
The tree(3D) ←→ Camera(2D)
The tree : The point we're seeing with our eyes, 3D point which has world coordinate
Camera : When we take the tree, it transforms to 2D

3D Projection Geometry:
- The camera is at point C (camera center).
- A 3D point in space is projected through the pinhole onto the image plane.
- The image plane is in front of the camera in this drawing, which is a convention to avoid flipping.

2D Side View:
- Shows that a point at depth Z gets projected a distance f·X/Z (or f·Y/Z) from the center of the image plane, where f is the focal length.
When we take the picture of the tree, the camera influenced by the Camera intrinsic parameters-lens, the distance between the lens and the image sensor, the angle between the lens and the image sensor, etc.
Therefore, when finding the position where the 3D points are projected on the image or, conversely, restoring the 3D coordinates from the image coordinates, these internal factors must be removed to ensure accurate calculation.
And the process of finding the parameter values of these internal factors is called camera calibration.
- X,Y,Z : 3D coordinates in the world
- [R|t] : extrinsic parameter(Rotation matrix, Translation matrix)-related to the outer space, such as the height of the camera, the orientation(fan, tilt)
- A : intrinsic camera matrix-Internal parameters of the camera itself, such as the focal length, aspect ratio, and center point of the camera
- A[R|t] : camera matrix(projection matrix)

- Camera intrinsic parameters
Focal length, Principal point, Skew coefficient.
- Focal length



Distance between lens center and image sensor in pixels.
- Principal point
Lens center point.

In the ideal case, Cx = width/2, Cy = height/2

Ideally, the main point (=lens center point) matches the image center point, but there are cases where it does not match due to various issues arising from the camera manufacturing process.
- Skew coefficient

Inclined degree of y-axis.
- Camera Projection(3D to 2D)
1. 3D Point in Camera Coordinates :
a. Intrinsic matrix K:
This matrix maps normalized camera coordinates to pixel coordinates:
- 𝑓: focal length (in pixels)
- p_x,p_y : principal point (usually image center)
b. Augmented identity matrix [I|0]:
This just extracts (𝑋,𝑌,𝑍) from the homogeneous 4D vector (𝑋,𝑌,𝑍,1).
3. Multiply It All Together
Now multiply:
4. Final Summary
- Camera extrinsic parameters
Rotation, Translation
3D←→2D
These are not camera specific parameters, so they depend on where and in what direction the camera is installed and how the world coordinate system is defined.
- World Coordinates to Camera Coordinates (Extrinsics)
This is a coordinate transformation from the world coordinate system to the camera coordinate system.
- : point in the world coordinate system
- : position of the camera center in the world coordinate system
- : Translate, move the world origin to the camera center
- : rotation matrix — aligns the world coordinate axes to the camera axes, This applies a rotation that aligns the world coordinate frame to the camera’s coordinate frame.
- t=−RC: camera position expressed as translation
So the transformation matrix becomes:
- Why Apply Rotation After Translation?
- Translation affects origin (moves things)
- Rotation affects direction (reorients the frame)
We first move the origin to the camera (so the camera becomes the origin), and then rotate to align the camera's axes with the new origin.
How?
1. We’re transforming a 3D point using:
R: 3×3 rotation matrix
𝑡: 3×1 translation vector
𝑋: 3×1 world coordinate
2. Let’s turn 𝑋 into homogeneous coordinates by appending a 1:
3. Augmented Matrix Form, We combine 𝑅 and 𝑡 into a single 3×4 matrix:
Then do the multiplication:
4. Final equation :
- Camera Coordinates to Image (Intrinsics)
- Final Form: Projection Equation
Because:
- It emphasizes that you're expressing everything relative to the camera center.
- It separates camera pose (center
BUT!!!! there is another formulation.
Project world coordinates (X,Y,Z) to image pixel coordinates (x,y) via:
- Intrinsic matrix K: focal length, principal point
- Extrinsic matrix [R ∣ t]: rotation + translation from world to camera
The difference between two of them above is
is explicitly uses the camera center C in world coordinates.
With Camera Center C | Standard Projection |
P=K⋅R⋅[I ∣ −C] | P=K⋅[R ∣ t] |
You want to... | Use... |
Work with camera position explicitly | P=K⋅R⋅[I ∣ −C] |
Use calibration output or toolbox | P=K⋅[R ∣ t] |
https://intuitive-robotics.tistory.com/110
'Autonomous Vehicle > Video Geometry' 카테고리의 다른 글
SIFT (Scale-Invariant Feature Transform) (0) | 2025.04.03 |
---|---|
Homography (0) | 2024.07.25 |
3D Transformations (0) | 2024.07.19 |
2D Transformations (0) | 2024.07.19 |
Coordinate System (0) | 2024.07.18 |