There are different camera models in RealityCapture, but you can specify, which you want to use after all cameras are registered (aligned). This should be called as a final model. By default it is set to Brown3, which stands for 3-parametric polynomial Brown model without tangential distorsion. You can change this settings in Alignment settings ->Advanced -> Distortion model.
Each distortion model defines the way how points from the 3D space are projected to 2D image space. RealityCapture can convert between this models. Based on the selected Distortion model (see figure above), the camera model and a proper projection equation is defined.
Camera Coordinate system
There is a common part which employs transferring 3D point into a camera coordinate system.
Let X be a homogenous 3D point, X = [Xx, Xy, Xz, Xw], R be a 3x3 rotation/unitary matrix and t be a camera translation vector, then x is a 3x1 vector [u; v; w].
x = [ R t ] * X.
If Xw = 1, then point x represents the point X in the camera coordinate frame.
In the RealityCapture XMP file, the rotation matrix and the translation vector are stored as:
<xcr:Rotation>-0.600806990019897 0.799386597570746 -0.00346819369376912 -0.210825371930633 -0.162635135620312 -0.963899618846316 -0.771092486861039 -0.578386445454965 0.266243303052733</xcr:Rotation> <xcr:Position>2111.44219951044 1607.86624656544 2302.25896526736</xcr:Position>.
Where the rotation is stored row wise, i.e.,
R = [-0.600806990019897 0.799386597570746 -0.00346819369376912
-0.210825371930633 -0.162635135620312 -0.963899618846316
-0.771092486861039 -0.578386445454965 0.266243303052733 ].
General Projection Equation
Projection of X to image is defined as:
m = DistortionModel( x ),
where m is a 3x1 vector with m=[x; y; depth] and DistortionModel is an arbitrary function. Coordinates x and y are always in range <-0.5, 0.5>.
Following cases are considered:
1) calibration is applied after the lens distortion, i.e., DistortionModel( x ) = K * Distort( x ),
2) calibration is applied prior the lens distortion, i.e., DistortionModel( x ) = Distort( K * x ),
where K is an upper triangular 3x3 calibration matrix
[ focal skew ppU
0 aspect*focal ppV
0 0 1 ].
Entries of this matrix are stored in XMP as follows:
1. Principal point coordinates
ppU = xcr:PrincipalPointU="0.00621063808526977",
ppV = xcr:PrincipalPointV="-0.0214264554930412",
2. XMP stores the focal length w.r.t. 35mm file format
focal = xcr:FocalLength35mm / 36,
3. The camera skew, skew = xcr:Skew="0",
4. Aspect ratio, aspect = xcr:AspectRatio="1".
Lens Distortion Models
The basic equation for this model is as follows:
m = x + [0;0;k*r],
r = ||x-[0;0;1]||^2,
where k is the distortion parameter. Please note that it is assumed that x is de-homogenized, i.e., w = 1.
In the XMP, the distortion vector is the first vector element:
<xcr:DistortionCoeficients>-0.0831553227672967 0 0 0 0 0</xcr:DistortionCoeficients>.
In RealityCapture we apply division model in the image coordinates instead of camera coordinates. It means that:
DivisionMode( K^-1*m ) = x,
m = K*DivisionModel^-1( x ).
The inverse form yields to a more complicated projection equation which we omit here.
All other supported models are based on the polynomial distortion with basic distortion equation:
Xd = Brown( x ) = x * ( 1 + k1*r^2 + k2*r^4 + k3*r^6 + k4*r^8 ... ) +
[t1*(r^2 + 2*xx^2) + 2*t2*xx*xy; t2*(r^2 + 2*xy^2) + 2*t1*xx*xy ],
where Xd is the distorted coordinate of an undistorted point x = [xx; xy; 1] and r is the distance of the point x from the center of distortion (approximated to 0,0):
r = ||x-[0;0;1]||^2.
Please note that point x is normalized w.r.t. its w component. The complete 3D-to-2D equation has then the following form:
x = [ R t ] * X,
x’ = x / x.w,
Xd = Brown( x’ ),
m = K*Xd.
The distortion coefficients for brown model are in XMP file stored in xcr:DistortionCoeficients as:
<xcr:DistortionCoeficients>k1 k2 k3 k4 t1 t2</xcr:DistortionCoeficients>.
For Brown 3 or models without tangential distortion compensation k4, t1 and t2 are replaced with 0. Order of the entries in the vector is unchanged.
Recall that x and y coordinates of m are in range <-0.5, 0.5>. This is to make the projection equations independent on the scale of images and thus an end-user can freely downscale images without breaking the equation.
To scale the point m to the image pixels:
scale = max( image width, image height ),
px = scale*m + image width / 2,
py = scale*m + image height / 2,
where px and py are coordinates in the image.