Understanding 3D transforms is essential to mathematics applied to the gaming industry or, more specific to our current concerns, to the raytracing process of optical design softwares. The math behind them is relatively straightforward provided it is explained step-by-step. I derived most of the relations presented in this post when I started university but the maths themselves are high-school level only.
The topic being relatively broad, I will limit myself here to the concepts that are mandatory to implement optical design software but I will also digress a bit on games programming since I have some experience there as well.
Describing points in 3D space requires 3 degrees of freedom so it is safe to write any point P as a triplet
where P1, P2 and P3 are the three components of point P.
Up to now, we did not consider any meaning to these components, they dont even have units yet. To have meaning, we need to assign the point P a coordinate system. There are many coordinate systems like spherical coordinates, cylindrical coordinates etc. The most popular, and often easy to work with, is the Cartesian coordinates system defined as three perpendicular axis labelled x, y and z. It is convenient (but not mandatory) that the axis are isometric, meaning a displacement of given distance yields the same result whatever the axis is. This definition may seem difficult (or very mathy) to understand but it is only a formalisation of the 3D coordinate system everyone is using 99.9% of the time to think about the world or to implement video games. In this coordinate system, the units of P components are lengths (i.e. meters, inches, cm etc.). Note that you always select the coordinate system that is the best to represent your problem. For instance, describing a helix is best handled in the cylindrical coordinate system.
In the cartesian coordinates system, the three components are often labelled as x, y and z:
Two points located in space gives a direction. As an example, v is the direction from point O to point P and is written
v connects point O to point P in a straight line (more on this later).
Similar to points, directions also have three components that we can label (in the case of a Cartesian coordinate system) as
Point P can be obtained by adding the quantity v to P:
and we can generate any intermediate point of the straight line between O and P using parameter 0≤k≤1:
This equation has one degree of freedom (k), all the other quantities being fixed. If we restrict k to [0,1] we have a line segment and if we do not restrict k we have a full line that extends beyond points O and P.
The previous equation can be split in its component representation (here for a Cartesian system with x, y, z indices):
This is less compact but details better how each coordinate is obtained by calculus which is what the computer will be doing.
Similarly, O is obtained from P by
Only the subtraction between two points is allowed. Adding two points has no meaning. On the other hand, directions can be added and subtracted without any problem. As an example, if v1 is the direction from O to P and v2 the direction from P to Q, then v1+v2 is the direction from O to Q:
Points O, P and Q also define a plane using two degrees of freedom k1 and k2 (provided Q does not line up with O and P, in which case the system is degenerated):
This equation defines a triangle if we restrict k1, k2 and k1+k2 to [0,1].
Finally, if we add a fourth point U we get a volume using three degrees of freedom (provided these points are not coplanar):
This last equation will become very important in this post.
Note that the concept of a straight line has to be understood in the context of its coordinate system. Connecting point O to point P in spherical coordinates is a straight line in the context of their coordinate system. If we were to display the path from O to P in Cartesian space, it would appear as an arc because we would linearly interpolate the angular (and radius) coordinates of the initial system.
Note also that, by convention, I am using capital letters for points and arrows for directions.
We could stop here but I need to introduce a last concept which is the homogeneous cartesian coordinate system. We will add a 4th dimension to our system using component w.
w will always be set to 1 for points and 0 for directions since directions are obtained by subtracting two points, therefore yielding 1-1=0.
Point P now becomes
Since P still has three degrees of freedom so it still represents a point in 3D space. This 4th component will be very handy to represent some transformations later.
For raytracing we will never have w different from 1 but video games do have some transforms that will affect this last component. Game engines then project the point back to 3D space by a process known as homogenization which consists of dividing all the terms by w:
This is very handy to represent perspective which is an apparent change of lateral (x,y) size based on the depth of an object. Imagine you have a transform that convert point P in 3D space to (Px,Py,Pz,Pz). Homogenising this point will yield 2D coordinates (Px/Pz,Py/Pz,1,1). The newly generated (x,y) lateral coordinates now depends on the depth Pz of the object.
You should however not worry too much about homogenization as we do not need it for raytracing operations.
Lets come back to our equation defining a volume:
Point T has coordinates (Tx,Ty,Tz) in our Cartesian coordinate system. But point T can also be described by the triplet (k1,k2,k3) using the above equation. Both have three degrees of freedom and are therefore legitimate to represent a point in a volume (space). The difference between (Tx,Ty,Tz) and (k1,k2,k3) is that the former expresses the coordinates in the main coordinate system while the latter represents the point in the coordinate systems defined by O, OP, OQ and OU.
If this last coordinate system was attached to a body of origin O and orientation OP, OQ and OU we would say that (k1,k2,k3) is the representation of T in the local coordinate system of the body while (Tx,Ty,Tz) is the representation of T is the world coordinate system. Video games use the concept of local space (LSP) and world space (WSP), I will borrow these notations.
To simplify computations, I will rename these local axis x, y and z and we will assume that they are orthonormal, that is
where ● is the dot product operator defined in Cartesian coordinate system as
and corresponds to the projection of direction a on direction b.
The length of a vector a being defined as the Euclidean norm, or,
Orthonormal axis are therefore axis that have zero projections on other axis and have unit lengths.
As an example, if we define our axis as
We can verify the here-above equalities and write T as
x, y and z are the conventional definitions used for the world space system centered on coordinates 0.
We will now define two important transformations.
The first transformation converts coordinates in local space to coordinates in world space. It is nothing but the formula we previously wrote:
where Twsp is the point in world space coordinates and Tlsp is the point in local space defined by axis x, y, z and center O.
The second transformation will convert coordinate from world space to coordinates in local space. Using the equalities of the dot product we defined, we can write
Using these two transforms, we can project a point from local space to world space and vice versa. Expressing a point in local space will often have the advantage to simplify the raytracing operations when doing interception of rays and surfaces.
Lets now develop each of the transforms in their component notation.
The first transform becomes (in a very generic expression)
which we can write as a product of matrices
Or, in a more compact form,
Conversely, we can write the second transform as
And, in a more compact form,
These two matrices are the most important concepts to remember from this post so be sure to understand them.
We can notice several important things in the here-above notations.
First, vectors (points and directions) are now expressed as column matrices. That is,
I removed the vector annotation on purpose when expressing directions as matrices.
Second, the last row will always be (0,0,0,1) and the last column will contain the amount to translate the point.
Third, the top-left 3×3 matrix representing the axis are the transpose of each other. These matrices belong to the family of orthonormal matrices which have the property that
A matrix is orthonormal if its column vectors ci respects the rule
This occurs here because when building the matrices we used the assumption that
Orthonormal may look anecdotal but actually represents a very important family of meaningful transforms since orthonormal matrices have the property to conserve lengths. Indeed, the length L of a vector v is given by
or, in matrix form,
The length of the transformed vector v is therefore
where I is the identity matrix which does nothing to the vectors.
Now that we know how to transform points and directions, I would like to make a small note to make about how to transform normals. A normal is a vector that is perpendicular to the surface at any given point. It is obtained by taking the x and y derivative of the surface z=S(x,y):
Normals are heavily involved in 3D games for lightning computation and we need them as well in raytracing to compute refraction of light rays.
The dot product between the vector OP and the normal n gives the distance between P and the plane perpendicular to n and passing through the origin of the system O=(0,0,0). That is
I will skip the derivation because it goes out of the scope of this post. What is important is that this distance shall be independent of the coordinates system in which we measure the distance d, so
where n and OP are the transformed vectors of n and OP respectively. Since we have
For any coordinate system represented by M, the only way to transform n to conserve the distance d is to apply the rule
When the matrix M is orthonormal, it breaks down to
Which is the same form as the point/direction transform. This property of orthonormal system has led many people think that normals transform just like directions because it works as long as the matrix is orthonormal (99% of the case I would say). It will however fail and generate erroneous results when skew transforms are used.
Although we will use normals in raytracing heavily, we usually do not need to transform them unless we want to report it to the user in world space coordinates.
It is interesting to note that we can split our previous transforms into successive operations:
when our space is orthonormal.
In the previous notation, we can identify some important transforms:
is the translation transform. It adds the quantity (x,y,z) to the point to which it is applied. Its inverse is
Is the centered lsp to wsp matrix for a body of axis x,y,z. Its inverse is
where the supscript T stands for transpose.
Any 4×4 matrix will correspond to a transform. Some of them will have a physical meaning and some will not. Some of them will keep the point in 3D space while some other will leave the 3D space and will require later an homogenization of the resulting vector to have a meaning.
What is important is that you can consider transforms as matrices and that applying successive transforms to a point can be treated as successive matrix multiplication. In raytracing we will use one or two local space transforms: one to define the orientation of the lens in space and another one to represent the clear aperture of the lens (most of the time it will be the identity transform). Video games can have much more cascade transforms. A good example is skeletal animation where each joint of a character depends on others (eg: the tip of the finger depends on all the joints of the finger, the wrist, the knee, the shoulder and how the shoulder connects to the rest of the body).
As you concatenate your local transform matrix, you can also concatenate the inverse transforms to yield the wsp to lsp transform. Technically, you could compute it from scratch by taking the inverse of the local transform matrix but this is not computationally friendly and its better to build both transforms side by sides:
where the transforms are applied in the order 1 to N.
In the previous section, I gave a generic expression of the orthonormal transform and its inverse. There are many simple transforms that are described by this generic equation.
We can mention the rotation matrix around the x axis:
Which rotates the local space around the x axis by an angle θ. Its inverse is
Similarly, there is the rotation around the y and z axis:
The inverse being
In a former post, I showed how [»] quaternions could be used to represent rotations about an arbitrary axis n, including the 3×3 matrix representation of that transform. You can easily upgrade this transform matrix by padding with zero and adding a final 1 at the (4,4) coordinates but I will not do it here as it is not used in raytracing. This transform plays a very important role in video games, more specifically in animation engines where smooth angular movement at joints are required.
We can also mention the reflection matrices which flips one or more coordinate axis. For instance, the reflection on x being
You can also write the reflection around y, z, xy, xz, yz and xyz similarly.
Improper rotations are rotations around an axis followed by a reflection around a plane perpendicular to that direction. They are heavily used in group theories but have not a lot of usage in raytracing to my knowledge so I will skip them.
Among the matrices that are NOT orthonormal you have the matrix that scale everything by factor (x,y,z)
Once you concat a matrix that is not orthonormal to a chain of transforms, the complete chain loses its orthonormal properties.
Using concatenation of simple transforms allows you to build a full transform easily. For instance, the matrix that rotates the system around a point P by an angle ϴ around x is
where the transforms on the left are applied first.
You can easily check that only point P is invariant to the rotation. This form of transform is very popular in CAD software where P is then referred as the anchor point or the transform. Typical transforms with anchor points are rotations and scaling.
Note that the order in which you concat transform is important. For instance, there are 6 different ways to combine three angles (pitch-yaw-roll) rotations around the X, Y and Z axis. NASA defined the following standard:
while 3D computer graphics defined the VRML standard:
Since pitch-yaw-roll correspond to a physical definition, we can understand this as NASA and VRML standard having both the X axis oriented forward from the body but while NASA defines the Y axis as up and Z axis as lateral, VRML does the opposite.
In optics I usually choose to have the Z axis pointing forward and I use the following rotation system
Implementing rotations can be a nightmare to make them work if you do not understand exactly how they are built!
When doing raytracing, transforms plays an important role because the intersection between a ray and a surface is usually easier to compute in some reference system. It is common to describe surfaces perpendicular to the z axis with the apex of the lens as the center coordinates of the system.
For instance, the standard optical surface (as named in optical design software) is described by the equation
for radial coordinate r, curvature c, conic constant k and r²=x²+y².
This equation is expressed for a body aligned with axis z and perpendicular to x and y axis. It has its origin at 0 (i.e.: z(0)=0).
To compute the intersection of a ray of direction v originating from point P described by the equation
with the surface in a local system described by transform Mlsp>wsp and its inverse Mwsp>lsp, we first transform the ray from world coordinate system to local space using the wsp->lsp transform, solve the system to find the intercept position in local space and transform the local position into world coordinate using the local space to world space transform.
Using the same method, we can go from surface to surface, following the ray, each time by transforming the ray from world space to the next surface local space and retransform to world space after having solved the interception problem (more on this in a following post).
Figure 1 illustrates a simple raytracing with a STOP and a plano-convex lens. Rays emerges from an object plane, intercept the STOP then the first and second interface of the lens before reaching the image plane. Each surface (object, STOP, sphere, plane and image) have their own local space coordinates system represented here by xy arrows.
Note that when performing raytracing in a system, we usually express the starting and ending position in the local space of the first and last surface. That is, if rays emerge from an object plane to end at the image plane, the coordinates reported to the user will be in the local space of object and image plane respectively instead of in the world space coordinate system. This is purely for convenience of the user and does not change anything with the math.
Thats all for today! In the next post, we will develop a little bit further the raytracing equations to develop our own optical design software!
I would like to give a big thanks to James, Daniel, Naif, Lilith, Cam, Samuel, Themulticaster, Sivaraman, Vaclav and Arif who have supported this post through [∞] Patreon. I also take the occasion to invite you to donate through Patreon, even as little as $1. I cannot stress it more, you can really help me to post more content and make more experiments![⇈] Top of Page
You may also like: