Published: 2022-05-24 | Categories: [»] Tutorialsand[»] Optics.

In a [»] previous post we saw how the imaging quality of a system could be quantified through an accurate representation of [»] diffraction. More specifically, we described the Modulation Transfer Function (MTF) and explained how it helps characterizing a system’s performances.

These evaluations are however computationally-demanding because they require evaluating the PSF through the Rayleigh-Sommerfeld formula, which is O(N4), and we have then to compute the Fourier transform to get the MTF. An alternative to the Rayleigh-Sommerfeld formula to get the PSF is to compute the Fourier transform of the Wavefront Error (WFE) map which represents the delay of each ray compared to a perfect spherical wave centered on the image point that would produce a diffraction limited PSF through the Fresnel-Fraunhofer approximation. This was detailed in our [»] previous post as well, including the benefits/disadvantages of each approaches.

At this point, you may have already some sort of gut feeling that, since the MTF is linked to the PSF and the PSF is linked to the WFE, then, maybe, you could use the wavefront error map directly to quantify your system and skip all the Fourier transforms parts. You would be right and this is actually well known by optical designers who often use quantifiers on the WFE to characterize their system, the two most common being the peak-to-valley and the root-mean-squares (rms) figures.

An interesting thing about the rms value is that it correlates directly to the ratio of the area under the MTF over the area under the MTF of the diffraction-limited system:

with S being called the Strehl ratio, rms is the root-mean-squares of the wavefront error map in meters and λ the wavelength of the light being used.

This approximation is known as the Marechal formula and is relatively accurate for moderately aberrated systems (S>0.3). For your personal information, you may hear about the Marechal criterion as well which impose a system quality such that S≥0.80 which corresponds roughly to a system with a rms of λ/10 and better.

The Strehl ratio is often used as a rejection filter when optimizing the design in its early stages. It is a relatively quick way to sort out what systems have a change to work and keep only the solutions that will later make sense without having to compute the actual PSF and/or MTF. It is rarely used, however, in my experience, as a figure of merit in the latest stage of the design where we need to make accurate predictions. Still, it is frequent to have it mentioned in communication with clients, as a system requirement or as a selling point of an off-the-shelve product, particularly in amateur astronomy telescopes mirrors.

That being said, everything I mentioned here-above was just the introduction to today’s post. I am now going to discuss an extremely important topic in optical design which is the third-order aberration theory. Many junior optical designers tend to disregard this as being deprecated by today’s technology (and so did I when I started) but it is an enormous mistake and you will never succeed in optical design if you do not understand it. As always, for the #DevOptical software, I’m going to make this as easy to use as possible but, at contrario to other optical design software, I will make third-order aberration theory at the core of the system rather than as a side feature because it is so important.

Third-order aberration theory dates back to the time where there was no computer to perform extensive raytracing of wavefront error maps but they still had to find a way to direct their choices when designing system because the trial-and-error approach used so far was not desirable anymore (not practical nor efficient). By mathematical treatment of the refraction of rays through glass interfaces, they managed to come up with a series of formula that allowed predicting an approximation of the wavefront error map from the raytrace of only two particular rays in the system. Not only does this save a lot of computations, it also allows to compute the individual contribution of each interfaces to the overall aberration of the system through manageable mathematical formulas. All the major historical lenses were designed using these formulas and they had fantastic quality for that time. A good example of this is Cooke’s triplet that was invented by Denis Taylor in 1893 which is still in use today as a low cost solution for inexpensive camera lenses because, despite having only three lenses, it corrects all the low order aberrations (see below for a definition of aberration orders).

Sure, a modern raytracing software will produce a more optimized design because third-order aberration theory only give you an approximation on the aberrations while a full raytracing approach will compute an exact solution, but it also gives you a methodology to create a base solution that you can then refine with modern tools. Optical design softwares, on the other hand, have no overall methodology to guide you in creating your base system. In fact, during an official training with ZEMAX, they thought us to start from a known similar design and to reoptimize for our specific conditions. They are, in that regard, very good at selling you libraries of lenses taken from patents or the literature for you to adapt. This is something really important because if you want to become a successful optical designer, you cannot rely on just copying other people’s work and you need to understand how to design systems in the first place. That is the first reason I want to put third-order aberration theory at the center of our software.

The second thing about third-order aberration theory is that it is fast, much faster than complete raytracing. It can then be used as a rejection filter during design – just like the Marechal formula. If a design is failing in the third-order aberration theory, it is guaranteed that it will give you troubles when you will have to produce it as highly aberrated systems tends to be very sensitive to tilt and decentering.

Now that you are convinced (I hope you are at this point) that it is important to understand third-order aberration theory, let’s dig into what it is and how to use it.

When we compute the wavefront of an object point located at coordinates (x,y) we trace bunches of rays through the system that intercept the pupil at coordinates (px,py). The only free parameters of the raytrace are the coordinates x, y, px, py. All other parameters like the wavelength and temperature are considered fixed. That means, we can say the wavefront at any object point can be adequately represented by a function W(x,y,px,py).

If we now restrict ourselves to rotationally symmetric systems (and if not you can usually split your system into subsystems that are rotationally symmetric), we can break this down as three variables and write

where

are the vectors going from the optical axis to the object point and pupil intercept coordinates.

The new parameters are therefore the square of the distance of the object point and pupil intercept coordinates to the optical axis and the cosine of the angle between the vector O and P (through the dot product).

Since the system is rotationally symmetric, there is no real distinction between picking any (x,y) point rather than a point on the y axis alone (0,y). So, we end up with the generic description W(y²,yyp, xp²+yp²). Note we still have no closed expression for that wavefront, it is merely a mathematical statement saying that the wavefront depends only on the height of the object point, the height of the pupil intersection and the cosine of the angle between the object point and the pupil interception. So far it represents any rotationally symmetric system.

We can further compute the [∞] power series expansion of that generic function and we obtain

If you check the required units of the coefficients to produce a wavefront in meters you get ai in meters, bi in meters-1, ci in meters-3 etc. The name third-order aberration theory comes from these units and means that we will focus our attention on the ci coefficients. There are higher-order aberrations (di and above) but only the ci can be easily computed from two rays. Also, systems with low third-order aberrations usually have even lower higher aberrations and it’s a good approximation to neglect them. More details on this later.

A specific remark corresponds to the bi which are the first-order aberration coefficients. I will not discuss them here but they correspond to paraxial optics (so-called Gaussian images of the point). We have already set a complete framework for the processing of paraxial rays in our previous posts so I do not feel the need to extend the discussion on these coefficients because it will be redundant to what has already been said but viewed under a different angle and might so eat unnecessary energy for a first exposition to system analysis.

You can remember that

represent a defocus, and,

represents a tilt (lateral deviation).

To complete our analysis on the W function, we have mentioned in earlier post that aberrations fall to zero as we get close to the optical axis. That means the aberration should be zero when we cross the center of the pupil

If we exclude the case y=0, this assertion can be true only if a1=b3=c6=0. The same reasoning can be applied to higher orders as well. Note that it is not something magic and it is coming from the fact that we describe aberration as the difference between any ray and the ray passing through the center of the pupil as it defines the general offset of the wavefront. For convenience in our mathematical treatment, we choose this offset to be zero.

This is exactly the approach followed by Sir William Hamilton in the 1830’s. The exact choice of the rotationally invariant coordinates used here (and not another system like using square roots) is made to stay consistent with the theory developed around Hamilton’s work. Other systems are possible but they will give different scaling of the aberrations.

Let us now focus on the third-order aberration only. By subdividing our previous wavefront into five functions Wi

we have

Spherical Aberration is shown in Figure 1. It is independent of the field position, y, and increase with the fourth power of the radial coordinate of the aperture. You can interpret spherical aberration as some sort of defocus that would be proportional to the square of the radial aperture intercept position. This means that rays close to the optical axis will not focus at the same position than rays close to the rim of the aperture. It also means that doubling the aperture diameter will multiply the spherical aberration by 4! It is usually the dominant aberration in non-corrected systems with low field angles.

Coma is shown in Figure 2. It acts as a defocus that is both proportional to the field position and the pupil intercept coordinate along the field direction. As a consequence, coma is always negligible for very small field angles. The name coma comes from the comet-like shape this aberration produces. It’s shape is always oriented radially due to the yyp dependance.

Astigmatism is shown in Figure 3. You can interpret astigmatism as some sort of defocus that would act only in the direction of the field position. That means you will have two different focus position, depending on if you are aligned with the field axis or with the perpendicular of the field axis. As for coma, astigmatism is negligible for low field angles since it depends on the square of the field position. As for field curvature, it also makes the system focus on a sphere rather than on a plane (more on this below).

Field Curvature is shown in Figure 4. It acts as a defocus that is proportional to the square of field height. The consequence of the square dependence to the field position is that rays will tend focus on a sphere rather than on the plane. At contrario to spherical aberration, coma and astigmatism, the image of a point remains a point – just not where you expect it to be from a paraxial analysis.

Distortion is shown in Figure 5. It corresponds to a lateral shift (tilt) that depends on the cube of the field position. Like for field curvature, the image of a point is still a point but it will be located at a different place than where you expect it to be from a paraxial analysis. Distortion is well known to photographers and can be corrected in software provided the correction to be applied is known (either from the design or from a calibration).

Neglecting higher-order aberrations, we can say that any total aberration of a system is a linear combination of the five elementary aberrations shown here-above. Note that you may have encountered some of these aberrations before, like in our post on a [»] custom microscope objective from COTS parts. You now have the origin of these aberration types and names.

Still assuming that our wavefront is mostly described by third-order aberrations (which also means the system should be at focus so we have no influence of the bi terms), we can compute the rms as

where A is the aperture and the normalized Qi,j products can be precomputed for a circular aperture as

The rms can then be obtained by the sum of the weighted products of the ci. This (relatively) elegant concept therefore allows to evaluate an approximation of the Strehl ratio of a system by the simple knowledge of the ci coefficients. Note that the matrix is symmetric and is valid only for unobscured circular apertures. If the aperture is different, the table shall be recomputed. When using the two-rays approach described here-below to solve the ci, the value of y should be normalized by the maximum field angle (that is, y=0 for on-axis and y=1 for maximum off-axis).

Now that we are comfortable with the power series expansion, we need to discuss how we can compute the values of the ci for a given system. This is where the two-rays approach I mentioned comes into play!

The theory was first published by Philip Ludwig von Seidel in 1856 and is still of uttermost importance today. A complete derivation of the formulas can be found in Chapter 8 of Aberrations of Optical Systems by W. T. Wellford. I must confess this book was really hard to find and I had to make some phone calls to get my hands on one. It is also relatively difficult to go through so I don’t necessarily recommend reading it unless you are starving for an in-depth explanation of the here-below formulas.

The general idea used by Seidel was to propagate two rays through a single-spherical interface system, one passing through the center of the pupil and one passing at its edge. For a single interface, it is still manageable to express the relations analytically as we did in our [»] post on raytracing though you will also have to approximate the square root function in the way. After a lot of substitutions (really, a lot of them) and equating the various terms to the power-series wavefront distribution shown above, you end up with a series of formula to derive the so-called Seidel coefficients. Note that the choice of the rays pose a problem with systems that have central obscuration like most mirror-based telescopes.

There are two important remarks to be made.

First, by some interesting fact that appear as you do the maths, the coefficients do not depend on the actual reference sphere used for the wavefront. This is not the case with higher-order aberrations. That means only third-order aberrations are localized features of the interfaces. Higher order aberrations will also depend on the space between the interfaces and make interpretation of the coefficients not as easy as for the third-order ones.

Second, all computations are made on the Gaussian image points of the lens – that is their paraxial image! A direct consequence is that you do not need to perform full raytracing to obtain the wavefront estimate but only easy ABCD paraxial matrix computations. This makes Seidel formulas really fast to apply.

Finally, since the maths to get to the formula are quite lengthy, you may likely run into different expressions of the same formula. The one I used here are the one from W.T. Wellford’s book which can also be found in Handbook of Optical Systems, Volume 3: Aberration Theory and Correction of Optical Systems that I already quoted before on this website. Using these, I was able to get the same results as in Zemax OpticsStudio. I also tried the formula of Modern Optical Engineering by Warren J. Smith a few years ago but I failed to get correct results with them (I don’t necessarily blame the author – that can be my fault too). They also don’t have the symmetrical beauty of W.T. Wellford’s book which I will consider from now on as the reference.

In Seidel theory, the contributions of each interface in the system can be computed as

where h is the ray intercept height, c the surface curvature, u the incoming ray angle, u’ the exit ray angle, n the initial refractive index, n’ the refractive index after the interface and

where (h,u) refers to the marginal ray and (h,u) to the chief ray.

i is therefore the incidence angle of the marginal ray on the surface, i is the incidence angle of the chief ray on the surface and L is the Lagrange invariant of the system.

The quantity

represents the change of x as it goes through the interface.

Some logic emerges from the previous formulas that you can use to simplify computations since

From the derivations of Seidel, the relation between the Si and the ci coefficients of the power series are

SI is therefore spherical aberration, SII coma, SIII astigmatism and SV distortion. The case of SIV is a bit more complex because it is a part of field curvature along with SIII and is called Petzval curvature. If a system does not have astigmatism, then all object points focus on a sphere whose radius is given by the Petzval curvature.

The total Si terms are given by the sum of the individual Si over all the interfaces. Every surface therefore contributes to the total aberration and some surface contribution will balance other of the opposite sign.

Although this goes way beyond the scope of this article, a quick glance at the Seidel formula shows that some of the aberrations can be completely cancelled under specific conditions. I will not cover them in details but we can list

∆(u/n)=0 will cancel spherical aberration, coma and astigmatism. These are the Young-Weierstrass points. Example usage are immersion lenses.

h=0 will cancel spherical aberration, coma and astigmatism. Example usage are field flatteners.

A=0 will cancel spherical aberration and coma. Example usage is the Offner concentric mirrors relay system.

A=0 will cancel coma, astigmatism and distortion. Example usage is the Schmidt telescope.

When designing a system, you should always try to balance all the aberration types such that their sum is as close to zero as possible. Having all Si=0 means all ci will be zero too and therefore rms=0 and the Strehl ratio equals to 1 (diffraction limited system). This is, however, not always true and you should pay attention that you don’t have very large individual contributions when taken separately. Although their third-order contribution will cancel out, this is not generally true for the higher orders and a surface with high third-order will usually have non-negligible higher order aberrations. Also, highly aberrated interfaces tends to have bad tolerance properties and you are likely to end up with a system that is extremely difficult to build (either from a manufacturing point of view and/or an assembly point of view). Design strategies, such as splitting single lenses into multiple ones, shall be used to ensure no single surface has too much contribution to the aberrations.

Balancing third-order aberrations using the here-above formula is exactly what optical designers have been doing since the empirical approaches have been abandoned. Every degree of freedom allows to minimize (if not cancel) one aberration at a time. For instance, in single lens system, you can alter the spherical aberration by playing with the lens bending (which is exactly what we did in our [»] former post but without mentioning where the formula was coming from). By altering as well the aperture position relative to the lens, we can both minimize spherical and coma. etc.

Implementing this methodology will give you a base design (there can be multiple base design achieving the overall same performances by the way). From the base design, and only from this point, you can switch to full raytracing optimization and tolerance analysis to optimize and validate your design by including all aberrations orders.

As an example, the long-working distance lenses bending of Figure 6 were optimized to produce close-to-zero seidel terms. The actual coefficients are given in Figure 7 and the predicted wavefront in Figure 8. All the results were validated against Zemax OpticsStudio and matches perfectly. Note however that the actual raytraced off-axis wavefront is λ/4 due to the higher-order aberrations that became non-negligible here. The cause is probably the 50 µm spherical aberration of the first element that is compensated mostly by the strong bending of the second lens. If these results were not satisfactory enough, the next design approach would probably be to swap the first lens by a doublet.

That is all for today! There are more to be discussed like the stop shift equations, the thin lens approximation to the third-order aberration theory, design tips using the third-order aberration theory equations etc. but this goes way beyond the scope of this post and we will have the occasion to discuss all of them back later!

I would like to give a big thanks to James, Lilith, Cam, Samuel, Themulticaster, Sivaraman, Vaclav, Arif and Jesse who have supported this post through [∞] Patreon. I remind you to take the occasion to invite you to donate through Patreon, even as little as \$1. I cannot stress it more, you can really help me to post more content and make more experiments! The device presented in this post was paid 100% through the money collected on Patreon!

[⇈] Top of Page