Published: 2019-12-07 | Category: [»] Tutorials.

I recently presented [»] a method to find the methanol concentration of ethanol/methanol mixtures by using a method derived from a least-square fitting. I then promised to present a deeper introduction to the topic when I would have some time. That time has come :)

Before we start, I would like to point out that the method I will use here is not the one usually met on websites such as [∞] Wikipedia. It is actually the way I was introduced first to least squares fitting and has many advantages over the traditional description.

Also, you must be warned that you will someday encounter some I-know-it-all jackass who will pretend that standard (ordinary) least squares is not robust enough and will always fail to fit models properly. The truth is I have used the very same method I will describe here during my PhD thesis on hundred of thousands of holograms without a single failure and that the company I’m working for today use this exact same model on all of its instruments, computing thousands and thousands of high-quality data everyday around the world. So, when I tell you that OLS does work, you can take it for granted :) Obviously, as in all modelling operations, the results will depend on the quality of the input data and some procedure are better to handle extremely noisy data. But that does not mean that OLS is not an extremely good starting point!

So now that I set things straight, let’s dig into our problem.

Imagine you have some data with N points representing something of value. It can be 1-dimensionnal, like in a spectrum with wavelength-vs-intensity (y=f(x)) data, 2-dimensionnal like an image (z=f(x,y)) or n-dimensionnal to generalize (y=f(x0,…,xi)). The dimensionality of the problem is not an issue and you can always think about your data set as ‘N’ individual points. You would now like to fit a model on this data set because you know that it adequately represents your data. For instance, during my PhD thesis I was working a lot on holograms that had a parabolic phase offset due to an off-axis configuration. That offset had to be removed to leave only the phase of the sample being studied. Figure 1 shows an image taken during my PhD with yeasts cells with (left) and without (right) the parabolic background. The corrected image allows an effortless observation of the cell details.

Figure 1 - Parabolic background subtraction in holograms

In that case, the model was

Because we also have an experimental value zi for each couple of points (xi,yi) of the image we can compute the squared error of the dataset to the model

which requires the parameters a, b, c, d, e and f of the model.

The overall idea of the least-square method is to minimize the cost function to yield the parameters that would yield the best representation of the dataset (xi,yi,zi). There are optimization methods like the simplex algorithm (fminsearch in Matlab) designed to that but here we can use a simple algebraic resolution method.

Minimizing the cost function means cancelling all its partial derivatives in terms of model coefficients

You can see the cost function as a 6-dimensional function with variables (a,b,c,d,e,f) and its minimum requires (by definition of a minimum) that its gradient should be null at the position (a,b,c,d,e,f) that corresponds to a local or global minimum. Hence the 6 previous equations.

Computing the partial derivatives yields the system

which can be rearranged as

which can then be written using a matrix representation



the system can be solved as

In practice, it is generally not a good idea to explicitly inverse the matrix to solve the system of equations and method such as the Gauss-Jordan elimination or (much better) a SVD decomposition will yield better results. If using matlab, just use M\Y notation.

Two notes must be made about the matrix M: it will always be a square matrix and it will always be symmetric. So, the inverse will always exist as long as the model is not degenerated. Special case like matrices close to singularity are best processed using SVD algorithms but you shouldn’t worry about that if you are just using Matlab.

The previous results can be generalized for any dataset (xi,yi). Note that xi can be a n-dimensional vector.

If we would like to fit a model of M coefficient aj over M basis bj

The matrices will be

and equivalently

I left the results without proof but the equations are easily obtained by expressing the partial derivatives as we did previously.

One important thing to note is that the basis bj can be analytical functions or they can be dataset themselves. As example of the latter is when I presented the methanol/ethanol experiment I used the basis of pure ethanol and pure methanol spectra that I obtained experimentally.

That model was

For which we obtained the least-square solution

This model assumed that we had no loss of intensity and a more general model would have been

which relates to the previous one if there is no intensity loss, that is if α+β=1.

Solving the new model using the algebraic approach gives

The amount of methanol in the mixture would be expressed as

Which gives the previous results if α+β=1 (after some factorization).

I tested both model (α+β=1 and α+β≠1) and although they give slightly different results, the overall error when comparing Q to the actual wt% was the same (0.6% median error).

As you can see OLS are not really complicated and it is always the same method to apply. But it does not mean that the method is limited to straight-fitting operations. Actually, you can use OLS to do many things: integrating data, phase unwrapping etc. It is just a matter of expressing the correct set of equations. The data of Figure 1 were processed using a wrapped fit for instance.

Before we end, I would like to cover one special case which is the background removal of Raman spectra as presented on Figure 2.

Figure 2 - Experimental spectrum and fitted background

The basis I have used here are 3rd order b-splines with 6 knots producing a total of 6 functions with 6 parameters to fit. The internal details of b-splines functions are not important but you can see them as local polynoms that merges successively. Advantage of b-splines over a a6x6+a5x5+… model is that it is more robust and changes of a parameter value only has local effect on the generated function.

Basically, you can fit b-splines using the same technique as described previously but you will get unsatisfactory results with baselines that will have a tendency to pass through the peaks of your spectrum rather than supporting it. This is particularly evident when you have many intense peaks as they will “lift” the fit from the actual baseline. The solution is to define a curve that sticks to the data from below without ever getting above the experimental spectrum.

One way to achieve this behaviour is by using a custom cost functions such as the asymmetric quadratic function that I used to compute the baseline of Figure 2. The equation of the cost function is

where si is the actual spectrum data, ^ the fitted baseline and σ and the noise floor in the spectrum.

So the idea is to add to the cost function only the points that are below 3 times the noise floor. This will drive the curve to fit the data “from below”.

Unfortunately, we cannot use the previous algebraic procedure to solve such problems. There are different ways to do so but here I used a simplex algorithm (fminsearch in Matlab) to optimize the 6 coefficients. Just be aware that there are other ways to solve the problem.

One last remark concerns the value of σ. You can either use a fixed value that represent the noise of the spectrum (a priori known from previous experiments) or you can guess it from the data. The approach I used here was to initially fit the b-splines using the algebraic procedure, then remove peaks or abnormal regions using Tukey’s fence outliers detection and finally taking the standard deviation from the regions where the fit was more or less correct (the region between 1500 and 2500 cm-1 in Figure 2). This approach has the advantage of also leaving you with an initial guess of the model parameters which will speed up the simplex algorithm.

Spectrum background corrections does not always yield the desired results and you will need to play around with the parameters (number of knots, degree of the bspline etc.) and so I do not recommend it for automated processing of many spectra.

That’s all for today folks! I would like to give a big thanks to James who has supported this post through Patreon. If you’d like to support me too and get credits as well as many other stuff, consider [∞] donating :) You can make the difference!

Winter is coming and holidays too so I’ll take the occasion to get my hands dirty in the lab! See you really soon!

[⇈] Top of Page

You may also like:

[»] Quantifying Methanol in Ethanol using Raman Spectroscopy

[»] Robust Calibration Method for Spectrometers

[»] #DevOptical Part 19: A Quick Focus Algorithm

[»] Instancing Objects from Strings in C++

[»] Proportionality of the Least Square Coefficients in Raman Spectroscopy