Least squares

In regression analysis, least squares is a useful way to find the best model that matches a set of data points. It works by making the total of the squares of the differences between the real data and the model's predictions as small as possible. These differences are called residuals.

There are two main types of least squares problems. One type is called linear or ordinary least squares, which happens when the model is straight or linear. This type has a special math solution that can be found directly. The other type is nonlinear least squares, which is solved step-by-step by making the problem look more like a linear one each time.

Polynomial least squares helps us understand how much a prediction can change based on the values we put in and how far they are from the curve the model makes. This method is important in many areas of science and math because it gives a clear way to make good guesses from data.

History

Carl Friedrich Gauss

The method of least squares grew from many ideas during the 1700s. Early thinkers like Isaac Newton and Roger Cotes suggested combining many observations to get a better estimate. Others, such as Tobias Mayer and Pierre-Simon Laplace, applied these ideas to study the movements of planets like Jupiter and Saturn.

The method became clear when Legendre published it in 1805. Soon after, it was widely used in astronomy and mapping across France, Italy, and Prussia. Carl Friedrich Gauss improved the method by linking it to probability, showing that it could predict positions very accurately — such as finding the asteroid Ceres after it disappeared behind the Sun.

Problem statement

The goal is to adjust a model to fit a set of data points best. Each data point has an input value, like the number of hours studied, and an output value, like the test score received. We want to find the best way to connect these inputs and outputs using a model.

The least-squares method helps us do this by focusing on the differences between what we observe and what the model predicts. We square these differences and add them up, then choose the model that makes this total as small as possible. This gives us the best-fit model for our data.

For simple models, like a straight line, this method gives us the average value of the data. For more complex models, we might need to adjust more pieces to get the best fit.

Limitations

This method focuses only on errors in what we're measuring directly. Another method, called total least squares, can also consider errors in all variables.

There are two main reasons to use this method:

For making predictions. We use the data we have to create a rule that helps us guess future values, assuming future data has similar errors.
For finding a true relationship. Normally, we assume there are no errors in the input data. But when there are errors, special methods can help us get better results. Using total least squares is one way to balance different types of errors.

Main articles: Regression analysis, Independent variable, Models of measurement error, Parameter estimates, Hypothesis testing, Confidence intervals

Solving the least squares problem

Least squares is a way to find the best fit for data by making the total of the squared differences between what we observe and what our model predicts as small as possible. We do this by setting a special math rule, called the gradient, to zero. This helps us find the best values for our model.

There are two main types of least squares problems: linear and nonlinear. In linear least squares, the model is a straight combination of simple pieces, and we can find the answer directly using math formulas. In nonlinear least squares, the model is more complex, and we usually need to start with guesses and improve them step by step until we get close enough to the best answer.

Example

Imagine you have a spring, and you want to find out how much it stretches when you push on it with different amounts of force. Scientists use something called Hooke's law, which says the stretch of the spring, called "y", depends on the force, called "F", and a special number called "k".

To find this number "k", they test the spring many times with different forces and measure how much it stretches each time. But each test has a tiny bit of error. So, they use a method called least squares to find the best guess for "k". This method looks at all the tests and finds the value of "k" that makes the total of the squares of these tiny errors as small as possible. Once they have this best guess for "k", they can predict how much the spring will stretch for any force using Hooke's law.

Uncertainty quantification

In least squares calculations, we try to understand how sure we can be about our results. We look at how spread out the possible values might be for each part of our answer. This helps us know how reliable our findings are.

We use a special number called the error variance to estimate this spread. We replace the true error variance with an estimate based on how well our model fits the data. This gives us a way to measure the uncertainty in our results. The number of data points we have compared to the number of things we’re trying to find also affects this uncertainty.

Statistical testing

When we know how the numbers in our data are spread out, we can find ranges for our guesses and test our results. If we assume that the small differences between what we measure and what we predict are spread out in a certain way, it makes our guesses more reliable.

A key idea is that these small differences are spread out evenly. This helps us know that our best guesses are the ones with the least changes. Even if these differences aren’t spread out perfectly, with enough data, our guesses will still be close to the right answer.

Weighted least squares

Main article: Weighted least squares

Weighted least squares is a special way to find the best fit for data. It is used when the spread of the data points changes depending on their value. This happens when some data points are more spread out than others. Weighted least squares helps adjust for this uneven spread to give a better result.

Relationship to principal components

The first principal component shows the line that comes closest to a group of points, measured by how far each point is from the line. This method looks at all directions equally.

Linear least squares, however, only focuses on minimizing the distance in one specific direction. So, while they both try to reduce errors, linear least squares treats one part of the data differently compared to how principal components work.

Relationship to measure theory

A well-known statistician named Sara van de Geer used ideas from empirical process theory and the Vapnik–Chervonenkis dimension to show that a least-squares method can be seen as a special kind of measure on a space of square-integrable functions. This helps explain the deeper math behind how we find the best fits in data.

Regularization

Main article: Regularized least squares

Main article: Tikhonov regularization

Sometimes, we use a special version of the least squares method to get better results. This version, called Tikhonov regularization or ridge regression, adds a rule to keep the numbers we use in our model small. This helps prevent the model from getting too complicated.

Another method is called Lasso. It works by keeping the total size of the numbers in our model small in a different way. Unlike ridge regression, Lasso can make some of these numbers become zero, which helps us focus on the most important parts of our data. This makes Lasso useful for picking out which pieces of information matter most in our model.