I find that Simple Linear Regression is a great introduction to machine learning because it is intuitive.
The context is, as the name suggests, simple: The dependent variable y is only a function of one dependent variable X.
The High School scenario
When the points are all on the same line then it’s the scenario we saw in high school.
To find the equation of that line we just to do the calculation below with any 2 points:
Now, what happens if the points are not all falling on the same line?
In this case, if we use the same formula as above with any pair of points we would get a different result, and thus a different line.
For this reason, we won’t be searching for the line that passes through all the points anymore. Instead, we will find what we call the line of best fit.
But before we get there, we need to understand why such a case might happen.
The Error is… not Only Human
Next time you’re on a treadmill, monitor your pulse rate as a function of speed. Do this for a couple of sessions and then combine your data.
What you will find out is that for the same speed your pulse rate will vary over time. Is it not strange?
If you think about it, the same system is regulating our pulse rate, so why would the output be different from one moment to another?
Well, maybe one day you were more tired and the next one you had more energy. Maybe one day you exercised before and the next you sat on the couch all day. We can assume that it has an impact on your pulse rate.
The problem is that in practice you can’t monitor all the tiny variables that can have an impact on your pulse rate. That being said, it’s not because we can’t measure them that we should ignore them. In fact, we can’t ignore them to get the line of best fit when we don’t know the equation of the true underlying system.
To take into account these variables we will lump them in what we call the error term (). Instead of assuming that each point is on the line, we will attribute them an error. The equation becomes:
Line of Best Fit
Intuitively, you can already trace a straight line in your mind just by looking at a chart with some points, like for the earlier examples. Let’s now formalize that intuition by using the error term and the parameters to find the line of best fit.
We will never really know what the true and are, so we will need to estimate them. To differentiate them, we will refer to the estimates as and .
To come up with the line of best fit, we will now consider the error in between the points in our data and our estimated regression line (in red in the chart below).
You can see that sometimes the points are above our line (positive error), and sometimes below (negative errors). To avoid positives cancelling out negatives we will choose to estimate our total error by squaring all errors. Here is our error function, also called the Mean Squared Error (MSE):
The line of best fit is the line that minimizes our error function. You can show that the line of best fit corresponds to the line with parameters:
For these quantities our MSE is the smallest, and thus we found the line of best fit.
Can we assume that when we minimizes the MSE we are automatically left with a good simple linear regression model? Not necessarily. There are a couple of assumptions that we need to validate, and we need some tools to assess the performance of our model.