We have one more bit of elegance to introduce, and that's to note that, since the equations are linear in a and b, a matrix formulation is appropriate. Let:
(17)
(18)
Then we have the lovely solution:
(19)
Implications
After all the math we needed to define the "best fit" criterion, the final result is satisfyingly simple. Only four of the summations are unique. Yes, we have to invert the matrix Q, but even that inversion is easy, because no matter how many data points we have, the matrix is always only a 2x2 matrix. The 2x2 matrix is unique in that, unlike its larger cousins, we can write down the inverse without any numerical tricks. First, the determinant is:
(20)
Then the inverse is given by swapping the two diagonal terms, negating the other two, and dividing by D.
(21)
It's important to note that Q doesn't depend on the measured y-values at all; only on the x's. In the general case, this fact doesn't matter much. But it has huge implications if I know the values of the x's in advance.
You'll note that in my simple example, I used values of xi that were evenly spaced, starting with x1 = 0. This is not at all required. Equations 16 through 18 work just as well if the x's are unevenly spaced. They don't even need to be in any kind of order. But in many cases, particularly in embedded work, I do know that the data is going to be measured at regular intervals. What's more, with a little judicious scaling and base-shifting, I can always make the x-values start with [0, 1, 2, 3, ...]. In this important special case, Q never changes. It's a constant we know in advance. Therefore we also know its inverse in advance, and the need to invert the matrix goes away completely.