More with Least Squares

One very cool thing about our formula for the least squares regression line, $$\mathtt{\left(X^{T}X\right)^{-1}X^{T}y}$$, is that it is the same no matter whether we have one independent variable (univariate) or many independent variables (multivariate).

Consider these data, showing the selling prices of some grandfather clocks at auction. The first scatter plot shows the age of the clock in years on the x-axis (100–200), and the second shows the number of bidders on the x-axis (0–20). Price (in pounds or dollars) is on the y-axis on each plot (500–2500).

Age (years)BiddersPrice (\$)
127131235
115121080
1277845
15091522
15661047
182111979
156121822
132101253
13791297
1139946
137151713
117111024
13781147
15361092
117131152
126101336
170142131
18281550
162111884
184102041
1436854
15991483
108141055
17581545
1086729
17991792
111151175
18781593
1117785
1157744
19451356
16871262

You can see in the notebook below that the first regression line, for the price of a clock as a function of its age, is approximately $$\mathtt{10.5x-192}$$. The second regression line, for the price of a clock as a function of the number of bidders at auction, is approximately $$\mathtt{55x+806}$$. As mentioned above, each of these univariate least squares regression lines can be calculated with the formula $$\mathtt{\left(X^{T}X\right)^{-1}X^{T}y}$$.

Combining both age and number of bidders together, we can calculate, using the same formula, a multivariate least squares regression equation. This of course is no longer a line. In the case of two input variables as we have here, our line becomes a plane.

Our final regression equation becomes $$\mathtt{12.74x_{1}+85.82x_{2}-1336.72}$$, with $$\mathtt{x_1}$$ representing the age of a clock and $$\mathtt{x_2}$$ representing the number of bidders.  