A Modern Introduction to Probability and Statistics
From charlesreid1
Chapter 17: Basic Statistical Models
Linear Regression
For a bivariate data set $ (x_1, y_1), (x_2, y_2), \dots, (x_n, y_n) $:
Assume that $ x_1, x_2, \dots x_n $ are not random
$ y_1, y_2, \dots, y_n $ are realizations of random variables $ Y_1, Y_2, \dots, Y_n $ that satisfy
$ Y_i = \alpha + \beta x_i + U_i $
for $ i=1 \dots n $
where $ U_i $ are independent random variables with $ E(U_i) = 0 $ (because random fluctuations, expected to be zero about the regression line) and $ Var(U_i) = \sigma^2 $ (each point has same variance, because assuming each random fluctuation has same amount of variability)
Expectation of each $ Y_i $ is different:
$ E[Y_i] = E[\alpha + \beta x_i + U_i] = \alpha + \beta x_i + E[U_i] = \alpha + \beta x_i $
Multiple Linear Regression
If we considered that the data were better matched by a function like
$ y = \alpha + \beta x + \gamma x^2 $
then it's no longer linear regression, it's multiple linear regression
Chapter 20
Mean Squared Error (MSE)
Discussing unbiased estimators
Comparison of two unbiased estimators:
1. Variance (spread): less spread means better estimator
2. The lower the spread, the lower the MSE, the better the estimator