From charlesreid1

Chapter 17: Basic Statistical Models

Linear Regression

For a bivariate data set $ (x_1, y_1), (x_2, y_2), \dots, (x_n, y_n) $:

Assume that $ x_1, x_2, \dots x_n $ are not random

$ y_1, y_2, \dots, y_n $ are realizations of random variables $ Y_1, Y_2, \dots, Y_n $ that satisfy

$ Y_i = \alpha + \beta x_i + U_i $

for $ i=1 \dots n $

where $ U_i $ are independent random variables with $ E(U_i) = 0 $ (because random fluctuations, expected to be zero about the regression line) and $ Var(U_i) = \sigma^2 $ (each point has same variance, because assuming each random fluctuation has same amount of variability)

Expectation of each $ Y_i $ is different:

$ E[Y_i] = E[\alpha + \beta x_i + U_i] = \alpha + \beta x_i + E[U_i] = \alpha + \beta x_i $

Multiple Linear Regression

If we considered that the data were better matched by a function like

$ y = \alpha + \beta x + \gamma x^2 $

then it's no longer linear regression, it's multiple linear regression

Chapter 20

Mean Squared Error (MSE)

Discussing unbiased estimators

Comparison of two unbiased estimators:

1. Variance (spread): less spread means better estimator

2. The lower the spread, the lower the MSE, the better the estimator