A Modern Introduction to Probability and Statistics
From charlesreid1
Contents
Chapter 17: Basic Statistical Models
Linear Regression
For a bivariate data set Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle (x_1, y_1), (x_2, y_2), \dots, (x_n, y_n)} :
Assume that are not random
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle y_1, y_2, \dots, y_n} are realizations of random variables Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle Y_1, Y_2, \dots, Y_n} that satisfy
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle Y_i = \alpha + \beta x_i + U_i }
for Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle i=1 \dots n}
where Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle U_i} are independent random variables with Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle E(U_i) = 0} (because random fluctuations, expected to be zero about the regression line) and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle Var(U_i) = \sigma^2} (each point has same variance, because assuming each random fluctuation has same amount of variability)
Expectation of each Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle Y_i} is different:
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle E[Y_i] = E[\alpha + \beta x_i + U_i] = \alpha + \beta x_i + E[U_i] = \alpha + \beta x_i }
Multiple Linear Regression
If we considered that the data were better matched by a function like
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://en.wikipedia.org/api/rest_v1/":): {\displaystyle y = \alpha + \beta x + \gamma x^2 }
then it's no longer linear regression, it's multiple linear regression
Chapter 20
Mean Squared Error (MSE)
Discussing unbiased estimators
Comparison of two unbiased estimators:
1. Variance (spread): less spread means better estimator
2. The lower the spread, the lower the MSE, the better the estimator