Multivariate Statistical Modelling Based on Generalized Linear Models
From charlesreid1
Chapter 1: Book Outline and Notes
Chapter 2: review of univariate generalized linear models and extensions
Chapter 3: models for multicategorical responses (i.e. multiple, unordered responses)
Chapter 4: selecting variables for models, variable reduction procedures, checking models, goodness-of-fit, residual analysis (outliers or consistent trend?)
Chapter 5: Semi- and non-parametric approaches
Chapter 6: Fixed-parameter models for time series (extends Ch. 2 and Ch. 3)
Chapter 7: Random effects models for non-normal data
Chapter 8: State space models for analyzing non-normal time series; relate time series observations y_t to unobserved states, like trend and seasonal components
Chapter 9: Survival models; determination of factors that determine survival/transition
Chapter 2: Univariate Generalized Linear Models
Cross-sectional regression analysis: univariate variable of primary interest (response variable) $ y $
Explained by a vector $ x = (x_1, x_2, \dots x_m) $
Data consist of observations on $ (y,x) $:
$ (y_i , x_i ), i = 1, \dots, n $
Definition of Univariate Generalized Linear Models
classical linear model for ungrouped normal responses and deterministic covariates is:
$ y_i = z_i^{\prime} \beta + \epsilon_i $
where:
$ z_i $ = design vector, function of covariate vector $ x_i $
$ \beta $ = vector of unknown parameters
$ \epsilon_i $ = errors, normally distributed and independent, $ \epsilon_i \sim N(0, \sigma^2) $
The observations $ y_i $ are independent and normally distributed,
$ y_i \sim N(\mu_i, \sigma^2), i=1, \dots, n $
A specific generalized linear model is fully characterized by three components:
- type of exponential family
- response or link function
- design vector
Example:
Exponential family
- important members: normal, binomial, poisson, gamma, inverse Gaussian distributions
Models for Continuous Responses
Normal distribution
Gamma distribution
Inverse Gaussian distribution
Models for Binary and Binomial Responses
Linear probability model
Probit model
Logit model
Complementary log-log model
Models for Counted Data
Log-linear Poisson model
Linear Poisson model
Likelihood Inference
Regression analysis with generalized linear models is based on likelihoods
This section contains inferential tools for:
- parameter estimation
- hypothesis testing
- good-ness-of-fit tests
- more detailed material on model choice/checking: see Chapter 4
Assumes that model is completely and correctly specified
Maximum likelihood estimator (MLE): MLE of unknown parameter vector obtained by maximizing the likelihood
Goodness of fit Statistics
two measures of adequacy of model (goodness of fit) are:
Pearson statistic $ \chi^2 = \sum_{i=1}^{g} \frac{ \left( y_i - \hat{\mu}_i \right)^2 }{ v(\hat{\mu}_i) } $
Deviance $ D = - 2 \phi \sum_{i=1}^{g} \left[ l_i ( \hat{\mu}_i ) - l_i (y_i) \right] $
where
$ \hat{\mu}_i $ = estimated mean function
$ v(\hat{\mu}_i) $ = estimated variance function
$ l_i(y_i) $ = individual log-likelihood
References
Generalized linear models:
- McCullagh and Nelder 1989 - standard source of information about generalized linear models
- Santner and Duffy 1989 - consider cross-classified data and univariate discrete data