Design and Modeling for Computer Experiments
From charlesreid1
Fang, Kai-Tai; Li, Runze; Sudjianto, Agus (2006). Design and Modeling for Computer Experiments. Chapman and Hall/CRC.
Contents
Chapter 1: Introduction
Concepts/Definitions
- Factor - controllable variable that is of interest in the experiment
- quantitative vs. qualitative
- quantitative - can be measured on numerical scale (e.g. T, P, ratio, rxn rate, etc.)
- qualitative - values are categories (e.g. operators, material type, etc.); also called categorical factor or indicator factor
- computer experiments: factor = input variable
- Experimental domain - hypercube of all possible factor values (also called input variable space)
- DC approach: this is the initial hypercube
- Run/trial - implementation of level-combination in experimental environment
- Computer experiments: no random error, trials are deterministic
- Response - result of a run/trial based on purpose of experiment
- Can be a function: functional response
- Chapter 7: computer experiments with functional responses
- Responses also called outputs
- Factorial design - set of level-combinations with purpose of estimating main effects, interaction effects among factors
- symmetric - all factors have same number of levels
- asymmetric - factors have diff. numbers of levels
- full factorial design - all level combinations appear
- fractional factorial desgin - subset of all level combinations
- ANOVA models - factorial designs are based on statistical model
- e.g. 1 factor experiment, q levels, expressed as:
- - overall mean of y
- - true value of response y at
- - random error in ith replication of jth level of x ()
- all errors assumed independently/identically distributed according to
- mean decomposed into , where is main effect of y at x_j, and satisfies
- e.g. 2 factor experiment (factor A, factor B):
- = overall mean
- = main effect of factor A/factor B at level i/level j
- = random error in kth trial at level combination
- = interaction between A and B at level combination , under restrictions:
- Factorial design cost - for an s-factor experiment with levels, the number of main effects plus interactions is and this exponentially increases as s increases
- Sparsity principle - number of relatively important effects and interactions in factorial design is small
- Hierarchical ordering principle - lower order effects are more likely to be important than higher order effects; main effects more likely to be important than interactions; effects of same order are equally likely to be important
- Optimal design - given an underlying relationship, different optimization approaches may be taken
- General regression model:
- are specified or known functions
- is random error, and
- This can be applied to several specific cases, e.g. linear model, quadratic model, etc., but g can also be nonlinear functions of x
- Rewriting in matrix form:
- matrix G: design matrix
- row 1 = [ g_1(x_1) \dots g_m(x_1) ]
- row n = [ g_1(x_n) \dots g_m(x_n) ]
- matrix M: information matrix
- covariance matrix of least squares estimator:
- Optimization techniques:
- Want covariance matrix to be small as possible, which suggests maximizing with respect to dsign
- Many different proposed optimization criteria
- D-optimality: maximize determinant of
- Equivalent to minimizing generalized variance (determinant of covariance matrix)
- A-optimality: minimize trace of
- equivalent to minimizing the sum of variances
- E-optimality: minimize largest eigenvalue of
Motivation
- Computer model:
- physical experiments to understand relationship between response y and inputs are too expensive or time consuming
- computer models important for investigating complicated physical phenomena
- one goal of computer experiments is to find an approximate model that is much simpler than the true but complicated model
- True model:
- Metamodel:
Comprehensive review papers:
- Sacks Welch Mitchell Wynn 1989
- Bates Buck Riccomango Wynn 1996
- Koehler Owen 1996
Computer vs. physical experiments
- involve larger numbers of variables compared to typical physical experiments
- larger experiment domain or design space employed to explore nonlinear functions
- computer experimens are deterministic
Uses for metamodels:
- preliminary study and visualization
- prediction and optimization
- sensitivitiy analysis
- probabilistic analysis (effect of input uncertainty on variability of output variable; reliability and risk assessment appliations)
- robust design and reliability-based design
Statistical approach for computer experiments:
- Design - find set of points in input space so that a model can be constructed; e.g. space filling designs
- Modeling - fitting highly adaptive models using various techniques; these are more complex, straightforward interpretations not available, use of sophisticated ANOVA-like global sensitivity analysis needed to interpret metamodel
General discussion of computer models and their use in industry (internal combustion engine application)
- Robust design and prbabilistic-based design optimization approaches have been proposed:
- Wu Wang 1998
- Du Chen 2004
- Kalagnanam Diwekar 1997
- Du Sudjianto Chen 2004
- Hoffman Sudjianto Du Stout 2003
- Yang et al 2001
- Simpson Booker Ghosh Giunta Koch Yang 2000
- Du et al 2004
- Factorial design widely used in industrial designs
- Montgomery 2001
- Wu Hamada 2000
Space-filling designs
- trying to minimize deviation between expensive/full model and metamodel
- stochastic approaches - e.g. latin hypercube sampling (LHS)
- deterministic approaches - e.g. uniform design
Chapter 2/3: LHS and UD designs
Koehler Owen 1996: different way to classify approaches to computer experiments
- "There are two main statistical approaches to computer experiments, one based on Bayesian statistics and a frequentist one based on sampling techniques."
- LHS, UD = frequentist experimental designs
- optimal LHS designs = Bayesian designs
Modeling Techniques
Metamodels: can be represented using linear combination of set of specific basis functions
Univariate Functions
Polynomial models:
- popular for computer experiments
- 2nd order polynomial models most popular
- "response surfaces" refers to 2nd order polynomial models
- Myers Montgomery 1995
- Morris Mitchell 1995
- problems:
- unstable computations... bypassed by centering variables, e.g. replace with
- collinearilty problem with high-order polynomials... bypassed by using orthogonal polynomial models
- splines - variation of polynomial models ddesigned to work in high collinearity/high order case, better than polynomials alone
Fourier basis models:
- True model is approximated using Fourier regression, set of periodic funcitons
- Number of terms increases exponentially with dimension
- In practice, one particular Fourier metamodel used
- Riccomango, Schwabe, Wynn 1997
- Wavelets:
- used to improve Fourier basis
- work esp. well when function being approximated is not smooth
- Chui 1992
- Daubechies 1992
- Antoniadis Oppenheim 1995
polynomials, splines, fourier bases, and wavelets are powerful for univariate functions, but lose effectiveness and applicability for multivariate functions
Multivariate Functions
Kriging model
- assumes that = overall mean of
- = Gaussian process with mean 0 and covariance function
- = unknown variance of
- R = correlation function with pre-specified functional form, some unknown parameters
- Typical correlation function:
- are unknowns
- Resulting metamodel can be written
- which is of the general form of the linear combination of basis functions
- advantage of Krigging approach: constructs the basis directly using the correlation function
- under certain conditions, it can be shown that resulting metamodel from Kriging approach is the best linear unbiased predictor (see Ch. 5.4.1)
- Gaussian Kriging approach admits a Bayesian interpretation
Bayesian interpolation
- Proposed by:
- Currin Mitchell Morris Ylvisaker 1991
- Morris Mitchell Ylvisaker 1993
- advantage: can easily incorporate auxiliary information
- e.g.: Bayesian Kriging method
- Morris et al 1993
Neural networks (multilayer perceptron network)
- mathematical definition...
- nonlinear optimization for a least squares objective function
- training algorithms
- etc.
Radial basis function methods
- used for neural network modeling, closely relate to Kriging approach
- generally, for design inputs and associated outputs :
- K \left( \left\Vert \mathbf{x} - \mathbf{x}_i \right\Vert / \theta \right), i=1,\dots,n_i</math>
- = kernel function
- = smoothing parameter
- Resulting metamodel:
- if kernel function is taken to be the Gaussian kernel function (density function of normal distribution, the resulting metamodel has the same form as the Kriging metamodel
Local polynomial models
- Fan 1992
- Fan Gijbels 1996
- Concept: data point closer to carries more information about the value of than one that is further away
- regression function estimator: running local average
- improved version of local average: locally weighted average, e.g.
Chapter 5 - more explanations of these various modeling approaches, more modeling techniques, etc.
Book Map
Part II: design of computer experiments
Chapter 2:
- Latin hypercube sampling
- its modifications
- randomized orthogonal array
- symmetric Latin hypercube sampling
- optimal Latin hypercube designs
Chapter 3:
- uniform design
- measures of uniformity
- modified L2-discrepancies
- algebraic approaches for constructing several classes of uniform design
Chapter 4
- stochastic optimization techniques for constructing optimal space-filling designs
- heuristic optimization algorithms
- high-quality space filling designs under variuos optimality criteria
- popular algorithms
Chapter 5:
- introduction to various modeling techniques
- fundamental concepts
- logical progression from simple to increasingly complex models
- polynomial models
- splines
- kriging
- bayesian approaches
- neural networks
- local polynomials
- unified view of all models is provided
- Kriging is a central concept to this chapter
Chapter 6
- special techniques for model interpretatino
- generalizations of the traditional ANOVA (analysis of variance) for linear models
- highly recommended that readers study this chapter, especially those interested in understanding sensitivity of input variables to the output
- beginning: traditional sum-of-squares decomposition (linear models)
- sequential sum of squares decomposition for general models
- Sobol functional decomposition (generalization of ANOVA decomposition)
- analytic functional decomposition (tensor product metamodel structures)
- computational technique: FAST (fourier amplitude sensitivity test)
Chapter 7
- computer experiments with functional responses
- response is in the form of a curve, where response data are collected over a range of time interval, space interval, or operation interval (e.g. experiment measuring engine noise at range of speeds)
- analysis of functional response in context of design of experiments is new area
Chapter 2: Latin Hypercube Sampling and its Modifications
Chapter 3: Uniform Experimental Data
Chapter 4: Optimization in Construction of Designs for Computer Experiments
Chapter 5: Metamodeling
Review of Modeling Concepts
Mean square error and prediction error
Mean square error (MSE):
No random error in computer experiments, so MSE = PE (prediction error)
Weighted mean square error (WMSE):
is a weighted function, with
The weighting function allows you to incorporate prior information about the distribution of over the domain
When no information about distribution of x over domain is available, it is assumed to be uniformly distributed
In cases where computer experiments are expensive (order of hours), impractical to evaluate prediction error directly
General strategy: estimate prediction error of metamodel g by using cross-validation procedure
for let denote the metamodel based on the sample excluding .
Cross Validation for Prediction Error of Expensive Models
Cross validation score:
This gives a good estimate for the prediction error of g
Procedure also called "leave-one-out cross validation"
If sample size n is large, and process of building nonlinear metamodel is time-consuming (Kriging and/or neural network models), using scores becomes computationally too demanding (b/c need to build n metamodels)
To reduce computational burden even further, can modify procedure
For pre-specified K, divide sample into K groups (equal sample size)
Let be metamodel built on sampel excluding observations in the kth group
Let be vector consisting of observed values and predicted values for the kth group using the respectively
This yields a K-fold cross validation score:
Regularization Parameter
For most modeling procedures: metamodel g depends on a regularization parameter, say
Cross validation score depends on regularization parameter, denoted by (either or )
Goal is to minimize cross validation score with respect to
Minimization done by searching over grid values of
Theoretical properties of cross validation:
- Li 1987
General Metamodel Form
Metamodels can be generally expressed in the form:
where are a set of basis functions defined over experimental domain (hypercube)
This may be a polynomial basis function, covariance function (Kriging), radial basis function (neural networks), etc.
Outputs of computer experiments are deterministic: therefore metamodel construction is interpolation problem
Matrix notation:
and