Fang, Kai-Tai; Li, Runze; Sudjianto, Agus (2006). Design and Modeling for Computer Experiments. Chapman and Hall/CRC.

Chapter 1: Introduction

Concepts/Definitions

Factor - controllable variable that is of interest in the experiment
- quantitative vs. qualitative
- quantitative - can be measured on numerical scale (e.g. T, P, ratio, rxn rate, etc.)
- qualitative - values are categories (e.g. operators, material type, etc.); also called categorical factor or indicator factor
- computer experiments: factor = input variable

Experimental domain - hypercube of all possible factor values (also called input variable space)
- DC approach: this is the initial hypercube $H=\bigcap \left[\alpha _{i}\leq x_{i}\leq \beta _{i}\right]$

Run/trial - implementation of level-combination in experimental environment
- Computer experiments: no random error, trials are deterministic

Response - result of a run/trial based on purpose of experiment
- Can be a function: functional response
- Chapter 7: computer experiments with functional responses
- Responses also called outputs

Factorial design - set of level-combinations with purpose of estimating main effects, interaction effects among factors
- symmetric - all factors have same number of levels
- asymmetric - factors have diff. numbers of levels
- full factorial design - all level combinations appear
- fractional factorial desgin - subset of all level combinations

ANOVA models - factorial designs are based on statistical model
- e.g. 1 factor experiment, q levels, expressed as:

$y_{ij}=\mu +\alpha _{j}+\epsilon _{ij}=\mu _{j}+\epsilon _{ij},j=1\dots q,i=1\dots n_{j}$

- - $\mu$ - overall mean of y
  - $\mu _{j}$ - true value of response y at $x_{j}$
  - $\epsilon _{ij}$ - random error in ith replication of jth level of x ( $x_{j}$ )
  - all errors $\epsilon _{ij}$ assumed independently/identically distributed according to $N(0,\sigma ^{2})$
  - mean decomposed into $\mu _{j}=\mu +\alpha _{j}$ , where $\alpha _{j}$ is main effect of y at x_j, and satisfies $\sum _{j=1}^{q}\alpha _{j}=0$

- e.g. 2 factor experiment (factor A, factor B):

${\begin{aligned}y_{ijk}&=&\mu +\alpha _{i}+\beta _{j}+(\alpha \beta )_{ij}+\epsilon _{ijk}\\&&i=1,\dots ,p\\&&j=1,\dots ,q\\&&k=1,\dots ,K\end{aligned}}$

- - $\mu$ = overall mean
  - $\alpha _{i},\beta _{j}$ = main effect of factor A/factor B at level i/level j
  - $\epsilon _{ijk}$ = random error in kth trial at level combination $\alpha _{i}\beta _{j}$
  - $(\alpha \beta )_{ij}$ $(\alpha \beta )_{ij}$ = interaction between A and B at level combination $\alpha _{i}\beta _{j}$ $\alpha _{i}\beta _{j}$ , under restrictions:
    - $\sum _{i=1}^{p}\alpha _{i}=0$
    - $\sum _{j=1}^{q}\beta _{j}=0$
    - $\sum _{i=1}^{p}(\alpha \beta )_{ij}=\sum _{j=1}^{q}(\alpha \beta )_{ij}=0$

Factorial design cost - for an s-factor experiment with $q_{1},\dots ,q_{s}$ $q_{1},\dots ,q_{s}$ levels, the number of main effects plus interactions is $\prod _{j=1}^{s}q_{j}-1$ $\prod _{j=1}^{s}q_{j}-1$ and this exponentially increases as s increases
- Sparsity principle - number of relatively important effects and interactions in factorial design is small
- Hierarchical ordering principle - lower order effects are more likely to be important than higher order effects; main effects more likely to be important than interactions; effects of same order are equally likely to be important

Optimal design - given an underlying relationship, different optimization approaches may be taken
- General regression model:
- $y_{k}=\sum _{j=1}^{m}\beta _{j}g_{j}(x_{k1},\dots ,x_{ks})+\epsilon _{k}=\sum _{j=1}^{m}\beta _{j}g_{j}({\boldsymbol {x}}_{k})+\epsilon _{k},k=1,\dots ,n$
- $g_{j}(\cdot )$ are specified or known functions
- $\epsilon$ is random error, $E(\epsilon )=0$ and $Var(\epsilon )=\sigma ^{2}$
- This can be applied to several specific cases, e.g. linear model, quadratic model, etc., but g can also be nonlinear functions of x
- Rewriting in matrix form:
- $\mathbf {y} =\mathbf {G} {\boldsymbol {\beta }}+\epsilon$
- matrix G: design matrix
  - row 1 = [ g_1(x_1) \dots g_m(x_1) ]
  - row n = [ g_1(x_n) \dots g_m(x_n) ]
- matrix M: information matrix
  - $\mathbf {M} ={\frac {1}{n}}\mathbf {G^{\prime }G}$
- covariance matrix of least squares estimator:
  - $Cov({\hat {\beta }})={\frac {\sigma ^{2}}{n}}\mathbf {M} ^{-1}$
- Optimization techniques:
  - Want covariance matrix to be small as possible, which suggests maximizing $\mathbf {M}$ with respect to dsign $D_{n}=\left[\mathbf {x} k=(x_{1},\dots ,x_{s})\right]$
- Many different proposed optimization criteria
- D-optimality: maximize determinant of $\mathbf {M}$ $\mathbf {M}$
  - Equivalent to minimizing generalized variance (determinant of covariance matrix)
- A-optimality: minimize trace of $\mathbf {M} ^{-1}$ $\mathbf {M} ^{-1}$
  - equivalent to minimizing the sum of variances $\sum _{i=1}^{m}{\hat {\beta }}_{i}$
- E-optimality: minimize largest eigenvalue of $\mathbf {M} ^{-1}$

Motivation

Computer model: $y=f(\mathbf {x} )$
physical experiments to understand relationship between response y and inputs $x_{j}$ are too expensive or time consuming
computer models important for investigating complicated physical phenomena
one goal of computer experiments is to find an approximate model that is much simpler than the true but complicated model
True model: $y=f(\mathbf {x} )$
Metamodel: ${\hat {y}}=f(\mathbf {x} )$

Comprehensive review papers:

Sacks Welch Mitchell Wynn 1989
Bates Buck Riccomango Wynn 1996
Koehler Owen 1996

Computer vs. physical experiments

involve larger numbers of variables compared to typical physical experiments
larger experiment domain or design space employed to explore nonlinear functions
computer experimens are deterministic

Uses for metamodels:

preliminary study and visualization
prediction and optimization
sensitivitiy analysis
probabilistic analysis (effect of input uncertainty on variability of output variable; reliability and risk assessment appliations)
robust design and reliability-based design

Statistical approach for computer experiments:

Design - find set of points in input space so that a model can be constructed; e.g. space filling designs
Modeling - fitting highly adaptive models using various techniques; these are more complex, straightforward interpretations not available, use of sophisticated ANOVA-like global sensitivity analysis needed to interpret metamodel

General discussion of computer models and their use in industry (internal combustion engine application)

Robust design and prbabilistic-based design optimization approaches have been proposed:
- Wu Wang 1998
- Du Chen 2004
- Kalagnanam Diwekar 1997
- Du Sudjianto Chen 2004
- Hoffman Sudjianto Du Stout 2003
- Yang et al 2001
- Simpson Booker Ghosh Giunta Koch Yang 2000
- Du et al 2004

Factorial design widely used in industrial designs
- Montgomery 2001
- Wu Hamada 2000

Space-filling designs

trying to minimize deviation between expensive/full model and metamodel
stochastic approaches - e.g. latin hypercube sampling (LHS)
deterministic approaches - e.g. uniform design

Chapter 2/3: LHS and UD designs

Koehler Owen 1996: different way to classify approaches to computer experiments

"There are two main statistical approaches to computer experiments, one based on Bayesian statistics and a frequentist one based on sampling techniques."
- LHS, UD = frequentist experimental designs
- optimal LHS designs = Bayesian designs

Modeling Techniques

Metamodels: can be represented using linear combination of set of specific basis functions

Univariate Functions

Polynomial models:

popular for computer experiments
2nd order polynomial models most popular
$g(\mathbf {x} )=\beta _{0}+\sum _{i=1}^{s}\beta _{i}x_{i}+\sum _{i=1}^{s}\sum _{j=1}^{s}\beta _{ij}x_{i}x_{j}$
"response surfaces" refers to 2nd order polynomial models
- Myers Montgomery 1995
- Morris Mitchell 1995
problems:
- unstable computations... bypassed by centering variables, e.g. replace $x_{i}$ with $x_{i}-{\overline {x_{i}}}$
- collinearilty problem with high-order polynomials... bypassed by using orthogonal polynomial models
splines - variation of polynomial models ddesigned to work in high collinearity/high order case, better than polynomials alone

Fourier basis models:

True model is approximated using Fourier regression, set of periodic funcitons
Number of terms increases exponentially with dimension
In practice, one particular Fourier metamodel used
- Riccomango, Schwabe, Wynn 1997
Wavelets:
- used to improve Fourier basis
- work esp. well when function being approximated is not smooth
- Chui 1992
- Daubechies 1992
- Antoniadis Oppenheim 1995

polynomials, splines, fourier bases, and wavelets are powerful for univariate functions, but lose effectiveness and applicability for multivariate functions

Multivariate Functions

Kriging model

assumes that $y(\mathbf {x} )=\mu +z(\mathbf {x} )</math**<math>\mu$ $y(\mathbf {x} )=\mu +z(\mathbf {x} )</math**<math>\mu$ = overall mean of $y(\mathbf {x} )$ $y(\mathbf {x} )$
- $z(\mathbf {x} )$ = Gaussian process with mean 0 and covariance function $Cov(z(\mathbf {x} _{i}),z(\mathbf {x} _{j})=\sigma ^{2}R(\mathbf {x} _{i},\mathbf {x} _{j})$
- $\sigma ^{2}$ = unknown variance of $z(\mathbf {x} )$
- R = correlation function with pre-specified functional form, some unknown parameters
- Typical correlation function: $r(\mathbf {x} _{1},\mathbf {x} _{2})=exp\left[-\sum _{k=1}^{s}\Theta _{i}(x_{k1}-x_{k2})^{2}\right]$
- $\Theta _{i}$ are unknowns

Resulting metamodel can be written $g(\mathbf {x} )=\sum _{i=1}^{n}\beta _{i}r(\mathbf {x} ,\mathbf {x} _{i})$ $g(\mathbf {x} )=\sum _{i=1}^{n}\beta _{i}r(\mathbf {x} ,\mathbf {x} _{i})$
- which is of the general form of the linear combination of basis functions
- advantage of Krigging approach: constructs the basis directly using the correlation function
- under certain conditions, it can be shown that resulting metamodel from Kriging approach is the best linear unbiased predictor (see Ch. 5.4.1)
- Gaussian Kriging approach admits a Bayesian interpretation

Bayesian interpolation

Proposed by:
- Currin Mitchell Morris Ylvisaker 1991
- Morris Mitchell Ylvisaker 1993
advantage: can easily incorporate auxiliary information
- e.g.: Bayesian Kriging method
- Morris et al 1993

Neural networks (multilayer perceptron network)

mathematical definition...
nonlinear optimization for a least squares objective function
training algorithms
etc.

Radial basis function methods

used for neural network modeling, closely relate to Kriging approach
generally, for design inputs $\mathbf {x} _{1},\dots ,\mathbf {x} _{n}$ and associated outputs $y_{1},\dots ,y_{n}$ :
K \left( \left\Vert \mathbf{x} - \mathbf{x}_i \right\Vert / \theta \right), i=1,\dots,n_i</math>
- $K(\cdot )$ = kernel function
- $\theta$ = smoothing parameter
Resulting metamodel:
- $g(\mathbf {x} )=\sum _{i=1}^{n}\beta _{i}K\left(\left\Vert \mathbf {x} -\mathbf {x} _{i}\right\Vert /\theta \right)$
if kernel function is taken to be the Gaussian kernel function (density function of normal distribution, the resulting metamodel has the same form as the Kriging metamodel

Local polynomial models

Fan 1992
Fan Gijbels 1996
Concept: data point closer to $\mathbf {x}$ carries more information about the value of $f(\mathbf {x} )$ than one that is further away
regression function estimator: running local average
improved version of local average: locally weighted average, e.g.
$g(\mathbf {x} )=\sum _{i=1}^{n}w_{i}(\mathbf {x} )y_{i}$

Chapter 5 - more explanations of these various modeling approaches, more modeling techniques, etc.

Book Map

Part II: design of computer experiments

Chapter 2:

Latin hypercube sampling
its modifications
- randomized orthogonal array
- symmetric Latin hypercube sampling
- optimal Latin hypercube designs

Chapter 3:

uniform design
measures of uniformity
modified L2-discrepancies
algebraic approaches for constructing several classes of uniform design

Chapter 4

stochastic optimization techniques for constructing optimal space-filling designs
heuristic optimization algorithms
high-quality space filling designs under variuos optimality criteria
popular algorithms

Chapter 5:

introduction to various modeling techniques
fundamental concepts
logical progression from simple to increasingly complex models
- polynomial models
- splines
- kriging
- bayesian approaches
- neural networks
- local polynomials
unified view of all models is provided
Kriging is a central concept to this chapter

Chapter 6

special techniques for model interpretatino
generalizations of the traditional ANOVA (analysis of variance) for linear models
highly recommended that readers study this chapter, especially those interested in understanding sensitivity of input variables to the output
beginning: traditional sum-of-squares decomposition (linear models)
sequential sum of squares decomposition for general models
Sobol functional decomposition (generalization of ANOVA decomposition)
analytic functional decomposition (tensor product metamodel structures)
computational technique: FAST (fourier amplitude sensitivity test)

Chapter 7

computer experiments with functional responses
response is in the form of a curve, where response data are collected over a range of time interval, space interval, or operation interval (e.g. experiment measuring engine noise at range of speeds)
analysis of functional response in context of design of experiments is new area

Chapter 2: Latin Hypercube Sampling and its Modifications

Chapter 3: Uniform Experimental Data

Chapter 4: Optimization in Construction of Designs for Computer Experiments

Chapter 5: Metamodeling

Review of Modeling Concepts

Mean square error and prediction error

Mean square error (MSE):

$MSE(g)=\int _{T}\left[f(\mathbf {x} )-g(\mathbf {x} )\right]^{2}d\mathbf {x}$

No random error in computer experiments, so MSE = PE (prediction error)

Weighted mean square error (WMSE):

$WMSE(g)=\int _{T}\left[f(\mathbf {x} -g(\mathbf {x} )\right]^{2}w(\mathbf {x} )d\mathbf {x}$

$w(\mathbf {x} )\geq 0$ is a weighted function, with $\int _{T}w(\mathbf {x} )d\mathbf {x} =1$

The weighting function allows you to incorporate prior information about the distribution of $\mathbf {x}$ over the domain

When no information about distribution of x over domain is available, it is assumed to be uniformly distributed

In cases where computer experiments are expensive (order of hours), impractical to evaluate prediction error directly

General strategy: estimate prediction error of metamodel g by using cross-validation procedure

for $i=1,\dots ,n$ let $g_{-i}$ denote the metamodel based on the sample excluding $(\mathbf {x} _{i},y_{i})$ .

Cross Validation for Prediction Error of Expensive Models

Cross validation score:

$CV_{n}={\frac {1}{2}}\sum _{i=1}^{n}\left[f(x_{i})-g_{-i}(\mathbf {x} _{i})\right]^{2}$

This gives a good estimate for the prediction error of g

Procedure also called "leave-one-out cross validation"

If sample size n is large, and process of building nonlinear metamodel is time-consuming (Kriging and/or neural network models), using $CV_{1}$ scores becomes computationally too demanding (b/c need to build n metamodels)

To reduce computational burden even further, can modify procedure

For pre-specified K, divide sample into K groups (equal sample size)

Let $g_{(-k)}$ be metamodel built on sampel excluding observations in the kth group

Let $\mathbf {y} _{(k)},\mathbf {g} _{(k)}$ be vector consisting of observed values and predicted values for the kth group using the $g_{(-k)}$ respectively

This yields a K-fold cross validation score:

$CV_{k}={\frac {1}{n}}\sum _{k=1}^{K}\left(\mathbf {y} _{(k)}-\mathbf {g} _{(k)}\right)^{\prime }\left(\mathbf {y} _{(k)}-\mathbf {g} _{(k)}\right)$

Regularization Parameter

For most modeling procedures: metamodel g depends on a regularization parameter, say $\lambda$

Cross validation score depends on regularization parameter, denoted by $CV(\lambda )$ (either $CV_{1}$ or $CV_{K}$ )

Goal is to minimize cross validation score with respect to $\lambda$

${\hat {\lambda }}=minCV(\lambda )$

Minimization done by searching over grid values of $\lambda$

Theoretical properties of cross validation:

Li 1987

General Metamodel Form

Metamodels can be generally expressed in the form:

$g(\mathbf {x} )=\sum _{j=0}^{L}B_{j}(\mathbf {x} )\beta _{j}$

where $B_{j}$ are a set of basis functions defined over experimental domain (hypercube)

This may be a polynomial basis function, covariance function (Kriging), radial basis function (neural networks), etc.

Outputs of computer experiments are deterministic: therefore metamodel construction is interpolation problem

Matrix notation:

${\begin{array}{rcl}\mathbf {y} &=&(y_{1},\dots ,y_{n})^{\prime }\\{\boldsymbol {\beta }}&=&(\beta _{0},\dots ,\beta _{L})^{\prime }\\\mathbf {b(x)} &=&(B_{0}(\mathbf {x} ),\dots ,B_{L}(\mathbf {x} ))^{\prime }\end{array}}$

and

$\mathbf {B} =\left({\begin{array}{ccc}B_{0}(\mathbf {x} _{1})&\dots &B_{L}(\mathbf {x} _{1})\\B_{0}(\mathbf {x} _{2})&\dots &B_{L}(\mathbf {x} _{2})\\\dots &\dots &\dots \\B_{0}(\mathbf {x} _{n})&\dots &B_{L}(\mathbf {x} _{n})\end{array}}\right)$

Design and Modeling for Computer Experiments

From charlesreid1

Contents