From charlesreid1

Overview of Experimental Design and Surrogate Models

The Problem Statement

Purpose: create a cheap representation of an expensive computer model

We're picking some input parameters, and some output variables

Normally there is a map from one to the other: the real function ,

And we're creating a surrogate model ,

This is sometimes called a "metamodel", because it's a model of a model

Classes of Surrogate Models

There are several classes or forms for

  • Space-filling designs
    • Latin hypercube
  • Uniform sampling
  • Neural networks
  • Gaussian
  • Polynomials (response surface methodology)

I won't cover all, I will only cover latin hypercube, space-filling, and response surface methodologies

Surrogate Modeling

When constructing surrogate models, important to distinguish between computer surrogate modeling (metamodeling) and experimental surrogate modeling

Big difference: experiments have random errors

Basic Concepts for Experiments

Analysis of Variance tables

Basic Concepts for Metamodeling

Metamodeling: regression on data without random errors

Trying to predict true value using surrogate model

Mean square error:

where R is the region in parameter space where the metamodel applies

Example

Example function:

Real function f:

function real = real_function()                                      
                                                                     
% Define the domain of the real function                             
x=0:(pi/32):2*pi;                                                    
                                                                     
real = 2*x.*cos(4*pi*x);

Surrogate function f:

function surrogate = surrogate_function()

% Define the region in which the function is valid
x = 0:(pi/32):2*pi;

surrogate = 0.9931 + 1.96*(x-0.5) - 76.8838*(x-0.5).^2 - 152.0006*(x-0.5).^3 ...
        + 943.8565*(x-0.5).^4 + 1857.1427*(x-0.5).^5 - 3983.9332*(x-0.5).^6 ...
        - 7780.7937*(x-0.5).^7 + 5756.3561*(x-0.5).^8 + 11147.1698*(x-0.5).^9;

Comparing the two functions:

ExpDesignTwoFunctions.png

And comparing their error:

ExpDesignTwoFunctionsError.png

Mean square error:

r=real_function;
s=surrogate_function;
MSE = sum( (r-s).^2 );

MSE = 1.6354

Latin Hypercube

Latin Hypercube is a way of sampling a space randomly, but in such a way that each dimension of the space is sampled.

For example, in the following figure, one sample falls into each bin of each of the x and y dimensions:

ExpDesignLatinHypercube.png

For a domain divided into bins, each bin has an equal marginal probability of

Algorithm

Purpose: create an experimental design with runs (number of samples to be taken), and input variables

The result should be a Latin hypercube design that is an matrix denoting the variable combinations at which to sample

Step 1: take independent permutations of integers

(note that indexes the dimension of the Latin hypercube, , and is the number of runs or experiments)

Step 2: Take Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle ns} random numbers Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle U_{k}^{j}} and compute the locations of the Latin hypercube samples as:

Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle x_{k}^{j}={\frac {\pi _{j}(k)-U_{k}^{j}}{n}}}

where Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle k=1\dots n} and

Variation

One variation is centered Latin hypercube sampling

Each sample location is given by:

Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle x_{k}^{i}={\frac {\pi ^{j}(k)-0.5}{n}}}

where Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle k=1\dots n} indexes which experiment (or run)

(this technique does not require random numbers)

LHS in Matlab

If you have the statistical toolbox (which CHPC @ University of Utah does), Matlab has an LHS function available to you: lhsdesign

Documentation is available here: http://www.mathworks.com/help/toolbox/stats/lhsdesign.html

Discussion Question

Why might the LHS sample (same as given above):

ExpDesignLatinHypercube.png

be better than the one given below?

ExpDesignLinearHypercube.png

Monte Carlo Sampling

Roulette.jpg
Roulette illustrates the random nature of the function samples used in Monte Carlo methods. A good random number generator (for example, a roulette table) is an essential part of any Monte Carlo method.

Monte Carlo sampling is essentially a brute-force technique in which random samples are taken until confidence that the entire space has been sampled is satisfactory.

Random numbers are used to create sampling points in each direction.

Think of Monte Carlo ray-tracing: you send out a whole bunch of rays, each in random directions, and from the result you determine the radiative flux. Mathematically, you're performing an integration by randomly sampling the function you want to integrate, then adding up all of the random samples:

Discussion Question

Why use Monte Carlo sampling vs. Latin Hypercube sampling?

Response Surfaces

See below for details of the response surface methodology.

Response Surface Methodology

The response surface methodology for surrogate models (and experimental design) uses polynomials to approximate the response surface.

For a function , the response surface is the parameter-space plot of the output function as a function of the input parameters Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle {\boldsymbol {x}}}

Last week, Alex showed us a preliminary response surface plot of the output, product yield Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle y_{P}} , as a function of two input paramters, and :

ToyProblemResponseSurface.png

The response surface methodology prescribes ways to sample the output function in order to produce the best polynomial representation possible

Terminology

A factor is an input parameter

A level is a discrete value that the parameter can take on

For example: if I want to investigate the effect of on Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle y_{P}} , I have to decide the discrete values of that I want to try

If I choose, for example, 1 and 10, then the factor has two levels Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle {1,10}}

Factorial Design

Factorial designs are experimental designs for linear functions of input parameters. That is, they assume a linear model (given a -dimensional input parameter vector) of the form:

Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle y({\boldsymbol {x}})=\alpha _{0}+\sum _{i=1}^{D}\alpha _{i}x_{i}+\sum _{j=1}^{D}\sum _{k=1}^{j}\alpha _{jk}x_{j}x_{k}+\dots }


One Factor At A Time (OFAAT)

OFAAT testing isolates the interaction effects of each variable, and assumes each variable is independent

Given: a set of factors Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle i=1\dots k}

The algorithm is as follows:

1. Fix the levels of all but one factor Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle i=2,3,\dots k} , and determine the optimal level for factor

2. Use the optimal level for factor . Fix levels for all but one factor Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle i=3,4,\dots k} and determine the optimal level for factor Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle i=2}

3. Use the optimal level for factor Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle i=1,2} . Fix levels for all but one factor Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle i=4,5,\dots k} , etc.

etc.

etc.

Full Factorial Design

The following discussion assumes use of two factor levels. Using more than 2 factor levels can potentially make factorial experimental design very complex.

The full factorial algorithm is as follows:

1. Assign the value Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle +1} or Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle -1} to the upper and lower factor levels, respectively.

2. Create a table of factor-level combinations. For example:

Factor-level combination enumeration Factor 1 Factor 2
1 +1 +1
2 +1 -1
3 -1 +1
4 -1 -1

3. Generate a sequence of random numbers, where is the number of factor-level combinations.

4. Run each sequence in the determined order

Fractional Factorial Design

Fractional factorial designs reduce the number of experimental runs required to construct the polynomial, but they throw out some information about interaction effects. Typically, information about the highest-order effects is thrown out.

The essential technique is to determine which experiments you do not want to run, and exclude those from Step 4 in the factorial design algorithm given above. And the essential technique for determining which experiments you do not want to run is to determine which combination of experiments will give information about higher-order interaction effects.

To do this, an expanded version of the above table is required: one that includes factor interactions. To determine, for example, the factor interaction between factors 1 and 2, Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle I_{12}} , simply multiply the factor 1 and factor 2 values (e.g. Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle +1\times +1} or Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle -1\times +1} , etc.).

Factor 1 = Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle I_{1}}

Factor 2 = Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle I_{2}}

Factor 3 = Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle I_{3}}

Factor 1-Factor 2 = Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle I_{12}}

Factor 2-Factor 3 = Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle I_{23}}

Factor 1-Factor 3 = Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle I_{13}}

Factor 1-Factor 2-Factor 3 = Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle I_{123}}

Factor-level combination enumeration Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle I_{1}} Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle I_{2}} Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle I_{3}} Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle I_{12}} Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle I_{23}} Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle I_{13}} Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle I_{123}}
1 +1 +1 +1 +1 +1 +1 +1
2 +1 +1 -1 +1 -1 -1 -1
3 +1 -1 +1 -1 -1 +1 -1
4 +1 -1 -1 -1 +1 -1 +1
5 -1 +1 +1 -1 +1 -1 -1
6 -1 +1 -1 -1 -1 +1 +1
7 -1 -1 +1 +1 -1 -1 +1
8 -1 -1 -1 +1 +1 +1 -1

Composite Design

Composite design builds on the full factorial design by adding three additional levels. Rather than just and Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle +1} , composite design also includes Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle -2,0,+2} .

Thinking about a three-dimensional factorial design, the factorial design sample points looks like a cube in parameter space. Composite design makes a "star" design.

The additional sample points consist of the new lower Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle -2} and upper Failed to parse (Conversion error. Server ("https://en.wikipedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle +2} levels for each factor, while keeping all other factors constant at the level. It also includes a center point, at which all factors have a factor level of .

For a two-parameter experimental design, this looks like:

Experimental Design Step Factor 1 Factor 2
Full factorial +1 +1
Full factorial +1 -1
Full factorial -1 +1
Full factorial -1 -1
Composite 0 0
Composite +2 0
Composite -2 0
Composite 0 +2
Composite 0 -2

Other Alternatives

Box-Behnkin

Box-Behnkin designs are intended to minimize the number of function evaluations required to construct a response surface (polynomial surrogate model). The design consists of three levels, rather than the five required by composite designs.

For more information see Box Behnken 1960 <ref name="BoxBehnken1960">Box and Behnken (1960). "Three Level Designs for the Study of Quantitative Variables". Technometrics 2 (4): 455-475. http://files.charlesmartinreid.com/BoxBehnken1960.pdf.  </ref>

Plackett Burman

Experimental designs requiring a very minimal number of sample points.<ref name="PlackettBurman"> Plackett and Burman (1946). "Design of optimum multifactorial experiments". Biometrika 33: 305-325.  </ref>

Sequential Assembly

References

<references />

Links to Papers