From charlesreid1

Overview of Experimental Design and Surrogate Models

The Problem Statement

Purpose: create a cheap representation of an expensive computer model

We're picking some input parameters, and some output variables

Normally there is a map from one to the other: the real function $ f $,

$ \boldsymbol{y} = f(\boldsymbol{x}) $

And we're creating a surrogate model $ g $,

$ \boldsymbol{y} = g(\boldsymbol{x}) $

This is sometimes called a "metamodel", because it's a model of a model

Classes of Surrogate Models

There are several classes or forms for $ g $

  • Latin hypercube
  • Space-filling
  • Uniform
  • Neural networks
  • Gaussian
  • Polynomials (response surface methodology)

I won't cover all, I will only cover latin hypercube, space-filling, and response surface methodologies

Surrogate Modeling

When constructing surrogate models, important to distinguish between computer surrogate modeling (metamodeling) and experimental surrogate modeling

Big difference: experiments have random errors


Basic Concepts for Experiments

Analysis of Variance tables

Basic Concepts for Metamodeling

Metamodeling: regression on data without random errors

Trying to predict true value $ f(\boldsymbol{x}) $ using surrogate model $ g(\boldsymbol{x}) $

Mean square error:

$ MSE(g) = \int_R \left( f(\boldsymbol{x}) - g(\boldsymbol{x}) \right)^2 d\boldsymbol{x} $

where R is the region in parameter space where the metamodel applies

Example

Example function:

Real function f:

function real = real_function()                                      
                                                                     
% Define the domain of the real function                             
x=0:(pi/32):2*pi;                                                    
                                                                     
real = 2*x.*cos(4*pi*x);

Surrogate function f:

function surrogate = surrogate_function()

% Define the region in which the function is valid
x = 0:(pi/32):2*pi;

surrogate = 0.9931 + 1.96*(x-0.5) - 76.8838*(x-0.5).^2 - 152.0006*(x-0.5).^3 ...
        + 943.8565*(x-0.5).^4 + 1857.1427*(x-0.5).^5 - 3983.9332*(x-0.5).^6 ...
        - 7780.7937*(x-0.5).^7 + 5756.3561*(x-0.5).^8 + 11147.1698*(x-0.5).^9;

Comparing the two functions:

ExpDesignTwoFunctions.png

And comparing their error:

ExpDesignTwoFunctionsError.png

Mean square error:

r=real_function;
s=surrogate_function;
MSE = sum( (r-s).^2 );

MSE = 1.6354

Monte Carlo Sampling

Monte Carlo sampling is essentially a brute-force technique in which random samples are taken until confidence that the entire space has been sampled is satisfactory.

Latin Hypercube

Latin Hypercube is a way of sampling a space randomly, but in such a way that each dimension of the space is sampled.

For example, in the following figure, one sample falls into each bin of each of the x and y dimensions:

ExpDesignLatinHypercube.png

For a domain divided into $ n $ bins, each bin has an equal marginal probability of $ 1/n $

Algorithm

Purpose: create an experimental design with $ n $ runs (number of samples to be taken), and $ s $ input variables

The result should be a Latin hypercube design that is an $ n \times s $ matrix denoting the variable combinations at which to sample

Step 1: take $ s $ independent permutations of $ n $ integers $ \pi_{j}(1) \dots \pi_{j}(n) $

(note that $ j $ indexes the dimension of the Latin hypercube, $ j=1 \dots s $, and $ n $ is the number of runs or experiments)

Step 2: Take $ ns $ random numbers $ U_{k}^{j} $ and compute the locations of the Latin hypercube samples as:

$ x_{k}^{j} = \frac{ \pi_{j}(k) - U_{k}^{j} }{ n } $

where $ k = 1 \dots n $ and $ j = 1 \dots s $

Variation

One variation is centered Latin hypercube sampling

Each sample location is given by:

$ x_{k}^{i} = \frac{ \pi^{j}(k) - 0.5 }{ n } $

where $ k = 1 \dots n $ indexes which experiment (or run)

(this technique does not require random numbers)

LHS in Matlab

If you have the statistical toolbox (which CHPC @ University of Utah does), Matlab has an LHS function available to you: lhsdesign

Documentation is available here: http://www.mathworks.com/help/toolbox/stats/lhsdesign.html

Space-Filling

Response surface

More in detail on this

Response Surface Methodology

Factorial Design

Fractional Factorial Design

Full Factorial Design

Composite Design

Other Alternatives

Box-Behnkin

Whatever Others