Revision as of 18:07, 20 June 2011

Overview of Experimental Design and Surrogate Models

The Problem Statement

Purpose: create a cheap representation of an expensive computer model

We're picking some input parameters, and some output variables

Normally there is a map from one to the other: the real function $$ f $$ ,

$\boldsymbol{y} = f(\boldsymbol{x})$

And we're creating a surrogate model $$ g $$ ,

$\boldsymbol{y} = g(\boldsymbol{x})$

This is sometimes called a "metamodel", because it's a model of a model

Classes of Surrogate Models

There are several classes or forms for $$ g $$

Latin hypercube
Space-filling
Uniform
Neural networks
Gaussian
Polynomials (response surface methodology)

I won't cover all, I will only cover latin hypercube, space-filling, and response surface methodologies

Surrogate Modeling

When constructing surrogate models, important to distinguish between computer surrogate modeling (metamodeling) and experimental surrogate modeling

Big difference: experiments have random errors

Basic Concepts for Experiments

Analysis of Variance tables

Basic Concepts for Metamodeling

Metamodeling: regression on data without random errors

Trying to predict true value $f(\boldsymbol{x})$ using surrogate model $g(\boldsymbol{x})$

Mean square error:

$MSE(g) = \int_R \left( f(\boldsymbol{x}) - g(\boldsymbol{x}) \right)^2 d\boldsymbol{x}$

where R is the region in parameter space where the metamodel applies

Example

Example function:

Real function f:

function real = real_function()                                      
                                                                     
% Define the domain of the real function                             
x=0:(pi/32):2*pi;                                                    
                                                                     
real = 2*x.*cos(4*pi*x);

Surrogate function f:

function surrogate = surrogate_function()

% Define the region in which the function is valid
x = 0:(pi/32):2*pi;

surrogate = 0.9931 + 1.96*(x-0.5) - 76.8838*(x-0.5).^2 - 152.0006*(x-0.5).^3 ...
        + 943.8565*(x-0.5).^4 + 1857.1427*(x-0.5).^5 - 3983.9332*(x-0.5).^6 ...
        - 7780.7937*(x-0.5).^7 + 5756.3561*(x-0.5).^8 + 11147.1698*(x-0.5).^9;

Comparing the two functions:

And comparing their error:

Mean square error:

r=real_function;
s=surrogate_function;
MSE = sum( (r-s).^2 );

MSE = 1.6354

Monte Carlo Sampling

Monte Carlo sampling is essentially a brute-force technique in which random samples are taken until confidence that the entire space has been sampled is satisfactory.

Latin Hypercube

Latin Hypercube is a way of sampling a space randomly, but in such a way that each dimension of the space is sampled.

For example, in the following figure, one sample falls into each bin of each of the x and y dimensions:

For a domain divided into $$ n $$ bins, each bin has an equal marginal probability of $$ 1/n $$

Algorithm

Purpose: create an experimental design with $$ n $$ runs (number of samples to be taken), and $$ s $$ input variables

The result should be a Latin hypercube design that is an $n \times s$ matrix denoting the variable combinations at which to sample

Step 1: take $$ s $$ independent permutations of $$ n $$ integers $\pi_{j}(1) \dots \pi_{j}(n)$

(note that $$ j $$ indexes the dimension of the Latin hypercube, $j=1 \dots s$ , and $$ n $$ is the number of runs or experiments)

Step 2: Take $$ ns $$ random numbers $U_{k}^{j}$ and compute the locations of the Latin hypercube samples as:

$x_{k}^{j} = \frac{ \pi_{j}(k) - U_{k}^{j} }{ n }$

where $k = 1 \dots n$ and $j = 1 \dots s$

Variation

One variation is centered Latin hypercube sampling

Each sample location is given by:

$x_{k}^{i} = \frac{ \pi^{j}(k) - 0.5 }{ n }$

where $k = 1 \dots n$ indexes which experiment (or run)

(this technique does not require random numbers)

Space-Filling

Response surface

Response Surface Methodology

Factorial Design

Fractional Factorial Design

Full Factorial Design

Composite Design

Other Alternatives

Box-Behnkin

Whatever Others

@@ Line 105: / Line 105: @@
 MSE = 1.6354
+==Monte Carlo Sampling==
+Monte Carlo sampling is essentially a brute-force technique in which random samples are taken until confidence that the entire space has been sampled is satisfactory.
 ==Latin Hypercube==
+Latin Hypercube is a way of sampling a space randomly, but in such a way that each dimension of the space is sampled.
+For example, in the following figure, one sample falls into each bin of each of the x and y dimensions:
+[[Image:ExpDesignLatinHypercube.png|400px]]
+For a domain divided into <math>n</math> bins, each bin has an equal marginal probability of <math>1/n</math>
+===Algorithm===
+Purpose: create an experimental design with <math>n</math> runs (number of samples to be taken), and <math>s</math> input variables
+The result should be a Latin hypercube design that is an <math>n \times s</math> matrix denoting the variable combinations at which to sample
+Step 1: take <math>s</math> independent permutations of <math>n</math> integers <math>\pi_{j}(1) \dots \pi_{j}(n)</math>
+(note that <math>j</math> indexes the dimension of the Latin hypercube, <math>j=1 \dots s</math>, and <math>n</math> is the number of runs or experiments)
+Step 2: Take <math>ns</math> random numbers <math>U_{k}^{j}</math> and compute the locations of the Latin hypercube samples as:
+<math>
+x_{k}^{j} = \frac{ \pi_{j}(k) - U_{k}^{j} }{ n }
+</math>
+where <math>k = 1 \dots n</math> and <math>j = 1 \dots s</math>
+===Variation===
+One variation is ''centered'' Latin hypercube sampling
+Each sample location is given by:
+<math>
+x_{k}^{i} = \frac{ \pi^{j}(k) - 0.5 }{ n }
+</math>
+where <math>k = 1 \dots n</math> indexes which experiment (or run)
+(this technique does not require random numbers)
 ==Space-Filling==

Experimental Design Lecture: Difference between revisions

From charlesreid1