Composite Experimental Design
From charlesreid1
Contents
Overview
Composite experimental design refers to the successive sampling of parameter space in such a way as to construct a first or second order polynomial function.
Explanation
Setting Up the Whole Design
1. Select 5 (or 3) levels for each variable. Code each level with a numerical value, typically between (but can be, e.g., between , see Box and Draper 1987).
2. Create variable transforms to translate between the coded levels and the actual input parameter values (see below)
3. Create the full composite design matrix
4. Parse the full factorial matrix from above
5. Parse the fractional factorial matrix from above
6. Parse the one-factor-at-a-time matrix from above
7. Sample function in the following order:
- One factor at a time
- Fractional factorial
- Full factorial
- Full composite
How Many Levels?
The question of whether to choose 3 or 5 levels depends entirely on the case.
Typically, 3-level designs are chosen for experiments where multiple levels create difficulty in experimental setup. In this case, the minimum number of levels is desirable.
However, in simulations, 5-level designs are best, because there is no significant effort on the part of the user when running with a large number of levels.
Variable Transforms
For a variable with range ,
- the transformed variable has the range for factorial design
- the transformed variable has the range for composite design
Linear Variables
To transform a linear variable to the variable :
To transform a linear variable to the variable :
Log Variables
To transform a log variable to the variable :
To transform a log variable to the variable :
Full Composite Design Matrix
Full Factorial
Fractional Factorial
One Parameter At A Time
Example
Problem Information
For details about the problem, including the input uncertainty map, see Example Problem for Experimental Design
Code
Computing Response Surface
See Response Surface Methodology for general information on response surface methodology.
See Composite Experimental Design Matlab Code for the actual Matlab code used to generate the results below.
A Note on Visualization
Response surfaces are difficult to visualize if they are more than 2 dimensions. For example, imagine reducing the dimension of a 1-D function (e.g. ) by one dimension (a point).
Even worse is reducing by more than one dimension: for example, a plane described by a 2-D polynomial to a 0-D point.
For this reason, it is important to use more reliable metrics than visual inspection in order to judge how well a response surface represents the actual response.
A Note on Coefficient and Variable Order
The coefficient vector for each response surface is given below. The order of variables for the polynomials are:
- = mass flowrate
- = reaction rate
- = mixing length for mixing model
- = measurement location 1
- = measurement location 2
- = measurement location 3
Polynomial Powers Matrix
The polynomial powers matrix for an N-variable polynomial of M degrees is a matrix, where T is the number of different polynomial terms that can exist for a polynomial with N variables and M degrees.
The function allVL1
(available for download here: http://files.charlesmartinreid.com/ExperimentalDesign/allVL1.m) creates this matrix of permutations. Alternatively, the regstats
and x2fx
Matlab functions will automatically generate their own versions of this matrix (albeit in a different order than from the allLV1
function).
My Polynomial Term Ordering
When I run the regstats
function in Matlab, I always specify the form of the polynomial powers matrix using the allVL1
function by running allVL1(number_of_vars, degree_of_polynomial, '<=')
.
This function, in turn, sorts the matrix of polynomial powers the same way that Matlab's sortrows
function would sort the rows. More info here: http://www.mathworks.com/help/techdoc/ref/sortrows.html
Each response surface is available to download below, and the model form is specified in each .mat file. Alternatively, you can download the allVL1
function and run it for yourself.
Matlab's Polynomial Term Ordering
If no polynomial powers matrix is specified when running the regstats
function in Matlab, then the polynomial powers for an n-variable polynomial are ordered in the same way that Matlab's x2fx
function orders them.
Documentation page for x2fx
describing ordering: http://www.mathworks.com/help/toolbox/stats/x2fx.html
First order non-interaction terms | |
Second order interaction terms | |
Second order non-interaction terms | |
Third order interaction terms | |
Third order non-interaction terms | |
Quadratic Surface, 6 Dimensions
Download the response surface here: http://files.charlesmartinreid.com/ExperimentalDesign/ResponseSurface_6dim_2deg.mat
contains 2 variables:response_surface
- structure containing information about the response surface (coefficient vector is inresponse_surface.beta
)model
- this is a matrix containing the polynomial powers of each variable (variable order given in section above, #A Note on Coefficient and Variable Order; description of polynomial powers matrix given in section above, #Polynomial Powers Matrix)
A quadratic response surface for , a quadratic function of 6 input parameters of the form:
was computed using Matlab's regstats
command [1].
Because the response surface is six dimensions, graphical representation is difficult (see preceding section). However, the surface was visualized using the mean values of each of the 4 non-visualized dimensions. The two dimensions visualized were and .
The resulting polynomial coefficient vector is:
b(1) = 4.0870e+03 b(2) = -2.0956e+03 b(3) = -1.2574e+03 b(4) = -4.1912e+02 b(5) = -2.6527e-01 b(6) = 8.2956e-02 b(7) = -8.3864e+02 b(8) = 4.1912e+02 b(9) = 4.0102e-09 b(10) = 4.1912e+02 b(11) = 1.2271e-08 b(12) = 1.0050e-08 b(13) = 4.1912e+02 b(14) = 1.2039e-10 b(15) = 1.1920e-10 b(16) = 1.1952e-10 b(17) = 7.9500e-02 b(18) = 1.2627e-11 b(19) = 1.2676e-11 b(20) = 1.2491e-11 b(21) = 6.4480e-03 b(22) = -9.1954e-04 b(23) = 9.1895e-09 b(24) = 7.8094e-09 b(25) = 8.7553e-09 b(26) = 1.4867e-02 b(27) = 1.1544e-02 b(28) = 4.1922e+02
for the polynomial powers matrix:
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 1 1 0 0 0 0 2 0 0 0 0 1 0 1 0 0 0 1 1 0 0 0 0 2 0 0 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 2 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 2 0 0 0 0 1 0 0 0 0 1 1 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 2 0 0 0 0 0
The resulting response surface, holding all other parameters constant at their mean value, looks like:
Some key statistics for the response surface are given here:
--------------------------------------------------- Response surface summary of information: Number of variables in response surface is 6. Number of terms in polynomial is 28. Degree of response surface is 2. MSE = 0.03845480 MSE DoF = 17 L-inf norm resid = 0.34272386 R^2 = 0.86371957 adjusted R^2 = 0.64727417 ---------------------------------------------------
Quadratic Surface, 2 Dimensions
Download the response surface here: http://files.charlesmartinreid.com/ExperimentalDesign/ResponseSurface_2dim_2deg.mat
contains 2 variables:response_surface
- structure containing information about the response surface (coefficient vector is inresponse_surface.beta
)model
- this is a matrix containing the polynomial powers of each variable (variable order given in section above, #A Note on Coefficient and Variable Order; description of polynomial powers matrix given in section above, #Polynomial Powers Matrix)
The response surface resulting from the regression of only the two dimensions visualized (of the same form, but lower in dimension) results in a polynomial coefficient vector of:
b(1) = 0.2019 b(2) = -0.1065 b(3) = 0.1115 b(4) = 0.0269 b(5) = -0.0145 b(6) = -0.0009
for the polynomial powers matrix:
0 0 0 1 1 0 0 2 1 1 2 0
It also results in the following response surface:
This surface has the following statistics:
--------------------------------------------------- Response surface summary of information: Number of variables in response surface is 2. Number of terms in polynomial is 6. Degree of response surface is 2. MSE = 0.00690353 MSE DoF = 39 L-inf norm resid = 0.13735696 R^2 = 0.93490530 adjusted R^2 = 0.92655983 ---------------------------------------------------
It is obvious that removing the 4 non-visualized dimensions yields very significant differences in the response surface statistics.
Also of note, the 2-dimensional surface predicts a response greater than 1, physically impossible for the response of interest (mass fractions). However, this is a constraint that is not incorporated into the regression procedure.
As polynomial degrees increase, this characteristic of the response surfaces (predicting impossible or non-physical responses) becomes more exaggerated.
Cubic Surface, 6 Dimensions: Trouble in Paradise
Download the response surface here: http://files.charlesmartinreid.com/ExperimentalDesign/ResponseSurface_6dim_3deg.mat
contains 2 variables:response_surface
- structure containing information about the response surface (coefficient vector is inresponse_surface.beta
)model
- this is a matrix containing the polynomial powers of each variable (variable order given in section above, #A Note on Coefficient and Variable Order; description of polynomial powers matrix given in section above, #Polynomial Powers Matrix)
A 6-dimensional cubic response surface has 84 coefficients - much higher than the number of sample points obtained with a composite design. However, if most 3rd order interaction terms are eliminated, and only a few are used, this will significantly reduce the number of coefficients.
A cubic model was used that was the same as the quadratic models described above, but with the addition of 9 third order terms, listed on the right. The coefficient vector is: b(1) = 351.8 b(2) = -1.065e-06 b(3) = -2.132e-07 b(4) = -4.034e-08 b(5) = -6.49 b(6) = -0.1842 b(7) = -1048 b(8) = 4.272e-07 b(9) = -4.456e-10 b(10) = 1.437e-07 b(11) = -1.3e-09 b(12) = -5.911e-10 b(13) = 9.099e-08 b(14) = -9.027e-12 b(15) = -2.151e-11 b(16) = -3.873e-12 b(17) = 6.377 b(18) = -3.172e-12 b(19) = -1.141e-12 b(20) = -3.025e-12 b(21) = -1.529 b(22) = 0.3625 b(23) = -1.38e-09 b(24) = -8.897e-10 b(25) = -1.015e-09 b(26) = 0.01404 b(27) = 0.00944 b(28) = 1048 b(29) = -349.3 b(30) = 0.001737 b(31) = -2.122 b(32) = -6.063e-08 b(33) = -3.193e-08 b(34) = -5.694e-08 b(35) = -0.003435 b(36) = 2.63 b(37) = -0.5728 where the order for the first 28 terms is the same as for the quadratic models above, and the remaining 9 terms are in the order given in the table to the right. This makes the polynomial powers matrix: 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 1 1 0 0 0 0 2 0 0 0 0 1 0 1 0 0 0 1 1 0 0 0 0 2 0 0 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 2 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 2 0 0 0 0 1 0 0 0 0 1 1 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 2 0 0 0 0 0 3 0 0 0 0 0 0 3 0 0 0 0 0 0 3 0 0 0 0 0 0 3 0 0 0 0 0 0 3 0 0 0 0 0 0 3 1 1 1 0 0 0 0 1 2 0 0 0 0 2 1 0 0 0 |
|
This model presents an interesting problem. The 6-dimensional response surface that results, plotted in 2 dimensions (again using mean values for non-visualized dimensions), looks like this:
Note the range of the response: this is clearly a fishy response surface (maximum predicted is on the order of 1000???). However, looking at the statistics shows that the polynomial creates a perfect fit!
--------------------------------------------------- Response surface summary of information: Number of variables in response surface is 6. Number of terms in polynomial is 37. Degree of response surface is varied, deg is a matrix. Max degree = 3. MSE = 0.00000000 MSE DoF = 8 L-inf norm resid = 0.00000000 R^2 = 1.00000000 adjusted R^2 = 1.00000000 ---------------------------------------------------
At every experimental design sample point, the polynomial response prediction exactly matches the actual response , resulting in 0 error.
The knee-jerk reaction is that something must be wrong - the response surface is wrong, there was some mistake, the software should have come up with a more "reasonable" response surface to fit the sample points. However, regression (and the whole idea of using response surfaces to represent complex functions) is double-edged sword: you can make the function evaluation much, much cheaper - but the price you pay is a significant loss of information.
One may suggest an alternative validation technique of creating a low-dimensional response surface, which is easier to fit with a "reasonable" or "sensible" polynomial, and perform validation; then use the feasible (validated) values for each of those dimensions to create a second low-dimensional response surface, which is then validated; this yields a new feasible set, which can be combined with the old feasible set; and so on, until all dimensions have been covered and valid ranges for all input parameter values determined.
However, this approach is not equivalent, nor is it an improvement. When creating the low-dimensional response surface, one must select values for the other, ignored dimensions; these values are uncertain and are merely guesses. Changing the values of non-regressed variables will likely result in significant changes in the regression results (i.e. the response surface).
Box-Behnken Designs
The relationship between composite and Box Behnken designs is that, if you use a face-centered (i.e. a 3-level) composite design and combine it with a Box Behnken design, you will get a full factorial design. So composite and Box Behnken designs are both fractional factorial designs.
|