Empirical Model-Building and Response Surfaces
From charlesreid1
Box, George; Draper, Norman (1987). Empirical Model-Building and Response Surfaces. Wiley and Sons. ISBN 0-471-81033-9.
Contents
- 1 Chapter 1: Introduction to Response Surface Methodology
- 2 Chapter 2: Use of Graduating Functions
- 3 Chapter 3: Least Squares for Response Surface Work
- 4 Chapter 4: Factorial Designs at 2 Levels
- 5 Chapter 5: Blocking and Fractionating Factorial Designs (skipping...)
- 6 Chapter 6: Use of Steepest Ascent for Process Improvement
- 7 Chapter 7: Fitting Second-Order Models
- 8 Chapter 8: Adequacy of Estimation and the Use of Transformation
- 9 Chapter 9: Exploration of Maxima and Ridge Systems with Second-Order Response Surfaces
- 10 Chapter 10: Occurrence and Elucidation of Ridge Systems I
- 11 Chapter 11: Occurrence and Elucidation of Ridge Systems II
- 11.1 Examples
- 11.2 Canonical Analysis to Characterize Ridge Phenomena
- 11.3 Example: Consecutive Chemical System with Near Stationary Planar Ridge
- 11.4 Example: Small reactor study yielding rising ridge surface
- 11.5 Example: Stationary ridge in five variables
- 11.6 Economic importance of dimensionality of maxima/minima
- 11.7 Method for obtaining desirable combination of several responses
- 11.8 Appendix 11A: Calculations for ANOVA
- 11.9 Appendix 11B: Ridge analysis (alternative to canonical analysis)
- 12 Chapter 12: Links Between Empirical and Theoretical Models
- 13 Chapter 13: Design Aspects of Variance, Bias, and Lack of Fit
- 14 Chapter 14: Variance-Optimal Designs
- 15 Chapter 15: Practical Choice of a Response Surface Design
Chapter 1: Introduction to Response Surface Methodology
Questions when planning initial set of experiments:
1. Which input variables should be studied?
2. Should the input variables be examined in their original form, or should transformed input variables be employed
3. How should response be measured?
4. At which levels of a given input variable should experiments be run?
5. How complex a model is necessary in a particular situation?
6. How shall we choose qualitative variables?
7. What experimental arrangement (experimental design) should be used?
Chapter 2: Use of Graduating Functions
Polynomial approximations:
- a polynomial of degree d can be thought of as a Taylor series expansion of the true underlying theoretical function y(x) truncated after terms of dth order
- the higher the degree d, the more closely the Taylor series can approximate the true function
- the smaller the region R over which y(x) is being approximated with the polynomial approximation, the better the approximation
Issues with application of polynomial approximations:
- least squares - how does it work? what are its assumptions?
- standard errors of coefficients - how to estimate the standard deviations of the linear coefficients?
- adequacy of fit - approximating an unknown theoretical function empirically; need to be able to check whether a given degree of approximation is adequate; how can analysis of variance (ANOVA) and examination of residuals (observed - fitted values) help to check adequacy of fit?
- designs - what designs are suitable for fitting polynomials of first and second degrees? (Ch. 4, 5, 15, 13)
- transformations - how can one find transformations (generally)?
Chapter 3: Least Squares for Response Surface Work
Method of Least Squares
Least squares helps you to understand a model of the form:
y = f(x,t) + e
where:
E(y) = eta = f(x,t)
is the mean level of the response y which is affected by k variables (x1, x2, ..., xk) = x
It also involves p parameters (t1, t2, ..., tp) = t
e is experimental error
To examine this model, experiments would run at n different sets of conditions, x1, x2, ..., xn
would then observe corresponding values of response y1, y2, ..., yn
Two important questions:
1. does postulated model accurately represent the data?
2. if model does accurately represent data, what are best estimates of parameters t?
start with second question first
Given: function f(x,t) for each experimental run
n discrepancies:
Method of least squares selects best value of t that make the sum of squares smallest:
S(t) = sum of squares function
minimizing choice of t is denoted
are least-squares estimates of t good?
their goodness depends on the nature of the distribution of their errors
least-squares estimates are appropriate if you can assume that experimental errors:
are statistically independent and with constant variance, and are normally distributed
these are "standard assumptions"
Linear models
this is a limiting case, where
adding experimental error :
model of this form is linear in the parameters
Algorithm
Formulate a problem with n observed responses, p parameters...
this yields n equations of the form
y_1 = t_1 z_{11} + t_2 z_{21} + ...
y_2 = t_1 z_{21} + t_2 z_{22} + ...
etc...
This can be written in matrix form:
and the dimensions of each matrix are:
- y = n x 1
- Z = n x p
- t = p x 1
- epsilon = n x 1
the sum of squares function is given by:
or,
this can be rewritten as:
Rank of Z
If there are relationships between the different input parameters (z's), then the matrix Z can become singular
e.g. if there is a relationship z2 = c z1, then you can only estimate the linear combination z1 + c z2
reason: when z2 = c z1, changes in z1 can't be distinguished from changes in z2
Z (an n x p matrix) is said to be full rank p if there are no linear relationships of the form:
a_1 z_1 + a_2 z_2 + ... + a_p z_p l= 0
if there are q > 0 independent linear relationships, then Z has rank p - q
Analysis of Variance: 1 regressor
Assume simple model
This states that y is varying about an unknown mean
Suppose we have 3 observations of y,
Then the model can be written as
and
and
so that
[ 4 ] [ 1 ] [ \epsilon_1 ] [ 1 ] = [ 1 ] t + [ \epsilon_2 ] [ 1 ] [ 1 ] [ \epsilon_3 ]
Supposing the linear model posited a value of one of the regressors t, e.g.
Then you could check the null hypothesis, e.g.
If true, the mean observation vector given by
or,
[ 0.5 ] [ 1 ] [ 0.5 ] = [ 1 ] 0.5 [ 0.5 ] [ 1 ]
and the appropriate "observation breakdown" (whatever that means?) is:
Associated with this observation breakdown is an analysis of variance table:
Source | Degrees of freedom (df) | Sum of squares (square of length), SS | Mean square, MS | Expected value of mean square, E(MS) |
---|---|---|---|---|
Model | 1 | 6.75 | ||
Residual | 2 | 3.00 | ||
Total | 3 |
sum of squares: squared lengths of vectors
degrees of freedom: number of dimensions in which vector can move (geometric interpretation)
the model says whatever the data is, the systematic part of must lie in the direction of , which gives only one degree of freedom.
Whatever the data, the residual vector must be perpendicular to (why?), and so it can move in 2 directions and has 2 degrees of freedom
Now, looking at the null hypothesis:
the component is a measure of discrepancy between POSTULATED model and ESTIMATED model
Making "standard assumptions" (earlier), expected value of sum of squares, assuming model is true, is
For the residual component it is (or, in general, , where is number of degrees of freedom of residuals)
Thus a measure of discrepancy from the null hypothesis is
if the null hypothesis were true, then the top and bottom would both estimate the same
So if F is different from 1, that indicates departure from null hypothesis
The MORE F differs from 1, the more doubtful the null hypothesis becomes
Least squares: 2 regressors
Previous model, , said y was represented with a mean plus an error.
Instead, suppose that there are systematic deviations from the mean, associated with an external variable (e.g. humidity in the lab).
Now equation is for straight line:
or,
So now the revised least-squares model is:
- i.e. is in the plane defined by linear combinations of vectors
because , these two vectors are NOT at right angles
the least-squares values produce a vector
these least-squares values make the squared length of the residual vector as small as possible
The normal equations express fact that residual vector must be perpendicular to both and :
also written as:
also written (in matrix form) as:
Now suppose the null hypothesis was investigated for and
Then the mean observation vector is represented as
Source | Degrees of freedom | SS | MS | F |
---|---|---|---|---|
Model and | 2 | 3.345 | 2.23 | |
Residual | 1 | 1.50 | ||
Total | 3 |
and so
Orthogonalizing second regressor
In the above example, and are not orthogonal
One can find the vectors and that are orthogonal
To do this, use least squares property that residual vector is orthogonal to space in which the predictor variables lie
Regard as "response" vector and as predictor variable
You then obtain (how?)
so the residual vector is
now the model can be rewritten as
This gives three least-squares equations:
1. 2. 3.
The analysis of variance becomes:
Source | df | SS |
---|---|---|
Response function with only | 1 | |
Extra due to (given ) | 1 | |
Residual | 1 | |
Total | 3 |
Generalization to p regressors
With n observations and p parameters:
n relations implicit in response function can be written
Assuming Z is full rank, and letting be the vector of estimates given by normal equations
Sum of squares function is
because cross-product is zero from the normal equations
Furthermore, because is positive definite, minimized when
So the solution to the normal equations producing the least squares estimate is the one where :
Source | df | SS |
---|---|---|
Response function | p | |
Residual | n-p | |
Total | n |
Bias in Least-Squares Estimators if Inadequate Model
Say data was being fit with a model ,
but the true model that should have been used is
would be estimated by
but using true model,
The matrix A is the bias or alias matrix
Unless A = 0, will represent t1 AND t2, not just t1
A = 0 when , which happens if regressors in Z1 are orthogonal to regressors in Z2
Confidence Intervals
(examples given for orthogonal and non-orthogonal design... looks interesting but didn't understand it fully)
Chapter 4: Factorial Designs at 2 Levels
I think this approach has a problem... Can only lead to LINEAR models. Chapter 7 begins to deal with 2nd order models.
However, I'm not completely screwed. Composite designs: Chapter 9 details central composite designs, which consist of factorial designs for first-order effects, plus more points to determine higher-order terms.
Brief explanation of 2-level factorial designs
Designation of lower/upper level with -1/+1
Analysis of Factorial Design
Main effect of a given variable, as defined by Yates (1937), is the average difference in the level of response as one moves from low to high level of that variable
Example: effect of variable 1 is estimated by:
and effect of variable 2 is
and effect of variable 3 is
Factorial designs also make calculation of interactions possible... i.e. is effect of 1 different at the two different levels of 3?
Example given of calculating multiple interactions...
Variance, Standard Errors
For complete design, if :
or, if there are r repeats, then the denominators become
In practice, still need estimate of experimental error variacne
Suppose we're given estimate of ; then
and corresponding standard errors are the square roots:
so the effects of each variable, with the standard error, is:
Variable I:
Variable 1:
Variable 2:
Variable 3:
Variable 12:
etc...
Regression Coefficients
If you fit a first degree polynomial to textile data, you can obtain:
The estimated regression coefficients and their errors are half of the main effects and their standard errors
Factor of one half comes from definition of effect: difference in response on moving from the -1 level to the +1 level of given variable , so it corresponds to change in y after changing by 2 units
Regression coefficient is the change in y when is changed by 1 unit
Dye example
Example of factorial design
Analysis of results show that data adequately explained in terms of linear effects in only three variables,
Linear equations in fitted by least squares to all 64 data points:
These are not a function of but this does NOT mean substituting value of 0 for nonsignificant coefficients
Value of 0 would not be best estimate
Regard fitted equations as best estimates in the three dimensional subspace of the full six-dimensional space in which are at average values
Next, obtain estimate of standard error deviations from residual sum of squares:
Variable 1
Source of variation | SS | df | MS | F ratio |
---|---|---|---|---|
Total SS = | 8,443.41 | 64 | ||
Correction factor,
SS due to |
7,918.77 | 1 | ||
Corrected total SS | 524.63 | 63 | ||
Due to | 48.825 | 1 | 48.825 | 13.47 |
Due to | 142.50 | 1 | 142.50 | 39.32 |
Due to | 115.83 | 1 | 115.83 | 31.96 |
Residual | 217.47 | 60 | 3.624
|
etc... this table also exists for variable 4 and variable 6
Potential bias in standard deviation estimates:
- biased upward because of several small main effects and interactions that are being ignored
- biased downward because of effect of selection (only large estimates taken to be real effects)
these s values were used to estimate standard errors of coefficients hsown in parentheses beneath coefficients
Diagnostic Checking of Fitted Models
Plots of residuals vs.
Plots of residuals vs. Time Order
Don't understand exactly what they're getting from these plots... or what "NSCORES" are... or what "Time Order" is
Response Surface Analysis
Application of model to manufacturing: want to restrict one variable to 20, another variable to 26... and maximize strength
Response surface: looking at three-dimensional cube... these two constraints create two planes
The two planes intersect and create a line PQ, and along this line the strength varies from 11.08 (Q) to 12.46 (P)
Estimated difference in strengths at P and Q given by:
and the variance is given by:
where , and is the squared distance between the points P and Q in the scale of the x's
So the standard deviation of is
when this is evaluated, it is:
value of substituted for , this gives a standard error
For variable 1, , the standard error is 0.53
This means that the difference is 2.6 times larger than the standard error, meaning we can be confident the strength is in fact higher at P than at Q
Appendix 4A: Yates' Method for Obtaining Factorial Effects
This has got to be the worst description of a mathematical technique, ever.
Chapter 5: Blocking and Fractionating Factorial Designs (skipping...)
Chapter 6: Use of Steepest Ascent for Process Improvement
Expensive and impractical to explore entire operability region (i.e. entire region in which the system could be operated)
But this should not be the objective
Instead, explore subregion of interest
For new/poorly understood systems, need to apply a preliminary procedure to find these subregions of interest where a particular model form (e.g. 2nd order polynomial) will apply
One method: one factor at a time method
Alternative method: steepest ascent method (Box: this is more effective, economical)
Steepest Ascent Method
Example: chemical system whose yield depends on time, temperature, concentration
Early stage of investigation: planar contours of first-degree equation can be expected to provide fair approximation in immediate region of point P far from optimum
Direction at right angles to contour planes is in direction of steepest ascent, if pointing toward higher yield values
Exploratory runs performed along path of steepest ascent
Best point found, or interpolated estimated maximum point on path, could be made base for new first-order design, from which further advance might be possible
After one or two applications of steepest ascent, first-order effects will no longer dominate, first order approximation will be inadequate
Second order methods (Chapter 7, Chapter 9) will then have to be applied
Chapter 7: Fitting Second-Order Models
Chapter 8: Adequacy of Estimation and the Use of Transformation
Chapter 9: Exploration of Maxima and Ridge Systems with Second-Order Response Surfaces
At first glance this chapter just appears to be a re-hash of the earlier chapter on ridge systems and optimization.
However, on second glance, section 2 discusses a composite design used to construct second-order response surface for a polymer elasticity.
9.2 Example: Polymer Elasticity
Illustrating example to elucidate nature of maximal region for polymer elasticity experiment
Central Composite Design
The design employed was second order central composite design
Such design consists of two-level factorial (or fractional factorial), chosen to allow estimation of all first-order and two-factor interaction terms
This is augmented with additional points to estimate pure quadratic effects
These designs are discussed in more detail in Chapter 15
Using standard factorial coding, 3 variable values converted to -1/+1
First, determine the low/high levels of variable :
Next, determine the midlevel:
And last, semirange:
So that the "standard factorial coding" is:
First set of runs: factorial design, coded factorial variable values were -1 and +1
Second set of runs: three-dimensional "star", coded factorial variable values were -2, 0, and +2
Block difference: 1+ week between first and second set of runs, allowing much time for systematic differences
Experiment was run in two blocks of eight runs (Chapter 5 terminology)
Estimation/Elimination of Block Differences
If all 16 runs performed under conditions of first block:
Constant term in second degree polynomial could be written
If all 16 runs performed under conditions of second block:
Constant term in second degree polynomial could be written
True mean difference between blocks is
This makes the model:
where the variable is a blocking variable
This variable is -1 for the first block, +1 for the second block
Next (this is where he loses me) the blocking (indicator) variable is orthogonal to each column in
Then it follows from Section 3.9 (uh, what?) that:
He then says, that you can separate out the "blocks" contribution from the residual sum of squares
He then loses me again, with this "blocks" contribution with 1 degree of freedom, that is given by:
And then references a page that doesn't seem to talk about anything related (p. 513)........
He then presents this table:
Source | SS | degrees of freedom | MS |
---|---|---|---|
Blocks | 26.6 | 1 | 26.6 (F = 8.9) |
Residual after removal of block | 15.0 | 5 | 3.0 |
Residual before removal of blocks | 41.6 | 6 |
Because blocking is orthogonal, it does not change the estimated coefficients in the model
But the portion of the original residual sum of squares accounted for by the systematic block difference is removed, and that increases the experimental accuracy
It is further possible to analyze this residual variance further and isolate measures of adequacy of fit
Importance of Blocking
Blocking is important!!!
Once the factorial design is completed, there are a couple of different ways to proceed
Had first-order effects been large compared with their standard errors, and large compared with estimated interaction terms, then application of steepest ascent would be appropriate (no second-order model)
Using the application of steepest ascent would lead to a maxima in (likely) a different location
Sequential Assembly of Designs
Second part of design added with knowledge that the second degree polynomial equation could now be estimated
A change in level could have occurred between two blocks of runs
Possibility of sequential assembly of different kinds of designs: discussed in more detail in Section 15.3
Examination of Fitted Surface
Location of maximum of fitted surface
Investigation of Adequacy of Fit
Isolating residual degrees of freedom for composite design
Before accepting fitted second degree equation, one must consider lack of fit
N observations fitted to linear model with p parameters:
Fitting process itself will introduce p linear relationships among the N residuals
If N is large with respect to p:
- the effect of induced dependence among residuals will be slight
- Plotting techniques employed to examine residuals useful in revealing inadequacies in model
As p becomes larger, and as it approaches N:
- patterns caused by induced dependencies become dominant, and can mask those due to model inadequacies
Section 7.4: Using factorial desgin, possible to obtain information on adequacy of fit by isolating and identifying individual residual degrees of freedom associated with feared model inadequacies
For fitting a polynomial of degree n, it is important to consider possibility that polynomial of higher degree is needed
This focuses attention on characteristics of estimates when the feared alternative model applies, but the simpler assuemd model ahs been fitted
Contemplation of fitted model embedded in more complex one makes it possible to answer two questions:
1. to what extent are original estimates of coefficients biased if the more complex model is true?
2. what are appropriate checking functions to warn of the possible need for a more complex model?
both questions critically affected by choice of design
Thoughts: need for adequate surrogate models is desperate... If we don't have good surrogate models, all of the hard computational work goes to waste.
The investigation of model fitness, experimental design, and statistical analysis of the results is just as important as development of the model itself.
ESPECIALLY in the case of moving toward predictive science, questions (1) and (2) above are CRITICAL!
Bias Characteristics of the Design
Can write extended third-order polynomial model in form:
y = Z1 beta1 + Z2 beta2 + epsilon
Or in orthogonalized form,
y = Z1 ( beta1 + A beta2 ) + ( Z2 - Z1 A ) beta2 + epsilon
where Z1 beta1 includes all terms up to and including second order, and Z2 beta2 has all terms of third order
Alias or bias matrix A:
It shows that only the estimates of first order terms are biased by third-order terms
If a third-order model is appropriate, and if b1 b2 and b3 are previous least-squares estimates, then
E(b1) = beta1 + 2.5 beta111 + 0.5 beta122 + 0.5 beta133
E(b2) = beta2 + 2.5 beta222 + 0.5 beta112 + 0.5 beta233
E(b3) = beta3 + 2.5 beta333 + 0.5 beta113 + 0.5 beta223
Checking Functions for the Design
Examination of matrix Z2 - Z1 A reveals rather remarkable circumstance
of the 10 columns, only 4 are independent; these 4 are simple multiples of 4 columns of Z2*
(some analysis I don't quite follow...)
Thus, although we cannot obtain estimates of each of the third-order effects individually, using this design we can isolate certain linear combinations of them (certain alias groups)
size of these combinations can indicate particular directions in which there may be lack of fit
example: if (linear combinations of observation, I think?) were excessively large, it could indicate that a transformation of might be needed to obtain adequate representation using second degree equation
Transformation aspect: discussed in more detail in Section 13.8
Complete Breakup of Residual Sum of Squares
16 runs in composite design used
10 degrees of freedom = estimation of second degree polynomial
1 degree of freedom = blocking
1 degree of freedom = pure error comparison in which two center points are compared
4 degrees of freedom remain
- canbe associated with possible lack of fit from neglected third order terms
- alternatively, with need for transformation variables
Source | SS | degrees of freedom | MS |
---|---|---|---|
Blocks | 26.6 | 1 | 26.6 |
111 | 7.2 | 1 | 7.2 |
222 | 2.6 | 1 | 2.6 |
333 | 1.6 | 1 | 1.6 |
123 | 1.2 | 1 | 1.2 |
Pure error | 2.3 | 1 | 2.3 |
Residual sum of squares | 41.6 | 6 |
Since none of the mean squares are excessively large compared with others, and do not contradict earlier supposition that (or ), no reason to suspect lack of fit
Chapter 10: Occurrence and Elucidation of Ridge Systems I
Reason for occurrence of unusual ridge shapes of systems can be seen because factors like temp., time, pressure, concentration, etc. are regarded as "natural" variables because they can be conveniently manipulated and measured
Individual fundamental variables (e.g. collision of two types of molecules) often a function of multiple variables
This is why you may see multiple min/max or optimal levels of a fundamental variable
Example: measuring an observable that is a function of voltage, but all you can measure is current and resistance (presuming Ohm's law existence unknown)
This leads to a ridge system, where (along the ridge) the voltage is maximal
Elucidation of Stationary Regions/Maxima/Minima by Canonical Analysis
Canonical analysis: writing second degree equation in form in which it can be more readily understood
involves elimination of all cross-product terms
Examples
(several examples and forms of canonical analysis given)
Appendix 10A: Simple explanation of canonical analysis
(Geometrical explanation of canonical analysis)
Chapter 11: Occurrence and Elucidation of Ridge Systems II
One of the most important uses of response surface techniques: detection, description, exploitation of ridge systems
Examples
Stationary ridge
Rising ridge
Canonical Analysis to Characterize Ridge Phenomena
Example: Consecutive Chemical System with Near Stationary Planar Ridge
(example given: Box and Youle, 1955)
Chemical system
Transformation of variables
Canonical analysis
Direct fitting of canonical form
Exploiting canonical form
Example: Small reactor study yielding rising ridge surface
Example: Stationary ridge in five variables
Economic importance of dimensionality of maxima/minima
Method for obtaining desirable combination of several responses
Appendix 11A: Calculations for ANOVA
Appendix 11B: Ridge analysis (alternative to canonical analysis)
Chapter 12: Links Between Empirical and Theoretical Models
Chapter 13: Design Aspects of Variance, Bias, and Lack of Fit
response y is measured, with a mean value , believed to depend on set of variables
Exact functional relationship is
and is usually unknown/unknowable
Flight of bird, fall of leaf, flow of water through valve: even using such equations, we are likely to be able to approximate main features of relationship
This book: employ crude polynomial approximations, exploiting local smoothness properties
Adequate LOCALLY (flight of bird can be approximated by straight line function of time for short times, maybe quadratic function at long times)
Low order terms of Taylor series approximation can be used over region of interest
This lies within larger region of operability
if is the polynomial approximation,
"The fact that the polynomial is an approximation does not necessarily detract from its usefulness because all models are approximations. Essentially, all models are wrong, but some are useful. However, the approximate nature of the model msut always be borne in mind."
Suppose the following:
is vector of random errors with zero vector mean,
(where the n different are n observations of , the true model is not:
but is actually
where
is a vector discrepancy that should be small over the region of interest
There are TWO types of errors that must be taken into account:
1. Systematic, or bias, errors , which is the difference between the expected value of the response and its approximating function
2. Random errors
Systematic errors are always to be expected
Since the time of Gauss, they have been ignored and most concentration has been focused on random error
(Nice mathematical results are possible when this is done)
In choosing a design, ignoring of systematic error is not innocuous approximation, and may lead to misleading results
Competing effects of bias and variance
Example of an interval with an unknown function , and looking at a plot of vs.
Region of interest:
Approximating using straight line:
And errors in observations have variance
Next step is to apply the coding transformation
to convert the interval of interest into interval
Now suppose use least-squares fit to find
Mean squared error calculation:
MSE estimating with is, for N design points and variance ,
Symbolically:
standardized mean squared error is equal to variance plus squared bias at x
Integrated Mean Squared Error
Can integrate variance and squared bias over region of interest R:
and if integrated mean square error is denoted , then
Regions of Interest, Regions of Operability for k Dimensions
When using standard designs:
Choice of approximation function and selected neighborhood are implicit as soon as experimenter decides on type of design, variables to investigate, levels of variables, and transformations to use
Example: chemist exploring effect of 2 types of catalyst; picks experimental design factors to be total catalyst weight (sum of 2 variables) and catalyst weight ratios (ratio of 2 variables)
other scientists might have selected different ranges for variables, or selected to use different transforms for design factors
Such differences would not necessarily have any adverse effect on the end result of the investigation. The iterative strategy we have proposed for the exploration of response surfaces is designed to be adaptive and self-correcting. For example, an inappropriate choice of scales or of a transformation can be corrected as the iteration proceeds.However, for a given experimental design, a change of scale can have a major influcence on:
1. the variance of estimated effects
2. the sizes of systematic errors
Regions of Interest: Weight Functions
"Region of interest" discussion implies accuracy of prediction is uniform over interval R
Sometimes this is not the case
Might need high accuracy at some point P in predictor variable space, and can tolerate reduced accuracy away from P
e.g. think of Gaussian vs. top hat
Introduce a weight function
Minimizing a weighted mean squared error integrated over the whole operability region O
with
Weight functions should also be normalized, so that
1-D Weight Function Example
Fitted equation:
True model:
Suppose N runs made at levels , and that
Define
it can be shown that the integrated mean squared error is
(big expression for M)
Want to minimize M
can choose to eliminate one term, and this can be done by making design symmetric about center
Only design characteristic that remains is , which measures spread of sample points (small spread means near-neighbors of center point are selected, etc.)
"All-Variance" Case: bias term is totally ignored
"All-Bias" Case: variance term is totally ignored
Results from 1-D case: optimal value for V = B is close to that for all-bias designs, dramatically different from all-variance designs
This suggests that, if a simplification is to be made in the design problem, it might be better to ignore the effects of sampling variation rather than those of bias
Designs Minimizing Bias
Designs minimizing squared bias are of practical importance
Consideration of properties of such designs are important
Example: polynomial model of degree d1, actual model of degree d2
also,
and
Necessary and sufficient condition for squared bias B to be minimized:
Not necessary, but sufficient, condition is
Elements of are of form:
and these are moments of weight function.
Elements of are of form:
and these are moments of the design points.
All are of order
Thus the sufficient condition above states that up to and including order the design moments must equal the weight function moments
Previous section: all-bias design obtained by setting ,
(Conclusions are...? I don't know exactly. He gives an example for fitting a response function plane to a real function of degree 2 within a spherical region of interest R...)
Detecting Lack of Fit
Two features from a model are desired:
1. Good estimation of
2. Detection of model inadequacy
Box and Draper 1959, 1963: consider experimental design strategies that fit the first critera, then narrow down to ones that also fit second criteria
Consider mechanics of making test of goodness of fit using ANOVA
Estimating parameters
Using observations
Results in degrees of freedom
Repeated observations made at certain points to provide pure error degrees of freedom
Total number of observations:
ANOVA table:
Source | df | E(MS) | |
---|---|---|---|
Parameter estimates | p | ||
Lack of fit | f | ||
Pure error | e | ||
Total | N |
= noncentrality parameter
= experimental error variance, or expectation of unbiased pure error mean square
= expected value of lack of fit mean square
Test of lack of fit: comparison of mean square for lack of fit to mean square for error, via test
Noncentrality parameter takes the general form:
where = lack of fit sum of squares
Good detectability of general lack of fit can be obtained by choosing a design that makes large
This can be achieved by putting certain conditions on the experimental design moments
order design provides high detectibility for terms of order if:
1. all odd design moments of order or less are zero
2. the following ratio is large:
where
Example: for first order design , ratio should be large to detect high dependability of quadratic lack of fit; , ratio should be large to provide high detectability of cubic lack of fit.
Example:
Two example designs are given:
- one design (A) is sensitive to lack of fit produced by interaction term, completely insensitive to lack of fit produced by quadratic terms
- one design (B) is sensitive to lack of fit produced by quadratic terms alone, not sensitive to lack of fit due to interaction terms
Detecting Variable Transformability
Construction of designs to detect whether a variable should be transformed to yield a simpler model
Discussion of both first and second order models...
Second Order Models
For example, a function may possess asymmetrical maximum which, after suitable variable transformation, can be represented by quadratic function
Parsimonious class of designs of this type: central composite arrangements in which a cube, consisting of two-level factorial with coded points , or fraction of resolution , augmented by an added "star" with axial points at coded distance and by added center points
(note that is number of cube points, is number of star points, number of center cube points, number of center star points, and thus )
Before accepting utility of fitted equation, need to be reassured on two questions:
1. Is there evidence from data of serious lack of fit?
2. If not, is the change in over the experimental region explored by design large enough compared with standard error of to indicate that response surface is adequately estimated?
ANOVA table: throws light on both questions
- elements (row) for:
- mean
- blocks
- first order extra
- second order extra
- lack of fit
- lack of fit
- lack of fit
- pure error
Main concern: marked lack of fit of second order model
Design of experiment table: factors/levels table
Need for transformation would be associated with appearance of third order terms
Associated with the design table are four possible third-order columns, namely those created by:
These form two sets of two items
Suppose these third-order columns orthogonalized with respect to low-order vectors (regress them against 6 columns for )
Then take residuals to yield columns from , from , etc.
In vector notation:
The curvature contrast (curvature contrast has expectation zero if assumption of a quadratic model is true) associated with is:
in this case, is the average of the responses at the second level (so, for composite design, most likely).
is average of response at level , etc.
is measure of overall non-quadricity in the direction
Corresponding measure in diredction is
General Formulas
General composite designs contain:
1. a "cube" consisting of a (full factorial) or (fractional factorial), made up of points of type for resolution replicated times, leading to the number of points
2. a star, that is, points on the predictor variable axes replicated times leading to points (assuming )
3. Center points, number , where in cube, in star
Chapter 14: Variance-Optimal Designs
Ignoring of bias: the theory that follows rests on assumption that graduating polynomial is the response function
This polynomial must be regarded as a local approximation of an unknown response function
Two sources of error: variance error and bias error
Designs which take account of bias tend not to place points at the extremes of region of interest, which is where credibility of approximating function is most strained
Aims in selecting experimental design must be multifaceted
Desirable design properties:
- generate satisfactory distribution of information throughout region of interest R
- ensure that fitted value at be as close as possible to true value at
- give good detectability of lack of fit
- allow transformations to be estimated
- allow experiments to be performed in blocks
- allow designs of increasing order to be built up sequentially
- provide internal estimate of error
- be insensitive to wild observations and to violation of usual normal theory assumptions
- require minimum number of experimental runs
- provide simple data patterns that allow ready visual appreciation
- ensure simplicity of calculation
- behave well when errors occur in the settings of predictor variables, the x's
- not require impractically huge number of levels of predictor variables
- provide check on constancy of variance assumption
Orthogonal designs
Orthogonality: important design principle (Fisher and Yates)
Rotatability: logical extension of orthogonality
Chapter 15: Practical Choice of a Response Surface Design
have to take account of stuff on the right to determine relative importance of stuff on left
Characteristics of design | Relevant experimental circumstances |
---|---|
Allows check of fit |
|
Allows estimations of transformations |
|
Permits sequential assembly |
|
Can be run in blocks |
|
Provides independent estimate of error |
|
Robustness of distribution of design points |
|
Number of runs required |
|
Simplicity of data pattern |
|
Sequential assembly
Many examples of designs used sequentially, e.g. using steepest ascent with first-order designs, then finding sufficiently promising region, then creating second-order model inside that region
Illustration: three-phase sequential construction of design
- I: regular simplex (4 of 8 cube corners) and 2 center points
- II: complementary simplex (remaining 4 cube corners) and 2 (additional) center points
- III: six axial points (star)
Phase I: orthogonal first-order design, checks for overall curvature (via contrast of average response of center points with average response on cube)
If first-order effects were large compared with their standard errors, and no serious curvature exhibited, then would explore indicated direction of steepest ascent
If doubts about adequacy of first-degree polynomial model, then Step II would be performed
Steps (I+II) still first order, but they give much better indication of detecting lack of fit due to additional information permitting estimation of two-parameter interactions
If first-order terms dominate, move to more promising region
If first-order effects small, and two-factor interactions dominate or strong evidence of curvature, move to Step III (second order)
Other Factors
Robustness
Minimization of effects of wild observations (not sure waht that means, exactly)
Number of Runs
Experimental design should focus experimental effort on whatever is judged important in the context
Suppose that in addition to estimating parameters of assumed model form, need contrasts needed to check adequacy of fit, further contrasts needed for blocking, and estimate of experimental error needed having degrees of freedom
To obtain independent estimates of all items of interest, require a design with at least runs
Relative importance of checking fit, blocking, and obtaining error estimate will differ in different situations
Minimum value of N runs will correspondingly differ
Corresponding minimum design will only be adequate if is below a critical value
When larger, designs larger than minimum design needed to obtain estimates of sufficient precision
Even with small, designs where not wasteful
Depends on whether additional degrees of freedom are genuinely used to achieve the experimenter's objectives
Simple Design Patterns
Question: why use design patterns, instead of randomized designs?
Statistician's task:
1. Inductive criticism
2. Deductive estimation
(2) involves deducing consequences of given assumptions, in the light of the data obtained; this is easily done with randomized designs
(1) involves two questions:
a) what function should be fitted in the first place?
b) how to examine residuals from fitted function to understand deviations from initial model? (relation to the predictor variables, how to find appropriate model modification)
Factorial/composite designs use patterns of experimental points allowing such comparisons to be made
Inductive criticism enhanced by possibility of being able to plot original data and residuals against variable parameter for each individual level of , or vice versa
First Order Designs
two-level factorial (Chapter 4)
fractional factorial (Chapter 5)
use of these in estimating steepest ascent (Chapter 6)
convenient curvature check: obtained by adding center points to factorial points (Section 6.3)
first-order designs can play role of initial building blocks in construction of second-order designs
Plackett Burman designs: useful first-order designs (Section 5.4); can also be used as initial building blocks for smaller second-order designs (Section 15.5)
Koshal first-order design can be used for "extreme economy of experimentation"
Regular Simplex Designs
Adapted from Box (1952), "Multifactor designs of first order", Biometrika 39, p. 49-57
Simplex in k dimensions: figure formed by joining any k+1 = N points that do not lie in a (k-1)-dimensional space
Example: (k + 1) = 3 points not all on the same straight line (because if they were on a line, that would be in a 1-dimensional space)
Example: (k + 1) = 4 points not all on the same plane (because if they were, that would be in a 2-dimensional space)
Triangle in 2 dimensions
Tetrahedron in 3 dimensions
Regular simplex = all edges are equal