Box, George; Draper, Norman (1987). Empirical Model-Building and Response Surfaces. Wiley and Sons. ISBN 0-471-81033-9.

1 Chapter 1: Introduction to Response Surface Methodology
2 Chapter 2: Use of Graduating Functions
3 Chapter 3: Least Squares for Response Surface Work
4 Chapter 4: Factorial Designs at 2 Levels
- 4.1 Analysis of Factorial Design
- 4.2 Appendix 4A: Yates' Method for Obtaining Factorial Effects
5 Chapter 5: Blocking and Fractionating Factorial Designs (skipping...)
6 Chapter 6: Use of Steepest Ascent for Process Improvement
- 6.1 Steepest Ascent Method
7 Chapter 7: Fitting Second-Order Models
8 Chapter 8: Adequacy of Estimation and the Use of Transformation
9 Chapter 9: Exploration of Maxima and Ridge Systems with Second-Order Response Surfaces
10 Chapter 10: Occurrence and Elucidation of Ridge Systems I
- 10.1 Elucidation of Stationary Regions/Maxima/Minima by Canonical Analysis
  - 10.1.1 Examples
- 10.2 Appendix 10A: Simple explanation of canonical analysis
11 Chapter 11: Occurrence and Elucidation of Ridge Systems II
12 Chapter 12: Links Between Empirical and Theoretical Models
13 Chapter 13: Design Aspects of Variance, Bias, and Lack of Fit
14 Chapter 14: Variance-Optimal Designs
- 14.1 Orthogonal designs
15 Chapter 15: Practical Choice of a Response Surface Design

Chapter 1: Introduction to Response Surface Methodology

Questions when planning initial set of experiments:

1. Which input variables should be studied?

2. Should the input variables be examined in their original form, or should transformed input variables be employed

3. How should response be measured?

4. At which levels of a given input variable should experiments be run?

5. How complex a model is necessary in a particular situation?

6. How shall we choose qualitative variables?

7. What experimental arrangement (experimental design) should be used?

Chapter 2: Use of Graduating Functions

Polynomial approximations:

a polynomial of degree d can be thought of as a Taylor series expansion of the true underlying theoretical function y(x) truncated after terms of dth order
the higher the degree d, the more closely the Taylor series can approximate the true function
the smaller the region R over which y(x) is being approximated with the polynomial approximation, the better the approximation

Issues with application of polynomial approximations:

least squares - how does it work? what are its assumptions?
standard errors of coefficients - how to estimate the standard deviations of the linear coefficients?
adequacy of fit - approximating an unknown theoretical function empirically; need to be able to check whether a given degree of approximation is adequate; how can analysis of variance (ANOVA) and examination of residuals (observed - fitted values) help to check adequacy of fit?
designs - what designs are suitable for fitting polynomials of first and second degrees? (Ch. 4, 5, 15, 13)
transformations - how can one find transformations (generally)?

Chapter 3: Least Squares for Response Surface Work

Method of Least Squares

Least squares helps you to understand a model of the form:

y = f(x,t) + e

where:

E(y) = eta = f(x,t)

is the mean level of the response y which is affected by k variables (x1, x2, ..., xk) = x

It also involves p parameters (t1, t2, ..., tp) = t

e is experimental error

To examine this model, experiments would run at n different sets of conditions, x1, x2, ..., xn

would then observe corresponding values of response y1, y2, ..., yn

Two important questions:

1. does postulated model accurately represent the data?

2. if model does accurately represent data, what are best estimates of parameters t?

start with second question first

Given: function f(x,t) for each experimental run

n discrepancies:

${y1-f(x1,t)},{y2-f(x2,t)},...,{yn-f(xn,t)}$

Method of least squares selects best value of t that make the sum of squares smallest:

$S(t)=\sum _{u=1}^{n}\left[y_{n}-f\left(x_{u},t\right)\right]^{2}$

S(t) = sum of squares function

minimizing choice of t is denoted

${\hat {t}}$

are least-squares estimates of t good?

their goodness depends on the nature of the distribution of their errors

least-squares estimates are appropriate if you can assume that experimental errors:

$\epsilon _{u}=y_{u}-\eta _{u}$

are statistically independent and with constant variance, and are normally distributed

these are "standard assumptions"

Linear models

this is a limiting case, where

$\eta =f(x,t)=t_{1}z_{1}+t_{2}z_{2}+...+t_{p}z_{p}$

adding experimental error $\epsilon =y-\eta$ :

$y=t_{1}z_{1}+t_{2}z_{2}+...+t_{p}z_{p}+\epsilon$

model of this form is linear in the parameters

Algorithm

Formulate a problem with n observed responses, p parameters...

this yields n equations of the form

y_1 = t_1 z_{11} + t_2 z_{21} + ...

y_2 = t_1 z_{21} + t_2 z_{22} + ...

etc...

This can be written in matrix form:

$\mathbf {y} =\mathbf {Zt} +{\boldsymbol {\epsilon }}$

and the dimensions of each matrix are:

y = n x 1
Z = n x p
t = p x 1
epsilon = n x 1

the sum of squares function is given by:

$S(\mathbf {t} )=\sum _{u=1}^{n}\left(y_{u}-t_{1}z_{1u}-t_{2}z_{2u}-...-t_{p}z_{pu}\right)^{2}$

or,

$S(t)=(y-Zt)^{\prime }(y-Zt)$

this can be rewritten as:

$\mathbf {Z^{\prime }Zt=Z^{\prime }y}$

Rank of Z

If there are relationships between the different input parameters (z's), then the matrix Z can become singular

e.g. if there is a relationship z2 = c z1, then you can only estimate the linear combination z1 + c z2

reason: when z2 = c z1, changes in z1 can't be distinguished from changes in z2

Z (an n x p matrix) is said to be full rank p if there are no linear relationships of the form:

a_1 z_1 + a_2 z_2 + ... + a_p z_p l= 0

if there are q > 0 independent linear relationships, then Z has rank p - q

Analysis of Variance: 1 regressor

Assume simple model $y=\beta +\epsilon$

This states that y is varying about an unknown mean $\beta$

Suppose we have 3 observations of y, $\mathbf {y} =(4,1,1)'$

Then the model can be written as $y=z_{1}t+\epsilon$

and $z_{1}=(1,1,1)'$

and $t=\beta$

so that

[ 4 ]   [ 1 ]     [ \epsilon_1 ]
[ 1 ] = [ 1 ] t + [ \epsilon_2 ]
[ 1 ]   [ 1 ]     [ \epsilon_3 ]

Supposing the linear model posited a value of one of the regressors t, e.g. $t_{0}=0.5$

Then you could check the null hypothesis, e.g. $H_{0}:t=t_{0}=0.5$

If true, the mean observation vector given by $\eta _{0}=z_{1}t_{0}$

or,

[ 0.5 ]   [ 1 ]
[ 0.5 ] = [ 1 ] 0.5
[ 0.5 ]   [ 1 ]

and the appropriate "observation breakdown" (whatever that means?) is:

$y-\eta _{0}=({\hat {y}}-\eta _{0})+(y-{\hat {y}})$

Associated with this observation breakdown is an analysis of variance table:

Source	Degrees of freedom (df)	Sum of squares (square of length), SS	Mean square, MS	Expected value of mean square, E(MS)
Model	1	$\vert {\hat {y}}-\eta _{0}\vert ^{2}=({\hat {t}}-t_{0})^{2}\sum z_{1}^{2}$	6.75	$\sigma ^{2}+(t-t_{0})^{2}\sum z_{1}^{2}$
Residual	2	$\vert y-{\hat {y}}\vert ^{2}=\sum (y-{\hat {t}}z_{1})^{2}$	3.00	$\sigma ^{2}$
Total	3	$\vert y-\eta _{0}\vert ^{2}=\sum (y-\eta _{0})^{2}=12.75$

sum of squares: squared lengths of vectors

degrees of freedom: number of dimensions in which vector can move (geometric interpretation)

the model $y=z_{1}t+\epsilon$ says whatever the data is, the systematic part ${\hat {y}}-\eta _{0}=({\hat {t}}-t_{0})z_{1}$ of $y-\eta _{0}$ must lie in the direction of $z_{1}$ , which gives ${\hat {y}}-\eta _{0}$ only one degree of freedom.

Whatever the data, the residual vector must be perpendicular to $z_{1}$ (why?), and so it can move in 2 directions and has 2 degrees of freedom

Now, looking at the null hypothesis:

the component $\vert {\hat {y}}-\eta _{0}\vert ^{2}=({\hat {t}}-t_{0})^{2}\sum z^{2}$ is a measure of discrepancy between POSTULATED model $\eta _{0}=z_{1}t_{0}$ and ESTIMATED model ${\hat {y}}=z_{1}{\hat {t}}$

Making "standard assumptions" (earlier), expected value of sum of squares, assuming model is true, is $(t-t_{0})^{2}\sum z_{1}^{2}+\sigma ^{2}$

For the residual component it is $2\sigma ^{2}$ (or, in general, $\nu _{2}\sigma ^{2}$ , where $\nu _{2}$ is number of degrees of freedom of residuals)

Thus a measure of discrepancy from the null hypothesis $t=t_{0}$ is $F={\frac {\vert {\hat {y}}-\eta _{0}\vert ^{2}/1}{\vert y-{\hat {y}}\vert ^{2}/2}}$

if the null hypothesis were true, then the top and bottom would both estimate the same $\sigma ^{2}$

So if F is different from 1, that indicates departure from null hypothesis

The MORE F differs from 1, the more doubtful the null hypothesis becomes

Least squares: 2 regressors

Previous model, $y=\beta +\epsilon$ , said y was represented with a mean $t$ plus an error.

Instead, suppose that there are systematic deviations from the mean, associated with an external variable (e.g. humidity in the lab).

Now equation is for straight line: $y=\beta _{0}+\beta _{1}x+\epsilon$

or, $y=z_{1}t_{1}+z_{2}t_{2}+\epsilon$

So now the revised least-squares model is: $\eta =z_{1}t_{1}+z_{2}t_{2}$

$\eta =E(y)$ - i.e. $\eta$ is in the plane defined by linear combinations of vectors $z_{1},z_{2}$

because $z_{1}^{\prime }z_{2}=\sum z_{1}z_{2}\neq 0$ , these two vectors are NOT at right angles

the least-squares values ${\hat {t_{1}}},{\hat {t_{2}}}$ produce a vector ${\hat {\hat {y}}}=z_{1}{\hat {t_{1}}}+z_{2}{\hat {t_{2}}}$

these least-squares values make the squared length $\sum (y-{\hat {\hat {y}}})^{2}=\vert y-{\hat {\hat {y}}}\vert ^{2}$ of the residual vector as small as possible

The normal equations express fact that residual vector must be perpendicular to both $z_{1}$ and $z_{2}$ :

${\begin{aligned}z_{1}^{\prime }(y-{\hat {\hat {y}}})&=&0\\z_{2}^{\prime }(y-{\hat {\hat {y}}})&=&0\end{aligned}}$

also written as:

${\begin{aligned}\sum z_{1}(y-{\hat {t_{1}}}z_{1}-{\hat {t_{2}}}z_{2})&=&0\\\sum z_{2}(y-{\hat {t_{1}}}z_{1}-{\hat {t_{2}}}z_{2})&=&0\end{aligned}}$

also written (in matrix form) as:

$\mathbf {Z^{\prime }} (\mathbf {y-Z{\hat {t}}} )=0$

Now suppose the null hypothesis was investigated for $t_{1}=t_{10}=0.5$ and $t_{2}=t_{20}=1.0$

Then the mean observation vector $\eta _{0}$ is represented as $\eta _{0}=t_{10}z_{1}+t_{20}z_{2}$

Source	Degrees of freedom	SS	MS	F
Model $z_{1}$ and $z_{2}$	2	$\vert {\hat {\hat {y}}}-\eta _{0}\vert ^{2}=\sum \left[\left(t_{1}-t_{01}\right)z_{1}+\left(t_{2}-t_{02}\right)z_{2}\right]^{2}=6.69$	3.345	2.23
Residual	1	$\vert y-{\hat {\hat {y}}}\vert ^{2}=\sum \left(y-{\hat {t_{1}}}z_{1}-{\hat {t_{2}}}z_{2}\right)^{2}=1.50$	1.50
Total	3	$\vert y-\eta _{0}\vert ^{2}=\sum \left(y-\eta _{0}\right)^{2}=8.19$

$y-\eta _{0}=\left({\hat {\hat {y}}}-\eta _{0}\right)+\left(y-{\hat {\hat {y}}}\right)$

and so

$F_{0}={\frac {\vert {\hat {\hat {y}}}-\eta _{0}\vert /2}{\vert y-{\hat {\hat {y}}}\vert ^{2}/1}}=2.23$

Orthogonalizing second regressor

In the above example, $z_{1}$ and $z_{2}$ are not orthogonal

One can find the vectors $z_{1}$ and $z_{2\cdot 1}$ that are orthogonal

To do this, use least squares property that residual vector is orthogonal to space in which the predictor variables lie

Regard $z_{2}$ as "response" vector and $z_{1}$ as predictor variable

You then obtain ${\hat {z_{2}}}=0.2z_{1}$ (how?)

so the residual vector is $z_{2\cdot 1}=z_{2}-{\hat {z_{2}}}=z_{2}-0.2z_{1}$

now the model can be rewritten as $\eta =\left(t_{1}+0.2t_{2}\right)z_{1}+t_{2}\left(z_{2}-0.2z_{1}\right)=tz_{1}+t_{2}z_{2\cdot 1}$

This gives three least-squares equations:

1. ${\hat {y}}=2z_{1}$ 2. ${\hat {y}}=1.5z_{1}+2.5z_{2}$ 3. ${\hat {y}}=2.0z_{1}+2.5z_{2\cdot 1}$

The analysis of variance becomes:

Source	df	SS
Response function with $z_{1}$ only	1	$\vert {\hat {y}}-\eta _{0}\|vert^{2}=\left({\hat {t}}-t_{0}\right)^{2}\sum z_{1}^{2}=12.0$
Extra due to $z_{2}$ (given $z_{1}$ )	1	$\vert {\hat {\hat {y}}}-{\hat {y}}\vert ^{2}={\hat {t}}_{2}^{2}\sum z_{2\cdot 1}^{2}=4.5$
Residual	1	$\vert y-{\hat {\hat {y}}}\vert ^{2}=\sum \left(y-{\hat {\hat {y}}}\right)^{2}=1.5$
Total	3	$\vert y-\eta _{0}\vert ^{2}=\sum \left(y-\eta _{0}\right)^{2}=18.0$

Generalization to p regressors

With n observations and p parameters:

n relations implicit in response function can be written

${\boldsymbol {\eta }}=\mathbf {Zt}$

Assuming Z is full rank, and letting ${\hat {\mathbf {t} }}$ be the vector of estimates given by normal equations

$\left(\mathbf {y-{\hat {y}}} \right)^{\prime }\mathbf {Z} =\left(y-Z{\hat {t}}\right)^{\prime }Z=0$

Sum of squares function is $S(t)=(y-\eta )^{\prime }(y-\eta )=(y-{\hat {y}})^{\prime }(y-{\hat {y}})+({\hat {y}}-\eta )^{\prime }({\hat {y}}-\eta )$

because cross-product is zero from the normal equations

$S(t)=S({\hat {t}})+({\hat {t}}-t)^{\prime }\mathbf {Z^{\prime }Z} ({\hat {t}}-t)$

Furthermore, because $\mathbf {Z^{\prime }Z}$ is positive definite, $S(t)$ minimized when $t={\hat {t}}$

So the solution to the normal equations producing the least squares estimate is the one where $t={\hat {t}}$ :

${\hat {t}}=(\mathbf {Z^{\prime }Z} )^{-1}\mathbf {Z^{\prime }y}$

Source	df	SS
Response function	p	$\vert {\hat {y}}-\eta \vert ^{2}=({\hat {t}}-t)^{\prime }\mathbf {Z^{\prime }Z} ({\hat {t}}-t)$
Residual	n-p	$\vert y-{\hat {y}}\vert ^{2}=\sum (y-{\hat {y}})^{2}$
Total	n	$\vert y-\eta \vert ^{2}=\sum (y-\eta )^{2}$

Bias in Least-Squares Estimators if Inadequate Model

Say data was being fit with a model $y=Z_{1}t_{1}+\epsilon$ ,

but the true model that should have been used is $y=Z_{1}t_{1}+Z_{2}t_{2}+\epsilon$

$t_{1}$ would be estimated by ${\hat {t_{1}}}=(\mathbf {Z_{1}^{\prime }Z_{1}} )^{-1}\mathbf {Z_{1}^{\prime }y}$

but using true model, ${\begin{array}{rcl}E({\hat {t_{1}}})&=&(\mathbf {Z_{1}^{\prime }Z_{1}} )^{-1}\mathbf {Z_{1}^{\prime }} E(\mathbf {y} )\\&=&(\mathbf {Z_{1}^{\prime }Z_{1}} )^{-1}\mathbf {Z_{1}^{\prime }} (\mathbf {Z_{1}t_{1}} +\mathbf {Z_{2}t_{2}} )\\&=&\mathbf {t_{1}+At_{2}} \end{array}}$

The matrix A is the bias or alias matrix

$A=\left(\mathbf {Z_{1}^{\prime }Z_{1}} \right)^{-1}\mathbf {Z_{1}^{\prime }Z_{2}}$

Unless A = 0, ${\hat {t_{1}}}$ will represent t1 AND t2, not just t1

A = 0 when $\mathbf {Z_{1}^{\prime }Z_{2}} =0$ , which happens if regressors in Z1 are orthogonal to regressors in Z2

Confidence Intervals

(examples given for orthogonal and non-orthogonal design... looks interesting but didn't understand it fully)

Chapter 4: Factorial Designs at 2 Levels

I think this approach has a problem... Can only lead to LINEAR models. Chapter 7 begins to deal with 2nd order models.

However, I'm not completely screwed. Composite designs: Chapter 9 details central composite designs, which consist of factorial designs for first-order effects, plus more points to determine higher-order terms.

Brief explanation of 2-level factorial designs

Designation of lower/upper level with -1/+1

Analysis of Factorial Design

Main effect of a given variable, as defined by Yates (1937), is the average difference in the level of response as one moves from low to high level of that variable

Example: effect of variable 1 is estimated by: ${\begin{aligned}{\frac {1}{4}}\left(x_{2=+1,3=+1}+x_{2=-1,3=+1}+x_{2=+1,3=-1}+x_{2=-1,3=-1}\right)_{1=+1}\\+{\frac {1}{4}}\left(x_{2=+1,3=+1}+x_{2=-1,3=+1}+x_{2=+1,3=-1}+x_{2=-1,3=-1}\right)_{1=-1}\\=0.75\end{aligned}}$

and effect of variable 2 is $-0.59$

and effect of variable 3 is $-0.35$

Factorial designs also make calculation of interactions possible... i.e. is effect of 1 different at the two different levels of 3?

Example given of calculating multiple interactions...

Variance, Standard Errors

For complete $2^{k}$ design, if $V(y)=\sigma ^{2}$ :

$V({\mbox{grand mean}})={\frac {\sigma ^{2}}{2^{k}}}$

$V({\mbox{effect}})={\frac {4\sigma ^{2}}{2^{k}}}$

or, if there are r repeats, then the denominators become $r2^{k}$

In practice, still need estimate $s^{2}$ of experimental error variacne $\sigma ^{2}$

Suppose we're given estimate of $s^{2}=0.0050$ ; then

${\hat {V}}({\overline {y}})=0.000625$

${\hat {V}}({\mbox{effect}})=0.0025$

and corresponding standard errors are the square roots:

$s({\overline {y}})=0.025$

$s({\mbox{effect}})=0.05$

so the effects of each variable, with the standard error, is:

Variable I: ${\overline {y}}=2.745\pm 0.025$

Variable 1: $0.75\pm 0.05$

Variable 2: $-0.59\pm 0.05$

Variable 3: $-0.35\pm 0.05$

Variable 12: $0.03\pm 0.05$

etc...

Regression Coefficients

If you fit a first degree polynomial to textile data, you can obtain:

${\hat {y}}=(2.745\pm 0.025)+(0.375\pm 0.025)x_{1}-(0.295\pm 0.025)x_{2}-(0.175\pm 0.025)x_{3}$

The estimated regression coefficients $b_{1}=0.375,b_{2}=-0.295,b_{3}=-0.175$ and their errors are half of the main effects and their standard errors

Factor of one half comes from definition of effect: difference in response on moving from the -1 level to the +1 level of given variable $x_{i}$ , so it corresponds to change in y after changing $x_{i}$ by 2 units

Regression coefficient $b_{i}$ is the change in y when $x_{i}$ is changed by 1 unit

Dye example

Example of $2^{6}$ factorial design

Analysis of results show that data adequately explained in terms of linear effects in only three variables, $x_{1},x_{4},x_{6}$

Linear equations in $x_{1},x_{4},x_{6}$ fitted by least squares to all 64 data points:

${\begin{array}{rcl}{\hat {y}}_{1}&=&(11.12\pm 0.24)+(0.87\pm 0.24)x_{1}+(1.49\pm 0.24)x_{4}+(1.35\pm 0.24)x_{6}\\{\hat {y}}_{2}&=&(16.95\pm 0.74)-(5.64\pm 0.74)x_{1}-(0.17\pm 0.74)x_{4}+(5.42\pm 0.74)x_{6}\\{\hat {y}}_{3}&=&(28.28\pm 0.67)+(0.19\pm 0.67)x_{1}-(1.50\pm 0.67)x_{4}-(4.44\pm 0.67)x_{6}\end{array}}$

These are not a function of $x_{2},x_{3},x_{5}$ but this does NOT mean substituting value of 0 for nonsignificant coefficients

Value of 0 would not be best estimate

Regard fitted equations as best estimates in the three dimensional subspace of the full six-dimensional space in which $x_{2},x_{3},x_{5}$ are at average values

Next, obtain estimate of standard error deviations from residual sum of squares:

Variable 1

Source of variation	SS	df	MS	F ratio
Total SS = $\sum y^{2}$	8,443.41	64
Correction factor, SS due to $b_{0}=(\sum y)^{2}/64$	7,918.77	1
Corrected total SS	524.63	63
Due to $b_{1}=b_{1}\sum x_{1}y=(\sum x_{1}y)^{2}/\sum x_{1}^{2}$	48.825	1	48.825	13.47
Due to $b_{4}=b_{4}\sum x_{4}y=(\sum x_{4}y)^{2}/\sum x_{4}^{2}$	142.50	1	142.50	39.32
Due to $b_{6}=b_{6}\sum x_{6}y=(\sum x_{6}y)^{2}/\sum x_{6}^{2}$	115.83	1	115.83	31.96
Residual	217.47	60	3.624 $s_{1}=1.9038$

$\sum y=711.9;\sum x_{1}y=55.9;\sum x_{4}y=95.5;\sum x_{6}y=86.1$

etc... this table also exists for variable 4 and variable 6

Potential bias in standard deviation estimates:

biased upward because of several small main effects and interactions that are being ignored
biased downward because of effect of selection (only large estimates taken to be real effects)

these s values were used to estimate standard errors of coefficients hsown in parentheses beneath coefficients

Diagnostic Checking of Fitted Models

Plots of residuals vs. ${\hat {y}}_{1},{\hat {y}}_{2},{\hat {y}}_{3}$

Plots of residuals vs. Time Order

Don't understand exactly what they're getting from these plots... or what "NSCORES" are... or what "Time Order" is

Response Surface Analysis

Application of model to manufacturing: want to restrict one variable to 20, another variable to 26... and maximize strength

Response surface: looking at three-dimensional cube... these two constraints create two planes

The two planes intersect and create a line PQ, and along this line the strength varies from 11.08 (Q) to 12.46 (P)

Estimated difference in strengths at P and Q given by:

${\begin{array}{rcl}{\hat {y}}_{P}-{\hat {y}}_{Q}&=&b_{1}(x_{1P}-x_{1Q})+b_{4}(x_{4P}-x_{4Q})+b_{6}(x_{6P}-x_{6Q})\\&=&12.46-11.08\\&=&1.38\end{array}}$

and the variance is given by:

${\begin{array}{rcl}V({\hat {y}}_{P}-{\hat {y}}_{Q})&=&\left[(x_{1P}-x_{1Q})^{2}+(x_{4P}-x_{4Q})^{2}+(x_{6P}-x_{6Q})^{2}\right]V(b_{i})\\&=&{\overline {PQ}}^{2}V(b_{i})\end{array}}$

where $V(b_{i})={\frac {\sigma ^{2}}{n}}$ , and ${\overline {PQ}}^{2}$ is the squared distance between the points P and Q in the scale of the x's

So the standard deviation of $({\hat {y}}_{P}-{\hat {y}}_{Q})=1.38$ is ${\overline {PQ}}{\frac {\sigma }{n^{\frac {1}{2}}}}$

when this is evaluated, it is:

$\left[(x_{1P}-x_{1Q})^{2}+(x_{4P}-x_{4Q})^{2}+(x_{6P}-x_{6Q})^{2}\right]^{\frac {1}{2}}{\frac {\sigma }{64^{\frac {1}{2}}}}=0.2809\sigma$

value of $s_{1},s_{2},s_{3}$ substituted for $\sigma$ , this gives a standard error

For variable 1, $s_{1}=\sigma$ , the standard error is 0.53

This means that the difference ${\hat {y}}_{P}-{\hat {y}}_{Q}=1.38$ is 2.6 times larger than the standard error, meaning we can be confident the strength is in fact higher at P than at Q

Appendix 4A: Yates' Method for Obtaining Factorial Effects

This has got to be the worst description of a mathematical technique, ever.

Chapter 5: Blocking and Fractionating Factorial Designs (skipping...)

Chapter 6: Use of Steepest Ascent for Process Improvement

Expensive and impractical to explore entire operability region (i.e. entire region in which the system could be operated)

But this should not be the objective

Instead, explore subregion of interest

For new/poorly understood systems, need to apply a preliminary procedure to find these subregions of interest where a particular model form (e.g. 2nd order polynomial) will apply

One method: one factor at a time method

Alternative method: steepest ascent method (Box: this is more effective, economical)

Steepest Ascent Method

Example: chemical system whose yield depends on time, temperature, concentration

Early stage of investigation: planar contours of first-degree equation can be expected to provide fair approximation in immediate region of point P far from optimum

Direction at right angles to contour planes is in direction of steepest ascent, if pointing toward higher yield values

Exploratory runs performed along path of steepest ascent

Best point found, or interpolated estimated maximum point on path, could be made base for new first-order design, from which further advance might be possible

After one or two applications of steepest ascent, first-order effects will no longer dominate, first order approximation will be inadequate

Second order methods (Chapter 7, Chapter 9) will then have to be applied

Chapter 7: Fitting Second-Order Models

Chapter 8: Adequacy of Estimation and the Use of Transformation

Chapter 9: Exploration of Maxima and Ridge Systems with Second-Order Response Surfaces

At first glance this chapter just appears to be a re-hash of the earlier chapter on ridge systems and optimization.

However, on second glance, section 2 discusses a composite design used to construct second-order response surface for a polymer elasticity.

9.2 Example: Polymer Elasticity

Illustrating example to elucidate nature of maximal region for polymer elasticity experiment

Central Composite Design

The design employed was second order central composite design

Such design consists of two-level factorial (or fractional factorial), chosen to allow estimation of all first-order and two-factor interaction terms

This is augmented with additional points to estimate pure quadratic effects

These designs are discussed in more detail in Chapter 15

Using standard factorial coding, 3 variable values converted to -1/+1

First, determine the low/high levels of variable $\phi$ :

$\phi _{low}$

$\phi _{high}$

Next, determine the midlevel:

${\frac {\left(\phi _{high}+\phi _{low}\right)}{2}}=\phi _{mid}$

And last, semirange:

$\phi _{semi}=\phi _{high}-\phi _{mid}=\phi _{mid}-\phi _{low}$

So that the "standard factorial coding" is:

$\psi ={\frac {\phi -\phi _{mid}}{\phi _{semi}}}$

First set of runs: factorial design, coded factorial variable values were -1 and +1

Second set of runs: three-dimensional "star", coded factorial variable values were -2, 0, and +2

Block difference: 1+ week between first and second set of runs, allowing much time for systematic differences

Experiment was run in two blocks of eight runs (Chapter 5 terminology)

Estimation/Elimination of Block Differences

If all 16 runs performed under conditions of first block:

Constant term in second degree polynomial could be written $\beta _{0}-\delta$

If all 16 runs performed under conditions of second block:

Constant term in second degree polynomial could be written $\beta _{0}+\delta$

True mean difference between blocks is $2\delta$

This makes the model:

$y=g(\mathbf {x} ,{\boldsymbol {\beta }})+x_{B}\delta +\epsilon$

where the variable $x_{B}$ is a blocking variable

This variable is -1 for the first block, +1 for the second block

Next (this is where he loses me) the blocking (indicator) variable $x_{B}$ is orthogonal to each column in $\mathbf {Z} _{1}$

Then it follows from Section 3.9 (uh, what?) that:

${\hat {\delta }}={\frac {\left(\sum _{j=1}^{16}x_{Bj}y_{j}\right)}{\sum _{j=1}^{16}x_{Bj}^{2}}}$

He then says, that you can separate out the "blocks" contribution from the residual sum of squares

He then loses me again, with this "blocks" contribution with 1 degree of freedom, that is given by:

$16{\hat {\delta }}^{2}=26.63$

And then references a page that doesn't seem to talk about anything related (p. 513)........

He then presents this table:

Source	SS	degrees of freedom	MS
Blocks	26.6	1	26.6 (F = 8.9)
Residual after removal of block	15.0	5	3.0
Residual before removal of blocks	41.6	6

Because blocking is orthogonal, it does not change the estimated coefficients in the model

But the portion of the original residual sum of squares accounted for by the systematic block difference is removed, and that increases the experimental accuracy

It is further possible to analyze this residual variance further and isolate measures of adequacy of fit

Importance of Blocking

Blocking is important!!!

Once the $2^{3}$ factorial design is completed, there are a couple of different ways to proceed

Had first-order effects been large compared with their standard errors, and large compared with estimated interaction terms, then application of steepest ascent would be appropriate (no second-order model)

Using the application of steepest ascent would lead to a maxima in (likely) a different location

Sequential Assembly of Designs

Second part of design added with knowledge that the second degree polynomial equation could now be estimated

A change in level could have occurred between two blocks of runs

Possibility of sequential assembly of different kinds of designs: discussed in more detail in Section 15.3

Examination of Fitted Surface

Location of maximum of fitted surface

Investigation of Adequacy of Fit

Isolating residual degrees of freedom for composite design

Before accepting fitted second degree equation, one must consider lack of fit

N observations fitted to linear model with p parameters:

Fitting process itself will introduce p linear relationships among the N residuals $y-{\hat {y}}$

If N is large with respect to p:

the effect of induced dependence among residuals will be slight
Plotting techniques employed to examine residuals useful in revealing inadequacies in model

As p becomes larger, and as it approaches N:

patterns caused by induced dependencies become dominant, and can mask those due to model inadequacies

Section 7.4: Using factorial desgin, possible to obtain information on adequacy of fit by isolating and identifying individual residual degrees of freedom associated with feared model inadequacies

For fitting a polynomial of degree n, it is important to consider possibility that polynomial of higher degree is needed

This focuses attention on characteristics of estimates when the feared alternative model applies, but the simpler assuemd model ahs been fitted

Contemplation of fitted model embedded in more complex one makes it possible to answer two questions:

1. to what extent are original estimates of coefficients biased if the more complex model is true?

2. what are appropriate checking functions to warn of the possible need for a more complex model?

both questions critically affected by choice of design

Thoughts: need for adequate surrogate models is desperate... If we don't have good surrogate models, all of the hard computational work goes to waste.

The investigation of model fitness, experimental design, and statistical analysis of the results is just as important as development of the model itself.

ESPECIALLY in the case of moving toward predictive science, questions (1) and (2) above are CRITICAL!

Bias Characteristics of the Design

Can write extended third-order polynomial model in form:

y = Z1 beta1 + Z2 beta2 + epsilon

Or in orthogonalized form,

y = Z1 ( beta1 + A beta2 ) + ( Z2 - Z1 A ) beta2 + epsilon

where Z1 beta1 includes all terms up to and including second order, and Z2 beta2 has all terms of third order

Alias or bias matrix A:

$A=\left(Z_{1}^{\prime }Z_{1}\right)^{-1}Z_{1}^{\prime }Z_{2}$

It shows that only the estimates of first order terms are biased by third-order terms

If a third-order model is appropriate, and if b1 b2 and b3 are previous least-squares estimates, then

E(b1) = beta1 + 2.5 beta111 + 0.5 beta122 + 0.5 beta133

E(b2) = beta2 + 2.5 beta222 + 0.5 beta112 + 0.5 beta233

E(b3) = beta3 + 2.5 beta333 + 0.5 beta113 + 0.5 beta223

Checking Functions for the Design

Examination of matrix Z2 - Z1 A reveals rather remarkable circumstance

of the 10 columns, only 4 are independent; these 4 are simple multiples of 4 columns of Z2*

(some analysis I don't quite follow...)

Thus, although we cannot obtain estimates of each of the third-order effects individually, using this design we can isolate certain linear combinations of them (certain alias groups)

size of these combinations can indicate particular directions in which there may be lack of fit

example: if $l_{jjj}$ (linear combinations of $j^{th}$ observation, I think?) were excessively large, it could indicate that a transformation of $x_{j}$ might be needed to obtain adequate representation using second degree equation

Transformation aspect: discussed in more detail in Section 13.8

Complete Breakup of Residual Sum of Squares

16 runs in composite design used

10 degrees of freedom = estimation of second degree polynomial

1 degree of freedom = blocking

1 degree of freedom = pure error comparison in which two center points are compared

4 degrees of freedom remain

canbe associated with possible lack of fit from neglected third order terms
alternatively, with need for transformation variables

Source	SS	degrees of freedom	MS
Blocks	26.6	1	26.6
111	7.2	1	7.2
222	2.6	1	2.6
333	1.6	1	1.6
123	1.2	1	1.2
Pure error	2.3	1	2.3
Residual sum of squares	41.6	6

Since none of the mean squares are excessively large compared with others, and do not contradict earlier supposition that $\sigma =2$ (or $\sigma ^{2}=4$ ), no reason to suspect lack of fit

Chapter 10: Occurrence and Elucidation of Ridge Systems I

Reason for occurrence of unusual ridge shapes of systems can be seen because factors like temp., time, pressure, concentration, etc. are regarded as "natural" variables because they can be conveniently manipulated and measured

Individual fundamental variables (e.g. collision of two types of molecules) often a function of multiple variables

This is why you may see multiple min/max or optimal levels of a fundamental variable

Example: measuring an observable that is a function of voltage, but all you can measure is current and resistance (presuming Ohm's law existence unknown)

This leads to a ridge system, where (along the ridge) the voltage is maximal

Elucidation of Stationary Regions/Maxima/Minima by Canonical Analysis

Canonical analysis: writing second degree equation in form in which it can be more readily understood

involves elimination of all cross-product terms

Examples

(several examples and forms of canonical analysis given)

Appendix 10A: Simple explanation of canonical analysis

(Geometrical explanation of canonical analysis)

Chapter 11: Occurrence and Elucidation of Ridge Systems II

One of the most important uses of response surface techniques: detection, description, exploitation of ridge systems

Examples

Stationary ridge

Rising ridge

Canonical Analysis to Characterize Ridge Phenomena

Example: Consecutive Chemical System with Near Stationary Planar Ridge

(example given: Box and Youle, 1955)

Chemical system

Transformation of variables

Canonical analysis

Direct fitting of canonical form

Exploiting canonical form

Example: Small reactor study yielding rising ridge surface

Example: Stationary ridge in five variables

Economic importance of dimensionality of maxima/minima

Method for obtaining desirable combination of several responses

Appendix 11A: Calculations for ANOVA

Appendix 11B: Ridge analysis (alternative to canonical analysis)

Chapter 12: Links Between Empirical and Theoretical Models

Chapter 13: Design Aspects of Variance, Bias, and Lack of Fit

response y is measured, with a mean value $E(y)=\eta$ , believed to depend on set of variables ${\boldsymbol {\xi }}=\xi _{1},\xi _{2},\dots ,\xi _{k}$

Exact functional relationship is

$E(y)=\eta =\eta ({\boldsymbol {\xi }})$

and is usually unknown/unknowable

Flight of bird, fall of leaf, flow of water through valve: even using such equations, we are likely to be able to approximate main features of relationship

This book: employ crude polynomial approximations, exploiting local smoothness properties

Adequate LOCALLY (flight of bird can be approximated by straight line function of time for short times, maybe quadratic function at long times)

Low order terms of Taylor series approximation can be used over region of interest $R({\boldsymbol {\xi }})$

This lies within larger region of operability $O({\boldsymbol {\xi }})$

if $f({\boldsymbol {\xi }})$ is the polynomial approximation,

$E({\boldsymbol {\xi }})=\eta ({\boldsymbol {\xi }})\approx f({\boldsymbol {\xi }})$

"The fact that the polynomial is an approximation does not necessarily detract from its usefulness because all models are approximations. Essentially, all models are wrong, but some are useful. However, the approximate nature of the model msut always be borne in mind."

Suppose the following:

${\boldsymbol {\epsilon }}=\epsilon _{1},\epsilon _{2},\dots ,\epsilon _{n}$ is vector of random errors with zero vector mean,

${\boldsymbol {y}}=y_{1},y_{2},\dots ,y_{n}$

${\boldsymbol {f(\xi )}}=f({\boldsymbol {\xi _{1}}}),f({\boldsymbol {\xi _{2}}}),\dots ,f({\boldsymbol {\xi _{n}}})$ (where the n different $\xi$ are n observations of $\xi$ , the true model is not:

$\mathbf {y} =\mathbf {f} ({\boldsymbol {\xi }})+{\boldsymbol {\epsilon }}$

but is actually

${\begin{array}{rcl}\mathbf {y} &=&{\boldsymbol {\eta }}({\boldsymbol {\xi }})+{\boldsymbol {\epsilon }}\\&=&\mathbf {f} ({\boldsymbol {\xi }})+{\boldsymbol {\delta }}({\boldsymbol {\xi }})+{\boldsymbol {\epsilon }}\end{array}}$

where

${\boldsymbol {\delta }}({\boldsymbol {\xi }})={\boldsymbol {\eta }}({\boldsymbol {\xi }})-\mathbf {f} ({\boldsymbol {\xi }})$

is a vector discrepancy that should be small over the region of interest

There are TWO types of errors that must be taken into account:

1. Systematic, or bias, errors $\delta (\xi )=\eta (\xi )-f(\xi )$ , which is the difference between the expected value of the response $E(y)=\eta (\xi )$ and its approximating function $f(\xi )$

2. Random errors $\epsilon$

Systematic errors are always to be expected

Since the time of Gauss, they have been ignored and most concentration has been focused on random error

(Nice mathematical results are possible when this is done)

In choosing a design, ignoring of systematic error is not innocuous approximation, and may lead to misleading results

Competing effects of bias and variance

Example of an interval with an unknown function $E(y)=\eta (\xi )$ , and looking at a plot of $\eta$ vs. $\xi$

Region of interest: $\xi _{-}\leq \xi \leq \xi _{+}$

Approximating using straight line:

$f(\xi )=\alpha +\beta \xi$

And errors $\epsilon$ in observations have variance $\sigma ^{2}$

Next step is to apply the coding transformation

$x={\frac {\xi -{\frac {1}{2}}\left(\xi _{+}+\xi _{-}\right)}{\xi _{+}-\xi _{-}}}$

to convert the interval of interest into interval $(-1,1)$

Now suppose use least-squares fit to find

${\hat {y}}_{x}=a+bx$

Mean squared error calculation:

MSE estimating $\eta _{x}$ with ${\hat {y}}_{x}$ is, for N design points and variance $\sigma ^{2}$ ,

${\begin{array}{rcl}&=&{\frac {NE({\hat {y}}_{x}-\eta _{x})^{2}}{\sigma ^{2}}}\\&=&{\frac {NE\left[{\hat {y}}_{x}-E({\hat {y}}_{x})+E({\hat {y}}_{x})-\eta _{x}\right]^{2}}{\sigma ^{2}}}\\&=&{\frac {NV({\hat {y}}_{x})}{\sigma ^{2}}}+{\frac {N\left[E({\hat {y}}_{x})-\eta _{x}\right]^{2}}{\sigma ^{2}}}\end{array}}$

Symbolically:

$M_{x}=V_{x}+B_{x}$

standardized mean squared error $M_{x}$ is equal to variance $V_{x}$ plus squared bias $B_{x}$ at x

Integrated Mean Squared Error

Can integrate variance and squared bias over region of interest R:

$V={\frac {\int _{R}V_{x}dx}{\int _{R}dx}}$

$B={\frac {\int _{R}B_{x}dx}{\int _{R}dx}}$

and if integrated mean square error is denoted $M$ , then

$M=V+R$

Regions of Interest, Regions of Operability for k Dimensions

When using standard designs:

Choice of approximation function $f(\xi )$ and selected neighborhood $R(\xi )$ are implicit as soon as experimenter decides on type of design, variables to investigate, levels of variables, and transformations to use

Example: chemist exploring effect of 2 types of catalyst; picks experimental design factors to be total catalyst weight (sum of 2 variables) and catalyst weight ratios (ratio of 2 variables)

other scientists might have selected different ranges for variables, or selected to use different transforms for design factors

Such differences would not necessarily have any adverse effect on the end result of the investigation. The iterative strategy we have proposed for the exploration of response surfaces is designed to be adaptive and self-correcting. For example, an inappropriate choice of scales or of a transformation can be corrected as the iteration proceeds.
However, for a given experimental design, a change of scale can have a major influcence on:
1. the variance of estimated effects
2. the sizes of systematic errors

Regions of Interest: Weight Functions

"Region of interest" discussion implies accuracy of prediction is uniform over interval R

Sometimes this is not the case

Might need high accuracy at some point P in predictor variable space, and can tolerate reduced accuracy away from P

e.g. think of Gaussian vs. top hat

Introduce a weight function $w({\boldsymbol {x}})$

Minimizing a weighted mean squared error integrated over the whole operability region O

$M=V+B$

with

${\begin{array}{rcl}M&=&{\frac {N}{\sigma ^{2}}}\int _{O}w(x)E\left[{\hat {y}}(x)-\eta (x)\right]^{2}dx\\V&=&{\frac {N}{\sigma ^{2}}}\int _{O}w(x)E\left[{\hat {y}}(x)-E{\hat {y}}(x)\right]^{2}dx\\B&=&{\frac {N}{\sigma ^{2}}}\int _{O}w(x)E\left[E{\hat {y}}(x)-\eta (x)\right]^{2}dx\end{array}}$

Weight functions should also be normalized, so that

$\int _{O}w(x)=1$

1-D Weight Function Example

Fitted equation:

${\hat {y}}_{x}=b_{0}+b_{1}x$

True model:

$\eta _{x}=\beta _{0}+\beta _{1}x+\beta _{11}x^{2}$

Suppose N runs made at levels $x_{1},x_{2},\dots ,x_{N}$ , and that $\sum x_{i}=N{\overline {x}}$

Define

$m_{p}={\frac {\sum x_{u}^{p}}{N}}$

$\mu _{p}=\sum _{O}w(x)x^{p}dx$

it can be shown that the integrated mean squared error is

(big expression for M)

Want to minimize M

can choose $m_{3}=0$ to eliminate one term, and this can be done by making design symmetric about center

Only design characteristic that remains is $m_{2}$ , which measures spread of sample points (small spread means near-neighbors of center point are selected, etc.)

"All-Variance" Case: bias term is totally ignored

"All-Bias" Case: variance term is totally ignored

Results from 1-D case: optimal value for V = B is close to that for all-bias designs, dramatically different from all-variance designs

This suggests that, if a simplification is to be made in the design problem, it might be better to ignore the effects of sampling variation rather than those of bias

Designs Minimizing Bias

Designs minimizing squared bias are of practical importance

Consideration of properties of such designs are important

Example: polynomial model of degree d1, actual model of degree d2

${\hat {y}}(x)=\mathbf {X} _{1}b_{1}$

$\eta (x)=\mathbf {X} _{1}\beta _{1}+\mathbf {X} _{2}\beta _{2}$

also,

${\begin{array}{rcl}\mathbf {M} _{11}&=&N^{-1}\mathbf {X} _{1}^{\prime }\mathbf {X} _{1}\\\mathbf {M} _{12}&=&N^{-1}\mathbf {X} _{1}^{\prime }\mathbf {X} _{2}\end{array}}$

and

${\begin{array}{rcl}\mu _{11}&=&\int _{O}w(x)x_{1}x_{1}^{\prime }dx\\\mu _{12}&=&\int _{O}w(x)x_{1}x_{2}^{\prime }dx\end{array}}$

Necessary and sufficient condition for squared bias B to be minimized:

$\mathbf {M} _{11}^{-1}\mathbf {M} _{12}^{-1}=\mu _{11}^{-1}\mu _{12}$

Not necessary, but sufficient, condition is

${\begin{array}{rcl}\mathbf {M} _{11}&=&\mu _{11}\\\mathbf {M} _{12}&=&\mu _{12}\end{array}}$

Elements of $\mu _{11},\mu _{12}$ are of form:

$\int _{O}w(x)x_{1}^{\alpha _{1}}x_{2}^{\alpha _{2}}\dots x_{k}^{\alpha _{k}}dx$

and these are moments of weight function.

Elements of $\mathbf {M} _{11},\mathbf {M} _{12}$ are of form:

$N^{-1}\sum _{u=1}^{N}x_{1u}^{\alpha _{1}}x_{2u}^{\alpha _{2}}\dots x_{ku}^{\alpha _{k}}$

and these are moments of the design points.

All are of order $\alpha =\sum _{i}\alpha _{i}$

Thus the sufficient condition above states that up to and including order $d_{1}+d_{2}$ the design moments must equal the weight function moments

Previous section: all-bias design obtained by setting $m_{2}=\mu _{2}$ , $m_{3}=\mu _{3}$

(Conclusions are...? I don't know exactly. He gives an example for fitting a response function plane to a real function of degree 2 within a spherical region of interest R...)

Detecting Lack of Fit

Two features from a model are desired:

1. Good estimation of $\eta$

2. Detection of model inadequacy

Box and Draper 1959, 1963: consider experimental design strategies that fit the first critera, then narrow down to ones that also fit second criteria

Consider mechanics of making test of goodness of fit using ANOVA

Estimating $p$ parameters

Using $p+f$ observations

Results in $f$ degrees of freedom

Repeated observations made at certain points to provide $e$ pure error degrees of freedom

Total number of observations: $N=p+f+e$

ANOVA table:

Source	df	E(MS)
Parameter estimates	p
Lack of fit	f	$\sigma ^{2}+\Delta ^{2}/f$
Pure error	e	$\sigma ^{2}$
Total	N

$\Delta ^{2}$ = noncentrality parameter

$\sigma ^{2}$ = experimental error variance, or expectation of unbiased pure error mean square

$\sigma ^{2}+\Delta ^{2}/f$ = expected value of lack of fit mean square

Test of lack of fit: comparison of mean square for lack of fit to mean square for error, via $F(f,e)$ test

Noncentrality parameter takes the general form:

$\Delta ^{2}=\sum _{u=1}^{N}\left[E({\hat {y}}_{u})-\eta _{u}\right]^{2}=E(S_{L})-f\sigma ^{2}$

where $S_{L}$ = lack of fit sum of squares

Good detectability of general lack of fit can be obtained by choosing a design that makes $\Delta ^{2}$ large

This can be achieved by putting certain conditions on the experimental design moments

$d^{th}$ order design provides high detectibility for terms of order $d+1$ if:

1. all odd design moments of order $(2d+1)$ or less are zero

2. the following ratio is large:

${\frac {\sum _{u=1}^{N}r_{u}^{2(d+1)}}{\left[\sum _{u=1}^{N}r_{u}^{2}\right]^{d+1}}}$

where

$r_{u}^{2}=x_{1u}^{2}+x_{2u}^{2}+\dots +x_{ku}^{2}$

Example: for first order design $d=1$ , ratio $\sum r_{u}^{4}/(\sum r_{u}^{2})^{2}$ should be large to detect high dependability of quadratic lack of fit; $d=2$ , ratio $\sum r_{iu}^{6}/(r_{iu}^{2})^{3}$ should be large to provide high detectability of cubic lack of fit.

Example: $k=2,d_{1}=1,d_{2}=2$

Two example designs are given:

one design (A) is sensitive to lack of fit produced by interaction term, completely insensitive to lack of fit produced by quadratic terms
one design (B) is sensitive to lack of fit produced by quadratic terms alone, not sensitive to lack of fit due to interaction terms

Detecting Variable Transformability

Construction of designs to detect whether a variable should be transformed to yield a simpler model

Discussion of both first and second order models...

Second Order Models

For example, a function may possess asymmetrical maximum which, after suitable variable transformation, can be represented by quadratic function

Parsimonious class of designs of this type: central composite arrangements in which a cube, consisting of two-level factorial with coded points $\pm 1$ , or fraction of resolution $R\geq 5$ , augmented by an added "star" with axial points at coded distance $\alpha$ and by $n_{0}$ added center points

(note that $n_{c}$ is number of cube points, $n_{s}$ is number of star points, $n_{0c}$ number of center cube points, $n_{0s}$ number of center star points, and thus $n_{0}=n_{0c}+n_{0s}$ )

Before accepting utility of fitted equation, need to be reassured on two questions:

1. Is there evidence from data of serious lack of fit?

2. If not, is the change in ${\hat {y}}$ over the experimental region explored by design large enough compared with standard error of ${\hat {y}}$ to indicate that response surface is adequately estimated?

ANOVA table: throws light on both questions

elements (row) for:
- mean
- blocks
- first order extra
- second order extra
- lack of fit $b_{111}$
- lack of fit $b_{222}$
- lack of fit $CC$
- pure error

Main concern: marked lack of fit of second order model

Design of experiment table: factors/levels table

Need for transformation would be associated with appearance of third order terms

Associated with the design table are four possible third-order columns, namely those created by:

$(x_{1}^{3},x_{1}x_{2}^{2});(x_{2}^{3},x_{2}x_{1}^{2});$

These form two sets of two items

Suppose these third-order columns orthogonalized with respect to low-order $\mathbf {X}$ vectors (regress them against 6 columns for $x_{1},x_{2},x_{1}^{2},\dots$ )

Then take residuals to yield columns $x_{111}$ from $x_{1}^{3}$ , $x_{122}$ from $x_{1}x_{2}^{2}$ , etc.

In vector notation:

$\mathbf {x} _{iii}=-3\mathbf {x} _{ijj}$

The curvature contrast (curvature contrast has expectation zero if assumption of a quadratic model is true) associated with $\mathbf {x} _{111}$ is:

${\begin{array}{rcl}c_{31}&=&{\frac {1}{36}}\mathbf {x} _{111}'\mathbf {y} \\&=&{\frac {1}{3}}\left[{\frac {{\overline {y}}_{\alpha }-{\overline {y}}_{-\alpha }}{2\alpha }}-{\frac {{\overline {y}}_{1}-{\overline {y}}_{-1}}{2}}\right]\end{array}}$

in this case, $\alpha$ is the average of the responses at the second level (so, for composite design, $\alpha =2$ most likely).

${\overline {y}}_{\alpha }$ is average of response $y$ at level $x_{1}=\alpha$ , etc.

$c_{31}$ is measure of overall non-quadricity in the $x_{1}$ direction

Corresponding measure in $x_{2}$ diredction is $c_{32}=\mathbf {x} _{222}^{\prime }\mathbf {y} /36$

General Formulas

General composite designs contain:

1. a "cube" consisting of a $2^{k}$ (full factorial) or $2^{k-p}$ (fractional factorial), made up of points of type $\pm 1$ for resolution $R\geq 5$ replicated $r_{c}$ times, leading to the number of points $n_{c}=r_{c}2^{k-p}$

2. a star, that is, $2k$ points $(\pm \alpha ,0,0,\dots ),(0,\pm \alpha ,0,0,\dots ),\dots$ on the predictor variable axes replicated $r_{s}$ times leading to $n_{s}=2kr_{s}$ points (assuming $\alpha \neq 1$ )

3. Center points, number $n_{0}$ , where $n_{0c}$ in cube, $n_{0s}$ in star

Chapter 14: Variance-Optimal Designs

Ignoring of bias: the theory that follows rests on assumption that graduating polynomial is the response function

This polynomial must be regarded as a local approximation of an unknown response function

Two sources of error: variance error and bias error

Designs which take account of bias tend not to place points at the extremes of region of interest, which is where credibility of approximating function is most strained

Aims in selecting experimental design must be multifaceted

Desirable design properties:

generate satisfactory distribution of information throughout region of interest R
ensure that fitted value at $x,{\hat {y}}(x)$ be as close as possible to true value at $x,\eta (x)$
give good detectability of lack of fit
allow transformations to be estimated
allow experiments to be performed in blocks
allow designs of increasing order to be built up sequentially
provide internal estimate of error
be insensitive to wild observations and to violation of usual normal theory assumptions
require minimum number of experimental runs
provide simple data patterns that allow ready visual appreciation
ensure simplicity of calculation
behave well when errors occur in the settings of predictor variables, the x's
not require impractically huge number of levels of predictor variables
provide check on constancy of variance assumption

Orthogonal designs

Orthogonality: important design principle (Fisher and Yates)

Rotatability: logical extension of orthogonality

Chapter 15: Practical Choice of a Response Surface Design

have to take account of stuff on the right to determine relative importance of stuff on left

Characteristics of design	Relevant experimental circumstances
Allows check of fit	size of experimental region smoothness of response function complexity of model
Allows estimations of transformations	lack of fit that could be corrected by transformation
Permits sequential assembly	ability to perform runs sequentially ability to move in space of the variables
Can be run in blocks	homogeneity of experimental materials state of control of the process
Provides independent estimate of error	number of runs permissible possibility of large experimental error existance of reliable prior estimate of error
Robustness of distribution of design points	possibility of occasional aberrant runs and/or observations nature of error function
Number of runs required	cost of making runs
Simplicity of data pattern	need to visualize data to motivate model

Sequential assembly

Many examples of designs used sequentially, e.g. using steepest ascent with first-order designs, then finding sufficiently promising region, then creating second-order model inside that region

Illustration: three-phase sequential construction of design

I: regular simplex (4 of 8 cube corners) and 2 center points
II: complementary simplex (remaining 4 cube corners) and 2 (additional) center points
III: six axial points (star)

Phase I: orthogonal first-order design, checks for overall curvature (via contrast of average response of center points with average response on cube)

If first-order effects were large compared with their standard errors, and no serious curvature exhibited, then would explore indicated direction of steepest ascent

If doubts about adequacy of first-degree polynomial model, then Step II would be performed

Steps (I+II) still first order, but they give much better indication of detecting lack of fit due to additional information permitting estimation of two-parameter interactions

If first-order terms dominate, move to more promising region

If first-order effects small, and two-factor interactions dominate or strong evidence of curvature, move to Step III (second order)

Other Factors

Robustness

Minimization of effects of wild observations (not sure waht that means, exactly)

Number of Runs

Experimental design should focus experimental effort on whatever is judged important in the context

Suppose that in addition to estimating $p$ parameters of assumed model form, need $f\geq 0$ contrasts needed to check adequacy of fit, $b\geq 0$ further contrasts needed for blocking, and estimate of experimental error needed having $e\geq 0$ degrees of freedom

To obtain independent estimates of all items of interest, require a design with at least $p+f+b+e$ runs

Relative importance of checking fit, blocking, and obtaining error estimate will differ in different situations

Minimum value of N runs will correspondingly differ

Corresponding minimum design will only be adequate if $\sigma ^{2}$ is below a critical value

When $\sigma ^{2}$ larger, designs larger than minimum design needed to obtain estimates of sufficient precision

Even with $\sigma ^{2}$ small, designs where $N>p$ not wasteful

Depends on whether additional degrees of freedom are genuinely used to achieve the experimenter's objectives

Simple Design Patterns

Question: why use design patterns, instead of randomized designs?

Statistician's task:

1. Inductive criticism

2. Deductive estimation

(2) involves deducing consequences of given assumptions, in the light of the data obtained; this is easily done with randomized designs

(1) involves two questions:

a) what function should be fitted in the first place?

b) how to examine residuals from fitted function to understand deviations from initial model? (relation to the predictor variables, how to find appropriate model modification)

Factorial/composite designs use patterns of experimental points allowing such comparisons to be made

Inductive criticism enhanced by possibility of being able to plot original data and residuals against variable parameter $\xi _{1}$ for each individual level of $\xi _{2}$ , or vice versa

First Order Designs

two-level factorial (Chapter 4)

fractional factorial (Chapter 5)

use of these in estimating steepest ascent (Chapter 6)

convenient curvature check: obtained by adding center points to factorial points (Section 6.3)

first-order designs can play role of initial building blocks in construction of second-order designs

Plackett Burman designs: useful first-order designs (Section 5.4); can also be used as initial building blocks for smaller second-order designs (Section 15.5)

Koshal first-order design can be used for "extreme economy of experimentation"

Regular Simplex Designs

Adapted from Box (1952), "Multifactor designs of first order", Biometrika 39, p. 49-57

Simplex in k dimensions: figure formed by joining any k+1 = N points that do not lie in a (k-1)-dimensional space

Example: (k + 1) = 3 points not all on the same straight line (because if they were on a line, that would be in a 1-dimensional space)

Example: (k + 1) = 4 points not all on the same plane (because if they were, that would be in a 2-dimensional space)

Triangle in 2 dimensions

Tetrahedron in 3 dimensions

Regular simplex = all edges are equal

Empirical Model-Building and Response Surfaces

From charlesreid1

Contents

Chapter 1: Introduction to Response Surface Methodology

Chapter 2: Use of Graduating Functions

Chapter 3: Least Squares for Response Surface Work

Method of Least Squares

Linear models

Algorithm

Rank of Z

Analysis of Variance: 1 regressor

Least squares: 2 regressors

Orthogonalizing second regressor

Generalization to p regressors

Bias in Least-Squares Estimators if Inadequate Model

Confidence Intervals

Chapter 4: Factorial Designs at 2 Levels

Analysis of Factorial Design

Variance, Standard Errors

Regression Coefficients

Dye example

Diagnostic Checking of Fitted Models

Response Surface Analysis

Appendix 4A: Yates' Method for Obtaining Factorial Effects

Chapter 5: Blocking and Fractionating Factorial Designs (skipping...)

Chapter 6: Use of Steepest Ascent for Process Improvement

Steepest Ascent Method

Chapter 7: Fitting Second-Order Models

Chapter 8: Adequacy of Estimation and the Use of Transformation

Chapter 9: Exploration of Maxima and Ridge Systems with Second-Order Response Surfaces

9.2 Example: Polymer Elasticity

Central Composite Design

Estimation/Elimination of Block Differences

Importance of Blocking

Sequential Assembly of Designs

Examination of Fitted Surface

Investigation of Adequacy of Fit

Bias Characteristics of the Design

Checking Functions for the Design

Complete Breakup of Residual Sum of Squares

Chapter 10: Occurrence and Elucidation of Ridge Systems I

Elucidation of Stationary Regions/Maxima/Minima by Canonical Analysis

Examples

Appendix 10A: Simple explanation of canonical analysis

Chapter 11: Occurrence and Elucidation of Ridge Systems II

Examples

Canonical Analysis to Characterize Ridge Phenomena

Example: Consecutive Chemical System with Near Stationary Planar Ridge

Example: Small reactor study yielding rising ridge surface

Example: Stationary ridge in five variables

Economic importance of dimensionality of maxima/minima

Method for obtaining desirable combination of several responses

Appendix 11A: Calculations for ANOVA

Appendix 11B: Ridge analysis (alternative to canonical analysis)

Chapter 12: Links Between Empirical and Theoretical Models

Chapter 13: Design Aspects of Variance, Bias, and Lack of Fit

Competing effects of bias and variance

Integrated Mean Squared Error

Regions of Interest, Regions of Operability for k Dimensions

Regions of Interest: Weight Functions

1-D Weight Function Example

Designs Minimizing Bias

Detecting Lack of Fit

Detecting Variable Transformability

Second Order Models

General Formulas

Chapter 14: Variance-Optimal Designs

Orthogonal designs

Chapter 15: Practical Choice of a Response Surface Design

Sequential assembly

Other Factors

Robustness

Number of Runs

Simple Design Patterns

First Order Designs

Regular Simplex Designs