Box, George; Draper, Norman (1987). Empirical Model-Building and Response Surfaces. Wiley and Sons. ISBN 0-471-81033-9.

Chapter 1: Introduction to Response Surface Methodology

Questions when planning initial set of experiments:

1. Which input variables should be studied?

2. Should the input variables be examined in their original form, or should transformed input variables be employed

3. How should response be measured?

4. At which levels of a given input variable should experiments be run?

5. How complex a model is necessary in a particular situation?

6. How shall we choose qualitative variables?

7. What experimental arrangement (experimental design) should be used?

Chapter 2: Use of Graduating Functions

Polynomial approximations:

a polynomial of degree d can be thought of as a Taylor series expansion of the true underlying theoretical function y(x) truncated after terms of dth order
the higher the degree d, the more closely the Taylor series can approximate the true function
the smaller the region R over which y(x) is being approximated with the polynomial approximation, the better the approximation

Issues with application of polynomial approximations:

least squares - how does it work? what are its assumptions?
standard errors of coefficients - how to estimate the standard deviations of the linear coefficients?
adequacy of fit - approximating an unknown theoretical function empirically; need to be able to check whether a given degree of approximation is adequate; how can analysis of variance (ANOVA) and examination of residuals (observed - fitted values) help to check adequacy of fit?
designs - what designs are suitable for fitting polynomials of first and second degrees? (Ch. 4, 5, 15, 13)
transformations - how can one find transformations (generally)?

Chapter 3: Least Squares for Response Surface Work

Method of Least Squares

Least squares helps you to understand a model of the form:

y = f(x,t) + e

where:

E(y) = eta = f(x,t)

is the mean level of the response y which is affected by k variables (x1, x2, ..., xk) = x

It also involves p parameters (t1, t2, ..., tp) = t

e is experimental error

To examine this model, experiments would run at n different sets of conditions, x1, x2, ..., xn

would then observe corresponding values of response y1, y2, ..., yn

Two important questions:

1. does postulated model accurately represent the data?

2. if model does accurately represent data, what are best estimates of parameters t?

start with second question first

Given: function f(x,t) for each experimental run

n discrepancies:

$$ {y1 - f(x1,t)}, {y2 - f(x2,t)}, ..., {yn - f(xn,t)} $$

Method of least squares selects best value of t that make the sum of squares smallest:

$S(t) = \sum_{u=1}^{n} \left[ y_n - f \left( x_u, t \right) \right]^2$

S(t) = sum of squares function

minimizing choice of t is denoted

$\hat{t}$

are least-squares estimates of t good?

their goodness depends on the nature of the distribution of their errors

least-squares estimates are appropriate if you can assume that experimental errors:

$\epsilon_u = y_u - \eta_u$

are statistically independent and with constant variance, and are normally distributed

these are "standard assumptions"

Linear models

this is a limiting case, where

$\eta = f(x,t) = t_1 z_1 + t_2 z_2 + ... + t_p z_p$

adding experimental error $\epsilon = y - \eta$ :

$y = t_1 z_1 + t_2 z_2 + ... + t_p z_p + \epsilon$

model of this form is linear in the parameters

Algorithm

Formulate a problem with n observed responses, p parameters...

this yields n equations of the form

y_1 = t_1 z_{11} + t_2 z_{21} + ...

y_2 = t_1 z_{21} + t_2 z_{22} + ...

etc...

This can be written in matrix form:

$\mathbf{y} = \mathbf{Z t} + \boldsymbol{\epsilon}$

and the dimensions of each matrix are:

y = n x 1
Z = n x p
t = p x 1
epsilon = n x 1

the sum of squares function is given by:

$S(\mathbf{t}) = \sum_{u=1}^{n} \left( y_u - t_1 z_{1u} - t_2 z_{2u} - ... - t_p z_{pu} \right)^2$

or,

$S(t) = ( y - Zt )^{\prime} ( y - Zt )$

this can be rewritten as:

$\mathbf{ Z^{\prime} Z t = Z^{\prime} y }$

Rank of Z

If there are relationships between the different input parameters (z's), then the matrix Z can become singular

e.g. if there is a relationship z2 = c z1, then you can only estimate the linear combination z1 + c z2

reason: when z2 = c z1, changes in z1 can't be distinguished from changes in z2

Z (an n x p matrix) is said to be full rank p if there are no linear relationships of the form:

a_1 z_1 + a_2 z_2 + ... + a_p z_p l= 0

if there are q > 0 independent linear relationships, then Z has rank p - q

Analysis of Variance: 1 regressor

Assume simple model $y = \beta + \epsilon$

This states that y is varying about an unknown mean $\beta$

Suppose we have 3 observations of y, $\mathbf{y} = (4, 1, 1)'$

Then the model can be written as $y = z_1 t + \epsilon$

and $$ z_1 = (1, 1, 1) ' $$

and $t = \beta$

so that

[ 4 ]   [ 1 ]     [ \epsilon_1 ]
[ 1 ] = [ 1 ] t + [ \epsilon_2 ]
[ 1 ]   [ 1 ]     [ \epsilon_3 ]

Supposing the linear model posited a value of one of the regressors t, e.g. $$ t_0 = 0.5 $$

Then you could check the null hypothesis, e.g. $$ H_0 : t = t_0 = 0.5 $$

If true, the mean observation vector given by $\eta_0 = z_1 t_0$

or,

[ 0.5 ]   [ 1 ]
[ 0.5 ] = [ 1 ] 0.5
[ 0.5 ]   [ 1 ]

and the appropriate "observation breakdown" (whatever that means?) is:

$y - \eta_0 = ( \hat{y} - \eta_0 ) + ( y - \hat{y} )$

Associated with this observation breakdown is an analysis of variance table:

Source	Degrees of freedom (df)	Sum of squares (square of length), SS	Mean square, MS	Expected value of mean square, E(MS)
Model	1	$\vert \hat{y} - \eta_0 \vert^2 = ( \hat{t} - t_0 )^2 \sum z_1^2$	6.75	$\sigma^2 + ( t - t_0 )^2 \sum z_1^2$
Residual	2	$\vert y - \hat{y} \vert^2 = \sum ( y - \hat{t} z_1 )^2$	3.00	$\sigma^2$
Total	3	$\vert y - \eta_0 \vert^2 = \sum ( y - \eta_0 )^2 = 12.75$

sum of squares: squared lengths of vectors

degrees of freedom: number of dimensions in which vector can move (geometric interpretation)

the model $y = z_1 t + \epsilon$ says whatever the data is, the systematic part $\hat{y} - \eta_0 = ( \hat{t} - t_0) z_1$ of $y - \eta_0$ must lie in the direction of $$ z_1 $$ , which gives $\hat{y} - \eta_0$ only one degree of freedom.

Whatever the data, the residual vector must be perpendicular to $$ z_1 $$ (why?), and so it can move in 2 directions and has 2 degrees of freedom

Now, looking at the null hypothesis:

the component $\vert \hat{y} - \eta_0 \vert^2 = ( \hat{t} - t_0 )^2 \sum z^2$ is a measure of discrepancy between POSTULATED model $\eta_0 = z_1 t_0$ and ESTIMATED model $\hat{y} = z_1 \hat{t}$

Making "standard assumptions" (earlier), expected value of sum of squares, assuming model is true, is $( t - t_0 )^2 \sum z_1^2 + \sigma^2$

For the residual component it is $2 \sigma^2$ (or, in general, $\nu_2 \sigma^2$ , where $\nu_2$ is number of degrees of freedom of residuals)

Thus a measure of discrepancy from the null hypothesis $$ t = t_0 $$ is $F = \frac{ \vert \hat{y} - \eta_0 \vert^2 / 1 }{ \vert y - \hat{y} \vert^2 / 2 }$

if the null hypothesis were true, then the top and bottom would both estimate the same $\sigma^2$

So if F is different from 1, that indicates departure from null hypothesis

The MORE F differs from 1, the more doubtful the null hypothesis becomes

Least squares: 2 regressors

Previous model, $y = \beta + \epsilon$ , said y was represented with a mean $$ t $$ plus an error.

Instead, suppose that there are systematic deviations from the mean, associated with an external variable (e.g. humidity in the lab).

Now equation is for straight line: $y = \beta_0 + \beta_1 x + \epsilon$

or, $y = z_1 t_1 + z_2 t_2 + \epsilon$

So now the revised least-squares model is: $\eta = z_1 t_1 + z_2 t_2$

$\eta = E(y)$ - i.e. $\eta$ is in the plane defined by linear combinations of vectors $$ z_1, z_2 $$

because $z_1^{\prime} z_2 = \sum z_1 z_2 \neq 0$ , these two vectors are NOT at right angles

the least-squares values $\hat{t_1}, \hat{t_2}$ produce a vector $\hat{\hat{y}} = z_1 \hat{t_1} + z_2 \hat{t_2}$

these least-squares values make the squared length $\sum ( y - \hat{\hat{y}} )^2 = \vert y - \hat{\hat{y}} \vert^2$ of the residual vector as small as possible

The normal equations express fact that residual vector must be perpendicular to both $$ z_1 $$ and $$ z_2 $$ :

$\begin{align} z_1^{\prime} ( y - \hat{\hat{y}} ) &=& 0 \\ z_2^{\prime} ( y - \hat{\hat{y}} ) &=& 0 \end{align}$

also written as:

$\begin{align} \sum z_1 ( y - \hat{t_1} z_1 - \hat{t_2} z_2 ) &=& 0 \\ \sum z_2 ( y - \hat{t_1} z_1 - \hat{t_2} z_2 ) &=& 0 \end{align}$

also written (in matrix form) as:

$\mathbf{Z^{\prime}} ( \mathbf{y - Z \hat{t} } ) = 0$

Now suppose the null hypothesis was investigated for $t_1 = t_{10} = 0.5$ and $t_2 = t_{20} = 1.0$

Then the mean observation vector $\eta_0$ is represented as $\eta_0 = t_{10} z_1 + t_{20} z_2$

Source	Degrees of freedom	SS	MS	F
Model $$ z_1 $$ and $$ z_2 $$	2	$\vert \hat{\hat{y}} - \eta_0 \vert^2 = \sum \left[ \left( t_1 - t_{01} \right) z_1 + \left( t_2 - t_{02} \right) z_2 \right]^2 = 6.69$	3.345	2.23
Residual	1	$\vert y - \hat{\hat{y}} \vert^2 = \sum \left( y - \hat{t_1} z_1 - \hat{ t_2 } z_2 \right)^2 = 1.50$	1.50
Total	3	$\vert y - \eta_0 \vert^2 = \sum \left( y - \eta_0 \right)^2 = 8.19$

$y - \eta_0 = \left( \hat{\hat{y}} - \eta_0 \right) + \left( y - \hat{\hat{y}} \right)$

and so

$F_0 = \frac{ \vert \hat{\hat{y}} - \eta_0 \vert / 2 }{ \vert y - \hat{\hat{y}} \vert^2 / 1 } = 2.23$

Orthogonalizing second regressor

In the above example, $$ z_1 $$ and $$ z_2 $$ are not orthogonal

One can find the vectors $$ z_1 $$ and $z_{2 \cdot 1}$ that are orthogonal

To do this, use least squares property that residual vector is orthogonal to space in which the predictor variables lie

Regard $$ z_2 $$ as "response" vector and $$ z_1 $$ as predictor variable

You then obtain $\hat{z_2} = 0.2 z_1$ (how?)

so the residual vector is $z_{2 \cdot 1} = z_2 - \hat{z_2} = z_2 - 0.2 z_1$

now the model can be rewritten as $\eta = \left( t_1 + 0.2 t_2 \right) z_1 + t_2 \left( z_2 - 0.2 z_1 \right) = t z_1 + t_2 z_{2 \cdot 1}$

This gives three least-squares equations:

1. $\hat{y} = 2 z_1$ 2. $\hat{y} = 1.5 z_1 + 2.5 z_2$ 3. $\hat{y} = 2.0 z_1 + 2.5 z_{2 \cdot 1}$

The analysis of variance becomes:

Source	df	SS
Response function with $$ z_1 $$ only	1	$\vert \hat{y} - \eta_0 \|vert^2 = \left( \hat{t} - t_0 \right)^2 \sum z_1^2 = 12.0$
Extra due to $$ z_2 $$ (given $$ z_1 $$ )	1	$\vert \hat{\hat{y}} - \hat{y} \vert^2 = \hat{t}_2^2 \sum z_{2 \cdot 1}^2 = 4.5$
Residual	1	$\vert y - \hat{\hat{y}} \vert^2 = \sum \left( y - \hat{\hat{y}} \right)^2 = 1.5$
Total	3	$\vert y - \eta_0 \vert^2 = \sum \left( y - \eta_0 \right)^2 = 18.0$

Generalization to p regressors

With n observations and p parameters:

n relations implicit in response function can be written

$\boldsymbol{\eta} = \mathbf{Z t}$

Assuming Z is full rank, and letting $\hat{\mathbf{t}}$ be the vector of estimates given by normal equations

$\left( \mathbf{ y - \hat{y} } \right)^{\prime} \mathbf{Z} = \left( y - Z \hat{t} \right)^{\prime} Z = 0$

Sum of squares function is $S(t) = (y - \eta)^{\prime} (y - \eta) = (y - \hat{y})^{\prime} (y - \hat{y}) + ( \hat{y} - \eta )^{\prime} (\hat{y} - \eta)$

because cross-product is zero from the normal equations

$S(t) = S(\hat{t}) + (\hat{t} - t)^{\prime} \mathbf{Z^{\prime} Z} ( \hat{t} - t )$

Furthermore, because $\mathbf{Z^{\prime} Z}$ is positive definite, $$ S(t) $$ minimized when $t = \hat{t}$

So the solution to the normal equations producing the least squares estimate is the one where $t = \hat{t}$ :

$\hat{t} = ( \mathbf{Z^{\prime} Z} )^{-1} \mathbf{Z^{\prime} y}$

Source	df	SS
Response function	p	$\vert \hat{y} - \eta \vert^2 = (\hat{t} - t)^{\prime} \mathbf{Z^{\prime} Z} ( \hat{t} - t )$
Residual	n-p	$\vert y - \hat{y} \vert^2 = \sum ( y - \hat{y} )^2$
Total	n	$\vert y - \eta \vert^2 = \sum ( y - \eta )^2$

Bias in Least-Squares Estimators if Inadequate Model

Say data was being fit with a model $y = Z_1 t_1 + \epsilon$ ,

but the true model that should have been used is $y = Z_1 t_1 + Z_2 t_2 + \epsilon$

$$ t_1 $$ would be estimated by $\hat{t_1} = (\mathbf{ Z_1^{\prime} Z_1 } )^{-1} \mathbf{ Z_1^{\prime} y }$

but using true model, $\begin{array}{rcl} E( \hat{t_1} ) &=& ( \mathbf{Z_1^{\prime} Z_1} )^{-1} \mathbf{Z_1^{\prime}} E(\mathbf{y}) \\ &=& ( \mathbf{ Z_1^{\prime} Z_1 } )^{-1} \mathbf{Z_1^{\prime}} (\mathbf{Z_1 t_1} + \mathbf{Z_2 t_2} ) \\ &=& \mathbf{t_1 + A t_2} \end{array}$

The matrix A is the bias or alias matrix

$A = \left( \mathbf{ Z_1^{\prime} Z_1 } \right)^{-1} \mathbf{ Z_1^{\prime} Z_2 }$

Unless A = 0, $\hat{t_1}$ will represent t1 AND t2, not just t1

A = 0 when $\mathbf{Z_1^{\prime} Z_2} = 0$ , which happens if regressors in Z1 are orthogonal to regressors in Z2

Confidence Intervals

(examples given for orthogonal and non-orthogonal design... looks interesting but didn't understand it fully)

Chapter 4: Factorial Designs at 2 Levels

I think this approach has a problem... Can only lead to LINEAR models. Chapter 7 begins to deal with 2nd order models.

However, I'm not completely screwed. Composite designs: Chapter 9 details central composite designs, which consist of factorial designs for first-order effects, plus more points to determine higher-order terms.

Brief explanation of 2-level factorial designs

Designation of lower/upper level with -1/+1

Analysis of Factorial Design

Main effect of a given variable, as defined by Yates (1937), is the average difference in the level of response as one moves from low to high level of that variable

Example: effect of variable 1 is estimated by: $\begin{align} \frac{1}{4} \left( x_{2=+1,3=+1} + x_{2=-1,3=+1} + x_{2=+1,3=-1} + x_{2=-1,3=-1} \right)_{1=+1} \\ + \frac{1}{4} \left( x_{2=+1,3=+1} + x_{2=-1,3=+1} + x_{2=+1,3=-1} + x_{2=-1,3=-1} \right)_{1=-1} \\ = 0.75 \end{align}$

and effect of variable 2 is $$ -0.59 $$

and effect of variable 3 is $$ -0.35 $$

Factorial designs also make calculation of interactions possible... i.e. is effect of 1 different at the two different levels of 3?

Example given of calculating multiple interactions...

Variance, Standard Errors

For complete $$ 2^k $$ design, if $V(y) = \sigma^2$ :

$V(\mbox{grand mean}) = \frac{ \sigma^2 }{ 2^k }$

$V(\mbox{effect}) = \frac{4 \sigma^2}{ 2^k }$

or, if there are r repeats, then the denominators become $$ r 2^k $$

In practice, still need estimate $$ s^2 $$ of experimental error variacne $\sigma^2$

Suppose we're given estimate of $$ s^2 = 0.0050 $$ ; then

$\hat{V}(\overline{y}) = 0.000625$

$\hat{V}(\mbox{effect}) = 0.0025$

and corresponding standard errors are the square roots:

$s(\overline{y}) = 0.025$

$s(\mbox{effect}) = 0.05$

so the effects of each variable, with the standard error, is:

Variable I: $\overline{y} = 2.745 \pm 0.025$

Variable 1: $0.75 \pm 0.05$

Variable 2: $-0.59 \pm 0.05$

Variable 3: $-0.35 \pm 0.05$

Variable 12: $0.03 \pm 0.05$

etc...

Regression Coefficients

If you fit a first degree polynomial to textile data, you can obtain:

$\hat{y} = (2.745 \pm 0.025) + (0.375 \pm 0.025) x_1 - (0.295 \pm 0.025) x_2 - (0.175 \pm 0.025) x_3$

The estimated regression coefficients $$ b_1 = 0.375, b_2 = -0.295, b_3 = -0.175 $$ and their errors are half of the main effects and their standard errors

Factor of one half comes from definition of effect: difference in response on moving from the -1 level to the +1 level of given variable $$ x_i $$ , so it corresponds to change in y after changing $$ x_i $$ by 2 units

Regression coefficient $$ b_i $$ is the change in y when $$ x_i $$ is changed by 1 unit

Dye example

Example of $$ 2^6 $$ factorial design

Analysis of results show that data adequately explained in terms of linear effects in only three variables, $$ x_1 , x_4 , x_6 $$

Linear equations in $$ x_1 , x_4 , x_6 $$ fitted by least squares to all 64 data points:

$\begin{array}{rcl} \hat{y}_1 &=& (11.12 \pm 0.24)+(0.87 \pm 0.24) x_1+(1.49 \pm 0.24) x_4+(1.35 \pm 0.24) x_6 \\ \hat{y}_2 &=& (16.95 \pm 0.74)-(5.64 \pm 0.74) x_1-(0.17 \pm 0.74) x_4+(5.42 \pm 0.74) x_6 \\ \hat{y}_3 &=& (28.28 \pm 0.67)+(0.19 \pm 0.67) x_1-(1.50 \pm 0.67) x_4-(4.44 \pm 0.67) x_6 \end{array}$

These are not a function of $$ x_2, x_3, x_5 $$ but this does NOT mean substituting value of 0 for nonsignificant coefficients

Value of 0 would not be best estimate

Regard fitted equations as best estimates in the three dimensional subspace of the full six-dimensional space in which $$ x_2, x_3, x_5 $$ are at average values

Next, obtain estimate of standard error deviations from residual sum of squares:

Variable 1

Source of variation	SS	df	MS	F ratio
Total SS = $\sum y^2$	8,443.41	64
Correction factor, SS due to $b_0 = (\sum y)^2 / 64$	7,918.77	1
Corrected total SS	524.63	63
Due to $b_1 = b_1 \sum x_1 y = ( \sum x_1 y )^2 / \sum x_1^2$	48.825	1	48.825	13.47
Due to $b_4 = b_4 \sum x_4 y = ( \sum x_4 y )^2 / \sum x_4^2$	142.50	1	142.50	39.32
Due to $b_6 = b_6 \sum x_6 y = ( \sum x_6 y )^2 / \sum x_6^2$	115.83	1	115.83	31.96
Residual	217.47	60	3.624 $$ s_1 = 1.9038 $$

$\sum y = 711.9; \sum x_1 y = 55.9; \sum x_4 y = 95.5; \sum x_6 y = 86.1$

etc... this table also exists for variable 4 and variable 6

Potential bias in standard deviation estimates:

biased upward because of several small main effects and interactions that are being ignored
biased downward because of effect of selection (only large estimates taken to be real effects)

these s values were used to estimate standard errors of coefficients hsown in parentheses beneath coefficients

Diagnostic Checking of Fitted Models

Plots of residuals vs. $\hat{y}_1, \hat{y}_2, \hat{y}_3$

Plots of residuals vs. Time Order

Don't understand exactly what they're getting from these plots... or what "NSCORES" are... or what "Time Order" is

Response Surface Analysis

Application of model to manufacturing: want to restrict one variable to 20, another variable to 26... and maximize strength

Response surface: looking at three-dimensional cube... these two constraints create two planes

The two planes intersect and create a line PQ, and along this line the strength varies from 11.08 (Q) to 12.46 (P)

Estimated difference in strengths at P and Q given by:

$\begin{array}{rcl} \hat{y}_P - \hat{y}_Q &=& b_1 ( x_{1P} - x_{1Q} ) + b_4 ( x_{4P} - x_{4Q} ) + b_6 ( x_{6P} - x_{6Q} ) \\ &=& 12.46 - 11.08 \\ &=& 1.38 \end{array}$

and the variance is given by:

$\begin{array}{rcl} V(\hat{y}_P - \hat{y}_Q) &=& \left[ (x_{1P} - x_{1Q})^2 + (x_{4P} - x_{4Q})^2 + (x_{6P} - x_{6Q})^2 \right] V(b_i) \\ &=& \overline{PQ}^2 V(b_i) \end{array}$

where $V(b_i) = \frac{ \sigma^2 }{n}$ , and $\overline{PQ}^2$ is the squared distance between the points P and Q in the scale of the x's

So the standard deviation of $(\hat{y}_P - \hat{y}_Q) = 1.38$ is $\overline{PQ} \frac{ \sigma }{ n^{\frac{1}{2}} }$

when this is evaluated, it is:

$\left[ (x_{1P} - x_{1Q})^2 + (x_{4P} - x_{4Q})^2 + (x_{6P} - x_{6Q})^2 \right]^{\frac{1}{2}} \frac{\sigma}{ 64^{\frac{1}{2}} } = 0.2809 \sigma$

value of $$ s_1, s_2, s_3 $$ substituted for $\sigma$ , this gives a standard error

For variable 1, $s_1 = \sigma$ , the standard error is 0.53

This means that the difference $\hat{y}_P - \hat{y}_Q = 1.38$ is 2.6 times larger than the standard error, meaning we can be confident the strength is in fact higher at P than at Q

Appendix 4A: Yates' Method for Obtaining Factorial Effects

This has got to be the worst description of a mathematical technique, ever.

Chapter 5: Blocking and Fractionating Factorial Designs (skipping...)

Chapter 6: Use of Steepest Ascent for Process Improvement

Expensive and impractical to explore entire operability region (i.e. entire region in which the system could be operated)

But this should not be the objective

Instead, explore subregion of interest

For new/poorly understood systems, need to apply a preliminary procedure to find these subregions of interest where a particular model form (e.g. 2nd order polynomial) will apply

One method: one factor at a time method

Alternative method: steepest ascent method (Box: this is more effective, economical)

Steepest Ascent Method

Example: chemical system whose yield depends on time, temperature, concentration

Early stage of investigation: planar contours of first-degree equation can be expected to provide fair approximation in immediate region of point P far from optimum

Direction at right angles to contour planes is in direction of steepest ascent, if pointing toward higher yield values

Exploratory runs performed along path of steepest ascent

Best point found, or interpolated estimated maximum point on path, could be made base for new first-order design, from which further advance might be possible

After one or two applications of steepest ascent, first-order effects will no longer dominate, first order approximation will be inadequate

Second order methods (Chapter 7, Chapter 9) will then have to be applied

Empirical Model-Building and Response Surfaces

From charlesreid1

Contents