Overview

Monte Carlo sampling is essentially a brute-force technique in which random samples are taken until confidence that the entire space has been sampled is satisfactory.

Random numbers are used to create sampling points in each direction.

Think of Monte Carlo ray-tracing: you send out a whole bunch of rays, each in random directions, and from the result you determine the radiative flux. Mathematically, you're performing an integration by randomly sampling the function you want to integrate, then adding up all of the random samples:

$\int f(x)dx\approx {\frac {1}{N}}\sum _{i}f(x_{i})$

Explanation

Transforming Variables

For a distribution that is a function of $m$ variables $x_{1},\dots ,x_{m}$ :

Each variable has its own range, $\alpha _{i}\leq x_{i}\leq \beta _{i}$

This range must be converted to $[0,1]$

${\hat {x}}_{i}={\frac {x_{i}-\alpha _{i}}{\beta _{i}-\alpha _{i}}}$

so that $x_{i}\in \left[0,1\right]\forall i=1\dots m$

Log Scale

If you're using a log scale, i.e. sampling logarithmically more at $\alpha$ than $\beta$

${\hat {x}}_{i}={\frac {\log {(x_{i})}-\log {(\alpha _{i})}}{\log {(\beta _{i})}-\log {(\alpha _{i})}}}$


Histogram of samples in normal, non-transformed space. Samples are preferentially grouped toward lower end of range.	When the samples are plotted in the logarithm-transformed space, the preferential grouping disappears, as each bin now represents an order of magnitude, which drastically stretches the higher-end bins (so they include more samples) and squishes the lower-end bins (so they include fewer samples).

Selecting Samples

An $m$ -element vector of random numbers is generated to correspond with a single sample.

A number of sample points are selected, and the ${\hat {x}}_{i}$ and corresponding $x_{i}$ are stored

the function is evaluated at each input variable value $x_{i}$

all random input vectors and corresponding output vectors are stored/saved

Running in Parallel

Because the code suited itself to being run in parallel, I was able to use the Matlab Parallel Computing toolbox by changing the for loop over all samples that evaluates the function at each sample.

help parfor

Analysis of Results

Example

Problem Information

For details about the problem, including the input uncertainty map, see Example Problem for Experimental Design

What I am trying to learn:

What does "true" function look like via MC sampling?
How many samples are required for MC?
How high a degree does the polynomial go to, and is that a function of the number of samples?
How does the polynomial compare to a Fourier analysis?

Plan: save the state after 10, 100, 1k, 10k samples and evaluate after each.

Code

Main article: Monte Carlo Experimental Design Matlab Code

Computing Response Surface

See Response Surface Methodology for general information on response surface methodology.

contains 2 variables:
- response_surface - structure containing information about the response surface (coefficient vector is in response_surface.beta)
- model - this is a matrix containing the polynomial powers of each variable (variable order given in section Composite Experimental Design#A Note on Coefficient and Variable Order; description of polynomial powers matrix given in section Composite Experimental Design#Polynomial Powers Matrix)

A quadratic response surface was computed using all of the information from the Monte Carlo samples. There were 10,000 samples in total.

The resulting polynomial coefficient vector is:

b(1) = 175.9 
b(2) = -55.47 
b(3) = -59.65 
b(4) = -75.2 
b(5) = 1.12 
b(6) = 0.1184 
b(7) = -44.47 
b(8) = 4.065 
b(9) = -0.8079 
b(10) = 5.412 
b(11) = -10.95 
b(12) = 25.88 
b(13) = 21.01 
b(14) = -0.1112 
b(15) = 0.2904 
b(16) = -0.3872 
b(17) = -0.003535 
b(18) = -0.03342 
b(19) = -0.009973 
b(20) = -0.01873 
b(21) = 0.0003347 
b(22) = -0.0003496 
b(23) = 21.21 
b(24) = 16.26 
b(25) = 21.64 
b(26) = -0.5508 
b(27) = 0.0131 
b(28) = -10.72

for the polynomial powers matrix:

     0     0     0     0     0     0
     0     0     0     0     0     1
     0     0     0     0     1     0
     0     0     0     1     0     0
     0     0     1     0     0     0
     0     1     0     0     0     0
     1     0     0     0     0     0
     0     0     0     0     0     2
     0     0     0     0     1     1
     0     0     0     0     2     0
     0     0     0     1     0     1
     0     0     0     1     1     0
     0     0     0     2     0     0
     0     0     1     0     0     1
     0     0     1     0     1     0
     0     0     1     1     0     0
     0     0     2     0     0     0
     0     1     0     0     0     1
     0     1     0     0     1     0
     0     1     0     1     0     0
     0     1     1     0     0     0
     0     2     0     0     0     0
     1     0     0     0     0     1
     1     0     0     0     1     0
     1     0     0     1     0     0
     1     0     1     0     0     0
     1     1     0     0     0     0
     2     0     0     0     0     0

The response surface, plotted with the non-visualized dimensions set to a constant (their means), is:

and some important statistics are:

---------------------------------------------------
Response surface summary of information:
Number of variables in response surface is 6. 
Number of terms in polynomial is 28. 
Degree of response surface is 2.

MSE =			 0.04265621 
MSE DoF = 		 5007 

L-inf norm resid = 	 0.53414457 

R^2 =			 0.68956066 
adjusted R^2 =		 0.68788663 
---------------------------------------------------

Quadratic Surface, 2 Dimensions

Download the response surface here: http://files.charlesmartinreid.com/ExperimentalDesign/MCResponseSurface_2dim_2deg.mat

contains 2 variables:
- response_surface - structure containing information about the response surface (coefficient vector is in response_surface.beta)
- model - this is a matrix containing the polynomial powers of each variable (variable order given in section Composite Experimental Design#A Note on Coefficient and Variable Order; description of polynomial powers matrix given in section Composite Experimental Design#Polynomial Powers Matrix)

The same set of Monte Carlo samples was fit to a quadratic surface, but with 2 variables instead of 6.

The resulting polynomial coefficient vector is:

b(1) = 0.2606 
b(2) = -0.01793 
b(3) = 0.0367 
b(4) = -0.003453 
b(5) = 0.0003092 
b(6) = -0.0003485

for the polynomial powers matrix:

This results in a response surface that looks similar to the 6-dimensional quadratic response surface:

The statistics show that the fit is better for the 2-dimensional surface than for the 6-dimensional surface. This, combined with the fact that he response surfaces look similar, means we can conclude that the additional dimensions are probably independent of the two visualized dimensions, or that they ave a minimal impact on the response.

---------------------------------------------------
Response surface summary of information:
Number of variables in response surface is 2. 
Number of terms in polynomial is 6. 
Degree of response surface is 2.

MSE =			 0.04267264 
MSE DoF = 		 5029 

L-inf norm resid = 	 0.50344056 

R^2 =			 0.68807653 
adjusted R^2 =		 0.68776641 
---------------------------------------------------

Cubic Surface, 6 Dimensions

Download the response surface here: http://files.charlesmartinreid.com/ExperimentalDesign/MCResponseSurface_6dim_3deg.mat

contains 2 variables:
- response_surface - structure containing information about the response surface (coefficient vector is in response_surface.beta)
- model - this is a matrix containing the polynomial powers of each variable (variable order given in section Composite Experimental Design#A Note on Coefficient and Variable Order; description of polynomial powers matrix given in section Composite Experimental Design#Polynomial Powers Matrix)

The polynomial coefficient vector is:

b(1) = 9.4335e+04 
b(2) = -7.1360e+04 
b(3) = -1.3930e+04 
b(4) = 2.4439e+04 
b(5) = 6.3177e+01 
b(6) = -7.3399e-01 
b(7) = -4.7962e+04 
b(8) = 2.5084e+04 
b(9) = 1.2428e+04 
b(10) = 9.7792e+02 
b(11) = -9.2480e+03 
b(12) = -5.1276e+03 
b(13) = -5.7739e+03 
b(14) = -1.3153e+02 
b(15) = 1.5852e+01 
b(16) = -1.7894e+01 
b(17) = 7.7344e-01 
b(18) = -1.1606e+00 
b(19) = -1.8691e+00 
b(20) = 5.6333e+00 
b(21) = 5.4130e-02 
b(22) = -3.5192e-03 
b(23) = 1.7109e+03 
b(24) = -1.8055e+03 
b(25) = -6.1622e+03 
b(26) = 9.2918e+01 
b(27) = 2.2992e+00 
b(28) = 2.4321e+04 
b(29) = -3.2712e+03 
b(30) = -2.0094e+03 
b(31) = -8.4970e+02 
b(32) = 6.7127e+02 
b(33) = 5.4902e+02 
b(34) = 1.6649e+03 
b(35) = 3.9418e+01 
b(36) = -8.6106e+01 
b(37) = 2.8060e+03 
b(38) = 8.4698e+02 
b(39) = 4.0922e+01 
b(40) = -4.1602e+01 
b(41) = 3.6996e+01 
b(42) = 2.7639e+01 
b(43) = -3.6318e+01 
b(44) = 1.9221e+01 
b(45) = -1.3503e-01 
b(46) = 4.1899e-01 
b(47) = 3.5445e-02 
b(48) = 3.4742e-03 
b(49) = 1.0020e-01 
b(50) = 9.6863e-01 
b(51) = 1.0226e-01 
b(52) = -1.0614e+00 
b(53) = -7.4942e-01 
b(54) = -6.7971e-01 
b(55) = -2.5682e-02 
b(56) = 1.6590e-03 
b(57) = 6.0159e-04 
b(58) = -3.3300e-04 
b(59) = 6.4587e-04 
b(60) = 4.1078e-04 
b(61) = 8.1751e-04 
b(62) = -4.3508e-06 
b(63) = 1.0649e-05 
b(64) = 1.0716e+03 
b(65) = -3.1054e+02 
b(66) = -9.7221e+02 
b(67) = 2.0289e+03 
b(68) = -9.5553e+02 
b(69) = 2.5068e+02 
b(70) = -1.2136e+01 
b(71) = -2.9474e+00 
b(72) = -8.2761e+00 
b(73) = -5.5199e-01 
b(74) = -1.3527e-01 
b(75) = -2.5765e-01 
b(76) = -6.1860e-01 
b(77) = 4.1224e-03 
b(78) = -3.8697e-04 
b(79) = -1.8981e+03 
b(80) = 1.5007e+03 
b(81) = 5.7488e+02 
b(82) = -1.3082e+01 
b(83) = -3.1649e-01 
b(84) = -3.6841e+03

for the polynomial powers matrix:

     0     0     0     0     0     0
     0     0     0     0     0     1
     0     0     0     0     1     0
     0     0     0     1     0     0
     0     0     1     0     0     0
     0     1     0     0     0     0
     1     0     0     0     0     0
     0     0     0     0     0     2
     0     0     0     0     1     1
     0     0     0     0     2     0
     0     0     0     1     0     1
     0     0     0     1     1     0
     0     0     0     2     0     0
     0     0     1     0     0     1
     0     0     1     0     1     0
     0     0     1     1     0     0
     0     0     2     0     0     0
     0     1     0     0     0     1
     0     1     0     0     1     0
     0     1     0     1     0     0
     0     1     1     0     0     0
     0     2     0     0     0     0
     1     0     0     0     0     1
     1     0     0     0     1     0
     1     0     0     1     0     0
     1     0     1     0     0     0
     1     1     0     0     0     0
     2     0     0     0     0     0
     0     0     0     0     0     3
     0     0     0     0     1     2
     0     0     0     0     2     1
     0     0     0     0     3     0
     0     0     0     1     0     2
     0     0     0     1     1     1
     0     0     0     1     2     0
     0     0     0     2     0     1
     0     0     0     2     1     0
     0     0     0     3     0     0
     0     0     1     0     0     2
     0     0     1     0     1     1
     0     0     1     0     2     0
     0     0     1     1     0     1
     0     0     1     1     1     0
     0     0     1     2     0     0
     0     0     2     0     0     1
     0     0     2     0     1     0
     0     0     2     1     0     0
     0     0     3     0     0     0
     0     1     0     0     0     2
     0     1     0     0     1     1
     0     1     0     0     2     0
     0     1     0     1     0     1
     0     1     0     1     1     0
     0     1     0     2     0     0
     0     1     1     0     0     1
     0     1     1     0     1     0
     0     1     1     1     0     0
     0     1     2     0     0     0
     0     2     0     0     0     1
     0     2     0     0     1     0
     0     2     0     1     0     0
     0     2     1     0     0     0
     0     3     0     0     0     0
     1     0     0     0     0     2
     1     0     0     0     1     1
     1     0     0     0     2     0
     1     0     0     1     0     1
     1     0     0     1     1     0
     1     0     0     2     0     0
     1     0     1     0     0     1
     1     0     1     0     1     0
     1     0     1     1     0     0
     1     0     2     0     0     0
     1     1     0     0     0     1
     1     1     0     0     1     0
     1     1     0     1     0     0
     1     1     1     0     0     0
     1     2     0     0     0     0
     2     0     0     0     0     1
     2     0     0     0     1     0
     2     0     0     1     0     0
     2     0     1     0     0     0
     2     1     0     0     0     0
     3     0     0     0     0     0

and the response surface looks like:

Some important statistics are:

---------------------------------------------------
Response surface summary of information:
Number of variables in response surface is 6. 
Number of terms in polynomial is 84. 
Degree of response surface is 3.

MSE =			 0.02514706 
MSE DoF = 		 4951 

L-inf norm resid = 	 0.51330364 

R^2 =			 0.81903400 
adjusted R^2 =		 0.81600023 
---------------------------------------------------

Quartic Response Surface

This response surface is available for download here: http://files.charlesmartinreid.com/ExperimentalDesign/MCResponseSurface_6dim_4deg.mat

For the sake of brevity, the full coefficients and powers matrix won't be printed here (they are included in the response surface file above).

The plot and relevant statistics are given here:

500px

---------------------------------------------------
Response surface summary of information:
Number of variables in response surface is 6. 
Number of terms in polynomial is 210. 
Degree of response surface is 4.

MSE =			 0.02069806 
MSE DoF = 		 9790 

L-inf norm resid = 	 0.37829408 

R^2 =			 0.85452284 
adjusted R^2 =		 0.85141715 
---------------------------------------------------

It is clear that despite having a high-degree polynomial with a large number (210) of coefficients, the polynomial fit is still quite poor, and increasing the degree of the polynomial does not greatly increase the polynomial's fit to the data.

With the composite design response surface, the (reduced) third degree polynomial fit all of the data points exactly, and yielded 0 mean square error and an r-squared value of 1.0. However, this is because there were only 45 sample points, and almost as many polynomial coefficients - 37.

References

Matlab

Statistics toolbox documentation: http://www.mathworks.com/help/toolbox/stats/bq_676m-2.html
- Very helpful page on linear regression in general: goes through linear reg., multiple linear reg., multivariate linear reg., polynomial reg., multivariate polynomial reg. (response surfaces), etc.

x2fx function documentation: http://www.mathworks.com/help/toolbox/stats/x2fx.html

regstats function documentation: http://www.mathworks.com/help/toolbox/stats/regstats.html

scatter3 (3D scatter plot) function documentation: http://www.mathworks.com/help/techdoc/ref/scatter3.html
- Modifying axes properties: http://www.mathworks.com/help/techdoc/ref/axes_props.html
- Creating axes graphics objects: http://www.mathworks.com/help/techdoc/ref/axes.html

surf function documentation: http://www.mathworks.com/help/techdoc/ref/surf.html

Monte Carlo Experimental Design

From charlesreid1

Contents

Overview

Explanation

Transforming Variables

Log Scale

Selecting Samples

Running in Parallel

Analysis of Results

Example

Problem Information

Code

Computing Response Surface

A Note on Visualization

A Note on Coefficient and Variable Order

Polynomial Powers Matrix

Quadratic Surface, 6 Dimensions

Quadratic Surface, 2 Dimensions

Cubic Surface, 6 Dimensions

Quartic Response Surface

References

Matlab