Monte Carlo Experimental Design
From charlesreid1
Overview
Monte Carlo sampling is essentially a brute-force technique in which random samples are taken until confidence that the entire space has been sampled is satisfactory.
Random numbers are used to create sampling points in each direction.
Think of Monte Carlo ray-tracing: you send out a whole bunch of rays, each in random directions, and from the result you determine the radiative flux. Mathematically, you're performing an integration by randomly sampling the function you want to integrate, then adding up all of the random samples:
$ \int f(x) dx \approx \frac{1}{N} \sum_{i} f( x_i ) $
Explanation
Transforming Variables
For a distribution that is a function of $ m $ variables $ x_1, \dots, x_m $:
Each variable has its own range, $ \alpha_i \leq x_i \leq \beta_i $
This range must be converted to $ [0,1] $
$ \hat{x}_i = \frac{ x_i - \alpha_i }{ \beta_i - \alpha_i } $
so that $ x_i \in \left[ 0, 1 \right] \forall i = 1 \dots m $
Log Scale
If you're using a log scale, i.e. sampling logarithmically more at $ \alpha $ than $ \beta $
$ \hat{x}_i = \frac{ \log{(x_i)} - \log{(\alpha_i)} }{ \log{(\beta_i)} - \log{(\alpha_i)} } $
Selecting Samples
An $ m $-element vector of random numbers is generated to correspond with a single sample.
A number of sample points are selected, and the $ \hat{x}_i $ and corresponding $ x_i $ are stored
the function is evaluated at each input variable value $ x_i $
all random input vectors and corresponding output vectors are stored/saved
Running in Parallel
Because the code suited itself to being run in parallel, I was able to use the Matlab Parallel Computing toolbox by changing the for loop over all samples that evaluates the function at each sample.
help parfor
Analysis of Results
More about linear regression here:
- http://en.wikipedia.org/wiki/Linear_regression
- http://www.mathworks.com/help/toolbox/stats/mvregress.html
More about multivariate regression/analysis here:
"Multiple linear regression" is a model for one response variable ("y"), and one or more predictor variables ("X").
"Multivariate linear regression" broadens that to more than one response variable ("Y").
The idea is that the response variables ("Y") are correlated with each other within each (independent) observation.
- from http://www.mathworks.de/matlabcentral/newsreader/view_thread/154512
How to do it either:
- the Matlab mvregress style
- the linear algebra B\Y style
- http://www.mathworks.co.uk/matlabcentral/newsreader/view_thread/304252
Constructing a Statistical Model
Classical statistical models: assume data are Gaussian, linear in structure
Aim is to use more flexible statistical-model-based tools for data analysis
Generalized linear models: Nelder and Wedderburn (1972)
Example
For details about the problem, see Example Problem for Experimental Design
For the input uncertainty map, see Example Problem for Experimental Design
What I am trying to learn:
- What does "true" function look like via MC sampling?
- How many samples are required for MC?
- How high a degree does the polynomial go to, and is that a function of the number of samples?
- How does the MC-polynomial compare to the MC-Fourier analysis?
Plan: save the state after 10, 100, 1k, 10k samples and evaluate after each.