From charlesreid1

A short course in data analysis

Yes, we have nice fancy machine learning for these huge data sets where we're literally swimming in data

But what about the real world? The real, real world? Where you have to keep your thermocouples from melting, and your pressur sensors cost $400 a pop, and only last for a few experimental runs, and your mass flow controller is acting kind of wonky and has been drifting over the past few days. And how well do I know the temperature? How do I quantify the uncertainty in my instrument's reading, and once I do, how do I use that information?

These are the kinds of problems that engineers come up against in the real world. And it's an important part of data analysis - particularly, analysis of sparse, expensive data.

This is a short course on data analysis for complex, integrated, expensive systems where data is hard to come by.

1. start with your variable selection process

2. screening runs - process information as you go - pick out variables that are likely to be important - design your experiment around them

3. variable selection/reduction process - narrow down the variables you are considering - based on analysis of the results of the screening process

4. box behnken design - polynomial/quadratic model (you're greatly simplifying your very complex model, but... but... the justification is... the models are so integrated that any effects are smoothed out and result in smooth behavior.)

5. further narrow in, by adding points to improve your model

6. Now you have a response surface model, which is a (much much much much much cheaper) version of what your full, expensive simulation tool would have predicted was the case had you spent a week babysitting a case in the queue of the supercomputer. You can pound away at it 10 million times and not break a sweat. Use the response surface model to explore the input-output parameter space.

What values of input parameters will make the output parameters consistent with experimental data? What is the consistent set?

the set of all inputs x in X conditional on l < y < u