Statistical Treatment of Experimental Data: Difference between revisions
From charlesreid1
| Line 49: | Line 49: | ||
* Realizations of random variables are not necessarily outcomes in the sample space. Example: if tossing a die, could assign outcome as 0 if even and 1 if odd | * Realizations of random variables are not necessarily outcomes in the sample space. Example: if tossing a die, could assign outcome as 0 if even and 1 if odd | ||
* Random variables also called statistics or variates | * Random variables also called statistics or variates | ||
===Probability Density Functions=== | |||
Density function: | Density function: | ||
| Line 86: | Line 88: | ||
<math> | <math> | ||
P( X \in A \mbox{ and } Y \in B) = P(X \in A) \times P(Y \in B) | P( X \in A \mbox{ and } Y \in B) = P(X \in A) \times P(Y \in B) | ||
</math> | |||
===(Cumulative) Distribution Function=== | |||
(Cumulative) distribution function F for a random variable X is defined for discrete and continuous random variables as: | |||
for continuous: | |||
<math> | |||
F(x) = P(X \leq x) = \int_{-\infty}^{X} f(y) dy | |||
</math> | |||
for discrete: | |||
<math> | |||
F(x) = P(X \leq x) = \sum_{y \leq x} p(y) | |||
</math> | |||
It follows that: | |||
<math> | |||
P(a \leq X \leq b) = F(b) - F(a) | |||
</math> | |||
Statisticians use the term "distribution function" differently from physicists/chemists. Phys/chem usually apply term to probability density. Density and distribution functions are different for case of normal distribution. | |||
For a quantity <math>0 \leq \beta \leq 1</math> we can denote the <math>\beta</math> quantile as <math>\xi_{\beta}</math> - this is the quantity such that <math>F(\xi_{\beta}) = \beta</math> | |||
===Expectation=== | |||
Define expectation using distribution function: | |||
<math> | |||
E(g(X)) = \sum g(x) p(x) | |||
</math> | |||
<math> | |||
E(g(X)) = \int_{-\infty}^{+\infty} g(x) f(x) dx | |||
</mmath> | |||
These forms both included in the Steltjes integral form: | |||
<math> | |||
E(g(x)) = \int_{0}^{1} g(x) dF(x) | |||
</math> | |||
Represents whichever of the two (discrete or continuous) forms defined above. | |||
Distribution mean of X also called mean of distribution F(x) | |||
<math> | |||
\mu = E(X) = \int_{0}^{1} x dF(x) | |||
</math> | |||
The rth non-central moment of X or of distribution F(x) is given by: | |||
<math> | |||
\mu_{r}' = E(X^r) = \int_{0}^{1} x^r dF(x) \qquad r=1, 2, ... | |||
</math> | |||
The rth central moment of X or of distribution F(x) is given by: | |||
<math> | |||
\mu_{r} = E((X-\mu)^r) = \int_{0}^{1} (x-\mu)^r dF(x) \qquad r = 1, 2, \dots | |||
</math> | |||
(Integral must be finite, of course.) | |||
Distribution of variance of X or of F(x) is the second moment, <math>\mu^2</math>, also denoted <math>\sigma^2</math>, defined by: | |||
<math> | |||
\sigma^2 = E((X-\mu)^2) = \int_{0}^{1} (x-\mu)^2 dF(x) | |||
</math> | |||
Can represent variance of X by symbol V(X). | |||
Standard deviation is the square root of variance in the distribution <math>\sigma</math>, more useful because it has units that match <math>\mu</math> and <math>X</math> itself. | |||
Moment generating function represented by symbol <math>M_{X}(t)</math> (t is a dummy variable) defined through expression: | |||
<math> | |||
M_{X}(t) = E(e^{tX}) \qquad t \geq 0 | |||
</math> | |||
Expanding the exponential function using a Taylor series yields: | |||
<math> | |||
M_{X}(t) = 1 + \mu_1' t + \dfrac{ \mu_2' t^2 }{2} + \dots + \dfrac{ \mu_r' t^r }{r'} + O(t^r) | |||
</math> | |||
Characteristic function and probability generating function: | |||
* closely related to moment generating function | |||
Characteristic function definition: | |||
<math> | |||
cf = E(e^{itX}) | |||
</math> | |||
Probability generating function: | |||
<math> | |||
pgf = E(t^{X}) | |||
</math> | </math> | ||
Revision as of 23:33, 4 November 2017
"Statistical Treatment of Experimental Data" by Green and Margerison (Elsevier)
Chapter 2 - Probability
Basic definitions:
- set of all possible outcomes from random experiment - sample space
- discrete - countable number of possible outcomes (can also be infinite - as in, number of particles emitted)
- continuous - all possible real values in certain interval or series of intervals may occur
- univariate - only one number is recorded
- multivariate -more than one value obtained from single performance of an experiment
- event - set of outcomes in the sample space
- probability of an event A as outcome is P(A)
- addition law: P(A U B) = P(A) + P(B)
- venn diagram: if two events are not mutually exclusive, split into three mutually exclusive events (D - (D and E)), (E - (D and E)), (D and E)
- product law: P(A and B) = P(A) * P(B)
- conditional probability: P(C | D) = P(C and D)/P(D)
- independent - two or more performances of an experiment are called independent if probabilities of different outcomes in one are unaffected by outcomes in the other
- replicates - independent repeat performances of an experiment
Probability models:
- discrete uniform model - each outcome equally likely (e.g., tossing unbiased fair die)
- random sampling - drawing random sample of size s from batch of size N (random means, all samples of size s equally likely to be chosen); number of possible samples is N choose s
$ C(N,s) = \dfrac{N!}{s! (N-s)!} $
- if r of the N items are special, number of ways of drawing sample containing d specials (number of ways of choosing d specials and s-d non-specials) is:
$ C(r,d) \times C(N-r, s-d) $
- Another way to write this:
$ P(d\mbox{ specials}) = \dfrac{ C(r,d) C(N-r, s-d) }{ C(N,s) } \qquad d = 1, 2, ..., \min(r,s) $
- this is the definition of hypergeometric distribution (special case of the uniform model)
- example: bag with 3 red and 4 blue discs, no replacement; random sample of size 2 (=s) from batch of size 7 (=N) with 3 (=r) special (red). probability that 1 (=d) sample is special (red), is P(R and B) = 3 choose 1 * 4 choose 1 / 7 choose 2
Chapter 3 - Random Variables
More definitions/concepts:
- Random variables are a function on the sample space (corresponding to each outcome, random variable takes a particular value that is a realization of it)
- Sample space comprises all possible values of random variable
- Convention - capital letters denote random variables, mall letters denote realization
- e.g., if X is discrete random variable, $ P(X=x) $ denotes probability of event comprising all outcomes for which X takes the value x; this can also be written $ P(x) $
- e.g, if X is a continuous random variable, $ P(x < X \leq x + dx) $ is probability of the event comprising all outcomes for which X falls into the interval (x, x+dx)
- Realizations of random variables are not necessarily outcomes in the sample space. Example: if tossing a die, could assign outcome as 0 if even and 1 if odd
- Random variables also called statistics or variates
Probability Density Functions
Density function:
- If random variable X is continuous, can specify probability density function f(x)
- The integral of f(x) over any interval A gives probability of X belonging to A, denoted $ P(X \in A) $, equivalent to $ P(A) $
$ P(X \in A) = P(A) = \int_{A} f(x) dx $
- Integral over entire space -infinity to +infinity yields 1 by definition (takes value 0 where X cannot occur)
- Discrete point: use sum instead of integral, and sum over probability p(x) of single outcomes x:
$ P(X \in A) = P(A) = \sum_{x \in A} p(x) $
Joint density:
- Can extend definitions above to joint density
- Two outcomes are recorded for each performance of experiment
- Two corresponding random variables X and Y
- If continuous, joint density $ f(x,y) $ such that:
$ P(x < X \leq x+dx \mbox{ and } y < Y \leq y + dy) = f(x,y) dx dy $
$ P(X \in A \mbox{ and } Y \in B) = \int_{A} \int_{B} f(x,y) dx dy $
Likewise, integral over entire space of possible outcomes for X and Y will yield 1.
Independence:
- Two random variables are independent if:
$ P( X \in A \mbox{ and } Y \in B) = P(X \in A) \times P(Y \in B) $
(Cumulative) Distribution Function
(Cumulative) distribution function F for a random variable X is defined for discrete and continuous random variables as:
for continuous:
$ F(x) = P(X \leq x) = \int_{-\infty}^{X} f(y) dy $
for discrete:
$ F(x) = P(X \leq x) = \sum_{y \leq x} p(y) $
It follows that:
$ P(a \leq X \leq b) = F(b) - F(a) $
Statisticians use the term "distribution function" differently from physicists/chemists. Phys/chem usually apply term to probability density. Density and distribution functions are different for case of normal distribution.
For a quantity $ 0 \leq \beta \leq 1 $ we can denote the $ \beta $ quantile as $ \xi_{\beta} $ - this is the quantity such that $ F(\xi_{\beta}) = \beta $
Expectation
Define expectation using distribution function:
$ E(g(X)) = \sum g(x) p(x) $
$ E(g(X)) = \int_{-\infty}^{+\infty} g(x) f(x) dx </mmath> These forms both included in the Steltjes integral form: <math> E(g(x)) = \int_{0}^{1} g(x) dF(x) $
Represents whichever of the two (discrete or continuous) forms defined above.
Distribution mean of X also called mean of distribution F(x)
$ \mu = E(X) = \int_{0}^{1} x dF(x) $
The rth non-central moment of X or of distribution F(x) is given by:
$ \mu_{r}' = E(X^r) = \int_{0}^{1} x^r dF(x) \qquad r=1, 2, ... $
The rth central moment of X or of distribution F(x) is given by:
$ \mu_{r} = E((X-\mu)^r) = \int_{0}^{1} (x-\mu)^r dF(x) \qquad r = 1, 2, \dots $
(Integral must be finite, of course.)
Distribution of variance of X or of F(x) is the second moment, $ \mu^2 $, also denoted $ \sigma^2 $, defined by:
$ \sigma^2 = E((X-\mu)^2) = \int_{0}^{1} (x-\mu)^2 dF(x) $
Can represent variance of X by symbol V(X).
Standard deviation is the square root of variance in the distribution $ \sigma $, more useful because it has units that match $ \mu $ and $ X $ itself.
Moment generating function represented by symbol $ M_{X}(t) $ (t is a dummy variable) defined through expression:
$ M_{X}(t) = E(e^{tX}) \qquad t \geq 0 $
Expanding the exponential function using a Taylor series yields:
$ M_{X}(t) = 1 + \mu_1' t + \dfrac{ \mu_2' t^2 }{2} + \dots + \dfrac{ \mu_r' t^r }{r'} + O(t^r) $
Characteristic function and probability generating function:
- closely related to moment generating function
Characteristic function definition:
$ cf = E(e^{itX}) $
Probability generating function:
$ pgf = E(t^{X}) $