Statistical Treatment of Experimental Data

"Statistical Treatment of Experimental Data" by Green and Margerison (Elsevier)

Chapter 2 - Probability

Basic definitions:

set of all possible outcomes from random experiment - sample space
discrete - countable number of possible outcomes (can also be infinite - as in, number of particles emitted)
continuous - all possible real values in certain interval or series of intervals may occur
univariate - only one number is recorded
multivariate -more than one value obtained from single performance of an experiment
event - set of outcomes in the sample space
probability of an event A as outcome is P(A)
addition law: P(A U B) = P(A) + P(B)
venn diagram: if two events are not mutually exclusive, split into three mutually exclusive events (D - (D and E)), (E - (D and E)), (D and E)
product law: P(A and B) = P(A) * P(B)
conditional probability: P(C | D) = P(C and D)/P(D)
independent - two or more performances of an experiment are called independent if probabilities of different outcomes in one are unaffected by outcomes in the other
replicates - independent repeat performances of an experiment

Probability models:

discrete uniform model - each outcome equally likely (e.g., tossing unbiased fair die)
random sampling - drawing random sample of size s from batch of size N (random means, all samples of size s equally likely to be chosen); number of possible samples is N choose s

$C(N,s) = \dfrac{N!}{s! (N-s)!}$

if r of the N items are special, number of ways of drawing sample containing d specials (number of ways of choosing d specials and s-d non-specials) is:

$C(r,d) \times C(N-r, s-d)$

Another way to write this:

$P(d\mbox{ specials}) = \dfrac{ C(r,d) C(N-r, s-d) }{ C(N,s) } \qquad d = 1, 2, ..., \min(r,s)$

this is the definition of hypergeometric distribution (special case of the uniform model)
example: bag with 3 red and 4 blue discs, no replacement; random sample of size 2 (=s) from batch of size 7 (=N) with 3 (=r) special (red). probability that 1 (=d) sample is special (red), is P(R and B) = 3 choose 1 * 4 choose 1 / 7 choose 2

Chapter 3 - Random Variables

More definitions/concepts:

Random variables are a function on the sample space (corresponding to each outcome, random variable takes a particular value that is a realization of it)
Sample space comprises all possible values of random variable
Convention - capital letters denote random variables, mall letters denote realization
e.g., if X is discrete random variable, $$ P(X=x) $$ denotes probability of event comprising all outcomes for which X takes the value x; this can also be written $$ P(x) $$
e.g, if X is a continuous random variable, $P(x < X \leq x + dx)$ is probability of the event comprising all outcomes for which X falls into the interval (x, x+dx)
Realizations of random variables are not necessarily outcomes in the sample space. Example: if tossing a die, could assign outcome as 0 if even and 1 if odd
Random variables also called statistics or variates

Probability Density Functions

Density function:

If random variable X is continuous, can specify probability density function f(x)
The integral of f(x) over any interval A gives probability of X belonging to A, denoted $P(X \in A)$ , equivalent to $$ P(A) $$

$P(X \in A) = P(A) = \int_{A} f(x) dx$

Integral over entire space -infinity to +infinity yields 1 by definition (takes value 0 where X cannot occur)
Discrete point: use sum instead of integral, and sum over probability p(x) of single outcomes x:

$P(X \in A) = P(A) = \sum_{x \in A} p(x)$

Joint density:

Can extend definitions above to joint density
Two outcomes are recorded for each performance of experiment
Two corresponding random variables X and Y
If continuous, joint density $$ f(x,y) $$ such that:

$P(x < X \leq x+dx \mbox{ and } y < Y \leq y + dy) = f(x,y) dx dy$

$P(X \in A \mbox{ and } Y \in B) = \int_{A} \int_{B} f(x,y) dx dy$

Likewise, integral over entire space of possible outcomes for X and Y will yield 1.

Independence:

Two random variables are independent if:

$P( X \in A \mbox{ and } Y \in B) = P(X \in A) \times P(Y \in B)$

(Cumulative) Distribution Function

(Cumulative) distribution function F for a random variable X is defined for discrete and continuous random variables as:

for continuous:

$F(x) = P(X \leq x) = \int_{-\infty}^{X} f(y) dy$

for discrete:

$F(x) = P(X \leq x) = \sum_{y \leq x} p(y)$

It follows that:

$P(a \leq X \leq b) = F(b) - F(a)$

Statisticians use the term "distribution function" differently from physicists/chemists. Phys/chem usually apply term to probability density. Density and distribution functions are different for case of normal distribution.

For a quantity $0 \leq \beta \leq 1$ we can denote the $\beta$ quantile as $\xi_{\beta}$ - this is the quantity such that $F(\xi_{\beta}) = \beta$

Expectation

Define expectation using distribution function:

$E(g(X)) = \sum g(x) p(x)$

$E(g(X)) = \int_{-\infty}^{+\infty} g(x) f(x) dx$

These forms both included in the Steltjes integral form:

$E(g(x)) = \int_{0}^{1} g(x) dF(x)$

Represents whichever of the two (discrete or continuous) forms defined above.

Distribution mean of X also called mean of distribution F(x)

$\mu = E(X) = \int_{0}^{1} x dF(x)$

The rth non-central moment of X or of distribution F(x) is given by:

$\mu_{r}' = E(X^r) = \int_{0}^{1} x^r dF(x) \qquad r=1, 2, ...$

The rth central moment of X or of distribution F(x) is given by:

$\mu_{r} = E((X-\mu)^r) = \int_{0}^{1} (x-\mu)^r dF(x) \qquad r = 1, 2, \dots$

(Integral must be finite, of course.)

Distribution of variance of X or of F(x) is the second moment, $\mu^2$ , also denoted $\sigma^2$ , defined by:

$\sigma^2 = E((X-\mu)^2) = \int_{0}^{1} (x-\mu)^2 dF(x)$

Can represent variance of X by symbol V(X).

Standard deviation is the square root of variance in the distribution $\sigma$ , more useful because it has units that match $\mu$ and $$ X $$ itself.

Moment generating function represented by symbol $M_{X}(t)$ (t is a dummy variable) defined through expression:

$M_{X}(t) = E(e^{tX}) \qquad t \geq 0$

Expanding the exponential function using a Taylor series yields:

$M_{X}(t) = 1 + \mu_1' t + \dfrac{ \mu_2' t^2 }{2} + \dots + \dfrac{ \mu_r' t^r }{r'} + O(t^r)$

Characteristic function and probability generating function:

closely related to moment generating function

Characteristic function definition:

$cf = E(e^{itX})$

Probability generating function:

$pgf = E(t^{X})$

Covariance

Covariance of two random variables X an Y:

$C(X,Y) = E( (X - \mu_X) (Y - \mu_Y) )$

Variance is special case of covariance, C(X,X)

Distribution correlation coefficient is a "normalized" covariance - normalized by variance of individual variables:

$\rho(X,Y) = \dfrac{ C(X,Y) }{ \sqrt{ V(X) V(Y) } }$

Flags