Statistical Treatment of Experimental Data

"Statistical Treatment of Experimental Data" by Green and Margerison (Elsevier)

Chapter 2 - Probability

Basic definitions:

set of all possible outcomes from random experiment - sample space
discrete - countable number of possible outcomes (can also be infinite - as in, number of particles emitted)
continuous - all possible real values in certain interval or series of intervals may occur
univariate - only one number is recorded
multivariate -more than one value obtained from single performance of an experiment
event - set of outcomes in the sample space
probability of an event A as outcome is P(A)
addition law: P(A U B) = P(A) + P(B)
venn diagram: if two events are not mutually exclusive, split into three mutually exclusive events (D - (D and E)), (E - (D and E)), (D and E)
product law: P(A and B) = P(A) * P(B)
conditional probability: P(C | D) = P(C and D)/P(D)
independent - two or more performances of an experiment are called independent if probabilities of different outcomes in one are unaffected by outcomes in the other
replicates - independent repeat performances of an experiment

Probability models:

discrete uniform model - each outcome equally likely (e.g., tossing unbiased fair die)
random sampling - drawing random sample of size s from batch of size N (random means, all samples of size s equally likely to be chosen); number of possible samples is N choose s

$C(N,s) = \dfrac{N!}{s! (N-s)!}$

if r of the N items are special, number of ways of drawing sample containing d specials (number of ways of choosing d specials and s-d non-specials) is:

$C(r,d) \times C(N-r, s-d)$

Another way to write this:

$P(d\mbox{ specials}) = \dfrac{ C(r,d) C(N-r, s-d) }{ C(N,s) } \qquad d = 1, 2, ..., \min(r,s)$

this is the definition of hypergeometric distribution (special case of the uniform model)
example: bag with 3 red and 4 blue discs, no replacement; random sample of size 2 (=s) from batch of size 7 (=N) with 3 (=r) special (red). probability that 1 (=d) sample is special (red), is P(R and B) = 3 choose 1 * 4 choose 1 / 7 choose 2

Chapter 3 - Random Variables

More definitions/concepts:

Random variables are a function on the sample space (corresponding to each outcome, random variable takes a particular value that is a realization of it)
Sample space comprises all possible values of random variable
Convention - capital letters denote random variables, mall letters denote realization
e.g., if X is discrete random variable, $$ P(X=x) $$ denotes probability of event comprising all outcomes for which X takes the value x; this can also be written $$ P(x) $$
e.g, if X is a continuous random variable, $P(x < X \leq x + dx)$ is probability of the event comprising all outcomes for which X falls into the interval (x, x+dx)
Realizations of random variables are not necessarily outcomes in the sample space. Example: if tossing a die, could assign outcome as 0 if even and 1 if odd
Random variables also called statistics or variates

Probability Density Functions

Density function:

If random variable X is continuous, can specify probability density function f(x)
The integral of f(x) over any interval A gives probability of X belonging to A, denoted $P(X \in A)$ , equivalent to $$ P(A) $$

$P(X \in A) = P(A) = \int_{A} f(x) dx$

Integral over entire space -infinity to +infinity yields 1 by definition (takes value 0 where X cannot occur)
Discrete point: use sum instead of integral, and sum over probability p(x) of single outcomes x:

$P(X \in A) = P(A) = \sum_{x \in A} p(x)$

Joint density:

Can extend definitions above to joint density
Two outcomes are recorded for each performance of experiment
Two corresponding random variables X and Y
If continuous, joint density $$ f(x,y) $$ such that:

$P(x < X \leq x+dx \mbox{ and } y < Y \leq y + dy) = f(x,y) dx dy$

$P(X \in A \mbox{ and } Y \in B) = \int_{A} \int_{B} f(x,y) dx dy$

Likewise, integral over entire space of possible outcomes for X and Y will yield 1.

Independence:

Two random variables are independent if:

$P( X \in A \mbox{ and } Y \in B) = P(X \in A) \times P(Y \in B)$

Cumulative Distribution Function

Cumulative distribution function F for a random variable X is defined for discrete and continuous random variables as:

for continuous:

$F(x) = P(X \leq x) = \int_{-\infty}^{X} f(y) dy$

for discrete:

$F(x) = P(X \leq x) = \sum_{y \leq x} p(y)$

It follows that:

$P(a \leq X \leq b) = F(b) - F(a)$

Statisticians use the term "distribution function" differently from physicists/chemists. Phys/chem usually apply term to probability density. Density and distribution functions are different for case of normal distribution.

For a quantity $0 \leq \beta \leq 1$ we can denote the $\beta$ quantile as $\xi_{\beta}$ - this is the quantity such that $F(\xi_{\beta}) = \beta$

Expectation

Define expectation using distribution function:

$E(g(X)) = \sum g(x) p(x)$

$E(g(X)) = \int_{-\infty}^{+\infty} g(x) f(x) dx$

These forms both included in the Steltjes integral form:

$E(g(x)) = \int_{0}^{1} g(x) dF(x)$

Represents whichever of the two (discrete or continuous) forms defined above.

Distribution mean of X also called mean of distribution F(x)

$\mu = E(X) = \int_{0}^{1} x dF(x)$

The rth non-central moment of X or of distribution F(x) is given by:

$\mu_{r}' = E(X^r) = \int_{0}^{1} x^r dF(x) \qquad r=1, 2, ...$

The rth central moment of X or of distribution F(x) is given by:

$\mu_{r} = E((X-\mu)^r) = \int_{0}^{1} (x-\mu)^r dF(x) \qquad r = 1, 2, \dots$

(Integral must be finite, of course.)

Distribution of variance of X or of F(x) is the second moment, $\mu^2$ , also denoted $\sigma^2$ , defined by:

$\sigma^2 = E((X-\mu)^2) = \int_{0}^{1} (x-\mu)^2 dF(x)$

Can represent variance of X by symbol V(X).

Standard deviation is the square root of variance in the distribution $\sigma$ , more useful because it has units that match $\mu$ and $$ X $$ itself.

Moment generating function represented by symbol $M_{X}(t)$ (t is a dummy variable) defined through expression:

$M_{X}(t) = E(e^{tX}) \qquad t \geq 0$

Expanding the exponential function using a Taylor series yields:

$M_{X}(t) = 1 + \mu_1' t + \dfrac{ \mu_2' t^2 }{2} + \dots + \dfrac{ \mu_r' t^r }{r'} + O(t^r)$

Characteristic function and probability generating function:

closely related to moment generating function

Characteristic function definition:

$cf = E(e^{itX})$

Probability generating function:

$pgf = E(t^{X})$

Covariance

Covariance of two random variables X an Y:

$C(X,Y) = E( (X - \mu_X) (Y - \mu_Y) )$

Variance is special case of covariance, C(X,X)

Distribution correlation coefficient is a "normalized" covariance - normalized by variance of individual variables:

$\rho(X,Y) = \dfrac{ C(X,Y) }{ \sqrt{ V(X) V(Y) } }$

Properties of Expectation

Useful properties of expectation include:

Expectation of a constant is the constant

$$ E(a) = a $$

Can simplify application of expectation operator to linear model

$$ E(aX + b) = a E(X) + b $$

Expectation of sum is sum of individual expectations:

$$ E(X + Y + Z) = E(X) + E(Y) + E(Z) $$

Properties of Variance

This can be applied to the variance expression to get a useful identity:

$\begin{array}{rcl} V(X) &=& E((X-\mu)^2) \\ &=& E(X^2 - 2 \mu X + \mu^2) \\ &=& E(X^2) - E( 2 \mu X) + E(\mu^2) \\ &=& E(X^2) - 2 \mu^2 + \mu^2 \\ &=& E(X^2) - \mu^2 \end{array}$

The last line yields the identity:

$$ V(X) = E(X^2) - ( E(X) )^2 $$

Likewise,

$$ V(aX + b) = a^2 V(X) $$

Covariance identity can likewise be derived:

$\begin{array}{rcl} C(X,Y) &=& E( (X-\mu_{x}) (Y-\mu_{y}) ) \\ &=& E(XY - \mu_{x} Y - \mu_{y} X + \mu_{x} \mu_{y} ) \\ &=& E(XY) - 2 \mu_{x} \mu_{y} + \mu_{x} \mu_{y} \\ &=& E(XY) - \mu_{x} \mu_{y} \end{array}$

In the special case where X and Y are independent, the expectation of the product becomes the product of the expectations, $$ E(XY) = E(X) E(Y) $$ . In this special case, $$ C(X,Y) = 0 $$ and therefore $\rho(X,Y) = 0$

If we consider the variance of the sum of two random variables, we can find a relationship between the variance of the individual variables and their covariance:

$\begin{array}{rcl} V(X+Y) &=& E(((X+Y) - (\mu_{x} + \mu_{y}) )^2) \\ &=& E( ( (X-\mu_{x}) + (Y-\mu_{y}) )^2 ) \\ &=& E( ( X-\mu_{x})^2) + E((Y-\mu_{y})^2) + 2 E((X-\mu_{x})(Y-\mu_{y})) \end{array}$

This yields the identity:

$$ V(X+Y) = V(X) + V(Y) + 2 C(X,Y) $$

Likewise,

$$ V(X - Y) = V(X) + V(Y) - 2 C(X,Y) $$

Example

Evaluate the mean and variance of a rectangular distribution.

Definition of rectangular distribution:

$\begin{array}{rcl} f(x) &=& k \qquad for a \leq x \leq b \\ f(x) &=& 0 \qquad \mbox{otherwise} \end{array}$

We know that

$\int_{-\infty}^{+\infty} f(x) dx = 1$

Therefore

$$ k(b-a) = 1 $$

$k = \dfrac{1}{b-a}$

Now the mean can be computed as:

$\mu = E(X) = \int_{a}^{b} k x dx = \int_{a}^{b} \dfrac{x}{b-a} dx = \dfrac{a+b}{2}$

which is a trivial average.

The density is symemtrical about this value.

Further, expectation of x^2 is:

$E(X^2) = \int_{a}^{b} k x^2 dx = \int_{a}^{b} \dfrac{x^2}{b-a} dx = \dfrac{b^2 + ab + a^2}{3}$

This result can be used to compute the variance:

$\begin{array}{rcl} V(X) &=& E(X^2) - \mu^2 \\ &=& \dfrac{b^2 + ab + a^2}{3} - \dfrac{(b+a)^2}{4} \\ &=& \dfrac{(b-a)^2}{12} \end{array}$

In the special case where $$ a = c, b = -c $$ , we get:

$$ E(X) = 0 $$

$V(X) = \dfrac{c^2}{3}$

Sampling

If we replicate an experiment n times, we produce a vector of observations $\mathbf{X} = \left[ X_1, X_2, X_3, \dots, X_n \right]$

Subscripts label the observations.

Consider a function of these observations $T(X) = T(X_1, X_2, X_3, \dots, X_n)$

Two valuable statistics are the sample mean $\overline{X}$ and sample variance $$ s^2 $$ . These are defined as:

$\begin{array}{rcl} \overline{X} &=& \sum \dfrac{X_i}{n} \\ s^2 &=& s^2(X) = \dfrac{ \sum \left(X_i - \overline{X} \right)^2 }{ (n-1) } \\ &=& \dfrac{ \sum X_i^2 - \left( \dfrac{ \left( \sum X_i \right)^2 }{n} \right) }{(n-1)} \end{array}$

Note the variable s: the letter s represents the sample variance (and not a random variable).

Important to distinguish the sample parameters from the distribution parameters. The sample population estimates the entire population, just as the sample parameters estimate the distribution parameters. In the limit of sample size being equal to population size, the sample parameters equal the distribution parameters.

However, we also have to remember that $\overline{X}$ and $$ s^2 $$ themselves have a distribution. Using different sample populations leads to different values for these two parameters.

Properties of the distributions of $\overline{X}$ and $$ s^2 $$ :

$E(\overline{X}) = \dfrac{ \sum \mu}{n} = \mu = E(X)$

The Xs are independent, so we can also get

$V(\overline{X}) = \dfrac{ \sum \sigma^2 }{n^2} = \dfrac{n \sigma^2}{n^2} = \dfrac{\sigma^2}{n}$

As n increases, the distribution of $\overline{X}$ becomes more concentrated about the mean, but occurring slowly - increasing n fourfold halves the standard deviation of $\overline{X}$ .

The expectation of $$ s^2 $$ can be derived using an identity:

$\sum \left( X_i - a \right)^2 = \sum \left[ \left( \left( X_i - \overline{X} \right) + \left( \overline{X} - a \right) \right)^2 \right]$

Now we get:

$n \sigma^2 = E \left( (n-1)s^2 \right) + n V(\overline{X}) = (n-1) E(s^2) + \sigma^2$

and therefore,

$E(s^2) = \sigma^2$

Thus the sample variance $$ s^2 $$ approaches the distribution's variance.

Sample variance $$ s^2 $$ (and real variance) are said to have n-1 degrees of freedom.

Chapter 4: Important Probability Distributions

Outline

This chapter covers the following distributions:

binomial distribution - used for trials (binary outcomes)
poisson distribution - used for distribution of outcomes that are positive numbered
poisson process - used for distribution of event times/frequenies
exponential distribution - used for distribution of time elapsed
gamma distribution - distribution of sum of n independent exponential variates with same mean
normal distribution - most important and widely-used distribution, used for distribution of continuous random variables
chi squared distribution - another widely-used distribution, models distribution of e.g., sum of squares of n independent standard normal variates
student's t distribution - used for tests on, and confidence intervals for, normal distributions
F distribution - used in tests involving comparison fo two distribution variances (ANOVA)
distribution of sample mean and sample variance for normal case - important extension of discussion of normal distribution

Binomial distribution

If outcome of experiment is divided into two complementary events, A and not A, the experiment outcomes can be modeled using the binomial distribution.

Running n binomial trials results in n outcomes.

For K successes out of n trials, we have a discrete random variable on the sample space. Sample space is the number of times an outcome may occur, $0, 1, \dots, n$

K has a binomial distribution $$ B(p,n) $$ . The name comes from the fact that the probabilities P(K=k, k = 0, 1, \dots, n</math> are found from the binomial expansion of $$ (p+q)^n $$

Probability of any sequence, e.g., SSFSSSFFSF... comprised of k S's and (n-k) F's is $ppqppppqqpq = p^k q^{n-k}$ because trials are independent

Number of sequences containing k S's is the number of ways of choosing k items from n, $$ C(n,k) $$ (n choose k). Need to sum the probabilities of $$ C(n,k) $$ simple events to find the probability $$ P(K=k) $$ .

$P(K=k) = P(k) = C(n,k) p^k q^{n-k} = C(n,k) p^k (1-p)^{n-k} \qquad k = 0, 1, \dots, n$

Note by definition, $C(n,k) = \dfrac{n!}{k!(n-k)!}$

$$ P(k) $$ is the term in $$ p^k $$ in the expansion of $$ (p+q)^n $$

Total probability for all k is:

$\sum_{k} P(k) = \sum_{k} C(n,k) p^k n^{n-k} = \left( p+q \right)^n = 1^n = 1$

To compute probabilities of successive values of k using a recurrence relation:

$\dfrac{P(k)}{P(k-1)} = \dfrac{(n-k+1)}{k} \times \dfrac{p}{q} \qquad P(0) = q^n$

This can be used to calculate P(0), P(1), P(2), etc. It is a good idea to independently calculate the last probability in the sequence to check it, or sum the probabilities to ensure they sum to 1.

Binomial distribution example

Probability of single performance of experiment will yield usable result is 60%.

We perform the experiment 5 times.

Question 1: What is distribution of number of usable results?

Question 2: What is probability of at least 2 unusable results?

Question 1:

Start by calculating probabilities using direct method. Example:

$\begin{array}{rcl} P(0) &=& 1 \times 0.6^0 \times 0.4^5 = 0.01024 \\ P(1) &=& 5 \times 0.6^1 \times 0.4^4 = 0.07680 \\ P(2) &=& 10 \times 0.6^2 \times 0.4^3 = 0.2304 \end{array}$

or by recurrence method:

$\begin{array}{rcl} P(1) &=& P(0) \times \dfrac{5}{1} \times \dfrac{3}{2} \\ P(2) = P(1) \times \dfrac{4}{2} \times \dfrac{3}{2} \end{array}$

etc...

Question 2:

To find probability of more than 2 unusable results, we need to find $P(K \geq 2)$

$P(K \geq 2) = P(2) + P(3) + P(4) + \dots$

To do this in a more simple way:

$P(K \geq 2) = 1 - P(0) - P(1)$

This is:

$P(K \geq 2) = 0.91296$

Binomial distribution mean and variance

Mean can be written as:

$\mu = E(K) = \sum_{k=0}^{n} k P(k)$

Simplifying:

$\begin{array}{rcl} \mu &=& \sum_{k=1}^{n} k C(n,k) p^k q^{n-k} \\ &=& \sum_{k=1}^{n} \dfrac{ k n! }{k! (n-k)!} p^k q^{n-k} \\ &=& np (p+q)^{n-1} \\ &=& np \end{array}$

For the variance, compute $$ E(K(K-1)) $$ to find $$ E(K^2) $$ :

$E(K(K-1)) = \sum_{k=2}^{n} k (k-1) P(k) = n(n-1)p^2$

Now,

$$ E(K^2) = E(K(K-1)) + E(K) = n(n-1)p^2 + np $$

which gives

$\begin{array}{rcl} \sigma^2 &=& E(K^2) - (E(K))^2 \\ &=& n(n-1)p^2 + np - n^2 p^2 \\ &=& np(1-p) \\ &=& npq \end{array}$

Additive property: if K1 and K2 are independently distributed as $$ B(p, n_1) $$ and $$ B(p, n_2) $$ , the distribution of their sum K1+K2 is given by $$ B(p, n_1 + n_2) $$ . This holds for the sum of m independent, binomially distributed random variables.

Relation to other distributions: Binomial distribution can be used to approximate the hypergeometric distribution, when sample size s is small compared to batch size. In this case, sampling without replacement (hypergeometric distribution) is well-approximated by sampling with replacement (binomial).

Poisson distribution

Relates to the number of events that occur per given segment of time or space, when the events occur randomly in time or space at a certain average rate.

Examples: number of particles emitted by radioactive source, number of faults per given length of yarn, number of typing errors per page of manuscript, number of vehicles passing a given point on a road.

Use K to represent random variable on this space. Define the Poisson distribution as the distribution in which probability that K = k is given by:

$P(K = k) = P(k) = \dfrac{ m^k e^{-m}}{k!} \qquad k=0, 1, 2, \dots$

Shorthand: $K \sim Pn(m)$

Poisson distribution has free parameter m.

$\sum_{k=0}^{\infty} P(k) = e^{-m} \sum_{0}^{\infty} \dfrac{m^k}{k!} = e^{-m} e^{m} = 1$

Recurrence relation:

$P(k) = \dfrac{m P(k-1)}{k} \qquad P(0) = e^{-m}$

If we increase the size of each segment by a factor a, number of events per segment is distributed according to $$ Pn(am) $$

Poisson distribution mean and variance

To compute the mean via direct method:

$\mu = E(K) = \sum_{k=0}^{\infty} \dfrac{ k m^k e^{-m} }{k!} = \sum_{k=1}^{\infty} \dfrac{m^k e^{-m}}{(k-1)!}$

this becomes

$\mu = m e^{-m} \sum_{k-1=0}^{\infty} \dfrac{m^{k-1}}{(k-1)!} = m e^{-m} e^{m} = m$

so for the Poisson distribution $$ P(m) $$ ,

$\mu = m$

The free parameter m is therefore the expected value of the parameter k.

The variance can be calculated as above by first computing E(K(K-1)):

$E(K(K-1)) = \sum_{k=0}^{\infty} \dfrac{ k (k-1) m^k e^{-m}}{k!} = m^2 e^{-m} \sum_{k-2=0}^{\infty} \dfrac{m^{k-2}}{(k-2)!} = m^2$

Therefore,

$\sigma^2 = E(K(K-1)) + E(K) - (E(K))^2$

which becomes

$\sigma^2 = m$

Therefore, the mean and variance of a Poisson distribution are the same.

Additivity property: if two variables K1 and K2 are independently distributed as Pn(m1) and Pn(m2), then the distribution of their sum is $$ Pn(m_1 + m_2) $$

Relationship to other distributions: Poisson distribution is useful approximation to binomial distribution B(p,n) for small p and large n. Number of successes approximately distributed as Pn(np). (Also, the normal distribution can be used to approximate the Poisson distribution.)

Poisson distribution example

Suppopse laboratory counter arranged to measure cosmic ray background. Records number of particles arriving, in intervals of 0.1 s. Very large number of measurements made, histogram obtained, estimate of mean.

Plotting KP(k) vs. k shows distribution is not quite symmetrical. (The smaller m is, the more skewed the distribution becomes.)

Mean obtained this way is 11.60, giving the parameter m for the distribution.

Repeating the experiment with a radioactive source close to the detector, mean number of particles over same interval 0.1 s is 98.73. We assume number of particles arriving at detector from radioactive source and from cosmic rays are independent, so we have two independent variables distributed according to Poisson distribution with different mean values.

The additivity theorem allows us to find the number of particles from the radioactive source alone as:

$$ Pn(98.73) - Pn(11.60) = Pn(98.73-11.60) = Pn(87.13) $$

Flags