From charlesreid1

"Statistical Treatment of Experimental Data" by Green and Margerison (Elsevier)

Chapter 2 - Probability

Basic definitions:

  • set of all possible outcomes from random experiment - sample space
  • discrete - countable number of possible outcomes (can also be infinite - as in, number of particles emitted)
  • continuous - all possible real values in certain interval or series of intervals may occur
  • univariate - only one number is recorded
  • multivariate -more than one value obtained from single performance of an experiment
  • event - set of outcomes in the sample space
  • probability of an event A as outcome is P(A)
  • addition law: P(A U B) = P(A) + P(B)
  • venn diagram: if two events are not mutually exclusive, split into three mutually exclusive events (D - (D and E)), (E - (D and E)), (D and E)
  • product law: P(A and B) = P(A) * P(B)
  • conditional probability: P(C | D) = P(C and D)/P(D)
  • independent - two or more performances of an experiment are called independent if probabilities of different outcomes in one are unaffected by outcomes in the other
  • replicates - independent repeat performances of an experiment

Probability models:

  • discrete uniform model - each outcome equally likely (e.g., tossing unbiased fair die)
  • random sampling - drawing random sample of size s from batch of size N (random means, all samples of size s equally likely to be chosen); number of possible samples is N choose s

C(N,s) = \dfrac{N!}{s! (N-s)!}

  • if r of the N items are special, number of ways of drawing sample containing d specials (number of ways of choosing d specials and s-d non-specials) is:

C(r,d) \times C(N-r, s-d)

  • Another way to write this:

P(d\mbox{ specials}) = \dfrac{ C(r,d) C(N-r, s-d) }{ C(N,s) } \qquad d = 1, 2, ..., \min(r,s)

  • this is the definition of hypergeometric distribution (special case of the uniform model)
  • example: bag with 3 red and 4 blue discs, no replacement; random sample of size 2 (=s) from batch of size 7 (=N) with 3 (=r) special (red). probability that 1 (=d) sample is special (red), is P(R and B) = 3 choose 1 * 4 choose 1 / 7 choose 2

Chapter 3 - Random Variables

More definitions/concepts:

  • Random variables are a function on the sample space (corresponding to each outcome, random variable takes a particular value that is a realization of it)
  • Sample space comprises all possible values of random variable
  • Convention - capital letters denote random variables, mall letters denote realization
  • e.g., if X is discrete random variable, P(X=x) denotes probability of event comprising all outcomes for which X takes the value x; this can also be written P(x)
  • e.g, if X is a continuous random variable, P(x < X \leq x + dx) is probability of the event comprising all outcomes for which X falls into the interval (x, x+dx)
  • Realizations of random variables are not necessarily outcomes in the sample space. Example: if tossing a die, could assign outcome as 0 if even and 1 if odd
  • Random variables also called statistics or variates

Probability Density Functions

Density function:

  • If random variable X is continuous, can specify probability density function f(x)
  • The integral of f(x) over any interval A gives probability of X belonging to A, denoted P(X \in A), equivalent to P(A)

P(X \in A) = P(A) = \int_{A} f(x) dx

  • Integral over entire space -infinity to +infinity yields 1 by definition (takes value 0 where X cannot occur)
  • Discrete point: use sum instead of integral, and sum over probability p(x) of single outcomes x:

P(X \in A) = P(A) = \sum_{x \in A} p(x)

Joint density:

  • Can extend definitions above to joint density
  • Two outcomes are recorded for each performance of experiment
  • Two corresponding random variables X and Y
  • If continuous, joint density f(x,y) such that:

P(x < X \leq x+dx \mbox{ and } y < Y \leq y + dy) = f(x,y) dx dy

P(X \in A \mbox{ and } Y \in B) = \int_{A} \int_{B} f(x,y) dx dy

Likewise, integral over entire space of possible outcomes for X and Y will yield 1.


  • Two random variables are independent if:

P( X \in A \mbox{ and } Y \in B) = P(X \in A) \times P(Y \in B)

Cumulative Distribution Function

Cumulative distribution function F for a random variable X is defined for discrete and continuous random variables as:

for continuous:

F(x) = P(X \leq x) = \int_{-\infty}^{X} f(y) dy

for discrete:

F(x) = P(X \leq x) = \sum_{y \leq x} p(y)

It follows that:

P(a \leq X \leq b) = F(b) - F(a)

Statisticians use the term "distribution function" differently from physicists/chemists. Phys/chem usually apply term to probability density. Density and distribution functions are different for case of normal distribution.

For a quantity 0 \leq \beta \leq 1 we can denote the \beta quantile as \xi_{\beta} - this is the quantity such that F(\xi_{\beta}) = \beta


Define expectation using distribution function:

E(g(X)) = \sum g(x) p(x)

E(g(X)) = \int_{-\infty}^{+\infty} g(x) f(x) dx

These forms both included in the Steltjes integral form:

E(g(x)) = \int_{0}^{1} g(x) dF(x)

Represents whichever of the two (discrete or continuous) forms defined above.

Distribution mean of X also called mean of distribution F(x)

\mu = E(X) = \int_{0}^{1} x dF(x)

The rth non-central moment of X or of distribution F(x) is given by:

\mu_{r}' = E(X^r) = \int_{0}^{1} x^r dF(x) \qquad r=1, 2, ...

The rth central moment of X or of distribution F(x) is given by:

\mu_{r} = E((X-\mu)^r) = \int_{0}^{1} (x-\mu)^r dF(x) \qquad r = 1, 2, \dots

(Integral must be finite, of course.)

Distribution of variance of X or of F(x) is the second moment, \mu^2, also denoted \sigma^2, defined by:

\sigma^2 = E((X-\mu)^2) = \int_{0}^{1} (x-\mu)^2 dF(x)

Can represent variance of X by symbol V(X).

Standard deviation is the square root of variance in the distribution \sigma, more useful because it has units that match \mu and X itself.

Moment generating function represented by symbol M_{X}(t) (t is a dummy variable) defined through expression:

M_{X}(t) = E(e^{tX}) \qquad t \geq 0

Expanding the exponential function using a Taylor series yields:

M_{X}(t) = 1 + \mu_1' t + \dfrac{ \mu_2' t^2 }{2} + \dots + \dfrac{ \mu_r' t^r }{r'} + O(t^r)

Characteristic function and probability generating function:

  • closely related to moment generating function

Characteristic function definition:

cf = E(e^{itX})

Probability generating function:

pgf = E(t^{X})


Covariance of two random variables X an Y:

C(X,Y) = E( (X - \mu_X) (Y - \mu_Y) )

Variance is special case of covariance, C(X,X)

Distribution correlation coefficient is a "normalized" covariance - normalized by variance of individual variables:

\rho(X,Y)  = \dfrac{ C(X,Y) }{ \sqrt{ V(X) V(Y) } }

Properties of Expectation

Useful properties of expectation include:

Expectation of a constant is the constant

E(a) = a

Can simplify application of expectation operator to linear model

E(aX + b) = a E(X) + b

Expectation of sum is sum of individual expectations:

E(X + Y + Z) = E(X) + E(Y) + E(Z)

Properties of Variance

This can be applied to the variance expression to get a useful identity:

V(X) &=& E((X-\mu)^2) \\
&=& E(X^2 - 2 \mu X + \mu^2) \\
&=& E(X^2) - E( 2 \mu X) + E(\mu^2) \\
&=& E(X^2) - 2 \mu^2 + \mu^2 \\
&=& E(X^2) - \mu^2

The last line yields the identity:

V(X) = E(X^2) - ( E(X) )^2


V(aX + b) = a^2 V(X)

Covariance identity can likewise be derived:

C(X,Y) &=& E( (X-\mu_{x}) (Y-\mu_{y}) ) \\
&=& E(XY - \mu_{x} Y - \mu_{y} X + \mu_{x} \mu_{y} ) \\
&=& E(XY) - 2 \mu_{x} \mu_{y} + \mu_{x} \mu_{y} \\
&=& E(XY) - \mu_{x} \mu_{y}

In the special case where X and Y are independent, the expectation of the product becomes the product of the expectations, E(XY) = E(X) E(Y). In this special case, C(X,Y) = 0 and therefore \rho(X,Y) = 0

If we consider the variance of the sum of two random variables, we can find a relationship between the variance of the individual variables and their covariance:

V(X+Y) &=& E(((X+Y) - (\mu_{x} + \mu_{y}) )^2) \\
&=& E( ( (X-\mu_{x}) + (Y-\mu_{y}) )^2 ) \\
&=& E( ( X-\mu_{x})^2) + E((Y-\mu_{y})^2) + 2 E((X-\mu_{x})(Y-\mu_{y}))

This yields the identity:

V(X+Y) = V(X) + V(Y) + 2 C(X,Y)


V(X - Y) = V(X) + V(Y) - 2 C(X,Y)


Evaluate the mean and variance of a rectangular distribution.

Definition of rectangular distribution:

f(x) &=& k \qquad for a \leq x \leq b \\
f(x) &=& 0 \qquad \mbox{otherwise}

We know that

\int_{-\infty}^{+\infty} f(x) dx = 1


k(b-a) = 1

k = \dfrac{1}{b-a}

Now the mean can be computed as:

\mu = E(X) = \int_{a}^{b} k x dx = \int_{a}^{b} \dfrac{x}{b-a} dx = \dfrac{a+b}{2}

which is a trivial average.

The density is symemtrical about this value.

Further, expectation of x^2 is:

E(X^2) = \int_{a}^{b} k x^2 dx = \int_{a}^{b} \dfrac{x^2}{b-a} dx = \dfrac{b^2 + ab + a^2}{3}

This result can be used to compute the variance:

V(X) &=& E(X^2) - \mu^2 \\
&=& \dfrac{b^2 + ab + a^2}{3}  - \dfrac{(b+a)^2}{4} \\
&=& \dfrac{(b-a)^2}{12}

In the special case where a = c, b = -c, we get:

E(X) = 0

V(X) = \dfrac{c^2}{3}


If we replicate an experiment n times, we produce a vector of observations \mathbf{X} = \left[ X_1, X_2, X_3, \dots, X_n \right]

Subscripts label the observations.

Consider a function of these observations T(X) = T(X_1, X_2, X_3, \dots, X_n)

Two valuable statistics are the sample mean \overline{X} and sample variance s^2. These are defined as:

\overline{X} &=& \sum \dfrac{X_i}{n} \\
s^2 &=& s^2(X) = \dfrac{  \sum \left(X_i - \overline{X} \right)^2  }{ (n-1) } \\
&=& \dfrac{ \sum X_i^2 - \left( \dfrac{ \left( \sum X_i \right)^2 }{n} \right) }{(n-1)}

Note the variable s: the letter s represents the sample variance (and not a random variable).

Important to distinguish the sample parameters from the distribution parameters. The sample population estimates the entire population, just as the sample parameters estimate the distribution parameters. In the limit of sample size being equal to population size, the sample parameters equal the distribution parameters.

However, we also have to remember that \overline{X} and s^2 themselves have a distribution. Using different sample populations leads to different values for these two parameters.

Properties of the distributions of \overline{X} and s^2:

E(\overline{X}) = \dfrac{ \sum \mu}{n} = \mu = E(X)

The Xs are independent, so we can also get

V(\overline{X}) = \dfrac{ \sum \sigma^2 }{n^2} = \dfrac{n \sigma^2}{n^2} = \dfrac{\sigma^2}{n}

As n increases, the distribution of \overline{X} becomes more concentrated about the mean, but occurring slowly - increasing n fourfold halves the standard deviation of \overline{X}.

The expectation of s^2 can be derived using an identity:

\sum \left( X_i - a \right)^2 = \sum \left[ \left( \left( X_i - \overline{X} \right) + \left( \overline{X} - a \right) \right)^2 \right]

Now we get:

n \sigma^2 = E \left( (n-1)s^2 \right) + n V(\overline{X}) = (n-1) E(s^2) + \sigma^2

and therefore,

E(s^2) = \sigma^2

Thus the sample variance s^2 approaches the distribution's variance.

Sample variance s^2 (and real variance) are said to have n-1 degrees of freedom.

Chapter 4: Important Probability Distributions


This chapter covers the following distributions:

  • uniform distribution - dice etc.
  • binomial distribution - used for trials (binary outcomes)
  • poisson distribution - used for distribution of outcomes that are positive numbered
  • poisson process - used for distribution of event times/frequenies
  • exponential distribution - used for distribution of time elapsed
  • gamma distribution - distribution of sum of n independent exponential variates with same mean
  • normal distribution - most important and widely-used distribution, used for distribution of continuous random variables
  • chi squared distribution - another widely-used distribution, models distribution of e.g., sum of squares of n independent standard normal variates
  • student's t distribution - used for tests on, and confidence intervals for, normal distributions
  • F distribution - used in tests involving comparison fo two distribution variances (ANOVA)
  • distribution of sample mean and sample variance for normal case - important extension of discussion of normal distribution

Uniform distribution

When a "fair" process (such as a six-sided die) occurs, it has a uniform distribution.

In general, a variable x can be between a and b, in the interval:


Binomial distribution

If outcome of experiment is divided into two complementary events, A and not A, the experiment outcomes can be modeled using the binomial distribution.

Running n binomial trials results in n outcomes.

For K successes out of n trials, we have a discrete random variable on the sample space. Sample space is the number of times an outcome may occur, 0, 1, \dots, n

K has a binomial distribution B(p,n). The name comes from the fact that the probabilities P(K=k, k = 0, 1, \dots, n</math> are found from the binomial expansion of (p+q)^n

Probability of any sequence, e.g., SSFSSSFFSF... comprised of k S's and (n-k) F's is ppqppppqqpq = p^k q^{n-k} because trials are independent

Number of sequences containing k S's is the number of ways of choosing k items from n, C(n,k) (n choose k). Need to sum the probabilities of C(n,k) simple events to find the probability P(K=k).

P(K=k) = P(k) = C(n,k) p^k q^{n-k} = C(n,k) p^k (1-p)^{n-k} \qquad k = 0, 1, \dots, n

Note by definition, C(n,k) = \dfrac{n!}{k!(n-k)!}

P(k) is the term in p^k in the expansion of (p+q)^n

Total probability for all k is:

\sum_{k} P(k) = \sum_{k} C(n,k) p^k n^{n-k} = \left( p+q \right)^n = 1^n = 1

To compute probabilities of successive values of k using a recurrence relation:

\dfrac{P(k)}{P(k-1)} = \dfrac{(n-k+1)}{k} \times \dfrac{p}{q} \qquad P(0) = q^n

This can be used to calculate P(0), P(1), P(2), etc. It is a good idea to independently calculate the last probability in the sequence to check it, or sum the probabilities to ensure they sum to 1.

Binomial distribution example

Probability of single performance of experiment will yield usable result is 60%.

We perform the experiment 5 times.

Question 1: What is distribution of number of usable results?

Question 2: What is probability of at least 2 unusable results?

Question 1:

Start by calculating probabilities using direct method. Example:

P(0) &=& 1 \times 0.6^0 \times 0.4^5 = 0.01024 \\
P(1) &=& 5 \times 0.6^1 \times 0.4^4 = 0.07680 \\
P(2) &=& 10 \times 0.6^2 \times 0.4^3 = 0.2304 

or by recurrence method:

P(1) &=& P(0) \times \dfrac{5}{1} \times \dfrac{3}{2} \\
P(2) = P(1) \times \dfrac{4}{2} \times \dfrac{3}{2}


Question 2:

To find probability of more than 2 unusable results, we need to find P(K \geq 2)

P(K \geq 2) = P(2) + P(3) + P(4) + \dots

To do this in a more simple way:

P(K \geq 2) = 1 - P(0) - P(1)

This is:

P(K \geq 2) = 0.91296

Binomial distribution mean and variance

Mean can be written as:

\mu = E(K) = \sum_{k=0}^{n} k P(k)


\mu &=& \sum_{k=1}^{n} k C(n,k) p^k q^{n-k} \\
&=& \sum_{k=1}^{n} \dfrac{ k n! }{k! (n-k)!} p^k q^{n-k} \\
&=& np (p+q)^{n-1} \\
&=& np

For the variance, compute E(K(K-1)) to find E(K^2):

E(K(K-1)) = \sum_{k=2}^{n} k (k-1) P(k) = n(n-1)p^2


E(K^2) = E(K(K-1)) + E(K) = n(n-1)p^2 + np

which gives

\sigma^2 &=& E(K^2) - (E(K))^2 \\
&=& n(n-1)p^2 + np - n^2 p^2 \\
&=& np(1-p) \\
&=& npq

Additive property: if K1 and K2 are independently distributed as B(p, n_1) and B(p, n_2), the distribution of their sum K1+K2 is given by B(p, n_1 + n_2). This holds for the sum of m independent, binomially distributed random variables.

Relation to other distributions: Binomial distribution can be used to approximate the hypergeometric distribution, when sample size s is small compared to batch size. In this case, sampling without replacement (hypergeometric distribution) is well-approximated by sampling with replacement (binomial).

Poisson distribution

Relates to the number of events that occur per given segment of time or space, when the events occur randomly in time or space at a certain average rate.

Examples: number of particles emitted by radioactive source, number of faults per given length of yarn, number of typing errors per page of manuscript, number of vehicles passing a given point on a road.

Use K to represent random variable on this space. Define the Poisson distribution as the distribution in which probability that K = k is given by:

P(K = k) = P(k) = \dfrac{ m^k e^{-m}}{k!} \qquad k=0, 1, 2, \dots

Shorthand: K \sim Pn(m)

Poisson distribution has free parameter m.

\sum_{k=0}^{\infty} P(k) = e^{-m} \sum_{0}^{\infty} \dfrac{m^k}{k!} = e^{-m} e^{m} = 1

Recurrence relation:

P(k) = \dfrac{m P(k-1)}{k} \qquad P(0) = e^{-m}

If we increase the size of each segment by a factor a, number of events per segment is distributed according to Pn(am)

Poisson distribution mean and variance

To compute the mean via direct method:

\mu = E(K) = \sum_{k=0}^{\infty} \dfrac{ k m^k e^{-m} }{k!} = \sum_{k=1}^{\infty} \dfrac{m^k e^{-m}}{(k-1)!}

this becomes

\mu = m e^{-m} \sum_{k-1=0}^{\infty} \dfrac{m^{k-1}}{(k-1)!} = m e^{-m} e^{m} = m

so for the Poisson distribution P(m),

\mu = m

The free parameter m is therefore the expected value of the parameter k.

The variance can be calculated as above by first computing E(K(K-1)):

E(K(K-1)) = \sum_{k=0}^{\infty} \dfrac{ k (k-1) m^k e^{-m}}{k!} = m^2 e^{-m} \sum_{k-2=0}^{\infty} \dfrac{m^{k-2}}{(k-2)!} = m^2


\sigma^2 = E(K(K-1)) + E(K) - (E(K))^2

which becomes

\sigma^2 = m

Therefore, the mean and variance of a Poisson distribution are the same.

Additivity property: if two variables K1 and K2 are independently distributed as Pn(m1) and Pn(m2), then the distribution of their sum is Pn(m_1 + m_2)

Relationship to other distributions: Poisson distribution is useful approximation to binomial distribution B(p,n) for small p and large n. Number of successes approximately distributed as Pn(np). (Also, the normal distribution can be used to approximate the Poisson distribution.)

Poisson distribution example

Suppopse laboratory counter arranged to measure cosmic ray background. Records number of particles arriving, in intervals of 0.1 s. Very large number of measurements made, histogram obtained, estimate of mean.

Plotting KP(k) vs. k shows distribution is not quite symmetrical. (The smaller m is, the more skewed the distribution becomes.)

Mean obtained this way is 11.60, giving the parameter m for the distribution.

Repeating the experiment with a radioactive source close to the detector, mean number of particles over same interval 0.1 s is 98.73. We assume number of particles arriving at detector from radioactive source and from cosmic rays are independent, so we have two independent variables distributed according to Poisson distribution with different mean values.

The additivity theorem allows us to find the number of particles from the radioactive source alone as:

Pn(98.73) - Pn(11.60) = Pn(98.73-11.60) = Pn(87.13)

Poisson process distribution

Closely related to Poisson distribution - a Poisson process is a process in which events occur randomly in time or space. The Poisson process thinks in terms of TIME PER EVENT (or space) rather than in terms of number of events per time.

Number of events per given time have a Poisson distribution, while intervals between consecutive events have exponential distribution.

Probability of an occurrence of an event in time intervla (t, t+\delta t) is \lambda \delta t + o(\delta t), where \lambda is a constant characteristic of the process and o(\delta t) is small compared with \delta t.

Consider probability of occurrence of n events in interval (0, t + \delta t) where n \geq 1.

We only need to consider two possibilities:

A: n events occur in the interval (0,t) and none occur in the next \delta t

B: n-1 events occur in the interval (0,t) and 1 occurs in the next \delta t

(Other possibilities have an extremely small probability.)

We use P(n,t) to denote prbability that n events have occurred in interval (0,t).

P(A) = P(n,t) \times (1 - \lambda \delta t) + o(\delta t)

P(B) = P(n-1,t) \times \lambda \delta t + o(\delta t)


P(n, t+\delta t) = P(A) + P(B) + o(\delta t)

Therefore, we can get

P(n, t+\delta t) = P(n,t) \times (1 - \lambda \delta t) + P(n-1,t) \times \lambda \delta t + o(\delta t)

and that gives an approximation to the time derivative,

\dfrac{ P(n, t+\delta t) - P(n,t)}{\delta t} = \lambda \left( P(n-1,t) - P(n,t) \right) + \dfrac{o(\delta t)}{\delta t}

In the limit of \delta t \rightarrow 0 the derivative becomes

\dot{P}(n,t) = \lambda \left( P(n-1,t) - P(n,t) \right)

which, when integrated, gives a recurrence formula. Cutting to the chase, the initial probability n=0 is:

P(0,t) = e^{- \lambda t }

and the recurrence relation gives

P(n,t) = \dfrac{ (\lambda t)^n e^{- \lambda t} }{n!}

Number of occurrences in the time interval (0,t) is distributed as Pn(\lambda t)

Density of distribution of time to first occurrence of an event:

f_1 (t) = \lambda e^{-\lambda t}

Similarly, density of disttribution of time to nth event f_n(t) is:

f_n(t) = \dfrac{ \left( \lambda t \right)^{n-1} e^{- \lambda t} \lambda }{ (n-1)! }

Exponential distribution

Distribution of time elapsed, space covered, etc., before a randomly located event occurs.

Time elapsed between consecutive events in a Poisson process has an exponential distribution.

Example: lifetime of a component in a piece of apparatus; distance traveled between successive collisions in a low pressure gas.

Continuous random variable for which sample space is the positive real numbers, x \geq 0

Random variable X has the exponential distribution if density f(x) is given by:

f(x) = a e^{-ax} \qquad x \geq 0, a > 0

As required by density function,

\int_{0}^{\infty} f(x) dx = 1

F(x) = 1 - e^{-ax}

Mean is given by:

\mu &=& E(X) = \int_{0}^{\infty} a x e^{-ax} dx \\
&=& \left[ x e^{-ax} + \int e^{-ax} dx \right]_{0}^{\infty} \\
&=& \dfrac{1}{a}

The mean or expectation of the exponential distribuition a e^{-ax} is \dfrac{1}{a}

To find variance, start by finding E(X^2)

E(X^2) = \int_{0}^{\infty} a x^2 e^{-ax} dx = \left[ x^2 e^{-ax} + \int 2 x e^{-ax} dx \right]_{0}^{\infty}

E(X^2) = \dfrac{2}{a^2}


\sigma^2 = \dfrac{2}{a^2} - \dfrac{1}{a^2}

\sigma^2 = \dfrac{1}{a^2}

Relationship to other distributions: exponential distribution is connected with Poisson processes. Also closely related to the Gamma distribution - it is the simplest case of the Gamma distribution.

Exponential distribution example

Ditertiary butyl peroxide DTBP decomposes at 154.6 C in the gas phase by first order process, with rate constant k = 3.46e-4 1/s.

Number of molecules N(t) of DTBP remaining at time t after reaction is given by:

N(t) = N_0 e^{-kt}

Decrease in number of molecules of DTBP -dN(t) during time interval t to t+dt is:

-dN(t) = N(0) k e^{-kt} dt

P(t < T \leq t + dt) = - \dfrac{dN(t)}{N(0)}

This leads to

P(t < T \leq t + dt) = k e^{-kt} dt

Thus the density of the survival time is

f(t) = k e^{-kt}

The average survival time of DTBP molecules is 1/k = 2.89e3 s

Gamma distribution

Gamma distribution is related to exponential distribution. It is used to model the distribution of hte sum of n independent exponential variates, each with the same mean.

(Also related to chi-squared distribution.)

Random variable X has gamma distribution if

f(x) = \dfrac{ a^b x^{b-1} e^{-ax} }{ \Gamma(b) } \qquad x \geq 0; a, b > 0

Shorthand, denote as X \sim Gm(a,b)

b (often an integer) called the number of degrees of freedom

Ratio \dfrac{ \Gamma(b+1) }{ \Gamma(b) } = b

If b is an integer, \Gamma(b) = (b-1)!

Gamma distribution with one degree of freedom is same as exponential distribution

Gamma distribution mean and variance

We can use the identity/property

\int_{0}^{\infty} x^r e^{-x} dx = \Gamma(r+1)


E(X^s) = \int_{0}^{\infty} \dfrac{ a^b x^{b+s-1} e^{-ax} }{ \Gamma(b) } dx


E(X^s) = \dfrac{ a^s \Gamma(b+s)}{\Gamma(b)}


\mu = E(X) = \dfrac{ \Gamma(b+1)}{a \Gamma(b) } = \dfrac{b}{a}


\sigma^2 = E(X^2) - (E(X))^2 = \dfrac{ \Gamma(b+2)}{a^2 \Gamma(b)} - \dfrac{1}{a^2} \left( \dfrac{\Gamma(b+1)}{\Gamma(b)} \right)^2

which becomes

\sigma^2 = \dfrac{b}{a^2}

Additivity property: if we have two random variables X1 and X2 independently distributed as Gm(a, b_1) and Gm(a, b_2), then the sum of these variables X1 + X2 is distributed as Gm(a, b_1 + b_2)

(This can be extended to sums of multiple variables.)

Connecting Gamma distribution and Poisson distribution

If we have a random variable Z that is distributed according to the Gamma distribution, Z \sim Gm(1,m), where m is an integer, then we can obtain the following result:

P(Z > c) = P(K \leq m-1) \qquad K \sim Pn(c)

To interpret: consider Poisson process in which events occur at average rate of 1 per second; Z seconds represents waiting time until occurrence of mth event. The probability that this waiting time is greater than c seconds is jsut hte proability that not more than m-1 events have occurred during the time interval (0,c) seconds, i.e.,

P(Z > c) = P(K \leq m-1) \qquad K \sim Pn(c)

Can also be expressed as:

P(Z > c) = e^{-c} \left( 1 + c + \dfrac{c^2}{2!} + \dots + \dfrac{c^{m-1}}{(m-1)!} \right)

Gamma distribution example

A car is fifth in a queue of vehicles waiting at a toll booth. Waiting time is the sum of four service times for preceding vehicles. Service times are independently exponentially distributed with mean of 20 seconds.

Q: What is probability that car in question will have to wait more than 90 seconds?

Let service time be denoted T seconds. Then T is distributed as P(T) = a e^{-at}, E(T) = \dfrac{1}{a} = 20

If waiting time is W seconds, W is sum of 4 independent exponential variates, each with a parameter a = \dfrac{1}{20}

Hence, W \sim Gm(\dfrac{1}{20}, 4)

Therefore P(W > 90) can be obtained by using c = \dfrac{90}{20} in eqn from preceding section:

P(W > 90) = P(K \leq 4-1)

where K \sim Pn(\dfrac{90}{20}

Plugging in:

P(W > 90) = e^{-4.5} \left( 1 + 4.5 + \dfrac{4.5^2}{2} + \dfrac{4.5^3}{6} \right) = 0.3423

Normal distribution

Random variable X is normally distributed if its probability density is given by

f(x) = \left( 2 \pi \sigma^2 \right) \exp \left( \dfrac{ - (x-\mu)^2 }{ 2 \sigma^2 } \right) \qquad -\infty < x < \infty


X \sim N(\mu, \sigma^2)

Random variable Z can be written as a "standardized form" of X if:

Z = \dfrac{ X - \mu}{\sigma}

Probability density of Z \phi(Z):

\phi(z) = \dfrac{ f(x) }{ \left| \frac{dz}{dx} \right| }

The density becomes:

\phi(z) = \dfrac{1}{\sqrt{ 2 \pi }} \exp \left( \dfrac{-z^2}{2} \right) \qquad -\infty < z < \infty

Z is the standard normal variate, and is denoted Z \sim N(0,1)

Normal distribution mean and variance

Expectation of Z:

E(Z) = \int_{-\infty}^{\infty} \dfrac{1}{\sqrt{ 2 \pi}} z e^{-\frac{1}{2} z^2 } dz = 0

(because integrand is odd.)

E(Z^2) = \int_{-\infty}^{\infty} \dfrac{1}{\sqrt{ 2 \pi}} z^2 e^{-\frac{1}{2} z^2 } dz = 1


V(Z) = 1

Now we can use these to find E(X) and V(X):

X = \sigma Z + \mu


E(X) = \sigma E(Z) + \mu = \mu

V(X) = \sigma^2 V(Z) = \sigma^2

Additive property: if we have two normally distributed random variables X1 and X2, describe by normal distributions N(\mu_1, \sigma_1) and N(\mu_2, \sigma_2), the distribution of their sum X1 + X2 can be described with the normal distribution N(\mu_1 + \mu_2, \sigma_1 + \sigma_2)

Normal distribution example

Use tabulated values of \Phi = \int_{-\infty}^{z} \phi(y) dy to answer the question.

Note that \Phi(-z) = 1 - \Phi(z)

Suppose you have a physical quantity distributed as N(3,4).

Q1: What is probability of observing X > 3.5?

Q2: What is probability of observing X < 1.2?

Q3: What is probability of observing 2.5 < X < 3.5?

Question 1: convert X to Z by plugging in to definition: Z = \dfrac{X - \mu}{\sigma} = \dfrac{X - 3}{2}. Now:

P(X > 3.5) &=& P(Z > 0.25) \\
&=& 1 - \Phi(0.25) \\
&=& 1 - 0.5987 \\
&=& 0.413

Question 2: again, convert X to Z. Now:

P(X < 1.2) &=& P(Z < -0.9) \\
&=& \Phi(-0.9) \\
&=& 1 - \Phi(0.9) \\
&=& 1 - 0.8159 \\
&=& 0.1841

Question 3: convert from X to Z, which gives

P(2.5 < X < 3.5) &=& P(-0.25 < Z < 0.25) \\
&=& \Phi(0.25) - \Phi(-0.25) \\
&=& 2 \Phi(0.25) - 1 \\
&=& 1.1974 - 1 \\
&=& 0.1974

Chi squared distribution

Random variable X is distributed as chi squared with \nu degrees of freedom if density given by:

f(x) = \dfrac{ x^{\frac{1}{2}\nu - 1} e^{-\frac{1}{2} x} }{ \Gamma( \frac{1}{2} \nu) 2^{\frac{1}{2} \nu} } \qquad \nu > 0, 0 \leq x < \infty

Shorthand: X \sim \chi_{\nu}^2

Example of this type of random variable: sum of squares of n independent standard normal variates, distributed as \chi_{n}^2

Equivalently, if X_1, X_2, \dots, X_n independent random variables, each distributed as N(\mu, \sigma^2), then \dfrac{ \sum (X_i - \mu)^2 }{ \sigma^2 } distributed as \chi_n^2

It can also be shown that \dfrac{ \sum (X_i - \overline{X})^2 }{ \sigma^2 } = \dfrac{(n-1)s^2}{\sigma^2} is also distributed as \chi_{n-1}^2, independently of \overline{X}

If X \sim \chi_{\nu}^2, the mean and variance are given by:

Mean: E(X) = \nu

Variance: V(X) = 2 \nu

Additive property: if X1 and X2 are independently distributed as \chi_{\nu_1}^2 and \chi_{\nu_2}^2, then their sum X1+X2 is distributed as \chi_{\nu_1 + \nu_2}^2