Revision as of 18:42, 21 October 2010

October 7, 2010

Statistical Inference (Casella and Berger)

wikipedia:Set (mathematics)

wikipedia:Probability interpretations

http://en.wikipedia.org/wiki/Set_%28mathematics%29

http://en.wikipedia.org/wiki/Probability_interpretations

Set Theory:

Union - combination of two sets

Complement - everything that's not in A

Empty - no elements

Definitions:

experiment - any activity generating observable results

outcome - result of experiment (IMPORTANT TO KEEP STRAIGHT! don't confuse events and outcomes)

trial - single performance of experiment

sample space - set of all possible outcomes

countable/uncountable: - countable = one-to-one correspondence (e.g. 1/n) - uncountable = no one-to-one correspondence can be made -- infinite loop: you can do an infinite loop, but still count it -- flipping a coin: countable; temperature: uncountable

event - any subset of the sample space

Example:

experiment - roll a dice outcome - 1, or 2, or 3, or 4, or 5, or 6 trial - one roll of the dice COUNTABLE sample space - {1,2,3,4,5,6} event - may be {1}, or {1,2,3}, etc...

Operators:

Union, empty set, complement, intersection

Commutative:

Associative:

Distributive:

DeMorgan's Law:

Call a set abnormal if it can be put into itself (otherwise it's normal)

Example: the set of all squares is not itself square, so it is not a member of the set of squares The complimentary set, containing all non-squares, is itself not a square, so is normal

Consider the set of all normal sets Is it normal or abnormal? If it were normal, it would be contained in itself, and would therefore be abnormal If it were abnormal, it would not be contained in itself, and would therefore be normal

You can resolve this using more rigorous set theory...

More Definitions:

Disjoint ("set" term) / mutually exclusive ("probability" term) - if the intersection of two sets is the null set, they are mutually exclusive

Partition - take a group of sets; if the union of these sets is the sample sapce, and they are mutually exclusive, this is a partition

Distinction between probability theory that has a physical meaning (and is therefore "contaminated" by intuition) and a more abstract probability theory that doesn't have a corresponding physical meaning

Axiomatic probability theory (Komolgorov)

A probability is a function that follows 3 axioms:

Sample space $$ S $$

(The domain) $\sigma$ -algebra $\mathfrak{B}$ (means the set is fully consistent)

Function P -> probability over the domain $\mathfrak{B}$

1. $P(A) \geq 0$ for all $A \in \mathfrak{B}$

2. $$ P(S) = 1 $$

3. If $A \in \mathfrak{B}$ and $B \in \mathfrak{B}$ are disjoint, then $P(A \bigcup B) = P(A) + P(B)$

In other words, $P( \bigcup_{i=1}^{\infty} = \sum_{l=1}^{\infty} P(A_{i})$

This is a mathematician's viewpoint: a clean definition, as long as we follow these rules, the function is a probability.

What is the probability of the null set?

Create a partition: $S = {S, \emptyset}$

The probability of the sample space is $$ P(S) = 1 $$

So $P(\emptyset) = 1 - P(S) = 0$

$$ P(A) = 1 $$

$$ P(A^c) = 1-P(A) $$

If $A \subset B$ then $P(A) \leq P(B)$

The size of the set is directly related to the probability...

Another way to do this is using measure theory (another route, besides rigorous set theory, that leads to probability theory)

[wikipedia:Measure theory]

[wikipedia:Sigma-algebra]

Bonferroni's inequality: $P(A \bigcap B) \geq P(A) + P(B) - 1$

Reading Assignment Discussion

Classical Definition of Probability (Laplace, 1812)

"If a random experiment can result in $$ N $$ mutually exclusive and equally likely outcomes and if $$ N_A $$ of these outcomes result in the occurrence of the event $$ A $$ , the probability of $$ A $$ is defined by $P(A) = P{A} = {N_A \over N}$ ."

Example: rolling a dice

Event A might be how many times we roll a 1... or how many times we roll a 1 or a 2...

What if one side of the dice is weighted to favor 5? This definition doesn't work... No mathematical proof to show that the outcomes were mutually exclusive and equally likely.

Frequency

This is the limit, as the number of experimental trials performed (trials must be performed under "identical" conditions) goes to infinity, of the number of outcomes of the event of interest $$ N_A $$ :

$P(A) = lim_{n \rightarrow \infty} {N_A \over N}$

Limitations:

can never actually perform an infinite number of trials
what does "identical" mean? e.g. if you're performing a turbulence experiment, how can you initialize everthing the exact same way?
(Tony): what's your infinity?
- (Sean): whatever it is, it's not finite

Probability can be seen as lack of information (e.g. fluid mechanics is deterministic, so we could describe it if we know the state perfectly - we use probability to fill in/make up for the lack of knowledge)

Quantum theory: in two-split experiment, the very state of nature is random and needs to be defined using probability

Probability of an outcome in the future: judging confidence based on prior experience... statement of confidence

Bayesian vs. fequentist

Side discussion:

wikipedia:Noether's theorem

Advantage of Axiomatic Approach

It provides a rigorous mathematical framework, removing bias/preference

When you get a result, you can't necessarily specify the meaning

October 11, 2010

Derivation of Bonferoni's Inequality

Show:

$P(A \bigcap B) \leq P(A) + P(B) - 1$

Proof:

$P(A \bigcup B) = P(A) + P(B) - P(A \bigcap B) P(A \bigcap B) = P(A) + P(B) - P(A \bigcup B)$

But in general,

$P(A \bigcup B) \leq 1$

Plugging this back in,

$P(A \bigcap B) \leq P(A) + P(B) - 1$

On Wikipedia: more general case; more than two sets

wikipedia:Boole's inequality

Conditional Probability and Bayes' Rule

if $A,B \in S$ , and $P(B) \neq 0$ ,

$P(A|B) = \frac{P(A \bigcap B)}{P(B)}$

And as a result, we get Bayes' Rule:

$P(A|B) = P(B|A) P(A) \frac{ P(A) }{ P(B) }$

Derivation:

$\frac{ P(A) }{ P(B) } \times P(B|A) = \frac{ P(B \bigcap A) }{ P(A) } = \frac{ P(A \bigcap B) }{ P(A) } \times \frac{ P(A) }{ P(B) } = P(A|B)$

Example application of conditional probability:

Run a combustion simulation... "Given that the temperature is in range X, what is the concentration range?"

Statistical Independence

Definition: $P(A \bigcap B) = P(A) * P(B)$

Consequences:

$$ P(A|B) = P(A) $$

Random Variable

wikipedia:Random variable

A random variable is a mapping from a sample space to real numbers.

Example: Dice roll

Mapping rolls to a set $$ {1,2,3,4,5,6} $$

Example: Morse code

Looking at statistics of morse code...

Mapping dots and dashes to $$ {0,1} $$ (or alternatively $$ {1,2} $$ )

IMPORTANT: Sample space is different from the random variable

Induced Probability Function

Let $$ P(B) $$ for event B $B \in \mathfrak{B}$ on $$ S $$ , outcome $S_{i}$

(e.g. there's a sample space, and in the sample space there are events...

Before, we were talking about probability of events. Now, we're talking about probability of a random variable)

Random variable $$ X(s) $$ , random variable realization $\chi$ , $A \subset \chi$

$P_{X} (X \in A) = P{ S_{i} \in S : X(S_{i}) \in A }$

$S_{i}$ are the outcomes that are in B

Cumulative distribution function

Definition:

$F_{X}(x) = P_{X} (X \leq x) \forall x$

Example: Dice roll

Probability of rolling a given number is constant (1/6)

Cumulative distribution function is a line, because for x=1, cumulative probability is 1/6; for x=2, cumulative probability is 2/6; and so on.

This definition is more general than the integral definition, because in general you need a cumulative distribution function that is differentiable

Sometimes there will be situations where a probability distribution function can't be defined, and only a cumulative distribution function can be defined

$F_{X} \geq 0$

Further information:

$\displaystyle{ \lim_{x \rightarrow -\infty} } = 0$

$\displaystyle{ \lim_{x \rightarrow +\infty} } = 1$

Example

Given 5% of men are colorblind, and 0.25% of women are colorblind: if a person is chosen at random and they are colorblind, what is the probability of their gender?

$\begin{align} P(CB|M) &=& 0.05 \\ P(CB|W) &=& 0.0025 \\ P(M|CB) &=& ? \end{align}$

$P(M|CB) = \frac{ P(M \bigcap CB)}{P(CB)} = \frac{ P(CB|M) P(M) }{ P(CB) } = \frac{ (0.05) (0.50) }{ P(M) P(CB|M) + P(CB|W)P(W) } = 0.952$

Identically Distributed

For two random variables $$ X, Y $$ , and an event $$ A $$ : they are identically distributed if $P(X \in A) = P(Y \in A)$

We have an experiment, which we run a trial of, and we get an outcome. If the probability of that outcome is the same, the two outcomes are identically distributed.

The actual values of $$ X, Y $$ are not important. e.g., probability of rolling a 2 on a dice is the same as rolling a 4 on a dice, so they are identically distributed, even thought 2 and 4 are not equal.

Probability Mass Function, Probability Density Function

Difference: one's continuous, one's discrete

Probability Mass Function (PMF): $f_{X}(x) = P(X=x)$

Discrete cases only

Probability Density Function (PDF): $f_{X}(x) = \frac{ d }{ dx } F_{X}(x)$

Continuous cases only

where $F_{X}$ is the cumulative distribution function:

$F_{X}(x) = \displaystyle{ \int_{-\infty}^{x} f_{X}(x') dx' }$

Given properties of the CDF, what are the properties of the PDF?

$\displaystyle{ \int_{-\infty}^{\infty} f_{X}(x) dx } = 1$

If CDF is monotonically increasing, analogous property for PDF is $f_{X}(x) \geq 0$

Above two properties are necessary and sufficient conditions for a function to be considered a PDF.

Further properties:

$P(a \leq X \leq b) = \displaystyle{ \int_{a}^{b} f_{X}(x) dx } = F_{X}(b) - F_{X}(a)$

Example

Assume $\lambda > 0$ is a fixed positive constant

define function:

$f(x) = \begin{cases} 1/2 \lambda \exp(-\lambda x) & x \geq 0 \\ 1/2 \lambda \exp(\lambda x) & x < 0 \end{cases}$

Part a

What is the pdf of $$ f(x) $$ ?

$f(x) \geq 0 \qquad \forall x$

$\begin{align} \int_{-\infty}^{+\infty} f(x) dx &=& \int_{-\infty}^{0} f(x) dx + \int_{0}^{\infty} f(x) dx \\ & = & \frac{1}{2} \lambda \frac{1}{\lambda} \exp(\lambda x) \vert_{-\infty}^{0} + \frac{1}{2} \lambda ( - \frac{1}{\lambda} ) \exp(- \lambda x) \vert_{0}^{+\infty} \\ & = & 1 \end{align}$

Part b

Find probability that a random variable x is less than t, e.g. $P(x<t) \forall t \in \mathfrak{R}$

$P(x < t) = \int_{-\infty}^{t} f(x) dx$

For this one, you have to integrate twice.

$\int_{-\infty}^{t} f(x) dx = \begin{cases} \frac{1}{2} \exp( \lambda t ) & t < 0 \\ 1 - {1}{2} \exp(- \lambda t) & t > 0 \end{cases}$

Another name for this is the cumulative distribution function.

Transformations

Expectation

Definition:

$E( g(x) ) = \displaystyle{ \int_{-\infty}^{+\infty} g(x) f_{X}(x) dx }$

Linearity:

$E( a g_{1}(X) + b g_{2}(X) + c) = a E(g_{1}(X)) + b E(g_{2}(X)) + c$

Positivity:

If the function $g(x) > 0 \forall x$ , then $$ E(g(X)) > 0 $$

If $g_{1}(x) > g_{2}(x) \forall x$ , then $E(g_{1}(X)) > E(g_{2}(X))$

Moments:

nth moment $\mu_{n} = E( X^{n} )$

Central moments:

nth central moment $\mu^{\prime}_{n} = E( (X - \mu_{1})^{n} )$

2nd central moment: variance

Moment Generating Function (MGF)

$M_{X}(t) = E( \exp( t X ) )$

If you're looking at $$ M(-t) $$ , it's a Laplace transform. If it's $$ M(-it) $$ , it's a Fourier transform.

wikipedia:Moment generating function

$\begin{align} M_{X}(t) &=& \int_{-\infty}^{+\infty} \exp( tx ) f_{X}(x) dx \\ &=& \int_{-\infty}^{+\infty} ( 1 + tx + \frac{t^2 x^2}{2!} + \frac{t^3 x^3}{3!} + ... ) f_{X}(x) dx \\ &=& \mu_{0} + t \mu_{1} + \frac{t^2}{2!} \mu_{2} + ... \\ \mu_{n} &=& M_{X}^{(n)}(0) \end{align}$

So knowing all the moments is equivalent to knowing the PDF.

October 14, 2010

Bernoulli Trial - two outcomes, known probability for each (e.g. heads or tails, or picking black and white marbles out of an urn)

Binomial distribution - probability of k successes in n Bernoulli trials

Normal distribution (and central limit theorem) - the sum of a bunch of random variables with same mean and variance approaches a Gaussian distribution

When doing an experiment - if there are a whole bunch of causes of error, all the errors add up, and you can expect a normal (Gaussian) distribution

October 18, 2010

Example 1: Expectation, CDF

Let x be a continuous non-negative random variable

Let $$ f $$ denote the PDF

$$ f(x) = 0 $$ for $$ x < 0 $$

Show that $E(X) = \displaystyle{ \int_{0}^{\infty} \left( 1-F_{X} (x) \right) dx }$

$x = \int_{0}^{x} 1 dx'$

Next, using definition of expectation... substitute this into the definition of the expectation.

(There's a problem - going from this step to the next step)

$E(X) = \int_{x' = 0}^{x' = \infty} \left[ \int_{x=x'}^{x=\infty} f(x) dx \right] dx' - \int_{x' = 0}^{x' = \infty} \left[ \int_{x=0}^{x=x'} f(x) dx \right] dx'$

where the first quantity in square brackets is 1, and the second quantity in square brackets is the CDF of $$ x' $$ .

$= \int_{0}^{\infty} 1 dx' = \int_{0}^{\infty} F_{X} (x') dx$

$= \int_{0}^{\infty} ( 1 - F_{X} ) dx$

Example 2: Choosing Keys

A man has a set of N keys. He wants to open his door, which will open with exactly 1 key, but he doesn't know which one, and he is trying keys at random.

Part A

Find the mean number of trial attempts.

Use a negative binomial: http://en.wikipedia.org/wiki/Negative_binomial_distribution

Specifically, use a geometric distribution: http://en.wikipedia.org/wiki/Geometric_distribution

So we know that the expectation of the geometric distribution is:

$E(X) = \frac{1}{p}$

where $$ p $$ is the probability of success in the Bernoulli trial,

$p = \frac{1}{n}$

So that the mean number of trial attempts is:

$$ E(X) = n $$

Part B

What if there is no replacement?

With no attempts,

$$ P(X = 0)=0 $$

After the first attempt,

$P(X=1) = \frac{1}{n}$

After the second attempt,

$P(X=2) = \frac{ (n-1) }{ n (n-1) }$

And the third attempt,

$P(X=3) = \frac{ (n-1)(n-2) }{ n (n-1) (n-2) }$

And so on. Each time, all terms cancel out except

$P(X=x) = \frac{1}{n}$

So we can take the expectation of that:

$E(X) = \displaystyle{ \sum_{x=1}^{n} } \frac{x}{n}$

And using the formula for the first $$ n $$ counting numbers,

$E(X) = \frac{1}{n} \frac{n (n+1)}{2}$

$E(X) = \frac{n+1}{2}$

Multivariate

Set theory

Moved into random variables (mapping from event space to the real numbers)

Now we want to do a new type of mapping into multiple random variables

n-dimensional random vector - a mapping from a sample space into $\mathbf{R}^n$ Euclidian space.

Joint Probability Mass Function: $f_{X,Y,\dots} (x,y,\dots) = P(X=x, Y=y, \dots )$

Joint Probability Density Function: $P( {X, Y, \dots \in A} ) = \int_{A} \dots \int f_{X,Y,\dots} (x,y,\dots) dx dy \dots$

Joint Cumulative Distribution Function: $P(X \leq x, Y \leq y, \dots ) = F_{X,Y,\dots} (x,y,\dots) = \int_{0}^{x} \dots \int_{0}^{y} f_{X,Y,\dots} (x,y,\dots) dx dy \dots$

Question: Why is joint PDF defined in terms of P, whereas the univariate PDF is defined in terms of the CDF?

Answer: Boundaries of multivariate PDFs are often non-trivial, and are not nice even "rectangles"... You need to know the boundaries of the PDF really well to use the CDF, so the joint CDF is not used as often.

Joint Conditional PDF: $f_{X|Y} (x|y) = P(X=x|Y=y) = \frac{ f_{X,Y} (x,y) }{ f_{Y} (y) }$

The conditional PDF is just a renormalization.

But how is $f_{Y} (y)$ defined, if it's a multivariate PDF?

$f_{Y} (y) = \int_{x=-\infty}^{\infty} f_{X,Y} (x,y) dx$

This is the marginal PDF...

Marginal PDF: $f_{X} (x) = \int_{Y} \dots \int_{Z} f_{X,Y,\dots,Z} (x,y,\dots,z) dz \dots dy$

Definition of independence of $$ X $$ and $$ Y $$ : $f_{X,Y} (x,y) = f_{X} (x) f_{Y} (y)$

$P ( { X \in A, Y \in B } ) = P({ X \in A}) * P({ Y \in B })$

$$ E(g(X) h(Y)) = E(g(X)) * E(h(Y)) $$

If $ Z = X + Y $, then the moment generating function of $ Z $, $ M_{Z} (t) = M_{X} (t) * M_{Y} (t) $
- This was used in deriving Central Limit Theorem

$Z = X+Y ~ Norm( \mu{x} + \mu{y}, \sigma_{x}^2 + \sigma_{y}^2 )$

$X ~ Norm(\mu_{x}, \sigma_{x}^2)$

$Y ~ Norm(\mu_{y}, \sigma_{y}^2)$

$$ E(X) = E(E(X|Y)) $$

Define a new variable: covariance

Covariance: $Cov(X,Y) = E((X-\mu_{x})(Y-\mu_{y})) = E(XY) - \mu_{X} \mu_{Y}$

Correlation: $\rho_{XY} = \frac{ Cov(X,Y) }{\sigma_{X} \sigma_{Y}}$

Note: Just because the covariance is 0 does not mean that $$ X $$ and $$ Y $$ are independent, i.e. it doesn't imply $$ E(XY) = E(X) E(Y) $$

Bivariate Normal Distribution

Means $\mu_{X}, \mu_{Y}$
Variances $\sigma_{X}^2, \sigma_{Y}^2$
Correlation $\rho$

$f_{X,Y} (x,y) = \displaystyle{ \frac{ \exp{ \frac{-1}{2(1-\rho^2)} \left[ \left( \frac{x-\mu_{x}}{\sigma_{x}} \right) - 2 \rho \left( \frac{x - \mu_{x}}{\sigma_{x}} \right) \left( \frac{y-\mu_{y}}{\sigma_{y}} \right) + \left( \frac{y-\mu_y}{\sigma_y} \right)^2 \right] } }{ 2 \pi \sigma_x \sigma_y \sqrt{ 1 - \rho^2 } } }$

Transform of a PDF

$\overrightarrow{X}, f_{ \overrightarrow{X} } ( \overrightarrow{x} )$

$\overrightarrow{ U } \rightarrow U_{1} = g_{1} ( \overrightarrow{X} ), U_{2} = g_{2} ( \overrightarrow{X} ), \dots$

$h = g^{-1}$

$J = det \left( \left[ \frac{\partial h_i}{\partial u_j} \right]_{i,j} \right)$

The transformed PDF is:

$f_{\overrightarrow{U}} (\overrightarrow{u}) = f_{x} \left[ h_1 (\overrightarrow{u}), h_2 (\overrightarrow{u}), \dots \right] \cdot \vert J \vert$

Multivariate Example 1

X and Y have the distribution:

		X
		1	2	3
Y	2	$\frac{1}{12}$	$\frac{1}{6}$	$\frac{1}{12}$
	3	$\frac{1}{6}$	$$ 0 $$	$\frac{1}{6}$
	4	$$ 0 $$	$\frac{1}{3}$	$$ 0 $$

Part A

Show that X and Y are not independent.

One way: show that the covariance is nonzero.

$E(XY) = \displaystyle{ \sum_{i=1}^{3} \sum_{j=2}^{4} x_{i} y_{j} f_{X,Y} }$

A more fundamental way: using the definition of independence

$f_{X,Y} (x,y) =? f_{X} (x) * f_{Y} (y)$

So sum up each row/column and put it in a new row/column

		X
		1	2	3	(sum)
Y	2	$\frac{1}{12}$	$\frac{1}{6}$	$\frac{1}{12}$	$\frac{1}{3}$
	3	$\frac{1}{6}$	$$ 0 $$	$\frac{1}{6}$	$\frac{1}{3}$
	4	$$ 0 $$	$\frac{1}{3}$	$$ 0 $$	$\frac{1}{3}$
	(sum)	$\frac{1}{4}$	$\frac{1}{2}$	$\frac{1}{4}$

Then show that the product of the two marginal PDFs is not equal to the joint PDF value

i.e. pick row i and column j, and if the sum of the joint PDF across the whole row i, times the sum of the joint PDF across the whole column j, does not equal the joint PDF at location (row i col j), then we know the definition of independence is not met

Part B

Give a probability table for random variables U and V with the same marginals as X and Y but are independent.

So, we want to keep the "sum" column and row. Then we want to multiply the sum for row i by the sum for column j,

$F_{U,V} = f_{U} * f_{V}$

$f_{U} = f_{X}; f_{V} = f_{Y}$

U is a discrete binomial distribution

V is uniformly distributed

Notes

If they're independent, they WILL have a zero covariance

(So it follows that, if the covariance is nonzero, there is no way they can be independent)

But, just because the covariance is zero doesn't mean they are independent

October 21, 2010

Review

Hypergeometric - out of n objects, picking k objects (bernoulli trials) without replacement, the probability that x of them are success

Binomial distribution - out of n objects, picking k objects (bernoulli trials) with replacement, the probability that x of them are success

Normal distribution - derived using Central Limit Theorem; central concept is, if you take an infinite number of random variables distributed with the same mean and variance, the sum of these infinite number becomes a normal distribution

Independence of random variables - the joint PDF is equal to the products of the marginal PDFs

@@ Line 836: / Line 836: @@
+= October 21, 2010 =
+== Review ==
+Hypergeometric - out of n objects, picking k objects (bernoulli trials) without replacement, the probability that x of them are success
+Binomial distribution - out of n objects, picking k objects (bernoulli trials) with replacement, the probability that x of them are success
+Normal distribution - derived using Central Limit Theorem; central concept is, if you take an infinite number of random variables distributed with the same mean and variance, the sum of these infinite number becomes a normal distribution
+Independence of random variables - the joint PDF is equal to the products of the marginal PDFs
 [[Category:Science]]
 [[Category:Math]]

ProbabilityDiscussion: Difference between revisions

From charlesreid1

Revision as of 18:42, 21 October 2010

October 7, 2010

Axiomatic probability theory (Komolgorov)

Reading Assignment Discussion

Classical Definition of Probability (Laplace, 1812)

Frequency

Advantage of Axiomatic Approach

October 11, 2010

Derivation of Bonferoni's Inequality

Conditional Probability and Bayes' Rule

Statistical Independence

Random Variable

Induced Probability Function

Cumulative distribution function

Example

Identically Distributed

Probability Mass Function, Probability Density Function

Example

Part a

Part b

Transformations

Expectation

Moment Generating Function (MGF)

October 14, 2010

October 18, 2010

Example 1: Expectation, CDF

Example 2: Choosing Keys

Part A

Part B

Multivariate

Bivariate Normal Distribution

Transform of a PDF

Multivariate Example 1

Part A

Part B

Notes

October 21, 2010

Review