Distribution models

Distribution models
3
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
1
Outline
1. Discrete distributions
Binomial distribution
Geometric distribution
Poisson distribution
2. Continuous distributions
Uniform distribution
Exponential distribution
Normal distribution
Central Limit Theorem
3. Multivariate normal distribution
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
2
Binomial distribution
A random experiment consists of n trials such that: The trials
are independent. Each trial results either in success or failure.
The probability of a success, p, remains constant.
The random variable that equals the number of
trials that result in a success follows a binomial
distribution with parameters 0< p <1 and n=1,2,…
X ~ B(n, p)
X≡ number of (indep.) trials that result in a success
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
3
Binomial distribution
Probability mass function:
 n k
n−k


P( X = k ) =   p (1 − p) , k ∈ {0,1, K , n}.
k
It is possible to write X=X1+…+Xn where Xi ~ B(1, p)
independent random variables.
Parameters:
E[X] = np ; Var[X] = np(1−p)
If X~B(n1, p) and Y~B(n2, p) are indep, X+Y~B(n1+n2, p)
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
4
Binomial distribution
B(50,0'7)
0.00
0.00
0.05
0.02
0.10
0.04
0.15
0.06
0.20
0.08
0.25
0.10
0.30
0.35
0.12
B(5,0'7)
0
Ignacio Cascos
1
2
3
4
5
0 3 6 9
Depto. Estadística, Universidad Carlos III
13 17 21 25 29 33 37 41 45 49
5
Example (Four flamates and a die)
Four flatmates Alice, Bob, Charly, and Dave roll a
6-sided die every night in order to decide who washes
the dishes after dinner. If the outcome is 1, Alice
washes, it is its 2, Bob does, while for 3 or 4 Charly
must wash the dishes and for 5 or 6 it is Dave's turn.
a) What is the probability that Charly washes the
dishes at most twice in a week (7 days)?
b) How many days (dinners) must we wait on
average until it is Alice's turn to wash the dishes?
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
6
Outline
1. Discrete distributions
Binomial distribution
Geometric distribution
Poisson distribution
2. Continuous distributions
Uniform distribution
Exponential distribution
Normal distribution
Central Limit Theorem
3. Multivariate normal distribution
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
7
Geometric distribution
In a series of independent trials with constant
probability of a success, 0<p<1, the random
variable denoting the number of trials until the
first success follows a geometric distribution
with parameter p
X ~ G(p)
X≡ number of (indep) trials until the first success
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
8
Geometric distribution
Probability mass function:
P( X = k ) = (1 − p )
k −1
p, k ∈ {1,2,3, K}.
Parameters: E[X] = 1/p ; Var[X] = (1−p)/p2
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
9
Geometric distribution
G(0'3)
0.0
0.00
0.05
0.1
0.10
0.2
0.15
0.3
0.20
0.4
0.25
0.5
0.30
G(0'5)
1
2
Ignacio Cascos
3
4
5
6
7
8
9 10
12
14
1
2
Depto. Estadística, Universidad Carlos III
3
4
5
6
7
8
9 10
12
14
10
Example (Four flamates and a die)
Four flatmates Alice, Bob, Charly, and Dave roll a
6-sided die every night in order to decide who washes
the dishes after dinner. If the outcome is 1, Alice
washes, it is its 2, Bob does, while for 3 or 4 Charly
must wash the dishes and for 5 or 6 it is Dave's turn.
a) What is the probability that Charly washes the
dishes at most twice in a week (7 days)?
b) How many days (dinners) must we wait on
average until it is Alice's turn to wash the dishes?
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
11
Outline
1. Discrete distributions
Binomial distribution
Geometric distribution
Poisson distribution
2. Continuous distributions
Uniform distribution
Exponential distribution
Normal distribution
Central Limit Theorem
3. Multivariate normal distribution
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
12
Poisson distribution
Assume that certain events occur in a fixed interval of real
numbers (period of time, area, volume,…) with a known
average rate λ>0 and independently one from the others.
The random variable that equals the number of events
occurring in the interval follows a Poisson distribution
with parameter λ,
X ~ ℘(λ)
X≡ number of events in the interval
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
13
Poisson distribution
Probability mass function:
P( X = k ) = e
Parameters:
−λ
λ
k
k!
, k ∈ {0,1,2,K}.
E[X] = λ ; Var[X] = λ
If X~℘(λ1) and Y~℘(λ2) are indep, X+Y~℘(λ1+λ2)
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
14
Poisson distribution
P(3)
0.00
0.00
0.05
0.05
0.10
0.15
0.10
0.20
0.15
0.25
0.30
0.20
0.35
P(1)
0
Ignacio Cascos
1
2
3
4
5
6
7
8
9
10
0
1
Depto. Estadística, Universidad Carlos III
2
3
4
5
6
7
8
9
10
15
Example (radioactive material)
A sample of radioactive material emits, on average,
15 alpha particles per minute. If the number of alpha
particles emitted follows a Poisson distribution, what is
the probability of 10 alpha particles being emitted in:
a) 1 minute ?
b) 2 minutes ?
c) Many years later, the material averages 6 alpha
particles emitted per min. What is the probability of
at least 6 alpha particles being emitted in 1 minute?
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
16
Outline
1. Discrete distributions
Binomial distribution
Geometric distribution
Poisson distribution
2. Continuous distributions
Uniform distribution
Exponential distribution
Normal distribution
Central Limit Theorem
3. Multivariate normal distribution
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
17
(Continuous) Uniform distribution
A random variable uniformly distributed in the
interval (a,b) represents a number chosen at
random between a and b. The selection is made
in such a way that the probability that the
random variable lays in any interval inside (a,b)
depends only on the length of such interval,
X~U(a,b)
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
18
(Continuous) Uniform distribution
Density mass function:
 b −1 a if
f ( x) = 
 0 if
x ∉ ( a, b)
Cumulative distribution function:
0
 x−a
F ( x) =  b − a
1

x ∈ ( a, b)
Parameters:
Ignacio Cascos
if
x≤a
if
a< x<b
if
x≥b
E[X] = (a+b)/2 ; Var[X] = (b−a)2/12
Depto. Estadística, Universidad Carlos III
19
(Continuous) Uniform distribution
Uniform Distribution
1
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0
Lower limit,Upper limit
1,3
cumulative probability
density
Uniform Distribution
Lower limit,Upper limit
1
1,3
0,8
0,6
0,4
0,2
0
0 0,4 0,8 1,2 1,6 2 2,4 2,8 3,2 3,6 4
0 0,4 0,8 1,2 1,6 2 2,4 2,8 3,2 3,6 4
x
x
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
20
Outline
1. Discrete distributions
Binomial distribution
Geometric distribution
Poisson distribution
2. Continuous distributions
Uniform distribution
Exponential distribution
Normal distribution
Central Limit Theorem
3. Multivariate normal distribution
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
21
Exponential distribution
The random variable that equals the distance
between successive events in a Poisson process
with mean λ>0 follows an exponential
distribution with parameter λ,
X ~ Exp(λ)
X≡ distance between successive events
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
22
Exponential distribution
Density mass function:
λe − λx
f ( x) = 
 0
x>0
x≤0
Cumulative distribution function:
1 − e − λx
F ( x) = 
 0
if
if
Parameters:
Ignacio Cascos
if
if
x>0
x≤0
E[X] = λ−1 ; Var[X] = λ−2
Depto. Estadística, Universidad Carlos III
23
Exponential distribution
Exponential Distribution
0,1
Mean
10
density
0,08
0,06
0,04
0,02
0
0
10
20
30
40
50
60
cumulative probability
Exponential Distribution
1
Mean
10
0,8
0,6
0,4
0,2
0
-10
0
x
Ignacio Cascos
10
20
30
40
50
60
70
x
Depto. Estadística, Universidad Carlos III
24
Exponential distribution
Lack of memory property.
For an exponential random variable T,
given t1,t2>0
P(T > t1+t2 | T > t1) = P(T > t2)
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
25
Example (radioactive material)
On average, a sample of radioactive material emits
15 alpha particles per minute.
a) What is the average time between the emission of
two alpha particles?
b) What is the probability that the time between the
emission of two alpha particles is longer than 10 sec?
c) Last alpha particle was emitted 10 seconds ago.
What is the probability that it still takes longer than
10 seconds until the next particle is emitted?
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
26
Outline
1. Discrete distributions
Binomial distribution
Geometric distribution
Poisson distribution
2. Continuous distributions
Uniform distribution
Exponential distribution
Normal distribution
Central Limit Theorem
3. Multivariate normal distribution
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
27
Normal distribution
The most widely used model for the distribution
of a random variable is a normal (or Gaussian)
distribution. Apart from other relevant
properties, it appears as the limit distribution in
the Central Limit Theorem. A normal
distribution is determined by two parameters, the
mean µ and the standard deviation σ>0,
X ~ N(µ,σ)
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
28
Normal distribution
Standard normal density mass function N(0,1):
 x2 
1
exp− 
f ( x) =
2π
 2
Density mass function N(µ,σ):
 ( x − µ )2 
1
exp−
f ( x) =

2
2σ 
σ 2π

Parameters:
Ignacio Cascos
E[X] = µ ; Var[X] = σ2
Depto. Estadística, Universidad Carlos III
29
Normal distribution
Normal Distribution
0,4
Mean,Std. dev.1
0,1
0,8
0,3
density
cumulative probability
Normal Distribution
0,2
0,1
0
-5
-3
-1
1
3
5
Mean,Std. dev.
0,1
0,6
0,4
0,2
0
-5
-3
x
Ignacio Cascos
-1
1
3
5
x
Depto. Estadística, Universidad Carlos III
30
Normal distribution
N(0,1) negro, N(2,1) rojo
0.2
0.1
0.0
0.0
0.2
0.4
0.6
0.3
0.8
0.4
N(0,0'5) rojo, N(0,1) negro, N(0,2) azul
-6
-4
-2
0
2
4
6
-6
r
Ignacio Cascos
-4
-2
0
2
4
6
r
Depto. Estadística, Universidad Carlos III
31
Normal distribution
1.
2.
Properties.
If X ~ N(µ,σ) , for every a and b,
aX+b ~ N(aµ+b , |a|σ)
If X ~ N(µ
µ1,σ
σ1) , Y ~ N(µ
µ2,σ
σ2) indep, for a, b
aX+bY ~ N(aµ1+bµ2 , (a2σ12+b2σ22)1/2)
Standardization. Given X~N(µ,σ), the random
variable (X−µ)/σ follows a standard normal
distribution, N(0,1).
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
32
Table for the N(0,1) cdf
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
33
Example (Capacitors)
A machine makes capacitors with a mean value
of 25 µF and a standard deviation of 6 µF.
Assuming that capacitance follows a Gaussian
distribution, find the probability that the value of
capacitance exceeds 31 µF.
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
34
Central Limit Theorem
Given n independent random variables X1,X2,…,Xn, with
finite means and variances E[Xi]=µi and Var[Xi]=σi2, the
limiting distribution (n→∞) of their sum is normal
X1+X2+…+Xn≈N(Σi=1,nµi , (Σi=1,nσi2)1/2)
The approximation is usually good for n > 30.
If the variables are discrete, we will use a correction
factor called continuity correction.
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
35
Normal approximations
Normal approximation to the Binomial
distribution. A Binomial distribution B(n,p) with
n > 30 and np(1−p) > 5, is approximately
(
N np , np (1 − p )
)
0.00
0.04
0.08
0.12
B (5 0 ,0 '7 ) y N (3 5 ,3 '2 4 )
0
Ignacio Cascos
3
6
9
1 3
1 7
2 1
2 5
2 9
3 3
3 7
Depto. Estadística, Universidad Carlos III
4 1
4 5
4 9
36
Example (Four flamates and a die)
Four flatmates Alice, Bob, Charly, and Dave roll a
6-sided die every night in order to decide who washes
the dishes after dinner. If the outcome is 1, Alice
washes, it is its 2, Bob does, while for 3 or 4 Charly
must wash the dishes and for 5 or 6 it is Dave's turn.
c) How many days (dinners) must we wait until Bob
washes the dishes at least 11 times with probability
0.95?
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
37
Normal approximations
Normal approximation to the Poisson
distribution. A Poisson distribution℘(λ) with λ>5
is approximately N(λ, λ1/2)
0.00
0.02
0.04
P (4 9 ) y N (4 9 ,7 )
0
Ignacio Cascos
6
13
21
29
37
45
53
61
Depto. Estadística, Universidad Carlos III
69
77
85
93
38
Example (radioactive material)
On average, a sample of radioactive material emits
15 alpha particles per minute. What is the approximate
probability of 10 alpha particles being emitted in:
a) 1 minute ?
b) Many years later, the material averages 6 alpha
particles emitted per min. What is the probability of
at least 6 alpha particles being emitted in 1 minute?
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
39
Outline
1. Discrete distributions
Binomial distribution
Geometric distribution
Poisson distribution
2. Continuous distributions
Uniform distribution
Exponential distribution
Normal distribution
Central Limit Theorem
3. Multivariate normal distribution
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
40
Bivariate normal distribution
The density mass function of a random vector that
follows a bivariate normal distribution with mean vector
(µ1,µ2) and covariance matrix Σ is
 1
x − µ1 
1
−1  1

f ( x1 , x2 ) =
exp− (x1 − µ1 , x2 − µ 2 )Σ 
1/ 2
2π Σ
 x2 − µ 2  
 2
 σ 12
if Σ = 
 ρσ 1 σ 2
ρσ 1 σ 2 
 , then
2
σ 2 
 − 1
1
exp
2
2
−
2
1
ρ
2πσ 1σ 2 1 − ρ

)
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
(
 x − µ  2  x − µ  2
 x1 − µ1  x2 − µ 2  
1
1
2
2
 + 
 − 2 ρ 

 

 σ 1   σ 2 
 σ 1  σ 2  
41
Bivariate normal distribution
If X1 and X2 have a bivariate normal distribution
with mean vector (µ1,µ2) and covariance matrix
Σ, the marginal distributions of X1 and X2 are
normal,
X1~N(µ1,σ1) y X2~N(µ2,σ2) .
The correlation ρ measures the dependence between the
variables.
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
42
Bivariate normal distribution
rho=0, sigma1)1, sigma2=3
-3
-10
-2
-5
-1
0
0
1
5
2
10
rho=0, sigma1=sigma2
-1
0
1
2
3
rho=0.8, sigma1=sigma2
-10
-5
0
rho=-0.8, sigma1=sigma2
10
0
-5
0
-5
-4
Ignacio Cascos
5
5
-2
5
-3
-2
0
2
4
Depto. Estadística, Universidad Carlos III
-4
-2
0
2
4
43
Bivariate normal distribution
Properties. Given (X1,X2) a normal random
vactor with mean vector (µ1,µ2) and
covariance matrix
 σ 12
ρσ 1 σ 2 

Σ = 
2

ρσ
σ
σ
2
 1 2

1.
2.
Ignacio Cascos
if ρ = 0 then X1 and X2 are independent ;
given a1,a2∈IR, a1X1+a2X2 is normal .
Depto. Estadística, Universidad Carlos III
44
Example
Given (X1,X2) a normal bivariate random vector
with mean vector (50,45) and covariance matrix
 6 2

Σ = 
 2 4
Determine:
a) P(4X1+X2 ≥ 250)
b) P(X1+4X2 ≥ 220)
Ignacio Cascos
Depto. Estadística, Universidad Carlos III
45