exercise

Graduate School of ISEE
June 25, 2014
確率・統計特論 (Probability and Statistics) #10
来嶋 秀治 (Shuji Kijima)
Note: You must indicate the information source which you use.
Todays topics: Linear regression, multivariate distributions, model selection
1. Linear regression
1-i. Linear regression: simple linear regression. Observed values x, y are suppose to satisfy y = α +
βx + e with e ∼ N (0, σ 2 ) for unknown parameters α, β. Given k samples (x1 , y1 ), (x2 , y2 ), . . . , (xn , yk ),
b is defined by
we try to estimate α and β. The least square estimator) (b
α, β)
b = arg min
(b
α, β)
′ ′
α ,β
k
∑
(yi − (α′ + βi′ xi ))2 ,
i=1
and analytic solution is found in
βb = sx,y /s2x ,
b
α
b = y − βx
where
x=
n
∑
xi /n,
y=
i=1
n
∑
yi /n,
x2
=
i=1
n
∑
x2i /n,
xy =
i=1
n
∑
xi yi /n,
s2x = x2 − x2
sx,y = xy − x· y.
i=1
Remember Cov[XY ] = E[(X − E[X])(Y − E[Y ])] = E[XY ] − E[X]E[Y ].
1-ii. ∑
Linear regression: multiple regression. observed values y, x1 , . . . , xn are supposed to satisfy y =
α + ni=1 βxi + e with e ∼ N (0, σ 2 ) for unknown parameters α, β1 , . . . , βn . The multiple regression is
an estimation of α, β1 , . . . , βn , for given k samples x1 , x2 , . . . , xk (=⇒ Ex 1).
1-iii. Other regression. We can consider arbitrary functions which random variables satisfy, in general.
For example observed values x, y are supposed to satisfy y = e′ Cxα with log(e′ ) ∼ N (0, σ 2 ) for unknown
parameters α, C (=⇒ Ex 2).
For example, the logistic regression is a non-linear regression.
2. Multivariate distributions
(i) Multivariate normal distribution N (µ, Σ)
(µ = (µ1 , . . . , µn )⊤ ∈ Rn , Σ = (σi,j ) ∈ Rn×n : variance-covariance matrix. )
(
)
1
1
⊤ −1
√
f (x) =
exp − (x − µ) Σ (x − µ)
(x ∈ Rn )
2
(2π)n/2 det(Σ)
where

σ12
 σ1,2

Σ =  ..
 .
σ1,2
σ22
..
.
···
···
..
.
2
σ1,n σ2,n
···

σ1,n
σ2,n 

.. 
. 
σn2
(ii) Dirichlet distribution Dir(α) (α = (α1 , . . . , αn ) ∈ Rn≥0 )
Sample space Ω = {x = (x1 , . . . , xn ) ∈ Rn>0 | x1 + · · · + xn = 1},
1 ∏ αi −1
xi
B(α)
n
f (x) =
(x ∈ Ω)
i=1
1
where
B(α) =
∫ ∏
n
Ω i=1
xαi i −1 dx
∏n
Γ(αi )
i=1
∑n
=
Γ( i=1 αi )
(iii) Multinomial distribution M(p, K) (p = (p1 , . . . , pn ) ∈ Rn>0 ,
Sample space Ω = {z ∈ Zn≥0 | z1 + · · · + zn = K}
f (z) =
K!
pz1 · · · pznn
z1 !z2 ! · · · zn ! 1
∑n
i=1 pi
= 1)
(z ∈ Ω)
Remark that multivariate normal distribution and Dirichlet distribution are continuous, while multinomial distribution is discrete.
3. How to select a model? —Occam’s razor
AIC (Akaike’s information criteria; 赤池情報量基準): Given k samples x1 , . . . , xk , AIC for a distribution
(stochastic model) f (x; θ) is defined by
AIC := −2
n
∑
log L(θb0 , . . . , θbq ; xi ) + 2(q + 2)
i=1
=
−2(maximum likelihood) + 2(number of parameters)
A model with small AIC is good.
Remark there are many other similar concepts, such as BIC (Bayesian Information Criteria), MDL(Minimum
Description Length), etc. Relationship among some of them are studied well.
Exercises
Ex 1. Observed values x and y are supposed to satisfy y = e′ Cxα with log(e′ ) ∼ N (0, σ 2 ) for unknown
parameters α and C. Given k samples (x1 , y1 ), . . . , (xk , yk ), Estimate α and C.
hint: consider the logarithm of y = e′ Cxα .
Ex 2. Observed values x, y, z are supposed to satisfy z = ax + by + c for unknown parameters a, b, c.
Given k samples (x1 , y1 , z1 ), . . . , (xk , yk , zk ), estimate a, b, c.
Reference (in Japanese)
樺島祥介, 北川源四郎, 甘利俊一, 赤池弘次, 下平英寿, 土谷隆 (編), 室田一雄 (編), 赤池情報量規準 AIC—
モデリング・予測・知識発見, 共立出版 (2007).
2