STAT 380: Lecture 11

STAT 380: Lecture 11
Matthew Schofield
STAT 380: Lecture 11
Slide 1
Motivation
Have some function that we want to maximise
Often a likelihood
Could be another function
Unable/unwilling to do it analytically
e.g. MLEs for Gamma or Beta distribution
iid
y1 , . . . , yn ∼ Gamma(α, β)
iid
y1 , . . . , yn ∼ Beta(α, β)
STAT 380: Lecture 11
Slide 2
Options
Option 1: Use a built in optimisation routine in R?
nlm
optim
optimize
Problem: need to understand what’s going on!
STAT 380: Lecture 11
Slide 3
Options
Option 1: Use a built in optimisation routine in R?
nlm
optim
optimize
Problem: need to understand what’s going on!
Option 2: Do it yourself.
What we will focus on here...
Bisection, Newton, and Secant Algorithms
Often we will use the above R functions to check our work
STAT 380: Lecture 11
Slide 3
Question
What do bisection and newton algorithms do?
STAT 380: Lecture 11
Slide 4
Question
What do bisection and newton algorithms do?
Find solutions to the equation f (x) = 0
STAT 380: Lecture 11
Slide 4
Question
What do bisection and newton algorithms do?
Find solutions to the equation f (x) = 0
How can we use them to maximize our likelihood?
STAT 380: Lecture 11
Slide 4
Question
What do bisection and newton algorithms do?
Find solutions to the equation f (x) = 0
How can we use them to maximize our likelihood?
Use the methods on the derivative of the likelihood
STAT 380: Lecture 11
Slide 4
Numerically Solving an Equation
We have an equation f (x) = 0
Rewrite that equation to give something of the form x = h(x)
Set up an iterative equation: x (i+1) = h(x (i) )
Iterate (i.e. continually update) until convergence
Infinite number of h(x) we can choose
e.g. suppose log (x) − x 2 = 0
We could rewrite that as: x =
p
log (x)
We could rewrite that as: x = exp(x 2 )
etc
STAT 380: Lecture 11
Slide 5
Numerically Solving an Equation
Set up an iterative equation: x (i+1) = h(x (i) )
Iterate (i.e. continually update) until convergence
Whole lot of theory (MATH 361) that determines
1. whether there will be a solution
2. whether that solution is unique
3. whether or not the equation will converge
4. if convergent, the (approximate) rate at which it converges
Depending on h(·) the solution can actually diverge
The speed of convergence depends on the algorithm and the
choice of h(·)
STAT 380: Lecture 11
Slide 6
Convergence
We want the successive xi values to get closer and closer to
the true answer
Refer to this as a convergent algorithm
Take a practical view to convergence
Usually only know that the xi values get closer and closer to
some answer
Stop the algorithm is successive xi values are very close
STAT 380: Lecture 11
Slide 7
Bisection Method
The bisection method is one of the simplest and most
intuitive optimisation algorithms.
Given we wish to find f (x) = 0, the steps are:
1. Set i = 0 and make an initial guess at the solution: x0 .
2. Define a lower and upper limit such that xlow < x0 < xup .
3. Set (
i ← i + 1 and set x (i) = (xup + xlow )/2.
xlow = xi if SIGN[h(xi )] = SIGN[h(xlow )]
4. Set
xup = xi otherwise
5. Set i ← i + 1 and return to step 3. Repeat until convergence
xup - xlow is small
Solution is x =
STAT 380: Lecture 11
xup −xlow
2
Slide 8
Bisection Method
The bisection method steers the solution by comparing the
signs of the limits with the current state.
Each successive iteration halves the interval length.
The function h is required to be continuous between the
limits, but not necessarily differentiable.
STAT 380: Lecture 11
Slide 9
Example
Suppose we want to minimise f where
f (x) = −x(log(x) − 1) −
x2
4
Find the derivative and set to zero (on board):
STAT 380: Lecture 11
Slide 10
Problem
Analytically finding the solution to
f 0 (x) = − log(x) − x/2 = 0 is impossible (at least for me).
STAT 380: Lecture 11
Slide 11
Example: functions in R
f = function(x) {
fx = -x * (log(x) - 1) - x^2/4
return(fx)
}
f.prime = function(x) {
fx = -log(x) - x/2
}
STAT 380: Lecture 11
Slide 12
Example: plotting f (x) and f 0 (x)
curve(f, from = 0.1, to = 2, n = 1001)
abline(h = 0, lty = 2)
curve(f.prime, add = T, col = "blue", n = 1001)
legend("bottomleft", legend = c("f", "d/dx f", "zero"), lty = c(1,
1, 2), col = c("black", "blue", "black"))
STAT 380: Lecture 11
Slide 13
f(x)
−0.4
−0.2
0.0
0.2
0.4
0.6
0.8
Example: plotting f (x) and f 0 (x)
f
d/dx f
zero
0.5
1.0
1.5
2.0
x
STAT 380: Lecture 11
Slide 14
1.0
Example: bisection method
−0.5
|
||
||
−1.0
f.prime(x)
0.0
0.5
Solution is approx 0.7034674...
| |
|
−1.5
|
|
|
|
|
−2.0
|
|
|
0.4
0.5
|
0.6
0.7
0.8
0.9
1.0
1.1
x
STAT 380: Lecture 11
Slide 15
Convergence of bisection method
k
1
= 2−k times
After k steps, the length of the interval is
2
its initial length.
The convergence rate is linear
i.e. the rate at which the interval reduces is constant.
Based on this we can define a stopping rule.
STAT 380: Lecture 11
Slide 16
Stopping rule for bisection method
There are no hard-and-fast laws regarding stopping rules for
these kinds of algorithms.
Usually base it on the length of the interval
Define some small constant Define convergence to have taken place when
(xup − xlow ) ≤ ,
Given a certain value for , how many steps will it take to
achieve convergence?
STAT 380: Lecture 11
Slide 17
Stopping rule for bisection method
Two options:
1. For a given use our knowledge about the bisection method to
find the number of iterations k to ensure the desired degree of
accuracy
Run a for loop for that many iterations
Hard to do for algorithms consider next
2. Run until xup − xlow ≤ Run a while loop
The exact choice of will typically depend upon the scale of
the problem.
STAT 380: Lecture 11
Slide 18
Finding k for bisection method
Define z to be initial interval width
Want final interval width < Want to find k (number of iterations) such that:
k
1
z
<
2
k
1
<
2
z
k log(0.5) < log
z
log z
k>
log(0.5)
STAT 380: Lecture 11
Slide 19
Example: bisection method
If we want accuracy to = 0.01 with starting interval z = 1
z = 1
epsilon = 0.01
(k = ceiling(log(epsilon/z)/log(0.5)))
## [1] 7
If we want accuracy to = 1e −10 with starting interval z = 1
epsilon = 1e-10
(k = ceiling(log(epsilon/z)/log(0.5)))
## [1] 34
STAT 380: Lecture 11
Slide 20
Using built-in functions
Check our code with built-in functions
optimize
help(optimize)
optimize(f, interval = c(0, 10), maximum = TRUE)
## $maximum
## [1] 0.7035
##
## $objective
## [1] 0.8272
STAT 380: Lecture 11
Slide 21
Exercise: MLE for the Rayleigh distribution
We observe the n = 6 data points
x = c(0.34, 0.77, 1.37, 1.64, 2.46, 0.26)
We believe they are independent samples from the same
Rayleigh distribution.
STAT 380: Lecture 11
Slide 22
Exercise: MLE for the Rayleigh distribution
The Rayleigh distribution:
is defined for values x ≥ 0
is controlled by the parameter σ > 0
has density function
f (x; σ) =
x −x22
e 2σ .
σ2
Given a sample of size {x1 , . . . , xn }, the log-likelihood for σ is:
Pn 2
x
2
`(σ|x1 , . . . , xn ) = −n log(σ ) − i 2 i
2σ
STAT 380: Lecture 11
Slide 23
Excercises
1. Starting with the pdf of the Rayleigh distribution, show that
the log-likelihood in the previous slide is correct
2. Find the derivative of the log-likelihood. Specify a function (in
R) that evaluates the derivative (using the data given) at a
given value (or values) of σ
3. Plot the derivative of the log-likelihood function and use this
plot to find appropriate values for σlow and σup that are either
side of the value of σ that leads to a derivative of 0
STAT 380: Lecture 11
Slide 24
Excercises
4. Write R code that runs one iteration of bisection method: i.e.
start with your values of σlow and σup and (i) find the
midpoint σi and (ii) use it to replace either σlow or σup .
5. Write R code that runs k iterations of the bisection method
using a for loop. Run it for k = 7 and k = 34 iterations.
6. Write R code that runs the bisection method until
xup − xlow < for some using a while loop. Run it until
< 0.01 and < 1e −10 .
7. This is a case where the MLE can be obtained in closed form
(i.e. you can algebraicly solve for it). Find the analytical MLE
and use it to ensure your functions above are correct.
STAT 380: Lecture 11
Slide 25