here - KTH

PROJECT WORK IN SF2943 TIME SERIES ANALYSIS 2014
FILIP LINDSKOG & PIERRE NYQUIST
1. About the project work
This course has mandatory project work as an essential learning activity. The course
aims to provide you with sufficient skills to analyze time series data, and such skills are
not likely obtained by only attending lectures and solving exercises from the course literature. You learn how to analyze time series data by analyzing time series data. The
problems below that define the project work are intentionally a bit vaguely formulated.
You are required to use good judgement in deciding how to address the problems. You are
encouraged to let your curiosity lead you to investigate also other aspects of the problems.
Make sure that you read the following project instructions carefully!
The project work is evaluated on the basis of your written report and gives bonus points
on the written exam in the range 0 − 10 points. The project work is done in groups of 1 − 3
students.
You may use whatever software you like (e.g. Matlab, R, ITSM2000), use available
functions for time series analysis, or write your own functions. Whatever you choose, you
are required to state precisely what you are using and what the specific function does. If
you only give the name of the function and its output, then we will probably not, when
evaluating your report, be convinced that you know what you are doing.
Use the LaTeX template available from the course webpage to write your project report.
Clear, concise, and well-written reports will be rewarded, unnecessary lengthy reports
should be avoided.
There are four mandatory seminars: two project discussion seminars (10 April and 24
April), one project presentation seminar (6 May), and one seminar where you present
solutions to problems from old exams (15 May).
To the first project discussion seminar you should come prepared to discuss the project
work with other groups and you are required to bring plots and preliminary results from
your project work in order to make the discussions meaningful for all parties.
To the second project discussion seminar you are required to bring a preliminary version
of your project report containing at least partial solutions to Problems 1-8.
For the project presentation seminar you must be prepared to present your solutions to
Problems 1-8. We will inform you one day in advance what part of the project work you
should present. If one member of your group gives a presentation that indicates that he/she
Date: 2014-04-01.
1
2
FILIP LINDSKOG & PIERRE NYQUIST
does not fully understand the work of your group, then that will affect the evaluation of
the entire project work of your group.
Towards the end of the course you should study at least three old exams and select
two problems that you find interesting and relevant for the course: Problem 9 below. You
should be prepared to present solutions to these two problems on 15 May. The day before,
on 14 May, you should hand in a very brief report explaining why you selected the two
problems, what they test, and if possible suggest modifications to the problems that make
them more relevant or interesting.
To ensure that you are sufficiently prepared to participate in the project work, each
member of your group is required to pass a multiple choice test that tests basic knowledge
of time series analysis. Passing the test means having at least 6, out of 10, correct answers.
If you fail the test once, and at least one day prior to the deadline, then you will have
another opportunity to pass the test prior to the deadline. Failure to pass the test prior to
the deadline disqualifies you from participating in the project work.
2. Deadlines
• 15:00 on 9 April 2014: deadline for sending your answers to the multiple choice
test by email to lindskog<AT>kth<DOT>se or pierren<AT>kth<DOT>se.
• During the lecture on 30 April 2014: hand in your complete project report, written
on paper.
• During the lecture on 14 May 2014: hand in your 1/2 − 1-page report on the chosen
two exam problems, written on paper.
3. Project work
Problem 1. How do you know whether it is plausible that a time series data set is a
realization of white noise, or even a sequence of independent and identically distributed
(iid) random variables?
Simulate iid samples of varying sizes and compute the sample autocorrelations for different lags. It is claimed that the sample autocorrelations for different lags, based on a
common sufficiently large sample, are approximately independent and normally distributed
with zero mean and variance 1/n (n is the sample size). Investigate this claim by doing
relevant simulations of iid samples, performing the relevant tests, and analyze the results.
Apply the other methods discussed in Section 1.6 of the course textbook on iid samples
and study their performance.
Repeat the analysis with the iid sample replaced by low order AR and MA processes
with small coefficients. How large must the coefficients be in order to detect deviations
from white noise? What is the effect of the sample size? What can you say about the
possibility to detect a small trend in the data?
Problem 2. Consider the AR(2) time series model
(1)
Xt − 1.3Xt−1 + 0.65Xt−2 = Zt ,
{Zt } ∼ WN(0, 280),
PROJECT WORK IN SF2943 TIME SERIES ANALYSIS 2014
3
where the white noise sequence consists of independent and normally distributed random
variables. Is the time series stationary? Causal? Compute and plot the autocorrelation
function and the spectral density of the time series. Explain how it can be seen from
the plots that the autocorrelation function and the spectral density come from the AR(2)
model (1).
Problem 3. Simulate 100 samples of size 200 from the AR(2) model (1), with independent
and normally distributed Zt s.
(a) For each sample, use at least two different methods for estimating the model parameters.
For each method, make a scatter plot of the 100 pairs of estimated coefficients of the AR
polynomial. Which method is best in the sense that it produces the smallest one-step
mean-squared prediction error? Compare the histograms (with the same x-axis) of the
one-step prediction errors. Also investigate whether the filtered residuals from the fitted
model and the simulated data have the same (normal) distribution as the original simulated
sample {Zt }.
(b) For each sample, fit both an AR(2) model and an AR(10) model to the data. Compare
how well the two fitted models do one-step and three-step predictions.
Problem 4. Consider the AR(1) process
(2)
Xt = 0.8Xt−1 + Zt ,
{Zt } ∼ WN(0, 1).
A causal autoregressive process can be represented as a moving average process of infinite
order. In particular, this holds for the AR(1) process above. Simulate 100 samples of
size 200 from the AR(1) model, with independent and normally distributed Zt s. For each
sample, fit both an AR(1) model and an MA(10) model to the data. Compare how well
the two fitted models do one-step and three-step predictions.
Problem 5. Consider the AR(1) process (2) with independent Zt s and with c + Zt being
lognormally distributed for an appropriate choice of c. Simulate 100 samples of size 200
from the AR(1) model. For each sample fit an AR(1) model to the data. Make a scatter plot
of the estimated parameter pair corresponding to the AR(1) coefficient and the standard
deviation of the white noise sequence. Compare with the findings in Problem 4 how well
the fitted model do one-step predictions.
Problem 6. Consider the six time series data sets in
Data_Series_1.txt, Data_Series_2.txt, . . . , Data_Series_6.txt.
Whenever necessary, transform the data series into time series that appear to be realizations of stationary time series. How do you know whether you have succeeded in removing
possible trends and seasonal components? Choose a stationary time series model for the
transformed time series data. Present the analysis you do to decide on the appropriate
time series model class, AR(p), MA(q), ARMA(p, q)? What are appropriate values for p
and/or q? Estimate the parameters of the time series model. There are several approaches
available for parameter estimation. Choose at least two approaches that seem appropriate
for the data, compare the results, and elaborate on the reasons for preferring one approach
over another.
4
FILIP LINDSKOG & PIERRE NYQUIST
Problem 7. Choose time series data that you would like to analyze. Present the time
series and explain why you want to analyze it. Analyze the time series using appropriate
techniques used in your solution to Problem 6, and possibly other techniques that seem
relevant here. Present your analysis and describe your findings.
Problem 8. The data set in co2_mm_mlo.txt contains average monthly atmospheric carbon dioxide levels measured at Mauna Loa, Hawaii, since 1958 (available from the website http://www.esrl.noaa.gov/gmd/ccgg/trends/). Predict the average carbon dioxide
level in July 2014 and comment on the accuracy of the forecast.
Problem 9. Go through and analyze three previous exams in the course. The exams are
available from the course webpage. Choose two problems from the three exams that you
find particularly relevant for the course. Present the solutions to the problems and explain
what the problems test and why the problems are relevant exam problems. Suggest, if
possible, modifications to the problems that would make them better.