PROJECT WORK IN SF2943 TIME SERIES ANALYSIS 2014 FILIP LINDSKOG & PIERRE NYQUIST 1. About the project work This course has mandatory project work as an essential learning activity. The course aims to provide you with sufficient skills to analyze time series data, and such skills are not likely obtained by only attending lectures and solving exercises from the course literature. You learn how to analyze time series data by analyzing time series data. The problems below that define the project work are intentionally a bit vaguely formulated. You are required to use good judgement in deciding how to address the problems. You are encouraged to let your curiosity lead you to investigate also other aspects of the problems. Make sure that you read the following project instructions carefully! The project work is evaluated on the basis of your written report and gives bonus points on the written exam in the range 0 − 10 points. The project work is done in groups of 1 − 3 students. You may use whatever software you like (e.g. Matlab, R, ITSM2000), use available functions for time series analysis, or write your own functions. Whatever you choose, you are required to state precisely what you are using and what the specific function does. If you only give the name of the function and its output, then we will probably not, when evaluating your report, be convinced that you know what you are doing. Use the LaTeX template available from the course webpage to write your project report. Clear, concise, and well-written reports will be rewarded, unnecessary lengthy reports should be avoided. There are four mandatory seminars: two project discussion seminars (10 April and 24 April), one project presentation seminar (6 May), and one seminar where you present solutions to problems from old exams (15 May). To the first project discussion seminar you should come prepared to discuss the project work with other groups and you are required to bring plots and preliminary results from your project work in order to make the discussions meaningful for all parties. To the second project discussion seminar you are required to bring a preliminary version of your project report containing at least partial solutions to Problems 1-8. For the project presentation seminar you must be prepared to present your solutions to Problems 1-8. We will inform you one day in advance what part of the project work you should present. If one member of your group gives a presentation that indicates that he/she Date: 2014-04-01. 1 2 FILIP LINDSKOG & PIERRE NYQUIST does not fully understand the work of your group, then that will affect the evaluation of the entire project work of your group. Towards the end of the course you should study at least three old exams and select two problems that you find interesting and relevant for the course: Problem 9 below. You should be prepared to present solutions to these two problems on 15 May. The day before, on 14 May, you should hand in a very brief report explaining why you selected the two problems, what they test, and if possible suggest modifications to the problems that make them more relevant or interesting. To ensure that you are sufficiently prepared to participate in the project work, each member of your group is required to pass a multiple choice test that tests basic knowledge of time series analysis. Passing the test means having at least 6, out of 10, correct answers. If you fail the test once, and at least one day prior to the deadline, then you will have another opportunity to pass the test prior to the deadline. Failure to pass the test prior to the deadline disqualifies you from participating in the project work. 2. Deadlines • 15:00 on 9 April 2014: deadline for sending your answers to the multiple choice test by email to lindskog<AT>kth<DOT>se or pierren<AT>kth<DOT>se. • During the lecture on 30 April 2014: hand in your complete project report, written on paper. • During the lecture on 14 May 2014: hand in your 1/2 − 1-page report on the chosen two exam problems, written on paper. 3. Project work Problem 1. How do you know whether it is plausible that a time series data set is a realization of white noise, or even a sequence of independent and identically distributed (iid) random variables? Simulate iid samples of varying sizes and compute the sample autocorrelations for different lags. It is claimed that the sample autocorrelations for different lags, based on a common sufficiently large sample, are approximately independent and normally distributed with zero mean and variance 1/n (n is the sample size). Investigate this claim by doing relevant simulations of iid samples, performing the relevant tests, and analyze the results. Apply the other methods discussed in Section 1.6 of the course textbook on iid samples and study their performance. Repeat the analysis with the iid sample replaced by low order AR and MA processes with small coefficients. How large must the coefficients be in order to detect deviations from white noise? What is the effect of the sample size? What can you say about the possibility to detect a small trend in the data? Problem 2. Consider the AR(2) time series model (1) Xt − 1.3Xt−1 + 0.65Xt−2 = Zt , {Zt } ∼ WN(0, 280), PROJECT WORK IN SF2943 TIME SERIES ANALYSIS 2014 3 where the white noise sequence consists of independent and normally distributed random variables. Is the time series stationary? Causal? Compute and plot the autocorrelation function and the spectral density of the time series. Explain how it can be seen from the plots that the autocorrelation function and the spectral density come from the AR(2) model (1). Problem 3. Simulate 100 samples of size 200 from the AR(2) model (1), with independent and normally distributed Zt s. (a) For each sample, use at least two different methods for estimating the model parameters. For each method, make a scatter plot of the 100 pairs of estimated coefficients of the AR polynomial. Which method is best in the sense that it produces the smallest one-step mean-squared prediction error? Compare the histograms (with the same x-axis) of the one-step prediction errors. Also investigate whether the filtered residuals from the fitted model and the simulated data have the same (normal) distribution as the original simulated sample {Zt }. (b) For each sample, fit both an AR(2) model and an AR(10) model to the data. Compare how well the two fitted models do one-step and three-step predictions. Problem 4. Consider the AR(1) process (2) Xt = 0.8Xt−1 + Zt , {Zt } ∼ WN(0, 1). A causal autoregressive process can be represented as a moving average process of infinite order. In particular, this holds for the AR(1) process above. Simulate 100 samples of size 200 from the AR(1) model, with independent and normally distributed Zt s. For each sample, fit both an AR(1) model and an MA(10) model to the data. Compare how well the two fitted models do one-step and three-step predictions. Problem 5. Consider the AR(1) process (2) with independent Zt s and with c + Zt being lognormally distributed for an appropriate choice of c. Simulate 100 samples of size 200 from the AR(1) model. For each sample fit an AR(1) model to the data. Make a scatter plot of the estimated parameter pair corresponding to the AR(1) coefficient and the standard deviation of the white noise sequence. Compare with the findings in Problem 4 how well the fitted model do one-step predictions. Problem 6. Consider the six time series data sets in Data_Series_1.txt, Data_Series_2.txt, . . . , Data_Series_6.txt. Whenever necessary, transform the data series into time series that appear to be realizations of stationary time series. How do you know whether you have succeeded in removing possible trends and seasonal components? Choose a stationary time series model for the transformed time series data. Present the analysis you do to decide on the appropriate time series model class, AR(p), MA(q), ARMA(p, q)? What are appropriate values for p and/or q? Estimate the parameters of the time series model. There are several approaches available for parameter estimation. Choose at least two approaches that seem appropriate for the data, compare the results, and elaborate on the reasons for preferring one approach over another. 4 FILIP LINDSKOG & PIERRE NYQUIST Problem 7. Choose time series data that you would like to analyze. Present the time series and explain why you want to analyze it. Analyze the time series using appropriate techniques used in your solution to Problem 6, and possibly other techniques that seem relevant here. Present your analysis and describe your findings. Problem 8. The data set in co2_mm_mlo.txt contains average monthly atmospheric carbon dioxide levels measured at Mauna Loa, Hawaii, since 1958 (available from the website http://www.esrl.noaa.gov/gmd/ccgg/trends/). Predict the average carbon dioxide level in July 2014 and comment on the accuracy of the forecast. Problem 9. Go through and analyze three previous exams in the course. The exams are available from the course webpage. Choose two problems from the three exams that you find particularly relevant for the course. Present the solutions to the problems and explain what the problems test and why the problems are relevant exam problems. Suggest, if possible, modifications to the problems that would make them better.
© Copyright 2024 ExpyDoc