Use of R in the UK Office for National Statistics Duncan Elliott Time Series Analysis y Branch Survey Methodology and Statistical Computing Division Outline • Brief B i f hi history t off R iin ONS • Examples of analysis dlm package for modelling unemployment spatstat package for crime statistics • Use in producing National Statistics Current: MortalitySmooth y package p g Future: ? Brief history of R in ONS Year Version ONS Users 2004 2.0.1 1 2006 231 2.3.1 5 2011 2.12.1 30 2014 3.0.2 (32-bit) 60 • Used U d as a research h ttooll ffor Spatial analysis, small area estimation, time series analysis sample design & estimation analysis, Brief history of R in ONS • R used to call X X-12-ARIMA 12 ARIMA to analyse multiple seasonal time series efficiently (2006) • Development of informal training (2007) • Smoothing of mortality rates (2010) • Establishment of R Testing Group (2011) Methodologists & IT specialists Pilots for disclosure tool, admin data processing, production standard graphics, experience of other NSI’s Conclusion: useful research tool not yet for general production systems • Establishment of R Development Group (2012) Aim: testing for production environments Development of formal training (2013) How is R is currently used at ONS • R for Windows on individual workstations • Restricted use of R Delay in updating versions No direct access to packages on CRAN or other repositories Cannot be used for regular g p production of a National Statistic (one exception) • Mostly users access via R Gui (some use of R studio) • Used for analysis and research in a growing number of areas at ONS Analysis with the dlm package • Giovanni Petris (2010). (2010) An R Package for Dynamic Linear Models. Journal of Statistical Software, 36(12), 1-16. 1 16. URL http://www.jstatsoft.org/v36/i12/ • Unemployment statistics currently published as rolling quarterly data p model for • Aim: state space Modelling potential discontinuities Account for Survey Error Autocorrelation (SEA) and Rotation G Group Bias (RGB) ( G ) Extracting monthly signal Removal of some unobserved components LFS Survey Design & Estimation • Quarterly survey with rotating panel design • 40,000 households per quarter • Respondents R d t iinterviewed t i d ffor 5 successive i waves at three monthly intervals • Typically Wave 1 CAPI, CAPI waves 2 2-5 5 CATI • Rolling quarterly estimates use calibration weighting • Imputation for non-response = roll forward for one period else zero weight Cohort and Wave structure Cohort Period 1 2 3 4 5 6 7 8 9 10 11 12 Jan‐Mar Jan Mar 2012 2012 W5 W4 W3 W2 W1 Apr‐Jun 2012 W5 W4 W3 W2 W1 Jul‐Sep 2012 W5 W4 W3 W2 W1 Oct‐Dec 2012 W5 W4 W3 W2 W1 Jan‐Mar 2013 W5 W4 W3 W2 W1 A J 2013 Apr‐Jun 2013 W5 W4 W3 W2 W1 Jul‐Sep 2013 W5 W4 W3 W2 W1 Oct Dec 2013 Oct‐Dec 2013 W5 W4 W3 W2 W1 Multivariate model • Ob Observations ti are monthly thl wave specific ifi estimates for waves j = 1,2,..5 ytj Yt a j etj Yt Lt St I t Lt Lt 1 Rt 1 wtL Rt Rt 1 wtR 10 St St i wtS i 1 It w I t wtL ~ N (0, L2 ) wtR ~ N (0, (0 R2 ) wtS ~ N (0, S2 ) wtI ~ N (0, I2 ) Multivariate model • Model M d l ffor wave specific ifi errors et e w j j j ,t t 3 e t • For example e e w e w 3 t 3 3,t t 3 e t 3 2 t 3 w1 w2 w3 w4 w5 e1t-3 e2t-3 e3t-3 e4t-3 e5t-3 e1t-2 e2t-2 e3t-2 e4t-2 e5t-2 e1t-1 e2t-1 e3t-1 e4t-1 e5t-1 e1t e2t e3t e4t e5t e t Multivariate model • State St t Space S Model M d l ytj Yt a j etj Lt St I t a j etj y t F t t G t 1 w t • State vector t ( tY , te ) tY ( Lt , Rt , St , St 1 ,...St 10 , I t , a 2 , a 3 ,..., a5 ) te (et1 , et2 ,..., et5 , et11 , et21 ,..., et51 , et12 , et22 ,..., et52 ) Pseudo Survey Error Autocorrelation • E Estimates ti t based on Pf ff Pfeffermann et al (1998) Monthly unemployment UK aged 16 16+ spatstat: visualising crime data • Adrian Baddeley, Baddeley Rolf Turner (2005) (2005). spatstat: An R Package for Analyzing Spatial Point Patterns. Journal of Statistical Software 12(6), 1-42. 1 42. URL http://www.jstatsoft.org/v12/i06/ • Crime data is currently released by police.uk at postcode level • Rich data source, but current presentation could be improved to enable better understanding of trends and complex patterns • Kernel smoothing done in R, using the spatstat package k Vehicle crime in greater London Shepherd's Bush Ilford 15 New methods for small area estimation • Early research ongoing into new methods of estimating income at small areas (MSOA level) • Method proposed by Molina and Rao (2010) has been implemented in R as part of this research Difficult to implement p in our standard software Preferred tool for academics involved • Potentially allows production of quantities of income, such as the median, at MSOA level which were not previously available R in production of a National Statistic • 2010 review i off mortality t lit rates t estimation ti ti Recommended use of 2-dimensional p-spline Method not available in standard software Carlo G. Camarda (2012). MortalitySmooth: An R P k Package for f Smoothing S thi Poisson P i Counts C t with ith PP Splines. Journal of Statistical Software, 50(1), 124 URL http://www.jstatsoft.org/v50/i01/ 24. http://www jstatsoft org/v50/i01/ Unsmoothed mortality improvement rates t ffor females f l in i the th UK 100 75 Age Improvement Rate 25 0 -25 -50 50 25 0 1960 1970 1980 1990 Year 2000 2010 Smoothed mortality improvement rates ages for f ffemales l in i the th UK Testing R for production • E Evaluation l ti off survey (Lumley, (L l 2004 and d 2010), and ReGenesees (ISTAT, 2014) eg zero hours h contracts, t t business b i surveys • Statistical functions in CORD Benchmarking function (Dagum & Cholette, 2006) Forecasting? Splining? Seasonal adjustment? • … other areas of the generic statistical business p process model What we would like to learn? • Wh Whatt barriers b i tto using i Rh have th there b been iin your organisations and how have you overcome them? th ? • Experience of organisations where R has been used for systems development (eg hosting/calling from servers/integrating with other software) • Functions relevant for National Accounts • R and big data References • • • • • • • • • Adrian Baddeley, Rolf Turner (2005). spatstat: An R Package for Analyzing Spatial Point Patterns. Journal of Statistical Software 12(6), 1-42. URL http://www.jstatsoft.org/v12/i06/ Carlo G. Camarda (2012). MortalitySmooth: An R Package for Smoothing Poisson Counts with P-Splines. Journal of Statistical Software, 50(1), 1-24. URL http://www jstatsoft org/v50/i01/ http://www.jstatsoft.org/v50/i01/ Estela Bee Dagum & Pierre A. Cholette (2006) Benchmarking, Temporal Distribution, and Reconciliation Methods for Time Series: Lecture Notes in Statistics 186, Springer ISTAT (2014) http://www.istat.it/it/strumenti/metodi-e-software/software/regenesees http://www istat it/it/strumenti/metodi e software/software/regenesees T. Lumley (2012) "survey: analysis of complex survey samples". R package version 3.28-2. T Lumley (2004) Analysis of complex survey samples T. samples. Journal of Statistical Software 9(1): 1-19 Molina, I. and Rao, J.N.K. (2010) ‘Small area estimation of poverty indicators’ Canadian Journal of Statistics, Vol.38 No.3 pp369-385 pp Giovanni Petris (2010). An R Package for Dynamic Linear Models. Journal of Statistical Software, 36(12), 1-16. URL http://www.jstatsoft.org/v36/i12/ Pfeffermann, D., Feder, M. And Signorelli, D (1998). ‘Estimation of autocorrelations of survey errors with application to trend estimation in small samples’ Journal of Business and Economic Statistics, Vol. 16 pp339-348 Thank you Contact: C t t [email protected] +44 16 33 45 56 20 Acknowledgements: Ki Kieran M Martin, ti Ria Ri Sanderson, S d Daniel D i l Ayoubkhani & Gary Brown
© Copyright 2024 ExpyDoc