Advantages of panel data • More observations • Time dimension allows taking into account dynamics • More variation in data (over time, over cross sections) • Unobserved variables that may be correlated with the variables in the model can be eliminated Panel data Applied Time Series Econometrics Spring 2014 1 Handling panel data 2 • Unbalanced panel caused by: – Some survey respondents stop participation – New individuals taken to sample (to replace exited ones) – Rotation of sample – Demographic events • Some concepts: • Panel data / longitudinal data / pooled cross section – time series data: – Same observation units followed over time • Balanced panel: • firm data: entry of new firms, exit of firms, mergers • in individual data: deaths, births – Information available on all observation units for all time periods – Errors in data • Unbalanced panel: • sample attrition – For some units, not all time periods available 3 • Pseudo panel: separate cross sections are aggregated to cells (e.g. based on age cohort, sex etc.) that are followed over time – Sometimes the term pseudo panel is also used for aggregate panel data, e.g. industry panels or country panels (in contrast to ”true” panels that cover micro units) 4 • The data set can be ordered in alternative ways: – Stacked by cross section: list all years for the 1st cross section unit, then for the 2nd, etc. – Stacked by year: list all observations for the 1st year, then for the 2nd year, etc. – In either case, variables as columns – Both are ok, but you have time identifier and crosssection identifier as variables (columns) – The data can be unbalanced (no need to have ”empty” rows) • The data can also be in columns by crosssection (“wide form”) and converted to stacked data (“long form”) in Stata 5 6 1 • Data set stacked by cross-section unit or by date – Columns: date, unit, variable yit, variables xit’ (1xK vector)  1 1 y11 x11 '  ... ... ... ...    T 1 y1T x1T '    y ' x 1 2 21  21  ... ... ... ...    T 2 y2T x2T '  ... ... ... ...     1 N y N 1 xN 1 ' ... ... ... ...    T N y NT x NT ' 1 1  ...  1 ...  ... T  T ...  T • A panel need not have time and cross section as the dimensions • You could have for example region and industry as the dimensions 1 y11 x11 '  2 y21 x21 '  ... ... ...   N yN1 xN1 ' ... ... ...   ... ... ...  1 y1T x1T '   2 y 2 T x2 T '  ... ... ...   N y NT x NT ' – In this case, the regions have to have code numbers; treat them as if they were “time periods” 7 8 • Stata data set wagepanel.dta in the course homepage • Number of cross-section units: 545 • Number of time periods: 8 • Number of obs.: 8x545 = 4360 • • • • • • • • • • • • • 1. nr 2. year 3. black 4. exper 5. hisp 6. hours 7. married 8.-16. occ1 to occ9 17. educ 18. union 19. lwage 20.-26. d81 to d87 27. expersq • When you have imported a panel data to Stata, you have to tell the program that it is a panel • Statistics  Time series  Setup and utilities  Declare dataset to be time series person nuber (cross section identifier) 1980 to 1987 (time identifier) =1 if black labor market experience in years =1 if Hispanic annual hours worked =1 if married occ1=1 if occupation = 1 etc. years of schooling =1 if in union member log(wage) d81=1 if year = 1981 etc. exper^2 – Then give both time variables and panel ID variable • Command xtset nr year (also tsset nr year works) – Where year and nr are the names of the variables in the wagepanel.dta data set that indicate time and cross-section – Commands that start with ”xt” are panel commands 9 • Time series operators L., F., D. work also with panel data 10 Pooled estimation Pooled estimation • If we use OLS directly for the panel, this is usually called pooled estimation yit    xit   it • i=1,...,N (cross sections); t=1,...,T (time) • Use normal regression (regress) 11 12 2 • Estimation methods that take into account the panel nature of the data are fixed effects (FE) and random effects (RE) estimation yit   i  xit   it • i = fixed effect or random effect – depending on the application, also called individual effect, firm effect etc.; other terms: unobserved effect, idiosyncratic effect – The terms fixed effects and random effects are misleading, since both allow i to be random! 13 Fixed effects (FE) estimation • A) least squares dummy variables estimation (LSDV) – Include a dummy variable for all cross section units (except leave out one) and estimate model with OLS – The coefficients of the dummies estimate the unobserved effects – Not useful, if there are many cross section units; in the wage data N=545 – In principle possible to include the dummies in the form i.nr, e.g. reg lwage educ i.nr, but 14 there may be problems with matrix size • The idea in the other fixed effects methods is to get rid of the unobserved effect i • B) take differences over time • Incidental parameters problem – When number of observations increases, we can use asymptotic properties of the estimators, e.g. consistency – In time series data T; in cross-section data N; in panel data T and/or N – If in panel data T is fixed and the model has cross section dummies, when N, also the number of parameters to be estimated increases and there is no gain from more observations parameters of ”fixed effects” not consistent yit  xit    it • i = 0, so the ”fixed” effect disappears • Fist observation for each cross section is lost • OLS used for the differenced model; use regress and the D. operator for the variables • This is called first difference transformation • Sometimes long differences (e.g. 3-year differences) used, but this requires more data • Variables that are constant over time drop out 15 First-difference estimation (note: constant dropped) 16 • C) take differences from cross-section means (demeaning) • Estimate the model _ _ _ yit  y i  ( xit  x i )'   ( it   i ) _ y i  t yit / T , etc. 17 • Again, the unobserved effect disappears, because the mean of i is i (so i-i=0) • OLS used for the demeaned model • This is called within transformation • If T=2, first-difference and within approaches give the same results 18 3 Regression line estimated with pooled OLS _ y y y True regression line, cross-section unit 1 1 True regression line, cross-section unit 2 Regression line estimated with FE 2 _ xx x 19 • Examples: – yit = log(wage); i = unobserved ability; xit includes education, which is correlated with ability (high ability  more education) – yit = log(output) ; i = unobserved managerial ability; xit includes logs of inputs (labor and capital), which are correlated with managerial ability (good management  firm grows  uses more labor and capital) • In both cases OLS estimates inconsistent • Unobservable can be eliminated in FE estimation 20 • Estimation of FE (within) model in Stata • Statistics  Longitudinal/Panel data  Linear models  Linear regression • In the estimation window, specify the model in the normal way and choose model type • Command for example xtreg lwage exper hours, fe – Options fe=fixed effects, re=random effects, be=between, pa=population averaged (rarely used in econometrics) 21 FE (within) estimation 22 • Some notes on the output • Stata output shows a constant even for fixed effects model – This is an ”average” constant • In demeaning overall averages added to the variables • Output gives 3 different R2 values – Within: for demeaned model • R2 in dummy variable OLS would be much higher – Between: for time averaged model – Overall: for pooled data • Between and overall R2 are not quite equal to ”traditional” R2 23 24 4 Dynamic panel data with FE • What has to be assumed in FE estimation? • Errors uncorrelated with the explanatory variables: _ • Strong exogeneity fails (at least) when there are lagged values of the dependent variable: yit   i  xit    yi ,t 1   it _ E[( xit  x i )' ( it   i )]  0 _ • This implies that xit are uncorrelated with all (past and future) errors, since they are part of the average error – This is called strong exogeneity assumption _ _ _ yit  y i  ( xit  x i )'    ( yit 1  y i )  ( it   i ) 25 • Statistics  Longitudinal/panel data  Dynamic panel data (DPD) • xtabond (Arellano-Bond) • xtdpdsys (Blundell-Bond) • xtdpd (both of above) • Fairly complicated, require many choices (lag legths, instruments, endogeneity of variables) • The transformed lagged dependent variable correlated with the transformed error • Estimation by GMM (generalized method of moments): a combination of GLS and instrumental variables, with lagged values of variables as instruments Clustering • Transformed errors for cross section unit i (individual, firm etc.) tend to be correlated with each other (they all include the same average error) – Typical approach: correct standard errors for ”clustering” (cluster = cross section unit) – Command for example xtreg lwage exper, fe vce(cluster nr) or specify the standard error option from the panel data menu (SE/Robust) – nr is the name of the cross-section identifier 27 FE with standard error corrected for clustering 26 28 Several fixed effects • It is also possible to specify fixed time effects (effects that vary over time, but not across cross sections) – “two-way” model – Dummy variables for time periods (in the example data, dummies d81 to d87; leave out one of them!) • Sometimes three-way models 29 – for example if employer of individuals is known, there could be individual effects, firm effects, and time effects – Complicated, if large data sets (e.g. 1 million individuals, 10000 firms) 30 5 RE estimation Random effects (RE) estimation • Include the unobserved effect in the error term: yit  xit   uit , uit   i   it , E ( i )  0 , Var ( i )   i2 – Note: Now x includes a constant • Use generalized least squares (GLS) or maximum likelihood (ML) to estimate the model, taking into account the error structure (all errors for i are correlated with each other, since they include the same random i) 31 Issues in choosing between FE and RE • Traditional view : In FE the ”effects” are time invariant parameters to be estimated, in RE they are random terms • When the data set is a random sample of a large population, RE may be appropriate • If data cover certain individuals / firms / etc., FE is appropriate – For example, data on all the biggest firms in Finland, data on OECD countries, etc. 32 • Contemporary view : • The ”effects” are random in both approaches, the main issue is whether they are correlated with the x ’s (allowed in FE) or not (assumed in RE) • In FE, inferences conditionally on the effects, in RE unconditional inferences • Out ‐ of ‐ sample projections of yit possible with RE, but not with FE, since  i not known for out ‐ of ‐ sample observations 33 • In FE, variables that do not change over time (for a cross section unit), cannot be used – Their mean is constant, so difference from mean is zero (the variable is “wiped out” in the within transformation) – E.g. education cannot be included, if nobody’s educational level changes over time!! – Other examples: • Female dummy in wage equation • Many country characteristics in country panels 34 • FE is based on variation within cross section units • Average the data for each cross section and use them in estimation of model _ _ _ yi  xi '    i –  we get between estimator (BE), which is based on variation between cross section units • It can be shown that RE estimator can be written as a combination of FE and BE estimators – RE is a “compromise” of FE and BW 35 36 6 Between estimation • Breusch-Pagan LR (Lagrange multiplier) test; Tests the hypothesis that Var(i) = 0 – If hypothesis accepted, pooled OLS can be used (instead of RE) – After xtreg –estimation with option re, use command xttest0 – Or Statistics  Longitudinal/Panel data  Linear models  Lagrange multiplier test for random effects – The test output gives a chi-square test statistic and pvalue – High value for the test statistic (and small p-value) would indicate rejection of hypothesis Var(i) = 0 , i.e. rejection of hypothesis of no random effects 37 38 • Testing RE vs. FE • Hausman test Breusch-Pagan test for random effects – Tests whether the coefficients in FE and RE estimations are equal – If xi correlated with i, FE is consistent but RE not, so the estimates should be different – If equality of FE and RE estimates is accepted, both are consistent and we conclude that xi not correlated with i  RE can be used – If equality of estimates is rejected, we conclude that xi is correlated with i  FE should be used 39 • Hausman test in Stata – Statistics  Postestimation  Tests  Hausman specification test – With commands: After xtreg –estimation with option fe, store the estimates with estimates store fe_e 40 Hausman test • Here fe_e is an arbitrary name given for the stored estimates – Then estimate the same model (i.e. same variables) with xtreg and option re, and store the estimates: estimates store re_e – Finally, give command hausman fe_e re_e – A high value for the test statistic (and small p-value) would indicate rejection of hypothesis that FE and RE estimates are equal; if rejected, FE should be used – The test sometimes does not work (involves inverting a matrix that may not be invertible) 41 42 7