y Causal A n a l y s i s P r o j e c t December, 1968 I n t e r i m Report No. 1 INSTITUTE FOR SOCIAL RESEARCH LIBRARY V .-* i 1 CORRELATIONAL PROPERTIES OF SIMULATED PANEL DATA WITH CAUSAL CONNECTIONS BETWEEN TWO VARIABLES Donald C. P e l z a s s i s t e d by Spyros Magliveras, w i t h t e c h n i c a l appendices by Robert A, Lew SURVEY RESEARCH CENTER The U n i v e r s i t y o f Michigan Ann Arbor, Michigan ARCHIVES Causal A n a l y s i s 1 ii TABLE OF CONTENTS. WITH ABSTRACTS* Page A. Introduction • 1 Two questions I n t u i t i v e expectations ( F i g u r e s 1.1 B. 5 to 1.3) 6 M u l t i p l e models f o r the same e m p i r i c a l data (Figure 1.4) . . . . A b s t r a c t : I f one has c r o s s - s e c t i o n a l survey data i n which a c o r r e l a t i o n appears between two v a r i a b l e s x and y, i t I s seldom p o s s i b l e to I n f e r the d i r e c t i o n of c a u s a l Influence ( I f any) between them. However w i t h panel data containing measurements on the same i n d i v i d u a l s a t two or more times, a d i f f e r e n c e i n the "cross-lagged c o r r e l a t i o n s " has been suggested as a means of I n f e r r i n g c a u s a l p r i o r i t y . This r e p o r t w i l l explore the question: given known c a u s a l conn e c t i o n s between two simulated v a r i a b l e s , w i l l t h i s c r o s s lagged d i f f e r e n t i a l appear? The r e v e r s e question I s more difficult: i f a cross-lagged d i f f e r e n t i a l i s observed, can one I n f e r c a u s a l connections? I n t u i t i v e conjectures are presented on how the presence of c a u s a l i n f l u e n c e i n simulated data w i l l a f f e c t the c o r r e l a t i o n s between two v a r i a b l e s as the measurement l a g I n c r e a s e s . 10 General C h a r a c t e r i s t i c s of Two-Variable 12 Two sources of consistency (Figures 1.5a Mode,! to d) . . Types of v a r i a b l e s ( F i g u r e 1.5e) A b s t r a c t : To observe what c o r r e l a t i o n a l p r o p e r t i e s w i l l follow from known connections, simulated v a r i a b l e s x and y have been generated. At s u c c e s s i v e c y c l e s of the operation, an I n d i v i d u a l ' s score on e i t h e r v a r i a b l e may be influenced by an e a r l i e r score on the other v a r i a b l e . Each v a r i a b l e may have d i f f e r i n g degrees of short-term c o n s i s t e n c y (the score I s dependent on the immediately p r i o r s c o r e ) , or long-term cons i s t e n c y (the score i s a l s o dependent on an enduring i n d i v i d u a l c o n s t a n t ) . The program computes and graphs autocorrel a t i o n s and c r o s s - c o r r e l a t i o n s as a f u n c t i o n of measurement lag. *For the meaning of s p e c i a l terms, see G l o s s a r y , p p . v i f f . 14 15' Causal A n a l y s i s 1 iii jjage C. T e c h n i c a l D e t a i l s of Two-Variable Model. . . . 18 Short-terra consistency 18 Long-term c o n s i s t e n c y . . . . . . . . . . . . . . . . 21 I n f l u e n c e of one v a r i a b l e on the other 24 Distributed influence 25 A b s t r a c t : Recursion equations a r e given f o r simulating x and y v a r i a b l e s . Short-term consistency i s c o n t r o l l e d by rho coe f f i c i e n t s ( p and py r e s p e c t i v e l y ) m u l t i p l y i n g the p r i o r x value of the same v a r i a b l e . Long-term consistency i s i n t r o duced by i n d i v i d u a l zeta constants Q and 7^ r e s p e c t i v e l y ) . Causal i n f l u e n c e of e i t h e r v a r i a b l e on the other may be i n troduced through a c o e f f i c i e n t c m u l t i p l y i n g a p r i o r value of the other v a r i a b l e . The c a u s a l influence may be exerted w i t h i n a s i n g l e c a u s a l i n t e r v a l or d i s t r i b u t e d over s e v e r a l intervals. K D. Autocorrelations E f f e c t s of short-term «, . . . . . . . • consistency (Figure 1. 6) E f f e c t s of long-term consistency 26 26 (Figure 1*7) 28 E f f e c t s of c a u s a l i n f l u e n c e by x on y (Figure 1.8) . E f f e c t of x i n f l u e n c e on y a u t o c o r r e l a t i o n given long-term cons i s t e n c y (Figure 1..9) 31 33 A b s t r a c t : The simulated a u t o c o r r e l a t i o n s corresponded c l o s e l y to the t h e o r e t i c a l e x p e c t a t i o n s . I n the absence of longterm consistency, a u t o c o r r e l a t i o n s declined slowly or r a p i d l y toward the zero asymptote depending on the magnitude of the rho c o e f f i c i e n t . When long-term consistency was introduced, a u t o c o r r e l a t i o n s d e c l i n e d as expected toward a non-zero asymptote. When x was allowed to influence y , the consistency of the l a t t e r was increased (autocorrelograms were h i g h e r ) . These observed r e s u l t s corresponded to mathematical d e r i v a t i o n s described i n t e c h n i c a l appendices. E. Cross-Correlations E f f e c t s of v a r i a t i o n i n short-term 1.11). I m p l i c a t i o n s f o r cross-lagged Delay i n maximum (Figure 1.12) * . . . consistency 35 (Figures 1.10 and 35 differential . . . . 38 40 Causal A n a l y s i s 1 iv E f f e c t s of v a r y i n g the amount of c a u s a l i n f l u e n c e by x on y (Figure 1.13) E f f e c t s of introducing long-term consistency and 1.15) E f f e c t of marked i n e q u a l i t y i n short-term and y (Figure 1.16) pa%e 42 (Figures 1.14 consistency 45 of x 48 E f f e c t of marked i n e q u a l i t y i n long-term consistency of x and y (Figure 1.17) 50 E f f e c t of introducing c o r r e l a t i o n between z e t a ' s for x and y ( F i g u r e s 1.18 and 1.19) 52 E f f e c t s of d i s t r i b u t i n g the, c a u s a l i n f l u e n c e (Figure 1.20) . . . . 56 A b s t r a c t : Data were shown on c r o s s - c o r r e l a t i o n s between simulated v a r i a b l e s x and y, where x was given a u n i d i r e c t i o n a l i n f l u e n c e on y . The greater the short-term consistency i n x and y, the higher the correlogram, and the stronger the cross-lagged d i f f e r e n t i a l . Other e f f e c t s were g r e a t e r span of the correlogram, and delay i n occurence of i t s maximum v a l u e . As the c a u s a l i n f l u e n c e of x on y was made stronger, the correlogram became higher, but n e i t h e r i t s span nor delay i n maximum were a f f e c t e d . As long-term consistency was increased the correlogram became higher but f l a t t e r , so that the cross-lagged d i f f e r e n t i a l was obscured. Succeeding s e c t i o n s i n v e s t i g a t e d the e f f e c t s of v a r i o u s m o d i f i c a t i o n s . Generally i t was observed that i n the presence of high short-term consistency the cross-lagged d i f f e r e n t i a l appeared under various c o n d i t i o n s , but i t was weakened by the presence of long-term consistency, e s p e c i a l l y i n the causal variable. F. R e c i p r o c a l Influence 59 E f f e c t of r e c i p r o c a l i n f l u e n c e on a u t o c o r r e l a t i o n s (Figure 1.21) . 59 E f f e c t of r e c i p r o c a l i n f l u e n c e on c r o s s - c o r r e l a t i o n s (Figure 1,22) 61 D i f f e r i n g magnitude of i n f l u e n c e (Figure 1.23) 63 E f f e c t of long-term consistency 64 (Figure 1.24 and 1.25) A b s t r a c t : I n the preceding s e c t i o n s , x was allowed a u n i d i r e c t i o n a l i n f l u e n c e on y. I n the present s e c t i o n r e c i p r o c a l or b i d i r e c t i o n a l i n f l u e n c e s were introduced; caution was needed s i n c e high short-term consistency r e s u l t e d i n unstable v a r i a n c e s . Given r e c i p r o c a l congruent i n f l u e n c e s or p o s i t i v e feedback Causal A n a l y s i s 1 . + PS I n Conclusion 69 ( x — > y — * x ) , the cross-correlogram showed two p o s i t i v e peaks as expected, one on e i t h e r s i d e of zero l a g . Given a p o s i t i v e and a negative i n f l u e n c e or a negative feedback loop ( x - ^ y - - ^ x ) , the correlogram showed one p o s i t i v e and one negative peak as expected. When long-term consistency was introduced, these shapes were preserved but became much flatter. G. v a A b s t r a c t : I n r e p l y to the f i r s t l a r g e q u e s t i o n — w h e t h e r known c a u s a l connections w i l l give r i s e to a cross-lagged d i f f e r e n t i a l — the answer thus f a r w i t h simulated data has been a f f i r m a t i v e , provided that each v a r i a b l e has a t l e a s t moderate short-term c o n s i s t e n c y , and t h a t long-term consistency i s low i n a t l e a s t the c a u s a l v a r i a b l e . S t i l l u n s e t t l e d , and the subject f o r future i n v e s t i g a t i o n , i s the r e v e r s e question of whether the observation of d i f f e r e n c e i n e m p i r i c a l c r o s s lagged c o r r e l a t i o n s w i l l permit inferences about underlying c a u s a l connections. APPENDICES 73 e Causal A n a l y s i s 1 vl GLOSSARY OF TERMS AND SYMBOLS Symbol Term Score it* y it Meaning Values on simulated v a r i a b l e s x and y r e s p e c t i v e l y , f o r i n d i v i d u a l i a t time t . Cycle S i n g l e operation of computer program c r e a t i n g a s e t of scores f o r N i n d i v i d u a l s a t a given time. Succ e s s i v e c y c l e s a r e equivalent to s u c c e s s i v e time u n i t s Lag I n t e r v a l of time between s u c c e s s i v e measurements of any v a r i a b l e , expressed l n c y c l e s ; a l s o c a l l e d measure ment interval» Autocorrelation r ( x t» t+4c x r(y t C o r r e l a t i o n between a s e t of scores f o r N i n d i v i d u a l s on one v a r i a b l e a t time t and scores f o r the same i n d i v i d u a l s on t h e same v a r i a b l e a t a l a t e r time t+k, where k » 1, 2, 3, .... Autocorrelogram Graph of the a u t o c o r r e l a t i o n Cross-correlation C o r r e l a t i o n between a s e t of scores f o r N i n d i v i d u a l s on one v a r i a b l e a t time t and those on another v a r i a b l e a t time t4k, where k » . . . r 3 , -2, - I , 0, 1, 2, 3.. A negative k i n d i c a t e s that y i s measured before x; p o s i t i v e k t h a t x i s measured before y , Cross-corre1ogram Graph of the c r o s s - c o r r e l a t i o n , plotted as a function of k. Cross-lagged correlation: Any c r o s s - c o r r e l a t i o n between two v a r i a b l e s measured a t d i f f e r e n t t i m e s — i . e . , k 4 0. Cross-lagged d i f f e r e n t i a l Difference p l o t t e d as function of k. = i n cross-lagged c o r r e l a t i o n s r < f x \ <y » t-*> r x t D i f f e r e n c e between two c r o s s - c o r r e l a t i o n s , i n one of which x precedes y by k time u n i t s , and i n the other y precedes x by the same l a g . I n a cross-correlogram, the d i f f e r e n t i a l appears as a d i f f e r e n c e i n height a t equal d i s t a n c e s on e i t h e r s i d e of t e r o l a g , k » 0. Causal A n a l y s i s 1 Term vii Symbol Meaning Short-term consistency: C h a r a c t e r i s t i c of a v a r i a b l e such that scores at time t+1 are dependent on immediately p r i o r scores, the degree of dependence being governed by a rho c o e f f i c i e n t yo or yoy. x Rho: c o e f f i c i e n t of short-term Px* Py S Long-term consistency: e e consistency above. C h a r a c t e r i s t i c of a v a r i a b l e such that scores for each i n d i v i d u a l f l u c t u a t e through time around an I n d i v i d u a l z e t a constant, ^ or 2iy» x Zeta: constant of long-term 3ix» 3iy Causal influence x—>y 1 [ * > > Causal i n t e r v a l g S e e consistency a b o v e « One v a r i a b l e i s s a i d another i f scores on pendent on scores on where the i n t e r v a l g to have a c a u s a l influence on the l a t t e r a t any time t are dethe former a t a p r i o r time t-g, i s c a l l e d the c a u s a l i n t e r v a l . See above. Causal i n f l u e n c e can be d i s t r i b u t e d over s e v e r a l i n t e r v a l s - - i . e . , y can depend not only on fc x C o e f f i c i e n t of c a u s a l °xy* °yx Congruent i n f l u e n c e : t-g b u t a l s o 0 1 1 x t-g+l> x t-g+2'-" t-r x influence C o e f f i c i e n t governing the extent to which one v a r i a b l e i s influenced by the p r i o r scores on the other variable. As one v a r i a b l e i n c r e a s e s , the other i n c r e a s e s . + + x—>y, y — > x Incongruent i n f l u e n c e : As one v a r i a b l e i n c r e a s e s , the other decreases. x-^»y, y - ^ x Causal connection Any c a u s a l i n f l u e n c e between x and y i n e i t h e r d i r e c t i o n , whether congruent or incongruent. Span I n a cross-correlogram, the distance between that point on the k a x i s a t which the correlogram begins to r i s e from an e s s e n t i a l l y h o r i z o n t a l slope, to that point a t which the correlogram again becomes e s s e n t i a l l y h o r i z o n t a l . Where p e r i o d i c f l u c t u a t i o n s occur, the span may be indeterminate. vlii Causal A n a l y s i s 1 Term Symbol Meaning Delay i n maximum The d i s t a n c e , i n k u n i t s , between t h a t point where k » g and t h a t point where the cross-correlogram reaches i t s maximum h e i g h t . Asymptote Height of autocorrelogram or a cross-correlogram a f t e r i t s slope has become e s s e n t i a l l y h o r i z o n t a l . Causal p r i o r i t y An i n f e r e n c e t h a t the hypothesis x — * y i s more tenable than the hypothesis y — » x . The question of whether such i n f e r e n c e s can be drawn l i e s beyond the scope of the present r e p o r t . SURVEY RESEARCH CENTER The U n i v e r s i t y of Michigan Ann Arbor, Michigan Causal A n a l y s i s P r o j e c t Interim Report No. 1 December, 1968 CORRELATIONAL PROPERTIES OF SIMULATED PANEL DATA WITH CAUSAL CONNECTIONS BETWEEN TWO Donald C. VARIABLES* Pelz a s s i s t e d by Spyros Magliveras, with t e c h n i c a l appendices by Robert A. A. Lew Introduction I f one has c r o s s - B e c t i o n survey data (a v a r i e t y of measures tained on a population of i n d i v i d u a l s a t one point i n time) and ob- observes a c o r r e l a t i o n between two v a r i a b l e s x and y, i t i s seldom possible'to d i s t i n g u i s h by s t a t i s t i c a l a n a l y s i s between two p l a u s i b l e hypotheses: "x i s c a u s a l l y p r i o r to y " ( x « ^ y ) , and **y i s c a u s a l l y p r i o r to x" Conducted under a grant from the National Science Foundation, GS-1873, with supplementary aid from the National Broadcasting Company. sions began e a r l y i n 1967 Discus- involving the author, Spyros Magliveras as pro- grammer, and Graham Kalton, v i s i t i n g l e c t u r e r l n sociology and sampling s t a t i s t i c s from the London School of Economics, who gave f r u i t f u l guidance in s t r u c t u r i n g the simulated model and expressing some of i t s properties from time s e r i e s theory. mathematical properties Robert Lew has pushed forward the derivation of the model. and expanding the computer program* George G l u s k i a s s i s t e d Lew of i n revising Cyrus Ulberg developed a program for examining lagged c o r r e l a t i o n s for a c t u a l panel data. Causal a n a l y s i s 1 page 2 * However, i f one has panel data (measurements on the same individuals a t two or more times), a s t a t i s t i c a l procedure c a l l e d " d i f f e r e n t i a l i n cross-lagged c o r r e l a t i o n s " has been proposed as a means of deciding which of the two hypotheses i s more tenable (Pelz and Andrews, 1964; 1963; Campbell and Stanley, Campbell, 1964). Imagine a population of persons characterized by v a r i a b l e s x and y, where the x and y scores for each i n d i v i d u a l change somewhat from one time to the next, but do not change r a d i c a l l y or e r r a t i c a l l y . such v a r i a b l e s "moderately consistent through time.") the autocorrelation (We s h a l l c a l l With such variables between x a t one time and x at a l a t e r time w i l l be high for adjacent measurements, and w i l l decline as the time between measurements i n c r e a s e s . Rozelle and Campbell (1969) have pointed out that one must consider at l e a s t four r i v a l hypotheses, two a s s e r t i n g that one v a r i a b l e the other in a congruent fashion and (as one two based on incongruent influences other decreases). increases, the other increases), (as one v a r i a b l e increases, may the Representing the two types of influence by "+" and "-" r e s p e c t i v e l y we must consider the hypotheses; One influences x y, x-^-y, y ^ x , a l s o hypothesize r e c i p r o c a l connections, such a s : common influence by a t h i r d v a r i a b l e : w^ , etc y-^x. x—^y-^x I n the in troductory discussion above, for the sake of s i m p l i c i t y l e t us assume only u n i d i r e c t i o n a l congruent influences: x - ^ y and y-^x. Causal a n a l y s i s 1 page 3 Imagine a l s o that the value of x a t time t tends to produce, i n a l i n e a r fashion, a corresponding (congruent) value In v a r i a b l e y a f t e r a c e r t a i n number of time u n i t s which we s h a l l c a l l the "causal i n t e r v a l , " designated by the symbol g. I n such a system, x and y are measured f o r a l l i n d i v i d u a l s a t time t and again a t time t+k (k « time lag between measurements). From the four s e t s of s c o r e s , s i x c o r r e l a t i o n c o e f f i c - ients (or other measures of a s s o c i a t i o n ) can be obtained as indicated by the following lines: t-ttt I t i s u s u a l l y assumed that causation i s Instantaneous—the causal f a c tor must be present l n time and space when the e f f e c t occurs. data, of course, the value of x ponding change I s noted i n y. weeks before action occurs. I n actual may change some time before a corres- For example, motivation to a c t may a r i s e I n t h i s i n v e s t i g a t i o n we s h a l l not review philosophical debates on the meaning of "causation." I n t h i s paper, causal influence w i l l be used l n the sense of functional dependence, as i l l u s t r a t e d i n the recursion equations f o r y as a function of p r i o r values of x or v i c e versa; see below pages 24-25. Causal a n a l y s i s 1 page 4 Suppose that we have chosen a measurement lag k which i s close to the causal i n t e r v a l g needed for x to influence y. vations seem i n t u i t i v e l y p l a u s i b l e , should be observed between x strongest t The following obser- (a) A r e l a t i v e l y strong c o r r e l a t i o n and y^^» since the causal influence i s p r e c i s e l y over t h i s time i n t e r v a l , (b) We may s t i l l observe p o s i t i v e c o r r e l a t i o n s between simultaneous scores; r ( x , y ) and r ( x ^ , t t t y t 4 4 c ).* Since x and y are both reasonably consistent through time (that i s x w i l l be p o s i t i v e l y correlated with 3t ^> *d the same for y at t+ fc and y ^)» t the strong diagonal c o r r e l a t i o n w i l l be r e f l e c t e d moderately i n the r e l a t i o n s when x and y are simultaneous, fc cor- (c) The weakest c o r r e l a t i o n should appear along the opposite diagonal, y and x t , since the c o r r e s t+K ponding x and y values are. each remote i n time from those x and y values which are causally linked. Campbell has independently suggested the same expectation: if v a r i a b l e X could be said to cause v a r i a b l e 0, then "the 'effect* should c o r r e l a t e higher with a p r i o r 'cause than with a subsequent 'cause,' i . e . , r q > rv o ." (Subscripts 1 and 2 here r e f e r to f i r s t and second 12 ^1 1 x measurements r e s p e c t i v e l y of X and 0. See Campbell and Stanley, 1963, p. 238.) The method has generated p l a u s i b l e outcomes with r e a l data. Pelz This notation for c o r r e l a t i o n c o e f f i c i e n t s w i l l be used to avoid double subscripts. pendices , I t i s consistent with the notation adapted by Lew i n the ap- Causal a n a l y s i s 1 page and Andrews (1964) applied i t to data on height and weight of growing boys and obtained r e s u l t s generally pe.cted. consistent with what had been ex- The method a l s o produced a consistent u n i d i r e c t i o n a l ordering among 12 measures of consumer behavior and expectations from a survey panel. I t proved possible to arrange these i n a directed network of causal p r i o r i t i e s i n which only one inconsistency appeared out of 35 comparisons • Two questions However, the v a l i d i t y of the procedure as a b a s i s f o r choosing among causal hypotheses i s f a r from e s t a b l i s h e d . Two e n t i r e l y separate questions must be considered, I. Suppose we know In f a c t that v a r i a b l e x influences v a r i a b l e y in a congruent d i r e c t i o n ( x - ^ y ) ; i . e . , a change In the s t a t e of x f o r each i n d i v i d u a l i s followed (with some margin of indeterminacy) by a corresponding change in the state of y . - true that a s i g n i f i c a n t difference appear? Will r ( x , y ^ ) t > r(y , t I f I n fact x - ^ y , w i l l I t hold i n the cross-lagged c o r r e l a t i o n s w i l l x ^ ) ? Evidence to date suggests that such a r e s u l t w i l l often appear, provided c e r t a i n conditions hold--such as that x and y are reasonably consistent through time (neither markedly stable nor unstable), that the ^Because of the extreme consistency of the v a r i a b l e s , these e f f e c t s ap- peared only when p a r t i a l correlations were used. H 2* 2 W r e P r e s e n t That I s , where H^, W^, height and weight on f i r s t and second measurements r e - s p e c t i v e l y , the p a r t i a l c o r r e l a t i o n HjW r three of s i x t r i a l s . *W^ exceeded WjH2 ' ^ r H * n Causal a n a l y s i s 1 page 6 causal influence i s not instantaneous, that the i n t e r v a l of measurement i s reasonably close to the i n t e r v a l of causation, that the Influence of x on y i s l i n e a r , e t c . II. Even i f the answer to the above I s a f f i r m a t i v e , we must ask second question: can we reason i n the reverse d i r e c t i o n ? the That i s , i f we observe e m p i r i c a l l y a s u b s t a n t i a l difference ln the cross-lagged correlations—i.e., i f r(x , y ) > t t+K r(y , x ) — c a n we t t+k Answer: be s e v e r a l other models (perhaps an by no means. There may I n f e r that x - ^ y ? infinite number) of causal connection between these v a r i a b l e s which might equally w e l l generate the observed d i f f e r e n t i a l i n c o r r e l a t i o n s . i l l u s t r a t i o n i s given below, pp. 10-12, How One hypothetical we are to d i s t i n g u i s h among these a l t e r n a t i v e s remains a major problem. Answering these two questions w i l l not be easy. confined to question I . This report i s Given known causal connections among c e r t a i n v a r i a b l e s , what differences i n cross-lagged c o r r e l a t i o n s (and what other c o r r e l a t i o n a l properties) can be expected? I n t u i t i v e expectations Suppose we have generated simulated v a r i a b l e s x fc and y fc (the sub- s c r i p t t standing for successive time u n i t s ) , where each v a r i a b l e i s moderately consistent through time, and we have l e t y gruently by an e a r l i e r value of x, say x^y tained between x t and y ^ » t+K Now, for each k = 0, 1, 2, What should we observe? be Influenced con- c o r r e l a t i o n s are ob- .... I t seemed reasonable to expect a cross- correlogram somewhat l i k e that pictured i n the upper h a l f of Figure 1,1. page 7 Causal analysis 1 (The i l l u s t r a t i o n s given below were I n i t i a l i n t u i t i v e conjectures; a c t u a l r e s u l t s w i l l be given l a t e r . ) .op. +1.0 Cross cor re lation • 00 r(x , y _ , ) t-rtc f c \ l.ofih i 15 * * » * 10 • o ' ' » ' * * t i i i ro i i \ V ( i n t e r v a l of measurement or l a g ) Figure 1.1. Expected shape m cross-correlograms between x and y where I n t e r v a l of measurement v a r i e s and c a u s a l » 5. interval The upper and lower curves should be produced by con- gruent and incongruent influences respectively of x on y. Thus, i f we measure y long before x (k » -25), or we measure x long before y (k « +25), we should observe approximately correlation. a zero c r o s s - The measures are too remote i n time f o r them to r e f l e c t the I n f l u e n c e of x on y w i t h i n f i v e time u n i t s . However, i f we measure x Just f i v e u n i t s before y (k = +5 i n F i g u r e 1.1), a c l e a r l y p o s i t i v e xy c o r r e l a t i o n page 8 Causal A n a l y s i s 1 should appear. As the measurement i n t e r v a l k departs from t h i s causal i n t e r v a l , the c o r r e l a t i o n between x and y should become smaller and smaller. I t w i l l not drop immediately to zero, however, because x and y are each moderately and y autocorrelated. Hence a c o r r e l a t i o n between x (say) w i l l a l s o generate a smaller one between x 9 2 and x , be- tween x^ and x^, e t c What i f x influences y i n a negative or incongruent fashion: the higher the x, the lower the y ? The cross-correlogram should be mirrored upside down, as i n the lower curve of Figure 1.1. As the measurement i n t e r v a l between x and y gets c l o s e r t o the causal i n t e r v a l of 5, an i n creasingly negative c o r r e l a t i o n between x and y should appear, reaching a maximum a t k = +5, I n the same way, i f y i s allowed to influence x with a causal i n t e r v a l of (say) 5 time u n i t s , then we should observe a maximum xy correl a t i o n when the measurement of y precedes that of x by 5 u n i t s , I.e., when k =» -5. What about a b i d i r e c t i o n a l or r e c i p r o c a l influence of x and y on + each other? y-£>x Suppose x — ^ y with a c a u s a l i n t e r v a l of 5 time periods, and with a c a u s a l i n t e r v a l of 8 u n i t s . In the s o l i d curve I n Figure 1.2 S We should then observe, as shown a double-humped curve with one peak a t k = -8 ( r e f l e c t i n g the y - ^ x i n f l u e n c e ) , and another peak a t k = +5 ( r e f l e c t i n g the x-^>y I n f l u e n c e ) . Causal a n a l y s i s 1 page 9 -1.00 J' / Crosscorrelation r < v Sffec+cf ' i V p 0 0 \ w -l.od ^ \ . i -15 i i / \ \ > I -10 /effect N • i 1 i i — 4 i i i -5 i i i 1 i 0 5 i i . . i p t .I i 10 15 k(lag) Figure 1.2. Expected shape of cross-correlograms given r e c i p r o c a l influence of x on y (with causal i n t e r v a l of 5) and y on x (with causal i n t e r v a l of 8 ) . I f one of the influences i s congruent and the other incongruent (say, x-^>y but y - ^ x ) , a cross-correlogram such as the dashed curve in Figure 1.2 i s to be expected. I n both of these figures we expected the c o r r e l a t i o n s to approach zero at e i t h e r extreme, i . e . , when x and y are measured f a r apart. seems no reason to expect any remote influence of x on y. There This value at e i t h e r extreme may be c a l l e d the "no-cause asymptote." Might the no-cause asymptote be other than zero? Of course. If x and y were both influenced by a t h i r d v a r i a b l e which generated a prev a i l i n g p o s i t i v e or negative c o r r e l a t i o n between them, we would expect a d e f i n i t e xy c o r r e l a t i o n to appear even between remote measurements. Causal a n a l y s i s 1 page 10 Two p o s s i b l e correlograms with non-zero asymptotes are i l l u s t r a t e d i n Figure 1.3. Note i n the lower cur\e that a negative Influence of y on x ( y — > x ) should produce a minimum c o r r e l a t i o n between the two measurements when y precedes x by the causal i n t e r v a l , although a l l c o r r e l a t i o n s could remain p o s i t i v e . Cross corre l a t ion *<* » y. t-fk ) .00 i»odri 15 I 1—1 L 10 10 15 k(lag) Figure 1.3. Expected correlograms when, because of common t h i r d cause w, a p r e v a i l i n g xy c o r r e l a t i o n i s generated; no- cause asymptote departs from zero. Multiple models for the same e m p i r i c a l data Suppose, i n some e m p i r i c a l panel data with x and y measured 5 time u n i t s apart, we observed that the cross-lagged c o r r e l a t i o n was low when y preceded x (k « - 5 ) , high when x preceded y (k = +5), and intermediate when the two were measured simultaneously (k « 0 ) . These r e s u l t s cor- respond to the three c i r c l e s i n Figure 1.4. Could we then i n f e r that x-i-y? By no means. Three hypothetical models which equally w e l l f i t these observed c o r r e l a t i o n s are i l l u s t r a t e d i n Figure 1.4. Causal a n a l y s i s 1 page 11 B St- Cross corre lation *(* , y . ) t+k « A 0 0 I—J lJ_l -15 ' ' • ' ' -10 I * 1 * • -5 0 5 Li 10 k(lag) Figure 1.4. E m p i r i c a l c o r r e l a t i o n s represented by the three c i r c l e s might f i t s e v e r a l d i f f e r e n t models, properties of which are described i n t e x t . Correlogram A might a r i s e i f x a f f e c t e d y p o s i t i v e l y (x—»y) with a causal lag of 4. Curve B could a r i s e i f y - ^ x with a causal lag of 4, and the presence of t h i r d factors introduced a p o s i t i v e no-cause asymptote. S t i l l another p o s s i b l e pattern i s i l l u s t r a t e d by curve C i n which x and y both influence the other p o s i t i v e l y , but with a d i f f e r e n t causal i n t e r v a l . The e m p i r i c a l data, based on two measurements only, might equally w e l l f i t three d i f f e r e n t models. To be sure, a d d i t i o n a l measurements a t three or more points of time would help to d i s t i n g u i s h among these models. Thus, i f we a l s o measured at k « +10, we might be able to prefer one of these three models over the other two. (Even so we might not be able to discriminate between minor v a r i a t i o n s of that model d i f f e r i n g , e.g., i n causal i n t e r v a l ) . The reader i s reminded that the above discussion i s based purely on conjecture. I t seemed reasonable to expect such crossr-correlograms, I 15 I I Causal a n a l y s i s 1 page 12 given the causal influences indicated. As w i l l be seen l a t e r , r e s u l t s with simulated data have generally corroborated these i n t u i t i o n s , but some unexpected departures have a l s o appeared. I t i s hoped that the l a t t e r can be accounted f o r i n terms of mathematical properties of the systems generated (see t e c h n i c a l appendices by Robert Lew). B. General C h a r a c t e r i s t i c s of Two-Variable Model I n order to explore the f i r s t of the two large questions—what c o r r e l a t i o n a l properties w i l l follow from known causal connections—my colleagues and I have begun generating simulated data by means of a computer (currently the IBM 360/67), For a population of N hypothetical i n d i v i d u a l s , normally d i s t r i b u t e d v a r i a b l e s x. and y. are created as described below, and are allowed to it it ' J operate through successive periods of "time," each time u n i t represented by a cycle or s i n g l e operation of the computer program. Each v a r i a b l e i s given a s p e c i f i e d consistency over time, by means of introducing a small to large normally d i s t r i b u t e d random e r r o r term (with mean = 0) at successive steps. *More p r e c i s e l y , normal d i s t r i b u t i o n s are created a t time « 0; subsequent "error' 1 ably remain normal. sary. since terms are a l s o normal, the r e s u l t a n t d i s t r i b u t i o n s probThe property of normality i s convenient but not neces- I t nowhere appears i n the mathematical derivations i n the appendices. Causal a n a l y s i s 1 page 13 After the two v a r i a b l e s have gone through s e v e r a l c y c l e s , a u n i d i r e c t i o n a l influence of x on y i s introduced with a s p e c i f i e d causal i n t e r v a l , such that an individual's y score a t a p a r t i c u l a r time ( y ^ ) t i s influenced by h i s score at a s p e c i f i e d prior time ( i _ ) * This x t influence can e i t h e r be p o s i t i v e (congruent) or negative. or r e c i p r o c a l influences can a l s o be created, but for now d i r e c t i o n a l s i t u a t i o n w i l l be g Bidirectional only the u n i - discussed. After allowing the causal system to become established during an i n i t i a l period (such as 20 c y c l e s ) , we allow the system to operate through 50 more cycles during which the program computes the c o r r e l a t i o n a l propert i e s of the r e s u l t i n g data. For each v a r i a b l e the program computes, f i r s t , the of each v a r i a b l e , e.g., r(x , x t 2, ...25 cycles. autocorrelation ) , for each t and each lag from k = 1, t+K The program p r i n t s a correlogram showing how the average * of these autocorrelations v a r i e s as the l a g i n c r e a s e s . The program a l s o computes c r o s s - c o r r e l a t i o n s , i . e . , r ( x , y t ^ ) , t"TTt where the i n t e r v a l between the two measurements can vary from k = -25 measured 25 cycles before x) to k = +25 (x i s measured 25 cycles before y ) . Again, an average of these lagged c r o s s - c o r r e l a t i o n s i s computed, and In the course of 50 cycles there are 49 p o s s i b i l i t i e s for an with lag of 1 (x^ and 2, e t c . x t + ^)» ^ (y i s a autocorrelation instances for autocorrelation with lag of Appendix D describes how the program s e l e c t s a subset of 25 examples of each l a g i n order to obtain an average c o r r e l a t i o n f o r each lag from k « 1 to k » 25. Causal a n a l y s i s 1 page 14 * graph of the r e s u l t i n g cross-lagged Two sources of correlogram i s printed. consistency The next sections w i l l describe some properties of the simulated v a r i a b l e s which are created by the computer program. One property i s that each i n d i v i d u a l have some degree of consistency in x and y over time. nized. Two d i s t i n c t sources of consistency can be recog- One w i l l be c a l l e d short-term or cycle-to-cycle consistency, r e - f l e c t i n g the f a c t that a person i s not l i k e l y to change sharply from one time to the next. be high. The autocorrelation between adjacent measurements w i l l I n the recursion equation for generating x, we l e t x a t time t ( i ) depend l i n e a r l y upon the i n d i v i d u a l ' s immediately p r i o r value x t (x^ t ) and a random e r r o r which can e i t h e r be small (high cycle-to-cycle consistency) or large (low c y c l e - t o - c y c l e c o n s i s t e n c y ) . Under such a system, the autocorrelation w i l l drop c l o s e r and closer to zero as the i n t e r v a l between measurements increases. who An i n d i v i d u a l s t a r t e d high on x could a f t e r 50 time u n i t s end up a t the opposite extreme, and v i c e v e r s a . But i n r e a l l i f e t h i s does not often happen; there i s u s u a l l y some long-term consistency due to stable p e r s o n a l i t y factors or the s o c i a l environment. Thus i n panel studies of p o l i t i c a l behavior and a t t i t u d e s , i t i s not uncommon to observe that the autocorrelation over a long i n t e r v a l i s almost the same as the autocorrelation over a short i n t e r v a l . See appendix D for method of s e l e c t i n g 25 examples of each l a g for the purpose of averaging. Causal a n a l y s i s 1 page 15 To achieve such long-term consistency we have assigned each i n d i v i d u a l a s t a b l e underlying tendency or constant--we c a l l i t - a zeta value-such that the mean zeta among i n d i v i d u a l s i s zero, and each individual's scores vary from c y c l e to cycle around h i s zeta value. Figures 1.5a through d show how x scores of two hypothetical i n d i viduals might vary over time, under d i f f e r i n g combinations of short-term and long-term consistency. Figures 1. 5a through d here The greater the differences among i n d i v i d u a l zeta values ( i . e . , the larger the zeta variance r e l a t i v e to t o t a l v a r i a n c e ) , the more long-term consistency Is introduced. I n our program, the d i s t r i b u t i o n of zetas i s made normal, and the variance of zetas can be a l t e r e d from large (for high long-term consistency) to zero ( f o r no long-term consistency)• Both short-term and long-term components f o r x can be varied independently, of course, from those for y . Types of v a r i a b l e s The two sources of consistency described above have been used, in various combinations, to generate many types of v a r i a b l e s . Some of these types are i l l u s t r a t e d i n Figure 1. 5e. Figure 1. 5e here Causal A n a l y s i s 1 page 16 a. High short-term No long-term consistency " « 4) rH consistency Time Time c. Low short-term No long-term b. High short-term High long-term consistency " d» Low short-term consistency High long-term 0 .O U > Time Time Figure 1.5a to d. Schematic representation of x scores for two v i d u a l s over time. Dashed l i n e ( Indi- ) corresponds to zeta con- stant for each, i n d i v i d u a l (both of these = 0 i n Figures a and c ) . Causal a n a l y s i s 1 page 17 1.00 Auto* correlation r(x . x t ) HE i .00 15 0 5 11 i i t 10 . i 15 k(lag) Figure l»5e« Depending on magnitude of short-term and long-term consistency i n a v a r i a b l e , i t s autocorrelation as measuremeant i n t e r v a l (k) i n c r e a s e s can be made, to vary i n shape. Curves I - VI are described in t e x t . Curves I and I I represent d i f f e r i n g degrees of short-term or cycleto-cycle consistency, with no long-term e f f e c t s . Given low short-term, consistency as i n curve I , the a u t o c o r r e l a t i o n i s v i s i b l e only over a few time u n i t s . Even when the short-term consistency i s high (as i n curve I I ) , the a u t o c o r r e l a t i o n eventually decays to zero i f a s u f f i c i e n t l y long Int e r v a l between measurements i s allowed. The remaining curves I l l u s t r a t e a u t o c o r r e l a t i o n when the long-term consistency i s moderate ( I I I and I V ) or high (V and V I ) . Even with a long time between s u c c e s s i v e measurements, the autocorrelation remains Causal a n a l y s i s 1 positive. page 18 Broken and s o l i d curves i n each set show what happens when short- term consistency i s low or high r e s p e c t i v e l y . How w i l l v a r i a t i o n i n these c h a r a c t e r i s t i c s a f f e c t the c r o s s - c o r r e l ograms between x and y? I n t u i t i v e l y I t seemed l i k e l y that given low consistency from e i t h e r source (as i n curve I ) the cross-correlograms suggested i n Figures 1,1 to 4 above would r i s e and f a l l sharply. consistency, e i t h e r short-term (as i n curve I I I ) or long-term Given high (curve V), the cross-correlograms should r i s e and f a l l much more gradually. Some e m p i r i c a l r e s u l t s given below (pp. 35-47) generally supported these ex- pectations, but a l s o revealed important d i f f e r e n c e s i n the e f f e c t of shortterm and long-term consistency. C. Technical D e t a i l s of Two-Variable Model I n t h i s s e c t i o n only, I t w i l l be d e s i r a b l e to use a notation somewhat more complex than that used previously. I t w i l l be consistent with the notation i n the t e c h n i c a l appendices. Short-term consistency Consider f i r s t the s i t u a t i o n i n which there i s no long-term c o n s i s tency, and no c a u s a l influence of e i t h e r v a r i a b l e on the other. cussion w i l l be i n terms of the x v a r i a b l e ; The d i s - i t w i l l apply equally to y . For a population of N i n d i v i d u a l s , a score x ^ *or each i at time t =• 0 i s assigned by random s e l e c t i o n from a normally individual distri- 2 buted population of x values with mean = 0 and variance o* x desired. Each individual's expected value thus i s 0. s p e c i f i e d as (The assumption of Causal a n a l y s i s 1 page 19 normality i s convenient but i s not necessary f o r the mathematical d e r i vations i n the appendices.) At each successive time t 9 1, 2, 3, ..., a value x ^ t I s generated by the following recursion equation; x where; it ^> x e Xt = />*< -l> x lt + e xt • • • W i s a number between 0 and 1, constant over t and i ; i s a r a n < ^ o m normally d i s t r i b u t e d error term with mean a o and variance (of s p e c i f i e d magnitude) constant over t and i ; values of e x t are independent across individuals and across time. The x's generated i n t h i s way represent one of the simplest autoregressive time s e r i e s , the Markov s e r i e s (see Kendall and Stuart, The Advanced Theory of S t a t i s t i c s , 1966, V o l . 3, pp. 405 f f . ) score x ^ t The i n d i v i d u a l ' s a t any time i s a l i n e a r combination of a c e r t a i n f r a c t i o n (o^) of h i s immediately p r i o r x score x^ _^, and a random e r r o r term e t can be interpreted as the e f f e c t of "unknown other v a r i a b l e s . " x t which I t i s as- sumed that yo^ i s the same f o r a l l i n d i v i d u a l s (although one can imagine a s i t u a t i o n i n which some i n d i v i d u a l s a r e linked more c l o s e l y from one time to the next than are other i n d i v i d u a l s . ) For our purposes i t i s desirable that the mean and variance of x be independent of time; with time, ( I n r e a l data one i s often confronted with means and variances Technically p x trolled; they should not change r a d i c a l l y or systematically could exceed 1, but then the variance of x ^ i t increases without bound as t i n c r e a s e s . t cannot be con- Causal a n a l y s i s 1 page 20 that s y s t e m a t i c a l l y r i s e , f a l l , or f l u c t u a t e over time. For our present model such e f f e c t s would be inconvenient, although possibly they could be incorporated i n future v e r s i o n s . ) I f the variance of x variance of e i s to remain independent of time, and i f the fc i s assumed fixed over time, then from expression (1) i t must hold true for any s p e c i f i e d time that: Var(x ) - t (Var(x )) + Var(e ) t Hence the values of p„, Var(x ) , and Var(e .) are v# t 'X . . . (2) x t Xt interdependent: Var(e ) x t />x " 1 / 1 ~ : > a n • • • <> d 3 Var(x ) fc Var(e xt )= (l-o Once the values of 2 x ) (Var(x ) ) . . . (4) z and of V a r ( x ) are s p e c i f i e d , the value of V a r ( e ) £ x t is fixed. The reader w i l l note that expression (3) i s one form of the expression f o r a c o r r e l a t i o n c o e f f i c i e n t . And in fact, i s the t h e o r e t i c a l autocorrelation between adjacent values of the Markov s e r i e s x . fc This rho c o e f f i c i e n t governs the short-term consistency of the x v a r i a b l e . larger i t i s , the more c l o s e l y succeeding values of x ^ t The are governed by the immediately p r i o r value. T h e o r e t i c a l expectation for a u t o c o r r e l a t i o n . As the lag k between successive sets of x^. i n c r e a s e s , i t i s known that the t h e o r e t i c a l autoth c o r r e l a t i o n between x and x t , w i l l be simply p r a i s e d to the k t+k 'x *See Kendall and S t u a r t , op. c i t . , p. 405 i power: Causal a n a l y s i s 1 page 21 /°( t» t+k> x x a Px • • • (5) (In the left-hand term p i s used instead of the usual r to represent the t h e o r e t i c a l r a t h e r than e m p i r i c a l a u t o c o r r e l a t i o n . For derivation, see Appendix A . l (7) and proof of ( 7 ) . I n Figure 1. 6 below the reader w i l l be able to compare t h e o r e t i c a l autocorrelations with those obtained from simulated data. Long-term consistency To introduce long-term consistency, each i n d i v i d u a l i s assigned an expected value not of zero but of an individual constant zeta 0£ix)• That i s , h i s scores over time can be conceived as deviating around h i s individual zeta. We s h a l l designate the new s e r i e s of x values for each i n d i v i d u a l t 1 as x^ , where the expected value E ( x ^ ) = Zixt S i m i l a r statements can be t made for the y v a r i a b l e . I n such a time s e r i e s , each i n d i v i d u a l i a t time t • 0 i s two randomly selected values: assigned an i n d i v i d u a l constant *I from a normal ->ix d i s t r i b u t i o n of zeta's with EO^) a 0 and Var(£ ) s p e c i f i e d ; x and an initial deviation value X^Q from a normal d i s t r i b u t i o n having E(XQ) « 0 and Var(xQ) specified. •k i His a l score i s the sum of these: x.10 i =n i t ix. "10 +' S i x rt rt Successive values of x ^ x Also: it y i t t are then generated by the recursion equation: = M It-l> + T x^Lx - /Oy^it-l) + T ySiy x (The tau c o e f f i c i e n t T x • • • (6) + xt ' e (7) + e yt or TV w i l l be discussed s h o r t l y . ) y "it Again the assumption of normality i s convenient but not necessary f o r mathematical d e r i v a t i o n . Causal analysis 1 page 22 The variance of J sired values. x and of Xq can be set independently a t any de- For convenience we have allowed the sum of these two t o equal an a r b i t r a r y t o t a l ( i n i t i a l ) variance. I n the simulated data shown l a t e r , we have set Var(7 ) + Var(x^) = 20. This procedure assumes that an individual's zeta i s fixed throughout time. I n r e a l l i f e , of course, individuals are not so stable. Their long-term consistencies (due t o personality, sociological conditions, etc.) might show a mild v a r i a t i o n through time, as well as upward or down- ward trends, A m u l t i v a r i a t e model now being constructed w i l l permit the f i r s t of these e f f e c t s . The individual's x score can be influenced by some very stable (but not completely constant) t h i r d variable, and h i s y score simil a r l y can be influenced by a very stable (but not constant) f o u r t h variable. When long-term consistency Is introduced through other variables, the and £ constants are not needed f o r t h i s purpose and can be set i y at zero f o r a l l i n d i v i d u a l s . Other complexities, however, such as systematically r i s i n g or f a l l i n g means or variances, must be ignored f o r the present. Where we generate non-zero variances of ^ and jjy, i t i s necessary to know whether any c o r r e l a t i o n exists between the set of ^'s f o r x and y respectively. I f a substantial c o r r e l a t i o n does e x i s t , one would expect I n the l a t e r sections, t h i s t o t a l i n i t i a l variance i s represented by a simple notation: i be: Var(x ). Q Var(x). I n the n o t a t i o n of the present section i t would page 23 Causal analysis 1 that even i n the absence of causal connection between x and y, a p r e v a i l ing c o r r e l a t i o n between them would appear ( i . e . , non-zero asymptote i n the xy correlogram). I n the computer program, therefore, one may specify what c o r r e l a t i o n between ^ °d jfy * desired. a s x A note on the tau o o e f f i c i e n t ( t or T ) . y Although Var(7 ) can be ^x x 1 set independently of V a r ( x ) , there i s an interdependency between Q moments of and I n expression (7) l e t us specify what expected value i s desired f o r each t e r m — i t s average over many time periods. Since x^ is t intended t o deviate around the i n d i v i d u a l constant £ , the expected l x i , value desired i s E ( x ) = <£i ' i t identical. expected value of x ^ ^ i s of course x t - The expected value f o r the e r r o r term i s by d e f i n i t i o n .zero, and f o r each constant i t i s simply that constant. Substituting these expected values f o r the corresponding terms i n expression (7) we have: Six Solving, we f i n d that r x - Ac ? i x + x ? T + . . . (8) 0 l x - (i-^) ... (9) Thus I f the expected value for each i n d i v i d u a l E ( x ^ ) and hence the expecr ted value f o r a l l individuals E(x ) are t o remain constant through time, t t the tau c o e f f i c i e n t T i n (7) must be set = l - o * x Without t h i s , the mean i of x t would not remain independent of time. Theoretical expectation f o r autocorrelation. Given a non-zero zeta variance, a t h e o r e t i c a l expression f o r the autocorrelation as lag k increases page 24 Causal analysis 1 can be shown t o be the following (see Appendix B . l ) : • V /)(x , x ^ ) k - /> t x V'Px* + Var <?x> . . . (io) V a r ( x ) + Var(^ ) Q x Note that i f V a r ( ^ ) is set at zero t h i s expression reduces to ( 5 ) , Also, as k increases, the t h e o r e t i c a l autocorrelation approaches an asymptote i n k as follows: P&t> t 4 k ) ~ * x ^ Var(x ) + V a r ( J ) V a r • • • <") X > Q x Thus w i t h large measurement intervals the autocorrelation approaches an asymptote not of zero, as i n the case where there Is no zeta e f f e c t , but rather of the r a t i o between the zeta variance and t o t a l variance. This e f f e c t may be seen i n Figure 1*7 below. Influence of one variable on the other A f t e r the x* and y series have been created by expression (7) and t allowed t o operate f o r several cycles, we now allow y^ f o r each individual t to be influenced by his x at a specified e a r l i e r time (x!. , where g is it-g called the causal i n t e r v a l ) . Essentially we create another time series 1 * * which may be designated Y . fc it ' I t i s possible, of course t o create another time series X i n which y i n fc fluences x, and thus t o generate reciprocal Influences. of this type are shown at the end of the report. Some simulated data Mathematical properties of such systems are formidable, however, and are not covered i n the appendices. I n subsequent sections, f o r convenience, the two variables however generated w i l l simply be designated x t and y t respectively. page 25 Causal analysis 1 The r e l a t i v e weight exerted by x | . t c xy (read: g is governed by a c o e f f i c i e n t causal influence of x on y ) , which may be set between 0 and +1, ™ the sign determining whether the influence is to be congruent ( i f p o s i t i v e ) or incongruent ( i f negative). There Is no inherent necessity that c be < | l | , but i n t u i t i v e l y i t does not make sense to say that one variable can influence another by an amount greater than I t s e l f . i The recursion equation f o r Y then becomes: *Jt a fyttit-l) +T y?iy + V W + V • • •<> 12 The size of the tau c o e f f i c i e n t must be determined. s value of the x reduces to I as simple. term i s zero p The expected so t h a t I n terms of expected values (12) ( 8 ) , although other properties of the Y series are not Pending f u r t h e r i n v e s t i g a t i o n we. s h a l l continue to set 7^ « ( l - p ) . y Theoretical expectation f o r cross-correlation. For the condition of u n i d i r e c t i o n a l Influence, the t h e o r e t i c a l l y expected cross-correlation Is discussed i n Appendix B.2. Distributed influence I n recursion equation (12) i t is assumed that the causal.influence of x on Y occurs a f t e r precisely g time u n i t s . Another pattern is posi s i b l e , and i s incorporated i n the computer program, namely that Y^ may be t 9 I Influenced not only by x^ .g but also by x^ j_» if2 I x t t- 9 I ••• it-g+l* x Most of the output shown below assumes a causal influence over a f i x e d I n t e r v a l g, but toward the end of the paper some examples are shown I n which the influence of x on y i s d i s t r i b u t e d through several i n t e r v a l s . page 26 Causal analysis 1 D. AutocorrelatIons This section and the one following w i l l present a variety of correlograms produced by simulated variables possessing d i f f e r e n t propert i e s of short-term and long-term consistency, where a u n i d i r e c t i o n a l i n fluence x — f r y was established. Section D w i l l describe the autocorrela- t i o n a l r e s u l t s , and section E the cross-correlational* Effects of short-term consistency Let us s t a r t with, looking a t e f f e c t s of varying the short-term consistency i n variable x, w i t h no long-term component.* . To p l o t these curves, as In Figures 1.6ff., only positive values of the time lag k need be shown, since the autocorrelograms are by d e f i n i t i o n symmetrical. For maximum use of space i n the following charts, there- f o r e , the k scale I n the l e f t h a l f i s the m i r r o r image of that i n the r i g h t h a l f , permitting two separate sets of data t o be shown on the same chart* Figure 1.6 here To generate the curves shown i n Figure 1-6, the rho c o e f f i c i e n t s governing short-temi consistency f o r variable x were set successively from a moderate value of jo ° .70 to an extremely high value of o i X 83 a 99. 'X I n the l e f t side of the chart are p l o t t e d the theoretical values f o r the autocorrelation of x as lag k increases, according t o expression ( 5 ) , p« 21. With a moderate rho of ,70, the t h e o r e t i c a l autocorrelation declined to an asymptote of zero a f t e r 15 time intervals u The higher the rho the page 27 Causal Analysis 1 l.oa .90 Simulated Theoretical ,8C no. .70! \ \ .97 • 6« F-b \ 8 u 40 I as 0) 30 X .20 \ .10 t-io \ \ N .00 25 20 15 10 5 0 5 10 15 20 k (lag) Figure 1.6. Theoretical and simulated values of autocorrelation f o r x as vho (sKort-term consistency) increased from .70 t o .99. Scale of time l a g (k) f o r t h e o r e t i c a l values a t l e f t i s mirror image of that f o r simulated values a t r i g h t . The l a t t e r on the average corresponded w e l l t o t h e o r e t i c a l expectation except f o r extremely high rho. 25 r-fi page 28 Causal analysis 1 more slowly the autocorrelations dropped, but: a l l of them were directed toward a zero asymptote i f a s u f f i c i e n t l y long i n t e r v a l of remeasurement were allowed. The r i g h t side of the chart shows r e s u l t s from 2*3 simulated runs w i t h each set of parameters. (For use i n r e f e r r i n g back t o the o r i g i n a l data, each run has been given an a r b i t r a r y number shown a t the r i g h t of the chart.) From run t o run, the simulated curves deviated somewhat fro© the theoretical. The average of two or more curves approximated the t h e o r e t i c a l curves, although f o r very high values of rho the simulated curves seemed t o f a l l s l i g h t l y below the t h e o r e t i c a l . the deviation i s not clear; The reason f o r perhaps I t i s due simply to sampling e r r o r . The reader w i l l note the s i m i l a r i t y between the autocorrelograms i n Figure 1.6, and curves I and I I sketched i n t u i t i v e l y i n Figure 1.5e I n d i cating low t o high short-term consistencies. For the simulated curves we have examined, once they begin t o dev i a t e from t h e o r e t i c a l they w i l l continue t h i s way, because of the i n t e r dependence between successive states. the For more accurate estimates of t h e o r e t i c a l expectation, several separate runs could be generated and averaged. At t h i s stage of rough exploration, however, the need f o r precise estimation was not strong enough t o j u s t i f y the additional step. Effects of long-term consistency To generate long-term consistency we assign each i n d i v i d u a l ^ and V constants around which h i s scores on x and y respectively are allowed iy to deviate. By making the variance of £ x or 7 small or large compared Causal analysis 1 page 29 to the t o t a l variance of e i t h e r variable (see p. 24), we are able t o create a small t o large component of long-term consistency i n e i t h e r variable. Effects of long-term consistency on autocorrelations of x are shown i n Figure 1.7. (Effects f o r y w i l l be s i m i l a r . ) For addi- t i o n a l discussion see Appendix B.3. Figure 1.7 here Four curves are shown. The broken and s o l i d d i f f e r e d i n short-term consistency (p « moderate and high r e s p e c t i v e l y ) . pair d i f f e r e d i n amount of long-term consistency; The members of each the r a t i o of zeta variance t o t o t a l variance was .30 a t the bottom and .70 a t the top. The t h e o r e t i c a l curves i n the l e f t h a l f of the chart were derived from expression (10), p. 24; they had corresponding asymptotes of .30 and .70. The four simulated curves shown i n the r i g h t h a l f of the chart corresponded rather closely t o the t h e o r e t i c a l expectations; all declined t o t h e i r respective asymptotes governed by the proportion of zeta variance t o t o t a l variance. To see what the curves would look l i k e given the same rho c o e f f i c ients and no long-term consistency, the ready may look back a t the bottom two pairs of curves In Figure 1. 6. The higher the rho c o e f f i c i e n t , the more slowly the autocorrelograms declined toward t h e i r respective asymptotes. I n Figure 1.7 may be seen variables of types I I I t o VI sketched in Figure 1.5e. Causal Analysis 1 page 30 1.00 10 .00 25 20 15 10 10 15 20 k (lag) Figure 1.7. Autocorrelations w i t h d i f f e r e n t combinations of long-term and short-term consistency. I n the lover curves the long-term consistency (governed by r a t i o of zeta variance t o t o t a l variance) was moderate, and i n the upper curves i t was high. These asymptotes were reached slowly (a) or r a p i d l y ( b ) , depending on short-term consistency (governed by r h o ) , 25 page 31 Causal analysis 1 Effects of causal influence by x on y Thus f a r we have shown r e s u l t s f o r the x variable only. I f we were to look at r e s u l t s f o r the y variable without influence by x, r e sults would be analogous, since (except f o r the influence of x) y i s generated i n the same way. What, then, w i l l be the e f f e c t on autocorrelations of y when t h i s is influenced by the individual's p r i o r x value at g time units e a r l i e r , * where g is the causal Interval? Figure 1.8 shows r e s u l t s when x and y had short-term consistency only (rho's ranging from .40 to .95), and the influence of x on y was moderate (causal c o e f f i c i e n t c was set at +.20). Figure 1.8 here For comparison, autocorrelations f o r x i n the same runs are shown at the l e f t and those f o r y at the r i g h t . I f there were no causal influence between the two the autocorrelations f o r each pair should be s i m i l a r , since both had the same rho's and no zeta's. Yet the reader w i l l note that when rho's were r e l a t i v e l y high, the autocorrelation for y dropped more slowly than did the corresponding curve f o r x. actual short-term consistency of y became greater. I n other words, the I t appeared that some part of the consistency i n x was being added to the e x i s t i n g consistency i n y Unless "otherwise s p e c i f i e d , x and y variables were generated w i t h p x » p Causal Analysis 1 page 32 i.oq \ Simulated curves f o r x .9XL. .80- / \ g / .5C / .4cr .30- / / / \ \ / 7 \ 1 \ oo. \ \ A--90 \2. \ / Jo .20- .10 \ \ / jot- .6Q Simulated curves f o r y / \ \ \ \ \ \ \ \ / \ 7 k (lag) Figure 1.8. Fairs of x and y variables were created w i t h i d e n t i c a l shortterm c o e f f i c i e n t s ranging from p « .40 t o p * .95, and no long-term consistency; x was given a moderate causal influence on y (c^y • +.20). As rho's increased, the short-term consistency of y appeared t o rise faster than f o r the corresponding x. Above p^ » .95, the variance of y became too unstable to j u s t i f y p l o t t i n g the autocorrelation. .(For discussion, see Appendix B.3.) page 33 Causal analysis 1 The reader may wonder whether the increase i n y's consistency was due t o giving x an positive Influence; would g i v i n g x a negative i n - fluence (making the c o e f f i c i e n t c negative) reduce rather than raise xy the consistency of y? Empirical tests gave the same result e i t h e r way-w consistency of y was increased* same r e s u l t ; Mathematical derivation yielded the 2 the increase depended on c (see Appendix B.3). Effect of x influence on y autocorrelation given long-term consistency We saw previously (Figure 1< 7) that i n the presence of long-term consistency created by variance among the zeta's, the autocorrelations declined t o a non-zero asymptote equivalent t o the r a t i o of zeta variance to t o t a l variance. How w i l l t h i s picture be affected, i f y i s allowed to be influenced by a p r i o r value of x? Some results are shown i n Figure 1.9. Figure 1*9 here I n a l l cases, the l e v e l of the y curves was raised. From the simulated results one cannot t e l l whether each y curve was heading toward a higher asymptote than i t s corresponding x curve, or Whether I t was simply dropping more slowly. The mathematical derivations^ though, indicate that the asymptotic y autocorrelations were Indeed higher, and that t h i s e f f e c t depended on the magnitude of V a r ( ^ ) but not on Var(^y). x See Appendix B.3. Let us leave these somewhat technical questions and move on t o the topic of cross-correlations, which i s more central to our concern w i t h page 34 Causal Analysis i i > Simulated f o r x imulaUid Vor (%) k (lag) Figure 1.9. With short-term consistency fixed at a high value ( p « p x y - .90) and x having a moderate e f f e c t on y ( c » +.20), long-term xy w consistency was varied by s e t t i n g variance of each zeta - .30, .50, and .70 of t o t a l variance. The autocorrelations f o r y were raised d i s t i n c t l y above those f o r x. Causal analysis 1 page 35 cross-lagged d i f f e r e n t i a l s . E. Cross-correlations Effects of v a r i a t i o n In short-term consistency As under the discussion of autocorrelations, we s h a l l s t a r t w i t h simpler examples and proceed t o more complex. F i r s t l e t us allow the short-term consistency of both x and y t o r i s e , w i t h no long-term consistency, and study the e f f e c t on the cross-correlations.* i n a l l the examples i n Figure 1.10, x was given a small causal Influence on y ^°xy ° **^)» + a n d t l i e r n o 1 8 w e t e allowed t o vary from .70 t o .95. I n Figure 1.11 the same set of rho's was used, but the causal influence was made negative (c =» -.10). Figure 1.10 and 11 here The i n t u i t i v e expectations sketched i n Figure 1.1 were generally borne out. When x was given a p o s i t i v e influence on y (the causal i n t e r val i n a l l cases was set a t g ° 4 ) , the correlation between x and y became increasingly positive as the lag k approached the causal Interval g (Figure 1.10). When x was given a negative influence on y, the correl a t i o n between them became Increasingly negative as lag k approached causal I n t e r v a l g (Figure 1.11). See Appendix B.3. *Unless otherwise specified, we always set p =p and Var(^ ) =» Var(? ) . Causal Analysis 1 page 36 .80 c a u s a l interval •70 Ru» no. P* and Py .60 -50 0-1 95 .90 70 8 .40 .30 .20 \ .10 1 \LO 7 .00 25 20 15 10 10 15 20 k (lag) Figure 1.10. Effect on the cross-correlations of v a r i a t i o n i n short-term consistency, w i t h x exerting a small p o s i t i v e influence on y (c^y » +.10). As p a x Py Increased from .70 t o .95, the crose-correlograms (a) became higher, (b) increased i n span, (c) reached a maximum height a f t e r increasing delay beyond the causal I n t e r v a l . Causal Analysis 1 page 37 .10 i—i .10 \ .20 / i / / D-8 .30 .40 0) no P06 .50 Py .70 o .60 .90 95 70 causa infervol .80 50 40 30 20 10 10 15 20 k (lag) Figure 1.11. Parameters here were the same as i n Figure 1.10. but t h i s time the influence of x on y was m i l d l y negative (c « -.10). xy As short-term consistency increased (from p .70 t o p « -.90), negative correlograms of increasing height, span, and delay were generated. page 38 Causal analysis 1 Other i n t e r e s t i n g features appeared as the rho's increased* (a) The correlograms became successively higher ( e i t h e r more positive or negative, depending on the sign of c___). Note that the magnitude of causal influence xy (c xy ) was unchanged--only the short-term consistency, (b) The span of the cross-correlogram became w i d e r — t h a t i s , i t started to rise sooner and declined t o zero l a t e r , (c) More s u r p r i s i n g , the point of maximum height did not occur a t causal I n t e r v a l g but was Increasingly delayed. The l a t t e r e f f e c t has been derived mathematically; see pp. 40-42, and Appendices A.6 and B.3. Implications f o r cross-lagged d i f f e r e n t i a l . Let us return f o r a moment t o what started t h i s i n v e s t i g a t i o n — t h e question of whether causal connections might be inferred from a " d i f f e r e n t i a l i n cross-lagged correlations." The l a t t e r quantity, as the reader w i l l r e c a l l from pages 4 and 5 (see also Glossary, p . v i ) , i s the difference between two xy correlations, in one of which x i s measured k time u n i t s before y, and l n the other y Is measured k time units before x. I n cross-correlograms such as Figures 1-10 and 11, this d i f f e r e n t i a l appears as a difference i n the height of the correlogram at equal distances on e i t h e r side of k = 0. Now i f x does exert a causal influence on y, under what conditions w i l l a d i f f e r e n t i a l i n the cross-correlations become v i s i b l e ? Its visi- b i l i t y w i l l be affected by two characteristics of the correlogram. (a) One is the height. I f the d i f f e r e n t i a l i s computed a t that k where the correlogram i s maximum (height a t t h i s point being compared w i t h height a t negative k of same s i z e ) , the magnitude of the d i f f e r e n t i a l Increases as Causal analysis 1 page 39 the rho's increase. (b) A second c h a r a c t e r i s t i c i s the span of the correlogram~-roughly the distance between i t s r i s e from zero at the l e f t extreme and i t s return to zero a t the r i g h t extreme ( f o r another d e f i n i t i o n see Glossary, p. vn ) . Given the short span generated by rho's of .70, the cross-lagged d i f f e r e n t i a l was v i s i b l e f o r k's i n a limited range: +10. from about k « +3 t o about (Exact d e f i n i t i o n of the range w i l l depend upon s p e c i f i c a t i o n of sampling v a r i a b i l i t y , which we have not attempted to pursue.) Given the broader span generated by rho's of „9b, the d i f f e r e n t i a l was v i s i b l e over a much greater range: from k = about +3 to +25 or more. And f o r the s t i l l broader span f o r rho's of .95, i t is clear (had the k axis been extended) that the d i f f e r e n t i a l would remain even over a range up t o k = +50 or more. Thus even when the lag In measurement was much longer than the causal interval,, the d i f f e r e n t i a l persisted when short-term consistency was high. Now since both the height and span are affected by rho, i t tentat i v e l y appears that the greater the short-term consistency of x and y (magnitude of causal Influence remaining constant), the more clearly a causal influence of x on y w i l l be v i s i b l e i n the d i f f e r e n t i a l I n crosslagged correlations, even when the measurement lag i s much longer than the causal i n t e r v a l . One might suppose that span i s merely a function of height, but the data i n Figure 1» 13 below w i l l show that height can increase without an Increase i n span. Causal analysis 1 page 40 (Several questions must be explored before t h i s statement can be a f f irmed--such as whether the conclusion depends equally on the short-tern consistencies of x and y, or whether one matters more. Some simulated data when the two rho's d i f f e r sharply w i l l be shown below i n Figure 1.16, page 49 ) . Another important aspect of Figures 1.10 and 11 Is the t h i r d characteristic: (c) the fact that the maximum height of the correlogram oc- curred l a t e r than the causal i n t e r v a l , increasingly so w i t h higher rho's. Given high short-term consistencies, the researcher would be advised t o select a measurement i n t e r v a l which was probably "too long" ( i . e . , longer than the causal i n t e r v a l ) rather than "too short." Delay i n maximum. An expression has been obtained r e l a t i n g the amount of delay In the maximum of the correlogram t o the size of p * respectively. X and p Y The relationship i s plotted i n Figure 1.12. Figure 1.12 here For example, i f p and p both « .50, there w i l l be no delay In the The expression i s s i m p l i f i e d when the time dimension t i s allowed t o become very large compared t o the measurement lags k — t e c h n i c a l l y , when the expression becomes "asymptotic i n t . " Under t h i s condition, the delay in maximum of the correlogram depends n e g l i g i b l y on g and t , and almost e n t i r e l y on yo and yoy. Figure 1.12 plots t h i s t-asymptotlc r e l a t i o n s h i p , described x i n Appendix A.6 and j u s t i f i e d i n Appendix C. Causal Analysis 1 page 41 1.00 Delay .90 Delay « 2 .80 Delay «.1 + .70 Delay •= 0 „«0 + .50 .00 . .00 .10 .20 .30 .40 .50 .60 .70 .80 .90 f i g u r e 1.12. Theoretical expectation f o r amount by which maximum of xy cross-correlogram w i l l be delayed beyond the causal i n t e r v a l , depending on the short-term consistency In the two variables. where ^ E.g., • .90 a delay of about 4 time periods can be expected. 9 3,00 page 42 Causal Analysis 1 maximum; But I f p and "x the l a t t e r w i l l f a l l a t the causal i n t e r v a l . both « .95, then a delay o f 10 time periods w i l l be observed. This t h e o r e t i c a l expectation agrees reasonably w e l l w i t h the empirical curves i n Figures 1*10 and 1.11. (Because of the flatness of the higher curves p and rounding of the correlations t o two decimal points, the exact maximum i n some curves i s ambiguous.) Note from Figure 1.12 that p maximum than p^ y i s more c r i t i c a l i n determining t h i s Even though the l a t t e r i s extremely high, there can s t i l l be zero delay over p^ values up t o «50. But I f p^ i s very high, delay w i l l be introduced even f o r small values of p^* Effect of varying the amount of causal influence by x on y The previous chart (Figure 1.12) implies that the amount of delay i n the maximum point of the cross-correlogram Is affected only by the rho values of x and y respectively (short-term consistency), and not by the presence of long-term consistency (non-zero zeta variance) or the amount of causal influence of x on y. out empirically by curves shown i n These expectations are borne the next few charts Q Figure 1.13, f o r example, shows what happens when short-term consistencies of x and y were fixed respectively a t a moderate l e v e l (px •» p^ a .70 i n the upper c h a r t ) , and a t a high l e v e l ( p m x Py m . 9 0 — i n the *Lemma 2 i n Appendix C shows that (asymptotically i n t ) the boundary line between the region where the delay i s zero and the delay i s one has the equation: p ( l + p ) = 1 . I . e . , there i s zero delay i f and only i f /> (l+p ) < 1« x y x Causal Analysis 1 lower c h a r t ) • page 43 The three curves l n each chart are generated by a succes- sively stronger degree of influence by. x on y ( a r b i t r a r y levels of the causal c o e f f i c i e n t Cyy were selected f o r upper and lower charts respect i v e l y f o r p l o t t i n g convenience). Figure 1.13 here So long as rho remained f i x e d , increasing the causal Influence of i x on y had a single e f f e c t : the correlogram became higher, and the cross- lagged d i f f e r e n t i a l became more marked, at least f o r lags k reasonably close t o the causal i n t e r v a l g* But increasing the causal influence did not-increase the span of the correlogram, nor did i t a f f e c t the amount of delay i n the maximum point• I n other words, i f there e x i s t s i n f a c t a causal influence of x on y, then the stronger t h i s causal influence, the more clearly i t w i l l show up i n a d i f f e r e n t i a l i n cross-lagged c o r r e l a t i o n s , providing the shortterm consistency of both variables i s a t least moderate, and the measurement i n t e r v a l k i s reasonably close t o the causal i n t e r v a l g. The lower part of Figure 1.14 shows that given larger rho's, a l l three curves possessed a wider span and a l a t e r maximum than was true i n the upper chart. Again, increasing the causal c o e f f i c i e n t merely increased the height of the correlograms without a f f e c t i n g t h e i r span or delay I n maximum. Causal Analysis 1 page 44 .60 50 8 Causal influence of x on y .40 0 • 30 CO .20 \ 10 Run r .00 s E-6 60 .50 Causal influence of x on y +. ZO 40 t ,30 KUft no, .20 CO to .10 causal- mfervol .00 -25" -20 -15 -10 -5 0 5 10 15 20 k (lag) Figure 1.13. E f f e c t of increasing the influence of x on y. Short-term consistencies of x and y were p » .70 (upper chart) and .90 (lower c h a r t ) . As causal influence increased, the height of cross-correlograms increased, the height of cross-correlograms increased, but span and l o c a t i o n of maximum d i d n o t . 25 page 45 Causal Analysis 1 Effects of Introducing long-term consistency We saw i n Figures 1.7 and 9 that introducing long-term consistency altered the asymptote o f the autocorrelations. Instead of declining t o zero f o r very long i n t e r v a l s of measurement, the autocorrelations s t a b i l ized (as expected) at some p o s i t i v e value. What e f f e c t would t h i s property have on the cross-correlations? I n the early section where i n t u i t i v e expectations were given (pp. 6-12 and Figures 1.1 t o 4 ) , no conjectures were offered on t h i s aspect. Over short i n t e r v a l s (k •> 5 or 10) high autocorrelations can be generated by e i t h e r short-term or long-term consistency. I t seemed l i k e l y , perhaps, that moderate consistency from e i t h e r long-term or short-term sources would permit the cross-lagged d i f f e r e n t i a l t o appear, b u t that extremely high consistency from e i t h e r source might cause the cross-correlogram to r i s e and f a l l very slowly and thus obscure the cross-lagged d i f f e r e n t i a l . (The actual e f f e c t of short-term consistency was d i f f e r e n t , as reported I n pp. 35-41). Figures 1.14 and 15 I l l u s t r a t e the actual e f f e c t s of introducing long-term consistency. I n Figure 1.14 the short-term component was moderate (rho » .70), and i n Figure 1.15 i t was high (rho = .90). Figures 1.14 and 15 here The r e s u l t s were d i s t i n c t l y s u r p r i s i n g . As long-term consistency increased, the curves d i d become f l a t t e r as expected, but instead of r i s i n g gradually from an asymptote of zero, a non-zero asymptote was generated 1 Causal Analysis 1 page 46 .60 Vor .50 Vor(y) MarM 4J 4J . Vor(> ) \ .70 / .40 .So 8 \ \ / \ .30 F-8 i 0) r CO CO .20 .30 L .10 .00 .00 causa Interval r - - i — 20 25 15 10 7* 0 10 15 20 k (lag) Figure 1.14. E f f e c t on the crossi-correlogram of increasing long-term consistency. p Here short-term consistency was set moderate (p^ « = .70), and causal influence was moderate ( c ^ =* +.20). As long-term consistency increased (variance of ^ and ^ increasing from .00 t o .70 of t o t a l variance), the curves became higher but flatter. Causal Analysis 1 page 47 • 90 ,80 .70 60 .50 ,40 Var ix) Var(y) 30 20 10 causal interval 00 -25 -20 -15 10 -10 15 20 k (lag) Figure 1.15, Another example of e f f e c t on cross-correlogram of increasing long-term consistency. ( t p x a r p y a .90); Here short-term consistency was set high causal influence was again moderate (c xy +.20). As long-term consistency increased, the curves became markedly flatter. 25 page 4-8 Causal Analysis 1 (This property has been confirmed mathematically; see Appendix B.3.) One would hot be surprised to f i n d a non-zero asymptote i n the presence of some c o r r e l a t i o n between ^ i)X ?y c o r r e ^ a t ^ o n w a s and ^ approximately zero. but i n these charts the 9 Given a condition of long-term consistency i n both variables, and given some causal influence of x on y, a permanent positive c o r r e l a t i o n was generated between them, even f o r measurement intervals remote from the causal i n t e r v a l . (Let us reserve f o r l a t e r discussion the question of whether this e f f e c t was generated by the long-term consistencies i n both x and y, or whether one of them was mainly responsible.) We now could r e f i n e our i n i t i a l conjecture about the effects of consistency on the cross-lagged d i f f e r e n t i a l . The higher the short-term cons is tency ( i n our model, the higher the rho's), the mora strongly the presence of a causal connection was revealed by a, difference i n crosslagged correlations. But the higher the long-term consistency ( i n our model, the larger the zeta variances), the more a causal influence of x on y was obscured. To summarize, we may say that as long-term consistency increased; (a) the span was unaffected (span being defined as the point of r i s e from a horizontal base whether zero or non-zero, to the point of r e t u r n to the h o r i z o n t a l ; (b) amount of delay i n maximum was unaffected; (c) the height of the cross-correlogram was raised somewhat, but (d) this advantage was more than o f f s e t by increasing flatness, so that the cross-lagged d i f f e r e n t i a l was reduced rather than increased. page 4 9 Causal A n a l y s i s 1 E f f e c t of marked i n e q u a l i t y inshort-term c o n s i s t e n c y of x and y Thus f a r v a r i a b l e s x and y have been g e n e r a t e d t e r s f o r t h e rho's and are markedly d i f f e r e n t ? cies. zeta variances. L e t us f i r s t w i t h i d e n t i c a l parame- What w i l l happen i f t h e s e parameters c o n s i d e r the s h o r t - t e r m c o n s i s t e n - W i l l the c r o s s - l a g g e d d i f f e r e n t i a l be o b s c u r e d — o r c o n c e i v a b l y r e - v e r s e d — i f x i s v e r y much more c o n s i s t e n t t h a n y, or v i c e v e r s a ? s i m u l a t e d r e s u l t s a r e shown i n F i g u r e Some 1.16. Figure"1.16 here F o u r c r o s s - c o r r e l o g r a m s a r e p l o t t e d , i n w h i c h d i f f e r e n t d e g r e e s of short-term c o n s i s t e n c y f o r x and y were combined. d e c l i n e d from h i g h (p =..95) t o low (p x o f the c u r v e s h r a n k As c o n s i s t e n c y of x =* . 4 0 ) , the l e f t - h a n d p o r t i o n x i n s p a n — t h a t i s , i t remained f l a t l o n g e r , and r o s e more a b r u p t l y as i t approached t h e c a u s a l i n t e r v a l . t h e s e changes were accompanied by to .95, p value x the span of the r i g h t - h a n d (.95) A l s o , when r i s i n g c o n s i s t e n c y o f y from py » .JO p o r t i o n i n c r e a s e d — a l t h o u g h the h i g h e a t seemed t o o f f s e t the e f f e c t of the s m a l l e s t py* Hence the c o n s i s t e n c y of x seemed t o a f f e c t m a i n l y of the c u r v e , and the c o n s i s t e n c y of y to a f f e c t mainly the l e f t the r i g h t However, i n a l l c u r v e s the c r o s s - l a g g e d d i f f e r e n t i a l was I f one then g e n e r a l l y h i g h e r than t h a t a t the left. side. maintained. t a k e s e q u a l d i s t a n c e s on e i t h e r s i d e of the middle (k = 0 ) , c u r v e a t the r i g h t was side the page 50 Causal Analysis 1 70 60 95 -W .50 30 TO .95 F-4-9 •25 •20 0 15 10 -5 15 20 k (lag) Figure 1.16. Effect o f inequality i n short-term consistency of x and y, ( I n a l l curves, x was given a strong influence on y: Decreasing p i n span. x c =• +.40.) caused the l e f t p o r t i o n of the correlogram t o shorten, Increasing p^ generally caused the r i g h t p o r t i o n t o i n - crease i n span. But i n a l l curves the cross-lagged d i f f e r e n t i a l remained f o r k's up t o +15 or more. 25 page 51 Causal Analysis 1 Thus, marked i n e q u a l i t y i n t h e s h o r t - t e r m did c o n s i s t e n c y o f x and y n o t o b s c u r e t h e emergence o f the c r o s s - l a g g e d causal differential from a connection. E f f e c t o f marked i n e q u a l i t y i n l o n g - t e r m c o n s i s t e n c y o f x and y Next, how w i l l t h e c r o s s - c o r r e l o g r a m s be i n f l u e n c e d i f t h e l o n g - term c o n s i s t e n c y i n x i s much l a r g e r t h a n t h a t f o r y , o r v i c e v e r s a ? Six curves in Figure showing d i f f e r e n t c o m b i n a t i o n s o f t h e s e q u a n t i t i e s a r e p l o t t e d 1.17. F i g u r e 1.17 h e r e A base of comparison i s the dotted c u r v e a t t h e bottom, where n e i t h e r x n o r y had any l o n g - t e r m c o n s i s t e n c y to t o t a l v a r i a n c e It .00 f o r b o t h v a r i a b l e s ) . I s remarkable t o note, f i r s t , term c o n s i s t e n c y ( r a t i o of z e t a v a r i a n c e to t h a t when only y was g i v e n long- ( r a t i o o f jfy t o t o t a l v a r i a n c e « .70, i n t h e n e x t - t o - bottom c u r v e ) t h e r e was a l m o s t no change from t h e comparison c u r v e ! cross-correlogram rose sharply, with cross-lagged as d i s t i n c t a s f o r t h e comparison differentials x almost curve. When moderate t o h i g h l o n g - t e r m c o n s i s t e n c y was i n t r o d u c e d however ( r a t i o o f ^ The i n t o x, t o t o t a l v a r i a n c e = .50 o r . 7 0 ) , t h e c r o s s - c o r r e l o - grams i m m e d i a t e l y f l a t t e n e d a t a h i g h " l e v e l , r e g a r d l e s s o f the long-term Causal Analysis 1 page 52 90 .80 ,70 60 F-53 50 F-20 AO Var( ) % 30 .20 10 00 -.10 25 20 -15 10 10 -5 15 20 25 k (lag) Figure 1.17. E f f e c t on the cross-correlogram of differences i n the longterm consistency of x and y. I n each curve, short-term consistencies were made high (p^ » py = .90), and causal influence s l i g h t (c » +.10) When y alone had long-term consistency, the correlogram was almost unchanged; but when x alone or both x and y had long-term consistency, the curves became much f l a t t e r . Caus,il Analysis 1 component i n y. to be: page 53 The i m p l i c a t i o n f o r cross-lagged d i f f e r e n t i a l s seems lons-term consistency i n the "cause w i l l obscure the d i f f e r e n t i a l , 11 whereas long-term consistency i n the " e f f e c t i t i s shown that the e f f e c t of ^ v 11 may not. ( I n Appendix disappears and only that of ^ remains.) x When both variables had high long-term components (£ « ^ • .70), x the cross-correlogram was even f l a t t e r . This seems plausible. B.3 As each variable approaches complete long-term consistency, there i s no longer room f o r causal influence to a f f e c t the cross-correlation, except by generating a high, f l a t asymptote. Effect of introducing c o r r e l a t i o n between zeta s f o r x and y f I n a l l of the correlograms thus f a r where x and y possessed some long-term consistency (variances of ^ x and £ puter established a c o r r e l a t i o n between ^ x were non-zero), the com- and £ y close to zero. What w i l l happen, now, i f a p o s i t i v e or negative c o r r e l a t i o n i s Intro-* duced between ^ x and £ ? v I t seemed l i k e l y , f i r s t , that a p r e v a i l i n g non- zero cross-correlation between x and y w i l l be generated, even i f x has no causal Influence on y. (We saw above that the mere existence of a £ component w i l l also generate a non-zero asymptote i f x does influence y.) Second, i t seemed l i k e l y that the more p o s i t i v e the £ £ correlation, the more positive w i l l be the asymptotic xy cross-correlogram. *For t h i s and other t e n t a t i v e conclusions, the reader i s reminded that they apply t o the one hypothetical model we have developed. Causal Analysis 1 page 54 The e f f e c t s of a ^ ^ correlation^ we thought, would resemble the effects of a t h i r d variable which Influences both x and y (see the conjecture i n Figure 1.3). Figure 1.4 sketched how such a non-zero asymptote might give.rise to an ambiguity. Where i t occurs as In curve B, and y exerts a negative or incongruent influence on x, the cross-correlogram should dip toward the zero l i n e when y precedes x by a suitable I n t e r v a l . The cross-lagged d i f f e r e n t i a l (the difference between the f i r s t and l a s t c i r c l e s i n Figure 1.4) could be tne same f o r curve B as f o r curve A, where x exerts a positive e f f e c t on y. We have t r i e d t o create curve B by introducing c o r r e l a t i o n betveen and £ v i n our model, but so f a r have been unsuccessful. Some results thus f a r are plotted in'Figures 1.18 and 19. Figures 1.18 and 19 here Thus i n Figure 1.18, a l l of the short-term consistencies were moderate (p K = py => .70), and causal influence of x on y was m i l d l y p o s i t i v e ( c ^ = +.20). A positive asymptote was created, as noted e a r l i e r . Now, i n the f i v e curves, the £ x ^ c o r r e l a t i o n was varied from strongly p o s i t i v e t o strongly negative (+.60 t o -.60). But even a strong negative c o r r e l a t i o n between ^ jfy f a i l e d t o p u l l the asymptotic cross-correlation below zero I See Appendix B.3. Figure 1.19 shows the corresponding picture when x exerted.a moderate Causal Analysis 1 page 55 .80 .70 Run .60 F-Z9 50 V / .40 .30 \ / / .to \ .20 V .10 F.-27 CflUSQI I'nferval .00 25 20 15 10 10 15 20 k (lag) Figure 1.18. Effect on cross-correlogram o f c o r r e l a t i o n between long-term constants. (p x I n a l l curves, short-term consistency was moderate = py = .70), x had a moderate p o s i t i v e influence on y ( c ^ = +.20), and variance of zeta's was set a t .50 of t o t a l variance. When c o r r e l a t i o n between £ and w a s varied from +.60 t o -.60, the asymptote was pulled down, but not past zero. 25 Causal Analysis 1 page 56 + . 60 t •23 20 •15 10 causal in+erval 10 15 20 k (lag) Figure 1.19. E f f e c t s of c o r r e l a t i o n between long-term constants (cont'd) Parameters were the same as I n the preceding chart, except t h a t here the Influence o f x on y was negative (c^y « - 2 0 ) u a negative asymptote• When the c o r r e l a t i o n between ^ 9 creating and ^ y was v a r i e d from -.60 t o +.60, the asymptote was p u l l e d up b u t 8 not past zero. 25 Causal Influence !• page negative influence on y (c =-.20). Now a prevailing, (asymptotic) nega- xy t i v e correlogram was generated. c o r r e l a t i o n between ^ and £ y But again, even a powerful positive f a i l e d to r a i s e t h i s asymptote above the zero l i n e . I n short: i n experiments thus f a r we have been unable t o generate a cross-correlogram which was moved from a p r e v a i l i n g positive asymptote toward zero by the nagative causal influence of x on y. That i s , we have been unable t o mask a congruent, influence x-^*y by creating a negative ^ x £ y c o r r e l a t i o n , or vice versa. Effects of d i s t r i b u t i n g the causal Influence I n a l l the foregoing examples the causal influence of x on y was exerted a t one s p e c i f i c i n t e r v a l g (called the causal i n t e r v a l ) . The actual recursion equation f o r y, however, may be extended t o include more than one causal i n t e r v a l . The y variable a t time t can be simultaneously influenced not only b j r x g , but also by other p r i o r values of x. t- What w i l l happen i f the influence of x on y i s d i s t r i b u t e d over several time intervals instead of concentrating a t a single one? We suspected that the cross-correlogram would increase i n span. Some r e s u i t s of one t r i a l are shown i n Figure 1.20. Figure 1.20 here *The mathematical model corresponding t o d i s t r i b u t e d causal Influence has not y e t been developed. Causal Analysis 1 page 58 .60 of- e n c k causol interval q .50 •u (« 30 AO JO .03 .07 03 CD .30 i .20 J6 .06 \0 .10 .07 oif b) 10 .ox ,03 ?7 .oi Run no .10 .00 K-47 .10 25 20 10 10 0 15 20 k (lag) Figure 1.20. Effects of concentrating vs. d i s t r i b u t i n g the causal i n fluence c^y. Short-term consistency was moderate (p^ » p y =* .70). I n a l l curves the t o t a l influence o f x on y was the same, but i t was d i s t r i b u t e d d i f f e r e n t l y through time. concentrated a t one causal I n t e r v a l ; I n curve (a) c ^ was i n (b) and (c) i t was spread symmetrically over, three or f i v e i n t e r v a l s , i n curve (d) the causal influence declined exponentially. See text f o r discussion. 25 Causal Analysis 1 page 59 The four curves a l l used about the same t o t a l influence of x on y. But i n curve (a) t h i s was concentrated a t i n t e r v a l 3; i n curve (b) i t was evenly d i s t r i b u t e d over i n t e r v a l s 2, 3, and 4; i n curve (c) i t was d i s t r i b u t e d i n a symmetrical pyramid between i n t e r v a l s 1 and 5. The cross-correlograms were s u r p r i s i n g l y s i m i l a r . The t h i r d curve was a l i t t l e broader i n span than the others, but the random v a r i a t i o n i n these correlograms i s such that one cannot be assured of.a genuine difference. I n the f o u r t h curve (d) the causal c o e f f i c i e n t s decreased exponentially, being greatest a t i n t e r v a l 1 and successively lower a t Intervals 2 through 5. The r e s u l t i n g span was about the same as before although the maximum (as one might expect) was closer to k a 0. The l a s t d i s t r i b u t i o n of c i s i n t u i t i v e l y appealing. I t seems xy plausible that the "cause" should Influence the " e f f e c t " most strongly i n the period Immediately f o l l o w i n g , and exert less and less influence as the " e f f e c t " variable becomes more and more remote. I n future t r i a l s , a much broader d i s t r i b u t i o n of causal influence w i l l be used t o see a t what point the correlogram i s noticeably f l a t t e n e d . For the time being, one may t e n t a t i v e l y conclude that whether the causal influence i s concentrated or d i s t r i b u t e d s l i g h t l y through time has l i t t l e e f f e c t on the cross-correlogram. I n further tests of correlational properties of simulated data, we s h a l l continue t o use a single causal i n terval • Causal Analysis 1 page 60 F. Reciprocal Influence Thus f a r simulated data have been shown i n which variable x exerted a u n i d i r e c t i o n a l influence on variable y. The basic model, however, permits b i d i r e c t i o n a l or reciprocal influences of x and y on each other, and several results w i t h t h i s feature w i l l be presented i n the present * section. The approach here must be cautious. I n certain preliminary runs, when x and y had high short-term consistency (rho's - .90, or higher), strange things happened t o the d i s t r i b u t i o n s of x and y. Means and vart iances were no longer stable; I n p a r t i c u l a r , over the customary 50 cycles, some variances increased explosively (by a factor of 1,000 or more). Let us s t a r t , then, w i t h variables having low short-term consistency (p x => py = .50), and no long-term consistency. E f f e c t of reciprocal influence on autocorrelations I n Figure 1.21 are shown results w i t h two runs, one containing a positive feedback loop (x-^W-y-^x) and one a negative feedback loop (x-^Vy—»I n order t o keep the two influences somewhat out of phase, the causal lags were made d i f f e r e n t : g =» 3 and 5 respectively. The c o e f f i c i e n t Cy deX notes the influence on x of the p r i o r value of y. Figure 1.21 here *The mathematical model f o r reciprocal influence has not yet been developed. Causal Analysis 1 • 7.0 [ page 61 1 1 f 1 T Variable y Variable x .60 1 — — J 1 .50 .40 .30 Kurt .20 u no. / .10 .00 V \ \ \ I — \ F~3'f i / \ .20 / \ / \ \J .30 ' 1 25 F-33 \ effects / .10 •.tu second 20 15 1 // V 1 10 / 1 5 1 0 -- • 5 • 10 • 15 1 20 25 k (lag) Figure 1.21. Effect of reciprocal influence on. autocorrelations. term consistencies were weak ( p => p x y Short- = .50). I n the upper curve, x and y were made t o influence each other p o s i t i v e l y (c™ = c a +.40), xy yx w i t h causal lags of g «» 3 and g « 5 respectively. I n the lower curve, the x—->-y influence was p o s i t i v e , while the y*-*-x influence was the same i n size but negative; causal lags as before. fects due t o the feedback loops appeared. Secondary e f - Causal Analysis 1 page 62 I n place of the previous autocorrelations which declined steadily e i t h e r t o a zero or some positive asymptote, certain periodic fluctuations appeared a t intervals of about 9 t o 19 respectively. (Note that the sum of the two causal i n t e r v a l s was 8,) The p o s i t i v e feedback loop appeared to generate a major and a minor secondary peak; the negative feedback loop produced a major v a l l e y followed by a minor peak. One thus faces the f a c t that a variable can be more strongly correlated w i t h i t s e l f over some moderate i n t e r v a l (such as k =* 9) than over an intermediate i n t e r v a l (such as, k « 5 ) . Furthermore, even when a variable i s reasonably self-consistent (rho i s p o s i t i v e ) , the presence of negative feedback loops can generate negative autocorrelations over certain intervals. Effect of reciprocal Influence on cross-correlations The reader w i l l r e c a l l from Figure 1.2 the i n t u i t i v e expectation that i f each variable influenced the other, two peaks or valleys should be observed, one on e i t h e r side of k = 0, The cross-correlograms generated by the variables j u s t discussed are p l o t t e d i n Figure 1.22. Figure 1.22 here The expected e f f e c t s did indeed appear. erated a peak a t k a The Influence of x — > y gen- +3, i d e n t i c a l w i t h the causal i n t e r v a l of g = 3; the e f f e c t appeared almost I d e n t i c a l l y i n two d i f f e r e n t runs. The e f f e c t of y - ^ x likewise generated a peak a t k = -5, corresponding Causal Analysis 1 page 63 .60 .50 effect of effect of AO .30 / .10 no. \ \ seconuary J 0> / .00 / 10 .10 Run \ .20 \ efFects \ l I \ / 1 / \1 \ \ V effects \ \ \ .20 / V / .30 v effect- of AO i .50 .60 I -25 _ l -20 I -15 I -10 I -5 ! 0 L 5 I 10 I 15 I 20 k (lag) Figure 1.22. Effect of reciprocal influence on cross-correlations. Same parameters as i n previous f i g u r e . The x-^-y influence produced a p o s i t i v e peak when x was measured before y (k «• g = +3); the y — and y »ac influences produced p o s i t i v e and negative peaks respec- t i v e l y when y was measured before x (k « g = - 5 ) . Secondary e f fects again appeared. I 25 Causal Analysis 1 page 64 again to the causal i n t e r v a l g = 5. Correspondingly, the negative influence of y-^-x produced a "negative peak" or v a l l e y of equal magnitude i n the opposite d i r e c t i o n , as predicted. I n addition to these main effects c e r t a i n secondary peaks and valleys appeared, a f t e r an i n t e r v a l of about k =• 4*13 and -15. Although the cor- relograms did not extend over a long enough time to be sure, we suspect that these secondary e f f e c t s would be periodic but smaller and smaller, and the cross-correlograms would eventually reach an asymptote of zero ( i n the absence of long-term consistency). What are the implications f o r the emergence of cross-lagged d i f f e r e n - t i a l s from causal influences? Clearly, i f both and y — t x , and the causal intervals are nearly the same, then any measurement at two times only may completely obscure the pattern. Only i f one has several measure- ments, so that cross-correlations over d i f f e r i n g i n t e r v a l s k can be obtained, w i l l the bimodal pattern be clear. Of course i f a negative feedback loop e x i s t s , a strong cross-lagged d i f f e r e n t i a l w i l l appear provided one's measurement I n t e r v a l i s reasonably close to both causal i n t e r v a l s . However, the d i f f e r e n t i a l i s produced by two e n t i r e l y d i f f e r e n t influences; x--ty and y-^-Vx. I f both lagged correlations are equally strong and i n opposite d i r e c t i o n s , one might w e l l suspect a negative feedback loop. D i f f e r i n g magnitude of influence I n the two previous runs, the causal influence of each variable on the other was made equal (c = x^ • c V What w i l l happen i f the two influences Causal Analysis 1 page 65 are made unequal i n magnitude? Some r e s u l t s are given i n Figure 1.23. Figure 1.23 here Again low short-term consistency was used. The e f f e c t of x on y was made more than twice as strong i n both curves (c = +.50), as the e f - f e c t of y on x (Cy = +.20 and -.20 r e s p e c t i v e l y ) . X One would expect the peak due t o x—*-y t o be higher than the peak due t o y—*-x. This difference i n f a c t appeared, although the difference i n height of the respective peaks was less than the difference i n causal coefficients. I f one has cross-correlograms on a set of r e a l data where measurements are taken a t several time i n t e r v a l s , so that two clear peaks or a peak-andvalley can be discerned, the r e l a t i v e height of the two peaks or valleys may indicate the r e l a t i v e magnitude of the two causal Influences. Effect of long-term consistency What w i l l happen, given reciprocal influence of x and y on each other, when there i s also long-term consistency i n the two variables? And what w i l l happen, furthermore, i f the zeta constants used t o create longterm consistency are either uncorrelated between x and y, or correlated? For c l a r i t y each of these conditions ought to be Introduced separately, but both were used i n generating the data i n the next two f i g u r e s . we see autocorrelations, i n Figure 1*24. First, Causal Analysis 1 page 66 e f f e c T of F-3& k (lag) Figure 1.23. I n these cross-correlograms, x was followed t o influence y more strongly (c a +.50) than y Influenced x ( c i n upper and lower curves r e s p e c t i v e l y ) . y x = +.20 and -.20 Other parameters remained the same as i n the previous chart. Peaks appeared w i t h same lags and d i r e c t i o n as before, but those due to y—>-x were smaller than those due t o x- Causal Analysis 1 page 67 Figure l 2 4 here a Autocorrelations are shown f o r a pair of x and y variables having moderate.long-term consistency (zeta variance set a t h a l f of t o t a l variance), w i t h a strong c o r r e l a t i o n introduced between the zeta's ( t h i s was specified t o be +.60, but because of random factors the actual corr e l a t i o n s between zeta's were .68 and ,62 f o r the two runs). causal influence of x on y was made stronger (c Here the = +.50) than the i n xy fluence of y on x (Cy = +.20 and -.20 respectively f o r the two curves. X Under p o s i t i v e feedback (x ) , the autocorrelations of both x and y became extremely high and p r a c t i c a l l y f l a t . The combination of moderate long-term consistency and c o r r e l a t i o n between the long-term constants, had the same e f f e c t as introducing extremely high long-term consistency. Under these circumstances, one i s not hopeful of f i n d i n g a cross-lagged d i f f e r e n t i a l (as the next chart indeed shows). + For the other pattern of negative feedback ( x — > y — - > x ) , however, the e f f e c t was less severe. Both autocorrelations s h i f t e d above the zero line (note the contrast w i t h a previous autocorrelation I n Figure 1.21), a l though t h i s s h i f t was more marked f o r the y variable than f o r the x. The cross-correlograms generated by the above variables are shown i n Figure 1.25. Figure l. 25 here i Causal Analysis 1 page 68 l.OQ Variable y Variable x .90 F--H .80 ,7C \ \\ .60 y \ .504J .40 / .33 / / / / \ .20- + / y v .00 25 20 15 10 10 15 20 k (lag) Figure 1.24. Autocorrelations when long-term consistency was introduced i n t o each variable (zeta variance = .50 of t o t a l variance), w i t h a strong c o r r e l a t i o n between the zeta s (about +.65). Causal i n fluence of x—>-y was made stronger (c » +.50, g - 3) than the xy influence of y—->x (Cy =* +.20 and -,20 f o r upper and lower curvet;, g - 5). I n contrast w i t h Figure 1.21 the autocorrelations a l l became posit i v e — e s p e c i a l l y f o r variable y. With positive feedback (upper curve) they became p r a c t i c a l l y f l a t . ! x 25 Causal Analysis 1 page 69 l.OQ .9dF--W ,8d .7d- 6d- 5&_ / ,4d 3d x— y —>- w / -p \ \ / \ • \ 2d- \ / \ \ / F-*X \ I \ ld- \ I ocj- -25 •20 •15 -10 0 5 10 15 20 k (lag) Figure 1.25. Cross-correlations f o r same data as i n previous chart, given long-term consistency w i t h a strong correlation between the zeta's. The same peaks and valleys appeared as i n previous cross-correlograms (Figures 1.22 and 23), but they were almost o b l i t e r a t e d by p o s i t i v e feedback (upper curve). 25 Causal Analysis 1 page 70 The presence of long-term consistency, coupled w i t h strong correl a t i o n between the long-term constants, raised a l l cross-correlations c l e a r l y above zero ( I n comparison w i t h the pattern shown i n Figures 1.22 and 23). However, under p o s i t i v e feedback, each variable had become so stable that there was almost no p o s s i b i l i t y f o r v a r i a t i o n i n the cross-correlogram. S l i g h t peaks appeared i n the same places as before, but they were almost obliterated. Under negative feedback, some cross-lagged d i f f e r e n t i a l remained. But; unless one had measurements a t several I n t e r v a l s , i t would be hard t o disengangle the e f f e c t of x-^*y from that of y-^-x. G. Simulated panel data Conclusion containing two variables x and y have been generated by computer, t o investigate how lagged correlations between the variables w i l l be affected by the presence of known causal connections. When x was allowed t o influence y (but not the reverse), and both variables had a t least moderate short-term consistency ( i . e . , each value of an individual's x and y score i s a l i n e a r combination of his immediately More recently there has been created a ten-variable simulation which w i l l permit more complex causal connections such as m u l t i p l e Influences on the same v a r i a b l e , causal chains, i n t e r a c t i o n e f f e c t s , e t c . The variables are created i n the same fashion as i n the two-variable model. I n addition t o these "true" values a corresponding set of "measured" values i s also created by the addition of a specified component of measurement u n r e l i a b i l i t y . Causal Analysis 1 page 71 p r i o r value on that variable and a random error term), a clear difference i n the "cross-lagged c o r r e l a t i o n " appeared. That I s , the c o r r e l a t i o n between x values a t a given time and subsequent y values was greater than the correlatbn between y values a t a given time and subsequent x values. The difference was stronger the greater the short-term consistency, and appeared despite marked inequality i n t h i s c h a r a c t e r i s t i c f o r x and y. The cross-lagged d i f f e r e n t i a l was obscured, however, by the presence of long-term consistency as t h i s was created I n the model by introducing I n d i v i d u a l constants f o r each variable around which the individual's x and y scores fluctuated^ the Long-term consistency appeared t o matter mainly i n causal v a r i a b l e . Hence i n the presence of certain conditions which are often approx- imated i n r e a l data—moderate consistency i n each variable over time, l i n ear relationships among variables, e t c . — t h e introduction of known causal connections i n fact generated clear-cut differences i n the cross-lagged c o r r e l a t i o n s , even when the lag (measurement i n t e r v a l ) departed from the i n t e r v a l of causation. There s t i l l remains the important question of whether, given observed cross-lagged correlations i n empirical data, one can reason i n the reverse direction and i n f e r causal connections. sets of actual panel data. Some next steps w i l l be t o examine An e f f o r t w i l l be made t o f i t t o these data d i f f e r e n t simulated models which may vary on such features as l e v e l of short-term consistency, presence or absence of long-term consistency, u n i d i r e c t i o n a l or reciprocal influence, positive or negative influence, correl a t i o n among the long-term constants, e t c . I n Figure 1.4, f o r example, Causal Analysis 1 page 72 were shown three curves d i f f e r i n g sharply i n such properties. Perhaps I t w i l l prove possible to eliminate some types of models as e s s e n t i a l l y incompatible w i t h the empirical r e s u l t s , and thus narrow the range of models among which causal i n t e r p r e t a t i o n s can be sought. Causal Analysis 1 page 73 APPENDICES: Table of Contents page Comments on mathematical model f o r causal analysis •*••••• Summary of Notation i n Appendices A and B* • • • • • • « • * * • 75 Introduction t o Appendix A • • • • • 77 Contents of Appendices A, B, and C* • • « • « • • • « • • t A.l. ...... and ^ 5 ^ 3 " * Appendix A. Moments of [ x ? ( 74 The common model f o r j x J fc A.2. The time series A .3. Asymptotic formulas . . . . A.4. Variance of Y t 78 ^ and £ ^ • v t &® 82 ........ 84 ....... 85 A.5. Covariance, asymptotic covariance, asymptotic c o r r e l a t i o n - • • 86 A. 6. Covariance and c o r r e l a t i o n between x and Y_» • • 88 n Appendix B. Moments of £ B. l . x i t j ^d ^ Exact results f o r ^ i t j i Y t ^ ^ ^itj" ^1 B.2. Asymptotic r e s u l t s f o r | Y ^ j 92 B.3. 94 I n t e r p r e t a t i o n of B.2. Relevance of r e s u l t s Introduction t o Appendix C . - - • • «<> 96 Appendix C. J u s t i f i c a t i o n of Figure 1 i n A.6. Lemma 1: • 97 * * • • . • • . . •• » « . . . . Lemma 2;. 98 99 Lemma 3: 101 Lemma 4 ; . * * « * * e « * Lemma 5: Lemma 6: 102 103 » . .. Lemma 7:. • • - • Numerical accuracy of Figure 1 i n A.6 104 • • • • • • * » • • • • • • 105 •• » • • • ° Appendix D. Computation of Correlations f o r Simulated Data • 106 107 Causal Analysis -Appendices page 74 Comments on Mathematical Model f o r Causal Analysis Among possible examples of data w i t h c o r r e l a t i o n structures which may contain Insight i n t o causation patterns i n that data, panel data serves as a useful r e a l i z a t i o n f o r describing the mathematical model developed here. For s i m p l i c i t y define a panel t o be a f i x e d group of persons who report per-, i o d i c a l l y on t h e i r behavior. Suppose the i ' * 1 person I n a panel of size N reports at times t » 0, 1, 2.,.two numbers, x£ and £ , where 1 - 1 , 2,...N. Y t The measurements x| and Y j t t t vary then according to both the i n d i v i d u a l 1 and the time t . For a given time t and i n d i v i d u a l 1, x^ and Y^ are random variables t t w i t h f i n i t e means (expectations) E ( x [ ) and E(Y^ ) and f i n i t e fc variances t positive Var(x' ) and Var(Y| ), To develop formulas f o r means,variances t and correlations f o r x and Y' as functions of 1 and t we f i x on the 1 it it 1 t h ind- l v i d u a l , suppress the subscript i In the n o t a t i o n so that x£. d Yj. replace an xj^and ^ > -d study the p a r a l l e l sequences of random variables l > l . Y ai t 9 x 0 * l * 2» • * • * t •" «*•" x ; x x YQ, Y^, '*•* sequences £ I n s t a t i s t i c a l terminology the above are called time seriea Y and w r i t t e n [x* J and j V J where t = 0, 1, 2..... t F i n a l l y , we extend these r e s u l t s f o r the i n d i v i d u a l t o the group. To express the r e l a t i o n s h i p between x t and Y requires introducing a t h i r d fc sequence of random variables y^, t • 0, 1, 2.... Roughly speaking, ^x |and t | y ^ J are sequences of dependent variables which develop Independently as sequences, and combine t o form the sequence^Y^J. i previous ( i n time) values of x y i t < depends only on y values. That i s , x may depend on fc i but not on any y value or values. Similarly, Causal Analysis 1 - Appendices page 75 Summary of Notation i n Appendices A and B t s - Non-negative integer valued subscript representing time. »jjkj«^ - Integer valued subscripts used t o denote values of t , n N - Size of population of i n d i v i d u a l s . 1 - Non-negative, valued subscript corresponding to an i n d i vidual i n the population. X it* X t X m e a s u r e m e a t (random v a r i a b l e ) associated with 1 indi- n vidual i n population at time t . when discussion focuses on i n d i v i d u a l the subscript i i s suppressed. yit*" t v " ^ measurement (analogous t o x measurement above) Independent of x measurement. , ^. 2 2 <T , Oy - Means (expectations) on x ^ , y ^ which are constant 0v*r-'-fc. t i t i - Variances of x^ , y^ t t constant w i t h respect to both time t 2 Thus Var(x) = V a r ( x ) = 0^ = V a r 0 and Var(y). = V a r ( y ) *= x x i t = Var(y Q yo , y6y < - Autocorrelation of x ^ ^ i w i t h x^ i t I *) given fixed i . | I ) given fixed 1. (y]_t+l i w t t n v it) co n s t a n t over t and 1. Y it* Y t " M s a s u r e m e n t ( it| x T a n formed from l i n e a r combination of terms from the ^ |^*tj ser ^- eSa L a t t e r notation used when I sub- s c r i p t can be suppressed. - Positive integer value of t where terms from ^ i j ( 1 x t 1 f i r s t combine t o give , Y^ J. . t g an£ * ( itl Y 1 For t<£T Y i t i s defined equal - Causal I n t e r v a l i . e . number of cycles before time t > T at t which x, "influences l-tr-g. i 11 i Y.-^. I n recursive d e f i n i t i o n , Y, It I I * l t is given as depending l i n e a r l y on ^^ ^i» t -term. x i-t-g> a n < a a n e r r o r Causal Analysis 1 page x ,y , t* ' t t - The variables x | xtj* " e s e ^ u e n c e s y t* ' t t > c translated t o have mean zero, °^ ^dependent random variables i n t o which the dependent sequence o£ variables ^ t|» decompose. x o^c^ - See A.2 (11.1) (11.2) @ - asymptotic See A.3 E - Expectation Var - Variance Cov(x ,Y ) - Covariance of x„ w i t h Y._ s* t s t % x>(x ,Y ) / s t - Correlation c o e f f i c i e n t between x c , c^ - Coefficient of x-term X " Coefficient of 0 1 - See B.2 (4) * and Y r l n recursion i n recursion for. Y for t Y^. , and Y^.. Causal Analysis - Appendicoo page 77 Introduction t o Appendix A Formulas are derived f o r the c o r r e l a t i o n c o e f f i c i e n t s between pairs of random variables drawn from w i t h i n and between throe time series, two p a r a l l e l time series « ^ t ^ x ( t - 0, 1, 2, . . . ) . a n d "( t} y { t} a n d Y f o r m e d f r o m t n e m The scries Y^ i s defined t o equal y^ T-l whore the integer f o r t « 0,1,... T > 1 i s called the generating period. Is set equal t o a l i n e a r combination of Y _^ t-g» x t a n At time T, e r r o r ( term. The normegfltivo integer g i s called the (size of the) causal i n t e r v a l . Thereafter, f o r t > T , Y^ i s defined t o be the same linear combination r i f of ?c - i<' x t-g , and an error term, Because c o r r e l a t i o n c o e f f i c i e n t s are invariant under t r a n s l a t i o n , l « i t proves convenient to replace x t and y t by the random variables (pro- vided the expected values E(x^) and E(y£) e x i s t ) . x - x - E(x£) \ , ( t - 0, 1, 2, 3,...) y - y - E(y ) t t t t t Hence, the superscript " 1 " serves only t o indicate random var- iables which have not baon translated so as t o have zero moan. We assume that ^ t J x aru * n a v Q autdcorrelated (Markov) time series,, t n G b *- characteristics of ae c (See Kendell and Stuart, The Ad- vanced Theory of Statistics., Vol. 3, pp. 405, 418.) The f u l l force of the s t a t l o n a r i t y assumption Is not needed hence not assumed here. I n stead, WQ require only that x 2 2 cr and or constant over t , x y fc and y t have f i n i t e positive variances s \ % x ° , ) l r < a ) Causal Analysis 1 - Appendices page 78 Contents of Appendices A, B and C c In Appendix A moments of the three time series are developed. f t} » x * By moments we mean the s t a t i s t i c a l parameters ex- p e c t a t i o n , variance, covariance and c o r r e l a t i o n c o e f f i c i e n t . Appendix an< The formulas in A apply only t o the I n d i v i d u a l but form the basis from which the r e s u l t s f o r the t o t a l population or group are derived* Appendix B contains a summary, discussion, and abridged derivations of formulas f o r moments of the population. I n p a r t i c u l a r , f o r the model set f o r t h i n {is,) page 25 o f the main report and B.2 of the appendix, B.2 (10) it i gives the asymptotic ( i n s and t ) autocorrelation between Y^ and Y^ (where t e and t are times) and B.2 8 (11) the asymptotic cross-correlation between and Y^g.. Sections D and E o f the main report focus on these two correlations, The i n t e r p r e t a t i o n o f the formulas i n B.2 and t h e i r relevance t o Section D and E are treated i n B.3. The v a r i a b l e s [ ^ | d j ^ i t f x a n a r e t n e t i n the main report. Define The variables { V ^ t ] £ ^ * l x l y * & u * P°P l * u at c o m D i n e on w i t n 1=1,2,...N yeriables under study [ itj x t 0 f o r m { itJ' Y t-1,2,... (1.3) -E(y; ) (1.4) t 0"2 v a r ( x ) » V a r ( x ^ ( i ) , the variance of x ^ given 1 » Var(y) * Var(y | i ) , the variance of y. given 1 y it it c The means over time t . t t and ^ depend on the i n d i v i d u a l 1 but are constant 9 The variance o f the* random variables x or y.. f o r any fixed it *- i n d i v i d u a l 1 does not depend on the i n d i v i d u a l or the time. by suppressing subscripts 1 and t . c We indicate this (1.5) (1.6) Causal Analysis 1 - Appendices page 79 These properties enable us t o w r i t e That i s , a l l the variables x[ ( y j ) &re i d e n t i c a l as variables t t except f o r t h e i r means J T ^ t ^ y ) * Formulas (1.3) through (1.8) use the n o t a t i o n o f the main report and l i n k Appendices A and B. For each i n d i v i d u a l ( i ) the model*set f o r t h represents the measurement Y made a t time t by a l i n e a r combination of Y. . i,t-i it (the Y< measure' 3- 0 ment one u n i t before) ^ . g (the x^, measurement g units before) and x v an e r r o r term. iables ^ i | x o n s I n t h i s sense £ _g i s the primary influence of the varx t Y s f t it° ^ ^ 1 < it-g i t x Y } s * e a d s n a t u r a l l y t o the conjecture - P < is' i t > x Y f o r f i x e d ** g t e e r t w g, g + l s - 0 8 p S g+2, . O B 1, 2, ... The seven lemmas comprising :Apj>«*<J»* C provide an answer t o t h i s question In the case S and t are very l a r g e i n Figure 1 Ap^ndix- e The r e s u l t s are summarized A.6 and Figure 1-12 page 40. page 80 Causal Analysis 1 - Appendices Appendix A. Moments of x A.l. The Common Model f o r j x t | a n d t and Y { t}' v The series f^*^ -[yt} d i f f e r only i n the values of the parameters cr , o* , p and p . Thus, only the model f o r I x \ I s developed, x y y v t' a n a v We express the dependent random variables x^. i n terms of independent ones e^ t o obtain simpler d e r i v a t i o n s . Analogously y t fc i s expressed i n terms of e • yt Model. Let ^ x j ( t » 0,1,2,...) be a sequence of random variables t 2 w i t h common mean zero and common variance o* » x Let - ^ t } ( e "* 0>1»2,...) be a sequence of independent random c x variables. Let ^ be a r e a l number such that j o ^ Define x Q = x t e «c 1. (2) ; x Q « p x _ + e \ Cx t - 1 xt (t-1,2,3,...) E<^r*>~ From Cv) . (3) '1) corrG (cCtT&n <Z<P& €-ft<i " '<e faefwe. e. H Causal Analysis 1 - Appendices <*? fVo^f Vy^-f tff index ptsrt £zp> page 81 Of) <T^) * W 6 6 ? J-&*M~ vW/ - W o v <Sxfc , so y'*e(<i X*>-% ct"d , -t- f / C^XtE: 2 c o Causal A n a l y s i s 1 - Appendices <xwt*J "the. GICJS Yt~- ( • ~fha -fan*** \zl*c> cZ> -2 "fe, Appendix n <t&.r-i*ex{Hon*' Ytr. -fasiT .cu<A Xwtro.<A/cY*© Vi^'is* page 82 A « X*~6-tfr> "fe -fa*, J-zr CVc3^> ~T" fc>e kvS + £> ~fc> ^ ^ * r o ^ ivt *t~ v hr'c k ^h -Vfcs ~ierm£ ^ f /»c<e/ '/ytct.lre? Ji/fw . less Y£ t*Se-fuf WrY«* , cvb&<sc ^5 Abofe» 4i>r~ <ar& 4tH*o«jh <3Ach J<s>*t* inert* * 7?<*ujh ( y Causal A n a l y s i s 1 - Appendices pag« 83 v/ •t--r t o . u**»cj reverse ^r<^> 5^6/ <?rc/er<*- 'shed. ^umrr, afl&n, zr^irt" -f* Causal A n a l y s i s 1 - Appendices .3. page 34 Asymptotic formulas. From the equation f o r Y given by (10) we derive expressions f o r t the variance of Y , and the covariance and c o r r e l a t i o n between Y fc Y t and between x s and Y . Also t and 8 simpler "asymptotic ' formulas for 1 r these expressions are given. Conventions throughout the remainder of Section A, s > T and |s - t | » n t > T. the absolute d i f f e r e n c e The p r e f i x "@" stands for "asymptotic" e.g. 'JffiCov" means asymptotic covariance. D e f l n i t i on of asymptotic By asymptotic we mean that the integers g, T, and n » [ s - t j are a minute compared to s and t so that expressions of the form p 8 £ 8-t py , and py (but not of the form p ) may x x „p x t » be regarded as zero. I n t u i t i v e l y , taking the example of time, i f t represents the present, then s i s the near future or past and g and T a r e the remote past. I n other words the values of x and y a t times before T have n e g l i g i b l e influence on the values x fluence of x s on x ( a n d y c i f t i s l e s s than s. 3 on t and y t as compared to the I n - y t ) i f a i s l e s s than t , or v i c e versa Causal A n a l y s i s 1 - Appendices page 85 <o> <xbo e i„ f(<*? V f ~ = (=y - We. < - c s va^xltee <?>»'rh aft •fr*>*±, d>*~ appear pro^s -s'ynpfoi— ~it^a^ <-r~ey f - f > <^*= O £,r~ i< -- fo v - r ( ( 1 . 3 ? Case ~fa<z cr^-src J. 5 ^ a « 4 £>y ^ ' m ^x My t**p<=>f~T&*r' Way , R , *-T+ * -t.rv * *- V '7 T Causal A n a l y s i s 1 - Appendices - ^'''^'^ (l-y*/C< o f -frie Vat-la ~ r [ o ^ _ - * tail— irt eXf&A— -ftf <£ ^«"lO ~*/?(f~ <V '« bl&s page 86 rrf"f" .6/ - <**> CL&'JV'G b<?C<J>yn CJ - XJKX' ?. CKS>7 ' £ 1 1 + ( K ^ f - f v <-f*fV ^ f ; page 87 Causal A n a l y s i s 1 - Appendices For- « 3 ( (/8.0 ( 7 c ^ - a ^ y ^ i - f>*? 6*? f/ - f* 6 - (if/Xf- f v ^ - * eVr«+ Carres JO*»J\«J f*'"~JS <>f /* / C* Causal A n a l y s i s 1 - Appendices A.6, Covariance page 88 and C o r r e l a t i o n between x and Y s t Consistent with the notation i n A.5. we define the c o r r e l a t i o n c o e f f i c i e n t between x and s p <v V 3 C o v <x s» t > / ° x Y / to be t V a r cf ) • t Asymptotically Var (Y^) does not depend on t (see (15)) so that /o(x Y ) v a r i e s i n x and. t as Cov (x » Y ) v a r i e s . fl> fc s I n (9.1), t the d e f i n i t i o n of Y , * _ g enters a s "the i n f l u e n c e " of the x fc t s e r i e s on Y . I n t u i t i v e l y one expects therefore that Cov ( x , Y ) and p ( x Y ) a t t a i n t h e i r maximum values when s « t-g. fc t s s > t Considering only non-negative values- of /o and p^ asymptotic value occurs a t s » t-g i f py time s before t-g i f /o (1+/> ) > 1. y X the maximum 9 x (l*/^) S * ant * a t some I f we think of time s as the present, then we expect maximum c o r r e l a t i o n g time u n i t s l n the f u t u r e . I f the maximum occurs a t time t a f t e r s-fg, then wa say the maximum i s delayed t-g-s time u n i t s . Thus, we define the delay d as follows: d « t-g-s where s ^ t - g so that d assumes only non- negative integer v a l u e s . Figure 1 on the following page shows the regions of the u n i t square (with a b s c i s s a p lay d « 0 P 1^ 2, x The l o g i c a l j u s t i f i c a t i o n s of the table, Figure 1, and the statements p Y above comprise Section C. At t h i s we conclude Section A by obtaining expressions for Cov ( s > t>» @ x on which the de- 10. The t a b l e below Figure 1 gives the areas of these regions. point and ordinate p^) C o v < a* t>> x Y a n d @ P <V Y t>' Causal A n a l y s i s 1 - Appendices P* rgy u re page 89 J I.00 8 Delay .30 Delay « 2 .80 Delay * 1 + .70 Delay « 0 .-60 + .50 .00 .00 .10 .20 .40 .30 1 q • i ft*A a. .60 .50 2. 3 6 ,70 .80 <? ~7 S + &t?o3 .90 1.00 to 4. CO? Causal A n a l y s i s 1 - Appendices ~~Hie vtf-iAb (a*" j -the page 90 tea t*a i*4 V 2- o<k a$f»*f>~fc>i-;sr<3jL, sense -fc?<? &t ikies Who* -m = S W>«» = tot f r - e * A,3 , k/cr. _& b e c ~ " e ~ t~&<Ju*n?s "f& £^£ •I - ff££Lll%f-s+0-&-j-*W*] page 91 Causal A n a l y s i s 1 - Appendices ~7J?<- te^ri* (~ts Append ~fUos?<z l,i <X A Appendtx' erf &, SJ 3* we -&>rIn expand OH C 'trtci&partde Appeal ~fe A. fx C&.sT? v ^ « > F*o m £(**? = CI. C? <u*d a:s? E Cy±? <^> < a - f c ? l ( o w ^ C*. f? ova -f^^ u *(1 -fa* m ><t y^ a*tj beecutM ~~faer& Z e*tt i ) f(*3? , s -fixed Zrf' = ovw JCC^^^OSG la* c~c *rf ~hi<T ****** Ca<3? -fo -fro* cmd bsfaw* + V*rf?iP Xsft-ttturtZr. - t (<*»<J i?>«fr<?4*zh\»* JC„ C3.3? <ab*>vc -fet-„ -TZtc.«j -fee >-^7o cr* <X.X? by 0.1? * <tOg) V* C**(y&,yD=$~'%? -+hd[t~ XU t/ 5«<:e amji and j \<£ W ^ A f c / s j , "TUw* -fate ,"TJr,^ a^iz-bli&Lie-r ba^i^ -for* c / c / v f e p er<a_c^~ t,3j'-- -tt,iS ZCrfr* Ju£-hh„ Tfec t-m Appear *? ft Cx.O y+rhUb ( 7 . < £ < f f o ~ r & bo se Varte*. I CXr, y^? by 4. t 0?? y<*tJ* <2. <f? t w page 92 Causal A n a l y s i s l - Appendices •Ho „ ft* ? Insults -fite*- C . ^ Y ^ l , ely-Str*-/awec9 + for Vi* lY-j] -far- * ~t ? T . F^r ^*i*-<j aucj -fox? -6- /rtrla^/e *W u*£V7*i •£ < 1~ w* -e^-r y>e{<is paga 93 Causal A n a l y s i s 1 - Appendices — ~TKe -font- -^K^-T ^ — ; < ^ Cay C* 9 ) am r ky> O f ? Causal A n a l y s i s 1 * Appendices B.3. I n t e r p r e t a t i o n of B.2. page 94 Relevance of r e s u l t s . The a u t o c o r r e l a t i o n formula ( i<y ) and c r o s s - c o r r e l a t i o n formula ( It ) Illuminate the simulated r e s u l t s i n Sections D and E of the main report. By v i r t u e of the d e f i n i t i o n o f asymptotic i n s and t I n A.3,, only C o v ( Y Y ) and Cov(x^, Y ) e x p l i c i t l y depend on s and t . Thus @ st fc t V a r ( Y ) i s constant over t and so a r e the denominators i n ( \& ) and i t (// >. As the l a g l s - t | increases ( t h a t i s as k or -k Increase i n magnitude where k » J s - t | i n main r e p o r t ) Cov(Y , Y ) and Cov(x , Y ) approach fl zero by (ftf, \ ) , (<£<?) i n A.5 and (fia.l) t i n A.6. fl t From ( \& ) as l a g Increases Oppr-oashe*, ( Yc«. .Ylt / a-^J -CrT*** „ . x CnJ These expressions give the asymptotes r e f e r r e d to on page W. t X+ -feH+ws -G-°m( *5 ) whan ^ that I s , @ Y Y lt d o n o t i t does not depend on fay* i t depends n e g l i g i b l y on J i y ; I t follows that the moments of depend on fay as observed on pages 33 and 57 of main r e p o r t . More p r e c i s e l y , when f y nate the ^ « 0, Y i s small compared to Cv-.. the terms with c dcni* ^ xy terms. The l a s t point helps e x p l a i n why on page the c r o s s - c o r r e l a t i o n f a i l s to reach zero despite the strong negative c o r r e l a t i o n between and Causal A n a l y s i s 1 - Appendices I n ( // ) V a r ^ix^ + page 95 T C o v y iix^ ^iy* d e c r e a s e 8 slowly as Cov 0!>ix» J l y J Brows more negative because Ty i s s m a l l . When £ and l y are independent and hence C o v ^ ^ , ^ i y ) • 0 i n ( I d . ) t The variance V a r ( Y ) then I s oonotone increasing i n T i t 2 y , Var(J^ ), y 2 c^ , and Var(^£ ). x As noted on pag<2 £f the autocorrelation ( (0 ) i s monotone increasing In V a r ( ^ i x ) and V a r ( ^ ) since^as the denominator i n y ( ) increases,@ ^>(Y^ , Y j . ) approaches one. S t F i n a l l y with regard to comments on pages 33 and r?£ we d i s c u s s the e f f e c t of varying C™. i n ( io ) and ( f/ ) . From (53.1) Section A,6 Ay @ Cov(x , Y ) i s l i n e a r i n c ^ . I f 7^ i s close to zero, then the numerator i n ( // ) becomes e s s e n t i a l l y l i n e a r i n c^y and the denominator i n ( // ) 2 depends e s s e n t i a l l y on c ^ only through • Thus, I f c^y p o s i t i v e i s s t replaced by c w of the same magnitude but negative* the numerator reverses 2 sign while the denominator of ( If ) i s unchanged since ( - c ^ ) t i . This v e r i f i e s that @ 2 » c^ • * l ) merely changes s i g n when c^y changes s i g n t as observed on page 3^. On the other hand when (/A) holds or Ty. I s close to zero, the auto-* 2 c o r r e l a t i o n ( ia ) depends e s s e n t i a l l y on c___ through c and no l i n e a r / xy i » A terms i n Cxy. Thus, @ pC*i8» as observed on page 33 • Y it^ r e m a i n s f i x e d when c ^ changes s i g n Causal A n a l y s i s 1 - Appendices page 96 I n t r o d u c t i o n to Section C When t h e c o r r e l a t i o n s p ^ and p^ are near one the maximum asymptotic ( i n s and t ) c o r r e l a t i o n between x ^ i s delayed. That i s p s and. Y^ does n o t occur a t s « t - g but t Y has maximum c o r r e l a t i o n w i t h some x. where s I s it is more than g time u n i t s b e f o r e t . As might be expected increase together. and 0 £ p y 1. t h e delay increases c o n t i n u o u s l y as The lemmas i n S e c t i o n C consider o n l y 0 £ They t r e a t t h e delay d as a non-negative except i n Lemma 7 which r e s t r i c t s d t o non-negative and < 1 real variable i n t e g e r values. Thus, i n Figure 1 o f A. 6, t h e i n t e g e r . v a l u e o f delay gives the .maximum c o r r e l a t i o n i n the sense t h a t (Xj , Y..) >dx, V..) f o r a l l non-negative it-g-d, i t is, i t f However y i n t e g e r values of s. i f n o n - i n t e g e r values o f s a r e a l l o w e d , F i g u r e 1 w i l l look s i m i l a r b u t w i l l have d i f f e r e n t boundary curves s l a n t i n g d i a g o n a l l y toward the upper l e f t corner o f t h e u n i t square. The n o t a t i o n i n S e c t i o n C i s o n l y s l i g h t l y r e l a t e d t o previous nota-r tion. For s i m p l i c i t y x replaces p'^ and y replaces p^. t i v e x and y For s t r i c t l y p o s i - k i s d e f i n e d equal t o y/x. This k i s u n r e l a t e d t o any k used p r e v i o u s l y i n t h e r e p o r t . Causal A n a l y s i s 1 - Appendices y page 97 f^f* * & y TiTe •fir-ft' L*> *3 "tW^ sU^s -t*<*t JA Osr eJ -tuc^e 'Wr<?«tJeJ , te.O £t**d ^y c - yr l <~ «y dawfojo "prt?perr7r e-r "h> each f>*\r- &f <*,d) ~f-C*, / s c f j "hterc CV'n-s£w4s -£CJ, * y} •« -CCJ+ t > x , t appr&CLcJk ( a . s O -i-fte <*(s*> I'nc - f a r y - y I ^ d x . *f><3* A s & i I cr 14* j ?^ * 3- / -——, j*, p _ ^ ^ ^ ^ ^ s „ *p y * p i *— 7 , ^ ~ ( t + p ~ / r C * * - 0 ' '~?J-_l:»v A Causal A n a l y s i s 1 - Appendices Z " w -ft*c y - &b 0v'e a It page 99 * >< i p[y -t- A the*} {»cxr^. <r — C? & ; . -fixed I - Xy. **" d a -frcr*o- (2.2-) -Hie asymptote. - f t t — rry / 6 ) Causal A n a l y s i s ! - Appendices fC^ST^ r/C ti <rT{' -far ti'f'rt- \ dll page 100 d P a Cv, fr> _ Causal A n a l y s i s 1 - Appendices cztfeac- r^x>t x/) page 101 of-tter- ~~t(< - / .-z„c,-ru<z<~- root•'<9*t*ii, tr.*^^« i — «y ^ ^7^i & CPrr , ~~ . ^ < * *r < ^ £ < - Sase f< — f o n f y (r± frr*"*" ( rf*~y? c**%& f< = ( is t -t * J &t*d f?y ~tt9c? »*r<9 yields '+ X J « v b&<r-<?r»<?x -iT^ert&d *zf-t&- ' cT/ ~, <Vi ^ ^MC^ 'G^aZting (6)*Mh / VT%t& wU< cl\ ^e^O' y - *- L ^ ^ /* ^ '*C -this / . ; us<»^j Causal A n a l y s i s 1 - Appendices 3* }s <3? y =*r a»<*fher- * f&^f- a ^ r page 102 ^cCs) rr» f,'<p, r beside.* y - <j * X" _> TCu^ ^ K' - if x^>r Causal A n a l y s i s 1 - Appendices Castv JE*- , u and ft eSyC*,? -KG art <*"<^ g -&+(lc»rtr C u e k*t + = -pw*" i g - C x i j . * * ^ <* O r^/tfj cPf y CPU--^ which (x't - < yj make.* S~ -teat -fa*. .<*pf><*yH& CM e "/» S t h t = e ? <• . He* gC^^T? is g f y J ~ * ) f a. i^ed. so**<> fc' e (l-<~ > H - O . £ \ hero <r<*j <rJ? Gav*e i ^ £ ^ ) > C&,r? y-=-c=> = ^ f t / c y ^ gC*?.,kO '/*-fc* XH <z/+ ( ~th& k*i - <? ~y have b<*<~t* yfGQcCtTy^ r<2 and •fm™- ? almost r in^e. czW / -hh£h k-~ ( -proof- page 103 ~ , y is y, d f O H^i- hC^jXry <SK p r e - sri^-^ ^at- ~tUa = UV)t^ue So(^7o h y ^ O - ^ ) , = ^ '< rtaat* / s ^ y ~ £ b » <6 ) ~<z ^ ' — ) ^ c ^ *-p-* z-*7^ &t+*OS <p -J< ±^Hp S-uv&w m «-^ll_ J p«c? -/..*7^' / ^Mrh f U G ? r p?lp?&wcr? P 3^ (J**P}W Pt^P -JM>Cop< > - l ^ V * • ' j-.'** p J>*y t ^ <x # S-v&^b -HSMS) ffyvy. f>*Z? de**) &^h~* z>s\&q -& ' hl^ & ^ pj^ r y'M-/- !_ ^^y^^srq^p Causal A n a l y s i s 1 - Appendices j n c j V*t«*s tf? -£<*o-i~~ <'« i ™ p y> f < 'Bert sotulri?*.* ~~&> Th? rnarrxzly »,7tU -Oc*, , -1-b*rf~ -i%ere y~ x y O <r yo ^ f ^ e . ^ ' i ^ r - f r c ^ i e ^ - f i y y * . T k * x , »z' &xis~~hi K sueU a - a»cJ ^ W<3 ( C<rJ > x~) wk & ^t^cJ ^ ^ b^rh, y ' d + iy J roof- "/fie -V^ (V^r was b let ^ r o x ^ f e ^>r~ ^ &r~',j,„ „ j . x w &(( ^ a a <; a J a <j,or Hit ^ ~f>?c e o l o r J ^ f e * ^ r ^ ^ ^ y (pafouJ a^pr-ox -£t'n<Ji'*j - > a ^ y ^ ? r~r<?>rt~ " b r v e f f y cJe s<z r i & & "f^e . « ^ Value* J-o.^^y &r f i7r e-* -that" 7 Seech** ^ur^e y invert***- < r ^ ^ T ^ ^ ' ^ ' " « j / le* "*- y^~) u<;',n<j —The* /:, ^ _ I merino snc^H-h i Prjur* t'u <- groad^r < J by y ' ^ J > - -<r>cy y Gac: ^ - y / Values <r&Y* p,u't<z?r- -fe*^ <- <d <T^+r>f<<3<-<2 -f-Ujuro^: <r>r f~*tr<: y , J e e r * * * - fce^e ctro • page 106 ^..rh n )„ , Us',^j Hc"/« YT^ M proved -fb&j r Causal A n a l y s i s 1 - Appendices Appendix D. page 107 Computation of C o r r e l a t i o n s f o r Simulated Data As i d e a l i z e d , the p a r a l l e l time s e r i e s £ tj» l ^ t ^ f i x e d means x and v a r i a n c e s over time t , so that c o r r e l a t i o n between say x pends only on k. extremely fc and * t + k de- I n s i m u l a t i o n , however, the means and v a r i a n c e s , while s t a b l e vary due to random s e l e c t i o n of v a l u e s and the f a c t that i n f i n i t e time Y fc cannot reach I t s l i m i t i n g d i s t r i b u t i o n . A u t o c o r r e l a t i o n s and c r o s s - c o r r e l a t i o n s are computed for lags k » 0# ±2»**«±25« Two methods are used t o compute the c o r r e l a t i o n c o e f f i c - H l e n t for l a g k which provide a quick and easy a l b e i t crude means of judgIng the s t a b i l i t y of the time s e r i e s . Method 1. From 50 c y c l e s of the s i m u l a t i o n producing 50 a r r a y s of v a l u e s , say x ^ p i 2 » " » i 5 0 x #e x f o * *° » » » » « r 1 2 1 0 0 » 5 " c e n t r a l " values of t are 2 chosen, 25 separate c o r r e l a t i o n s r ( x , x ^ ) computed, and the average f c of the 25 v a l u e s r ( x , used to estimate the c o r r e l a t i o n f o r l a g k. t Method 2. t As above, 25 c e n t r a l v a l u e s of t are chosen. However, instead of separate c o r r e l a t i o n s the x v a l u e s are pooled i n t o two c l a s s e s , namely t - v a l u e s and t + k - v a l u e s 6 and one o v e r a l l c o r r e l a t i o n f o r l a g k computed. C e n t r a l Values of t For t « 1,2, ..50 to s e l e c t 25 v a l u e s of t f o r l a g k » 25 involves e a l l 50 v a l u e s of t i n the computation s i n c e t » 1 goes w i t h t « 26, t =» 2 w i t h t « 27,..., and t « 25 w i t h t « 50« puting v a l u e s of t are needed. "central 11 I n general f o r l a g k„ 25 + k com- We r e q u i r e these computing v a l u e s to be i n the sense t h a t they be consecutive and the minimum t used l n computation be as near to zero as the maximum t i s to 50. For example f o r Causal A n a l y s i s 1 - Appendices page 108 k » 10, 35 computing v a l u e s are needed, namely t » 8 through t a 42. Note 8 i s e i g h t from aero and 42 e i g h t from 50. The " c e n t r a l " values of t are then t « 8 through t - 32. The diagram below i l l u s t r a t e s which 25 t v a l u e s corresponds to particular lags. The axes r e p r e s e n t times t « 1,2,...50. therefore corresponds to a p a i r of times t and t+k. the s e r r a t e d diamond shape times used f o r l a g k «• 0. Each square The squares i n s i d e l a b e l l e d w i t h 0 are the twenty-five p a i r s of Those l a b e l l e d 5 correspond to l a g =* 5, those l a b e l l e d -5 to l a g «• -5, e t c . diagram of t v a l u e s here page 109 C a u s a l A n a l y s i s 1 - Appendices 6 10 8 8 8 16 8 Time t 8 e 8 30 35 V5 I 1 20 SO i Time t Diagram of t v a l u e s
© Copyright 2024 ExpyDoc