PH144B Spring 2014 Assignment #7: (Final Exam Homework) Due: Wednesday, May 14, 2014 at the end of the day -- place in box outside of 101 Haviland. Contact David Lein if you want to turn it in early. Unlike your previous homework assignments, work on this exam independently. General Task: Using the CHDS Data Set, test the hypothesis that pregnancy weight gain trajectories over time (i.e.: rates of weight gain) are equal for both halves of a pregnancy. Also, examine the correlation between weight gain trajectories with birth weight. In Specific • Create an appropriate SAS Data Set with unique observations for each pregnancy comprised of the following variables: (1) PREG -- the CHDS pregnancy ID variable, (2) BETA1 and BETA2 -- the trajectory of pregnancy weight gain (betas from simple regression in kilos/day) for the first and second half of each pregnancy, (3) the difference between the two BETA variables, and (4) BIRTHWEIGHT – the birthweight of the child associated with each pregnancy. • Compute the following: (1) the mean of the two betas across all pregnancies. (2) a paired t-test to assess the equivalence of the betas, (3) the correlations among the birthweight and the two weightgain betas. Data Sets: To perform these analyses, you will use two CHDS raw data sets: (1) CHDS Basic, basic.dat and (2) wtgainwide.dat. basic.dat resides on the C:\PH144 directories in the computer labs and the class web site; wtgainwide.dat resides on the class web site. You should already have experience reading the CHDS Basic from previous exercises. The wtgainwide.dat data set is a pregnancy-based raw (or ascii) data set. There is one record for each pregnancy. Each record is comprised of the unique CHDS pregnancy ID variable, month, day and year of LMP (last menstrual period – proxy for date of conception) and up to twenty-six measurement sets of month day and year of medical exam and mother’s body weight in pounds at the time of the exam. This data set is written in free-field format with spaces as delimiters. wtgainwide.dat record layout (free field) order of variables: preg (pregnancy ID) lmpmonth (month of lmp) lmpday (day of lmp) lmpyear (year of lmp) exmonth1 (month of exam #1) exday1 (day of exam #1) exyear1 (year of exam #1) weight1 (mother’s body weight at exam #1 – lbs.) exmonth2 (month of exam #2) exday2 (day of exam #2) exyear2 (year of exam #2) weight2 (mother’s body weight at exam #2 – lbs.) . . . exmonth26 (month of exam #26) exday26 (day of exam #26) exyear26 (year of exam #26) weight26 (mother’s body weight at exam #26 – lbs.) To complete this assignment, you will need to (not necessarily in this order) … 1. 2. 3. 4. Read in the wtgainwide.dat data set – use the provided MACRO to help. Create SAS Dates from months, days and years and convert weight measurements to metric. Reconfigure the wide weight-gain data set into a long, date-based data set. Create a first- and second-half-of-pregnancy variable, perhaps call it PREGHALF. Assume that the first 135 days of the pregnancy is the first half and records after 135 days are in the second. 5. Create a data set of individual weight-gain trajectories (betas) from simple regressions – two for each pregnancy (first half and second half of the pregnancy). 6. Read in the CHDS Basic data set and convert birthweight to metric. 7. Combine (merge) betas with birthweight variables as appropriate - only include pregs with betas. 8. As described above, assemble a pregnancy-specific data set (one record per pregnancy) with the following variables – PREG, BETA1 (beta from first half of the pregnancy), BETA2,(beta from first half of the pregnancy), difference between BETA1 and BETA2, and BIRHTWEIGHT. 9. Run PROC MEANS as appropriate to compute the mean values for the two beta-variables and a paired t-test for their difference. (Paring is based on the two betas for each of the pregnancy). 10. Run PROC CORR to compute the correlations among the two betas and birthweight. Some Help in Reading wtgainwide.dat The following SAS code has been posted on the class web site, http://socrates.berkeley.edu/~dlein. You may use it in your program to read in the wtgainwide.dat data set. filename WG ‘c:\ph144\wtgainwide.dat’; (or similar) options mprint; data wtgainwide; infile WG lrecl=500; input preg lmpmonth lmpday lmpyear @@; %macro out; %do i = 1 %to 25; input exmonth&i exday&i exyear&i weight&i @@; %end; %mend; %out; input exmonth26 exday26 exyear26 weight26; run; Read though this SAS Data Step and make sure that you understand the use of the SAS Macro Statements. Note the final resolution into conventional SAS Language that appears in the log file (if you use the mprint option). The following discussion may help to explain some of the features of this SAS Data Step. Also, consider expanding the SAS Macro to help with the requested tasks. Recall that SAS allows for the use of the single dash to refer to groups of variables with numeric suffixes and the same character-prefix (e.g. x1-x33 or age1-age12 refers the full range of twelve variables). The single dash convention may be used in SAS Procedures (e.g. in a SAS Var Statement or SAS Model Statement) as well as in a SAS Input Statement or SAS Put Statement. There is, however, no such shortcut for referring to groups of variables each of which have the same numeric prefix. For example, (x1 y1 z1) – (x12 y12 z12) does not work as a method of referring to a x1 y1 z1 x2 y2 z2 … x12 y12 z12. It is, in fact, a curious shortcoming of SAS. An easy solution, however, is to program the SAS Macro Facility to generate SAS Statements with text substitution to refer to each set of variables. In this case, we can generate SAS Input Statements. Generating multiple input statements, along with the use of the double-trailing-@-sign, allows for SAS to input multiple groups of variables and assign them to a single output record. A trailing-double-@-sign says to the SAS Data step, “Hold the line.” That is, do not move the pointer to a new line; instead, just keep on reading data values until there are no more on the line. Also, note the use of the lrecl=500 option in the SAS Infile Statement. lrecl is an abbreviation for “logical record length.” The default logical record length in SAS is 256 characters. If you want to read a file that is longer than 256 characters, you must break the default by specifying a larger record length that is equal to or longer than your data file. The longest record in the wtgainwide.dat file is 385 records, so lrecl must be set to a number greater than 385. lrecl may be included in the SAS Input Statement or the SAS Filename Statement. HAVE A GREAT SUMMER!