Stata A short introduction Applied Time Series Econometrics 31E01300 Spring 2014 1 General • Stata is a general statistical package, which is widely used by econometricians, biometricians, and social statisticians • To use Stata efficiently, it is necessary to learn commands • However, you can also point-and-click and Stata shows the commands implied by the ”clicks” easy to learn commands 2 Starting • Start the program by clicking the shortcut symbol • Or: if you already have a data file, click the its name • When you start the program, you will see – – – – – – – – Title Menu row Shortcut symbols Command window Review window Results window Variables window Address line 3 4 • • • • • • • • • • • • Shortcut symbols: Open file Save file Print Start/close log Viewer Edit graph Open do file Edit data View data Variables manager Stop execution 5 • Log is a record of everything that appears in the results window, including commands and results • Viewer: for example help file is shown in a viewer • Data editor: input or change data – Not necessary to use, most likely better not to use • Data browser: you can look at data, but not change it (read-only) • Do file: a file with commands 6 • • • • • • • • Menus: File: open a data file, import data, print etc. Edit: edit text Data: go to data editor or browser, combine data sets, get descriptions of data, matrix operations Graphics: graphics tools Statistics: statistics and estimation commands Window: determine which windows are shown Help: search help files (on the computer) or addon programs available in the net; manual in pdf form 7 Reading in data • To try something quickly with a small number of observations, you can type in the data or copy & paste data in the Data Editor • To import data from a file, click File Import Excel spreadsheet – You come to a window where you can choose Excel file and worksheet names. • Similarly, if the data is as a text file, choose the format in the Import menu • There are also some specialized file transfer programs (e.g. StatTransfer) to convert data directly e.g. from Excel to Stata 8 • It is a useful to organize the data in Excel so tha variable names are in the first row and observation number / date in time series as the first column • When reading in a very large data set, Stata (new versions) automatically adjust the memory size; with older versions of Stata you had to secure that there is enough memory for the data – To increase memory, write in the command window for example: set memory 50m 9 • If a Stata data file already exists, you can start Stata by clicking Stata data file name (file with .dta ending) • Or start by opening Stata and then click the Open file shortcut symbol, or click File Open, and choose existing data file 10 Saving your data • Click the save button and the dataset (with possible changes) replaces the old one with the same name • To give a new name, click the save shortcut symbol or FileSave As and give a name to the data set – Stata data file names always have an ending .dta • Note: it is always important to have a safe copy of your original, unchanged data • When you generate new variables etc., save the new data file with another name than the original file • If the data set is small, a good policy is to do the variable creation always ”from scratch” with a do-file (see below) 11 After reading in data • In the Variables window: – – – – Variable names Possibly some explanations (Labels) Variable type and format If the type is str (string), the variable has been read as text and not numbers • It could be genuinely text (like automobile names) or just numbers coded as text, in which case they can be converted to numbers • In the Results window: command that has been executed (and output when applicable) • In the Review window: command that has been executed 12 13 • There are two other ways of giving commands in Stata (besides point-and-click) • 1) Write command in the Command window • Then press Enter and Stata executes the command • The command appears in the Review window and the result in the Results window 14 • Examples of commands to read in data import excel “C:\Documents\try.xlsx", sheet ("Sheet1") – Reads an Excel file; ‘Documents’ is whatever place (folder) where you saved your data file use “C:\Documents\datasetname.dta", clear – Reads a Stata file that already exists • An example of a command to save data save ”C:\Documents\newfilename.dta” 15 • 2) Write the commands in a Do-file and run it • Click Do-file shortcut symbol and an empty dofile appears. • Write the commands there • You can also copy commands from the Review window and paste them in the do-file • Then highlight the command(s) and click the symbol in the extreme right (Do-button) • Stata executes the command(s) (for example, data is read). • Now you can save the do-file and repeat the same thing in the future 16 Dobutton 17 • What’s the point in using a do-file with commands? – You have a record of what you have done: how you handled the data, generated variables, exact estimation commands used etc. – you (and others!) can always replicate what you have done – easy to add new variables, change estimation methods etc. – easier to detect errors – you can apply user-written extensions to Stata commands 18 • Often easiest to start by doing initial estimations using point-and-click mode; then copy estimation commands to a do file • When you have learnt the basics, faster to write the commands directly • To learn about commands, see the manual and the help file 19 • When you read data or save datasets to your computer or memory stick, it is often useful to define the path so you don’t have to repeat it • Example: you keep the files in the folder project in the documents folder and have the Stata dataset mydata there local mypath ““C:/documents/project”” cd `mypath’ use mydata.dta 20 Example file • We will use several example files which will be available in the home page – The dataset iceream.dta contains time series data from a paper by Hildreth and Lu (1960) – four-week observations from 18 March 1951 to 11 July 1953 – consumption: consumption of ice cream per head (pints) – income: avg. family income per week (US $) – price: price of ice cream (per pint) – temp: average temperature (in Fahrenheit) 21 – time: index from 1 to 30 Looking at the data • Viewing the data set – Click the Data Browser button – You see the variable names and values – You can sort the data according to some variable by clicking some cell in the column according to which the data is to be sorted, and then click Data Sort – You can order the columns by clicking Data Data utilities Change order of variables • Or drag variables in the Variables window of the Data browser to the order you want 22 – Note: – Decimals are noted with periods, not commas – Missing values denoted by dots ( . ) • No missing values in the icecream data set 23 Printing data and summary statistics • Click Data Describe data • There are several options, which all lead to a window where you can choose the variables you are interested in – List: lists variable values in the results window – Describe: gives information on e.g. the range, and distribution of the variables and their format – Summary statistics: mean, variance, min, max 24 • Descriptive statistics can be obtained also by clicking Statistics Summaries, tables, and tests Summary and descriptive statistics Summary statistics – There you can also e.g. get correlations of variables 25 • The same with commands: – you can write the command summarize consumtion price income temp – Or list consumption price income temp – You can either write these in the command window or in a do-file • To tabulate distribution for variables with discrete values, use tabulate or tab, for example: tab temp 26 • If several variable names have the same beginning, e.g. if we have variables x1 and x2, you can write summarize x* to get a summary of both • Most commands can be shortened, for example instead of summarize, you could write sum • if specifies conditions, e.g. if you want to summarize variable cons when variable temp exceeds 60, choose in the summary statistics window the if/by-sheet and write the condition temp>60 there – Corresponding command is summarize consumption if temp>60 – Several if-conditions can be combined: if temp> 60 & income<80 27 • by makes it possibly to do an operation for separate subgroups – For example, we could summarize icecreame consumtion by temperature – Choose in the summary statistics window the if/by sheet and there click Repeat commands by groups and write temp as the Variables that define groups – With command: by temp, sort: sum consumption or bysort temp: sum consumption – Here most ”groups” have only one observation, so the operation did not make much sense 28 Create variables • Click Data Create or change data Create new variable • Example: convert the temperature, which is in Fahrenheit (°F) to Centigrades (°C) • Give a name to the new variable, e.g. ctemp • Then write expression for of the new variable: (temp-32)*5/9 – By clicking Create, you get mathematical etc. Functions and from Data Create or change data Create new variable (extended) some additional expressions – You can alternatively write the command generate ctemp=(temp-32)*5/9 in Command window or do-file 29 • To change the content of a variable, click Data Create or change variables Change contents of a variable – With commands: for example the income variable is weekly income. If you wanted to convert that to annual income, without changing the name of the variable, you could write replace income = 52*income • To rename a variable, click Data Variable utilities Rename a variable – Alternatively, write a command, e.g. rename temp temp_fahrenheit 30 • Some additional points: – Stata allows long variable names, so try to avoid short, cryptic names (difficult to remember afterwards what they are) – Upper case and lower case letters are treated as different, so for example temp and Temp would be different variable names – If you try to create a new variable and give it a name that an existing variable has, you’ll get an error message 31 • Often useful to add explanations on the variables, i.e. labels – Click the shortcut symbol for data manager, or click Data Variables manager, then choose a variable and write its label (description) – The label appears in the Variables window – You can also choose value labels: Click the value labels manager and give the values and their labels and give a name for the labels 32 – Example: give the temperature variable temp the label ”Fahrenheit” and the variable ”ctemp” the label ”Celcius” – Do it in the menu, or use commands label variable temp ”Fahrenheit” label variable ctemp ”Celsius” – The variable labels no appear e.g. in graphs and tables instead of the variable names 33 – Example: generate dummy variable for temperatures above 15 °C . Then give the label values high and low for the values 1 and 0, and the name templabel for the value labels: gen hightemp = ctemp>15 label variable hightemp “dummy for high temp” label define templabel 1 “high” 0 “low” label values hightemp templabel • We need the last command to attach the value labels to the variable male – If we now give the command tab hightemp, we see the value labels instead of 1’s and 0’s 34 • In the previous example, we used a logical expression ctemp>15 • This returns 1 if the expression is true and 0 if not • Similarly, one can use >, <, >=, <=, != (not equal), == (equal) 35 Some examples on plotting data • Click Graphics Histogram – Then choose variable – Note: several other choices available, concerning if/by, graph titles etc. – Alternatively write command, e.g. histogram ctemp • To get a smoothed continuous distribution graph, click Graphics Smoothing and dencities Kernel density – Alternatively, use command kdensity ctemp 36 • Plot icecream demand (cons) against temperature (ctemp) • Click Graphics Twoway graph . – Then click Create, and you get a window where you can choose graph type (take Scatter) and choose the variables – Click Submit and you’ll get the graph – Note that this is named Plot 1 in the Twoway graph window. You can now specify several plots and include them in the same graph – Alternatively, you’ll get the same graph with a command scatter consumption ctemp – Line graph: line consumption time 37 .6 .5 .2 .3 cons .4 -5 0 5 10 Celsius 15 20 38 0 10 time 20 30 39 .2 .3 cons .4 .5 .6 • Saving a graph – After getting the graph, you can save it • File Save as and give a name, or use command graph save graphname where graphname is the name you want to give • Saved as a file with ending .gph – You can also copy the graph and paste it in a Word document – Graphs in Stata mostly not as flexible as in Excel – In Stata the graphs can be edited, for example text can be added, color changed etc. using the Graph editor • If you have saved two graphs and given them e.g. names graph1 and graph2, you can combine them to one graph: graph combine graph1.gph grpah2.gph 40 Using estimation commands • Example: simple regression model • Click Statistics Linear models and related Linear regression • Then choose dependent variable and independent variable – Constant term is added automatically – Alternatively, write command, e.g.: regress consumption ctemp income 41 Log file • It is useful to have a record of what you have done during a session; This is shown in a log file • To start a log file, click the log shortcut symbol, then choose file name and location – Choose log file type – log: text file that can be directly read by Word (e.g. copy tables to text) – smcl: formatted log file, has to be ”translated” to a text file before use • The results window shows that the log has been started • In the review window you see a corresponding command, which looks something like log using ”C:/Folder name/logfilename.log” • You can see the log file by clicking File Log View • To stop logging, click log shortcut, and Close log file, or write command log close 42 • All this you can do in a do-file with commands (i.e. write these in a do-file, save it, highlight commands and click the Do-button) • Example: log using ”C:\Folder name\logfilename.log”, replace use ”C:\Folder name\icecream.dta”, clear generate ctemp=(temp-32)*5/9 sum consumption ctemp income price reg consumption ctemp income price save ”C:\Folder name\icecream_modified.dta”, replace log close 43 Some command & do-file hints: • In the above do-file, the log command has ”, replace ” in the end – This replaces existing do-file with another one – ”, append ” adds new log file in the end of an old one – Of cource you can always give a new name to the log file • use command has ”, clear ” in the end – This clears possible old data from memory before reading new one • save command has ”, replace ” in the end – Replaces old data file with a new one – Be careful with this: don’t destroy possibly the only copy of your original data 44 • It is useful to write some explanations and comments in the do-file – Easier to remember afterwards what you did and why – Comments can be added by starting a line with * – Or us /* and */ and include comments between them • Lines can be broken by /// in the end of the line. The command continues from the next line • Examples: *Regress icecream consumption on income /* Regress icecream consumtion on temperature and income */ reg consumption /// ctemp income 45 • Some user prefer to end all commands with some symbol, e.g. semicolon ; • In this case, you have to tell Stata that the ’delimiter’ is ; #delimit ; reg consumption temp income; – Note that you can in this case continue the command in several lines, only the last line of a command ends with ; • To change command to end with break (carriage return), use #delimit cr (this is the default) 46 Time series operations • When the data set is time series variables, we can use special time series operations, but first we have to tell Stata that we have time series – The data has to have a variable that is a time identifier, e.g. time in the icecreame.dta data – Statistics Time series Setup and utilities Declare dataset to be time series data – With command (assume time is the name of the identifier): tsset time 47 • In models with a time series dimension, we often need lags • The lag operator is L. ; for two lags L2., etc. – L.x is xt-1, L2.x is xt-2 etc. – Note the dot ”.” after L • You can either generate lagged variables: gen lagincome= L.income and use the generated variables in estimation command, for example reg consumption lagincome • Or: specify the lags in the estimation command: reg consumption L.income 48 • To use differences over time, the operator is D. (or D2. for twice differencing etc.) – D.x is xt – xt-1, D2.x is (xt – xt-1) - (xt-1 – xt-2) • You can generate gen dcons = D.consumption or gen dcons = consumption – L.consumption • Or use the command directly in estimation: reg D.consumption D.income • You can also combine these, e.g. gen dlagcons = D.L.consumption • Note: when using lagged or differenced variables, you loose observations from the beginning of the data period 49 • For leads, use F. , F2. etc. – F.x is xt+1, F2.x is xt+2 etc. • For seasonal differences, use S. , S2. etc. – S.x is xt-xt-1 (the same as D.x), S2.x is xt – xt-2 etc. • E.g. to get 4-quarter differences of consumption, use gen dcons = S4.consumption (or gen dcons = consumption - L4.consumption) • These can be combined with lags, e.g. L.S4.consumption 50 • Using multiple lags: reg consumption L(1,2,3).income reg consumption L(1/3).income (lags from 1 to 3) reg consumption L(1(2)5).income lags from 1 to 5 (with an increment of 2) • Alternatively, you can generate the lags; in this case, it is useful to name them with names ending in lag numbers: gen c1=L.consumption, gen c2=L2.consumption, and similarly gen i1=L.income, i2=L2.income • These can easily be handled in the commands: reg consumption i1-i3 c1-c3 51 Dates • Suggestions on coding dates e.g. in an Excel file that you read into Stata – Annual data: years are already numbers, so no problem – Quarterly data: 1980q1, etc., or q1-1980 etc. – Monthly data: 1996Jan, 1996Feb, etc., or Jan1996 etc., or 1996m01, 1996m02, etc. – Daily data: 1987Dec13 etc., or 13dec1987, or 13121987 – also other formats possible 52 • This kind of time codes would be strings (text) in Stata data file. • For example, file nyse.dta has information on monthly volume of NY stock exchange, with string variable obs showing the months in the form 1900m01 etc. • To generate a number for the months, generate a new variable, gen month =monthly(obs,"YM") – YM tells that we have used coding like 1980m1 where year comes first and then month; Note the use of quotation marks and capital letters (”YM”) 53 • Stata produces a new variable month which is a series of numbers that always start running from the beginning of year 1960, in this case from Jan 1960 • With monthly data, there are 60*12=720 monthly observations in 1900m1 – 1959m12, so the numbering in the data set starts from -720 • A similar methods applies to monthly, daily, etc. data, e.g. gen day = daily(date,"YMD") if the original string variable is date 54 • Then in the tsset command, one can specify the month variable as the time dimension: tsset month • Better still, use the command tsset month, monthly – Now variable month is attached with ”labels” 1900m1 etc. (i.e. no longer strings), although it still has the numerical values -720,…, this is convenient e.g. in graphs – If the option , monthly is not used, the periods are numbered from 109 even in figures where we might prefer to have actual dates 55
© Copyright 2025 ExpyDoc