Use the UNIVAR macro. 1. Issue the following commands to SAS (with appropriate substitutions for "U:\"). LIBNAME GFLib 'U:\dm\sasdata' ; LIBNAME RCLib 'U:\dm\RC' ; 2. Read your Excel file into SAS with the EXCELSAS macro. Note that this step only needs to be done once. In other words, if you close SAS down today but decide that you want to apply the UNIVAR macro to the same data set on another day, you do not need to read your Excel file into SAS a second time. This is because the EXCELSAS macro actually creates a permanent SAS dataset. 3. Apply the UNIVAR macro. Detailed instructions on how to fill in the blanks are given on pages 60-63 of the textbook; I will make a few comments below. I used the following for the illustration in Lecture 3. RCLib.saheart sbp chd 95 subject 1 U:\dm\RC\ U:\dm\RC\ word Comments: You can put any continuous variable from the data set in the second field. This variable does not have to be what you would consider the response variable. You can also put more than one variable in the second field, if you separate them by spaces. Since the length of the second field is limited, you may need more than one invocation of the UNIVAR macro if you are interested in looking at several continuous variables. You can put any categorical variable from the data set in the third field, or you can leave the third field blank. The former option performs exploratory data analysis on the continuous variable(s) within strata determined by the categorical variable, while the latter option does not entail any stratification. What I have above looks at the distribution of systolic blood pressure among subjects with coronary heart disease and at the distribution of systolic blood pressure among subjects without coronary heart disease. 4. Based on the exploratory data analysis, you can define transformed versions of the continuous variable(s) and remove extreme outliers that are mistakes for which corrections are not possible. The macro creates new data sets with outliers removed, but I would not use these since the macro discards moderate outliers (which are often valid observations) along with extreme outliers (which may be mistakes). Here is sample code that creates a new SAS data set {saheartclean.sas7bdat} starting from {saheart.sas7bdat} by removing all observations for which sbp > 200 and defining new variables as the logarithm and square root of systolic blood pressure. DATA RCLib.saheartclean; SET RCLib.saheart; if (sbp > 200) then delete; logsbp = log(sbp); sqrtsbp = sqrt(sbp); RUN; 5. You may want to apply the UNIVAR macro again to see how the transformed versions of the continuous variable(s) turn out. RCLib.saheartclean logsbp chd 95 subject 2 U:\dm\RC\ U:\dm\RC\ word Use the FREQ (or FREQUENCY) macro. 1. Issue the following commands to SAS (with appropriate substitutions for "U:\"). LIBNAME GFLib 'U:\dm\sasdata' ; LIBNAME RCLib 'U:\dm\RC' ; 2. Read your Excel file into SAS with the EXCELSAS macro. 3. Apply the FREQ (or FREQUENCY) macro. Detailed instructions on how to fill in the blanks are given on pages 53-55 of the textbook; I will make a few comments below. I used the following for the illustration in Lecture 3. RCLib.saheart chd famhist freq midpoint color 3 U:\dm\RC\ U:\dm\RC\ word Comments: Use the second field for the categorical variable whose distribution you want to examine. Optionally, use the third and fourth fields for other categorical variables by which you want to stratify. What I have above looks at the distribution of coronary heart disease among subjects with a family history and among subjects without a family history.