Use Enterprise Miner to fit a regression tree.

1. Issue the following commands to SAS (with appropriate
   substitutions for "U:\").

LIBNAME GFLib 'U:\dm\sasdata' ; 

LIBNAME RCLib 'U:\dm\RC' ;

2. If necessary, apply macros to convert Excel data to
SAS (EXCELSAS, see MacroInstr.txt), to divide the data 
into training/test or training/validation/test subsets
(RANSPLIT, see MacroInstr.txt), or to explore the data
(UNIVAR and FREQ/FREQUENCY, see MacroInstr2.txt).

3. If necessary, assign numerical designations to 
categories and shorten variable names.  Illustrative
code may be found in MacroInstr5.txt. 

4. In SAS, go to the "Solutions" menu.  Go to "Analysis"
and then select "Enterprise Miner".  Do not invoke the
tutorial (unless you want to do so to satisfy your own
curiosity).  From the "File" menu choose "New" and then
"Project".  A box will appear with "Name" and "Location"
fields as well as with "Create", "Cancel", and "Browse" 
buttons.  Press the "Browse" button and select a 
subdirectory such as 'U:\dm\RC'.  In the "Name" field,
type a name like 'RegTree'.  Then press "Create".  

5. In the left panel of the SAS Enterprise Miner window, 
you will see a diagram with 'RegTree' and, immediately 
below it, 'Untitled'.  You can right-click on 'Untitled' 
to assign a name such as 'FEV'.

6. Near the top of the SAS Enterprise Miner window, 
click the "Input Data Source" icon (far left) and, while 
holding the mouse button down, drag it into the right 
panel.  Assuming that you have training, validation, and 
test data, repeat this process twice so that you have 
three "Input Data Source" icons in the right panel.  

7. Double click on one of the "Input Data Source" icons 
in the right panel.  Press the "Select" button and then 
choose an appropriate library such as 'RCLib'.  Choose a 
training data set like 'FEVTRAIN'.  In the "Role" field,
change "RAW" to "TRAIN".  Then click on the "Variables"
tab near the top of the "Input Data Source" box.  You 
can alter entries in the "Model Role" column by right
clicking and then selecting "Set Model Role".  You want
the response variable to be identified as "target", any
(candidate) explanatory variables to be identified as
"input", and the ID variable (if any) to be identified
as "ID".  Any variables that you know you will not be 
using at all may be identified as "rejected".  Also, 
make any necessary adjustments in the "Measurement"
column.  When finished, click the white on red X in the
upper right corner of the "Input Data Source" box and
confirm the changes.  Assuming that you have validation 
and test data, repeat this process twice (except that
"RAW" will be changed to "VALIDATE" and "TEST").

8. Near the top of the SAS Enterprise Miner window, 
click the "Tree" icon (middle) and drag it into the 
right panel.  By holding the left mouse button down,
'draw' arrows from the data set icons to the "Tree" 
icon.  Also, drag the "Reporter" icon (right) into
the right panel, then 'draw' an arrow from the "Tree"
icon to the "Reporter" icon.

9. Double click on the "Tree" icon.  Click the "Basic"
tab.  Choose "Variance Reduction" for the splitting
criterion, deselect "Treat missing as an acceptable
value", and change the number of "Surrogate rules 
saved in each node" to 3.  When finished, close the
"Tree" box and assign a model name such as 'FEVEx'.

10. Right-click the "Tree" icon and choose "Run".  
You will be asked if you want to view the results,
which include average squared error figures for the 
training and validation data sets as more leaves are 
added to the regression tree.  When finished, click
the white on red X.  You can view a schematic of the 
regression tree by right-clicking the "Tree" icon, 
choosing "Interactive", and selecting "Start".

11. Right-click on the "Reporter" icon and choose
"Run".  You can "Open" the report now or simply note
to which subdirectory it has been saved.  Some
important items in the report not found in the results 
you already examined are the average squared error for
the test data set and the "English rules" describing
the regression tree.

12. Suppose that you want to see the tree's
predictions for specific individuals in the test data 
set (or, actually, for any data set that has the same 
explanatory variables as the training and validation 
data sets).  You can do so by imitating the approach
shown below.

DATA RCLib.FEVtestPred;
SET  RCLib.FEVtest;

Copy and paste the contents of
{http://www.richardcharnigo.net/CPH636S09/FEVreport/em_report_822392082.txt},
which is accessible from the "Datastep Score Code" link at
{http://www.richardcharnigo.net/CPH636S09/FEVreport/em_report.html}.

RUN;

PROC PRINT DATA=RCLib.FEVtestPred;
VAR P_FEV;
RUN;


Use Enterprise Miner to fit a classification tree.

Steps 1 through 8 are essentially the same as those
presented above for fitting a regression tree.

9. Double click on the "Tree" icon.  Click the "Basic"
tab.  Choose "Gini Reduction" for the splitting
criterion, deselect "Treat missing as an acceptable
value", and change the number of "Surrogate rules 
saved in each node" to 3.  When finished, close the
"Tree" box and assign a model name such as 'SAEx'.

10. Right-click the "Tree" icon and choose "Run".  
You will be asked if you want to view the results,
which include misclassification rates for the training 
and validation data sets as more leaves are added to 
the classification tree.  When finished, click the 
white on red X.  You can view a schematic of the 
classification tree by right-clicking the "Tree" icon, 
choosing "Interactive", and selecting "Start".

11. Right-click on the "Reporter" icon and choose
"Run".  You can "Open" the report now or simply note
to which subdirectory it has been saved.  Some
important items in the report not found in the results 
you already examined are the misclassification rate 
for the test data set and the "English rules" 
describing the classification tree.

12. Suppose that you want to see the tree's estimated
probabilities for specific individuals in the test data 
set (or, actually, for any data set that has the same 
explanatory variables as the training and validation 
data sets).  You can do so by imitating the approach
shown below.

DATA RCLib.SAtest2Pred;
SET  RCLib.SAtest2;

Copy and paste the contents of
{http://www.richardcharnigo.net/CPH636S09/SAreport/em_report_822554338.txt},
which is accessible from the "Datastep Score Code" link at
{http://www.richardcharnigo.net/CPH636S09/SAreport/em_report.html}.

RUN;

PROC PRINT DATA=RCLib.SAtest2Pred;
VAR P_CHD1;
RUN;

13. Suppose that you want to obtain correct
classification rates on the test data set (or, 
actually, any data set that has the same explanatory
variables as the training and validation data sets) 
with thresholds other than 0.50.  You can do so by 
imitating the approach shown below, after having 
performed step 12.  If you insert 
	WHERE CHD = 1;
immediately after the line starting with  PROC MEANS,
the code will give you sensitivities instead of 
correct classification rates.  If you insert 
	WHERE CHD = 0;
immediately after the line starting with  PROC MEANS,
the code will give you specificities instead of
correct classification rates.  

DATA RCLib.SAtest2Pred;
SET  RCLib.SAtest2Pred;
ESTRISK = P_CHD1;
PREDWITHCUTOFF05 = 1 - (ESTRISK < 0.05);
PREDWITHCUTOFF10 = 1 - (ESTRISK < 0.10);
PREDWITHCUTOFF15 = 1 - (ESTRISK < 0.15);
PREDWITHCUTOFF20 = 1 - (ESTRISK < 0.20);
PREDWITHCUTOFF25 = 1 - (ESTRISK < 0.25);
PREDWITHCUTOFF30 = 1 - (ESTRISK < 0.30);
PREDWITHCUTOFF35 = 1 - (ESTRISK < 0.35);
PREDWITHCUTOFF40 = 1 - (ESTRISK < 0.40);
PREDWITHCUTOFF45 = 1 - (ESTRISK < 0.45);
PREDWITHCUTOFF50 = 1 - (ESTRISK < 0.50);
PREDWITHCUTOFF55 = 1 - (ESTRISK < 0.55);
PREDWITHCUTOFF60 = 1 - (ESTRISK < 0.60);
PREDWITHCUTOFF65 = 1 - (ESTRISK < 0.65);
PREDWITHCUTOFF70 = 1 - (ESTRISK < 0.70);
PREDWITHCUTOFF75 = 1 - (ESTRISK < 0.75);
PREDWITHCUTOFF80 = 1 - (ESTRISK < 0.80);
PREDWITHCUTOFF85 = 1 - (ESTRISK < 0.85);
PREDWITHCUTOFF90 = 1 - (ESTRISK < 0.90);
PREDWITHCUTOFF95 = 1 - (ESTRISK < 0.95); 
CORRECT05 = (PREDWITHCUTOFF05 = CHD);
CORRECT10 = (PREDWITHCUTOFF10 = CHD);
CORRECT15 = (PREDWITHCUTOFF15 = CHD);
CORRECT20 = (PREDWITHCUTOFF20 = CHD);
CORRECT25 = (PREDWITHCUTOFF25 = CHD);
CORRECT30 = (PREDWITHCUTOFF30 = CHD);
CORRECT35 = (PREDWITHCUTOFF35 = CHD);
CORRECT40 = (PREDWITHCUTOFF40 = CHD);
CORRECT45 = (PREDWITHCUTOFF45 = CHD);
CORRECT50 = (PREDWITHCUTOFF50 = CHD);
CORRECT55 = (PREDWITHCUTOFF55 = CHD);
CORRECT60 = (PREDWITHCUTOFF60 = CHD);
CORRECT65 = (PREDWITHCUTOFF65 = CHD);
CORRECT70 = (PREDWITHCUTOFF70 = CHD);
CORRECT75 = (PREDWITHCUTOFF75 = CHD);
CORRECT80 = (PREDWITHCUTOFF80 = CHD);
CORRECT85 = (PREDWITHCUTOFF85 = CHD);
CORRECT90 = (PREDWITHCUTOFF90 = CHD);
CORRECT95 = (PREDWITHCUTOFF95 = CHD);
RUN;

TITLE ' ';
PROC MEANS DATA=RCLib.SAtest2Pred MEAN;
VAR CORRECT05 CORRECT10 CORRECT15 CORRECT20 CORRECT25 CORRECT30 CORRECT35 CORRECT40 CORRECT45 CORRECT50 CORRECT55 CORRECT60 CORRECT65 CORRECT70 CORRECT75 CORRECT80 CORRECT85 CORRECT90 CORRECT95;
RUN;