Use Enterprise Miner for nearest neighbor analysis. 1. Issue the following commands to SAS (with appropriate substitutions for "U:\"). LIBNAME GFLib 'U:\dm\sasdata' ; LIBNAME RCLib 'U:\dm\RC' ; 2. If necessary, apply macros to convert Excel data to SAS (EXCELSAS, see MacroInstr.txt), to divide the data into training/test or training/validation/test subsets (RANSPLIT, see MacroInstr.txt), or to explore the data (UNIVAR and FREQ/FREQUENCY, see MacroInstr2.txt). 3. Determine the mean and standard deviation of each explanatory variable in the training data set. Use this information to create standardized versions of the explanatory variables, which will be used in the nearest neighbors analysis. (The standardized versions will have mean 0 and standard deviation 1 in the training data set and will have approximately but not exactly mean 0 and standard deviation 1 in other data sets with the same explanatory variables.) Note, however, that there is no need (or good reason) to use a standardized version of the response variable in the nearest neighbors analysis. proc means data=RCLIB.diabtr mean stddev; var NPREG GLU BP BMI AGE; RUN; data RCLIB.diabtr; set RCLIB.diabtr; STNPREG = (NPREG - 3.53 )/ 3.1923362; STGLU = (GLU - 123.03 )/ 31.1628032; STBP = (BP - 71.36 )/ 11.4508347; STBMI = (BMI - 33.22 )/ 6.2331146; STAGE = (AGE - 32.63 )/ 10.5434694; RUN; data RCLIB.diabva; set RCLIB.diabva; STNPREG = (NPREG - 3.53 )/ 3.1923362; STGLU = (GLU - 123.03 )/ 31.1628032; STBP = (BP - 71.36 )/ 11.4508347; STBMI = (BMI - 33.22 )/ 6.2331146; STAGE = (AGE - 32.63 )/ 10.5434694; RUN; data RCLIB.diabte; set RCLIB.diabte; STNPREG = (NPREG - 3.53 )/ 3.1923362; STGLU = (GLU - 123.03 )/ 31.1628032; STBP = (BP - 71.36 )/ 11.4508347; STBMI = (BMI - 33.22 )/ 6.2331146; STAGE = (AGE - 32.63 )/ 10.5434694; RUN; 4. In SAS, go to the "Solutions" menu. Go to "Analysis" and then select "Enterprise Miner". From the "File" menu, choose "New" and then "Project". A box will appear with "Name" and "Location" fields as well as "Create", "Cancel", and "Browse" buttons. Press the "Browse" button and select a subdirectory such as 'U:\dm\RC'. In the "Name" field, type a name like 'Neighbor'. Then press "Create". 5. In the left panel of the SAS Enterprise Miner window, you will see a diagram with 'Neighbor' and, immediately below it, 'Untitled'. You can right-click on 'Untitled' to assign a name such as 'Diabetes'. 6. Near the top of the SAS Enterprise Miner window, click the "Input Data Source" icon (far left) and, while holding the mouse button down, drag it into the right panel. Assuming that you have training, validation, and test data, repeat this process twice so that you have three "Input Data Source" icons in the right panel. 7. Double click on one of the "Input Data Source" icons in the right panel. Press the "Select" button and then choose an appropriate library such as 'RCLib'. Choose a training data set like 'DIABTR'. In the "Role" field, change "RAW" to "TRAIN". Then click on the "Variables" tab near the top of the "Input Data Source" box. You can alter entries in the "Model Role" column by right clicking and then selecting "Set Model Role". You want the response variable to be identified as "target", the (standardized) explanatory variables to be identified as "input", and the ID variable (if any) to be identified as "ID". Any variables that you know will not be used may be identified as "rejected". Also, make any necessary adjustments in the "Measurement" column. When finished, click the white on red X in the upper right corner of the "Input Data Source" box and confirm the changes. Assuming that you have validation and test data, repeat this process twice (except that "RAW" will be changed to "VALIDATE" and "TEST"). 8. Click the "Tools" tab at the lower left part of the SAS Enterprise Miner window. Drag the "Memory-Based Reasoning" icon into the right panel. By holding the left mouse button down, 'draw' arrows from the data set icons to the "Memory-Based Reasoning" icon. Also, drag the "Reporter" icon into the right panel, then 'draw' an arrow from the "Memory-Based Reasoning" icon to the "Reporter" icon. 9. Double click on the "Memory-Based Reasoning" icon. A "Tools" menu will appear near the top left of the screen. Go to "Tools" and then "Settings". You may reset the "Number of Neighbors" to whatever you wish, and you may deselect the "Weight Dimensions" option. Then choose "OK". Click the "Output" tab in the "Memory-Based Reasoning" window. Select "Process or Score:" Training, Validation, and Test. Close the "Memory-Based Reasoning" window and assign a model name such as 'DiabNeigh'. 10. Right-click the "Memory-Based Reasoning" icon and choose "Run". You will be asked if you want to view the results. You can say "No", as you will acquire what you need in the next step. 11. Right-click the "Reporter" icon and choose "Run". You can "Open" the report now or view it later by noting to which subdirectory it has been saved. For a continuous target, you are interested in the average squared error for the training, validation, and test data sets. For a categorical target, you are interested in the misclassification rate for the training, validation, and test data sets. You can find this information by clicking the "Output" link. For either kind of target, you can click the "Datastep Score Code" link to get SAS code that will allow you to generate predictions for each and every individual in any data set that has the same (standardized) explanatory variables as the training data set. You will need to modify the SAS code as indicated below. ORIGINAL (ignore portions before and after this one) proc pmbr score=_last_ out=work._tmpmbr data=EMDATA.VIEW_GF2 dmdbcat=EMPROJ.dm_DGM00000 k=16 method=RDTREE target DIAB; var STNPREG STGLU STBP STBMI STAGE; data &_oldscr; set work._tmpmbr; MODIFIED (assuming you want predictions on DIAB for individuals in the test data set) proc pmbr score=RCLIB.diabte out=RCLIB.diabtepred data=EMDATA.VIEW_GF2 dmdbcat=EMPROJ.dm_DGM00000 k=16 method=RDTREE ; target DIAB; var STNPREG STGLU STBP STBMI STAGE; RUN; proc print data=RCLIB.diabtepred; var P_DIAB1; RUN;