For Analytic Solver, partition data sets into 50% training, 30% validation, and 20% test and use 12345 as the default random seed. If the predictor variable values are in the character format, then treat the predictor variable as a categorical variable. Otherwise, treat the predictor variable as a numerical variable.
The excel worksheet of the accompanying data file is used to classify individuals as likely or unlikely to attend church using five predictor variables: years of education (Educ), annual income (Income in $), age, sex (F = female, M = male), and marital status (Married, Y = yes, N = no). The outcome variable is Church (1 = attends, 0 otherwise). Create a classification tree model for predicting whether the individual is likely to attend church.
a-1. How many leaf nodes are in the best-pruned tree and minimum error tree?
a-2. Which of the following is NOT a rule that can be derived from the best-pruned tree?
multiple choice
- If age is greater than or equal to 34.5 then the individual is likely to go to church.
- If age is greater than or equal to 34.5 then the individual is not likely to go to church.
- If age is less than 34.5 then the individual is not likely to go to church.
- If age is greater than or equal to 34.5 , the income is greater than or equal to 10 , 600 , and the age is greater than or equal to 62.5 then the individual is likely to go to church.
b. What are the accuracy rate, sensitivity, specificity, and precision of the best-pruned tree on the test data?
Note: Round your answers to 2 decimal places.
c. Generate the ROC curve. What is the area under the ROC curve (or AUC value)?
Note: Round your answer to 4 decimal places.
d. Score the cases in the Church_Score worksheet using the best-pruned tree. What percentage of the individuals in the score data set are likely to go to church based on a cutoff probability value of 0.5?
Note: Round your answer to 2 decimal places.