Imagine your company has pulled you in on a non-traditional actuarial assignment to help the marketing team better understand the effectiveness of their marketing campaigns and predict how many people will sign up for your health insurance plans based on the company’s marketing efforts. The dataset (marketing.csv) you’ve been given contains information pertaining to marketing dollars spent on different media and the number of members sign-ups those marketing campaigns have generated.
There are 5 fields in this dataset:
- TV – is the amount of marketing dollars (in thousands) spent on TV advertising.
- Internet – is the amount of marketing dollars (in thousands) spent on online advertising.
- Mailing – is the amount of marketing dollars (in thousands) spend on advertising via mail.
- Members – is the number of people who have signed up (in thousands) for health insurance given the various marketing campaigns.
- Region – is the region of the United States that each marketing campaign was focused on.
For this writing assignment,
(1) Randomly split this dataset into two parts with the split ratio of 80:20. That is 80% of the dataset goes into the training set and 20% of the dataset goes into the testing set.
(2) With the training set, find the best model for predicting Members based on RMSE of your test data. That is the model with the lowest RMSE.
Write up all the procedures that you come up with your best model. Your writeup should include 4 parts: Introduction (Simply introduce the data and the goal of the project), Methods (the steps of you used to find the best model, necessary graphs should be included), Results (report your best model and the RMSE value you got, necessary graphs should be included), and Discussion (simply summarize the advantages of your model and the potential limitations).