Final Project

Requirements

You will be analyzing a dataset using a regression and a classification analysis. An unsupervised learning analysis is not required for the project, but could be useful.

Collaboration:

You may work in teams of up to 4 members. Individual work is fine.

Final deliverables:

Only one team member has to submit these materials to Moodle. The due date is Friday, May 5th at 11:59pm CST.

Submit a final knitted HTML file (must knit without errors) and corresponding Rmd file containing code for your analysis
- For cleanliness, please add this R chunk to the top of your Markdown file:

knitr::opts_chunk$set(echo = TRUE, eval = TRUE, warning = FALSE, message = FALSE, tidy = TRUE)

Include a < 15 minute video presentation of your project that addresses the items in the Grading Rubric below.
- Upload the video itself to Moodle. If it’s too large, share a link to a recording on the web or in a shared drive.
- All team members should have an equal speaking role in the presentation.
- Please use a visual aid (that is, something for the viewer to look at). I think the easiest way to do this is to record a Zoom meeting, then share your screen with some slides.

Grading Rubric

Data context (10 points)
- Clearly describe what the cases in the final clean dataset represent.
- Broadly describe the variables used in your analyses.
- Who collected the data? When, why, and how? Answer as much of this as the available information allows.
Research questions (10 points)
- Research question(s) for the regression task make clear the outcome variable and its units.
- Research question(s) for the classification task make clear the outcome variable and its possible categories.
Regression - Methods (10 points)
- Describe the regression models you tried.
- Describe what you did to evaluate models.
  - Indicate how you estimated quantitative evaluation metrics.
  - Indicate what plots you used to evaluate models.
Regression - Results (10 points)
- Display evaluation metrics for your different models.
- Summarize your final model and justify your model choice using a combination of the following:
  - Analytically: evaluation metrics, plots, variable importance
  - Interpretability/simplicity
- Show and interpret some representative examples of residual plots for your final model. Does the model show acceptable results in terms of any systematic trends/biases?
Regression - Summary (10 points)
- Interpret your final model for important predictors, and provide general interpretations about what you learned.
- Interpret evaluation metrics in context, with units. Is this an acceptable amount of error?
- All summarizations should be made in original context of the data.
Classification - Methods (10 points)
- Describe the classification models you tried.
- Describe what you did to evaluate models.
  - Indicate how you estimated classification evaluation metrics.
Classification - Results - Variable Importance (10 points)
- Display evaluation metrics for your different models.
- Summarize your final model and justify your model choice using a combination of the following:
  - Analytically: accuracy metrics, variable importance
  - Interpretability/simplicity
  - Plots: ROC, grouped boxplots with probability threshold, etc.
- Does the model show acceptable results in terms of accuracy, sensitivity, or specificity?
Classification - Summary (10 points)
- Interpret your final model for important predictors, and provide general interpretations about what you learned.
- Interpret evaluation metrics in context. Is this an acceptable accuracy/sensitivity/specificity, relative to the NIR?
- All summarizations should be made in original context of the data.
Code (20 points)
- Knitted, error-free HTML and corresponding Rmd file submitted
- Code corresponding to all analyses above is present and correct