Final Project

Requirements

You will be analyzing a dataset using a regression and a classification analysis. An unsupervised learning analysis is not required for the project, but could be useful.

Collaboration:

  • You may work in teams of up to 4 members. Individual work is fine.



Final deliverables:

Only one team member has to submit these materials to Moodle. The due date is Friday, May 5th at 11:59pm CST.

  • Submit a final knitted HTML file (must knit without errors) and corresponding Rmd file containing code for your analysis
    • For cleanliness, please add this R chunk to the top of your Markdown file:
knitr::opts_chunk$set(echo = TRUE, eval = TRUE, warning = FALSE, message = FALSE, tidy = TRUE)
  • Include a < 15 minute video presentation of your project that addresses the items in the Grading Rubric below.
    • Upload the video itself to Moodle. If it’s too large, share a link to a recording on the web or in a shared drive.
    • All team members should have an equal speaking role in the presentation.
    • Please use a visual aid (that is, something for the viewer to look at). I think the easiest way to do this is to record a Zoom meeting, then share your screen with some slides.



Grading Rubric

  • Data context (10 points)
    • Clearly describe what the cases in the final clean dataset represent.
    • Broadly describe the variables used in your analyses.
    • Who collected the data? When, why, and how? Answer as much of this as the available information allows.
  • Research questions (10 points)
    • Research question(s) for the regression task make clear the outcome variable and its units.
    • Research question(s) for the classification task make clear the outcome variable and its possible categories.
  • Regression - Methods (10 points)
    • Describe the regression models you tried.
    • Describe what you did to evaluate models.
      • Indicate how you estimated quantitative evaluation metrics.
      • Indicate what plots you used to evaluate models.
  • Regression - Results (10 points)
    • Display evaluation metrics for your different models.
    • Summarize your final model and justify your model choice using a combination of the following:
      • Analytically: evaluation metrics, plots, variable importance
      • Interpretability/simplicity
    • Show and interpret some representative examples of residual plots for your final model. Does the model show acceptable results in terms of any systematic trends/biases?
  • Regression - Summary (10 points)
    • Interpret your final model for important predictors, and provide general interpretations about what you learned.
    • Interpret evaluation metrics in context, with units. Is this an acceptable amount of error?
    • All summarizations should be made in original context of the data.
  • Classification - Methods (10 points)
    • Describe the classification models you tried.
    • Describe what you did to evaluate models.
      • Indicate how you estimated classification evaluation metrics.
  • Classification - Results - Variable Importance (10 points)
    • Display evaluation metrics for your different models.
    • Summarize your final model and justify your model choice using a combination of the following:
      • Analytically: accuracy metrics, variable importance
      • Interpretability/simplicity
      • Plots: ROC, grouped boxplots with probability threshold, etc.
    • Does the model show acceptable results in terms of accuracy, sensitivity, or specificity?
  • Classification - Summary (10 points)
    • Interpret your final model for important predictors, and provide general interpretations about what you learned.
    • Interpret evaluation metrics in context. Is this an acceptable accuracy/sensitivity/specificity, relative to the NIR?
    • All summarizations should be made in original context of the data.
  • Code (20 points)
    • Knitted, error-free HTML and corresponding Rmd file submitted
    • Code corresponding to all analyses above is present and correct