Learning Objectives

Learning objectives for our course topics are listed below. Use these to guide your synthesis of video and reading material.

Introduction to Statistical Machine Learning

Formulate research questions that align with regression, classification, or unsupervised learning tasks

Evaluating Regression Models

Create and interpret residuals vs. fitted, residuals vs. predictor plots to identify improvements in modeling and address ethical concerns
Interpret MSE, RMSE, MAE, and R-squared in a contextually meaningful way

Overfitting and cross-validation

Explain why training/in-sample model evaluation metrics can provide a misleading view of true test/out-of-sample performance
Accurately describe all steps of cross-validation to estimate the test/out-of-sample version of a model evaluation metric
Explain what role CV has in a predictive modeling analysis and its connection to overfitting
Explain the pros/cons of higher vs. lower k in k-fold CV in terms of sample size and computing time

Subset selection

Clearly describe the forward and backward stepwise selection algorithm and why they are examples of greedy algorithms
Compare best subset and stepwise algorithms in terms of optimality of output and computational time

LASSO (shrinkage/regularization)

Explain how ordinary and penalized least squares are similar and different with regard to (1) the form of the objective function and (2) the goal of variable selection
Explain how the lambda tuning parameter affects model performance and how this is related to overfitting

KNN Regression and the Bias-Variance Tradeoff

Clearly describe / implement by hand the KNN algorithm for making a regression prediction
Explain how the number of neighbors relates to the bias-variance tradeoff
Explain the difference between parametric and nonparametric methods
Explain how the curse of dimensionality relates to the performance of KNN (not in the video–will be discussed in class)

Modeling Nonlinearity: Polynomial Regression and Splines

Explain the advantages of splines over global transformations and other types of piecewise polynomials
Explain how splines are constructed by drawing connections to variable transformations and least squares
Explain how the number of knots relates to the bias-variance tradeoff

Local Regression and Generalized Additive Models

Clearly describe the local regression algorithm for making a prediction
Explain how bandwidth (span) relate to the bias-variance tradeoff
Describe some different formulations for a GAM (how the arbitrary functions are represented)
Explain how to make a prediction from a GAM
Interpret the output from a GAM

Logistic regression

Use a logistic regression model to make hard (class) and soft (probability) predictions
Interpret non-intercept coefficients from logistic regression models in the data context

Evaluating classification models

Calculate (by hand from confusion matrices) and contextually interpret overall accuracy, sensitivity, and specificity
Construct and interpret plots of predicted probabilities across classes
Explain how a ROC curve is constructed and the rationale behind AUC as an evaluation metric
Appropriately use and interpret the no-information rate to evaluate accuracy metrics

Decision trees

Clearly describe the recursive binary splitting algorithm for tree building for both regression and classification
Compute the weighted average Gini index to measure the quality of a classification tree split
Compute the sum of squared residuals to measure the quality of a regression tree split
Explain how recursive binary splitting is a greedy algorithm
Explain how different tree parameters relate to the bias-variance tradeoff

Bagging and random forests

Explain the rationale for bagging
Explain the rationale for selecting a random subset of predictors at each split (random forests)
Explain how the size of the random subset of predictors at each split relates to the bias-variance tradeoff
Explain the rationale for and implement out-of-bag error estimation for both regression and classification
Explain the rationale behind the random forest variable importance measure and why it is biased towards quantitative predictors (in class)

K-means clustering

Clearly describe / implement by hand the k-means algorithm
Describe the rationale for how clustering algorithms work in terms of within-cluster variation
Describe the tradeoff of more vs. less clusters in terms of interpretability
Implement strategies for interpreting / contextualizing the clusters

Hierarchical clustering

Clearly describe / implement by hand the hierarchical clustering algorithm
Compare and contrast k-means and hierarchical clustering in their outputs and algorithms
Interpret cuts of the dendrogram for single and complete linkage
Describe the rationale for how clustering algorithms work in terms of within-cluster variation
Describe the tradeoff of more vs. less clusters in terms of interpretability
Implement strategies for interpreting / contextualizing the clusters