DATA 311 – Lecture 23 – Classification and Regression Metrics Worksheet

Names:

Regression Metrics

Suppose you are evaluating regression results on a validation set. Your model produces predictions \(y_i^\mathrm{pred}\) for each datapoint \(i\), while the corresponding ground truth labels are \(y_i^\mathrm{true}\)

Computing average error over a whole validation set would look like: \[ \sum_i \left(y_i^\mathrm{true} - y_i^\mathrm{pred}\right) \] Why wouldn’t this be a good idea? and how would you fix it?
What is tradeoff in choosing MSE vs MAE to measure regression performance on a set of predictions?
The coefficient of determination is defined as \(1 - \frac{SS_\mathrm{res}}{SS_\mathrm{tot}}\), where:
- the numerator is the “sum of squared residuals”: \(SS_\mathrm{res} = \sum_i \left(y_i^\mathrm{true} - y_i^\mathrm{pred}\right)\).
- the denominator is the “total sum of squares”: \(SS_\mathrm{tot} = \sum_i \left(y_i^\mathrm{true} - \bar{y}\right)\).
1. What is the coefficient of determination if the predictions are perfect?
2. What is the coefficient of determination if you use a regressor that predicts the mean label?
3. What happens to the coefficient of determination if your predictions are worse than the mean?

Classification Metrics

As a reminder, we can classify binary classification predictions into four categories:

TP - True positives (correctly labeled positive)
TN - True negatives (correctly labeled negative)
FP - False positives (incorrectly labeled positive; was actually negative)
FN - False negatives (incorrectly labeled negative; was actually positive)

Let TP be the number of true positives, and so on for the other three. Define accuracy in terms of these quantities.

For each of the following questions, your task is to game the metric; describe either a classification task, or a classification strategy, where the given metric would not be a good measure of the model’s true performance. For the sake of example, imagine the classification task is a test that predicts cancer.

Game it: when is accuracy not a good measure?
Precision is how often you’re right when you say it’s positive: \(\frac{TP}{(TP+FP)}\). Game it.
Recall is how many of the positive examples you are right about: \(\frac{TP}{(TP + FN)}\). Game it.
The precision for class \(c\) is \(\frac{\textrm{\# correctly labeled } c}{\textrm{\# labeled class } c}\), while recall for class \(c\) is: \(\frac{\textrm{\# correctly labeled } c}{\textrm{\# with true label } c}\). Given a confusion matrix, how would you calculate:
1. The precision for a certain class?
2. The recall for a certain class?