Names:
For each of the following prediction scenarios, come up with the strongest baseline you can think of that does not require any machine learning.
Predict whether an email message is spam. The training data contains equal numbers of spam (positive) and non-spam (negative) examples.
Your task is to predict whether an MRI scan shows a tumor or not. The training data contains 90% non-tumor images (negative examples) and 10% tumor images (positive examples).
Given all weather measurements from today and prior, predict whether it will rain tomorrow.
For the NHANES body measurement dataset, predict a person’s leg length given their height.
Suppose you are evaluating regression results on a validation set. Your model produces predictions \(y_i^\mathrm{pred}\) for each datapoint \(i\), while the corresponding ground truth labels are \(y_i^\mathrm{true}\)
Computing average error over a whole validation set would look like \(\sum_i \left(y_i^\mathrm{true} - y_i^\mathrm{pred}\right)\). Why wouldn’t this be a good idea? and how would you fix it?
What is the tradeoff in choosing MSE vs MAE to measure regression performance on a dataset?
The coefficient of determination is defined as \(1 - \frac{SS_\mathrm{res}}{SS_\mathrm{tot}}\), where:
What is the coefficient of determination if the predictions are perfect?
What is the coefficient of determination if you use a regressor that predicts the mean label?
What happens to the coefficient of determination if your predictions are worse than the mean?
As a reminder, we can classify binary classification predictions into four categories:
TP - True positives (correctly labeled positive)
TN - True negatives (correctly labeled negative)
FP - False positives (incorrectly labeled positive; was actually negative)
FN - False negatives (incorrectly labeled negative; was actually positive)
For each of the following questions, your task is to game the metric; describe either a classification task, or a classification strategy, where the given metric would not be a good measure of the model’s true performance. For the sake of example, imagine the classification task is a test that predicts cancer.
Game it: when is accuracy not a good measure?
Precision is how often you’re right when you say it’s positive: \(\frac{TP}{(TP+FP)}\). Game it.
Recall is how many of the positive examples you are right about: \(\frac{TP}{(TP + FN)}\). Game it.
The precision for class \(c\) is \(\frac{\textrm{\# correctly labeled } c}{\textrm{\# labeled class } c}\), while recall for class \(c\) is: \(\frac{\textrm{\# correctly labeled } c}{\textrm{\# with true label } c}\). Given a confusion matrix, how would you calculate:
The precision for a certain class?
The recall for a certain class?