Lecture 31 - Modeling and Evaluation 2

Announcements

Goals

So you've made some predictions...

How good are they? Assume we're in a supervised setting, so we have some ground truth labels for our training and validation data. Should you call it good and present your results, or keep tweaking your model?

You need an evaluation environment. What do you need to make this?

Make it convenient; make it informative

With good reason, the book recommends that you package all your evaluation machinery into a single-command program (this could also be a single notebook or sequence of cells in a notebook).

You should output your candidate model's performance:

It's also a good idea to output:

Baselines

The first rule of machine learning is to start without machine learning. (Google says so, so it must be true.)

Why?

Example: you are a computer vision expert working on biomedical image classification, trying to predict whether an MRI scan shows a tumor or not. Your training data contains 90% non-tumor images (negative examples) and 10% tumor images (positive examples).

Baseline Brainstorm

Example prediction problems:

Ideas?

My ideas:

Evaluation Metrics

So you've made some predictions... how good are they?

Let's consider regression first. Our model is some function that maps an input datapoint to a numerical value:

$y_i^\mathrm{pred} = f(x_i)$

and we have a ground-truth value $y_i^\mathrm{true} $for $x_i$.

How do we measure how wrong we are?

Problem with any of the above:

You can make your error metric go as small as you want! Just scale: $$ X \leftarrow X / k $$ $$ \mathbf{y}^\mathrm{true} \leftarrow \mathbf{y}^\mathrm{true} / k $$ $$ \mathbf{y}^\mathrm{pred} \leftarrow \mathbf{y}^\mathrm{pred} / k $$

Also: Is 10 vs 12 is a bigger error than 1 vs 2?

Solutions: