Lecture 19 - Generalization, Continued¶

Hyperparameter Tuning and Regularization¶

Announcements¶

  • New seats, new friends, as of Monday:
In [7]:
import random
random.seed(518)
datafolk = "Alli Keira Malik Erika Narina Sebastian Josh Dylan Haden Zach Maven Marcus Finnley".split()
random.shuffle(datafolk)
print(datafolk[:4])
print(datafolk[4:9])
print(datafolk[9:])
['Malik', 'Josh', 'Alli', 'Erika']
['Marcus', 'Keira', 'Maven', 'Dylan', 'Zach']
['Narina', 'Finnley', 'Haden', 'Sebastian']

Goals¶

  • Know why and how to subdivide datasets into training, validation, and test sets
  • Understand what hyperparameters are and how to tune them using a validation set
  • Know how cross-validation works and why you might want to use it.

Notes¶

Lecture notes for today's content:

  • See the notebook from last lecture (L18)
  • See the whiteboard notes from L18 and today (L19)

Warm-up¶

Questions 1--3 on today's worksheet

Whiteboard:¶

  • MLBot 1.0 - Triain/val/test splits
  • Cross-validation
  • Hyperparameters and Regularization

Hyperparameter Tuning and Regularization: Activity¶

0. Imports and Data Splits¶

In [8]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_openml
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

data = fetch_openml('vehicle', version=1, as_frame=False, parser='auto')
X, y = data.data, data.target

# This converts features to z-scores:
X = StandardScaler().fit_transform(X)

train_frac = 0.6

X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=1-train_frac, random_state=311)
X_val,   X_test, y_val,   y_test  = train_test_split(X_temp, y_temp, test_size=0.5, random_state=311)

print(f"Train: {len(X_train)}\n  Val: {len(X_val)}\n Test: {len(X_test)}")
Train: 507
  Val: 169
 Test: 170

1. Train a family of models with different values of $C$¶

In [9]:
C_values = [0.001, 0.01, 0.1, 1, 10, 100, 1000]

train_accs, val_accs = [], []

for C in C_values:
    svm = SVC(kernel='linear', C=C) # make the classifier
    svm.fit(X_train, y_train) # train the classifier
    train_accs.append(svm.score(X_train, y_train)) # evaluate on the training set
    val_accs.append(  svm.score(X_val,   y_val))   # evaluate on the validation set

best_C = C_values[np.argmax(val_accs)]

2. Plot the training and validation accuracy for each model¶

In [10]:
fig, ax = plt.subplots(figsize=(9, 5))

ax.plot(C_values, train_accs)
ax.plot(C_values, val_accs)

ax.set_xscale('log')
ax.set_xlabel('C', fontsize=12)
ax.set_ylabel('Accuracy', fontsize=12)
ax.set_title('SVM Hyperparameter Tuning', fontsize=13)
plt.tight_layout()
plt.show()
No description has been provided for this image

3. Evaluate the best model on the held-out test set¶

Use the best C chosen from the validation curve and compute the performance on the held-out test set.

Important: we should run this only when we are finished tweaking our modeling assumptions and hyperparameters. After we evaluate on the test set, it's no longer unseen!

In [11]:
final_model = SVC(kernel='linear', C=best_C)
final_model.fit(X_train, y_train)

test_acc  = final_model.score(X_test, y_test)
val_acc   = max(val_accs)

print(f"Validation accuracy (C={best_C}): {val_acc:.1%}")
print(f"Test accuracy       (C={best_C}): {test_acc:.1%}")
Validation accuracy (C=10): 78.7%
Test accuracy       (C=10): 83.5%
In [ ]: