DATA 311 - Lab 7: PCA and Clustering

Scott Wehrwein

Fall 2021

Introduction

In this lab, we’ll use PCA to help us visualize the learned feature extraction process of a machine learning model.

Collaboration Policy

You are required to complete this lab in pairs. I highly recommend collaborating synchronously, as each partner will be responsible for understanding (and being able to independently explain) every aspect of your submission. As a reminder, here’s the collaboration policy for labs done in pairs from the syllabus:

For labs done in pairs, any and all collaboration is permissible between members of the same pair. That said, both members must understand and be able to explain in detail all aspects of their submission. For this reason, “pair programming” is highly recommended - you should not split the tasks up for each group member complete independently. I reserve the right to meet with any student one-on-one and ask them to explain any part of their submission to me in detail.Getting Started

Getting Started

For this lab, you’ll need to install the sklearn package: pip install sklearn. There is a starter notebook for this lab, which you can download here.

Your Tasks

The background explanations and tasks are detailed in the starter notebook. Read the instructions carefully: the harder part of this lab is probably figuring out what I’m asking you to do, not actually doing it.

If you’re stuck, ask questions early and often. The number of lines of code in my solution comes in under 30; you don’t need to match that, but if you’re writing a lot more, you’re probably missing an easier way to do something.

Finally, please make sure you have results for all parts; I’ve put “TODO” in the notebook in each place you need to do some work. Don’t forget to do 1.2, which is a written answer.

Submitting your work

Modify the starter notebook and submit it to Canvas (one group member only), then fill out the Week 7 Survey on Canvas (both group members). Your submission will not be considered complete until both group members have submitted the survey.

Rubric

Part 1 is worth 20 points:

Part 2 is worth 10 points:

Extra Credit

Up to 3 points for a thorough quantitative analysis of cluster quality in all feature maps.