DATA 311 - Final Project

Scott Wehrwein

Winter 2023

Overview

In the final project, you will complete a project that puts together just about everything we’ve covered in this course, covering the full extent of the data science pipeline from coming up with a question to presenting your results.

Deliverables

Groups

You will work in groups of 3; other group sizes (2 or 4) will be allowed only by request and only to satisfy divisibility constraints. You are free to choose your groups.

Topics

The topic for the project is left open-ended so you can explore something of interest to you. The requirements are:

A Note on Flexibility

It’s pretty easy to scope a project incorrectly or go down a path where good data isn’t available, so we may need to iterate a little on the proposals before settling on a topic. It’s also possible that you’ll encounter unexpected road blocks; in this case, well-justified pivots will be allowed at the milestone, provided that you end up with a quality project at the end. Pivots because you didn’t start early enough and ran out of time are not considered well-justified.

Proposal and Feasibility

Due Date: Friday, 2/24

Write a proposal document for your final project. One person from each group will submit a single proposal as a PDF by the deadline above. Make it as short or long as needed (ideally no longer than needed), but I expect these to be about 1-2 pages. The document should include the following sections:

  1. Group members
  2. Question: What is your motivating research question? What are you looking to learn from this project?
  3. Data: Describe the data you will work with and convince me that (a) it exists and (b) you can get it.
  4. Feasibility: Convince me that your data can address your question. Be sure to talk specifically about an exploratory component and a machine learning component. If there is some uncertainty on specifics, that’s okay but your plan should be plausible; if there are significant foreseeable risks, include contingency plans.
  5. Milestone deliverable: describe what you plan to have done at the milestone deadline. This should tell me how the Milestone guideline listed in the Overview section above applies to your project.
  6. Roadmap: do your best to break the project into subtasks that will take one group member no more than a week to accomplish. For each task, give a tentative allocation of which group member(s) will accomplish it and when it will be done.

Milestone Report

Due Date: Friday, 3/3Sunday 3/5

Submit a milestone report describing your progress, and supporting notebooks, code, etc. Submit a PDF file with the following information:

  1. List your group members
  2. Address any unresolved feedback from your proposal.
  3. Describe of the status of each of the tasks requested for the Milestone. This will include:
  4. If any of the above goals were not met, explain why and detail your plan for completing them. If any changes in scope, goals, or roadmap are necessary, explain why and what your updated plan is.

Alongside your report, you should submit a zip file containing artifacts showing your progress. This will likely take the form of Jupyter notebooks, but if something else makes sense, go for it. I expect to see at least evidence of exploratory analysis results and baseline/evaluation setup. You do not need to submit the data necessary to run the notebooks, but you should submit the notebooks with up-to-date outputs that show me what I’d see if I did run them.

Final Report

Due Date: Friday, 3/10

Expectations

Your final submission must present interesting results on your original motivating question. The exploratory analysis component must be Correct, Convincing, and Clear, and should include at least some polished visualizations in the spirit of Lab 3.

Expectations for the machine learning component depend on the style of approach.

Submission

Blog Post Notebook

This is a writeup of your project for a general audience. It will be a notebook (titled blog.ipynb), but you should think of it as a blog post. It should talk not only about your results, but also provide background on what you set out to do, why is it interesting and worth reading about, the size, source, etc. of the data you used. It should walk the reader through your most interesting findings and provide discussion and interpretation of what the results mean and their implications. The blog post should address both the exploratory analysis you did and the predictions you made, though it can focus more on one or the other if one of them turned out to be more interesting. The blog post should include at least some Lab 3-quality visualizations to support the exposition; that is, the visualizations should be highly polished and adhering to the principles of visualization aesthetics, though you don’t need to explain your designs here as you did in Lab 3.

Supporting Notebooks

Your blog notebook should also refer to one or more additional notebooks (in .ipynb format) containing the details of your analysis that you will submit alongside the blog post. These notebooks should be correct, clear, and convincing according to the guidelines used for many labs in this course. The audience here is an interested reader of your blog post who wants to dive into the details of your analysis, and potentially reproduce it - as such, the notebooks together should contain everything needed to reproduce your results.

Submission Logistics

Submit a single zip file to Canvas containing:

Submissions will be posted on the course webpage and linked from a final project showcase page for posterity.

Final Presentations

In class on 3/7 and 3/8

We will use Tuesday, Wednesday (and if needed, Friday) of the last week of classes to give brief (5-7 minute) presentations of the final projects. These are relatively informal presentations that will give you a chance to see the fun and interesting results found by other groups. You may make separate presentation slides, or just show and talk about your blog post, but make sure that you’re keeping it short and talking about the highlights of what was interesting about your findings. If you have slides or any other visuals to show, you will need to send them to me by one hour before the beginning of class so we can present from a single computer. The presentation schedule will be posted as we approach the last week.