DATA 311 - Final Project

Scott Wehrwein

Fall 2021

Final Project Reports

Here are the reports from the Fall 2021 final projects:

Overview

In the final project, you will complete a project that puts together just about everything we’ve covered in this course, covering the full extent of the data science pipeline from coming up with a question to presenting your results. The project will run for about four weeks, with one deliverable due each week.

Topics

The topic for the project is left open-ended so you can explore something of interest to you. The requirements are:

A Note on Flexibility

It’s pretty easy to scope a project incorrectly or go down a path where good data isn’t available, so we may need to iterate a little on the proposals before settling on a topic. It’s also possible that you’ll encounter unexpected road blocks; in this case, well-justified pivots will be allowed at each milestone, provided that you end up with a quality project at the end. Pivots because you didn’t start early enough and ran out of time are not considered well-justified.

Proposal

Due Date: Friday, 11/12

Write a proposal document for your final project. One person from each group will submit a single proposal as a PDF by 10pm Friday, 11/12. Make it as short or long as needed (ideally no longer than needed), but I expect these to be about 1-2 pages. The document should include the following sections:

  1. Group members
  2. The goal of your analysis:
    1. Your motivating research question. What are you looking to learn from this project?
    2. Describe the data you’re working with. By the time you submit the proposal, you should be very sure the data exists, and ideally have acquired it and made sure it’s readable and contains what you think it does. If your project involves a substantial scraping component, you should have a proof-of-concept scraper running, though you need not have all the data collected yet.
    3. At a high level, describe how you plan to use your data to answer your question. Be sure to talk specifically about an exploratory component and a machine learning component.
  3. Milestone 1 deliverable: describe what you plan to have done at the first milestone deadline. This should tell me how the Milestone 1 guideline listed in the Overview section above applies to your project.
  4. Milestone 2 deliverable: describe what you plan to have done at the second milestone deadline. This should tell me how the Milestone 2 guideline listed in the Overview section above applies to your project.
  5. Roadmap: do your best to break the project into subtasks that will take one group member no more than a week to accomplish. For each task, give a tentative allocation of which group member(s) will accomplish it and when it will be done.

Milestone 1 Report

Due Date: Monday, 11/22

Submit a milestone report describing your progress, and supporting notebooks, code, etc. Submit a PDF file with the following information:

  1. Group members
  2. Address any details that were not included in the proposal that I called out in my feedback; for example, if your prediction task or data split strategy was not fully specified, let me know what you settled on.
  3. A description of the status of each of the tasks requested for Milestone 1. This will include:
  4. If any of the above goals were not met, explain why and detail your plan for completing them. If any changes in scope, goals, or roadmap are necessary, explain why and what your updated plan is.

Alongside your report, you should submit a zip file containing artifacts showing your progress. This will likely take the form of Jupyter notebooks, but if something else makes sense, go for it. I expect to see at least evidence of exploratory analysis results and baseline/evaluation setup. You do not need to submit the data necessary to run the notebooks, but you should submit the notebooks with up-to-date outputs that show me what I’d see if I did run them.

Milestone 2 Report

Due Date: Monday, 11/29

Submit a milestone report describing your progress, and supporting notebooks, code, etc. Submit a PDF file with the following information:

  1. Group members
  2. Address any issues raised in your Milestone 1 feedback.
  3. A description of the status of each of the tasks requested for Milestone 2. This will include:
  4. If any of the above goals were not met, explain why and detail your plan for completing them. If any changes in scope, goals, or roadmap are necessary, explain why and what your updated plan is.

Alongside your report, you should submit a zip file containing artifacts showing your progress. This will likely take the form of Jupyter notebooks, but if something else makes sense, go for it. This should include (some of this may not have changed since MS1 if it was satisfactory at that point, but you should include it anyway):

You do not need to submit the data necessary to run the notebooks, but you should submit the notebooks with up-to-date outputs that show me what I’d see if I did run them.

Final Report

Due Date: Monday, 12/6

Blog Post

This is a writeup of your project for a general audience. It should talk not only about your results, but also provide background on what you set out to do, why is it interesting and worth reading about, the size, source, etc. of the data you used. It should walk the reader through your most interesting findings and provide discussion and interpretation of what the results mean and their implications. The blog post should address both the exploratory analysis you did and the predictions you made, though it can focus more on one or the other if one of them turned out to be more interesting. The blog post should include at least some Lab 6-quality visualizations to support the exposition; that is, the visualizations should be highly polished and adhering to the principles of visualization aesthetics, though you don’t need to explain your designs here as you did in lab 6.

The blog post should be in HTML format in a file called index.html, submitted along with any necessary supporting files (e.g., images). One simple way to create an HTML file is to write your blog post in a Jupyter notebook then Download As HTML. You can embed images in Markdown using syntax like: ![](name_of_image.png).

Notebooks

Your blog post should also provide links to the Jupyter notebooks (in .ipynb format) containing the details of your analysis. These notebooks should be correct, clear, and convincing according to the guidelines used for many labs in this course. The audience here is an interested reader of your blog post who wants to dive into the details of your analysis, and potentially reproduce it - as such, the notebooks together should contain everything needed to reproduce your results.

Submission

Submit a single zip file to Canvas containing:

Submissions will be posted on the course webpage and linked from a final project showcase page for posterity.

Final Presentations

1:00pm - 3:00pm on Tuesday, 12/7

We will use our final exam slot to give brief (~5 minute) presentations of the final projects. These are informal presentations that will give you a chance to see the fun and interesting results found by other groups. You may make separate presentation slides, or just show and talk about your blog post, but make sure that you’re keeping it short and talking about the highlights of what was interesting about your findings. If you have slides or any other visuals to show, you will need to send them to me the night before the presentation so we can present from a single computer.