DATA 311 Project Rubric

Intermediate Deliverables

Proposal (10 points)

Your proposal should be well-considered and complete. We will provide feedback and iterate until we have agreed on a satisfactory path. The 10 points for this deliverable will be assigned for completeness and responsiveness to feedback, if any.

Milestone (10 points)

Your milestone deliverable should demonstrate that you have completed data collection, curation, cleaning, etc. Any changes in scope or plan should be clearly documented. The 10 points for this deliverable will be assigned for completeness and responsiveness to feedback, if any.

Final Deliverable - Part I (50 Points)

DATA COLLECTION CURATION (30 Points)

This process is highly dependent on the project. To allow flexibility, the following rubric is more than 30 points total to allow for different approaches. Some techniques have N/A listed in some categories and are not eligble for those amount of points. The term data points in the rubric refers to the equivalent of a pandas row. The cap for this section is 30 points.

Technique Extensive Use (8 Points) Full Use (6 Points) Partial Use (4 Points) Weak Use (2 Points) No Use (0 Points)
Web Scraping Full Use and an extra technique or process not covered in class, scrapes many (3+) sites for a significant amount of data points, or scraps a difficult site to scrape. Uses web scraping to gather a significant part of the data set, 5 or more data points. Uses web scraping to gather some part of the data set, more than 1 data point. Uses web scraping for a single data point. Does not use web scrapping.
Web Scraping (Pagination) N/A N/A Paginates over many (3+) web pages using loops. Paginates over a few (<3) websites manually or with loops. Does not paginate to collect data.
APIs Full Use and either uses additional APIs or interacts with advanced features an API provides. Uses an API to gather a significant part of the data set, 5 or more data points. Uses an API to gather some part of the data set, more than 1 data point. Uses an API to gather a single data point. Does not use an API
Data Cleaning Full Use and some complex cleaning that involves some procedure such as a function or library call. Partial Use and has at least one data point with a complex clean (more than just lower-casing a string or indexing a value). Cleans up at least 3 data points in order to use properly for analysis. Uses basic cleaning for a couple (<3) data points. Does not clean data.
Data Creation N/A Partial Use and uses procedures beyond statistics and basic functions to create at least 1 of the data points. Creates at least 2 data points from collected data points. Creates at least 1 data point from collected data. Does not use data creation.
Missing Data N/A N/A Removes or fills in missing data and documents the reasoning and consequences of doing so. Removes or fills in missing data. Does not deal with missing data.
Data Merging Full Use and extra cleaning or advanced merging techniques were required to do so successfully. Merges together two datasets with at least 3 new data points being added as a result. Merges together two datasets with less than 3 data points being added as a result N/A Data merging is not used.
NLP In-depth NLP methods are used to create at least 3 data points. NLP methods are used to create at least 3 data points. NLP methods are used to create less than 3 data points N/A NLP is not used.
Other Techniques Other technique that requires substantial work to pull off. Other technique that requires a good deal of work to pull off. Other technique to collect at least 3 data points. Other technique to collect 1 data point. No other technique used.

DOCUMENTATION (10 Points)

Metric 5 Points 4 Points 3 Points 2 Points 1 Points 0 Points
Documentation The entire notebook has documentation for confusing code, and all considerations that guided the direction of the collection. The notebook has good comments, but there are a few places where confusing code or steps are not documented. The notebook has good comments, but there are more than a few places where confusing code or steps are not documented. There are some comments in the notebook, but some confusing decisions and steps are left undocumented. There are a few comments in the notebook but they are lacking. There are no comments or no comments of value in the notebook.
Repeatability Notebook deliverable runs fully on its own by just hitting run all cells (on WWU JupyterHub instance running the DATA311 environment). Notebook runs almost fully on its own just by hitting run all cells. i.e. A file is needed to be manually downloaded. The notebook has more than one manual step required to be able to be run. The notebook has less than 3 manual steps required to be able to be run. The notebook has less than 6 manual steps required to be able to be run. Almost all or all steps involving data collection require manual intervention of some kind.

ETHICAL COLLECTION (10 Points)

Metric 10 Points 5 Points 0 Points
Ethical Collection An ethical collection is followed. robots.txt was checked and adhered to; websites and API endpoints were not hit with many requests over a short period of time. Some ethics were not followed for part of the collection. Ethics were blatantly ignored during data collection.

Final Deliverable - Part II (50 Points)

ANALYSIS (40 Points)

Metric 5 Points 2 Points 0 Points
Soundness Figure choice and result selection makes sense and helps to tell the story of the data. Some figure choice and results do not make sense for telling the story of the data. Figure choice and results have little or no power in telling the story of the data.
Results Results are clearly stated in the writing. Some results are skipped over in the writing (over reliance on figures.) Swaths of results are not mentioned in writing in the report.
Framing The results are framed clearly in the report, interesting and important results are explicitly highlighted. Most results are framed clearly in the report, some of the interesting and important results are highlighted; room for more. No or little attempt is made to make the results clear and engaging to the audience.
Style The writing is in a style that is appropriate for sharing technical results with an audience. The writing style is informal but effective at delivering information. The writing is too informal and unclear because of style choices.
Execution Writing is devoid of spelling and grammar errors and sentence structure is clear. Some errors in spelling, grammar, or structure. The writing contains many spelling mistakes, grammar errors, and is unclear.
Flow Arguments and storytelling in the report follow a clear path. Arguments and storytelling in the report are follow-able, but could be clearer. There is little to no clarity in the arguments and storytelling in the report.
Formatting Report writing is clear and written in markdown cells Report writing is clear but written as code comments Formatting is inconsistent or unclear.
Figures Graphs and figures follow good graph principles. Graphs have some flaws that do not follow good graph principles. Graphs do not follow good principles or there are graphs missing.

REFLECTION (10 Points)

Metric 10 Points 5 Points 0 Points
Reflection The reflection is clear and thoughtful. There is a reflection but it is not as clear as it could be. The reflection is lacking clarity and thoughtfulness or is missing.