DATA 311 - Fundamentals of Data Science

Scott Wehrwein

Fall 2025

Course Overview

Basics

Staff

Where and When

Lectures:

What is this course about?

Synopsis from the WWU Course Catalog

Introduction to the fundamentals of data science, focusing on techniques for collecting, processing, visualizing and organizing data. Applied machine learning concepts will also be covered, including fundamentals of machine learning experimentation and the use of libraries to perform clustering, classification and regression. Includes lab.

Official Course Outcomes

On completion of this course students will demonstrate:

Textbook

The following books are recommended, but not required:

Assessment

Data science is a practical pursuit, and this course takes a particularly practical-minded approach to it. We will focus less on the mathematical underpinnings of the tools of data science and more on strategies for successfully using those tools to extract insights from data. As such, the assessment in this course is entirely project-based. Grades will be calculated as a weighted average of scores on the following course components, each of which is described in more detail below:

The standard letter grade ranges apply (i.e., 90–100% is an A, 80–90% is a B, and so on). The calculated raw percentages may be curved at the instructor’s discretion, but any such curve used will not lower anyone’s grade. “+” or “-” cutoffs will be decided at the instructor’s discretion.

Students who demonstrate mastery of the material will get grades in the A range, and it is my goal to give as many A’s as possible.

Lab Assignments

Labs will be spent getting started on the lab assignment for the week. The labs comprise the bulk of the out-of-class workload for this course, so you should plan to allow significant time to complete them outside of the lab period. Some labs will be done individually, while others may be completed in pairs. Lab attendance is required, but missing one lab will not count against you. If you miss lab but submit the lab assignment on time, you will receive 50% of the score you would have received had you attended.

To get full credit for a lab, you must both attend lab and hand in the deliverable by the deadline, which will typically be Thursday night at 10:00pm the week following the lab. Some (possibly all) labs will have a short pre-lab assignment, due by the start of your lab period. Attending a lab section other than the one you are registered for requires permission from both me and the lab TA(s). If you do not attend lab but do submit the deliverable on time youa will receive half credit (i.e., your score will be multiplied by 0.5). This non-attendance penalty is automatically waived for one lab; if you have a legitimate reason for missing additional labs, contact me ahead of time.

Quizzes

Weekly quizzes will be given, generally at the start of class on Fridays, covering material up to but not including that day’s topic. Quizzes usually focus on material from the preceeding Friday, Monday, and Wednesday’s classes. I have a strict policy against makeup quizzes, but your lowest quiz grade will be dropped. If special circumstances cause you to miss taking more than one quiz, please talk to me ahead of time.

In-Class Activities and Reading Responses

My goal is to make the lecture component of this course as interactive as possible. Activities may include class discussions, individual writing prompts, and group work; activities will often have a deliverable that is handed in. In-class activities will be graded on completion only (i.e., if you make an honest effort, you will receive full credit).

I will also assign a few (likely 3 ± 1) Data Ethics assignments, often involving an assigned reading, that touch on the interactions between data science and society, often with a focus on ethical considerations. A short individual written response will be submitted before class, and an in-class discussion will follow.

Final Exam

The final exam will be given in our usual lecture classroom at the University-appointed time. The exam may involve a written component and/or a practical component that is done on a computer in a Jupyter environment. Details of the exam format will be announced by the start of the last week of classes.

The final exam will be cumulative. You may use two double-sided 8.5x11 sheets of handwritten notes, but all other resources (books, internet, friends, AI) are prohibited. Per the University Academic Policies, a student who fails to take a final examination without making prior arrangements acceptable to the instructor receives a failing grade for the course.

I do not release final exams or final exam grades. This means that at the end of the quarter, your score on Canvas will not reflect your final grade in the course. If you wish to see your graded final exam, you can review it in-person in my office starting at the beginning of the following quarter by visiting my office hours or emailing me to make an appointment.

Resources

Help with Course Content

If you are stuck, struggling, or need help on any aspect of the course, you have several avenues for seeking help:

Other Resources

If you are have concerns that go beyond the course material you are welcome to talk to me. The following resources are also available to support you. https://cs.wwu.edu/diversity-equity-inclusion

Community Ambassadors

The Computer Science department has Faculty and student community ambassadors. The role of these ambassadors is to hear concerns, feedback, or questions from students, faculty and staff, especially (but not limited to) those related to equity, inclusion and diversity issues. We hope that the Community Ambassadors can advise and also guide people to college, university or external resources.

You can find more information on Community Ambassadors and contact details for faculty and student ambassadors at the following link: https://cs.wwu.edu/diversity-equity-inclusion.

University Resources

As a reminder, the following University resources are always available:

Logistics

Course Webpage / Syllabus

The Schedule section of this page will be kept up-to-date as the quarter progresses with topics, links to all lecture materials (notes, resources, etc), as well as links to assignment and lab handouts. I suggest bookmarking this page (including the #schedule at the end will link you straight to that part of the page); if you forget the URL and need to find your way back here, you can find the link on the Syllabus page in Canvas.

Canvas

I generally minimize the use of Canvas in favor of sharing materials via the course webpage. However, we will use Canvas for announcements, grades, quizzes, and submission of assignments. Lab and assignment writeups will be linked from both the course webpage and the corresponding assignment on Canvas. Lecture materials, readings, etc. will only be posted on the course webpage.

Computing Resources

CS Department Labs

The CS department maintains a set of Computer Science computer labs separate from the general university labs. These systems are all set up with the software that you need to complete the work for this class. You can find a list of these rooms and more information about them in the CS Support documentation. You will use your regular University username and password to log in. These labs are open to all CS students (that’s you!) any time except when scheduled for a class or other activity. CF 405 is never booked, so it’s always available. Labs are open 24/7, although the building locks at 11pm so you won’t be able to enter later than that.

JupyterHub

Most of our practical work in this class will be done working in Jupyter notebooks. The officilally supported environment for working with Jupyter notebooks is the department-hosted JupyterHub instance. You can start up a Jupyter server by visiting https://csci-head.cluster.cs.wwu.edu/; if you’re off campus, you’ll need to connect to the VPN first. Lecture 1 will cover the basics for starting a server and working in Jupyter notebooks, and you can find a quickstart guide here.

When finished working with your server, please shut it down by going to File > Hub Control Panel and hitting the red “Stop My Server” button. Your files will persist and you can restart your server next time you want to resume work again.

Gradescope

Quizzes and exams may be graded and returned to you via an online tool called Gradescope. You will receive an email around the time of the first quiz with instructions on how to set a password for the account that has been created for you. Logging in for the first time is the same process as resetting your password - begin by clicking the “forgot password” link. Thereafter, you can access graded quizzes and exams by logging into your account on https://www.gradescope.com/.

Feedback

I take student feedback seriously. I appreciate any feedback you’re willing to give, and I will do my best to act on constructive feedback when possible. I will solicit feedback through surveys periodically throughout the course, but you are welcome and encouraged to provide feedback anytime in my office hours, by email, or if you desire anonymity you can fill out this Google Form.

Communication Guidelines

Announcements

I will make all course-related announcements either in class or on Canvas. In-class announcements will be posted on the Schedule table on the course webpage. It is your responsibility to make sure that you see Canvas announcements promptly and check the in-class announcements if you miss class. Canvas should be configured to send you an email notification by default, but if you are unsure, please come see me in office hours.

Email

Email is the best way to get in touch with me. I do my best to check email regularly and respond when I can, but I am not able to be instantly responsive all the time. If you have something time-sensitive, email is the medium that I am most likely to see first. You can use Canvas messages as an alternative; these simply go to my email.

Grace

The policies for this course have “grace” built in: you have slip days for labs, your lowest quiz gets dropped, you can miss one lab without an attendance penalty, and you can miss up to three in-class activity submissions without penalty.

If any of the above forms of “grace” apply to your situation, you do not need to contact me: the grace policies are applied automatically. If you have used up all of a certain kind of grace and extenuating circumstances will cause you to go beyond the allowed grace, please contact me by email or in person to explain your situation.

Canvas Submission Comments

I do not read Canvas submission comments, so please do not use them. If you have a message for me and/or the TAs, please use email instead.

See Me After Class

Many quick questions can be resolved in a timely fashion by talking to me after class instead of using email or waiting for office hours. I will be available in the 10 minutes following lecture, so please feel free to use this time.

Schedule

This table contains a rough outline of a schedule for the quarter. As the quarter progresses, I will update it with more detail on past and upcoming topics. You will also find links to all course materials I post. Unless otherwise noted, References refer to chapters/sections in the Skiena book.

Date Topics Assignments References
09/24 (0) Introduction and overview
What is data science? What is data?
slides
typed notes
worksheet
whiteboard
Start of Quarter Survey (Canvas) 1.1, 1.3
09/26 Data types; numerical data; Jupyter
typed notes
worksheet
whiteboard
jupyterhub quickstart
09/29 (1) Data science tools overview
Multidimensional arrays; numpy
typed notes
worksheet (ipynb)
Lab 1: numpy Numpy Illustrated
McKinney 4
Numpy Quickstart
10/01 Tabular data and Dataframes; pandas basics
notebook: ipynb, html
McKinney 5
10 mins to Pandas
Intro to Pandas Data structures
10/3 Minimum viable prob/stat
pandas - basic stats and histograms
notebook / typed notes: ipynb, html
worksheet
whiteboard
Skiena 2.1-2.2
McKinney 5.3
10/06 (2) Formulating data questions
Conditional Probability and Independence
notebook: ipynb, html
worksheet
whiteboard
Lab 2: pandas Skiena 1.2
Skiena 2.1
10/08 Data Ethics 1 Discussion Data Ethics 1
10/10 Visualization: Principles Tufte excerpt
Skiena 6
McKinney 9
10/13 (3) Visualization: Practice
Lab 3: visualization
10/15 Processing: outliers and missing data, numerical normalization
10/17 Processing: text normalization, NLP basics
Week of 10/20 (4) (Responsible) Data collection
Structured data formats; Scraping; APIs
Lab 4: data processing
Ethics 2
Week of 10/27 (5) Exploratory Data Analysis Lab 5: data collection
Week of 11/03 (6) ML fundamentals; KNN
Lab 6: EDA
Week of 11/10 (7) ML for Data Analysis:
clustering, dimensionality reduction
Lab 7 - ML for Data Analysis
Ethics 3
Week of 11/17 (8) Generalization; ML experimentation and evaluation Lab 8
Week of 11/24 (9) Buffer / Thanksgiving
Week of 12/1 (10) Buffer / Language models, fairness/bias, AI
Friday, 12/12 Final Exam - 10:30 am - 12:30 pm

Course Policies

Professionalism

I am committed to maintaining an inclusive, supportive, and professional environment in all academic settings including lectures, labs, and course-related online spaces. Students are expected to live up to the ACM Code of Ethics and Professional Conduct. This is the ethical code adopted by nearly every software professional. Failing to follow the ACM Code of Ethics and Professional Conduct can negatively affect course grades up to and including a failing grade for the course. Conduct is also considered when determining admission to the major.

Attendance

I do not explicitly track attendance. However, in-class activities cannot be made up after the fact. At least three of these assessments can be missed without affecting your grade. For labs, the 50% absence penalty will be waived for one lab. If you have reasons that you need to miss class or lab beyond these limits, please talk to me about case-by-case exceptions. If you will be missing more than an occasional class here and there, or if you have any concerns about the effect of absences on your grade, please have a conversation with me about it.

Late Work

You have three “slip days” that you may use at your discretion to submit labs late. Slip days apply only to labs and can not be applied to any other deadline. You may use slip days one at a time or together - for example, you might submit each of three labs one day late, or submit one lab three days late. A slip day moves the deadline by exactly 24 hours from the original deadline; if you go beyond this, you will need to use a second slip day, if available.

After your slip days are exhausted, a penalty of 10% * floor(hours_late/24 + 1) - that is, 10% per day late, will be applied. This is calculated as a percentage of the total points possible, not of the points earned.

The time of your submission will be recorded when you submit it on Canvas, so other than submitting your assignment and corresponding survey late, you do not need to take any action to use a slip day. Your grading feedback will include a note of how many slip days have been applied.

Academic Honesty

The academic honesty guidelines for this course differ somewhat from those of a typical CS course. Much of the code you write will be written in chunks of a few lines at a time. The challenge will more often be knowing which library functions to use and how to correctly apply them, rather than solving complex algorithmic problems.

Some labs will be done individually, while others may be done in pairs. For all lab assignments, you are welcome and encouraged to discuss the lab with your classmates. You should feel free to exchange ideas for how to solve pieces of an assignment; this collaboration may be as detailed as suggesting which library function to use and an English description of what you might use it for. You may not copy anyone else’s code, nor should you allow anyone else to copy your code. Finally, most tasks of most labs will ask you to intersperse descriptive text with your code, to explain what the code is doing. This text must be your own and cannot be copied from, or even “inspired by” anyone else’s text. If you did get help on how to code up a task, you can prove that you understand the solution well by explaining it in your notebook.

For labs done in pairs, any and all collaboration is permissible between members of the same pair. That said, both members must understand and be able to explain in detail all aspects of their submission. For this reason, “pair programming” is highly recommended - you should not split the tasks up for each group member complete independently. I reserve the right to meet with any student one-on-one and ask them to explain any part of their submission to me in detail.

Viewing or sharing code with anyone that you’re not paired with on any assignment is an academic honesty violation. If you’re discussing an assignment with a classmate, it is safest to do so away from computers.

On the Use of AI Tools

AI tools such as ChatGPT and Github Copilot raise significant, new, and unanswered questions about their role in education, and indeed in the work your education is likely preparing you for. No one knows what this means for the future!

While it’s possible for tools to empower and make us more efficient, this does not preclude the necessity of learning the fundamentals of the tasks those tools can perform for us. Even though we have calculators, it’s still important to know how to do arithmetic. Furthermore, current AI tools are significantly less reliable than calculators, so it’s critical to be able to check their work. For this reason, I encourage you to be careful how you use AI tools and avoid deceiving yourself as to whether your understanding is solid, or whether you could have written that code yourself.

AI tools are not strictly prohibited, but your usage must pass a simple test: if you received the same kind of help from a human, would it be within the letter and spirit of the above collaboration policy? If yes, then it’s probably fine. If not, then you should avoid it. This means that, for example, code completion tools are off the table: this is analogous to a human standing over your shoulder and telling you what to type.

If you have any questions about what is and is not allowed under the academic honesty policies for this course, please talk to me.

Changes to the Syllabus

This syllabus is subject to change. Changes, if any, will be announced in class or online. Students will be held responsible for all changes.

University Policies

All University-wide policies apply to this course, including those outlined at http://syllabi.wwu.edu. These policies cover issues including: