Syllabus

Welcome to Intro to Big Data Systems! We'll deploy and use distributed systems to store and analyze large datasets. Unstructured and structured approaches to storage will be covered. Analysis will involve learning new query languages, processing streaming data, and training machine learning models. Systems covered include Docker, PyTorch, HDFS, Spark, Cassandra, Kafka, and more.

Revisions to Syllabus

Learning Objectives

Lecture

We meet 3 times a week -- see the lecture schedule here.

I'll ask questions during lecture via TopHat. Though in-person attendance is not required, you can earn extra credit by answering these correctly. Answering TopHat questions remotely is not permitted.

Readings

We'll be learning about many different big data systems, and so no textbook closely corresponds to the lecture content. Thus, attending lectures and taking notes will be your primary resource.

We will have recommended (though optional) readings for many systems, however. We'll select from O'Reilly text books because you can read them free online via the Madison Public Library. You just need to do the following:

  1. get a library card (free)
  2. sign into the O'Reilly collection with your card number
  3. search for the assigned book

Here are some of the main texts we'll reference this semester:

Communication

We message the class regularly via Canvas announcements. We recommend updating your Canvas settings so that the "Announcement" option is "Notify immediately" so that you don't miss something important.

See the help page for details about how to contact us.

Email guidelines: When working with a project partner and emailing course staff, one student should send the email and CC their partner. If you email a TA and don't hear back within 2 business days, you may CC the instructor on the same thread to escalate. If you need to contact both the instructor and a TA about the same topic, include both on one email thread rather than starting separate threads.

We have various forms for us to leave (optionally anonymous) feedback and thank TAs.

Course Components

There will be an opportunity to earn 104 points during the semester (100 regular points and 4 extra credit points).

Point distribution

Grade thresholds will be applied to points (not percents!) as follows:

Exams

There will be 3 in-class midterms and a final exam during finals week. All exams are cumulative. They will usually be a mix of multiple choice questions and others (e.g., short answer, or illustrating how a system works). These will all be closed book/notes.

The final exam will be Tuesday, May 5, 2026, 5:05-7:05 PM.

Missed Midterm Policy: You may miss up to 2 midterms. The first miss is no questions asked, and the second time you must provide credible medical evidence or proof of travel to your instructor. If you are not able to take any midterms, other options such as medical withdrawal should be considered. You must communicate midterm misses via the miss form as soon as possible, and no later than the day of the midterm.

At the end of the semester, the grade percent for your validly-missed midterms will be replaced with your percent on the final. This is basically equivalent to putting more weight on the final. The final exam cannot be skipped, nor taken before the regular time. It may only rescheduled to a later date, if necessary.

Online Policy Quiz

There is an online quiz covering course policies. You may repeat this quiz as many times as you like. If the instructor believes it is necessary for a student to review course policies, the instructor may clear their score, requiring them to take it again.

Projects

There will be 8 substantial programming projects. AI policy will vary by project, to expose you to different tools (check the specific policy for each project). These can optionally be done with a single partner.

Hand-in Worksheets

We will do a variety of non-graded worksheets in class. You do not need to hand these in. We will also have some that you must hand in (we will share more details about which ones, and when this is).

Hand-in worksheets must be submitted in person at lecture, either by you or a trusted friend, on the due date or the following lecture. They cannot be submitted at office hours or online.

You may lose points on worksheets for: illegible name or writing, using staples/paperclips/glue, wrong answers, or paper that is crumpled or torn (these can jam the scanner).

Skill Demos

Two times during the semester, you will need to schedule a time to demonstrate certain skills on a virtual machine to a TA. We communicate more details about this.

Extra Credit

The extra credit opportunities will add up to more than 4, but nobody can earn more than 4 extra credit points in the course. One TopHat point will be worth 0.2 course points in the extra credit category. You may only do TopHats in your registered section. There will be enough TopHat opportunities that you can miss many and still max out, providing flexibility for illness or technical issues. When there are technical issues, please do not ask the instructor to re-open the question. With hundreds of students, and 5-10% having technical issues in any given lecture, this is not a scalable form of flexibility, unfortunately.

Academic Misconduct

Project Policies

Be sure to read and understand the full project collaboration policies here.

TopHat Policies

TopHat questions are intended for in-class participants. Students who submit any TopHat question remotely are not eligible for any extra credit for the course. We might notice this by passing around a sign-up sheet following a TopHat question. AI use is NOT permitted on TopHats. Comparing answers with other students in the same section IS allowed.

Piazza Policies

Do not post project code snippets that are >5 lines long.

Exam Policies

Worksheets

You may discuss with other students IN PERSON on the worksheets. If you do, you will still submit individually. For example, you can meet and discuss answers. You MAY NOT just send answers via photo or otherwise, saying "here is what I did" or similar. You MAY NOT use AI on worksheets.

Recommendation Letters

Earning a recommendation letter is much harder than earning an A in this course. At a minimum, I'll want to see you doing something complex and interesting beyond the assingments. For a typical letter, I'll have collaborated with a student on some project for multiple months, with many iterations of feedback.

Most grad schools require recommenders to fill long forms rating students on various abilities (see an example below). Make sure that if you're asking me, I would be able to fill such a form without needing to put "I don't know" as my answer to many of the questions.