Syllabus

Welcome to Intro to Big Data Systems! We'll deploy and use distributed systems to store and analyze large datasets. Unstructured and structured approaches to storage will be covered. Analysis will involve learning new query languages, processing streaming data, and training machine learning models. Systems covered include Docker, PyTorch, HDFS, Spark, Cassandra, Kafka, and more.

Revisions to Syllabus

Learning Objectives

Lecture

We meet 3 times a week -- see the lecture schedule here.

I'll ask questions during lecture via TopHat. Though in-person attendance is not required, you can earn extra credit by answering these correctly. Answering TopHat questions remotely is not permitted.

Readings

We'll be learning about many different big data systems, and so no textbook closely corresponds to the lecture content. Thus, attending lectures and taking notes will be your primary resource.

We will have recommended (though optional) readings for many systems, however. We'll select from O'Reilly text books because you can read them free online via the Madison Public Library. You just need to do the following:

  1. get a library card (free)
  2. sign into the O'Reilly collection with your card number
  3. search for the assigned book

Here are some of the main texts we'll reference this semester:

Communication

We message the class regularly via Canvas announcements. We recommend updating your Canvas settings so that the "Announcement" option is "Notify immediately" so that you don't miss something important.

See the help page for details about how to contact us.

We have various forms for us to leave (optionally anonymous) feedback, report lab attendance, and thank TAs.

Course Components

There will be an opportunity to earn 104 points during the semester (100 regular points and 4 extra credit points).

Point distribution

Grade thresholds will be applied to points (not percents!) as follows:

The extra credit opportunities will add up to more than 4, but nobody can earn more than 4 extra credit points in the course. One TopHat points will be worth 0.2 course points in the extra credit category.

Exams and In-Person Quizzes

These will be taken in person and mostly multiple choice. Exams 1+2 will be in the evening, and exam 3 will be during finals week. The in-person quizzes will be 30 minutes, during lecture time. All exams/quizzes are cumulative.

There will be alternate options for exams, but not in-person quizzes. If you must miss an in-person quiz (e.g., due to illness), and the instructor approves this, the in-person quizzes and exams will receive greater weight, without scaling. For example, a quiz is worth 10/65 of all the points. So if you miss a quiz, the other quiz will be worth 10 * (65 / 55) points and the exams will be worth 15 * (65 / 55 points). If you cannot make it to an exam nor its alternate, it will similarly be reweighted (with instructor approval).

If you take a exam or quiz, you cannot drop it and reweight after the fact.

Exam 3 cannot be skipped (only rescheduled to a later date, if necessary).

Quizzes

We'll have occasional online, multiple choice Canvas quizzes. There is no time limit, and they are open books/notes/AI. You may do them as a group with other CS 544 students if you like (thoug each student must submit separately).

Projects

There will be 8 substantial programming projects. AI policy will vary by project, to expose you to different tools (check the specific policy for each project). These can optionally be done with a single partner.

Academic Misconduct

Project Policies

Be sure to read and understand the full project collaboration policies here.

TopHat Policies

TopHat questions are intended for in-class participants. Students who submit any TopHat question remotely are not eligible for any extra credit for the course. We might notice this by passing around a sign-up sheet following a TopHat question.

Piazza Policies

Do not post project code snippets that are >5 lines long.

Exam and In-Person Quiz Policies

Online Quiz Policies

Allowed
NOT allowed

Recommendation Letters

Earning a recommendation letter is much harder than earning an A in this course. At a minimum, I'll want to see you doing something complex and interesting beyond the assingments. For a typical letter, I'll have collaborated with a student on some project for multiple months, with many iterations of feedback.

Most grad schools require recommenders to fill long forms rating students on various abilities (see an example below). Make sure that if you're asking me, I would be able to fill such a form without needing to put "I don't know" as my answer to many of the questions.