Course Schedule
Part 1: Resources and Deployment
Week 1
Mon, Sep 4
Labor Day
Week 2
Mon, Sep 11
Deployment (Linux Pipelines)
Read: Designing Data Intensive Applications, Kleppmann ("Batch Processing with Unix Tools" of Chapter 10)
Watch: Lecture
Slides: PDF
Wed, Sep 13
Deployment (Docker)
Released: P1 (Docker)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 1
Fri, Sep 15
Compute Resources (PyTorch Basics)
Read: Machine Learning with PyTorch and Scikit-Learn ("PyTorch's computation graphs", "PyTorch tensor objects for storing and updating model parameters", and "Computing gradients via automatic differentiation" sections of chapter 13, "Going Deeper - Mechanics of PyTorch")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 3
Mon, Sep 18
Compute Resources (PyTorch Optimization)
Read: Machine Learning with PyTorch and Scikit-Learn ("Building input pipelines in PyTorch" and "Building an NN model in PyTorch" sections of chapter 12, "Parallelizing Neural Network Training with PyTorch")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Sep 20
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")
Due: P1
Released: P2 (PyTorch, COVID)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 2 and before (cumulative)
Fri, Sep 22
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Week 4
Mon, Sep 25
Memory Resources (Caching)
Read: Systems Performance, Gregg (6.2.2; "CPU Caches" and "Latency" subsections of 6.4.1)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Fri, Sep 29
Memory Resources (PyArrow)
Read: Gallery of Processor Cache Effects (Examples 1 and 2)
Due: P2
Released: P3 (Threads+Caching+gRPC, Model Serving)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 5
Mon, Oct 2
Network Resources (Overview)
Read: Designing Data Intensive Applications, Kleppmann (Chapter 4, "Encoding and Evolution")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 4
Network Resources (gRPC+Compose)
Read: gRPC Basics Tutorial
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 4 and before (cumulative)
Week 6
Mon, Oct 9
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 11
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 5 and before (cumulative)
Part 2: Clusters and Hadoop Ecosystem
Week 7
Mon, Oct 16
HDFS Practice
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Released: P4 (HDFS, Loans)
Due: P3
Watch: Lecture
Anki Flashcards: Deck
Wed, Oct 18
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 6 and before (cumulative)
Week 8
Wed, Oct 25
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Watch: Lecture
Anki Flashcards: Deck
Fri, Oct 27
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Due: P4
Released: P5 (Spark, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 9
Mon, Oct 30
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Nov 1
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 8 and before (cumulative)
Fri, Nov 3
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 10
Wed, Nov 8
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 9 and before (cumulative)
Fri, Nov 10
Cassandra Replication
Due: P5
Released: P6 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 11
Mon, Nov 13
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Nov 15
Streaming: Kafka Demos
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Fri, Nov 17
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 12
Mon, Nov 20
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Watch: Lecture
Anki Flashcards: Deck
Wed, Nov 22
Streaming: Spark Concepts
Due: P6
Released: P7 (Kafka, Weather Stations)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 11 and before (cumulative)
Fri, Nov 24
Thanksgiving Break
Part 3: The Cloud
Week 13
Wed, Nov 29
Big Query 1
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 12 and before (cumulative)
Fri, Dec 1
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Watch: Lecture
Anki Flashcards: Deck
Week 14
Mon, Dec 4
Cancelled
Please use the extra time to work on P7.
Wed, Dec 6
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 13 and before (cumulative)
Week 15
Mon, Sep 4
Labor Day
Mon, Sep 11
Deployment (Linux Pipelines)
Read: Designing Data Intensive Applications, Kleppmann ("Batch Processing with Unix Tools" of Chapter 10)Watch: Lecture
Slides: PDF
Wed, Sep 13
Deployment (Docker)
Released: P1 (Docker)Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 1
Fri, Sep 15
Compute Resources (PyTorch Basics)
Read: Machine Learning with PyTorch and Scikit-Learn ("PyTorch's computation graphs", "PyTorch tensor objects for storing and updating model parameters", and "Computing gradients via automatic differentiation" sections of chapter 13, "Going Deeper - Mechanics of PyTorch")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 3
Mon, Sep 18
Compute Resources (PyTorch Optimization)
Read: Machine Learning with PyTorch and Scikit-Learn ("Building input pipelines in PyTorch" and "Building an NN model in PyTorch" sections of chapter 12, "Parallelizing Neural Network Training with PyTorch")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Sep 20
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")
Due: P1
Released: P2 (PyTorch, COVID)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 2 and before (cumulative)
Fri, Sep 22
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Week 4
Mon, Sep 25
Memory Resources (Caching)
Read: Systems Performance, Gregg (6.2.2; "CPU Caches" and "Latency" subsections of 6.4.1)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Fri, Sep 29
Memory Resources (PyArrow)
Read: Gallery of Processor Cache Effects (Examples 1 and 2)
Due: P2
Released: P3 (Threads+Caching+gRPC, Model Serving)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 5
Mon, Oct 2
Network Resources (Overview)
Read: Designing Data Intensive Applications, Kleppmann (Chapter 4, "Encoding and Evolution")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 4
Network Resources (gRPC+Compose)
Read: gRPC Basics Tutorial
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 4 and before (cumulative)
Week 6
Mon, Oct 9
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 11
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 5 and before (cumulative)
Part 2: Clusters and Hadoop Ecosystem
Week 7
Mon, Oct 16
HDFS Practice
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Released: P4 (HDFS, Loans)
Due: P3
Watch: Lecture
Anki Flashcards: Deck
Wed, Oct 18
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 6 and before (cumulative)
Week 8
Wed, Oct 25
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Watch: Lecture
Anki Flashcards: Deck
Fri, Oct 27
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Due: P4
Released: P5 (Spark, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 9
Mon, Oct 30
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Nov 1
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 8 and before (cumulative)
Fri, Nov 3
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 10
Wed, Nov 8
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 9 and before (cumulative)
Fri, Nov 10
Cassandra Replication
Due: P5
Released: P6 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 11
Mon, Nov 13
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Nov 15
Streaming: Kafka Demos
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Fri, Nov 17
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 12
Mon, Nov 20
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Watch: Lecture
Anki Flashcards: Deck
Wed, Nov 22
Streaming: Spark Concepts
Due: P6
Released: P7 (Kafka, Weather Stations)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 11 and before (cumulative)
Fri, Nov 24
Thanksgiving Break
Part 3: The Cloud
Week 13
Wed, Nov 29
Big Query 1
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 12 and before (cumulative)
Fri, Dec 1
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Watch: Lecture
Anki Flashcards: Deck
Week 14
Mon, Dec 4
Cancelled
Please use the extra time to work on P7.
Wed, Dec 6
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 13 and before (cumulative)
Week 15
Mon, Sep 18
Compute Resources (PyTorch Optimization)
Read: Machine Learning with PyTorch and Scikit-Learn ("Building input pipelines in PyTorch" and "Building an NN model in PyTorch" sections of chapter 12, "Parallelizing Neural Network Training with PyTorch")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Sep 20
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")Due: P1
Released: P2 (PyTorch, COVID)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 2 and before (cumulative)
Fri, Sep 22
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Mon, Sep 25
Memory Resources (Caching)
Read: Systems Performance, Gregg (6.2.2; "CPU Caches" and "Latency" subsections of 6.4.1)Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Fri, Sep 29
Memory Resources (PyArrow)
Read: Gallery of Processor Cache Effects (Examples 1 and 2)Due: P2
Released: P3 (Threads+Caching+gRPC, Model Serving)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 5
Mon, Oct 2
Network Resources (Overview)
Read: Designing Data Intensive Applications, Kleppmann (Chapter 4, "Encoding and Evolution")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 4
Network Resources (gRPC+Compose)
Read: gRPC Basics Tutorial
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 4 and before (cumulative)
Week 6
Mon, Oct 9
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 11
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 5 and before (cumulative)
Part 2: Clusters and Hadoop Ecosystem
Week 7
Mon, Oct 16
HDFS Practice
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Released: P4 (HDFS, Loans)
Due: P3
Watch: Lecture
Anki Flashcards: Deck
Wed, Oct 18
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 6 and before (cumulative)
Week 8
Wed, Oct 25
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Watch: Lecture
Anki Flashcards: Deck
Fri, Oct 27
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Due: P4
Released: P5 (Spark, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 9
Mon, Oct 30
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Nov 1
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 8 and before (cumulative)
Fri, Nov 3
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 10
Wed, Nov 8
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 9 and before (cumulative)
Fri, Nov 10
Cassandra Replication
Due: P5
Released: P6 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 11
Mon, Nov 13
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Nov 15
Streaming: Kafka Demos
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Fri, Nov 17
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 12
Mon, Nov 20
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Watch: Lecture
Anki Flashcards: Deck
Wed, Nov 22
Streaming: Spark Concepts
Due: P6
Released: P7 (Kafka, Weather Stations)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 11 and before (cumulative)
Fri, Nov 24
Thanksgiving Break
Part 3: The Cloud
Week 13
Wed, Nov 29
Big Query 1
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 12 and before (cumulative)
Fri, Dec 1
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Watch: Lecture
Anki Flashcards: Deck
Week 14
Mon, Dec 4
Cancelled
Please use the extra time to work on P7.
Wed, Dec 6
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 13 and before (cumulative)
Week 15
Mon, Oct 2
Network Resources (Overview)
Read: Designing Data Intensive Applications, Kleppmann (Chapter 4, "Encoding and Evolution")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 4
Network Resources (gRPC+Compose)
Read: gRPC Basics TutorialWatch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 4 and before (cumulative)
Mon, Oct 9
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 11
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 5 and before (cumulative)
Part 2: Clusters and Hadoop Ecosystem
Week 7
Mon, Oct 16
HDFS Practice
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Released: P4 (HDFS, Loans)
Due: P3
Watch: Lecture
Anki Flashcards: Deck
Wed, Oct 18
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 6 and before (cumulative)
Week 8
Wed, Oct 25
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Watch: Lecture
Anki Flashcards: Deck
Fri, Oct 27
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Due: P4
Released: P5 (Spark, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 9
Mon, Oct 30
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Nov 1
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 8 and before (cumulative)
Fri, Nov 3
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 10
Wed, Nov 8
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 9 and before (cumulative)
Fri, Nov 10
Cassandra Replication
Due: P5
Released: P6 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 11
Mon, Nov 13
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Nov 15
Streaming: Kafka Demos
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Fri, Nov 17
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 12
Mon, Nov 20
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Watch: Lecture
Anki Flashcards: Deck
Wed, Nov 22
Streaming: Spark Concepts
Due: P6
Released: P7 (Kafka, Weather Stations)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 11 and before (cumulative)
Fri, Nov 24
Thanksgiving Break
Part 3: The Cloud
Week 13
Wed, Nov 29
Big Query 1
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 12 and before (cumulative)
Fri, Dec 1
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Watch: Lecture
Anki Flashcards: Deck
Week 14
Mon, Dec 4
Cancelled
Please use the extra time to work on P7.
Wed, Dec 6
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 13 and before (cumulative)
Week 15
Mon, Oct 16
HDFS Practice
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)Released: P4 (HDFS, Loans)
Due: P3
Watch: Lecture
Anki Flashcards: Deck
Wed, Oct 18
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 6 and before (cumulative)
Wed, Oct 25
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")Watch: Lecture
Anki Flashcards: Deck
Fri, Oct 27
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")Due: P4
Released: P5 (Spark, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 9
Mon, Oct 30
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Nov 1
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 8 and before (cumulative)
Fri, Nov 3
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 10
Wed, Nov 8
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 9 and before (cumulative)
Fri, Nov 10
Cassandra Replication
Due: P5
Released: P6 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 11
Mon, Nov 13
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Nov 15
Streaming: Kafka Demos
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Fri, Nov 17
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 12
Mon, Nov 20
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Watch: Lecture
Anki Flashcards: Deck
Wed, Nov 22
Streaming: Spark Concepts
Due: P6
Released: P7 (Kafka, Weather Stations)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 11 and before (cumulative)
Fri, Nov 24
Thanksgiving Break
Part 3: The Cloud
Week 13
Wed, Nov 29
Big Query 1
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 12 and before (cumulative)
Fri, Dec 1
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Watch: Lecture
Anki Flashcards: Deck
Week 14
Mon, Dec 4
Cancelled
Please use the extra time to work on P7.
Wed, Dec 6
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 13 and before (cumulative)
Week 15
Mon, Oct 30
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Nov 1
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 8 and before (cumulative)
Fri, Nov 3
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Nov 8
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 9 and before (cumulative)
Fri, Nov 10
Cassandra Replication
Due: P5Released: P6 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 11
Mon, Nov 13
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Nov 15
Streaming: Kafka Demos
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Fri, Nov 17
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 12
Mon, Nov 20
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Watch: Lecture
Anki Flashcards: Deck
Wed, Nov 22
Streaming: Spark Concepts
Due: P6
Released: P7 (Kafka, Weather Stations)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 11 and before (cumulative)
Fri, Nov 24
Thanksgiving Break
Part 3: The Cloud
Week 13
Wed, Nov 29
Big Query 1
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 12 and before (cumulative)
Fri, Dec 1
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Watch: Lecture
Anki Flashcards: Deck
Week 14
Mon, Dec 4
Cancelled
Please use the extra time to work on P7.
Wed, Dec 6
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 13 and before (cumulative)
Week 15
Mon, Nov 13
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Nov 15
Streaming: Kafka Demos
Watch: LectureAnki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Fri, Nov 17
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Mon, Nov 20
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")Watch: Lecture
Anki Flashcards: Deck
Wed, Nov 22
Streaming: Spark Concepts
Due: P6Released: P7 (Kafka, Weather Stations)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 11 and before (cumulative)
Fri, Nov 24
Thanksgiving Break
Part 3: The Cloud
Week 13
Wed, Nov 29
Big Query 1
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 12 and before (cumulative)
Fri, Dec 1
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Watch: Lecture
Anki Flashcards: Deck
Week 14
Mon, Dec 4
Cancelled
Please use the extra time to work on P7.
Wed, Dec 6
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 13 and before (cumulative)
Week 15
Wed, Nov 29
Big Query 1
Watch: LectureSlides: PDF
Anki Flashcards: Deck
Quiz: week 12 and before (cumulative)
Fri, Dec 1
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")Watch: Lecture
Anki Flashcards: Deck
Mon, Dec 4
Cancelled
Please use the extra time to work on P7.
Wed, Dec 6
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")Due: P7
Released: P8 (BigQuery, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 13 and before (cumulative)