Course Schedule
Part 1: Resources and Deployment
Week 1
Mon, Sep 4
Labor Day
Week 2
Mon, Sep 11
Deployment (Linux Pipelines)
Read: Designing Data Intensive Applications, Kleppmann ("Batch Processing with Unix Tools" of Chapter 10)
Watch: Lecture
Slides: PDF
Wed, Sep 13
Deployment (Docker)
Released: P1 (Docker)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 1
Fri, Sep 15
Compute Resources (PyTorch Basics)
Read: Machine Learning with PyTorch and Scikit-Learn ("PyTorch's computation graphs", "PyTorch tensor objects for storing and updating model parameters", and "Computing gradients via automatic differentiation" sections of chapter 13, "Going Deeper - Mechanics of PyTorch")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 3
Mon, Sep 18
Compute Resources (PyTorch Optimization)
Read: Machine Learning with PyTorch and Scikit-Learn ("Building input pipelines in PyTorch" and "Building an NN model in PyTorch" sections of chapter 12, "Parallelizing Neural Network Training with PyTorch")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Sep 20
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")
Due: P1
Released: P2 (PyTorch, COVID)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 2 and before (cumulative)
Fri, Sep 22
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Week 4
Mon, Sep 25
Memory Resources (Caching)
Read: Systems Performance, Gregg (6.2.2; "CPU Caches" and "Latency" subsections of 6.4.1)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Fri, Sep 29
Memory Resources (PyArrow)
Read: Gallery of Processor Cache Effects (Examples 1 and 2)
Due: P2
Released: P3 (Threads+Caching+gRPC, Model Serving)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 5
Mon, Oct 2
Network Resources (Overview)
Read: Designing Data Intensive Applications, Kleppmann (Chapter 4, "Encoding and Evolution")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 4
Network Resources (gRPC+Compose)
Read: gRPC Basics Tutorial
Slides: PDF
Anki Flashcards: Deck
Quiz: week 4 and before (cumulative)
Part 2: Clusters and Hadoop Ecosystem
Week 6
Mon, Oct 9
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")
Old Slides: PDF
Wed, Oct 11
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Old Slides: PDF
Quiz: week 5 and before (cumulative)
Week 7
Mon, Oct 16
HDFS
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Released: P4 (HDFS, Loans)
Wed, Oct 18
MapReduce and Spark
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Watch: PDF
Quiz: week 6 and before (cumulative)
Fri, Oct 20
Spark (Structured APIs)
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Week 8
Mon, Oct 23
Midterm (in class)
Wed, Oct 25
Spark Grouping and Joining
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Old Slides: PDF
Fri, Oct 27
Spark Internals and Performance 1
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Due: P4
Released: P5 (Spark, Loans)
Old Slides: PDF
Week 9
Mon, Oct 30
Spark Internals and Performance 2
Wed, Nov 1
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Old Slides: PDF
Quiz: week 8 and before (cumulative)
Fri, Nov 3
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Old Slides: PDF
Week 10
Mon, Nov 6
Cassandra Query Language (CQL)
Wed, Nov 8
Cassandra Partitioning+Replication
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Old Slides: PDF
Old Worksheet: PDF
Quiz: week 9 and before (cumulative)
Week 11
Mon, Nov 13
Cassandra Storage Engine
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Memtables, SSTables, and Commit Logs" to "Summary" of Chapter 6, "The Cassandra Architecture")
Old Slides: PDF
Wed, Nov 15
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Old Slides: PDF
Old Worksheet: PDF
Quiz: week 10 and before (cumulative)
Fri, Nov 17
Streaming: Kafka Demos
Week 12
Mon, Nov 20
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Old Slides: PDF
Wed, Nov 22
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Due: P6
Released: P7 (Kafka, Weather Stations)
Quiz: week 11 and before (cumulative)
Fri, Nov 24
Thanksgiving Break
Week 13
Fri, Dec 1
Big Query 1
Part 3: The Cloud
Week 14
Mon, Dec 4
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Old Slides: PDF
Wed, Dec 6
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Old Slides: PDF
Quiz: week 13 and before (cumulative)
Fri, Dec 8
Big Query 4
Week 15
Wed, Dec 13
Review
Due: P8
Mon, Sep 4
Labor Day
Mon, Sep 11
Deployment (Linux Pipelines)
Read: Designing Data Intensive Applications, Kleppmann ("Batch Processing with Unix Tools" of Chapter 10)Watch: Lecture
Slides: PDF
Wed, Sep 13
Deployment (Docker)
Released: P1 (Docker)Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 1
Fri, Sep 15
Compute Resources (PyTorch Basics)
Read: Machine Learning with PyTorch and Scikit-Learn ("PyTorch's computation graphs", "PyTorch tensor objects for storing and updating model parameters", and "Computing gradients via automatic differentiation" sections of chapter 13, "Going Deeper - Mechanics of PyTorch")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 3
Mon, Sep 18
Compute Resources (PyTorch Optimization)
Read: Machine Learning with PyTorch and Scikit-Learn ("Building input pipelines in PyTorch" and "Building an NN model in PyTorch" sections of chapter 12, "Parallelizing Neural Network Training with PyTorch")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Sep 20
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")
Due: P1
Released: P2 (PyTorch, COVID)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 2 and before (cumulative)
Fri, Sep 22
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Week 4
Mon, Sep 25
Memory Resources (Caching)
Read: Systems Performance, Gregg (6.2.2; "CPU Caches" and "Latency" subsections of 6.4.1)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Fri, Sep 29
Memory Resources (PyArrow)
Read: Gallery of Processor Cache Effects (Examples 1 and 2)
Due: P2
Released: P3 (Threads+Caching+gRPC, Model Serving)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 5
Mon, Oct 2
Network Resources (Overview)
Read: Designing Data Intensive Applications, Kleppmann (Chapter 4, "Encoding and Evolution")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 4
Network Resources (gRPC+Compose)
Read: gRPC Basics Tutorial
Slides: PDF
Anki Flashcards: Deck
Quiz: week 4 and before (cumulative)
Part 2: Clusters and Hadoop Ecosystem
Week 6
Mon, Oct 9
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")
Old Slides: PDF
Wed, Oct 11
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Old Slides: PDF
Quiz: week 5 and before (cumulative)
Week 7
Mon, Oct 16
HDFS
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Released: P4 (HDFS, Loans)
Wed, Oct 18
MapReduce and Spark
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Watch: PDF
Quiz: week 6 and before (cumulative)
Fri, Oct 20
Spark (Structured APIs)
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Week 8
Mon, Oct 23
Midterm (in class)
Wed, Oct 25
Spark Grouping and Joining
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Old Slides: PDF
Fri, Oct 27
Spark Internals and Performance 1
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Due: P4
Released: P5 (Spark, Loans)
Old Slides: PDF
Week 9
Mon, Oct 30
Spark Internals and Performance 2
Wed, Nov 1
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Old Slides: PDF
Quiz: week 8 and before (cumulative)
Fri, Nov 3
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Old Slides: PDF
Week 10
Mon, Nov 6
Cassandra Query Language (CQL)
Wed, Nov 8
Cassandra Partitioning+Replication
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Old Slides: PDF
Old Worksheet: PDF
Quiz: week 9 and before (cumulative)
Week 11
Mon, Nov 13
Cassandra Storage Engine
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Memtables, SSTables, and Commit Logs" to "Summary" of Chapter 6, "The Cassandra Architecture")
Old Slides: PDF
Wed, Nov 15
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Old Slides: PDF
Old Worksheet: PDF
Quiz: week 10 and before (cumulative)
Fri, Nov 17
Streaming: Kafka Demos
Week 12
Mon, Nov 20
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Old Slides: PDF
Wed, Nov 22
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Due: P6
Released: P7 (Kafka, Weather Stations)
Quiz: week 11 and before (cumulative)
Fri, Nov 24
Thanksgiving Break
Week 13
Fri, Dec 1
Big Query 1
Part 3: The Cloud
Week 14
Mon, Dec 4
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Old Slides: PDF
Wed, Dec 6
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Old Slides: PDF
Quiz: week 13 and before (cumulative)
Fri, Dec 8
Big Query 4
Week 15
Wed, Dec 13
Review
Due: P8
Mon, Sep 18
Compute Resources (PyTorch Optimization)
Read: Machine Learning with PyTorch and Scikit-Learn ("Building input pipelines in PyTorch" and "Building an NN model in PyTorch" sections of chapter 12, "Parallelizing Neural Network Training with PyTorch")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Sep 20
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")Due: P1
Released: P2 (PyTorch, COVID)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 2 and before (cumulative)
Fri, Sep 22
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Mon, Sep 25
Memory Resources (Caching)
Read: Systems Performance, Gregg (6.2.2; "CPU Caches" and "Latency" subsections of 6.4.1)Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Fri, Sep 29
Memory Resources (PyArrow)
Read: Gallery of Processor Cache Effects (Examples 1 and 2)Due: P2
Released: P3 (Threads+Caching+gRPC, Model Serving)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 5
Mon, Oct 2
Network Resources (Overview)
Read: Designing Data Intensive Applications, Kleppmann (Chapter 4, "Encoding and Evolution")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 4
Network Resources (gRPC+Compose)
Read: gRPC Basics Tutorial
Slides: PDF
Anki Flashcards: Deck
Quiz: week 4 and before (cumulative)
Part 2: Clusters and Hadoop Ecosystem
Week 6
Mon, Oct 9
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")
Old Slides: PDF
Wed, Oct 11
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Old Slides: PDF
Quiz: week 5 and before (cumulative)
Week 7
Mon, Oct 16
HDFS
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Released: P4 (HDFS, Loans)
Wed, Oct 18
MapReduce and Spark
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Watch: PDF
Quiz: week 6 and before (cumulative)
Fri, Oct 20
Spark (Structured APIs)
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Week 8
Mon, Oct 23
Midterm (in class)
Wed, Oct 25
Spark Grouping and Joining
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Old Slides: PDF
Fri, Oct 27
Spark Internals and Performance 1
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Due: P4
Released: P5 (Spark, Loans)
Old Slides: PDF
Week 9
Mon, Oct 30
Spark Internals and Performance 2
Wed, Nov 1
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Old Slides: PDF
Quiz: week 8 and before (cumulative)
Fri, Nov 3
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Old Slides: PDF
Week 10
Mon, Nov 6
Cassandra Query Language (CQL)
Wed, Nov 8
Cassandra Partitioning+Replication
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Old Slides: PDF
Old Worksheet: PDF
Quiz: week 9 and before (cumulative)
Week 11
Mon, Nov 13
Cassandra Storage Engine
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Memtables, SSTables, and Commit Logs" to "Summary" of Chapter 6, "The Cassandra Architecture")
Old Slides: PDF
Wed, Nov 15
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Old Slides: PDF
Old Worksheet: PDF
Quiz: week 10 and before (cumulative)
Fri, Nov 17
Streaming: Kafka Demos
Week 12
Mon, Nov 20
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Old Slides: PDF
Wed, Nov 22
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Due: P6
Released: P7 (Kafka, Weather Stations)
Quiz: week 11 and before (cumulative)
Fri, Nov 24
Thanksgiving Break
Week 13
Fri, Dec 1
Big Query 1
Part 3: The Cloud
Week 14
Mon, Dec 4
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Old Slides: PDF
Wed, Dec 6
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Old Slides: PDF
Quiz: week 13 and before (cumulative)
Fri, Dec 8
Big Query 4
Week 15
Wed, Dec 13
Review
Due: P8
Mon, Oct 2
Network Resources (Overview)
Read: Designing Data Intensive Applications, Kleppmann (Chapter 4, "Encoding and Evolution")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 4
Network Resources (gRPC+Compose)
Read: gRPC Basics TutorialSlides: PDF
Anki Flashcards: Deck
Quiz: week 4 and before (cumulative)
Mon, Oct 9
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")Old Slides: PDF
Wed, Oct 11
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")Old Slides: PDF
Quiz: week 5 and before (cumulative)
Week 7
Mon, Oct 16
HDFS
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Released: P4 (HDFS, Loans)
Wed, Oct 18
MapReduce and Spark
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Watch: PDF
Quiz: week 6 and before (cumulative)
Fri, Oct 20
Spark (Structured APIs)
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Week 8
Mon, Oct 23
Midterm (in class)
Wed, Oct 25
Spark Grouping and Joining
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Old Slides: PDF
Fri, Oct 27
Spark Internals and Performance 1
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Due: P4
Released: P5 (Spark, Loans)
Old Slides: PDF
Week 9
Mon, Oct 30
Spark Internals and Performance 2
Wed, Nov 1
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Old Slides: PDF
Quiz: week 8 and before (cumulative)
Fri, Nov 3
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Old Slides: PDF
Week 10
Mon, Nov 6
Cassandra Query Language (CQL)
Wed, Nov 8
Cassandra Partitioning+Replication
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Old Slides: PDF
Old Worksheet: PDF
Quiz: week 9 and before (cumulative)
Week 11
Mon, Nov 13
Cassandra Storage Engine
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Memtables, SSTables, and Commit Logs" to "Summary" of Chapter 6, "The Cassandra Architecture")
Old Slides: PDF
Wed, Nov 15
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Old Slides: PDF
Old Worksheet: PDF
Quiz: week 10 and before (cumulative)
Fri, Nov 17
Streaming: Kafka Demos
Week 12
Mon, Nov 20
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Old Slides: PDF
Wed, Nov 22
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Due: P6
Released: P7 (Kafka, Weather Stations)
Quiz: week 11 and before (cumulative)
Fri, Nov 24
Thanksgiving Break
Week 13
Fri, Dec 1
Big Query 1
Part 3: The Cloud
Week 14
Mon, Dec 4
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Old Slides: PDF
Wed, Dec 6
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Old Slides: PDF
Quiz: week 13 and before (cumulative)
Fri, Dec 8
Big Query 4
Week 15
Wed, Dec 13
Review
Due: P8
Mon, Oct 16
HDFS
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)Released: P4 (HDFS, Loans)
Wed, Oct 18
MapReduce and Spark
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")Watch: PDF
Quiz: week 6 and before (cumulative)
Fri, Oct 20
Spark (Structured APIs)
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")Mon, Oct 23
Midterm (in class)
Wed, Oct 25
Spark Grouping and Joining
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")Old Slides: PDF
Fri, Oct 27
Spark Internals and Performance 1
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")Due: P4
Released: P5 (Spark, Loans)
Old Slides: PDF
Week 9
Mon, Oct 30
Spark Internals and Performance 2
Wed, Nov 1
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Old Slides: PDF
Quiz: week 8 and before (cumulative)
Fri, Nov 3
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Old Slides: PDF
Week 10
Mon, Nov 6
Cassandra Query Language (CQL)
Wed, Nov 8
Cassandra Partitioning+Replication
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Old Slides: PDF
Old Worksheet: PDF
Quiz: week 9 and before (cumulative)
Week 11
Mon, Nov 13
Cassandra Storage Engine
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Memtables, SSTables, and Commit Logs" to "Summary" of Chapter 6, "The Cassandra Architecture")
Old Slides: PDF
Wed, Nov 15
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Old Slides: PDF
Old Worksheet: PDF
Quiz: week 10 and before (cumulative)
Fri, Nov 17
Streaming: Kafka Demos
Week 12
Mon, Nov 20
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Old Slides: PDF
Wed, Nov 22
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Due: P6
Released: P7 (Kafka, Weather Stations)
Quiz: week 11 and before (cumulative)
Fri, Nov 24
Thanksgiving Break
Week 13
Fri, Dec 1
Big Query 1
Part 3: The Cloud
Week 14
Mon, Dec 4
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Old Slides: PDF
Wed, Dec 6
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Old Slides: PDF
Quiz: week 13 and before (cumulative)
Fri, Dec 8
Big Query 4
Week 15
Wed, Dec 13
Review
Due: P8
Mon, Oct 30
Spark Internals and Performance 2
Wed, Nov 1
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")Old Slides: PDF
Quiz: week 8 and before (cumulative)
Fri, Nov 3
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")Old Slides: PDF
Mon, Nov 6
Cassandra Query Language (CQL)
Wed, Nov 8
Cassandra Partitioning+Replication
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")Old Slides: PDF
Old Worksheet: PDF
Quiz: week 9 and before (cumulative)
Week 11
Mon, Nov 13
Cassandra Storage Engine
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Memtables, SSTables, and Commit Logs" to "Summary" of Chapter 6, "The Cassandra Architecture")
Old Slides: PDF
Wed, Nov 15
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Old Slides: PDF
Old Worksheet: PDF
Quiz: week 10 and before (cumulative)
Fri, Nov 17
Streaming: Kafka Demos
Week 12
Mon, Nov 20
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Old Slides: PDF
Wed, Nov 22
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Due: P6
Released: P7 (Kafka, Weather Stations)
Quiz: week 11 and before (cumulative)
Fri, Nov 24
Thanksgiving Break
Week 13
Fri, Dec 1
Big Query 1
Part 3: The Cloud
Week 14
Mon, Dec 4
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Old Slides: PDF
Wed, Dec 6
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Old Slides: PDF
Quiz: week 13 and before (cumulative)
Fri, Dec 8
Big Query 4
Week 15
Wed, Dec 13
Review
Due: P8
Mon, Nov 13
Cassandra Storage Engine
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Memtables, SSTables, and Commit Logs" to "Summary" of Chapter 6, "The Cassandra Architecture")Old Slides: PDF
Wed, Nov 15
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")Old Slides: PDF
Old Worksheet: PDF
Quiz: week 10 and before (cumulative)
Fri, Nov 17
Streaming: Kafka Demos
Mon, Nov 20
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")Old Slides: PDF
Wed, Nov 22
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")Due: P6
Released: P7 (Kafka, Weather Stations)
Quiz: week 11 and before (cumulative)
Fri, Nov 24
Thanksgiving Break
Week 13
Fri, Dec 1
Big Query 1
Part 3: The Cloud
Week 14
Mon, Dec 4
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Old Slides: PDF
Wed, Dec 6
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Old Slides: PDF
Quiz: week 13 and before (cumulative)
Fri, Dec 8
Big Query 4
Week 15
Wed, Dec 13
Review
Due: P8
Fri, Dec 1
Big Query 1
Mon, Dec 4
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")Old Slides: PDF
Wed, Dec 6
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")Due: P7
Released: P8 (BigQuery, Loans)
Old Slides: PDF
Quiz: week 13 and before (cumulative)