Course Schedule
Part 1: Resources and Deployment
Week 1
Mon, Jan 23
No Class
Week 2
Mon, Jan 30
Deployment (Linux Pipelines)
Read: Designing Data Intensive Applications, Kleppmann ("Batch Processing with Unix Tools" of Chapter 10)
Watch: Lecture
Wed, Feb 1
Memory Resources (Caching)
Read: Systems Performance, Gregg (6.2.2; "CPU Caches" and "Latency" subsections of 6.4.1)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Quiz: week 1
Fri, Feb 3
Compute Resources (PyTorch Numbers)
Released: P1 (PyTorch, COVID)
Read: Machine Learning with PyTorch and Scikit-Learn ("PyTorch's computation graphs", "PyTorch tensor objects for storing and updating model parameters", and "Computing gradients via automatic differentiation" sections of chapter 13, "Going Deeper - Mechanics of PyTorch")
Watch: Lecture
Slides: PDF
Week 3
Wed, Feb 8
Compute Resources (PyTorch ML)
Read: Machine Learning with PyTorch and Scikit-Learn ("Building input pipelines in PyTorch" and "Building an NN model in PyTorch" sections of chapter 12, "Parallelizing Neural Network Training with PyTorch")
Watch: Lecture
Slides: PDF
Quiz: week 2 and before (cumulative)
Fri, Feb 10
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")
Watch: Lecture
Slides: PDF
Worksheet: PDF
Week 4
Mon, Feb 13
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Wed, Feb 15
Network Resources (RPC)
Due: P1
Released: P2 (gRPC, K/V Store)
Read: Designing Data Intensive Applications, Kleppmann (Chapter 4, "Encoding and Evolution")
Watch: Lecture
Slides: PDF
Quiz: week 3 and before (cumulative)
Week 5
Wed, Feb 22
Storage Resources (File Systems)
Watch: Lecture
Slides: PDF
Quiz: week 4 and before (cumulative)
Fri, Feb 24
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")
Watch: Lecture
Slides: PDF
Part 2: Clusters and Hadoop Ecosystem
Week 6
Mon, Feb 27
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Watch: Lecture
Slides: PDF
Wed, Mar 1
Hadoop Ecosystem
Due: P2
Released: P3 (HDFS, Loans)
Watch: Lecture
Slides: PDF
Quiz: week 5 and before (cumulative)
Fri, Mar 3
HDFS
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Watch: Part 1, Part 2
Week 7
Mon, Mar 6
MapReduce and Spark
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Watch: Part 1, Part 2, Part 3
Slides: PDF
Wed, Mar 8
Spark (Structured APIs)
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Watch: Lecture
Quiz: week 6 and before (cumulative)
Week 8
Mon, Mar 13
Spring Break
Wed, Mar 15
Spring Break
Fri, Mar 17
Spring Break
Week 9
Mon, Mar 20
Midterm (in class)
Due: P3
Wed, Mar 22
Spark Grouping and Joining
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Watch: Lecture
Slides: PDF
Fri, Mar 24
Spark Internals and Performance 1
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Watch: Lecture
Slides: PDF
Week 10
Wed, Mar 29
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture
Slides: PDF
Quiz: week 9 and before (cumulative)
Week 11
Mon, Apr 3
Cassandra Partitioning+Replication
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P4
Released: P5 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Fri, Apr 7
Cassandra Storage Engine
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Memtables, SSTables, and Commit Logs" to "Summary" of Chapter 6, "The Cassandra Architecture")
Watch: Lecture
Slides: PDF
Week 12
Mon, Apr 10
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Worksheet: PDF
Fri, Apr 14
Streaming: Kafka Reliability
Due: P5
Released: P6 (Streaming+MLib, Weather)
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Watch: Lecture
Slides: PDF
Week 13
Mon, Apr 17
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Watch: Lecture
Wed, Apr 19
Streaming: Spark Concepts
Watch: Lecture
Slides: PDF
Quiz: week 12 and before (cumulative)
Fri, Apr 21
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Slides: PDF
Part 3: The Cloud
Week 14
Wed, Apr 26
Big Query 1
Released: P7 (BigQuery, Optional)
Watch: Lecture
Quiz: week 13 and before (cumulative)
Fri, Apr 28
Big Query 2
Due: P6
Watch: Lecture
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Slides: PDF
Week 15
Mon, May 1
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Watch: Lecture
Slides: PDF
Mon, Jan 23
No Class
Mon, Jan 30
Deployment (Linux Pipelines)
Read: Designing Data Intensive Applications, Kleppmann ("Batch Processing with Unix Tools" of Chapter 10)Watch: Lecture
Wed, Feb 1
Memory Resources (Caching)
Read: Systems Performance, Gregg (6.2.2; "CPU Caches" and "Latency" subsections of 6.4.1)Watch: Lecture
Slides: PDF
Worksheet: PDF
Quiz: week 1
Fri, Feb 3
Compute Resources (PyTorch Numbers)
Released: P1 (PyTorch, COVID)Read: Machine Learning with PyTorch and Scikit-Learn ("PyTorch's computation graphs", "PyTorch tensor objects for storing and updating model parameters", and "Computing gradients via automatic differentiation" sections of chapter 13, "Going Deeper - Mechanics of PyTorch")
Watch: Lecture
Slides: PDF
Week 3
Wed, Feb 8
Compute Resources (PyTorch ML)
Read: Machine Learning with PyTorch and Scikit-Learn ("Building input pipelines in PyTorch" and "Building an NN model in PyTorch" sections of chapter 12, "Parallelizing Neural Network Training with PyTorch")
Watch: Lecture
Slides: PDF
Quiz: week 2 and before (cumulative)
Fri, Feb 10
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")
Watch: Lecture
Slides: PDF
Worksheet: PDF
Week 4
Mon, Feb 13
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Wed, Feb 15
Network Resources (RPC)
Due: P1
Released: P2 (gRPC, K/V Store)
Read: Designing Data Intensive Applications, Kleppmann (Chapter 4, "Encoding and Evolution")
Watch: Lecture
Slides: PDF
Quiz: week 3 and before (cumulative)
Week 5
Wed, Feb 22
Storage Resources (File Systems)
Watch: Lecture
Slides: PDF
Quiz: week 4 and before (cumulative)
Fri, Feb 24
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")
Watch: Lecture
Slides: PDF
Part 2: Clusters and Hadoop Ecosystem
Week 6
Mon, Feb 27
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Watch: Lecture
Slides: PDF
Wed, Mar 1
Hadoop Ecosystem
Due: P2
Released: P3 (HDFS, Loans)
Watch: Lecture
Slides: PDF
Quiz: week 5 and before (cumulative)
Fri, Mar 3
HDFS
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Watch: Part 1, Part 2
Week 7
Mon, Mar 6
MapReduce and Spark
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Watch: Part 1, Part 2, Part 3
Slides: PDF
Wed, Mar 8
Spark (Structured APIs)
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Watch: Lecture
Quiz: week 6 and before (cumulative)
Week 8
Mon, Mar 13
Spring Break
Wed, Mar 15
Spring Break
Fri, Mar 17
Spring Break
Week 9
Mon, Mar 20
Midterm (in class)
Due: P3
Wed, Mar 22
Spark Grouping and Joining
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Watch: Lecture
Slides: PDF
Fri, Mar 24
Spark Internals and Performance 1
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Watch: Lecture
Slides: PDF
Week 10
Wed, Mar 29
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture
Slides: PDF
Quiz: week 9 and before (cumulative)
Week 11
Mon, Apr 3
Cassandra Partitioning+Replication
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P4
Released: P5 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Fri, Apr 7
Cassandra Storage Engine
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Memtables, SSTables, and Commit Logs" to "Summary" of Chapter 6, "The Cassandra Architecture")
Watch: Lecture
Slides: PDF
Week 12
Mon, Apr 10
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Worksheet: PDF
Fri, Apr 14
Streaming: Kafka Reliability
Due: P5
Released: P6 (Streaming+MLib, Weather)
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Watch: Lecture
Slides: PDF
Week 13
Mon, Apr 17
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Watch: Lecture
Wed, Apr 19
Streaming: Spark Concepts
Watch: Lecture
Slides: PDF
Quiz: week 12 and before (cumulative)
Fri, Apr 21
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Slides: PDF
Part 3: The Cloud
Week 14
Wed, Apr 26
Big Query 1
Released: P7 (BigQuery, Optional)
Watch: Lecture
Quiz: week 13 and before (cumulative)
Fri, Apr 28
Big Query 2
Due: P6
Watch: Lecture
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Slides: PDF
Week 15
Mon, May 1
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Watch: Lecture
Slides: PDF
Wed, Feb 8
Compute Resources (PyTorch ML)
Read: Machine Learning with PyTorch and Scikit-Learn ("Building input pipelines in PyTorch" and "Building an NN model in PyTorch" sections of chapter 12, "Parallelizing Neural Network Training with PyTorch")Watch: Lecture
Slides: PDF
Quiz: week 2 and before (cumulative)
Fri, Feb 10
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")Watch: Lecture
Slides: PDF
Worksheet: PDF
Mon, Feb 13
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)Watch: Lecture
Slides: PDF
Worksheet: PDF
Wed, Feb 15
Network Resources (RPC)
Due: P1Released: P2 (gRPC, K/V Store)
Read: Designing Data Intensive Applications, Kleppmann (Chapter 4, "Encoding and Evolution")
Watch: Lecture
Slides: PDF
Quiz: week 3 and before (cumulative)
Week 5
Wed, Feb 22
Storage Resources (File Systems)
Watch: Lecture
Slides: PDF
Quiz: week 4 and before (cumulative)
Fri, Feb 24
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")
Watch: Lecture
Slides: PDF
Part 2: Clusters and Hadoop Ecosystem
Week 6
Mon, Feb 27
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Watch: Lecture
Slides: PDF
Wed, Mar 1
Hadoop Ecosystem
Due: P2
Released: P3 (HDFS, Loans)
Watch: Lecture
Slides: PDF
Quiz: week 5 and before (cumulative)
Fri, Mar 3
HDFS
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Watch: Part 1, Part 2
Week 7
Mon, Mar 6
MapReduce and Spark
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Watch: Part 1, Part 2, Part 3
Slides: PDF
Wed, Mar 8
Spark (Structured APIs)
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Watch: Lecture
Quiz: week 6 and before (cumulative)
Week 8
Mon, Mar 13
Spring Break
Wed, Mar 15
Spring Break
Fri, Mar 17
Spring Break
Week 9
Mon, Mar 20
Midterm (in class)
Due: P3
Wed, Mar 22
Spark Grouping and Joining
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Watch: Lecture
Slides: PDF
Fri, Mar 24
Spark Internals and Performance 1
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Watch: Lecture
Slides: PDF
Week 10
Wed, Mar 29
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture
Slides: PDF
Quiz: week 9 and before (cumulative)
Week 11
Mon, Apr 3
Cassandra Partitioning+Replication
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P4
Released: P5 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Fri, Apr 7
Cassandra Storage Engine
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Memtables, SSTables, and Commit Logs" to "Summary" of Chapter 6, "The Cassandra Architecture")
Watch: Lecture
Slides: PDF
Week 12
Mon, Apr 10
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Worksheet: PDF
Fri, Apr 14
Streaming: Kafka Reliability
Due: P5
Released: P6 (Streaming+MLib, Weather)
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Watch: Lecture
Slides: PDF
Week 13
Mon, Apr 17
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Watch: Lecture
Wed, Apr 19
Streaming: Spark Concepts
Watch: Lecture
Slides: PDF
Quiz: week 12 and before (cumulative)
Fri, Apr 21
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Slides: PDF
Part 3: The Cloud
Week 14
Wed, Apr 26
Big Query 1
Released: P7 (BigQuery, Optional)
Watch: Lecture
Quiz: week 13 and before (cumulative)
Fri, Apr 28
Big Query 2
Due: P6
Watch: Lecture
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Slides: PDF
Week 15
Mon, May 1
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Watch: Lecture
Slides: PDF
Wed, Feb 22
Storage Resources (File Systems)
Watch: LectureSlides: PDF
Quiz: week 4 and before (cumulative)
Fri, Feb 24
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")Watch: Lecture
Slides: PDF
Mon, Feb 27
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")Watch: Lecture
Slides: PDF
Wed, Mar 1
Hadoop Ecosystem
Due: P2Released: P3 (HDFS, Loans)
Watch: Lecture
Slides: PDF
Quiz: week 5 and before (cumulative)
Fri, Mar 3
HDFS
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)Watch: Part 1, Part 2
Week 7
Mon, Mar 6
MapReduce and Spark
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Watch: Part 1, Part 2, Part 3
Slides: PDF
Wed, Mar 8
Spark (Structured APIs)
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Watch: Lecture
Quiz: week 6 and before (cumulative)
Week 8
Mon, Mar 13
Spring Break
Wed, Mar 15
Spring Break
Fri, Mar 17
Spring Break
Week 9
Mon, Mar 20
Midterm (in class)
Due: P3
Wed, Mar 22
Spark Grouping and Joining
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Watch: Lecture
Slides: PDF
Fri, Mar 24
Spark Internals and Performance 1
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Watch: Lecture
Slides: PDF
Week 10
Wed, Mar 29
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture
Slides: PDF
Quiz: week 9 and before (cumulative)
Week 11
Mon, Apr 3
Cassandra Partitioning+Replication
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P4
Released: P5 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Fri, Apr 7
Cassandra Storage Engine
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Memtables, SSTables, and Commit Logs" to "Summary" of Chapter 6, "The Cassandra Architecture")
Watch: Lecture
Slides: PDF
Week 12
Mon, Apr 10
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Worksheet: PDF
Fri, Apr 14
Streaming: Kafka Reliability
Due: P5
Released: P6 (Streaming+MLib, Weather)
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Watch: Lecture
Slides: PDF
Week 13
Mon, Apr 17
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Watch: Lecture
Wed, Apr 19
Streaming: Spark Concepts
Watch: Lecture
Slides: PDF
Quiz: week 12 and before (cumulative)
Fri, Apr 21
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Slides: PDF
Part 3: The Cloud
Week 14
Wed, Apr 26
Big Query 1
Released: P7 (BigQuery, Optional)
Watch: Lecture
Quiz: week 13 and before (cumulative)
Fri, Apr 28
Big Query 2
Due: P6
Watch: Lecture
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Slides: PDF
Week 15
Mon, May 1
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Watch: Lecture
Slides: PDF
Mon, Mar 6
MapReduce and Spark
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")Watch: Part 1, Part 2, Part 3
Slides: PDF
Wed, Mar 8
Spark (Structured APIs)
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")Watch: Lecture
Quiz: week 6 and before (cumulative)
Mon, Mar 13
Spring Break
Wed, Mar 15
Spring Break
Fri, Mar 17
Spring Break
Week 9
Mon, Mar 20
Midterm (in class)
Due: P3
Wed, Mar 22
Spark Grouping and Joining
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Watch: Lecture
Slides: PDF
Fri, Mar 24
Spark Internals and Performance 1
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Watch: Lecture
Slides: PDF
Week 10
Wed, Mar 29
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture
Slides: PDF
Quiz: week 9 and before (cumulative)
Week 11
Mon, Apr 3
Cassandra Partitioning+Replication
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P4
Released: P5 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Fri, Apr 7
Cassandra Storage Engine
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Memtables, SSTables, and Commit Logs" to "Summary" of Chapter 6, "The Cassandra Architecture")
Watch: Lecture
Slides: PDF
Week 12
Mon, Apr 10
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Worksheet: PDF
Fri, Apr 14
Streaming: Kafka Reliability
Due: P5
Released: P6 (Streaming+MLib, Weather)
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Watch: Lecture
Slides: PDF
Week 13
Mon, Apr 17
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Watch: Lecture
Wed, Apr 19
Streaming: Spark Concepts
Watch: Lecture
Slides: PDF
Quiz: week 12 and before (cumulative)
Fri, Apr 21
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Slides: PDF
Part 3: The Cloud
Week 14
Wed, Apr 26
Big Query 1
Released: P7 (BigQuery, Optional)
Watch: Lecture
Quiz: week 13 and before (cumulative)
Fri, Apr 28
Big Query 2
Due: P6
Watch: Lecture
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Slides: PDF
Week 15
Mon, May 1
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Watch: Lecture
Slides: PDF
Mon, Mar 20
Midterm (in class)
Due: P3Wed, Mar 22
Spark Grouping and Joining
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")Watch: Lecture
Slides: PDF
Fri, Mar 24
Spark Internals and Performance 1
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")Watch: Lecture
Slides: PDF
Wed, Mar 29
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")Watch: Lecture
Slides: PDF
Quiz: week 9 and before (cumulative)
Week 11
Mon, Apr 3
Cassandra Partitioning+Replication
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P4
Released: P5 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Fri, Apr 7
Cassandra Storage Engine
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Memtables, SSTables, and Commit Logs" to "Summary" of Chapter 6, "The Cassandra Architecture")
Watch: Lecture
Slides: PDF
Week 12
Mon, Apr 10
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Worksheet: PDF
Fri, Apr 14
Streaming: Kafka Reliability
Due: P5
Released: P6 (Streaming+MLib, Weather)
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Watch: Lecture
Slides: PDF
Week 13
Mon, Apr 17
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Watch: Lecture
Wed, Apr 19
Streaming: Spark Concepts
Watch: Lecture
Slides: PDF
Quiz: week 12 and before (cumulative)
Fri, Apr 21
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Slides: PDF
Part 3: The Cloud
Week 14
Wed, Apr 26
Big Query 1
Released: P7 (BigQuery, Optional)
Watch: Lecture
Quiz: week 13 and before (cumulative)
Fri, Apr 28
Big Query 2
Due: P6
Watch: Lecture
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Slides: PDF
Week 15
Mon, May 1
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Watch: Lecture
Slides: PDF
Mon, Apr 3
Cassandra Partitioning+Replication
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")Due: P4
Released: P5 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Fri, Apr 7
Cassandra Storage Engine
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Memtables, SSTables, and Commit Logs" to "Summary" of Chapter 6, "The Cassandra Architecture")Watch: Lecture
Slides: PDF
Mon, Apr 10
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")Watch: Lecture
Slides: PDF
Worksheet: PDF
Fri, Apr 14
Streaming: Kafka Reliability
Due: P5Released: P6 (Streaming+MLib, Weather)
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Watch: Lecture
Slides: PDF
Week 13
Mon, Apr 17
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Watch: Lecture
Wed, Apr 19
Streaming: Spark Concepts
Watch: Lecture
Slides: PDF
Quiz: week 12 and before (cumulative)
Fri, Apr 21
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Slides: PDF
Part 3: The Cloud
Week 14
Wed, Apr 26
Big Query 1
Released: P7 (BigQuery, Optional)
Watch: Lecture
Quiz: week 13 and before (cumulative)
Fri, Apr 28
Big Query 2
Due: P6
Watch: Lecture
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Slides: PDF
Week 15
Mon, May 1
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Watch: Lecture
Slides: PDF
Mon, Apr 17
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")Watch: Lecture
Wed, Apr 19
Streaming: Spark Concepts
Watch: LectureSlides: PDF
Quiz: week 12 and before (cumulative)
Fri, Apr 21
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")Watch: Lecture
Slides: PDF
Wed, Apr 26
Big Query 1
Released: P7 (BigQuery, Optional)Watch: Lecture
Quiz: week 13 and before (cumulative)
Fri, Apr 28
Big Query 2
Due: P6Watch: Lecture
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Slides: PDF
Week 15
Mon, May 1
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Watch: Lecture
Slides: PDF
Mon, May 1
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")Watch: Lecture
Slides: PDF