Course Schedule
Part 1: Resources
Week 1
Mon, Sep 1
Labor Day!
Week 2
Mon, Sep 8
Deployment (Linux Pipelines)
Read: Designing Data Intensive Applications, Kleppmann ("Batch Processing with Unix Tools" of Chapter 10)
Watch: Lecture
Slides: PDF
Wed, Sep 10
Deployment (Docker)
Release: P1 (Docker)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Online Quiz: week 1
Fri, Sep 12
Network Resources (Overview)
Read: Designing Data Intensive Applications, Kleppmann (Chapter 4, "Encoding and Evolution")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 3
Mon, Sep 15
Network Resources (gRPC)
Read: gRPC Basics Tutorial
- note: Tyler's recording didn't capture video, so sharing Meena's recording
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 4
Mon, Sep 22
Memory Resources (Caching)
Read: Systems Performance, Gregg (6.2.2; "CPU Caches" and "Latency" subsections of 6.4.1)
Due: P1
Release: P2 (Network+Memory)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 5
Mon, Sep 29
Memory Resources (PyArrow)
Read: Gallery of Processor Cache Effects (Examples 1 and 2)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 1
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Oct 3
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 6
Mon, Oct 6
Storage Resources (File Systems)
Due: P2
Release: P3 (Compute+Storage)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 8
Storage Resources (Formats and DBs)
In-Person Midterm (evening)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Oct 10
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Part 2: Clusters
Week 7
Wed, Oct 15
Hadoop
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Release: P4 (HDFS, Loans)
Due: P3
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Online Quiz: week 6 and before (cumulative)
Fri, Oct 17
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 8
Wed, Oct 22
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Watch: Lecture
Anki Flashcards: Deck
Fri, Oct 24
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 9
Mon, Oct 27
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Due: P4
Release: P5 (Spark, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 29
In-Person Quiz (in class)
Catchup+Quiz
Fri, Oct 31
Spark Machine Learning API
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 10
Wed, Nov 5
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Online Quiz: week 9 and before (cumulative)
Fri, Nov 7
Cassandra Query Language (CQL)
Week 11
Mon, Nov 10
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Release: P6 (Cassandra, Weather)
Wed, Nov 12
Cassandra Replication
Fri, Nov 14
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Week 12
Mon, Nov 17
Streaming: Kafka Demos
Wed, Nov 19
In-Person Midterm (evening)
Review/Catchup
Fri, Nov 21
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Due: P6
Release: P7 (Kafka, Weather Stations)
Week 13
Mon, Nov 24
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Wed, Nov 26
Streaming: Spark Concepts
Fri, Nov 28
Thanksgiving Break
Part 3: Cloud
Week 14
Mon, Dec 1
The Cloud
Wed, Dec 3
Big Query 1: Basics
Online Quiz: week 13 and before (cumulative)
Fri, Dec 5
Big Query 2: Data Sources + Geo Data
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Release: P8 (Cloud Services)
Week 15
Mon, Dec 8
Big Query 3: Machine Learning
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Wed, Dec 10
Big Query 4: Cost
Fri, Dec 12
No Class
Due: P8
Mon, Sep 1
Labor Day!
Mon, Sep 8
Deployment (Linux Pipelines)
Read: Designing Data Intensive Applications, Kleppmann ("Batch Processing with Unix Tools" of Chapter 10)Watch: Lecture
Slides: PDF
Wed, Sep 10
Deployment (Docker)
Release: P1 (Docker)Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Online Quiz: week 1
Fri, Sep 12
Network Resources (Overview)
Read: Designing Data Intensive Applications, Kleppmann (Chapter 4, "Encoding and Evolution")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 3
Mon, Sep 15
Network Resources (gRPC)
Read: gRPC Basics Tutorial
- note: Tyler's recording didn't capture video, so sharing Meena's recording
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 4
Mon, Sep 22
Memory Resources (Caching)
Read: Systems Performance, Gregg (6.2.2; "CPU Caches" and "Latency" subsections of 6.4.1)
Due: P1
Release: P2 (Network+Memory)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 5
Mon, Sep 29
Memory Resources (PyArrow)
Read: Gallery of Processor Cache Effects (Examples 1 and 2)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 1
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Oct 3
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 6
Mon, Oct 6
Storage Resources (File Systems)
Due: P2
Release: P3 (Compute+Storage)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 8
Storage Resources (Formats and DBs)
In-Person Midterm (evening)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Oct 10
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Part 2: Clusters
Week 7
Wed, Oct 15
Hadoop
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Release: P4 (HDFS, Loans)
Due: P3
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Online Quiz: week 6 and before (cumulative)
Fri, Oct 17
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 8
Wed, Oct 22
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Watch: Lecture
Anki Flashcards: Deck
Fri, Oct 24
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 9
Mon, Oct 27
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Due: P4
Release: P5 (Spark, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 29
In-Person Quiz (in class)
Catchup+Quiz
Fri, Oct 31
Spark Machine Learning API
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 10
Wed, Nov 5
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Online Quiz: week 9 and before (cumulative)
Fri, Nov 7
Cassandra Query Language (CQL)
Week 11
Mon, Nov 10
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Release: P6 (Cassandra, Weather)
Wed, Nov 12
Cassandra Replication
Fri, Nov 14
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Week 12
Mon, Nov 17
Streaming: Kafka Demos
Wed, Nov 19
In-Person Midterm (evening)
Review/Catchup
Fri, Nov 21
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Due: P6
Release: P7 (Kafka, Weather Stations)
Week 13
Mon, Nov 24
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Wed, Nov 26
Streaming: Spark Concepts
Fri, Nov 28
Thanksgiving Break
Part 3: Cloud
Week 14
Mon, Dec 1
The Cloud
Wed, Dec 3
Big Query 1: Basics
Online Quiz: week 13 and before (cumulative)
Fri, Dec 5
Big Query 2: Data Sources + Geo Data
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Release: P8 (Cloud Services)
Week 15
Mon, Dec 8
Big Query 3: Machine Learning
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Wed, Dec 10
Big Query 4: Cost
Fri, Dec 12
No Class
Due: P8
Mon, Sep 15
Network Resources (gRPC)
Read: gRPC Basics Tutorial- note: Tyler's recording didn't capture video, so sharing Meena's recording
Slides: PDF
Anki Flashcards: Deck
Mon, Sep 22
Memory Resources (Caching)
Read: Systems Performance, Gregg (6.2.2; "CPU Caches" and "Latency" subsections of 6.4.1)Due: P1
Release: P2 (Network+Memory)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 5
Mon, Sep 29
Memory Resources (PyArrow)
Read: Gallery of Processor Cache Effects (Examples 1 and 2)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 1
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Oct 3
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 6
Mon, Oct 6
Storage Resources (File Systems)
Due: P2
Release: P3 (Compute+Storage)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 8
Storage Resources (Formats and DBs)
In-Person Midterm (evening)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Oct 10
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Part 2: Clusters
Week 7
Wed, Oct 15
Hadoop
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Release: P4 (HDFS, Loans)
Due: P3
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Online Quiz: week 6 and before (cumulative)
Fri, Oct 17
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 8
Wed, Oct 22
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Watch: Lecture
Anki Flashcards: Deck
Fri, Oct 24
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 9
Mon, Oct 27
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Due: P4
Release: P5 (Spark, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 29
In-Person Quiz (in class)
Catchup+Quiz
Fri, Oct 31
Spark Machine Learning API
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 10
Wed, Nov 5
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Online Quiz: week 9 and before (cumulative)
Fri, Nov 7
Cassandra Query Language (CQL)
Week 11
Mon, Nov 10
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Release: P6 (Cassandra, Weather)
Wed, Nov 12
Cassandra Replication
Fri, Nov 14
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Week 12
Mon, Nov 17
Streaming: Kafka Demos
Wed, Nov 19
In-Person Midterm (evening)
Review/Catchup
Fri, Nov 21
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Due: P6
Release: P7 (Kafka, Weather Stations)
Week 13
Mon, Nov 24
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Wed, Nov 26
Streaming: Spark Concepts
Fri, Nov 28
Thanksgiving Break
Part 3: Cloud
Week 14
Mon, Dec 1
The Cloud
Wed, Dec 3
Big Query 1: Basics
Online Quiz: week 13 and before (cumulative)
Fri, Dec 5
Big Query 2: Data Sources + Geo Data
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Release: P8 (Cloud Services)
Week 15
Mon, Dec 8
Big Query 3: Machine Learning
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Wed, Dec 10
Big Query 4: Cost
Fri, Dec 12
No Class
Due: P8
Mon, Sep 29
Memory Resources (PyArrow)
Read: Gallery of Processor Cache Effects (Examples 1 and 2)Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 1
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Oct 3
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Mon, Oct 6
Storage Resources (File Systems)
Due: P2Release: P3 (Compute+Storage)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 8
Storage Resources (Formats and DBs)
In-Person Midterm (evening)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Oct 10
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Part 2: Clusters
Week 7
Wed, Oct 15
Hadoop
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Release: P4 (HDFS, Loans)
Due: P3
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Online Quiz: week 6 and before (cumulative)
Fri, Oct 17
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 8
Wed, Oct 22
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Watch: Lecture
Anki Flashcards: Deck
Fri, Oct 24
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 9
Mon, Oct 27
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Due: P4
Release: P5 (Spark, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 29
In-Person Quiz (in class)
Catchup+Quiz
Fri, Oct 31
Spark Machine Learning API
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 10
Wed, Nov 5
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Online Quiz: week 9 and before (cumulative)
Fri, Nov 7
Cassandra Query Language (CQL)
Week 11
Mon, Nov 10
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Release: P6 (Cassandra, Weather)
Wed, Nov 12
Cassandra Replication
Fri, Nov 14
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Week 12
Mon, Nov 17
Streaming: Kafka Demos
Wed, Nov 19
In-Person Midterm (evening)
Review/Catchup
Fri, Nov 21
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Due: P6
Release: P7 (Kafka, Weather Stations)
Week 13
Mon, Nov 24
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Wed, Nov 26
Streaming: Spark Concepts
Fri, Nov 28
Thanksgiving Break
Part 3: Cloud
Week 14
Mon, Dec 1
The Cloud
Wed, Dec 3
Big Query 1: Basics
Online Quiz: week 13 and before (cumulative)
Fri, Dec 5
Big Query 2: Data Sources + Geo Data
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Release: P8 (Cloud Services)
Week 15
Mon, Dec 8
Big Query 3: Machine Learning
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Wed, Dec 10
Big Query 4: Cost
Fri, Dec 12
No Class
Due: P8
Wed, Oct 15
Hadoop
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)Release: P4 (HDFS, Loans)
Due: P3
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Online Quiz: week 6 and before (cumulative)
Fri, Oct 17
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 22
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")Watch: Lecture
Anki Flashcards: Deck
Fri, Oct 24
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 9
Mon, Oct 27
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Due: P4
Release: P5 (Spark, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 29
In-Person Quiz (in class)
Catchup+Quiz
Fri, Oct 31
Spark Machine Learning API
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 10
Wed, Nov 5
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Online Quiz: week 9 and before (cumulative)
Fri, Nov 7
Cassandra Query Language (CQL)
Week 11
Mon, Nov 10
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Release: P6 (Cassandra, Weather)
Wed, Nov 12
Cassandra Replication
Fri, Nov 14
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Week 12
Mon, Nov 17
Streaming: Kafka Demos
Wed, Nov 19
In-Person Midterm (evening)
Review/Catchup
Fri, Nov 21
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Due: P6
Release: P7 (Kafka, Weather Stations)
Week 13
Mon, Nov 24
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Wed, Nov 26
Streaming: Spark Concepts
Fri, Nov 28
Thanksgiving Break
Part 3: Cloud
Week 14
Mon, Dec 1
The Cloud
Wed, Dec 3
Big Query 1: Basics
Online Quiz: week 13 and before (cumulative)
Fri, Dec 5
Big Query 2: Data Sources + Geo Data
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Release: P8 (Cloud Services)
Week 15
Mon, Dec 8
Big Query 3: Machine Learning
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Wed, Dec 10
Big Query 4: Cost
Fri, Dec 12
No Class
Due: P8
Mon, Oct 27
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")Due: P4
Release: P5 (Spark, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 29
In-Person Quiz (in class)
Catchup+Quiz
Fri, Oct 31
Spark Machine Learning API
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Nov 5
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Online Quiz: week 9 and before (cumulative)
Fri, Nov 7
Cassandra Query Language (CQL)
Week 11
Mon, Nov 10
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Release: P6 (Cassandra, Weather)
Wed, Nov 12
Cassandra Replication
Fri, Nov 14
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Week 12
Mon, Nov 17
Streaming: Kafka Demos
Wed, Nov 19
In-Person Midterm (evening)
Review/Catchup
Fri, Nov 21
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Due: P6
Release: P7 (Kafka, Weather Stations)
Week 13
Mon, Nov 24
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Wed, Nov 26
Streaming: Spark Concepts
Fri, Nov 28
Thanksgiving Break
Part 3: Cloud
Week 14
Mon, Dec 1
The Cloud
Wed, Dec 3
Big Query 1: Basics
Online Quiz: week 13 and before (cumulative)
Fri, Dec 5
Big Query 2: Data Sources + Geo Data
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Release: P8 (Cloud Services)
Week 15
Mon, Dec 8
Big Query 3: Machine Learning
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Wed, Dec 10
Big Query 4: Cost
Fri, Dec 12
No Class
Due: P8
Mon, Nov 10
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")Due: P5
Release: P6 (Cassandra, Weather)
Wed, Nov 12
Cassandra Replication
Fri, Nov 14
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")Mon, Nov 17
Streaming: Kafka Demos
Wed, Nov 19
In-Person Midterm (evening)
Review/Catchup
Fri, Nov 21
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")Due: P6
Release: P7 (Kafka, Weather Stations)
Week 13
Mon, Nov 24
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Wed, Nov 26
Streaming: Spark Concepts
Fri, Nov 28
Thanksgiving Break
Part 3: Cloud
Week 14
Mon, Dec 1
The Cloud
Wed, Dec 3
Big Query 1: Basics
Online Quiz: week 13 and before (cumulative)
Fri, Dec 5
Big Query 2: Data Sources + Geo Data
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Release: P8 (Cloud Services)
Week 15
Mon, Dec 8
Big Query 3: Machine Learning
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Wed, Dec 10
Big Query 4: Cost
Fri, Dec 12
No Class
Due: P8
Mon, Nov 24
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")Wed, Nov 26
Streaming: Spark Concepts
Fri, Nov 28
Thanksgiving Break
Mon, Dec 1
The Cloud
Wed, Dec 3
Big Query 1: Basics
Online Quiz: week 13 and before (cumulative)
Fri, Dec 5
Big Query 2: Data Sources + Geo Data
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")Due: P7
Release: P8 (Cloud Services)