Course Schedule
Part 1: Resources
Week 1
Mon, Jan 20
Martin Luther King Day!
Week 2
Mon, Jan 27
Deployment (Linux Pipelines)
Read: Designing Data Intensive Applications, Kleppmann ("Batch Processing with Unix Tools" of Chapter 10)
Watch: Lecture
Slides: PDF
Wed, Jan 29
Deployment (Docker)
Release: P1 (Docker)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 1
Fri, Jan 31
Network Resources (Overview)
Read: Designing Data Intensive Applications, Kleppmann (Chapter 4, "Encoding and Evolution")
Watch: Part 1: Dockerfiles
Watch: Part 2: Networking Intro
Watch: Part 3: Web Server Demo
Watch: Part 4: gRPC
Slides: PDF
Anki Flashcards: Deck
Week 3
Wed, Feb 5
Network Resources (Compose)
Quiz: week 2 and before (cumulative)
Fri, Feb 7
Memory Resources (Caching)
Read: Systems Performance, Gregg (6.2.2; "CPU Caches" and "Latency" subsections of 6.4.1)
Week 4
Wed, Feb 12
Memory Resources (PyArrow)
Read: Gallery of Processor Cache Effects (Examples 1 and 2)
Quiz: week 3 and before (cumulative)
Fri, Feb 14
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")
Week 5
Mon, Feb 17
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)
Wed, Feb 19
Storage Resources (File Systems)
Quiz: week 4 and before (cumulative)
Fri, Feb 21
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")
Due: P2
Release: P3 (Compute+Storage)
Week 6
Mon, Feb 24
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Wed, Feb 26
Midterm (in class)
Fri, Feb 28
HDFS Overview
Part 2: Clusters
Week 7
Mon, Mar 3
HDFS Practice
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Release: P4 (HDFS, Loans)
Due: P3
Wed, Mar 5
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Quiz: week 6 and before (cumulative)
Fri, Mar 7
Spark RDDs
Week 8
Mon, Mar 10
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Wed, Mar 12
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Quiz: week 7 and before (cumulative)
Fri, Mar 14
Spark Internals and Performance 1
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Due: P4
Release: P5 (Spark, Loans)
Week 9
Mon, Mar 17
Spark Internals and Performance 2
Wed, Mar 19
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Quiz: week 8 and before (cumulative)
Fri, Mar 21
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Week 10
Mon, Mar 24
Spring Break
Wed, Mar 26
Spring Break
Fri, Mar 28
Spring Break
Week 11
Mon, Mar 31
Cassandra Query Language (CQL)
Wed, Apr 2
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Release: P6 (Cassandra, Weather)
Quiz: week 10 and before (cumulative)
Fri, Apr 4
Cassandra Replication
Week 12
Mon, Apr 7
Midterm (in class)
Wed, Apr 9
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Fri, Apr 11
Streaming: Kafka Demos
Part 3: Cloud
Week 13
Mon, Apr 14
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Due: P6
Release: P7 (Kafka, Weather Stations)
Wed, Apr 16
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Quiz: week 12 and before (cumulative)
Fri, Apr 18
Streaming: Spark Concepts
Week 14
Mon, Apr 21
The Cloud
Wed, Apr 23
Big Query 1
Quiz: week 13 and before (cumulative)
Fri, Apr 25
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Release: P8 (BigQuery, Loans)
Week 15
Mon, Apr 28
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Wed, Apr 30
Big Query 4
Quiz: week 14 and before (cumulative)
Fri, May 2
Review
Due: P8
Mon, Jan 20
Martin Luther King Day!
Mon, Jan 27
Deployment (Linux Pipelines)
Read: Designing Data Intensive Applications, Kleppmann ("Batch Processing with Unix Tools" of Chapter 10)Watch: Lecture
Slides: PDF
Wed, Jan 29
Deployment (Docker)
Release: P1 (Docker)Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 1
Fri, Jan 31
Network Resources (Overview)
Read: Designing Data Intensive Applications, Kleppmann (Chapter 4, "Encoding and Evolution")Watch: Part 1: Dockerfiles
Watch: Part 2: Networking Intro
Watch: Part 3: Web Server Demo
Watch: Part 4: gRPC
Slides: PDF
Anki Flashcards: Deck
Week 3
Wed, Feb 5
Network Resources (Compose)
Quiz: week 2 and before (cumulative)
Fri, Feb 7
Memory Resources (Caching)
Read: Systems Performance, Gregg (6.2.2; "CPU Caches" and "Latency" subsections of 6.4.1)
Week 4
Wed, Feb 12
Memory Resources (PyArrow)
Read: Gallery of Processor Cache Effects (Examples 1 and 2)
Quiz: week 3 and before (cumulative)
Fri, Feb 14
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")
Week 5
Mon, Feb 17
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)
Wed, Feb 19
Storage Resources (File Systems)
Quiz: week 4 and before (cumulative)
Fri, Feb 21
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")
Due: P2
Release: P3 (Compute+Storage)
Week 6
Mon, Feb 24
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Wed, Feb 26
Midterm (in class)
Fri, Feb 28
HDFS Overview
Part 2: Clusters
Week 7
Mon, Mar 3
HDFS Practice
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Release: P4 (HDFS, Loans)
Due: P3
Wed, Mar 5
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Quiz: week 6 and before (cumulative)
Fri, Mar 7
Spark RDDs
Week 8
Mon, Mar 10
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Wed, Mar 12
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Quiz: week 7 and before (cumulative)
Fri, Mar 14
Spark Internals and Performance 1
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Due: P4
Release: P5 (Spark, Loans)
Week 9
Mon, Mar 17
Spark Internals and Performance 2
Wed, Mar 19
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Quiz: week 8 and before (cumulative)
Fri, Mar 21
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Week 10
Mon, Mar 24
Spring Break
Wed, Mar 26
Spring Break
Fri, Mar 28
Spring Break
Week 11
Mon, Mar 31
Cassandra Query Language (CQL)
Wed, Apr 2
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Release: P6 (Cassandra, Weather)
Quiz: week 10 and before (cumulative)
Fri, Apr 4
Cassandra Replication
Week 12
Mon, Apr 7
Midterm (in class)
Wed, Apr 9
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Fri, Apr 11
Streaming: Kafka Demos
Part 3: Cloud
Week 13
Mon, Apr 14
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Due: P6
Release: P7 (Kafka, Weather Stations)
Wed, Apr 16
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Quiz: week 12 and before (cumulative)
Fri, Apr 18
Streaming: Spark Concepts
Week 14
Mon, Apr 21
The Cloud
Wed, Apr 23
Big Query 1
Quiz: week 13 and before (cumulative)
Fri, Apr 25
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Release: P8 (BigQuery, Loans)
Week 15
Mon, Apr 28
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Wed, Apr 30
Big Query 4
Quiz: week 14 and before (cumulative)
Fri, May 2
Review
Due: P8
Wed, Feb 5
Network Resources (Compose)
Quiz: week 2 and before (cumulative)
Fri, Feb 7
Memory Resources (Caching)
Read: Systems Performance, Gregg (6.2.2; "CPU Caches" and "Latency" subsections of 6.4.1)Wed, Feb 12
Memory Resources (PyArrow)
Read: Gallery of Processor Cache Effects (Examples 1 and 2)Quiz: week 3 and before (cumulative)
Fri, Feb 14
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")Week 5
Mon, Feb 17
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)
Wed, Feb 19
Storage Resources (File Systems)
Quiz: week 4 and before (cumulative)
Fri, Feb 21
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")
Due: P2
Release: P3 (Compute+Storage)
Week 6
Mon, Feb 24
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Wed, Feb 26
Midterm (in class)
Fri, Feb 28
HDFS Overview
Part 2: Clusters
Week 7
Mon, Mar 3
HDFS Practice
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Release: P4 (HDFS, Loans)
Due: P3
Wed, Mar 5
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Quiz: week 6 and before (cumulative)
Fri, Mar 7
Spark RDDs
Week 8
Mon, Mar 10
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Wed, Mar 12
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Quiz: week 7 and before (cumulative)
Fri, Mar 14
Spark Internals and Performance 1
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Due: P4
Release: P5 (Spark, Loans)
Week 9
Mon, Mar 17
Spark Internals and Performance 2
Wed, Mar 19
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Quiz: week 8 and before (cumulative)
Fri, Mar 21
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Week 10
Mon, Mar 24
Spring Break
Wed, Mar 26
Spring Break
Fri, Mar 28
Spring Break
Week 11
Mon, Mar 31
Cassandra Query Language (CQL)
Wed, Apr 2
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Release: P6 (Cassandra, Weather)
Quiz: week 10 and before (cumulative)
Fri, Apr 4
Cassandra Replication
Week 12
Mon, Apr 7
Midterm (in class)
Wed, Apr 9
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Fri, Apr 11
Streaming: Kafka Demos
Part 3: Cloud
Week 13
Mon, Apr 14
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Due: P6
Release: P7 (Kafka, Weather Stations)
Wed, Apr 16
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Quiz: week 12 and before (cumulative)
Fri, Apr 18
Streaming: Spark Concepts
Week 14
Mon, Apr 21
The Cloud
Wed, Apr 23
Big Query 1
Quiz: week 13 and before (cumulative)
Fri, Apr 25
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Release: P8 (BigQuery, Loans)
Week 15
Mon, Apr 28
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Wed, Apr 30
Big Query 4
Quiz: week 14 and before (cumulative)
Fri, May 2
Review
Due: P8
Mon, Feb 17
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)Wed, Feb 19
Storage Resources (File Systems)
Quiz: week 4 and before (cumulative)
Fri, Feb 21
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")Due: P2
Release: P3 (Compute+Storage)
Mon, Feb 24
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")Wed, Feb 26
Midterm (in class)
Fri, Feb 28
HDFS Overview
Part 2: Clusters
Week 7
Mon, Mar 3
HDFS Practice
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Release: P4 (HDFS, Loans)
Due: P3
Wed, Mar 5
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Quiz: week 6 and before (cumulative)
Fri, Mar 7
Spark RDDs
Week 8
Mon, Mar 10
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Wed, Mar 12
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Quiz: week 7 and before (cumulative)
Fri, Mar 14
Spark Internals and Performance 1
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Due: P4
Release: P5 (Spark, Loans)
Week 9
Mon, Mar 17
Spark Internals and Performance 2
Wed, Mar 19
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Quiz: week 8 and before (cumulative)
Fri, Mar 21
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Week 10
Mon, Mar 24
Spring Break
Wed, Mar 26
Spring Break
Fri, Mar 28
Spring Break
Week 11
Mon, Mar 31
Cassandra Query Language (CQL)
Wed, Apr 2
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Release: P6 (Cassandra, Weather)
Quiz: week 10 and before (cumulative)
Fri, Apr 4
Cassandra Replication
Week 12
Mon, Apr 7
Midterm (in class)
Wed, Apr 9
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Fri, Apr 11
Streaming: Kafka Demos
Part 3: Cloud
Week 13
Mon, Apr 14
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Due: P6
Release: P7 (Kafka, Weather Stations)
Wed, Apr 16
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Quiz: week 12 and before (cumulative)
Fri, Apr 18
Streaming: Spark Concepts
Week 14
Mon, Apr 21
The Cloud
Wed, Apr 23
Big Query 1
Quiz: week 13 and before (cumulative)
Fri, Apr 25
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Release: P8 (BigQuery, Loans)
Week 15
Mon, Apr 28
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Wed, Apr 30
Big Query 4
Quiz: week 14 and before (cumulative)
Fri, May 2
Review
Due: P8
Mon, Mar 3
HDFS Practice
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)Release: P4 (HDFS, Loans)
Due: P3
Wed, Mar 5
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")Quiz: week 6 and before (cumulative)
Fri, Mar 7
Spark RDDs
Mon, Mar 10
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")Wed, Mar 12
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")Quiz: week 7 and before (cumulative)
Fri, Mar 14
Spark Internals and Performance 1
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")Due: P4
Release: P5 (Spark, Loans)
Week 9
Mon, Mar 17
Spark Internals and Performance 2
Wed, Mar 19
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Quiz: week 8 and before (cumulative)
Fri, Mar 21
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Week 10
Mon, Mar 24
Spring Break
Wed, Mar 26
Spring Break
Fri, Mar 28
Spring Break
Week 11
Mon, Mar 31
Cassandra Query Language (CQL)
Wed, Apr 2
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Release: P6 (Cassandra, Weather)
Quiz: week 10 and before (cumulative)
Fri, Apr 4
Cassandra Replication
Week 12
Mon, Apr 7
Midterm (in class)
Wed, Apr 9
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Fri, Apr 11
Streaming: Kafka Demos
Part 3: Cloud
Week 13
Mon, Apr 14
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Due: P6
Release: P7 (Kafka, Weather Stations)
Wed, Apr 16
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Quiz: week 12 and before (cumulative)
Fri, Apr 18
Streaming: Spark Concepts
Week 14
Mon, Apr 21
The Cloud
Wed, Apr 23
Big Query 1
Quiz: week 13 and before (cumulative)
Fri, Apr 25
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Release: P8 (BigQuery, Loans)
Week 15
Mon, Apr 28
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Wed, Apr 30
Big Query 4
Quiz: week 14 and before (cumulative)
Fri, May 2
Review
Due: P8
Mon, Mar 17
Spark Internals and Performance 2
Wed, Mar 19
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")Quiz: week 8 and before (cumulative)
Fri, Mar 21
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")Mon, Mar 24
Spring Break
Wed, Mar 26
Spring Break
Fri, Mar 28
Spring Break
Week 11
Mon, Mar 31
Cassandra Query Language (CQL)
Wed, Apr 2
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Release: P6 (Cassandra, Weather)
Quiz: week 10 and before (cumulative)
Fri, Apr 4
Cassandra Replication
Week 12
Mon, Apr 7
Midterm (in class)
Wed, Apr 9
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Fri, Apr 11
Streaming: Kafka Demos
Part 3: Cloud
Week 13
Mon, Apr 14
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Due: P6
Release: P7 (Kafka, Weather Stations)
Wed, Apr 16
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Quiz: week 12 and before (cumulative)
Fri, Apr 18
Streaming: Spark Concepts
Week 14
Mon, Apr 21
The Cloud
Wed, Apr 23
Big Query 1
Quiz: week 13 and before (cumulative)
Fri, Apr 25
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Release: P8 (BigQuery, Loans)
Week 15
Mon, Apr 28
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Wed, Apr 30
Big Query 4
Quiz: week 14 and before (cumulative)
Fri, May 2
Review
Due: P8
Mon, Mar 31
Cassandra Query Language (CQL)
Wed, Apr 2
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")Due: P5
Release: P6 (Cassandra, Weather)
Quiz: week 10 and before (cumulative)
Fri, Apr 4
Cassandra Replication
Mon, Apr 7
Midterm (in class)
Wed, Apr 9
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")Fri, Apr 11
Streaming: Kafka Demos
Part 3: Cloud
Week 13
Mon, Apr 14
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Due: P6
Release: P7 (Kafka, Weather Stations)
Wed, Apr 16
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Quiz: week 12 and before (cumulative)
Fri, Apr 18
Streaming: Spark Concepts
Week 14
Mon, Apr 21
The Cloud
Wed, Apr 23
Big Query 1
Quiz: week 13 and before (cumulative)
Fri, Apr 25
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Release: P8 (BigQuery, Loans)
Week 15
Mon, Apr 28
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Wed, Apr 30
Big Query 4
Quiz: week 14 and before (cumulative)
Fri, May 2
Review
Due: P8
Mon, Apr 14
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")Due: P6
Release: P7 (Kafka, Weather Stations)
Wed, Apr 16
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")Quiz: week 12 and before (cumulative)
Fri, Apr 18
Streaming: Spark Concepts
Mon, Apr 21
The Cloud
Wed, Apr 23
Big Query 1
Quiz: week 13 and before (cumulative)
Fri, Apr 25
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")Due: P7
Release: P8 (BigQuery, Loans)