Course Schedule
Part 1: Resources
Week 1
Mon, Sep 2
Labor Day
Week 2
Mon, Sep 9
Deployment (Linux Pipelines)
Read: Designing Data Intensive Applications, Kleppmann ("Batch Processing with Unix Tools" of Chapter 10)
Watch: Lecture
Slides: PDF
Wed, Sep 11
Deployment (Docker)
Released: P1 (Docker)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 1
Fri, Sep 13
Network Resources (Overview)
Read: Designing Data Intensive Applications, Kleppmann (Chapter 4, "Encoding and Evolution")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 3
Mon, Sep 16
Network Resources (gRPC)
Read: gRPC Basics Tutorial
Watch: Lecture Part 1 (concepts)
Watch: Lecture Part 2 (demo)
Slides: PDF
Anki Flashcards: Deck
Wed, Sep 18
Network Resources (Compose)
Watch: Lecture
Slides: PDF
Quiz: week 2 and before (cumulative)
Fri, Sep 20
Memory Resources (Caching)
Read: Systems Performance, Gregg (6.2.2; "CPU Caches" and "Latency" subsections of 6.4.1)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Week 4
Wed, Sep 25
Memory Resources (PyArrow)
Read: Gallery of Processor Cache Effects (Examples 1 and 2)
Watch: Lecture Part 1 (CPU Cache)
Watch: Lecture Part 2 (OS Cache)
Slides: PDF
Anki Flashcards: Deck
Quiz: week 3 and before (cumulative)
Fri, Sep 27
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Week 5
Mon, Sep 30
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Wed, Oct 2
Storage Resources (File Systems)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 4 and before (cumulative)
Fri, Oct 4
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")
Due: P2
Released: P3 (Compute+Storage)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 6
Mon, Oct 7
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 9
Midterm (in class)
Part 2: Clusters
Week 7
Mon, Oct 14
HDFS Practice
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Released: P4 (HDFS, Loans)
Due: P3
Watch: Lecture
Anki Flashcards: Deck
Wed, Oct 16
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Quiz: week 6 and before (cumulative)
Fri, Oct 18
Spark RDDs
Week 8
Mon, Oct 21
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Wed, Oct 23
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Fri, Oct 25
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Due: P4
Released: P5 (Spark, Loans)
Week 9
Mon, Oct 28
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Wed, Oct 30
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Quiz: week 8 and before (cumulative)
Fri, Nov 1
Cassandra Query Language (CQL)
Week 10
Mon, Nov 4
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Worksheet: PDF
Wed, Nov 6
Cassandra Replication
Due: P5
Released: P6 (Cassandra, Weather)
Quiz: week 9 and before (cumulative)
Fri, Nov 8
Review
Week 11
Mon, Nov 11
Midterm (in class)
Wed, Nov 13
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Quiz: week 10 and before (cumulative)
Fri, Nov 15
Streaming: Kafka Demos
Week 12
Mon, Nov 18
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Wed, Nov 20
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Due: P6
Released: P7 (Kafka, Weather Stations)
Quiz: week 11 and before (cumulative)
Fri, Nov 22
Streaming: Spark Concepts
Part 3: Cloud
Week 13
Mon, Nov 25
The Cloud
Wed, Nov 27
Big Query 1
Quiz: week 12 and before (cumulative)
Fri, Nov 29
Thanksgiving Break
Week 14
Mon, Dec 2
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Wed, Dec 4
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Quiz: week 13 and before (cumulative)
Fri, Dec 6
Big Query 4
Week 15
Mon, Dec 9
Cloud Deployment
Wed, Dec 11
Review
Due: P8
Mon, Sep 2
Labor Day
Mon, Sep 9
Deployment (Linux Pipelines)
Read: Designing Data Intensive Applications, Kleppmann ("Batch Processing with Unix Tools" of Chapter 10)Watch: Lecture
Slides: PDF
Wed, Sep 11
Deployment (Docker)
Released: P1 (Docker)Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 1
Fri, Sep 13
Network Resources (Overview)
Read: Designing Data Intensive Applications, Kleppmann (Chapter 4, "Encoding and Evolution")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 3
Mon, Sep 16
Network Resources (gRPC)
Read: gRPC Basics Tutorial
Watch: Lecture Part 1 (concepts)
Watch: Lecture Part 2 (demo)
Slides: PDF
Anki Flashcards: Deck
Wed, Sep 18
Network Resources (Compose)
Watch: Lecture
Slides: PDF
Quiz: week 2 and before (cumulative)
Fri, Sep 20
Memory Resources (Caching)
Read: Systems Performance, Gregg (6.2.2; "CPU Caches" and "Latency" subsections of 6.4.1)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Week 4
Wed, Sep 25
Memory Resources (PyArrow)
Read: Gallery of Processor Cache Effects (Examples 1 and 2)
Watch: Lecture Part 1 (CPU Cache)
Watch: Lecture Part 2 (OS Cache)
Slides: PDF
Anki Flashcards: Deck
Quiz: week 3 and before (cumulative)
Fri, Sep 27
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Week 5
Mon, Sep 30
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Wed, Oct 2
Storage Resources (File Systems)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 4 and before (cumulative)
Fri, Oct 4
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")
Due: P2
Released: P3 (Compute+Storage)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 6
Mon, Oct 7
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 9
Midterm (in class)
Part 2: Clusters
Week 7
Mon, Oct 14
HDFS Practice
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Released: P4 (HDFS, Loans)
Due: P3
Watch: Lecture
Anki Flashcards: Deck
Wed, Oct 16
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Quiz: week 6 and before (cumulative)
Fri, Oct 18
Spark RDDs
Week 8
Mon, Oct 21
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Wed, Oct 23
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Fri, Oct 25
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Due: P4
Released: P5 (Spark, Loans)
Week 9
Mon, Oct 28
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Wed, Oct 30
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Quiz: week 8 and before (cumulative)
Fri, Nov 1
Cassandra Query Language (CQL)
Week 10
Mon, Nov 4
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Worksheet: PDF
Wed, Nov 6
Cassandra Replication
Due: P5
Released: P6 (Cassandra, Weather)
Quiz: week 9 and before (cumulative)
Fri, Nov 8
Review
Week 11
Mon, Nov 11
Midterm (in class)
Wed, Nov 13
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Quiz: week 10 and before (cumulative)
Fri, Nov 15
Streaming: Kafka Demos
Week 12
Mon, Nov 18
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Wed, Nov 20
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Due: P6
Released: P7 (Kafka, Weather Stations)
Quiz: week 11 and before (cumulative)
Fri, Nov 22
Streaming: Spark Concepts
Part 3: Cloud
Week 13
Mon, Nov 25
The Cloud
Wed, Nov 27
Big Query 1
Quiz: week 12 and before (cumulative)
Fri, Nov 29
Thanksgiving Break
Week 14
Mon, Dec 2
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Wed, Dec 4
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Quiz: week 13 and before (cumulative)
Fri, Dec 6
Big Query 4
Week 15
Mon, Dec 9
Cloud Deployment
Wed, Dec 11
Review
Due: P8
Mon, Sep 16
Network Resources (gRPC)
Read: gRPC Basics TutorialWatch: Lecture Part 1 (concepts)
Watch: Lecture Part 2 (demo)
Slides: PDF
Anki Flashcards: Deck
Wed, Sep 18
Network Resources (Compose)
Watch: LectureSlides: PDF
Quiz: week 2 and before (cumulative)
Fri, Sep 20
Memory Resources (Caching)
Read: Systems Performance, Gregg (6.2.2; "CPU Caches" and "Latency" subsections of 6.4.1)Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Wed, Sep 25
Memory Resources (PyArrow)
Read: Gallery of Processor Cache Effects (Examples 1 and 2)Watch: Lecture Part 1 (CPU Cache)
Watch: Lecture Part 2 (OS Cache)
Slides: PDF
Anki Flashcards: Deck
Quiz: week 3 and before (cumulative)
Fri, Sep 27
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Week 5
Mon, Sep 30
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Wed, Oct 2
Storage Resources (File Systems)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 4 and before (cumulative)
Fri, Oct 4
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")
Due: P2
Released: P3 (Compute+Storage)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 6
Mon, Oct 7
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 9
Midterm (in class)
Part 2: Clusters
Week 7
Mon, Oct 14
HDFS Practice
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Released: P4 (HDFS, Loans)
Due: P3
Watch: Lecture
Anki Flashcards: Deck
Wed, Oct 16
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Quiz: week 6 and before (cumulative)
Fri, Oct 18
Spark RDDs
Week 8
Mon, Oct 21
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Wed, Oct 23
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Fri, Oct 25
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Due: P4
Released: P5 (Spark, Loans)
Week 9
Mon, Oct 28
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Wed, Oct 30
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Quiz: week 8 and before (cumulative)
Fri, Nov 1
Cassandra Query Language (CQL)
Week 10
Mon, Nov 4
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Worksheet: PDF
Wed, Nov 6
Cassandra Replication
Due: P5
Released: P6 (Cassandra, Weather)
Quiz: week 9 and before (cumulative)
Fri, Nov 8
Review
Week 11
Mon, Nov 11
Midterm (in class)
Wed, Nov 13
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Quiz: week 10 and before (cumulative)
Fri, Nov 15
Streaming: Kafka Demos
Week 12
Mon, Nov 18
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Wed, Nov 20
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Due: P6
Released: P7 (Kafka, Weather Stations)
Quiz: week 11 and before (cumulative)
Fri, Nov 22
Streaming: Spark Concepts
Part 3: Cloud
Week 13
Mon, Nov 25
The Cloud
Wed, Nov 27
Big Query 1
Quiz: week 12 and before (cumulative)
Fri, Nov 29
Thanksgiving Break
Week 14
Mon, Dec 2
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Wed, Dec 4
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Quiz: week 13 and before (cumulative)
Fri, Dec 6
Big Query 4
Week 15
Mon, Dec 9
Cloud Deployment
Wed, Dec 11
Review
Due: P8
Mon, Sep 30
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Wed, Oct 2
Storage Resources (File Systems)
Watch: LectureSlides: PDF
Anki Flashcards: Deck
Quiz: week 4 and before (cumulative)
Fri, Oct 4
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")Due: P2
Released: P3 (Compute+Storage)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Mon, Oct 7
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 9
Midterm (in class)
Part 2: Clusters
Week 7
Mon, Oct 14
HDFS Practice
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Released: P4 (HDFS, Loans)
Due: P3
Watch: Lecture
Anki Flashcards: Deck
Wed, Oct 16
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Quiz: week 6 and before (cumulative)
Fri, Oct 18
Spark RDDs
Week 8
Mon, Oct 21
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Wed, Oct 23
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Fri, Oct 25
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Due: P4
Released: P5 (Spark, Loans)
Week 9
Mon, Oct 28
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Wed, Oct 30
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Quiz: week 8 and before (cumulative)
Fri, Nov 1
Cassandra Query Language (CQL)
Week 10
Mon, Nov 4
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Worksheet: PDF
Wed, Nov 6
Cassandra Replication
Due: P5
Released: P6 (Cassandra, Weather)
Quiz: week 9 and before (cumulative)
Fri, Nov 8
Review
Week 11
Mon, Nov 11
Midterm (in class)
Wed, Nov 13
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Quiz: week 10 and before (cumulative)
Fri, Nov 15
Streaming: Kafka Demos
Week 12
Mon, Nov 18
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Wed, Nov 20
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Due: P6
Released: P7 (Kafka, Weather Stations)
Quiz: week 11 and before (cumulative)
Fri, Nov 22
Streaming: Spark Concepts
Part 3: Cloud
Week 13
Mon, Nov 25
The Cloud
Wed, Nov 27
Big Query 1
Quiz: week 12 and before (cumulative)
Fri, Nov 29
Thanksgiving Break
Week 14
Mon, Dec 2
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Wed, Dec 4
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Quiz: week 13 and before (cumulative)
Fri, Dec 6
Big Query 4
Week 15
Mon, Dec 9
Cloud Deployment
Wed, Dec 11
Review
Due: P8
Mon, Oct 14
HDFS Practice
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)Released: P4 (HDFS, Loans)
Due: P3
Watch: Lecture
Anki Flashcards: Deck
Wed, Oct 16
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")Quiz: week 6 and before (cumulative)
Fri, Oct 18
Spark RDDs
Mon, Oct 21
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")Wed, Oct 23
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")Fri, Oct 25
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")Due: P4
Released: P5 (Spark, Loans)
Week 9
Mon, Oct 28
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Wed, Oct 30
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Quiz: week 8 and before (cumulative)
Fri, Nov 1
Cassandra Query Language (CQL)
Week 10
Mon, Nov 4
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Worksheet: PDF
Wed, Nov 6
Cassandra Replication
Due: P5
Released: P6 (Cassandra, Weather)
Quiz: week 9 and before (cumulative)
Fri, Nov 8
Review
Week 11
Mon, Nov 11
Midterm (in class)
Wed, Nov 13
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Quiz: week 10 and before (cumulative)
Fri, Nov 15
Streaming: Kafka Demos
Week 12
Mon, Nov 18
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Wed, Nov 20
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Due: P6
Released: P7 (Kafka, Weather Stations)
Quiz: week 11 and before (cumulative)
Fri, Nov 22
Streaming: Spark Concepts
Part 3: Cloud
Week 13
Mon, Nov 25
The Cloud
Wed, Nov 27
Big Query 1
Quiz: week 12 and before (cumulative)
Fri, Nov 29
Thanksgiving Break
Week 14
Mon, Dec 2
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Wed, Dec 4
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Quiz: week 13 and before (cumulative)
Fri, Dec 6
Big Query 4
Week 15
Mon, Dec 9
Cloud Deployment
Wed, Dec 11
Review
Due: P8
Mon, Oct 28
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")Wed, Oct 30
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")Quiz: week 8 and before (cumulative)
Fri, Nov 1
Cassandra Query Language (CQL)
Mon, Nov 4
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")Worksheet: PDF
Wed, Nov 6
Cassandra Replication
Due: P5Released: P6 (Cassandra, Weather)
Quiz: week 9 and before (cumulative)
Fri, Nov 8
Review
Week 11
Mon, Nov 11
Midterm (in class)
Wed, Nov 13
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Quiz: week 10 and before (cumulative)
Fri, Nov 15
Streaming: Kafka Demos
Week 12
Mon, Nov 18
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Wed, Nov 20
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Due: P6
Released: P7 (Kafka, Weather Stations)
Quiz: week 11 and before (cumulative)
Fri, Nov 22
Streaming: Spark Concepts
Part 3: Cloud
Week 13
Mon, Nov 25
The Cloud
Wed, Nov 27
Big Query 1
Quiz: week 12 and before (cumulative)
Fri, Nov 29
Thanksgiving Break
Week 14
Mon, Dec 2
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Wed, Dec 4
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Quiz: week 13 and before (cumulative)
Fri, Dec 6
Big Query 4
Week 15
Mon, Dec 9
Cloud Deployment
Wed, Dec 11
Review
Due: P8
Mon, Nov 11
Midterm (in class)
Wed, Nov 13
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")Quiz: week 10 and before (cumulative)
Fri, Nov 15
Streaming: Kafka Demos
Mon, Nov 18
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")Wed, Nov 20
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")Due: P6
Released: P7 (Kafka, Weather Stations)
Quiz: week 11 and before (cumulative)
Fri, Nov 22
Streaming: Spark Concepts
Part 3: Cloud
Week 13
Mon, Nov 25
The Cloud
Wed, Nov 27
Big Query 1
Quiz: week 12 and before (cumulative)
Fri, Nov 29
Thanksgiving Break
Week 14
Mon, Dec 2
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Wed, Dec 4
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Quiz: week 13 and before (cumulative)
Fri, Dec 6
Big Query 4
Week 15
Mon, Dec 9
Cloud Deployment
Wed, Dec 11
Review
Due: P8
Mon, Nov 25
The Cloud
Wed, Nov 27
Big Query 1
Quiz: week 12 and before (cumulative)
Fri, Nov 29
Thanksgiving Break
Mon, Dec 2
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")Wed, Dec 4
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")Due: P7
Released: P8 (BigQuery, Loans)
Quiz: week 13 and before (cumulative)