Course Schedule
Part 1: Resources
Week 1
Mon, Sep 2
Labor Day
Week 2
Mon, Sep 9
Deployment (Linux Pipelines)
Read: Designing Data Intensive Applications, Kleppmann ("Batch Processing with Unix Tools" of Chapter 10)
Watch: Lecture
Slides: PDF
Wed, Sep 11
Deployment (Docker)
Released: P1 (Docker)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 1
Fri, Sep 13
Network Resources (Overview)
Read: Designing Data Intensive Applications, Kleppmann (Chapter 4, "Encoding and Evolution")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 3
Mon, Sep 16
Network Resources (gRPC)
Read: gRPC Basics Tutorial
Watch: Lecture Part 1 (concepts)
Watch: Lecture Part 2 (demo)
Slides: PDF
Anki Flashcards: Deck
Wed, Sep 18
Network Resources (Compose)
Watch: Lecture
Slides: PDF
Quiz: week 2 and before (cumulative)
Fri, Sep 20
Memory Resources (Caching)
Read: Systems Performance, Gregg (6.2.2; "CPU Caches" and "Latency" subsections of 6.4.1)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Week 4
Wed, Sep 25
Memory Resources (PyArrow)
Read: Gallery of Processor Cache Effects (Examples 1 and 2)
Watch: Lecture Part 1 (CPU Cache)
Watch: Lecture Part 2 (OS Cache)
Slides: PDF
Anki Flashcards: Deck
Quiz: week 3 and before (cumulative)
Fri, Sep 27
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Week 5
Mon, Sep 30
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Wed, Oct 2
Storage Resources (File Systems)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 4 and before (cumulative)
Fri, Oct 4
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")
Due: P2
Released: P3 (Compute+Storage)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 6
Mon, Oct 7
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 9
Midterm (in class)
Part 2: Clusters
Week 7
Mon, Oct 14
HDFS Practice
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Released: P4 (HDFS, Loans)
Due: P3
Watch: Lecture
Anki Flashcards: Deck
Wed, Oct 16
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 6 and before (cumulative)
Week 8
Mon, Oct 21
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Watch: Lecture
Anki Flashcards: Deck
Wed, Oct 23
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Oct 25
Spark Internals and Performance 1
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Due: P4
Released: P5 (Spark, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 9
Wed, Oct 30
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 8 and before (cumulative)
Fri, Nov 1
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture Part 1: PLANET
Watch: Lecture Part 2: HBase and Cassandra
Watch: Lecture Part 3: Getting Started
Slides: PDF
Anki Flashcards: Deck
Week 10
Wed, Nov 6
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Released: P6 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 9 and before (cumulative)
Week 11
Mon, Nov 11
Midterm (in class)
Wed, Nov 13
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Week 12
Mon, Nov 18
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Watch: Lecture Part 1: GROUP BY
Watch: Lecture Part 2: Reliability
Watch: Lecture Part 3: Exactly Once
Slides: PDF
Anki Flashcards: Deck
Wed, Nov 20
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Due: P6
Released: P7 (Kafka, Weather Stations)
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 11 and before (cumulative)
Part 3: Cloud
Week 13
Mon, Nov 25
The Cloud
Wed, Nov 27
Big Query 1
Quiz: week 12 and before (cumulative)
Fri, Nov 29
Thanksgiving Break
Week 14
Mon, Dec 2
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Wed, Dec 4
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Quiz: week 13 and before (cumulative)
Fri, Dec 6
Big Query 4
Week 15
Mon, Dec 9
Cloud Deployment
Wed, Dec 11
Review
Due: P8
Mon, Sep 2
Labor Day
Mon, Sep 9
Deployment (Linux Pipelines)
Read: Designing Data Intensive Applications, Kleppmann ("Batch Processing with Unix Tools" of Chapter 10)Watch: Lecture
Slides: PDF
Wed, Sep 11
Deployment (Docker)
Released: P1 (Docker)Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 1
Fri, Sep 13
Network Resources (Overview)
Read: Designing Data Intensive Applications, Kleppmann (Chapter 4, "Encoding and Evolution")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 3
Mon, Sep 16
Network Resources (gRPC)
Read: gRPC Basics Tutorial
Watch: Lecture Part 1 (concepts)
Watch: Lecture Part 2 (demo)
Slides: PDF
Anki Flashcards: Deck
Wed, Sep 18
Network Resources (Compose)
Watch: Lecture
Slides: PDF
Quiz: week 2 and before (cumulative)
Fri, Sep 20
Memory Resources (Caching)
Read: Systems Performance, Gregg (6.2.2; "CPU Caches" and "Latency" subsections of 6.4.1)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Week 4
Wed, Sep 25
Memory Resources (PyArrow)
Read: Gallery of Processor Cache Effects (Examples 1 and 2)
Watch: Lecture Part 1 (CPU Cache)
Watch: Lecture Part 2 (OS Cache)
Slides: PDF
Anki Flashcards: Deck
Quiz: week 3 and before (cumulative)
Fri, Sep 27
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Week 5
Mon, Sep 30
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Wed, Oct 2
Storage Resources (File Systems)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 4 and before (cumulative)
Fri, Oct 4
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")
Due: P2
Released: P3 (Compute+Storage)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 6
Mon, Oct 7
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 9
Midterm (in class)
Part 2: Clusters
Week 7
Mon, Oct 14
HDFS Practice
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Released: P4 (HDFS, Loans)
Due: P3
Watch: Lecture
Anki Flashcards: Deck
Wed, Oct 16
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 6 and before (cumulative)
Week 8
Mon, Oct 21
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Watch: Lecture
Anki Flashcards: Deck
Wed, Oct 23
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Oct 25
Spark Internals and Performance 1
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Due: P4
Released: P5 (Spark, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 9
Wed, Oct 30
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 8 and before (cumulative)
Fri, Nov 1
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture Part 1: PLANET
Watch: Lecture Part 2: HBase and Cassandra
Watch: Lecture Part 3: Getting Started
Slides: PDF
Anki Flashcards: Deck
Week 10
Wed, Nov 6
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Released: P6 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 9 and before (cumulative)
Week 11
Mon, Nov 11
Midterm (in class)
Wed, Nov 13
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Week 12
Mon, Nov 18
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Watch: Lecture Part 1: GROUP BY
Watch: Lecture Part 2: Reliability
Watch: Lecture Part 3: Exactly Once
Slides: PDF
Anki Flashcards: Deck
Wed, Nov 20
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Due: P6
Released: P7 (Kafka, Weather Stations)
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 11 and before (cumulative)
Part 3: Cloud
Week 13
Mon, Nov 25
The Cloud
Wed, Nov 27
Big Query 1
Quiz: week 12 and before (cumulative)
Fri, Nov 29
Thanksgiving Break
Week 14
Mon, Dec 2
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Wed, Dec 4
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Quiz: week 13 and before (cumulative)
Fri, Dec 6
Big Query 4
Week 15
Mon, Dec 9
Cloud Deployment
Wed, Dec 11
Review
Due: P8
Mon, Sep 16
Network Resources (gRPC)
Read: gRPC Basics TutorialWatch: Lecture Part 1 (concepts)
Watch: Lecture Part 2 (demo)
Slides: PDF
Anki Flashcards: Deck
Wed, Sep 18
Network Resources (Compose)
Watch: LectureSlides: PDF
Quiz: week 2 and before (cumulative)
Fri, Sep 20
Memory Resources (Caching)
Read: Systems Performance, Gregg (6.2.2; "CPU Caches" and "Latency" subsections of 6.4.1)Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Wed, Sep 25
Memory Resources (PyArrow)
Read: Gallery of Processor Cache Effects (Examples 1 and 2)Watch: Lecture Part 1 (CPU Cache)
Watch: Lecture Part 2 (OS Cache)
Slides: PDF
Anki Flashcards: Deck
Quiz: week 3 and before (cumulative)
Fri, Sep 27
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Week 5
Mon, Sep 30
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Wed, Oct 2
Storage Resources (File Systems)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 4 and before (cumulative)
Fri, Oct 4
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")
Due: P2
Released: P3 (Compute+Storage)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 6
Mon, Oct 7
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 9
Midterm (in class)
Part 2: Clusters
Week 7
Mon, Oct 14
HDFS Practice
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Released: P4 (HDFS, Loans)
Due: P3
Watch: Lecture
Anki Flashcards: Deck
Wed, Oct 16
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 6 and before (cumulative)
Week 8
Mon, Oct 21
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Watch: Lecture
Anki Flashcards: Deck
Wed, Oct 23
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Oct 25
Spark Internals and Performance 1
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Due: P4
Released: P5 (Spark, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 9
Wed, Oct 30
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 8 and before (cumulative)
Fri, Nov 1
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture Part 1: PLANET
Watch: Lecture Part 2: HBase and Cassandra
Watch: Lecture Part 3: Getting Started
Slides: PDF
Anki Flashcards: Deck
Week 10
Wed, Nov 6
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Released: P6 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 9 and before (cumulative)
Week 11
Mon, Nov 11
Midterm (in class)
Wed, Nov 13
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Week 12
Mon, Nov 18
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Watch: Lecture Part 1: GROUP BY
Watch: Lecture Part 2: Reliability
Watch: Lecture Part 3: Exactly Once
Slides: PDF
Anki Flashcards: Deck
Wed, Nov 20
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Due: P6
Released: P7 (Kafka, Weather Stations)
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 11 and before (cumulative)
Part 3: Cloud
Week 13
Mon, Nov 25
The Cloud
Wed, Nov 27
Big Query 1
Quiz: week 12 and before (cumulative)
Fri, Nov 29
Thanksgiving Break
Week 14
Mon, Dec 2
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Wed, Dec 4
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Quiz: week 13 and before (cumulative)
Fri, Dec 6
Big Query 4
Week 15
Mon, Dec 9
Cloud Deployment
Wed, Dec 11
Review
Due: P8
Mon, Sep 30
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Wed, Oct 2
Storage Resources (File Systems)
Watch: LectureSlides: PDF
Anki Flashcards: Deck
Quiz: week 4 and before (cumulative)
Fri, Oct 4
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")Due: P2
Released: P3 (Compute+Storage)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Mon, Oct 7
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Oct 9
Midterm (in class)
Part 2: Clusters
Week 7
Mon, Oct 14
HDFS Practice
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Released: P4 (HDFS, Loans)
Due: P3
Watch: Lecture
Anki Flashcards: Deck
Wed, Oct 16
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 6 and before (cumulative)
Week 8
Mon, Oct 21
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Watch: Lecture
Anki Flashcards: Deck
Wed, Oct 23
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Oct 25
Spark Internals and Performance 1
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Due: P4
Released: P5 (Spark, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 9
Wed, Oct 30
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 8 and before (cumulative)
Fri, Nov 1
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture Part 1: PLANET
Watch: Lecture Part 2: HBase and Cassandra
Watch: Lecture Part 3: Getting Started
Slides: PDF
Anki Flashcards: Deck
Week 10
Wed, Nov 6
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Released: P6 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 9 and before (cumulative)
Week 11
Mon, Nov 11
Midterm (in class)
Wed, Nov 13
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Week 12
Mon, Nov 18
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Watch: Lecture Part 1: GROUP BY
Watch: Lecture Part 2: Reliability
Watch: Lecture Part 3: Exactly Once
Slides: PDF
Anki Flashcards: Deck
Wed, Nov 20
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Due: P6
Released: P7 (Kafka, Weather Stations)
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 11 and before (cumulative)
Part 3: Cloud
Week 13
Mon, Nov 25
The Cloud
Wed, Nov 27
Big Query 1
Quiz: week 12 and before (cumulative)
Fri, Nov 29
Thanksgiving Break
Week 14
Mon, Dec 2
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Wed, Dec 4
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Quiz: week 13 and before (cumulative)
Fri, Dec 6
Big Query 4
Week 15
Mon, Dec 9
Cloud Deployment
Wed, Dec 11
Review
Due: P8
Mon, Oct 14
HDFS Practice
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)Released: P4 (HDFS, Loans)
Due: P3
Watch: Lecture
Anki Flashcards: Deck
Wed, Oct 16
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 6 and before (cumulative)
Mon, Oct 21
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")Watch: Lecture
Anki Flashcards: Deck
Wed, Oct 23
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Oct 25
Spark Internals and Performance 1
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")Due: P4
Released: P5 (Spark, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 9
Wed, Oct 30
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 8 and before (cumulative)
Fri, Nov 1
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture Part 1: PLANET
Watch: Lecture Part 2: HBase and Cassandra
Watch: Lecture Part 3: Getting Started
Slides: PDF
Anki Flashcards: Deck
Week 10
Wed, Nov 6
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Released: P6 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 9 and before (cumulative)
Week 11
Mon, Nov 11
Midterm (in class)
Wed, Nov 13
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Week 12
Mon, Nov 18
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Watch: Lecture Part 1: GROUP BY
Watch: Lecture Part 2: Reliability
Watch: Lecture Part 3: Exactly Once
Slides: PDF
Anki Flashcards: Deck
Wed, Nov 20
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Due: P6
Released: P7 (Kafka, Weather Stations)
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 11 and before (cumulative)
Part 3: Cloud
Week 13
Mon, Nov 25
The Cloud
Wed, Nov 27
Big Query 1
Quiz: week 12 and before (cumulative)
Fri, Nov 29
Thanksgiving Break
Week 14
Mon, Dec 2
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Wed, Dec 4
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Quiz: week 13 and before (cumulative)
Fri, Dec 6
Big Query 4
Week 15
Mon, Dec 9
Cloud Deployment
Wed, Dec 11
Review
Due: P8
Wed, Oct 30
Spark Machine Learning
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 8 and before (cumulative)
Fri, Nov 1
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")Watch: Lecture Part 1: PLANET
Watch: Lecture Part 2: HBase and Cassandra
Watch: Lecture Part 3: Getting Started
Slides: PDF
Anki Flashcards: Deck
Wed, Nov 6
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")Due: P5
Released: P6 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Worksheet: PDF
Anki Flashcards: Deck
Quiz: week 9 and before (cumulative)
Week 11
Mon, Nov 11
Midterm (in class)
Wed, Nov 13
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Week 12
Mon, Nov 18
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Watch: Lecture Part 1: GROUP BY
Watch: Lecture Part 2: Reliability
Watch: Lecture Part 3: Exactly Once
Slides: PDF
Anki Flashcards: Deck
Wed, Nov 20
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Due: P6
Released: P7 (Kafka, Weather Stations)
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 11 and before (cumulative)
Part 3: Cloud
Week 13
Mon, Nov 25
The Cloud
Wed, Nov 27
Big Query 1
Quiz: week 12 and before (cumulative)
Fri, Nov 29
Thanksgiving Break
Week 14
Mon, Dec 2
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Wed, Dec 4
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Quiz: week 13 and before (cumulative)
Fri, Dec 6
Big Query 4
Week 15
Mon, Dec 9
Cloud Deployment
Wed, Dec 11
Review
Due: P8
Mon, Nov 11
Midterm (in class)
Wed, Nov 13
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Mon, Nov 18
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")Watch: Lecture Part 1: GROUP BY
Watch: Lecture Part 2: Reliability
Watch: Lecture Part 3: Exactly Once
Slides: PDF
Anki Flashcards: Deck
Wed, Nov 20
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")Due: P6
Released: P7 (Kafka, Weather Stations)
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 11 and before (cumulative)
Part 3: Cloud
Week 13
Mon, Nov 25
The Cloud
Wed, Nov 27
Big Query 1
Quiz: week 12 and before (cumulative)
Fri, Nov 29
Thanksgiving Break
Week 14
Mon, Dec 2
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Wed, Dec 4
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Due: P7
Released: P8 (BigQuery, Loans)
Quiz: week 13 and before (cumulative)
Fri, Dec 6
Big Query 4
Week 15
Mon, Dec 9
Cloud Deployment
Wed, Dec 11
Review
Due: P8
Mon, Nov 25
The Cloud
Wed, Nov 27
Big Query 1
Quiz: week 12 and before (cumulative)
Fri, Nov 29
Thanksgiving Break
Mon, Dec 2
Big Query 2
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")Wed, Dec 4
Big Query 3
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")Due: P7
Released: P8 (BigQuery, Loans)
Quiz: week 13 and before (cumulative)