Course Schedule
Part 1: Resources
Week 1
Mon, Jan 20
Martin Luther King Day!
Week 2
Mon, Jan 27
Deployment (Linux Pipelines)
Read: Designing Data Intensive Applications, Kleppmann ("Batch Processing with Unix Tools" of Chapter 10)
Watch: Lecture
Slides: PDF
Wed, Jan 29
Deployment (Docker)
Release: P1 (Docker)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 1
Fri, Jan 31
Network Resources (Overview)
Read: Designing Data Intensive Applications, Kleppmann (Chapter 4, "Encoding and Evolution")
Watch: Part 1: Dockerfiles
Watch: Part 2: Networking Intro
Watch: Part 3: Web Server Demo
Watch: Part 4: gRPC
Slides: PDF
Anki Flashcards: Deck
Week 3
Mon, Feb 3
Network Resources (gRPC)
Read: gRPC Basics Tutorial
Watch: Part 1: gRPC Demo
Watch: Part 2: gRPC repeated values
Watch: Part 3: Docker port forwarding
Slides: PDF
Anki Flashcards: Deck
Wed, Feb 5
Network Resources (Compose)
Watch: Lecture
Slides: PDF
Quiz: week 2 and before (cumulative)
Fri, Feb 7
Memory Resources (Caching)
Read: Systems Performance, Gregg (6.2.2; "CPU Caches" and "Latency" subsections of 6.4.1)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 4
Wed, Feb 12
Memory Resources (PyArrow)
Read: Gallery of Processor Cache Effects (Examples 1 and 2)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 3 and before (cumulative)
Fri, Feb 14
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 5
Mon, Feb 17
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Feb 19
Storage Resources (File Systems)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 4 and before (cumulative)
Fri, Feb 21
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")
Due: P2
Release: P3 (Compute+Storage)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 6
Mon, Feb 24
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Feb 26
Midterm (in class)
Part 2: Clusters
Week 7
Mon, Mar 3
Hadoop
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Release: P4 (HDFS, Loans)
Due: P3
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Mar 5
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 6 and before (cumulative)
Week 8
Mon, Mar 10
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Watch: Part 1: Spark RDD Demos
Watch: Part 2: Spark DataFrames
Watch: Part 3: Spark with CSVs and Parquets
Anki Flashcards: Deck
Wed, Mar 12
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 7 and before (cumulative)
Fri, Mar 14
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Due: P4
Release: P5 (Spark, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 9
Mon, Mar 17
Spark Machine Learning API
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Mar 21
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 10
Mon, Mar 24
Spring Break
Wed, Mar 26
Spring Break
Fri, Mar 28
Spring Break
Week 11
Mon, Mar 31
Cassandra Query Language (CQL)
Watch: Part 1: CQL Demos
Watch: Part 2: Cassandra Partitioning
Anki Flashcards: Deck
Wed, Apr 2
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Release: P6 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Week 12
Mon, Apr 7
Midterm (in class)
Wed, Apr 9
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 13
Mon, Apr 14
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Due: P6
Release: P7 (Kafka, Weather Stations)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 16
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 12 and before (cumulative)
Part 3: Cloud
Week 14
Mon, Apr 21
The Cloud
Watch: Part 1: Cloud Intro
Watch: Part 2: Cloud Platforms
Watch: Part 3: Creating a Cloud VM
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 23
Big Query 1: Basics
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 13 and before (cumulative)
Fri, Apr 25
Big Query 2: Data Sources + Geo Data
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Release: P8 (Cloud Services)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 15
Mon, Apr 28
Big Query 3: Machine Learning
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 30
Big Query 4: Cost
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 14 and before (cumulative)
Mon, Jan 20
Martin Luther King Day!
Mon, Jan 27
Deployment (Linux Pipelines)
Read: Designing Data Intensive Applications, Kleppmann ("Batch Processing with Unix Tools" of Chapter 10)Watch: Lecture
Slides: PDF
Wed, Jan 29
Deployment (Docker)
Release: P1 (Docker)Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 1
Fri, Jan 31
Network Resources (Overview)
Read: Designing Data Intensive Applications, Kleppmann (Chapter 4, "Encoding and Evolution")Watch: Part 1: Dockerfiles
Watch: Part 2: Networking Intro
Watch: Part 3: Web Server Demo
Watch: Part 4: gRPC
Slides: PDF
Anki Flashcards: Deck
Week 3
Mon, Feb 3
Network Resources (gRPC)
Read: gRPC Basics Tutorial
Watch: Part 1: gRPC Demo
Watch: Part 2: gRPC repeated values
Watch: Part 3: Docker port forwarding
Slides: PDF
Anki Flashcards: Deck
Wed, Feb 5
Network Resources (Compose)
Watch: Lecture
Slides: PDF
Quiz: week 2 and before (cumulative)
Fri, Feb 7
Memory Resources (Caching)
Read: Systems Performance, Gregg (6.2.2; "CPU Caches" and "Latency" subsections of 6.4.1)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 4
Wed, Feb 12
Memory Resources (PyArrow)
Read: Gallery of Processor Cache Effects (Examples 1 and 2)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 3 and before (cumulative)
Fri, Feb 14
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 5
Mon, Feb 17
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Feb 19
Storage Resources (File Systems)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 4 and before (cumulative)
Fri, Feb 21
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")
Due: P2
Release: P3 (Compute+Storage)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 6
Mon, Feb 24
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Feb 26
Midterm (in class)
Part 2: Clusters
Week 7
Mon, Mar 3
Hadoop
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Release: P4 (HDFS, Loans)
Due: P3
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Mar 5
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 6 and before (cumulative)
Week 8
Mon, Mar 10
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Watch: Part 1: Spark RDD Demos
Watch: Part 2: Spark DataFrames
Watch: Part 3: Spark with CSVs and Parquets
Anki Flashcards: Deck
Wed, Mar 12
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 7 and before (cumulative)
Fri, Mar 14
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Due: P4
Release: P5 (Spark, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 9
Mon, Mar 17
Spark Machine Learning API
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Mar 21
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 10
Mon, Mar 24
Spring Break
Wed, Mar 26
Spring Break
Fri, Mar 28
Spring Break
Week 11
Mon, Mar 31
Cassandra Query Language (CQL)
Watch: Part 1: CQL Demos
Watch: Part 2: Cassandra Partitioning
Anki Flashcards: Deck
Wed, Apr 2
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Release: P6 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Week 12
Mon, Apr 7
Midterm (in class)
Wed, Apr 9
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 13
Mon, Apr 14
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Due: P6
Release: P7 (Kafka, Weather Stations)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 16
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 12 and before (cumulative)
Part 3: Cloud
Week 14
Mon, Apr 21
The Cloud
Watch: Part 1: Cloud Intro
Watch: Part 2: Cloud Platforms
Watch: Part 3: Creating a Cloud VM
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 23
Big Query 1: Basics
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 13 and before (cumulative)
Fri, Apr 25
Big Query 2: Data Sources + Geo Data
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Release: P8 (Cloud Services)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 15
Mon, Apr 28
Big Query 3: Machine Learning
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 30
Big Query 4: Cost
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 14 and before (cumulative)
Mon, Feb 3
Network Resources (gRPC)
Read: gRPC Basics TutorialWatch: Part 1: gRPC Demo
Watch: Part 2: gRPC repeated values
Watch: Part 3: Docker port forwarding
Slides: PDF
Anki Flashcards: Deck
Wed, Feb 5
Network Resources (Compose)
Watch: LectureSlides: PDF
Quiz: week 2 and before (cumulative)
Fri, Feb 7
Memory Resources (Caching)
Read: Systems Performance, Gregg (6.2.2; "CPU Caches" and "Latency" subsections of 6.4.1)Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Feb 12
Memory Resources (PyArrow)
Read: Gallery of Processor Cache Effects (Examples 1 and 2)Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 3 and before (cumulative)
Fri, Feb 14
Compute Resources (Threads)
Read: Fluent Python, 2nd Edition ("What's New in This Chapter" through "A Bit of Jargon" in chapter 19, "Concurrency Models in Python")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 5
Mon, Feb 17
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Feb 19
Storage Resources (File Systems)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 4 and before (cumulative)
Fri, Feb 21
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")
Due: P2
Release: P3 (Compute+Storage)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 6
Mon, Feb 24
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Feb 26
Midterm (in class)
Part 2: Clusters
Week 7
Mon, Mar 3
Hadoop
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Release: P4 (HDFS, Loans)
Due: P3
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Mar 5
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 6 and before (cumulative)
Week 8
Mon, Mar 10
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Watch: Part 1: Spark RDD Demos
Watch: Part 2: Spark DataFrames
Watch: Part 3: Spark with CSVs and Parquets
Anki Flashcards: Deck
Wed, Mar 12
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 7 and before (cumulative)
Fri, Mar 14
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Due: P4
Release: P5 (Spark, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 9
Mon, Mar 17
Spark Machine Learning API
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Mar 21
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 10
Mon, Mar 24
Spring Break
Wed, Mar 26
Spring Break
Fri, Mar 28
Spring Break
Week 11
Mon, Mar 31
Cassandra Query Language (CQL)
Watch: Part 1: CQL Demos
Watch: Part 2: Cassandra Partitioning
Anki Flashcards: Deck
Wed, Apr 2
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Release: P6 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Week 12
Mon, Apr 7
Midterm (in class)
Wed, Apr 9
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 13
Mon, Apr 14
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Due: P6
Release: P7 (Kafka, Weather Stations)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 16
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 12 and before (cumulative)
Part 3: Cloud
Week 14
Mon, Apr 21
The Cloud
Watch: Part 1: Cloud Intro
Watch: Part 2: Cloud Platforms
Watch: Part 3: Creating a Cloud VM
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 23
Big Query 1: Basics
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 13 and before (cumulative)
Fri, Apr 25
Big Query 2: Data Sources + Geo Data
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Release: P8 (Cloud Services)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 15
Mon, Apr 28
Big Query 3: Machine Learning
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 30
Big Query 4: Cost
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 14 and before (cumulative)
Mon, Feb 17
Compute Resources (Locks)
Read: Mastering Concurrency in Python ("Working With Threads In Python" chapter)Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Feb 19
Storage Resources (File Systems)
Watch: LectureSlides: PDF
Anki Flashcards: Deck
Quiz: week 4 and before (cumulative)
Fri, Feb 21
Storage Resources (Formats and DBs)
Read: Designing Data Intensive Applications, Kleppmann ("Transaction Processing or Analytics?" and "Column-Oriented Storage" sections of Chapter 3, "Storage and Retrieval")Due: P2
Release: P3 (Compute+Storage)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Mon, Feb 24
SQL Databases (MySQL)
Read: MySQL Crash Course, Silva (Chapters 3+5), Designing Data-Intensive Applications, Kleppmann ("The Meaning of ACID" section in Chapter 7, "Transactions")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Feb 26
Midterm (in class)
Part 2: Clusters
Week 7
Mon, Mar 3
Hadoop
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)
Release: P4 (HDFS, Loans)
Due: P3
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Mar 5
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 6 and before (cumulative)
Week 8
Mon, Mar 10
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")
Watch: Part 1: Spark RDD Demos
Watch: Part 2: Spark DataFrames
Watch: Part 3: Spark with CSVs and Parquets
Anki Flashcards: Deck
Wed, Mar 12
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 7 and before (cumulative)
Fri, Mar 14
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")
Due: P4
Release: P5 (Spark, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 9
Mon, Mar 17
Spark Machine Learning API
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Mar 21
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 10
Mon, Mar 24
Spring Break
Wed, Mar 26
Spring Break
Fri, Mar 28
Spring Break
Week 11
Mon, Mar 31
Cassandra Query Language (CQL)
Watch: Part 1: CQL Demos
Watch: Part 2: Cassandra Partitioning
Anki Flashcards: Deck
Wed, Apr 2
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Release: P6 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Week 12
Mon, Apr 7
Midterm (in class)
Wed, Apr 9
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 13
Mon, Apr 14
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Due: P6
Release: P7 (Kafka, Weather Stations)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 16
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 12 and before (cumulative)
Part 3: Cloud
Week 14
Mon, Apr 21
The Cloud
Watch: Part 1: Cloud Intro
Watch: Part 2: Cloud Platforms
Watch: Part 3: Creating a Cloud VM
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 23
Big Query 1: Basics
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 13 and before (cumulative)
Fri, Apr 25
Big Query 2: Data Sources + Geo Data
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Release: P8 (Cloud Services)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 15
Mon, Apr 28
Big Query 3: Machine Learning
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 30
Big Query 4: Cost
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 14 and before (cumulative)
Mon, Mar 3
Hadoop
Read: Mastering Hadoop 3, Singh et al. ("Deep Dive Into the Hadoop Distributed File System" chapter)Release: P4 (HDFS, Loans)
Due: P3
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Mar 5
MapReduce
Read: Learning Spark, 2nd edition by Damji et al. (sections "The Importance of an Optimal Storage Solution", "Databases", and "Data Lakes" of chapter 9, "Building Reliable Data Lakes with Apache Spark")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 6 and before (cumulative)
Mon, Mar 10
Spark DataFrames
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 4, "Spark SQL and DataFrames: Introduction to Built-in Data Sources")Watch: Part 1: Spark RDD Demos
Watch: Part 2: Spark DataFrames
Watch: Part 3: Spark with CSVs and Parquets
Anki Flashcards: Deck
Wed, Mar 12
Spark SQL
Read: Designing Data Intensive Applications, Kleppmann ("Reduce-Side Joins and Grouping" of Chapter 10, "Batch Processing")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 7 and before (cumulative)
Fri, Mar 14
Spark Internals and Performance
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 7, "Optimizing and Tuning Spark Applications")Due: P4
Release: P5 (Spark, Loans)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 9
Mon, Mar 17
Spark Machine Learning API
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Mar 21
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 10
Mon, Mar 24
Spring Break
Wed, Mar 26
Spring Break
Fri, Mar 28
Spring Break
Week 11
Mon, Mar 31
Cassandra Query Language (CQL)
Watch: Part 1: CQL Demos
Watch: Part 2: Cassandra Partitioning
Anki Flashcards: Deck
Wed, Apr 2
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Release: P6 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Week 12
Mon, Apr 7
Midterm (in class)
Wed, Apr 9
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 13
Mon, Apr 14
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Due: P6
Release: P7 (Kafka, Weather Stations)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 16
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 12 and before (cumulative)
Part 3: Cloud
Week 14
Mon, Apr 21
The Cloud
Watch: Part 1: Cloud Intro
Watch: Part 2: Cloud Platforms
Watch: Part 3: Creating a Cloud VM
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 23
Big Query 1: Basics
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 13 and before (cumulative)
Fri, Apr 25
Big Query 2: Data Sources + Geo Data
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Release: P8 (Cloud Services)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 15
Mon, Apr 28
Big Query 3: Machine Learning
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 30
Big Query 4: Cost
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 14 and before (cumulative)
Mon, Mar 17
Spark Machine Learning API
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 10, "Machine Learning with MLlib")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Fri, Mar 21
Wide Tables: HBase and Cassandra
Read: Cassandra, The Definitive Guide, by Carpenter et al. (Chapter 4, "The Cassandra Query Language")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Mon, Mar 24
Spring Break
Wed, Mar 26
Spring Break
Fri, Mar 28
Spring Break
Week 11
Mon, Mar 31
Cassandra Query Language (CQL)
Watch: Part 1: CQL Demos
Watch: Part 2: Cassandra Partitioning
Anki Flashcards: Deck
Wed, Apr 2
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")
Due: P5
Release: P6 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Week 12
Mon, Apr 7
Midterm (in class)
Wed, Apr 9
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 13
Mon, Apr 14
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Due: P6
Release: P7 (Kafka, Weather Stations)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 16
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 12 and before (cumulative)
Part 3: Cloud
Week 14
Mon, Apr 21
The Cloud
Watch: Part 1: Cloud Intro
Watch: Part 2: Cloud Platforms
Watch: Part 3: Creating a Cloud VM
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 23
Big Query 1: Basics
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 13 and before (cumulative)
Fri, Apr 25
Big Query 2: Data Sources + Geo Data
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Release: P8 (Cloud Services)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 15
Mon, Apr 28
Big Query 3: Machine Learning
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 30
Big Query 4: Cost
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 14 and before (cumulative)
Mon, Mar 31
Cassandra Query Language (CQL)
Watch: Part 1: CQL DemosWatch: Part 2: Cassandra Partitioning
Anki Flashcards: Deck
Wed, Apr 2
Cassandra Partitioning
Read: Cassandra, The Definitive Guide, by Carpenter et al. (sections "Data Centers and Racks" to "Hinted Handoff" of Chapter 6, "The Cassandra Architecture")Due: P5
Release: P6 (Cassandra, Weather)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 10 and before (cumulative)
Mon, Apr 7
Midterm (in class)
Wed, Apr 9
Streaming: Kafka Concepts
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. ("Enter Kafka" section of Chapter 1, "Meet Kafka")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 13
Mon, Apr 14
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")
Due: P6
Release: P7 (Kafka, Weather Stations)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 16
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")
Watch: Lecture
Anki Flashcards: Deck
Quiz: week 12 and before (cumulative)
Part 3: Cloud
Week 14
Mon, Apr 21
The Cloud
Watch: Part 1: Cloud Intro
Watch: Part 2: Cloud Platforms
Watch: Part 3: Creating a Cloud VM
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 23
Big Query 1: Basics
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 13 and before (cumulative)
Fri, Apr 25
Big Query 2: Data Sources + Geo Data
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")
Due: P7
Release: P8 (Cloud Services)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 15
Mon, Apr 28
Big Query 3: Machine Learning
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 30
Big Query 4: Cost
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 14 and before (cumulative)
Mon, Apr 14
Streaming: Kafka Reliability
Read: Kafka, The Definitive Guide, 2nd edition by Shapira et al. (Chapter 7, "Reliable Data Delivery")Due: P6
Release: P7 (Kafka, Weather Stations)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 16
Streaming: Spark Programming
Read: Learning Spark, 2nd edition by Damji et al. (Chapter 8, "Structured Streaming")Watch: Lecture
Anki Flashcards: Deck
Quiz: week 12 and before (cumulative)
Mon, Apr 21
The Cloud
Watch: Part 1: Cloud IntroWatch: Part 2: Cloud Platforms
Watch: Part 3: Creating a Cloud VM
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 23
Big Query 1: Basics
Watch: LectureSlides: PDF
Anki Flashcards: Deck
Quiz: week 13 and before (cumulative)
Fri, Apr 25
Big Query 2: Data Sources + Geo Data
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. ("BigQuery Geographic Information Systems" section of Chapter 8, "Advanced Queries")Due: P7
Release: P8 (Cloud Services)
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Week 15
Mon, Apr 28
Big Query 3: Machine Learning
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 30
Big Query 4: Cost
Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Quiz: week 14 and before (cumulative)
Mon, Apr 28
Big Query 3: Machine Learning
Read: Google BigQuery: The Definitive Guide, by Lakshmanan et al. (Chapter 9, "Machine Learning in BigQuery")Watch: Lecture
Slides: PDF
Anki Flashcards: Deck
Wed, Apr 30
Big Query 4: Cost
Watch: LectureSlides: PDF
Anki Flashcards: Deck
Quiz: week 14 and before (cumulative)