Designing Modern Web-Scale Applications , ECE 1724, Fall 2024

Quick Links

Home Quercus Accessing Papers Project Format Project Ideas

Course Description

The last decade has seen an enormous shift in computing, with web-scale applications driving the rise of cloud computing and big data processing. This course discusses the principles, key technologies and trends in the design of web-scale applications. The course will examine and compare the architectures and the infrastructure needed to support several types of web-scale applications. Students will learn how these applications are designed to achieve high scalability, reliability and availability.

Students are expected to read, analyze and discuss seminal and cutting-edge research in this area. The aim is to both learn from prior work and extract exciting research ideas. A course project will provide concrete experience and deeper understanding of the material.

The course covers advanced topics, broadly in the areas of distributed systems, operating systems, storage and databases, with a focus on web-scale applications. The goal is to survey research in this area, rather than focus on a specific topic.

Prerequisites

This course builds on the following undergraduate courses taught at University of Toronto: operating systems (ECE344) and distributed systems (ECE419). It assumes that students are knowledgeable about the contents of these courses. If you have not taken these or similar courses, you will not have sufficient background to take this course.

Students are expected to have strong coding skills and be experienced in languages such as Java, Python, and C++. They should have experience building and debugging a significant software system. If unsure, consult with the instructor.

Textbooks

There are no required textbooks for this course. The optional textbooks are

Distributed Systems: Concepts and Design (Fifth Edition), by George Coulouris, Jean Dollimore Tim Kindberg and Gordon Blair, 2011.
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems, by Martin Kleppmann, 2017.
Guide to Reliable Distributed Systems: Building High-Assurance Applications and Cloud-Hosted Services, by Kenneth P. Birman, 2012.

Course Announcements and Questions

Course announcements will be made on Quercus Announcements. You should post any general questions about the course on Quercus Discussions. If you need to contact the instructor directly, please send an email.

Grading Policy

Grades will be based on quizzes and a class project. This year, the course has two grading options:

Course Project: With this option, the student will do a course project and two quizzes. This option is required for MASc and PhD students.
- 2 Quizzes: 50%
- Class project: 50%
All Quizzes: With this option, the student will do four quizzes and no course project. This option is suggested for MEng students.
- 4 Quizzes: 100% (each is 25%)

Course Readings

Each week this class will cover a set of papers that focuses on a specific aspect of the course. These papers are seminal or recent research papers, broadly in the areas of distributed systems, operating systems, storage, and databases. Students are expected to read typically one or two papers each week. The instructor will present these papers and then we will discuss these papers. Students are expected to take part in the discussion and so they are expected to read the papers before the class. Please expect to spend 4-6 hours per week to read the papers critically.

Class Project and Project Ideas

A major component of this course is devoted to a term-long project. The topic of the project is largely up to you, but to help you choose a project, a sample list of projects is provided below. This list should help students determine whether their own projects are of reasonable size and scope.

Students will generally work in groups on a term project that pushes the state-of-the-art in the design of a web-scale application. Students will typically be required to implement and evaluate a significantly-large software system.

The project deliverables are as follows:

Project Description (5%): 1 page. Due ~~Oct 2~~ Oct 7, 2024.
Project Status Report (10%): 3-4 pages. Due ~~Nov 9~~Nov 16, 2024.
Project Final Report (35%): 8-10 pages. Due ~~Dec 4~~Dec 15, 2024.

We will be holding project presentations on Dec 4, 2024 and Dec 11, 2024 during class. Each presentation should be roughly 15 minutes long, followed by 5 minutes of Q/A.

Here is a list of project ideas.

Note: students should not do the course project if they choose the All Quizzes option.

Quizzes

There will be four quizzes in the course, held roughly every 3 weeks. Each quiz will cover topics covered in the last 3 weeks. Students doing the course project can choose to do any two quizzes. Otherwise, students are required to take the four quizzes.

The quizzes will be held on ~~Tuesdays, 7-9 pm~~ Wednesdays in class from 4:10-5:40 pm, on the following dates:
~~Sep 24~~ Oct 2, 2024
~~Oct 15~~ Oct 16, 2024
~~Nov 12~~ Nov 13, 2024
~~Dec 3~~ Dec 4, 2024

The format of the quiz will be announced in class.

There will be no assignments in this course.

There will be no final exam in this course.

Lecture Material

The instructor will make lecture material available here as topics are covered in class.

1.	Introduction
	Overview of the course
	Introduction to the course
2.	Consensus and Coordination
	Overview
	Overview of linearizability
	Raft slides	In Search of an Understandable Consensus Algorithm. USENIX ATC 2014.	Video, Extended Paper
	ZooKeeper slides	ZooKeeper: Wait-free Coordination for Internet-scale Systems. USENIX ATC 2010.	Video
3.	Cluster Storage Systems
	Overview
	GFS slides	The Google File System. SOSP 2003.
	BigTable slides	Bigtable: A Distributed Storage System for Structured Data. OSDI 2006.	Audio
4.	Transactional Stores
	Overview
	Sinfonia slides	Sinfonia: A New Paradigm for Building Scalable Distributed Systems. SOSP 2007.
5.	Wide Area Storage Systems
	Overview
	Dynamo slides	Dynamo: Amazon's Highly Available Key-value Store. SOSP 2007.	Experiences and Video
	Spanner slides	Spanner: Google's Globally Distributed Database. ACM TOCS 2013.	Video, Extended Paper
6.	Data Parallel Frameworks
	Overview
	Map Reduce slides	MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004.
	Spark slides	Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. NSDI 2012.	Video
7.	Resource Management
	Overview
	Mesos slides	Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. NSDI 2011.	Video
	Twine slides	Twine: A Unified Cluster Management System for Shared Infrastructure. OSDI 2020.	Video
8.	Stream Processing
	Overview
	Millwheel slides	MillWheel: Fault-Tolerant Stream Processing at Internet Scale. VLDB 2013.
	Noria slides	Noria: dynamic, partially-stateful data-flow for high-performance web applications. OSDI 2018.	Audio, Extended Video
9.	Graph Processing
	Overview
	Pregel slides	Pregel: A System for Large-Scale Graph Processing. SIGMOD 2010.
	Powergraph slides	PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. OSDI 2012.	Video

Reading List

This is a list of papers that provides background on the material that will be covered in the course. We will cover a subset of this material in class. This background material will be especially helpful when choosing your course project, for motivating your project and for writing about the research work related to your project.

Most of these papers can be accessed from the ACM web site. If you cannot access ACM articles directly, please read the following instructions for accessing the papers.

Introduction

A View of Cloud Computing. CACM 2010.
The Dangers of Replication and a Solution. SIGMOD 1996.
Efficient Readings of Papers in Science and Technology.
How (and How Not) to Write a Good Systems Paper. Operating Systems Review 1983.

Consensus and Coordination

Cluster Storage Systems

The Hadoop Distributed File System. MSST 2010.
Fast Crash Recovery in RAMCloud. SOSP 2011.
Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency. SOSP 2011.

Transactional Stores

Designing Modern Web-Scale Applications

ECE1724, Fall 2024 University of Toronto

Quick Links

Course Description

Prerequisites

Textbooks

Course Announcements and Questions

Grading Policy

Course Readings

Class Project and Project Ideas

Quizzes

Lecture Material

Reading List

Introduction

Consensus and Coordination

Cluster Storage Systems

Transactional Stores

Wide Area Storage Systems

Data Parallel Frameworks

Scheduling and Resource Management

Stream Processing

Graph Processing

Machine Learning Systems

ECE1724, Fall 2024
University of Toronto