About me
I am a professor in the Electrical and Computer Engineering Department at University of Toronto, where I also serve as Chair of the Computer Engineering Group. My research focuses on systems software, with an emphasis on developing practical solutions to improve the availability and performance of large-scale software systems.
I hold the Canada Research Chair in Systems Software and am a recipient of McCharles Prize for Early Career Research Distinction. I have served as Program Co-chair of SOSP, HotOS, APSys, and the inaugural Student Research Competition at SOSP. I currently serve as the Vice-Chair of ACM SIGOPS. I also received a few teaching awards, including the Gordon Slemon award and Student Choice Award (upper year instructor) of Faculty of Engineering.
I founded a startup company called YScope with my PhD students to bring our research into practice. Check out CLP, an open-source tool that can compress text and JSON logs and search compressed logs without decompression. This Uber Engineering Blog describes a deployment case-study of CLP.
I received my Ph.D from the Computer Science Department of University of Illinois, Urbana-Champaign under the supervision of a great advisor, Yuanyuan Zhou. I was also a visiting PhD student in the awesome System and Networking group of University of California, San Diego. My CV is here.
I am always looking for self motivated students to work with me. If you are interested, please submit your application here.
News
- Hacker News [1], [2],
 - Discussions from HBase developers, which prompted a series of reactions to address the problems we mentioned in the paper.
 - Twitter discussions: see this, this, and this (if you're looking for a screenshot that summarizes our paper, see this or this).
 - Blog: the morning paper (also it is considered as a highlight of 2016), It Will Never Work In Theory, Another word for it, Metadata, Fifty Quick Ideas to Improve Your Tests, Postmortem lessons, Some discussions on Google+.
 - And quite a few emails sent to us from developers...
 
Selected publications
- μSlope: High Compression and Fast Search on Semi-Structured Logs. In the Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI'24), July, 2024. Pages 529-544. [Code]
 - Relational Debugging -- Pinpointing Root Causes of Performance Problems. In the Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI'23), July, 2023. Pages 65--80. [SIGOPS Blog, Code]
 - Investigating Managed Language Runtime Performance: Why JavaScript and Python are 8x and 29x slower than C++, yet Java and Go can be Faster? In the Proceedings of the 2022 USENIX Annual Technical Conference (ATC'22), July 11-13, 2022. Pages 835--852. [USENIX ;login: article] [Code]
 - Hubble: Performance Debugging with In-Production, Just-In-Time Method Tracing on Android. In the Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI'22), July 11-13, 2022. Pages 787--803.
 - ctFS: Replacing File Indexing with Hardware Memory Translation through Contiguous File Allocation for Persistent Memory. In the Proceedings of the 20th USENIX Conference on File and Storage Technologies (FAST'22), Febuary 22-24, 2022. Best paper award runner up. [ACM Transaction on Storage article] [USENIX ;login: article] [Code]
 - Understanding and Detecting Software Upgrade Failures in Distributed Systems In the Proceedings of The 28th ACM Symposium on Operating Systems Principles (SOSP'21), October 25-28, 2021. [Code]
 - CLP: Efficient and Scalable Search on Compressed Text Logs. In the Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI'21). July 14--16, 2021. Pages 183--198. [Code]
 - M3: End-to-End Memory Management in Elastic Systems Software Stack. In the 16th ACM European Conference on Computer Systems (EuroSys 2021), April, 2021. Pages 507-522. [Code]
 - The Inflection Point Hypothesis: A Principled Debugging Approach for Locating the Root Cause of a Failure. In the 27th ACM Symposium on Operating Systems Principles (SOSP’19), October 2019, Huntsville, Ontario, Canada. [Press: The morning paper] [USENIX ;login: article]
 - An Analysis of Performance Evolution of Linux's Core Operations. In the 27th ACM Symposium on Operating Systems Principles (SOSP’19), October 2019, Huntsville, Ontario, Canada. [Press: The morning paper] [Code]
 - Log20: Fully Automated Optimal Placement of Log Printing Statements under Specified Overhead Threshold. In the 26th ACM Symposium on Operating Systems Principles (SOSP’17), October 2017, Shanghai, China. [Press: The morning paper][Code][Impact: licensed by Netflix]
 - Pensieve: Non-Intrusive Failure Reproduction for Distributed Systems using the Event Chaining Approach. In the 26th ACM Symposium on Operating Systems Principles (SOSP’17), October 2017, Shanghai, China.
 - Non-intrusive Performance Profiling of Entire Software Stacks based on the Flow Reconstruction Principle. Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16), November 2016, Savannah, GA.
 - Don't Get Caught In the Cold, Warm-up Your JVM: Understand and Eliminate JVM Warm-up Overhead In Data-parallel Systems. Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16), November 2016, Savannah, GA. [Press: Invited publication: USENIX ;login: 42(1), The Next Platform][Code]
 - Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-intensive Systems. Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14), October 2014, Broomfield, CO
 - lprof: A Non-intrusive Request Flow Profiler for Distributed Systems. In the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14), October 2014, Broomfield, CO. *: Equally contributed.
 - Do Not Blame Users for Misconfigurations Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13), November 2013.
 - Be Conservative: Enhancing Failure Diagnosis with Proactive Logging Proceedings of the 9th ACM/USENIX Symposium on Operating Systems Design and Implementation (OSDI’12), Hollywood, CA, October 2012
 - Improving Software Diagnosability via Log Enhancement ACM Transactions on Computer Systems (TOCS), Februray 2012. Fast-forwarded from ASPLOS'11.
 - SherLog: Error Diagnosis by Connecting Clues from Run-time Logs. In the Proceedings of the 15th International Conference on Architecture Support for Programming Language and Operating Systems (ASPLOS’10), pages 143-154, Pittsburgh, PA., March 2010.
 - /* iComment: Bugs or Bad Comments? */ In the Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP’07), pages 145-158, October 2007.
 
Full publication list
Group
It is fun to work with the following incredible people:
Post-doc:
- Huangshi Tian
 
Graduate students:
- Adrian Chiu
 - Ruibin Li
 - Zhihao Lin
 - Jack Luo
 - David Marcovitch
 - ChenXing Yang
 - Yi Fan Yu
 
Alumni:
- Yongle Zhang, PhD 2020, First Employment: Assistant Professor, Department of Computer Science, Purdue University. Winner of The SIGOPS Dennis M. Ritchie Thesis Award.
 - Xu Zhao, PhD 2021, First Employment: Research Scientist@Facebook. Winner of Facebook Fellowship.
 - Kirk Rodrigues, PhD 2023, First Employment: Co-founder of YScope.
 - David Lion, PhD 2023, First Employment: Co-founder of YScope.
 - Xiang (Jenny) Ren, PhD 2024, First Employment: Assistant Professor, College of Computer Science, Northeastern University.
 - Ming Zhang, Visiting PhD student from HUST, First Employment: OS researcher at Huawei.
 - Hailong Sun, visiting scholar, now Professor at Beihang University
 - Sitao Wang, Master of Applied Science 2025, First Employment: Engineer at YScope.
 - Chaoyue Gong, Master of Applied Science 2025, First Employment: Engineer at YScope.
 - Devin Gibson, Master of Applied Science 2024, First Employment: Engineer at YScope.
 - Xiaochong Wei, Master of Applied Science 2024, First Employment: Engineer at YScope.
 - Haiqi Xu, Master of Applied Science 2024, First Employment: Engineer at YScope.
 - Rishikesh Devsot, Master of Applied Science 2023, First Employment: Engineer at YScope.
 - Rui Wang, Master of Applied Science 2023, First Employment: Engineer at YScope.
 - Zhuqi Jin, Master of Applied Science 2022, now at Meta.
 - Serhei Makarov, Master of Applied Science 2018, now at Red Hat.
 - Muhammad FaizanUllah (Undergraduate thesis) -> Microsoft
 - Neil Newman (Undergraduate thesis) -> graduate school@UBC
 - Alan Chung (Undergraduate thesis)
 
Teaching
- ECE344 Operating Systems: [Winter25][Winter24][Winter23][Winter22][Winter21][Winter20][Winter18][Winter17][Winter16][Winter15][Winter14][Winter13]
 - ECE454 Computer Systems Programming: [Fall18][Fall14][Fall13]
 - ECE244 Programming Fundamentals: [Fall22][Fall17][Fall16]
 - ECE1759 Graduate OS: [Fall25][Fall24][Fall23][Fall22][Fall21][Fall20][Fall17][Fall16][Fall14]
 
Program committee
- 2025: SOSP (PC Co-chair with Rebecca Isaacs)
 - 2024: SOSP, OSDI
 - 2023: SOSP, OSDI, EuroSys, EuroSys Poster (PC co-chair)
 - 2022: OSDI
 - 2021: OSDI, SOSP, HAOC, ASPLOS
 - 2020: OSDI, NSDI
 - 2019: HotOS (PC Co-chair with Jinyang Li), APSys (PC Co-chair with Yu Hua)
 - 2018: OSDI, EuroSys, ASPLOS (ERC)
 - 2017: SOSP, Student Research Competition@SOSP'17 (chair)
 - 2016: ASPLOS (also chair of poster and lightning session)
 - 2015: USENIX Annual Technical Conference, USENIX LISA, SOSP (poster PC)
 - 2014: OSDI (external review committee), USENIX Annual Technical Conference, SIGMETRICS, USENIX ICAC
 - 2012: USENIX Workshop on Managing Systems Automatically and Dynamically
 
Misc
I play a lot of sports, including basketball, skiing, swimming, and running. I was the captain of the Beihang's CSE basketball team when I was an undergrad and co-captain of the UIUC CS faculty & grad-student basketball team in the intramural games. I also ran some marathon and half-marathons (see a not-so-recent photo here). When I have more time, I also play accordion and piano.

