I am an Assistant Professor in the Edward S. Rogers Sr. Department of Electrical and Computer Engineering and am cross-appointed to the Department of Computer Science at the University of Toronto. I work on computer architecture and computer systems. My goal is to enable software engineers to easily harness the full potential of parallel systems with many thousands of cores. My current research focuses on (i) new programming models and compilers that easily express challenging types of parallelism and (ii) new large-scale multicore and accelerator architectures that extract the parallelism.
Before joining the University of Toronto, I was a Research Scientist in the Systems group at Facebook AI Research. I earned a PhD from the Massachusetts Institute of Technology, where I worked with Daniel Sanchez in the Computer Science and Artificial Intelligence Laboratory. Previously, I was a software engineer at AeroFS, a Y combinator startup, working on distributed file sharing. I earned an MASc while working with Greg Steffan and a BASc in Engineering Science from the University of Toronto.
Publications
MultiQueue-based FPGA routing: relaxed A* priority ordering for improved parallelism. In Proc. of the 23rd IEEE international conference on Field Programmable Technology (FPT). December 2024. (Best Paper Nominee) [text] [slides] [talk] [code] | FPT 2024 |
Multi bucket queues: efficient concurrent priority scheduling. In Proc. of the 36th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). June 2024. [text] [doi] [slides] [code] | SPAA 2024 |
When is parallelism fearless and zero-cost with Rust? In Proc. of the 36th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). June 2024. [text] [doi] [slides] [code] | SPAA 2024 |
Intergenerational embodied carbon. In Proc. of the workshop on Hot Topics in Ethical Computer Systems (HotEthics). April 2024. [text] [slides] | HotEthics 2024 |
Disintegrating manycores: which applications lose and why? In Proc. of the 16th international workshop on Network on Chip Architectures (NoCArc). October 2023. (Best Paper Award) [text] [doi] [slides] | NoCArc 2023 |
Brief announcement: is the problem-based benchmark suite fearless with Rust? In Proc. of the 35th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). June 2023. [text] [doi] [slides] [code] | SPAA 2023 |
Altocumulus: Scalable scheduling for nanosecond-scale remote procedure calls. In Proc. of the 55th IEEE/ACM international symposium on Microarchitecture (MICRO-55). October 2022. [text] [doi] | MICRO 2022 |
A scalable architecture for reprioritizing ordered parallelism. In Proc. of the 49th ACM/IEEE International Symposium on Computer Architecture (ISCA-49). June 2022. [text] [doi] [slides] [talk] | ISCA 2022 |
Understanding and improving failure tolerant training for deep learning recommendation with partial recovery. In Proc. of the 4th Conference on Machine Learning and Systems (MLSys). April 2021. [text] [talk] | MLSys 2021 |
T4: Compiling sequential code for effective speculative parallelization in hardware. In Proc. of the 47th ACM/IEEE International Symposium on Computer Architecture (ISCA-47). June 2020. [text] [doi] [slides] [talk] [code] | ISCA 2020 |
A hardware and software architecture for pervasive parallelism. PhD Thesis, Massachusetts Institute of Technology. 2019. (George M. Sprowls PhD thesis award in computer science, 2nd place) [text] | PhD Thesis |
Harmonizing speculative and non-speculative execution in architectures for ordered parallelism. In Proc. of the 51st IEEE/ACM international symposium on Microarchitecture (MICRO-51). October 2018. [text] [doi] [slides] [lightning] | MICRO 2018 |
SAM: optimizing multithreaded cores for speculative parallelism. In Proc. of the 26th international conference on Parallel Architectures and Compilation Techniques (PACT-26). September 2017. [text] [doi] [slides] | PACT 2017 |
Fractal: an execution model for fine-grain nested speculative parallelism. In Proc. of the 44th ACM/IEEE International Symposium on Computer Architecture (ISCA-44). June 2017. [text] [doi] [slides] [press: MIT News] | ISCA 2017 |
Data-centric execution of speculative parallel programs. In Proc. of the 49th IEEE/ACM international symposium on Microarchitecture (MICRO-49). October 2016. (Honorable mention for IEEE Micro's Top Picks) [text] [doi] [slides] | MICRO 2016 |
Unlocking ordered parallelism with the Swarm architecture. IEEE Micro's Top Picks from the Computer Architecture Conferences, 36(3). May/June 2016. [text] [doi] [press: MIT News, EEJournal, TOP500, HPCwire] | IEEE Micro 2016 |
A scalable architecture for ordered parallelism. In Proc. of the 48th IEEE/ACM international symposium on Microarchitecture (MICRO-48). December 2015. (Selected for IEEE Micro's Top Picks issue of "most significant papers in computer architecture based on novelty and long-term impact") [text] [doi] [slides] | MICRO 2015 |
Understanding and improving Bloom filter configuration for lazy address-set disambiguation. MASc Thesis, University of Toronto. 2011. [text] [slides] [code] | MASc Thesis |
Application-specific signatures for transactional memory in soft processors. ACM Transactions on Reconfigurable Technology and Systems, 4(3). August 2011. [doi] | ACM TRETS 2011 |
Understanding Bloom filter intersection for lazy address-set disambiguation. In Proc. of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). June 2011. [text] [doi] [slides] [code] | SPAA 2011 |
Application-specific signatures for transactional memory in soft processors. In Proc. of the 6th international symposium on Applied Reconfigurable Computing (ARC). March 2010. (Best paper award) [doi] | ARC 2010 |
Teaching
Winter 2025 | ECE1755 Parallel Computer Architecture and Programming |
Winter 2024 | ECE1755 Parallel Computer Architecture and Programming |
Fall 2023 |
ECE552 Computer Architecture and ECE253 Digital and Computer Systems |
Winter 2023 | ECE1755 Parallel Computer Architecture and Programming |
Fall 2022 |
ECE552 Computer Architecture and ECE253 Digital and Computer Systems |
Winter 2022 | ECE1755 Parallel Computer Architecture and Programming |
Fall 2021 | ECE552 Computer Architecture |
Winter 2021 | ECE1755 Parallel Computer Architecture and Programming |
Fall 2020 | ECE552 Computer Architecture |