start [Andreas Moshovos]

This is an old revision of the document!

I teach at the Department of Electrical and Computer Engineering at the University of Toronto where I have the priviledge to work with a team of very talented graduate students. I have also taught at the Northwestern University, USA, the University of Athens, Greece, the Hellenic Open University, Greece and as a invited professor at the École polytechnique fédérale de Lausanne, Switzerland. I received a Bachelors and a Master's Degree from the University of Crete, Greece and a Ph.D. from the University of Wisconsin-Madison.

My research interests lie primarily in the design of performance-, energy-, and/or cost-optimized computing engines for various applications domains. Most of my work thus far has been on high-performance general-purpose systems.My current work emphasizes highly-specialized computing engines for Deep Learning. I will also be serving as the Director of the newly formed National Sciences and Engineering Research Council Strategic Partnership Network on Machine Learning Hardware Acceleration (NSERC COHESA), a partnership of 19 Researchers across 7 Universities involving 8 Industrial Partners.

For the work I have done with my students and collaborators, I have been awarded the ACM SIGARCH Maurice Wilkes mid-career award, a National Research Foundation CAREER Award, two IBM Faculty Partnership awards, a Semiconductor Research Innovation award, an IEEE Top Picks in Computer Architecture Research, and a MICRO conference Hall of Fame award. I have served as the Program Chair for the ACM/IEEE International Symposium on Microarchitecture and the ACM/IEEE International Symposium on Performance Analysis of Systems and Software. I am also Fellow of the ACM.

We are developing, designing and demonstrating a novel class of hardware accelerators for Deep Learning networks whose key feature is that they are value-based. Conventional accelerators rely mostly on the structure of computation, that is, which calculations are performed and how they communicate. Value-based accelerators further boost performance by taking advantage of expected properties in the runtime calculated value stream, such as, dynamically redundant or ineffectual computations, or the distribution of values, or even their bit content. In short, our accelerator designs, reduce the amount of work that needs to be performed for existing neural models and so transparently to the model designer. Why are we pursuing these designs? Because Deep Learning is transforming our world by leaps and bounds. One of the three drivers behind Deep Learning success is the computing hardware that enabled its first practical applications. While algorithmic improvements will allow Deep Learning to evolve, much hinges on hardware’s ability to keep delivering ever higher performance and data processing storage and processing capability. As Dennard scaling has seized, the only viable way to do so is by architecture specialization.

The figure below highlights the potential and motivation of some of the methods we have developed:

A: remove zero Activations, W: remove zero weights, Ap: use dynamic per group precision for activations, Ae: skip ineffectual terms after Booth Encoding the Activations. We also have designs that exploit Weight precision (see LOOM below) and yet to be released designs that exploit further properties :)

This IEEE MICRO and IEEE Computer articles present our rationale and summarize some of our designs. The most recently publicly disclosed design is Bit-Tactical that targets but does not require sparse networks.

The tables below summarize some key characteristics of some of our designs:

Deep Learning Acceleration