Up: PC's Home

FPGA Research

At UofT, we have the largest group of researchers working on many aspects of FPGA technology. Here I describe my involvement in these projects.

FPGA technology is still a new and developing technology so there are lots of interesting aspects to study and improve including:

Reconfigurable Processors

An interesting use of reconfigurable technology is to use it for computing. To a first order, the goal is to compile code into a hardware description instead of a sequence of machine instructions. The hope is that the hardware configuration will result in a shorter processing time. This is the first time that we have had a technology that really allows us to explore alternatives to the typical von Neumann style of architectures.

There are hurdles to overcome before this becomes a mainstream technology. These include the fact that reconfigurable technology is inherently slower than full custom or gate array technology, so this approach may not provide speedup in all cases. A more serious issue is that creating a good hardware description is an extremely difficult problem that will provide opportunities for many more Ph.D. dissertations. It will be a while longer before you will get new flags on the Gnu C compiler that will target FPGAs.

The fact that there are so many problems, also means that there are many interesting research topics. It also means that there are many different approaches that one can take. The Berkeley Brass Research Group is a key player and maintains a good list of links to other work.

Many are looking at using FPGAs as the complete solution, meaning that the complete functionality of the computation is described as hardware. The approach that I am taking, along with some others, is to include both a programmable processor and reconfigurable hardware. Here, we have been focusing on the interface between the two components [1,2,3,4,5,6,7]. This type of processor could be used in two situations: one where the reconfigurable hardware aids in the computation, and the other where the reconfigurable hardware is used to develop an interface to the outside world. The latest work was done by Jorge Carrillo [6,7]. He has created a OneChip Homepage.

Application-Specific Reconfigurable Multiprocessors

Now (circa 2003) that FPGAs have become so large and powerful, it is possible to think of ways to use them for aspects of high-performance computing. One way to speed up computation is to use an application-specific processor (ASP), defined to be a processor that does a computation by taking advantage of specific knowledge about the problem. This typically means that custom hardware is developed. With the advent of VLSI and microprocessor technology, it became more difficult to develop ASPs in a timely manner. This is because VLSI development is a risky and long process, and microprocessor speeds often improve faster than it is possible to build an ASP to solve the same problem, i.e., by the time the chip is done, the software implementation using a current microprocessor can run at about the same speed.

However, with modern FPGAs, the risk and time of hardware development is significantly reduced. There is also the added advantage that the designs (the HDL code) are generally portable. Like recompiling a C program for a faster processor, it is possible to recompile your HDL design for a newer and better FPGA. Therefore, there are now emerging opportunities to achieve, once again, a leap in performance above software implementations on microprocessors by developing specialized hardware.

This project is starting by investigating the use of FPGAs for computing the molecular dynamics (MD) problem as it applies to biomolecular computing. Some early work shows the promise of this approach [8,9].

The goal is to build a machine that, in many ways, looks like a multiprocessor, but the processing nodes are FPGAs that can be programmed to implement a particular computation. We use the term Application-Specific Reconfigurable Multiprocessor for this class of architectures.

For the MD problem, we are currently calling our system the TMD Machine. The advantage of the TMD machine is that it should be at least 100 times less costly than the large scale multiprocessors and distributed computing systems currently in use for this application, and run faster, resulting in a signficant (at least 100x) price/performance advantage. The machine should also be significantly less complex and easier to build than currrent multiprocessors. Only commodity parts will be used.

In many ways, this project is a super Transmogrifier. The next item discusses the Transmogrifier in more detail.

David Chui, Sam Lee and Chris Comis are grad students of mine working on this project. Professor Régis Pomès and his student, Chris Madill, are collaborators on this project from the Biochemistry world.

More details to come as the project develops.

The Transmogrifier

This is a large project that is joint work with Professors Rose and Lewis.

The Transmogrifier is in the class of machines called FPGA Systems. These are machines consisting of a number of FPGAs connected so that they can be used as a reconfigurable hardware platform.

A simple description of the Transmogrifier is that it is an interconnection of a number of FPGAs using programmable interconnect chips. In a sense, we are building a very large FPGA using existing off-the-shelf FPGAs and field-programmable interconnection chips.

We started with the Transmogrifier 1 (TM-1), a small system [10] using a board from Aptix that contained one of their interconnect chips. This board had four Xilinx XC4010 FPGAs.

Based on what we learned from the TM-1, we set out to build a One-Million gate system (the TM-2), but with the rapid improvement of technology, we have now built a Two-Million gate system (the TM-2a) [11,12]. This system uses 10K50 (TM-2) or 10K100 (TM-2a) devices from Altera and programmable interconnect chips from I-Cube.

An important element of the TM-2 work is the design environment. The Transmogrifier-2 is accessible over the network and we have developed a protocol and C libraries that make it easy to communicate with the user circuit from a program running on a workstation somewhere on the network.

The machine is intended to be used for hardware prototyping and reconfigurable computing. Currently we are trying out some neat graphics algorithms, some image processing, and face recognition. My students have implemented a highly pipelined DES cracker [13] that tries to find the key that has been used for a message encrypted by DES. On the 16-board TM-2a, we can achieve 800M key cracks/second, which we claim to be the fastest programmable solution that fits in a cubic metre! A slide presentation is also available. The only faster solutions use a large number of workstations coordinated over the internet (distributed.net) or customized hardware.

After the construction of the TM-2a, we developed the next generation, the TM-3. Having had our fill of building multi-board systems, we decided that our next system must fit on one board. The TM-3 has four Xilinx XCV2000E Virtex chips, the latest generation devices at the time and a fixed interconnect.

As FPGA technology continues to progress, we are now developing the TM-4. It is the M.A.Sc. project of Josh Fender working under the supervision of Prof. Rose. The improvements over the TM-3 architecture are to provide faster interconnect between the FPGAs (using LVDS), and faster and more external memory. Again, we will use the latest in technology. For the TM-4, we will use four Altera Stratix S80 devices.

Check out the TM-2 Home Page, the TM-3 Home Page and the TM-4 Home Page!

Architecture

This is the study of what the FPGA should look like inside so that it can be used to implement interesting circuits efficiently. At UofT, we did a lot of the early work in this area. I have been involved in several architectural studies [14,15,16,17] and there have been others done here by Professors Rose, Lewis, and Brown.

Circuit Design and Implementation

Much of this aspect is closely related to the architectural work that we do here. To keep our architectural studies honest, we also try to build real FPGA chips to make sure that our architectural modeling has been done correctly [18,19,20,21,22,23].

Programmable Analogue Arrays

We have also played with programmable analogue arrays. Professor Gulak is the leader of this work. Professor Gulak and I supervised Paul Chow (no relation!) on a project to build a field-programmable mixed analogue/digital array [24,25]. This combines digital FPGA circuitry with the programmable analogue circuits. The main issue we explored was the interface between the two devices.

I also worked with Professor Gulak and Dean D'Mello who used a language called Ballistic to describe an analogue FPGA [26]. Ballistic is a high-level layout language.

Just Building Things

Sometimes, we want to build something and FPGAs are a reasonable medium for doing so. One of our most successful projects was to build a Viterbi decoder [27,28,29] using Xilinx FPGAs, which Xilinx generously provided. This was a demonstration of another project that developed the architecture for the decoder.

On-Line Publications Copyright Information

General Disclaimer

The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Bibliography

1: Robert Jeschke.
An FPGA Based Reconfigurable Coprocessor for the IBM PC.
Master's thesis, University of Toronto, Department of Electrical and Computer Engineering, Toronto, Ontario, M5S 3G4, 1994.
2: Ralph D. Wittig.
OneChip: An FPGA Processor With Reconfigurable Logic.
Master's thesis, University of Toronto, Department of Electrical and Computer Engineering, Toronto, Ontario, M5S 3G4, 1995.
thesis.ps (509116), thesis.ps.gz (151124) .
3: Ralph D. Wittig and Paul Chow.
OneChip: An FPGA Processor With Reconfigurable Logic.
In The Fourth Annual IEEE Symposium on FPGAs for Custom Computing Machines FCCM'96, pages 126-135. IEEE, March 1996.
paper.ps (155887), paper.ps.gz (49472), paper.pdf (1059698), slides.ps.gz (29628) .
4: Jeffrey A. Jacob and Paul Chow.
Memory Interfacing and Instruction Specification for Reconfigurable Processors.
In International Symposium on Field-Programmable Gate Arrays, pages 145-154. ACM/SIGDA, February 1999.
paper.ps (811055), paper.ps.gz (236224), paper.pdf (95228) .
5: Jeffery A. Jacob.
Memory Interfacing for the OneChip Reconfigurable Processor.
Master's thesis, University of Toronto, Department of Electrical and Computer Engineering, Toronto, Ontario, M5S 3G4, 1998.
thesis.ps.gz (645782) .
6: Jorge Ernesto Carrillo Esparza.
Evaluation of the OneChip reconfigurable Processor.
Master's thesis, University of Toronto, Department of Electrical and Computer Engineering, Toronto, Ontario, M5S 3G4, September 2000.
thesis.ps.gz (143442) .
7: Jorge E. Carrillo E. and Paul Chow.
The Effect of Reconfigurable Units in Superscalar Processors.
In International Symposium on Field-Programmable Gate Arrays, pages 141-150. ACM, February 2001.
paper.ps (363276), paper.ps.gz (146728), paper.pdf (201396) .
8: Ian Kuon, Navid Azizi, Ahmad Darabiha, Aaron Egier, and Paul Chow.
FPGA-Based Supercomputing: An Implementation for Molecular Dynamics.
ACM International Symposium on Field-Programmable Gate Arrays Poster Session, February 2004.
poster.pdf (3663118) .
9: Navid Azizi, Ian Kuon, Aaron Egier, Ahmad Darabiha, and Paul Chow.
Reconfigurable Molecular Dynamics Simulator.
In Symposium on Field-Programmable Custom Computing Machines (FCCM'04), pages 197-206. IEEE, April 2004.
paper.pdf (173430), slides.pdf (356937), slidesdetailed.pdf (526704) .
10: David Galloway, David Karchmer, Paul Chow, David Lewis, and Jonathan Rose.
The Transmogrifier: The University of Toronto Field-Programmable System.
In 1994 Canadian Workshop on Field-Programmable Devices, Kingston, Ontario, June 1994.
paper.ps.gz (20279) .
11: David M. Lewis, David R. Galloway, Marcus van Ierssel, Jonathan Rose, and Paul Chow.
The Transmogrifier-2: A 1 Million Gate Rapid Prototyping System.
In 1997 International Symposium on Field-Programmable Gate Arrays, pages 53-61. ACM/SIGDA, February 1997.
paper.ps.gz (409501) .
12: David M. Lewis, David R. Galloway, Marcus van Ierssel, Jonathan Rose, and Paul Chow.
The Transmogrifier-2: A 1 Million Gate Rapid Prototyping System.
IEEE Transactions on VLSI Systems, 6(2):188-198, June 1998.
paper.pdf (190473) .
13: Ivan Hamer and Paul Chow.
DES Cracking on the Transmogrifier 2a.
In Cetin Kaya Koc and Christof Paar, editors, Cryptographic Hardware and Embedded Systems, pages 13-24. Springer-Verlag Lecture Notes in Computer Science (LNCS 1717), 1999.
Presented August 12, 1999. paper.ps.gz (80614), slides .
14: Jonathan Rose, Robert J. Francis, Paul Chow, and David Lewis.
The Effect of Logic Block Complexity on Area of Programmable Gate Arrays.
In Custom Integrated Circuits Conference, pages 5.3.1-5.3.5. IEEE, May 1989.
paper.pdf (331773) .
15: Jonathan Rose, Robert J. Francis, David Lewis, and Paul Chow.
Architecture of Field Programmable Gate Arrays: The Effect of Logic Block Functionality on Area Efficiency.
IEEE Journal of Solid-State Circuits, 25(5):1217-1225, October 1990.
paper.pdf (828316) .
16: Satwant Singh, Jonathan Rose, David Lewis, Kevin Chung, and Paul Chow.
Optimization of Field-Programmable Gate Array Logic Block Architecture for Speed.
In Custom Integrated Circuits Conference, pages 6.1.1-6.1.6. IEEE, May 1991.
paper.pdf (409783) .
17: Satwant Singh, Jonathan Rose, Paul Chow, and David Lewis.
The Effect of Logic Block Architecture on FPGA Performance.
IEEE Journal of Solid-State Circuits, 27(3):281-287, March 1992.
paper.pdf (724665) .
18: Paul Chow, Soon Ong Seo, Dennis Au, Terrence Choy, Bahram Fallah, David Lewis, Cherry Li, and Jonathan Rose.
A 1.2 $\mu$ m CMOS FPGA Using Cascaded Logic Blocks and Segmented Routing.
In Will Moore and Wayne Luk, editors, FPGAs, chapter 3.2, pages 91-102. Abingdon EE&CS Books, 15 Harcourt Way, Abingdon OX14 1NV, England, 1991.
Presented at the Oxford 1991 International Workshop on Field Programmable Logic and Applications, paper.ps.gz (60692), paper.pdf (51599) .
19: Kevin Chung, Satwant Singh, Jonathan Rose, and Paul Chow.
Using Hierarchical Logic Blocks to Improve the Speed of Field-Programmable Gate Arrays.
In Will Moore and Wayne Luk, editors, FPGAs, chapter 3.3, pages 103-113. Abingdon EE&CS Books, 15 Harcourt Way, Abingdon OX14 1NV, England, 1991.
Presented at the Oxford 1991 International Workshop on Field Programmable Logic and Applications.
20: Soong Ong Seo.
A High Speed Field-Programmable Gate Array Using Programmable Minitiles.
Master's thesis, University of Toronto, Department of Electrical and Computer Engineering, Toronto, Ontario, M5S 3G4, 1994.
Co-supervised with Jonathan Rose.
21: Paul Chow, Soon Ong Seo, Kevin Chung, Gerard Paez, and Jonathan Rose.
A High-Speed FPGA Using Programmable Mini-tiles.
In Symposium on Integrated Systems, previously the Conference on Advanced Research in VLSI, pages 103-122, March 1993.
paper.ps (664109), paper.ps.gz (101397) .
22: Paul Chow, Soon Ong Seo, Jonathan Rose, Kevin Chung, Gerard Páez-Monzón, and Immanuel Rahardja.
The Design of an SRAM-Based Field-Programmable Gate Array, Part I: Architecture.
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 7(2):191-197, June 1999.
paper.pdf (178930) .
23: Paul Chow, Soon Ong Seo, Jonathan Rose, Kevin Chung, Gerard Páez-Monzón, and Immanuel Rahardja.
The Design of an SRAM-Based Field-Programmable Gate Array, Part II: Circuit Design and Layout.
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 7(3):321-330, September 1999.
paper.pdf (528590) .
24: Paul Chow.
A Field-Programmable Mixed-Analog-Digital Array.
Master's thesis, University of Toronto, Department of Electrical and Computer Engineering, Toronto, Ontario, M5S 3G4, 1994.
Co-supervised with Glenn Gulak, thesis.ps.gz (204218) .
25: Paul Chow, Paul Chow, and P. Glenn Gulak.
A Field-Programmable Mixed-Analog-Digital Array.
In 1995 International Symposium on Field-Programmable Arrays, pages 104-109. ACM/SIGDA, February 1995.
paper.pdf (113781) .
26: Dean D'Mello.
Synthesis of FPAA Cores Using an Intermediate Language Layout Language Approach.
Master's thesis, University of Toronto, Department of Electrical and Computer Engineering, Toronto, Ontario, M5S 3G4, 1996.
27: David Yeh, Paul Chow, and Gennady Feygin.
A Multiprocessor Viterbi Decoder Using Xilinx FPGAs.
In 1994 Canadian Workshop on Field-Programmable Devices, Kingston, Ontario, June 1994.
paper.ps.gz (44485) .
28: David Chun-Chin Yeh.
A Multiprocessor Viterbi Decoder Using Xilinx FPGAs.
Master's thesis, University of Toronto, Department of Electrical and Computer Engineering, Toronto, Ontario, M5S 3G4, 1995.
thesis.ps.gz (196452) .
29: David Yeh, Gennady Feygin, and Paul Chow.
RACER: A Reconfigurable Constraint-Length 14 Viterbi Decoder.
In The Fourth Annual IEEE Symposium on FPGAs for Custom Computing Machines FCCM'96, pages 60-69. IEEE, March 1996.
paper.ps (190678), paper.ps.gz (58685), paper.pdf (1059698), slides.ps.gz (196452) .

Up: PC's Home

Paul Chow 2005-01-02