At UofT, we have the largest group of researchers working on many aspects of FPGA technology. Here I describe my involvement in these projects.
FPGA technology is still a new and developing technology so there are lots of interesting aspects to study and improve including:
There are hurdles to overcome before this becomes a mainstream technology. These include the fact that reconfigurable technology is inherently slower than full custom or gate array technology, so this approach may not provide speedup in all cases. A more serious issue is that creating a good hardware description is an extremely difficult problem that will provide opportunities for many more Ph.D. dissertations. It will be a while longer before you will get new flags on the Gnu C compiler that will target FPGAs.
The fact that there are so many problems, also means that there are many interesting research topics. It also means that there are many different approaches that one can take. The Berkeley Brass Research Group is a key player and maintains a good list of links to other work.
Many are looking at using FPGAs as the complete solution, meaning that the complete functionality of the computation is described as hardware. The approach that I am taking, along with some others, is to include both a programmable processor and reconfigurable hardware. Here, we have been focusing on the interface between the two components [1,2,3,4,5,6,7]. This type of processor could be used in two situations: one where the reconfigurable hardware aids in the computation, and the other where the reconfigurable hardware is used to develop an interface to the outside world. The latest work was done by Jorge Carrillo [6,7]. He has created a OneChip Homepage.
However, with modern FPGAs, the risk and time of hardware development is significantly reduced. There is also the added advantage that the designs (the HDL code) are generally portable. Like recompiling a C program for a faster processor, it is possible to recompile your HDL design for a newer and better FPGA. Therefore, there are now emerging opportunities to achieve, once again, a leap in performance above software implementations on microprocessors by developing specialized hardware.
This project is starting by investigating the use of FPGAs for computing the molecular dynamics (MD) problem as it applies to biomolecular computing. Some early work shows the promise of this approach [8,9].
The goal is to build a machine that, in many ways, looks like a multiprocessor, but the processing nodes are FPGAs that can be programmed to implement a particular computation. We use the term Application-Specific Reconfigurable Multiprocessor for this class of architectures.
For the MD problem, we are currently calling our system the TMD Machine. The advantage of the TMD machine is that it should be at least 100 times less costly than the large scale multiprocessors and distributed computing systems currently in use for this application, and run faster, resulting in a signficant (at least 100x) price/performance advantage. The machine should also be significantly less complex and easier to build than currrent multiprocessors. Only commodity parts will be used.
In many ways, this project is a super Transmogrifier. The next item discusses the Transmogrifier in more detail.
David Chui, Sam Lee and Chris Comis are grad students of mine working on this project. Professor Régis Pomès and his student, Chris Madill, are collaborators on this project from the Biochemistry world.
More details to come as the project develops.
The Transmogrifier is in the class of machines called FPGA Systems. These are machines consisting of a number of FPGAs connected so that they can be used as a reconfigurable hardware platform.
A simple description of the Transmogrifier is that it is an interconnection of a number of FPGAs using programmable interconnect chips. In a sense, we are building a very large FPGA using existing off-the-shelf FPGAs and field-programmable interconnection chips.
We started with the Transmogrifier 1 (TM-1), a small system [10] using a board from Aptix that contained one of their interconnect chips. This board had four Xilinx XC4010 FPGAs.
Based on what we learned from the TM-1, we set out to build a One-Million gate system (the TM-2), but with the rapid improvement of technology, we have now built a Two-Million gate system (the TM-2a) [11,12]. This system uses 10K50 (TM-2) or 10K100 (TM-2a) devices from Altera and programmable interconnect chips from I-Cube.
An important element of the TM-2 work is the design environment. The Transmogrifier-2 is accessible over the network and we have developed a protocol and C libraries that make it easy to communicate with the user circuit from a program running on a workstation somewhere on the network.
The machine is intended to be used for hardware prototyping and
reconfigurable computing.
Currently we are trying out some neat graphics algorithms,
some image processing, and face recognition.
My students have implemented a highly pipelined DES cracker [13]
that tries to
find the key that has been used for a message encrypted by DES.
On the 16-board TM-2a, we can achieve 800M key cracks/second, which
we claim to be the fastest programmable solution that fits in a cubic
metre!
A
slide presentation
is also available.
The only faster solutions use a large number of workstations
coordinated over the internet
(distributed.net)
or
customized hardware.
After the construction of the TM-2a, we developed the next generation, the TM-3. Having had our fill of building multi-board systems, we decided that our next system must fit on one board. The TM-3 has four Xilinx XCV2000E Virtex chips, the latest generation devices at the time and a fixed interconnect.
As FPGA technology continues to progress, we are now developing the TM-4. It is the M.A.Sc. project of Josh Fender working under the supervision of Prof. Rose. The improvements over the TM-3 architecture are to provide faster interconnect between the FPGAs (using LVDS), and faster and more external memory. Again, we will use the latest in technology. For the TM-4, we will use four Altera Stratix S80 devices.
Check out the TM-2 Home Page, the TM-3 Home Page and the TM-4 Home Page!
This is the study of what the FPGA should look like inside so that it can be used to implement interesting circuits efficiently. At UofT, we did a lot of the early work in this area. I have been involved in several architectural studies [14,15,16,17] and there have been others done here by Professors Rose, Lewis, and Brown.
I also worked with Professor Gulak and Dean D'Mello who used a language called Ballistic to describe an analogue FPGA [26]. Ballistic is a high-level layout language.
General Disclaimer
The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.