Abstract.FPGAs are becoming an attractive platform for accelerating many computations including scientific applications. However, the large development cost and short life span for FPGA designs have limited their adoption by the scientific computing community. We believe that FPGA-based scientific computing would become far more practical if there were hardware libraries that were portable to any FPGA with performance that could scale with size of the resources of the FPGA. To illustrate this idea we have implemented one common super-computing library function: the LU factorization method for solution of systems of linear equations. This paper discusses issues in making the design both portable (which is primarily about the ease of use of memory external to the FPGA) and scalable. The compute engine is automatically generated to match the FPGA capabilities and external memory through the use of a wide range of parameters. We compared the performance of the engine on the largest FPGA available, (an Altera Stratix III 3S340) to a single processor core fabricated in the same 65nm IC process, and show that it performs LU factorization 2.2 times faster on matrices on the order of 10,000 x 10,000 elements, and that the energy dissipated per useful GFLOP operation is a factor of 5 times less. We also note that, unless the very best software libraries from processor manufacturers are used, the FPGA results would be far better than these quoted.
Back to my home page.