Bulding a Multi-FPGA Virtualized Restricted Boltzmann Architecture Using Embedded MPI

Slide Link

Charles Lo and Paul Chow

University of Toronto

February, 2011

Machines (RBMs). However, the network size for most is limited by the amount of available on-chip memory. Therefore, many FPGAs are required to implement very large networks for use in real-world applications. A virtualized design is able to time-multiplex the hardware resources and handle much larger networks but suffers a performance penalty due to the context switch. In this paper, we present a number of improvements to a virtualized FPGA architecture for RBMs. First, we take advantage of 16-bit arithmetic to pack larger networks onto a chip. Second, a custom DMA engine is designed to reduce the performance impact of the large amount of memory transactions. Finally, the architecture is scaled to multiple FPGAs to gain additional performance through coarse grain parallelism.  The design effort required to implement these changes is minimized through the use of an embedded MPI framework.