The goal of this work is to encourage simpler software development over difficult hardware design by offering FPGA designers the option of scaling the performance of data parallel computations using a soft vector processor. Currently, FPGA designers implement their systems primarily in hardware using an HDL description, with a few least-critical computations and system control tasks implemented in C and executed on a soft processor such as Altera's Nios II or Xilinx's Microblaze. If these soft processors could scale their performance, they could be used to implement more critical computations hence reducing the amount of laborious hardware design necessary to implement a complete FPGA system. In the long term, our goal is to develop EDA tools for automatically generating soft processors that exploit both the properties in an application and the features in an FPGA.
To explore the feasibility of soft vector processor we have designed VESPA which implements a full soft vector processor and can execute industry-standard EEMBC benchmarks on the Altera DE3 board connected to a DDR2 memory. The VESPA processor is comprised of a MIPS scalar processor generated from SPREE and a hand-made vector coprocessor implementing most integer and fixed-point instructions from the VIRAM instruction set. VESPA's two key features are its scalability and flexibility.
VESPA is highly parameterized allowing it to finely span a large design space. Its parameters allow a user to modify its compute architecture, instruction set, and memory system in several ways. A table listing its parameters are shown below.
|Memory Crossbar Lanes||M||1,2,4,8,... L|
|Multiplier Lanes||X||1,2,4,8,... L|
|Register File Banks||B||1,2,4,...|
|ALU per Bank||APB||true/false|
|Maximum Vector Length||MVL||2,4,8,16,...|
|Vector Lane Bit-Width||W||1,2,3,4,..., 32|
|Enable Each Vector Instruction||-||on/off|
|ICache Depth (KB)||ID||4,8,...|
|ICache Line Size (B)||IW||16,32,64,...|
|DCache Depth (KB)||DD||4,8,...|
|DCache Line Size (B)||DW||16,32,64,...|
|DCache Miss Prefetch (num cache lines)||DPK||1,2,3,...|
|Vector Miss Prefetch (num cache lines)||DPV||1,2,3,...|
Combining all the above parameters results in an enormous design space. After pruning less interesting points, we explore the design space of 768 configurations as shown in the figure below. In terms of area, VESPA can span a range of 28x between the smallest and biggest configurations. In terms of wall clock time, the range is 18x between the fastest and slowest configurations.