Vector Extended Soft Processor Architecture (VESPA)

A parameterized soft vector processor fully implemented on a DE3 FPGA board

The goal of this work is to encourage simpler software development over difficult hardware design by offering FPGA designers the option of scaling the performance of data parallel computations using a soft vector processor. Currently, FPGA designers implement their systems primarily in hardware using an HDL description, with a few least-critical computations and system control tasks implemented in C and executed on a soft processor such as Altera's Nios II or Xilinx's Microblaze. If these soft processors could scale their performance, they could be used to implement more critical computations hence reducing the amount of laborious hardware design necessary to implement a complete FPGA system. In the long term, our goal is to develop EDA tools for automatically generating soft processors that exploit both the properties in an application and the features in an FPGA.

To explore the feasibility of soft vector processor we have designed VESPA which implements a full soft vector processor and can execute industry-standard EEMBC benchmarks on the Altera DE3 board connected to a DDR2 memory. The VESPA processor is comprised of a MIPS scalar processor generated from SPREE and a hand-made vector coprocessor implementing most integer and fixed-point instructions from the VIRAM instruction set. VESPA's two key features are its scalability and flexibility.

Scalability

To lure FPGA designers into using a soft vector processor it must be able to effectively convert area into performance. Across a set of benchmarks, many from the EEMBC benchmark suite, we show that an FPGA designer can increase the number of vector lanes from 1 to 32 and achieve up to 27x speedup (average 15x) as seen in the figure below.

Flexibility

VESPA is highly parameterized allowing it to finely span a large design space. Its parameters allow a user to modify its compute architecture, instruction set, and memory system in several ways. A table listing its parameters are shown below.

Parameter Symbol Value Range
Compute
Vector Lanes L 1,2,4,8,16,...
Memory Crossbar Lanes M 1,2,4,8,... L
Multiplier Lanes X 1,2,4,8,... L
Register File Banks B 1,2,4,...
ALU per Bank APB true/false
Instruction Set
Maximum Vector Length MVL 2,4,8,16,...
Vector Lane Bit-Width W 1,2,3,4,..., 32
Enable Each Vector Instruction - on/off
Memory
ICache Depth (KB) ID 4,8,...
ICache Line Size (B) IW 16,32,64,...
DCache Depth (KB) DD 4,8,...
DCache Line Size (B) DW 16,32,64,...
DCache Miss Prefetch (num cache lines) DPK 1,2,3,...
Vector Miss Prefetch (num cache lines) DPV 1,2,3,...

Combining all the above parameters results in an enormous design space. After pruning less interesting points, we explore the design space of 768 configurations as shown in the figure below. In terms of area, VESPA can span a range of 28x between the smallest and biggest configurations. In terms of wall clock time, the range is 18x between the fastest and slowest configurations.


People

Publications

Links

Download


[ECE Home] [APSC Home Page] [UofT Home Page]
Last Updated June 26, 2009;