Michael Stumm: Alumni

Ph.D. Alumni: Duncan G. Elliott

Reference:

Duncan G. Elliott
Computational RAM: A Memory SIMD Hybrid
Ph.D. Thesis, Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada, 1998.
Co-supervised by: Prof. W. Martin Snelgrove

Supervisor(s):

Michael Stumm
W. Martin Snelgrove

Download Thesis:

PDF

Abstract:

In this thesis, a novel computer architecture called Computational RAM (C-RAM) is proposed and implemented. C-Ram is semiconductor random access memory with processors incorporated into the design, while retaining a memory interface. C-RAM can be used to build an inexpensive, massively parallel computer. Applications that contain the appropriate parallelism will typically run thousands of times faster on C-RAM than on the CPU. This work includes the design and implementation of the architecture as a working chip with 64 processor elements (PEs), a PE design for a 2048-PE 4 Mbit DRAM, and applications.

C-RAM is the first processor-in-memory architecture that is scalable across many generations of DRAM. This scalability is obtained by pitch-matching narrow 1-bit PEs to the memory and restricting communications to using 1-dimensional interconnects. The PEs are pitch-matched to memory columns so that they can be connected to the sense amplifiers. The 1-bit wide data path is suitable for a narrow, arrayable VLSI implementation, is compatible with memory redundancy, and has the highest performance cost ratio among hardware arithmetic algorithms. For scalability, the memory arrays and memory-style packaging limit the internal interprocessor communications to 1-dimentional networks. Of these networks, both a broadcast bus network and a left-right nearest-neighbor network are implemented.

C-RAM requires little overhead over the existing memory to exploit much of the internal memory bandwidth. When C-RAM PEs are added to DRAM, more than 25% of the internal memory bandwidth is exploited at a cost of less than 25% in terms of silicon area and power. The memory bandwidth internal to memory chips at the sense amplifiers can be 3,000 times the memory bandwidth at the CPU. By placing SIMD PEs adjacent to those sense amplifiers, this internal memory bandwidth can be better utilized.

The performance of C-RAM has been demonstrated in a wide range of application areas, and speedups of several orders of magnitude compared to a typical workstation have been shown in the fields of signal and image processing, database, and CAD.

Keywords:

Computer architecture, memory, parallel system, DRAM, SIMD, processor-in-memory architecture

BibTeX:

@phdthesis(Elliott-PhD98,
    author = {Duncan G. Elliott},
    title = {Computational RAM: A Memory SIMD Hybrid},
    school = {Department of Electrical and Computer Engineering, University of Toronto},
    address = {Toronto, Canada},
    supervisors = {Michael Stumm, W. Martin Snelgrove},
    year = {1998},
    keywords = {Computer architecture, memory, parallel system, DRAM, SIMD, processor-in-memory architecture}
)