ECE1388F VLSI Design Methodology
Final Project:
Cache Chip Design
Jennifer Pham
Cintia Man
Shahriar Shahramian
Oleksiy Tyshchenko
January 31, 2005
This document focuses on the design of a synchronous write
through cache memory which implements least recently used (LRU) algorithm for
efficient data access. In the LRU
algorithm, the content of the least recently used cell is being replaced
whenever a write cycle is executed. The
content of the most recently used cell remains untouched in the cache. The cache chip consists of address and data
path. The address path is responsible for mapping the global memory address
(typically from external DRAM) to the physical location of data in the cache.
The data path returns the content of the physical memory location to the
requesting device (typically from a processor or controller). The address path is implemented as
Figure 2.1
illustrates the overall block diagrams of the cache chip. The processor is
assumed to have an address space of 16 bits. The address bits (15:0) are
latched with Flip Flops before they are sent to the
·
·
SRAM:
256 Rows x 4 Blocks x 32 Bits

Figure 2.1: System
Block Diagram
When it comes to
replacing an element in the Cache, an LRU algorithm has been employed. A custom
LRU CAM block has been designed for which its
The over all
algorithmic description of the behavior of our Cache chip during READ/WRITE
cycles is as follows:
READ HIT
1.
2.
The
least 2 significant bits of the address are used to select the appropriate SRAM
block.
3.
The
data is read from SRAM and sent back to the processor.
4.
The
corresponding LRU CAM ROW COUNTER is reset to 00000.
READ MISS
1.
2.
The
processor is notified and the LRU CAM is searched with patters to find the LRU
element.
3.
In
the meanwhile 4 words of data corresponding to the 14 most significant bits of
the address are brought from main memory.
4.
The
data from the main memory is also sent to the processor.
5.
LRU
DECISION block selects the LRU CAM row and the new address is written in the
row while 4 new words are stored in the SRAM in the same row number.
6.
The
corresponding LRU CAM ROW COUNTER is reset to 00000.
WRITE HIT
1.
2.
The
least 2 significant bits of the address are used to select the appropriate SRAM
block.
3.
The
new data coming from the processor is updates in the SRAM.
4.
In
the meanwhile, the data in the main memory is also updates. This is necessary
as the Cache is a write through Cache.
5.
The
corresponding LRU CAM ROW COUNTER is reset to 00000.
WRITE MISS
1.
2.
The
processor is notified and the LRU CAM is searched with patters to find the LRU
element.
3.
In
the meanwhile 4 words of data corresponding to the 14 most significant bits of
the address are updated and brought from main memory.
4.
LRU
DECISION block selects the LRU CAM row and the new address is written in the
row while 4 new words are stored in the SRAM in the same row number.
5.
The
corresponding LRU CAM ROW COUNTER is reset to 00000.
In terms of
division of tasks, here is an estimate:
·
LRU
Block Circuitry: Shahriar
·
·
SRAM
Column Circuitry: Jen
·
Peripheral
Circuitry for LRU,
·
System
Integration: All team members
a) Circuit Schematic

Figure 3.1:
LRU

Figure 3.2:
LRU

Figure 3.3:
LRU
Decision Block Schematic
b) Cell Layout

Figure 3.4:
LRU

Figure 3.5:
LRU
Decision Block Layout
c) Simulation Results

a) Circuit Schematic

Figure 4.1:

Figure 4.2:

Figure 4.3:
b) Cell Layout

Figure 4.4:

Figure 4.5:
MLSA
Layout
c) Simulation Results


Figure 5.1:
SRAM
Column Circuitry

Figure 5.2:
SRAM
6T Cell

Figure 5.3:
Column
Decoder

Figure 5.1:
Column
Multiplexer

Figure 5.2:
Read
Tri-state Circuitry

Figure 5.3:
SRAM
Sense Amplifier (SA)

Figure 5.4:
Write
Tri-state Circuitry

Figure 5.5:
Write
Bit-line Driver

Figure 5.6:
Input/Output
Flip Flop

Figure 5.7:
SRAM
Cell Layout

Figure 5.8:
Column
Decoder Layout

Figure 5.9:
Column
Multiplexer Layout

Figure 5.10:
Read
Tri-state Layout

Figure 5.11:
SRAM
Sense Amplifier Layout

Figure 5.12:
Write
Tri-state and Bit-line Driver Layout

Figure 5.13:
Input/Output
Flip Flop Layout
Read 1 from SRAM: d = data @ SRAM, I/O = data out @ PAD, CLK =
200MHz
Read Access Time
(Read_Precharge ΰ IO) = 5.34ns

Read 0 from SRAM: d = data @ SRAM, I/O = data out @ PAD, CLK =
200MHz

Write 1 and
0 to SRAM : I/O = data in @ PAD, d = data @ SRAM, CLK =
200MHz
Write Time (CLK ΰ d)
= 2.28ns


Figure 6.1:
Cache
Chip Layout Architecture

Figure 6.2:
Overall
Chip With Pads

Figure 6.3:
Overall
Chip