# Circuit Implementations of the Differential Capacitance Read Scheme (DCRS) for Ferroelectric Random-Access Memories (FeRAM)

Yadollah Eslami, Student Member, IEEE, Ali Sheikholeslami, Senior Member, IEEE, Shoichi Masui, Member, IEEE, Toru Endo, and Shoichiro Kawashima, Member, IEEE

*Abstract*—This paper presents two circuit implementations for the differential capacitance read scheme (DCRS) in ferroelectric random-access memories (FeRAM). Compared to the conventional read scheme, DCRS achieves a faster read access by activating the sense amplifiers immediately after a wordline is activated. By relying on the capacitance difference instead of the charge difference, DCRS avoids raising the highly capacitive platelines until after the read is complete. We have implemented this scheme in a 0.35- $\mu$ m CMOS+Ferro test chip that includes an array of  $256 \times 32$  2T-2C cells. The test chip measures an access time of 45 ns at a power supply of 3 V.

*Index Terms*—Differential capacitance, fast read scheme, ferroelectric memory, memory circuit design, nondriven plateline, nonvolatile memory.

#### I. INTRODUCTION

ERROELECTRIC random-access memories (FeRAM) are well known for their low power, low voltage, and fast write operations compared to other nonvolatile memories [1]. These features have given FeRAM an advantage over Flash memory and EEPROM in applications such as contactless smart cards, digital cameras, PDAs, and cellular phones. The read access time of FeRAM, however, still lags behind those of EEPROM and Flash memory. Fig. 1 compares the typical read access time of a 512  $\times$  64, two-transistor two-capacitor (2T-2C) FeRAM using four different read schemes. Among the four, the conventional [1], the bitline driven [2], and the nondriven plateline [3] read schemes utilize the charge difference between a ferroelectric capacitor storing a "1"  $(Q_1)$  and a "0"  $(Q_0)$  as shown in Fig. 2, to detect the data stored in a cell. The differential capacitance read scheme (DCRS) [4], however, utilizes the capaci*tance* difference of the ferroelectric capacitor in states "1"  $(C_1)$ and "0"  $(C_0)$ , as defined in Fig. 2, for the same purpose.

As depicted in Fig. 1, a large portion of the read access time is spent on driving the highly capacitive plateline (in the conventional read scheme) or on charge sharing between the cell

Y. Eslami and A. Sheikholeslami are with the Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 3G4, Canada (e-mail: eslamiy@eecg.utoronto.ca; ali@eecg.utoronto.ca).

S. Masui, T. Endo, and S. Kawashima are with Fujitsu Laboratories Ltd., Kanagawa 211-8588, Japan (e-mail: masui@flab.fujitsu.co.jp; tendo@ flab.fujitsu.co.jp; poosan@flab.fujitsu.co.jp).

Digital Object Identifier 10.1109/JSSC.2004.835813



Fig. 1. Comparing read access times among various read schemes.



Fig. 2. Ferroelectric capacitor hysteresis loop defining "0", "1",  $Q_0, Q_1, C_0$ , and  $C_1$ .

capacitor and the bitline capacitor (in the bitline driven and the nondriven plateline read schemes). DCRS, on the other hand, compares the capacitance of the cell capacitor against a reference capacitance and starts the detection process right after a memory row is selected, eliminating the charge transfer and charge sharing times to achieve a smaller read access time. We have shown previously by simulation results [4] that DCRS can achieve 20% to 40% smaller read access time compared to the other three read schemes. In this paper, we present two circuit implementations of DCRS along with a comparative study of their complexity, area overhead, and sensitivity to the mismatches. We also present the architecture of a test chip designed to implement the proposed circuits along with the measurement results of the test chip.

The rest of this paper is organized as follows. In Section II, we briefly review the operation of the DCRS. In Section III, we present the two proposed circuits and discuss their operations. Section IV discusses the DCRS cycle time. In Section V, we study the effects of mismatches, fatigue, imprint, and relaxation

0018-9200/04\$20.00 © 2004 IEEE

Manuscript received March 2, 2004; revised July 15, 2004. This work was supported by Fujitsu Labs, Japan, and by NSERC, Canada.



Fig. 3. (a) 2T-2C architecture, direct DCRS implementation; (b) equivalent circuit during read cycle.



Fig. 4. DCRS read cycle timing.

on the read scheme. The test chip architecture and its measurement results are presented in Section VI and Section VII, respectively, and the conclusions are provided in Section VIII.

## II. DCRS PRIMER

Fig. 3(a) presents the basic concept of DCRS [4]. We describe the operation of this circuit during the read cycle using the equivalent circuit shown in Fig. 3(b) and the timing diagram of Fig. 4. Assuming a constant current  $I_s$ , the voltage on the bitline connected to a cell storing a "0" (BL0) will rise faster than the voltage on a bitline connected to a cell storing a "1" (BL1) because  $C_0$  is smaller than  $C_1$  (see Fig. 2). The voltage slew rates of BL0 and BL1 are given by the following equations:

$$SR_0 = I_S / C_{BL0} \tag{1}$$

$$SR_1 = I_S / C_{BL1} \tag{2}$$

where  $C_{\text{BL0}}$  and  $C_{\text{BL1}}$  are defined as

$$C_{\rm BL0} = C_{\rm BL} + C_0 \tag{3}$$

$$C_{\rm BL1} = C_{\rm BL} + C_1. \tag{4}$$

Fig. 5 shows typical simulation results where  $V_{\rm BL0}$  and  $V_{\rm BL1}$  rise according to (1) and (2). This simulation is based on a bitline with 256 cells ( $C_{\rm BL} \cong 500$  fF) and a 1- $\mu$ m<sup>2</sup> ferroelectric cell capacitor ( $C_0 \cong 50$  fF and  $C_1 \cong 150$  fF). In less than 6 ns, more than 400 mV of voltage difference is developed on BL0 and BL1. The CSC signal controls the current  $I_{\rm s}$  in (1) and (2) and can be generated on-chip.

#### **III. CIRCUIT IMPLEMENTATIONS**

Fig. 3(a) shows a direct implementation of the read scheme where the current sources are implemented by two PMOS transistors, controlled by CSC. After the bitlines are precharged to



Fig. 5. V<sub>BL1</sub> and V<sub>BL0</sub> for 2T-2C architecture.

ground (see timing diagram in Fig. 4) the WL and CSC are asserted, causing the PMOS transistors saturation current,  $I_s$ , to charge up the bitlines capacitors. When sufficient voltage difference is developed on the bitlines, the sense amplifier is turned on by activating the SAP and SAN signals, and the stored data is detected. The relative timing of CSC, SAP, and SAN is critical in this design. Referring to Fig. 5, since the voltages on both bitlines start from 0 V and eventually reach  $V_{\rm DD}$ , the sense amplifier must be turned on when the difference voltage  $(V_{\rm BL0}-V_{\rm BL1})$  is larger than the sense amplifier offset voltage. If SAP and SAN are activated too early or too late, the voltage difference available to the sense amplifier is less than the required minimum and may cause erroneous data read. Note also that for this design extra circuitry is required to generate the CSC signal.

Fig. 6(a) shows the second proposed circuit, the simplified DCRS, that can be used by two different timings depicted in Fig. 6(b) and (c). This circuit uses the PMOS transistors of the sense amplifier as the current sources by activating the SAP signal and hence does not need the extra activation signal CSC. In the SAN-delayed timing shown in Fig. 6(b), after precharging the bitlines to ground, the SAP signal is asserted and the PMOS transistors of the sense amplifier act as current sources charging up BL and  $\overline{BL}$  with the rates given by (1) and (2), while the SAN signal is delayed with respect to SAP to keep the NMOS transistors of the sense amplifier in the OFF state. Note that in this case, the current  $I_{\rm s}$  is controlled automatically by the positive feedback of the sense amplifier. For example, if the cell is storing a "0", BL rises faster than  $\overline{BL}$  and hence reduces the  $V_{GS}$  of M2 which is charging up the  $\overline{BL}$  capacitors. This reduces the current to  $\overline{BL}$  and develops the differential voltage on BL and  $\overline{BL}$  more rapidly. The detection process starts by the sense amplifier as soon as the SAN signal is asserted. The positive feedback effect of this circuit on  $V_{\rm BL}$  and  $V_{\rm BL}$  can be seen in Fig. 7 (compare with Fig. 5), which shows a read cycle for a cell storing a "0" with SAN kept inactive during the cycle. This implementation requires a delay circuit for delaying SAN with respect to SAP. The delay time is not critical as long as it is longer than the time required for the difference voltage on the bitlines  $(V_{BL0} - V_{BL1})$ grow beyond the sense amplifier offset voltage.

Fig. 6(c) shows the simplified timing for this implementation, in which SAP and SAN signals are activated simultaneously. The circuit starts exactly the same way as in the SAN-delayed timing. When both SAN and SAP are activated after precharging



Fig. 6. Simplified DCRS (a) using sense amplifier PMOS transistors as the current sources, (b) SAN-delayed timing, and (c) simultaneous SAP-SAN timing.



Fig. 7. Positive feedback effect on  $V_{\rm BL}$  and  $V_{\rm \overline{BL}}$  for simplified DCR with SAN inactive.

bitlines to ground, the NMOS transistors in the sense amplifier are in cut-off region for having their  $V_{\rm GS} < V_{\rm Tn}$ . But the positive feedback effect of the sense amplifier affecting the PMOS transistors, as discussed for the SAN-delayed timing, is enhanced as the voltages on BL and BL pass the threshold voltage ( $V_{\rm Tn}$ ) of the NMOS transistors. At this point, the bitline connected to the ferroelectric capacitor storing a "1" receives less charging current from its PMOS transistor, and is also pulled down by the sense amplifier NMOS transistor, and the amplification process starts automatically. The advantage of this timing to the timing of Fig. 6(b) is that it does not need a delay circuit to delay SAN with respect to SAP.



Fig. 8. Cell capacitances seen by the PL during write-back in DCRS.

# IV. DCRS CYCLE TIME

Fig. 1 shows clearly that DCRS has smaller read access time compared to other read schemes, however, this will not necessarily result in a smaller read cycle time. In this section, we will compare the read cycle times of DCRS and the conventional read scheme. The read operation in both DCRS and the conventional read scheme is destructive and requires a "write-back." The write-back time can be longer in DCRS than in the conventional read scheme. This is because the write-back in DCRS includes both the rise and fall times of PL, as opposed to just the fall time required by the conventional read scheme (the rise of PL takes place during the read access time). Assuming identical PL drivers in both schemes, the duration of the rise and fall times is a function of the PL capacitance. To compare the PL capacitance for both read schemes, consider a 2T-2C cell storing a "0" (the same argument applies to a cell storing a "1"), as depicted in Fig. 8(a). The PL capacitance contributed by this cell for the conventional read scheme is

$$C_{\rm PL} = \frac{C_{\rm BL} \cdot C_0}{C_{\rm BL} + C_0} + \frac{C_{\rm BL} \cdot C_1}{C_{\rm BL} + C_1}.$$
 (5)

To find the PL capacitance of a 2T-2C cell for DCRS during the write-back, we use Fig. 8(b), which shows the capacitances of the two capacitors seen by PL when  $V_{\rm BL} = V_{\rm DD}$  and  $V_{\overline{\rm BL}} = 0$  V after the activation of the sense amplifiers. The capacitance  $C'_0$  in this figure, refers to the capacitance of  $C_{\overline{\rm FE}}$  seen by PL.  $C'_0$  is larger than  $C_0$  because point "1" on the Q-axis is moved toward the origin due to the small voltage experienced by this capacitance contributed by this cell is

$$C'_{\rm PL} = C_0 + C'_0. \tag{6}$$

Using the typical values of  $C_{BL} \cong 500$  fF,  $C_0 \cong 50$  fF,  $C_1 \cong 150$  fF, and  $C'_0 \cong 100$  fF, (5) and (6) predict  $C_{PL} \cong 160$  fF and  $C'_{PL} \cong 150$  fF, for the conventional and DCRS, respectively. This shows that typically the capacitance experienced by PL from each cell is slightly larger for conventional read scheme than for DCRS. This makes the rise and fall of PL slightly faster in DCRS, but the write-back time of DCRS will be longer than that of the conventional read scheme (since it includes both a rise

and a fall time). The cycle times of both read schemes, however, are almost identical.

## V. MISMATCH EFFECTS

As mentioned in Section II, DCRS compares the voltages developed on a cell bitline and its reference bitline. The reference bitline for a 2T-2C array is the bitline connected to the capacitor in the same cell storing the complement data [see Fig. 3(a)]. The voltage developed on a bitline in DCRS is a function of  $C_{\rm BL}$ (bitline capacitance + cell capacitance) and  $I_s$  (the charging current). Any mismatch of either of these parameters for the cell bitline and the reference bitline, in addition to the mismatches in the sense amplifier, may cause an error in the detection of the stored bit. As suggested by (1)–(4),  $I_s$ ,  $C_{BL}$ ,  $C_0$ , and  $C_1$ are the major parameters that affect the read scheme performance. To investigate the effects of parameter mismatches on the read scheme, assume a charging current mismatch of  $\Delta I_{
m s}$ and a capacitance mismatch of  $\Delta C_{\rm BL}$  (total bitline capacitance mismatch including the cell capacitance mismatch). Then, the worst case  $SR_0$  and  $SR_1$  are given by

$$SR_{0(WorstCase)} = \frac{\left(I_s - \frac{\Delta I_s}{2}\right)}{\left(C_{BL0} + \frac{\Delta C_{BL}}{2}\right)}$$
(7)

$$SR_{1(WorstCase)} = \frac{\left(I_s + \frac{\Delta I_s}{2}\right)}{\left(C_{BL1} - \frac{\Delta C_{BL}}{2}\right)}.$$
(8)

Recall that DCRS operates on the premise that for a matched circuit  $SR_0 > SR_1$ . Equations (7) and (8) represent the worstcase slew rates because they represent a smaller current charging a larger capacitor (for  $SR_0$ ) or a larger current charging a smaller capacitor (for  $SR_1$ ). Despite mismatches, the read scheme will function properly as long as  $SR_{0(WorstCase)} > SR_{1(WorstCase)}$ . This is equivalent to

$$\frac{\left(1-\frac{\Delta I_s}{2I_s}\right)}{\left(1+\frac{\Delta I_s}{2I_s}\right)} > \frac{\left(1+\frac{\Delta C_{\rm BL}}{2C_{\rm BL0}}\right)}{\left(\frac{C_{\rm BL1}}{C_{\rm BL0}}-\frac{\Delta C_{\rm BL}}{2C_{\rm BL0}}\right)}.$$
(9)

Based on (9), to compensate a 10% mismatch on  $I_{\rm s}$  and a 10% mismatch of  $C_{\rm BL}$ ,  $C_{\rm BL1}/C_{\rm BL0}$  must be greater than 1.21. Based on our simulation results, this condition holds for up to 256 cells per bitline using  $1-\mu m^2$  cell ferroelectric capacitors.

For special cases when only one of the two mismatches exists, (9) can be simplified as

$$\frac{C_{\rm BL0}}{C_{\rm BL1}} < \frac{\left(1 - \frac{\Delta I_s}{2I_s}\right)}{\left(1 + \frac{\Delta I_s}{2I_s}\right)} \quad \text{when} \quad \Delta C_{\rm BL} = 0 \tag{10}$$

$$\Delta C_{\rm BL} < C_1 - C_0 \qquad \text{when} \quad \Delta I_{\rm s} = 0. \tag{11}$$

 $\Delta I_{\rm s}$  is often caused by the mismatches in the threshold voltage  $(V_{\rm Tp})$  or saturation currents  $(I_{\rm sat})$  of the PMOS transistors. Our



Fig. 9.  $I_{\rm s}$  mismatch effect on DCRS.

TABLE I COMPARISON OF MISMATCH TOLERANCE OF CONVENTIONAL AND DCRS READ SCHEMES (✔: NO FAILURE UP TO 100 mV MISMATCH)

| Read Scheme                           | Conventional |      | DCRS |      |
|---------------------------------------|--------------|------|------|------|
| Mismatch Type                         | (1X)         | (3X) | (1X) | (3X) |
| PMOS V <sub>th0</sub> ONLY            | ~            | ~    | ~    | ~    |
| PMOS and NMOS $V_{th0}$               | ~            | ~    | 90mV | ~    |
| PMOS and NMOS $V_{th0}$ and 10% $I_d$ | ~            | ~    | 50mV | ~    |

simulations results indicate that a 100-mV mismatch in  $V_{\rm Tp}$  of the PMOS current sources reduces the sense voltage by 25%, as depicted in Fig. 9, but does not create an error. Similarly, a 10% mismatch in  $I_{\rm sat}$  reduces the sense voltage by 25%, with no error in circuit operation.

 $\Delta C_{\rm BL}$  is caused by unequal parasitic capacitances on the bitlines and/or tolerances in the size of the cell capacitors due to process variations. Table I compares the mismatch tolerance of the conventional read scheme and DCRS for 1X and 3X cells, obtained by simulating a 256 × 64, 2T-2C memory array. It shows that DCRS is as robust to process variations as the conventional read scheme.

So far we have considered the effects of mismatch on DCRS. Now, we consider the effects of the ferroelectric material imperfections, namely, fatigue, imprint, and relaxation as depicted in Fig. 10, on the read scheme. In this figure, the C\* values represent the capacitances after the imperfection has occurred. In all three cases, it is important to note that DCRS functions based on the large-signal capacitance, and not on the small-signal capacitance of the capacitors. It can be seen from the figure that in all three cases,  $C_0$  is increased and  $C_1$  is decreased after the imperfection has occurred, i.e.,  $C_0^* > C_0$  and  $C_1^* < C_1$ , with imprint having the worst effect. Referring to (1) and (2), larger  $C_0^*$  decreases SR<sub>0</sub> and smaller  $C_1^*$  increases SR<sub>1</sub>, both reducing the difference between SR<sub>0</sub> and SR<sub>1</sub> and hence reducing the



Fig. 10. Ferroelectric materials imperfections: (a) Normal; (b) fatigued; (c) imprinted; (d) relaxed.



Column Sampling Sense Circuit Circuits Amplifiers

| Technology              | 0.35µm, 3 LM             |  |
|-------------------------|--------------------------|--|
| Ferroelectric Capacitor | Planar                   |  |
| Power Supply            | 3.0 V                    |  |
| Memory Array (2T-2C)    | 8kbits                   |  |
| Memory Array (1T-1C)    | 8kbits                   |  |
| Sense Amplifier         | CCSA Sensing             |  |
| Special Feature         | Bitline Sampling Monitor |  |

Fig. 11. Test chip layout and specifications.

sense margin for the read scheme. Therefore, ferroelectric material imperfections reduce the sense margin in DCRS, but the read scheme will not fail as long as  $C_0^*$  remains smaller than  $C_1^*$ .

## VI. TEST CHIP ARCHITECTURE

An FeRAM test chip is designed to implement the proposed circuits for DCRS in 0.35- $\mu$ m CMOS+Ferro technology. The test chip layout and specifications and its die photo are shown in Fig. 11 and Fig. 12, respectively. The memory is divided into two arrays, the 2T-2C array (256 rows × 32 columns) and the 1T-1C array (256 rows × 64 columns) as shown in Fig. 13(a). The 1T-1C array has its cells interleaved for ease of layout design, hence a total of 8 K cells are implemented. Two special features are implemented on the test chip to study the read scheme performance over a relatively wide range of bitline and cell capacitances. First, four different cell capacitor sizes ( $C_{\rm FE}$ ) are used in every four consecutive rows of the memory, starting from the minimum size ( $1X = 1 \mu m^2$  capacitor area) to four times the minimum (4X) as shown in Fig. 13(b). Second, four



Fig. 12. Test chip die photo.



Fig. 13. (a) Memory array architecture in test chip and (b) different cell sizes in the array.

bitline sizes are employed in four different blocks of each array. In the first block, each bitline is connected to 64, in the second to 128, in the third to 196, and in the fourth block to 256 cells as shown in Fig. 14. Since the bitline capacitance  $(C_{BL})$  is the sum of the diffusion capacitances of the cell access transistors and the parasitics of the metal conductors, the bitline capacitances of these blocks are expected to vary from one to four times the minimum bitline capacitance. Therefore,  $C_{BL2} = 2C_{BL1}$ ,  $C_{\text{BL3}} = 3C_{\text{BL1}}$ , and  $C_{\text{BL4}} = 4C_{\text{BL1}}$  in Fig. 14 for both arrays. Using these two features, 16 different combinations of bitline and cell capacitances (4 cell capacitor size  $\times$  4 bitline capacitor size) are available for testing in each array. To keep the memory array homogeneous, the unconnected cells are kept in place in the array (represented by the dashed lines in Fig. 14) but not connected to the bitlines and never accessed. Friendly cells are also placed all around the memory array, but not at the interface of the 2T-2C and 1T-1C arrays. Friendly cells are unused memory cells which are located around the memory array. They provide



Fig. 14. Memory array architecture (dashed lines indicate not accessed cells).



Fig. 15. (a) Folded bitline structure for 1T-1C architecture. (b) Reference capacitance implementation. (c) Equivalent circuit when accessed.

the cells located around the array with the same coupling effects as the cells in the middle of the array.

Special addressing scheme is used to simplify the address decoding circuitry of the test chip. A 64-bit shift register is used to select a row segment consisting of four consecutive rows. Similarly, a 16-bit shift register is used to select a column segment consisting of four consecutive columns. These shift registers reset to "00...001" by their corresponding reset signals, selecting the first row and column segments of the array. The lone "1" in each shift register can be shifted circularly to select a different segment by applying an external clock to the corresponding shift register. A two-bit row address (column address), provided off-chip, is decoded to select a row (column) in a segment. The cell at the intersection of the selected row and column is the active cell which is accessed in any given cycle. One PMOS transistor is connected to every bitline, controlled by a common input signal CSC, which implements the current source for that bitline. One cross-coupled-inverter sense amplifier per bitline pair is implemented that is controlled by externally provided SAP signal and its internally generated complement, SAN.

A folded bitline architecture is used for the 1T-1C array and every bitline pair share a single reference capacitor as shown in Fig. 15(a). Two types of reference capacitors are implemented



Fig. 16. Block diagram of the sampling monitor circuitry.

in the 1T-1C array: a parallel combination of two ferroelectric capacitors in series  $((C_0 + C_1)/2)$ , as shown in Fig. 15(b) and Fig. 15(c), introduced in [4], and an oversized reference capacitor [1]. In order to compare each cell capacitor with different reference capacitor sizes, four different reference capacitor sizes, 1X to 4X, are used in the test chip.

A sampling monitor circuitry [5] is implemented on every bitline of the memory array to monitor the bitline voltage as a function of time. On the test chip layout, it is located right under the sense amplifier on each column as highlighted in Fig. 11. The block diagram of the circuit is shown in Fig. 16. The circuit operates in one of the two modes: the "calibration" mode and the "sampling" mode. In the calibration mode, an off-chip signal,  $V_{cal}$ , is sampled and sent to the output. This mode is used to tune the sampling frequency and the off-chip low-pass filters to reproduce  $V_{cal}$ . In the sampling mode, the selected bitline (BL) and its reference bitline (BLB) are sampled periodically to produce BLout and BLBout signals, respectively. In both modes, the sampling clock frequency is very close (but not exactly equal) to the frequency of the signals being sampled. This guarantees that the samples taken from different periods of the input waveforms are close enough to reproduce the original periodic signals after passing a simple low-pass filter.

#### VII. SIMULATION AND MEASUREMENT RESULTS

A bitline with 256 cells and each cell with the minimum size  $(1X = 1 \ \mu m^2)$  ferroelectric capacitor is considered for simulation. Since DCRS is a destructive read out, a write-back is required at the end of a read cycle. Therefore, a write cycle is followed by two read cycles in all simulations and measurements. The first read cycle verifies if the write has been successful and also does a write-back to the cell. The second read cycle verifies if the write both "0" and "1" into the cell and read them from the cell successfully. Since, both proposed circuits with all three timings, presented in Section III, show similar results on the test chip measurements, in the following we will present the results for the simplified DCRS [Fig. 6(a)] with the simultaneous SAP-SAN activation timing [Fig. 6(c)].



Fig. 17. DCRS simulation results for 2T-2C FeRAM architecture.



Fig. 18.  $V_{\rm BL}$  and  $V_{\rm BL}$  waveforms on the test chip (2T-2C cell).



Fig. 19.  $D_{out}$ , RD, WR, WL waveforms on the test chip (2T-2C cell).

# A. 2T-2C Array

Fig. 17 shows the simulated voltages on BL and  $\overline{BL}$  for a 2T-2C memory cell. A differential voltage of 380 mV is developed between BL and  $\overline{BL}$  at sensing point. A successful read operation is confirmed by monitoring the bitline voltages of the test chip for both read "0" and read "1", as shown in Fig. 18. The output data signal, D<sub>out</sub>, along with the control signals RD, WR, and WL measured on the test chip are shown on Fig. 19.



Fig. 20. DCRS simulation results for 1T-1C FeRAM architecture (oversized reference capacitor).



Fig. 21. DCRS simulation results for 1T-1C FeRAM architecture (proposed reference capacitor).

The read scheme performs well for the 2T-2C array for all combinations of the bitline and cell capacitances and more than 98% of the cells are functional. The failing cells are located at the boundary of 2T-2C and 1T-1C arrays. The missing friendly cells at the boundary are the cause of failure for these cells.

# B. 1T-1C Array

Fig. 20 and Fig. 21 show the simulation results for the 1T-1C array with an oversized reference capacitor [1] and the reference capacitor of Fig. 15(b), respectively. It can be seen that for both cases, more than 300 mV of voltage difference is available to the sense amplifier during the read "0" and read "1". This suggests a successful read from the cell for both reference capacitors. The measurement results, however, reveal that this is true for the cells with the oversized reference capacitor, but not for the cells with the reference capacitor of Fig. 15(b). In the latter case the reference bitline always rises faster than the bitline connected to the cell. This indicates that the reference capacitor always exhibits a smaller capacitance than both  $C_0$ and  $C_1$ . Further measurement results show that the ferroelectric capacitors of the reference capacitor switch much slower than the cell capacitors. This is in agreement with the results published in [6] and [7] where the switching time of the ferroelectric material is inversely proportional to the applied voltage. Referring to Fig. 15(c), any voltage developed on the reference bitline during the read cycle is divided equally between the upper and lower parallel branches. Since this voltage is relatively small  $(V_{\rm DD}/2 \text{ at its peak})$  the ferroelectric material of the capacitors at state "1", switch slower than a capacitor experiencing a full  $V_{\rm DD}$ . This in turn results in a smaller capacitance than expected  $C_1$ . The measurement results suggest other reference generation techniques need to be researched for 1T-1C DCRS [8].

## VIII. CONCLUSION

We presented two circuit implementations for the DCRS and demonstrated that they operate successfully over a wide range of bitline and cell capacitances. Both the simulation and measurement results of a test chip implemented in 0.35- $\mu$ m CMOS+Ferro technology confirm that the DCRS speeds up the FeRAM read access by up to 40% compared to the conventional read scheme. We have also shown, analytically and by simulation, that DCRS is robust to process variations.

#### ACKNOWLEDGMENT

The authors thank the anonymous reviewers for their technical comments on this work.

#### REFERENCES

- A. Sheikholeslami and P. G. Gulak, "A survey of circuit innovations in ferroelectric random-access memories," *Proc. IEEE*, vol. 88, pp. 667–689, May 2000.
- [2] H. Hirano, T. Honda, N. Moriwaki, T. Nakakuma, A. Inoue, G. Nakane, S. Chaya, and T. Sumi, "2-V/100-ns nonvolatile ferroelectric memory architecture with bitline-driven read scheme & nonrelaxation reference cell," in *IEEE Int. Symp. VLSI Circuits Dig. Tech. Papers*, 1996, pp. 48–49.
- [3] H. Koike, T. Otsuki, T. Kimura, M. Fukuma, Y. Hayashi, Y. Maejima, K. Amanuma, M. Tanabe, T. Matsuki, S. Saito, T. Takeuchi, S. Kobayashi, T. Kunio, T. Hase, Y. Miyasaka, N. Shohota, and M. Takada, "A 60 ns 1 Mb nonvolatile ferroelectric memory with nondriven cell plate line write/read scheme," in *IEEE ISSCC Dig. Tech. papers*, 1996, pp. 368–369.
- [4] Y. Eslami, A. Sheikholeslami, S. Masui, T. Endo, and S. Kawashima, "A differential-capacitance read scheme for FeRAMs," in *IEEE Int. Symp. VLSI Circuits Dig. Tech. Papers*, June 2002, pp. 298–301.
- [5] R. Ho, B. Amrutur, K. Mai, B. Wilburn, T. Mori, and M. Horowitz, "Applications of on-chip samplers for test and measurement of integrated circuits," in *Symp. VLSI Circuits, Dig. Tech. Papers*, 1998, pp. 138–139.
- [6] A. K. Tagantsev, I. Stolichnov, and N. Setter, "Non-Kolmogorov-Avrami switching kinetics in ferroelectric thin films," *Phys. Rev. B*, vol. 66, 214109, 2002.
- [7] J. Chow, A. Sheikholeslami1, J. S. Cross, and S. Masui, "A voltagedependent switching-time (VDST) model of ferroelectric capacitors for low-voltage FeRAM circuits," in *IEEE Int. Symp. VLSI Circuits Dig. Tech. Papers*, June 2004, pp. 448–449.
- [8] T. Chandler, A. Sheikholeslami, S. Masui, and M. Oura, "An adaptive reference generation scheme for 1T1C FeRAMs," in *IEEE Int. Symp. VLSI Circuits Dig. Tech. Papers*, 2003, pp. 173–174.



Ali Sheikholeslami (S'98–M'99–SM'02) received the B.Sc. degree from Shiraz University, Shiraz, Iran, in 1990 and the M.A.Sc. and Ph.D. degrees from the University of Toronto, Toronto, ON, Canada, in 1994 and 1999, respectively, all in electrical and computer engineering.

In 1999, he joined the Department of Electrical and Computer Engineering, University of Toronto, where he is currently an Assistant Professor and holds the L. Lau Junior Chair in Electrical and Computer Engineering. His research interests are in the areas of

analog and digital integrated circuits, high-speed signaling, VLSI memory design (including SRAM, DRAM, and CAMs), and ferroelectric memories. He has collaborated with industry on various VLSI design projects in the past few years, including work with Nortel, Canada, in 1994, with Mosaid, Canada, since 1996, and with Fujitsu, Japan, since 1998. He is currently supervising three active research groups in the areas of ferroelectric memories, CAMs, and high-speed signaling. He has coauthored several journal and conference papers, as well as a book chapter on ferroelectric memories. He holds several patents on both ferroelectric memories and CAMs.

Dr. Sheikholeslami received the Best Professor of the Year Award in 2000 and 2002 by the popular vote of the undergraduate students in the Department of Electrical and Computer Engineering, University of Toronto. He served on the Memory Subcommittee of the IEEE International Solid-State Circuits Conference (ISSCC) from 2001 to 2004. He has served on the Technology Directions Subcommittee of the same conference since 2002. He presented a tutorial on ferroelectric memory design at the ISSCC 2002.



Shoichi Masui (M'90) received the B.S. and M.S. degrees from Nagoya University, Nagoya, Japan, in 1982 and 1984, respectively.

From 1994 to 1999, he was with Nippon Steel Corporation, Sagamihara, Japan, where he was engaged in researches on SOI device, nonvolatile memory circuit design and its application to radio frequency identification (RFID) ICs. From 1990 to 1992, he was a Visiting Scholar at Stanford University, Stanford, CA, where he researched substrate-coupling noise in mixed-signal ICs. In 1999,

he joined Fujitsu Ltd., and since 2000 he has been with Fujitsu Laboratories Ltd., where he is currently a Research Fellow and is engaged in design of ferroelectric random access memory (FeRAM) for smart cards, RFIDs, and reconfigurable logic LSIs. In 2001, he was a Visiting Scholar at University of Toronto, Canada, where he researched on FeRAM design, and its application to reconfigurable logic LSIs. He contributed to a chapter of *Ferroelectric Random Access Memories Fundamentals and Applications* (Berlin: Springer-Verlag, 2004). In 2004, he received commendation by the Minister of Education, Culture, Sports, Science, and Technology, Japan.



**Toru Endo** was born in Shizuoka Prefecture, Japan, in 1960. He received the B.E. degree in electronic engineering from Tokai University, Kanagawa, Japan, in 1983, and the M.E. degree in electronic engineering from Meiji University, Kanagawa, in 1985.

He joined Fujitsu Ltd., Kawasaki, Japan, in 1985, where he designed Bipolar PROMs and BiCMOS SRAMs. From 1995 to 1999, he worked on the development of Flash memory. In 1999, he moved Fujitsu Laboratories Ltd., and started to research and

develop FeRAMs. Since 2002, he has been with the FeRAM division of Fujitsu Ltd., Kawasaki, Japan.



Yadollah Eslami (S'00) is a Ph.D. candidate in the Department of Electrical and Computer Engineering at the University of Toronto, Canada. He received the B.Sc. degree in electrical engineering from Shiraz University, Iran, and the M.Sc. degree in communication systems from Isfahan University of Technology, Iran, in 1985 and 1987, respectively. He was a Lecturer with the Electrical and Com-

puter Engineering Department of the Isfahan University of Technology from 1987 to 1999. His research interests are in the areas of ferroelectric memories,

VLSI memories, and microprocessor architecture.



Shoichiro Kawashima (M'83) was born in Yokohama, Japan, in 1958. He received the B.S. degree in applied physics from Tokyo University, Tokyo, Japan, in 1982.

He joined Fujitsu Ltd., Kawasaki, Japan, in 1982, where he was engaged in the development of 16-kb/16-Mb MOS static RAMs. In 1994, he went to Fujitsu Laboratories Ltd., Kawasaki, where he researched low-power SRAMs and DSPs. Since 2002, he has been with the FeRAM division of Fujitsu Ltd., Kawasaki.

Mr. Kawashima is a member of the Japan Society of Applied Physics and the Institute of Electronics, Information, and Communication Engineers of Japan.