# AN ADAPTIVE 4-PAM DECISION-FEEDBACK EQUALIZER FOR CHIP-TO-CHIP SIGNALING

## Marcus van Ierssel, Joyce Wong, Ali Sheikholeslami

University of Toronto, Dept. of Elec. and Comp. Eng Email: {vane, jwongh, ali}@eecg.toronto.edu

Abstract - This paper presents a 4-PAM adaptive decisionfeedback equalizer (DFE) for chip-to-chip signaling. The DFE adapts to the channel impulse response by observing a calibration sequence sent across the channel. Uninterrupted signaling is maintained across a parallel bus by providing an additional channel and using multiplexors to reroute the signals of the channel being calibrated. Using the intermittent calibration sequence instead of the conventional LMS adaptation technique removes the need to generate an error signal, eliminating the associated analog blocks. Also presented is a novel method of using the DFE adaptation circuits to extract the system's pulse response. The complete transceiver is implemented in a 0.18  $\mu$ m CMOS process.

#### I. Introduction

Even with CMOS process scaling allowing higher and higher levels of system integration, the need for inter-chip interconnect bandwidth continues to grow. This increased bandwidth demand is satisfied using increased pin counts as well as increased signaling rates. As these rates reach 10 Gb/sec and beyond [1][2], detailed knowledge of the channel inter-symbol interference (ISI) becomes all the more crucial. Since the channel is not known prior to the design, or if known might change with temperature and process variations, the only viable technique to compensate for ISI is adaptive equalization.

The 4-PAM equalizer presented in [3] performs adaptive equalization in the *transmitter*. Adaptive transmit equalization requires a bi-directional wire or back-channel to communicate parameter updates from the receiver to the transmitter. Also, it increases the high frequency content of the transmitted signal, causing increased crosstalk, and reduces the maximum transmit voltage swing, both reducing the signal to noise ratio.

This paper describes an adaptive decision-feedback equalizer (DFE) design, implemented in the receiver, which avoids these problems. A DFE is an equalizer that uses the past decisions of the receiver and an estimate of the channel impulse response to create and subtract a replica of the ISI from the current symbol [4][5][6]. The LMS algorithm used in conventional adaptive DFEs requires an error signal to guide the adaptation. This error signal is defined as the difference between the input and output of the slicer that reduces the analog input to a limited number of levels (2 for binary, 4 for 4-PAM etc.). In a high speed DFE application where the speed of a technology is being pushed to its limit, the delay in the slicer decision circuit approaches the symbol period. When this happens, by the time the output is ready, the analog value of the input has changed significantly and can no longer be used to derive the error signal. One solution to this problem is to delay the analog value using an analog shift register, but this requires additional power and adds complexity. To avoid these problems this work describes an adaptive DFE using an intermittent calibration sequence to directly measure and cancel ISI. This technique allows the digital output of the DFE to be used directly to guide the



# Fig. 1 Equalization architecture

adaptation process without having to generate a separate error signal.

Our system uses a 4-phase clock, with consecutive 4-PAM symbols that are decoded by interleaved DFEs, as shown in Fig. 1. In the rest of this paper we discuss detailed system and circuit implementation of this design, particularly our technique of using a calibration sequence to adaptively determine the DFE filter parameters  $h_1$ ,  $h_2$ , and  $h_3$ , and comparator reference levels '2' and '-2'.

# **II. System Description**

Adaptive equalization is performed during system operation by transmitting an intermittent calibration sequence. To provide uninterrupted signaling, the system uses N+1 channels to transmit N signals. For test purposes we have chosen N=4 to demonstrate a small, yet non-trivial implementation. As shown in Fig. 2, multiplexors are used to take one channel out of service at a time, seamlessly transferring its signaling duties to the previously out-ofservice channel. The newly out-of-service channel uses the calibration sequence to adjust the DFE filter parameters and adjust the phase of the receiver's clock.

Fig. 3 shows a simplified block diagram of the transceiver. The transceiver design uses 4-PAM signaling and 4-phase clocking, which results in the transmission of 8 bits on each



Fig. 2 One channel used to allow continuous adaptation

#### 0-7803-8445-8/04/\$20.00 ©2004 IEEE



channel during each system clock period. The transmitter contains a pseudo-random bit sequence (PRBS) data generator producing an 8-bit wide output. These 8 bits are then applied to an 8-to-2 serializer and then driven off-chip with a 4-level PAM driver. The receiver portion of the chip comprises the adaptive DFE, a phase recovery block, and a data retiming block. Built-in test structures include channel pulse monitoring capability and an error detector that compares the transmitter's PRBS data sequence with the received data sequence and counts the resulting errors. The following sections describe the design and implementation of the DFE and the phase recovery blocks.

## **III. Adaptive DFE**

As shown in Fig. 1, the DFE described in this paper subtracts ISI due to the past three symbols, and can be characterized by the following equation:

 $\hat{d}_n = slice[d(t_n) - h_1\hat{d}_{n-1} - h_2\hat{d}_{n-2} - h_3\hat{d}_{n-3}]$ 

where d(t) is the received continuous waveform,  $\hat{d}_n$  is a DFE decision, and  $h_1$ ,  $h_2$ , and  $h_3$  are the filter parameters. By making the DFE adaptive, these filter parameters are automatically adjusted to match the channel characteristics. This DFE design also performs two additional functions: First, it determines the reference levels needed for decoding 4-PAM signaling. Second, it provides data phase recovery for the receiver.

### A. Equalization Filter Parameters

The adaptation of equalization parameters in the DFE is accomplished through the direct measurement and cancellation of ISI during transmission of the repeated calibration sequence '3000'. This technique is shown conceptually in Fig. 4 for the  $h_1$  parameter. The  $h_1$ parameter is determined by the amplitude of the ISI in the symbol period following the '3' in the calibration sequence. The normal set of valid transmit symbols in our 4-PAM system is {3, 1, -1, -3}; however, during adaptation the symbol '0' is added to allow the direct measurement of the system impulse response. The '0' symbol is implemented by splitting the transmit driver into two equal halves; during data transmission the two halves support each other, while during '0' transmission they oppose each other. Assuming the ISI in the system is limited to four symbol periods, the receiver samples the pulse amplitude of the '3' symbol followed by three samples of ISI during the '0' samples. If the ISI is zero or is equalized these three samples will become zero. Without the '0' symbol, an alternative calibration sequence would be '3 -3 -3' for which the same three samples would be non-zero even in the absence of ISI. With the '3000' sequence, the three samples of the ISI in the calibration sequence are used directly as the DFE adaptation control signals. The samples during the first, second, and third '0' symbols are used to adapt  $h_1$ ,  $h_2$ , and  $h_3$  respectively. During these '0's the equalization



Fig. 4 Adaptive DFE technique (a) received pulse sequence (b) adaptation block diagram (c) equalization parameter tracking

parameters are adjusted up or down until the ISI is eliminated. An up/down counter driving an 8-bit binarycoded DAC is used for this purpose, controlled by a comparator as shown in Fig. 4(b). The same comparator is used during channel operation to distinguish a '1' from a '-1'. If the received sample is greater than zero, the DFE parameter is increased; otherwise it is decreased. This counter modifies the equalization parameter until it converges and oscillates around the zero-ISI setting, as shown in Fig. 4(c). At the end of the calibration sequence, the parameter value is frozen and used for equalization when the channel is returned to operational mode.

#### **B.** 4-PAM Reference Generation

The 4-PAM signaling used in this design requires three reference levels for the receiver comparators. The valid symbols are elements of {3, 1, -1, -3}, requiring reference levels of 2, 0, and -2. The zero reference is inherent to the comparator design, leaving the non-zero references 2 and -2 to be determined. These two references can be considered as one due to the differential nature of the design. Reference generation is accomplished using a procedure similar to the one described above for the equalization filter parameters. Instead of adapting  $h_1$  to offset the first ISI sample, as shown in Fig. 4b, the 2'-2 reference level on the comparator is adjusted until it is equal to the sampled '3' pulse. During the '3000' calibration sequence, the reference level is adjusted using an up/down counter controlled by the '2' comparator output, until the reference level equals the sampled amplitude of the '3' pulse. Because this results in the reference level being adapted to '3' instead of the required '2', the comparator reference needs to be scaled. This is done by implementing the reference level using 3 current sources; during calibration all three are enabled, while during equalizer operation only 2 are enabled.

## C. Timing Recovery

While the transmitter and receiver in this system use a common clock, the phase of the received signal is unknown, and must be recovered. The concept behind the phase recovery technique is shown in Fig. 5. Phase recovery is

received signal



again achieved in a similar manner to the DFE adaptation procedure. A retiming sequence of '3 3 -3 -3' is repeatedly transmitted, and the receiver adjusts the local clock phase to align with the '-3' to '3' transition. This adjustment is accomplished by using one of the receiver's comparators as a phase detector driving an up/down counter. This counter controls an adjustable phase interpolator, which generates the local clock phase. During data transmission, a programmable offset is added to the recovered phase to place the sampling point in the center of the bit period instead of at the bit boundary.

#### **IV. Implementation**

The transceiver uses a 4-phase clock to reduce the clock speed of the system. The receiver implements the 4-phase clocking scheme using 4-way parallel interleaving of the DFE. This interleaving has three advantages: First, interleaving reduces the speed requirement of the latched comparators; their reset phase can be performed during clock phases when other comparators are latching. Second, interleaving simplifies the signal routing of the previous decision feedback data. For example, the  $d_{n-1}$  input of any of the interleaved DFEs can be hardwired to the comparator output of the DFE operating on the previous clock phase. Third, interleaving allows all of the DFE filter parameters to be adapted concurrently. Because the calibration sequence '3 0 0 0' is four symbols long, it is distributed across the four interleaved DFEs. The DFE branch receiving the '3' symbol calibrates the 4-PAM reference level, while the other three branches calibrate the filter parameters  $h_1$ ,  $h_2$ , and  $h_3$ .

Due to the large number of signal additions and subtractions required by the DFE, all signal operations use current mode where they are easily performed using multiple differential pairs summing their currents into a common load. As shown in Fig. 6, the core of this DFE can be broken down into three main blocks: The first block converts the received signal from voltage-mode to current-mode. The second block sums the DFE feedback signal and the reference levels needed for 4-PAM decoding, all in current mode using parallel differential pairs. The third block of the DFE is a comparator that uses the sum of the above mentioned currents as its input. The comparator outputs are used for DFE feedback, up/down control for the DFE adaptation counters, and also provide the decoded data that proceeds to the retiming block of Fig. 3. This DFE core is repeated in each of the 4 interleaved branches. These interleaved branches are further subdivided for each of the 3 reference levels  $\{\pm 2, 0\}$ , resulting in a total of 12 DFE cores. The resulting system comprising the 12 DFE cores as well as a bias control block is shown in Fig. 6. The bias control block consists of up/down counters driving D/A current sources that implement the equalization parameters

1



and reference level.

The DFE core described above is implemented using a folded cascode architecture, shown in Fig. 7. The differential pair on the left performs the input voltage-to-current conversion using source degeneration to set the transconductance. A zero-peaking capacitor is added in parallel to the degeneration resistor to compensate for the pole created at the current summing node. The next three differential pairs implement the decision feedback signals using current sources weighted by  $h_1$ ,  $h_2$ , and  $h_3$ . The reference level is implemented with three additional differential pairs, as discussed in Section 3B, but omitted for clarity. The column on the right side of the schematic has a current source on top, with a cascode stage below it. Below the cascode stage is a current steering stage that directs current either into the clocked comparator below, or directly to ground. This keeps all the transistors that drive the current summing node in saturation during comparator reset. Below the current steering stage is the clocked comparator. This is implemented as a sense-amp latch comprised of two back to back inverters and two reset transistors. The folded cascode design provides a low-impedance at the high-capacitance current-summing nodes which, with the added benefits of the zero-peaking capacitor, increases the bandwidth of the circuit. In addition, the folded cascode architecture provides more headroom in a 1.8V supply environment.

The full transceiver has been designed and implemented in a 0.18  $\mu$ m CMOS process. The die photo of the 5.3 mm<sup>2</sup> test chip is shown in Fig. 8.

#### V. Measurement and Simulation Results

Our initial testing of the chip confirms the correct operation of many of design's low-level functions, including automatic timing recovery, adaptation of the DFE parameters and 4-PAM reference level. While these design elements





have been shown to be functional, a design error in the clock generation block has prevented full functionality tests. The rest of this section presents the measurement results from the built-in pulse-response monitor, and simulation results of the DFE that demonstrate its correct operation.

We have successfully tested the use of the DFE adaptation circuits to monitor the pulse response of the system's channel using a technique similar to that of a digital sampling oscilloscope. The received calibration sequence is sampled at 256 evenly spaced clock phases over many clock cycles. The receiver clock phase is adjusted using a programmable phase-offset register. The sample amplitude at each phase is measured using the circuits designed to adaptively determine the 4-PAM reference level. As described in Section 3B, finding the 4-PAM reference level is accomplished by finding the amplitude of a symbol '3' pulse at its peak sampling point. If the DFE remains in calibration mode when the receiver's clock phase changes, the new value of the 'reference level' provides a measure of the sample's amplitude. This measurement is monitored externally using the chip's scan chain. Plotting this measure against clock phase produces the pulse response over one clock cycle (4 UI). Fig. 9 shows the pulse response of a short channel on our test board. The pulse response extracted using the DFE adaptation circuits is nearly identical to the direct measurement using an oscilloscope.

Simulated or measured eye diagrams are often used to demonstrate the correct operation of a high-speed signaling system with equalization. Due to the nature of a DFE, there is no external node on which to probe a equalized signal. Instead, the simulated eye diagram for the current-summing node in one of the interleaved DFEs is shown in Fig. 10 with the equalization turned both off and on. With equalization off, no eye is visible, while a clean eye opening is visible







Fig. 10 Simulated eye diagram of the low-impedance current-summing nodes shown in Fig. 7.

with the equalizer turned on. Because the feedback input to the interleaved DFE is only valid for one out of every four symbols, the eye pattern shows a 4-symbol cycle, where the eye is open only during one symbol period. The signal levels shown in the eye diagram represent a  $12 \,\mu A$  differential current on a nominal 60 µA inside the latched comparator. These simulation results are for a signaling rate of 2 Gb/s over the equivalent of 2M of FR-4 PCB trace. The transmitted signal level is 200 mV/level (differential), ranging from 1.5 V to 1.8 V.

#### **VI.** Conclusion

The proposed adaptive equalization technique uses direct measurement and cancellation of ISI during an intermittent calibration sequence to determine equalizer parameters, eliminating some of the analog blocks that would be required to generate the error signal required if using LMS adaptation. Simulation results show that the design is capable of equalizing at bit rates up to 2 Gb/s. The symbol rate in this design is limited by the delay of the feedback path in the DFE. To increase the symbol rate, future designs should focus on reducing or mitigating this delay.

#### Acknowledgments

The authors would like to thank Bill Walker and Hirotaka Tamura of Fujitsu Labs for their input. The autors would also like to acknowledge the generous funding of Fujitsu Labs of America, and NSERC of Canada.

#### References

[1] S. Kaeriyama, M. Mizuno, "A 10Gb/s/ch 50mW 120x130µm<sup>2</sup> clock and data recovery circuit," IEEE Dig. Tech. Papers, ISSCC, pp. 70-71, 2003.

[2] H. Takauchi, H. Tamura, et al., "A CMOS multi-channel 10Gb/s Transceiver," *IEEE Dig. Tech. Papers, ISSCC*, pp. 72-73, 2003.

[3] J. Stonick, G. Wei, J. Sonntag, D. Weinlader, "An adaptive PAM-4 5Gb/s backplane transceiver in 0.25 µm CMOS," IEEE

J. Solid-State Circuits, pp. 436-443, March 2003. [4] B.S. Song, D. Soo, "NRZ timing recovery technique for bandlimited channels," *IEEE J. Solid-State Circuits*, pp. 514-520, April 1997. [5] Y.S. Sohn, S.J. Bae, H.J. Park, S.I. Cho, "A 1.2 Gbps

CMOS DFE receiver with the extended sampling time window for application to the SSTL channel," *IEEE Symp. VLSI* Circuits, pp. 92-93, 2002.

[6] J. Zerbe, C. Werner, et al., "Equalization and clock recovery for a 2.5 - 10Gb/s 2-PAM/4-PAM backplane transceiver cell," IEEE Dig. Tech. Papers, ISSCC, pp. 80-81, 2003.