## 6.6 A 22.5-to-32Gb/s 3.2pJ/b Referenceless Baud-Rate Digital CDR with DFE and CTLE in 28nm CMOS

Wahid Rahman<sup>1</sup>, Danny Yoo<sup>1</sup>, Joshua Liang<sup>1</sup>, Ali Sheikholeslami<sup>1</sup>, Hirotaka Tamura<sup>2</sup>, Takayuki Shibasaki<sup>2</sup>, Hisakatsu Yamaguchi<sup>2</sup>

<sup>1</sup>University of Toronto, Toronto, Canada <sup>2</sup>Fujitsu Laboratories, Kawasaki, Japan

Baud-rate clock and data recovery circuits (CDRs) are becoming more prevalent in high-speed receiver designs as they offer lower power consumption by sampling the received data only once per UI [1,2]. This reduces the number of front-end comparators and clock distribution networks [1]. However, current baud-rate CDRs require an external reference clock [1,2], adding to the system complexity in pin count and clock generation. While frequency detectors (FDs) allow CDR designs to operate without a reference clock and across a wide capture range [3-5], current FDs are not designed for baud-rate CDRs. As well, current FDs rely on sharp data edges and are not designed for significant ISI caused by channel loss at high data rates [3-5]. This work presents a reference-less baudrate CDR that operates from 22.5Gb/s to 32Gb/s with channel loss up to -14.8dB at Nyquist. An FD scheme is proposed that automatically controls an adjustable PD to correct any frequency error. This eliminates the need for a separate frequency acquisition loop in the CDR. The CDR, with a CTLE and a 1-tap DFE, is fabricated in 28nm CMOS. The entire receiver consumes 3.2pJ/b at 32Gb/s PRBS-31.

Figure 6.6.1 illustrates the proposed baud-rate receiver architecture with frequency detection. Full-rate clocking is shown here for conceptual purposes. The incoming waveform is equalized by a CTLE before being sampled at baudrate (i.e. once per UI) by three comparators. Two of the comparators are set to  $\pm \alpha$  thresholds to be shared by a 1-tap look-ahead DFE [1]; a third comparator is set to a zero threshold. The outputs of the three comparators form the resulting sample S<sub>n</sub>, which is then fed to a pattern filter to be selected if it matches a predefined data sequence. A valid  $S_n$  is then used by the proposed FD and the adjustable baud-rate phase detector (PD) in the digital CDR to bring the clock frequency closer to the data rate. Frequency detection operates as follows. Slow/fast clock detectors indicate if there is a frequency error between the recovered clock and incoming data (Slow<sub>FD</sub> & Fast<sub>FD</sub>) and feed these indicators to the frequency correction logic where they are accumulated over time. The FD filter then compares the raw accumulated FD output against programmable filter thresholds ( $\pm 30$  from a range of  $\pm 150$ ). If the high or low thresholds are exceeded, the FD identifies the clock as too slow or too fast, respectively. The FD then digitally adjusts the PD characteristic (PD Adjust Slow/Fast) such that the average PD output corrects for the frequency error over time. If the filter thresholds are not exceeded, the FD allows the PD to operate normally (PD Adjust Normal). An FD lock detector measures the average activity from the slow/fast clock detectors and determines if frequency lock is achieved. This occurs when frequency error falls below ±800ppm, which is within the PD capture range.

The pattern filter and normal PD operation are depicted in Fig. 6.6.2. Similar to [1], both the rising and the falling RX waveforms corresponding to TX data patterns "011" and "100" are considered (however only the rising waveform is shown). From the post-CTLE waveform, three subsequent samples  $S_{n-1}$ ,  $S_n$ , and  $S_{n+1}$  are quantized into four distinct voltage "zones". Sequences for which  $S_{n-1}$  and  $S_{n+1}$  are in Zones 0 and 3, respectively, are identified. Among these sequences,  $S_n$  is selected for PD/FD operation only if it falls in Zones 1, 2, or 3; sequences for which  $S_n$  lies in Zone 0 are ignored. This ensures the 1-tap look-ahead DFE correctly recovers the "011" data. PD logic is defined by the  $S_n$  zone. For normal PD operation, if  $S_n$  is in Zones 1 or 2, the recovered clock (CK<sub>REC</sub>) is early and the PD output (PD<sub>OUT</sub>) is DN. If  $S_n$  is in Zone 3, CK<sub>REC</sub> is late and PD<sub>OUT</sub> is UP.

Figure 6.6.3 illustrates the operation of the FD under three cases: normal clock ( $f_{CK}=f_{DATA}$ ), slow clock ( $f_{CK}<f_{DATA}$ ), and fast clock ( $f_{CK}>f_{DATA}$ ). For a normal clock, jitter and CDR dynamics move selected  $S_n$  samples about the stable phase lock point, as shown in the 1<sup>st</sup> row of the table. For this normal PD logic, the average PD characteristic is zero and the PD ensures that the CDR maintains lock for both the VCO frequency and phase. If the recovered clock is slow, the clock drifts with respect to the data, as shown in the 2<sup>nd</sup> row of the table. As observed on rising waveforms, selected  $S_n$  samples drift from Zones 1 to 2 to 3 over time. The slow clock detector in the FD recognizes this and, if it persists, issues a Slow Adjust

signal to the PD. This signal changes the PD characteristic such that the average PD output is positive. Over time, this positive average increases the VCO frequency and corrects the slow clock. A similar procedure is done for a fast clock, as shown in the 3<sup>rd</sup> row of the table. In all three cases, there exists a stable phase lock point in the PD characteristic. This ensures the CDR automatically phase-locks once FD<sub>LOCK</sub> is declared. At this point, the PD resumes normal operation.

Figure 6.6.4 presents the receiver schematic for a guarter-rate implementation. The CTLE consists of an adjustable source-degenerated stage followed by a CML buffer to drive ten sampling comparators. The CTLE provides up to 4.0dB gain at 7GHz; this equalizes the channel response up to the first post-cursor ISI for the 1-tap DFE. The ten double-tail comparators operate at guarter-rate to sample the RX waveform once per UI. Of these comparators, eight correspond to the DFE levels  $(\pm \alpha)$  for all four clock phases and two correspond to the zero level. While this restricts the S<sub>n</sub> samples to be available only at clock phases CK0° and CK180°, it is sufficient for CDR operation and relaxes the CTLE design. A four-stage CML ring VCO operates at quarter-rate ( $f_{CK}/4$ ) to generate the four high-speed quadrature clock phases. The measured VCO tuning range is 5.6-9.0GHz. The demuxed comparator samples are processed by the synthesized digital back-end operating at f<sub>CK</sub>/32 (CK<sub>CORF</sub>) from 703.1MHz to 1.125GHz. Within the digital backend, the digital loop filter of the CDR generates coarse and fine codes to control a 10b segmented current DAC. The fine DAC is designed to span 2 LSBs of the coarse DAC. The current DAC achieves a resolution of 3.0MHz/LSB through an Ito-V conversion.

A CDR prototype is fabricated in TSMC 28nm CMOS technology and consumes 65.2-102.0mW when operating from 22.5-32Gb/s respectively (without I/O buffers). Figure 6.6.5 summarizes the measurement results. FD operation is verified by open-loop response and closed-loop capture range measurements. For the open-loop FD response, frequency error  $(f_{FBB}=[f_{DATA}-f_{CK}]/f_{CK})$  is measured by forcing the VCO in open loop to 7.0GHz ( $f_{CK}$ =28GHz) and transmitting 22.5-32Gb/s PRBS-31 data over a 5" Tyco channel with Nyquist loss ranging from -10.1dB to -14.8dB. The maximum data rate (fDATA) of the measurement equipment is limited to 32Gb/s, corresponding to f<sub>ERR</sub> ≤+14.3% (interval A in Fig. 6.6.5 openloop measurements). To characterize for  $f_{ERR}$ >14.3% (interval B),  $f_{DATA}$  is held constant at 32Gb/s and  $f_{\mbox{\tiny CK}}$  is reduced. Closed-loop capture range is measured by initializing the VCO in closed-loop to 7.0GHz (f<sub>cx</sub>=28GHz) and observing the widest range of data rates for which the CDR acquires lock. The CDR locks down to 22.5Gb/s and up to 32Gb/s when no TX jitter is applied, achieving a capture range of 9.5Gb/s (34%). The capture range is limited by the VCO lower limit and the equipment data rate upper limit. Applying 0.2UI<sub>PP</sub> TX SJ reduces capture range to 25%. The FD improves CDR capture range by up to 227×. Jitter tolerance measurements are shown with the FD enabled and disabled. A real-time oscilloscope measures a maximum FD lock time of 10.1ms for FD<sub>LOCK</sub>.

Figure 6.6.6 compares the performance of this work against prior work. The entire receiver, including equalizers, competes favourably in terms of power efficiency against prior works with no equalization. Figure 6.6.7 shows the die micrograph.

## Acknowledgements:

The authors would like to thank CMC Microsystems for providing CAD tools and measurement equipment, NSERC for partial funding support, and Nikola Nedovic for technical assistance.

## References:

[1] T. Shibasaki, et al., "A 56Gb/s NRZ-Electrical 247mW/lane Serial-Link Transceiver in 28nm CMOS," *ISSCC*, pp. 64-65, Feb. 2016.

[2] R. Dokania, et al., "A 5.9pJ/b 10Gb/s Serial Link with Unequalized MM-CDR in 14nm Tri-Gate CMOS," *ISSCC*, pp. 184-185, Feb. 2015.

[3] G. Shu, et al., "A 4-to-10.5Gb/s 2.2mW/Gb/s Continuous-Rate Digital CDR with Automatic Frequency Acquisition in 65nm CMOS," *ISSCC*, pp. 150-151, Feb. 2014.

[4] S. Huang, et al., "An 8.2-to-10.3Gb/s Full-Rate Linear Reference-less CDR Without Frequency Detector in 0.18 CMOS," *ISSCC*, pp. 152-153, Feb. 2014.
[5] S. Jalali, et al., "A Reference-Less Single-Loop Half-Rate Binary CDR," *IEEE JSSC*, vol. 50, pp. 2037-2047, Sept. 2015.











Figure 6.6.5: Measurement results: open-loop FD response; CDR capture range vs. TX SJ at 200 MHz; CDR JTOL for 28Gb/s PRBS-31; and lock time vs. freq. error (w/ and w/o  $0.2UI_{PP}$  TX SJ at 200 MHz).







Figure 6.6.4: Schematic of complete baud-rate receiver. Baud-rate sampling is implemented with four phases of a quarter-rate clock.

| $\succ$              | ISSCC 2014<br>[3]   | ISSCC 2014<br>[4]     | JSCC 2015<br>[5]      | This work             |
|----------------------|---------------------|-----------------------|-----------------------|-----------------------|
| Technology           | 65nm CMOS           | 0.18µm<br>BiCMOS      | 65nm CMOS             | 28nm CMOS             |
| Supply<br>Voltage    | 1.2/1.0             | 1.8                   | N/A                   | 0.9                   |
| Baud-rate?           | No                  | No                    | No                    | Yes                   |
| Data rate<br>(Gb/s)  | 4-10.5<br>(Δ = 6.5) | 8.2-10.3<br>(Δ = 2.1) | 8.5-12.1<br>(Δ = 3.6) | 22.5-32*<br>(Δ = 9.5) |
| Capture<br>Range     | 65%                 | 21%                   | 36%                   | 34%                   |
| Channel<br>Loss (dB) | None<br>reported    | None<br>reported      | 7.0                   | 14.8                  |
| Equalization         | None                | None                  | None                  | CTLE +<br>1-tap DFE   |
| Total Power          | 22.5                | 174                   | 43.0                  | 102.0                 |
| (mW)                 | @ 10Gb/s            | @ 10.3Gb/s            | @ 12.1Gb/s            | @ 32Gb/s              |
| FoM<br>(pJ/b)        | 2.25                | 16.89                 | 3.55                  | 3.19                  |

\*Equipment limit (maximum data rate = 32Gb/s)

Figure 6.6.6: Performance comparison.

