# A 30Gb/s 2x Half-Baud-Rate CDR

Danny Yoo<sup>1</sup>, Mohammad Bagherbeik<sup>1</sup>, Wahid Rahman<sup>1</sup>, Ali Sheikholeslami<sup>1</sup>, Hirotaka Tamura<sup>2</sup>, Takayuki Shibasaki<sup>2</sup> <sup>1</sup>University of Toronto, Toronto, Canada

<sup>2</sup>Fujitsu Laboratories, Kawasaki, Japan

*Abstract*—This paper presents a 2x half-baud-rate clock and data recovery technique that locks to the edge by performing 2x oversampling at half-baud-rate (every other UI). A test-chip was fabricated in TSMC 28nm HPC CMOS technology demonstrating a 30 Gb/s 2x half-baud-rate CDR with a Tyco 5" channel with 13.06 dB loss at Nyquist. The total power consumption is measured to be 79.2 mW (FOM of 2.64 pJ/bit) for 30 Gb/s PRBS31 input data.

Keywords—CMOS; Receiver; CDR; Clock and Data Recovery; Baud-Rate; Wireline

## I. INTRODUCTION

In several generations of high-speed links, 2x oversampling bang-bang phase detectors (BBPD), where the data is sampled both at the center and the edge and the clock is locked to the edge, have been prominent due to their robustness and simple hardware implementation. However, the recent trend has shifted towards baud-rate phase detectors, such as the Mueller-Muller PD (MMPD), as a means of reducing power consumption by sampling only once per UI [1, 2, 3, 4] or even less at sub-baudrate [5]. However, as indicated in [2], MMPDs, which lock to the center of the data, are sensitive to equalization and asymmetry in the pulse response. In this paper, we propose a scheme that combines the benefits of the 2x oversampling BBPD with those of the MMPD in a 2x half-baud-rate CDR. In this scheme, we collect two samples (2x) from every other UI (half baud-rate), effectively sampling the data at baud-rate on average for power saving, but lock to the data edge similar to a BBPD for robustness. By sampling the data and edge in every other UI, while using the same VCO topology as in a 2x oversampling BBPD, the proposed scheme reduces the number of clock phases to be distributed to half, thereby reducing its associated power consumption, which often constitutes a major portion of the total power consumption. We present an implementation of this scheme in a 30Gb/s receiver fabricated in 28nm CMOS.

#### II. 2X HALF BAUD-RATE SCHEME

Fig. 1 illustrates the basic concept behind the proposed 2x half-baud-rate scheme. The eye diagram shown corresponds to an output of a front-end equalizer with one significant postcursor ISI, while all other ISI terms are assumed to be minimized via equalization. We sample a UI by three comparators at the edge phase,  $\phi_e$ , with their outputs labeled as DL, ED, and DH, and by one comparator at the center phase,  $\phi_c$ , with its output labeled as DM, while we skip sampling the following UI



Fig. 1. Operation of the proposed 2x half-baud-rate clock and data recovery

0

1

1

1

1

1

(0, 1)

(1, 1)

LATE

HOLD

altogether. Indeed, we rely on ISI to recover the previous bit. In doing so, we perform 4 comparisons in every other UI, or on average 2 comparisons per UI. By having the center and edge samples, albeit in every other UI, this scheme inherits the benefits of a 2x oversampling BBPD by locking to the edge, as will be demonstrated later in Fig. 2. By skipping every other UI, the proposed scheme shares the benefits of reduced hardware and low power consumption with the baud-rate MMPD. Compared to the Alexander PD (2x BBPD), the proposed 2x half-baud-rate PD requires half the number of clock phases, hence, reduces the power of the clock distribution network.

## A. Phase Detector

0

1

1

All other cases

1

The 2x baud-rate phase detector (PD) logic can be explained by observing samples from the current UI (n) in Fig. 1. If at  $\phi_e$ the data falls between  $-V_{ref}$  and  $+V_{ref}$ , we conclude that there is a data transition ( $0 \rightarrow 1$  or  $1 \rightarrow 0$ ) at this phase and hence we will judge the early/late by the output of the edge (ED) and the data (DM) comparators, similar to a 2x oversampling BBPD's logic. If these two bits are identical, the clock is late; otherwise, it is early as shown in the phase detector table of Fig. 1.

# \* 978-1-5386-9395-7/19/\$31.00 ©2019 IEEE



Fig. 2. Characteristics of the proposed 2x half-baud-rate PD versus those of the conventional Mueller-Muller PD.

# B. Data Decoder

The data decoder (DD) only needs to observe the outputs of the data comparators (DH, DL, and DM) to decode the current bit and the previous bit that is from an unsampled UI. Similar to a 1-tap speculative DFE, the DD recovers the data by slicing the data eye at a threshold that is adjusted depending on the previous bit sequence. If the output of all three comparators are zero,  $D_{n-1}$  and  $D_n$  are both zero. Similarly, if the outputs of all three comparators are 1,  $D_{n-1}$  and  $D_n$  are both 1. If the data at  $\varphi_e$  falls between  $-V_{ref}$  and  $+V_{ref}$ , it implies a transition between  $D_{n-1}$  and  $D_n$ . Therefore, by observing the sign of DM (which indicates  $D_n$ ), we can find  $D_{n-1} = \overline{D_n}$ . The truth table explaining the data decoder logic is also shown in Fig. 1.

## III. PD CHARACTERISTIC COMAPRISON

Fig. 2 compares simulated results for PD operations of the proposed 2x half-baud-rate scheme against the Mueller-Muller scheme from [1]. Both PD schemes display similar PD characteristics over 1UI period when properly tuned and equalized (solid black curves). However, the MMPD suffers significantly as the equalization setting and/or comparator reference level  $V_{ref}$  diverge from their optimal values. The first row shows the effect of comparator offset on the PD characteristics of both schemes for a PRBS7 input pattern. As we increase the offset, the MMPD begins to reveal a dead-zone while the proposed PD (similar to the 2x oversampling BBPD) continues to provide dead-zone-free characteristics. Similarly, the second row illustrates how the MMPD reveals a dead-zone if the channel is not properly equalized by the front-end equalizer, i.e. if the residual ISI is increased. Residual ISI here is described as ISI other than the first post-cursor ISI and denoted by  $\beta$  in 1 +  $\alpha Z^{-1}$  +  $\beta Z^{-2}$  in the table. The proposed PD (similar to the 2x oversampling BBPD) is not as sensitive to residual ISI as an MMPD.



Fig. 3. Proposed quarter-rate implementation of 2x half-baud-rate CDR.

## IV. CIRCUIT IMPLEMENTATION

#### A. System Architecture

As previously stated, the proposed PD works well when the front-end equalizes long-tail ISI, leaving only one significant post-cursor similar to the case of a 1-tap DFE. The proposed 2x half-baud-rate PD is prototyped on an inductor-less analog CDR with a CTLE that equalizes the long-tail ISI. A quarter-rate implementation is shown in Fig. 3. Unlike the prior work in [5], integration of multiple UIs is not required in the front-end, thus, reducing circuit complexity.

## B. Building Blocks

The front-end of the proposed architecture is a CTLE with 2 boost stages. The CTLE has 4 bits of controls (16 settings) for both the source degeneration capacitor and resistor. The output of the CTLE is sampled by a total of 8 double-tail latch comparators in a quarter-rate clocking scheme. The PD and the DD blocks, highlighted in red, are implemented for operation at



Fig. 4. Measurement setup with a Tyco 5" channel. PRBS input data is generated and probed to the input pad of open-cavity QFN package.



Fig. 5. Measured S21 insertion loss of Tyco 5" channel with 36" cables using a VNA.

7.5Gb/s using custom high-speed digital logic gates according to the truth tables shown in Fig. 1. The output of these custom digital logic cells are flopped to avoid glitches that are innate to combinational logics. The data path following DD consists of a 4-to-32 deserializer that down-samples the data and feeds it to a digital BERT to validate the recovered PRBS pattern by interleaving all 32 parallel paths. The digital BERT is the only synthesized digital block on this receiver. The clock recovery consists of a PD followed by a majority voter (MV), a charge pump (CP), a loop filter (LF), and a ring VCO. In a quarter-rate implementation, CK  $0^{\circ}/180^{\circ}$  represents the edge phase,  $\phi_{e}$  and CK 45°/225° represents the clock phase,  $\phi_c$ . In other words, a conventional 4-stage ring VCO is sufficient to generate the required phases for the proposed 2x half-baud-rate scheme since it naturally produces clocks at  $0^{\circ}/45^{\circ}/90^{\circ}/135^{\circ}$  and its inverse 180°/225°/270°/315°.

#### V. MEASUREMENT RESULTS

# A. Test Setup

A test-chip of the proposed CDR was fabricated in TSMC 28nm HPC process (0.9V supply) and packaged in a 5x5 open-



Fig. 6. Measured clock spectrum and phase noise for locked CDR at 30 Gb/s for PRBS31 (left) and PRBS7 (right).



Fig. 7. Measured jitter tolerance with 30Gb/s PRBS31 and PRBS7 (BER < 1E-12).

cavity QFN. Fig. 4 illustrates the test setup used for measuring the 30Gb/s 2x half-baud-rate CDR. The SHF 12104A bit pattern generator was used to generate both PRBS7 and PRBS31. Input data is passed through the Tyco 5" channel using 36" SMA cables before being connected to the GSGSG probe head. Fig. 5 depicts the measured S21 insertion loss of the Tyco 5" channel with 13.06dB loss at Nyquist.

# B. Measured Results

Fig. 6 illustrates the measured spectrum of the divided recovered clock (CK/16) for PRBS31 & PRBS7 when the CDR is locked. The integrated jitter from the phase noise plot is  $823.5f_{\text{RMS}}$  for PRBS31 and  $731.8f_{\text{SRMS}}$  for PRBS7. In addition, the capture range was measured to be -2300ppm to +66000ppm. The higher ppm in the positive direction is due to the asymmetric nature of the 2x half-baud-rate PD logic where the data sample always follows the edge sample, not the other way around. This property makes frequency acquisition available for free in one



Fig.8. Power & area breakdown per block and the die photo.

|                            | ISSCC 10 [1]                      | ISSCC 15 [2]      | ISSCC 17 [3]                      | ISSCC 18 [4]                | CICC 18 [5]            | This Work               |
|----------------------------|-----------------------------------|-------------------|-----------------------------------|-----------------------------|------------------------|-------------------------|
| Technology                 | 32nm CMOS                         | 14nm CMOS         | 65nm CMOS                         | 16nm CMOS FinFET            | 65nm CMOS              | 28nm CMOS               |
| Supply Voltage             | 0.95V                             | 0.9V              | 1.0V, 1.2V                        | 0.85V, 0.9V, 1.2V           | 1.0V                   | 0.9V                    |
| Data-rate                  | 11.8 Gb/s                         | 10 Gb/s           | 60 Gb/s                           | 19-56 Gb/s                  | 15.2 Gb/s              | 30 Gb/s                 |
| Signalling                 | NRZ                               | NRZ               | NRZ                               | PAM4                        | NRZ                    | NRZ                     |
| Inductor                   | No                                | No                | Yes                               | Yes                         | Yes                    | No                      |
| Sampling Rate              | Baud-rate                         | Baud-rate         | Baud-rate                         | Baud-rate                   | Half-baud-rate         | 2x half-baud-rate       |
| Channel Loss               | 25                                | 24 dB             | 21 dB                             | 32 dB                       | 11 dB                  | 13.06 dB                |
| CDR Type                   | Digital                           | Digital           | Digital                           | Digital                     | Digital                | Analog                  |
| Freq Offset<br>Tracking BW | None Reported                     | None Reported     | None Reported                     | +/-200ppm                   | None Reported          | -2333 to +666666ppm     |
| Equalization               | 3-tap TX FFE<br>CTLE<br>4-tap DFE | CTLE<br>4-tap DFE | CTLE<br>3-tap DFE<br>2-tap RX FFE | CTLE<br>ADC based DFE & FFE | CTLE                   | CTLE<br>Data Decoder    |
| Jitter (PRBS31)            | None Reported                     | None Reported     | None Reported                     | None Reported               | 1.12 ps <sub>rms</sub> | 823.5 fs <sub>rms</sub> |
| BER                        | <10 <sup>-15</sup>                | <10-12            | <10-12                            | <10 <sup>-12</sup>          | <10 <sup>-12</sup>     | <10-12                  |
| Total RX Power             | 43 mW                             | 59 mW*            | 136 mW                            | 545 mW*                     | 29.3 mW                | 79.2 mW                 |
| FOM                        | 3.64 pJ/bit                       | 5.9 pJ/bit*       | 2.26 pJ/bit                       | 9.7 pJ/bit*                 | 1.9 pJ/bit             | 2.64 pJ/bit             |

\* Entire transceiver (TX + RX)

Fig. 9. Performance comparison with recently published CDRs.

direction without adding any additional feedback loop in the CDR. In other words, in the positive direction, where the incoming data is faster than the CDR's initial VCO frequency, the PD is able to pull up the VCO frequency by +66000ppm (equivalently 2Gb/s) to a frequency lock, and then track the phase simultaneously to achieve a phase lock.

The measured jitter tolerance with sinusoidal jitter injected at the input is shown in Fig. 7. Jitter tolerances for both PRBS31 & PRBS7 pass the IEEE 802.3 masks. As expected, the PRBS7 has a higher jitter tolerance curve compared to the PRBS31 due to lower pattern-dependent ISI. A lower-than-expected jitter tolerance in both cases is due to the VCO's higher-than-expected center frequency which is caused by a process shift to a fast-fast (FF) corner. This phenomenon of higher clock frequency could not be remedied by either a reduction in the supply voltage or the bias current due to lower gain through the ring oscillator's delay stages in a FF corner.

# VI. CONCLUSION

Fig. 8 illustrates the die photo with its power breakdown per block. The total power consumption is 79.2mW with an FOM of 2.64pJ/bit at 30Gb/s. Aside from the VCO and the clock power, which were over-designed to reduce phase noise in a ring architecture, only 25mW is consumed by all other blocks. As a result, FOM could be much improved for the proposed 2x half-baud-rate scheme. The total die area is 1.232mm<sup>2</sup> and the area occupied by all the building blocks is only 0.135mm<sup>2</sup>.

The table in Fig. 9 compares the performance of the proposed 2x half-baud-rate CDR to those of published CDRs. In conclusion, to the best of our knowledge, the proposed CDR is

the first fully analog 2x half-baud-rate CDR reported that locks to the edge by performing 2x oversampling at half-baud-rate (every other UI). In doing so, the benefits of a traditional 2x oversampling BBPD in terms of robustness, and a baud-rate MMPD in terms of power-saving are combined.

#### **ACKNOWLEDGEMENTS**

The authors would like to thank CMC Microsystems for providing CAD tools and measurement equipment, MOSIS for technology access, and Hossein Shakiba, Joshua Liang, and Behzad Dehlaghi for careful review and feedback.

## References

- F. Spagna et al., "A 78mW 11.8Gb/s serial link transceiver with adaptive RX equalization and baud-rate CDR in 32nm CMOS," 2010 IEEE International Solid-State Circuits Conference - (ISSCC), San Francisco, CA, 2010, pp. 366-367.
- [2] R. Dokania et al., "10.5 A 5.9pJ/b 10Gb/s serial link with unequalized MM-CDR in 14nm tri-gate CMOS," 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers, San Francisco, CA, 2015, pp. 1-3.
- [3] J. Han, Y. Lu, N. Sutardja and E. Alon, "6.2 A 60Gb/s 288mW NRZ transceiver with adaptive equalization and baud-rate clock and data recovery in 65nm CMOS technology," 2017 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, 2017, pp. 112-113.
- [4] P. Upadhyaya et al., "A fully adaptive 19-to-56Gb/s PAM-4 wireline transceiver with a configurable ADC in 16nm FinFET," 2018 IEEE International Solid - State Circuits Conference - (ISSCC), San Francisco, CA, 2018, pp. 108-110.
- [5] D. Kim, W. Choi, A. Elkholy, J. Kenney and P. K. Hanumolu, "A 15Gb/s 1.9pJ/bit sub-baud-rate digital CDR," 2018 IEEE Custom Integrated Circuits Conference (CICC), San Diego, CA, 2018, pp. 1-4.