## 4x, 3-level, blind ADC-based receiver

N. Kovacevic, M. S. Jalali, J. Liang<sup>™</sup>, C. Ting,

A. Sheikholeslami, M. Kibune and H. Tamura

The design of a 4× blind analogue-to-digital converter (ADC)-based receiver implemented in 65 nm CMOS technology is presented. The ADC, which has three levels with two adjustable thresholds, effectively implements a speculative decision-feedback equaliser. By reducing the ADC resolution and by simplifying the digital clock and data recovery design, the power consumption is reduced by a factor of 2 compared with previous works. Measurement results confirm a bit error rate of  $<10^{-12}$  at 5 Gbit/s with a high-frequency jitter tolerance of 0.39 and 0.31 UI<sub>pp</sub> for a 9.3 and a 12.9 dB FR4 channel, respectively. The entire receiver consumes 63 and 86 mW for the respective channels.

*Introduction:* The main benefit of blind analogue-to-digital converter (ADC)-based clock and data recovery (CDRs) circuits against phase-tracking ADC-based CDRs [1–3], depicted in Fig. 1, is that they completely eliminate the analogue feedback required in clock recovery. This significantly simplifies the design, allowing the digital CDR to be designed independently of the ADC (eliminating the need for co-simulation of the ADC and digital backend). In contrast with a phase-tracking CDR that recovers a physical clock from the data, a blind CDR finds the phase of the recovered clock as a digital code, and uses that for data recovery. The challenge in blind ADC-based CDRs is to reduce their power consumption to compete with their phase-tracking counterparts.



Fig. 1 Phase tracking receiver (Fig. 1a) against blind ADC-based receiver (Fig. 1b), and timing recovery for phase tracking against blind architectures (Fig. 1c)



Fig. 2 Complete 4×, 1.5b blind ADC-based CDR

Previous blind ADC-based CDRs [4–8] consumed a major portion of their power in their ADCs to sample the received data at high resolutions (5 bits in [4] and 3 bits in [8]). These high resolutions were needed for the digital blocks following the ADCs to accurately recover the value and the phase of the data. The ADC resolution also determines the amount of digital equalisation that can be achieved in the digital domain [9].

To reduce power consumption, the ADC resolution (n) must be reduced. The corresponding loss in voltage (amplitude) resolution can be partially compensated by increasing the time resolution by means of the oversampling ratio (OSR) [8]. Assuming a flash ADC architecture is used, the analogue power consumption of the ADC is proportional to the number of comparators used for each unit interval (UI):  $(2^n - 1) \times$  OSR. In this Letter, we propose a 4×, 1.5 bit system (OSR=4, n = 1.5) where the number of comparator uses per UI is reduced to 8, from 62 in [4] and 21 in [8], reducing the ADC power consumption accordingly.

The low ADC resolution, however, severely limits the amount of digital equalisation that can be achieved. To compensate for this, we move the comparator thresholds closer to zero, effectively implementing a speculative decision-feedback equaliser (DFE), to cancel the first post-cursor of intersymbol interference (ISI). In other words, we distribute the comparator thresholds non-uniformly in the voltage domain to compensate for the reduction in n. This allows the proposed architecture to halve the power consumption compared with previous work [8], without sacrificing jitter tolerance.

Proposed architecture: The full receiver architecture is shown in Fig. 2. The received 5 Gbit/s data are first equalised through a continuous-time linear equaliser (CTLE), implemented as a tunable source-degenerated differential pair, followed by a tunable variable gain amplifier. The equalised data are then sampled by eight pairs of time-interleaved comparators, each operating with one phase of an eight-phase 2.5 GHz clock. The clock phases are derived from a 10 GHz clock divider for an equivalent sampling rate of 20 Gsamples/s, which provides 4× sampling of the 5 Gbit/s data. Each pair of comparators forms a 1.5 bit (three-level) ADC, where the comparator offsets correspond to the first post-cursor ISI. For this reason, this ADC is equivalent to a one-tap speculative DFE in the analogue domain. It is noted that this CDR architecture could also be considered as a speculative DFE with blind oversampling. However, in contrast with the conventional loopunrolled DFE, blind sampling means that no sample is guaranteed to be at the centre of the eye. Data decisions (DDs) must therefore be made using a combination of all the samples.

In Fig. 2, the comparator defined as POS has a  $+\alpha$  differential offset with respect to the data common mode, whereas the comparator defined as NEG has a  $-\alpha$  offset. As in conventional speculative DFEs, the POS comparators remove the ISI when the previous bit is a 1, whereas the NEG comparators produce the correct data when the previous bit is a 0. The comparators are implemented as slightly modified StrongARM latches that compare the input against the differential offset ( $\alpha$ ) [10]. Offset due to mismatch is calibrated via 3 bit binary-encoded current tails attached to both sides of the differential summing nodes. These current tails can be calibrated either manually or automatically via a small digital block located next to the ADC core.



**Fig. 3** Digital CDR architecture (Fig. 3a), phase detector (Fig. 3b), DFE selection block (Fig. 3c)

The digital CDR, shown in Fig. 3, consists of a phase detector (PD), a loop filter, a DD block and a cycle-slip detection block. The PD is made up of two slices (POS and NEG), which detect the data zero-crossing phases ( $\Phi_X^P$ ,  $\Phi_X^N$ ) corresponding to the two speculative levels. As shown in Fig. 3*b*, phase detection is done by XORing adjacent samples in groups of five (A, B, C, D, E) per UI window. The last sample in the group (E) is the first sample of the next UI. The final zero-crossing phase,  $\Phi_{X_5}$  is estimated by the average of  $\Phi_X^P$  and  $\Phi_X^N$ . A third-order lowpass filter then estimates the average zero-crossing phase,  $\Phi_{AVE}$ . The DD block selects the sample closest to the UI centre as the transmitted bit. This is unlike the DD block in [8] in which a second-order interpolation was needed to accurately estimate the UI centre for use in the digital DFE. The DD block selects one sample in each UI window that is closest to the picking phase, defined as  $\Phi_{\text{PICK}} = \Phi_{\text{AVE}} + 0.5$  UI, for both the POS and NEG sets. The correct speculative bit per UI window is selected by a chain of MUXes controlled by bits from the previous UIs, as shown in Fig. 3*c*.

*Experimental results:* The receiver was fabricated in Fujitsu's 65 nm CMOS technology. Fig. 4*a* illustrates the jitter tolerance when PRBS7 data are transmitted over a 32-inch channel, corresponding to a loss of 9.3 dB at Nyquist, for  $\alpha$  being 0, 60 and 100 mV. The maximum jitter tolerance of 0.39 UI<sub>pp</sub> is obtained when  $\alpha = 60$  mV. When the channel length is increased to 48-inches, corresponding to a loss of 12.9 dB at Nyquist, the CTLE must be activated to maintain the required bit error rate (BER). The jitter tolerance for this channel without the CTLE is zero as shown in Fig. 4*b*. With the CTLE enabled, the receiver recovers error-free data (BER < 10<sup>-12</sup>) with a high-frequency jitter tolerance of 0.31 UI<sub>pp</sub>.



**Fig. 4** Measured jitter tolerance: 32-inch FR4 channel (9.3 dB loss at 2.5 GHz) with CTLE disabled (Fig. 4a); 48-inch FR4 channel (12.9 dB loss at 2.5 GHz) with and without CTLE enabled (Fig. 4b)

Fig. 5a shows a die photograph of the implemented chip. The area of each block is shown in this Figure. The total area is ~50% smaller than the total area of the  $3 \times$  design in [8]. The total measured power of the chip is 63.5 mW (12.7 mW/Gbit/s) without the CTLE (i.e. for the 32inch channel) and 86.3 mW (17.3 mW/Gbit/s) with the CTLE (i.e. for the 48-inch channel), 15 mW of which is consumed by the clock divider. Compared with [8], that in this Letter consumes similar power in the clock while consuming half the digital power. Owing to the reduced number of comparator uses per UI, the ADC power is also expected to drop by about 50%, but the measured ADC power shows only a drop of 20%. On further investigation, we found that an error in the design integration phase caused a floating node in the automatic calibration block. This resulted in extra static power consumption in the offset calibration blocks, leading to increased ADC power consumption. After correcting the design error, the total power of the chip is expected to be 53.5 mW (10.7 mW/Gbit/s) without the CTLE and 76.3 mW (15.3 mW/Gbit/s) with the CTLE. The ADC power consumption would also drop to approximately half compared with [8].



Fig. 5 Die micrograph (Fig. 5a); power efficiency against channel loss (Fig. 5b)

Table 1 summarises this work and compares it against ADC-based receivers in terms of their power consumption and channel attenuation. Fig. 5b shows the power efficiency against channel loss. Compared with other blind ADC-based CDRs handling a comparable channel loss [5, 6], this work achieves more than  $2^{\times}$  better power efficiency, and achieves almost  $2^{\times}$  better power efficiency than [8] while tolerating more loss.

Table 1: Comparison with previous works

| CDR                   | Data<br>rate<br>(Gbit/ | Tech.<br>(nm) | Ch.<br>Attn.<br>(dB) | ADC<br>res.<br>(bits) | ADC<br>power<br>(mW) | Digital.<br>power<br>(mW) | Total<br>power<br>(mW/ |
|-----------------------|------------------------|---------------|----------------------|-----------------------|----------------------|---------------------------|------------------------|
|                       | s)                     |               |                      |                       |                      |                           | Gbit/s)                |
| [1]                   | 7.5                    | 65            | 24                   | 4.5                   | NA                   | NA                        | 26.4                   |
| [2]                   | 10                     | 65            | 26                   | 6                     | NA                   | NA                        | 50                     |
| [3]                   | 10.3                   | 40            | 34                   | 6                     | 195                  | NA                        | NA                     |
| [4]                   | 5                      | 65            | 10                   | 5                     | 110                  | 68.4                      | 35.7                   |
| [5]                   | 5                      | 65            | 15                   | 5                     | NA                   | NA                        | 56                     |
| [ <mark>6</mark> ]    | 5                      | 65            | 13.3                 | 5                     | NA                   | 57.6                      | 42.2                   |
| [7]                   | 10                     | 65            | NA                   | 5                     | 109                  | 111.6                     | 30.6                   |
| [8]                   | 5                      | 65            | 6                    | 3                     | 38.4                 | 42                        | 19                     |
| This work<br>w/o CTLE | 5                      | 65            | 9.3                  | 1.5                   | 30.5 <sup>a</sup>    | 18                        | 12.7 <sup>b</sup>      |
| This work<br>w/ CTLE  | 5                      | 65            | 12.9                 | 1.5                   | 30.5 <sup>a</sup>    | 18                        | 17.3 <sup>b</sup>      |

<sup>a</sup>Expected to be 20.5 mW after fixing the error in the calibration block, reducing the total power to 10.7 and 15.3 mW/Gbit/s. <sup>b</sup>Includes 15 mW of clocking power.

*Conclusion:* By merging the speculative DFE into the ADC, the number of comparators and therefore the analogue power consumption of a blind ADC-based CDR is reduced by almost a factor of 2. Moreover, by over sampling the data by a factor of 4, a simplified DD scheme is used to lower the digital power consumption. Overall, this work reduces the power consumption by a factor of 2 compared with previous works.

*Acknowledgments:* The authors acknowledge the CMC for providing test equipment and the NSERC for partial funding.

 $\ensuremath{\mathbb{C}}$  The Institution of Engineering and Technology 2015

22 December 2014

doi: 10.1049/el.2014.4441

One or more of the Figures in this Letter are available in colour online.

N. Kovacevic, M. S. Jalali, J. Liang, C. Ting and A. Sheikholeslami (Department of Electrical and Computer Engineering, University of Toronto, 10 King's College Road, Toronto, Ontario, Canada M5S 3G4)

□ E-mail: liangj@ece.utoronto.ca

M. Kibune and H. Tamura (Fujitsu Laboratories Limited, 4-1-1 Kamikodanaka Nakahara-ku, Kawasaki, Kanagawa 211–8588, Japan)

## References

- Harwood, M., et al.: 'A 12.5 Gb/s SerDes in 65 nm CMOS using a baud-rate ADC with digital receiver equalization and clock recovery'. IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Pap., San Francisco, CA, USA, February 2007, pp. 436–437
- 2 Cao, J., et al.: 'A 500 mW digitally calibrated AFE in 65 nm CMOS for 10 Gb/s serial links over backplane and multimode fiber'. IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Pap., Grenoble, France, February 2009, pp. 370–371
- 3 Zhang, B., et al.: 'A 195 mW/55 mW dual-path receiver AFE for multistandard 8.5-to-11.5 Gb/s serial links in 40 nm CMOS'. IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Pap., San Francisco, CA, USA, February 2013, pp. 34–35
- 4 Tyschenko, O., Sheikholeslami, A., Tamura, H., Kibune, M., Yamaguchi, H., and Ogawa, J.: 'A 5 Gb/s ADC-based feed-forward CDR in 65 nm CMOS', *IEEE J. Solid-State Circuits*, 2010, 45, (6), pp. 1091–1098
- 5 Yamaguchi, H., et al.: 'A 5 Gb/s transceiver in 65 nm CMOS'. IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Pap., San Francisco, CA, USA, February 2010, pp. 168–169
- 6 Sarvari, S., Tahmoureszadeh, T., Sheikholeslami, A., Tamura, H., and Kibune, M.: 'A 5 Gb/s speculative DFE for 2x blind ADC-based receivers in 65 nm CMOS'. IEEE Symp. VLSI Circuits Dig. Tech. Pap., Honolulu, HI, USA, June 2010, pp. 69–70
- 7 Ting, C., Liang, J., Sheikholeslami, A., Kibune, M., and Tamura, H.: 'A blind baud-rate ADC-based CDR'. IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Pap., San Francisco, CA, USA, February 2013, pp. 122–123
- 8 Jalali, M.S., Ting, C., Abiri, B., Sheikholeslami, A., Kibune, M., and Tamura, H.: 'A 3x blind ADC-based CDR'. IEEE Asian Solid-State Circuits Conf. (ASSCC) Dig. Tech. Pap., Singapore, November 2013, pp. 349–352
- 9 Sheikholeslami, A., and Tamura, H.: 'Design metrics for blind ADC-based wireline receivers (invited paper)'. Proc. IEEE Custom Integrated Circuit Conf. (CICC), San Jose, CA, USA, September 2013, pp. 1–4
- 10 Chammas, M., and Murmann, B.: 'A 12 GS/s 81 mW 5 bit timeinterleaved flash ADC with background timing skew calibration', *IEEE J. Solid-State Circuits*, 2011, 46, (4), pp. 838–847