# A 4-GS/s Single Channel Reconfigurable Folding Flash ADC for Wireline Applications in 16-nm FinFET

Luke Wang, Marc-Andre LaCroix, and Anthony Chan Carusone

Abstract—This brief presents a 4-GS/s single channel folding flash analog-to-digital converter (ADC) designed to be timeinterleaved for wireline receivers in 16-nm FinFET CMOS. The resolution of the ADC is scalable to enable power savings depending on link modulation format (2 PAM/4 PAM) and link loss. A 1-bit folding stage determines the MSB, while the LSBs are determined by a 5-bit full flash where each comparator can be individually enabled/disabled. At 6-bit resolution, the ADC including a variable gain amplifier achieves an SNDR of 30.7 dB and an SFDR of 40.6 dB at Nyquist frequency while consuming 34.4 mW from a 0.9-V supply, yielding an FOM of 303 fJ/convstep. At lower resolutions of 5, 4, and 3 bits, the FOM remains low at 295, 320, and 399 fJ/conv-step, respectively, at Nyquist frequency.

Index Terms—Analog-to-digital converter (ADC), high speed, flash, folding, FinFET, reconfigurable, wireline.

#### I. INTRODUCTION

NALOG to digital converter (ADC) based receivers are used in wireline links for their ability to compensate high loss (>20dB at Nyquist) & cross-talk channels. The front-end ADC is normally followed by a synthesized digital signal processing (DSP) engine which implements equalization and/or forward error correction (FEC). The synthesized DSP is easily malleable to cover link conditions across multiple standards. In addition, the DSP allows for better resilience against process variations as the design moves to smaller CMOS technology nodes. Traditional mixed-signal receivers do not offer as much portability and re-configurability as an ADC-based design, but may offer a lower power consumption. With the advent of higher order modulation formats such as 4PAM, the architecture of traditional mixed-signal receivers began to resemble that of a flash ADC [1], where multiple comparators are used in parallel to quantize the signal. This in turn makes the power advantage of mixed-signal receivers less obvious. In addition, several ideas such as embedding equalization in the ADC itself [2]-[4] and utilizing non-uniform quantization [5] to minimize the bit-error-rate (BER) further

Manuscript received May 6, 2017; accepted July 7, 2017. Date of publication July 12, 2017; date of current version November 22, 2017. This work was supported in part by the Huawei Canada Research Centre and in part by NSERC. This brief was recommended by Associate Editor C. W. Wu. (*Corresponding author: Luke Wang.*)

L. Wang and A. C. Carusone are with the Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S3G4, Canada (e-mail: luke.wang@isl.utoronto.ca).

M.-A. LaCroix is with Huawei Canada, Ottawa, ON K2K 3J1, Canada.



Fig. 1. Conceptual 8-way time-interleaved ADC.

blurs the distinction between ADC-based receivers and mixedsignal receivers. In fact [3]–[5] indicate that the true resolution needed for a 2PAM link is only 3-4 bits. This is in stark contrast to the common use of 6-7 bit ADCs in receivers [6]–[8] which are significantly over-designed.

For an ADC-based receiver, the front-end ADC must be time-interleaved to achieve good power efficiency at speeds of 10GS/s and above. Fig. 1 shows a conceptual 8 way time-interleaved ADC operating at a sampling frequency of fs, where each channel operates at fs/8, sampling the input sequentially in time. All 8 channels are driven by a shared input buffer. Each individual channel is usually a successive approximation register (SAR) [2] or flash ADC [7], [8], as these architectures provide the best power efficiency at resolutions needed for wireline links. In this brief, a folding flash ADC is designed as part of a time-interleaved ADC in 16nm FinFET CMOS. The flash ADC provides better latency than its SAR counterpart, allowing more latency to be budgeted for more aggressive error correction coding and a higher BW CDR for better jitter tracking. In addition, power scalability is much more prominent in a flash architecture than a SAR architecture. By individually controlling the enable of each and every comparator in the array, the flash ADC in this brief is scalable in resolution and can also be used as a non-uniform quantizer.

## **II. ADC ARCHITECTURE**

## A. System Architecture

The ADC architecture is shown in Fig 2. It consists of a track and hold (T&H) using a NMOS switch, a variable gain



Fig. 2. Folding flash ADC architecture.



Fig. 3. (a) Clock generation circuitry and (b) timing diagram.

amplifier (VGA), a 1-bit folding stage, and a 5-bit full flash. The sampling capacitor  $C_s$  of ~15fF consists of metallization and the input capacitance of the VGA. The 1-bit folding stage [9] is activated based on the MSB comparator decision, and simply exchanges the differential inputs if the sign is negative. This simple folding operation allows the number of comparators to be reduced by almost half. The VGA allows for input range adjustment depending on transmitter swing and channel attenuation, essentially extending the dynamic range of the ADC. It also allows for calibration of gain mismatch when the channel is integrated in a time-interleaved ADC. A PMOS source follower (SF) is used to buffer the large input capacitance of the 5-bit full flash. As part of the design, the current of the PMOS buffer can be increased 2x depending on the resolution of the flash.

# B. Clock Generation and Timing

The clock generation and timing diagram are shown in Fig. 3a and Fig. 3b respectively. An on-chip PLL generates a differential clock at 16.25GHz with <200fs rms jitter, which is used by a divide-by-4 to generate 8 phases,  $\phi_0 \dots \phi_7$ , of an  $f_{clk} = 4.0625$ GHz clock spaced at 1/(8 $f_{clk}$ ). Two adjacent phases of this clock are then gated to generate a sampling clock CLK<sub>S</sub> with 1/8 duty cycle. In the 8-way time-interleaved configuration in Fig. 1, this allows only 1 T&H/channel to load the input at a time, therefore improving the bandwidth of the input buffer. In terms of potential input buffer design, in this technology, a unity gain buffer with an output impedance of 50 $\Omega$ , driving up to 400 $\mu$ m of total interconnect (up to



Fig. 4. (a) Source-degenerated VGA for gain control, (b) MSB comparator with offset cancellation.

8 subADCs) and one T&H at a time can provide 20GHz BW while consuming 10mW. After the input is sampled, ~61.5ps is allocated for the output of the VGA to settle before CLK<sub>MSB</sub> triggers the MSB comparator. A simple delay line creates CLK<sub>MSB</sub> from a delayed version of  $\phi_0$ . The decision of the comparator activates the folding switches in the appropriate direction immediately after it is ready. The switches are then opened slightly before CLK<sub>MSB</sub> falls low. This allows the voltage to remain unaffected by the kickback caused by the reset of the MSB comparator. The full 5-bit flash is then activated by CLK<sub>LSB</sub> which is simply a buffered version of  $\phi_0$ . Note that the folder activation time depends on the magnitude of the input to the MSB comparator, but is guaranteed by design (considering the best case MSB comparator decision time and routing delay) to be after the previous LSB decision is finished. In this way, the folding switches are essentially used to pipeline the conversion.

## III. CIRCUIT DESIGN CONSIDERATIONS

# A. System Level Considerations

The ADC was designed to operate at 0.9V supply with a nominal full-scale range of 500mV<sub>ppd</sub> & an input common mode of 400mV. The VGA, shown in Fig. 4a, is a simple differential pair with PMOS degeneration. The control V<sub>TUNE</sub> is generated by a 5-bit R-DAC. The MSB comparator, shown in Fig. 4b, is a double-tail latch [10] with offset cancellation implemented using a signed 5-bit C-DAC (2 bit thermometer, 3 bit binary) at D<sub>IN</sub> & D<sub>IP</sub>. The C-DAC covers >  $3\sigma$ ( $\sigma = 1.5$ LSB) of comparator offset with a nominal step size of approximately 1/4 LSB. The C-DAC provides lower thermal noise than adding an additional input pair or current DAC for calibration. The MSB comparator is sized  $\sqrt{2} \times$  larger than the LSB comparators for lower noise & offset since its decision is more crucial.

#### B. Design of 5-Bit Flash ADC

The 5-bit flash ADC, buffered by the PMOS SF, consists of 31 comparators, a resistor ladder, and a Wallace tree adder as an encoder. The LSB comparator, shown in Fig. 5a, is also a double tail latch with offset calibration using a signed 5-bit CDAC with nominal step of 1/4 LSB. An additional input pair



Fig. 5. (a) LSB comparator and (b) input stage shown with added kickback mitigation.



Fig. 6. (a) Kickback reduction by staggering comparator activation and (b) by cancelling systematic average kickback through shifting of references  $V_{REF}$  by 2 LSBs.

is added to connect to the references  $V_{REF}$  from the resistor ladder. Each dynamic comparator is clock-gated with an enable signal, thereby allowing the ADC quantization levels and resolution to be flexible. The comparators are sized for a simulated thermal noise of  $1.2 \text{mV}_{rms}$  compared to a quantization noise level of  $2.255 \text{mV}_{rms}$ . In the 16nm FinFET process, the thermal noise may be  $2 \times$  larger than a planar process, therefore the comparator size is large enough to cause significant kickback at the input to the flash. To alleviate this issue, three techniques were used.

## C. Kickback Mitigation

The first technique is to add kickback cancellation [11] to the comparator as shown in Fig. 5b. First, the source of the input transistor is reset through M1 instead of left floating, thereby minimizing the drift on the sampled voltage. Second, cross-coupling transistors M2 & M3 are added to convert any differential kickback to common mode. Third, transistors M4 & M5 are added at the gate of the input devices to supply charge.

The second technique, shown in Fig. 6a, is to stagger the comparators in 3 delayed groups such that the kickback is reduced. This disadvantage of this is the LSB comparator decision time is reduced, but in this technology the comparator time constant is small enough to guarantee a meta-stability rate of  $10^{-12}$ . Finally, since the folding ideally ensures only positive differential voltages appear at the input of the flash,



Fig. 7. Simulation result of offset calibration for RCC extracted ADC before and after reference shift of 2 LSBs without random mismatch. Converged DAC code for each LSB comparator is shown.



Fig. 8. Resistor ladder and control of common mode and LSB size through current control.

the kickback is always in the same direction. Therefore, the resistor ladder taps are shifted down by 2 LSBs to account for this systematic offset as shown in Fig. 6b. Fig. 7 shows simulation results where the RCC extracted ADC was calibrated for offset before and after the shift. This simulation did not include random mismatch, therefore the offset calibrated results primarily from kickback and minor systematic offset after layout. The DAC codes shown suggest a kickback that introduces a systematic offset with a range of 16 DAC codes or  $\sim$ 9mV, where after the shift, additional margin is introduced for the negative range of the DAC.

## D. Reference Generation

For reference generation, the R-ladder shown in Fig. 8 draws only  $50\mu$ A, with a 1.2pF decoupling capacitor at each node for a maximum single-ended droop of 1/2 LSB. A singlestage op-amp is used in feedback with a loop gain of 40dB to set the common mode of the ladder. This common mode is adjustable through an R-DAC, and the bias current of the ladder is also adjustable. This enables LSB size and therefore input full scale range adjustment, or additional calibration range for PVT variations.

## E. Folding Considerations

It should be noted that the 1 bit folding operation essentially doubles the highest frequency content of the waveform. This can be easily seen by considering a triangular wave input: the 1 bit folded waveform is simply a triangular wave at double the frequency with a DC offset. Similarly, a Nyquist signal at  $f_{in}$  near  $f_s/2$  now appears near  $f_s$ , which when aliased becomes a very "slow" signal near DC, while a signal at  $f_{in}$ 

Fig. 9. Die photo & layout floorplan with ADC components labelled. Total active area is  $60\mu$ mx290 $\mu$ m.



Fig. 10. Measured offset DAC calibration code histogram, total DAC range is +/-31.

near  $f_s/4$  appears near  $f_s/2$ . Therefore, the settling requirement after folding is hardest for a signal near  $f_s/4$ . As a result, in this design the SFDR and therefore SNDR degrades near this frequency instead of  $f_s/2$ . This can be alleviated by resetting the folding node after each conversion at the cost of a further reduction in the comparators' decision time or additional pipelining which adds latency.

## F. Process Considerations

The nominal supply in this process is 0.8V, however in order to achieve sufficient linearity in this design, an analog supply of 0.9V was used. Layout rules are also more restrictive, requiring uniformity and additional dummification. More importantly, layouts must meet EM requirements, further restricting design choices. In particular, FinFETs suffer from self-heating effect (SHE), which can cause up to a 20°C increase in temp., proportional to the number of fins. For minimum length devices, this prototype restricts the number of fins to be  $\langle = 8$ , targeting a max. temp. of 80°C without SHE.

## **IV. MEASUREMENTS**

The ADC was fabricated in the TSMC 16nm FinFET CMOS process. The die photo & layout floorplan are shown in Fig. 9 with a total active area of approximately  $60\mu$ mx290 $\mu$ m. To calibrate the comparator offsets, an off-chip DAC with an accuracy of +/-2mV sweeps the input and through a binary search using 1000 averaged decisions per iteration the C-DAC code of each comparator is changed until its offset is minimized. The comparator offset standard deviation is measured in this way to be 1.56 LSB, with the final C-DAC settings between -14 & 21 as shown in Fig. 10, thereby validating that the C-DAC range of +/-31 is more than sufficient. The input-referred thermal noise of the ADC was also extracted to be 3.61mV<sub>rms</sub> by Gaussian fitting an output histogram with zero input.

Fig. 11 shows the DNL/INL before and after calibration: DNL improves from -0.907/1.444 to -0.757/0.542, and the INL improves from -2.364/2.047 to -1.035/0.195.



Fig. 11. DNL/INL before (grey dashed) and after (black solid) calibration.



Fig. 12. SNDR/SFDR performance up to Nyquist ( $F_s = 4.0625$ GS/s).

Fig. 12 shows the SNDR/SFDR before and after calibration. Note that the SFDR is lowest for input frequencies around  $f_S/4$  as this results in the highest frequency signal at the output of the folding stage as discussed previously. Fig. 13 shows the spectrum of the ADC at a near Nyquist input of 2.0243GHz. A SNDR of 30.68dB and a SFDR of 40.6dB is achieved at Nyquist with a total power consumption of 34.44mW from a 0.9V supply, yielding a FOM of 303fJ/conv-step including the VGA and all clock distribution & generation (including 2 stages of clock buffers operating at 16GHz but excluding the PLL). The total power consists of a static power of 5.72mW (16.6%) and a dynamic power of 28.72mW (83.4%). The PMOS SF that buffers the 5-bit flash consumes 70.8% of the total static power. The comparators in total consume 87.2% of the total dynamic power, with each LSB comparator consuming 772.5 $\mu$ W, leading to a significant power saving if some of them can be disabled depending on target resolution for link. Fig. 14 shows ADC SNDR vs. resolution as comparators are disabled. For resolutions < 4, the PMOS SF current is halved. A FOM of 295fJ/conv-step, 320fJ/conv-step, 399fJ/conv-step, 814fJ/conv-step is achieved for resolutions 5, 4, 3, & 2 bits respectively.

Table I compares the performance of this ADC at various resolutions with other folding flash ADCs. Single channel successive approximation register (SAR) ADCs may be more power-efficient, however, in addition to requiring a much higher interleaving factor ( $\sim 4 \times$ ), SAR architecture is incapable of power-scaling as efficiently as the flash architecture. The power efficiency is comparable, except for [9] which has a BER =1.3e-3 not suitable for wireline use and has no gain stage. In this brief, the Wallace encoder provides better





Fig. 13. Spectrum of input at 2.0243GHz, 4096pts FFT ( $F_8 = 4.0625GS/s$ ).



Fig. 14. SNDR versus ADC resolution ( $F_s = 4.0625$ GS/s).



Fig. 15. SNDR/SFDR@Nyquist for all VGA settings ( $F_s = 4.0625$ GS/s).



Fig. 16. SNDR/SFDR performance up to 24GHz ( $F_s = 4.0625$ GS/s).

tolerance to bit errors. The VGA in this brief has a measured maximum DC gain of 5.8dB with a SFDR better than 38dB for all gain settings as shown in Fig. 15 with a Nyquist rate input. For time-interleaving to higher frequencies, Fig. 16 shows the ADC SNDR/SFDR up to 24GHz, where the insertion loss due to equipment, PCB, & package were compensated by increasing the signal source amplitude. The performance is 4.27 ENOB at 16GHz, limited by the 3<sup>rd</sup> harmonic.

 TABLE I

 Summary & Comparison of Folding FLash ADCs

| Ref.       | [9]  | [6]  | [7]  | [8]  | This Work |      |      |      |
|------------|------|------|------|------|-----------|------|------|------|
| Node[nm]   | 90   | 40   | 40   | 28   | 16        |      |      |      |
| FSR[V]     | 0.8  | 1    | N/A  | 0.7  | 0.5       |      |      |      |
| Fs [GS/s]  | 1.75 | 2.2  | 10.3 | 10.3 | 4.0625    |      |      |      |
| Res.[Bits] | 5    | 7    | 6    | 6    | 6         | 5    | 4    | 3    |
| ENOB@      | 4.7  | 5.92 | 4.56 | 4.59 | 4.8       | 4.3  | 3.7  | 2.75 |
| Nyquist    |      |      |      |      |           |      |      |      |
| Power      | 2.2  | 27.4 | 139  | 40.9 | 34.4      | 23.5 | 16.9 | 10.9 |
| [mW]       |      |      |      |      |           |      |      |      |
| FOM        | 50   | 220  | 590  | 330  | 303       | 295  | 320  | 399  |
| [fJ/conv]  |      |      |      |      |           |      |      |      |

## V. CONCLUSION

A 4GS/s folding flash ADC was presented that can be used as a single channel in a time-interleaved ADC for wireline applications. This ADC addresses the prominent overdesign in ADC-based receivers by allowing for resolution reduction based on the operating link conditions. At a resolution from 3 bits to 6 bits, the ADC maintains a good FoM ranging from 303-399 fJ/conv-step at Nyquist.

## ACKNOWLEDGMENT

The authors thank specifically Muhammad Ali Khan, Mark Roberts, and Trevor Monson for layout support; Semyon Lebedev, David Cassan, and Davide Tonietto for comments and discussions; and Rudy Beerkens and Andrew Marshall for the die photo.

#### REFERENCES

- O. Elhadidy, A. Roshan-Zamir, H.-W. Yang, and S. Palermo, "A 32 Gb/s 0.55 mW/Gbps PAM4 1-FIR 2-IIR tap DFE receiver in 65-nm CMOS," in *Proc. Symp. VLSI Circuits*, Kyoto, Japan, Jun. 2015, pp. C224–C225.
- [2] E. Z. Tabasy, A. Shafik, K. Lee, S. Hoyos, and S. Palermo, "A 6 bit 10 GS/s TI-SAR ADC with low-overhead embedded FFE/DFE equalization for wireline receiver applications," *IEEE J. Solid-State Circuits*, vol. 49, no. 11, pp. 2560–2574, Nov. 2014.
- [3] E.-H. Chen, R. Yousry, and C.-K. K. Yang, "Power optimized ADCbased serial link receiver," *IEEE J. Solid-State Circuits*, vol. 47, no. 4, pp. 938–951, Apr. 2012.
- [4] J. Kim et al., "Equalizer design and performance trade-offs in ADCbased serial links," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 58, no. 9, pp. 2096–2107, Sep. 2011.
- [5] Y. Lin et al., "A study of BER-optimal ADC-based receiver for serial links," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 63, no. 5, pp. 693–704, May 2016.
- [6] M. Miyahara, I. Mano, M. Nakayama, K. Okada, and A. Matsuzawa, "A 2.2GS/s 7b 27.4mW time-based folding-flash ADC with resistively averaged voltage-to-time amplifiers," in *Proc. Int. Solid-State Circuits Conf.*, San Francisco, CA, USA, Feb. 2014, pp. 388–389.
- [7] B. Zhang et al., "A 40 nm CMOS 195 mW/55 mW dual-path receiver AFE for multi-standard 8.5–11.5 Gb/s serial links," *IEEE J. Solid-State Circuits*, vol. 50, no. 2, pp. 426–439, Feb. 2015.
- [8] B. Raghavan et al., "A 125 mW 8.5-11.5 Gb/s serial link transceiver with a dual path 6-bit ADC/5-tap DFE receiver and a 4-tap FFE transmitter in 28 nm CMOS," in Proc. Symp. VLSI Circuits, Honolulu, HI, USA, Jun. 2016, pp. 1–2.
- [9] B. Verbruggen, J. Craninckx, M. Kuijk, P. Wambacq, and G. Van der Plas, "A 2.2mW 5b 1.75GS/s folding flash ADC in 90nm digital CMOS," in *Proc. Int. Solid-State Circuits Conf.*, Feb. 2008, pp. 252–611.
- [10] M. Miyahara and A. Matsuzawa, "A low-offset latched comparator using zero-static power dynamic offset cancellation technique," in *Proc. Asian Solid-State Circuits Conf.*, Nov. 2009, pp. 233–236.
- [11] K.-M. Lei, P.-I. Mak, and R. P. Martins, "Systematic analysis and cancellation of kickback noise in a dynamic latched comparator," *Analog Integr. Circuits Signal Process.*, vol. 77, no. 2, pp. 277–284, Nov. 2013.