# A 35-GS/s, 4-Bit Flash ADC With Active Data and Clock Distribution Trees

Shahriar Shahramian, Student Member, IEEE, Sorin P. Voinigescu, Senior Member, IEEE, and Anthony Chan Carusone, Senior Member, IEEE

Abstract—This paper presents a 35-GS/s, 4-bit flash ADC-DAC with active data and clock distribution trees. At mm-wave clock frequencies, skew due to mismatch in the clock and data distribution paths is a significant challenge for both flash and time-interleaved converter architectures. A full-rate front-end track and hold amplifier (THA) may be used to reduce the effect of skew. However, it is found that the THA output must then be distributed to the comparators with a bandwidth greater than the sampling frequency in order to preserve the flat regions of the track and hold waveform. Instead, if the data and clock distribution have very low skew, the THA can be omitted thus obviating the associated nonlinearities and resulting in improved performance. In this work, a tree of fully symmetric and linear BiCMOS buffers, called a "data tree", distributes the input to the comparator bank with a measured 3-dB bandwidth of 16 GHz. The data tree is integrated into a complete 4-bit ADC including a full-rate input THA that can be disabled and a 4-bit thermometer-code DAC for testing purposes. The chip occupies 2.5 mm  $\times$  3.2 mm including pads and is implemented in 0.18  $\mu$ m SiGe BiCMOS technology. The ADC consumes 4.5 W from a 3.3 V supply while the DAC operates from a 5 V supply and consumes 0.5 W. The ADC has 3.7 ENOB with a 3-dB effective resolution bandwidth of 8 GHz and a full-scale differential input range of 0.24  $V_{pp}$ . With the THA enabled, the performance degrades rapidly beyond 8 GHz to less than 1-bit, but with the THA disabled, the ENOB remains better than 3-bits for inputs up to 11 GHz with an SFDR of better than 26 dB.

Index Terms—Active clock distribution, active data distribution, analog to digital converter (ADC), BiCMOS amplifiers, digital to analog converter (DAC), DSP-based equalizers, flash data converters, mm-wave data converters, SiGe BiCMOS HBT, track and hold amplifier (THA), transimpedance amplifier (TIA).

#### I. INTRODUCTION

ROBUST and integrated wireline receiver solution is to employ digital signal processing (DSP) for dispersion and intersymbol interference (ISI) compensation and recovery of the clock. DSP-based equalization can be employed for optical links or to address the need to push higher data rates through existing low frequency copper wireline infrastructures. Electrical equalization (if possible) is an inexpensive alternative to replacing the existing wired infrastructure. However, a major bottleneck in realizing a DSP-based equalizer is the implementation of the

Manuscript received June 10, 2008; revised March 12, 2009. Current version published May 28, 2009. This work was supported by Gennum and Jazz Semi-conductor.

The authors are with the Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada (e-mail: sshahram@eecg.utoronto.ca).

Digital Object Identifier 10.1109/JSSC.2009.2020657

preceding analog-to-digital converter (ADC). DSP-based equalizers using baud-rate ADCs have been demonstrated in the past [1], [2]. Other applications for mm-wave ADCs include satellite and wireless communication [3], [4], high speed soft-decision-based forward error correction systems [5], military radar systems [6] and instrumentation (i.e., wide bandwidth sampling oscilloscopes) [7]. Furthermore, depending on the application, the desired ENOB varies. For instance, [3] has demonstrated a powerful forward error correction for 10 Gb/s optical communication systems using a 3-bit soft decision IC. For DSP-based optical communications, [8] recommends sampling at twice the data rate combined with an ENOB of over 4 bits in the frequency band of interest.

High speed ADCs reported in the literature use a variety of technologies and architectures. The highest sampling rates have been realized with SiGe and InP technologies [6], [8], [9]. To overcome the capacitive load associated with the comparator bank, in mm-wave flash or time interleaved architectures, different techniques have been employed.

To achieve a combination of high sampling rate and high resolution, time-interleaved architectures have been demonstrated in [7] and [10]. These designs rely on parallelism and require periodic calibration for offset, gain, skew and mismatch correction. A simplified block diagram of this approach is shown in Fig. 1(a). The reported CMOS implementations use time-interleaved THAs to drive banks of time interleaved sub-ADCs. In [7], the front-end consists of 80 parallel THAs and a SiGe amplifier is used to drive the 4 pF of total input capacitance. In [10], a power splitter is used to break up the input capacitance of the THAs. The latter approach requires a full scale differential input of 1.2  $\rm V_{\rm pp}$ .

The highest reported sampling rate of 40 GHz was achieved using a SiGe technology ( $f_T/f_{\rm MAX}=210/310$  GHz) for a 3-bit ADC [6]. In this approach (shown in Fig. 1(b)) the comparator bank is treated as a lumped load to be driven by a THA.

A 5-bit, 22 GS/s ADC is presented in [8] where the input capacitance of the comparators is absorbed along a transmission line. This approach, which was originally implemented in [9], is shown in Fig. 1(c). No track-and-hold amplifier (THA) or other active input stage was integrated. The implemented ADC in [8] requires a differential input signal amplitude of 1.28  $V_{\rm pp}$  and precise matching between the delay of the input data path and the clock distribution network to avoid clock-to-data skew.

This paper presents an alternative architecture which is shown in Fig. 1(d). The fabricated ADC uses a tree of linear buffers to drive the capacitive load of the comparator bank. A TIA with 12 dB of differential gain is used as a relatively low-noise front-end amplifier. This combination allows the



Fig. 1. Flash converter architectures: (a) A time interleaved architecture with sub-rate THAs and multiphase clock generator (e.g., [7], [10]). (b) Direct driving the comparator bank using a full rate THA (e.g., [6]). (c) Using a transmission line to distribute the input and the clock signal (e.g., [8]). (d) Implementation of active distribution networks to route the input and clock signal to the comparator bank, as in this work.

ADC to process small input signals (0.24  $V_{\rm pp}$  differential), and drive the comparator bank through a symmetric data tree for minimized skew. A fully symmetric bipolar clock distribution network is employed to minimize clock skew. This technique relies only on the matching between identical blocks in each distribution tree. A THA may be inserted between the front-end amplifier and the data tree, but it will be shown that unless sufficient bandwidth is provided in the data tree such a track and hold amplifier can actually degrade the performance of the ADC. Section II discusses the impact of the data tree on the ADC performance and the efficacy of the THA for high

frequency input signals. Measurements of a fabricated breakout comprising a THA followed by an active data tree are also presented. The implementation and measurement results of the fabricated 35-GS/s flash ADC is discussed in Section III. Concluding remarks are presented in Section IV.

#### II. IMPACT OF THE ACTIVE DATA TREE ON ADC PERFORMANCE

By using a tree of linear buffers to drive the comparators of the ADC, the large capacitance associated with the comparator bank is divided amongst buffers with reduced fan-out. Since the input capacitance of the data tree is significantly smaller than that of the entire comparator bank, a high gain front-end amplifier (in this design, a TIA) can then be used to drive the data tree with sufficient bandwidth (Fig. 1(d)). It is important to note however, that in contrast to purely passive data distribution, this active implementation consumes power.

A simple approach is to design the data tree using identical amplifiers with a fan-out of k, in which case the required depth of the tree is  $[\log_k(2^n-1)]$  for an n-bit flash converter. In order to minimize the number of cascaded stages, the highest fan-out which satisfies the overall required data tree bandwidth must be chosen. In practice, layout considerations and symmetry requirements of the data tree must also be considered. To achieve a data tree bandwidth of  $\omega_{\rm clk}/2$ , each buffer with a fan-out of k must have a 3 dB frequency of at least [11]

$$\omega_{\text{3dB\_Buffer}} \ge \frac{\omega_{\text{clk}}}{2\sqrt{2^{\frac{1}{\lceil \log_k(2^n - 1) \rceil}} - 1}}.$$
 (1)

In an n-bit converter and under otherwise ideal conditions, a front-end amplifier and data tree bandwidth of  $\omega_{\rm clk}/2$  can achieve an ENOB of better than (n-0.5)-bits up to the Nyquist frequency. However, sources of sampling uncertainty can further degrade the performance of the ADC. The maximum achievable SNR in the presence of sampling jitter is [12]

$$SNR_{MAX} = 20 \log \left( \frac{1}{\omega_{in} \tau_i} \right).$$
 (2)

In flash ADCs, three major sources of timing uncertainty are clock skew, data skew and random jitter of the clock signal, but clock and data skew dominate if the ADC is driven by a low phase noise clock [8]. To combat clock and data skew in mm-wave ADCs, a THA may be employed. A full rate THA can be inserted after the front-end amplifier. For half the clock period the THA holds the input signal constant, ideally reducing the rate of change of the input signal to zero. During this time, the clocked comparators make decisions based on the held value. For the second half of the clock period, the THA tracks the input signal. In the presence of an ideal THA, the sampling uncertainty due to clock and data skew is eliminated.

The THA produces the desired "zero slope" regions in the time domain by introducing high frequency content to the spectrum of the input signal. This high frequency content must be preserved otherwise the hold mode behavior of the THA is lost. In practice, poles introduced at the output of the THA and comparator pre-amplifiers attenuate this content, thus reducing the efficacy of the THA. Fig. 2 shows the block diagram of a 4-bit flash ADC used to model this phenomenon. In our behavioral



Fig. 2. This model is used to simulate the efficacy of a THA in the presence of data tree bandwidth limitations. Without sufficient data tree bandwidth, the THA is unable to combat the effects of the clock and data skew ( $\tau_{d1-15}$  and  $\tau_{c1-15}$ , respectively).



Fig. 3. Impact of the data tree bandwidth on the THA waveform: (a)  $\omega_{\rm 3dB}=2\omega_{\rm clk}$ , (b)  $\omega_{\rm 3dB}=0.8\omega_{\rm clk}$ , (c)  $\omega_{\rm 3dB}=0.5\omega_{\rm clk}$ . The input signal frequency is  $\omega_{\rm in}=0.14\omega_{\rm clk}$ . The hold mode behavior of the THA is mostly lost if the data tree bandwidth is limited.

simulations, an ideal THA is followed by an  $N ext{th}$  order low-pass filter

$$T_{\rm LPF}(s) = \left(\frac{1}{1 + \frac{s}{\omega_o}}\right)^N \tag{3}$$

and with a 3 dB bandwidth of  $\omega_{3\mathrm{dB}}$ . The low-pass filter represents the frequency response of the data tree. For a constant 3 dB bandwidth, simulation results are similar for  $N=\{2,3,4\}$ . The results shown here are for N=3. Clock and data skew are modeled by delaying the input and clock of each comparator by independent Gaussian distributed random variables ( $\tau_{\mathrm{d1}} - \tau_{\mathrm{c15}}$  and  $\tau_{\mathrm{c1}} - \tau_{\mathrm{c15}}$  in Fig. 2) with zero mean and standard deviations of  $0.05T_{\mathrm{clk}}$  where  $T_{\mathrm{clk}}$  is the clock period of the ADC. Behavioral simulations of one hundred 4-bit ADCs with a data tree bandwidth of  $\omega_{\mathrm{clk}}/2$  and no THA show that with increasing input frequency, the average ENOB decreases due to skew. For instance, while 95% of all simulated ADCs achieve an ENOB of better than 3.5 bits for an input frequency of  $0.08\omega_{\mathrm{clk}}$ , only 3%

of the simulated ADCs achieve the same ENOB for an input frequency of  $0.22\omega_{\rm clk}$ .

An ideal THA followed by a data tree with infinite bandwidth can fully eliminate the effect of the skew in more than 99.7% of all simulated ADCs resulting in an ENOB of 4-bits for any input frequency. However, an ideal THA followed by a finite-bandwidth data tree can be expected to have performance somewhere between these two extremes. The goal is to investigate the data tree bandwidth requirements such that a THA can successfully combat skew.

Fig. 3 shows the input to the comparator bank for a sinusoidal input signal  $x(t) = \sin(\omega_{\rm in}t)$  with  $\omega_{\rm in} = 0.14\omega_{\rm clk}$  where  $\omega_{\rm clk}$  is the clock frequency of ADC in rad/s. With a very high bandwidth data tree, the slope during the hold time is zero, providing immunity against skew. However, as the bandwidth is reduced, the input to the comparator bank looks increasingly like a sinusoid at the ADC input. Hence, the THA provides little improvement in performance unless the bandwidth of the following stages is high enough.



Fig. 4. Simulated ENOB of a 4-bit Flash ADC in the presence of clock and data skew with standard deviations of  $0.05T_{\rm Clk}$ . It can be observed that if the data tree bandwidth is limited to  $0.5\omega_{\rm clk}$ , a THA offers little improvement in performance.

The simulated ADC ENOB profile is shown in Fig. 4. It shows the minimum ENOB achieved by the best 95% of all simulated ADCs. With a data tree bandwidth of  $\omega_{\rm clk}/2$ , no THA, and in the presence of skew, an ENOB better than 3-bits can only be expected for input frequencies up to  $0.18\omega_{\rm clk}$ . With the THA enabled and the data tree bandwidth extended to beyond  $\omega_{\rm clk}$ , nearly all simulated ADCs achieved 4-bits of ENOB. However, it can be observed that if the bandwidth of the data tree is close to the Nyquist frequency,  $\omega_{\rm 3dB} = \omega_{\rm clk}/2$ , implementing a THA gives little improvement in performance. Therefore, if a THA is to be employed effectively, the bandwidth of the data tree must exceed the clock frequency; typically a very difficult design specification for mm-wave conversion rates.

#### A. THA and Data Tree Breakout

To investigate the impact of the data tree on the ADC performance, a breakout of the TIA, THA and data tree has been fabricated. This breakout allows for the characterization of the TIA, THA, and the BiCMOS data tree. The front-end amplifier is a low noise broadband TIA, followed by a buffering stage which drives the THA. The buffering stage provides biasing current to the THA and improves the single-ended to differential conversion when the TIA is driven single-ended [14]. Unlike [6], [7], [10], the implemented THA needs to only drive a small output capacitance. A data tree of BiCMOS cascode buffers is employed. The tree is intended for use in a 4-bit flash converter. Each stage has a fan-out of two and, to save area in this breakout, only one branch of the tree is implemented with the other branches terminated on chip. A BiCMOS 50  $\Omega$  driver allows the analog output of the tree to be taken off-chip.

1) Transimpedance Amplifier (TIA): A transimpedance amplifier can be designed to provide simultaneous noise and impedance matching without the need for 50  $\Omega$  matching resistors [13]. Fig. 5 shows the schematic of the TIA included in this ADC. The TIA is employed as the front-end amplifier and



Fig. 5. Schematic of the implemented fully bipolar TIA. The TIA has 12 dB of simulated differential gain and drives the THA.

is driven as a voltage amplifier. In order to achieve the desired 12 dB of differential gain, the TIA is implemented using SiGe HBTs only. In this design, we have added 12  $\Omega$  degeneration resistors to improve the linearity of the TIA without significantly degrading the noise figure. Diode connected transistors  $Q_3$  and  $Q_4$  prevent the breakdown of  $Q_1$  and  $Q_2$ . A switched emitter-follower THA is also implemented following the design



Fig. 6. Schematic of the MOS-HBT cascode differential pair used in the data tree. CMOS input transistors offer high linearity while the bipolar cascode transistors offer low output capacitance.

methodology described in [14]. The TIA has a simulated bandwidth of better than 20 GHz. This simulation is also supported by the measurement results presented in [14].

2) BiCMOS Data Tree: Each block in the BiCMOS data tree consists of a MOS-HBT cascode amplifier shown in Fig. 6. The BiCMOS cascode amplifiers have a tail current of 8 mA with the MOSFETs biased near the peak  $f_T$  current density  $(J_{\rm pfT}=0.3~{\rm mA}/\mu{\rm m})$ . Although smaller nMOS transistors are more susceptible to mismatches, biasing at  $J_{\rm pfT}$  was necessary to achieve a combination of high bandwidth and high linearity. In this design, the layout of each data tree block is optimized by using dummy resistors, transistors and interdigitating and merging the nMOS transistors with common sources a single well to minimize mismatches. The 0.18  $\mu$ m nMOS pair offers a simulated differential input compression point of 1.5  $V_{\rm pp}$  at low frequencies. The HBT common-base transistors create low impedance cascode nodes with high output slew rate and bandwidth. The data tree is implemented completely symmetrically to minimize systematic sources of skew, comparator dependent signal attenuation and offsets. Series-shunt inductive peaking is also used throughout the data tree. Series peaking is provided by the interconnect between the data tree blocks. The simulated bandwidth of a single block with a fan-out of two is 45 GHz and the simulated gain of the entire data tree is 0 dB.

# B. THA and Data Tree Measurement Results

All measurements on the breakout are done single-ended and using wafer probing. The measured and post-extraction simulated small-signal characteristics of the TIA, THA and the data tree are shown in Fig. 7. The BiCMOS output driver used to drive off-chip 50  $\Omega$  loads has a simulated loss of 6 dB. The

combination of the TIA, THA and the data tree has a 3 dB bandwidth of 16 GHz. The TIA input return loss is better than  $-10\,\mathrm{dB}$  up to 20 GHz. Although a similar THA in the same technology demonstrated a bandwidth exceeding 40 GHz [14], the bandwidth of this design is limited by the many buffer stages required to drive the capacitive load of the comparator bank. While each buffer stage has a bandwidth of 45 GHz on its own, cascading four of them along with the THA and TIA greatly reduces the overall bandwidth down to 16 GHz. Fig. 8 shows the measured time domain outputs of the breakout for input frequency  $\omega_{\rm in}=5$  GHz and clock frequency  $\omega_{\rm clk}=35$  GHz without [Fig. 8(a)] and with [Fig. 8(b)] the THA enabled. It can be observed that the hold mode behavior of the THA has nearly disappeared in Fig. 8(b) due to the insufficient bandwidth of the data tree.

A significant disadvantage of enabling the switched emitterfollower (SEF) THA is that it introduces nonlinearities which can actually degrade the SNDR of the ADC. Two major sources of this are the nonlinear modulation of the base-emitter junction voltage of the SEF during the charging and discharging of the hold capacitance [14], and the mixing of the harmonics of the input signal with the clock signal which folds the harmonics back into the Nyquist bandwidth of the converter. Fig. 9 shows the measured output spectrum of a fabricated breakout, with and without the THA activated. To disable the THA, it is switched to track-mode only. In the case where the THA is disabled [Fig. 9(a)], the second and third harmonics at the output are mostly due to the nonlinearities of the TIA. However, once the THA is activated [Fig. 9(b)], mixed products of the input signal harmonics, as well as the clock feed-through signal, are present at the output. The third order harmonic also increases in magnitude. Depending on the resolution of the ADC and the severity of the clock jitter and skew, the benefit of a THA can outweigh the nonlinearities introduced by it. However, in the case where the data distribution bandwidth is insufficient for proper operation of the THA, it is better to omit the implementation of the THA altogether. In this work, since the data tree bandwidth is insufficient for proper operation of the THA at the targeted clock frequencies, minimizing clock and data skew is of great importance. This criterion was strictly observed by designing symmetric and well-matched clock and data distribution

# III. THE 4-bit FLASH ADC-DAC CHAIN

The fabricated ADC uses the same TIA and BiCMOS data tree as the breakout and merges the buffer and THA. The block diagram of the implemented ADC-DAC chain is shown in Fig. 10. The data tree of BiCMOS cascode buffers distributes the THA output to the entire comparator bank. Each of the 15 comparators, required for 4-bit operation, consists of an offset amplifier, a high gain preamplifier, a flip-flop and a differential pair. A 16th dummy comparator is also employed to maintain the symmetry of the data tree and provide a single thermometer code output off-chip for testing. The comparator bank produces a 15-level thermometer code. For testing, a 15-level thermometer code DAC has been implemented on chip. No decoder logic or bubble correction circuitry has been implemented. The clock tree comprises 17 series-shunt inductively peaked bipolar



Fig. 7. Small-signal characteristics of the fabricated breakout. Measured single-ended s-parameters and simulated single-ended gain of the TIA, THA, data tree and overall s-parameters of the breakout after extraction are shown. The combination of the TIA, THA and the data tree has a measured small-signal 3 dB bandwidth of 16 GHz.



Fig. 8. Measured output of the fabricated breakout: (a) THA OFF; (b) THA ON. Due to the limited data tree bandwidth, the hold mode behavior of the THA amplifier is nearly eliminated.

differential pairs with a fan-out of two per stage; the final stages drive four latches each. A separate clock path drives the THA. A two-stage tuneable delay cell has been implemented on chip to align the THA clock with the comparator clock. The active delay cells consist of phase interpolating blocks between fast



Fig. 9. Measured spectra of the breakout output showing the nonlinearity impact of the THA: (a) THA disabled; (b) THA enabled showing intermodulation products of the input and clock signals in the output spectrum.

and slow paths [15]. The THA can be disabled by forcing it into track mode only via external controls.

## A. ADC-DAC Circuit Descriptions

1) Merged BiCMOS Cascode and THA: In the design of the ADC, a MOS-HBT BiCMOS cascode buffer is merged with the switched emitter-follower THA as shown in the schematics on



Fig. 10. Block diagram of the implemented 35-GS/s, 4-bit, Flash ADC-DAC. The ADC uses data and clock distribution to drive the comparator bank.



Fig. 11. Schematic of the merged MOS-HBT BiCMOS cascode and THA. The total capacitance at node (a) is reduced by connecting the collector of transistor  $Q_9$  to the low impedance cascode node (b) instead.

Fig. 11. The tail current of the THA is 6 mA per side. Compared with [14], the total capacitance at node (A) is reduced by connecting the collector of transistor  $Q_9$  to the low impedance cascode node (B) instead. This approach eliminates the need for

level shifting diodes, and further isolates the hold capacitor,  $\mathrm{C}_{\mathrm{H}},$  from the clock signal.

2) Comparator Design: Offset currents were used in this design to establish the quantization levels without a resistor



Fig. 12. Schematic of MOS-HBT BiCMOS cascode offset amplifier. The offset currents are drawn from the low impedance cascode nodes to mitigate their impact on bandwidth.

ladder [6]. Each of the comparators is preceded by a differential BiCMOS cascode offset amplifier illustrated in Fig. 12. The unit quantization current in this design is 120  $\mu$ A, which in combination with the 170  $\Omega$  load resistors, produces a differential quantization level of 40.8 mV at the inputs of the comparators. Furthermore, since the simulated gain of the data tree is 0 dB, one LSB referred back to the input of the data tree is also 40.8 mV. The offset currents are drawn from the low impedance cascode nodes, (A) and (B), to mitigate their impact on bandwidth. In order to maintain equal delay, the total capacitance at nodes (A) and (B) is kept constant for all comparators. This is accomplished by adding dummy "off" current sources where needed. Offset mismatches introduced by the data tree also present themselves at the output of this block. These types of mismatches may be corrected by calibration of the offset currents. Similarly, data tree gain variations may also be corrected through this type of calibration. However, in this design, due to the large quantization levels and the low data tree gain, no calibration technique has been deemed necessary. Fully bipolar cascode pre-amplifiers with 15 dB of differential gain and >25 GHz bandwidth follow the BiCMOS offset amplifiers. The comparator's metastability window, which refers to the smallest input voltage required to switch a comparator's state, is reduced by the addition of these pre-amplifiers. The preamplifiers also isolate the offset stages from flip-flop kickback. Each flip-flop has two latches with 4 mA tail current and emitter-followers both on the clock and data paths. Inverting stages with 4 mA tail current follow every flip-flop to eliminate the latch induced clock feedthrough and drive the DAC. The combination of the pre-amplifier and flip-flop has a simulated metastability window of better than 20 mV at 35 Gbps.

3) Thermometer Code DAC: The comparator bank produces a 15-level thermometer code. In implementations where thermometer-to-binary conversion is required, bubble error removal can be applied in the form of three-input NAND gates at the cost of increased power consumption [16]. Another method to remove bubble errors that does not increase the power consumption is by adding extra "voting" transistors to the latches as proposed by [17]. However, in this approach the added capacitive loading in the latches may reduce the maximum conversion frequency.

For testing purposes, a 15-level thermometer code DAC has been implemented on this chip. Its schematic is reproduced in Fig. 13. It consists of a differential, double bipolar cascode. Each thermometer code bit drives one CML inverter with a tail of current of 4 mA. The output currents of these CML inverters are summed in the first cascode nodes in groups of four. The remaining four output currents are summed in the second cascode nodes in pairs. Finally the remaining two output currents are summed at the on-chip 50  $\Omega$  resistors. Simulations have shown that, in this technology and with this DAC topology, buffers smaller than 4 mA do not have sufficient bandwidth to operate at 35 Gbps. Each thermometer code produces a differential swing of 200 mV $_{\rm pp}$  for a total differential output swing of 3 V $_{\rm pp}$ . The emitter-followers driving the inverting stages of the DAC operate from 3.3 V while the double cascodes require a 5 V power supply (Fig. 13).

4) Clock Distribution Network: The challenging task of distributing the 35 GHz clock to 32 latches (16 master–slave comparators) is accomplished by a fully bipolar clock distribution tree. The tree consists of series-shunt inductively peaked stages with a fan-out of two per stage; the final stages drive four latches



Fig. 13. Schematic of the implemented 15-level thermometer code DAC. Thermometer code inputs (D0–D15) are summed together at different nodes of the double cascode DAC.



Fig. 14. Chip micrograph of the ADC-DAC, die area is  $2.5 \times 3.2 \text{ mm}^2$ .

each. Each stage has an 8 mA tail current with 500 mV  $_{\rm pp}$  of differential output swing. A separate clock path drives the THA. A two-stage tuneable delay cell is also implemented on chip [15]. It is used to align the clock and data signals and consists of phase interpolating blocks between fast and slow paths.



Fig. 15. Measured ADC INL and DNL with a 35-GHz clock frequency.

## B. ADC-DAC Measurement Results

The ADC-DAC chain has been implemented in Jazz Semi-conductor's SBC18HX, 0.18  $\mu$ m SiGe BiCMOS HBT technology. The process offers an  $f_T$  of 160 GHz/50 GHz for bipolar and nMOS devices respectively. The ADC operates from a 3.3 V supply while the DAC requires a 5 V supply. The chip occupies

|                               | This<br>Work | [6]      | [7]               | [8]               | [9]                | [10]              |
|-------------------------------|--------------|----------|-------------------|-------------------|--------------------|-------------------|
| Resolution (bits)             | 4            | 3        | 8                 | 5                 | 3                  | 6                 |
| Sampling                      | 35           | 40       | 20                | 22                | 24                 | 24                |
| Rate (GS/s)                   |              |          |                   |                   |                    |                   |
| SFDR(dB)                      | 28.5         | 28       | N/A               | 35                | 24.6 <sup>1</sup>  | 44                |
| $At \le 1GHz$                 |              |          |                   |                   |                    |                   |
| SNDR (dB)                     | 24.1         | 18.6     | 40.8              | 30                | 19.5 <sup>1</sup>  | 33.34             |
| $At \le 1GHz$                 |              |          |                   |                   |                    |                   |
| SFDR (dB)                     | 27.3         | 25       | N/A               | < 20              | 23.3 <sup>2</sup>  | 35.3              |
| At 11GHz                      |              |          |                   |                   |                    |                   |
| SNDR (dB)                     | 19.8         | N/A      | N/A               | < 15              | 15.6 <sup>2</sup>  | 23.1              |
| At 11GHz                      |              |          |                   |                   |                    |                   |
| ENOB (bits)                   | 3.7          | 2.8      | 6.5               | 4.7               | 2.95 <sup>1</sup>  | 5.25              |
| ERBW (GHz)                    | 8            | N/A      | 2                 | 5                 | N/A                | 5.6               |
| Full Scale                    | 0.24         | N/A      | 0.5               | 1.28              | 1                  | 1.2               |
| Input (V <sub>pp</sub> Diff.) |              |          |                   |                   |                    |                   |
| ADC Power                     | 4.5W         | 3.8W     | 9.0W <sup>3</sup> | 3.0W <sup>3</sup> | 3.84W <sup>3</sup> | 1.2W <sup>3</sup> |
| Consumption                   |              |          |                   |                   |                    |                   |
| Architecture                  | Flash        | Flash    | Time              | Flash             | Flash              | Time              |
|                               |              |          | Interleave        | Interpolating     |                    | Interleave        |
| Technology                    | SiGe         | SiGe     | 0.18µm            | SiGe              | InP                | 90nm              |
| $(f_T)$                       | BiCMOS       | (210GHz) | CMOS              | BiCMOS            | (150GHz)           | CMOS              |
|                               | (160GHz)     |          | (N/A)             | (150GHz)          |                    | (N/A)             |

# TABLE I PERFORMANCE SUMMARY AND COMPARISON TABLE

<sup>&</sup>lt;sup>2</sup> Measured at 20 GS/s and 10 GHz input. <sup>3</sup> Includes the on-chip decoder power.



Fig. 16. Measured ENOB and SFDR versus input frequency for a 35-GHz clock frequency. The THA degrades the performance of the ADC for input frequencies above 8 GHz.



Fig. 17. Output spectrum of a 7.98 GHz input signal sampled at 35 GHz after corrections. The THA is disabled.

an area of 2.5 mm  $\times$  3.2 mm and the ADC-DAC chain consumes a total of 5 W (4.5 W for the ADC). The data and clock tree require 0.65 W and 1 W respectively. The comparator bank consumes 2.4 W while the TIA and THA amplifier use a total of 0.23 W. The adjustable delay cells and biasing circuitry need a total of 0.22 W. Fig. 14 shows the chip micrograph.

The INL and DNL were measured differentially by applying a full scale DC sweep at the input and capturing the DAC outputs while clocking the ADC at 35 GHz. Fig. 15 shows that the INL and DNL of the ADC are below 0.5 LSB for all analog input levels. The measured INL and DNL of the DAC are better than 0.1 LSB for all digital codes.

All dynamic measurements are performed singled-ended by wafer probing only one of the differential pair outputs. The ADC

was tested with the same input signal power for all input frequencies. After accounting for the  $\sin(x)/x$  DAC roll-off and high frequency losses in the wafer probes and cables, the SFDR and ENOB of the ADC are plotted in Fig. 16 for both THA operation modes. At low frequencies the ENOB is 3.7 bits (24 dB SNDR). Fig. 17 shows a sample output spectrum of a 7.98 GHz input signal with the THA disabled. The effective resolution bandwidth is about 8 GHz irrespective whether the THA is enabled or not. However, beyond 8 GHz the SFDR and ENOB degrade more rapidly with the THA activated. This is the result of the input signal harmonics mixing with the clock signal and showing up at low frequencies, within the bandwidth of the data tree. The third-order harmonics of the input signal is also degraded due to the SEF THA nonlinearities. Therefore, the THA actually degrades the performance of this ADC. In this design,

<sup>&</sup>lt;sup>1</sup> Measured at 2 GS/s.

with 4 bits of precision, careful layout is sufficient to prevent skew from becoming a performance limitation. Table I presents a performance summary and comparison table.

#### IV. CONCLUSION

A 4-bit, 35-GS/s flash ADC-DAC chain was demonstrated in this paper. The ADC uses active clock and data trees to distribute the input and the clock to the comparators. Because both clock and data are symmetrically distributed, this implementation minimizes timing skew achieving 3.7 ENOB with 8 GHz effective resolution bandwidth and 3 ENOB up to 11 GHz without the full-rate THA.

#### ACKNOWLEDGMENT

The authors would like to acknowledge Gennum for financial support, Jazz Semiconductor for fabrication, and CMC for CAD tools.

#### REFERENCES

- [1] M. Harwood, N. Warke, R. Simpson, T. Leslie, A. Amerasekera, S. Batty, D. Colman, E. Carr, V. Gopinathan, S. Hubbins, P. Hunt, A. Joy, P. Khandelwal, B. Killips, T. Krause, S. Lytollis, A. Pickering, M. Saxton, D. Sebastio, G. Swanson, A. Szczepanek, T. Ward, J. Williams, R. Williams, and T. Willwerth, "A 12.5 Gb/s SerDes in 65 nm CMOS using a baud-rate ADC with digital receiver equalization and clock recovery," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2007, pp. 436–437.
- [2] H. Bae, J. B. Ashbrook, J. Park, N. R. Shanbhag, A. C. Singer, and S. Chopra, "An MLSE receiver for electronic dispersion compensation of OC-192 fiber links," *IEEE J. Solid-State Circuits*, vol. 41, no. 11, pp. 2541–2554, Nov. 2006.
- [3] J. C. Jensen and L. E. Larson, "A broadband 10 GHz track-and-hold in Si/SiGe HBT technology," *IEEE J. Solid-State Circuits*, vol. 36, no. 3, pp. 325–330, Mar. 2001.
- [4] Y. Lu, W.-M. Kuo, X. Li, R. Krithivasan, J. D. Cressler, Y. Borokhovych, H. Gustat, B. Tillack, and B. Heinemann, "An 8-bit 12 Gsample/sec SiGe track-and-hold amplifier," in *Proc. Bipolar/BiCMOS Circuits and Technology Meeting (BCTM)*, 2005, pp. 148–151.
- [5] H. Tagami, T. Kobayashi, Y. Miyata, K. Ouchi, K. Sawada, K. Kubo, K. Kuno, H. Yoshida, K. Shimizu, T. Mizuochi, and K. Motoshima, "A 3-bit soft-decision IC for powerful forward error correction in 10-Gb/s optical communication systems," *IEEE J. Solid-State Circuits*, vol. 40, no. 8, pp. 1695–1705, Aug. 2005.
- [6] W. Cheng, W. Ali, M.-J. Choi, K. Liu, T. Tat, D. Devendorf, L. Linder, and R. Stevens, "A 3 b 40 GS/s ADC-DAC in 0.12 μm SiGe," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2004, pp. 262–263.
- [7] K. Poulton, R. Neff, B. Setterberg, B. Wuppermann, T. Kopley, R. Jewett, J. Pernillo, C. Tan, and A. Montijo, "A 20 GS/s 8 b ADC with a 1 MB memory in 0.18 μm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2003, pp. 318–319.
- [8] P. Schvan, D. Pollex, S. Wang, C. Falt, and N. Ben-Hamida, "A 22 GS/s 5 b ADC in 0.13 μm SiGe BiCMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2006, pp. 572–573.
- [9] H. Nosaka, M. Nakamura, M. Ida, K. Kurishima, T. Shibata, M. Tokumitsu, and M. Muraguchi, "A 24-Gsps 3-bit Nyquist ADC using InP HBTs for electronic dispersion compensation," in 2004 IEEE MTT-S Int. Microwave Symp. Dig., 2004, pp. 101–104.
- [10] P. Schvan, J. Bach, C. Falt, P. Flemke, R. Gibbins, Y. Greshishchev, N. Ben-Hamida, D. Pollex, J. Sitch, S.-C. Wang, and J. Wolczanski, "A 24 GS/s 6 b ADC in 90 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2008, pp. 544–545.

- [11] E. Sackinger, Broadband Circuits for Optical Fiber Communications. New York: Wiley, 2005.
- [12] M. Shinagawa, Y. Akazawa, and T. Wakimoto, "Jitter analysis of high-speed sampling systems," *IEEE J. Solid-State Circuits*, vol. 25, no. 2, pp. 220–224, Feb. 1990.
- [13] H. Tran, F. Pera, D. S. McPherson, D. Viorel, and S. P. Voinigescu, "6-kΩ 43-Gb/s differential transimpedance-limiting amplifier with auto-zero feedback and high dynamic range," *IEEE J. Solid-State Circuits*, vol. 39, no. 10, pp. 1680–1689, Oct. 2004.
- [14] S. Shahramian, A. C. Carusone, and S. P. Voinigescu, "Design methodology for a 40-Gsamples/s track and hold amplifier in 0.18-μm SiGe BiCMOS technology," *IEEE J. Solid-State Circuits*, vol. 41, no. 10, pp. 2233–2240. Oct. 2006.
- [15] T. O. Dickson, E. Laskin, I. Khalid, R. Beerkens, J. Xie, B. Karajica, and S. P. Voinigescu, "An 80-Gb/s 2<sup>31</sup>-1 pseudorandom binary sequence generator in SiGe BiCMOS technology," *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 2735–2745, Dec. 2005.
- [16] M. Steyaert, R. Roovers, and J. Craninckx, "A 100 MHz 8 bit CMOS interpolating A/D converter," in *Proc. IEEE Custom Integrated Circuits Conf. (CICC)*, May 1993, pp. 28.1.1–28.1.4.
- [17] J. van Valburg and R. J. van de Plassche, "An 8-b 650-MHz folding ADC," *IEEE J. Solid-State Circuits*, vol. 27, no. 12, pp. 1662–1666, Dec. 1992.



Shahriar Shahramian (S'06) received the B.A.Sc. (Hons) degree in computer engineering from the University of Toronto, Toronto, ON, Canada, in 2003. He enrolled in the Masters program in electrical engineering at the University of Toronto in 2003 and transferred to the Ph.D. program in 2004. He is currently working toward the Ph.D. degree in the Department of Electrical and Computer Engineering, University of Toronto.

His research interests include the design of highspeed and mm-wave integrated circuits with focus on

high-speed A/D converters and DSP-based equalizers. He also completed an internship program at Alcatel-Lucent in 2008 where he worked on E-Band wireless transceivers.

Mr. Shahramian received the Aloha Award in recognition of his B.A.Sc. thesis. He was also the recipient of the Ontario Graduate Scholarship in 2003 and University of Toronto Fellowship from 2004 to 2006. He won the Best Paper Award at the Compound Semiconductor IC Symposium in 2005. He has also received six teaching awards between the years 2005–2008 at University of Toronto.



**Sorin P. Voinigescu** (SM'02) received the M.Sc. degree in electronics from the Polytechnic Institute of Bucharest, Romania, in 1984, and the Ph.D. degree in electrical and computer engineering from the University of Toronto, Toronto, ON, Canada, in 1994.

From 1984 to 1991, he worked in R&D and academia in Bucharest, where he designed and lectured on microwave semiconductor devices and integrated circuits. Between 1994 and 2002, he was with Nortel Networks and Quake Technologies in Ottawa, Canada, where he was responsible for

projects in high-frequency characterization and statistical scalable compact model development for Si, SiGe, and III-V devices. He later conducted research on wireless and optical fiber building blocks and transceivers in these technologies. In 2002, he joined the University of Toronto, where he is a full Professor. He has authored or coauthored over 100 refereed and invited technical papers spanning the simulation, modeling, design, and fabrication of high frequency semiconductor devices and circuits. His research and teaching interests focus on nanoscale semiconductor devices and their application in integrated circuits at frequencies beyond 200 GHz.

Dr. Voinigescu received the Nortel President Award for Innovation in 1996 and is a member of the TPCs of the IEEE CSICS and BCTM. He is a co-recipient of the Best Paper Award at the 2001 IEEE CICC and at the 2005 IEEE CSICS, and of the Beatrice Winner Award at the 2008 IEEE ISSCC. His students have won Best Student Paper Awards at the 2004 IEEE VLSI Circuits Symposium, the 2006 SiRF Meeting, 2006 RFIC Symposium, and 2006 BCTM.



Anthony Chan Carusone (SM'08) received the B.A.Sc. and Ph.D. degrees from the University of Toronto, Toronto, ON, Canada, in 1997 and 2002, respectively, during which time he received the Governor-General's Silver Medal.

Since 2001, he has been with the Department of Electrical and Computer Engineering at the University of Toronto, where he is currently an Associate Professor. In 2008, he was a visiting researcher at the University of Pavia, Italy, and later at the Circuits Research Lab of Intel Corporation, Hillsboro, Oregon.

Prof. Chan Carusone was a coauthor of the best paper at the 2005 Compound Semiconductor Integrated Circuits Symposium and the best student papers at both the 2007 and 2008 Custom Integrated Circuits Conferences. He is an appointed member of the Administrative Committee of the IEEE Solid-State Circuits Society, a member and past chair of the Analog Signal Processing Technical Committee for the IEEE Circuits and Systems Society, and a member and past chair of the Wireline Communications subcommittee of the Custom Integrated Circuits Conference. He serves as a guest editor for both the IEEE JOURNAL OF SOLID-STATE CIRCUITS and the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS. He is currently Editor-in-Chief of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS.