# 5–10 Gb/s 70 mW Burst Mode AC Coupled Receiver in 90-nm CMOS

Masum Hossain and Anthony Chan Carusone, Senior Member, IEEE

*Abstract*—A low power burst mode receiver architecture is presented which can be used for AC coupled links where low frequency signal components are attenuated by the channel. The nonlinear path comprises a hysteresis latch that recovers the missing low frequency content and a linear path that boosts the high frequency component by taking advantage of the high pass channel response. By optimally combining them, the front-end recovers NRZ signals up to 13 Gb/s burning only 26 mW in 90 nm CMOS. A low powerand area-efficient clock recovery scheme uses the linear path to injection lock an oscillator. A simple theory and simulation technique for ILO-based receivers is discussed. The clock recovery technique is verified with experimental results at 5–10 Gb/s in 90 nm CMOS consuming 70 mW and acquiring lock within 1.5 ns.

*Index Terms*—Burst mode, dicode channel, AC coupling, halfrate, injection locking.

#### I. INTRODUCTION

IGH-SPEED links with small AC coupling capacitances are increasing in importance. For example, wireless interconnects using either inductive or capacitive coupling between stacked dice can achieve high density [1], [2], [3]. These interconnects also introduce spectral nulls at DC. As a result, the receiver receives a stream of positive and negative pulses corresponding to the rising and falling edges of transmitted data. Receivers which are capable of recovering NRZ signals from these narrow pulses are referred to in this work as AC coupled receivers, and are not to be confused with receivers for links with a relatively large DC blocking capacitor where the received waveform still look like an NRZ signal with some baseline wander [4]. Fabrication of such interconnects are challenging due to the required alignment and heat dissipation [5]. The focus of this work is to present power and area efficient I/O circuits which do not limit the interconnect density and reduce the heat that must be dissipated. One possible implementation is shown in Fig. 1. A shared PLL can perform frequency acquisition globally [6] and skew compensation is done individually per link.

The present status of AC coupled receivers is summarized in Fig. 2 where the power efficiency is plotted for different bit-rates. The primary focus of existing AC coupled receivers is NRZ signal recovery, and several front-ends for this purpose

The authors are with the Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario M5S 3G4, Canada (e-mail: masum@eecg.utoronto.ca; tcc@eecg.utoronto.ca).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSSC.2009.2039535



Fig. 1. AC coupled pulse transceivers for high density I/Os.



Fig. 2. Summary of existing AC coupled pulse receivers.

have been demonstrated with excellent power efficiency. However, when clock recovery is included their power efficiency significantly degrades [1].

Burst mode capability is also desired for high-density AC coupled interconnects as it results in reduced power and area consumption [7]. In [7] no timing recovery is performed; the receiver fully relies on matching between the data and forwarded clock path [7]. This requires very accurate matching of the interconnect and stacked-die coupling. On the other hand, the clock recovery techniques used in [1] and [2] are too slow to allow burst mode operation. A timing recovery scheme which can recover a clock within several bit periods is sought. Such fast locking techniques are already employed in other wireline applications such as passive optical network (PON) systems. For

Manuscript received June 20, 2009; revised October 29, 2009. Current version published February 24, 2010. This paper was approved by Associate Editor Jafar Savoj. This work was supported by Intel Corporation and Broadcom.



Fig. 3. Equivalent circuit of an AC coupled link including transmitter, channel and receiver.

example, the 10 Gb/s receiver presented in [8] is AC coupled and also provides fast locking. However, the 2 W of power consumed in [8] is not acceptable for chip-to-chip applications.

In this work we provide both NRZ recovery and fast locking with significantly reduced power (70 mW) consumption. Compared to the previous work in [9] we have made several modifications: (i) We use a hysteresis latch with a variable decision threshold. This allows the receiver to be implemented with less preamp gain which translates into a 20% power savings. (ii) In this work the high-pass linear signal path introduced in [9] is further leveraged to facilitate clock recovery over a broad frequency range. (Clock recovery was not considered in [9].) It will be shown later that the use of this linear path results in improved jitter and lower power consumption compared with [1]. (iii) Unlike [8], [10], [11], we use a half-rate ILO for clock recovery which further reduces the power consumption. Half-rate injection locking has been previously used for DC coupled channels [12]. Taking advantage of the high-pass channel response, half-rate injection locking is here adopted for AC coupled receivers. In summary, the high pass response provided by the AC coupled channel is used to advantage for both equalization and clock recovery.

In Section II we will first focus on the NRZ signal recovery circuitry including experimental results. In Section III, we will discuss burst mode timing recovery techniques where the linear path is used to extract timing information. Some theory required to evaluate such a CDR will be derived in Section IV. The implementation and experimental results are discussed in Section V.

## II. NRZ SIGNAL RECOVERY

NRZ signal recovery from AC coupled links is studied in, for example, [1], [2], [5]. A simple capacitvely-coupled channel is shown in Fig. 3. Since the coupling capacitances are small (on the order of  $50 \rightarrow 500$  fF) the time constant RC is much less than a bit period and the channel response is approximately sRCover the band of interest. Hence, the transmitted NRZ signal is differentiated. On the receiver side, the channel's differentiation can be undone by comparing the received signal to a threshold that depends upon the last bit, similar to peak detection previously employed in differentiating magnetic storage channels [13]. At the targeted data rates of 5–10 Gb/s, multiple lowerrate data streams can be aggregated onto each pair. The transmitter may employ either current-mode buffers or voltage-mode buffers as in [1].

#### A. Hysteresis Latch (Nonlinear Path)

The functionality described above is equivalent to decision feedback equalization (DFE) [13]. Unlike a conventional DFE, this can be implemented without any clock using a simple hysteresis latch as shown in Fig. 4(a). The decision threshold level is updated based on most-recently decoded bit,  $v_{th}(n) = \beta V(n)$ . Note that this feedback circuit provides only two stable operating points at  $+V_o$  and  $-V_o$ . The output will be forced high (to  $+V_o$ ) whenever the voltage  $A_v v_{in}$  exceeds  $\beta V_o$  and it will be forced low (to  $-V_o$ ) whenever the voltage  $A_v v_{in}$  is below  $-\beta V_o$ . When the voltage is in the range  $-\beta V_o < A_V v_{in} <$  $+\beta V_o$ , the output polarity is held at its previous state.

The circuit implementation is shown in Fig. 4(b) [9]. Here  $A_v = g_{m-\mathrm{in}}R_{\mathrm{in}}$  and the feedback gain  $\beta$  is equal to  $g_{m2}R_{\mathrm{in}}$ . To provide hysteresis, the small signal loop gain  $g_{m2}R_{\mathrm{in}}g_{m3}R_{L1}$  should be greater than 1. Since the feedback is positive, the output amplitude increases exponentially until it completely switches the differential pairs  $g_{m2}$  and  $g_{m3}$  at which point the feedback signal is  $\beta V_o = I_{\mathrm{tail}}R_{\mathrm{in}}$ . Hence, the input referred threshold of the latch is

$$|V_{\rm in-th}| = \frac{\beta V_o}{A_V} = \frac{I_{\rm tail}}{g_{m-\rm in}}.$$
 (1)

Note that by adjusting the tail current sources  $(I_{\text{tail}})$  the decision threshold levels can be adjusted according to the input signal swing. The transient behavior of the decision threshold and output voltage are shown in Fig. 5(a) for two settings of  $I_{\text{tail}}$ . There are two settling times associated with this circuit: the threshold settling time,  $T_{V_{\text{th}}}$  and the output settling time,  $T_{V_o}$ . To enable a fast response, the capacitive loading at the internal nodes is minimized by using  $g_{m3}$  as an output buffer.



Fig. 4. Hysteresis latch topology for NRZ recovery: (a) NRZ recovery with DFE; (b) proposed Implementation.

For larger threshold voltages, the signal takes longer to switch from  $-V_{\rm th}$  to  $+V_{\rm th}$ . Note that as  $I_{\rm tail}$  is increased, the threshold voltage increases linearly  $(|V_{\rm th}| \propto I_{\rm tail})$  but the transconductance increases more slowly  $(g_{m2} \propto \sqrt{I_{\text{tail}}})$  and as a result the settling time also increases. This effect can be mitigated by simultaneously changing the tail currents of both  $g_{m2}$  and  $g_{m3}$ . The measured threshold voltage and settling time  $(T_{V_{th}})$ is shown in Fig. 5(b) as a function of the tail current. In chip to chip burst mode applications as described in [7], the threshold voltage can be adapted to the channel using off line calibration. The threshold settling time  $T_{V_{\rm th}}$  varies from 60 ps to 80 ps for different threshold voltages which suggests that the maximum achievable data rate is 13 Gb/s. However, the output settling time  $T_{V_{0}}$  is longer (120 ps) which would limit the achievable data rate to 8 Gb/s. Existing techniques use cascode transistors and inductive peaking to achieve 10 Gb/s [8]. These techniques cannot be used in this application where area and power efficiency is critical. Thus, the hysteresis latch loses some of the signal's high frequency content while restoring the DC component of the received signal. Fortunately, this high frequency content can be recovered by adding a parallel linear path as shown in Fig. 6.

# B. Linear Path

A linear path is added in parallel to the hysteresis latch using a broadband amplifier which has the same circuit topology as the hysteresis circuit. By swapping the feedback nodes, the feedback becomes negative instead of positive and improves the bandwidth of the linear path [14]. The transfer function of this amplifier is 2nd order, and the feedback gain  $g_{mf}R_{in}$  is chosen to provide a maximally flat frequency response as evidenced by the measured frequency response of the preamp and linear amplifier shown in Fig. 7. Since the AC coupled channel response is inherently high pass, the linear path can provide 14 dB of boost at 8 GHz with respect to low frequency (< 200 MHz) gain



Fig. 5. Threshold adjustment with  $I_{tail}$  in the proposed hysteresis latch: (a) settling time of the decision threshold and output voltage; (b) variation of threshold voltage and settling time as a function of tail current.

(Fig. 7). In summary, the linear path provides better signal integrity at high frequency (e.g., alternating 1s and 0s) whereas the nonlinear path provides better signal integrity at low frequency (i.e., several consecutive 1s and 0s). By combining them we improve overall signal integrity at both low and high frequency, similar to equalization.



Fig. 6. Dual path receiver architecture with linear amplifier and analog adder.



Fig. 7. Measured linear path response including and without the AC coupled channel.



Fig. 8. Die photo of the dual path receiver in 90 nm CMOS.

## C. Experimental Results

A prototype front-end is implemented in 90-nm CMOS and the die photo is shown in Fig. 8. The benefit of the linear path is shown in measurements at 10 Gb/s in Fig. 9. The linear path bandwidth is set according to the data rate. The relative weight



(b)

Fig. 9. The effect of the linear path on a recovered 10 Gb/s eye diagram for  $2^7 - 1$  data: (a) without linear path; (b) with linear path.

of the linear and nonlinear path in this prototype is set manually to maximize the eye opening. For verification of error free



Fig. 10. Transmitted and recovered  $2^7 - 1$  sequence at 12 Gb/s with and without equalization captured by a pattern-locked oscilloscope. Arrows on the top indicate errors in the unequalized pattern which are corrected by equalization with the linear path.

recovery, the transmitted sequence and recovered equalized sequence are captured in Fig. 10. Note the bits highlighted by the arrows where the linear path restores bits that would otherwise be missed by the hysteresis latch alone. The maximum achievable data rate of 13 Gb/s is limited by the hysteresis settling time which is 80 ps.

## **III. TIMING RECOVERY**

NRZ signals have a spectral null at  $1/T_{bit}$ . To extract a clock tone the NRZ signal is passed through a nonlinearity. The extracted clock tone may then be filtered out with a bandpass filter for timing recovery. This method of clock recovery is known as 'the nonlinear spectral line method'. Traditionally, an off-chip high Q dielectric resonator is used as the bandpass filter to eliminate high-frequency (pattern-dependent) jitter in the extracted tone [15].

AC coupled channels provide a distinct advantage for burst mode clock recovery; the channel response itself filters high frequency jitter [16]. Thus, the CDR jitter tracking bandwidth can be extended to accommodate fast locking without a significant jitter penalty. However, the relatively slow settling time of the hysteresis latch output reintroduces some pattern-dependent high-frequency jitter. Hence, to take advantage of the channel response the hysteresis latch output should not be used for timing recovery. Previous methods for clock recovery in AC coupled links are shown in Fig. 11. Fortunately in this work, the high-frequency content in the linear path, already being used for equalization, can also be used for clock recovery, as shown in Fig. 12. That signal is passed through a nonlinearity and an integrated injection locked oscillator with moderate Q is used in place of an off-chip bandpass filter.

Fig. 13(a) shows the general implementation of the nonlinear spectral line method. An analog multiplier provides the required nonlinearity; three possible implementations are shown in Fig. 13(b)–(d). The first method uses a delay element and XOR gate [12], [10] [Fig. 13(b)]. To support 5–10 Gb/s operation, this technique will require a broadband tunable delay element to provide a delay of  $T_{\rm bit}/2$ . In the second method, the linear path output is rectified by multiplying it with the recovered NRZ signal [Fig. 13(c)]. Finally, in the third method



Fig. 11. AC coupled pulse receivers: (a) as in [1]; (b) as in [8].

the output of the linear path is squared to extract the clock tone [Fig. 13(d)]. To compare their performance we consider simulated waveforms at  $V_{\rm NRZ}$  and  $V_{\rm Slope}$ . Note that all three methods generate a clock tone at 10 GHz, but their jitter performance significantly varies as shown in the eye diagrams. To obtain more insight, the jitter spectrum is plotted. The peak-to-peak pattern-dependent jitter of the first and second methods are 12 ps and 14 ps respectively. Significant portions of the jitter are due to ISI on  $V_{\rm NRZ}$ , mostly introduced by the finite settling time of the hysteresis latch. These ISI components appear as periodic jitter at multiples of 0.5/(pattern length) resulting in spurs on the plots of jitter spectrum in Fig. 14. Since the ILO JTB is on the order of 100s of MHz, this jitter cannot be filtered. The peak-to-peak jitter of the third method, employed in this work, is  $2.5 \times$  lower compared to other techniques because it uses only the waveform  $V_{\text{Slope}}$  which has less ISI.

The implemented squaring circuit is shown in Fig. 15. The extracted clock amplitude sets the injection strength of the ILO. Resistor  $R_{\rm com}$  is used to shift the common mode level making it compatible with the injection inputs of the ILO. The extracted



Fig. 12. Proposed dual path AC coupled pulse receiver with clock recovery using linear path.



Fig. 13. Clock recovery with nonlinear spectral line method: (a) general block diagram; (b) as in [10]; (c) as in [17]; (d) this work.

clock tone is captured on-die with an oscilloscope at 5-Gb/s and shown in Fig. 16.

#### IV. CLOCK RECOVERY USING ILO

Injection locking is functionally equivalent to a large bandwidth PLL, but can be implemented with smaller area and lower power. Existing ILO based CDRs use T-FFs [12] and LC VCOs [10], [17]. Alternatively, ring oscillators can be used for injection locking and will be compared to LC oscillators in this work. The locking behavior of an ILO was studied in [18] for small injection strengths. In the case of LC oscillators, this study was extended for large injection strengths in [19]. In this work our focus is to study the transient locking behavior for both small injection and for large injection while keeping the VCO topology general as in [20].

The general ILO model shown in Fig. 17 is adopted from [21], [19]. The phasor diagram in Fig. 17 is taken with respect to the injected frequency,  $\omega_{inj}$ . Let the ILO's instantaneous oscillation frequency be  $\omega$ . Thus, the oscillator output phasor  $I_{osc} = |I_{osc}|e^{j\theta}$  rotates with an instantaneous angular frequency  $\omega - \omega_{inj}$ . Let  $\omega_0$  be the ILO free-running frequency (i.e., the frequency at which it oscillates with no injection) and  $\Delta \omega$  is the inherent frequency difference,  $\Delta \omega = \omega_0 - \omega_{inj}$ . The phasor  $I_L$  is the vector summation of  $I_{inj}$  and  $I_{osc} : I_L = I_{osc} + I_{inj} = |I_L|e^{j(\theta-\phi)}$  where  $\phi = \angle H_{VCO}$  is the phase response of the

ILO. Finally, let K be the amplitude of the injecting signal normalized to that of the ILO output,  $K = |I_{inj}|/|I_{osc}|$ . To study the phase tracking of the ILO, we derive the transient phase response of the VCO output starting at time t = 0 with an arbitrary phase difference  $\theta_0$ . It is shown in the appendix that this phase difference exponentially decreases to zero and can be expressed as

$$\theta(t) \approx \theta_0 e^{-t/\tau}.$$
 (2)

For small injection strengths K < 1, time constant  $\tau$  is

$$\tau = 1 / \left( \sqrt{\frac{K^2}{A^2} - \Delta \omega^2} \right). \tag{3}$$

For large injection strengths,  $K \ge 1, \tau$  is

$$\tau = 1/\left(\sqrt{\frac{1}{A^2} - \Delta\omega^2}\right).\tag{4}$$

The constant A captures the VCO topology's effect on  $\tau$ .

$$A \approx -\left. \frac{d \tan \phi}{d\omega} \right|_{\omega=\omega_0}.$$
 (5)

For the parallel *RLC* resonant tank, it was shown in [18], [19] that  $A = (2Q)/(\omega_o)$  where Q is the quality factor of the tank circuit. Similarly for ring oscillators it is shown in [20] that  $A \cong$ 



Fig. 14. Simulation results comparing different timing recovery schemes in both time and frequency domain at 10 Gb/s.

 $(n)/(2\omega_o)\sin((2\pi)/(n))$ . Here n is the number of delay stages in the ring.

Note that the single time constant response in (2) implies a first order low pass transfer function. Thus, when the input injecting clock is phase modulated at a modulation frequency  $w_{\text{jitter}}$ , the resulting output phase modulation is related to that of the input by a stable first-order low-pass input jitter transfer function

$$JTF_{INPUT}(\omega_{jitter}) = \frac{1}{1 + j\omega_{jitter}/\omega_P}$$
(6)

where  $\omega_P = 1/\tau$  is also known as the jitter tracking bandwidth (JTB). Jitter tolerance ( $J_{\text{TOL}}$ ) can be obtained from the jitter transfer function as described in [22, p. 330].

$$J_{\text{TOL}}(\omega_{\text{jitter}}) = \left| \frac{0.5}{1 - \text{JTF}_{\text{INPUT}}(\omega_{\text{jitter}})} \right|$$
$$= \left| 0.5 \frac{1 + j\omega_{\text{jitter}}/\omega_P}{j\omega_{\text{jitter}}/\omega_P} \right|. \tag{7}$$

TABLE I SUMMARY OF ILO BASED CDR PARAMETERS

| Jitter tracking bandwidth $(\omega_p)$   | $\sqrt{\frac{K^2}{A^2} - \Delta \omega^2}$                                              |  |  |
|------------------------------------------|-----------------------------------------------------------------------------------------|--|--|
| Phase step response $(\theta(t))$        | $1 - 	heta_0 e^{-\omega_p t}$                                                           |  |  |
| Jitter transfer function $(JTF_{INPUT})$ | $rac{1}{1+j\omega_{jitter}/\omega_P}$                                                  |  |  |
| Jitter tolerance $(J_{TOL})$             | $\mid 0.5 rac{1+j\omega_{jitter}/\omega_P}{j\omega_{jitter}/\omega_P}\mid$             |  |  |
| Phase noise $(S_{out})$                  | $\frac{\omega_P^2 S_{inj} + \omega_{jitter}^2 S_{ILO}}{\omega_P^2 + \omega_{jitter}^2}$ |  |  |

The above 1st order expressions presume a high transition density in the incoming data, such as can be provided by a line code. When the injection signal is derived from random data, in the absence of data transitions, the ILO drifts towards its natural frequency of oscillation and thus accumulates jitter. On the arrival of the next transition, the VCO frequency and phase are pulled back to the injected frequency and phase. In the presence of L consecutive identical digits (CIDs) the effective injection strength is reduced by a factor L, with an attendant shift in the pole of  $JTF_{INPUT}$ .

$$\omega_P = \sqrt{\frac{K^2}{L^2 A^2} - \Delta \omega^2}.$$
(8)

This reduction in JTB due to CIDs has been verified experimentally. ILO theory presented in this section is summarized in Table I which can be applied to any VCO by deriving appropriate parameter A.

## V. CLOCK RECOVERY IMPLEMENTATION

## A. LC Versus Ring ILO

The expressions in Table I show how the lock time, jitter tracking and jitter tolerance can be traded off against each other. If we consider the effective Q of a ring oscillator as  $Q = (\omega_0/2) d\phi/d\omega|_{\omega=\omega_0}$  [23], we see that for a fixed resonant frequency  $\omega_0$ , the effective Q of a ring is proportional to  $n\sin(2\pi/n)$  [23]. This explains why more stages in a ring make it more frequency-stable and, like a high-Q LC oscillator, make it slower to track phase steps in an injecting input and give it lower JTB. In general, increasing the injection strength K, results in faster phase response, hence higher jitter tracking bandwidth. Thus, to improve the lock time of an injection-locked LC oscillator, one can either increase the injection strength K or reduce Q. Both approaches will result in increased power consumption. Furthermore providing the required tuning range to support 5-10 Gb/s operation is very difficult using an LC oscillator.



Fig. 15. Schematic of the Gilbert multiplier used for clock extraction.



Fig. 16. Recovered NRZ signal and corresponding extracted clock tone at 5 Gb/s.



Fig. 17. Injection locked oscillator model and corresponding phasor diagram.

Ring oscillators can provide a wider tuning range and faster locking than their *LC* counterparts, but also have worse phase noise and higher power consumption in the 5–10 GHz range. Since the CDR will be designed with large JTB, a significant portion of the oscillator phase noise can be filtered, but the power consumption remains a problem. To help overcome this, a half rate architecture is adopted where a 5 GHz oscillator is injected with a 10 GHz recovered clock tone. Thus, the ILO will work as an injection locked divider, which can be used to directly demux the recovered NRZ data. The theory derived above for ILOs injected near their fundamental frequency is still applicable to half rate injection with one exception: the output referred lock range is divided by two. A critical parameter for the ring ILO is the number of stages n. Increasing the number of stages results in higher power consumption and longer lock time. The performance of this ring oscillator based half rate scheme is compared to LC ILO based full rate clock recovery in Fig. 18. For the same injection strength, K = 0.1, a 5 GHz 4 stage ring ILO provides  $2.5 \times$  faster locking compared to a 10 GHz (Q = 3.5) LC ILO [Fig. 18(a)]. The phase noise of the ILO is shaped by a first-order high-pass transfer function

$$JTF_{ILO}(\omega_{jitter}) = \frac{j\omega_{jitter}/\omega_P}{1 + j\omega_{jitter}/\omega_P}.$$
(9)

If  $S_{ILO}$  is the ILO phase noise and  $S_{inj}$  is the phase noise of the injected clock, then the phase noise of the recovered clock can be expressed as

$$S_{\text{out}}(\omega_{\text{jitter}}) = |\text{JTF}_{\text{INPUT}}(\omega_{\text{jitter}})|^2 S_{\text{inj}}(\omega_{\text{jitter}}) + |\text{JTF}_{\text{ILO}}(\omega_{\text{jitter}})|^2 S_{\text{ILO}}(\omega_{\text{jitter}}). \quad (10)$$

Using transfer functions from (6) and (9), (10) can be written as

$$S_{\rm out}(\omega_{\rm jitter}) = \frac{\omega_P^2 S_{\rm inj}(\omega_{\rm jitter}) + \omega_{\rm jitter}^2 S_{\rm ILO}(\omega_{\rm jitter})}{\omega_P^2 + \omega_{\rm jitter}^2}.$$
 (11)

Equation (11) is validated by simulations in Fig. 18(b). The phase noise of two different free-running ILOs (one ring and one LC),  $S_{\rm ILO}$ , and of a much quieter injecting clock,  $S_{\rm inj}$ , are obtained from transistor-level simulations. These are substituted into (11) to obtain the dashed line predictions of the phase noise under injection locking in Fig. 18(b). Under the same conditions, transistor level simulation results are plotted with the solidlines in Fig. 18(b), matching very well with the dashed line theory. Even with moderate injection strength, most of this phase noise is filtered and the effect of the ILO's inherent phase noise on the recovered clock is insignificant except at very high offset frequencies the phase noise of the recovered clock is 6 dB lower than the reference injection. This is because the ILO



Fig. 18. Comparison of ILO performance for *LC* vs ring VCO. The *LC* VCO is operating at 10 GHz, Q = 3.5. The ring oscillator is operating at 5 GHz with n = 4 stages: (a) lock time as a function of injection strength; (b) phase noise of the free running VCO and corresponding recovered clock; (c) jitter tolerance with a frequency offset,  $\Delta \omega = 2\pi \times 10^6$ .

is dividing the injecting clock by 2. Finally, the ring oscillator has a larger JTB and, hence, is more tolerant to jitter when injection-locked to the received data [Fig. 18(c)].

# B. ILO Design and Implementation

A 4 stage ring oscillator is designed with stage 1 and 3 used for injection and stage 2 and 4 used to tune the free-running oscillation frequency (Fig. 19). The differentially tunable delay stage results in less amplitude variation over the 2 to 6 GHz frequency range compared to single ended tuning. Transmitted NRZ data, the extracted clock tone and the corresponding locked clock phases are shown in Fig. 19. The n = 4 stage ring provides an in-phase clock (Phase 0°) locked to the data edges and a quadrature clock (Phase 90°) to sample the centre of the data eye. The halfrate receiver directly demultiplexes 10 Gb/s NRZ data (Fig. 20).

## C. ILO Non-Ideality

Although injection locking provides area and power efficiency, it suffers from several limitations. Firstly, although the oscillator output  $V_{\rm ILO}$  is phase locked to the equalizer output  $V_{\rm EQ}$  by injection locking, the actual sampling phase suffers additional delay through the clock buffers preceding the sampling FFs. This additional delay mismatch causes a static phase offset,  $\theta_{\rm error} = \Delta T_{\rm clk} - \Delta T_{\rm data}$ , which is not corrected since it is outside the phase tracking loop. As a result jitter tolerance outside the tracking bandwidth is degraded:

$$J_{\rm TOL}(\omega_{\rm jitter}) = \left| \frac{0.5}{1 - \text{JTF}_{\rm INPUT}(\omega_{\rm jitter})} \right| - \theta_{\rm error}.$$
 (12)

Identical buffer stages were used in the clock and data paths to try to match the delay through those paths. To compensate for any remaining skew, the free-running VCO frequency is detuned away from the injection frequency,  $\theta_{ss} \approx \sin^{-1}(A\Delta\omega/K)$ . Doing so does reduce the JTB, but as long as the delay mismatch is kept less than 0.1 UI by careful layout, only a small frequency offset is required and there will be negligible change in the JTB. In this prototype, under a locked condition, the ILO frequency was manually tuned to maximize timing margin. In a practical implementation this can be automated as in, for example, [24] with an on-die oscilloscope. This tuning can be performed once during calibration and turned off during normal operation to avoid any significant power and area overhead.

In a high density proximity coupled application, these ILOs will be packed densely. Thus, coupling between the ILOs is a major concern. In the case of a ring oscillator, coupling mainly occurs through the supply and substrate network. Fortunately, higher jitter tracking bandwidth provides better immunity to supply noise. Moreover, the VCO delay cells are implemented with current mode logic, providing good supply and substrate noise immunity compared with static CMOS logic.

#### D. Simulation Techniques

To evaluate the ILO's jitter transfer characteristics two techniques are used. First, the jitter transfer function is generated from the phase noise of the free running VCO and the phase noise of the recovered clock. Using (6) the ILO jitter transfer can be written as

$$|\text{JTF}_{\text{input}}(\omega_{\text{jitter}})|^{2} = \frac{\omega_{P}^{2}}{\omega_{\text{jitter}}^{2} + \omega_{P}^{2}}$$
$$= 1 - \frac{\omega_{\text{jitter}}^{2}}{\omega_{\text{iitter}}^{2} + \omega_{P}^{2}}.$$
(13)



Fig. 19. Schematic of the ring oscillator based ILO and corresponding timing diagram.



Fig. 20. Block diagram of the demux.

Since  $S_{\rm ILO} \gg S_{\rm inj}$  for a ring oscillator, around the cutoff frequency the output phase noise from (11) can be approximated as

$$S_{\rm out}(\omega_{\rm jitter}) \approx \frac{\omega_{\rm jitter}^2 S_{\rm ILO}(\omega_{\rm jitter})}{\omega_{\rm jitter}^2 + \omega_P^2}.$$
 (14)

Combining (13) and (14), the jitter transfer function can be expressed in terms of the phase noise  $S_{\text{ILO}}$  and  $S_{\text{out}}$ :

$$|\text{JTF}_{\text{input}}(\omega_{\text{jitter}})|^2 \approx 1 - \frac{S_{\text{out}(\omega_{\text{jitter}})}}{S_{\text{VCO}}(\omega_{\text{jitter}})}.$$
 (15)



Fig. 21. Normalized jitter transfer function for different injection strengths with n = 4 stages, an oscillation frequency of 5 GHz and an injection frequency of 10 GHz.

This method is used to estimate the jitter transfer of the ring ILO in simulation for different injection strengths, and compared with the theoretical results of Table I and to simulations where the injected signal is sinusoidally phase modulated in Fig. 21. Good agreement is observed.

For burst mode applications, the lock time is an important specification which requires designers to simulate the CDR's response to phase step. The system's step response can also be used to estimate its jitter transfer characteristics. Estimates so

Authorized licensed use limited to: The University of Toronto. Downloaded on March 17,2010 at 13:31:38 EDT from IEEE Xplore. Restrictions apply.

V Simulation

800

1000

Theory

Frequency (MHz) Unit Eye (UI) Fig. 22. Transient phase response and corresponding jitter transfer function for different injection strengths with n = 4 stages, an oscillation frequency of 5 GHz and an injection frequency of 10 GHz.

25

(qB)

-2

-3

K=0.24

K=0.20

K=0.16

K=0.12

K=0.08 60

200

400

600

**Normalized JTF** 

Simulation

Theory

K=0 24

K=0.20

K=0,16

K=0.12

K=0.08

20

obtained are shown in Fig. 22 and are also in good agreement with the developed theory (Fig. 21).

Step Input

15

1

0.8

0.6

0.4

0.2

0

10

Phase Step Response

Vormalized

With a large JTB, it is possible to generate a relatively low jitter clock from a low power (hence, noisy) ring oscillator. Increasing the injection strength K results in faster locking, higher JTB and better jitter tolerance, as illustrated in Figs. 21 and 22.

## E. Experimental Results

For experimental verification, a complete receiver including the equalizer, half rate clock recovery and demultiplexer is implemented in 90-nm CMOS (Fig. 23). For testability, probe pads (with a parasitic capacitance of 25 fF each) and buffers are included at the output of the front-end, edge detector and VCO. An AC coupled channel is emulated with on-die 80 fF coupling capacitors and 50 ohm termination resistors. Probe pads with 25 fF capacitance are also included on either side of the coupling capacitors to characterize their frequency response. A schematic of the emulated channel is shown in Fig. 23. Excluding the probe pads and test structures, the active area of the receiver is less than 0.3 mm<sup>2</sup>. Experimental verification and optimization of the front-end equalizer has already been documented in Section II. The extracted clock tone output has also been shown in Section III. This section will mainly focus on clock recovery results.

The receiver was designed to support data rates from 5 to 10 Gb/s requiring the ILO to have a tuning range from 2.5 GHz to 5 GHz. The ILO has a tuning range from 2 GHz to 6 GHz as shown in Fig. 24(a), providing some margin for process and temperature variations. The simulated and measured lock range are shown over the ILO's tuning range with different injection strengths, K, in Fig. 24(b). Note that the input injection frequency is twice the oscillation frequency since the VCO is used as injection locked divider. Theoretically, lock range increases with injection strength and oscillation frequency,  $\omega_{\text{lock}} \propto K \omega_0$ . At higher frequencies, deviation from this trend is observed due to un-accounted-for parasitics at the injection nodes. The receiver is tested with a 10 Gb/s external PRBS source at the input. At 10 Gb/s, the input was provided single-endedly to avoid any intra-pair skew in the test setup. To improve the single-ended signal integrity, the common mode node  $V_{\rm com}$  is coupled to ground with a 10 pF capacitance, but this would not be required in an application where differential inputs are always provided. The ILO's free running and locked spectra are shown in



Fig. 23. Implemented complete receiver in 90 nm CMOS.

Fig. 25. The recovered clock and corresponding demuxed data are shown in Fig. 26 and the corresponding jitter transfer function is shown in Fig. 27. To study jitter accumulation during consecutive identical digits, the phase noise for both an alternating input data pattern (with at most 1 CID) and a  $2^7 - 1$  PRBS input data pattern (with at most 7 CIDs) are also plotted in Fig. 27. Note that the measured jitter transfer function is in good agreement with theory and simulation. However, the measured phase noise of the free-running VCO and recovered clock was significantly higher compared to simulation due to supply noise not accounted for in the theoretical model. For example, the simulated recovered clock phase noise is better than -125 dBc/Hz at 1 MHz offset (Fig. 18) whereas the measured results in Fig. 27 show -120 dBc/Hz at 1 MHz offset. Similarly, the simulated ring VCO phase noise is better than -110 dBc/Hz at 30 MHz offset (Fig. 18) compared to -100 dBc/Hz at the same offset frequency. A burst mode test pattern comprising 10 ns of no data (consecutive 0s) followed by 5 ns of alternating data (1s and 0s) is shown along with the ILO clock output captured on an oscilloscope in Fig. 28. The measured lock time is less than 1.5 ns, which is as expected for the measured lock range of 900 MHz.



Fig. 24. Measured tuning range and lock range of the implemented 5 GHz 4 stage ring ILO.



Fig. 25. Spectrum of the free running and recovered clock.

### VI. CONCLUSION

The proposed clock recovery method is compared with previously reported burst mode receivers in Table II. Note that all AC coupled receivers consume more power than their DC counterparts due to the additional circuitry required for NRZ signal recovery. The simple and inductorless implementation of the proposed architecture results in a small area of  $0.3 \text{ mm}^2$ . Although a low power, poor phase noise ring oscillator is used as an ILO, the jitter of the recovered clock is still comparable to the existing *LC* VCO based CDRs.

# APPENDIX A TRANSIENT PHASE RESPONSE

To derive the transient phase response of an ILO we define a variable that captures the impact of oscillator topology on the injection dynamics as in [20].

$$A \cong -\left. \frac{d \tan \Phi}{d\omega} \right|_{\omega - \omega_o}.$$
 (16)



Fig. 26. Recovered clock and retimed demuxed data.

The analysis in [19] may then be generalized resulting in the following the locking equation:

$$\frac{d\theta}{dt} = -\frac{1}{A} \frac{K \sin \theta}{(1 + K \cos \theta)} + \Delta \omega.$$
(17)

The locking equation can be solved for  $\theta(t)$  in two particular cases: small and large injection strength.

1) Case I, Small Injection  $(K \ll 1)$ : In this particular case the locking (16) can be simplified as follows:

$$\frac{d\theta}{\Delta\omega - \frac{K}{A}\sin\theta} = dt.$$
 (18)

To find the transient phase response we integrate both sides. There are two possible solutions depending on the frequency

Authorized licensed use limited to: The University of Toronto. Downloaded on March 17,2010 at 13:31:38 EDT from IEEE Xplore. Restrictions apply.

|                | [8]            | [1]        | [10]          | [11]        | [17]          | This work     |
|----------------|----------------|------------|---------------|-------------|---------------|---------------|
| AC/DC Coupled  | AC coupled     | AC coupled | DC Coupled    | DC Coupled  | DC Coupled    | AC Coupled    |
| Clock recovery | Full-rate GVCO | DLL        | Full-rate ILO | Gated VCO   | Full-rate ILO | Half-rate ILO |
| Technology     | 0.13um         | 0.18um     | 90nm          | 0.18um      | SiGe          | 90nm          |
| Lock time      | <1 ns          |            | 50 ps         | <3.2 ns     |               | <1.5 ns       |
| Bit-rate       | 10 Gb/s        | 3 Gb/s     | 20 Gb/s       | 10 Gb/s     | 10.3 Gb/s     | 5-10 Gb/s     |
| Clock Jitter   | 3.2 ps RMS     | 7 ps RMS   | 1.2 ps RMS    | 1.47 ps RMS | 1.45 ps RMS   | 2.2 ps RMS    |
|                | 19.6ps p-p     |            | 8 ps p-p      |             |               | 15.5 ps p-p   |
| Receiver Power | 1.2 W          | 117 mW     | 175 mW        | 200 mW      | 230 mW        | 70 mW         |
| Area           | $6.25mm^2$     |            | $0.96mm^2$    | $3.4mm^2$   | $0.5mm^2$     | $0.3mm^{2}$   |
| FoM(pJ/bit)    | 120            | 39         | 8.75          | 20          | 23            | 7.0           |

 TABLE II

 Comparison of State-of-the-Art Burst Mode Clock Recovery Technique



Fig. 27. Normalized jitter transfer function and phase noise of the recovered clock.



Fig. 28. Experimental verification of lock time.

difference between the injected clock tone and free running ILO frequency. First, we will consider a frequency offset small enough to keep the ILO within its lock range:  $\Delta \omega < \omega_{\text{LOCK}} \approx K/A$ . In this case the integration yields:

$$\frac{-1}{\mu} \log \left[ \frac{-\frac{K}{A} + (\Delta\omega)\sin\theta + (\mu)\cos\theta}{\Delta\omega - \frac{K\sin\theta}{A}} \right] = t \qquad (19)$$

where

$$\mu = \sqrt{\frac{K^2}{A^2} - \Delta\omega^2} \tag{20}$$

To further simplify it, we assume that the frequency offset,  $\Delta \omega$  is much smaller than the lock range, i.e.,  $\Delta \omega \ll \omega_{\text{LOCK}} \approx K/A$ . This assumption is valid when the CDR performs frequency acquisition so that the ILO's self-resonant frequency  $\omega_0$ is tuned very close to the incoming data rate. With that assumption, the above equation is simplified to

$$\frac{1+\cos\theta}{\sin\theta} = e^{-\left(\sqrt{\frac{K^2}{A^2} - \Delta\omega^2}\right)t}.$$
(21)

Substituting  $\sin(\alpha) = 2\sin(\alpha/2)\cos(\alpha/2)$  and  $1 + \cos(\alpha) = 2\cos^2(\alpha/2)$ 

$$\theta(t) = 2 \tan^{-1} \left( e^{-\left(\sqrt{\frac{K^2}{A^2} - \Delta\omega^2}\right)t} \right) + C \approx \theta_0 e^{\left(-\sqrt{\frac{K^2}{A^2} - \Delta\omega^2}\right)t}$$
(22)

where  $\theta_0$  is the initial difference between the injected clock phase and free running VCO phase. Substituting  $A = 2Q/\omega_0$ gives the same transient expression derived in [19] for *LC* oscillators.

Authorized licensed use limited to: The University of Toronto. Downloaded on March 17,2010 at 13:31:38 EDT from IEEE Xplore. Restrictions apply.

2) Case II, Large Injection  $(K \approx 1)$ : In this case the locking (16) is simplified as

$$\frac{d\theta}{\Delta\omega - \frac{1}{A}\tan(\theta/2)} = dt.$$
 (23)

Similar to the previous case, the time domain phase variation can be obtained by integrating with respect to time:

$$\frac{A^2}{\Delta\omega^2 A^2 + 1} \left[ (\Delta\omega)\theta - \frac{1}{2A} \log \times (\Delta\omega\cos\theta/2 - (1/A)\sin\theta/2) \right] = t. \quad (24)$$

Within the lock range and for small frequency offset i.e.,  $\Delta \omega \ll \omega_{\text{LOCK}} \approx K/A, \theta(t)$  can be further simplified:

$$\theta(t) = 2\sin^{-1}[e^{-t/A}] \approx \theta_0 e^{-t/A}.$$
 (25)

In summary, for both small injection and large injection, the phase difference exponentially decreases to zero. For small injection strength the time constant is a strong function of  $K, \tau = -K/A$ , whereas for large injection the time constant is independent of  $K, \tau = -1/A$ . In [19], this conclusion was derived for *LC* oscillators whereas in this work we have generalized it to any VCO topology by appropriately defining A.

#### ACKNOWLEDGMENT

The authors would like to thank Intel Corp. and Broadcom for funding this research and CMC for providing fabrication facilities.

#### References

- [1] L. Luo, J. M. Wilson, S. E. Mick, J. Xu, L. Zhang, and P. D. Franzon, "A 3 Gb/s AC coupled chip-to-chip communication using a low swing pulse receiver," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2005, pp. 522–523.
- [2] K. Kanda, D. Antono, K. Ishida, H. Kawaguchi, T. Kuroda, and T. Sakurai, "1.27 Gb/s/pin 3 mW/pin wireless superconnect (WSC) interface scheme," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2003, pp. 186–127.
- [3] A. Fazzi et al., "3D capacitive interconnections with mono- and bidirectional capabilities," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2007, pp. 356–357.
- [4] E. Fang, G. Asada, R. Kumar, S. Hale, P. Ken, and M. Leary, "A 5.2 Gb/s hypertransport integrated ac coupled receiver with DFR DC restore," in *IEEE Symp. VLSI Circuits Dig.*, Jun. 2007, pp. 34–35.
- [5] R. Drost, R. Hopkins, R. Ho, and I. Sutherland, "Proximity communication," *IEEE J. Solid-State Circuits*, vol. 39, no. 9, pp. 1529–1535, Sep. 2004.
- [6] A. Tajalli, P. Muller, and Y. Leblebici, "A power-efficient clock and data recovery circuit in 0.18-μm CMOS technology for multi-channel short-haul optical data communication," *IEEE J. Solid-State Circuits*, vol. 42, no. 10, pp. 2288–290, Oct. 2007.
- [7] N. Miura *et al.*, "An 11 Gb/s inductive-coupling link with burst transmission," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2008, pp. 288–290.
- [8] M. Nogawa et al., "A 10 Gb/s burst-mode CDR IC in 0.13 μ m CMOS," in IEEE ISSCC Dig. Tech. Papers, Feb. 2005, pp. 228–229.
- [9] M. Hossain and A. Chan Carusone, "A 14 gb/s 32 mW AC coupled receiver in 90-nm cmos," in *Symp. VLSI Circuits Dig.*, Kyoto, Japan, Jun. 2007, pp. 186–187.
- [10] J. Lee and M. Liu, "A 20-Gb/s burst-mode clock and data recovery circuit using injection-locking technique," *IEEE J. Solid-State Circuits*, vol. 55, no. 3, pp. 619–630, Mar. 2008.
- [11] C.-F. Liang, S.-C. Hwu, and S.-I. Liu, "A 10 Gbps burst-mode CDR circuit in 0.18 μm CMOS," in *Proc. IEEE Custom Integrated Circuits Conf.*, Sep. 2006.
- [12] Murata and T. Otsuji, "A novel clock recovery circuit for fully monolithic integration," *IEEE Microw. Theory Tech.*, vol. 12, no. 12, pp. 2528–2533, Dec. 1999.

- [13] R. W. Wood and R. W. Donaldson, "Decision feedback equalization of the DC null in high-density digital magnetic recording," *IEEE Trans. Magn.*, vol. 14, pp. 218–221, Jul. 1978.
- [14] S. Galal and B. Razavi, "10-Gb/s limiting amplifier and laser/modulator driver in 0.18-μm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 38, no. 12, pp. 2138–2146, Dec. 2003.
- [15] P. Monteiro, J. Matos, A. Gamerio, and J. Rocha, "10 Gb/s timing recovery circuit using dielectric resonators and active bandpass filters," *IEEE Electron. Lett.*, vol. 28, Apr. 1992.
- [16] B. Casper and F. O'Mahony, "Clocking analysis, implementation and measurement techniques for high-speed data links a tutorial," *IEEE Trans. Circuits Syst. I*, vol. 56, no. 1, pp. 683–688, Jan. 2009.
- [17] J. Zhan, J. Duster, and K. Kornegay, "Full-rate injection-locked 10.3 Gb/s clock and data recovery circuit in a 45 GHz ft SiGe process," in *Proc. IEEE Custom Integrated Circuits Conf.*, San Jose, CA, Sep. 2005, vol. 2.
- [18] R. Adler, "A study of locking phenomena in oscillators," Proc. IRE, vol. 33, pp. 351–357, Jun. 1946.
- [19] L. J. Paciorek, "Injection locking of oscillators," *Proc. IEEE*, vol. 53, pp. 1723–1728, Nov. 1965.
- [20] M. Hossain and A. Chan Carusone, "CMOS oscillators for clock distribution and injection-locked deskew," *IEEE J. Solid-State Circuits*, vol. 44, no. 8, pp. 2138–2153, Aug. 2009.
  [21] H. R. Rategh and T. H. Lee, "Superharmonic injection-locked fre-
- [21] H. R. Rategh and T. H. Lee, "Superharmonic injection-locked frequency dividers," *IEEE J. Solid-State Circuits*, vol. 34, no. 6, pp. 813–821, Jun. 1999.
- [22] B. Razavi, Design of Integrated Circuits for Optical Communications, 1st ed. Cambridge, U.K.: Cambridge Univ. Press, 2002.
- [23] B. Razavi, "A study of phase noise in CMOS oscillators," *IEEE J. Solid-State Circuits*, vol. 31, no. 3, pp. 331–343, Mar. 1996.
- [24] F. O'Mahony et al., "A 27 Gb/s forwarded clock I/O receiver using an injection-locked LC-DCO in 45 nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2008.



**Masum Hossain** received the B.Sc. degree in electrical engineering from Bangladesh University of Engineering and Technology, Bangladesh, and the M.Sc. degree from Queen's University, Canada, in 2002 and 2005, respectively. During his M.Sc. work, he worked on K-band wireless receiver in CMOS. Since 2005, he has been working towards the Ph.D. degree in electrical engineering at University of Toronto, Canada.

From September 2007 to January 2008, he was with Intel Circuit Research Lab (CRL) as a graduate

intern. Currently, he is working for Gennum Corporation in Analog and Mixed Signal division. His research interest includes mixed signal circuits for high-speed chip-to-chip communications, low power VCO, phase interpolator and clock recovery techniques.

Mr. Hossain won the Best Student Paper Award in 2008 IEEE Custom Integrated Circuits (CICC) Conference.



Anthony Chan Carusone (S'96–M'02–SM'08) completed the B.A.Sc. and Ph.D. degrees at the University of Toronto, Toronto, ON, Canada, in 1997 and 2002, respectively, during which time he received the Governor-General's Silver Medal.

Since 2001, he has been with the Department of Electrical and Computer Engineering at the University of Toronto where he is currently an Associate Professor. In 2008 he was a visiting researcher at the University of Pavia, Italy, and later at the Circuits Research Lab of Intel Corporation, Hillsboro, OR.

Prof. Chan Carusone was a coauthor of the best paper at the 2005 Compound Semiconductor Integrated Circuits Symposium and the best student papers at both the 2007 and 2008 Custom Integrated Circuits Conferences. He is an appointed member of the Administrative Committee of the IEEE Solid-State Circuits Society and the Board of Governors of the Circuits and Systems Society, a member and past chair of the Analog Signal Processing Technical Committee for the IEEE Circuits and Systems Society, and a member and past chair of the Wireline Communications subcommittee of the Custom Integrated Circuits Conference. He has served as a guest editor for both the IEEE JOURNAL OF SOLID-STATE CIRCUITS and the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS. He served on the editorial board of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS from 2006 to 2009 when he was Editor-in-Chief.