#### **CLOCK RECOVERY IN HIGH-SPEED MULTILEVEL SERIAL LINKS**

Faisal A. Musa and Anthony Chan Carusone Email: tcc@eecg.utoronto.ca Dept. of Electrical and Computer Engineering, University of Toronto 10 King's College Rd., Toronto, CANADA, M5S 3G4

### ABSTRACT

This paper introduces a simple and hardware efficient clock recovery method for high speed serial links and compares its performance with conventional techniques. Conventional methods are conceptually complex and difficult to realize since they rely on data transitions to recover the clock by oversampling the received signal. In contrast, the new method monitors one or more signal levels and aligns the clock sampling phase with the maximum vertical data eye opening by using the minimum mean squared error algorithm. Besides being easily implementable in a standard CMOS technology, this new method requires only baud rate sampling and is independent of the data transition density. Behavioral simulations predict superior performance of this method compared to a conventional bang bang phase detector based architecture.

### 1. INTRODUCTION

The explosion in the number of Internet users has led to an enormous amount of data being handled by the Internet backbone. The resulting demand in network data bandwidth has motivated the development of high speed serial links. Applications for serial links include transferring voice, data and high resolution graphics via coaxial cable, network backplanes and optical fibres at data rates from 622 Mb/s to 10Gb/s with a near future target of 40Gb/s [1]. However, finite transmission channel bandwidth, CMOS circuit speed limitations and stringent jitter requirements on the clock sources are the main challenges of increasing the data rate. One solution is to use multilevel pulse amplitude modulation (PAM) signaling which encodes multiple bits per symbol and requires less bandwidth than conventional non-return to zero (NRZ) signaling for the same data rate. However, clock and data recovery (CDR) of multilevel PAM signals is complicated by the existence of asymmetric transitions which result in zero crossings that are misplaced from the midpoint between two consecutive symbols. These transitions must be ignored by the phase detector (PD) in the CDR. Moreover, as the number of levels in the PAM signal is increased to save bandwidth over conventional NRZ transmission, the CDR circuit increases in complexity and power consumption.

This paper describes a novel multilevel PAM timing recovery technique that neither requires monitoring of the data transitions nor oversampling of the received waveform. It basically uses a Bang-Bang PD to generate early/late pulses based on minimum mean squared error (MMSE) criteria [2]. Called Sign-Sign MMSE (SSMMSE), this method is simple both conceptually and in terms of hardware implementation. Also, it requires only baud rate sampling of the received waveform thus eliminating the need for quadrature clocks in a half rate system. There is almost no penalty in circuit complexity and power consumption as the number of signal levels is increased because this method can achieve clock recovery by monitoring a single level.

## 2. TIMING RECOVERY FROM MULTILEVEL SIGNALS

Timing recovery using linear or bang-bang PD is basically accomplished by using data transitions to update the phase of the sampling receiver clocks. When considering multilevel signals, only symmetric zero crossings (i.e. transitions to the same magnitude level but opposite polarity) are to be considered, as illustrated in Fig. 1. Conventional methods require on-chip coding to guarantee a high enough symmetric zero crossing density for reliable clock recovery [3].



# Fig. 1. Symmetric and asymmetric zero crossings in a multilevel signal.

Instead of monitoring zero crossings, SSMMSE timing recovery monitors one or more signal levels and adjusts the clock phase such that the error between the sampled value of data and the corresponding signal level is minimized.

### 2.1 Linear Phase Detector based CDR

Linear phase detectors produce pulses that are proportional to the phase error between the transition edge of the transmitted data and the sampling clock. The width or the amplitude of the PD pulses are varied linearly in accordance to the phase error.

For multilevel PAM timing recovery, extra circuitry is required to allow the PD output to be processed only during symmetric zero crossings. This is achieved by sampling the data both at the transition edge and at the center of the data eye [3]. The former samples are used to calculate the phase error whereas the latter samples are digitized and are used to detect symmetric zero crossings in a transition detector. Once a symmetric zero crossing is detected by the transition detector, the corresponding phase error is utilized to correct the sampling phase. Fig. 2 illustrates this PD for a full rate clock.

The advantage of this technique is the low jitter of the recovered clock. However, this method is less robust since the PD is basically analog. Also, to reduce the on-chip frequency requirement, it is always desirable to recover a half or lower rate clock. For oversampling at twice the baud rate using a half or lower rate clock, the VCO must be capable of generating precisely matched multiple clock phases. This leads to a complicated VCO structure and higher power consumption.



## Fig. 2. Multilevel PAM timing recovery using linear phase detector[3] with a full rate clock.

#### 2.2 Bang-Bang Phase Detector based CDR

The bang-bang (or Alexander) phase detector generates a binary output that indicates whether the clock leads or lags the data. The data is sampled during rising and falling edges of the full rate clock and the PD generates early/late pulses by comparing three consecutive data samples (denoted by 'a', 'b', 'c').During a data transition (i.e. a & c are unequal), if the data samples during consecutive rising and falling edges of the clock have the same logic level (i.e. a & b are equal), then the clock is early otherwise the clock is late:

 $a = b \neq c \rightarrow EarlyPulse$  $a \neq b = c \rightarrow LatePulse$ 

These early and late pulses can be used to drive a charge pump. When the loop is in lock, alternate early and late pulses are applied to the charge pump.

For multilevel PAM timing recovery, a,b,c signals only in the vicinity of symmetric transitions are applied to the PD logic to generate early/late pulses. A possible implementation is shown in Fig. 3 for a full rate clock. This method is more robust than the linear method since the PD is basically digital. However, this method suffers from higher jitter in the recovered clock. Also, for half or lower rate clocks this method requires the VCO to produce multiple clock phases.



Fig. 3. Multilevel PAM timing recovery using Alexander phase detector with a full rate clock.

## 2.3 SSMMSE Phase Detector based CDR

In this method, the following two quantities are required:

1) The slope of the received signal at the sampling instant.

2) The error between the sampled value and a particular signal level.

These two quantities are used to correct the VCO sampling phase (i. e. rising edge of the full rate clock) such that the mean squared error [2] is minimized. As a result the VCO positions the sampling phase at the maximum data eye opening. The technique is best understood by referring to the eye diagram of Fig. 4.



## Fig. 4. Eye diagram for 4-PAM signal.

Let A be the sampled data due to a clock sampling phase of  $t_1$ . The sampling phase being incorrect, the sampled data deviates from the desired signal level (which is +0.5 in this case) thus producing a finite positive error. Therefore, to decrease the mean squared error, the VCO frequency must decrease so that the sampled data point is C instead of A. Point C differs from point A in terms of slope and error. Thus a knowledge of these two quantities can be utilized to delay the clock sampling phase. Again, if the sampling phase is  $t_3$  (i.e. point B) the VCO frequency must increase to advance the clock phase to  $t_2$ . The decision to advance or delay the sampling phase can be based on the signs of the error and the slope at the sampling point as shown in Table 1. Here a positive error/slope is denoted by 1 and a negative error/slope by 0:

 TABLE 1. SSMMSE PD Truth Table

| Sampling point<br>(Fig. 4) | Error<br>(e) | Slope<br>(s) | Sampling<br>Phase |
|----------------------------|--------------|--------------|-------------------|
| А                          | 1            | 0            | Early             |
| В                          | 1            | 1            | Late              |
| D                          | 0            | 0            | Late              |
| Е                          | 0            | 1            | Early             |

Consequently, early/late pulses can be generated using XOR/ XNOR gates:

$$Early = e \oplus s$$

$$Late = e \oplus s$$

The error with respect to a particular signal level can be generated by a comparator. A level detector ensures that data samples only in between reference levels  $V_{ref1} \& V_{ref2}$  are used to gener-

ate the PD logic. The additional hardware required to generate the slope is minimal since this is usually readily available elsewhere. For example, an integrate and dump circuit, which is commonly used to perform filtering in many receivers [4] has the following input-output relationship at any instant  $\tau$ :

$$y(kT+\tau) = \int_{(k-1)T+\tau}^{kT+\tau} u(t)dt$$

where u and y are the input and output signals respectively. Thus the derivative of the output can be expressed as,

$$\dot{y}(kT+\tau) = u(kT+\tau) - u((k-1)T+\tau)$$

The sign of the slope can be obtained by comparing the current value of the input with a sample of u delayed by one symbol period, T. Fig. 5 shows the complete block diagram of SSMMSE timing recovery when monitoring +0.5 level. When monitoring more than one level, the PD outputs of the individual levels must be combined by an OR gate.



# Fig. 5. Multilevel PAM timing recovery using SSMMSE PD with a full rate clock (+0.5 level is being monitored).

As mentioned before, the other two methods if implemented with a half or lower rate clock, would require a complicated VCO that generates perfectly matched multiple clock phases. But SSMMSE is a simpler alternative for high speed serial links since it always requires half the number of clock phases as that required by the other two methods. For example, to retime the data with a quarter rate clock, the other two methods would require eight clock phases separated in phase by 45° but the SSMMSE method requires only four clock phases separated in phase by 90°. However, its jitter performance is inferior to that of linear PD based CDRs.

#### 3. SIMULATION RESULTS

The behavioral simulation based performance of linear, bangbang and SSMMSE PD based CDRs in recovering the clock of a 4Gsymbols/s 4-PAM signal is presented in this section. The serial link was realized by a coaxial cable model that introduced additive white gaussian noise at an SNR of 20dB and was based on the transmission line model in [5]. The VCO(KVCO = 0.4 GHz/V), loop filter (R1=10 k $\Omega$ , C1 = 10 pF, C2 = 1 pF) and charge pump models were derived from [6]. All components of the CDRs being ideal, behavioral simulations predict only pattern dependent jitter. The VCO and loop filter components were identical for all three CDRs but the charge pump gain was varied to ensure approximately the same settling time (since the PD gain is different for the three CDRs). The loop bandwidth could have been chosen as the constant parameter instead of the settling time to realize similar loop dynamics for the three CDRs. However, determination of the loop bandwidth of nonlinear PD based CDRs is complicated since the PD gain is not well defined. Again since the loop bandwidth and settling time are inversely related, constant settling time would imply fixed loop bandwidth for all practical purposes.

Table 2 shows that for the same charge pump gain, two different sampling clock phases are produced for the SSMMSE CDR when monitoring a maximum/minimum level (+1.5/-1.5) and a mid-level (+0.5/-0.5). The difference can be explained by referring to Fig. 6 where the mean squared error is plotted against sampling phase for two different levels.

TABLE 2.Comparison of SSMMSE PD basedCDR performance (single level)

| Monitored<br>Level(s) | Charge<br>Pump<br>Gain | Settling<br>Time | Average<br>Phase<br>in lock | RMS<br>Jitter | Peak to<br>Peak<br>Jitter |
|-----------------------|------------------------|------------------|-----------------------------|---------------|---------------------------|
| V                     | μA/rad                 | µ sec            | rad                         | ps            | ps                        |
| +0.5 or -0.5          | 1.6                    | 0.41             | 1.13                        | 3.2           | 12.48                     |
| +1.5 or-1.5           | 1.6                    | 0.33             | 1.35                        | 1.8           | 7.6                       |
| <sup>2</sup> 0.1      |                        |                  |                             |               |                           |



## Fig. 6. Mean squared error vs. sampling phase for the SSMMSE CDR for +1.5 & 0.5 levels.

The minimum mean squared error (MMSE) occurs at 1.1rad and 1.35 rad for +0.5 level and +1.5 level respectively, as predicted in the transient simulation (Table 2). Although the minimum error is larger for the +1.5 level, the jitter is lower. This is because SSMMSE is not concerned with the magnitude of the mean squared error: it only uses the MMSE algorithm to detect whether early/late pulses should be applied to the charge pump. The width or amplitude of the early/late pulses remain unchanged irrespective of the MSE. The dip of the curves in Fig. 6 near the MMSE indicate the jitter. A sharper dip (as observed for the +1.5 level monitoring) implies less variation around the sampling phase corresponding to the MMSE, thus producing less jitter. The acquisition of phase for the two CDRs is shown in Fig.7.



Fig. 7. Phase acquisition for SSMMSE CDR (single level).

When monitoring more than one level simultaneously, several different combinations can exist. Table 3 shows the different combinations along with simulation results. To ensure the same settling time, the charge pump gain is scaled by N when monitoring N simultaneous levels. Best performance is obtained when monitoring max. and min. signal levels simultaneously. The performance degrades as the number of mid-levels in the combination increases.

| Monitored<br>Levels                     | Charge<br>Pump<br>Gain | Settling<br>Time | Average<br>Phase<br>(in lock) | RMS<br>Jitter | Peak<br>to<br>Peak<br>Jitter |
|-----------------------------------------|------------------------|------------------|-------------------------------|---------------|------------------------------|
| V                                       | μA/<br>rad             | µ sec            | rad                           | ps            | ps                           |
| Two mid-levels                          | 0.8                    | 0.41             | 1.13                          | 2.8           | 10.53                        |
| Max. & min. levels                      | 0.8                    | 0.36             | 1.36                          | 1.2           | 5.3                          |
| Max./min level and<br>mid level         | 0.8                    | 0.39             | 1.27                          | 1.82          | 7ps                          |
| Two mid-levels and max/min level        | 0.53                   | 0.42             | 1.23                          | 1.9           | 7.4                          |
| Max. & min. levels<br>and one mid-level | 0.53                   | 0.39             | 1.31                          | 1.24          | 5                            |
| Four levels                             | 0.4                    | 0.4              | 1.28                          | 1.5           | 6                            |

 TABLE 3. Comparison of SSMMSE PD based CDR

 performance (more than one level)

Table 4 summarizes simulation results for linear, Alexander and SSMMSE PD based CDRs. The linear and Alexander PD based CDRs align the clock rising edge with the mid-point of the horizontal data eye opening while the SSMMSE PD aligns it with the maximum vertical data eye opening (Fig. 8). Since theses two points are not the same in the 4Gsymbol/s 4-PAM data signal shown in Fig. 9, the SSMMSE clock rising edge is not aligned with the other two clock rising edges (Fig. 9). This also accounts for the difference in average phase in lock (Table 4).



Fig. 8. Horizontal & vertical eye openings for NRZ data.

| Phase Detector<br>Type | Charge<br>Pump<br>Gain | Set-<br>tling<br>Time | Aver-<br>age<br>Phase | RMS<br>Jitter | Peak<br>to<br>Peak |
|------------------------|------------------------|-----------------------|-----------------------|---------------|--------------------|
|                        |                        |                       | in lock               |               | Jitter             |
|                        | μA/<br>rad             | µ sec                 | rad                   | ps            | ps                 |
| Linear                 | 2000                   | 0.36                  | 0.86                  | 0.5           | 1.8                |
| Alexander              | 420                    | 0.32                  | 0.85                  | 1.9           | 8.2                |
| SSMMSE (two            | 0.8                    | 0.36                  | 1.36                  | 1.2           | 5.3                |
| levels=+1.5&-1.5)      |                        |                       |                       |               |                    |

TABLE 4. Comparison of CDR performance.





#### 4. CONCLUSIONS

Different CDR techniques for multilevel signals were presented. Simulation results predict that the linear PD based CDR has the lowest jitter and Alexander PD based CDR has the most jitter. Both CDRs require oversampling at twice the baud rate. In contrast, the SSMMSE PD based CDR requires only baud rate sampling of the received data and hence half the number of clock phases as that of the other two methods. The ultimate choice of the CDR method may depend upon the particular channel and shape of the received eye.

### REFERENCES

- J. Khoury and K. Lakshmikumar, "High-speed serial transceivers for data communication systems," *IEEE Communications Magazine*, pp. 160-165, July 2001.
- [2] E. Lee and D. Messerschmitt, *Digital Communication*, Kluwer Academic Publishers, Massachusetts, 1997.
- [3] R. Farjad-Rad, C. Yang, M. Horowitz, and T.H. Lee, "A 0.3μm CMOS 8-Gb/s 4-PAM serial link transceiver," *IEEE Journal of Solid State Circuits*, vol. 35, no. 5, pp 757-764, May 2000.
- [4] J. L. Zerbe, P.S. Chau, C. W. Werner, W. Stonecypher, H.J. Liaw, G.J. Yeh, T.P. Thrush, S.C. Best, and K.S. Donnelly, "A 2 Gb/s/pin 4-PAM parallel bus interface with transmit crosstalk cancellation, equalization, and integrating receivers," in *IEEE Int. Solid State Circuits Conf.* Feb. 2001, pp. 66-67.
- [5] D. Johns and D. Essig, "Integrated circuits for data transmission over twisted-pair channels," *IEEE Journal of Solid State Circuits*, vol. 32, pp. 398-406, March 1997.
- [6] D. Johns and K. Martin, Analog Integrated Circuit Design, John Wiley & Sons, 1997.