# An 8mW Frequency Detector for 10Gb/s Half-Rate CDR using Clock Phase Selection

Mohammad Sadegh Jalali<sup>1</sup>, Ravi Shivnaraine<sup>1</sup>, Ali Sheikholeslami<sup>1</sup>, Masaya Kibune<sup>2</sup>, Hirotaka Tamura<sup>2</sup> <sup>1</sup>Department of Electrical and Computer Engineering, University of Toronto, Canada, <sup>2</sup>Fujitsu Laboratories Limited, Japan

Abstract—A half-rate single-loop CDR with a new frequency detection scheme is introduced. The proposed frequency detector selects between the clock phases (I and Q) to reduce cycle slipping, hence improving lock time and capture range. This frequency detector, implemented within a 10Gb/s CDR in Fujitsu 65nm CMOS, consumes only 8mW, but improves the capture range by up to  $3.6 \times$ . The measured capture range with the FD is from 8.675Gb/s to 11Gb/s.

## I. INTRODUCTION

Most clock and data recovery (CDR) circuits include two locking mechanisms: one for frequency and one for phase. Conventional frequency detectors (in reference-less CDRs) are based on either the analog [1-2] or the digital implementation of the quadricorrelator architecture [3-7]. Digital quadricorrelator-based frequency detectors (DQFD) for halfrate CDRs typically sample four phases of the clock  $(0^{\circ}, 45^{\circ})$ , 90° and 135°) at each data edge to uniquely identify phase rotation, as illustrated in Fig. 1. Accordingly, as the data edge crosses the quadrant boundaries of the clock (CK), the FD asserts pulses, directly contributing to the charge-pump (CP) current, slowing down phase rotation, and pushing the VCO towards lock [3-4]. When frequency error is close to zero, the FD becomes inactive, and the PD takes full control of the VCO. Due to concurrent operation, the two loops can interfere with each other [5, 6] and delay phase locking. In [6], the FD and PD loops are uncoupled, where the FD changes the VCO frequency by switching capacitors in and out of the tank. However, this complicates the VCO design.

In contrast, we propose an embedded FD, shown in Fig. 1, that uses one loop and two phases of the clock (I and Q). The proposed embedded FD affects the CP through the PD, instead of directly controlling the CP current, enabling the PD to deal with frequency offset. Prior to lock, the clock phases  $(CK_I \text{ and } CK_Q)$  into the PD are continually swapped (in a manner described in Section II) to reduce frequency error. We will show in this paper that this clock phase selection (CPS) scheme has a much lower power consumption and complexity than the DQFD. We implement the proposed FD within a half-rate CDR in 65nm CMOS, and demonstrate that the proposed FD, using only 8mW, on average increases the capture range by  $3.6 \times$  to 8.675Gb/s - 11Gb/s.

The rest of this paper is organized as follows. Section II introduces the basic idea of CPS-based frequency detectors. In section III, the FD implementation, along with the circuits used in the CDR are shown. Simulation and experimental results

from a test-chip fabricated in a 65-nm CMOS process are included in Section IV. Finally, section V concludes the paper.



Fig. 1. Basic architecture of conventional and proposed half-rate FD

# II. PROPOSED FREQUENCY DETECTION TECHNIQUE

Fig. 2 compares the operation principle of a conventional half-rate DQFD [3-4] with that of the proposed FD in the presence of a large frequency offset. Assuming a positive frequency offset ( $f_{DATA} > f_{CK}$ ), the data phase in terms of the clock phase rotates clockwise, as shown in Fig. 2(a). In a conventional CDR, in regions 3-4 and 7-8, phase error is positive while in the other regions, it is negative. As a result, the PD output increases the VCO frequency in half of the regions and decreases it in the other half. This makes the net PD output near zero and causes cycle slipping. The DQFD, however, is able to detect the direction of this phase rotation by sampling all clock phases on the data edge and tracking the change in polarity of these samples. The FD then asserts pulses to oppose phase rotation. The sum of the PD and the FD outputs reduces frequency error [3-4].

Fig. 2(b) shows the CP current ( $\propto \Phi_{err}$ ) if  $CK_I$  or  $CK_Q$ is selected as the PD clock ( $CK_{REC}$  in Fig. 1). In this figure, the net charge into the loop filter due to  $CK_I$  and  $CK_Q$  are shown in blue (light) and red (dark), respectively. Here, similar to the conventional case, cycle slipping occurs regardless of whether  $CK_I$  or  $CK_Q$  is used. However, we make a critical observation: since  $CK_I$  and  $CK_Q$  are 90° apart, at any given time, phase error with respect to either  $CK_I$  or  $CK_Q$  will move the VCO frequency in the correct direction. Therefore, if  $CK_{REC}$  could switch between  $CK_I$  and  $CK_Q$  at appropriate times, it is possible to reduce cycle slipping (ideally avoid it altogether), and the need for a secondary FD loop is obviated. This is done by comparing  $CK_I$  and  $CK_Q$  on the rising edge of the data in regions 1-2 and 5-6 and choosing the one closest to the eye center. We disable switching in regions 3-4 and 7-8. This is conceptually demonstrated in Fig. 2(b). We observe that the PD output in regions 1-2 and 5-6 averages to zero, while in regions 3-4 and 7-8, the PD output averages to a positive



Fig. 2. Operation principle of conventional (a) and proposed half-rate FD (b) and (c)

value, therefore, overall, the VCO frequency moves in the direction of reducing the frequency offset over every cycle slip period. Similarly, as shown in Fig. 2(c), when  $f_{DATA} < f_{CK}$ , switching between  $CK_I$  and  $CK_Q$  results in a net negative CP current, again moving the VCO frequency in the direction of reducing the frequency offset.

As we will see later in this paper, compared to half-rate DQFDs, the proposed FD is simpler to implement, offers reduced lock time and increased capture range while consuming less power.

## III. CDR ARCHITECTURE AND IMPLEMENTATION

Fig. 3 shows a block level implementation of the proposed half-rate FD embedded in the PD loop. We use a linear half-rate PD [8], along with a differential ring VCO. After a data rising edge, one of the two clock phases  $CK_I$  and  $CK_Q$  are chosen by the proposed FD, and is fed back to the PD. If  $CK_{REC}$  is positive at the rising edge of the data, the higher (in amplitude) of  $CK_I$  and  $CK_Q$  is chosen, while if it is negative, the lower of the two is chosen. Fig. 3 also shows an example where the edge falls in region 6, and the PD clock is swapped from  $CK_I$  to  $CK_Q$ . This simple FD logic removes the need for a secondary FD loop, uses a total of only 45 transistors, and consumes  $4.75 \times$  less power in simulations than the half-rate FDs in [3] and [4]. After the CDR locks, the FD will

become inactive since the data edge will occur at the same phase and the swapping stops.

While in a conventional FD, after phase locking, phase can deviate from its locked position by  $1UI_{pp}$  without activating the FD, in a CPS-based FD, this zone is reduced to  $0.5UI_{pp}$ . This is because the CPS-based FD uses the instantaneous phase information to feed the desirable clock phase to the phase detector. To solve this problem, the *FDlock* signal shown in Fig. 3 can be used to turn the CPS-based FD off.



Fig. 3. Half-rate linear CDR with the proposed half-rate FD

Fig. 4 shows the circuit implementation of the VCO. The VCO delay cell is based on a differential pair with a crosscoupled stage. The delay of each stage is controlled by  $V_{TUNE}$ , which adjusts the trans-conductance of the negativegm stage, varying delay.  $V_{TUNE}$  is used in a differential fashion to maintain a constant common-mode at the VCO output. The single-ended to differential converter circuit converts the single-ended  $V_{CNT}$  to the differential  $V_{TUNE}$ .



Fig. 4. Half-rate linear CDR with the proposed half-rate FD

Fig. 5 shows the circuit implementation of the charge-pump. The current associated with the error signal is twice as large as the current associated with the reference signal, due to the half-rate nature of the PD [8]. An on-chip DAC is used to adjust the CP current during capture range measurements.



Fig. 5. Circuit diagram of the charge-pump

### IV. SIMULATION AND EXPERIMENTAL RESULTS

Fig. 6(a) shows the behavioral simulation results of the system with a 3% frequency offset. The figure shows that the CDR does not acquire lock until the FD is turned on. Fig. 6(b) characterizes the response of the proposed FD versus frequency offset with a PRBS7 10Gb/s input and compares it to the response of the previous FD [3] with the same inputs, CP currents, and loop filter values. The average gain of the proposed FD is 1.9 times that of the conventional FD. Fig. 6(c)shows the lock time of the CDR with a PRBS7 10Gb/s input (defined as the time it takes for the CDR to start producing error free data) with a conventional half-rate FD [3] and a CPSbased FD where both systems are simulated under the same conditions. On average, the CPS-based FD locks 1.8 times faster than the conventional frequency detector. Also, the CDR with the conventional FD fails to lock for very large frequency offsets (indicated by the blue region) which is predicted by Fig. 6(b) results. Also, our behavioral simulations show that the proposed FD locks 2.7 times faster than the FD in [4].

Fig. 7 shows the measured tuning range of the ring VCO which is from 3.94GHz to 6.25GHz.

Fig. 8(a) shows the measured recovered demultiplexed eye of the CDR with and without the FD. In both cases, we initialize the VCO frequency to 5GHz. With the FD off, the CDR does not lock to a PRBS7 input at 8.675Gb/s, while locking is acquired once the frequency detector is turned on. Fig. 8(b) shows the spectrum of the recovered CK before and after lock where the incoming data rate is 9.7Gb/s. In one case, CDR loop is opened and the VCO frequency is held at 10Gb/s (5GHz). This forced frequency error causes constant swapping of  $CK_I$  and  $CK_Q$ , creating spurs in the clock spectrum. Closing the loop locks the CDR; here, clock spectrum is clean even though the FD is on. This is because the freq. error is zero and clock swapping no longer occurs.

Fig. 9 shows the measured capture range of the CDR (defined as  $(f_{max}-f_{min})/10$ Gbps) with and without the FD and its jitter tolerance (JT). The VCO frequency is initialized to 5GHz, and PRBS7 data frequency is swept. As expected, increasing CP current improves the capture range. The maximum locking range of the CDR without the frequency detector is from 9.5Gb/s to 10.15Gb/s, while with the FD this range is increased to 8.675Gb/s to 11Gb/s. Due to the bandwidth limitation of the test fixture, the maximum reliable data rate



Fig. 6. Behavioral simulation results for the proposed FD (a) the system locks only after the FD is turned on, (b) normalized charge-pump current versus frequency error with a PRBS7 pattern (c) lock time of the CDR with PRBS7 pattern



for measurements was found to be 11Gb/s. Limited CP swing, caused by charge-pump current sources entering triode, results in the CDR capture range being less than the VCO tuning range. The use of a linear phase detector further limits the CP swing. A bang-bang PD can be used if an even higher capture range is needed. The measured JT of the CDR for a BER less that  $10^{-12}$  at 10Gb/s is  $0.2UI_{pp}$  at high frequency.

The chip is fabricated in Fujitsu's 65nm CMOS process. The die photo is shown in Fig. 10. The CDR area is  $350 \times 400 \mu m^2$ ,



Fig. 8. (a) Demuxed eye for the CDR with and without the FD with an 8.675Gb/s PRBS7 input, (b) clock spectrum before and after lock



Fig. 9. Capture range and JT measurement results

of which  $100 \times 65 \mu m^2$  is occupied by the proposed FD. At a 1.2V supply and operating at 10Gb/s, the chip consumes a total power of 37.2mW when the FD is on and 28.8mW when the frequency detector is off. Hence the CPS-based FD consumes only 8.4mW.

Finally, Table I summarizes the results and compares this paper against previous work. Also, note that simulating all frequency detectors in 65nm CMOS at 10Gb/s (with the same gates) result in a power consumption of 6mW for the proposed FD and 29.5mW and 28.6mW for the FDs in [3] and [4], respectively. Since the details of the design in [6] and [7] are not available, they cannot be simulated. Also exact transistor count cannot be obtained.

## V. CONCLUSION

In this paper, a novel clock phase selection based frequency detector for half-rate CDRs is introduced. It was shown that by changing the PD clock (switching between  $CK_I$  and  $CK_Q$ ) at the right time, the phase detector can be capable of dealing with frequency offset itself, removing the need for a secondary FD loop. Based on this idea, a chip was fabricated in Fujitsu 65nm CMOS process. It was shown that the FD increases the



Fig. 10. Die photo

TABLE I COMPARISON OF CDR RESULTS

| FD        | Туре      | Tech. | Lock   | Lock  | No. of | FD    |
|-----------|-----------|-------|--------|-------|--------|-------|
|           |           | (nm)  | Rate   | range | trans- | power |
|           |           |       | (Gb/s) | (%)   | istors | (mW)  |
| [3]       | Half-rate | 180   | 3.125  | 11.52 | 156    | 30.6  |
| [4]       | Half-rate | 180   | 10     | 14.3  | 147    | 42.2  |
| [6]       | Full-rate | 65    | 10     | 30    | >73    | NA    |
| [7]       | Full-rate | 180   | 3.125  | 16    | >76    | 15.5  |
| This work | Half-rate | 65    | 10     | 23.25 | 45     | 8.4   |

capture range to 23.25% while consuming only 8mW. Our simulation and measurement results show that this inherent FD consumes much less power and area than its conventional equivalents.

#### ACKNOWLEDGMENT

The authors would like the acknowledge CMC Microsystems for the provision of test equipment and CAD tools.

#### REFERENCES

- D. Richman, "Color-Carrier Reference Phase Synchronization Accuracy in NTSC Color Television," *Proceedings of the IRE*, vol. 42, pp. 106-133, Jan. 1954.
- [2] B. Razavi, "A 2.5-Gb/sec 15-mW Clock Recovery Circuit," *IEEE Journal of Solid-State Circuits*, vol. 31, pp. 472-480, April 1996.
- [3] R. Yang, S. Chen, and S. Liu, "A 3.125-Gb/s Clock and Data Recovery Circuit for the 10-Gbase-LX4 Ethernet," *IEEE Journal of Solid-State Circuits*, vol. 39, pp. 1356-1360, Aug. 2004.
- [4] J. Savoj and B. Razavi, "A 10-Gb/s CMOS Clock and Data Recovery Circuit with a Half-Rate Binary Phase/Frequency Detector," *IEEE Journal* of Solid-State Circuits, vol. 38, pp. 13-21, Jan. 2003.
- [5] D. Dalton, S. Fallahi, M. Kargar, M. Khanpour, and A. Momtaz, "A 12.5-Mb/s to 2.7-Gb/s Continuous-Rate CDR With Automatic Frequency Acquisition and Data-Rate Readback," *IEEE Journal of Solid-State Circuits*, vol. 40, pp. 2713-2725, Dec. 2005.
- [6] N. Kocaman, K. Chai, E. Evans, M. Ferriss, D. Hitchcox, P. Murray, S. Selvanayagam, P. Shepherd, and L. DeVito, "An 8.511.5Gbps SONET Transceiver with Reference less Frequency Acquisition," *IEEE Custom Integrated Circuits Conference*, pp. 1-4, Sep. 2012.
- [7] M. Lee, and T. Lee, "A clock and data recovery circuit with wide linear range freq. detector," *IEEE International Symposium on VLSI design*, *automation and test*, pp. 121-124, April 2008.
- [8] J. Savoj, and B. Razavi, "A 10-Gb/s CMOS Clock and Data Recovery Circuit with a Half-Rate Linear Phase Detector," *IEEE Journal of Solid-State Circuits*, vol. 36, pp. 761-768, May 2001.