# A 32/16-Gb/s Dual-Mode Pulsewidth Modulation Pre-Emphasis (PWM-PE) Transmitter With 30-dB Loss Compensation Using a High-Speed CML Design Methodology

Horace Cheng, Member, IEEE, Faisal A. Musa, Member, IEEE, and Anthony Chan Carusone, Senior Member, IEEE

Abstract—Pulse-width modulation pre-emphasis (PWM-PE) is a relatively new technique for compensating severe losses in wireline channels by varying the duty cycle of the transmitted pulse. The technique has been demonstrated upto 5 Gb/s and requires high-speed digital logic to accomodate narrow pulses in the transmitted bit stream. This work targets data rates beyond 10 Gb/s and extends PWM-PE to 4-PAM signals in addition to binary mode transmission. The target speed is achieved by designing the transmitter using current mode logic (CML) blocks that combine relatively large logic swings and incomplete switching of the tail current. Implemented in a 0.13- $\mu$ m CMOS process to accommodate the wide output swing of 1.2 Vpp per side, the transmitter compensates upto 30 dB loss at one-half the symbol rate and operates up to 16 Gsymbols/s.

*Index Terms*—CMOS, current mode logic (CML), pulse-amplitude modulation (PAM), pulsewidth modulation pre-emphasis (PWM-PE).

#### I. INTRODUCTION

**P** ULSEWIDTH modulation pre-emphasis (PWM-PE) [1], [2] is used to compensate severe losses in wireline channels by shaping the transmitted pulse response. Pulse shaping is performed by varying the duty cycle of the pulse to achieve high frequency boost and low frequency attenuation. The optimal pulse duty cycle corresponds to a flat combined response of the transmitter and lossy channel. This technique has been demonstrated in CMOS to compensate 33 dB loss at 2.5 GHz [1] and 22 dB loss at 1.25 GHz [2]. Such enormous loss compensation capacity requires wide swing at the transmitter in order to avoid low amplitude signals at the receiving end (e.g., 30-mV receiver sensitivity with 30-dB loss channel requires a transmitter with at least 1-V swing). However, [1] and [2] report transmitter swings of 600 mVpp at 5 Gb/s and 700 mVpp at 4 Gb/s. This work achieves 30-dB loss compensation at data rates up to 16 Gb/s

Manuscript received February 23, 2009; revised May 08, 2009. First published June 10, 2009; current version published August 26, 2009. This work was supported by Gennum Corporation and NSERC. This paper was recommended by Guest Editor W. A. Serdijn.

H. Cheng was with the Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M4X 1G7, Canada. He is now with Synopsys Inc., Bedminster, NJ 07921-1537 USA (e-mail: horace.cheng@gmail. com).

F. A. Musa and A. C. Carusone are with the Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M4X 1G7, Canada (e-mail: faisal.a.musa@gmail.com; tcc@eecg.toronto.edu).

Digital Object Identifier 10.1109/TCSI.2009.2024903

and with twice the transmitter swing of [1]. This is also the first ever reported 4-PAM PWM-PE transmitter with a data rate of 32 Gb/s.

To safely avoid device breakdown, and without external transformers, a swing over 1 Vpp restricted the design to 130-nm CMOS technology. However, achieving data rates beyond 10 Gb/s with current mode logic (CML) gates in 130 nm requires designing circuits at very high-speed. Moreover, PWM-PE requires particularly high-speed CML gates since PWM creates pulses that are a fraction of a bit period in duration. Hence, the speed requirements for this work are comparable to those achieved in [3]–[5]. Examining the CML design methodologies in [3]–[5], we find that they use very large logic swings. But, this contradicts the design prescriptions for high-speed CML found in [6]-[10]. After briefly describing PWM-PE technique in Section II, we resolve the contradiction and arrive at a design methodology that combines relatively large logic swings and incomplete switching of the tail current to achieve very high-speed operation in Section III. Section IV describes the dual mode 2-PAM/4-PAM transmitter [11] and its building blocks. Measurement results are presented in Section V, and conclusions are presented in Section VI.

#### II. PWM-PE

PWM-PE as proposed in [1] and [2] has the following pulse response:

$$p_{\rm PWM}(t) = \begin{cases} 0, & t < 0\\ 1, & 0 \le t < d \cdot T_s\\ -1, & d \cdot T_s \le t < T_s\\ 0, & t \ge T_s \end{cases}$$

where d and  $T_s$  denote the duty cycle and symbol period respectively. PWM-PE uses the timing resolution within one symbol period to shape the transmit-pulse response, while keeping the transmitted waveform at its full-swing values. The pulse response comprises a positive portion of duration  $d \cdot T_s$ , and a negative portion of duration  $(1-d) \cdot T_s$ . Fig. 1 shows PWM-PE pulses for different settings of the duty cycle, d with normalized amplitude of  $\pm 1$ . PWM-PE with d = 50% is equivalent to a Manchester-encoded signal.

Fig. 2 shows the magnitude of the transmit spectrum as a function of the duty cycle d. Note that as the pre-emphasis setting d changes, the pulse spectrum no longer has nulls at integer multiples of  $1/T_S$ . Low-frequency components are suppressed,



Fig. 1. Time-domain pulse shape of PWM pre-emphasis; d = 100%, 80%, 52.5%.



Fig. 2. Pulse spectrum of PWM pre-emphasis; dashed: d = 100% (NRZ); circle: d = 80%; square: d = 52.5%.

while the high frequency components are boosted which serves the purpose of compensating losses at high-frequency.

Two architectures have been reported in literature [1], [2] to implement PWM-PE. The first approach, shown in Fig. 3(a) [1], uses a delay block and a logical-OR. The amount of delay determines the duty-cycle of the clock signal, and hence the amount of pre-emphasis in this implementation. This architecture requires two separate control voltages to adjust the duty-cycle. The second architecture, shown in Fig. 3(b), uses DC offset currents to control the clock's duty-cycle [2]. This approach is



(b) PWM-PE generation via DCC

Fig. 3. PWM-PE topology.



Fig. 4. Output pulses of the Gray-coded 4-PAM transmitter with PWM-PE.

better suited to generating high pre-emphasis since zero offset current yields 50% duty-cycle at any clock frequency.

The ability to disable pre-emphasis for low-loss channels or testing purposes is also an important consideration. From the circuit block diagrams, since the clock input is ac-coupled, it is evident the first approach [Fig. 3(a)] can only create a 100% duty-cycle (no pre-emphasis) if the delay is greater than half of the clock period. Using the second approach [Fig. 3(b)], however, pre-emphasis is disabled by disconnecting the clock input and applying a large offset current to the DCC. Based on the simpler circuit implementation and the ability to disable pre-emphasis reliably, the architecture with duty-cycle control (DCC) is chosen in this work.

Although previously only applied to binary signals, PWM-PE is a linear operation and, hence, can be applied to a 4-PAM system. The corresponding 4-PAM symbol pulse shapes are shown in Fig. 4, assuming a Gray line code. The 2-bit inputs M and L are mapped to Gray line code bits A, B and C using the following logic:

$$A = M \cdot \overline{L} \tag{1}$$

$$B = M = M \cdot 1 \tag{2}$$

$$C = M + L = \overline{\overline{M} \cdot \overline{L}} \tag{3}$$

where the MSB input is denoted by M and the LSB input is denoted by L. Each of the three bits A, B and C are combined



Fig. 5. Circuit simulation setup. (a) Cascade of CML buffers with a fanout, k = 2 used for circuit simulation. (b) Schematic of CML buffer used for circuit simulation.

with a variable duty cycle clock and passed through XOR gates to realize PWM pulses. The pulses are then linearly combined to generate the PWM-PE signal. Note that the Gray encoder also facilitates operation in binary (2PAM) mode. If the LSB is set to logic 0, (1)–(3) reduce to A = B = C = MSB. Hence, the three outputs of the encoder switch in unison with the MSB input, thereby generating a full-swing binary output.

#### III. CML DESIGN METHODOLOGY IN CMOS

In this work, the various logic gates in the PWM-PE transmitter were implemented using current mode logic (CML). In this section we describe and compare conventional CML design and the high-speed CML design methodology used for designing logic gates in the implemented PWM-PE Transmitter. It is shown that the high-speed CML gates are superior in terms of speed, swing and silicon area.

#### A. Background: Square Law CML Design

CML circuits are commonly used in pre-driver, delay-locked loop (DLL), and clock distribution circuits. In these cases, a cascade of roughly constant fanout buffers is employed. Hence, this is the basis of the simulation setup, shown in Fig. 5 that is used to investigate the design methodology of high-speed CML logic. All simulations in this section are performed in 0.13  $\mu$ m CMOS at the typical (tt) design corner and at a temperature of 75 °C. The schematic of a generalized CML buffer is shown in Fig. 5(b). The tail currents (=  $I_{TAIL}$ ), device widths (=  $W \cdot n$ ), and load resistors (= R) are scaled by the fanout factor (= k = 2), along the cascade of buffers. Each CML buffer has the same current density  $J = I_{TAIL}/W$ , and fullswitching voltage,  $\Delta V_{MAX} = I_{TAIL}R$ . Assuming a source-tobody voltage of 400 mV, the threshold voltage of the differential pair devices is 480 mV.

Generally, either the load resistor or tail current of the last stage is fixed by the desire to achieve a particular time constant, current level, or matching resistance at the output. Hence, the design of the entire cascade of buffers boils down to choosing  $\Delta V_{\text{MAX}} = I_{\text{TAIL}}R$  and W.

Much prior work has analyzed CML circuits using a smallsignal analysis [6]–[8]. Assuming a square-law MOS model, in order to ensure that the tail currents in all buffers are fullyswitched to one side of the differential pair, it may be shown that [6], [8]

$$A_v = g_m \cdot R \ge \sqrt{2} = 3 \text{ dB} \tag{4}$$

$$V_t \ge I_{\text{TAIL}} \cdot R = \Delta V_{\text{MAX}}.$$
 (5)

In (4),  $g_m$  is the small-signal transconductance of the differential pair devices with zero differential input. The threshold voltage,  $V_t$ , in (5) is that of the differential pair devices.

Hence, when a square-law MOS model is assumed and full switching of the differential pair devices required, CML design proceeds as follows:  $\Delta V_{MAX}$  is chosen close to  $V_t$ . Then the device width, W of the differential pair devices is chosen to achieve a small-signal gain of at least 3 dB. Rewriting (4) in terms of the device width

$$A_v = R \sqrt{\frac{\mu_n C_{\text{ox}} I_{\text{TAIL}}}{L}} W \ge \sqrt{2} = 3 \text{ dB}$$
(6)

$$\Rightarrow W \ge \frac{2L}{\mu_n C_{\text{ox}} I_{\text{TAIL}}} \cdot \frac{1}{R^2}.$$
(7)

Here L is the gate length of the differential pair devices (assumed to be the minimum permitted by the given CMOS technology),  $\mu_n$  is the mobility of the n-channel devices and  $C_{\text{ox}}$  is the oxide capacitance. Once the device width W,  $\Delta V_{\text{MAX}}$ , the fanout ratio k, and the load resistance R of the last stage are known, the design parameters of all stages in the cascade ( $I_{\text{tail}}$ , R, and W) may be computed.

This approach is limited for very high-speed CML design in several respects. First, the requirement that the CML buffers fully switch all their tail current can be relaxed. If some current is permitted to flow in the "off" branch of a CML buffer, the resistance R of the preceding stage can be reduced, thus decreasing the time constant at that node and increasing the speed of the cascade. Second, the speed of a CML gate is largely influenced by its open-circuit time constant (OCTC) at the output nodes [12], [13]. With the assumption that transistor device parasitic capacitances are the dominant form of load capacitance at the output node of a CML buffer, the total load capacitance is proportional to the widths of the input and output transistors connected to that node. Hence, the OCTC at the output node is roughly proportional to W and R:

$$au \propto RW.$$
 (8)

Maintaining the square-law MOS model (6) may be rearranged to show that

$$A_v \propto R\sqrt{W}.$$
 (9)

Based on (8), and (9), it is possible to reduce the *RC* time constant of a CML stage without affecting its small-signal gain by simultaneously increasing its load resistance by some factor x and decreasing its width by a factor  $x^2$  (hence, increasing its current density). In fact, this observation is consistent with several published high-speed CML designs [3]–[5]. The CML circuits



Fig. 6. VTC of CML circuits in different operating regions. (a) Full-switching region. (b) Soft-switching region. (c) Attenuating region.

in these published works all employ high current-densities to reduce load capacitances. The values of  $\Delta V_{\text{MAX}}$  of these CML designs exceed the device threshold voltage, in contradiction of (5). Of course, biasing the differential pair devices with such high current densities reduces the accuracy of the square-law MOS model that underlies (4), (5) in the first place. A simulation-based design methodology is therefore required.

#### B. High-Speed CML Design

This section describes a high-speed CML design methodology that does not require full-switching of the tail current in a CML buffer, and uses high current densities to minimize device sizes.

Fig. 6 shows three possible voltage transfer characteristics (VTCs) of CML buffers. VTCs illustrate the large-signal inputoutput characteristics of a CML stage graphically and can be easily obtained from a circuit simulator. The small-signal gain is given by the tangent of the VTC at the origin. In Fig. 6(a), the CML gate is designed to provide full switching. In this case even a small input voltage will result in the full-switching output swing of  $\Delta V_{MAX}$  after a few stages. Under the square-law assumption, a small-signal gain of 3 dB is required to ensure this full-switching operation. In Fig. 6(c), the CML gate has a small-signal gain less than unity. This region of operation is generally avoided by circuit designers since it results in a signal with very low swing at the final output of a chain of CML buffers. VTCs as depicted in Fig. 6(b) fit between these two extremes and will be referred to as soft-switching CML operation.

Characteristics of the soft-switching region are as follows.

- 1) The small-signal gain is between 3 dB to 0 dB.
- 2) A small amount of current flows through the "off" device in the CML differential pair. As a result, the output voltage is less than the full-switching value of  $\Delta V_{\text{MAX}}$ . In fact, the output voltage is given by  $V_{\text{out}} = (I_{\text{on}} I_{\text{off}}) \cdot R$ .
- 3) The logic signal will be given by  $V^*$ , defined graphically in Fig. 6(b) as the point where the VTC intersects with the unity-gain line  $V_{\rm in} = V_{\rm out}$ . In a full-switching CML buffer, as in Fig. 6(a),  $V^* = \Delta V_{\rm MAX}$ . But in a softswitching buffer,  $V^* < \Delta V_{\rm MAX}$ , which implies that the current in the on branch of the buffer is less than the total tail current.



Fig. 7. The three regions of operation for CML circuits.

In a cascade of soft-switching CML buffers with identical VTCs, as more stages are added to the cascade, the output voltage eventually settles to  $V^*$  as inputs above  $V^*$  are compressed while inputs below are amplified. This is shown in Fig. 7. In contrast, attenuating region results in a diminishing output voltage and full switching results in an output swing of  $\Delta V_{\text{MAX}}$ .

Note that allowing soft-switching operation permits reduced load resistance of CML gates compared to that required to maintain full-switching. Low values of resistance are desirable for high-speed CML design as they translate to reductions in the RC time constants.

Having defined a suitable region of operation we now focus on our choices for  $\Delta V_{MAX}$  and device width, W. The square-law design methodology, with its emphasis on maintaining the saturation region of operation, is too conservative for high-speed CML design. Referring to (5), the square law model limits the  $\Delta V_{MAX}$  to  $V_t$ . The validity of the square law model is questionable for high-speed CML design with high current densities and/or nano-scale MOSFETs since velocity saturation and/or critical vertical electrical fields in MOSFETs may result in sub-quadratic voltage-current relationships. Furthermore, there is no strong reason to avoid operating the differential pair devices in triode. Hence, if we alleviate the requirement imposed by (5), the only limit on  $\Delta V_{MAX}$  is that imposed by the finite supply voltage and headroom required to

|                           | Square-law | High-speed   |  |
|---------------------------|------------|--------------|--|
|                           | design     | design       |  |
| Tail current              | 5 mA       | 5mA          |  |
| R                         | 80Ω        | 160Ω         |  |
| W                         | $54 \mu m$ | $17 \ \mu m$ |  |
| $\Delta V_{MAX}$          | 0.4 V      | 0.8          |  |
| $V^*$                     | 0.4 V      | 0.78         |  |
| Relative RC time constant | 1.0        | 0.63         |  |
| dc small-signal gain      | 3 dB       | 2.2dB        |  |

TABLE I COMPARSION OF CML DESIGN METHODOLOGIES

keep the tail current device from entering triode. In summary, the procedure advocated here is to achieve the highest possible operating speed for CML logic.

- Step 1) Select  $\Delta V_{\text{MAX}} = I_{\text{TAIL}}R$  as large as possible given supply and headroom constraints.
- Step 2) Simulate the VTC while parametrically sweeping the device width W.
- Step 3) Select the smallest width W that robustly provides a logic swing  $V^*$  with sufficient noise margin.

## C. Simulation Study

This section presents simulation results to validate the discussions in the previous sections. All simulation results are based on the simulation setup shown in Fig. 5. To accommodate the large swing requirements of the design, the supply voltage was set to 1.8 V. This is consistent with high-speed CML designs reported in [14] and [15] in 0.13  $\mu$ m CMOS.

Using the square-law design methodology, the full-switching voltage of  $\Delta V_{\text{MAX}}$  is set as close as possible to  $V_t$  of the differential pair devices. Since  $V_t = 480 \text{ mV}$ , we choose  $\Delta V_{\text{MAX}} = 400 \text{ mV}$ . Given a tail current of 5 mA, the load resistance is  $R = 80 \Omega$ , and the small-signal voltage gain of the buffer reaches 3 dB for a differential pair device width of  $W = 54 \mu \text{m}$ .

Using the high-speed design methodology, the full-switching voltage of  $\Delta V_{\text{MAX}}$  is set to 800 mV, just enough to provide the tail transistor 400 mV headroom. For 0.13  $\mu$ m CMOS technology used in this work, the drain-source breakdown voltage was 1.6 V. Thus, for a swing of 800 mV, the drain voltage of the device is 1.8 - 0.8 = 1 V. Assuming a 400 mV drop across the tail transistor, the drain-source voltage of the input transistor is 1 V - 400 mV = 600 mV which is well below the breakdown voltage.

For the same tail current, 5 mA, an 800 mV swing implies a load resistance of 160  $\Omega$ , double that used for the square-law model. A parametric sweep of device width (W) reveals that  $W = 17 \,\mu\text{m}$  and  $V^* = 780 \,\text{mV}$  at dc. The device width is over 3x smaller than that required by the square-law methodology, reducing the RC product of the high-speed design by approximately 37% compared to the square law based design in spite of the larger load resistance and output swing. It has a lower small-signal gain, 2.2 dB as a consequence of soft-switching operation. A larger value of load resistance would be required for gain of 3 dB or higher, but that would lead to degradation in RC time constant. A comparison of the two designs is summarized in Table I. The values in Table I show the following.

 For a fixed power dissipation, a larger swing and higher speed is possible by following the high-speed design methodology. Note that the larger the output swing of a



Fig. 8. 3-D VTCs of CML circuits. (a) Square law design:  $W = 54 \ \mu$ m,  $\Delta V_{\text{MAX}} = 400 \text{ mV}$ . (b) High-speed design:  $W = 17 \ \mu$ m,  $\Delta V_{\text{MAX}} = 800 \text{ mV}$ .

CML buffer, the lower the output common mode level. As long as the tail transistor in the next stage is in saturation, the low output common-mode of the previous stage should not be a problem. For multiple input gates (such as XOR gates used in this work; Section IV-D), stacked transistors may cause the tail transistor to enter triode. For these gates, the swing  $\Delta V_{\text{MAX}}$  was lowered to 600 mV and a chain of 800 mV swing CML buffers were added to the output of the gate to restore the swing back to 800 mV.

2) The high-speed CML buffers require smaller transistors and larger resistors and hence occupy less silicon area. Note that smaller resistances normally lead to larger parasitic capacitance since they are normally realized by placing several resistors in parallel. This also results in larger wiring capacitance and larger area in the integrated circuit. Hence, larger resistances are advantageous for several reasons.

Until now, the VTCs discussed have been simulated at dc. To compare the high-speed performance of the two designs, the VTCs can be plotted at high frequencies. This is done in simulation by exciting the CML circuit with a single tone sinusoid and observing its output. By varying the input amplitude and measuring the output amplitude it is possible to obtain the entire VTC at any particular frequency. This data is represented



Fig. 9.  $V^*$  versus frequency for a fanout, k = 2; square law design (solid), high-speed design (dashed).

across a wide range of frequencies in Fig. 8 for the two designs assuming a fanout of k = 2. Note that the output swings are normalized to  $\Delta V_{\text{MAX}}$  in each case. The intersection of these surface plots with the plane  $V_{\text{in}} = V_{\text{out}}$  results in the plots of frequency dependent  $V^*$  for the two designs in Fig. 9. Note that  $V^*$  for square law design, drops below 80% of its dc value at 2.5 GHz. On the other hand, high-speed CML design maintains a  $V^* > 0.8 \times \Delta V_{\text{MAX}}$  up to 3.7 GHz.

Although it provides a significant speed improvement over the square law design methodology, the high-speed CML design discussed so far is not suited for target speeds beyond 10 GHz in 0.13  $\mu$ m CMOS. To further improve the speed, designers can introduce inductive peaking and reduce the fan-out ratio, k. Inductive peaking [16] is commonly used to increase the bandwidth of CML. By introducing inductors with values of  $L = CR^2/3.1$ , it is possible to increase the 3-dB bandwidth of the CML stages by 60%. Further improvement in speed can be achieved by reducing the fan-out ratio k at the expense of power and area. As the fanout ratio is reduced, more stages are required to provide sufficient drive strength at the output of a cascade. Fig. 10 is a plot of  $V^*$  versus frequency for CML design with  $\Delta V_{\text{MAX}} = 800 \text{ mV}$  and  $W = I_{\text{TAIL}} \cdot 3.33 \mu \text{m/mA}$ with fanout ratios of k = 1.5 and k = 1.2 and peaking inductors chosen based on the formula  $L = CR^2/3.1$  where C and R are both functions of the fanout, k. The effectiveness of inductive peaking is illustrated by plotting  $V^*$  versus frequency without any inductive peaking on the same graph for fanout ratio of k = 1.5. Note that for a fixed fanout ratio of k = 1.5,  $V^*$ equals 600 mV at 15 and 23 GHz without and with inductive peaking, respectively. Hence, inductive peaking improves the large-signal speed of the CML logic by 53% in this case.

Based on Fig. 10, the cascade of buffers with k = 1.5 is sufficient for those parts of transmitter system that require a signal bandwidth up to 10 GHz. However, for clock distribution requiring 20 GHz signal bandwidth, only the cascade of buffers with k = 1.2 is suitable. Furthermore, large inductor values given by  $L = CR^2/2.4$  are used in the clock path where non-linear phase response can be tolerated in exchange for greater bandwidth extension. Finally, since the design parameters of



Fig. 10. Effect of fanout, k and inductive peaking on  $V^*$  versus frequency plots; fanout k = 1.5 without inductive peaking (dotted); fanout k = 1.5 with inductive peaking (solid); fanout k = 1.2 with inductive peaking (dashed).



Fig. 11. Block diagram of transmitter.

CML circuits are closely linked by simple design equations, a CML cascade can be designed quickly once the fanout ratio and the design of the first stage in the cascade are determined. The design equations for the parameters of a CML buffer ( $W_2$ ,  $I_2$ ,  $R_2$ , and  $L_2$ ) in terms of the preceding buffer's parameters ( $W_1$ ,  $I_1$ ,  $R_1$ , and  $L_1$ ) and the fanout, k, are given as follows:

$$W_2 = k \cdot W_1 \tag{10}$$

$$I_2 = k \cdot I_1 \tag{11}$$

$$R_2 = \frac{\Delta V}{I_2} = \frac{\Delta V}{k\dot{I}_1} = \frac{R_1}{k} \tag{12}$$

$$L_2 = \frac{L_1}{k}.\tag{13}$$

Temperature and process variation deteriorate the performance of the CML buffer chain and may cause the CML buffer to operate in the attenuating region. In order to avoid this undesired effect and ensure robustness, the entire transmitter chip was designed using a slow process corner and at 100 °C. Process and temperature effects on CML transfer curves are illustrated in Fig. 14 for the CML buffers used in the critical clock path.

Device mismatch produces an offset voltage in CML buffers [6]. If the offset is large enough, it can cause logical errors. However, since the transmitter chip uses relatively large width devices to ensure large current handling capacity, the resulting



Fig. 12. Schematic of DCC stage.

mismatch-induced offset was small. Moreover, the high-speed design methodology results in large logic levels ensuring that transmitted bits will not be corrupted by the offset.

#### IV. APPLICATION TO 2-PAM/4-PAM TRANSMITTER

This section describes the application of the high speed CML design methodology to a 2-PAM/4-PAM transmitter [11]. The transmitter uses pulsewidth modulation based pre-emphasis (PWM-PE) [2] to combat losses introduced by a channel at data rates upto 32 Gb/s. The fundamental parameter that controls the strength of pre-emphasis is the duty cycle of the clock.

Fig. 11 illustrates the block diagram of a 32/16 Gb/s 4-PAM/2-PAM transmitter with PWM-PE [11]. Input buffers were inserted for broadband single-ended to differential conversion of data and clock inputs. A binary-to-thermometer Gray code encoder is implemented in CML. The clock input passes through a duty cycle control (DCC) circuit, and its output is combined with the thermometer Gray-coded data streams at the XOR gates to create three PWM-PE data streams. These are each passed through a cascade of CML buffers to drive three parallel CML differential output stages connected to a common load. This section describes the design of each logic gate based on the high-speed CML design methodology discussed so far.

## A. Cascades of Buffers

Cascades of CML buffers are required for two reasons in this design. Firstly, they are used to provide single-ended to differential conversion for the data and clock inputs. Secondly, they are used to buffer the outputs of complex logic blocks (duty cycle control, Gray encoder, XOR) which have high self-loading and, hence, limited drive capability.

The buffers at various stages of the transmitter have different bandwidth requirements due to their different signal contents. Data signals MSB, LSB, A, B, and C have bandwidth requirements of 10 GHz. Due to the broadband nature of the data signals, inductive peaking for linear phase response is used along the data signal paths. Inductors with a value  $L = CR^2/3.1$ can extend the 3-dB bandwidth of a circuit by 60% while maintaining a linear phase response up to its 3-dB bandwidth [16]. A fanout of 1.5 is the maximum for which the value of  $V^*$  remains close to the target swing of 780 mV up to 10 GHz.

Clock signals require twice as much bandwidth, 20 GHz. Maximally flat inductive peaking is used in the clock path to provide extra bandwidth. These inductors have a value  $L = CR^2/2.4$  and can extend the 3-dB bandwidth of a circuit by 72%. The additional bandwidth is gained at the cost of non-linear phase response. Since the clock is a single-tone, the resulting non-linear phase response is not a problem. Furthermore, a lower fanout value of 1.2 is chosen.

## B. Duty Cycle Control (DCC)

Fig. 12 is the schematic of the DCC CML circuit. The differential inputs,  $V_{\text{offset}+}$  and  $V_{\text{offset}-}$ , are derived from a single off-chip control voltage by a single-ended to differential conversion circuit <sup>1</sup>. These inputs introduce a dc offset current which adjusts the duty-cycle of the differential clock applied at CLK+and CLK-. The switching differential pair has a tail current of 16 mA while the offset differential pair has a tail current of 13 mA. The target duty-cycle tuning range is between 50% and 75%. When the offset pair is balanced, the balanced dc offset current introduces no distortion to the switching signal. A positive differential input in the offset pair introduces offset current in the load resistance such that the zero-crossing is changed to increase the positive duty cycle of the clock waveform. Due to the presence of static current introduced by the offset differential pair, the extra voltage drop at the load resistors reduces the voltage headroom available to the transistors. This issue is addressed by reducing the output voltage swing from 800 to 600 mV. The dc bias voltages of the DCC stage in its balanced state are shown in Fig. 12.

The output of the DCC stage also has a lower common-mode voltage than the other CML stages. In addition, for high duty-cycle settings the negative cycle of the clock is very narrow and hence has small amplitude. This small amplitude is insufficient to switch the subsequent CML gate. To address this issue, a cascade of five CML stages with high gain is placed at the output

<sup>&</sup>lt;sup>1</sup>Although the DCC is controlled off-chip in this implementation, in a final product, an adaptation loop can be used to control the DCC through a back channel. A possible solution has been discussed in [17].



Fig. 13. Block diagram of DCC stage with high-gain buffers with the width of differential pair devices labeled for each stage.



Fig. 14. Transfer characteristics of high-gain buffers in clock path over process, temperature and resistor variations (solid line corresponds to fast corner,  $0 \,^{\circ}$ C and 10% increase in resistor values; dotted line corresponds to typical corner, 75  $^{\circ}$ C and normal resistor values and crossed line corresponds to slow corner, 100  $^{\circ}$ C and 10% decrease in resistor values).

of the DCC stage. A block diagram is shown in Fig. 13. The five high-gain CML stages are identical. The high-gain stages are all designed with larger device width,  $W = I_{\text{TAIL}} * 5 \,\mu\text{m/mA}$ , to increase their transconductance. A low fanout of one is used to provide high gain up to 20 GHz.

Effect of process and temperature on the transfer curves of the high-gain CML stages are illustrated in Fig. 14. A variation of  $\pm 10\%$  in resistance values were assumed in these simulations. Under worst case conditions, the gain remains larger than one which ensures robust operation in the clock path.

In addition to the ability to adjust the duty-cycle of the clock signal, the DCC circuit is also capable of turning off pre-emphasis in the transmitted data. By disconnecting the clock input into the transmitter and applying a large dc voltage to the DCC offset control circuit, the offset differential pair in Fig. 12 generates a sufficient dc output voltage to fully-switch the subsequent high-gain buffers. Hence, the *pwm\_clk* signal (Fig. 13) applied to the inputs of the XOR gates would be at a constant logic level and would not alter the content of the input data signal, thereby passing conventional NRZ signals to the output stage.

### C. Thermometer Gray Code Encoder

The two binary input data-streams are encoded into three equally weighted streams, A, B, and C as shown in Fig. 11. Each of the three encoded streams drives the same load, making their delays well-matched. The circuits of the three branches



Fig. 15. Basic designs for  $A = M \cdot \overline{L}$ . (a) Basic AND gate. (b) Improved AND gate.



Fig. 16. Mirrored AND topology. (a) Mirrored AND gate schematic. (b) Block diagram of mirrored AND gate with CML buffers used as delay cells.

can be duplicated. Furthermore, note that by using Gray coding, any level mistaken for a neighboring level results in only one bit error. This results in a lower BER than a binary-encoded link. The drawback is increased complexity in the encoder logic. Combinational CML gates were designed to implement the Gray encoding logic given in (1) to (3). A simple CML gate that implements the logical AND in (1) is shown in Fig. 15(a) [13]. Unlike standard CML XOR and MUX gates, the circuit in Fig. 15(a) is asymmetrical in that it presents different capacitive loads at the two differential outputs. This results in data-dependent jitter at its output, as shown in Fig. 17(a). The circuit shown in Fig. 15(b) attempts to resolve this issue by adding another differential pair on top of the L+ input. Nevertheless a large amount of systematic jitter, about 6–8 ps, is present at the output waveform as shown in Fig. 17(b).

The systematic jitter is mainly the result of the delay that occurs due to the mismatch between the upper switching diff pair and the DC-biased pair. For the upper switching diff pair, the center node is not precharged as in the case of the DC biased diff-pair. Thus, in order to match the center nodes in terms of pre-charge voltage, the signal at the bottom switching pair can



Fig. 17. Simulated eye diagrams. (a) Basic AND gate of Fig. 15(a): 5.5 ps jitter (b) Improved AND gate of Fig. 15(b): 8 ps jitter. (c) Mirrored AND gate of Fig. 16(a): 2 ps jitter.



Fig. 18. Schematic of XOR stage.

be applied earlier than the upper switching pair. Simulations reveal that the systematic jitter can be reduced by delaying the inputs to the upper switching pair by 3-4 ps. Fortunately, the intrinsic gate delay of a CML buffer is in the range of 3-4 ps, and can therefore be used to provide the required delay. Thus, a new topology with identical MSB and LSB data path is achieved by dividing the circuit in Fig. 15(b) into two identical parts and mirroring the top and bottom differential pairs inputs. The schematic of this design is shown in Fig. 15(b). Note that an early version of the LSB input is applied to the bottom differential pair with the MSB on the top differential pair on the left half of the circuit, and the reverse for the right half of the circuit resulting in a more symmetric design. The block diagram is shown in Fig. 16(a). This more symmetrical implementation of logical AND has an additional benefit that any small skew between the MSB and LSB streams would have less impact on the overall systematic jitter at the output compared to a design without the mirrored topology. Simulated eye diagrams at the output of the Basic AND gate [Fig. 15(a)], Improved AND gate [Fig. 15(b)] and the Mirrored AND gate [Fig. 16(a)] are shown in Fig. 17.



Fig. 19. Block diagram of XOR stage with high-gain buffers.

The systematic jitter improvement from 5.5 to 8 ps to 2 ps using this Mirrored AND gate topology is shown in Fig. 17(c).

All three logical expressions in (1)–(3) may be written as logical AND operations and, hence, use the same topology and device sizes to match their delays.

#### D. XOR

The design of the XOR gate is challenging since it generates the PWM-PE signal with increased high-frequency content. Fig. 18 is the schematic of XOR CML gate, with annotated bias voltages and currents. Again the output swing is reduced from 800 to 600 mV by reducing the load resistance. The lower load resistance reduces the RC time constant at the critical output node of the XOR gate. Along with inductive peaking and a low fanout to the subsequent CML buffer stage, sufficient bandwidth for the PWM-PE signal is maintained. Similar to the design of DCC stage, high-gain low-fanout (k = 1) buffers were used at the output of XOR CML circuit as shown in Fig. 19. They restore the output waveform to a full voltage swing of 800 mV.

#### E. Output Driver

The last circuit component in the overall design is the output driver. The primary functions of the output driver are to provide a large output swing and to combine the three PWM-PE data streams to form the 4/2 PAM output waveform with PWM-PE.

The transmitter is designed for use in a doubly terminated 75- $\Omega$  dc-coupled link. However, during testing an oscilloscope serves as the receiver and must be protected by a dc-blocking capacitor. Hence, a bias tee is introduced to provide a dc path for the additional bias current required by the output stage. Together with an external 75- $\Omega$  resistor as shown in Fig. 20, it emulates the dc path that would be provided by a dc-coupled receiver.

To accommodate the large supply voltage, a cascode topology was chosen [18], [19]. The cascode devices prevent breakdown by shielding the input differential pair from excessive drain–source voltages and also reduce the input Miller capacitance. The performance of the output driver is very



Fig. 20. Schematic of output driver.



Fig. 21. Die photograph of transmitter.

sensitive to the value of the gate bias on the cascode devices,  $V_{\rm cascode}$ . It is important to bias the cascode devices so that their drain–source voltage,  $V_{\rm ds}$ , never exceeds the breakdown limit of 1.6 V during operation.  $V_{\rm cascode}$  is an input reference voltage taken from off-chip with a value of 2-V. The high output swing can be reduced for low-loss channels by reducing the tail current of the output driver via the gate bias voltage,  $V_{\rm gs\_cascode}$  provided as an external input to the chip.

#### V. MEASUREMENTS

Measurement results of the fabricated design are detailed in this section. All measurements were performed on-wafer. Fig. 21 is a die photo of the transmitter. The prototype transmitter integrates 165 spiral inductors and 80 CML gates into a die area of 1.71 mm × 1.83 mm. The design was implemented in IBM CMRF8SF 0.13  $\mu$ m technology. CMRF8SF technology provides 8 layers of metal with three thick copper and two thick aluminum layers. All devices in CML differential pairs have minimum drawn gate length of 0.12  $\mu$ m. Inductors are implemented using the 4th and 5th metal layers. Load resistors in the CML circuits are implemented in a special metal resistance layer. Where possible, neighboring metal layers are used to distribute  $V_{\rm DD}$  and ground to ensure low series resistance and increase decoupling capacitance for the supply voltage.

The total power consumption is 1.58 W. With a tail current of 50 mA, the output driver consumes 150 mW from a 3-V supply. The remaining 1.43 W is consumed from the 1.8-V supply, of which 35% is consumed by the clock distribution, 37% by the encoder, and the rest for the generation and buffering of the PWM-PE data.

In the test set-up, an off-chip phase shifter was used in the clock path to align the clock with the data at the input of the XOR stages. A re-timer flip flop in front of the XOR stage using the clock signal would solve this practical problem, and significantly improve the output eye.

Skewed PRBS  $2^{31} - 1$  patterns were used for both the LSB and MSB inputs in all measurements. The SMA cable and connectors have a 50  $\Omega$  characteristic impedance, but the terminations in the Tx circuits are designed for a 75  $\Omega$  coaxial cable; hence, some reflections appear in the eye diagrams.

#### A. 2-PAM Measurements

For testing in 2-PAM mode, the LSB input is terminated to ground. Fig. 22 illustrates the effect of duty-cycle control on binary (2-PAM) transmit mode. An external control voltage of 0 V yields 50% duty-cycle whereas a voltage of 0.56 V provides 76.3% duty-cycle as shown in Fig. 22(a) and (b). High control



Channel Loss (dB)

Fig. 22. Transmitter output eye diagrams at 16.25 Gb/s in binary transmit mode: Duty-cycle control.



Fig. 23. Transmitter output eye diagrams at 16.25 Gb/s in binary transmit mode: Amplitude control at 50% duty-cycle.

voltages result in the NRZ output signal shown in Fig. 22(c). In NRZ mode, the clock is totally switched to one side and the twolevel CML logic circuit (XOR) is made to work as a single-level CML buffer stage. Systematic jitter is clearly visible in NRZ mode of transmission. This is due to the asymmetrical nature of the CML combinational logic gates in the thermometer Graycode encoder and skew in signal paths on chip. The transmitter output has better jitter performance with PWM-PE, as the clock signal realigns transitions at the XOR gates. The output swing of the transmitter can be controlled by an external bias current that is mirrored to the tail current of the cascode output stage shown in Fig. 20. The eye diagrams shown in Fig. 23 are obtained by adjusting the total tail current of the output stage from 100% to 25% of the maximum while keeping the duty-cycle at 50%.

To determine the loss compensation capacity of the transmitter, a 30-m-long coaxial cable channel was used. Fig. 24(a) shows the measured transfer characteristics of the channel. The channel had a loss of 30.3 dB at 8.125 GHz (i.e., half the bit-rate of 16.25 GHz). Fig. 24(b) illustrates the transmitter output in binary mode with 53.2% duty-cycle. The corresponding channel output eye diagram is shown in Fig. 24(c). The output waveform has an eye amplitude of approximately 30 mV. The BER bathtub curve is shown in Fig. 24(d).  $2 \times 10^7$  measurements were taken and the oscilloscope extrapolated a BER better than  $10^{-12}$  with approximately 0.25 UI margin at the channel output.



(53.2% duty-cycle).

Fig. 24. Transmitter measurement results at 16.25 Gb/s with binary mode of operation.

data eve

## B. 4PAM Measurements

In 4PAM mode, both MSB and LSB inputs are used. Effect of duty cycle variation on 4-PAM transmit mode is shown in Fig. 25. Similar to binary transmit mode, jitter is much greater in NRZ mode than with PWM-PE. In addition to the asymmetrical nature of the CML AND & OR gates in the thermometer Graycode encoder being a contributing factor to the jitter, any skew between the MSB and LSB signal paths also contributes to the overall jitter performance.

The frequency response of the channel, which consists of six sections of 1-meter long SMA cables and the corresponding connectors, is shown in Fig. 26(a). For a data rate of 32.5 Gb/s (16.25 GSymbol/s), at half the output symbol rate (8.125 GHz), the channel had a loss of 8.9 dB and at half the bit-rate (16.25 GHz) it has a loss of 17.2 dB as shown in Fig. 26(a). By manually adjusting the duty-cycle of the transmit pulse, and thus the amount of pre-emphasis of the transmitter output, it is possible to obtain an open eye at the output of the channel. Fig. 26(b) illustrates the transmitter 4PAM output with 64% duty-cycle. The corresponding channel output eye diagram is shown in Fig. 26(c). Since the output eye is over-equalized, the



Fig. 25. Transmitter output eye diagrams at 32.5 Gb/s in 4-PAM transmit mode: Duty-cycle control. (a) 4-PAM output eye with 50% duty cycle. (b) 4-PAM output eye in NRZ mode.



Fig. 26. Transmitter measurement results at 32.5 Gb/s with 4-PAM mode of operation. (a) Channel characteristics. (b) Output of transmitter with 64% duty cycle (vertical scale = 200 mV/div). (c) Output of channel with 64% duty cycle (vertical scale = 200 mV/div).

 TABLE II

 COMPARSION OF 2-PAM TRANSMITTERS.

|           | CMOS    | Output<br>Swing | Data<br>Rate | Loss<br>Compensation | Power       |
|-----------|---------|-----------------|--------------|----------------------|-------------|
|           | Process | (mV)            | (Gb/s)       | (dB)                 | (mW)        |
| [4]       | 0.13-µm | 260             | 40           | -                    | $\sim 2700$ |
| [5]       | 0.13-µm | 350             | 30           | -                    | 150         |
| [20]      | 0.18-µm | 4000            | 13.6         | -                    | 600         |
| [1]       | 0.13-µm | 600             | 5            | 31 (@ 2.5GHz)        | 110         |
| [2]       | 90-nm   | 700             | 4            | 22 (@ 1.25GHz)       | -           |
| This work | 0.13-µm | 1250            | 16           | 30.3 (@ 8 GHz)       | 1578        |

TABLE III COMPARSION OF 4-PAM TRANSMITTERS.

|           |           | Output | Data   | Loss           |       |
|-----------|-----------|--------|--------|----------------|-------|
|           | CMOS      | Swing  | Rate   | Compensation   | Power |
|           | Process   | (mV)   | (Gb/s) | (dB)           | (mW)  |
| [21]      | 90-nm SOI | 520    | 25     | 3 (@ 6.25GHz)  | 101.8 |
| [22]      | 90-nm     | 800    | 24     | 14.5 (@ 6GHz)  | 510   |
| [23]      | 0.18-µm   | 600    | 10     | -              | 120   |
| [24]      | 0.25-µm   | 600    | 10     | 3.7 (@ 2.5GHz) | 222   |
| [25]      | 0.4-µm    | 1100   | 10     | -              | 1000  |
| This work | 0.13-µm   | 1250   | 32     | 8.9 (@ 8 GHz)  | 1578  |

maximum DC levels are basically reduced to the inner-eye for the 4-PAM signal in Fig. 26(c).

Table II compares this work with current state-of-the-art CMOS binary transmitters. Its loss compensation is amongst the best for binary transmitters, but the power consumption is much higher because pushing the CMOS technology to its limits necessitated the use of very low fanout CML design. A comparison with published CMOS 4-PAM transmitters is

provided in Table III. This transmitter is the first to incorporate PWM-PE in 4-PAM in addition to being the fastest 4-PAM transmitter in CMOS reported to date. The capability to switch between 4-PAM and 2-PAM, adjustable pre-emphasis (50%–75% duty-cycle, or NRZ), and adjustable output amplitude makes it suitable for use in a wide range of electrical wireline links.

## VI. CONCLUSIONS

CML design methodology is typically based on a square law MOSFET model. However, the square-law design methodology leads to designs that are too conservative for high-speed operations. By relaxing the requirement for the switching transistors to operate in saturation region at all times and relaxing the requirement of full-switching, faster CML circuits can be designed. This approach is used to design the CML gates in a 2-PAM/4-PAM CMOS transmitter incorporating PWM-PE. The transmitter can operate upto data rates of 16 Gsymbols/s with an output swing of  $1.2V_{pp}$  per side and can compensate upto 30 dB loss at 8-GHz in binary mode. In 4-PAM mode, the designed transmitter achieves the largest swing and speed reported to date.

#### REFERENCES

 J.-R. Schrader, E. A. M. Klumperink, J. L. Visschers, and B. Nauta, "Pulse-width modulation pre-emphasis applied in a wireline transmitter, achieving 33 dB loss compensation at 5-Gb/s in 0.13-μm CMOS," *IEEE J. Solid-State Circuits*, vol. 41, no. 4, pp. 990–999, Apr. 2006.

- [2] J. Schrader, E. Klumperink, and B. Nauta, "Wireline equalization using pulse-width modulation," in *Proc. 2006 Custom Integr. Circuits Conf.* (CICC), 2006, pp. 591–598.
- [3] J. Kim, J.-K. Kim, B.-J. Lee, N. Kim, D.-K. Jeong, and W. Kim, "A 20-GHz phase-locked loop for 40-Gb/s serializing transmitter in 0.13-μm CMOS," *IEEE J. Solid-State Circuits*, vol. 41, no. 4, pp. 899–908, Apr. 2006.
- [4] J. Kim, J.-K. Kim, B.-J. Lee, M.-S. Hwang, H.-R. Lee, S.-H. Lee, N. Kim, D.-K. Jeong, and W. Kim, "Circuit techniques for a 40-Gb/s transmitter in 0.13-μm CMOS," in *Proc. 2005 Int. Solid State Circuits Conf.* (*ISSCC*), 2005, pp. 150–151.
- [5] P. Westergaard, T. O. Dickson, and S. P. Voinigescu, "A 1.5-V, 20/30-Gb/s CMOS backplane driver with digital pre-emphasis," in *Proc. 2004 Custom Integr. Circuits Conf. (CICC)*, 2004, pp. 23–26.
- [6] P. Heydari and R. Mohanavelu, "Design of ultrahigh-speed low-voltage CMOS CML buffers and latches," *IEEE Trans. Very Large Scale Integr.* (VLSI) Syst., vol. 12, pp. 1081–1093, 2004.
- [7] P. Heydari and R. Mohavavelu, "Design of ultra high-speed CMOS CML buffers and latches," in *Proc. 2003 Int. Symp. on Circuits Syst.* (*ISCAS*), 2003, vol. 2, pp. 208–211.
- [8] M. M. Green and U. Singh, "Design of CMOS CML circuits for highspeed broadband communications," in *Proc. 2003 Int. Symp. on Circuits and Systems (ISCAS)*, 2003, pp. 204–207.
- [9] A. Tajalli and Y. Leblebici, "A slew controlled LVDS output driver circuit in 0.18 μm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 44, pp. 538–548, Feb. 2009.
- [10] M. Alioto and G. Palumbo, "Power-aware design of nanometer MCML tapered buffers," *IEEE Trans. Circuits Syst. II, Expr. Briefs*, vol. 55, no. 1, pp. 16–20, Jan. 2008.
- [11] H. Cheng and A. C. Carusone, "A 32/16 Gb/s 4/2-PAM transmitter with PWM pre-emphasis and 1.2 Vpp per side output swing in 0.13  $\mu$ m CMOS," in *Proc. 2008 Custom Integr. Circuits Conf. (CICC)*, 2008, pp. 635–638.
- [12] T. O. Dickson, K. H. K. Yau, T. Chalvatzis, A. M. Mangan, E. Laskin, R. Beerkens, P. Westergaard, M. Tazlauanu, M.-T. Yang, and S. P. Voinigescu, "The invariance of characteristic current densities in nanoscale MOSFETs and its impact on algorithmic design methodologies and design porting of si(ge) (Bi)CMOS high speed building blocks," *IEEE J. Solid-State Circuits*, vol. 41, no. 8, pp. 1830–1845, Aug. 2006.
- [13] J. Rogers, C. Plett, and F. Dai, *Integrated Circuit Design for High-Speed Frequency Synthesis*. Reading, MA: Artech House, 2006.
- [14] J.-K. Kim, J. Kim, S.-Y. Lee, S. Kim, and D.-K. Jeong, "A 26.5–37.5 GHz frequency divider and a 73-GHz-BW CML buffer in 0.13 μm CMOS," in *Proc. IEEE Asian Solid-State Circuits Conf.*, 2007, pp. 148–151.
- [15] U. Singh and M. M. Green, "High-frequency CML clock dividers in 0.13-µm CMOS operating up to 38 GHz," *IEEE J. Solid-State Circuits*, vol. 40, no. 8, pp. 1658–1661, Aug. 2005.
- [16] S. S. Mohan, M. del Mar Hershenson, S. P. Boyd, and T. H. Lee, "Bandwidth extension in CMOS with optimized on-chip inductors," *IEEE J. Solid-State Circuits*, vol. 35, no. 3, pp. 346–355, Mar. 2000.
- [17] A. Ho, V. Stojanovic, F. Chen, C. Werner, G. Tsang, E. Alon, R. Kollipara, J. Zerbe, and M. Horowitz, "Common-mode backchannel signaling system for differential high-speed links," in VLSI Circuits 2004. Dig. Tech. Papers Symp., 2004, pp. 352–355.
- Dig. Tech. Papers Symp., 2004, pp. 352–355.
  [18] T. Kuo and B. Lusignan, "A 1.5 W class-F RF power amplifier in 0.2 μm CMOS technology," in Proc. 2001 Int. Solid State Circuits Conf. (ISSCC), 2001, pp. 154–155.
- [19] A.-J. Annema, G. Geelen, and P. de Jong, "5.5-V I/O in a 2.5-V 0.25-µm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 36, no. 3, pp. 528–538, Mar. 2001.
- [20] D. Li and C. Tsai, "10-13.6 Gbit/s 0.18 μm CMOS modulator drivers with 8 Vpp differential output swing," *Electron. Lett.*, vol. 41, no. 11, pp. 643–644, 2005.
- [21] C. Menolfi, T. Toifl, R. Reutemann, M. Ruegg, P. Buchmann, M. Kossel, T. Morf, and M. Schmatz, "A 25 Gb/s PAM4 transmitter in 90 nm CMOS SOI," in *Proc. 2005 Int. Solid State Circuits Conf. (ISSCC)*, 2005, pp. 77–78.
- [22] A. Amirkhany, A. Abbasfar, J. Savoj, M. Jeeradit, B. Garlepp, V. Stojanovic, and M. Horowitz, "A 24 Gb/s software programmable multichannel transmitter," in VLSI Circuits 2007 Dig. Tech. Papers Symp., 2007, pp. 38–39.

- [23] K. Farzan and D. A. Johns, "A CMOS 10-Gb/s power-efficient 4-PAM transmitter," *IEEE J. Solid-State Circuits*, vol. 39, no. 3, pp. 529–532, Mar. 2004.
- [24] C. Lin and C. Tsai, "Multi-gigabit pre-emphasis design and analysis for serial link," *IEICE Trans. Electron.*, vol. E88-C, no. 10, pp. 2009–2019, Oct. 2005.
- [25] R. Farjad-Rad, C.-K. K. Yang, M. A. Horowitz, and T. H. Lee, "A 0.4-μm CMOS 10-Gb/s 4-PAM pre-emphasis serial link transmitter," *IEEE J. Solid-State Circuits*, vol. 34, no. 5, pp. 580–585, May 1999.



Horace Cheng (M'08) received the B.A.Sc. degree from the University of Waterloo, Waterloo, ON, Canada, in 2005, and the M.A.Sc. degree from the University of Toronto, Toronto, ON, Canada, in 2008. His research focus for his graduate degree was on high-speed CMOS transmitter with pulse-amplitude and pulsewidth modulation to combat lossy serial links.

He joined Synopsys Inc. as an R&D Engineer in 2008. Since then, he has been mainly working on high-speed circuits for various SERDES standards, with an emphasis on transmitter design.

Mr. Cheng was awarded the NSERC Canada Graduate Scholarships for his graduate studies.



**Faisal A. Musa** (S'03–M'08) received the Ph.D. degree from the University of Toronto, Toronto, ON, Canada, in 2007.

During the summer of 2004, he worked on the design of high-speed clock recovery systems at Intel's Circuits Research Labs, Hillsboro, OR. From 2006 to 2008, he was with Gennum Corporation, Burlington, ON, Canada, working on the design and verification of high-speed integrated circuits for video and data communication applications. During the fall of 2008, he joined the Department of Electrical and Computer

Engineering, University of Toronto, as a part-time Lecturer, and is currently a Research Associate. His research interests include modeling, design, and implementation of high-speed integrated circuits for chip-to-chip communications.



Anthony Chan Carusone (S'96-M'02–SM'08) received the B.A.Sc. and Ph.D. degrees from the University of Toronto, Toronto, ON, Canada, in 1997 and 2002, respectively.

Since 2001, he has been with the Department of Electrical and Computer Engineering, University of Toronto, where he is currently an Associate Professor. In 2008, he was a Visiting Researcher with the University of Pavia, Pavia, Italy, and later at the Circuits Research Laboratory, Intel Corporation, Hillsboro, OR.

Prof. Carusone is a member and past chair of the Analog Signal Processing Technical Committee for the IEEE Circuits and Systems (CAS) Society, a member and past chair of the Wireline Communications subcommittee of the Custom Integrated Circuits Conference, and an appointed member of the Administrative Committee of the IEEE Solid-State Circuits Society and the Board of Governors of the IEEE Circuits and Systems (CAS) Society. He has served as a Guest Editor for both the IEEE JOURNAL OF SOLID-STATE CIRCUITS and the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS. Since 2006, he has served on the editorial board of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, for which he is currently Editor-in-Chief. While at the University of Toronto, he was the recipient of the Governor General's Silver Medal. As a coauthor, he was the recipient of the Best Paper Award at the 2005 Compound Semiconductor Integrated Circuits Symposium and the Best Student Paper Awards at both the 2007 and 2008 Custom Integrated Circuits Conferences.