# A 60 mW per Lane, $4 \times 23$ -Gb/s $2^7$ -1 PRBS Generator

Ekaterina Laskin, Student Member, IEEE, and Sorin P. Voinigescu, Senior Member, IEEE

Abstract—An ultra-low-power,  $2^7 - 1$  PRBS generator with four, appropriately delayed, parallel output streams was designed. It was fabricated in a 150-GHz  $f_T$  SiGe BiCMOS technology and measured to work up to 23 Gb/s. The four-channel PRBS generator consumes 235 mW from 2.5 V, which results in only 60 mW per output lane. The circuit is based on a 2.5-mW BiCMOS CML latch topology, which, to the best of our knowledge, represents the lowest power for a latch operating above 10 Gb/s. A power consumption and speed comparison of series and parallel PRBS generation techniques is presented. Low-power BiCMOS CML latch topologies are analyzed using the OCTC method.

*Index Terms*—Current-mode logic, OCTC, pseudo-random bit sequence generator, SiGe BiCMOS, technology scaling.

#### I. INTRODUCTION

SEUDO-RANDOM bit sequence (PRBS) generators and checkers are widely used for testing the correct functionality of broadband integrated circuits, such as re-timers, SERDES blocks, and transceivers. State-of-the-art circuits often outperform commercially available test equipment. To avoid this testing problem, PRBS generators can be integrated on the same chip as the device under test for built-in self-test (BIST) purposes. For these applications, it is important that the generator be able to produce as long a sequence as possible, while consuming low power. Early high-speed PRBS generators employed III-V HBT technologies [1], [2], Si bipolar [3]-[5] and more recently SiGe bipolar [6]-[8], SiGe BiCMOS [9]–[11], and CMOS [12] technologies. Our group has recently reported a record 80-Gb/s PRBS generator with a  $2^{31}-1$  sequence length [13], [14]. However, due to the long sequence length, it was too large and power hungry to be used as an on-chip self-test block. The work in this paper is part of an effort to reduce the power consumption of PRBS generators, while maintaining the speed.

Previously, the design of PRBS generators has been limited to full-rate [9], half-rate [2], [6], [7], [11], or quarter-rate [13], [14] series architectures. In these implementations, further reduction of the core generator clock rate significantly complicates the design of the rest of the generator. As part of this work, it will be shown that parallel PRBS generation techniques [10], [15] can be applied to design PRBS generators that use a low clock rate in the core, and can still achieve a very high output bit-rate, low

The authors are with the Edward S. Rogers, Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 3G4, Canada (e-mail: laskin@eecg.toronto.edu).

Digital Object Identifier 10.1109/JSSC.2006.878112

power consumption, and small area, even for sequence lengths greater than  $2^7-1$ .

This paper reports an ultra-low-power  $2^7-1$  PRBS generator with four, appropriately delayed, parallel output streams at 23 Gb/s each, which can be further multiplexed to an aggregate PRBS output at 92 Gb/s with minimal circuitry. The low-power performance of the circuit is facilitated by topology choice and transistor-level circuit optimization. At the system level, power is optimized by employing a parallel, as opposed to series, PRBS generator topology, which avoids additional phase shifting circuitry and is suitable for generating signals that can be multiplexed directly. At the transistor level, power usage is optimized by the design of a low-power SiGe BiCMOS CML latch. Avenues for further power reduction and speed improvement are also discussed.

# II. COMPARISON OF PRBS GENERATOR ARCHITECTURES FOR HIGH-SPEED OPERATION

For very high-speed generation of PRBS sequences, it is useful to know which architecture is optimal for a particular application. The different options that can be considered are parallel versus series PRBS generator architectures and the level of multiplexing. The level of multiplexing determines how much slower, relative to the final output, the core generator is operated, thus requiring proportionally less power. However, if the multiplexing level is too deep, too much power might be spent in the multiplexer itself.

Series PRBS generators are linear feedback shift registers, where the length of the register n and the feedback function determine the length of the sequence  $p = 2^n - 1$  [16]. For multiplexing the sequence to q times the original bit rate, q original sequences, spaced apart by (p-1)/q bits in phase, are required [17]. An efficient,  $O(\log(n))$ , algorithm exists for obtaining the phase shifts [18], nevertheless, the number of XOR gates required to implement the phase shifts in hardware grows exponentially with q.

In contrast, in the parallel PRBS generator architecture, the phase shifted sequences are available directly from the generator. The  $n \times n$  transition matrix **T**, which can be obtained from the characteristic polynomial of the PRBS, proves useful for constructing parallel PRBS generators. A procedure for translating **T**<sup>q</sup> into the PRBS generator schematic with q parallel outputs is given in [19]. The resulting q outputs are phase shifted appropriately for direct multiplexing.

Table I presents a comparison of series and parallel generators, in terms of the number of gates required and the maximum fanout of gates needed to build the generator. Fanout in the PRBS generator chain determines the maximum speed at which the core generator can be operated for a given gate topology. The

Manuscript received February 3, 2006; revised April 19, 2006. This work was supported in part by NSERC and Micronet.

|                       |      | Series Architecture |                  |               |        |      | Parallel Architecture |        |  |
|-----------------------|------|---------------------|------------------|---------------|--------|------|-----------------------|--------|--|
|                       |      | N                   | Max.             | No. of Blocks |        | Max. |                       |        |  |
| Generator Type        | Rate | DFFs (Shift Reg.)   | DFFs (Re-timing) | XORs          | Fanout | DFFs | XORs                  | Fanout |  |
|                       | full | 7                   | 0                | 1             | 2      | -    | -                     | -      |  |
| $2^7 - 1$             | 1/2  | 7                   | 2                | 2             | 2      | 7    | 2                     | 2      |  |
| $x^7 + x^6 + 1$       | 1/4  | 7                   | 4                | 5             | 3      | 7    | 4                     | 2      |  |
|                       | 1/8  | 7                   | 8                | 11            | 3      | 8    | 8                     | 2      |  |
|                       | full | 15                  | 0                | 1             | 2      | -    | -                     | -      |  |
| $2^{15} - 1$          | 1/2  | 15                  | 2                | 2             | 2      | 15   | 2                     | 2      |  |
| $x^{15} + x^{14} + 1$ | 1/4  | 15                  | 4                | 5             | 3      | 15   | 4                     | 2      |  |
|                       | 1/8  | 15                  | 8                | 15            | 4      | 15   | 8                     | 2      |  |
|                       | 1/16 | 15                  | 16               | 38            | 8      | 16   | 16                    | 2      |  |
|                       | full | 31                  | 0                | 1             | 2      | -    | -                     | -      |  |
|                       | 1/2  | 31                  | 2                | 2             | 2      | 31   | 2                     | 2      |  |
| $2^{31} - 1$          | 1/4  | 31                  | 4                | 6             | 2      | 31   | 4                     | 2      |  |
| $x^{31} + x^{28} + 1$ | 1/8  | 31                  | 8                | 17            | 4      | 31   | 8                     | 2      |  |
|                       | 1/16 | 31                  | 16               | 48            | 7      | 31   | 16                    | 2      |  |
|                       | 1/32 | 31                  | 32               | 186           | 7      | 32   | 32                    | 2      |  |

 TABLE I

 COMPARISON OF THE CIRCUITRY REQUIRED FOR SERIES AND PARALLEL PRBS GENERATORS OF DIFFERENT SIZES

number of gates determines the area of the PRBS generator. The latter is also related to operation speed because greater area implies that some gates have to drive longer lines, which limits the overall achievable bit rate. The overall power that the generator will consume is directly proportional to the number of blocks required to build it. Note that the count of blocks indicated in Table I does not include the multiplexers required to increase core generator output to the final bit rate.

To illustrate the differences between series and parallel topologies, an example is provided in Fig. 1 for a  $2^7-1$ , 1/8-rate PRBS generator. Fig. 1(a) shows the series topology implementation where seven D-flip-flops (DFFs) are used in the shift register, 10 XOR gates are required to form eight phase-shifted sequences according to the algorithm given in [18], and another eight DFFs are used to re-time the signals to equalize their delays. Re-timing is essential to achieve correct operation above 10 Gb/s. The corresponding parallel topology implementation can be derived from  $T^8$  of a  $2^7-1$  PRBS. The parallel topology shown in Fig. 1(b) uses eight XOR gates and eight DFFs. Hence, in the parallel implementation, all outputs are shifted and re-timed.

Parallel PRBS generators can be constructed for any sequence length. As an example, a parallel, eight-output,  $2^{31}-1$  generator with an 8-to-4 multiplexer is shown in Fig. 1(c). Assuming that the four multiplexer outputs are 20-Gb/s signals, such that an 80-Gb/s output is produced after further 4-to-1 multiplexing, the shown configuration is much more power efficient than an 80-Gb/s series  $2^{31}-1$  PRBS generator which requires phase shifting circuitry [14].

From Table I, it is apparent that parallel PRBS generators outperform series generators in all cases where the PRBS is generated below the full data rate and multiplexing is applied. In practice, for the sequence to be generated at full rate, a complex and power hungry design is required for each flipflop in the chain. On the other hand, when multiplexing is used, only the last stage of multiplexing needs to operate at the full data rate. This greatly simplifies the overall design and results in a smaller and more power efficient circuit.

Parallel PRBS generators have several other advantages over series generators in high-speed applications. First, the fanout of the XOR gates and flip-flops is uniform throughout the structure, making it easier to design each block and to lay them out. Second, re-timing of each combinational logic gate is essential for correct operation above 10 Gb/s because gate delays are a large fraction of the clock cycle. Conveniently, parallel generators are structured such that all parallel outputs are automatically re-timed and there is only one XOR gate between each two flip-flops. On the other hand, series generators require a very large number of XOR gates to produce appropriately shifted sequences as the multiplexing ratio is increased (Table I) making them highly impractical. Third, since all outputs of the parallel PRBS generators are retimed, the first stage of multiplexing can be simplified. Instead of employing the usual high-speed multiplexer that consists of five latches and a selector [20], only one latch and a selector are needed in this case. This further saves power and area of the overall generator.

# III. HIGH-SPEED LOGIC TOPOLOGIES

## A. CML Latch Design

This section will present several possible BiCMOS currentmode logic (CML) latch topologies. BiCMOS CML logic based on the MOS-HBT cascode [21] is employed throughout this work for several reasons. First, the  $V_{GS}$  of MOSFETs is lower than the  $V_{BE}$  of HBTs and thus allows lowering the supply voltage. Second, since MOSFETs are better switches, they are used on the clock path, resulting in the MOS-HBT cascode having lower input time constant  $\tau_{in} \approx R_G(C_{GS} + C_{GD})$  compared to that of an HBT-only cascode  $\tau_{in} \approx R_B(C_{BE}+2C_{BC})$ . Third, in this logic family the upper transistors are bipolar because they provide higher gain and better sensitivity on the data path. The latch is chosen as a representative block for analysis because it contains the largest output capacitance and because it operates at the full clock-rate frequency. The design of latches shown in Fig. 2 will be described first, followed by a performance comparison.

The design of a CML logic gate starts by selecting the DC voltage levels at each node. The DC levels have to be such that when the inputs and output nodes are balanced (have zero differential signal) then MOS transistors are in saturation and HBTs



Fig. 1. Parallel and series implementations of PRBS generators. (a) Series  $2^7 - 1$  PRBS generator with eight outputs. (b) Parallel  $2^7 - 1$  PRBS generator with all eight re-timed outputs. (c) Parallel  $2^{31} - 1$  PRBS generator with eight re-timed outputs and 8-to-4 MUX.

are in the active region. Thus, the  $V_{DS}$  of nMOS and  $V_{CE}$  of HBT transistors have to be approximately 0.7 and 0.9 V, respectively (in a 0.13- $\mu$ m SiGe BiCMOS technology). Next, the tail current  $I_{\text{tail}}$  and load resistors  $R_L$  are chosen to produce the desired voltage swing  $\Delta V$ :

$$\Delta V = I_{\text{tail}} R_L. \tag{1}$$

 $\Delta V$  is the single-ended voltage difference between the logic-low and logic-high levels of the gate. When the inputs and output are balanced, the voltage drop across  $R_L$  is  $\Delta V/2$ . The minimum supply voltage  $V_{DD}$  required for this gate is

given by the sum of all  $V_{BE}$  and  $V_{GS}$  voltage drops in the transistor stack and the voltage drop across  $R_L$ . The power consumption of the gate is then  $I_{\text{tail}}V_{DD}$ , independent of the switching speed.

The switching speed of the gate depends on  $\Delta V$ ,  $I_{\text{tail}}$ , and node capacitances:

switching time 
$$\propto \frac{\Delta V \times (C/W)}{I_{\text{tail}}/W}$$
 (2)

where C/W and  $I_{\text{tail}}/W$  are technology parameters. Hence, to increase the switching speed, the bias point must be chosen such that the  $\Delta V$  needed to fully switch the transistors is small. Also,



Fig. 2. CML latch schematics. (a) BiCMOS CML latch. (b) BiCMOS CML latch without current source.

the transistors themselves must be small to minimize capacitance, while the current must be as large as possible. However, increasing the current density  $I_{\text{tail}}/W$  beyond the peak- $f_T$ current density of 0.3 mA/ $\mu$ m increases  $\Delta V$  without improving speed [22]. For MOSFETs, the region below 0.15 mA/ $\mu$ m current density bias corresponds to operation according to the square law model. In the square law region, the swing required to fully switch the transistor is given by [23]

$$\Delta V > \sqrt{2} V_{\rm EFF} \tag{3}$$

where  $V_{\text{EFF}} = V_{GS} - V_{Tn}$  is the effective gate voltage when the tail current is split equally between the two branches and  $V_{Tn}$  is a function of  $V_{DS}$ . However, when the device is biased at or above the peak- $f_T$  current density, the square law no longer applies. In this case the swing required to switch the transistor is approximately [21]

$$\Delta V > 2V_{\rm EFF}.\tag{4}$$

The  $\Delta V$  of (4) is larger than that of (3) because both the coefficient and  $V_{\text{EFF}}$  are larger. Thus, for best performance, MOSFETs have to be biased close to half peak- $f_T$  current

density. Therefore, the bias current  $I_{\text{tail}}$  is chosen to be [21], [24]

$$I_{\text{tail}} = W_{\text{gate}} \times J_{\text{pfTMOS}} = W_{\text{gate}} \times 0.3 \, \frac{\text{mA}}{\mu \text{m}}.$$
 (5)

As a result, the current through the differential pair MOSFETs varies between zero and full peak- $f_T$  current density when the inputs are switched to one side. The  $V_{\rm EFF}$  that corresponds to half peak- $f_T$  current density is 400 mV in a 0.13- $\mu$ m technology. To account for temperature and process variations,  $\Delta V$  is chosen to be 400 to 500 mV [24].

The swing needed to fully switch a bipolar transistor can be as low as 6 times the thermal voltage [25], but in practice needs to be 200 to 300 mV when temperature, process variations, and  $R_E I_{\text{tail}}$  voltage drop are taken into account [25]. The tail current for SiGe HBT transistors is chosen such that it corresponds to 0.75 times peak- $f_T$  current density when the inputs are balanced, or to 1.5 times peak- $f_T$  current density when the inputs are switched [26], [27]:

$$I_{\text{tail}} = 1.5 \times w_e \times l_e \times J_{\text{pfTHBT}}.$$
 (6)

Even though in the HBT case the peak- $f_T$  current density is not constant across technologies, it is still constant for differentsized HBTs, when the bias current is normalized to the emitter area.

The performance of latches can be compared based on their time constant  $\tau$ . The time constant is suitable for comparison because both the propagation delay through the latch and the rise and fall times are proportional to it. An approximation of  $\tau$  for latches can be derived similarly to that for cascode circuits [21], using the open-circuit time-constants (OCTCs) and accounting for the fanout k (7). When deriving (7), it is assumed that the latch is loaded by a similar latch, where the tail current is  $k \times I_{\text{tail}}$ and in which all transistors and their capacitances are k times larger. Also, it is assumed that the output of the latch is connected to the top (bipolar) pair and not to the bottom MOSFET pair. Fig. 3 illustrates the relevant parasitic capacitances used to derive the latch time constant  $\tau$ . The first term of (7) is the time constant at the clock input of the latch ( $\tau_{in}$ ). The second term  $(\tau_{\rm mid})$  is the time constant in the middle (cascode) node of the latch. The third term  $(\tau_{out})$  represents the charging of output capacitances by the tail current. It takes into account the Miller capacitance of the latching pair. The fourth term ( $\tau_{\text{fanout}}$ ) describes the fanout of the latch.

 $\tau_{\rm BiCMOS-Latch}$ 

$$\approx \tau_{\rm in} + \tau_{\rm mid} + \tau_{\rm out} + \tau_{\rm fanout}$$

$$\approx \frac{R_G}{R_L} \frac{\Delta V}{I_{\rm tail}} \left\{ C_{GS} + \left( 1 + \frac{g_{m,\rm MOS}}{g_{m,\rm HBT}} \right) C_{GD} \right\}$$

$$+ \frac{2C_{BE} + C_{DB} + C_{GD}}{g_{m,\rm HBT}} + \frac{\Delta V}{I_{\rm tail}}$$

$$\cdot \frac{\left\{ 2C_{BC} + 2C_{CS} + C_{BE} + (1 + g_{m,\rm HBT}R_L) C_{BC} + C_{\rm INT} \right\}}{1.6}$$

$$+ k \left( \frac{\Delta V}{I_{\rm tail}} + \frac{R_B}{k} \right) \frac{\left\{ C_{BE} + (1 + g_{m,\rm HBT}R_L) C_{BC} \right\}}{1.6}.$$
 (7)

Equation (7) includes the impact of shunt peaking inductors that appear in the latches of Fig. 2. The inductors are added as

| -         |                                         |                 |             |      |                                 |             |             |        |
|-----------|-----------------------------------------|-----------------|-------------|------|---------------------------------|-------------|-------------|--------|
|           | 0.13                                    | $\mu m$ SiGe Bi | CMOS        |      | 90nm SiGe BiCMOS                |             |             |        |
| Latch     | NMOS $\left(\frac{W}{L}\right)$         | HBT $(l_e)$     | $R_L$       | L    | NMOS $\left(\frac{W}{L}\right)$ | HBT $(l_e)$ | $R_L$       | L      |
| Fig. 2(a) | $2 \times \frac{2\mu m}{0.13\mu m}$     | $0.64\mu m$     | $400\Omega$ | 0    | $2 \times \frac{2\mu m}{90nm}$  | $0.50\mu m$ | $200\Omega$ | 0      |
| Fig. 2(b) | $2 \times \frac{1  \mu m}{0.13  \mu m}$ | $0.64\mu m$     | $400\Omega$ | 0    | $2 \times \frac{1\mu m}{90nm}$  | $0.50\mu m$ | $200\Omega$ | 0      |
| Fig. 2(a) | $2 \times \frac{2\mu m}{0.13\mu m}$     | $0.64\mu m$     | $400\Omega$ | 2 nH | $2 \times \frac{2\mu m}{90nm}$  | $0.50\mu m$ | $200\Omega$ | 500 pH |
| Fig. 2(b) | $2 \times \frac{1  \mu m}{0.13  \mu m}$ | $0.64\mu m$     | $400\Omega$ | 2 nH | $2 \times \frac{1\mu m}{90nm}$  | $0.50\mu m$ | $200\Omega$ | 500 pH |

 TABLE II

 Device Sizes for 1-mA Latches in Each Configuration



Fig. 3. BiCMOS CML latch half-circuit with parasitics.

part of the load and are used to extend the bandwidth of the circuit by reducing the effect of the output capacitance (which is dominant).

The peaking inductors do not affect the biasing of the transistors, but they reduce the output time constant of the latch, and thus increase its speed. For flat group delay response (i.e., minimum deterministic jitter), the inductor value is selected according to [28]

$$L = \frac{C_{\text{out}} R_L^2}{3.1} \tag{8}$$

where  $R_L$  is the load resistance and  $C_{out}$  is the total capacitance at the output node. This value of L improves the output time constant 1.6 times as indicated in (7).

#### B. Power and Speed Optimization of CML Latches

The CML latch presented in the previous section requires a supply voltage of 2.5 V. As seen in Fig. 2(a), 0.4 V is allocated on the transistor that sets the tail current in the latch. This transistor can be eliminated to reduce power consumption without sacrificing performance. The new latch configuration is shown in Fig. 2(b). Now, the latch can operate from 1.8 V, or lower, with the same total current as before. The speed performance is maintained because  $\Delta V$ ,  $I_{\text{tail}}$ , and all capacitances are kept constant. However, precaution must be taken in the design process to ensure that the current through the latches of Fig. 2(b) is the same as in the latches of Fig. 2(a). The supply voltage can be further reduced with newer process technologies, in which smaller voltage drops are needed for each stacked MOSFET transistor.

Biasing of the CML latches of Fig. 2(b) proceeds as before. Since there now are two separate branches that go to ground, the current in each branch is  $I_{\text{branch}} = 0.5 \cdot I_{\text{tail}}$ , where  $I_{\text{tail}}$  is the corresponding tail current of the latches in Fig. 2(a). The MOS transistors are sized such that  $W_{\text{gate}} = I_{\text{branch}}/(0.5J_{\text{pfTMOS}} = I_{\text{branch}}/(0.15 \text{ mA}/\mu\text{m})$  when there is zero differential clock input. The SiGe HBT transistors are sized such that  $w_e \times l_e = I_{\text{branch}}/(0.75J_{\text{pfTHBT}})$  when there is zero differential data input to the latch. This choice of  $I_{\text{branch}}$  and transistor sizes results in peak- $f_T$  current density biasing and thus maintains the optimal switching characteristics described in Section III-A. The time constant  $\tau$  for the latch of Fig. 2(b) can also be calculated using (7).

#### C. Performance Comparison and Scaling

This section presents a performance comparison of the various latches described earlier. The comparison is carried out both with hand calculations, based on technology data, and with simulations of the two latches under identical conditions. The calculations and the simulations are conducted for two technologies, to be able to predict the feasibility of the proposed latch topologies for future applications. The first is a production 0.13- $\mu$ m SiGe BiCMOS technology with transistor  $f_T$  of 150 GHz [29]. The second is a 90-nm SiGe BiCMOS technology under development with transistor  $f_T$  of 220 GHz [30].

To make the comparison fair, all latches were designed to operate with a total current consumption of 1 mA. A current of 1 mA was chosen because it is the current that allows a minimum-size HBT in the  $0.13-\mu m$  SiGe BiCMOS technology to be biased for maximum speed. Next, the maximum bit rate at which the latch operated properly was observed. Proper operation condition is reached when the output swing is equal to the designed swing.

For hand calculations, (7) was employed. The device sizes used to realize the 1-mA latches are given in Table II for each latch configuration. The inductors can be designed as multi-metal spirals with narrow width (because high Q is not needed) and minimum spacing to maximize inductance per area [26]. Table III compares the performance of the latches based on power consumption, the calculated time constant  $\tau$ , and the maximum simulated speed of operation. The latches in each simulation had a fanout of 1.  $\Delta V$  is different between the two cases because more voltage is needed to fully switch a 0.13- $\mu$ m MOSFET than a 90-nm MOSFET [22].

The latch shown in Fig. 2(a) (but without inductors) was fabricated in 0.13- $\mu$ m SiGe BiCMOS technology, as part of the PRBS generator that will be described in the next section. The latch was found to work correctly up to 12 Gb/s. The performance of this latch after fabrication agrees closely with simu-

 $0.13 \, \mu m$  SiGe BiCMOS 90 nm SiGe BiCMOS Latch Power Bit Rate Power Bit Rate  $\tau$  $\Delta V$  $\tau$  $\Delta V$ 400 mV Fig. 2(a) (L = 0)14.1 ps 2.5 mW 15 Gb/s 2.2 mW 200 mV 30 Gb/s 6.3 ps Fig. 2(b) (L = 0)13.9 ps 1.8 mW 400 mV 16 Gb/s 5.9 ps 1.5 mW 200 mV 30 Gb/s Fig. 2(a) 4.2 ps 2.2 mW 200 mV 9.1 ps 2.5 mW 400 mV 19 Gb/s 40 Gb/s Fig. 2(b) 1.8 mW 1.5 mW 200 mV 40 Gb/s 8.9 ps 400 mV 19 Gb/s 3.9 ps

TABLE III PERFORMANCE COMPARISON OF THE DIFFERENT LATCH TOPOLOGIES



Fig. 4. System schematic of the designed four-output  $2^7 - 1$  PRBS generator.

lation results that include layout parasitics. This confirms the validity of the simulation results presented for the other latch topologies. The analysis presented in this section demonstrates that it is possible to reduce the supply voltage without increasing the tail current, thus saving power. Furthermore, it is possible to significantly increase the speed of a latch, with the sacrifice of some area, by adding 500-pH peaking inductors that can be designed with a diameter of 10  $\mu$ m [22].

# IV. CHIP DESIGN

#### A. Chip Architecture

A  $2^7-1$  parallel PRBS generator was designed using the concepts presented above. The block diagram of the chip is shown in Fig. 4. All signals in the system are differential. The only high-speed input to the system is an 11.5-GHz clock signal, which is distributed to all the components of the chip using a tree of clock buffers. A  $2^7-1$  PRBS generator produces eight parallel pseudo-random bit sequences, which are shifted appropriately for direct multiplexing. An 8-to-4 multiplexer combines the eight sequences into four sequences at 23 Gb/s each. The four outputs are also shifted with respect to each other, such

that they can be directly multiplexed to 92 Gb/s. One of the four outputs is provided off-chip for testing.

As discussed above, there are two topology options, series and parallel, to implement a PRBS generator with parallel outputs. In the case of the series PRBS generator [Fig. 1(a)], re-timing flip-flops are required after the combinational logic to align all signals with the clock, before multiplexing. The second problem with combinational logic is that it requires the fanouts of the shift register flip-flops to be different, and therefore have different delays. Even very small timing variations can significantly affect operation at high speeds. The total number of gates needed in this case is 11 XOR gates, 15 D-flip-flops, and four clock buffers, resulting in an estimated power of 263 mW for 12-Gb/s operation. The parallel generator [Fig. 1(b)] avoids the problems mentioned above thanks to its regular structure. The outputs are automatically re-timed and delayed appropriately. The fanout for all XOR gates and flip-flops is uniform, thus delays are equalized. The total number of gates needed in this case is eight XOR gates, eight D-flip-flops, and two clock buffers, consuming approximately 140 mW at 12 Gb/s. The parallel PRBS generator was chosen to be implemented for this system because it saves area and 47% power compared to the series generator.

TABLE IV LATCH DEVICE SIZES AND BIASING

### B. High-Speed Blocks

Once high-level system simulations were completed, each individual block was designed at the transistor level and simulated using Spectre. The design of each block will be given in this section.

1) Latch: Three types of latches were designed for different parts of the system. All three employ the same basic BiCMOS CML topology of Fig. 2(a) without inductive peaking but have different component values. This is done to customize each latch to its load conditions and thus save power where the load (fanout) is small. Transistor sizes and biasing conditions of the three types of latches are summarized in Table IV.

In this work, the goal was to achieve the lowest power consumption possible. Therefore, latches with low fanout, like master latches of DFFs, were designed with 1-mA tail current according to (5) and (6). Simulations with extracted parasitics indicated that the 1-mA latches worked up to 12 Gb/s, which met the design goal, so it was not necessary to further increase the current in the master latches for achieving the desired bit rate.

The output swing  $\Delta V = R_L \times I_{\text{tail}}$  of the latches was changed depending on the next stage following the latch. If the latch was used to drive the HBT pair of a BiCMOS block (as is the case in the master latch of a DFF),  $\Delta V$  was set to 300 mV, which is adequate to fully switch an HBT differential pair. If the latch was used to drive the MOS pair through a stage of emitter-followers, then  $\Delta V$  was set to 500 mV, which is required to switch a MOSFET differential pair in 0.13- $\mu$ m technology. This configuration was employed for latches inside the multiplexer.

In places where the fanout of a latch was larger than 2, the latch tail current was increased to 2 mA. Transistor sizes were scaled accordingly. This configuration was used in the slave latches of DFFs that had to drive two XOR gates, a 2-to-1 MUX, and the associated interconnect.

All latches and gates in this chip use the BiCMOS CML logic topology, but differ from previous designs [21]. In this design the feedback source followers are removed to save power, and peaking inductors are removed to save area. These changes are possible because the parallel PRBS architecture allows the shift register to operate at lower bit-rates than in [14].

2) *DFF:* D-flip-flops were used in the core part of the PRBS generator. A schematic of the DFF is illustrated in Fig. 5(a), showing the master and slave latches and the emitter-followers at the clock inputs.

The DFF topology is also an improved version of the one presented in [21]. The clock source followers are replaced by emitter-followers which are able to drive a larger capacitance per unit current. To reduce the load on the clock distribution



Fig. 5. D-flip-flop and 2-to-1 multiplexer schematics. (a) DFF. (b) 2-to-1 MUX.

buffers and to save power compared to a DFF configuration where each latch has its own emitter followers, this DFF contains only one set of emitter followers for both latches.

The DFFs used in the PRBS generator employ the 1-mA latch as the master and the 2-mA latch as the slave (Table IV). The slave latch of each generator DFF needs a larger tail current because it has to drive two XOR gates and a 2-to-1 MUX. Together with the clock emitter-followers, this results in a current of 5 mA from 2.5 V, thus a power dissipation of only 12.5 mW for a DFF that operates at 12 Gb/s.

3) Selector, XOR, and AND Gates: In addition to latches, the other digital blocks that are used in this system include selectors, XOR gates, and AND gates. They are also based on the BiCMOS CML logic topology. Selectors are employed in the final stage of each 2-to-1 MUX. To achieve a 24-Gb/s operation, the tail current was chosen to be 2 mA, with a single-ended swing of 250 mV. Transistors were sized by following the same procedure as for the latch.

XOR gates and AND gates are designed with 1-mA tail currents because their fanout is 1 in most cases. However, they differ from the latch and the selector topology by having emitter-followers at one of the inputs. These emitter-followers are necessary to step-down the DC voltage level from the top HBT pair to the bottom MOS transistor pair. They cannot be shared between gates.

4) 24-Gb/s 2-to-1 MUX: The 2-to-1 MUX block is repeated four times to build the 8-to-4 MUX that outputs four 24-Gb/s PRBS streams. The 2-to-1 MUX schematic is shown in Fig. 5(b). Note that only one latch and one selector are used to build the MUX, unlike the more common five latches and selector configuration [20]. This is acceptable because the signals going from the PRBS generator into the MUX are already re-timed, as can be seen in the system schematic (Fig. 4).

Since the non-latched input to the selector comes from the generator DFFs, which have 500-mV swing, the latched input



Fig. 6. Buffer schematic.

TABLE V BUFFER COMPONENT SIZES AND BIASING

|            | Clock Buffer     | Data Buffer         | Output Buffer                   |
|------------|------------------|---------------------|---------------------------------|
| $I_{tail}$ | 2 mA             | 1 mA                | 12 mA                           |
| Swing      | 450 mV           | 550 mV              | 300 mV                          |
| $R_L$      | $220\Omega$      | $550 \Omega$        | $50 \Omega \parallel 50 \Omega$ |
| Ι          | 1 mA             | 0.5 mA              | 3 mA                            |
| $Q_1, Q_2$ | $l_e = 2  \mu m$ | $l_e = 0.64  \mu m$ | $l_e = 11  \mu m$               |
| $Q_3, Q_4$ | $l_e = 4  \mu m$ | $l_e = 0.64  \mu m$ | $l_e = 6  \mu m$                |

must also have 500-mV swing. Therefore, a 1-mA latch with 500-mV swing (Table IV) is used in the 2-to-1 MUX in front of the selector. The clock emitter followers are shared between the latch and the selector of the 2-to-1 MUX, as in the DFF.

5) Clock, Data, and Output Buffers: One of the most important parts of the PRBS generator and checker system is the clock tree. It is a tree of CML buffers designed to deliver the 12-GHz clock signal synchronously to all latches in the system. The schematic of one clock buffer is shown in Fig. 6. It consists of an HBT differential pair preceded by emitter followers. Transistor sizes and bias are summarized in Table V. The swing is set to 450 mV, to be able to switch the MOS transistors at the clock inputs of the latches. The tail current in the differential pair is set to 2 mA for adequate bandwidth.

To reduce the number of clock buffers in the system, and thus to save power, the fanout of each buffer is set to 4. (This is illustrated in Fig. 4.) This high fanout is possible because in each flip flop, the emitter followers on the clock path are shared among the two latches. They also serve as the final stage of clock buffering.

It is very important to ensure that the paths traveled by the clock signal have identical delays. Thus, attention was paid in the layout to provide equal-length connections between clock buffers and from the clock buffers to the flip-flops.

In addition to clock buffers, data buffers and  $50-\Omega$  output buffers are also described in Table V. Data buffers are employed as intermediate buffers to enhance the signal, or before driving a large load.  $50-\Omega$  output buffers are used only on the outputs, to drive external  $50-\Omega$  loads. The  $50-\Omega$  load and the 300-mVswing requirement restrict the tail current in these buffers to be 12 mA.



Fig. 7. Die photo of the fabricated chip.



Fig. 8. Measurement setup for the PRBS generator.

# V. FABRICATION AND RESULTS

The chip was fabricated in the STMicroelectronics 0.13- $\mu$ m SiGe BiCMOS technology with HBT  $f_T$  of 150 GHz [29] and six metal layers. The die photo of the fabricated chip is shown in Fig. 7, with the PRBS generator and checker identified. The total, pad-limited chip area is 1 mm × 0.8 mm. The PRBS generator and 8-to-4 MUX together occupy an area of 393  $\mu$ m × 178  $\mu$ m and consume 235 mW. A small area is achieved partly because inductors are not employed anywhere in this design. The PRBS checker and error counter have an area of 308  $\mu$ m × 349  $\mu$ m and power consumption of 350 mW. They could not be tested at this time due to the unavailability of an on-chip CDR circuit. The rest of the power is consumed in the output buffers, adding up to a total measured power consumption of 940 mW.

The PRBS generator part of the chip was tested using an Agilent E4448A PSA series spectrum analyzer for verifying the bit-rate and periodicity of the generated PRB-sequence on one of the two differential outputs. Furthermore, an Agilent 86100C DCAJ oscilloscope was employed to monitor the other differential output. The oscilloscope is capable of identifying, locking, and characterizing the jitter of digital sequences as long as  $2^{15}$ –1 at data rates beyond 40 Gb/s. In the absence of a 40-Gb/s BERT, use of the oscilloscope was essential for confirming the correctness of the generated sequence.

The measurement setup for the PRBS generator circuit is shown in Fig. 8. The input clock and the output PRBS signal are provided onto and off the chip using differential 67-GHz GSGSG probes. The clock was applied to only one side of the differential input. The output signal was taken from both sides of the differential output. One side was connected through a DC-blocking capacitor to the remote head of the digital



Fig. 9. Measured PRBS generator performance at 23 Gb/s. (a) Spectrum of the generated PRBS at 23 Gb/s. (b) Spectrum of the generated PRBS at 23 Gb/s (zoomed). (c) Eye diagram of the generated PRBS at 23 Gb/s. (d) Locked time-domain sequence at 23 Gb/s.

oscilloscope. The other output was connected through another blocking capacitor to the spectrum analyzer.

The  $2^7-1$  PRBS generator (together with the 8-to-4 MUX) was tested by applying a clock signal and verifying the correctness of the generated sequence. The measurement results of the 23-Gb/s PRBS are summarized in Fig. 9(a)-(d) with an 11.5-GHz clock signal. Fig. 9(a) shows the spectrum of the 23-Gb/s PRBS output. It has a  $\sin(x)/x$ -type shape with nulls at multiples of the clock frequency, indicating non-return-to-zero (NRZ) logic. A zoomed-in version of the same spectrum is shown in Fig. 9(b), with spectral tones spaced apart by 180.9 MHz. This tone spacing is equal to the bit-rate divided by the sequence length 180.9 MHz = (23 Gb/s)/(127 bits), indicating that the correct pattern length of 127 bits is achieved. Fig. 9(c) demonstrates a fully open eye diagram at 23 Gb/s. However, this does not guarantee that every bit of the generated sequence is correct. To confirm the correctness of the sequence, the oscilloscope was locked to a 127-bit long pattern, and the pattern was checked bit-by-bit by scrolling through it [Fig. 9(d)]. The PRBS outputs at 12 and 23 Gb/s were saved using the oscilloscope, and plotted against an ideal  $2^7-1$ PRBS, as illustrated in Fig. 10. Correct PRBS generation was also obtained with clock frequencies as low as 100 MHz, demonstrating the very wide bandwidth of the PRBS generator.

With a 12-GHz input clock and a 24-Gb/s output, a wide open eye was obtained [Fig. 11(a)]. Also, the spectrum tones have the right spacing of 189.2 MHz = (24 Gb/s)/(127 bits)[Fig. 11(b)]. However, the oscilloscope could not be locked to the sequence to observe it in time domain due to a rather noisy spectrum. Therefore, even though all logic blocks inside the generator operate up to 24 Gb/s, as indicated by the spectrum, because of their delay relative to the clock cycle time, PRBS operation can only be guaranteed up to 23 Gb/s. The  $2^7-1$ PRBS generator produces 4, appropriately delayed, parallel



Fig. 10. Measured 23-Gb/s (top), measured 12-Gb/s (middle), and ideal (bottom) time domain  $2^7 - 1$  PRB-sequences.



Fig. 11. Measured PRBS generator performance at 24 Gb/s. (a) Eye diagram of the output at 24 Gb/s. (b) Spectrum of the generated PRBS at 24 Gb/s (zoomed).

output streams at 23 Gb/s each, which can be further multiplexed to an aggregate PRBS output at 92 Gb/s with minimal circuitry. The four-channel PRBS generator consumes 235 mW from 2.5 V, which results in only 60 mW per output lane.

In the generator core, latches that consume 2.5 mW are switching at 12 Gb/s. To the best of our knowledge, this is the lowest power latch operating above 10 Gb/s in any technology [31]. This BiCMOS CML latch implementation works with 1-mA tail current from a 2.5-V supply. Other recently reported sub-3.3-V bipolar logic families [7], [11], [32] consume significantly more power because they require doubling the tail

| Reference           | Power  | Bit-rate | Length       | Technology                                                | FOM [pJ/bit] |
|---------------------|--------|----------|--------------|-----------------------------------------------------------|--------------|
| This Work [31]      | 243 mW | 23 Gb/s  | $2^7 - 1$    | $0.13  \mu m$ SiGe BiCMOS, 150 GHz $f_T / f_{MAX}$        | 1.51         |
| T. O. Dickson [14]  | 9.8 W  | 80 Gb/s  | $2^{31} - 1$ | $0.13  \mu m$ SiGe BiCMOS, 150 GHz $f_T/f_{MAX}$          | 3.95         |
| D. Kucharski [11]   | 550 mW | 40 Gb/s  | $2^7 - 1$    | SiGe BiCMOS, 120 GHz $f_T$                                | 1.97         |
| O. Wohlgemuth [8]   | 950 mW | 86 Gb/s  | $2^7 - 1$    | SiGe bipolar, 200 GHz $f_T$ / 240 GHz $f_{MAX}$           | 1.58         |
| H. D. Wohlmuth [12] | 205 mW | 13 Gb/s  | $2^7 - 1$    | $0.12\mu m$ CMOS, 100 GHz $f_T/$ 50 GHz $f_{MAX}$         | 2.26         |
| H. Knapp [7]        | 1.5 W  | 100 Gb/s | $2^7 - 1$    | SiGe binder 200 GHz $f_{\pi}/f_{MAX}$                     | 2.15         |
|                     | 1.9 W  | 54 Gb/s  | $2^{11} - 1$ |                                                           | 3.21         |
| H. Veenstra [2]     | 1.75 W | 58 Gb/s  | $2^7 - 1$    | InP HBT, 170 GHz $f_T$                                    | 4.33         |
| S. Kim [10]         | 1.32 W | 50 Gb/s  | $2^7 - 1$    | $0.18\mu m$ SiGe BiCMOS, 120 GHz $f_T/$ 100 GHz $f_{MAX}$ | 3.77         |
| R. Malasani [9]     | 2.3 W  | 15 Gb/s  | $2^{31} - 1$ | $0.25\mu m$ SiGe BiCMOS, 80 GHz $f_T$ / 100 GHz $f_{MAX}$ | 4.95         |
| H. Knapp [6]        | 1.2 W  | 40 Gb/s  | $2^7 - 1$    | SiGe bipolar, 106 GHz $f_T$ / 145 GHz $f_{MAX}$           | 4.29         |

TABLE VI COMPARISON OF RECENT PRBS GENERATORS

current for a given logic function. While 130-nm or 90-nm MOS CML latches operate from 1.5-V or lower supplies, they require more than 2 times higher tail currents and inductive peaking to operate above 10 Gb/s, thus offsetting the advantage provided by the lower supply voltage [12], [33].

To compare the PRBS generator described here to previously reported work, a figure of merit (FOM) for PRBS generators is introduced in (9). The FOM includes the power consumption of the generator, the sequence length, and bit rate of the generator output. The FOMs of this PRBS generator and other previously published PRBS generators are summarized in Table VI. It should be noted that some of the references report more than the core generator in their power consumption. The power consumption of 243 mW indicated in Table VI for this work accounts for the parallel PRBS generator core (145 mW), the clock distribution tree (40 mW), one 2:1 MUX (13 mW) and a 50- $\Omega$ output buffer (45 mW):

$$FOM = \frac{Power}{\log_2 (Length) \times bitrate}.$$
 (9)

## VI. CONCLUSION

A review and comparison of PRBS generator topologies was presented, along with their applicability for high-speed and/or low-power implementation.

Low-power BiCMOS CML latch topologies were analyzed using the OCTC technique. Based on simulated results and a fabricated latch, it is expected that these topologies will be suitable for faster digital circuits or lower power ones operating at the same speed.

A  $2^7-1$  PRBS generator chip was designed based on this latch and CML family, fabricated, and characterized. The design was optimized for low power consumption at the architecture and circuit level. A 2.5-V 1-mA latch is used on the 12-Gb/s path. To the best of our knowledge this is the lowest power latch clocked above 10 GHz. The generator produces four parallel PRBS outputs at 23 Gb/s while consuming 235 mW, requiring 60 mW for each PRBS output.

#### ACKNOWLEDGMENT

The authors thank B. Sautreuil and R. Beerkens for their support and STMicroelectronics for fabrication. The authors also thank NSERC and Micronet for financial support, OIT and CFI for test equipment, and CMC for CAD tools and support.

#### REFERENCES

- M. G. Chen and J. K. Notthoff, "A 3.3 V, 21 Gb/s PRBS generator in AlGaAs/GaAs HBT technology," *IEEE J. Solid-State Circuits*, vol. 35, no. 9, pp. 1266–1270, Sep. 2000.
- [2] H. Veenstra, "1–58 Gb/s PRBS generator with <1.1 ps RMS jitter in InP technology," in *Proc. ESSCIRC*, Sep. 2004, pp. 359–362.
- [3] M. Bussmann, U. Langmann, W. J. Hillery, and W. W. Brown, "PRBS generation and error detection above 10 Gb/s using a monolithic Si bipolar IC," J. Lightw. Technol., vol. 12, no. 2, pp. 353–360, Feb. 1994.
- [4] F. Schumann and J. Böck, "Silicon bipolar IC for PRBS testing generates adjustable bit rates up to 25 Gbit/s," *Electron. Lett.*, vol. 33, pp. 2022–2023, Nov. 1997.
- [5] O. Kromat, U. Langmann, G. Hanke, and W. J. Hillery, "A 10-Gb/s silicon bipolar IC for PRBS testing," *IEEE J. Solid-State Circuits*, vol. 33, no. 1, pp. 76–85, Jan. 1998.
- [6] H. Knapp, M. Wurzer, T. F. Meister, J. Bock, and K. Aufinger, "40 Gbit/s 2<sup>7</sup>-1 PRBS generator IC in SiGe bipolar technology," in *Proc. Bipolar/BiCMOS Circuits and Technology Meeting*, Sep. 2002, pp. 124–127.
- [7] H. Knapp, M. Wurzer, W. Perndl, K. Aufinger, J. Böck, and T. F. Meister, "100-Gb/s 2<sup>7</sup>-1 and 54-Gb/s 2<sup>11</sup>-1 PRBS generators in SiGe bipolar technology," *IEEE J. Solid-State Circuits*, vol. 40, no. 10, pp. 2118–2125, Oct. 2005.
- [8] O. Wohlgemuth, W. Müller, T. Link, R. Lederer, and P. Paschke, "2<sup>7</sup>-1 SiGe PRBS generator IC up to 86 Gbit/s," in *Proc. Gallium Arsenide Applications Symp.*, Amsterdam, The Netherlands, Oct. 2004, pp. 335–338.
- [9] R. Malasani, C. Bourde, and G. Gutierrez, "A SiGe 10-Gb/s multipattern bit error rate tester," in *Proc. IEEE Radio Frequency Integrated Circuits (RFIC) Symp.*, Jun. 2003, pp. 321–324.
- [10] S. Kim, M. Kapur, M. Meghelli, A. Rylyakov, Y. Kwark, and D. Friedman, "45-Gb/s SiGe BiCMOS PRBS generator and PRBS checker," in *Proc. IEEE Custom Integrated Circuits Conference* (CICC), Sep. 2003, pp. 313–316.
- [11] D. Kucharski and K. Kornegay, "A 40 Gb/s 2.5 V 2<sup>7</sup>-1 PRBS generator in SiGe using a low-voltage logic family," in *IEEE ISSCC Dig. Tech. Papers*, San Francisco, CA, Feb. 2005, pp. 340–341.
- [12] H.-D. Wohlmuth and D. Kehrer, "A low power 13-Gb/s 2<sup>7</sup>-1 pseudo random bit sequence generator IC in 120 nm bulk CMOS," in *Proc. Symp. Integrated Circuits and Systems Design (SBCCI)*, Sep. 2004, pp. 233–236.
- [13] T. O. Dickson, E. Laskin, I. Khalid, R. Beerkens, J. Xie, B. Karajica, and S. P. Voinigescu, "A 72 Gb/s 2<sup>31</sup>-1 PRBS generator in SiGe BiCMOS technology," in *IEEE ISSCC Dig. Tech. Papers*, San Francisco, CA, Feb. 2005, pp. 342–345.
- [14] T. O. Dickson, E. Laskin, I. Khalid, R. Beerkens, J. Xie, B. Karajica, and S. P. Voinigescu, "An 80-Gb/s 2<sup>31</sup>-1 pseudorandom binary sequence generator in SiGe BiCMOS technology," in *IEEE J. Solid-State Circuits*, Dec. 2005, vol. 40, no. 12, pp. 2735–2745.
- [15] W. McFarland, K. Springer, and C.-S. Yen, "1-Gword/s pseudorandom word generator," *IEEE J. Solid-State Circuits*, vol. 24, no. 6, pp. 747–751, Jun. 1989.

- [16] S. W. Golomb, Shift Register Sequences. San Francisco, CA: Holden-Day, 1967.
- [17] F. Sinnesbichler, A. Ebberg, A. Felder, and R. Weigel, "Generation of high-speed pseudorandom sequences using multiplex techniques," *IEEE Trans. Microw. Theory Tech.*, vol. 44, no. 12, pp. 2738–2742, Dec. 1996.
- [18] A. N. Van-Luyn, "Shift register connections for delayed versions of m-sequences," *Electron. Lett.*, vol. 14, pp. 713–715, Oct. 1978.
- [19] J. J. O'Reilly, "Series-parallel generation of m-sequences," *The Radio and Electronic Engineer*, vol. 45, pp. 171–176, Apr. 1975.
- [20] B. Razavi, Design of Integrated Circuits for Optical Communications. New York: McGraw-Hill, 2003.
- [21] T. O. Dickson, R. Beerkens, and S. P. Voinigescu, "A 2.5-V 45-Gb/s decision circuit using SiGe BiCMOS logic," *IEEE J. Solid-State Circuits*, vol. 40, no. 4, pp. 994–1003, Apr. 2005.
- [22] T. O. Dickson, K. H. K. Yau, T. Chalvatzis, A. Mangan, E. Laskin, R. Beerkens, P. Westergaard, M. Tazlauanu, M. Yang, and S. P. Voinigescu, "The invariance of characteristic current densities in nanoscale MOSFETs and its impact on algorithmic design methodologies and design porting of Si(Ge) (Bi)CMOS high-speed building blocks," *IEEE J. Solid-State Circuits*, vol. 41, no. 8, pp. 1830–1845, Aug. 2006.
- [23] A. S. Sedra and K. C. Smith, *Microelectronic Circuits*, 5th ed. New York: Oxford Press, 2004.
- [24] S. P. Voinigescu, T. O. Dickson, T. Chalvatzis, A. Hazneci, E. Laskin, R. Beerkens, and I. Khalid, "Algorithmic design methodologies and design porting of wireline transceiver IC building blocks between technology nodes," in *Proc. IEEE Custom Integrated Circuits Conf.*, 2005, pp. 110–117.
- [25] T. E. Collins, V. Manan, and S. I. Long, "Design analysis and circuit enhancements for high-speed bipolar flip-flops," *IEEE J. Solid-State Circuits*, vol. 40, no. 5, pp. 1166–1174, May 2005.
- [26] T. O. Dickson, M.-A. LaCroix, S. Boret, D. Gloria, R. Beerkens, and S. P. Voinigescu, "30–100-GHz inductors and transformers for millimeter-wave (Bi)CMOS integrated circuits," *IEEE Trans. Microw. Theory Tech.*, vol. 53, no. 1, pp. 123–133, Jan. 2005.
- [27] S. P. Voinigescu, T. Dickson, R. Beerkens, I. Khalid, and P. Westergaard, "A comparison of Si CMOS, SiGe BiCMOS, and InP HBTs technologies for high-speed and millimeter-wave ICs," in *Proc. Si Monolithic Integrated Circuits in RF Systems*, Atlanta, GA, 2004, pp. 111–114.
- [28] T. H. Lee, The Design of CMOS Radio Frequency Integrated Circuits, 2nd ed. New York: Cambridge, 2004.
- [29] M. Laurens, B. Martinet, O. Kermarrec, Y. Campidelli, F. Deleglise, D. Dutarte, G. Troillard, D. Gloria, J. Bonnouvrier, R. Beerkens, V. Rousset, F. Leverd, A. Chantre, and A. Monroy, "A 150 GHz  $f_T/f_{max}$ 0.13- $\mu$ m SiGe:C BiCMOS technology," in *Proc. Bipolar/BiCMOS Circuits and Technology Meeting*, Sep. 2003, pp. 199–202.
- [30] P. Chevalier, C. Fellous, L. Rubaldo, F. Pourchon, S. Pruvost, R. Beerkens, F. Saguin, N. Zerounian, B. Barbalat, S. Lepilliet, D. Dutartre, D. Celi, I. Telliez, D. Gloria, F. Aniel, F. Danneville, and A. Chantre, "230-GHz self-aligned SiGeC HBT for optical and millimeter-wave applications," *IEEE J. Solid-State Circuits*, vol. 40, no. 10, pp. 2025–2034, Oct. 2005.
- [31] E. Laskin and S. P. Voinigescu, "A 60 mW per lane, 4 × 23-Gb/s 2<sup>7</sup>-1 PRBS generator," in *Proc. IEEE Compound Semiconductor Integrated Circuit Symp.*, Oct. 2005, pp. 192–195.
- [32] Y. Amamiya, Y. Suzuki, J. Yamaraki, A. Fujihara, S. Tanaka, and H. Hida, "1.5-V low supply voltage 43-Gb/s delayed flip-flop circuit," in *Proc. IEEE Gallium Arsenide Integrated Circuits (GaAs IC) Symp.*, Nov. 2003, pp. 169–172.

[33] K. Kanda, D. Yamazaki, T. Yamamoto, M. Horinaka, J. Ogawa, H. Tamura, and H. Onodera, "40 Gb/s 4:1 MUX/1:4 DEMUX in 90 nm standard CMOS," in *IEEE ISSCC Dig. Tech. Papers*, San Francisco, CA, Feb. 2005, pp. 152–153.



**Ekaterina Laskin** (S'04) received the B.A.Sc. (Hons) degree in computer engineering and the M.A.Sc. degree in electrical engineering from the University of Toronto, Toronto, ON, Canada, in 2004 and 2006, respectively. She is currently working toward the Ph.D. degree at the Department of Electrical and Computer Engineering, University of Toronto.

Her research interests include the design of highspeed and millimeter-wave integrated circuits, with a focus on mm-wave imaging systems.

Ms. Laskin was a University of Toronto Scholar from 2000 to 2004. She received the National Science and Engineering Research Counsel of Canada (NSERC) undergraduate student research award in industry and university in 2002 and 2003. She was the recipient of the NSERC Postgraduate Scholarship and currently holds the NSERC Canada Graduate Scholarship.



**Sorin P. Voinigescu** (M'90–SM'02) received the M.Sc. degree in electronics from the Polytechnic Institute of Bucharest, Bucharest, Romania, in 1984, and the Ph.D. degree in electrical and computer engineering from the University of Toronto, Toronto, ON, Canada, in 1994.

From 1984 to 1991, he worked in R&D and academia in Bucharest, Romania, where he designed and lectured on microwave semiconductor devices and integrated circuits. From 1994 to 2002 he was with Nortel Networks and with Quake Technologies

in Ottawa, ON, Canada, where he was responsible for projects in high-frequency characterization and statistical scalable compact model development for Si, SiGe, and III-V devices. He also led the design and product development of wireless and optical fiber building blocks and transceivers in these technologies. In 2000, he co-founded and was the CTO of Quake Technologies, the world's leading provider of 10 Gb Ethernet transceiver ICs. In September 2002, he joined the Department of Electrical and Computer Engineering, University of Toronto, as an Associate Professor. He has authored or co-authored over 70 refereed and invited technical papers spanning the simulation, modeling, design, and fabrication of high-frequency semiconductor devices and circuits. His research and teaching interests focus on nanoscale semiconductor devices and their application in integrated circuits at frequencies up to and beyond 100 GHz.

Dr. Voinigescu received Nortel's President Award for Innovation in 1996. He is a co-recipient of the Best Paper Award at the 2001 IEEE Custom Integrated Circuits Conference and at the 2005 Compound Semiconductor IC Symposium. His students have won the Best Student Paper Award at the 2004 IEEE VLSI Circuits Symposium, at the 2006 RFIC Symposium, and at the 2006 IEEE SiRF Meeting.