# High-Speed CMOS Analog Viterbi Detector for 4-PAM Partial-Response Signaling

Bahram Zand and David A. Johns, Fellow, IEEE

Abstract—In this paper, a 1-Gb/s analog Viterbi detector based on a 4-PAM duobinary scheme is discussed with experimental results for a 0.25- $\mu$ m CMOS implementation. This chip is the first analog integrated implementation of a reduced state sequence detector. Pipelining and parallel processing have been incorporated in this design for high-speed operation. Due to test equipment limitations, experimental results are given for 200-Mb/s operation while simulation results indicate a speed of 1 Gb/s. Power dissipation is 55 mW from a 2.5-V supply. The active area occupies 0.78 mm<sup>2</sup>. Although a duobinary scheme has been the focus of this work for its application in optical links, this design can be readily modified or extended to other partial-response signaling schemes such as dicode and PR4.

*Index Terms*—CMOS analog integrated circuits, multilevel systems, partial response signaling, sequence detection, Viterbi detection.

## I. INTRODUCTION

T HE EXPONENTIAL growth of high-speed data communication transceivers is often hindered by interface components and transmission link shortcomings. For example, in wired links there is a limited bandwidth that is dependent on distance and cabling, while in line-of-sight free-space optical links, a bandwidth limitation occurs due to photodiodes with large depletion capacitance and LEDs. Two common techniques to combat bandwidth limitations are multilevel modulation and partial-response signaling (PRS) [1].

Multilevel modulation schemes reduce the required bandwidth for a given bitrate and, hence, increase channel efficiency. A simple multilevel transmission scheme is M-level pulse amplitude modulation (M-PAM), where each pulse conveys  $log_2(M)$  bits of information by mapping each combination of  $log_2(M)$  bits to one of M specified levels. Partial-response signaling also improves channel efficiency, but in this case, the improvement occurs by allowing a controlled amount of intersymbol interference (ISI), thereby reducing noise enhancement (since less equalization is required). However, to take full advantage of PRS, maximum-likelihood sequence detection (MLSD) is required, of which the Viterbi algorithm is most commonly used [2], [3]. Unfortunately, when combining M-PAM modulation with PRS, the decoding complexity often makes this approach impractical. Fortunately, reduced-state

Publisher Item Identifier S 0018-9200(02)05872-9.



Fig. 1. Two-state trellis diagram.

sequence detection (RSSD) can be used to reduce decoding complexity with little compromise in performance [4]–[8]. While a digital realization of RSSD for 125-Mb/s digital transmission over unsheilded twisted pair cabling has been reported [9], there have been no previous reports of analog RSSD implementations. Due to the elimination of the analog-to-digital converter (A/D) at the frontend of analog Viterbi detectors, which alone consumes a power of about 300 mW at 500 MS/s, lower power and area are the greatest advantages of this design.

This paper presents the design and circuit implementation of a 1-Gb/s analog reduced-state sequence detector based on 4-PAM duobinary PRS. The design was fabricated in a  $0.25-\mu$ m CMOS process and consumes 55 mW from a 2.5-V supply when operating at 200 Mb/s. While post-layout simulations assert its function up to 1 Gb/s, due to test limitations, this chip was tested only up to 200 Mb/s.

# II. REDUCED-STATE VITERBI DETECTOR

The Viterbi algorithm is a practical technique for realizing a maximum-likelihood sequence detector. By measuring the difference between the actual value of the received signal and its expected value, one can assign metrics for each branch and state. Final detection will be based on revealing the sequence with the least accumulated branch metrics. These states and branches are stretched in time and are shown in trellis diagrams.

For a two-state trellis diagram, we follow the results in [12] to calculate branch metrics,  $b_{ji}(k)$ , and state metrics,  $m_i(k)$ , as shown in Fig. 1, in which  $b_{ji}(k)$  measures the amount of error between the expected and the received values, while  $m_i(k)$  for each state at time k represents the least accumulated branch metrics from the origin to that specific state.

State metrics  $m_0(k)$  and  $m_1(k)$  for time k can be evaluated based on the previous state metrics and the branch metrics as follows:

$$\begin{cases}
m_0(k) = \min\{m_0(k-1) + b_{00}(k), m_1(k-1) + b_{10}(k)\}\\
m_1(k) = \min\{m_0(k-1) + b_{01}(k), m_1(k-1) + b_{11}(k)\}.
\end{cases}$$
(1)

Manuscript received November 5, 2001; revised January 24, 2002.

B. Zand was with the Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 3G4, Canada. He is now with Snowbush Microelectronics, Toronto, ON M5S 2T9, Canada (e-mail: zand@eecg.utoronto.ca).

D. A. Johns is with the Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 3G4, Canada.

Since branch metrics are defined only for comparative purposes and in order to avoid overflowing, the difference metrics are introduced such that [16]

$$\Delta m(k) = m_0(k) - m_1(k).$$
 (2)

One can determine this difference metric and branch extensions based on the branch metrics conditions such that

$$\Delta m(k) = b_{00}(k) - b_{01}(k)$$
if
$$\begin{cases} \Delta m(k-1) < b_{10}(k) - b_{00}(k) \\ \Delta m(k-1) < b_{11}(k) - b_{01}(k) \end{cases}$$
(3.1)
$$\Delta m(k) = b_{00}(k) - b_{11}(k) + \Delta m(k-1)$$
if
$$\begin{cases} \Delta m(k-1) < b_{10}(k) - b_{00}(k) \\ \Delta m(k-1) > b_{11}(k) - b_{01}(k) \end{cases}$$
(3.2)
$$\Delta m(k) = b_{10}(k) - b_{01}(k) - \Delta m(k-1)$$
if
$$\begin{cases} \Delta m(k-1) > b_{10}(k) - b_{00}(k) \\ \Delta m(k-1) < b_{11}(k) - b_{01}(k) \end{cases}$$
(3.3)
$$\Delta m(k) = b_{10}(k) - b_{11}(k)$$
if
$$\begin{cases} \Delta m(k-1) > b_{10}(k) - b_{00}(k) \\ \Delta m(k-1) > b_{10}(k) - b_{00}(k) \\ \Delta m(k-1) > b_{11}(k) - b_{00}(k) \end{cases}$$
(3.4)

Extending this discussion to multilevel schemes, the full-state trellis diagram for a 4-PAM modulation with the levels of -1, -1/3, +1/3, and +1 V, encoded with a duobinary scheme, is shown in Fig. 2, where each branch is labeled by a pair of *input* data, duobinary coded data. For this modulation, although the full-state Viterbi algorithm works well, its circuit implementation is quite complex. To lower this complexity, RSSD is a solution for maintaining almost the same performance as fullstate Viterbi detection but with less complexity. For a two-state RSSD, the idea is to retain the two most probable states at each time and ignore the other states. These two states according to the adjacency relation [13]<sup>1</sup> will always be two neighboring states. As a result, for the diagram in Fig. 2, the remaining states at each time can be (0,1) or (2,1) or (2,3). As an example, depending on the level of the received sample, possible branch extensions initiating from the states (0,1) are shown in Fig. 3.

A few facts need to be clarified in Fig. 3. First, category pairs a–b and c–d each have three branches in common and can only be distinguished by the fourth branch. To do so, a threshold value can be set by averaging y(k) values of these noncommon branches. Second, other possible categories will not occur in duobinary coding [13]. Third, the next states in the categories a and d will always be (0,1) and (2,3), respectively, while in the categories b and c, next states will be (0,1) or (2,1) for b and (2,1) or (2,3) for c, depending on the Viterbi algorithm results. Fourth, the highest and the lowest quantization thresholds for this example are 0 and -4/3 V, respectively; these thresholds

<sup>1</sup>Note that a dicode sequence was examined in [13] rather than duobinary sequence in our case.



Fig. 2. Full state trellis diagram for a 4-PAM signaling. Branch labels represent the pairs of *uncoded* and *encoded* signals.



Fig. 3. Typical possible survivors in duobinary 4-PAM RSSD starting from the states (0,1).

for the starting states (2,1) are 2/3 and -2/3 V and are 4/3 and 0 V for starting states (2,3) [18].

This suggests that by grouping odd and even states into two hyperstates, we can represent any of the categories in Fig. 3 by a trellis diagram, as shown in Fig. 1, with the difference that the branch metrics are a function of their originating states [13]. Following this idea, the full-state trellis diagram in Fig. 2 will be reduced to the two-hyperstate diagram in Fig. 4. As an example, any of the categories in Fig. 3 which are a subset of the above diagram are shown in the two-state trellis diagram in Fig. 5. Having this state reduction in place, we can proceed to the next stage, which is basically the same as two-state Viterbi detection.

Applying mean-square error criterion and denoting any starting state by j and ending state by i, the branch metrics will be equal to

$$b'_{ji}(k) = \left[y(k) - 2\frac{(j+i-3)}{3}\right]^2,$$
  

$$j = 0, 1, 2, 3; \quad i = 0, 1, 2, 3. \quad (4)$$



Fig. 4. Two-hyperstate trellis diagram.



Fig. 5. Two-state presentation of the categories in Fig. 3.

By removing common terms which are independent of a particular state and applying a factor of 1/4, the branch metrics reduce to

$$b_{ji}(k) = \frac{3 - (j+i)}{3} \left[ y(k) + \frac{3 - (j+i)}{3} \right],$$
  

$$j = 0, 1, 2, 3; \quad i = 0, 1, 2, 3. \quad (5)$$

Using the above equation, the branch metrics for the example categories of c and d are shown in Fig. 6. Complete possible branch extensions and their metrics starting from the adjacent states I (0,1), III (2,1), and V (2,3) are presented in Table I. By assuming each category of branch extensions in Table I as a full two-state trellis diagram as exemplified in Fig. 5 and applying (3.1)–(3.4) to the associated branch metrics, the number of extended branches at each category is reduced to two under the specific conditions. These conditions and their pertaining branch extensions for the categories c and d, as an example, are presented in Table II, while complete tables for all categories in Table I can be found in [18]. Note that in Table II, branch metrics in the category c can be either positive or negative depending on the value of y(k). Due to this fact, an extra threshold -1/3 for this category and other threshold levels 1, +1/3, and -1 for the other categories not shown in this table have been introduced to differentiate between the distinct signs



Fig. 6. Typical branch metrics for the example categories c and d.

TABLE I BRANCH EXTENSIONS AND THEIR METRICS



 TABLE II

 BRANCH EXTENSION AND DIFFERENCE METRIC UPDATE OF STATE (0,1)

|   | Present State= (0,1)                                                                                                                 |                 |                                                              |                  |       |  |
|---|--------------------------------------------------------------------------------------------------------------------------------------|-----------------|--------------------------------------------------------------|------------------|-------|--|
| _ | y(k)                                                                                                                                 | ∆m(k)           | Condition                                                    | Branch Extension | State |  |
| d | 0 <y(k)< td=""><td>1/3(y(k)+1/3)</td><td>Δm(k-1)&lt;-1/3(y(k)+1/3)</td><td></td><td></td></y(k)<>                                    | 1/3(y(k)+1/3)   | Δm(k-1)<-1/3(y(k)+1/3)                                       |                  |       |  |
|   |                                                                                                                                      |                 | $(Qu=0, Qd=1), ((s/p)_d=1, (s/p)_u=0)$                       | :                | (2,3) |  |
|   |                                                                                                                                      | -Δm(k-1)        | -1/3(y(k)+1/3)<Δm(k-1)<1/3(-y(k)+1/3)                        | $\geq$ :         |       |  |
|   |                                                                                                                                      |                 | (Qu=0, Qd=0), ((s/p) <sub>d</sub> =0, (s/p) <sub>u</sub> =0) | : ~              | (2,3) |  |
|   |                                                                                                                                      | -1/3(-y(k)+1/3) | Δm(k-1)>1/3(-y(k)+1/3)                                       | ::               |       |  |
|   |                                                                                                                                      |                 | $(Qu=1, Qd=0), ((s/p)_d=0, (s/p)_u=1)$                       | :                | (2,3) |  |
| c | -1/3 <y(k)<0< td=""><td>1/3(y(k)+1/3)</td><td>Δm(k-1)&lt;-1/3(y(k)+1/3)</td><td>:</td><td></td></y(k)<0<>                            | 1/3(y(k)+1/3)   | Δm(k-1)<-1/3(y(k)+1/3)                                       | :                |       |  |
|   |                                                                                                                                      |                 | (Qu=0, Qd=1), ((s/p) <sub>d</sub> =1, (s/p) <sub>u</sub> =0) |                  | (2,3) |  |
|   |                                                                                                                                      | -Δm(k-1)        | -1/3(y(k)+1/3)<Δm(k-1)<1/3(y(k)+1/3)                         |                  |       |  |
|   |                                                                                                                                      |                 | (Qu=0, Qd=0), ((s/p) <sub>d</sub> =0, (s/p) <sub>u</sub> =0) | :                | (2,3) |  |
|   |                                                                                                                                      | -1/3(y(k)+1/3)  | $\Delta m(k-1) > 1/3(y(k)+1/3)$                              | :;               |       |  |
|   |                                                                                                                                      |                 | (Qu=1, Qd=0), ((s/p) <sub>d</sub> =0, (s/p) <sub>u</sub> =1) | : :              | (2,1) |  |
|   | -2/3 <y(k)<-1 3<="" td=""><td>1/3(y(k)+1/3)</td><td><math>\Delta m(k-1) &lt; 1/3(y(k)+1/3)</math></td><td> :</td><td></td></y(k)<-1> | 1/3(y(k)+1/3)   | $\Delta m(k-1) < 1/3(y(k)+1/3)$                              | :                |       |  |
|   |                                                                                                                                      |                 | $(Qu=0, Qd=1), ((s/p)_d=1, (s/p)_u=0)$                       | :                | (2,3) |  |
|   |                                                                                                                                      | Δm(k-1)         | $1/3(y(k)+1/3) < \Delta m(k-1) < -1/3(y(k)+1/3)$             | :                |       |  |
|   |                                                                                                                                      |                 | (Qu=0, Qd=0), ((s/p) <sub>d</sub> =1, (s/p) <sub>u</sub> =1) | : :              | (2,1) |  |
|   |                                                                                                                                      | -1/3(y(k)+1/3)  | $\Delta m(k-1) > -1/3(y(k)+1/3)$                             | ::               |       |  |
|   |                                                                                                                                      |                 | (Qu=1, Qd=0), ((s/p) <sub>d</sub> =0, (s/p) <sub>u</sub> =1) | : :              | (2,1) |  |





Fig. 7. Front-end quantizer circuit.

of these metrics. Some parameters such as Qu, Qd,  $(s/p)_u$ , and  $(s/p)_d$  in Table II will be explained in later sections. As seen in Table II, with the knowledge of the present state and the level of input signal, threshold levels for the final two comparators can be set and the difference metrics can be updated as the result of this final comparison. Finally, received data can be identified by keeping track of the survived branch transitions in a path memory. In the next section, the circuit implementation for this type of detection will be elaborated.

# III. ANALOG RSSD CIRCUIT DESIGN

## A. General Design

The information embedded in the complete version of Table II and also the other tables associated with the starting states (2,1) and (2,3) [18] give the main information for circuit implementation of the 4-level reduced-state Viterbi detector. Two comparator stages at the front and back end of the circuit, corresponding to the conditions of, respectively, columns 2 and 4 of Table II, as well as the offset combiners in the middle, form the analog core of this circuit. This analog core is supported by digital circuitry which sets the dc offset value and sign for the input signals as a function of present state and input level. This digital circuit also controls the path memory and defines the next state based on the outputs from the back-end comparators and the existing state.

The front-end circuit is composed of nine comparators which quantize the sampled input signal with steps of 1/3 V starting from +4/3 V and ending at -4/3 V (Fig. 7). Ten outputs, p1-10, of these comparators, along with the current state information, are input to the digital part to select the desired offset and polarity for y(k) and y(j).

As shown in Fig. 8, two combinations of y(k), each with appropriate polarity and offset, form threshold levels for the two comparators at the back end. Difference metrics will be updated and surviving branches will be identified upon the termination of this final comparison. In this figure, a few digital signals control the signals offset and their polarity. As implied from the third and fourth columns of Table II and the other complementary tables [18], there are only three distinct absolute offset values; these are 5/3 V, 1 V, and 1/3 V, which are se-



Fig. 8. Analog core of the processing circuit.

lectable by the digital signals C53, C10, and C13, respectively. Difference metrics which are extracted from one of the upper or lower threshold levels are selected and stored by the multiplexer sample-and-hold (Mux-S/H) for the succeeding comparison based on the following three possible conditions for the comparator outputs Qu and Qd. In the case of Qu = 1 and Qd = 0, the upper threshold voltage will be chosen, whereas in the case Qu = 0 and Qd = 1, the lower threshold level will be adopted. For the last possible case, when Qu = Qd = 0, no replacement for the former difference metric will take place and the only possible variation is its polarity which, indeed, will rely on the conditions of the current state and the quantized level.

Although the structure in Fig. 8 is complete and applicable, it suffers from the existence of two S/Hs in the signal path, which deteriorates the update speed. To improve speed performance, we notice that in Table II,  $\Delta m(k)$  is always a function of y(k) or  $\Delta m(k-1)$ , depending on the output of the two final comparators, which also implies that  $\Delta m(k-1)$  is a function of y(j), j < k [12]. This suggests that the circuit in Fig. 8 can be upgraded to the circuit shown in Fig. 9. Two ping-pong S/Hs at the input will store y(k) and y(j). The conditions of Qu and Qd, as addressed before, will rule on whether the position of the input sampling switch in this structure will be toggled or remain unchanged. The new configuration operates at higher frequencies due to removal of one S/H from the signal path.

Realization of the circuit in Fig. 9 can be simplified if all additions and subtractions are performed in current mode, as shown in Fig. 10. A fully differential structure ensures significant suppression of common-mode noise and interference in the circuit. The select switches pick one of the distinct offset levels of 1/3, 1, and 5/3 V controlled by the digital input controls. Input transconductors (V/I) convert input signals and selected offsets to current before they are combined via pull-up resistors. Also, polarity switches simply interchange the input and output connections based on the control inputs to change the polarity of alternative signals.

Since in Fig. 10 arithmetic operations are in current mode (and also to reuse circuit blocks), the quantizing structure in Fig. 7 is modified to the one shown in Fig. 11. Using this configuration, the transconductors V/I-9 employed in Fig. 10 can also be reused for quantization with extra output currents.



Fig. 9. Improved structure for the analog core.



Fig. 10. Practical structure for circuit realization of Fig. 9.

Unfortunately, the structure in Fig. 9 still suffers from significant delays within one sample period. The operations such as sampling, quantization, digital circuits delay, voltage-to-current conversions, and the last stage comparison, create a delay of more than 8 ns, which is too long to achieve the desired speed. These delays can be mitigated by splitting the above duties to different cycles and using a pipelining structure, which will be addressed in the next section.

#### **B.** Pipelining Structure

Due to the long processing time needed for complete computation during one sample period, the entire operation for one sample is divided into four consecutive cycles, which start with sampling and continue with quantization, digital assessment, and finally back-end comparison and difference metric update.

As depicted in Fig. 12, five S/Hs store five samples of the incoming signal. These samples are saved in the capacitors through the transistor switches controlled by  $S_{1-5}$  (1) before being converted to current by the corresponding transconductors. These currents which are proportional to the samples at each S/H are steered to different stages in the pipelining structure for subsequent analysis. The switches controlled by  $S_{1-5}$  (2) deliver the desired current to the quantizer, while the other switches,  $S_{1-5}$  (4) and  $S_{1-5}$  (5), take two other currents for the difference metric update process. Upon the completion



Fig. 11. Current-mode realization of the front-end quantizer.



Fig. 12. Circuit structure for pipelining.

of this process on each sample, that sample will be replaced by a new input sample at the same S/H. This implies that one S/H and one transconductor are devoted to each sample for a complete process.

To elaborate on the preceding discussion, suppose a new round of the process is begun by assuming S/H(1) samples y(m) at time 0. Denoting each clock period by T, sample y(m) should have been stored and settled by S/H(1) before time T. At the start of the second period, T, S/H(2) will start sampling y(m + 1) at the same time as y(m), which is being held in S/H(1), is under the quantization process. At time



v(t) sampling

quantization

Fig. 13. Selective switches and connections in the pipelining configuration of the circuit.

T+1, the quantizer outputs produced by y(m) will be used as part of the inputs for digital assessment. At the same time, y(m+2) is sampled by S/H(3) and y(m+1), being held in S/H(2), is undergoing the quantization process. At time T + 2, as presented in Fig. 13, this rotation will continue by saving y(m+3) in S/H(4) and quantizing y(m+2) while y(m+1) is in a waiting state for its digital assessment. Meanwhile, sample y(m-1), which we assume had already been stored in S/H(5) together with y(m), having been stored in S/H(1), will jointly proceed to final comparison and the difference metric update operation. In contrast, the function of y(m) and y(m-1) in Fig. 13 is analogous to the characteristics of y(k) and y(j) in Fig. 9, respectively, and as discussed earlier, the update of the sample and holds containing y(k) and y(j) depends on the results of Qu and Qd. This means that for the period starting at T+3, if either Qu or Qd is 1, the next sample, y(m+4), will be stored in S/H(5) [Fig. 14(a)]. Otherwise, if Qu = Qd = 0, S/H(5) will retain its sample and y(m + 4) will be stored in S/H(1) [Fig. 14(b)].

As shown in Fig. 14, there is one selective switch [SW (1-5)]assigned for any of the five S/Hs which, based on the existing conditions, controls the flow of sampled signals to the different processing stages. The digital controller for each switch is made up of seven D-flip-flops and two multiplexers, which is shown in more detail in Fig. 15. Each switching controller has four outputs. At each period, only one of the outputs will be active and the others will remain inactive. This is in compliance with the fact that samples should be in different positions during the detection process. In this circuit, the flow of active state from  $S_n(1)$  to  $S_n(4)$  is unconditional, while the state transitions from  $S_n(4)$  to  $S_n(5)$  and  $S_n(5)$  to  $S_n(1)$  are conditional and depend on Qu and Qd. For Qu or Qd = 1, there will be a routine flow of states from  $S_n(4)$  to  $S_n(5)$  and  $S_n(5)$  to  $S_n(1)$ , whereas for the case Qu = Qd = 0, the state in S<sub>n</sub>(4) will be moved to S<sub>n</sub>(1) rather than  $S_n(5)$  and  $S_n(5)$  will preserve its own state.

In this pipelining configuration, there is still one unresolved problem. Recall from Table II that the information about the selection of dc offset and signal polarity which are generated by the digital assessment circuitry depends on the knowledge of the present state, which is based on the acquisition of Qu and Qd information from the last period. However, in the pipelining



Fig. 14. Typical rotation of S/Hs when (a) Qu or Qd = 1 and (b) Qu = Qd = 0.



Fig. 15. Digital switching controller.

structure, the digital assessment and the difference metric update both execute simultaneously, which means that comparator outputs cannot be known until the end of the cycle. To avoid most of this delay, the digital assessment block is triplicated and each of the blocks pre-evaluate their outputs for the three possible cases of Qu, Qd = (1,0), (0,1), and (0,0). At the end of the cycle and once the final comparator outputs are established, one of these three sets of results will be chosen as the correct output set.

S/H (4)

y(m+:

S/H

V/I

SW



Fig. 16. Path memory configuration. L: latch. p: parallel input. S/P = 1 = > serial loading. S/P = 0 = > parallel loading.



Fig. 17. Transconductor circuit.

## C. Path Memory

The final step in detecting the received data is to keep track of the past states in the path memory. Based on the information from the present state and the branch extension protocols in Table II, the input digital state information (I, V) will be propagated through the path memory using serial/parallel control signals (S/P) and the final detected bits will be recovered at the output. A depth memory of about 30 bits is proved sufficient by simulation to acquire convergence in path memory, that is, recovering the same recovered data at the upper and lower pairs of the path memory in Fig. 16 (d1 = d2).

## **IV. BUILDING BLOCKS**

## A. Voltage-to-Current Converter (V/I)

Voltage-to-current converters (transconductors) play a critical role in this design, as all mathematical operations in this design are in current mode. The transconductor with p-channel inputs [14] depicted in Fig. 17 has a transconductance gain of

$$\frac{i_o}{v_{id}} \approx \frac{1}{R} \tag{6}$$

and can accommodate low bias level inputs and performs with high linearity if R is kept constant. Wide poly resistors laid out at a close distance from each other can provide good linearity and matching with the other transconductors. The main advantage of this transconductor configuration is its capability to support multiple outputs to be used in the circuits demonstrated in Figs. 10 and 11.



Fig. 18. Front-end preamplifiers of the comparator and their connections.

#### B. Comparators

Nine comparators at the front end and two comparators at the back end are the key parts in this detector. When dealing with CMOS comparators, their input offset can be significant in a precise design and the need for offset cancellation is unavoidable. The comparator employed in this design has incorporated two cascaded preamplifiers (Fig. 18) which are coupled to the input signal by  $C_1$  and  $C_2$ . Offset cancellation and bias adjustment is manipulated by the MOS switches which short the output to the input and connect the other side of the coupling capacitors to the reference voltages [15].

#### C. Input Quantizing Circuit

Fig. 19 is the circuit realization of V/I-9 in Figs. 10 and 11. Nine differential outputs enable the sampled signal in quantization position to be compared with nine reference levels as shown in Fig. 11 in current mode. Nine reference levels are generated using differential resistive ladders [17] and five two-differential-output transconductors (V/I-2), of which four of them introduce two symmetric levels of (+4/3, -4/3), (+1, -1), (2/3, -4/3), (+1, -1), (2/3, -4/3), (+1, -1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1), (-1)-2/3), and (1/3, -1/3) just by exchanging one of the output connections. The last V/I-2 presents 0 V level (refer to Figs. 11 and 20). These reference levels, as well as the input signal level, have been downscaled by the ratio of 3/10 in practice because of circuit swing limitations. Once in quantizing mode, output transistors controlled by S2 will be turned on for the process. In digital assessment mode, none of the output transistors are on, because during this cycle only digital operations are carried out, based on the previous quantization results. In y(k) and y(j) positions, transistors controlled by S4 and S5, respectively, will be turned on for the operation depicted in Fig. 10. Also, although in the sampling mode there is no computational process to mandate keeping the output transistors in the corresponding V/I-9 on, but as explained in Fig. 14, since the y(j) position is uncertain until the far end of the previous cycle, S1 will keep the output transistors to be placed in sample mode on for a probable switch to y(j)if both Qu and Qd are zero. For further clarification, it should be pointed out that for reducing power consumption, only the transistors engaged in the computational operations (quantization and difference metric update) will be on in that particular cycle and the rest will be kept off. The latter transistors will be turned on slightly before any operational cycle to avoid any delay caused by activating an off transistor.

In all modes except quantization, all transistors controlled by S2 are off and hence, the *enable* signal disconnects the gate of



Fig. 19. Nine-differential-output transconductor (V/I-9).





Fig. 20. Reference generating circuit. (a) Differential ladder resistors. (b) Two-differential-output transconductor (V/I-2).

these transistors to reduce capacitive load at the gates of active transistors. In addition, as illustrated in Fig. 20, switch transistors with their gates always grounded are also included in the V/I-2 circuit for matching purposes.

# V. EXPERIMENTAL RESULTS

The RSSD Viterbi detector was fabricated in a 0.25- $\mu$ m CMOS process. Fig. 21 shows a photograph of the chip. The active area is measured to be 0.78 mm<sup>2</sup>, of which 75% is occupied by the analog portion. Two differential pairs of the input clock and input signal along with five pairs of differential outputs form the major I/O pins. The output signals are comprised of four digital signals for path memory data propagation control and one output clock for synchronization purposes.

To focus on the analog core of the circuit, the path memory part was not included in the layout. However, the path memory consists of approximately 120 D-flip-flops operating at 500 MHz, which would increase the power consumption by 25% and increase the area by 10%.

To test the chip, an external high-speed D/A was used to generate a 7-level 4-PAM duobinary differential input signal. The three digital inputs to this D/A were generated from a pattern generator. A controlled amount of noise was added to this input signal using a noise generator and combiners. Due to speed limitations of the available equipment such as the logic analyzer and data generator, measurements were carried out up to 200 MS/s and the results are shown in Fig. 22. These results indicate a



Fig. 21. Chip photograph.



Fig. 22. Measured bit-error rate performance.

close agreement between the experimental and simulation results. Deviation of experimental and simulated results from the ideal case in low signal-to-noise ratio (SNR) is due to model in-

| TABLE III           |  |
|---------------------|--|
| PERFORMANCE SUMMARY |  |

| Chip                 | Analog RSSD                                                                                                |  |
|----------------------|------------------------------------------------------------------------------------------------------------|--|
| Modulation           | 4-PAM                                                                                                      |  |
| Coding               | (1+D) partial response                                                                                     |  |
| Symbol-Rate          | 500 MS/s - 1Gbit/s (Simulation)<br>100 MS/s - 200 Mbit/s (Experimen-<br>tal, due to equipment limitations) |  |
| Power<br>Consumption | 112mW at 1Gb/s (Simulation)<br>55mW at 200Mb/s (Experimental)                                              |  |
| Power Supply         | 2.5V                                                                                                       |  |
| Process              | 0.25 μm - CMOS                                                                                             |  |
| Active area          | 0.78 mm <sup>2</sup>                                                                                       |  |
|                      |                                                                                                            |  |

accuracies. A summary of the chip measured results and specifications is shown in Table III.

# VI. CONCLUSION

Analog integrated Viterbi detectors have already demonstrated their ability to operate at high speed while consuming low power. With an ever-increasing demand for higher data rates and the limitations of existing channels, multilevel schemes have drawn attention for their lower bandwidth requirement. In this paper, a complete design procedure of a 500-MS/s (1-Gb/s) analog Viterbi detector for 4-PAM duobinary partial-response signaling has been elaborated, and experimental results based on this design have been demonstrated. Due to the limitations of the testing equipment, testing was conducted at 100 MS/s (200 Mb/s), while simulations demonstrate that it should operate at 500 MS/s. The power consumption of the analog decoder was measured to be 55 mW from a 2.5-V supply. This design approach can also be extended to other partial-response signalings such as dicode and class-IV systems, where a high degree of detection reliability and low power consumption is of concern.

## REFERENCES

- [1] P. Kabal and S. Pasupathy, "Partial-response signaling," *IEEE Trans. Commun.*, vol. COM-23, pp. 921–934, Sept. 1975.
- [2] G. D. Forney, Jr., "Maximum-likelihood sequence estimation of digital sequences in the presence of intersymbol interference," *IEEE Trans. Inform. Theory*, vol. IT-18, pp. 363–378, May 1972.
- [3] \_\_\_\_\_, "The Viterbi algorithm," *Proc. IEEE*, vol. 61, pp. 268–278, Mar. 1973.
- [4] S. Olcer, "Reduced-state sequence detection of multilevel partial-response signals," *IEEE Trans. Commun.*, vol. 40, pp. 3–6, Jan. 1992.
- [5] A. Duel-Hallen and C. Heegard, "Delayed decision-feedback sequence estimation," *IEEE Trans. Commun.*, vol. 37, pp. 428–436, May 1989.
- [6] P. R. Chevillat and E. Eleftheriou, "Decoding of trellis-encoded signals in the presence of intersymbol interference and noise," *IEEE Trans. Commun.*, vol. 37, pp. 669–676, Jul. 1989.
- [7] F. L. Vermuelen and M. E. Hellman, "Reduced-state Viterbi decoding for channels with intersymbol interference," in *Proc. IEEE Int. Conf. Communications*, 1974, pp. 37.B.1–37.B.4.
- [8] M. V. Eyuboglu and S. U. Qureshi, "Reduced-state sequence estimation for coded modulation on intersymbol interference channels," *IEEE J. Select. Areas Commun.*, vol. 7, pp. 989–995, Aug. 1989.

- [9] G. Cherubini, S. Olcer, and G. Ungerboeck, "A quaternary partial-response class-IV transceiver for 125-Mbit/s data transmission over unshielded twisted-pair cables: Principles of operation and VLSI realization," *IEEE J. Select. Areas Commun.*, vol. 13, pp. 1656–1669, Dec. 1995.
- [10] R. Farjad-Rad, C. K. Yang, M. Horowitz, and T. Lee, "A 0.4-µ m CMOS 10-Gb/s 4-PAM pre-emphasis serial link transmitter," *IEEE J. Solid-State Circuits*, vol. 34, pp. 580–585, May 1999.
  [11] J. L. Zerbe, P. S. Chau, C. W. Werner, T. P. Thrush, H. J. Liaw, B. W.
- [11] J. L. Zerbe, P. S. Chau, C. W. Werner, T. P. Thrush, H. J. Liaw, B. W. Garlepp, and K. S. Donelly, "1.6-Gb/s/pin 4-PAM signaling and circuits for a multidrop bus," *IEEE J. Solid-State Circuits*, vol. 36, pp. 752–760, May 2001.
- [12] M. H. Shakiba, "Analog Viterbi detection for partial-response signaling," Ph.D. dissertation, Univ. Toronto, Toronto, Canada, 1997.
- [13] S. Olcer and G. Ungerboeck, "Difference-metric Viterbi decoding of multilevel class-IV partial-response signals," *IEEE Trans. Commun.*, vol. 42, no. 2, pp. 1558–1570, Feb./Mar./Apr. 1994.
- [14] D. A. Johns and K. Martin, Analog Integrated Circuit Design. New York: Wiley, 1997.
- [15] I. Mehr and D. Dalton, "A 500-MSample/s 6-bit Nyquist-rate ADC for disk-drive read-channel applications," *IEEE J. Solid-State Circuits*, vol. 34, pp. 912–920, July 1999.
- [16] M. J. Ferguson, "Optimal reception for binary partial-response channels," *Bell Syst. Tech. J.*, vol. 51, no. 2, pp. 493–505, Feb. 1972.
- [17] Y. Tamba and K. Yamakido, "A CMOS 6-b 500-MSample/s ADC for a hard disk-drive read channel," in *IEEE Int. Solid-State Circuits Conf.* 1999 Dig. Tech. Papers (ISSCC), Feb. 1999, pp. 324–325.
- [18] B. Zand, "High-speed optical wireless communications using reducedstate sequence detection," Ph.D. dissertation, Univ. Toronto, Toronto, Canada, 2002.



**Bahram Zand** received the B.Sc. and M.Sc. degrees from the Sharif University of Technology, Tehran, Iran, in 1985 and 1986, respectively, and the Ph.D. degree from the University of Toronto, Toronto, ON, Canada, in 2002, all in electrical engineering.

He is currently with Snowbush Microelectronics, Toronto, working on high-speed data communication systems. During 1987–1992, he was a Researcher and Lecturer in the Department of Electrical and Computer Engineering, Sharif University of Technology. From 1992 to 1996, he was a Design Manager with

Maharan Engineering Company, Iran. Durng 2001–2002, he was an Analog Design Engineer with Insilicon Canada. His research has been focused on highspeed free-space optical communication systems and design of analog integrated circuits for the applications in digital communications.



David A. Johns (S'81–M'89–SM'94–F'01) received the B.A.Sc., M.A.Sc., and Ph.D. degrees from the University of Toronto, Toronto, ON, Canada, in 1980, 1983 and 1989, respectively.

In 1988, he joined the University of Toronto where he is currently a Full Professor. He has ongoing research programs in the general area of analog integrated circuits with particular emphasis on circuits and systems for digital communications. His research work has resulted in more than 40 publications. He is coauthor of a textbook entitled *Analog Integrated* 

*Circuit Design* (New York: Wiley, 1997) and has given numerous industrial short courses. Together with academic experience, he has four years of semiconductor industrial experience and is co-founder of a microelectronics company called Snowbush.

Dr. Johns received the 1999 IEEE Darlington Award. He served as an associate editor for IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—PART II from 1993 to 1995 and for PART I from 1995 to 1997.