#### INFORMATION TO USERS This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer. The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand corner and continuing from left to right in equal sections with small overlaps. Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6" x 9" black and white photographic prints are available for any photographs or illustrations appearing in this copy for an additional charge. Contact UMI directly to order. ProQuest Information and Learning 300 North Zeeb Road, Ann Arbor, MI 48106-1346 USA 800-521-0600 # High-Speed Optical Wireless Communications using Reduced-State Sequence Detection by Bahram Zand A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto © Copyright by Bahram Zand 2002 National Library of Canada Acquisitions and Bibliographic Services 395 Wellington Street Ottawa ON K1A 0N4 Canada Bibliothèque nationale du Canada Acquisitions et services bibliographiques 395, rue Wellington Ottawa ON K1A 0N4 Canada Your file Votre référence Our file Notre référence The author has granted a nonexclusive licence allowing the National Library of Canada to reproduce, loan, distribute or sell copies of this thesis in microform, paper or electronic formats. The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission. L'auteur a accordé une licence non exclusive permettant à la Bibliothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la forme de microfiche/film, de reproduction sur papier ou sur format électronique. L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation. 0-612-69143-8 # High-Speed Optical Wireless Communications using Reduced-State Sequence Detection Bahram Zand Department of Electrical and Computer Engineering University of Toronto Degree of Doctor of Philosophy, 2002 #### **Abstract** Driven by the need for high-speed connectivity in short distances and the costs and difficulties of deploying cables, this thesis discusses the design of short-distance optical wireless data communications with the target speed of IGb/s. In addition to exploring the effect of individual components in this link, two blocks at the receiver side, the front-end transimpedance amplifier and the back-end detector, were designed and implemented and their performance summary are given below. A transimpedance amplifier with differential dc-coupled photocurrent sensing was integrated in a standard 0.35 $\mu$ m CMOS. It achieves 33k $\Omega$ transimpedance gain and a bandwidth of 255 MHz with a 2pF photodiode capacitance. This design exhibits 40dB power supply rejection ratio and an average input noise of $6.8 pA/\sqrt{Hz}$ . Power dissipation is 30mW from a 3V supply. Also, an active dc photocurrent rejection circuit was included in this circuit to prevent the circuit output from saturation under intense background light. A 1Gb/s analog Viterbi detector based on a 4-PAM duobinary scheme was designed in a 0.25µm CMOS process. This chip is the first integrated implementation of an *analog* reduced state sequence detector. Pipelining structure and parallel processing have been incor- porated in this design for high-speed operation. Due to test equipment limitations, experimental results are given for 200 Mb/s operation while simulation results indicate a speed of 1 Gb/s. Power dissipation is 55mW from a 2.5V supply while occupying 0.78mm<sup>2</sup> of area. Although a duobinary scheme has been the focus of this work for its application in optical links, this design can be readily modified or extended to other PRS schemes such as dicode and PR4. ## Acknowledgment I wish to express my deepest gratitude to Prof. David A. Johns for his invaluable assistance and inspirational supervision. I am indebted to him for all the academic, financial and moral support he provided for me. I would like to thank the members of my PhD thesis committee Profs. S. Pasupathy, W.T. Ng, F. Najm, J. Long and G. Cauwenberghs. My thanks is also extended to administration staff in the Department of Electrical and Computer Engineering and in particular to Sarah Cherian and Judith Levene for their compassionate support. To Jennifer Rodrigues for her help at anytime. I am always honored to have the opportunity to learn from so many sincere friends. My especial thanks to Dr. Mohammad Hossein Shakiba, Profs. Khoman Phang and Anthony Carusone, Amir Hadji-Abdolhamid, Steve Hranilovic, Shahriar Mirabbasi, Takis Zourntos, Tooraj Esmailian and Mehrdad Ramezani. I also learnt from the fruitful conversations and debates with Sebastian Magierowski, Raj Mahadevan, Cameron Lacy, Kasra Ardalan, Sotoudeh Hamedi-Hagh and my other colleagues in room EA104. To my parents for their continual support and devotion and for their encouragement to make this journey possible for me. My heart is always with you. To my brother, Behnam, who cheerfully took over all my responsibilities when leaving home and prepared a peace of mind for me to accomplish this work. May god help me to deserve your so loving dedication. To my wife Fariba and my daughter Neekoo for their relentless inspiration throughout this work. How could I even imagine of this achievement without your perpetual support. Thank you so much for your patience and love. # **Table of Contents** | CHAPTER 1 | Introduction | 1 | |-----------|------------------------------------------------|-----| | CHAPTER 2 | Background | 7 | | 2.1 | Partial Response Signalling and Detection | | | | Techniques 7 | | | | 2.1.1 Nyquist System 8 | | | | 2.1.2 Partial Response Signalling 9 | | | | 2.1.3 Detection Techniques in PRS 11 | | | | 2.1.4 Analog Viterbi Detectors 16 | | | 2.2 | Optical Wireless Communications 18 | | | | 2.2.2 Modulation Schemes 21 | | | | 2.2.3 Noise 22 | | | | 2.2.4 Preamplifiers 22 | | | 2.3 | Application of Modulation and Detection | | | | Techniques in Optical Wireless | | | | Communications 27 | | | 2.4 | Summary 33 | | | 2.5 | References 35 | | | CHAPTER 3 | Fully Differential DC-Coupled | | | | Transimpedance Amplifier | 37 | | 3.1 | Circuit Implementation 38 | | | | 3.1.1 The Photodiode Biasing Circuit 40 | | | | 3.1.2 The Differential Amplifier 41 | | | | 3.1.3 The DC Photocurrent Rejection Circuit 44 | | | 3.2 | Output Buffer 46 | | | 3.3 | Noise Performance 47 | | | 3.4 | Experimental Results 48 | | | 3.5 | Summary 51 | | | 3.6 | references 53 | | | CHAPTER 4 | Analog Reduced-State Sequence | | | | Detection: System and Circuit Design | 54 | | 4.1 | 4-PAM Signalling 56 | - • | | 4.2 | Reduced-State Viterbi Detector 57 | | |----------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----| | | 4.2.1 System Approach 57 | | | | 4.2.2 Performance Evaluation 66 | | | | 4.2.3 RSSD for other PRS 68 | | | 4.3 | Analog RSSD 71 | | | | 4.3.1 Circuit Design 71 | | | | 4.3.2 Pipelining Structure 76 | | | | 4.3.3 Path Memory 80 | | | | 4.3.4 Comparator Offset Effects 81 | | | 4.4 | Building Blocks 82 | | | | 4.4.1 Voltage-to-Current Converter 83 | | | | 4.4.2 Comparators 84 | | | | 4.4.3 Input Quantizing Circuit 86 | | | | 4.4.4 Input Sample and Holds 88 | | | | 4.4.5 Offset Generators 89 | | | | 4.4.6 Clock Generator 90 | | | 4.5 | Summary 91 | | | 4.6 | References 92 | | | | | | | CHAPTER 5 | Analog Reduced-State Sequence | | | | <b>Detection: Experimental Results</b> | 94 | | 5.1 | Layout 94 | | | | 5.1.1 Clock Buffers 97 | | | | 5.1.2 Digital I/O Line Translators 97 | | | | | | | 5.2 | Test Set-Up 99 | | | 5.3 | Experimental Results 101 | | | 5.3<br>5.4 | Experimental Results 101 Summary 102 | | | 5.3 | Experimental Results 101 | | | 5.3<br>5.4 | Experimental Results 101 Summary 102 | 104 | | 5.3<br>5.4<br>5.5<br>CHAPTER 6 | Experimental Results 101 Summary 102 References 103 Conclusions and Future Directions | 104 | | 5.3<br>5.4<br>5.5 | Experimental Results 101 Summary 102 References 103 | 104 | | 5.3<br>5.4<br>5.5<br>CHAPTER 6<br>6.1 | Experimental Results 101 Summary 102 References 103 Conclusions and Future Directions Summary and Conclusions 104 | 104 | | 5.3<br>5.4<br>5.5<br>CHAPTER 6<br>6.1 | Experimental Results 101 Summary 102 References 103 Conclusions and Future Directions Summary and Conclusions 104 | 104 | | 5.3<br>5.4<br>5.5<br>CHAPTER 6<br>6.1<br>6.2 | Experimental Results 101 Summary 102 References 103 Conclusions and Future Directions Summary and Conclusions 104 Future Directions 106 | 104 | | 5.3<br>5.4<br>5.5<br>CHAPTER 6<br>6.1<br>6.2 | Experimental Results 101 Summary 102 References 103 Conclusions and Future Directions Summary and Conclusions 104 Future Directions 106 Study of Adjacency in Branch | 104 | ### **List of Tables** - Table 2.1: Specification for some optical wireless and RF systems - Table 2.2: Optical power and bandwidth requirements for some intensity modulation schemes - Table 3.1: Performance Summary - Table 3.2: Comparison with previous work - Table 4.1: Branch extension and their metrics - Table 4.2: Branch extension and difference metric update of state I - Table 4.3: Branch extension and difference metric update of state III - Table 4.4: Branch extension and difference metric update of state V - Table 4.5: Branch extension and difference metric update of state I in dicode PRS - Table 5.1: Performance Summary - Table A.1: Survived metrics for the adjacent states $b_0(n-1)$ and $b_1(n-1)$ # **List of Figures** - Fig. 2.1: The minimum-bandwidth Nyquist systems with normalized $T_s(1/f_s)=1$ sec, a: Impulse response b: Frequency - Fig. 2.2: The duobinary PRS system with normalized $T_s(1/f_s)=1$ sec, a: Impulse response, b: Frequency response - Fig. 2.3: A partial-response encoder model - Fig. 2.4: Symbol-by-symbol detection using DFE - Fig. 2.5: Symbol-by-symbol detection using precoder/slicer - Fig. 2.6: A typical two-state trellis diagram - Fig. 2.7: A typical transceiver with adaptive equalizers at the front of the detector - Fig. 2.8: The receiver architecture in the presence of digital Viterbi detector - Fig. 2.9: The receiver architecture in the presence of analog Viterbi detector - Fig. 2.10: A typical optical wireless transceiver system - Fig. 2.11: Three optical preamplifier structures, at low input impedance, bt high input impedance, ct transimpedance amplifier - Fig. 2.12: Common-source transimpedance amplifiers - Fig. 2.13: Common-gate transimpedance amplifiers - Fig. 2.14: Common-gate transimpedance amplifiers with RGC circuit - Fig. 2.15: AC-coupled fully differential transimpedance amplifier - Fig. 2.16: A typical optical transceiver model - Fig. 2.17: Frequency response of the optical channel without equalization - Fig. 2.18: Detection techniques, a: peak detection, b: DFE, c: Viterbi decoder - Fig. 2.19: Frequency response for pulse slimming equalizer (symbol-rate=1GHz) - Fig. 2.20: Frequency response for pulse slimming equalizer (symbol-rate=1GHz) - Fig. 2.21: BER versus SNR for different detection techniques in 2-PAM scheme - Fig. 2.22: Frequency response for pulse slimming equalizer (symbol-rate=500MHz) - Fig. 2.23: Frequency response for 1+D equalizer (symbol-rate=500MHz) - Fig. 2.24: SER versus SNR for different detection techniques using 4-PAM scheme - Fig. 3.1: Basic structure of the proposed transimpedance amplifier - Fig. 3.2: Photodiode bias input stage - Fig. 3.3: Equivalent half-circuit of RGC - Fig. 3.4: Differential amplifier circuit - Fig. 3.5: Single-ended circuit for frequency response calculation - Fig. 4.Simplified model for the second stage - Fig. 3.7: Error amplifier - Fig. 3.8: DC photocurrent rejection, a: dc rejection in progress, b: dc rejection completed - Fig. 3.9: Output buffer - Fig. 3.10: Frequency response of the output buffer - Fig. 3.11: Input half-circuit used for noise calculation - Fig. 3.12: Circuit test set-up - Fig. 3.13: Measured preamplifier frequency response for different photodiode capacitances - Fig. 3.14: Measured output noise - Fig. 3.15: Eye diagrams with C<sub>pd</sub>=5pF: a) 200 Mbps, b) 400 Mbps - Fig. 3.16: Chip Micrograph - Fig. 4.1: A two-state trellis diagram - Fig. 4.2: Full state trellis diagram for a 4-PAM duobinary PRS scheme. Branch labels represent the pairs of *uncoded* and *encoded* signals - Fig. 4.3: Typical possible survivors in duobinary 4-PAM RSSD starting from the states (0,1) - Fig. 4.4: Two-state reduced trellis diagram - Fig. 4.5: Two state presentation of the categories in Fig. 4.3 - Fig. 4.6: Typical branch metrics for the example categories (c) and (d) - Fig. 4.7: Minimum-distance error event paths for a full-state and reduced-state detector in 4-PAM Signalling - Fig. 4.8: SER performance comparison of MLSD and RSSD - Fig. 4.9: Full state trellis diagram for a 4-PAM dicode PRS scheme. Branch Labels represent the pairs of *uncoded* and *encoded* signals - Fig. 4.10: Front-end quantizer circuit - Fig. 4.11: Analog core of the processing circuit - Fig. 4.12: Improved structure for the analog core - Fig. 4.13: Practical structure for circuit realization of Fig. 4.11 - Fig. 4.14: Current-mode realization of the front-end quantizer - Fig. 4.15: Digital processing blocks - Fig. 4.16: The circuit processing stages in pipelining structure - Fig. 4.17: Selective switches and connections in the circuit pipelining configuration - Fig. 4.18: Typical rotation of S/Hs when a: Qu or Qd =1 and b: Qu=Qd=0 - Fig. 4.19: Rotation management digital circuit - Fig. 4.20: Three possible path memory inputs - Fig. 4.21: Path memory a: circuit, b: digital controls - Fig. 4.22: Simulated SER performance comparison, a: 5% dc offset in the front-endcomparators, b: the same dc offset in the back-end comparators - Fig. 4.23: Transconductor circuit - Fig. 4.24: Transconductor a: simplified circuit, b: half-circuit open-loop circuit - Fig. 4.25: Front-end preamplifiers of the comparator and their connections - Fig. 4.26: The comparator circuit - Fig. 4.27: Nine-differential-output transconductor (V/I-9) - Fig. 4.28: Reference generating circuit a: differential ladder resistors, b: two-differential-output transconductor (V/I-2) - Fig. 4.29: Input sample-and-holds circuit - Fig. 4.30: Offset generating circuit - Fig. 4.31: The clock generating circuit - Fig. 4.32: Output waveforms of the clock generator - Fig. 5.1: Chip photograph - Fig. 5.2: Digital two-stage buffer - Fig. 5.3: Differential low-swing input to full-swing output translator circuit - Fig. 5.4: Full-swing input to low-swing differential output translator circuit - Fig. 5.5: Test set-up - Fig. 5.6: Input seven-level encoded signal eye diagram a: high SNR, b: low SNR - Fig. 5.7: Input test signal and clock generator hardware - Fig. 5.8: Measured BER performance - Fig. A.1: Possible branch extensions form the adjacent states $b_0(n-1)$ and $b_1(n-1)$ #### **CHAPTER** 1 ## Introduction Optical Communications has revolutionized information technology in the past few years. Light beams which lend themselves admirably to the ever-increasing demand for higher data rates enjoy an unmatched channel bandwidth due to the nature of photons which constitute an optical signal and react weakly to their environment and each other as opposed to electrons [1]. Nowadays, the mention of optical communications, most-likely implies fiber optic communications. With an exponential increase in the number of nodes, a load of as high as 11 Tb/s is expected for global Internet backbone by the year 2005 [2] which has urged wide-area networks (WANs) and local-area networks (LANs) to switch their media from copper to fiber. This in return has prompted a tremendous amount of investigation for low-cost, low-power integrated fiber-optic transceivers. Despite the enormous achievements in this area, fiber optics still remain an expensive choice for network expansion mostly due to the installation and trenching costs. Alternatively, optical wireless communications offers the speed of the optics without the expense of fiber. Optical wireless communications (known within the industry as free-space optics (FSO)) has compelling economic advantages to be incorporated in gigabit-per-second rates over metropolitan distances of a few city blocks or as a last-mile access which connects end-users with Internet service providers [3][4]. These free-space systems which in their full-scale and reliable setup require less than one fifth of the budget needed for a ground-based fiber optics [5] possess other advantages such as fast installation and flexible configuration. Optical equipment generally work at one of two wavelengths of 850 nm and 1550 nm. Lasers for 850 nm are much less expensive and are therefore favored for applications over moderate distances. On the other hand, 1550 nm lasers are less harmful to eye as their radiation are mostly absorbed by the comea before reaching the retina and hence eye-safety regulations allow these longer wavelength beams to operate at powers of about two orders of magnitude higher than the shorter wave-length beams. Higher transmit power translates into longer range of operation at the same speed or alternatively higher rate of bits with the same distance. However, when it comes to the applications in shorter ranges within a building or a room and with multiple users, 1550 nm laser systems become unjustifiably expensive and lower 850 nm wave-length lasers face eye-safety regulations [6][7] which make it difficult for them operate efficiently. Short-distance optical wireless communications target low-cost high-speed data exchange. Operating in 850 nm wave-length, inexpensive LEDs can produce substantial launch powers and yet be eye safe [7][8]. This is due to the fact that LEDs are not point source devices as are lasers and do not damage the retina of the eye. However, short-distance optical wireless environment is far from ideal. Optical receivers in this type of link must use photodiodes with a significantly large active area to alleviate fine-alignment problems which in turn impose a relatively large capacitance at the input of the receiver and seriously impact frequency performance. Furthermore, LEDs are slower than laser diodes and their rise and fall time are significant which also introduce a delay at the transmitter side. Stray lights such as sun beam and artificial lights result in some noise components at the received signal and are limiting factors in the receiver sensitivity and dynamic range. When dealing with a channel with these imperfections, sophisticated methods need to be employed for channel coding and data detection to maintain high data throughputs. In band-limited channels, partial response signalling [9] and multi-level modulation schemes can be invoked to establish high data throughputs. This can be achieved by tolerating more complexity in the receiver side. Sequence detection has been proved to be the optimal technique [10] to recover the received data and the Viterbi detection is a practical algorithm to realize a maximum-likelihood sequence detection [11]. In the past few years, analog Viterbi detectors have acquired a lot of attention as they can operate with lower power compared with a traditional digital architecture mostly due to the elimination of A/Ds from the front-end [12-16]. With the increased number of the states in multi-level schemes, the complexity of the Viterbi decoders becomes more serious and utilizing an algorithm to reduce the number of states and hence complexity becomes much more appealing. Reduced-state Viterbi detection is an algorithm which reduces the number of survived states to only most probable states and hence reduces the complexity by ignoring the other states without any notable compromise in performance. The present work studies the implementation of a 1 Gb/s optical wireless communications for short-range applications. Assuming a trivial design on the transmitter side, most attention has been paid to the receiver. A new architecture is introduced for the front-end transimpedance amplifier as well as an analog reduced-state Viterbi detector for 4-PAM duobinary partial-response signalling. The organization of this thesis is as follows: Chapter 2 provides the basic background needed for the rest of the thesis. Two major discussions in this chapter are: - -Partial response signallings and their detection techniques - -Optical wireless communications In the first section of this chapter, the general idea behind partial-response signalling and its prominent advantage over Nyquist systems in band-limited channels is given. This is followed by the discussion about two main detection techniques for this type of signalling, symbol-by-symbol and sequence detection where DFE and the Viterbi detectors have been addressed as special cases of these methods, respectively. In the next section of chapter 2, two major intensity modulation schemes, PAM and PPM are reviewed and different preamplifier architectures as the receiver front-end module are studied. In the last section of this chapter, as a case study for a typical optical wireless channel, equalizer characteristics as well as symbol error-rate performances for different types of detection techniques and signalling schemes are investigated. Chapter 3 describes the design and implementation of a new fully differential transimpedance amplifier. This circuit employs a dc coupling configuration to sense the photocurrent fully differentially and hence improves signal-to-noise and common-mode rejection ratios. The proposed transimpedance amplifier uses a regulated cascode circuit at the input to lower the input impedance and so isolates the photodiode capacitance from the rest of the circuit. Providing this isolation, the circuit bandwidth is shown to be fairly independent of large variation in input capacitance. An active dc rejection circuit included in this transimpedance amplifier, eliminates the effect of dc currents produced by the ambient light and hence prevents the circuit from saturation under intense background beams. Experimental results such as frequency response, noise performance, and transient response have also been presented in this chapter. In chapter 4, after establishing a comprehensive basis for the system design of an analog reduced-state sequence detector, the circuit-level design of the alternative components in this detector has been elaborated. It is shown in this chapter that as the result of unavoidable delays in the analog and digital processing circuits, attaining the desired speed (1Gb/s) within a sample period is difficult and hence some techniques such as pipelining and parallel processing have been incorporated to achieve the target speed. A discussion about practical nonidealities and imperfections and the required accuracy for comparators in the front-end and back-end stages are also given in this chapter. In chapter 5, experimental results and layout issues have been explained. In layout, matching considerations and some provision for substrate noise reduction are described. The test set-up followed by the performance evaluation are covered in the final part of this chapter. Finally, in chapter 6, a summary of thesis contributions is given and directions for fur- ther work are discussed. #### References - [1]M. Montrose, Printed Circuit Board Design Techniques for EMC Compliance, *IEEE Press*, New York, 1996. - [2]J. Savoj, B. Razavi, "A 10-Gb/s CMOS Clock and Data Recovery Circuit with a Half-Rate Linear Phase Detector," *IEEE J. Solid-State Circuits*, Vol. 36, No. 5, May 2001, pp. 761-767. - [3]D. W. Faulkner, D. B. Payne, J. R. Stern and J. W. Ballance, "Optical Networks for Local Loop Applications," *IEEE J. Lightway Tech.*, Vol. 7, No. 11, Nov. 1989, pp. 1741-1751. - [4]T. Kwok, "A Vision for Residential Broadband Services: ATM-to-the-Home," *IEEE Networks*, Vol. 9, No. 5, Sep.-Oct. 1995, pp. 14-28. - [5]H. A. Willebrand and B.S. Ghuman, "Fiber Optics without Fiber," *IEEE Spectrum*, Aug. 2001, pp. 40-45. - [6] International Electrotechnical Commission, "Safety of Laser Products Part I: Equipment Classification, Requirements and User's Guide," *Group Safety Publication*, Ref. No. 825-1, 1993. - [7]D. J. T. Heatly, D. R. Wisely, I. Neild, and P. Cochran, "Optical Wireless: The Story So Far," *IEEE Communications Magazine*, Dec. 1998, pp. 72-82. - [8]J. M. Kahn and J. R. Barry, "Wireless Infrared Communications," *Proceedings of the IEEE*, Vol. 85, No. 2, Feb. 1997, pp. 263-298. - [9]P. Kabal and S. Pasupathy, "Partial-Response Signaling," *IEEE Trans. Commun.*, Vol. 23, No. 9, Sep. 1975, pp. 921-934. - [10]E. A. Lee and D.G. Messerschmitt, "Digital Communication," *Kluwer Academic Publishers*, 1994. - [11]G. D. Forney, Jr., "Maximum-Likelihood Sequence Estimation of Digital Sequences in the Presence of Intersymbol Interference," *IEEE Trans. Inform. Theory*, Vol. 18, No.3, May 1972, pp. 363-378. - [12]T. W. Matthews, R. R. spencer, "An Integrated Analog CMOS Viterbi Detector for Digital Magnetic Recording," *IEEE J. Solid-State Circuits*, Vol. 28, No. 12, Dec. 1993, pp. 1294-1302. - [13]M. H. Shakiba, D. A. Johns and K. W. Martin, "An Integrated 200-MHz 3.3-V BICMOS Class-IV Partial-Response Analog Viterbi Decoder," *IEEE J. Solid-State Circuits*, Vol. 33, No. 1, Jan. 1998, pp. 61-75. - [14]A. Demosthenous and J. Taylor, "Low-Power CMOS and BICMOS Circuits for Analog Convolutional Decoders," *IEEE Trans. Circuits and Systems-II*, Vol. 46, No. 8, Aug. 1999, pp. 1077-1080. - [15]Kai He and Gert Cauwenberghs, "Performance of Analog Viterbi Decoding," *Proc. Midwest Symp. Circuits and Systems*, Vol. 1, 2000, pp. 2-5. - [16]Kai He and Gert Cauwenberghs, "Integrated 64-State Parallel Analog Viterbi Decoder," *Proc. IEEE Int. Symp. Circuits and Systems*, Vol. 4, 2000, pp. 761-764. #### CHAPTER # 2 # **Background** This chapter is a survey of the required background for the thesis. In the first section, partial response signalling schemes and their detection techniques have been reviewed. Optical wireless communications with emphasis on modulation schemes and transimpedance amplifiers has been addressed in the second section. Finally, some special application of the introduced modulation and detection techniques in optical wireless communication systems have been investigated in the last section. #### 2.1 Partial Response Signalling and Detection Techniques In digital data communication systems, for an interference-free data exchange, signals require a broad-spectrum channel extending from low frequencies up to high frequencies large enough to accommodate their main frequency content. Free-space wireless and twist-ed-pair wires as well as magnetic and optical storage systems are examples of popular links for data transmission. Unfortunately, many practical data communication channels are band-limited and unable to provide the required bandwidth especially for high data rates. For example, unshielded twisted-pair (UTP) telephone copper lines which were originally intended only for voice-band communications are currently under investigation to be included in the next generation broadband access networks. Very high rate digital subscriber line (VDSL) is targeting a downstream data transmission of as high as 55 Mbps over UTPs. In the meantime, in optical storage systems, a great deal of research is underway to proliferate the existing holding capacity of DVDs from 4.7 Gbytes to 17 Gbytes [1] [2]. With the increase in the data transmission rate and the bandwidth limitation of the available channels, the effect of adjacent pulses on the received pulse is unavoidable. This effect introduces a common form of interference in band-limited channels which is called intersymbol interference (ISI). Intersymbol interference is a major source of bit errors in the recovered data at the receiver and its effect worsens when pushing the channel toward higher data rates. ISI rejection in band-limited channels can be achieved by incorporating high-pass equalizers. However, their noise enhancement at high frequencies deteriorate the noise performance. On the other hand, there exist some equalizers with less high frequency boost that result in a controlled amount of intersymbol interference. In this section after introducing some minimum-bandwidth communication systems such as Nyquist and partial response signalling (PRS) systems, detection techniques that can be used to retrieve the transmitted data are addressed. #### 2.1.1 Nyquist Systems The fundamental idea behind Nyquist systems is that the received sample values should not be a function of other adjacent samples. This requires that the sampled values of the impulse response of the channel be zero at the non-corresponding data instants. A particular family of pulses which satisfy the Nyquist requirements and are widely used are raised cosine pulses [3]. The minimum bandwidth requirement for these pulses start from (BW=1/2T) for a symbol period of T and the excess bandwidth coefficient of O, as shown in Fig. 2.1. Since the design of brick-wall filters as shown in Fig. 2.1.b is practically impossible, it is common practice to use some excess bandwidth thereby reducing the sharpness of the frequency transition edge. Fig. 2.1: The minimum-bandwidth Nyquist systems with normalized $T_s(1/f_s)=1$ sec, a: Impulse response b: Frequency However, in this thesis, we will focus on minimum bandwidth systems (discussed below) and so the reader is referred to [3] for more information on Nyquist systems. #### 2.1.2 Partial Response Signalling As discussed above, although the minimum channel bandwidth is half the data symbol rate for zero intersymbol interference, an excess bandwidth is always required. To maintain the same bandwidth as the brick-wall filter shown in Fig.2.1.b but with a smoother shape, one can introduce some controlled ISI in time domain. With a predictable amount of ISI, the correct transmitted data can be recovered by using a more complex design in the receiver. These minimum bandwidth filters can be realized by introducing some non-zero values for the pulse filter in non-corresponding instants. For a simple case of duobinary signalling scheme, 1+D, [4] depicted in Fig. 2.2 where *D* denotes a unit delay, in addition to the desired non-zero value at time zero in Fig. 2.2.a, a non-zero value at time *I* is also introduced as a controlled interference. The frequency response of the channel is shown in Fig. 2.2.b and demonstrates a low-pass response with a bandwidth of *1/2T*. Fig. 2.2: The duobinary PRS system with normalized $T_s(1/f_s)=1$ sec, a: Impulse response, b: Frequency response The concept of generating duobinary scheme was then generalized in [5], leading to the signalling technique named partial response signalling (PRS). The construction of PRS is illustrated in Fig. 2.3 where the filter function is a polynomial given by (2.1) which is band limited by an ideal low-pass filter to 1/2T. $$F(D) = k_0 + \sum_{i=1}^{N-1} k_i D^i = k_0 \left[ 1 + \sum_{i=1}^{N-1} (k_i / k_0) D^i \right] = k_0 [1 + K(D)]$$ (2.1) In practice, an appropriate polynomial which best characterizes the channel behavior is incorporated as the model. For example, in optical communications in addition to duobinary partial response, other form of polynomials representing low-pass channels such as $(1+D)^n(1+kD+D^2)$ are also used to model fiber, wireless and recording optic channels [6][7][8]. In another applications, PRS modeling of the magnetic recording channels with some nulls at dc can be easily realized by introducing a $(1-D)^n$ factor to the coding polynomial [9][10]. Fig. 2.3: A partial-response encoder model #### 2.1.3 Detection Techniques in PRS The detection of partial response signals can be accomplished by the removal of the introduced ISI based on the channel polynomial-form model. Two detection methods of symbol-by-symbol and maximum-likelihood are generally applied to retrieve the transmitted symbols. The first method normally employs a decision-feedback equalizer (DFE) to remove ISI. The latter method with a little more complexity, minimizes the probability of error by considering the sequence of symbols and takes advantage of the information embedded in ISI. #### A. Symbol-by-Symbol Detection DFE is a commonly used method in symbol-by-symbol detection. DFEs in comparison with linear equalizers such as zero-forcing filters operate with less noise-enhancement. In this detection method, as illustrated in Fig. 2.4, the transmitted data can be recovered by subtracting the effect of previous input symbols from the current received sample. Fig. 2.4: Symbol-by-symbol detection using DFE In practice, two major factors, noise and error propagation impact DFE performance. In Fig. 2.4, by applying (2.1) and assuming $k_0=1$ for simplicity, the input sample is given by $$y(n) = x(n) + \sum_{i=1}^{N-1} k_i x(n-i)$$ (2.2) In the presence of additive noise, the input to the slicer can be defined by $$s(n) = y(n) + v(n) - \sum_{i=1}^{N-1} k_i \hat{x}(n-i)$$ (2.3) where v(n) is a noise sample. By replacing y(n) in (2.3) with its equivalent from (2.2) we will have $$s(n) = x(n) + v(n) + \sum_{i=1}^{N-1} k_i [x(n-i) - \hat{x}(n-i)].$$ (2.4) The above equation clarifies that the detected symbol can be affected by the added noise and the errors from the past decisions. To avoid this error propagation, as shown in Fig. 2.5, precoding can be implemented at the transmitter side. Fig. 2.5: Symbol-by-symbol detection using precoder/slicer To simplify the description of the precoding method, a duobinary channel has been adopted. In the receiver side, the precoder shown in Fig. 2.5, generates a new sequence defined by $$p(n) = x(n) - p(n-1)$$ (mod 2) . (2.5) Assuming a binary sequence x(n) of ls and ls for transmission, the resulted precoded p(n) will also be a sequence of ls and ls. Based on p(n) a stream q(n) of ls and ls is sent to the channel such that the channel input is set to ls when ls when ls and ls when ls and ls when ls and ls when ls and $$q(n) = 2p(n) - 1 (2.6)$$ The samples at the input of the slicer, r(n), are given by $$r(n) = q(n) + q(n-1) = 2(p(n) + p(n-1) - 1)$$ (2.7) Consequently, $$\hat{x}(n) = p(n) + p(n-1) = \frac{1}{2}r(n) + 1 \tag{2.8}$$ The above equation implies that if $r(n) = \pm 2$ then $\hat{x}(n) = 0$ and if r(n) = 0 then $\hat{x}(n) = 1$ . With the use of precoding in this technique, the detected sample at each time is independent of the previous samples and hence error propagation is avoided. Based on the symbol-by-symbol detection techniques presented here, a lower bound for symbol error rate (SER) when error propagation is ignored can be defined by [9] $$(SER)_{LB} = 2\left(1 - \frac{1}{M}\right)Q(1/\sigma) \tag{2.9}$$ where M is the number of modulating levels in pulse amplitude modulation (PAM) scheme, $\sigma^2$ is the variance of the Gaussian noise v(n) and Q(x) is given by $$Q(x) = \frac{1}{(2\pi)^{1/2}} \int_{r}^{\infty} \exp(-u^2/2) du$$ (2.10) The upper bound when including error propagation is defined by [9] $$(SER)_{UB} = \frac{M^{N-1}(SER)_{LB}}{\frac{M}{M-1}(M^{N-1}-1)(SER)_{LB}+1}$$ (2.11) Equation (2.11) reveals that error propagation increases the error probability by at most a factor $M^{N-1}$ . Comparing DFE performance with that of an ISI-free channel shows a 3dB loss in DFE detection [11]. This is caused by the non-optimal detection technique of DFE which ignores ISI information and makes decision on the base of each received symbol. However, almost all the SNR loss in PRS symbol-by-symbol detection can be recovered by exploiting maximum-likelihood sequence detection [11]. #### B. Maximum-Likelihood Sequence Detection For sequence detection in the presence of ISI, an optimum detection technique needs to take the full advantage of the embedded information in each sample to maximize its detection reliability because each sample contains a combined information of the corresponding symbol as well as the part of the other adjacent symbols. The maximum-likelihood sequence detection (MLSD) is an optimum detector which selects a possible sequence of samples with the highest likelihood compared to the received sequence. Partial response signalling schemes fall in the category of signals with ISI which are optimally detected using MLSD. Although offering an optimal detection technique, a brute-force approach to implement a MLSD decoder can be complex and difficult. Moreover, ideally, the entire sequence of information should have been received before the detection process is commenced which makes this process much more complicated. The Viterbi algorithm is a practical means of implementing an optimum maximum-likelihood detection. This decoding technique was originally developed for convolutional codes and subsequently was applied to ISI channels. The number of computations in the Viterbi algorithm grows linearly with the length of the transmitted sequence and detection process can start immediately upon receipt of the first sample. The idea behind the Viterbi detection technique is to set up a trellis diagram based on the number of coding states and search for a path in this diagram whose coded sequence differs from the received sequence in the fewest number of places. As an example, Fig. 2.6 represents a two-state trellis diagram in which $b_{ji}(k)$ denotes the branch metric from the state j to i at time k and $m_i(k)$ denotes the state metric for the state i at time k. Branch metrics in this diagram measure the amount of error between the expected value and the received value at each transition. The state metric for each state at time k presents the least accumulated branch metrics extended from the origin to that specific state. As illustrated in this figure, at time k=5, state 1 has obtained the smallest accumulated metric and the solid-line branches which have kept track of this state from the origin are exploited to recover the received information. The maximum number of states in an M-ary input signal and in a partial response signalling scheme depicted in Fig. 2.3 is equal to $M^{N-1}$ and the number of branches initiating from or ending to each state can be up to M. Fig. 2.6: A typical two-state trellis diagram In practice, the transmission channel characteristics do not precisely match a PRS channel and the output performance degrades as a result of this improper channel. For example in read channels, peak-up head positions and hence channel specifications are prone to variation due to assembling tolerance and weariness. As a result, adaptive equalizers at the input of the Viterbi or other detector at the receiver side are usually employed to compensate for these imperfections and adjust the channel to a desired behavior (Fig. 2.7). Fig. 2.7: A typical transceiver with adaptive equalizers at the front of the detector #### 2.1.4 Analog Viterbi Detectors Although digital Viterbi detectors are often employed as digital signal processing in systems such as optical and magnetic recording channels, digital mobile radios and digital satellite TVs, it is of interest to exploit some advantages of analog circuits in these decoders. In modest-size process technologies, analog circuits offer better speed and power consumption and applying them to some applications such as recording channels and optical links can achieve a great amount of saving in power and boosting in speed. Even with the shrinkage of process technologies toward deep sub-micron sizes and the unmatched improvement of speed and power consumption in digital circuitry, A/Ds at the front-end of digital Viterbi detectors as shown in Fig. 2.8, still remain as a bottleneck in high-speed and low-power applications. The most recent papers on the design of 6-bit A/Ds [12] and [13], report a power dissipation of about 400mW at 500MS/s and 225mW at 300MS/s with 3.3V power supply, respectively. Moreover, the power required by digital Viterbi detectors as reported in [14] and [15] are about 22mW/state at 110MS/s with 3.0V power supply. These information lead us to an estimated power consumption of about 800mW at 500 MS/s and 465 mW at 300 MS/s for a 4-state digital Viterbi detector and the required A/D. In comparison, as will be shown in chapter 5, the power dissipation of an equivalent analog Viterbi detector is experimentally measured to be 55mW at 100MS/s and is expected to be about 112 mW at 500 MS/s by simulation. Even with considering 20% more power for the path memory which was not included in the designed chip, calculations simply show at least five times power reduction in the analog version. Fig. 2.8: The receiver architecture in the presence of digital Viterbi detector Analog Viterbi detectors obtain their greatest benefit when used with other moderate-precision applications such as equalizers and PLLs, as illustrated in Fig. 2.9, and where the A/D can be eliminated from the front end of the receiver. Fig. 2.9: The receiver architecture in the presence of analog Viterbi detector. A design of analog Viterbi detector for class-IV partial response signalling was presented in [16]. This detector operated at a speed of 40 Mb/s while consuming 50 mW/state. This design was improved with a modified architecture in [17] to achieve a speed of 200 Mb/s with a reduced power dissipation of 15 mW/state. Other low-power designs [18] have reached a power consumption of about 4 mW/state at a speed of 100 Mb/s. In industry, a 72 Mb/s PRML disk-drive chip with an analog Viterbi detector was reported [19] and a 100 Mb/s read channel chip by Texas Instruments using analog Viterbi detector (SSI 32P4782A) [30] is presently available. One of the goals of this thesis is to demonstrate that with the use of an analog Viterbi detector, it is possible to achieve digital transmission speeds as high as 1Gb/s in the area of short distance optical wireless communications. #### 2.2 Optical Wireless Communications Optical wireless communication systems have achieved considerable market acceptance in recent years. This is primarily due to the significant reduction in optical components price and also their ease of installation and reconfiguration. Other advantages such as no requirement for frequency licensing and higher bandwidth, less vulnerability to interference and the possibility of reusing transceivers in the immediate vicinity make optical wireless in some fields more appealing than its radio frequency (RF) counterpart. In certain applications, the use of optical wireless links extend from personal appliances such as digital cameras, mobile phones and PDAs to PCs and printers and ultimately to optical wireless LANs. Indoor optical wireless systems are subject to stringent safety standards which bound their transmitted power within Class I eye safety under all conditions [20]. Indoor systems that use lasers therefore find it difficult to achieve a good power budget. However, by using LEDs instead of lasers a much higher launch power can be used and still remain in Class I eye safe [21]. Furthermore, to alleviate fine alignment complication between optical transmitters and receivers, photodiodes with large active area are needed which in turn introduce large parasitic depletion capacitors at the receiver front-end. Despite the availability of vast capacity in light communications, the aforementioned issues such as slow LEDs and photodiodes limit its operation up to few tens of MHz. While the emergence of Bluetooth and the other radio systems in the past few years has been presumed as a prevailing alternative to optical links, it is more realistic to consider them as two complementary media. Table 2.1 [21] summarizes some relevant radio and optical systems for comparison. As shown in this table, radio systems cover a larger distance whereas optical wireless links are more suitable for short distance communications. In the other words, while RF signals can cover the longer ranges between the buildings and penetrate through some opaque obstacles, wireless optical signals in contrast, are able to serve multiple users in the proximity of each other within the same building with higher throughputs. This calls for a challenging design of more sophisticated modules in an optical wireless communication system especially in the receiver side to perform at higher bit rates using low cost technologies. The scope of this chapter is to introduce a typical optical channel and its elements and put forward some existing challenges and their potential solutions to achieve higher speed. Table 2.1: Specification for some optical wireless and RF systems | Medium | System | Speed<br>(Mb/s) | Range<br>(m) | |---------------------|-----------|-----------------|--------------| | Optical<br>Wireless | JVC | 10 | 10 | | | Spectrix | 4 | 10 | | | IrDA | 4 | I | | | | | | | Radio | WaveLAN | 8 | 100 | | | HiperLAN | 26 | 50 | | | Bluctooth | ī | 10 | After introducing a typical optical link in this section, different modulation schemes and noise in optical communications will be investigated. Preamplifiers followed by equalizers and detectors are then discussed. #### 2.2.1 Optical Link Fig. 2.10 illustrates a typical infrared link. Fig. 2.10: A typical optical wireless transceiver system After coding and modulating the input data in the transmitter side, LED and its driver translate the input signal to the infrared light. The optical power generated in this way is proportional to the LED current and is measured by W/Steradian. In the receiver side, the photodiode also generates a current proportional to the received optical power. The resulting signal to noise ratio is proportional to the square of the received optical power which attenuates by the square of the distance and hence poses a significant limitation on the range of this type of link. The generated current in the receiver is then amplified by a preamplifier before being equalized to a desired spectral shape for detection. The detector recovers the received data using the equalized signal and the input clock generated by the clock recovery circuit. Opto-electronic components suitable for optical wireless data communications are the major limiting factors in achieving higher speed. Available LEDs with reasonable cost have a minimum rise and fall time of 1-1.5ns (MITEL 1A301 High-Performance LED) which impose a limitation of about 300 MHz switching speed for the transmitter. In the receiver side, to alleviate the need for a sharp alignment of the photodiode (as is done in fiber optics using costly lenses), photodiodes with a wide field of view (FOV) are used which introduce a large depletion capacitance at the input of transimpedance amplifier. Typical photodiodes present a capacitance of about 5pF at $V_R = 0V$ (HEWLETT PACKARD HSDL-5400) which normally, along with the input impedance of the preamplifier, in- troduce the dominant pole of the receiver. #### 2.2.2 Modulation Schemes Two popular intensity modulation schemes widely used in optical communications are L-PPM (L-level pulse-position modulation) and L-PAM (L-level pulse amplitude modulation). In L-PPM, the input bits with rate $R_b$ are grouped in packages of $log_2L$ bits as a symbol and at each period one of these symbols is chosen to transmit. In this modulation, each symbol period T is divided into L sub-intervals which each of them is allocated to one of the L different symbols. In L-PAM, the same number of symbols are translated to L different levels and at each period, one of these levels is sent. In contrast to L-PPM, L-PAM is more bandwidth efficient as the number of conveyed bits increase with the number of levels with the same symbol-rate. On the other hand, L\_PPM is more power efficient as it disperses L different symbols temporally rather than in amplitude. In Table 2.2, the power and bandwidth requirements of these two modulation schemes are compared [22]. In this table, for a given noise power and $R_h$ , power requirement is compared with that of an on-off keying modulation (2-PAM) scheme denoted by $P_{ook}$ . It is apparent from Table 2.2 that in limited bandwidth channels with a high bit rate, multi-level PAMs are the best choice. Moreover, as opposed to PPM, PAM is a linear modulation scheme and hence well-known and classical equalization and detection techniques can be applied. Table 2.2: Optical power and bandwidth requirements for some intensity modulation schemes | Modulation<br>Scheme | Average Optical Power<br>Requirement | Bandwidth<br>Requirement | |----------------------|------------------------------------------------|--------------------------| | OOK (2-PAM) | P <sub>ook</sub> | $R_b$ | | L-PAM | $\frac{L-1}{\sqrt{\log_2 L}} P_{ooK}$ | $\frac{R_b}{\log_2 L}$ | | L-PPM | $\frac{1}{\sqrt{0.5L \cdot \log_2 L}} P_{ook}$ | $\frac{LR_b}{\log_2 L}$ | #### **2.2.3** Noise Two primary sources of noise at the optical receiver front-end are thermal and shot noise. Thermal noise sources are due to resistive elements and transistors in the preamplifier. Feedback resistors as well as other resistors such as pull-up and pull-down resistors are the main contributors to thermal noise. This noise is generated independently of the received signal and can be modeled as a Gaussian distribution. On the other hand, shot noise is the major source of noise in wireless optical links in the presence of ambient light. This noise has a white spectrum with a normalized noise power density given by $$\overline{I_{n,sh}^2} = 2qI_s \qquad A^2/Hz \qquad (2.1)$$ where $I_s$ is the dc component of the input signal and q is a unit electron charge equal to $1.6 \times 10^{-19}$ C. Sunlight, incandescent light bulbs and fluorescent lamps are the principal sources of dc component of the optical signal. #### 2.2.4 Preamplifiers High bandwidth, low input referred noise, and wide dynamic range are the three prominent specifications for an optical preamplifier. Three principal configurations for optical preamplifiers are depicted in Fig. 2.11. In circuits a and b, photodiodes have been placed in series with the load resistor. In the former circuit, $R_L$ is low and hence, the overall bandwidth is high as it is determined by the inverse product of $R_L$ and $C_{pd}$ . But, on the other hand, the input thermal current noise, mainly generated by the load resistor, is proportional to $1/R_L$ and hence, this circuit suffers from high input referred noise. The same circuit but with a high load resistor shown in Fig. 2.11.b presents better noise performance at the expense of lower bandwidth. To compensate for their bandwidth shortage, these types of circuits are followed by equalizers. When it comes to bandwidth and noise compromise, the circuit shown in Fig. 2.11.c, or a transimpedance amplifier (TIA) becomes the best choice. In this circuit, $R_f$ performs as a shunt-shunt feedback and can be set as large as possible (as long as stability is sustained) while still presenting a small input impedance. Fig. 2.11: Three optical preamplifier structures, a: low input impedance, b: high input impedance, c: transimpedance amplifier Two major candidate topologies for TIAs are common-source and common-gate. Shown in Fig. 2.12, two common-source architectures differ in the type of feedback connection [23]. Fig. 2.12: Common-source transimpedance amplifiers In both circuits, the input referred noise is equal to $$\overline{I^{2}_{n, in}} = \frac{4kT}{R_{f}} + \frac{\overline{V^{2}_{n, A}}}{R_{f}^{2}}.$$ (2.2) where $R_f$ is the feedback resistor and $V_{n,A}$ denotes the input-referred noise voltage of the open-loop amplifier. Also, the -3dB bandwidth for the above circuits are equal to $(2\pi)A/(R_fC_{pd})$ where Cpd is the depletion capacitance of the photodiodes and A is the open-loop gain. The circuit of Fig. 2.12.a suffers from the small headroom for $R_B$ which is defined by $V_{gsI}+V_{gs2}$ and makes $R_B$ unavoidably small. This impacts both noise performance and bandwidth as it reduces the open-loop gain. Furthermore, the three poles at the input node, the drain of $M_I$ , and the output node degrade the phase margin and hence the stability. In Fig. 2.12.b, the $R_f$ connection node is relocated to the drain of $M_I$ and so the output capacitance is isolated from the feedback resistor. This modification leaves greater voltage drop across $R_B$ and the circuit has better noise and bandwidth characteristics. Two common-gate topologies are depicted in Fig. 2.13 [24]. Due to the lower input impedance of common-gate transistors, the depletion capacitance of photodiodes are isolated from determining the dominant pole. In both circuits, the input referred noise in low frequencies can be approximated by $$\overline{I^{2}_{n, in}} = \frac{4KT}{R_{s1}} + \frac{4KT}{R_{d1}} + \frac{4KT}{R_{f}}$$ (2.3) and the dominant pole is determined by the input capacitance of $M_2$ , the gate and drain capacitance of $M_1$ and the feedback resistor $R_f$ . In the circuit shown in Fig. 2.13.b, the feedback resistor connects the output to the drain of $M_1$ rather than its source as in Fig. 2.13.a. This provides a complete isolation of non-dominant poles from the photodiode capacitance and hence a better frequency performance. Furthermore, since a shunt feedback is applied to the gate of $M_2$ in Fig. 2.13.b, the input impedance at this node is low and the effect of $R_{d1}$ in $f_{-3dB}$ is negligible which implies that this resistor can be made larger in this upgraded design for lower input referred noise and higher open-loop gain. Fig. 2.13: Common-gate transimpedance amplifiers An improved version of common-gate TIAs (as shown in Fig. 2.14) employs a regulated cascode (RGC) circuit at the input to improve the input impedance for better frequency and noise performance [25]. An extra shunt feedback at the input provided by $M_B$ , lowers the input impedance by a factor of $(1+g_{mB}R_B)$ and so the input impedance will be equal to $$R_{in} = \frac{1}{g_{m1}(1 + g_{mB}R_B)} \tag{2.4}$$ This configuration will be addressed in more detail in chapter 3. Fig. 2.14: Common-gate transimpedance amplifiers with RGC circuit Fully-differential circuits have proven their efficiency in reducing common-mode noise injections. Transimpedance amplifiers with fully differential architectures have also been proposed in the literature. A capacitive coupled TIA illustrated in Fig. 2.15 and proposed by [26] and [27] employs two coupling capacitors to block dc current. The disadvantages of this type of topology are the complexity of layout with on-chip coupling capacitors as well as the photodiode biasing fluctuation as a result of the ambient light variation. The other fully-differential TIAs such as those proposed in [28] and [29] sense the photodiode current single-endedly at one of the inputs and use a current source or a capacitor at the other input for matching purposes. Some practical non-idealities such as process variations and temperature, prevent these circuits from being well balanced and their performance deteriorates with these imperfections. DC photocurrent induced by the ambient light can also be problematic in such dc-coupled structures as the intensity of this light can be orders of magnitude higher than that of the signal light. Inclusion of dc rejection circuits can prevent the preamplifier saturation due to background light. Fig. 2.15: AC-coupled fully differential transimpedance amplifier In chapter 3, a new fully differential dc-coupled transimpedance amplifier with dc photocurrent rejection circuit will be discussed. ### 2.3 Application of Modulation and Detection Techniques in Optical Wireless Communications With the advent of high-speed optical data communications and despite the existence of a rather unlimited bandwidth in optical media, band limitation of opto-electronic components and storage devices such as CDs and DVDs call for a sophisticated method for channel equalization and detection. In short-distance wireless optical communications, the main goal is to design a low-cost transceiver with the capability of handling high data bit rates as high as 1Gb/s. Having achieved this speed, even the connections to the monitors can be operated optically and multi-user networks can benefit from it. Based on the experimental results using the existing regular LEDs and photodiodes, a typical two-pole transmitter with the poles at 250 and 400MHz and a transimpedance amplifier with two poles located at 250 MHz and 400 MHz are readily achievable. A channel model for this type of link is shown in Fig. 2.16 in which H<sub>T</sub>(s) and H<sub>R</sub>(S) represent the LED driver and the receiver preamplifier transfer function. Fig. 2.16: A typical optical transceiver model The aggregate frequency response for the introduced typical channel is depicted in Fig. 2.17 and shows that the -3dB bandwidth is about 145MHz. Fig. 2.17: Frequency response of the optical channel without equalization Three methods are shown in Fig. 2.18 to equalize this low-pass channel. In the first method, a pulse slimming equalizer is used to convert the existing channel to a raised-co-sine equivalent with 100% excess bandwidth before proceeding to the peak detection process. In the second method, the channel is equalized to a 1+D partial response channel and then a DFE has been exploited to detect the received data. In the last method and with the same equalization as the second method, a Viterbi detector has been deployed to extract the received information. Fig. 2.18: Detection techniques, a: peak detection, b: DFE, c: Viterbi decoder Using a 10-Tap FIR filter as the equalizer and SIMULINK as the simulator, the filter coefficients of the desired equalizers for the existing channel were defined. After fixing the filter coefficients for each channel, the symbol error rate (SER) was evaluated for performance comparison. In Fig. 2.19, the frequency response of the pulse slimming equalizer for a 2-PAM modulation scheme with a bit-rate of 1Gb/s has been depicted. The same characteristic but for a 1+D equalizer is shown in Fig. 2.20. Since in the pulse slimming equalization, the required channel bandwidth is equal to the symbol-rate (1GHz), higher frequencies in this low-pass channel have been drastically boosted. This type of equalization also enhances in-band noise power which consequently impacts SER performance. In 1+D equalization, the required bandwidth is half the symbol-rate (500MHz) and hence the required high frequency boosting and noise enhancement are less than those of the earlier equalizer. Fig. 2.19: Frequency response for pulse slimming equalizer (symbol-rate=1GHz) Fig. 2.20: Frequency response for 1+D equalizer (symbol-rate=1GHz) Shown in Fig. 2.21, SER performance of the three detection techniques have been compared. This figure reveals at least 4dB and 2dB better performance of the Viterbi detection over the peak and DFE detection techniques, respectively. Fig. 2.21: BER versus SNR for different detection techniques in 2-PAM scheme To avoid the use of high boost equalizers in Figs. 2.19 and 2.20 and also their significant noise enhancement, it was decided to investigate 4-PAM signalling to reduce the required bandwidth and equalization complexity. Each symbol in 4-PAM conveys two bits of information and hence with the same bit rate, the symbol rate will be halved. Figs. 2.22 and 2.23 illustrate the frequency characteristics of pulse slimming and 1+D equalizers, respectively. While still more than 8dB amplification variation in pulse slimming equalizer is required, a simple low-pass filter seems adequate for 1+D equalization. Fig. 2.24 shows the SER performance of three detection methods in 4-PAM scheme. With nearly the same performance as 2-PAM, the Viterbi detection outperforms peak detection and DFE SER performance by 4 and 2dB. This indicates that to achieve a speed of 1Gb/s data transmission over wireless optical links, 4-PAM can provide a viable environment to implement simple equalization and the Viterbi detection can be employed for superior performance. Fig. 2.22: Frequency response for pulse slimming equalizer (symbol-rate=500MHz) Fig. 2.23: Frequency response for 1+D equalizer (symbol-rate=500MHz) Fig. 2.24: SER versus SNR for different detection techniques using 4-PAM scheme #### 2.4 Summary Intersymbol interference is an unavoidable perturbation in the bandwidth constrained channels. Partial-response signalling systems intentionally introduce controlled amount of ISI to spectrally push the signal toward the minimum bandwidth of the Nyquist rate. However, this requires more complexity in the receiver to extract the original data. The task of decoding the received signal in a partial response signalling scheme is to remove the existing ISI from the original information. Symbol-by-symbol techniques such as DFE and precode/slicer techniques are straight-forward. However, they ignore the informative content of the ISI due to the adjacent samples. As such, their SNR performance fall about 3dB below the optimum maximum-likelihood sequence detectors. The Viterbi algorithm is a practical and efficient tool to realize optimum MLSD decod- ers. Their optimal performance stems from the fact that their detection criterion is based on the sequence of information rather than individual samples and hence take the most of information embedded in ISI. Analog Viterbi detectors outperform their digital counterpart in lower power consumption as they eliminate the power-hungry A/D from the front-end of the receiver. Storage channels and short distance optical wireless systems are the appropriate applications to incorporate analog Viterbi detectors as they demonstrate fairly simple channels modeled by PRS coding polynomials and their required equalizers can be implemented in analog. An optical wireless channel is more suitable for short distance data communications where multi-user LANs and computer-based systems can benefit from it without any problems of wiring and cabling. LEDs and photodiodes with large field of view are the bandwidth limitation bottlenecks in optical wireless as they impose significant delay in data exchange. Common-gate input transistors can be employed in transimpedance amplifiers to isolate the photodiode capacitors from the rest of the circuit and hence improve the bandwidth. 4-PAM modulation scheme can provide a practical means of establishing a high-speed data communications as it requires less channel bandwidth. Moreover, partial response signalling with less bandwidth requirements make equalization more viable and efficient. Finally, in the introduced optical channel it was shown that the Viterbi detector performs superior to the other detection techniques such as peak detection and DFE by at least 2dB. #### 2.5 References - [1]J. Taylor, DVD Demystified, McGraw Hill, 1998. - [2] R. K. Jurgen, Digital Consumer Electronics Handbook, McGraw Hill, 1996. - [3]J. G. Proakis and M. Salehi, "Communication Systems Engineering," *Prentice Hall*, 1994. - [4] A. Lender, "The duobinary technique for high-speed data transmission," *IEEE Trans. Commun. Electron.*, Vol. 82, May 1963, pp. 214-218. - [5]E. R. Kretzmer, "Generalization of a Technique for Binary Data Communication," *IEEE Trans. on commun. Technol.*, Vol. 14, No. 1, Feb. 1966. - [6]J. M. B. Correia and A. V. T. Cartaxo, "Duobinary coding for 20 Gbit/s intensity modulated direct detection optical fiber transmission," *SPIE Proc. All-Optical Commun. Systems*, Vol. 3230, Nov. 1997, pp. 92-103. - [7]C. H. Lee and Y. S. Cho, "A PRML Detector for a DVDR Systems," *IEEE Trans. Consumer Electronics*, Vol. 45, No. 2, May 1999, pp. 278-285. - [8]S. H. Choi, J. J. Kong, B. G. Chung, Y. H. Kim, "Viterbi Detector Architecture for High-Speed optical Storage," *IEEE TENCON Proc. Speech and Image Technologies for Computing and Telecommunications*, Vol. 1, 1997, pp. 89-92. - [9]P. Kabal and S. Pasupathy, "Partial-Response Signaling," *IEEE Trans. Commun.*, Vol. 23, No. 9, Sep. 1975, pp. 921-934. - [10]H. Kobayashi, D. T. Tang, "Application of partial-response channel coding to magnetic recording systems," *IBM Journal of Research and Development*, July 1970, pp. 368-375. - [11]J. G. Proakis, Digital Communications, McGraw Hill, 1995. - [12]Y. Tamba, K. Yamakido, "A CMOS 6b 500MSample/s ADC for a Hard Disk Drive Read Channel," *ISSCC Dig. Tech. Papers*, Feb. 1999, pp. 324-325. - [13]I. Mehr, D. Dalton, "A 500-MSample/s, 6-Bit Nyquist-Rate ADC for Disk-Drive Read-Channel Applications," *IEEE J. Solid-State Circuits*, Vol. 34, No. 7, July 1999, pp. 912-920. - [14]S. Sridharan and L. R. Carley, "A 110 MHz 350mW 0.6µ CMOS 16-State Generalized-Target Viterbi detector for Disk Drive Read Channels," *IEEE J. Solid-State Circuits*, Vol. 35, No. 3, March 2000. - [15]L. R. Carley and S. Sridharan, "A pipelined 16-State Generalized Viterbi Detector," IEEE Trans. Magnetics, Vol. 34, No. 1, Jan. 1998, pp. 181-186. - [16]T. W. Matthews, R. R. spencer, "An Integrated Analog CMOS Viterbi Detector for - Digital Magnetic Recording," *IEEE J. Solid-State Circuits*, Vol. 28, No. 12, Dec. 1993, pp. 1294-1302. - [17]M. H. Shakiba, D. A. Johns and K. W. Martin, "An Integrated 200-MHz 3.3-V BICMOS Class-IV Partial-Response Analog Viterbi Decoder," *IEEE J. Solid-State Circuits*, Vol. 33, No. 1, Jan. 1998, pp. 61-75. - [18]A. Demosthenous and J. Taylor, "Low-Power CMOS and BICMOS Circuits for Analog Convolutional Decoders," *IEEE Trans. Circuits and Systems-II*, Vol. 46, No. 8, Aug. 1999, pp. 1077-1080. - [19]R. G. Yamasaki, T. Pan, M. Palmer, D. Browning, "A 72Mb/s PRML Disk-Drive Channel Chip with an analog Sampled-Data Signal Processor," *ISSCC Dig. Tech. Papers*, Feb. 1994, pp. 278-279. - [20]IEC825:1993, "Safety of Laser Products. Equipment classification, requirements and users guide." - [21] David J. Heatley and Ian Neild, "Optical Wireless the promise and the reality," *IEE Colloquium on Optical Wireless Communications*, pp. 1/1-1/6, 1999. - [22]J.R. Barry, "Wireless Infrared Communications," Kluwer Academic Publishers, Boston, 1994. - [23]B. Razavi, "Design of High-Speed Circuits for Optical Communication Systems," *IEEE Proc. CICC*, pp. 315-322, May 2001. - [24]S.M. Park and C. Toumazou, "Gigahertz Low Noise CMOS Transimpedance Amplifier," *IEEE Proc. ISCAS*, pp. 209-212, June 1997. - [25]S.M. Park and C. Toumazou, "Low Noise Current-Mode CMOS Transimpedance Amplifier for Giga-Bit Optical Communication," *IEEE Proc. ISCAS*, Vol. 1, pp. 293-296, June 1998. - [26]T. Ruotsalainen, P. Palojarvi, and J. Kostamovaara, "A Current-Mode Gain-Control Scheme with Constant Bandwidth Delay for a Transimpedance Preamplifier," *IEEE J. Solid-State Circuits*, Vol. 34, No. 2, pp. 253-258, February 1999. - [27]M. B. Ritter, F. Gfeller, W. Hirt, D. Rogers, S. Gowda, "Circuit and System Challenges in IR Wireless Communication," *Proc. ISSCC*, pp. 398-399, February 1996. - [28]K. Phang and D.A. Johns, "A CMOS Optical Preamplifier for Wireless Infrared Communications," *IEEE Trans. Circuits and Systems-II: Analog and Digital Signal Processing*, Vol. 46, No. 7, pp. 852-859, July 1999. - [29]R. Coppoolse, J. Verbeke, P. Lambrecht, J. Codenie, and J. Vandewege, "Comparison of a Bipolar and a CMOS Front End in Broadband Optical Transimpedance Amplifiers," *Proc. 38th Midwest Symp. on Cir. and Sys.*, pp. 1026-1029, Aug. 1996. - [30]www.ti.com #### **CHAPTER** ## 3 # Fully Differential DC-Coupled Transimpedance Amplifier High-speed free-space optical receivers require a transimpedance amplifier (TIA) at their front-end to accommodate photodiodes with a wide field of view and hence a large depletion capacitance (1-10pF). Despite this significant capacitive node at the input, the desired TIAs should still provide wide bandwidth, high gain, and good sensitivity. These receivers find their application in laptop computers, personal digital assistants (PDAs), digital cameras and many other equipment supplied with a short distance infrared communication port. CMOS is the preferred technology for its low cost and ease of integration. However, building single-chip optical receivers in CMOS technology is challenging because of low supply voltages, small transistor transconductances, and large substrate noise making it difficult to achieve high bandwidth and good sensitivity. In wireless optical communications, TIAs also encounter ambient light perturbation. Intense background light can reduce output swing or even saturate the front-end preamplifiers. Different architectures have been proposed to improve the sensitivity at the input of the transimpedance amplifiers. Reported designs in [1], [3] and [7] have introduced a fully dif- ferential structure to reject common-mode substrate and power supply noise, but continue to sense the photocurrent single-endedly. To maintain symmetry at the input of the aforementioned designs, an additional capacitor or current source is attached to the other input terminal. Since common-mode rejection principally relies upon the symmetry of the signal path, such designs are prone to degraded performance as a result of poor matching and variations in the photodiode characteristics due to temperature and process. In comparison, an ac-coupled photodiode current sensing configuration in [4] and [8] suffers from the difficulty of realizing on-chip capacitors and photodiode bias voltage fluctuations due to ambient light variation. Bandwidth improvement has been achieved by exploiting common-gate transistors at the input of transimpedance amplifiers [1], [5] and [6]. Common-gate transistors with low input impedances isolate the photodiode capacitance from the rest of the circuit and so the dominant pole at the circuit will no longer be a function of the input capacitance. This chapter presents a new transimpedance amplifier configuration with the capability of differential photodiode current sensing without any need for ac coupling capacitors by using common-gate transistors at the input. This structure results in numerous advantages. It provides a regulated photodiode biasing under ambient light intensity variations. It also boosts the signal power at the output by 6dB while increasing the noise power by 3dB resulting in an overall sensitivity improvement of 3dB. As well, it provides higher bandwidth by removing the dominant pole from the input terminal. It also demonstrates better robustness against common-mode and substrate noise due to the balanced configuration. Finally, this design includes a dc photocurrent rejection circuit at the front of the TIA to remove any low-frequency signal caused by background light. #### 3.1 Circuit Implementation Fig. 3.1 shows the simplified schematic of the proposed preamplifier structure. Fig. 3.1: Basic structure of the proposed transimpedance amplifier This circuit is composed of three sections, the photodiode bias input stage, the differential amplifier, A, and the dc photocurrent rejection feedback loop. Transistors $M_{b3}$ and $M_{b6}$ regulate the reverse bias voltage across the photodiode. A regulated cascode circuit (RGC) is employed to reduce the input impedance seen at the sources of $M_{b3-6}$ . These transistors provide a common-gate input buffer stage to the transimpedance amplifier. Transistor $M_{b4}$ is added to improve the symmetry of the signal path. Although, the proposed receiver structure does not impose any constraint on the design of the amplifier, we present here a differential amplifier that uses local shunt feedback in the second stage. The amplifier A is similar to the two-stage design presented in [3], but uses an n-channel input differential pair to achieve higher bandwidth and reduced thermal noise. The dc rejection feedback loop consists of an error amplifier and a differential pair made up of transistors $M_{b1}$ and $M_{b2}$ . The error amplifier functions as an integrator and determines the average difference in the differential outputs. Current sources $I_1$ and $I_2$ bias the cascode transistors $M_{b3-6}$ . In the absence of any dc photocurrent, all currents sourced by $I_1$ and $I_2$ will be drawn away from the amplifier by the differential pair $M_{b1}$ and $M_{b2}$ . The presence of a dc photocurrent, $I_o$ , results in a negative differential offset voltage at the output of the amplifier. This offset causes the error amplifier to change the differential voltage applied to the differential pair. In steady state, the bias current increases by $I_o$ for $M_{b1}$ and decreases by $I_o$ for $M_{b2}$ as illustrated in Fig. 3.1. In essence, the dc photocurrent has been steered into the differential pair, away from the amplifier. In contrast, for the actual signal current $i_s$ , the differential pair appears as a high impedance path. As a result, $i_s$ passes through to the amplifier where it is sensed differentially, resulting in a transimpedance gain of $2R_{f2}$ . #### 3.1.1 The Photodiode Biasing Circuit The photodiode biasing circuit including the common-gate and RGC transistors is shown in Fig. 3.2. The reverse bias voltage on the photodiode is determined by the voltage differences at the sources of $M_{b3}$ and $M_{b6}$ . For a 3V supply, the applied bias voltage ranges from a minimum of the threshold voltage of an NMOS device to a maximum of about 1V which is required to keep all MOSFETs in the active region. Fig. 3.2: Photodiode bias input stage To explain the operation of the RGC blocks, we refer to the equivalent half-circuit schematic shown in Fig. 3.3. Fig. 3.3: Equivalent half-circuit of RGC The input impedance of this circuit is $$R_{in} = \frac{1}{g_{mb6}(1 + g_{mc2}R_d)} (3.1)$$ Basically, it is the input impedance of a simple common-gate stage reduced by the factor $l+g_{mc2}R_d$ which is one plus the loop gain of feedback circuit made up of $M_{c2}$ and $R_d$ . Some design considerations which limit the minimum achievable input impedance include the allowable voltage drop across $R_d$ , power consumption, and frequency response of the feedback circuit. For the differential configuration, the impedance looking into the bias circuit will be twice that given in (3.1). The frequency pole resulting from the depletion capacitance of the photodiode, $C_{pd}$ , and the differential input impedance $2R_{in}$ is $$P_1 = \frac{1}{2R_{in}C_{pd}} (3.2)$$ #### 3.1.2 The Differential Amplifier Fig. 3.4 shows the schematic diagram of the differential amplifier, A. Fig. 3.4: Differential amplifier circuit Diode-connected transistor $M_{13}$ is used to level-shift the output common-mode voltage to about 2.2V. The gain of the circuit is given by $$\frac{2A_{vd}}{1+A_{vd}}R_{f2} = 2R_{f2} \qquad for A_{vd} = 1$$ (3.3) where $A_{vd}=g_{m1}R_{f1}$ is the open-loop voltage gain of differential amplifier and $g_{m1}$ is the transconductance of the input differential pair [3]. The bandwidth of A is actually defining the whole circuit frequency performance. Presenting the half circuit model of this amplifier as shown in Fig. 3.5, and assuming $C_{f2}$ as the compensating capacitor and $C_{in}$ as the equivalent input capacitance the transimpedance transfer function can be readily derived by (3.4) where $A_v$ denotes single-ended transimpedance gain. Fig. 3.5: Single-ended circuit for frequency response calculation $$\frac{V_o(S)}{i_s(S)} = -\frac{R_{f2}}{R_{f2} \left(\frac{C_{in}}{A_v} + C_{f2}\right) S + 1}$$ (3.4) Since photodiode depletion capacitance has been isolated from the input, the $C_{in}/A_v$ term in (3.4) is not significant in the mid-frequencies and the transimpedance dominant pole is defined by the feedback impedance time constant, $R_{f2}C_{f2}$ . At higher frequencies toward the unity gain-bandwidth of this amplifier, the frequency dependency of $A_v$ becomes more noticeable and the non-dominant poles will be unveiled in the transimpedance transfer function of $v_o(s)/i_s(s)$ . A small-signal model for the second stage of the amplifier is shown in Fig. 3.6 where $C_{f1}$ is the total shunt feedback capacitance bridging the drain and gate of $M_{11}$ or $M_{12}$ while $C_i$ and $C_o$ represent the equivalent input and output capacitances, respectively. Because of the presence of two capacitors, $C_i$ and $C_o$ , the second stage exhibits a second-order response. Shunt feedback capacitor, $C_{f1}$ , introduces a zero at high frequencies but keeps the number of poles the same, by which it provides a degree of freedom to tune the pole locations for frequency and transient response optimization. Fig. 3.6: Simplified model for the second stage #### 3.1.3 The DC Photocurrent Rejection Circuit The error amplifier in Fig. 3.1 is in fact a fully differential amplifier which controls the individual currents of $M_{b1}$ and $M_{b2}$ based on the average of the difference at the outputs of internal transimpedance amplifier to equalize these outputs. A common-mode feedback circuit is deployed to control the common mode level of the error amplifier outputs. As shown in Fig. 3.7, transistors $M_{(11-14)}$ as well as transistors $M_{(21-24)}$ form the fully differential error amplifier. Common-mode feedback transistors $M_{(15-16)}$ and $M_{(25-26)}$ adjust the current flow in $M_{17}$ and $M_{27}$ so as to keep the common mode level of the differential outputs $OUT_p$ and $OUT_n$ equal to $V_{ref}$ . Off-chip capacitors $C_1$ and $C_2$ average the signal levels at the outputs of error amplifier. The dc photocurrent rejection circuit including the error amplifier and the transistors $M_{b1}$ and $M_{b2}$ poses a lower bound for the circuit frequency response. Assuming a single-pole transfer function of $$A_{err}(s) = \frac{A_0}{1 + s/\omega_{err}} \tag{3.5}$$ for the error amplifier with the output impedance of $R_o$ in which $\omega_{err}$ is the dominant-pole frequency, $I/(R_oC_I)$ , and $A_0$ is its low frequency gain, $(g_{mI3}R_o)$ , the lower bound will be located at $$\omega_{low} = \omega_{err} g_{m_{h2}} R_{f2} A_0 \quad . \tag{3.6}$$ This low frequency pole (unlike the design in[3]) is constant and doesn't show variation with the photodiode dc current. The photodiode dc current rejection procedure is illustrated in Fig. 3.8. In Fig. 3.8.a, currents in $M_{b1}$ and $M_{b2}$ are about to be adjusted and in Fig. 3.8.b, this procedure is completed and dc current is fully rejected by creating an offset in bias current of the aforementioned transistors. Fig. 3.8: DC photocurrent rejection, a: dc rejection in progress, b: dc rejection completed #### 3.2 Output buffer A fully differential output buffer capable of driving $50\Omega$ loads has been provided in this design. As shown in Fig. 3.9, bias current in this buffer is controlled by off-chip resistor, $R_{bias}$ . Two common-source transistors $M_3$ and $M_4$ are isolating input signal from the large transistors $M_1$ and $M_2$ which support a biasing current of 12mA each and hence create relatively large capacitors at their gates. Fig. 3.9: Output buffer The simulated frequency response of the above output buffer is presented in Fig. 3.10. Voltage gain is 4.4dB (1.66) and -3dB frequency is located at about 950 MHz. For precise extraction of the transimpedance amplifier performance, an extra isolated output buffer was also included in the chip for measurement purposes. Fig. 3.10: Frequency response of the output buffer #### 3.3 Noise Performance For noise calculation, although shot noise can dominate under intense background light, we need to analyze the circuit's inherent thermal noise for a generalized application environment. To do so, the approach in [6] can be followed by considering the input half circuit of Fig. 3.11 where a single transistor, $M_{cs1}$ , represents the current source $I_2$ (although $I_2$ was in fact realized as a cascode current source). Fig. 3.11: Input half-circuit used for noise calculation The input equivalent noise current can be approximated by $$\overline{I_{n_{in}}^{2}} = \frac{4kT}{R_{f2}} + 4kT_{\overline{3}}^{2}g_{mcs1} + 4kT_{\overline{3}}^{2}g_{m_{b2}} + \frac{8kT}{3g_{mc2}} \left(\omega^{2}(2C_{pd})^{2}\right)$$ (3.7) In this equation, the noise for $M_{b4}$ is cancelled because it is placed in series with $M_{b2}$ and also, junction capacitors at the input have been ignored in comparison with $C_{pd}$ . By choosing an optimum size and bias current for the transistors involved in the above equation, it is shown in [6] that the input noise current spectral density in the RGC input stage is less than that of a simple common-gate input stage. It is worth noticing that in the single-ended circuit of Fig. 3.11, the voltage drop across $R_d$ is controlled by the sum of gate-source voltages of $M_{c2}$ and $M_{b6}$ . This poses another limitation on the $R_d$ resistance as well as $M_{c2}$ and $M_{b6}$ bias currents and sizes. In the differential RGC input, $M_{c2}$ source is virtually grounded and there is more flexibility for the circuit designer to improve noise and bandwidth performance. #### 3.4 Experimental Results Testing and characterization of the circuit was performed utilizing an on-chip differential output buffer for driving $50\Omega$ loads as well as $1K\Omega$ off-chip resistors in series with the source at the input to approximate a photodiode current source. The test set-up is shown in Fig. 3.12. Fig. 3.12: Circuit test set-up Fig. 3.13 depicts the TIA frequency response with $C_{pd}$ ranging from 0.5pF to 10pF. It can be seen that despite a 20 fold increase in $C_{pd}$ , the bandwidth has only decreased by a factor of two. In Fig. 3.14, the output noise spectral density demonstrates a slight positive slope within the -3dB bandwidth as predicted by (3.7). Fig. 3.13: Measured preamplifier frequency response for different photodiode capacitances Fig. 3.14: Measured output noise Fig. 3.15 shows output eye diagrams for 200Mbps and 400Mbps PN-sequence inputs. Fig. 3.16 illustrates the chip layout. In addition to the preamplifier itself, an output buffer and also a test buffer are included on this chip. A summary of the measured performance is given in Table 3.1. Fig. 3.15: Eye diagrams with $C_{pd}$ =5pF: a) 200 Mbps, b) 400 Mbps Fig. 3.16: Chip Micrograph Table 3.1: Performance Summary | Supply Voltage (Vdd) | 3.0V | |---------------------------------------------------------------------------|----------------------------------------------------------------| | Power Dissipation<br>(excluding output<br>buffer) | 30mW | | -3dB Bandwidth $C_{pd}$ =0.5pF $C_{pd}$ =2pF $C_{pd}$ =5pF $C_{pd}$ =10pF | 290MHz<br>255MHz<br>210MHz<br>150MHz | | Average Input Noise<br>C <sub>pd</sub> =0.5pF<br>Cpd=2pF<br>Cpd=5pF | 6.0pA/ √ <i>Hz</i><br>6.8pA/ √ <i>Hz</i><br>8.7pA/ √ <i>Hz</i> | | Differential<br>Transimpedance Gain | 90.4 dBΩ | | Mid-band differential input impedance | 74Ω | | Low-band frequency | 180KHz | | Power Supply Rejection | 40dB @ 20MHz | | Max. Differential<br>Output Swing with<br>50Ω Load | lV <sub>pp</sub> | | Technology | 0.35μm, CMOS | | Active Area | 300μm x 155μm | #### 3.5 Summary The proposed design is the first dc-coupled fully differential photodiode current sensing preamplifier without any need for ac coupling capacitors. This structure, compared to a similar design in single-ended mode, improves SNR by 3dB and displays significant improvement in substrate noise and power supply rejection. In addition, using common-gate transistors and an RGC block at the input, the bandwidth has been improved by isolating the photodiode capacitance. For better sensitivity, a fully differential dc current rejection circuit has also been provided to bypass photodiode dc current due to ambient light. In Table 3.2, circuit performance of this proposed transimpedance amplifier has been compared with that of some recent designs. As shown, even with Cpd=5pF, this design demonstrates the highest gain-bandwidth product. The input referred current noise, as implied from (6), can be improved by decreasing $g_{mcs1}$ and $g_{mb2}$ and increasing $g_{mc2}$ (Fig. 3.11). This can be done without any significant bandwidth reduction, because the dominant pole is not a function of the input circuit. Simulation results confirm a 35% improvement in noise performance at the expense of 10% bandwidth reduction when halving both currents and sizes of $M_{b2}$ and $M_{cs1}$ . Table 3.2: Comparison with previous work | reference | input | output | C <sub>pd</sub><br>(pF) | Gain<br>(KΩ) | Bandwidth<br>(MHz) | Gain<br>Bandwidth<br>(THz-Ω) | noise<br>(pA/ √Hz) | power<br>(mW) | supply (V) | process | |--------------|--------------|-------------------------|-------------------------|--------------|--------------------|------------------------------|--------------------|---------------|------------|----------------| | [6]* | single | single | <0.5 | 1.13 | 3500 | 3.96 | 4.2 | 135** | NA | CMOS<br>0.6µm | | [1] | single | differential | 0.6 | 1.6 | 1200 | 3.4 | >15 | 115 | NA | CMOS<br>0.5μm | | [2] | single | single | NA | 8.7 | 550 | 4.8 | 4.5 | 30 | 3.0 | CMOS<br>0.6µm | | This<br>Work | differential | rential differential 0. | 0.5 | 33 | 290 | 9.6 | 6 | 30 | 3.0 | CMOS<br>0.35μm | | | | | 2 | | 255 | 8.4 | 6.8 | | | | | | | | 5 | | 210 | 6.9 | 8.7 | | | | <sup>\*</sup> simulation results <sup>\*\*</sup> output buffer power dissipation is included #### 3.6 References - [1]S. S. Mohan, M. Mar Hershenson, S. P. Boyd, and T. H. Lee, "Bandwidth Extension in CMOS with Optimized On-Chip Inductors," *IEEE J. Solid-State Circuits*, Vol. 35, No. 3, pp. 346-355, March 2000. - [2]B. Razavi, "A 622Mb/s 4.5pA/ $\sqrt{Hz}$ CMOS Transimpedance Amplifier," *Proc. ISSCC*, pp. 162-163, February 2000. - [3]K. Phang and D.A. Johns, "A CMOS Optical Preamplifier for Wireless Infrared Communications," *IEEE Trans. Circuits and Systems-II: Analog and Digital Signal Processing*, Vol. 46, No. 7, pp. 852-859, July 1999. - [4]T. Ruotsalainen, P. Palojarvi, and J. Kostamovaara, "A Current-Mode Gain-Control Scheme with Constant Bandwidth Delay for a Transimpedance Preamplifier," *IEEE J. Solid-State Circuits*, Vol. 34, No. 2, pp. 253-258, February 1999. - [5]S.M. Park and C. Toumazou, "Low Noise Current-Mode CMOS Transimpedance Amplifier for Giga-Bit Optical Communication," *IEEE Proc. ISCAS*, TAA8-4, June 1998. - [6]S. M. Park and C. Toumazou, "Gigahertz Low Noise CMOS Transimpedance Amplifier," *IEEE Proc. ISCAS*, pp. 209-212, June 1997. - [7]R. Coppoolse, J. Verbeke, P. Lambrecht, J. Codenie, and J. Vandewege, "Comparison of a Bipolar and a CMOS Front End in Broadband Optical Transimpedance Amplifiers," *Proc. 38th Midwest Symp. on Cir. and Sys.*, Brazil, pp. 1026-1029, Aug. 1996. - [8]M. B. Ritter, F. Gfeller, W. Hirt, D. Rogers, S. Gowda, "Circuit and System Challenges in IR Wireless Communication," *Proc. ISSCC*, pp. 398-399, February 1996. #### **CHAPTER** 4 ## Analog Reduced-State Sequence Detection **System and Circuit Design** The exponential growth of high speed data communication transceivers is often hindered by interface components and transmission link shortcomings. For example, in wired links there is a limited bandwidth that is dependent on distance and cabling while in line-of-sight free-space optical links, a bandwidth limitation occurs due to photodiodes with large depletion capacitance and LEDs. Two common techniques to combat bandwidth limitations are multi-level modulation and partial response signalling (PRS) [1]. Multi-level modulation schemes reduce the required channel bandwidth for a given bit rate and hence, increase channel efficiency. A simple multi-level transmission scheme is M-level pulse amplitude modulation (M-PAM), where each pulse conveys $\log_2(M)$ bits of information by mapping each combination of $\log_2(M)$ bits to one of the M specified levels. Partial response signalling also improves channel efficiency but in this case, the improvement occurs by allowing a controlled amount of intersymbol interference (ISI). Intersymbol interference is the major impediment in band-limited channels. This phenomenon is the interaction of adjacent symbols over the current passing symbol and will increase when pushing the channel toward higher throughputs. While ISI is nominally implied as an interfering signal, it is an informative signal in a sense that it carries part of the knowledge of other symbols. Partial response signalling uses the information embedded in the ISI part to reduce the required bandwidth to the Nyquist limit and thereby reducing noise enhancement as less equalization is required. Two major detection techniques can be used to decode a signal with ISI. Symbol-by-symbol detection technique throws away the information hidden in ISI and identifies the received data only based on the instantaneous quantization of the signal amplitude. On the other hand, maximum likelihood sequence detection (MLSD) is theoretically the optimum detection technique [2] which takes the full advantage of ISI information and extracts data after completing the detection process over the whole sequence of the received signal. Despite being an optimal detection technique, a brute-force approach to implement MLSD is rather impractical due to its complexity. The Viterbi algorithm is a computationally efficient means of establishing a MLSD detection [3] but still its complexity in most practical channels and modulation schemes is significant [8]. Design simplicity and performance trade-offs have always been a dilemma in choosing to adopt a symbol-by-symbol detection technique such as DFE [4] [5] for simplicity or the Viterbi algorithm for better performance [3]. The implementation complexity of the Viterbi decoder for an N'th-order PRS scheme with an M-ary input signal is roughly $M^{N+1}$ times that of a DFE [6]. For multi-level signalling the Viterbi algorithm becomes more complex and all its performance can be worsened by circuit nonidealities. Reduced state sequence detection (RSSD) is a solution for maintaining almost the same performance as full-state Viterbi detection but with less realization complexity [7][9-12]. RSSD can be viewed as an intermediate detection technique between two extremes of full-state sequence detection and DFE. In general, RSSD reduces the number of states by grouping them into smaller number of hyper-states. If these groupings be manipulated in such a way that the minimum distance of the error events be maximized, it is shown in [23] that the performance degradation is negligible. DFE can be shown as a special case of RSSD when all of the states are combined into one single hyper-state [14]. A digital realization of a reduced-state Viterbi detector for 125Mb/s transmission over unshielded twisted-pair (UTP) cables is reported in [13]. However, in a digital implementation of RSSD, the existence of a front-end A/D converter which is power hungry at high speeds is indispensable. A 100Mb/s analog design of a Viterbi detector for 2-PAM dicode partial response signalling was realized in BICMOS [14]. The purpose of this chapter is to present an architecture that extends the approach in [14] so that a RSSD Viterbi detector can be realized for a 4-PAM partial response signalling. To demonstrate the approach, a chip was designed and tested using a 0.25µm CMOS process. While post-layout simulations assert its function up to 1Gb/s, due to test limitations, this chip was tested only up to 200Mb/s. Power consumption is measured to be 55mW from 2.5V supply at the operating speed of 200Mb/s. Although duobinary scheme has been the focus of this work for its application in optical links, this design can be easily modified or extended to other PRS schemes such as dicode and PR4. #### 4.1 4-PAM Signalling With a tremendous demand on higher speed in data communications and the lack of sufficient bandwidth in the existing channels, multiple level modulation schemes can offer higher bit rates at lower clock frequencies. However, with additional signal levels, reduced noise margins and circuit complexity can impact system performance. On the other hand, the noise margins for 2-PAM signals also worsen at higher clock frequencies where two level coding faces more attenuation than L\_PAM signals at the same bit rate [15][16]. 4-PAM signalling is a viable modulation for high speed data communications as it requires half the clock rate needed for 2-PAM at equal bit rates. This four level scheme can be incorporated in many applications such as point-to-point links [15], LANs, multi-drop buses [16] and wireless optical communications. #### 4.2 Reduced-State Viterbi Detector #### 4.2.1 System Approach The Viterbi algorithm is a practical technique for realizing a maximum-likelihood sequence detector. By measuring the difference between the actual value of the received signal and its expected value, one can assign metrics for each branch and state. Final detection is based on detecting the sequence with the least accumulated branch metrics. These states and branches are stretched in time and are shown in trellis diagrams. For a two-state trellis diagram, we follow the results in [17] to calculate branch metrics, $b_{ji}(k)$ , and state metrics, $m_i(k)$ , as shown in Fig. 4.1 where $m_i(k)$ denotes the metric of state i at time step k and $b_{ji}(k)$ is the metric of the branch connecting state j at time step k. Fig. 4.1: A two-state trellis diagram State metrics $m_0(k)$ and $m_1(k)$ for time k can be evaluated based on the previous state metrics and the branch metrics as follows: $$\begin{cases} m_0(k) = \min\{m_0(k-1) + b_{00}(k), m_1(k-1) + b_{10}(k)\} \\ m_1(k) = \min\{m_0(k-1) + b_{01}(k), m_1(k-1) + b_{11}(k)\} \end{cases} \tag{4.1}$$ To reduce realization complexity and avoid saturation, difference metrics can be defined as [21] $$\Delta m(k) = m_0(k) - m_1(k) \tag{4.2}$$ which suggests to store only the difference in the state metrics rather than the absolute state metrics. One can determine this difference metric and branch extensions based on the branch metrics condition such that $$\Delta m(k) = b_{00}(k) - b_{01}(k) \qquad \text{if } \begin{cases} \Delta m(k-1) < b_{10}(k) - b_{00}(k) \\ \Delta m(k-1) < b_{11}(k) - b_{01}(k) \end{cases}$$ $$(4.3.1)$$ $$\Delta m(k) = b_{00}(k) - b_{11}(k) + \Delta m(k-1) \text{ if } \begin{cases} \Delta m(k-1) < b_{10}(k) - b_{00}(k) \\ \Delta m(k-1) > b_{11}(k) - b_{01}(k) \end{cases}$$ $$(4.3.2)$$ $$\Delta m(k) = b_{10}(k) - b_{01}(k) - \Delta m(k-1) \text{ if } \begin{cases} \Delta m(k-1) > b_{10}(k) - b_{00}(k) \\ \Delta m(k-1) < b_{11}(k) - b_{01}(k) \end{cases}$$ (4.3.3) $$\Delta m(k) = b_{10}(k) - b_{11}(k) \qquad \text{if } \begin{cases} \Delta m(k-1) > b_{10}(k) - b_{00}(k) \\ \Delta m(k-1) > b_{11}(k) - b_{01}(k) \end{cases}$$ $$(4.3.4)$$ Extending this discussion to multilevel schemes, the full state trellis diagram for a 4-PAM modulation with the levels of -1, -1/3, +1/3 and +1V, encoded with a duobinary scheme, is shown in Fig. 4.2 where each branch is labeled by a pair of *input data/duobina-ry coded data*. For example, the branch connecting state 0 to the same state in the next sampling time is labeled by -1,-2 where -1 is the input data which is added to the previous data which was also -1 to result -2 in duobinary coding. For a 4-PAM modulation, although the full-state Viterbi algorithm works well, its circuit implementation is quite complex. To lower this complexity, reduced-state sequence detection is a solution for maintaining almost the same performance as full-state Viterbi detection but with less complexity. For a two-state RSSD the idea is to retain the two most probable states at each time and ignore the other states. These two states according to the adjacency relation [18]<sup>1</sup> will always be two neighboring states. Fig. 4.2: Full state trellis diagram for a 4-PAM duobinary PRS scheme. Branch labels represent the pairs of *uncoded* and *encoded* signals As a result, for the diagram in Fig. 4.2, the remaining states at each time can be (0,1) or (2,1) or (2,3). As an example, depending on the level of the received sample, possible branch extensions initiating from the states (0,1) are shown in Fig. 4.3. Fig. 4.3: Typical possible survivors in duobinary 4-PAM RSSD starting from the states (0,1) A few facts need to be clarified in Fig. 4.3. First, category pairs a-b and also c-d each <sup>1.</sup> Note that a dicode sequence was examined in [18] rather than duobinary one in our case. See the complete proof for the duobinary coding in the Appendix. have three branches in common and can only be distinguished by the fourth branch. To do so, a threshold value can be set by averaging y(k) values of these non-common branches. Second, other possible categories will not occur in duobinary coding [18]. Third, the next states in the categories a and d will always be (0,1) and (2,3), respectively, while in the categories b and c, next states will be (0,1) or (2,1) for b and (2,1) or (2,3) for c depending on the Viterbi algorithm results. Fourth, the highest and the lowest quantization thresholds for this example are 0 and -4/3V, respectively; these thresholds for the starting states (2,1) are 2/3 and -2/3V and are 4/3 and 0V for starting states (2,3). The preceding discussion suggests that by grouping odd and even states into two hyper-states, we can represent any of the categories in Fig. 4.3 by a trellis diagram as shown in Fig. 4.1 with the difference that the branch metrics are a function of their originating states [18]. Following this idea, the full-state trellis diagram in Fig. 4.2 will be reduced to the two hyper-state diagram in Fig. 4.4. Fig. 4.4: Two-state reduced trellis diagram As an example, any of the categories in Fig. 4.3 which are a subset of the above diagram are shown in the two-state trellis diagram in Fig. 4.5. Fig. 4.5: Two state presentation of the categories in Fig. 4.3 Having this state reduction in place, we can proceed to the next stage which is basically the same as two-state Viterbi detection. Denoting any starting state by j and ending state by i, the branch metrics will be equal to $$b_{ji}(k) = \left[y(k) - \frac{2}{3}(j+i-3)\right]^2 \quad j=0,1,2,3$$ $$i=0,1,2,3$$ (4.4) By removing common terms and applying a factor of 1/4, the branch metrics will reduce to $$b_{ji}(k) = \frac{3 - (j+i)}{3} \left[ y(k) + \frac{3 - (j+i)}{3} \right] \qquad i = 0, 1, 2, 3$$ $$i = 0, 1, 2, 3$$ (4.5) Using the above equation, the branch metrics for the example categories of (c) and (d) are shown in Fig. 4.6. Fig. 4.6: Typical branch metrics for the example categories (c) and (d) Complete possible branch extensions and their metrics starting from the adjacent states I (0,1), III (2,1) and V (2,3) are presented in Table 4.1. By applying Equations 4.3.1-4.3.4 to the branch metrics in Table 4.1, the number of ex- tended branches at each category is reduced to two under the conditions specified in Tables 4.2-4.4. Note that the branch metrics a,b,c and also a!,b!,c! can be either positive or negative depending on the value of y(k). Due to this fact, extra thresholds 1, +1/3, -1/3 and -1 in the following tables are introduced to differentiate between the distinct signs of these metrics. Qu and Qd in Tables 4.2-4 are the outputs of difference metric update comparators which will be elaborated later in this chapter. Table 4.1: Branch extension and their metrics | | Present State = I (0, 1) | | Present State = III ( | 2, 1) | Present State = V (2, 3) | | |-----------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------|---------------|-----------------------------------------|---------------|------------------------------|---------------| | y(k) | Branch Extension and metrics | Next<br>State | Branch Extension and metrics | Next<br>State | Branch Extension and metrics | Next<br>State | | 4/3 <y(k)< td=""><td>DO C:</td><td>V</td><td>c! c! b!</td><td>V</td><td>c!<br/>b!b!</td><td>V</td></y(k)<> | DO C: | V | c! c! b! | V | c!<br>b!b! | V | | 2/3 <y(k)<4 3<="" td=""><td>00 c!</td><td>V</td><td>0<br/>c!c!<br/>b!</td><td>V</td><td>b! c! c!</td><td>III<br/>V</td></y(k)<4> | 00 c! | V | 0<br>c!c!<br>b! | V | b! c! c! | III<br>V | | 0 <y(k)<2 3<="" td=""><td>000 c:</td><td>V</td><td>c!<u>0</u> c!</td><td>III</td><td>0<br/>C!</td><td>I<br/>III</td></y(k)<2> | 000 c: | V | c! <u>0</u> c! | III | 0<br>C! | I<br>III | | -2/3 <y(k)<0< td=""><td>D<sub>C</sub></td><td>III<br/>V</td><td></td><td>I</td><td>00 c!</td><td>I</td></y(k)<0<> | D <sub>C</sub> | III<br>V | | I | 00 c! | I | | -4/3 <y(k)<-2 3<="" td=""><td>c b b</td><td>I<br/>III</td><td>e b o o o o o o o o o o o o o o o o o o</td><td>I</td><td>\$ 00 c!</td><td>I</td></y(k)<-2> | c b b | I<br>III | e b o o o o o o o o o o o o o o o o o o | I | \$ 00 c! | I | | y(k)<-4/3 | a b b c | I | b<br>cc | I | <u>c</u><br>00<br>0 c! | I | | a = y(k) + 1 | b = 2/3(y(k)+2/3) | | 1/3(y(k)+1/3) | | | | | a!=-y(k)+1 | b!=2/3(-y(k)+2/3) | c!= | :1/3(-y(k)+1/3) | | | | Table 4.2: Branch extension and difference metric update of state I | Present State=I (0,1) | | | | | Next | |-----------------------|----------------------------------------------------------------------------------------------------------------------------------------------|-----------------|-----------------------------------------------|------------------|-------| | q | y(k) | Δm(k) | Condition | Branch Extension | State | | | | 1/3(y(k)+1/3) | $\Delta m(k-1) < -1/3(y(k)+1/3)$ | | | | | | | (Qu=0, Qd=1) | : \ | v | | | 0 <y(k)< td=""><td>-Δm(k-1)</td><td><math>-1/3(y(k)+1/3)&lt;\Delta m(k-1)&lt;1/3(-y(k)+1/3)</math></td><td></td><td></td></y(k)<> | -Δm(k-1) | $-1/3(y(k)+1/3)<\Delta m(k-1)<1/3(-y(k)+1/3)$ | | | | ı | | | (Qu=0, Qd=0) | : | v | | | | -1/3(-y(k)+1/3) | $\Delta m(k-1) > 1/3(-y(k)+1/3)$ | | | | | | | (Qu=1, Qd=0) | : | v | | | | 1/3(y(k)+1/3) | $\Delta m(k-1) < -1/3(y(k)+1/3)$ | | | | | | | (Qu=0, Qd=1) | : \ <u>\</u> | V | | 2 | -1/3 <y(k)<0< td=""><td>-Δm(k-1)</td><td><math>-1/3(y(k)+1/3)&lt;\Delta m(k-1)&lt;1/3(y(k)+1/3)</math></td><td>:</td><td></td></y(k)<0<> | -Δm(k-1) | $-1/3(y(k)+1/3)<\Delta m(k-1)<1/3(y(k)+1/3)$ | : | | | - | | | (Qu=0, Qd=0) | : | V | | | | -1/3(y(k)+1/3) | $\Delta m(k-1) > 1/3(y(k)+1/3)$ | | | | | | | (Qu=1, Qd=0) | : | ш | | | | 1/3(y(k)+1/3) | $\Delta m(k-1) < 1/3(y(k)+1/3)$ | | | | | | | (Qu=0, Qd=1) | : | v | | | -2/3 <y(k)<-1 3<="" td=""><td>Δm(k-1)</td><td><math>1/3(y(k)+1/3)&lt;\Delta m(k-1)&lt;-1/3(y(k)+1/3)</math></td><td></td><td></td></y(k)<-1> | Δm(k-1) | $1/3(y(k)+1/3)<\Delta m(k-1)<-1/3(y(k)+1/3)$ | | | | 3 | | | (Qu=0, Qd=0) | | ш | | | | -1/3(y(k)+1/3) | $\Delta m(k-1) > -1/3(y(k)+1/3)$ | - | | | | | | (Qu=1, Qd=0) | | III | | | | -1/3(y(k)+1) | $\Delta m(k-1) < -1/3(y(k)+1)$ | | | | | -1 <y(k)<-2 3<="" td=""><td></td><td>(Qu=0, Qd=1)</td><td>···</td><td>III</td></y(k)<-2> | | (Qu=0, Qd=1) | ··· | III | | | | Δm(k-1) | $-1/3(y(k)+1)<\Delta m(k-1)<1/3(y(k)+1)$ | | | | 4 | | | (Qu=0, Qd=0) | <i>)</i> " | III | | | : | 1/3(y(k)+1) | $\Delta m(k-1) > 1/3(y(k)+1)$ | ; | | | | | | (Qu=1, Qd=0) | | I | | | | -1/3(y(k)+1) | $\Delta m(k-1) < 1/3(y(k)+1)$ | | | | | | | (Qu=0, Qd=1) | : | III | | | -4/3 <y(k)<-1< td=""><td>-∆m(k-1)</td><td><math>1/3(y(k)+1)&lt;\Delta m(k-1)&lt;-1/3(y(k)+1)</math></td><td></td><td></td></y(k)<-1<> | -∆m(k-1) | $1/3(y(k)+1)<\Delta m(k-1)<-1/3(y(k)+1)$ | | | | 5 | | | (Qu=0, Qd=0) | : _ : | I | | | | 1/3(y(k)+1) | $\Delta m(k-1) > -1/3(y(k)+1)$ | | | | | | | (Qu=1, Qd=0) | <u>:</u> : | I | | 6 | | 1/3(y(k)+5/3) | $\Delta m(k-1) < -1/3(y(k)+5/3)$ | | | | | | | (Qu=0, Qd=1) | : | I | | | y(k)<-4/3 | –Δm(k-1) | $-1/3(y(k)+5/3)<\Delta m(k-1)<-1/3(y(k)+1)$ | | | | | | | (Qu=0, Qd=0) | : : | I | | | | 1/3(y(k)+1) | $\Delta m(k-1) > -1/3(y(k)+1)$ | | | | | | | (Qu=1, Qd=0) | : | I | Table 4.3: Branch extension and difference metric update of state III | Present State = III (2, 1) | | | | | Next | |----------------------------|-------------------------------------------------------------------------------------------------------------------------------------|---------------------|----------------------------------------------------------------|-------------------------|-------| | q | y(k) | Δm(k) | Condition | <b>Branch Extension</b> | State | | | | -1/3(-y(k)+1) | $\Delta m(k-1) < -1/3(-y(k)+1)$ | : | | | | | | (Qu=0, Qd=1) | | V | | | 2/3 <y(k)< td=""><td>Δm(k-1)</td><td><math>-1/3(-y(k)+1)&lt;\Delta m(k-1)&lt;-1/3(-y(k)+1/3)</math></td><td>:</td><td></td></y(k)<> | Δm(k-1) | $-1/3(-y(k)+1)<\Delta m(k-1)<-1/3(-y(k)+1/3)$ | : | | | 1 | | | (Qu=0, Qd=0) | | V | | | | -1/3(-y(k)+1/3) | $\Delta m(k-1) > -1/3(-y(k)+1/3)$ | :: | | | | | | (Qu=1, Qd=0) | | v | | | | 1/3(-y(k)+1/3) | $\Delta m(k-1) < 1/3(-y(k)+1/3)$ | : : | | | | | | (Qu=0, Qd=1) | | III | | | | Δm(k-1) | $1/3(-y(k)+1/3)<\Delta m(k-1)<-1/3(-y(k)+1/3)$ | : : | | | 2 | 1/3 <y(k)<2 3<="" td=""><td></td><td>(Qu=0, Qd=0)</td><td></td><td>V</td></y(k)<2> | | (Qu=0, Qd=0) | | V | | | | -1/3(-y(k)+1/3) | $\Delta m(k-1) > -1/3(-y(k)+1/3)$ | : : | | | | | | (Qu=1, Qd=0) | | v | | | 0 <y(k)<1 3<="" td=""><td>1/3(-y(k)+1/3)</td><td><math>\Delta m(k-1) &lt; -1/3(-y(k)+1/3)</math></td><td></td><td></td></y(k)<1> | 1/3(-y(k)+1/3) | $\Delta m(k-1) < -1/3(-y(k)+1/3)$ | | | | | | - | (Qu=0, Qd=1) | | III | | | | -Δm(k-1) | $-1/3(-y(k)+1/3)<\Delta m(k-1)<1/3(-y(k)+1/3)$ | • | | | 3 | | | (Qu=0, Qd=0) | | | | | | -1/3(-y(k)+1/3) | $\Delta m(k-1) > 1/3(-y(k)+1/3)$ | • • | | | | | | (Qu=1, Qd=0) | | v | | | | 1/3(y(k)+1/3) | $\Delta m(k-1) < -1/3(y(k)+1/3)$ | • | | | | -1/3 <y(k)<0< td=""><td></td><td>(Qu=0, Qd=1)</td><td></td><td>I</td></y(k)<0<> | | (Qu=0, Qd=1) | | I | | | | -Δm(k-1) | $-1/3(y(k)+1/3)<\Delta m(k-1)<1/3(y(k)+1/3)$ | • | | | 4 | | | (Qu=0, Qd=0) | | 111 | | | | -1/3(y(k)+1/3) | $\Delta m(k-1) > 1/3(y(k)+1/3)$ | • | | | | | • | (Qu=1, Qd=0) | | III | | | -2/3 <y(k)<-1 3<="" td=""><td>1/3(y(k)+1/3)</td><td><math>\Delta m(k-1) &lt; 1/3(y(k)+1/3)</math></td><td>•</td><td></td></y(k)<-1> | 1/3(y(k)+1/3) | $\Delta m(k-1) < 1/3(y(k)+1/3)$ | • | | | | | • • • • • | (Qu=0, Qd=1) | | I | | | | Δm(k-1) | $1/3(y(k)+1/3)<\Delta m(k-1)<-1/3(y(k)+1/3)$ | • | | | 5 | | | (Qu=0, Qd=0) | | 1 | | - | | -1/3(y(k)+1/3) | $\Delta m(k-1) > -1/3(y(k)+1/3)$ | • | | | | | ( <b>)</b> () | (Qu=1, Qd=0) | | Ш | | | y(k)<-2/3 | 1/3(y(k)+1/3) | $\Delta m(k-1) < 1/3(y(k)+1/3)$ | | | | | | - • • • • • • • • • | (Qu=0, Qd=1) | | I | | 6 | | Δm(k-1) | $\frac{(Qu=0, Qu=1)}{1/3(y(k)+1/3)<\Delta m(k-1)<1/3(y(k)+1)}$ | • | | | | | | (Qu=0, Qd=0) | | I | | | | 1/3(y(k)+1) | $\Delta m(k-1) > 1/3(y(k)+1)$ | • | - | | | 1 | 21 = (3 (**) · *) | | | | Table 4.4: Branch extension and difference metric update of state V | Present State = V (2, 3) | | | | | Next | |--------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------|------------------------------------------------|------------------|------| | q | y(k) | Δm(k) | Condition | Branch Extension | | | 1 | | -1/3(-y(k)+1) | $\Delta m(k-1) < 1/3(-y(k)+1)$ | : | | | | | | (Qu=0, Qd=1) | | V | | | 4/3 <y(k)< td=""><td>-Δm(k-1)</td><td><math>1/3(-y(k)+1)&lt;\Delta m(k-1)&lt;1/3(-y(k)+5/3)</math></td><td>•</td><td></td></y(k)<> | -Δm(k-1) | $1/3(-y(k)+1)<\Delta m(k-1)<1/3(-y(k)+5/3)$ | • | | | | | | (Qu=0, Qd=0) | | v | | | | -1/3(-y(k)+5/3) | $\Delta m(k-1) > 1/3(-y(k)+5/3)$ | • | | | | | | (Qu=1, Qd=0) | | v | | | - | -1/3(-y(k)+1) | $\Delta m(k-1) < 1/3(-y(k)+1)$ | : | | | | | | (Qu=0, Qd=1) | | v | | | | -Δm(k-1) | $1/3(-y(k)+1)<\Delta m(k-1)<-1/3(-y(k)+1)$ | • | | | 2 | 1 < y(k) < 4/3 | | (Qu=0, Qd=0) | | v | | | | $1/3(-y(k)+1)$ $\Delta m(k-1)>-1/3(-y(k)+1)$ | • | | | | | | | (Qu=1, Qd=0) | - | III | | | | -1/3(-y(k)+1) | $\Delta m(k-1) < -1/3(-y(k)+1)$ | • | | | | | | (Qu=0, Qd=1) | | V | | | | Δm(k-1) | $-1/3(-y(k)+1)<\Delta m(k-1)<1/3(-y(k)+1)$ | : | | | 3 | 2/3 <y(k)<1< td=""><td></td><td>(Qu=0, Qd=0)</td><td></td><td>Ш</td></y(k)<1<> | | (Qu=0, Qd=0) | | Ш | | | | 1/3(-y(k)+1) | $\Delta m(k-1) > 1/3(-y(k)+1)$ | : - | | | | | | (Qu=1, Qd=0) | | III | | | | 1/3(-y(k)+1/3) | $\Delta m(k-1) < 1/3(-y(k)+1/3)$ | | | | | | | (Qu=0, Qd=1) | • | III | | 4 | | Δm(k-1) | $1/3(-y(k)+1/3)<\Delta m(k-1)<-1/3(-y(k)+1/3)$ | : | - | | | 1/3 <y(k)<2 3<="" td=""><td></td><td>(Qu=0, Qd=0)</td><td></td><td>III</td></y(k)<2> | | (Qu=0, Qd=0) | | III | | | | -1/3(-y(k)+1/3) | $\Delta m(k-1) > -1/3(-y(k)+1/3)$ | | | | | | | (Qu=1, Qd=0) | | I | | | | $1/3(-y(k)+1/3)$ $\Delta m(k-1)<-1/3(-y(k)+1/3)$ | | | | | | | | (Qu=0, Qd=1) | • | Ш | | | 0 <y(k)<1 3<="" td=""><td>-Δm(k-1)</td><td><math>-1/3(-y(k)+1/3)&lt;\Delta m(k-1)&lt;1/3(-y(k)+1/3)</math></td><td></td><td></td></y(k)<1> | -Δm(k-1) | $-1/3(-y(k)+1/3)<\Delta m(k-1)<1/3(-y(k)+1/3)$ | | | | 5 | | | (Qu=0, Qd=0) | | I | | , | | -1/3(-y(k)+1/3) | $\Delta m(k-1) > 1/3(-y(k)+1/3)$ | | | | | | | (Qu=1, Qd=0) | | I | | | y(k)<0 | 1/3(y(k)+1/3) | $\Delta m(k-1) < -1/3(y(k)+1/3)$ | : | | | 6 | | | (Qu=0, Qd=1) | : | I | | | | -∆m(k-1) | $-1/3(y(k)+1/3)<\Delta m(k-1)<1/3(-y(k)+1/3)$ | | | | | | | (Qu=0, Qd=0) | | I | | | | -1/3(-y(k)+1/3) | $\Delta m(k-1) > 1/3(-y(k)+1/3)$ | : | | | | | | (Qu=1, Qd=0) | | I | As seen in Tables 4.2-4, with the knowledge of the previous state and the level of the present input signal, threshold levels for the final two comparators can be set and the difference metrics will be updated as the result of this final comparison. Finally, received data can be identified by keeping track of the survived branch transitions in a path memory. In the next sections, the circuit implementation of this type of detection will be elaborated. #### **4.2.2** Performance Evaluation Erroneous detection of the transmitted symbols can happen when an error event as a result of a deviation from the correct path in a sequence detection algorithm occurs. The probability of a particular error event $Pr_{ee}$ in a minimum-distance $(d_{min})$ path is in a general form of $$Pr_{ee} = C \cdot Q\left(\frac{d_{min}}{2\sigma}\right) \tag{4.6}$$ in which $\sigma$ is noise standard deviation, C is a constant, and Q(.) represents the complementary Gaussian distribution function [19] $$Q(x) = \frac{1}{\sqrt{2\pi}} \int_{x}^{\infty} \exp(\tau^{2}/2) d\tau.$$ (4.7) Fig. 4.7 is illustrating minimum-distance error events for a full-state and reduced-state detector in a 4-PAM signalling. For simplicity purposes, dicode PRS (see next section) is employed in this figure but the results can be applied to duobinary scheme as well. Fig. 4.7: Minimum-distance error event paths for a full-state and reduced-state detector in 4-PAM Signalling Bold lines represent the correct sequence path and the other lines shape minimum-distance error paths. Routes $a_1$ and $a_2$ are the same for full-state and reduced-state detection while path b is an extra minimum-distance error path for reduced-state detection as a result of combination of states 1 and 3. Assuming a normalized swing of [1, -1] for the 1+D symbols, minimum-distance for both full-state and reduced-state detection is equal to $d_{min} = \sqrt{(1/3)^2 + (1/3)^2} = \sqrt{2}/3$ . With a pyramid probability of symbols in a duobinary scheme, signal power with the above swing constraints can be calculated to be 5/18. The error event probability of the full-state and reduced-state detections in a sufficiently large decoding delay will be bounded by [18]: $$Pr_{ee(full-state)} \le \left[2\sum_{n=1}^{\infty} (3/4)^n\right] \cdot Q\left(\frac{1}{3\sqrt{2}\sigma}\right)$$ (4.8) $$Pr_{ee(reduced-state)} \le \left[3\sum_{n=1}^{\infty} (3/4)^n\right] \cdot Q\left(\frac{1}{3\sqrt{2}\sigma}\right)$$ (4.9) where n denotes the length of the error sequence in a possible minimum-distance error event. Furthermore, the symbol error rate (SER) can be computed by weighting each error event by the number of symbol errors that it entails. This yields [17] $$SER_{(full-state)} = 24Q\left(\frac{1}{3\sqrt{2}\sigma}\right) = 24Q\left(\frac{1}{\sqrt{5}}10^{\frac{SNR}{20}}\right)$$ (4.10) $$SER_{(reduced-state)} = 39Q\left(\frac{1}{3\sqrt{2}\sigma}\right) = 39Q\left(\frac{1}{\sqrt{5}}10^{\frac{SNR}{20}}\right)$$ (4.11) As is clear from the Equations 4.10 and 4.11, Q-function which is normally the dominant factor in error probability measurements, has the same argument in both equations which justifies the performance resemblance of two detection techniques. In other words, since the parallel branches in RSSD are not involved in minimum-distance error events, the performance impact is ignorable. In Fig. 4.8, the symbol error rate of a full-state ideal MLSD (4.10) is compared with that of a 2-state RSSD both in ideal case (4.11) and by simulation using the model to be presented in Fig. 4.13. Inaccuracy of (4.11) in low SNR is the reason for non-matching points in this figure. Fig. 4.8: SER performance comparison of MLSD and RSSD #### **4.2.3** RSSD for other PRS Schemes Having introduced a complete systematic RSSD design procedure for duobinary (1+D) PRS in the past sections, a brief review of RSSD design for other PRS schemes such as dicode (1-D) and PR4 (1-D<sup>2</sup>) is provided in this section. A full-state trellis diagram for a 4-PAM modulation in 1-D PRS scheme is shown in Fig. 4.9. Fig. 4.9: Full state trellis diagram for a 4-PAM dicode PRS scheme. Branch Labels represent the pairs of *uncoded* and *encoded* signals Comparing this figure with Fig. 4.2, it can be readily found that by reversing the order of initial states in Fig. 4.2, the trellis diagram in Fig. 4.9 is obtained. Table 4.5 presents the conditions and updates for the state I as an example. This table is a dual of Table 4.2 as it presents the same parameters for the state V in a duobinary scheme. Class-IV (PR4) signalling scheme (1-D<sup>2</sup>) is actually made up of two interleaved dicode detectors [14] which can operate at twice the sampling frequency of a single dicode detector and hence, double the operating frequency. Table 4.5: Branch extension and difference metric update of state I in dicode PRS | Present State = I(0, 1) | | | | | Next | |-------------------------|---------------------------------------------------------------------------------------------------------------------------------------|-----------------|---------------------------------------------------|-------------------------|------| | q | y(k) | Δm(k) | Condition | <b>Branch Extension</b> | 1 | | 1 | | -1/3(-y(k)+5/3) | $\Delta m(k-1) < -1/3(-y(k)+5/3)$ | : | | | | 4/3 <y(k)< td=""><td></td><td>(Qu=0, Qd=1)</td><td></td><td>v</td></y(k)<> | | (Qu=0, Qd=1) | | v | | | | Δm(k-1) | $-1/3(-y(k)+5/3)<\Delta m(k-1)<-1/3(-y(k)+1)$ | | | | | | | (Qu=0, Qd=0) | | v | | | | -1/3(-y(k)+1) | $\Delta m(k-1) > -1/3(-y(k)+1)$ | | | | | | | (Qu=1, Qd=0) | | v | | | | 1/3(-y(k)+1) | $\Delta m(k-1) < 1/3(-y(k)+1)$ | • | - | | | | | (Qu=0, Qd=1) | | V | | | | Δm(k-1) | $1/3(-y(k)-1)<\Delta m(k-1)<-1/3(-y(k)+1)$ | | | | 2 | 1 < y(k) < 4/3 | | (Qu=0, Qd=0) | | V | | | | -1/3(-y(k)+1) | $\Delta m(k-1) > -1/3(-y(k)+1)$ | | | | | | | (Qu=1, Qd=0) | | Ш | | | | -1/3(-y(k)+1) | $\Delta m(k-1) < 1/3(-y(k)+1)$ | | | | | | | (Qu=0, Qd=1) | | V | | | 2/3 <y(k)<1< td=""><td>-Δm(k-1)</td><td><math>1/3(-y(k)+1)&lt;\Delta m(k-1)&lt;-1/3(-y(k)+1)</math></td><td>•</td><td></td></y(k)<1<> | -Δm(k-1) | $1/3(-y(k)+1)<\Delta m(k-1)<-1/3(-y(k)+1)$ | • | | | 3 | | | (Qu=0, Qd=0) | | Ш | | | | 1/3(-y(k)+1) | $\Delta m(k-1) > -1/3(-y(k)+1)$ | | | | | | | (Qu=1, Qd=0) | | Ш | | | | -1/3(-y(k)+1/3) | $\Delta m(k-1) < 1/3(-y(k)+1/3)$ | • | | | | 1/3 <y(k)<2 3<="" td=""><td></td><td>(Qu=0, Qd=1)</td><td></td><td>Ш</td></y(k)<2> | | (Qu=0, Qd=1) | | Ш | | 4 | | –Δm(k-1) | $1/3(-y(k)+1/3)<\Delta m(k-1)<-1/3(-y(k)+1/3)$ | - | | | | | | (Qu=0, Qd=0) | | Ш | | | | 1/3(-y(k)+1/3) | $\Delta m(k-1) > -1/3(-y(k)+1/3)$ | | | | | | | (Qu=1, Qd=0) | | I | | | 0 <y(k)<1 3<="" td=""><td>-1/3(-y(k)+1/3)</td><td><math>\Delta m(k-1) &lt; -1/3(-y(k)+1/3)</math></td><td>•</td><td></td></y(k)<1> | -1/3(-y(k)+1/3) | $\Delta m(k-1) < -1/3(-y(k)+1/3)$ | • | | | | | | (Qu=0, Qd=1) | | Ш | | | | Δm(k-1) | $-1/3(-y(k)+1/3)<\Delta m(k-1)<1/3(-y(k)+1/3)$ | • | | | 5 | | | (Qu=0, Qd=0) | | I | | | | 1/3(-y(k)+1/3) | $\Delta m(k-1) > 1/3(-y(k)+1/3)$ | | | | | | | (Qu=1, Qd=0) | | I | | 6 | y(k)<0 | -1/3(-y(k)+1/3) | $\Delta m(k-1) < -1/3(-y(k)+1/3)$ | • | | | | | | (Qu=0, Qd=1) | | I | | | | Δm(k-1) | $-1/3(-y(k)+1/3) < \Delta m(k-1) < 1/3(y(k)+1/3)$ | | | | | | | (Qu=0, Qd=0) | | I | | | | 1/3(y(k)+1/3) | $\Delta m(k-1) > 1/3(y(k)+1/3)$ | | | | | | | (Qu=1, Qd=0) | | I | #### 4.3 Analog RSSD In this section, the circuit design of an analog RSSD that is based on the system approach discussed earlier will be addressed. This will be followed by an investigation on the effect of circuit nonidealities on the detector performance. #### 4.3.1 Circuit Design Tables 4.2-4 give the main information for circuit implementation of the multi-level reduced-state Viterbi detector. Two comparator stages at the front and back end of the circuit, as well as the offset combiners in the middle, form the analog core of this circuit (Fig. 4.11). This analog core is supported by digital circuitry which sets the dc offset value and sign for the input signal, y(k), as a function of present state and input level. This digital circuit also controls the path memory and defines the next state based on the outputs from the back-end comparators and the existing state. The front-end circuit is composed of nine comparators which quantize the sampled input signal with steps of 1/3V starting from +4/3V and ending at -4/3V (Fig. 4.10). Ten outputs p1-10 of these comparators along with the current state information are inputs to the digital part to select the desired offset and polarity for y(k). As shown in Fig. 4.11, two combinations of y(k) each with appropriate polarity and offset, form threshold levels for the two comparators at the back-end. Difference metrics will be updated and surviving branches will be identified upon the termination of this final comparison. As depicted in Fig. 4.11, a few digital signals control the offset and polarity of the input signal. As implied from Tables 4.2-4, there are only three distinct absolute offset values; these are 5/3V, 1V and 1/3V which are selectable by the digital signals C53, C10 and C13, respectively. Difference metrics which are extracted from one of the upper or lower threshold levels are selected and stored by the Mux-S/H for the succeeding comparison based on the following three possible conditions for the comparator outputs Qu and Qd. In the case of Qu=1 and Qd=0, the upper threshold voltage will be chosen whereas in the case Qu=0 and Qd=1, the lower threshold level will be adopted. For the last possible case when Qu=Qd=0, no replacement for the former difference metric will take place and the only possible variation is its polarity which indeed, will rely on the conditions of the current state and the quantized level. Fig. 4.10: Front-end quantizer circuit Fig. 4.11: Analog core of the processing circuit Although the structure in Fig. 4.11 is complete and applicable, it suffers from the existence of two S/Hs in the signal path which its settling time deteriorates the update speed. To improve speed performance, we notice that in Tables 4.2-4, $\Delta m(k)$ is always a function of y(k) or $\Delta m(k-1)$ depending on the output of the two final comparators which also implies that $\Delta m(k-1)$ is a function of y(j), j < k [17]. This suggests that the circuit in Fig. 4.11 can be upgraded to the circuit shown in Fig. 4.12. Two ping-pong sample and holds at the input will store y(k) and y(j). The conditions of Qu and Qd, as addressed before, will rule on the position of the input sampling switch in this structure to be toggled or remain unchanged. The new configuration operates at higher frequencies due to removal of one S/H from the signal path. Fig. 4.12: Improved structure for the analog core Realization of the circuit in Fig. 4.12 can be simplified if all additions and subtractions are performed in current mode. Fig. 4.13 represents the practical structure of Fig. 4.12. The detailed explanation of these blocks are provided in the later sections. Fig. 4.13: Practical structure for circuit realization of Fig. 4.11 A fully differential structure in this design ensures significant suppression of common mode noise and interference in the circuit. The select switches pick one of the distinct offset levels of 1/3, 1, and 5/3V controlled by the digital input controls. Input transconductors (V/I) convert input signals and selected offsets to current before being combined via pull-up resistors. Also, polarity switches simply interchange the input and output connections based on the control inputs to change the polarity of alternative signals. Since in Fig. 4.13 arithmetic operations are in current mode (and also to reuse circuit blocks), the quantizing structure in Fig. 4.10 is modified to the one shown in Fig. 4.14. Fig. 4.14: Current-mode realization of the front-end quantizer Using this configuration, the transconductors employed in Fig. 4.13 can also be reused for quantization with extra output currents. As is apparent from Fig. 4.13, digital circuits are also a major part of this system. They are mostly combinational logic which provide the analog core with the new state information as well as the choice of specific offsets and polarities. These circuits are illustrated in block form in Fig. 4.15 and a brief functional explanation of these digital blocks, as extracted from Tables 4.2-4 follows. Fig. 4.15: Digital processing blocks Combinational logic circuit in Fig. 4.15.a takes the current state information s(n) which are I, III and V together with the ten quantization outputs (p) obtained from the circuit in Fig. 4.10 to generate offset and polarity control signals for y(k). Depending on the output "1" of either of the final comparators Qu and Qd, and also the knowledge of the state and p, one of the polarity information Pu or Pd will be chosen by the circuit in Fig. 4.15.b for the polarity of y(j) in the next comparison. In the case of both Qu and Qd being "0", the last polarity Pp(n-1) will be restored as Pp(n) for the next cycle. Functions of circuits 4.15.c and 4.15.d can be explained in rather the same way as 4.15.b. Pcp(n) is defining the sign of offset for y(j) while the three outputs, Cxxp(n), which are C13(n), C10(n) and C53(n) select one of the offset levels 1/3V, 1V and 5/3V, respectively as the offset of y(j). Next state information is generated by the block shown in Fig. 4.15.e using the information of current state s(n) as well as p(n), Qu(n) and Qd(n). Unfortunately, the structure in Fig. 4.12 still suffers from significant delays within one sample period. The operations such as sampling, quantization, digital circuits delay, voltage to current conversions and finally the last stage comparison create a delay of more than 8ns which is too long to achieve the desired speed. These delays can be mitigated by splitting the above duties to different cycles and using a pipelining structure which will be addressed in the next section. #### 4.3.2 Pipelining Structure Due to the long processing time needed for complete computation during one sample period, the whole operation for one sample is divided into four consecutive cycles which starts with sampling and continues with quantization, digital assessment and finally back-end comparison and difference metric update. As depicted in Fig. 4.16, five sample and holds store five samples of the incoming signal. These samples are saved in the capacitors through the transistor switches controlled by $S_{1-5}(1)$ before being converted to current by the corresponding transconductors. These currents which are proportional to the samples at each S/H are steered to different stages in pipelining structure for subsequent analysis. The switches controlled by $S_{1-5}(2)$ deliver the desired current to the quantizer while the other switches which are $S_{1-5}(4)$ and $S_{1-5}(5)$ take two other currents for the difference metric update process. Upon the completion of this process on each sample, that sample will be replaced by a new input sample at the same S/H. This implies that one sample and hold and one transconductor are devoted to each sample for a complete process. To elaborate the preceding discussion, suppose a new round of process is beginning by assuming S/H(1) samples y(m) at time 0. Denoting each sampling period by T, sample y(m) should have been stored and settled by S/H(1) before time T. At the start of the second period, T, S/H(2) will start sampling y(m+1) at the same time as y(m), being held in S/H(1), is under the quantization process. At time T+I, the quantizer outputs produced by y(m) will be used as part of the inputs for digital assessment. At the same time, y(m+2) is sampled by S/H(3) and y(m+1), being held in S/H(2), is undergoing quantization process. At time T+2, as presented in Fig. 4.17, this rotation will continue by saving y(m+3) in Fig. 4.16: The circuit processing stages in pipelining structure S/H(4) and quantizing y(m+2) while y(m+1) is in a waiting state for its digital assessment. Meanwhile, sample y(m-1) which we assume had already been stored in S/H(5) together with y(m) having been stored in S/H(1) will jointly proceed to final comparison and the difference metric update operation. In contrast, the function of y(m) and y(m-1) in Fig. 4.17 is Fig. 4.17: Selective switches and connections in the circuit pipelining configuration analogous to the characteristics of y(k) and y(j) in Fig. 4.12, respectively and as discussed earlier, the update of the sample and holds containing y(k) and y(j) depends on the results of Qu and Qd. This means that for the period starting at T+3, if either Qu or Qd is 1, the next sample, y(m+4), will be stored in S/H(5) (Fig. 4.18.a). Otherwise, if Qu=Qd=0, S/H(5) will retain its sample and S/H(1) will hold y(m+4) (Fig. 4.18.b). Fig. 4.18: Typical rotation of S/Hs when a: Qu or Qd =1 and b: Qu=Qd=0 As shown in Fig. 4.18, there is one selective switch (SW\_(1-5)) assigned for any of the five sample and holds which based on the existing conditions controls the flow of sampled signals to the different processing stages. The digital controller for each switch is made up of seven D-Flip-Flops and two multiplexers which is shown in more detail in Fig. 4.19. Each switching controller has four outputs and at each period only one of the outputs will be active and the others will remain inactive. This is in compliance with the fact that samples should be in different positions during this detection process. In this circuit, the flow of active state from $S_n(1)$ to $S_n(4)$ is unconditional while the state transitions from $S_n(4)$ to $S_n(5)$ and $S_n(5)$ to $S_n(1)$ are conditional and depend on Qu and Qd. For Qu or Qd=1, there will be a routine flow of states from $S_n(4)$ to $S_n(5)$ and $S_n(5)$ to $S_n(1)$ whereas for the case Qu=Qd=0, the state in $S_n(4)$ will be moved to $S_n(1)$ rather than $S_n(5)$ and $S_n(5)$ will preserve its own state. Fig. 4.19: Rotation management digital circuit In the pipelining configuration described above there is still one unresolved problem. Recall from Tables 4.2-4 that the information about the selection of dc offset and signal polarity which are generated by the digital assessment circuitry, all depend on the knowledge of the present state which is based on the acquisition of Qu and Qd information from the last period. However, in the pipelining structure, the digital assessment and the difference metric update both execute simultaneously which means that comparator outputs would not be known until the end of the cycle. To avoid most of this delay, the digital assessment block is triplicated and each of the blocks pre-evaluate their outputs for the three possible cases of Qu, Qd = (1,0), (0,1) and (0,0). At the end of the cycle and once the final comparator outputs are established, one of these three sets of results will be chosen as the correct output set. #### 4.3.3 Path Memory The final step in detecting the received data is to keep track of the past states in the path memory. Since at the end of each period only the two most probable states will be retained and each of these states represents a 2-bit data, the three possible path memory inputs based on the conducting states (I, III and IV) will be of the forms illustrated in Fig. 4.20. Fig. 4.20: Three possible path memory inputs Propagation of these inputs within the path memory (Fig. 4.21.a) is controlled by (S/p)u and (S/P)d signals shown in Fig. 4.21.b. The whole structure of the path memory is based on Equations 4.3.1-4 which implies that when Qu=1 and Qd=0, the information in the upper latches will be copied to the both succeeding upper and lower latches while in the case Qu=0 and Qd=1, lower latches will propagate their data to the next upper and lower latches. For Qu=Qd=0, there will be a parallel or cross propagation depending on the state and quantization information. For example, in Table 4.2, the two middle rows have parallel transitions while the others have cross transitions. As a general rule, when Qu=Qd=0, the transition will be parallel or crossed depending on the updating of $\Delta m(k)$ to $\Delta m(k)$ or $\Delta m(k)$ , respectively. With enough number of latches in the path memory, the information <sup>1.</sup> Referring to Fig. 4.4, the branch extensions between the same hyper-states are called parallel propagation while the other branch extensions from 0 to 1 and also from 1 to 0 are named cross propagation. in the lower and upper latches will converge to the same data at the output. Fig. 4.21: Path memory a: circuit, b: digital controls #### 4.3.4 Comparator Offset Effects Practical nonidealities such as mismatches and dc offsets can impair the performance quality of the circuit. In this section, the effect of comparator dc offset on the symbol error rate of this RSSD Viterbi detector will be investigated. However, other nonidealities such as mismatch in the gain of transconductors and reference levels can also be translated as an offset in the comparators. Although these imperfections in different parts of the circuit may be treated as an another source of noise added to the input signal, their correlated nature make simulations more appealing than an analytical approach. As mentioned in the earlier sections, two sets of comparators are involved in this detection; these are nine front-end quantizing comparators and two back-end difference metric updating comparators. The first nine comparators, in contrast with a flash A/D, need about 3.3 bits of accuracy which implies that fairly simple comparators without any offset cancellation provisions can act properly in this segment. To extend this discussion to the two back-end comparators, the information in Tables 4.2-4 identify that threshold difference between two comparators can vary from 0 to 2/9V. For example, at the first and last rows of these tables, the difference between the threshold levels are independent of y(k) and are equal to 2/9V. This difference at the other rows is dependent on y(k) and the absolute maximum and minimum differences at each row are 2/9V and 0V, respectively. For the worst case, the accuracy of these comparators should be in the range of the whole system or about 6 bits which calls for more sophisticated comparators as compared with those of the first stage. This idea has been verified by simulating SER performance of this detector in two different comparator mismatch conditions. Given a random dc offset equal to +/-5% of the voltage between two consecutive input quantizing levels to the front-end nine comparators, Fig. 4.22.a shows that SER remains almost intact while in Fig. 4.22.b, the same amount of offset in the back-end two comparators demonstrates an approximate degradation of 0.4dB in SER performance. Fig. 4.22: Simulated SER performance comparison, a: 5% dc offset in the front-endcomparators, b: the same dc offset in the back-end comparators. #### 4.4 Building Blocks Functional blocks in the circuit realization of RSSD have been introduced in the previous sections and in this section a detailed description of each of those blocks along with their circuit issues will be presented. #### 4.4.1 Voltage-to-Current Converter Voltage-to-Current converters (transconductors) play a critical role in this design as all mathematical operations in this design are in current mode. The transconductor with p-channel inputs [19] depicted in Fig. 4.23 can accommodate low bias level inputs and performs with high linearity if R is kept constant. Wide poly resistors laid out in a close distance from each other can provide good linearity and matching with the other transconductors. Fig. 4.23: Transconductor circuit To investigate the frequency response of this transconductor, we derive the open-loop, Vo(s)/Vi(s), transfer function of the simplified circuit in Fig. 4.24.a which its open-loop half-circuit ac model is shown in Fig. 4.24.b. Fig. 4.24: Transconductor a: simplified circuit, b: half-circuit open-loop circuit Assuming $r_{cs1}$ and $r_{cs2}$ as the output impedances of $I_1$ and $I_2$ , respectively and presenting total parasitic capacitance in node 1 as $C_1$ and compensating capacitor in node 2 with $C_c$ in Fig. 4.24.b, Vo(s)/Vi(s) can be approximated as follows: $$\frac{V_o(s)}{V_i(s)} = -\frac{g_{m3}r_{cs2}}{1 + r_{cs2}C_cS + \frac{C_1C_cr_{cs2}}{g_{m1}}S^2}$$ (4.12) As expected, the dominant pole is located at $P_1 = \frac{1}{r_{cs_2}C_c}$ and the second pole is located at $P_2 = \frac{g_{m1}}{C_1}$ . $C_c$ performs as a compensating capacitor and is set to 80 fF for a reliable phase margin and -3dB bandwidth of 1.2GHz when pulling up the output differential currents with 300 $\Omega$ resistors. Also, for reducing the effect of output pull-up resistors on the frequency performance, open-drain transistors which conduct output currents are isolated from those resistors by the means of common gate transistors as shown in Fig. 4.24.a. The circuit transconductance can be approximately derived by: $$\frac{i_o}{v_{id}} \approx \frac{1}{R + \frac{2}{g_{m1}g_{m3}r_{o1}}}$$ (4.13) where $r_{ol}$ denotes the output impedance of transistor $Q_1$ . The denominator in Equation 4.13 is dominated by R and so for a linear transconductance gain, R needs to be constant in the whole range of the input signal. Moreover, it is also critical to have Rs for different transconductors being well-matched in the circuit. The main advantage of this transconductor configuration is its capability to have multiple outputs. #### 4.4.2 Comparators Nine comparators at the front-end and two comparators at the back-end are the key parts in this detector. When dealing with CMOS comparators, their input offset can be significant in a precise design and the need for offset cancellation is unavoidable. The comparator employed in this design has incorporated two cascaded preamplifiers (Fig. 4.25) which are coupled to the input signal by $C_1$ and $C_2$ . Offset cancellation and bias adjustment is manipulated by the MOS switches which short the output to the input and connect the other side of the coupling capacitors to the reference voltages [20]. Fig. 4.25: Front-end preamplifiers of the comparator and their connections The complete comparator circuit is shown in Fig. 4.26 and the corresponding clock waveforms are presented later in Fig. 4.32. Two latches I and II being activated consecutively ensure enough speed for the output settlement. Latch signal which is the inversion of Lin, will fire Latch1 after the input signal having been sufficiently amplified at the output of Preamp. II. At the far end of latching period, regen signal will ignite Latch II for a further amplification of Latch I output. Meanwhile, reset signal will prepare Latch I and Preamp. I for the next comparison cycle by equalizing the voltage levels at the output of Preamps. I and II. R/S F-F at the last stage of this comparator generates a full digital level swing at the output. M1=M2=M21=M22=M26=M27=M28=M29=16/0.25 M3=M4=M8=M9=M10=M11=4/0.25 M6=M7=M12=M13=M15=M16=M17=M18=M19=M20=M24=M25=8/0.25 M5=M14=6/0.25 M3=M14=6/0.23 M23=3/0.25 M30=M31=24/1 Fig. 4.26: The comparator circuit Two preamplifiers with diode connected loads present an aggregate gain of $$A_{v} = \frac{g_{m1}}{g_{m3}} \cdot \frac{g_{m6}}{g_{m8}}. (4.14)$$ Two major poles are also created at the output of these two preamplifiers which are $$P_1 = \frac{g_{m3}}{C_{out1}} \tag{4.15}$$ and $$P_2 = \frac{g_{m8}}{C_{out2}} {(4.16)}$$ By the transistors sized as shown in Fig. 4.26, the preamplifiers overall gain is about 9 and the two dominant poles $P_1$ and $P_2$ are located at about 0.9GHz and 1.3GHz, respectively which along with the other non-dominant poles result in a -3dB bandwidth of 580 MHz by simulation. The last stage R/S Flip-Flop is placed to maintain the output levels fixed while the Latch II outputs are pushed to about Vgs in reset mode. While Latch II outputs are both about Vgs, stacked R/S F-F transistors M26-M28 and M27-M29 will not be in active region and hence no transitions will occur during reset period. #### 4.4.3 Input Quantizing Circuit Fig. 4.27 is the circuit realization of V/I-9 in Figs. 4.13-14. Nine differential outputs enable the sampled signal in position y(m-1) to be compared with nine reference levels as shown in Fig. 4.14 in current mode. Nine reference levels are generated using differential resistive ladders [22] and five two-differential-output transconductors (V/I-2) of which four of them introduce two symmetric levels of (+4/3, -4/3), (+1, -1), (2/3, -2/3) and (1/3, -1/3) just by exchanging one of the outputs connections and the last V/I-2 presents 0V level (refer to Figs. 4.14 and 4.28). These reference levels as well as the input signal level have been down scaled by the ratio of 3/10 in practice because of circuit swing limitations. Fig. 4.27: Nine-differential-output transconductor (V/I-9) Once in quantizing mode, output transistors controlled by s2 will be turned on for the process. In digital assessment mode, none of the output transistors are on because at this cycle only digital operations are carried out based on the previous quantization results. In y(k) and y(j) positions, transistors controlled by s4 and s5, respectively will be turned on for the operation depicted in Fig. 4.12. Also, although in the sampling mode there is no computational process to keep output transistors in the corresponding V/I-9 on, but as explained in Fig. 4.18, since y(j) position is uncertain until the far end of the previous cycle, s1 will keep the output transistors to be placed in sample mode on for a probable switch to y(j) if both Qu and Qd are zero. For more clarification it should be pointed out that for reducing power consumption, only the transistors engaged in the computational operations (quantization and difference metric update) will be on in that particular cycle and the rest will be kept off. The latter transistors will be turned on slightly before any operational cycle to avoid any delay caused by activating an off transistor. In all modes except quantization, all transistors controlled by s2 are off and hence, enable signal disconnects the gate of these transistors to reduce capacitive load at the gates of active transistors. In addition, as illustrated in Fig. 4.28, switch transistors with their gates always grounded are also included in V/I-2 circuit for matching purposes. Fig. 4.28: Reference generating circuit a: differential ladder resistors, b: two-differential-output transconductor (V/I-2) #### 4.4.4 Input Sample and Holds In section 4.3.2, it was pointed out that pipelining structure was deployed in this design to achieve higher speed. For this purpose, five differential sample and holds are required to store the sampled input signals consecutively. The order of sampling, as already mentioned, is controlled by the digital circuit shown in Fig. 4.19. On the other hand, these five sampled data, as shown in Fig. 4.29, are input to five nine-differential-output transconductors (V/I-9) which were explained in the previous section. Charge injection and clock feed-through are sources of imperfection in sample and hole circuits. The amount of charge injected to the storing capacitor is proportional to $V_{eff}$ [19] and hence is a function of the input signal. Due to the small variation of the input signal (0.8V ~ 0.2V) compared to the Vdd as well as the fully differential structure of the circuit, the resulted impairments for about 6 bits of accuracy are negligible. Fig. 4.29: Input sample-and-holds circuit #### 4.4.5 Offset generators Three different dc differential offset values in this design are generated using two sets of ladder resistors. These reference levels are selected by the control signals C53, C10 and C13, using M(1-6) transistors in Fig. 4.30. The selected offset voltage will be converted to current by a transconductor (V/I) before being added to other current signals. Fig. 4.30: Offset generating circuit #### 4.4.6 Clock Generator As explained in the previous sections, many blocks in the circuit are controlled by the digital signals and clocks. Fig. 4.31 presents the clock generator circuit which provides outputs to control comparators, Flip-Flops and switches. Fig. 4.31: The clock generating circuit Two inverted versions of the input clock, *CLK* and *CLKd*, control flow of offset and sign information for the y(j) and y(k). *Lin1*, *Latch1* and *reset1* control nine quantizing comparators whereas *Lin2*, *Latch2* and *reset2* take over the control of the two last stage comparators. Both of the comparator groups use *regb* for their LatchII activation. Wave forms for these control signals and clocks are depicted in Fig. 4.32. These signals are all followed by appropriate buffers to drive subsequent blocks. Fig. 4.32: Output waveforms of the clock generator #### 4.5 Summary Analog integrated Viterbi detectors have already demonstrated their advantageous performance over their digital counterpart. Due to the elimination of an ADC at the front end, an analog design performs at high speeds with low power consumption. With ever-increasing demand on higher data rates and the limitation of existing channels, multi-level schemes have drawn attention for their lower bandwidth requirement. In this chapter a complete design procedure of a 500MS/s (1Gb/s) analog Viterbi detector for 4-PAM, duobinary partial signalling was elaborated. Due to the significant delays in a sample process imposed by analog and digital modules, pipelining and parallel processing were employed to improve the speed by tolerating a little more latency and complexity in the circuit. Build- ing blocks and practical imperfections were discussed and the required accuracy of the comparators was investigated. This design approach can also be extended to other partial response signalling schemes such as dicode and class-IV systems where high degree of detection reliability and low power consumption is of concern. #### 4.6 References - [1] P. Kabal and S. Pasupathy, "Partial-Response Signaling," *IEEE Trans. Commun.*, Vol. 23, No. 9, Sep. 1975, pp. 921-934. - [2] E. A. Lee and D.G. Messerschmitt, "Digital Communication," *Kluwer Academic Publishers*, 1994. - [3] G. D. Forney, Jr., "Maximum-Likelihood Sequence Estimation of Digital Sequences in the Presence of Intersymbol Interference," *IEEE Trans. Inform. Theory*, Vol. 18, No.3, May 1972, pp. 363-378. - [4] M. E. Austin, "Decision-feedback equalization for digital communication over dispersive channels," *M.I.T Lincoln Lab.*, *Lexington*, *Mass.*, Tech. Rep. 437, Aug. 1967. - [5] D.A. George, R. R. Bowen, and J. R. Storey, "An adaptive decision feedback equalizer," *IEEE Trans. Commun. Technol.*, Vol. 19, June 1971, pp. 281-293. - [6] M. V. Eyuboglu and S. U. Qureshi, "Reduced-State Sequence Estimation with Set Partitioning and Decision Feedback," *Proc. IEEE Globecome Conf.*, Vol. 2, 1986, pp. 1023-1028. - [7] S. Olcer, "Reduced-State Sequence Detection of Multilevel Partial-Response Signals," *IEEE Trans. Commun.*, Vol. 40, No. 1, Jan. 1992, pp.3-6. - [8] G. D. Forney, Jr., "The Viterbi algorithm," *Proc. IEEE*, Vol. 61, Mar. 1973, pp. 268-278. - [9] A. Duel-Hallen and C. Heegard, "Delayed decision-feedback sequence estimation," *IEEE Trans. Commun.*, Vol. 37, May 1989, pp. 428-436. - [10] P. R. Chevillat and E. Eleftheriou, "Decoding of trellis-encoded signals in the presence of intersymbol interference and noise," *IEEE Trans. Commun.*, Vol.37, Jul. 1989, pp. 669-676. - [11] F. L. Vermuelen and M. E. Hellman, "Reduced-State Viterbi Decoding for Channels with Intersymbol Interference," *Proc. IEEE Int'l Conf. on Commun.*, 1974, pp. 37.B.1-37.B.4. - [12] M. V. Eyuboglu and S. U. Qureshi, "Reduced-state sequence estimation for coded modulation on intersymbol interference channels," *IEEE J. Sel. Areas in Commun.*, - Vol. 7, Aug. 1989, pp. 989-995. - [13] G. Cherubini, S. Olcer, and G. Ungerboeck, "A Quaternary Partial-Response Class-IV Transceiver for 125 Mbit/s Data Transmission over Unshielded Twisted-Pair Cables: Principles of Operation and VLSI Realization," *IEEE J. Sel. Areas in Commun.*, Vol. 13, No. 9, Dec. 1995, pp. 1656-1669. - [14] M. H. Shakiba, D. A. Johns and K. W. Martin, "An Integrated 200-MHz 3.3-V BICMOS Class-IV Partial-Response Analog Viterbi Decoder," IEEE J. Solid-State Circuits, Vol. 33, No. 1, Jan. 1998, pp. 61-75. - [15] R. Farjad-Rad, C. K. Yang, M. Horowitz and T. Lee, "A 0.4-um CMOS 10-Gb/s 4-PAM pre-emphasis serial link transmitter," *IEEE J. Solid-State Circuits*, Vol. 34, May 1999, pp. 580-585. - [16] J. L. Zerbe, P. S. Chau, C. W. Werner, T. P. Thrush, H. J. Liaw, B. W. Garlepp and K. S. Donelly, "1.6 Gb/s/pin 4-PAM Signaling and Circuits for a Multidrop Bus," *IEEE J. Solid-State Circuits*, Vol. 36, May 2001, pp. 752-760. - [17] M. H. Shakiba, "Analog Viterbi Detection for Partial-Response Signaling", *Ph.D. Dissertation*, Univ. Toronto, 1997. - [18] S. Olcer and G. Ungerboeck, "Difference-Metric Viterbi Decoding of Multilevel Class-IV Partial-Response Signals", *IEEE Trans. Commun.*, Vol. 42, No. 2/3/4, Feb. / Mar. / Apr. 1994, pp. 1558-1570. - [19] D. A. Johns, K. Martin, Analog Integrated Circuit Design, Wiley, New York, 1997. - [20] I. Mehr and D. Dalton, "A 500-MSample/s, 6-Bit Nyquist-Rate ADC for Disk-Drive Read-Channel Applications", *IEEE J. Solid-State Circuits*, Vol. 34, No. 7, Jul. 1999, pp. 912-920. - [21] M. J. Ferguson, "Optimal reception for binary partial response channels," *Bell Syst. Tech. J.*, Vol. 51, No. 2, Feb. 1972, pp. 493-505. - [22] Y. Tamba, K. Yamakido, "A CMOS 6b 500MSample/s ADC for a Hard Disk Drive Read Channel," *ISSCC Digest of Technical Papers*, pp. 324-325, Feb. 1999. - [23] G. Ungerboeck, "Channel Coding with Multilevel/Phase Signals," *IEEE Trans. Inform. Theory*, Vol. 28, No. 1, Jan. 1982, pp. 55-67. #### CHAPTER # 5 ## Analog Reduced-State Sequence Detection ### **Experimental Results** An experimental prototype of the proposed reduced-state Viterbi detector was fabricated in a 0.25µm single-poly, five metal layer CMOS process technology. Section 5.1 discusses the layout considerations and techniques employed to guarantee a high level of performance. Test set-up for performance evaluation is discussed in section 5.2. Section 5.3 presents measurement results followed by conclusion and summary in section 5.4. #### 5.1 Layout The die photograph of the chip is shown in Fig. 5.1. As a mixed signal circuit, chip layout issues become much more important as sensitive analog circuits are subject to digital perturbations mostly conducted by the substrate. Two differential pairs of the input clock (clk\_p(in), clk\_n(in)) and input signal (in\_p, in\_n) as well as five pairs of differential outputs, form the major I/O pins. The output signals are comprised of four digital signals controlling path memory data propagation (SP\_p(u), SP\_n(u), SP\_p(d), SP\_n(d), I\_p, I\_n, V\_p, V\_n) and one output clock for synchronization and testing purposes (clk\_p(out), clk\_n(out)). Fig. 5.1: Chip photograph Wide layers of metal were used for routing Vdd and Gnd to minimize voltage drop across these power lines. These lines were drawn beside each other all over the chip area and were decoupled using CMOS capacitors of which their total capacitance is estimated to be about 1nF. Several pins were allocated to serve power to the different parts of the chip. Different power lines for analog (vdd\_a, vss\_a) and digital (vdd\_d, vss\_d) sections as well as separate power lines for guard rings (vdd\_r, vss\_r) and output buffers (vdd\_o, vss\_o) were precautions to reduce the penetration of digital noise to the analog part and separate high-current paths from sensitive signals. For testability purposes and also to provide control on current sources, the gates of all transistors performing as current sources for comparators and output buffers were pulled out of the chip (g-bias). In addition to the external biasings for the above modules, an internal constant-gm, wide-swing biasing circuit [1] was implemented to support bias levels for transconductors. Bias levels as well as current flow are controlled by an internal fixed resistor in series with an off-chip resistor connecting rbias pin to ground. Bias levels were also monitored by their corresponding pins vbn, vcn, vcp and vbp, externally. All the bias points within the chip were decoupled to Gnd or Vdd by CMOS capacitors. For matching purposes, all building blocks were laid out in symmetric configuration. Common-centroid layout [2][3] was incorporated for common source transistors. Dummy comparators and transconductors as well as dummy transistors, capacitors and resistors were placed on both sides of critical components and cells for matching enhancement. Clock and digital I/O lines play a critical role in the performance of the circuit and can face significant loading inside or outside the chip. Special buffers for these lines are explained later in this section. The analog differential input in this layout was shielded by two layers of metal at top and bottom for better noise protection. For generating dc offsets for combiners and quantization levels for input comparators, two separate pairs of input reference levels (ref\_max and ref\_min) have been provided to establish maximum and minimum levels for the corresponding resistor ladders. Reference levels for these two different pairs should ideally be the same but to account for nonidealities in the circuit as well as some testability purposes, separate I/O pins have been allocated for them. To focus on the analog core of the circuit, the path memory part was not included in the layout. However, the path memory consists of approximately 120 D-Flip-Flops which would increase the power consumption by 25% and increase the area by 10%. Active area measures $0.78 \text{mm}^2$ in $0.25 \mu \text{m}$ CMOS technology, of which 75% of it is occupied by analog portion. ### 5.1.1 Clock Buffers Clock generators face significant capacitive load at their outputs as a result of an abundance of digital modules. To alleviate skewing problems, a digital buffer comprised of two cascaded inverters was provided for each clock line (Fig. 5.2). The first-stage inverters which are made up of smaller transistors, were placed close to the clock generator module and the second-stage inverters with larger transistors were located close to the module. Fig. 5.2: Digital two-stage buffer ## 5.1.2 Digital I/O Line Translators Bonding pads with their relatively large capacitance to the substrate are the critical parts in a chip as they can pass sharp transitions at the I/O pads to the substrate and hence affect the analog circuit performance. To alleviate this phenomenon, all digital inputs/outputs in this chip are realized as differential low swing signals before being applied to the pads. A differential to single-ended circuit shown in Fig. 5.3 converts a differential input signal pair of 200mV swing to a full scale digital swing of 2.5V. While only one of the outputs is used as a clock, the other complementary output follows the main clock in the layout for substrate noise reduction purposes. Fig. 5.3: Differential low-swing input to full-swing output translator circuit In a reverse order, all digital outputs need to be translated to low swing differential signals before being brought outside through bonding pads. The open-drain differential pair shown in Fig. 5.4 is incorporated for this conversion. The buffer and the inverter in this circuit employ different transistor sizes to provide an equal delay complementary signals at the gates of M1 and M2. M3 is biased for a constant current of 1mA and hence the output swing can be defined by the product of this current and off-chip pull-up resistors. Moreover, the use of this circuit results in a constant power supply current and as a result, switching noise on the power lines is greatly reduced. Fig. 5.4: Full-swing input to low-swing differential output translator circuit ## 5.2 Test Set-Up Test set-up for the chip performance evaluation is shown in Fig. 5.5. The input data is a 4-PAM modulated signal under a 1+D partial response signalling environment and delivers seven equi-distance levels within the range of 0.2-0.8V. This signal is generated by a D/A converter with three digital inputs. These 3-digit code inputs to the D/A come from the three channels of a four-channel SONY/Tektronix DG2030 data generator. The fourth channel is left for clock generation. Two-bit random numbers were generated in MATLAB and then added with the previous number to form a 1+D coding. The resulting 3-bit codes are saved and fed to the data generator. One unused 3-bit code was interposed within the other codes periodically for offset cancellation as mentioned below. Fig. 5.5: Test set-up To make BER measurements, a controlled amount of noise was added to the differential input (Fig. 5.6). The chip outputs were stored in a logic analyzer and then were sent to a computer by a LAN interface for further analysis. After passing the received data through a path memory, bit error rate evaluation was carried out by comparing original bits with the detected data. Fig. 5.6: Input seven-level encoded signal eye diagram a: high SNR, b: low SNR Fig. 5.7 illustrates the detailed test set-up hardware. Splitters and bias-Ts are incorporated to provide a differential input clock with specific bias level for the chip. A low-pass filter at the output of the noise generator limits the noise bandwidth to half the symbol-rate. Three output channels of the data generator feed the D/A as well as a multiplexer. For a special case that all three channels outputs are 0, the multiplexer activates an offset cancelling pulse and hence, the detection process will be halted for this period. A frequency of 50 KHz was shown to be satisfactory for this pulse repetition. Fig. 5.7: Input test signal and clock generator hardware ## **5.3** Experimental Results Due to speed limitations of the logic analyzer (Hewlett Packard *HP1663A* with a maximum state speed of 100MHz) and the D/A (ANALOG DEVICES *AD9708AR* with a maximum sampling rate of 125 MS/s), measurements were carried out up to 100 MS/s. The results shown in Fig. 5.8 indicate a close agreement between the experimental and simulation results. Minor deviation of the experimental results from the ideal case in low SNR is due to model inaccuracies. At high SNR, unmeasured internal perturbation such as substrate noise and power supply ripples can account for this difference. A summary of the chip measured results and specifications is shown in Table 5.1. Fig. 5.8: Measured BER performance Chip Analog RSSD Modulation 4-PAM Coding (1+D) partial response Symbol-Rate 500 MS/s - 1Gbit/s (simulation) 100 MS/s - 200 Mbit/s (experimental, due to equipment limitations) Power 55mW at 100MS/s (experimental) 112mW at 500MS/s (simulation) Consumption Power Supply 2.5V 0.25 μm - CMOS Process $0.78 \text{ mm}^2$ Active area Table 5.1: Performance Summary ## 5.4 Summary A reduced-state analog Viterbi detector for 4-PAM duobinary partial-response signal-ling was fabricated in a 0.25 µm CMOS process. Input/output translators for digital signals were implemented to provide differential low-swing signals at the pads for substrate noise reduction. Due to the limitation of testing equipment, testing was conducted at 100MS/s (200Mb/s) while simulations demonstrate it should operate at 500 MS/s. The experimental results prove the complete expected performance of the designed circuit. The power consumption of the chip was measured to be 55mW from a 2.5V supply when working at the speed of 200 MS/s. However, simulations show the chip would consume 112mW at 500 MS/s. ## 5.5 References - [1] D. A. Johns, K. Martin, Analog Integrated Circuit Design, Wiley, New York, 1997. - [2] F. Maloberti, "Layout of Analog and Mixed Analog-Digital circuits," *Design of Analog-Digital VLSI Circuits for Telecommunication and Signal Processing*, Prentice Hall, Englewood Cliffs, New Jersey, 1994. - [3] P. O'Leary, "Practical Aspects of Mixed Analogue and Digital Design," Analogue-Digital Asics, Circuit Techniques, Design Tools, and Applications, Peter Peregrinus, Stevenage, England, 1991. - [4] A. B. Dowlatabadi, "Challenges in CMOS Mixed-Signal Designs for Analog Circuit Designers," *Proc. 40th Midwest Symposium on Circuits and Systems*, 1997, Vol. 1, pp. 47-50, 1998. ## CHAPTER ## 6 ## **Conclusions and Future Directions** ## 6.1 Summary and Conclusions This research has focused on the issues involved in the design of high-speed optical wireless transceivers targeted for short-distance applications. The motivation for this work is to prepare a low-cost and high-speed wireless system for indoor multi-user networks and develop a cable-free connection from portable devices such as notebook computers, PDAs, and cellular phones to a mainframe, network, or a dumb terminal. Assuming the receiver side to be the most critical part of this system, more attention was paid to this portion. A new fully differential transimpedance amplifier was proposed and implemented for the receiver front-end. In addition to its balanced structure, a fully differential regulated cascode circuit at the input of this transimpedance amplifier isolates large input photodiode capacitors from the rest of the circuit and thus improves the gain-bandwidth by removing the dominant pole from the input. Designed in a $0.35~\mu m$ CMOS technology, the experimental results demonstrated a gain of $90.4~dB\Omega$ and band- width of 255 MHz with a 2pF capacitance at the input. It was shown with this transimpedance amplifier that even with 20-fold increase in the input capacitance, the -3dB bandwidth was decreased just by a factor of 2 which justifies its insensitivity from the photodiode capacitor. The power supply rejection ratio was measured to be about 40dB at 20MHz and the consumed power was 30mW from 3.0V power supply. An active dc photocurrent rejection circuit was also included in this preamplifier to prevent the output from saturation when exposed to intense ambient light. Equalizers and detectors at the back-end part of the proposed receiver, shape the channel for the desired partial-response model and recover the transmitted information based on this coding scheme. To achieve a speed of as high as 1Gb/s, 4-PAM modulation scheme with duobinary partial response signalling is a desirable type of coding which requires a channel bandwidth of about one fourth of the bit-rate or 250 MHz. It was shown by simulation that according to the specification of typical available low-price opto-electric components in the market, a trivial low-pass filter satisfies the required equalizer performance when using the above band-efficient coding. For detection of this type of signalling, maximum-likelihood detection technique and hence the Viterbi detectors accomplish a better performance compared to the symbol-by-symbol detection techniques. Based on this fact which was also verified in this thesis for the introduced channel, an analog sequence detector was designed and realized. To simplify the required circuit, a full four-state Viterbi detector was simplified to a two-state Viterbi detector using state reduction criterion with little compromise in performance. Analog Viterbi detectors outperform their digital counterpart with lower power and higher speed. The experimental results extracted from the implemented analog reduced-state detector proved a good matching of its SER performance with those of the simulated circuit. The power consumption in the designed analog Viterbi detector was also shown to be at least one fifth of the same circuit in digital according to the recent published papers on A/Ds and digital Viterbi detectors. Although implemented for duobinary coding and optical communications, the proposed design methodology can be extended to other PRS schemes such as dicode and PR4. ## **6.2 Future Directions** Toward realization of a low-cost high-speed optical wireless link, any further research needs to be concentrated on system optimization, simplicity and compatibility with the new technologies. To fulfill these objectives some suggestions are addressed below. In the modern introduced technologies, low voltage designs are of great concern. While digital circuits lend themselves easily to this trend, analog circuits call for some revision on design and configuration strategies. As a matter of this thesis, low-voltage design of transimpedance amplifiers as well as the required transconductors and comparators are of great concern. In addition, circuit realization of the analog reduced-state sequence detector still can be improved by applying current-mode comparators. This eliminates the need for pull-up resistors at the input of the comparators and hence avoids significant delays associated with the time constants. Furthermore, proposing an efficient and systematic algorithm for designing other coding polynomials in RSSD will be helpful to create a CAD tool for higher order channels. This also manifests its significance when combining RSSD and the required equalizer together which call for models with higher number of states or variable coefficients. ## **APPENDIX** # Study of Adjacency in Branch Extension of the Reduced-State Detectors In two state reduced state sequence detection, the idea is to retain, at any time n, only two states with smallest state metrics and their associated survived branches while ignoring the other states. It can be proved that these two retained states in dicode and duobinary PRS schemes as well as other combinations such as PRIV are always two adjacent states. In the following, the adjacency criterion for L-PAM modulation in duobinary partial response signalling is investigated. To prove adjacency statement, we assume L levels of equally distributed -(L-1), $_-(L_-3)$ , ..., (L-3), (L-1) for transmission which can also be presented in the form 2(0, 1, 2, ..., L-1) - $(L-1)=2b_k-(L-1)$ in which $b_k \in B=(0, 1, 2, ..., L-1)$ . Starting with two random initial states and with enough randomness in the transmitted data, in some point, two resulted new states will be two neighboring states and from then on, this state propagation will be only to and from two adjacent states. Having this in mind, we take two initial states $b_0(n-1)$ and $b_1(n-1)$ such that Appendix 108 $$0 \le b_0(n-1) < b_1(n-1) \le L-1. \tag{A.1}$$ Upon receipt of input signal y(n), the branch metrics associated with the existing survived states and the succeeding states can be formulated as following after omitting constant parameters $$B_{ij}(n) = [y(n) - 2(b_{ij}(n) + b_i(n-1))]^2 = 4\left[\left(\frac{y(n)}{2} - b_i(n-1)\right) - b_{ij}(n)\right]^2$$ (A.2) where $b_{ij}(n)$ denotes the succeeding state and i=j=0, 1. The above equation conveys that the next two surviving states should provide the smallest and the second-smallest difference from $\left(\frac{y(n)}{2} - b_i(n-1)\right)$ . By presenting y(n) in the form of $$y(n) = 2k_n + r_n \tag{A.3}$$ where $k_n = (..., -1, 0, +1, ...)$ and $0 \le r_n < 2$ , the next two symbols for each $b_i(n-1)$ , i=0,1 are given in the form of $$b_{i0}(n) = Min[Max(0, (k_n - b_i(n - 1))), L - 2]$$ (A.4) and $$b_{i1}(n) = b_{i0}(n) + 1 (A.5)$$ The results from the above equations for different values of $k_n - b_0(n-1)$ have been presented in Table A.1 and are depicted in Fig. A.1. Appendix 109 | Table | <b>A.1</b> : | Survived | metrics | s for | the | adj | acent | |-------|--------------|----------|------------|-------|-------------------|------|-------| | | | states | $b_0(n-1)$ | and | b <sub>1</sub> (r | 1-1) | | | $k_n$ - $b_0(n-1)$ | ρ | b <sub>00</sub> (n) | $b_{0l}(n)$ | b <sub>10</sub> (n) | b <sub>11</sub> (n) | |--------------------|---|---------------------|-------------|---------------------|---------------------| | : | : | : | : | : | : | | L | 0 | L-2 | L-I | L-2 | L-I | | L-I | 0 | L-2 | L-I | L-2 | L-1 | | L-2 | 1 | L-2 | L-1 | L-3 | L-2 | | L-3 | 1 | L-3 | L-2 | L-4 | L-3 | | ; | : | : | : | : | : | | 2 | 1 | 2 | 3 | 1 | 2 | | 1 | I | 1 | 2 | 0 | 1 | | 0 | 0 | 0 | 1 | 0 | 1 | | -1 | 0 | 0 | ŧ | 0 | 1 | | : | : | : | : | : | : | Fig. A.1: Possible branch extensions form the adjacent states $b_0(n-1)$ and $b_1(n-1)$ It is apparent from Fig. A.1 that for the case $\rho$ =0, the survived states are reduced to two adjacent states and proof is complete. But for the case $\rho$ =1, there are three states left and it is yet to be shown that only two sets of $(b_{01}(n), b_{00}(n)=b_{11}(n))$ or $(b_{00}(n)=b_{11}(n), b_{10}(n))$ can be accepted as the survived states and the choice of $(b_{01}(n), b_{10}(n))$ is impossible. To Appendix 110 prove that we notice that in Fig. A.1, two pairs of branch metrics $b_0(n-1)$ ---> $b_{0l}(n)$ and $b_1(n-1)$ ---> $b_{1l}(n)$ as well as $b_0(n-1)$ ---> $b_{00}(n)$ and $b_1(n-1)$ ---> $b_{10}(n)$ are equal which we denote them by $\beta_1$ and $\beta_2$ , respectively. Let also assume $\alpha 0$ and $\alpha 1$ as the state metrics for the states $b_0(n-1)$ and $b_1(n-1)$ , respectively. Having these assumptions in place, the following state metrics for the succeeding states can be achieved $$M_{00} = \alpha_0 + B_{(b_0(n-1) \to b_{00}(n))} = \alpha_0 + \beta_2$$ (A.6) $$M_{01} = \alpha_0 + B_{(b_0(n-1) \to b_{01}(n))} = \alpha_0 + \beta_1$$ (A.7) $$M_{10} = \alpha_1 + B_{(b_1(n-1) \to b_{10}(n))} = \alpha_1 + \beta_2 \tag{A.8}$$ $$M_{11} = \alpha_1 + B_{(b_1(n-1) \to b_{11}(n))} = \alpha_1 + \beta_1. \tag{A.9}$$ From the above equations it is readily found that $M_{0I}$ - $M_{00}$ = $M_{II}$ - $M_{I0}$ = $\beta_1$ - $\beta_2$ which implies that the differential state metrics between two adjacent states is independent of the starting state. Assuming $\beta_2 < \beta_1$ , it is obvious that $M_{I0} < M_{II}$ and $M_{00} < M_{0I}$ and hence, the two smallest and second-smallest states are $b_{I0}$ and $b_{00}$ = $b_{II}$ , respectively. In the opposite case, $\beta_2 > \beta_1$ , $b_{0I}$ will be selected as the smallest metric and $b_{00}$ = $b_{II}$ will be the second-smallest metric.