# A 112-Gb/s -8.2-dBm Sensitivity 4-PAM Linear TIA in 16-nm CMOS With Co-Packaged Photodiodes

Dhruv Patel<sup>®</sup>, *Graduate Student Member, IEEE*, Alireza Sharif-Bakhtiar<sup>®</sup>, *Member, IEEE*, and Tony Chan Carusone<sup>®</sup>, *Fellow, IEEE* 

Abstract—A flip-chip co-packaged linear transimpedance amplifier (TIA) in 16-nm fin field effect transistor (FinFET) CMOS demonstrating 112-Gb/s four-level pulse-amplitude modulation (4-PAM) with -8.2-dBm sensitivity is presented in support for optical receivers required in the next-generation intradata center links. A proposed three-stage TIA is comprised of a shunt-feedback stage followed by digitally programmable continuous-time linear equalizers (CTLEs) and a variable gain amplifier (VGA). Broadband low-noise design is achieved by having the first stage with much lower bandwidth (BW) followed by the proposed BW recovering CTLEs. A low-power design is supported by the inverter-based single-ended architecture with a single-ended-to-pseudo-differential conversion in the last stage. TIA's BW extension is further supported by optimizing the photodiode-to-receiver (PD-to-RX) interconnect and utilizing several inductive peaking techniques. It achieves 63-dB<sub>Ω</sub> gain, 32-GHz BW, and an average input-referred current noise density of 16.9 pA/ $\sqrt{\text{Hz}}$  while operating at 0.9-V supply and consuming 47-mW power. Opto-electrical measurements are performed on a co-packaged prototype comprised of identical proposed TIAs in CMOS with combinations of various commercial PDs and PD-to-RX interconnect lengths confirming 112-Gb/s 4-PAM reception meeting pre-forward error correction (FEC) symbol error rate (SER) of 4.8  $\times$  10<sup>-4</sup> without any post-equalization.

*Index Terms*—100 Gb/s, 400 GbE, CMOS, co-packaged optical receiver front end, continuous-time linear equalizer (CTLE), fin field effect transistor (FinFET), gigabit Ethernet, inverter, low-noise broadband amplifier, optical communications, PAM-4, transimpedance amplifier (TIA).

#### I. INTRODUCTION

THE Big Bang of the Internet powering 5G, artificial intelligence (AI), machine learning (ML), video conferencing, the Internet of Things (IoT), and cloud storage applications has continuously increased the demand on the

Manuscript received 2 July 2022; revised 21 September 2022; accepted 25 October 2022. This article was approved by Associate Editor Farhana Sheikh. This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) under Grant 537348-18. (*Corresponding author: Dhruy Patel.*)

Dhruv Patel is with the Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 3G4, Canada (e-mail: dhruv.patel@isl.utoronto.ca).

Alireza Sharif-Bakhtiar was with Huawei Technologies Canada, Markham, ON L3R 5A4, Canada. He is now with Alphawave IP, Toronto, ON M5J 2M4, Canada.

Tony Chan Carusone is with the Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 3G4, Canada, and also with Alphawave IP, Toronto, ON M5J 2M4, Canada.

Color versions of one or more figures in this article are available at https://doi.org/10.1109/JSSC.2022.3218558.

Digital Object Identifier 10.1109/JSSC.2022.3218558

data centers with faster, lower cost, and energy-efficient solutions [1]. The majority of this demand burden is taken by the intra-data center links as they carry a relatively much higher portion of overall data center traffic. Standardization bodies are working toward such links taking its throughput capacity beyond 100 GBd/s for inter-rack and intra-rack links covering distances from 1 m up to 2 km [2], [3], [4]. Optical links, on the other hand, have been the most favorable domain of communication for such range of distances as optical channels have negligible frequency-dependent loss [5] compared to electrical links suffering from frequency-dependent loss beyond 20 GHz [6].

Considering the simplicity of signal modulation, energy efficiency, and cost efficiency, intensity-modulation and directdetection (IM/DD) systems are being pushed for emerging 400G-DR4/FR4/LR4 and 800G-DR8 Ethernet standards where +100 Gb/s/ $\lambda$  is targeted [3], [7]. Although optical channels have a negligible loss, the opto-electrical components in the signal path are typically bandwidth (BW) limited. Therefore, having significant expenditure in optical components and packaging, these standards are adapting to four-level pulseamplitude-modulation (4-PAM) signaling instead of conventional binary-coded non-return-to-zero (NRZ) signaling to double the data rate for a given system BW. However, adapting to 4-PAM signaling comes at the cost of reduced signal level spacing by 9.5 dB and enforced linearity constraints in both optical and electrical components [8]. Moreover, this also entails the extensive use of digital signal processing (DSP), including forward error correction (FEC) adding link latency and power [9], [10], [11].

Intra-data center links have been heavily reliant on pluggable optical transceiver modules connecting at the edge of the switch board, which is several tens of centimeters away from the switch application-specific integrated circuit (ASIC). Thus far, these pluggable modules have been scaling up with the data rate and channel count to meet the throughput demand. However, they are soon becoming a bottleneck [12], [13] due to heavy cost and power associated with frequency-dependent losses in printed circuit board (PCB) traces and multiple discrete components in re-timer and buffer circuitry [14], [15], [16]. To combat this, several efforts are evolving around to reduce the number of components while keeping the integration dense, reliable, and cost effective, due to the advancement in the packaging technologies supporting the fiber optics cables go as close as possible to the switch

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/



Fig. 1. Architectures of optical receivers with TIAs integrated in: (a) Bi-CMOS IC and (b) CMOS IC.

ASIC reducing the length of electrical channels [17], [18]. This has opened the doors for co-packaged optics (CPO) or first-level package integration [11], [15], [16], [19], [20], [21], [22], [23], [24], [25] as well as integration of silicon photonics (SiP) along with CMOS switch ASIC [26], [27], [28], [29], [30], [31], [32], [33], [34].

Optical receiver is a crucial determinant of the overall optical link performance. The very front-end block of the optical receiver is comprised of a transimpedance amplifier (TIA) whose gain, BW, and noise performance largely determine the overall receiver's sensitivity and power. Having inherently lower noise with superior  $f_T/f_{MAX}$  performance, the Si-GE Bi-CMOS process has been the powerhouse for the TIAs [35], [36], [37], [38], [39], [40], [41]. Nevertheless, recent advancements in CMOS processes have opened up opportunities for hosting TIAs while providing the luxury of integration with DSP (i.e., adaptation, equalization, and FEC) [29], [32], [33], [36], [42], [43], [44], [45], [46], [47], [48], [49], [50], [51], [52]. Importantly, integrating TIAs in CMOS helps reduce the total component counts, as shown in Fig. 1, hence reducing cost and power. However, there are several design challenges with CMOS process due to the limited supply range and higher thermal noise.

This work focusing on the optical receiver front end attempts to fill-in the following research gaps. First, it proposes a CMOS-suitable single-ended inverter-based multi-stage linear TIA comprised of continuous-time linear equalizers (CTLEs) and a variable gain amplifier (VGA). The TIA is carefully designed with several combinations of lowpower, low-noise, and BW extension design choices. Second, it addresses the co-optimization of photodiode-to-receiver (PD-to-RX) interconnect between flip-attached PD and CMOS integrated circuit (IC) hosting the TIA (see Fig. 2). Finally, it provides comprehensive prototype measurement results with multiple PD-to-RX interconnect lengths and multiple commercial PDs achieving 112-Gb/s 4-PAM.



Fig. 2. Co-packaged optical RX front-end architecture targeted in this work with TIA in CMOS and commercial PD flip-attached to a package substrate.

This article, which is an extension of our recent work [53], is organized as follows. Section II provides the TIA architectural design considerations. PD-to-RX co-packaged interconnect optimization is described in Section III. Section IV presents the proposed TIA circuits. Section V describes the co-packaged prototype followed by detailed measurements and comparison with the state of the art in Section VI. Finally, the work is summarized in Section VII.

#### II. ARCHITECTURAL DESIGN CONSIDERATIONS

# A. Co-Packaged Flip-Chip Integration

TIAs to be hosted in CMOS along with other DSP blocks require widely accepted flip-chip packaging compared to traditional wire-bonding solutions to support high I/O density, solid power/ground integrity, and low parasitics. This work focuses on the receiver-side integration of flip-chip co-packaged TIA in fin field effect transistor (FinFET) CMOS with discrete commercial PD, as shown in Fig. 2. This heterogeneous integration offers much more flexibility of choosing the best suitable technology for both the TIA and the PD to achieve the superior performance overall. For the best performance, PD-to-RX interconnect is optimized in this work with more details described in Section III.

## B. Low-Power Design Choices

1) Inverter as a Fundamental Block: To achieve a lowpower design, circuits supported by a low-voltage operation must be selected. A CMOS inverter is considered a foundational block for the proposed multi-stage TIA design. Inverter is an excellent power-efficient analog amplifier providing  $2 \times g_m$  for the same drain current [54], [55], [56]. It provides low-voltage operation providing higher linear swing for a given supply supporting 4-PAM signaling. Supported by the advanced FinFET CMOS nodes, having the same drive strength for equally sized PMOS and NMOS transistors self-biases the inverter at mid-rail when in the shunt-feedback configuration. Hence, no separate biasing or tail current circuitry is required. Furthermore, this also allows the layout to be symmetric about the horizontal axis, and having no other internal parasitic poles makes the layout iteration much easier for maximizing the BW. Importantly, it is compatible with Cherry-Hooper style configuration supporting multistage energy-efficient high-BW broadband amplification [29], [32], [52], [57].

2) Single-Ended TIA With Single-to-Differential Block at the End: Since the PD output in IM/DD systems is single-ended, TIAs from prior work utilize replica-based single-ended-to-pseudo-differential (S2D) block from the first TIA stage [58]

PATEL et al.: 112-Gb/s -8.2-dBm SENSITIVITY 4-PAM LINEAR TIA IN 16-nm CMOS



Fig. 3. Single-ended TIA with S2D block in the last stage.

or have an inverter-based self-referenced S2D block in the first stage [45] or in the second stage [32]. In this work, a singleended TIA architecture with inverter-based S2D block in the last stage is chosen, as shown in Fig. 3. This has numerous benefits and some disadvantages. Single-ended architecture reduces power consumption up to half and thermal noise by up to a factor of  $\sqrt{2}$  compared to replica-based TIA architectures. Compared to other architecture variants, it saves a significant active silicon area. It also gives a big relief from the significant design overhead dealing with mismatches in the amplitude and the phase errors in differential paths, especially when the targeted symbol rate has a UI < 20 ps [32]. However, a single-ended inverter-based architecture is sensitive to power supply noise, which can elevate the power supply induced jitter (PSIJ) [59], [60]. Although this is not required in this work, it can be mitigated with either placing the TIA circuits with its own isolated supply voltage or having a dedicated on-chip low dropout regulator (LDO) [48], [61], [62]. For example, on-chip LDOs can offer power supply rejection ratio (PSRR) of >20 dB up to several hundreds of megahertz [63], [64], [65].

#### C. Broadband Low-Noise Design

A low-noise design approach from [36] and [58] is considered, as shown in Fig. 4. It consists of an inverter-based TIA having the first transimpedance stage (TIS) with significantly low-BW followed by BW recovering CTLEs targeting overall post-layout transimpedance gain of >60 dB $\Omega$  and BW of >35 GHz. To achieve such high-BW performance, the use of passive elements, such as inductors and/or T-coils, is almost always required occupying a significant silicon area. Reducing the counts of such passive elements with the help of active circuit design techniques can further support the compact integration, which can enable multi-channel integration on the same die [45]. With the selected low-noise broadband approach, this work also attempts to minimize the number of passive inductors for compact design.

The first TIS stage having an inverter-based single-stage core amplifier with a fixed voltage supply (constrained by CMOS process) exhibits nearly the second-order system and offers slower roll-off [66]. This makes the design of following CTLEs much easier to recover the desired BW. To keep the in-band ringing and group delay variations well under control, two cascaded CTLEs are considered dividing the BW recovering task among them. Note that a single-stage core amplifier with a shunt-feedback architecture is chosen for a low-noise design instead of a multi-stage core amplifier with a shunt-feedback architecture implemented in [34] and [67]. This is because the latter approach is only efficient when the



Fig. 4. Low-noise inverter-based TIA design approach with low-BW first stage followed by BW recovering CTLE stages.

ratio of  $f_T$  to the desired TIA BW is much higher (i.e., >10), which cannot be satisfied in this case [66]. The latter approach also entails significant design effort overhead associated with meeting sufficient phase margin while dealing with multiple complex poles [66], [68]. The noise reduction insight with considered approach can be explained as follows. An inverter with resistive shunt feedback ( $R_F$ ) typically results in a second-order system. Assuming the first TIS having maximally flat (Butterworth) second-order characteristics with its core amplifier's (an inverter) dc gain  $A_0 \gg 1$ , the input-refereed current noise spectrum of the TIA can be given by the following equation [36], [58]:

$$I_{n}^{2}(f) = \underbrace{\frac{4kT}{R_{F}} + \frac{4kT\gamma}{g_{m}R_{F}^{2}} + \frac{4kT\gamma(2\pi fC_{T})^{2}}{g_{m}}}_{I_{n-\text{TIS}}^{2}(f)} + \underbrace{\frac{V_{n-\text{CTLEs}}^{2}(f)}{R_{F}^{2}} + \frac{V_{n-\text{CTLEs}}^{2}(f)}{R_{F}^{2}} \left(\frac{f}{f_{-3 \text{ dB, TIS}}}\right)^{4}}_{I_{n-\text{CTLEs}}^{2}(f)}$$
(1)

where  $g_m$  is the transconductance of an inverter in TIS, k is Boltzmann's constant, T is the absolute temperature,  $\gamma$  is the MOSFET thermal noise factor,  $C_T$  is the total input capacitance,  $V_{n-\text{CTLE}}^2$  is the thermal noise of the CTLEs referred at its input, and  $f_3_{\text{dB,TIS}}$  is the -3 dB BW of TIS stage. Noise terms are grouped by the input-referred current noise contribution from the TIS as  $I_{n-\text{TIS}}^2(f)$  and the CTLEs as  $I_{n-\text{CTLEs}}^2(f)$ .

In attempt to reduce  $I_n^2(f)$  for a fixed power expenditure, supply voltage implies that  $g_m$  and  $C_T$  remain constant, whereas the selection of  $R_F$  and  $f_{3 \text{ dB,TIS}}$  becomes a critical design choice. Importantly,  $R_F$  and  $f_{-3 \text{ dB,TIS}}$  in the second-order transimpedance systems have their upper bound limit analyzed in [68], which is given in the following equation:

$$R_F <= \frac{A_0 f_3 \,_{\text{dB,AMP}}}{2\pi \, C_T \, f_3^2 \,_{\text{dB,TIS}}} \tag{2}$$

where  $A_0 f_{3 \text{ dB,AMP}}$  is the gain–BW product of an inverter reflecting the technology parameter, which generally remains



Fig. 5. Illustration of input-referred noise reduction.

constant for a fixed supply voltage. Equation (2) signifies that  $R_F$  and  $f_{3 \text{ dB,TIS}}$  can be traded with inverse-quadratic relationship. If  $R_F$  is increased, then all the white noise terms in (1) (terms independent of f) would be reduced. The  $f^2$ colored noise term remains unchanged as it only contains constant parameters. The  $f^4$  colored noise term also remains unchanged because the result of  $R_F^2 \cdot f_{-3 \text{ dB,TIS}}^4$  in the denominator remains unchanged since  $R_f$  and  $f_{-3 \text{ dB,TIS}}$  are traded with inverse-quadratic relationship given by (2). Increasing  $R_F$  resulting in reduced  $f_{-3 \text{ dB,TIS}}$  implies that higher peaking from the CTLEs is required to recover overall targeted TIA BW. This can alleviate the CTLE noise,  $(V_{n-\text{CTLEs}}^2)$ . However, its impact on  $I_n^2(f)$  is not significant as the CTLEs' noise gets suppressed by  $R_F^2$  as dictated by (1). An illustration of noise reduction is shown in Fig. 5 when  $f_{-3 \text{ dB,TIS}}$  is scaled down by a factor of n resulting in  $R_F$  to scale up by a factor of  $n^2$ . Nevertheless, design iterations are required to find the right balance between the choice of  $R_F$  and colored noise contribution while considering the extent to which CTLEs are capable of recovering the desired BW. Further details on the transfer characteristics and the input-referred noise contribution of individual blocks in the proposed TIA are reported in Section IV.

## III. CO-PACKAGED OPTIMIZATION OF PD-TO-RX INTERCONNECT

The PD-to-RX interconnect (shown in Fig. 2) is in the high-speed signal path, and its design impacts the overall TIA BW. The PD output and the TIA input typically have large capacitances ranging from few tens to couple hundred femto-farads. Extending the BW by inserting an appropriate passive network between the PD and the TIA is well recognized where on-chip inductors are often inserted in between [69] or exploiting the inductive property of a well-modeled bond wire is considered during co-optimization with TIA [40], [47], [58], [70], [71], [72]. In this work, having both the PD and the CMOS TIA flip-chip mounted to a common package substrate affords the opportunity for an optimized micro-strip interconnect to extend the BW.

Assuming an interconnect with an ideal transmission line for simplicity, characteristics impedance,  $Z_0$ , in terms of inductance per unit length, L', and capacitance per unit length, C',



Fig. 6. Test bench model for passive front-end optimization with PD-to-RX interconnect.

is given by the following equation [73]:

$$Z_0 = \sqrt{\frac{L'}{C'}}.$$
(3)

It can be seen that simply increasing  $Z_0$  of the interconnect (i.e., reducing the micro-strip width) decreases its C' and increases its L'. Hence, exploiting the inductive property of PD-to-RX interconnect, optimum  $Z_0$  is selected to achieve passive front-end BW extension in this work.

Fig. 6 shows the test bench model used for optimizing the PD-to-RX interconnect. A simplified first-order TIA input impedance model with  $R_{in} = 26 \ \Omega$  and  $C_{in} = 100 \ \text{fF}$  is extracted from the proposed TIA. An electrostatic discharge (ESD) diode with a post-layout extracted capacitance of 80 fF is placed at the input for higher reliability and protection of >1 kV human body model (HBM) and >250V charge device model (CDM). The total bump pad capacitance at the RX input is extracted to be 100 fF, which includes the metal pad to the substrate capacitance of 70 fF and pad-to-pad capacitance with neighboring supply/ground pads of 30 fF. To reduce the effective capacitance imposed by an ESD diode and a bump pad, a multi-layer T-coil occupying  $20 \times 20 \ \mu\text{m}$  area is inserted, which helps increase the BW by  $2\times$ , as shown in Fig. 6 [29], [74], [75], [76], [77], [78].

The T-coil lump model is extracted using the EMX tool. The bump model with lumped elements is obtained from [79] with the values of physical parameters provided by the fabrication and assembly vendors. Proprietary PD model parameters are obtained from the PD vendor. The PD-to-RX micro-strip interconnect with the desired length and  $Z_0$  is modeled using the ADS EM tool providing layout extracted *S*-parameters.

Optimization of PD-to-RX interconnect is performed for two different interconnect lengths: 250 and 500  $\mu$ m. Optimum  $Z_0$  of a given interconnect length is selected by the one that provides the flattest possible dc gain with the highest BW. The transfer characteristic from the optical input to the TIA input is simulated across various  $Z_0$ 's, as shown in Fig. 7. It shows that  $Z_0 = 80 \ \Omega$  offers optimum choice for  $L = 250 \ \mu$ m [Fig. 7(a)] where as it is  $Z_0 = 50 \ \Omega$  being the optimum choice for  $L = 500 \ \mu$ m [Fig. 7(b)]. In both cases, selected  $Z_0$  results in the passive front-end BW extension up to 60 GHz. Due to manufacturing limitations,  $Z_0 = 75 \ \Omega$  for  $L = 250 \ \mu$ m is fabricated in the prototype presented in Section V. It is also verified that having no interconnect between PD and RX (i.e.,



Fig. 7. Passive front-end optimization for (a) 250- and (b)  $500-\mu$ m-long microstrip interconnect from PD-to-RX.



Fig. 8. Simulated 112-Gb/s PAM-4 eye diagrams at the TIA input for PD-to-RX interconnect length of 250  $\mu$ m with (a)  $Z_0 = 30 \Omega$ , (b)  $Z_0 = 40 \Omega$ , (c)  $Z_0 = 50 \Omega$ , (d)  $Z_0 = 60 \Omega$ , (e)  $Z_0 = 70 \Omega$ , and (f)  $Z_0 = 80 \Omega$ .

 $L = 0 \ \mu$ m) results in the lowest BW simply because now there is no inductive element in between. Furthermore, the choice of optimum  $Z_0$  is further confirmed by the simulated eye diagrams at the TIA input at 112-Gb/s PAM-4 shown in Fig. 8 (for  $L = 250 \ \mu$ m) and Fig. 9 (for  $L = 500 \ \mu$ m). Selected  $Z_0$  in both cases results in maximum eye opening



Fig. 9. Simulated 112-Gb/s PAM-4 eye diagrams at the TIA input for PD-to-RX interconnect length of 500  $\mu$ m with (a)  $Z_0 = 30 \Omega$ , (b)  $Z_0 = 40 \Omega$ , (c)  $Z_0 = 50 \Omega$ , (d)  $Z_0 = 60 \Omega$ , (e)  $Z_0 = 70 \Omega$ , and (f)  $Z_0 = 80 \Omega$ .

also assuring insignificant group delay variations. Importantly, this passive input network BW extension due to optimized PD-to-RX interconnect is achieved without any additional cost of noise or power. Since the TIA input impedance model and the selected T-coil depend on the TIA design, the overall co-design entails iterative optimization.

#### IV. PROPOSED TIA CIRCUIT IMPLEMENTATION

Fig. 10 shows the proposed three-stage inverter-based TIA operating at 0.9-V supply. Stage-1 is comprised of a shunt-feedback inverter designed with 10-GHz BW (roughly  $1/4^{th}$  of the overall BW) allowing higher value for  $R_F$  to maximize the dc gain and lowering the input-referred current noise, as discussed in Section II-C. Although, having higher  $R_F$  value is favorable for noise reduction, its value is constrained by the linearity. For example, in the pursuit of reducing overall noise, having much higher  $R_F$  value such as 650  $\Omega$  results in the TIS1 BW of 7 GHz, which can still be recovered by the CTLEs. However, having such high gain in TIS1 could induce non-linearity even before the signal BW gets recovered by the subsequent CTLEs. Hence, the final value of  $R_F = 324 \Omega$  with 10-GHz BW in TIS1 is carefully selected based on the trade-offs between noise and linearity.

Stage-2 is a Cherry–Hooper style stage forming a digitally programmable CTLE. A CTLE is comprised of a transconductor formed by an inverter for low-frequency gain in parallel with a CR-based high-pass filtered inverter. Resistors  $R_{E1}$ and  $R'_{E1}$  are digitally tuned to adjust high-pass filter cutoff frequency. PMOS and NMOS transconductances, biased ( $V_B$ ) with diode-connected inverter, are separately high-pass filtered to provide more programmability. This also helps tune-out any ringing in the frequency response arising from the process variation and packaging-related parasitics. CTLE transconductors' output current is converted back to a voltage by another



Fig. 10. Proposed TIA schematic.



Fig. 11. CTLEs and VGA response (stage 2 + stage 3) with min., mid. and max. settings.

TIS stage. Overall, Stage-2 is capable of providing a maximum of 4 dB boost at 25 GHz.

Stage-3 is comprised of a digitally programmable inverterbased VGA, with another CTLE similar to the one in Stage-2 in parallel for further equalization. The last portion of Stage-3 has a large TIS with  $L_1$  in series and  $L_2$  in shunt-feedback providing further BW extension [80]. Overall, Stage-3 provides around 7 dB boost at 30 GHz.

The sizing of each TIA stage is performed as follows. The inverter in TIS1 is sized up as much as possible to increase its  $g_m$  lowering its device noise. However, its sizing is limited by the dominance of self-loading increasing the capacitance at the TIA input ( $C_T$ ). The sizing of the subsequent transconductors (i.e., CTLE1) in Stage-1 is kept relatively low compared to



Fig. 12. S2D block with pulse response simulations at 56 GBd/s shown.

TIS1 to avoid further loading on TIS1. This allows the inverters in TIS1 to operate at maximum possible gain–BW product  $(g_m/2\pi C_L)$  supported by the technology. The TIS2 stage with  $R_F$  of 47  $\Omega$  allowing much higher transimpedance BW is sized relatively much larger to drive the input capacitance of CLTE2 + VGA. Finally, the TIS3 (S2D) is sized the largest to sufficiently drive the subsequent buffers without impacting the BW performance.

Post-layout simulations of CTLEs + VGA (Stage-2 + Stage-3 combined) response across maximum-to-minimum code settings is shown in Fig. 11. It highlights that dc gain and the CTLE peaking frequency can be changed independently. The transient pulse response simulation (post-layout) of implemented S2D block in this work at 56 GBd/s is shown in Fig. 12. It indicates that the resulting pseudo-differential signal D(s) has roughly 15% increased swing compared to

PATEL et al.: 112-Gb/s -8.2-dBm SENSITIVITY 4-PAM LINEAR TIA IN 16-nm CMOS



Fig. 13. Schematic of CML output buffers.

its single-ended output  $D_P(s)$ . It can be easily proven by formulating

$$D_P(s) = -I(s)Z_f(s)\frac{A(s)}{1+A(s)}$$
(4)

and

$$D(s) = D_P(s) - D_N(s) = -I(s)Z_f(s)$$
(5)

where I(s) is the input current to the S2D block from the previous TIA stage and  $Z_F(s)$  is the equivalent impedance of the shunt feedback. It indicates that D(s) has a slight gain of  $(D(s)/D_P(s)) = (A(s) + 1/A(s))$  compared to  $D_P(s)$ .

To subtract the dc current from the PD, a dc offset compensation (DCOC) loop in feedback is formed with a 1.3-M $\Omega$ resistor in series with inverter-based Miller capacitor (inverter with 9.3 pF of capacitor in shunt). The DCOC low-pass filter in closed loop provides a lower cutoff frequency around 1 kHz.

PD cathode is biased at 4 V through on-chip *RC* filter shown in Fig. 10 to decouple noise to the chip ground and to dampen any series resonance due to packaging inductance. The *RC* filter is formed with a metal resistor of 40  $\Omega$  and an MOM capacitor of 80 pF.

Current-mode logic (CML) buffers in this work are chosen specifically for testing the TIA circuits. Although CML buffers require higher static power and supply voltage compared to the CMOS inverter-based buffers used in [32], they offer inherent differential operation with higher common-mode rejection ratio (CMRR) making it best suitable for off-chip driving the high-speed signals coping with ground/supply noise [29], [44], [45], [46], [81] Three cascaded linear CML buffers equipped with shunt-inductive peaking and operating at 1.2 V (see Fig. 13) are followed by required T-coil and ESD diodes to drive 50- $\Omega$  load of the test equipment. They are designed to achieve 0-dB gain and 45-GHz BW. The tail current devices in the CML buffers are designed with a 96 nm gate length  $(6 \times$  than the minimum gate length of 16 nm) increasing its output resistance to help improve the CMRR. They provide (simulated) > 30 dB CMRR up to -3-dB TIA BW converting the pseudo-differential output of the TIS3 to fully differential. All inductors and T-coils are designed with an extracted self-resonant frequency of >80 GHz and a low quality factor of around 3.5 to support broadband operation.

To achieve the maximum possible performance, a careful device layout is considered as follows. Double-sided gate contacts are used to reduce the gate resistance, hence minimizing the noise [82]. Maximum of four fins per finger is used to minimize the self-heating effect affecting the transistor



Fig. 14. Input-referred noise contribution of each block in the proposed TIA chain.



Fig. 15. Simulated post-layout power breakdown of each TIA stage.

performance [83]. Gate-to-drain capacitance  $(C_{gd})$  and drainto-source capacitance  $(C_{ds})$  are minimized by bringing up the gate, drain, and source connections to higher metals in a staggered and staircase pattern. Minimum of three dummy fingers on both sides of the device is used to minimize the impact of process variations.

The simulated input-referred mean-square current noise contribution of various blocks in the TIA is shown in Fig. 14. It highlights that Stage-1 (TIS1) contributes (55.2%) to the majority of the noise. The total integrated input-referred current noise from  $R_F$  is 0.9  $\mu$ A<sub>rms</sub>, whereas it is 2.5  $\mu$ A<sub>rms</sub> from the device thermal noise of TIS1. Stage-2 and Stage-3 make up for 25.5% and 6.6% of the total noise, respectively. The TIS3 (S2D) block in Stage-3 is only responsible for 1.2% of the total noise. The DCOC circuits account for 2.2% of the total noise. Note that the input T-coils also contribute to 6.8% due to its parasitic resistance. The CML buffers being last in the signal path contribute only 3.7% of the total noise. Fig. 15 shows the simulated power breakdown of each TIA stage with the total TIA power of 51 mW.

Considering the optimized PD-to-RX response, the total transimpedance response  $(Z_T)$  in the post-layout simulation at TT corner and 25 °C is shown in Fig. 16. Stage-1 response (from PD output to Stage-1 output) achieves a low-BW of 10 GHz, while the following CTLEs extend the total transimpedance BW up to 39 GHz. Note that although Stage-1 has much lower BW, having the BW extension support up to 60 GHz from the input network shown in Figs. 6 and 7 (i.e., optimized T-coil and PD-to-RX interconnect) provides



Fig. 16. TIA response in post-layout simulation with optimized PD-to-RX interconnect. This includes the optimized PD-to-RX interconnect, CML buffers, and  $50-\Omega$  loads.



Fig. 17. Post-extracted simulated 112-Gb/s PAM-4 eye diagrams at the output of (a) PD, (b) low-BW TIS1, (c) CTLE1 transconductor, (d) VGA/CTLE2 transconductor, and (e) TIS3.

relatively much slower roll-off compared to the second- or third-order response. As a result of the slower roll-off from Stage-1, equalizing the gain at the frequencies between 10 and 39 GHz makes the task of two CTLEs easier requiring <7-dB boost at Nyquist per CTLE. This relaxed requirement in CTLEs ultimately helps keeping the group delay variation under control preserving the eye quality.

Also, note that the noise contribution of Stage-1 alone accounts for 55.2% of the total noise while having only 1/4th of the overall BW. On the other hand, Stage-2 + Stage-3 accounting for only  $\sim$ 32% of overall noise helps recover the targeted BW; due to the broadband low-noise design approach. Fig. 17 shows the resulting 112-Gb/s PAM-4 eye diagrams at the output of PD, low-BW TIS1, CTLE1 transconductor, VGA/CTLE2 transconductor, and, finally, TIS3 output.

#### V. PROTOTYPE

A co-packaged prototype housing four identically proposed TIAs in 16-nm FinFET CMOS exercised with multiple commercial PDs labeled as PD-[A/B/C] and PD-to-RX interconnect lengths (250  $\mu$ m with Z<sub>0</sub> = 75  $\Omega$  and 500  $\mu$ m with Z<sub>0</sub> = 50  $\Omega$ ) is assembled. The overview of the prototype with PDs and PD-to-RX interconnect specifications is shown in Fig. 18.

Fig. 19(a) shows the assembled unit comprised of two commercial back-illuminated PD ICs flip-attached onto a package



|                                          | RX1  | RX2  | RX3  | RX4   |
|------------------------------------------|------|------|------|-------|
|                                          | PD-A | PD-B | PD-C | Elec. |
| PD responsivity (A/W)                    | 0.6  | 0.6  | 0.7  | N/A   |
| PD capacitance (fF)                      | 60   | 60   | 70   | N/A   |
| PD O-E BW (GHz)                          | 40   | 40   | 35   | N/A   |
| PD-to-RX trace length (µm)               | 250  | 500  | 250  | 250   |
| PD-to-RX interconnect Z <sub>0</sub> (Ω) | 75   | 50   | 75   | 75    |

Fig. 18. Co-packaged optical RX front-end prototype overview with four identical TIAs in CMOS exercised with various commercial PDs and PD-to-RX interconnect lengths.





Fig. 19. (a) Assembled prototype of 16-nm FinFET CMOS chip and commercial PDs. (b) RX slice with area breakdown.

substrate alongside CMOS IC (2 mm  $\times$  2 mm). On-package probing is used to capture output signals. The entire assembly

PATEL et al.: 112-Gb/s -8.2-dBm SENSITIVITY 4-PAM LINEAR TIA IN 16-nm CMOS



Fig. 20. Electrical measurements: (a) transimpedance, (b) group delay, (c) output THD, and (d) 1-dB compression point.



Fig. 21. Single-ended output voltage noise distribution measurements: (a) with RX OFF (i.e., inherent noise of the scope) and (b) with RX ON.

is mounted on the PCB giving access to dc supplies and digital control signals. PD-C is a singlet, whereas PD-A and PD-B are two from an array of four PDs. The RX slice with dimensions is shown in Fig. 19(b) where the TIA (including ESD diodes and input T-coil) + DCOC blocks occupy 0.0165-mm<sup>2</sup> area.

#### VI. MEASUREMENTS

## A. Electrical Measurements

Electrical characterization is executed on RX4 where *S*-parameter measurements are performed using Keysight N5227B PNA. *S*-parameter inferred transimpedance for low, mid, and maximum gain settings is shown in Fig. 20(a). They reveal the maximum dc transimpedance gain of 63 dB $\Omega$  and 3-dB BW of 32 GHz. Digital tuning of VGA through maximum and minimum codes reveals the TIA dynamic range of 9 dB. The group delay measurements are shown in Fig. 20(b) confirming the group delay variation of <±5 ps up to 32 GHz.

Single-ended total harmonic distortion (THD) measurements shown in Fig. 20(c) and (d) are obtained by Rohde & Schwarz FSW-26 spectrum analyzer at the 1-GHz tone. It demonstrates that with 8% THD, up to 670  $\mu A_{pp}$  of input PD current can be handled. It also shows that the 1-dB compression point with maximum gain occurs at 320  $\mu A_{pp}$ .

Noise measurements are performed on RX1 with PD-A attached and with laser source turned off, i.e., no input optical signal is applied. The single-ended output voltage noise distribution is measured using Keysight DCA-X



(b)

Fig. 22. Optical measurements setup: (a) test bench schematic, (b) lab setup with assembled prototype.

sampling scope with 86118A module. Results shown in Fig. 21(a) and (b) are taken with RX OFF and RX on, respectively, for de-embedding the scope noise. The integrated input-referred current noise of the TIA results in

$$i_{n,in \text{ (rms)}} = \frac{2 \times \sqrt{(2.23 \ mV)^2 - (0.63 \ mV)^2}}{10 \left(\frac{63 \ \text{dB}}{20}\right)} = 3.0 \ \mu\text{A}_{\text{rms}}$$
(6)

or, equivalently, the average input-referred current noise density of 3.0  $\mu$ A<sub>rms</sub>/(32 GHz)<sup>1/2</sup> = 16.9 pA/(Hz)<sup>1/2</sup>.

## B. Optical Measurements

Fig. 22(a) shows the optical measurement test bench schematic. Optical measurements are performed at  $\lambda =$ 1310 nm (*O*-band) generated by a constant-wave distributedfeedback (DFB) laser source. A laser source feeds the commercial Mach–Zehnder modulator (MZM) through a single-mode fiber. The RF input of the MZM (EO BW of 35 GHz) is provided by a discrete driver amplifying the output of the Keysight M8194A arbitrary waveform generator (AWG). A 112-Gb/s 4-PAM QPRBS13 pattern generated from



Fig. 23. 112-Gb/s 4-PAM RX differential output eye diagrams with -6.1-dBm input OMA. Top: without any on-scope equalization on RX1, RX2, and RX3. Bottom left: with four-tap FFE on-scope equalization on RX1. Bottom right: with four-tap FFE + four-tap DFE on-scope equalization on RX1.

an AWG results in 4-PAM eyes with RLM > 0.95 and outer ER > 3 dB at the MZM output. Optical probe with lensed fiber tip is used to free-space couple light onto a PD. Differential outputs are probed through on-package pads and measured using the Keysight 86118A module. The co-packaged prototype mounted on the probe station with test equipment is shown in Fig. 22(b).

The 112-Gb/s 4-PAM optical measurements are performed for all three RX[1:3] individually with -6.1-dBm optical modulation amplitude (OMA). Differential output eye diagrams satisfying the minimum pre-FEC symbol error rate (SER) limit of  $4.8 \times 10^{-4}$  (indicated by the eye contours) without any on-scope equalization are shown in the top of Fig. 23. It is observed that both RX1 and RX2, having the same PD but with  $2 \times$  difference in their PD-to-RX interconnect length, achieved similar eye quality, due to the optimized  $Z_0$  chosen for their respective PD-to-RX interconnect length. Also, it is noted that RX3 with PD-C having 14% higher responsivity than PD-A/B from RX1/2 achieved slightly improved eye opening compared to the ones from RX1/2. Eye quality is further improved after applying on-scope four-tap feed-forward equalizer (FFE) and four-tap FFE + four-tap decision feedback equalizer (DFE) equalization, as shown in the bottom of Fig. 23. Operating at the maximum gain setting, the TIA consumes 47 mW, while the CML buffers consume 30 mW of power.

The 100-Gb/s 4-PAM output eye diagram without on-scope equalization with -4.1 dBm input OMA on RX1 is shown in Fig. 24(a). To further show the potential of the TIA, 150-Gb/s 4-PAM at -3.6-dBm input OMA after on-scope 16-tap FFE + 2-tap DFE equalization is measured and shown in Fig. 24(a).

Fig. 25(a) shows the 112-Gb/s 4-PAM SER across the input OMA achieving the sensitivity of -8.2-dBm OMA at the pre-FEC SER limit of  $4.8 \times 10^{-4}$  without on-scope equalization. Enabling on-scope four-tap FFE, four-tap DFE and

TABLE I Sensitivity Summary

IEEE JOURNAL OF SOLID-STATE CIRCUITS

|                          | Sensitivity at pre-FEC<br>BER of 1x10 <sup>-12</sup> (dBm) |       |       |      | Sensitivity at pre-FEC<br>SER limit of 4.8x10 <sup>-4</sup> (dBm) |                     |        |
|--------------------------|------------------------------------------------------------|-------|-------|------|-------------------------------------------------------------------|---------------------|--------|
| Datarate (Gb/s)          | 50                                                         | 56    | 64    | 72   | 100                                                               | 106.25              | 112    |
| Modulation               | NRZ                                                        | NRZ   | NRZ   | NRZ  | 4-PAM                                                             | 4-PAM               | 4-PAM  |
| Without EQ               | -13.9                                                      | -11.7 | -7.6  | -5.6 | -12.5                                                             | -10.6               | -8.2   |
| 4-tap FFE                | < -15.1                                                    | -12.7 | -10.4 | -7.5 | -13.5                                                             |                     | < -9.6 |
| 4-tap DFE                | -                                                          | 1     | -     | I    | -14.5                                                             | - 12.2              |        |
| 4-tap FFE +<br>4-tap DFE | 1                                                          | Ι     | -     | Ι    | -15                                                               | < <del>-</del> 12.5 |        |



Fig. 24. RX differential output eye diagrams of (a) 100-Gb/s 4-PAM at -4.1-dBm input OMA without on-scope equalization and (b) 150-Gb/s 4-PAM at -3.6-dBm input OMA after 16-tap FFE + 2-tap DFE on-scope equalization.

combination of both further reduce the SER, hence proving the suitability of the proposed TIA inhabiting in the front end of the DSP-based optical receivers. Similar measurement results are obtained for 4-PAM 100 Gb/s [Fig. 25(b)] achieving the sensitivity of -12.5-dBm OMA without on-scope equalization. Note that due to linearity limitations, the SER does not reduce beyond -3.3- and -4-dBm input OMA at 112- and 100-Gb/s 4-PAM, respectively. Considering the input

|                                                                  | JSSC'19 [44]                     | ESSCIRC'18 [46]                             | JSSC'18 [32]                                         | ISSCC'21 [43]                                                | This Work                                                                                          |
|------------------------------------------------------------------|----------------------------------|---------------------------------------------|------------------------------------------------------|--------------------------------------------------------------|----------------------------------------------------------------------------------------------------|
| CMOS technology                                                  | 28nm Bulk                        | 28nm Bulk                                   | 16nm FinFET                                          | 28nm Bulk                                                    | 16nm FinFET                                                                                        |
| Supply (V)                                                       | 1.2                              | 1.2, 2.5 (Vreg.)                            | 1.8 (Vreg.)                                          | 1.5 (Vreg.)                                                  | 0.9                                                                                                |
| Datarate (Gb/s)                                                  | 53                               | 112                                         | 106.25                                               | 100                                                          | 112                                                                                                |
| Signaling                                                        | NRZ                              | 4-PAM                                       | 4-PAM                                                | 4-PAM                                                        | 4-PAM                                                                                              |
| TIA gain (dBΩ)                                                   | 74                               | 65                                          | 78                                                   | 66*                                                          | 63                                                                                                 |
| TIA 3dB-BW (GHz)                                                 | 27                               | 60                                          | 27                                                   | 20*                                                          | 32                                                                                                 |
| Max 4-PAM output swing<br>w/ 50 Ω term. (mV <sub>P-P</sub> diff) | N/A                              | 300                                         | 300                                                  | 620                                                          | 450                                                                                                |
| Input ref. noise (pA/√Hz)                                        | 14*                              | 19.3                                        | 16.7                                                 | 21.2*                                                        | 16.9                                                                                               |
| Power (mW)                                                       | 34.6                             | 107**                                       | 60.8**                                               | 117**                                                        | 77 (47 TIA + 30 CML BUF)                                                                           |
| Input ESD (Yes/No)                                               | No                               | No                                          | No                                                   | No                                                           | Yes (80 fF)                                                                                        |
| PD capacitance (fF)                                              | 80                               | N/A<br>(Electrical<br>measurements<br>only) | 10                                                   | 70                                                           | 60                                                                                                 |
| PD responsivity (A/W)                                            | 0.55                             |                                             | 0.96                                                 | 1                                                            | 0.6                                                                                                |
| NRZ sensitivity at<br>< 1x10 <sup>-12</sup> BER (dBm)            | -6 dBm @53 Gb/s<br>w/o scope eq. |                                             | N/A                                                  | N/A                                                          | –11.7 @56 Gb/s<br>–5.6 @72 Gb/s<br>w/o scope eq.                                                   |
| 4-PAM sensitivity at < 4.8x10 <sup>-4</sup> SER (dBm)            | N/A                              |                                             | 11 dBm @<br>106.25 Gb/s<br>w/ 5-tap FFE<br>scope eq. | –8.3 dBm @100Gb/s<br>w/ 2-tap FFE + 2-tap<br>DFE on-chip eq. | -15 @100 Gb/s<br>< -12.3 @106.25 Gb/s<br>< -9.6 @112 Gb/s<br>w/ 4-tap FFE + 4-tap DFE<br>scope eq. |

 TABLE II

 COMPARISON WITH STATE-OF-THE-ART WORKS

\*Simulated

\*\*w/ output buffers and voltage regulators (approx. 20 - 40% of power consumed in voltage regulation)







Fig. 26. (a) Measured NRZ BER across input OMA at 72, 64, and 56 Gb/s. 72-Gb/s NRZ RX differential output eye diagrams at -5.6-dBm OMA: (b) without on-scope equalization and (c) with on-scope four-tap FFE equalization.

Fig. 25. Measured 4-PAM SER across input OMA: (a) 112 and (b) 100 Gb/s.

sensitivity and the maximum allowable input power before the sensitivity begins to degrade, the RX input dynamic range is measured to be -3.3 dBm - (-8.2 dBm) = 4.9 dB at 112-Gb/s 4-PAM and -4 dBm - (-12 dBm) = 8 dB at 100-Gb/s 4-PAM.

To demonstrate the support for low-latency links, optical measurement with NRZ PRBS13 test pattern performed at 72/64/56 Gb/s with bit error rate (BER) measurements across input OMA are shown in Fig. 26(a). Eye diagrams of 72-GB/s NRZ achieving BER less than  $1 \times 10^{-12}$  without and with

IEEE JOURNAL OF SOLID-STATE CIRCUITS

(four-tap FFE) on-scope equalization at -5.6-dBm OMA are shown in Fig. 26(b) and (c), respectively. The sensitivity measurements are summarized in Table I across different datarates.

Table II shows the comparison with the state-of-the-art works implemented in CMOS. Compared to [44], this work with 19% higher BW is capable of offering NRZ datarate of up to 72 Gb/s at a similar sensitivity of [44], but at the cost of 26% higher power consumption in TIA. The work of [46] (electrical measurements) offers the impressive BW of 60 GHz with the gain of 65 dB $\Omega$  but at the cost of higher power and noise compared to this work. Work from [32] with inverter-based TIA also implemented in 16-nm FinFET achieves a superior gain of 78 dB  $\Omega$ , but it trades with 18% lower BW while offering similar noise performance compared to this work. The work of [43] capable of 100 Gb/s offers 3 dB higher gain but with 37% lower BW than this work. With authors' best knowledge, even with approximately 40% lower PD responsivity and higher PD+ESD capacitance, this work offers the highest opto-electrically measured data rate and best sensitivity at equivalent datarates compared with [32] and [43].

#### VII. CONCLUSION

A 112-Gb/s 4-PAM linear TIA in 16-nm FinFET CMOS co-packaged along with various PDs and optimized PD-to-RX interconnect lengths is presented. An inverter-based single-ended TIA operating at 0.9-V achieves  $63\text{-}dB\Omega$  gain, 32-GHz BW, and an input-referred current noise of  $16.9\text{-}pA/\sqrt{\text{Hz}}$  while consuming 47-mW. The PD-to-RX interconnect is co-optimized to maximize the passive front-end BW. Optical measurements at 112-Gb/s 4PAM reveal a sensitivity of -8.2-dBm without any on-scope equalization meeting pre-FEC SER of  $4.8 \times 10^{-4}$ . Presented TIA with considered co-packaged architecture demonstrates strong potential for future high-density, low-energy, and low-cost +100-Gb/s class optical receivers required by the emerging 400-G/80-G/1.6-T Ethernet standards.

#### ACKNOWLEDGMENT

The authors would like to thank Huawei Technologies, Canada, for technical discussions (especially Dr. D. Dunwell and Dr. H. Shakiba) and fabrication support; Keysight technologies for lending the test equipment; Microteck Inc. for assembly support; and Prof. Nijwm Wary (now at IIT Bhubaneswar, Bhubaneswar, India) for tape-out support.

#### References

- Cisco Annual Internet Report (2018–2023) White Paper. Accessed: Jun. 20, 2022. [Online]. Available: https://www.cisco.com/c/en/us/ solutions/collateral/executive-perspectives/annual-internet-report/whitepaper-c11-741490.html
- [2] 100G Lambda Multi-Source Agreement. Accessed: Jun. 20, 2022.
   [Online]. Available: https://100glambda.com/specifications
- [3] IEEE P802.3df 200 Gb/s, 400 Gb/s, 800 Gb/s, and 1.6 Tb/s Ethernet Task Force. Accessed: May 6, 2022. [Online]. Available: https://www. ieee802.org/3/df/
- [4] Common Electrical I/O (CEI)-224G. Accessed: May 6, 2022.
   [Online]. Available: https://www.oiforum.com/technical-work/hottopics/common-electrical-i-o-cei-224g/

- [5] P. Yang et al., "Inter/intra-chip optical interconnection network: Opportunities, challenges, and implementations," in *Proc. 10th IEEE/ACM Int. Symp. Netw.-on-Chip (NOCS)*, Sep. 2016, pp. 1–8.
- [6] W. W. Beyene, Y.-C. Hahm, D. Secker, J. Ren, D. Mullen, and Y. Shlepnev, "Lessons learned: How to make predictable PCB interconnects for data rates of 50 Gbps and beyond," in *Proc. DesignCon*, Mar. 2014.
- [7] T. Wettlin, S. Calabro, T. Rahman, J. Wei, N. Stojanovic, and S. Pachnicke, "DSP for high-speed short-reach IM/DD systems using PAM," *J. Lightw. Technol.*, vol. 38, no. 24, pp. 6771–6778, Dec. 15, 2020.
- [8] B. Moeneclaey et al., "Design and experimental verification of a transimpedance amplifier for 64-Gb/s PAM-4 optical links," J. Lightw. Technol., vol. 36, no. 2, pp. 195–203, Jan. 15, 2018.
- [9] J. Bailey et al., "A 112-Gb/s PAM-4 low-power nine-tap sliding-block DFE in a 7-nm FinFET wireline receiver," *IEEE J. Solid-State Circuits*, vol. 57, no. 1, pp. 32–43, Jan. 2022.
- [10] M.-A. LaCroix et al., "A 116 Gb/s DSP-based wireline transceiver in 7 nm CMOS achieving 6 pJ/b at 45 dB loss in PAM-4/duo-PAM-4 and 52 dB in PAM-2," in *Proc. IEEE Int. Solid- State Circuits Conf. (ISSCC)*, Feb. 2021, pp. 132–134.
- [11] D. M. Kuchta, "High capacity VCSEL-based links," in Proc. Opt. Fiber Commun. Conf. Exhib. (OFC), 2017, pp. 1–94.
- [12] A. Ghiasi, "Large data centers interconnect bottlenecks," Opt. Exp., vol. 23, no. 3, pp. 2085–2090, Feb. 2015. [Online]. Available: http://opg.optica.org/oe/abstract.cfm?URI=oe-23-3-2085
- [13] H. J. S. Dorren, E. H. M. Wittebol, R. D. Kluijver, G. G. D. Villota, P. Duan, and O. Raz, "Challenges for optically enabled high-radix switches for data center networks," *J. Lightw. Technol.*, vol. 33, no. 5, pp. 1117–1125, Mar. 1, 2015.
- [14] B. Buscaino, E. Chen, J. W. Stewart, T. Pham, and J. M. Kahn, "External vs. integrated light sources for intra-data center co-packaged optical interfaces," *J. Lightw. Technol.*, vol. 39, no. 7, pp. 1984–1996, Apr. 1, 2021.
- [15] S. Fathololoumi et al., "1.6 Tbps silicon photonics integrated circuit and 800 Gbps photonic engine for switch co-packaging demonstration," *J. Lightw. Technol.*, vol. 39, no. 4, pp. 1155–1161, Feb. 15, 2021.
- [16] B. Buscaino, B. D. Taylor, and J. M. Kahn, "Multi-Tb/s-per-fiber coherent co-packaged optical interfaces for data center switches," *J. Lightw. Technol.*, vol. 37, no. 13, pp. 3401–3412, Jul. 1, 2019.
- [17] A. V. Krishnamoorthy et al., "From chip to cloud: Optical interconnects in engineered systems," J. Lightw. Technol., vol. 35, no. 15, pp. 3103–3115, Aug. 1, 2016.
- [18] C. Li, T. Li, G. Guelbenzu, B. Smalbrugge, P. Stabile, and O. Raz, "Chip scale 12-channel 10 Gb/s optical transmitter and receiver subassemblies based on wet etched silicon interposer," *J. Lightw. Technol.*, vol. 35, no. 15, pp. 3229–3236, Aug. 1, 2017.
- [19] R. Stone et al., "Co-packaged optics for data center switching," in Proc. Eur. Conf. Opt. Commun. (ECOC), Dec. 2020, pp. 1–3.
- [20] K. Hosseini et al., "8 Tbps co-packaged FPGA and silicon photonics optical IO," in Proc. Opt. Fiber Commun. Conf. (OFC), 2021, pp. 1–3.
- [21] L. Brusberg et al., "Fiber-to-waveguide connector for co-packaged optics," in Proc. Eur. Conf. Opt. Commun. (ECOC), Sep. 2017, pp. 1–3.
- [22] B. Wang, W. V. Sorin, P. Rosenberg, L. Kiyama, S. Mathai, and M. R. T. Tan, "4 × 112 Gbps/fiber CWDM VCSEL arrays for copackaged interconnects," in *Proc. Opt. Fiber Commun. Conf. (OFC)*. Optica Publishing Group, 2020, pp. 1–3. [Online]. Available: http:// opg.optica.org/abstract.cfm?URI=OFC-2020-M2A.4, doi: 10.1364/OFC. 2020.M2A.4.
- [23] Q. Hao et al., "A chip-level optical interconnect for CPU," *IEEE Photon. Technol. Lett.*, vol. 33, no. 16, pp. 852–855, Aug. 15, 2021.
- [24] S. Choi, Y. Bae, S. Oh, S. Han, D. D. Park, and Y. J. Park, "A new FOWLP platform for hybrid optical packaging—Demonstration on 100 Gbps transceiver," in *Proc. Opt. Fiber Commun. Conf. (OFC)*, 2021, pp. 1–3.
- [25] J. Sharma et al., "Silicon photonic microring-based 4 × 112 Gb/s WDM transmitter with photocurrent-based thermal control in 28-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 57, no. 4, pp. 1187–1198, Apr. 2022.
- [26] H. Li et al., "A 3-D-integrated silicon photonic microring-based 112-Gb/s PAM-4 transmitter with nonlinear equalization and thermal control," *IEEE J. Solid-State Circuits*, vol. 56, no. 1, pp. 19–29, Jan. 2021.
- [27] K. Yu et al., "A 25 Gb/s hybrid-integrated silicon photonic sourcesynchronous receiver with microring wavelength stabilization," *IEEE J. Solid-State Circuits*, vol. 51, no. 9, pp. 2129–2141, Sep. 2016.
- [28] A. H. Talkhooncheh et al., "A 2.4 pJ/b 100 Gb/s 3D-integrated PAM-4 optical transmitter with segmented SiP MOSCAP modulators and a 2channel 28 nm CMOS driver," in *Proc. IEEE Int. Solid- State Circuits Conf. (ISSCC)*, Feb. 2022, pp. 284–286.

- [29] M. Raj et al., "Design of a 50-Gb/s hybrid integrated Si-photonic optical link in 16-nm FinFET," *IEEE J. Solid-State Circuits*, vol. 55, no. 4, pp. 1086–1095, Apr. 2020.
- [30] M. Rakowski et al., "A 4 × 20 Gb/s WDM ring-based hybrid CMOS silicon photonics transceiver," in *Proc. IEEE Int. Solid-State Circuits Conf.* - (*ISSCC*) *Dig. Tech. Papers*, Feb. 2015, pp. 1–3.
- [31] D. F. Logan et al., "800 Gb/s silicon photonic transmitter for CoPackaged optics," in *Proc. IEEE Photon. Conf. (IPC)*, Sep. 2020, pp. 1–2.
- [32] K. R. Lakshmikumar et al., "A process and temperature insensitive CMOS linear TIA for 100 Gb/s/λ PAM-4 optical links," *IEEE J. Solid-State Circuits*, vol. 54, no. 11, pp. 3180–3190, Nov. 2019.
- [33] S. Saeedi, S. Menezo, G. Pares, and A. Emami, "A 25 Gb/s 3D-integrated CMOS/silicon-photonic receiver for low-power high-sensitivity optical communication," *J. Lightw. Technol.*, vol. 34, no. 12, pp. 2924–2933, Jun. 15, 2016.
- [34] M. G. Ahmed et al., "A 12-Gb/s –16.8-dBm OMA sensitivity 23-mW optical receiver in 65-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 53, no. 2, pp. 445–457, Feb. 2018.
- [35] K. Vasilakopoulos, S. P. Voinigescu, P. Schvan, P. Chevalier, and A. Cathelin, "A 92 GHz bandwidth SiGe BiCMOS HBT TIA with less than 6 dB noise figure," in *Proc. IEEE Bipolar/BiCMOS Circuits Technol. Meeting (BCTM)*, Oct. 2015, pp. 168–171.
- [36] I. García-López, A. Awny, P. Rito, M. Ko, A. C. Ulusoy, and D. Kissinger, "100 Gb/s differential linear TIAs with less than 10 pA/√Hz in 130-nm SiGe:C BiCMOS," *IEEE J. Solid-State Circuits*, vol. 53, no. 2, pp. 458–469, Feb. 2018.
- [37] A. Awny et al., "A dual 64 Gbaud 10 k $\Omega$  5% THD linear differential transimpedance amplifier with automatic gain control in 0.13  $\mu$ m BiCMOS technology for optical fiber coherent receivers," in *Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC)*, Jan. 2016, pp. 406–407.
- [38] T. Takemoto et al., "A 50-Gb/s high-sensitivity (-9.2 dBm) lowpower (7.9 pJ/bit) optical receiver based on 0.18-μm SiGe BiCMOS technology," *IEEE J. Solid-State Circuits*, vol. 53, no. 5, pp. 1518–1538, May 2018.
- [39] M. M. Khafaji, G. Belfiore, and F. Ellinger, "A linear 65-GHz bandwidth and 71-dBΩ gain TIA with 7.2 pA/√Hz in 130-nm SiGe BiCMOS," *IEEE Solid-State Circuits Lett.*, vol. 4, pp. 76–79, 2021.
- [40] C. Li and S. Palermo, "A low-power 26-GHz transformer-based regulated cascode SiGe BiCMOS transimpedance amplifier," *IEEE J. Solid-State Circuits*, vol. 48, no. 5, pp. 1264–1275, May 2013.
- [41] M. G. Ahmed, T. N. Huynh, C. Williams, Y. Wang, P. K. Hanumolu, and A. Rylyakov, "34-GBd linear transimpedance amplifier for 200-Gb/s DP-16-QAM optical coherent receivers," *IEEE J. Solid-State Circuits*, vol. 54, no. 3, pp. 834–844, Mar. 2019.
- [42] Q. Pan et al., "A 30-Gb/s 1.37-pJ/b CMOS Receiver for optical interconnects," J. Lightw. Technol., vol. 33, no. 4, pp. 778–786, Feb. 15, 2015.
- [43] H. Li, J. Sharma, C.-M. Hsu, G. Balamurugan, and J. Jaussi, "A 100 Gb/s-8.3 dBm-sensitivity PAM-4 optical receiver with integrated TIA, FFE and direct-feedback DFE in 28 nm CMOS," in *Proc. IEEE Int. Solid- State Circuits Conf. (ISSCC)*, Feb. 2021, pp. 190–192.
- [44] L. Szilagyi, J. Pliva, R. Henker, D. Schoeniger, J. P. Turkiewicz, and F. Ellinger, "A 53-Gbit/s optical receiver frontend with 0.65 pJ/bit in 28-nm bulk-CMOS," *IEEE J. Solid-State Circuits*, vol. 54, no. 3, pp. 845–855, Mar. 2019.
- [45] I. Ozkaya et al., "A 64-Gb/s 1.4-pJ/b NRZ optical receiver data-path in 14-nm CMOS FinFET," *IEEE J. Solid-State Circuits*, vol. 52, no. 12, pp. 3458–3473, Dec. 2017.
- [46] H. Li, G. Balamurugan, J. Jaussi, and B. Casper, "A 112 Gb/s PAM4 linear TIA with 0.96 pJ/bit energy efficiency in 28 nm CMOS," in *Proc. IEEE 44th Eur. Solid State Circuits Conf. (ESSCIRC)*, Sep. 2018, pp. 238–241.
- [47] J. E. Proesel et al., "A 32 Gb/s, 4.7 pJ/bit optical link with -11.7 dBm sensitivity in 14-nm FinFET CMOS," *IEEE J. Solid-State Circuits*, vol. 53, no. 4, pp. 1214–1226, Apr. 2018.
  [48] T. Takemoto, H. Yamashita, T. Yazaki, N. Chujo, Y. Lee, and
- [48] T. Takemoto, H. Yamashita, T. Yazaki, N. Chujo, Y. Lee, and Y. Matsuoka, "A 25-to-28 Gb/s high-sensitivity (-9.7 dBm) 65 nm CMOS optical receiver for board-to-board interconnects," *IEEE J. Solid-State Circuits*, vol. 49, no. 10, pp. 2259–2276, Oct. 2014.
- [49] A. Sharif-Bakhtiar and A. C. Carusone, "A 20 Gb/s CMOS optical receiver with limited-bandwidth front end and local feedback IIR-DFE," *IEEE J. Solid-State Circuits*, vol. 51, no. 11, pp. 2679–2689, Nov. 2016.
- [50] S. G. Kim, C. Hong, Y. S. Eo, J. Kim, and S. M. Park, "A 40-GHz mirrored-cascode differential transimpedance amplifier in 65-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 54, no. 5, pp. 1468–1474, May 2019.

- [51] M. H. Nazari and A. Emami-Neyestanak, "A 24-Gb/s double-sampling receiver for ultra-low-power optical communication," *IEEE J. Solid-State Circuits*, vol. 48, no. 2, pp. 344–357, Feb. 2013.
- [52] S. Daneshgar, H. Li, T. Kim, and G. Balamurugan, "A 128 Gb/s, 11.2 mW single-ended PAM4 linear TIA with 2.7 μA<sub>rms</sub> input noise in 22 nm FinFET CMOS," *IEEE J. Solid-State Circuits*, vol. 57, no. 5, pp. 1397–1408, May 2022.
- [53] D. Patel, A. Sharif-Bakhtiar, and A. C. Carusone, "A 112 Gb/s -8.2 dBm sensitivity 4-PAM linear TIA in 16 nm CMOS with copackaged photodiodes," in *Proc. IEEE Custom Integr. Circuits Conf.* (CICC), Apr. 2022, pp. 1–2.
- [54] W. Bae, "CMOS inverter as analog circuit: An overview," J. Low Power Electron. Appl., vol. 9, no. 3, p. 26, Aug. 2019. [Online]. Available: https://www.mdpi.com/2079-9268/9/3/26
- [55] K. Zheng et al., "An inverter-based analog front-end for a 56-Gb/s PAM-4 wireline transceiver in 16-nm CMOS," *IEEE Solid-State Circuits Lett.*, vol. 1, no. 12, pp. 249–252, Dec. 2018.
- [56] K. Zheng, Y. Frans, K. Chang, and B. Murmann, "A 56 Gb/s 6 mW 300 μm<sup>2</sup> inverter-based CTLE for short-reach PAM2 applications in 16 nm CMOS," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, Apr. 2018, pp. 1–4.
- [57] M. M. P. Fard, O. Liboiron-Ladouceur, and G. E. R. Cowan, "1.23-pJ/bit 25-Gb/s inductor-less optical receiver with low-voltage silicon photodetector," *IEEE J. Solid-State Circuits*, vol. 53, no. 6, pp. 1793–1805, Jun. 2018.
- [58] D. Li et al., "A low-noise design technique for high-speed CMOS optical receivers," *IEEE J. Solid-State Circuits*, vol. 49, no. 6, pp. 1437–1447, Jun. 2014.
- [59] J. Zerbe et al., "A 5 Gb/s link with matched source synchronous and common-mode clocking techniques," *IEEE J. Solid-State Circuits*, vol. 46, no. 4, pp. 974–985, Apr. 2011.
- [60] J.-H. Kang et al., "A 24-Gb/s/pin 8-Gb GDDR6 with a half-rate daisy-chain-based clocking architecture and I/O circuitry for low-noise operation," *IEEE J. Solid-State Circuits*, vol. 57, no. 1, pp. 212–223, Jan. 2022.
- [61] T. Takemoto et al., "A 25-Gb/s 2.2-W optical transceiver using an analog FE tolerant to power supply noise and redundant data format conversion in 65-nm CMOS," in *Proc. Symp. VLSI Circuits (VLSIC)*, Jun. 2012, pp. 106–107.
- [62] Y. Lu, Y. Wang, Q. Pan, W.-H. Ki, and C. P. Yue, "A fully-integrated low-dropout regulator with full-spectrum power supply rejection," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 62, no. 3, pp. 707–716, Mar. 2015.
- [63] T. Toifl, C. Menolfi, P. Buchmann, M. Kossel, T. Morf, and M. L. Schmatz, "A 1.25–5 GHz clock generator with high-bandwidth supply-rejection using a regulated-replica regulator in 45-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 44, no. 11, pp. 2901–2910, Nov. 2009.
- [64] Y. Lim, J. Lee, S. Park, Y. Jo, and J. Choi, "An external capacitorless low-dropout regulator with high PSR at all frequencies from 10 kHz to 1 GHz using an adaptive supply-ripple cancellation technique," *IEEE J. Solid-State Circuits*, vol. 53, no. 9, pp. 2675–2685, Sep. 2018.
- [65] J.-H. Jang, H.-D. Gwon, T.-H. Kong, J.-H. Yang, and B.-D. Choi, "A 0.5–1 V, –68 dB power supply rejection capacitorless analog LDO using voltage-to-time conversion in 28-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 57, no. 8, pp. 2462–2473, Aug. 2022.
- [66] D. Li, L. Geng, F. Maloberti, and F. Svelto, "Overcoming the transimpedance limit: A tutorial on design of low-noise TIA," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 69, no. 6, pp. 2648–2653, Jun. 2022.
- [67] D. Li et al., "Low-noise broadband CMOS TIA based on multi-stage stagger-tuned amplifier for high-speed high-sensitivity optical communication," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 66, no. 10, pp. 3676–3689, Oct. 2019.
- [68] E. Säckinger, "The transimpedance limit," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 57, no. 8, pp. 1848–1856, Aug. 2010.
- [69] B. Analui and A. Hajimiri, "Bandwidth enhancement for transimpedance amplifiers," *IEEE J. Solid-State Circuits*, vol. 39, no. 8, pp. 1263–1270, Aug. 2004.
- [70] M. Neuhauser, H.-M. Rein, and H. Wernz, "Design and realization of low-noise, high-gain Si-bipolar transimpedance preamplifiers for 10 Gb/s optical-fiber links," in *Proc. IEEE Bipolar/BiCMOS Circuits Technol. Meeting*, Oct. 1994, pp. 163–166.
- [71] M. Neuhäuser, H.-M. Rein, H. Wernz, and A. Felder, "13 Gbit/s Si bipolar preamplifier for optical front ends," *Electron. Lett.*, vol. 29, pp. 492–493, Mar. 1993. [Online]. Available: https://digitallibrary.theiet.org/content/journals/10.1049/el\_19930329

- 14
- [72] W. Li et al., "100 Gbit/s co-designed optical receiver with hybrid integration," Opt. Exp., vol. 29, no. 10, pp. 14304–14313, May 2021. [Online]. Available: http://www.osapublishing.org/oe/abstract.cfm?URI=oe-29-10-14304
- [73] D. M. Pozar, Microwave Engineering. Hoboken, NJ, USA: Wiley, Nov. 2011. [Online]. Available: https://www.xarg.org/ref/a/0470631554/
- [74] L. Selmi, D. B. Estreich, and B. Ricco, "Small-signal MMIC amplifiers with bridged T-coil matching networks," *IEEE J. Solid-State Circuits*, vol. 27, no. 7, pp. 1093–1096, Jul. 1992, doi: 10.1109/4.142607.
- [75] J. Paramesh and D. J. Allstot, "Analysis of the bridged T-coil circuit using the extra-element theorem," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 53, no. 12, pp. 1408–1412, Dec. 2006, doi: 10.1109/TCSII.2006.885971.
- [76] S. Shekhar, J. S. Walling, and D. Allstot, "Bandwidth extension techniques for CMOS amplifiers," *IEEE J. Solid-State Circuits*, vol. 41, no. 11, pp. 2424–2439, Nov. 2006, doi: 10.1109/JSSC.2006.883336.
- [77] S. C. D. Roy, "Comments on 'analysis of the bridged T-coil circuit using the extra-element theorem," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 54, no. 8, pp. 673–674, Aug. 2007.
- [78] M. Kossel et al., "A T-coil-enhanced 8.5 Gb/s high-swing SST transmitter in 65 nm bulk CMOS with < -16 dB return loss over 10 GHz bandwidth," *IEEE J. Solid-State Circuits*, vol. 43, no. 12, pp. 2905–2920, Dec. 2008, doi: 10.1109/JSSC.2008.2006230.
- [79] N. Pham, B. Mutnury, E. Matoglu, M. Cases, and D. De Araujo, "Package model for efficient simulation, design, and characterization of high performance electronic systems," in *Proc. IEEE Workship Signal Propag. Interconnects*, May 2006, pp. 39–42.
- [80] C.-H. Wu, C.-H. Lee, W.-S. Chen, and S.-I. Liu, "CMOS wideband amplifiers using multiple inductive-series peaking technique," *IEEE J. Solid-State Circuits*, vol. 40, no. 2, pp. 548–552, Feb. 2005.
- [81] P. Heydari and R. Mohanavelu, "Design of ultrahigh-speed low-voltage CMOS CML buffers and latches," *IEEE Trans. Very Large Scale Integr.* (VLSI) Syst., vol. 12, no. 10, pp. 1081–1093, Oct. 2004.
- [82] V. Subramanian et al., "Identifying the bottlenecks to the RF performance of FinFETs," in *Proc. 23rd Int. Conf. VLSI Design*, Jan. 2010, pp. 111–116.
- [83] S. Makovejev, S. Olsen, and J. Raskin, "RF extraction of self-heating effects in FinFETs," *IEEE Trans. Electron Devices*, vol. 58, no. 10, pp. 3335–3341, Oct. 2011.



**Dhruv Patel** (Graduate Student Member, IEEE) received the B.A.Sc. degree in electrical engineering from the University of Waterloo, Waterloo, ON, Canada, in 2016, and the M.A.Sc. degree in electrical engineering from the University of Toronto, Toronto, ON, Canada, in 2020, where he is currently pursuing the Ph.D. degree with the Integrated Systems Laboratory, with a focus on optical communication links in CMOS.

He was involved with variation tolerant sub-threshold SRAM circuits' research during

undergraduate studies.

Mr. Patel was a recipient of the Outstanding Student Paper Award at the Custom Integrated Circuits Conference 2022. He received the Ontario Graduate Scholarship and the NSERC Scholarship for his doctoral studies.



Alireza Sharif-Bakhtiar (Member, IEEE) received the B.S. degree in electrical engineering from the Sharif University of Technology, Tehran, Iran, in 2008, the M.S. degree from The University of British Columbia, Vancouver, BC, Canada, in 2011, and the Ph.D. degree in electrical engineering from the University of Toronto, Toronto, ON, Canada, in 2017.

IEEE JOURNAL OF SOLID-STATE CIRCUITS

He is currently working as a Principal Engineer at Alphawave IP, Toronto. His current research interests include high-speed optical interconnects.



**Tony Chan Carusone** (Fellow, IEEE) received the Ph.D. degree from the University of Toronto, Toronto, ON, Canada, in 2002.

He has been a Professor with the Department of Electrical and Computer Engineering, University of Toronto. He has also been a consultant to industry in the areas of integrated circuit design and digital communication since 1997. He is currently the Chief Technology Officer of Alphawave IP Group, Toronto. He has coauthored the popular textbooks *Analog Integrated Circuit Design* (along with D.

Johns and K. Martin) and *Microelectronic Circuits* (along with A. Sedra, K. C. Smith, and V. Gaudet, Eighth Edition).

Prof. Chan Carusone coauthored the Best Student Papers at the 2007, 2008, 2011, and 2022 Custom Integrated Circuits Conferences, the Best Invited Paper at the 2010 Custom Integrated Circuits Conference, the Best Paper at the 2005 Compound Semiconductor Integrated Circuits Symposium, the Best Young Scientist Paper at the 2014 European Solid-State Circuits Conference, and the Best Paper at DesignCon 2021. He was the Editor-in-Chief of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS in 2009 and an Associate Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS from 2010 to 2017. He was a Distinguished Lecturer of the IEEE Solid-State Circuits Society from 2015 to 2017. He has served on the Technical Program Committee of several IEEE conferences, including the International Solid-State Circuits Conference from 2016 to 2021. He is also the Editor-in-Chief of the IEEE SOLID-STATE CIRCUITS LETTERS.