Low-Power CMOS Receivers For Short Reach Optical Communication

Alireza Sharif-Bakhtiar\textsuperscript{1}, Michael G. Lee\textsuperscript{2}, Anthony Chan Carusone\textsuperscript{1}

\textsuperscript{1}Department of Electrical and Computer Engineering, University of Toronto, e-mail: alireza.sharif-bakhtiar@isl.utoronto.ca
\textsuperscript{2}Fujitsu Labs of America

Abstract—Emerging applications for short-reach optical communication require low-power receiver circuits in nanoscale CMOS technologies. An analysis of optical receivers with broadband input transimpedance reveals that their power consumption increases rapidly as bit-rate increases. This has motivated work on bandwidth-limited optical receiver front-ends. For example, receivers employing decision feedback equalization (DFE) and correlated-double sampling (CDS) are analyzed, showing that they significantly relax the bandwidth requirements of the analog front-ends, permitting their low-power implementation in CMOS. Finally the design of an optical receiver utilizing an integrate-and-dump (ID) front-end is described. The receiver is implemented in 28nm CMOS and achieves -8.3dBm sensitivity at 20Gbps consuming 0.7pJ/b.

I. INTRODUCTION

The rapid increase in the speed demanded of wireline links within data centers and high performance computing has increased the size and weight of copper cabling and the power consumption of the associated transceiver circuits making them increasingly prohibitive. Hence, optical links utilizing multi-mode fiber (MMF) are increasingly seen as preferable for link reaches up to 300m, particularly for links of 3-50m where the optical dispersion of the fiber is negligible [1]. Over such short distances, large numbers of transceivers must operate in parallel with high port density. This makes it important to reduce the power consumption per link. Ultimately it is desirable to integrate the optical transceiver circuits on the same die as large CMOS ASICs that direct and process data traffic. Doing so will eliminate the need for very-short reach (VSR) wireline transceivers communicating over PCB traces between the ASIC and off-chip optical transceiver circuits. Doing so requires the optical receiver front-end to be implemented in nanoscale CMOS technologies.

Implementation of the optical receiver front-end in nanoscale CMOS presents both opportunities and challenges. Nanoscale CMOS is notorious for its relatively low intrinsic transistor gain, making it difficult to realize a high-gain low noise front-end. However, CMOS affords a designer very high-speed switches and low-power high-speed latches and digital logic. This paper will illustrate design techniques developed for optical receivers that exploit the benefits of nanoscale CMOS to obviate its challenges.

This paper is organized as follows: Section II analyzes optical receivers with conventional wideband transimpedance front-ends which typically have a bandwidth of 70% the link symbol rate or more. It explains how such front-ends become increasingly less power efficient as the data rate increases. This has motivated the development of optical receiver front-ends with bandwidths far below the symbol rate. Section III explains the benefits and the trade-offs in designing a receiver that combines a limited bandwidth front-end with a decision feedback equalizer (DFE). It will be seen that reducing the front-end bandwidth and removing the resulting intersymbol interference (ISI) using a DFE not only results in higher vertical eye-opening for a given power consumption, it also improves the receiver sensitivity. Section IV explains the operation of correlated double sampling (CDS) receivers. Due to their feed forward structure, these receivers do not suffer from a critical timing path in a feedback loop, which can limit the maximum operating speed of DFE-based receivers. However, their sensitivity is limited below what is achievable with DFE-based receivers. Section V explains the operation and design of an integrate-and-dump (ID) receiver. It will be seen that an ID stage can provide a large gain and also filter out the high frequency noise of the stages preceding it. Unlike DFE-based receivers, ID receivers lack a feedback loop and hence can be operated at higher speed. A prototype is designed and fabricated in 28nm CMOS. The receiver reaches -8.3dBm sensitivity at 20Gbps with 0.7pJ/b power efficiency.

II. WIDEBAND FRONT-ENDS

The discrete reverse-biased photodiodes typical for short-reach optical communication can be modeled with a current source in parallel with a parasitic capacitance $C_{PD}$. The photocurrent ($I_{PD}$) is linearly proportional to the power of incoming light with the conversion gain referred to as...
the photodiode’s “responsivity” (A/W). Typical photodiode responsivities for this application are in the range of 0.5-1 A/W resulting in input signal currents in the range of 100-200 μA

The photodiode is connected to the receiver chip having a front-end input resistance of R_{IN} and generating a voltage V_{IN}. Fig. 2a. Larger R_{IN} is desirable to maximize the voltage swing at the input of receiver. However, C_{PD} and other parasitic capacitances at the input of the receiver form a pole with R_{IN} at \( f_{IN} = \frac{1}{2\pi R_{IN}(C_{IN}+C_{PD})} \). Fig. 3a plots the amplifier output voltage, assuming an ideal amplifier gain A for different R_{IN} values. The input rectangular pulse of current. It can be seen that a small value of R_{IN} maximizes the bandwidth and reduces post-cursor ISI, indicated by the samples \( V_{A,k} \geq 0 \) in Fig 3a. However, it also reduces the amplitude of the main sample \( (V_{A,0}) \). On the other hand, a larger value of R_{IN} makes the main sample bigger but also increases the post-cursor ISI. A worst case data pattern will cause all post-cursor ISI to add up constructively and reduce the vertical eye-opening at V_{A}. The amplitude of the main cursor \( V_{A,0} \), sum of all post-cursor ISI samples \( \sum_{k=1}^{\infty} |V_{A,k}| \), and their difference which indicates the worst case vertical eye-opening are shown in Fig. 3b. All values are normalized to \( \frac{A_{PD}}{f_{bit}(C_{PD}+C_{IN})} \) which is the voltage swing at V_{A} in one bit period when R_{IN} \( \rightarrow \infty \) and the input signal is integrated on C_{PD} + C_{IN}. It can be seen that as a tradeoff between gain and ISI, \( f_{IN} = 0.3 f_{bit} \) results in the largest normalized eye-opening of 0.37.

Since C_{IN} + C_{PD} can be large, R_{IN} (for \( f_{IN} = 0.3 f_{bit} \)) is limited to small value, thus providing small voltage swing at V_{IN}. The way to get around this limit is to use a transimpedance amplifier (TIA) at the input of the chip. Fig. 4 shows three popular examples of TIAs. Fig 4a is a common gate stage with two poles, input resistance R_{IN} \( \approx 1/g_{m} \), and transimpedance gain R_{A}. The first pole is at the input node with \( f_{IN} = \frac{g_{m}}{2\pi(C_{IN}+C_{PD})} \). The second pole is at the output node at \( f_{A} = \frac{1}{2\pi R_{A} C_{A}} \). For large values of g_{m}, f_{IN} becomes large enough to have negligible effect on the overall bandwidth. Therefore, the overall transimpedance gain and the bandwidth are set at the output node and the value of R_{A} that maximizes the eye opening at V_{A} in the presence of ISI is the one for which \( f_{A} = 0.3 f_{bit} \). Since typically \( C_{A} \ll (C_{IN} + C_{PD}) \), a much higher transimpedance gain can be reached by utilizing a TIA.

At high bit rates the input transistor of the common-gate stage has to be biased at high bias currents for a sufficiently high g_{m}. The high bias currents increase the voltage drop across R_{A} and makes it difficult to combine high gain and high bandwidth under the low supply voltages of nanoscale CMOS technologies. Therefore, in Fig. 4b an auxiliary amplifier is added to reduce the input resistance to \( R_{IN} = \frac{1}{g_{m}(A+1)} \). The auxiliary amplifier helps reduce the bias current \( I_{D1} \) by reducing the required g_{m} by a factor \( (A+1) \).

Another approach is to use a shunt-feedback amplifier as the receiver front-end TIA (Fig. 4c). Assuming an ideal amplifier the input resistance of the feedback TIA is \( R_{IN} = \frac{R_{T}}{A+1} \). This results in a transimpedance gain of \( R_{T} \approx R_{F} \) and a bandwidth of \( f_{IN} = \frac{2\pi R_{F}(C_{IN}+C_{PD})}{A+1} \).

When including the finite bandwidth and self-loading of the amplifier, the maximum achievable transimpedance gain of a TIA drops with the square of the TIA bandwidth, hence \( A_{TIA} \propto 1/f_{bit}^{2} \) [2]. In SiGe BiCMOS technologies, where intrinsic transistor gain and bandwidth are very high, the tradeoff can still result in an acceptable combination of gain and bandwidth [3], [4]. However, in CMOS the tradeoff results in low transimpedance gain. To compensate for the low gain of the wideband TIA, additional wideband voltage amplifiers should be added following TIA to reach the minimum required gain of the front-end (A_{T}). This is shown in Fig. 2b where N identical voltage amplifiers are connected in series to achieve an overall gain of \( A_{PA} = A_{T} / A_{TIA} \propto f_{bit}^{2} \). To avoid introducing additional ISI due to the post amplifier, its bandwidth (f_{PA}) needs to be approximately equal to f_{bit}. Assuming each voltage amplifier stage has a first-order response with a dc gain of A_{s} and bandwidth of f_{s}, the overall gain and bandwidth of the post amplifier become [5],

\[
A_{PA} = A_{s}^{N} \tag{1}
\]

\[
f_{PA} = f_{s} \sqrt{\frac{1}{2} - 1} \tag{2}
\]

Noting that the power of the post amplifier is roughly proportional to the square of its gain-bandwidth product, using (1) and (2) we have,

\[
P_{PA} \propto (N \sqrt{A_{PA}} f_{PA} \sqrt{\frac{1}{2} - 1})^{2} \tag{3}
\]

By increasing f_{bit}, A_{TIA} drops by \( f_{bit}^{2} \). As a result \( A_{PA} \) has to increase by \( f_{bit}^{2} \) for a given \( A_{T} \). Moreover, each voltage amplifier stage needs to have a higher bandwidth, so a larger number of stages, N, is likely needed to reach the target gain. As a result several terms in (3) increase with f_{bit} meaning
that the power consumption of the post amplifier increases proportionally to $f_{bit}^{x}$ with $x > 2$.

Several techniques have addressed the challenge of developing a low-power optical receiver front-end at high-speed. For example, bandwidth extension techniques for optical front-ends have been demonstrated [6] - [8] and the use of more advanced CMOS [9] and BiCMOS technologies [3], [4] where higher gain-bandwidths are possible. However, Sections III, IV, and V discuss alternative techniques that can potentially lower the power consumption of optical receivers and allow them to be more easily integrated into CMOS ASICs.

III. BANDWIDTH-LIMITED FRONT-END WITH DFE

Equalizing a bandwidth limited signal with a continuous-time linear equalizer (CTLE) and/or feedforward equalizer (FFE) amplifies the high-frequency input-referred noise of the front-end and hence degrades the signal to noise ratio of the front-end output, reducing the sensitivity of the receiver. However, a DFE filters the “noiseless” digital signal at the output of the slicer to predict and remove post-cursor ISI as shown in Fig. 5 without boosting the high frequency noise. Using Fig. 3b it can be seen that if a DFE removes all post-cursor ISI, the vertical eye-opening becomes simply the amplitude of the main cursor. As a result, reducing the front-end bandwidth to near zero (i.e. an integrating front-end) and utilizing a DFE to cancel the resulting ISI results in 2.7 times larger eye-opening compared to the case with a first-order front-end and no DFE. However, this analysis considers only ISI and neglects noise. In fact, reducing the front-end bandwidth too low integrates and excessively amplifies the low frequency noise (without much increase in the main cursor’s amplitude) and actually degrades the sensitivity of the receiver. Fig. 6a plots the ratio of vertical eye-opening to rms noise assuming first-order front-end with bandwidth $f_A$, without DFE, with 1-tap, 2-tap, and ideal DFE: (a) white input referred noise; (b) noise with a zero in the input referred noise at $0.3 f_{bit}$.

This property is used in [11] to maximize the gain of the TIA and improve the input referred noise of the receiver. The receiver uses a low-bandwidth TIA at the front-end. Due to the low bandwidth requirement the TIA can provide a large DC gain without consuming very large power. The low bandwidth of the TIA also filters high frequency noise that limits the sensitivity of the receiver. The ISI introduced by the low-bandwidth TIA is then removed by a 2-tap DFE providing adequate signal integrity. The work in [11] achieves an excellent sensitivity of -22dBm at 4Gbps.

Due to other poles and variations in the front-end parameters, as a rule of thumb, the TIA bandwidth is typically chosen as $f_A \approx 0.7 f_{bit}$.
This work was followed by [12] where the TIA is replaced by a simple resistor to form a low-bandwidth node at the input. The value of the resistor is chosen so the input bandwidth is 0.12f_{bit} providing large gain but with significant post-cursor ISI. Because the input node forms an RC-filter, it results in predictable exponential decaying ISI. An infinite impulse response DFE (IIR-DFE) [15] can remove such exponentially-decaying ISI, and is incorporated into the optical front-end of [12] as shown in Figure 7a. Input signal current pulses I_{PD}, after passing through the filter formed by R_{IN}(C_{IN}+C_{PD}), become V_{IN,PD}. Feedback current pulses, I_{DFE}, are passed through the same RC-filter becoming V_{IN,DFE}. The superposition of V_{IN,PD} and V_{IN,DFE} is V_{IN}, which ideally has no residual ISI. Because both I_{PD} and I_{DFE} pass through the same RC-filter there is no need to adjust the DFE-IIR time constant. The only value to be set is the DFE tail current. However, due to the delay of the DFE feedback, in practice the DFE is unable to remove all the post-cursor ISI. Fig. 7b plots the receiver signals in the case with no delay in the feedback loop, and the more realistic case with finite delay \( \Delta T \) (in this example \( \Delta T = 0.5 \text{UI} \)). It can be seen that in the presence of \( \Delta T \) the IIR DFE does not fully remove the post-cursor ISI. An extra (FIR) tap can alleviate this problem but it is missing from the presented work [13].

Another challenge with this receiver is that to maintain the input bandwidth even at the modest level of 0.12f_{bit}, the input resistor R_{IN} has to remain relatively small (600\( \Omega \) in a CMOS implementation and 750\( \Omega \) BiCMOS) which makes the voltage swing at the input of the latch relatively small. Thus this work was limited by the latch sensitivity, achieving -7dBm at 8Gbps for CMOS implementation and -10.6dBm at 10Gbps for a BiCMOS implementation.

To achieve a high front-end gain despite the large capacitance at the input of the receiver, [14] utilizes a current buffer to isolate the input node from the low frequency high gain node. The current buffer provides a low input impedance to create a wideband node at the input and guarantees the majority of the input current (I_{PD}) enters the receiver. This buffered current is then delivered to a high gain low bandwidth node to generate a voltage. The current buffer is realized by a regulated-cascode (RGC) stage, Fig. 4b.

In order to operate a DFE-based receiver at high high data rates, the critical feedback timing path must be addressed. For a DFE to function properly the feedback signal must be already have settled at the summer node before the flip-flop makes the next decision. Equivalently the sum of the delay through the flip-flop, feedback path, the settling of the summer node, and the flip-flop setup time needs to be smaller than 1-UI. This condition becomes difficult to meet at higher speeds and limits the maximum speed of operation in receivers utilizing a DFE. All prior art IIR-DFEs employ an explicit feedback either using the output of a full-rate retimer [12], or incorporating a full-rate multiplexer into the feedback [15] - [17] in order to apply the full-rate recovered data pattern to an IIR analog filter. This feedback loop consumes additional power and adds delay to the DFE feedback path and has limited their operating speed to 16Gbps [17]. An alternative architecture was presented in [14] wherein no full-rate data signal is reproduced. Instead, the passive IIR filter is multiplexed between half-rate signal paths.

The IIR-DFE schematic is shown in Fig. 8. A single differential IIR filter, R_{F} and C_{F}, degenerates two half-rate latches. Transistors M_{1} are the input transistors, serving as the DFE summer. They act upon their gate-source voltage: the difference between the front-end output V_{A}, and IIR feedback voltage V_{F}. Transistors (M_{2}) are clocked to alternately connect each of the half-rate latches (M_{1-4}) to the input transistors, effectively multiplexing the IIR filter between latches. When the clock is low, M_{2} disconnects the latch from the input and feedback, and precharges the output nodes to V_{DD}. When the clock goes high, M_{3} injects a differential current proportional to V_{A} and \( -V_{F} \), thus performing the DFE subtraction and tripping the latch. At the same time, the result of the comparison deposits charge onto either \( V_{F}^{+} \) or \( V_{F}^{-} \) depending on the polarity of the received bit, thus providing decision feedback for subsequent bits.

IV. BANDWIDTH-LIMITED FRONT-END WITH CDS

CDS optical receivers are another type of proposed low-bandwidth front-end receivers [18]- [21]. In [18], shown in Fig. 9, the photodiode’s current (I_{PD}) was integrated on the photodiode’s parasitic capacitance (C_{PD}) and receiver’s input parasitic capacitance (C_{IN}). Acting as an integrator, the front-end bandwidth tends toward zero. The CDS receiver then samples the voltage at the input (V_{IN}) every UI. If sample
$V_{IN}[N]$ is greater than its previous sample $V_{IN}[N-1]$ it decides the incoming bit is a “1” otherwise a “0”. CDS is effectively a 1-tap FFE which subtracts the previous sample from the current sample, $(1 - z^{-1})$.

A challenge of continuously integrating the input signal arises when long sequences of consecutive identical digits (CID) occur. In the presence of long CID sequences, $V_{IN}$ will become very close to the front-end supply voltage or ground which can disturb the front-end dc biasing. This problem was addressed in [19] with a 2.2-kΩ resistor connected between the input and a dc bias voltage. This resistor limits the dc gain and, hence the input voltage swing. Doing so maintains linear operation of the front-end. However, as shown in Figure 10 in the event of long CID sequences, $V_{IN}$ saturates and the difference between consecutive samples ($\Delta V_{IN}$) becomes very small. To address this problem a “dynamic offset modulation” (DOM) is introduced, which is effectively a second FFE tap. The DOM compares the input dc voltage with a reference voltage and adds a correction signal proportional to this difference. This second tap compensates for the saturation of the input RC-circuit and maintains a constant input to the comparator, $V_C$ during long CID sequences.

Both structures discussed [18] and [19] integrate the signal on to $C_{PD} + C_{IN}$. The charge on these capacitors is then shared with the sampling capacitors every time a sampling switch turns on. To keep the change in $V_{IN}$ due to the charge sharing minimal, the capacitance $C_{PD} + C_{IN}$ must be much greater than the sampling capacitances ($2C_S$). This condition creates two difficulties for the CDS receivers: a) It sets a minimum value for $C_{PD} + C_{IN}$ and therefore the receiver can not straightforwardly benefit from faster photodiodes with small $C_{PD}$. Provided that $\Delta V_{IN,max} < I_s/(f_{bit}(C_{PD} + C_{IN}))$, with $C_{PD} + C_{IN}$ given, $I_s$ has to increase linearly with $f_{bit}$ to maintain the same swing at the input and thus the receiver’s sensitivity drops linearly with bit-rate. b) It limits how large $C_s$ can become, which makes it difficult to reduce the $kT/C_s$ noise of the samplers. To get around this limit, [20] buffers the input current $I_{PD}$ before applying it to the samplers. The buffer (a feedback amplifier and a Cherry-Hooper style amplifier stage) isolates the sampling switches from the input capacitance, and therefore removes the need for a large $C_{PD} + C_{IN}$. The TIA also provides some gain (3-kΩ) which reduces the noise contribution of the samplers and the comparator.

The combination of low power circuit structures, advanced 28nm CMOS technology, and the ultra low capacitance of the silicon-photonic photodiode used in this work result in an excellent power efficiency of 170fJ/b (excluding clock buffers).

Unlike the DFE-based receivers, the lack of a critical timing path in a feedback loop in the FFE-based equalizer allows faster operation. However, FFE-based receivers do suffer from noise boosting which can degrade their sensitivity.

V. INTEGRATE-AND-DUMP (ID) RECEIVERS

DFE-based receivers estimate the post-cursor ISI introduced by the low-bandwidth of the front-end and subtracts it from the signal before making a decision. Another approach is to remove the post-cursor ISI by resetting the low-bandwidth node. An ID receiver in combination with a DFE has been reported in [22]. This work achieves the high sensitivity...
number of -10dBm (OMA) at 25Gbps. However, the power efficiency remains at 1.1pJ/b (excluding the clock buffers) due to the power hungry TIA in the front-end, wideband pre-amplifier before the comparators, and the current-mode logic comparators necessary for the utilized DFE structure. This section explains the design of a quarter-rate ID receiver in 28nm CMOS technology with lower power consumption due to a) a lower power TIA structure been utilized b) the amplifiers are reset to achieve high-gain and low ISI with higher power efficiency (Section V-C) c) CMOS dynamic comparators are used.

A. ID receiver prototype

To maximize the voltage signal swing while also filtering high frequency noise, the signal has to be integrated over the course of every UI and reset before the next integration begins to avoid ISI due to integration. This means that a half-rate (or even lower sub-rate) architecture is necessary so when one branch is in the integration phase, the other can reset.

In this work, a pseudo differential structure is employed with one of the inputs connected to a dummy photodiode. The block diagram of the receiver is shown in Fig. 11. A current buffer comprised of a feedback TIA stage and two transconductance stages, provides a low input impedance at the input and, thus, a wideband front-end. It generates two copies of the input current at its outputs. An offset cancellation loop is incorporated into the current buffer to remove offset between the pseudo-differential outputs of the current buffer. The receiver utilizes quarter-rate ID stages. The outputs of the ID stages are sampled by Strongarm comparators, converted into non-return to zero (NRZ) pattern with RS latches, and delivered to the output driver.

B. Receiver front-end current buffer

The current buffer schematic is shown in Fig. 12a. The first feedback amplifier provides the low input impedance and hence a wideband input node. A Cherry-Hooper style amplifier then provides some amplification to reduce the noise contribution of the following stages. The output of this amplifier goes to two transconductance stages to generate two amplified copies of the input current.

Assuming the receiver front-end has a first-order response with constant gain-bandwidth product $f_0$, Fig. 12b plots the output signal amplitude of an ideal ID stage at the end of 1-UI integration normalized the output for a dc input $(V_{OUT,dc})$, as a function of the amplifier’s -3dB bandwidth, $f_0/f_{bit}$. It can be seen that for a current buffer bandwidth lower than $0.25f_{bit}$ the bandwidth limitation severely reduces the output signal amplitude. On the other hand very wide bandwidth results in low gain which also reduces the output swing. A front-end bandwidth around $0.4f_{bit}$ maximizes the ID output signal amplitude. Note that reducing the current buffer’s bandwidth causes some increase in the low frequency noise at the output of the current buffer. It also reduces the effect of the noise peaking due to the zero in the noise transfer function as mentioned in Section III. As a result the input referred noise of the receiver does not significantly vary by changing the bandwidth from $0.4f_{bit}$ to $f_{bit}$.

C. Integrate-and-Dump (ID) circuit

Fig. 13 shows two ID circuit (ID$_1$ and ID$_2$) connected to the transconductance stage of the current buffer. The ID is clocked by four phases of a quarter-rate clock $(f_{clk} = f_{bit}/4)$, $\phi_1$, $\phi_2$, $\phi_1$, and $\phi_2$ each with 90° phase shift and with 50%
duty cycle. These clock signals are generated by dividing a half-rate clock by two with an on-chip frequency divider. As a result, each ID circuit has four phases of operation. For example, ID1 in Fig. 13 consists of a sampling switch (driven by φ2) and an amplifier which can be reset by a switch (driven by φ1). When φ1 = 1 and φ2 = 0, ID1 goes into “Internal reset”. In this phase the feedback switch forces the input and the output of the amplifier to go to the same voltage. In the next phase (φ1 = 1 and φ2 = 1) the ID goes into the “Reset” phase. In this phase both switches are closed creating a low impedance node at the output of the “gm” stage. This resets the output node of the “gm” stage. Next φ1 = 0 and the “Integration” phase begins. In this phase the current from the “gm” stage gets integrated on the input capacitance of the amplifier (CIN) and the result of the integration is amplified at “OUT1”. In the last phase φ2 goes to zero. With both switches open the ID goes into the “Hold” phase and the result of the integration is held and amplified at OUT1 for one UI. The “hold” phase becomes more important at high bit-rates to maintain the input signal during comparator regeneration and thereby ensure proper functionality of the comparators.

Due to mismatch in the transistors in the “gm” circuit and the amplifiers in the ID stages, the common-mode level at the outputs of different ID slices can vary. Two common-mode feedback (CMFB) circuits are utilized to set the common-mode level at the output of ID slices. Each CMFB measures the common-mode level at the output of two slices (OUT1 and OUT2 in Fig. 13) and compares them to the reference voltage VREF. If they are both higher or lower than VREF it applies a current to the output of the “gm” stage to correct this deviation. If only one of the outputs is too high or too low, the CMFB applies a correction current to the output of that particular ID slice.

**D. Measurement Results**

A prototype receiver was fabricated in 28nm CMOS and the photodiodes placed alongside the receiver in an open cavity QFN package (die photo shown in Fig. 14). The input optical signal was generated by directly modulating a 850nm wavelength VCSEL. The output of the VCSEL was coupled through a multimode fiber pigtail, which was connected to an optical probe over the discrete photodiode. An optical attenuator was placed between the VCSEL and the probe to adjust the optical power. A 20Gbps PRBS7 pattern was applied while varying the receiver sampling phase and input optical modulation amplitude (OMA). The bit error rates (BER) at each of the 4 quarter-rate outputs are plotted in Fig. 15. The bathtub curves are shown in Fig. 15a at -7dBm OMA; all four channels show an eye opening better than 0.17UI. Waterfall curves are plotted in Fig. 15a showing the receiver achieves a sensitivity of better than -8.3dBm on all four channels. The TIA consumes 7mW, gm-stage, ID, comparators and the RZ-to-NRZ blocks consume 3.6mW, and the clock divider and clock buffers consume 3.1mW, all operating under a 0.95V supply. This translates to an overall power efficiency of 0.7 pJ/b. Table I compares different state-of-the-art CMOS optical receivers.
Table 1: Comparison table

<table>
<thead>
<tr>
<th>Technology</th>
<th>[9]</th>
<th>[21]</th>
<th>[14]</th>
<th>[22]</th>
<th>This work</th>
</tr>
</thead>
<tbody>
<tr>
<td>Architecture</td>
<td>SOI</td>
<td>CMOS</td>
<td>CMOS</td>
<td>CMOS</td>
<td>CMOS</td>
</tr>
<tr>
<td></td>
<td>32nm</td>
<td>28nm</td>
<td>65nm</td>
<td>45nm</td>
<td>28nm</td>
</tr>
<tr>
<td>Data Rate (Gbps)</td>
<td>28</td>
<td>32</td>
<td>20</td>
<td>25</td>
<td>20</td>
</tr>
<tr>
<td>C/DIN (IF)</td>
<td>N/A</td>
<td>120</td>
<td>300</td>
<td>N/A</td>
<td>200</td>
</tr>
<tr>
<td>C/DPD (IF)</td>
<td>85</td>
<td>0.9</td>
<td>0.5</td>
<td>0.5</td>
<td>0.5</td>
</tr>
<tr>
<td>PD Responsivity (A/W)</td>
<td>0.55</td>
<td>0.5</td>
<td>0.5</td>
<td>0.5</td>
<td>0.5</td>
</tr>
<tr>
<td>Sensitivity**</td>
<td>-7.8</td>
<td>-5.7**</td>
<td>-7.5***</td>
<td>-10.8</td>
<td>-8.6</td>
</tr>
<tr>
<td>Power Efficiency (pJ/b)</td>
<td>2</td>
<td>0.17</td>
<td>0.75</td>
<td>1.1**</td>
<td>0.7</td>
</tr>
<tr>
<td>Area (mm²)</td>
<td>N/A</td>
<td>0.0045</td>
<td>0.027</td>
<td>0.007</td>
<td>0.005</td>
</tr>
</tbody>
</table>

* BER = 1e-12 ** Estimated based on average power
** The difference with [14] is due to a mistake in reporting OMA in the original work
*** Excluding clocking

VI. CONCLUSION

It was shown that the power of wideband optical receivers increases very rapidly with the bit rate. Alternatively the front-end bandwidth can be reduced and the resulting ISI removed by a DFE. This results in power savings and up to 2.8dB sensitivity improvement, however, the speed is limited by the DFE’s critical timing path. On the other hand CDS receivers can provide faster operation due to their feedforward structure but their sensitivity has remained below that of DFE receivers. ID receivers were shown to offer high sensitivity while also being capable of high speed operation due to their feedforward structure. A prototype fabricated in 28nm CMOS was shown demonstrating -8.6dBm sensitivity at 20Gbps consuming 0.7pJ/b.

VII. ACKNOWLEDGEMENT

We would like to thank Fujitsu Labs of America for their support for this project and Finisar Corp. for photodiode donation.

REFERENCES


Fig. 14: Die photo of the ID receiver chip. The receiver occupies 70μm × 70μm.

Fig. 15: (a) Bathtub curves at 20Gbps with OMA = -7dBm (b) Waterfall curves for all four channels at 20Gbps.