

# In the clouds:

# Towards 1Tb/s per carrier

S.P. Voinigescu

University of Toronto



University of Southern California, October 12, 2012

Sorin Voinigescu, October 12, 2012



Graduate students

- Yannis Sarkas
- Andreea Balteanu
- Alex Tomkins
- Eric Dacquay
- Katya Laskin

Collaborators

- Juergen Hasch
- Pascal Chevalier
- Peter Asbeck
- Gabriel Rebeiz
- Jim Buckwalter
- Larry Larson

- NSERC, OCE
- Robert Bosch GmbH, DARPA, Ciena, Gennum for funding
- STMicroelectronics, Darpa, Ciena for chip donations





- Why?
- How?
  - System
  - Antenna
  - Baseband
  - Radio transceiver
- When





#### We are addicted ...





Sorin Voinigescu, October 12, 2012



# What's in a cloud?



wireless links

- optical fiber links
- data centers



#### What's in a data center?



- Optical fiber links
- Coaxial cable links
- Routers
- Boards
- Backplanes









#### Facebook pictures...



- 40 Million pictures uploaded per day to Facebook
  - > 10<sup>15</sup> bits/day => 15 Gb/s
- Worldwide: 2049 data centers

consume 30 Billion Watts = 30 nuclear power stations





## **Evolution of CMOS since 2000**

130 nm, f<sub>T</sub> =80 GHz: V<sub>DD</sub>=1.2 V

Strained channel, SiGe S/D

■ 2004 => 65 nm, f<sub>T</sub> =180 GHz: V<sub>DD</sub>=1.1-1.2 V

More strain

■ 2006 => 45 nm, f<sub>T</sub> = 240 GHz: V<sub>DD</sub> = 1.0-1.2 V

✤High-K MG, more strain

2008 => 32 nm, f<sub>T</sub> = 360? GHz: V<sub>DD</sub>=0.9-1.2 V

✤High-K MG, more strain

■ 2011 => 22 nm, f<sub>T</sub> = 500?? GHz: V<sub>DD</sub> = 0.9 V

✤Tri-gate, High-K MG, more strain



 $E \propto f \cdot C \cdot V_{DD}^2$ 

Moore's law is alive!

More transistors per area => C decreases





 $E \propto f \cdot C \cdot V_{DD}^2$ 

Moore's law is alive!

More transistors per area => C decreases

• Transistor  $f_{\tau}$  (intrinsic speed) continues to improve as 1/L

Clock frequency should improve=> Hint, hint digital designers





 $E \propto f \cdot C \cdot V_{DD}^2$ 

Moore's law is alive!

More transistors per area => C decreases

- Transistor f<sub>1</sub> (intrinsic speed) continues to improve as 1/L
   Clock frequency should improve
- But Dennard's constant-field scaling law (physics) is dead!

 $V_{\rm DD}$  has not scaled



 $E \propto f \cdot C \cdot V_{DD}^2$ 

Moore's law is alive!

More transistors per area => C decreases

- Transistor f<sub>T</sub> (intrinsic speed) continues to improve as 1/L
   Clock frequency should improve
- But Dennard's constant field scaling law (physics) is dead!

   •V<sub>DD</sub> has not scaled
- Moore's law without Dennard's?
  - \* A nuclear power station for the DIGITAL die!



 $E \propto f \cdot C \cdot V_{DD}^2$ 

Moore's so-called law is alive!

More transistors per area => C decreases

- Transistor f<sub>T</sub> (intrinsic speed) continues to improve as 1/L
   Clock frequency should improve
- But Dennard's constant field scaling law (physics) is dead!

   •V<sub>DD</sub> has not scaled
- Moore's law without Dennard's?

A nuclear power station for the DIGITAL die!

Millimeter & sub-millimeter wave circuits are OK!



# Why can't we reduce V<sub>DD</sub>?

- Because the subthreshold slope, S, does not scale
- *S* determined by the Fermi-Dirac distribution function

$$S[V/decade] = \frac{kT}{q} \cdot \ln(10)$$

Valid in

- ◆3-D (Fin)FETs, bipolar transistors
- ◆2-D crystal FETs (graphene, MoS<sub>2</sub>)
- I-D FETs (nanowire, carbon nanotube)





#### So what are we to do?

$$S[V/decade] = \frac{kT}{q} \cdot \ln(10)$$

k and q are constants, T is a variable Solutions

Refrigeration: 77 K (liquid nitrogen), 4 K (space station?)
Not in your hand!
Possible in the data center

New physics:

•Tunnel FETs? Maybe, but S is  $V_{s}$ -dependent.



# More immediate solutions in...

- Wireless, wireline, fiberoptic system architectures that
   Increase data rate >1 Tb/s carrier (imperative in fiber links)
   Increase efficiency per bit
- Faster, more efficient circuit topologies
   CMOS logic at 50-100 Gb/s to save power?
   Stacked CMOS logic for large swing drivers?
- Can we push the carrier frequency to 300 GHz?



|              | 4G WiMAX                       | 60 GHz LOS<br>Radio     | Wireline<br>IEE 802.3.an | Fiber<br>SerDes VCSEL      | Fiber<br>DP-QPSK/BPSK     |
|--------------|--------------------------------|-------------------------|--------------------------|----------------------------|---------------------------|
| Data Rate    | ≤1 Gbps                        | 5.3 Gbps                | 10 Gbps                  | 10 Gbps                    | 50 Gbps                   |
| Power        | 1.76 W                         | 350 mW                  | 2 W                      | 2.5 W                      | 25 W                      |
| Distance     |                                | 2 m                     | 100 m                    | 20 km                      | 3500 km                   |
| Energy/bit   | 1.6 nJ/b                       | 66 pJ/b                 | 200pJ/b                  | 250 pJ/b                   | 500pJ/b                   |
| Energy/bit/m | t                              | 33 pJ/b/m               | 2 pJ/b/m                 | 12.5 fJ/b/m                | 0.14 fJ/b/m               |
| Reference    | [Krishnamurthy,<br>•RFIC 2010] | [Laskin,<br>•RFIC 2011] | [Gupta,<br>ISSCC 2012]   | [Voinigescu,<br>CICC 2001] | [Crivelli,<br>ISSCC 2012] |







#### Optical: 10 fJ/b/m



5000 km





#### Optical: 10 fJ/b/m



Source: Belden Inc.

#### Wireline: 2 pJ/b/m

100 m



Source: Belden Inc.



-

Sorin Voinigescu, October 12, 2012

5000 km

100 m

#### Wireless> 30 pJ/b/m



## BlackBerry

Optical: 10 fJ/b/m



Source: Belden Inc.

Wireless is the most inefficient, yet most popular!

# Wireline: 2 pJ/b/m



Source: Belden Inc.





Sorin Voinigescu, October 12, 2012

# Why Tb/s wireless?



- Short-range reconfigurable wireless data transmission in the data center 1Tb/s wireless @
- Board-to-board





# Why 200-300 GHz?

- Silicon transistors with  $f_{MAX}$  >400 GHz
- 100 GHz of bandwidth with no absorbtion
- Small antenna size with good gain
- Lower power LNA, mixer, receiver
- But...
  - higher power PLL,
  - ◆ reduced P<sub>out</sub>,
  - shorter range ~1/f<sup>2</sup>





Source: G. Rebeiz UCSD



Sorin Voinigescu, October 12, 2012



- Why?
- How?
  - System
  - Antenna
  - Baseband
  - Radio transceiver
- When





### **Scalable Digital Radio Transmitters**

•Can we improve efficiency by increasing the modulation rate per carrier at fixed  $P_{out}$ ?





### **Scalable Digital Radio Transmitters**

•Can we improve efficiency by increasing the modulation rate per carrier at fixed  $P_{0UT}$ ?

•Example: 0.3 Tb/s with 1 W PA => 3.3 pJ/b





### **Scalable Digital Radio Transmitters**

- •Can we improve efficiency by increasing the modulation rate per carrier at fixed  $P_{out}$ ?
- Example: 0.3 Tb/s with 1 W PA => 3.3 pJ/b
  - But 0.3 Tb/s with 64 QAM modulation requires 50-Gb/s serial baseband lanes,
  - difficult to realize efficiently with up-conversion transmitter architecture





## **Potential Solution: Direct Modulation TX Radio**



# Like Coherent Fiberoptics links: 110 Gb/s TX-RX



#### 200+ Gb/s Dual-Polarization TX/RX (ii)



#### How can we get to 1 Tb/s per carrier?

- Fiber: Dual-polarization, 16 QAM at 125 Gbaud
  - 8 baseband lanes at 125 Gb/s
  - Power consumption is not that critical here....
  - Need phase equalization in receiver
  - Need large swing (>5V) 6-bit 125 GS/sec DACs
- Wireless: 256 QAM at 125 Gbaud
  - Power consumption is critical
  - Need amplitude and phase equalization in receiver





# **Direct Modulation TX Radio**

- 2-bit polar-modulated, binary weighted PA cells driven in quadrature
- No back-off needed for linearity
- Phase/Amp bits @ 1-100 Gbps

•On chip free-space power combiner



[A. Balteanu et al. IMS 2012]

# IQ DAC TX with Antenna Level Segmentation



Reconfigurable modulation format





# IQ DAC TX Constellation





Sorin Voinigescu, October 12, 2012



#### Full 8I + 8Q Constellation



Sorin Voinigescu, October 12, 2012

# Wish list for sub-millimetre wave radio

- 100 Gb/s standard CMOS baseband lanes
   Efficiency scalable with data rate
- P<sub>TX</sub> = 10 dBm
- PLL with PN < -90 dBc/Hz in band at 300 GHz</p>
- NF < 12 dB
- P<sub>DC</sub> < 1W
- BW = 25-30%
- Antenna gain > 20 dB (lens)
- Distance: 10's cm







- Why?
- How?
  - System
  - Antenna
  - Baseband
  - Radio transceiver
- When






## **Antenna Integration**



**On-chip** 

Above IC [J. Hasch et al, March 2010]



Sorin Voinigescu, October 12, 2012

#### 120/160 GHz Transceiver Packaging



Chip: 2.2mm×2.6mm

Package: 7mm×7mm

[I. Sarkas Trans MTT, March 2012]



Sorin Voinigescu, October 12, 2012



# 142-152 GHz Antenna and die in QFN package



Package: 7mm×7mm

EU SUCCESS Project

- Antenna design by Stefan
  Beer, Karlsruhe Institute of
  Technology
- Packaging by Robert Bosch
  GmbH
- Fundamental frequency transceiver with self-test
  - [I. Sarkas CSICS 2012]







- Why?
- How?
  - System
  - Antenna
  - Baseband
  - Radio transceiver
- When







#### Rise/fall time, efficiency/bit in 45-nm SOI



#### 40+ Gb/s inductively-peaked CMOS logic



# Broadband, large swing stacked CMOS LOGIC

[I. Sarkas, ISSCC 2012]





Sorin Voinigescu, October 12, 2012



# Eye diagrams at 12 Gb/s









- Why?
- How?
  - System
  - Antenna
  - Baseband
  - Radio transceiver
- When







# **1.5-bit DAC Cell with stacked-CMOS inv.**



- Input balun for single ended to differential conversion
  - the only tuned component in chain
  - needed for testing
- Input CMOS TIAs for broadband matching
- CMOS Inverter based class-D driver chain



## Power-DAC Cell with N-MOS output stage



- DC 50 GHz in 45-nm SOI
- CMOS inverter based. Purely digital
- Scalable to 240 GHz using tuned LO path





#### TIA, **BPSK Modulator**



## Differential output stage with On-Off switch







#### 4-Stacked n-MOS Cascode





Sorin Voinigescu, October 12, 2012



#### 4-Stacked n-MOS Cascode (ii)





## 4-Stacked n-MOS Cascode (iii)





# 4-Stacked n-MOS Cascode (iv)







#### DAC Cell: 28 Gb/s Eyes





Sorin Voinigescu, October 12, 2012

of

## DAC Cell: 36 Gb/s Eyes





45-GHz IQ DAC Cell: Eyes, P<sub>SAT</sub>



#### Pout of 45-GHz DAC cell vs. time





## 2-Gb/s ASK+ 2-Gbs BPSK Mod of 45-GHz Carrier





Sorin Voinigescu, October 12, 2012



## 45-GHz 8-bit IQ DAC chiplet





Sorin Voinigescu, October 12, 2012

59

# **Die photo**







#### 45-GHz 32-bit IQ-DAC board



> 34 dBm, to be designed and packaged by UCSD



#### **Dual Receive Channel Transceiver**



#### Push-Push 148-170 GHz VCO



 $P_{DC} = 360 \text{ mW}, P_{OUT} = -10 \text{ dBm}, PN = -82 \text{ dBc/Hz at 1 MHz offset}$ Sorin Voinigescu, October 12, 2012

## 148-170 GHz LO Tree and TX Amplifiers





3.0 V

3.0 V

E

36p H

0-

Q

Bias



 $P_{D} = 126 \text{ mW}$ 



## 148-170 GHz Low Noise Amplifier



 $P_{D} = 67 \text{ mW}$ , Gain= 20 dB, NF < 12 dB.







# Die photograph



Chip: 2.1mm×2.9mm

130-nm BiCMOS9MW: SiGe HBT f<sub>T</sub>= 230 GHz, f<sub>MAX</sub> = 280 GHz

#### **RX Breakout Gain and Noise Figure**







## **Transceiver PLL Phase Noise**



## **On-die Doppler Test**









# Packaging



#### Dr. J. Hasch





Sorin Voinigescu, October 12, 2012

#### **In-package Antennas Simulation**



## 240-GHz Transceiver Blocks


### **240-GHz Amplifier**



#### **240-GHz Amplifier**



#### 150-GHz VCO-prescaler



#### Measurements



#### **300-GHz VCO-Doubler**



A. Tomkins et al., BCTM 2012















#### **300-GHz Signal Source Comparison**



## **300-GHz Signal Source Comparison (ii)**



#### Phase Noise of VCO-doubler at 309 GHz





#### Measured Pout and Phase Noise 300-GHz VCO+buffer+doubler



#### **300-GHz vs. 150-GHz Phase Noise**



## Conclusions

#### • Why?

Because we can!

 "Cloud" unsustainable without 10x speed and 100x efficiency improvement

- Need 1Tb/s for near field and intra data center comms
- How?
  - 50-100 Gb/s inductively peaked CMOS logic
  - Mm-wave Power-DAC Transmitter
  - H-Band SoCs with on-die antennas
  - Low-cost QFN package





## Antenna Efficiency & Bandwidth





Sorin Voinigescu, October 12, 2012



#### Si Transistor Performance at H-Band



#### SiGe vs. Alumina µstrip-lines: H-Band





## 50-GS/s 6-bit Fully Segmented RZ-DAC 3Vpp swing per side

#### A. Balteanu et al. IMS 2012







## **Block Diagram**



- Distributed Segmentation: 7 MSBs and 7 LSBs in 8:1 size ratio
- Each bit retimed at up to 50 GHz



#### **BPSK Cell Schematics**









## **Distributed Power DAC Simulations (V2)**





-S21 dB20<4> -S21 dB20<5>

S21 (dB)



ST's 130-nm SiGe BiCMOS Production Process  $f_T/f_{MAX} = 230/280 \text{ GHz}$ 

Jef∏ <u>92</u>

## Measured S-parameters (V1)





Sorin Voinigescu, October 12, 2012



## **Dynamic Range from S-parameters (V1)**







# 4 GHz large signal: one MSB at a time (V2)



2.8Vpp per side, no de-embedding





## 2.5 GHz large signal swing (V2)



#### 5 GHz large signal swing, spectra (V2)



# 10 GHz large signal swing, spectra (V2)



# 10 GHz large signal patterns (V2)



On-Off

Sine





# **20 GHz large signal swing, spectra (V2)** 7 MSBs + 5 LSBs switching at 2.5 Gb/s each



-

Sorin Voinigescu, October 12, 2012

