# 16-Bit 1GHz Adder Design in 180 nm Technology

by Dhruv Patel

4B Electrical Engineering, University of Waterloo, Waterloo, ON

dr3patel@uwaterloo.ca

Abstract-In this following paper, the design of 16-bit adder circuit is proposed in CMOS 180 nm technology to meet the loading condition of 10 fF and the rise/fall time of 100ps with the maximum inputs frequency of 1 GHz. The core design topology of the proposed adder is based on Square Root/Non-linear Carry Select adder. Although, there was no power consumption criteria for the sake of the project, the overall circuit design was implemented to have the delay of almost exactly 1ns under worst possible conditions (ss125 corner) and the rest of the design effort was made to reduce the power consumption and number of transistors. The propagation delay through the critical path and critical bit pattern was 697 ps under typical simulation condition (tt25) and 1004 ps under worst condition. Additionally, the power delay product (PDP) was analyzed against supply voltage. Finally, the optimum PDP for the proposed 16-bit adder was found at Vdd=1.4 V.

### I. INTRODUCTION

T HE two input 16-bit full-adder circuit is the key block in any micro-controllers and microprocessors. It adds two 16-bit inputs, and outputs *16-bit sum* and 1-bit *carry* data. The objective of this project is to design 16-bit Adder in 180nm technology using cadence with 10 fF of loading condition for each output bits, and the output with maximum propagation delay of 1 ns and rise/fall time of 100 ps.

#### **II. ADDER TYPE SELECTION**

Many design matrices were considered such as power, delay, power-delay-product (PDP) and number of transistors in selecting the type of 16-bit adder to meet the required design specifications. As a start, various types of adders such as carry-lookahead, carry-ripple, carry-skip, koggy-stone and brent-kung were theoretically compared across each matrices. Picking carry lookahead is simply the worst thing to do if designed for more than 3-bits because design complexity and number of transistors increase exponentially as bit size increases [1]. Therefore, carry lookahead was discarded. The carry skip was not picked as it is only beneficial in terms of delay if the input bits in its sub-adder blocks are unequal [2]. On average, Carry skip's worst path delay almost equals to full 16-bit carry ripple [3]. The high-performance tree adders such as koggy-stone, brent-kung were not picked as they cost signification power and number of transistors although they could minimize the delay by far [1]. Finally, only the square root carry-select had reasonable balance between adequate speed performance due to its parallel computation nature and power minimization for 16-bit configuration. As a result, the final final decision was made selecting the square root carry select adder for the purpose of 1GHz 16-bit adder.

## III. SQUARE-ROOT CARRY SELECT - TOP LEVEL DESIGN

The top level design of the 16-bit square root carry select adder consists of three main intermediate vertical stages as shown in figure 1: Setup, sub-adders and muxes. The sub-adder stages consist of the two sub-stages where one stage assumes carryIN = 1 and the other stage assumes carryIN = 0 and computes the resulting sum and carry out bits for individual sub-blocks. As the original carry bit propagates through the mux, based on the carry conditions, the results of either of two sub-adder blocks will be passed through the mux as final sum outputs and the next stage carry propagate. The carry select adder computes its outputs in each stage based on the following boolean equations:

$$p_{i} = a_{i} \oplus b_{i}$$
 [3]  

$$c_{i} = \overline{\overline{a_{i}} \ \overline{p} + \overline{c_{i}}p}$$
 [3]  

$$s_{i} = p_{i} \oplus c_{i}$$
 [3]

Since the carry select adder is selected, the arrangement and the size of individual sub-adders matter significantly in terms critical path delay and power consumption. Although, there can be many combinations of sub-adder block sizes such as [2, 2, 3, 4, 5], [4,4,4,4] and [3,3,5,5], the [3, 4, 4 5] was chosen as shown as figure 1 to reduce the number of multiplexing stages as it adds significant delay due to large fan-out of propagated carry in the latter stages. To take the full advantage of the parallelism of the carry select, the sub-adder block size was selected progressively, giving it square-rooted path delay.

For the intermediate 5-bit, 4-bit and 3-bit sub-adders, two main circuit topologies were analyzed: pass transistor and Manchester carry chain in both dynamic and static logic style. The pass transistor adder topology, according to the above equations didn't require additional Generate signal to be produced for neither Sum and Carry generation. Unlike Manchester carry adder, transmission gate adder just required p and  $\overline{a}$  signals. Conversely, the manchester chain required additional Generate signal (a AND b) output to be generated. Although the transmission gate adder topology is relatively slower than the manchester carry chain, it was picked mainly for reducing power. Nevertheless, the transmission gate adder was successfully workable to meet the delay specification in worst simulation conditions. Also, no dynamic logic style was chosen for any part of the adder as dynamic circuits would consume more power than static logic style.

Additionally, the inversion property of the adder chain was also implemented in mux part of the design to avoid 3



Fig. 1: 16-Bit Adder High Level Topology and Sizing multipliers

inverter delays (about 250 ps). The inversion at the mux is A. shown in figure 1.

#### A. Carry Select: Critical Path Delay

Theoretically, the Propagation delay of the critical path in the proposed square root carry select adder is:

$$t_{add} = t_{setup} + 3 * [t_{carry}] + t_{sum} + \sqrt{2N} * t_{mux}$$
[4]

The critical path of the carry select in the proposed design is following: Setup, 3-Bit Adder, 4-bit  $Mux_1$ , 4-bit  $Mux_2$ , 5-bit  $Mux_1$ 

## IV. DETAIL CIRCUIT DESIGN OF CARRY SELECT BLOCKS

Conscious circuit selection and sizing was performed in order to meet the delay specification of 1ns. After choosing the top-level adder architecture, the lower level and detail design was necessary for correct functionality of the adder under all input conditions. For all circuits designed in this 180nm technology, Nmos to Pmos ratio of 1 to 3.6 was used and minimum size of 220nm nMOS was used to size transistors. As a note: 220nm for Nmos = 792 nm for Pmos = 1 unit size was used to annotate the relative sizing of the circuits shown in all figures.

#### A. Setup Block

The setup block in this proposed adder circuit is only generating p and  $\overline{a}$ . The propagation circuit was designed to provide enough drive strength to subsequent sub-adders designed with pass transistors. As shown in figure 3, the propagation circuit was progressively sized from the 0th bit to 15th bit.

Generating skew-less and faster setup outputs with adequate drive strength was the major design decision. Although,  $\overline{p}$ could be generated with the XOR followed by inverter generating  $\overline{p}$ , XOR and XNOR circuits were designed to work independently of each other in parallel for generating skew-less p and  $\overline{p}$ . This would also make sure that both pand  $\overline{p}$  has almost equal drive strength. The XOR and XNOR circuit for p and  $\overline{p}$  was simply designed in complementary static logic design, similar to the summation circuit shown in figure 4.

## B. Sub-Adders and Sum Generation

Each sub-adders consists of the pass transistor chain for carry propagation within the sub-adder as well as the sum generation for each propagated carry based on the initial assumption of carry input. Sub-adders designed with pass transistor logic were also upsized progressively as shown in figure 2 to make them capable of driving the from lower to higher bit number. The last carry bit of each sub-adder had an upsized inverter of 4x as the last carry bit in each sub-adder will later have to drive 4-6 mux fanouts.



Fig. 2: 5-Bit, 4-Bit and 3-Bit Sub adders Circuit and Sizing



Fig. 3: Propagation XOR gates progressive sizing

The XOR gate for the sum generation was precisely picked and sized to meet the rise/fall and delay condition. The sum XOR gates was sized in way that the transistor directly connected to the rails were sized 4 times than the transistor connected to the output as shown in figure 4. Also, the transistor connected to rails were assigned the p and  $\bar{p}$  signal as they arrive much earlier than the carry signals assigned to transistors connected to output. This strategy helped charge-discharge any internal nodes before the carry signal arrives, resulting in much improved performance (about 50 ps).

Another reason for sizing the transistors connected to the output in figure 4 much smaller was to not have the carry signals get heavily loaded which may require significant upsizing of circuits at prior stages for compensation resulting in much higher power consumption.

## C. 2:1 Mux

The mux was probably the most critical in the process of the circuit design as the propagated carry signals acting as a select signal in 2:1 muxes had large fan outs of 4 on the first stage, 5 on the second stage and third stage, and 6 on the last stage. As shown in figure 1, increasing sizes of muxes



Fig. 4: SUM generation XOR gate with conscious pin assignment and sizing

progressively from lower to higher stages for sufficiently allowing the input signals pass through the mux made the conditions even worst for the select inputs of the mux leading to significant delay. As a result, both setup and sub-adder blocks were sufficiently sized up progressively as shown in figure 2 for fueling adequate drive strength to upcoming fanout stages. On previous trials, the mux designed with pass transistors didn't have adequate boost to the propagated signal and therefore, the complementry static logic style shown in figure 5 was used for 2:1 mux design.

#### V. SIMULATION RESULT

The proposed 16-bit Carry select adder was tested with three different input vectors and that under two different simulation conditions: typical-typical 27 C (tt27) and slow-slow 125 C (ss125).



Fig. 5: mux design using static complementary logic style

-/sum<0> -/sum<1> -/sum<2> -/sum<3> -/sum<4> -/sum<5> -/sum<6> -/sum<7> -/sum<8> -/sum<9> -/sum<10> -/sum<11> -/sum<12> -/sum<13> -/sum<14> -/sum<15> -/b<0> -/cout



Fig. 6: Simulation result of Vector 1 with typical-typical 27 C process corner

#### A. Test Vector 1

### 

This test vector requires the carry propagating throughout all bits. The power consumption for this test vector is relatively high compare to average input bit pattern due to increased switching as carry propagates further. The simulated waveforms shown in figure 6 depicts that under tt25, all outputs arrive at the expected logic level well before 1ns. The rise/fall time of the outputs are sufficiently met due to added buffers at the output. Due to utilization of inversion property, sum < 15: 11 > were already at 0 logic level and thus, sum < 10 > was the slowest transitioning output for this test vector.

Also, the simulated waveform shown in figure 7 illustrates that even in worst conditions, all outputs arrive at expected logic level in sufficient time. Also, as expected the delay for ss125 is 40 percent higher than the tt27. The similar trend is shown in later simulations for test vector 2 and test vector 3.



Fig. 7: Simulation result of Vector 1 with Slow-Slow 125 C process corner



Fig. 8: Power delay product vs. Vdd characteristics of 16-Bit adder. Min PDP at VDD = 1.4 V

1) PDP vs VDD supply Sweep: It is necessary to optimize circuit for delay and power. More importantly, it is even more optimum to optimize circuit at the combination of both delay and power known as Power-delay-product (pdp). The very predicted results were resulted in the simulations shown in figure 8. As Voltage supply is scaled up, the power is increased. Because the power is proportional to  $Vdd^2$  and the delay is inversely proportional to Vdd, as Vdd is scaled up the product of power and delay should rise. These is the similar relationship achieved in figure 8. When Vdd is increasing, the power goes up more than delay goes down and thus increasing PDP. Also notice in figure 8, that the graph drops down after about Vdd=3.3 V as the transistors are driven beyond its voltage limits resulting in false measurement. As a result, the Vdd with the lowest PDP meeting all design specification at worst simulation conditions should be picked for optimum efficiency and performance. In this case Vdd = 1.4 V gives the minimum pdp.

B. Test Vector 2



Fig. 9: Simulation result of test Vector 2 with typical-typical 27 C process corner



Fig. 10: Simulation result of Vector 2 with Slow-Slow 125 C process corner

## Outputs: sum = 1111111111111110, cout = 1

This test vector 2 also requires the carry propagating through all bits as well and therefore, the power consumption for this test vector is relatively similar to the test vector 1 listed in table I. Because all sum bits are required to charge up to logic 1 as well as carry to be propagated all the way through, this vector 2 will has the maximum power consumption and delay relative to any other vectors shown in figure 9 and 10. The comparison is tabulated in table I.

## C. Test Vector 3

## Inputs: a = 1010101010101010, b = 1010101010101010Outputs: sum = 0101010101010100, cout = 1

This test vector requires the carry to be propagating only to every other bits therefore, the overall delay for this test vector should be lower than the test vector 1 and test vector 2. Because there is only half of switching activity occurring compare to other two vectors, the power consumption of this circuit is lower than the test vector 1 and vector 2. The outputs



Fig. 11: Simulation result of Vector 3 with typical-typical 27 C process corner



Fig. 12: Simulation result of Vector 3 with Slow-Slow 125 C process corner

of this test vector is as expected under tt25 and ss125 as shown in figure 11 and 12.

## VI. TEST SUMMARY

The table I shows the summary of the simulation results for the purpose of comparing input vectors across various process corners.

|            | Vector 1 |       | Vector 2 |       | Vector 3 |       |
|------------|----------|-------|----------|-------|----------|-------|
|            | TT27     | SS125 | TT27     | SS125 | TT27     | SS125 |
| Delay (ps) | 596      | 883.2 | 697.2    | 1005  | 674.6    | 984   |
| Power (mW) | 23.9     | 22.8  | 29.3     | 26    | 27.9     | 26.94 |
| PDP (pJ)   | 14.24    | 20.1  | 20.4     | 26.13 | 18.8     | 26.5  |

TABLE I: Delay (ps), Power (mW) and PDP (pJ) for each TT27 and SS125 corner for each of three test vectors

#### VII. PROCESS AND TEMPERATURE VARIATION IMPACT

The overall circuit performance fluctuates across various process and temperature variation. At higher temperature, resistivity of transistor channel increases allowing less current to flow through, will result in slowest performance. Also, the Slow-Slow corner (ss125) referring to the lower doping concentration resulting in slow mobility of the electron in n-type and slow mobility of holes in p-type material will also result in slower performance. However, due to same amount of switching activity regardless of any process corner condition, the power consumption in slow-slow corner would be just slightly lower than the typical simulation conditions but the delay will get injured significantly.

## VIII. CONCLUSION

To conclude, the adder was designed to just closely meet the delay specification of 1n rather than overly meet the delay specification which may consume significant power. Also, the square root carry select circuit design should be operated at 1.4 V for it to work at optimized power delay product.

#### REFERENCES

- A. Al-Khalili. Class Lecture, Topic: "Parallel Adders", Department of Electrical and Computer Engineering, Concordia University,1998. Internet: http://users.encs.concordia.ca/ asim/COEN\_6501/Lecture\_Notes /L2\_Notes.pdf
- [2] J. Abraham. VLSI Design. Class Lecture, Topic: "Implementing Logic in CMOS", Department of Electrical and Computer Engineering, The University of Texas at Austin, Sept. 2, 2015. Internet: http://www.cerc.utexas.edu/ jaa/vlsi/lectures/3-1.pdf
- [3] N. H. E. Weste and D. M. Harris, *CMOS VLSI design: a circuits and systems perspective*, 4th ed. Boston: Pearson/Addison-Wesley, 2005.
- [4] J. M. Rabaey, A. P. Chandrakasan and B. Nikolic, *Digital integrated circuits: a design perspective*, 2n ed. Upper Saddle River, N.J.: Pearson Education, 2003.