# THE EFFECT OF LOGIC BLOCK GRANULARITY ON DEEP-SUBMICRON FPGA PERFORMANCE AND DENSITY

by

Elias Ahmed

A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto

Copyright © 2001 by Elias Ahmed

### Abstract

The Effect of Logic Block Granularity on Deep-Submicron FPGA Performance and Density

Elias Ahmed

Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto

### 2001

The architecture of an FPGA has a significant effect on area and delay. In deep-submicron designs, the interconnect resistance and capacitance accounts for the majority of the circuit delay. In the first part of this thesis, we perform a detailed study of the FPGA logic block architecture to determine the impact of logic block functionality on performance and density. In particular, in the context of lookup table (LUT), cluster-based island style FPGAs we look at the effect of LUT size and cluster size (number of LUTs per cluster) on the speed and logic density of an FPGA. The second part of this thesis explores the area and delay properties of a hardwired logic block architecture. This involves a new packing algorithm.

### Acknowledgements

I would like to thank my supervisor Professor Jonathan Rose for his technical guidance and moral support. Our weekly discussions were always interesting and very educational.

I would also like to thank Vaughn Betz and Alexander Marquardt for all their help with VPR, the CAD flow and SPICE modeling. Thanks to Steve Wilton for his advice concerning the 0.18  $\mu$ m SPICE FPGA routing models.

I'd also like to thank Guy Lemieux and all the students in Jonathan's research group, Rob, Andy, Ajay, Paul and William. Thanks to Vincent, Warren, Ted, Scott, Marcus, Brent, Jorge, Humberto, Kostas, Derek and all the rest of the students in the Computer and Electronics Group for making this a wonderful research environment.

Last but not least, I'm grateful to my family for their support and encouragement throughout the years and especially to my late father, Noor, for always believing in me.

# Contents

| 1 | Intr | oduction | a                               | 1  |
|---|------|----------|---------------------------------|----|
|   | 1.1  | Motiva   | ution                           | 1  |
|   | 1.2  | FPGA     | Logic Block Architecture        | 2  |
|   | 1.3  | Hardw    | ired Logic Blocks               | 4  |
|   | 1.4  | Thesis   | Organization                    | 5  |
| 2 | Bacl | kground  | I                               | 7  |
|   | 2.1  | CAD F    | Plow                            | 7  |
|   |      | 2.1.1    | Area Model                      | 10 |
|   | 2.2  | FPGA     | Packing Algorithms              | 11 |
|   |      | 2.2.1    | RASP                            | 11 |
|   |      | 2.2.2    | VPACK                           | 12 |
|   |      | 2.2.3    | Timing-Driven Packing (T-VPACK) | 14 |
|   | 2.3  | FPGA     | Logic Block Architecture        | 18 |
|   |      | 2.3.1    | LUT Size                        | 18 |
|   |      | 2.3.2    | Cluster Size                    | 20 |

|   | 2.4 | Summ     | ary                                              | 20 |
|---|-----|----------|--------------------------------------------------|----|
| 3 | FPG | A Logi   | c Block Architecture                             | 21 |
|   | 3.1 | FPGA     | Architecture Modeling                            | 22 |
|   |     | 3.1.1    | Logic Circuit Design and Delay Model             | 22 |
|   |     | 3.1.2    | Routing Architecture                             | 24 |
|   | 3.2 | Experi   | mental Results                                   | 25 |
|   |     | 3.2.1    | Cluster Inputs Required vs. LUT and Cluster Size | 25 |
|   |     | 3.2.2    | Area as a Function of N and K                    | 27 |
|   |     | 3.2.3    | Performance as a Function of N and K             | 33 |
|   |     | 3.2.4    | Area-Delay Product                               | 42 |
|   |     | 3.2.5    | Summary                                          | 42 |
| 4 | Har | dwired   | Logic Blocks                                     | 45 |
|   | 4.1 | Hardw    | ired Architecture                                | 46 |
|   |     | 4.1.1    | Cluster Inputs (I)                               | 46 |
|   |     | 4.1.2    | Tapping Buffers                                  | 47 |
|   |     | 4.1.3    | Logical Equivalence of Cluster Outputs           | 49 |
|   | 4.2 | HLB F    | Packing                                          | 50 |
|   |     | 4.2.1    | HLB Packing Algorithm                            | 52 |
|   |     | 4.2.2    | HLB Packing with Tapping Buffers                 | 54 |
|   | 4.3 | Experi   | mental Results                                   | 55 |
|   |     | 4.3.1    | Area Results                                     | 56 |
|   |     | 4.3.2    | Delay Results                                    | 59 |
|   | 4.4 | Area-I   | Delay Results                                    | 63 |
|   | 4.5 | Summ     | ary                                              | 64 |
| 5 | Con | clusions | s and Future Work                                | 65 |
|   | 5.1 | Summ     | ary and Contributions                            | 65 |

|    | 5.2 Future Work                           | 66  |
|----|-------------------------------------------|-----|
| A  | Total Area                                | 67  |
| B  | Intra-Cluster (Logic) Area                | 73  |
| С  | Inter-Cluster (Routing) Area              | 79  |
| D  | FPGA Channel Width                        | 85  |
| E  | Total Critical Path Delay                 | 91  |
| F  | Intra-Cluster (Logic) Delay               | 97  |
| G  | Inter-Cluster (Routing) Delay             | 103 |
| H  | Number of BLE Levels on Critical Path     | 109 |
| Ι  | Number of Cluster Levels on Critical Path | 115 |
| Bi | bliography                                | 115 |

# List of Tables

| 3.1 | Logic Cluster Delays for 4-input LUT Using 0.18 $\mu$ m CMOS process         | 23 |
|-----|------------------------------------------------------------------------------|----|
| 3.2 | LUT Delays Using 0.18 $\mu$ m CMOS process                                   | 24 |
| 3.3 | MCNC Benchmark Circuit Descriptions                                          | 26 |
| 3.4 | Channel Width vs. LUT and Cluster Size (1 to 5)                              | 35 |
| 3.5 | Channel Width vs. LUT and Cluster Size (6 to 10)                             | 36 |
| 3.6 | Critical Path Delay Comparison for K=4                                       | 38 |
| 3.7 | Summary of Best Area, Delay, and Area-Delay Results                          | 44 |
|     |                                                                              |    |
| 4.1 | Percentage of Logic Block Area that is Occupied by Output Routing Crossbar . | 50 |
| 4.2 | HLB Cluster Utilization (with and without tapping buffers)                   | 55 |
| 4.3 | Number of Clusters with and without Tapping Buffers                          | 55 |
| 4.4 | Comparison of number of 4-LUT to 7-LUT blocks after technology mapping .     | 60 |
| 4.5 | Area-Delay Product Comparison Between Cascaded 4-LUTs and Non-hardwired      |    |
|     | Architectures                                                                | 63 |
| A.1 | Total Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 1)    | 67 |
| 1 2 | Total Area (1106) in Min. Width Trong, Area (Cluster Size - 2)               | 60 |
| A.Z | Iotal Area $(\times 10^{-})$ in Min. Width Irans. Area (Cluster Size = 2)    | 08 |

| A.3  | Total Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 3)            | 68 |
|------|--------------------------------------------------------------------------------------|----|
| A.4  | Total Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 4)            | 69 |
| A.5  | Total Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 5)            | 69 |
| A.6  | Total Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 6)            | 70 |
| A.7  | Total Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 7)            | 70 |
| A.8  | Total Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 8)            | 71 |
| A.9  | Total Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 9)            | 71 |
| A.10 | Total Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 10)           | 72 |
| B.1  | Intra-Cluster Area (×10 <sup>6</sup> ) in Min. Width Trans. Area (Cluster Size = 1)  | 73 |
| B.2  | Intra-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 2)    | 74 |
| B.3  | Intra-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 3)    | 74 |
| B.4  | Intra-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 4)    | 75 |
| B.5  | Intra-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 5)    | 75 |
| B.6  | Intra-Cluster Area (×10 <sup>6</sup> ) in Min. Width Trans. Area (Cluster Size = 6)  | 76 |
| B.7  | Intra-Cluster Area (×10 <sup>6</sup> ) in Min. Width Trans. Area (Cluster Size = 7)  | 76 |
| B.8  | Intra-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 8)    | 77 |
| B.9  | Intra-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 9)    | 77 |
| B.10 | Intra-Cluster Area (×10 <sup>6</sup> ) in Min. Width Trans. Area (Cluster Size = 10) | 78 |
| C.1  | Inter-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 1)    | 79 |
| C.2  | Inter-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 2)    | 80 |
| C.3  | Inter-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 3)    | 80 |
| C.4  | Inter-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 4)    | 81 |
| C.5  | Inter-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 5)    | 81 |
| C.6  | Inter-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 6)    | 82 |
| C.7  | Inter-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 7)    | 82 |
| C.8  | Inter-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 8)    | 83 |

| C.9  | Inter-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 9)    | 83 |
|------|--------------------------------------------------------------------------------------|----|
| C.10 | Inter-Cluster Area (×10 <sup>6</sup> ) in Min. Width Trans. Area (Cluster Size = 10) | 84 |
| D.1  | Channel Width (Cluster Size = 1)                                                     | 85 |
| D.2  | Channel Width (Cluster Size = 2)                                                     | 86 |
| D.3  | Channel Width (Cluster Size = 3)                                                     | 86 |
| D.4  | Channel Width (Cluster Size = 4)                                                     | 87 |
| D.5  | Channel Width (Cluster Size = 5)                                                     | 87 |
| D.6  | Channel Width (Cluster Size = 6)                                                     | 88 |
| D.7  | Channel Width (Cluster Size = 7)                                                     | 88 |
| D.8  | Channel Width (Cluster Size = 8)                                                     | 89 |
| D.9  | Channel Width (Cluster Size = 9)                                                     | 89 |
| D.10 | Channel Width (Cluster Size = 10)                                                    | 90 |
| E.1  | Total Delay in nano-seconds (Cluster Size = 1)                                       | 91 |
| E.2  | Total Delay in nano-seconds (Cluster Size = 2)                                       | 92 |
| E.3  | Total Delay in nano-seconds (Cluster Size = 3)                                       | 92 |
| E.4  | Total Delay in nano-seconds (Cluster Size = 4)                                       | 93 |
| E.5  | Total Delay in nano-seconds (Cluster Size = 5)                                       | 93 |
| E.6  | Total Delay in nano-seconds (Cluster Size = 6)                                       | 94 |
| E.7  | Total Delay in nano-seconds (Cluster Size = 7)                                       | 94 |
| E.8  | Total Delay in nano-seconds (Cluster Size = 8)                                       | 95 |
| E.9  | Total Delay in nano-seconds (Cluster Size = 9)                                       | 95 |
| E.10 | Total Delay in nano-seconds (Cluster Size = 10)                                      | 96 |
| F.1  | Intra-Cluster Delay in nano-seconds (Cluster Size = 1)                               | 97 |
| F.2  | Intra-Cluster Delay in nano-seconds (Cluster Size = 2)                               | 98 |
| F.3  | Intra-Cluster Delay in nano-seconds (Cluster Size = 3)                               | 98 |
| F.4  | Intra-Cluster Delay in nano-seconds (Cluster Size = 4)                               | 99 |

| F.5  | Intra-Cluster Delay in nano-seconds (Cluster Size = 5) $\dots \dots 99$ |
|------|-------------------------------------------------------------------------------------------------------------------------------|
| F.6  | Intra-Cluster Delay in nano-seconds (Cluster Size = 6)                                                                        |
| F.7  | Intra-Cluster Delay in nano-seconds (Cluster Size = 7)                                                                        |
| F.8  | Intra-Cluster Delay in nano-seconds (Cluster Size = 8)                                                                        |
| F.9  | Intra-Cluster Delay in nano-seconds (Cluster Size = 9)                                                                        |
| F.10 | Intra-Cluster Delay in nano-seconds (Cluster Size = 10)                                                                       |
| G.1  | Inter-Cluster Delay in nano-seconds (Cluster Size = 1)                                                                        |
| G.2  | Inter-Cluster Delay in nano-seconds (Cluster Size = 2)                                                                        |
| G.3  | Inter-Cluster Delay in nano-seconds (Cluster Size = 3)                                                                        |
| G.4  | Inter-Cluster Delay in nano-seconds (Cluster Size = 4)                                                                        |
| G.5  | Inter-Cluster Delay in nano-seconds (Cluster Size = 5)                                                                        |
| G.6  | Inter-Cluster Delay in nano-seconds (Cluster Size = 6)                                                                        |
| G.7  | Inter-Cluster Delay in nano-seconds (Cluster Size = 7)                                                                        |
| G.8  | Inter-Cluster Delay in nano-seconds (Cluster Size = 8)                                                                        |
| G.9  | Inter-Cluster Delay in nano-seconds (Cluster Size = 9)                                                                        |
| G.10 | Inter-Cluster Delay in nano-seconds (Cluster Size = 10)                                                                       |
| H.1  | Number of BLEs on Critical Path (Cluster Size = 1)                                                                            |
| H.2  | Number of BLEs on Critical Path (Cluster Size = 2)                                                                            |
| H.3  | Number of BLEs on Critical Path (Cluster Size = 3)                                                                            |
| H.4  | Number of BLEs on Critical Path (Cluster Size = 4)                                                                            |
| H.5  | Number of BLEs on Critical Path (Cluster Size = 5)                                                                            |
| H.6  | Number of BLEs on Critical Path (Cluster Size = 6)                                                                            |
| H.7  | Number of BLEs on Critical Path (Cluster Size = 7)                                                                            |
| H.8  | Number of BLEs on Critical Path (Cluster Size = 8)                                                                            |
| H.9  | Number of BLEs on Critical Path (Cluster Size = 9)                                                                            |
| H.10 | Number of BLEs on Critical Path (Cluster Size = 10)                                                                           |

| I.1  | Number of Clusters on Critical Path (Cluster Size = 1)     | • | • | • | • • | • | • | • | • | • | • | • • | . 1 | 15 |
|------|------------------------------------------------------------|---|---|---|-----|---|---|---|---|---|---|-----|-----|----|
| I.2  | Number of Clusters on Critical Path (Cluster Size = 2)     | • | • |   | • • | • | • |   |   | • | • | • • | . 1 | 16 |
| I.3  | Number of Clusters on Critical Path (Cluster Size = 3)     | • | • | • | • • | • | • |   | • | • | • | • • | . 1 | 16 |
| I.4  | Number of Clusters on Critical Path (Cluster Size = 4)     | • | • | • | • • | • | • |   | • | • | • | • • | . 1 | 17 |
| I.5  | Number of Clusters on Critical Path (Cluster Size = 5)     | • | • | • | • • | • | • |   | • | • | • | • • | . 1 | 17 |
| I.6  | Number of Clusters on Critical Path (Cluster Size = 6)     | • | • | • | • • | • | • |   | • | • | • | • • | . 1 | 18 |
| I.7  | Number of Clusters on Critical Path (Cluster Size = 7)     | • | • | • | • • | • | • |   | • | • | • | • • | . 1 | 18 |
| I.8  | Number of Clusters on Critical Path (Cluster Size = 8)     | • | • | • | • • | • | • |   | • | • | • | • • | . 1 | 19 |
| I.9  | Number of Clusters on Critical Path (Cluster Size = 9)     | • | • | • | • • | • | • |   | • | • | • | • • | . 1 | 19 |
| I.10 | Number of Clusters on Critical Path (Cluster Size $= 10$ ) |   |   |   |     |   | • |   |   |   | • | •   | . 1 | 20 |

# List of Figures

| 1.1 | Island-Style FPGA [BRM99]                                                | 2  |
|-----|--------------------------------------------------------------------------|----|
| 1.2 | FPGA Cluster-style Logic Block Contents                                  | 3  |
| 1.3 | Examples of Hardwired Logic Blocks                                       | 5  |
| 2.1 | Grouping of LUTs and Flip-Flops                                          | 8  |
| 2.2 | Architecture Evaluation Flow                                             | 9  |
| 2.3 | Definition of a Minimum-Width Transistor Area [BRM99]                    | 11 |
| 2.4 | Packing of BLEs to Form Clusters                                         | 13 |
| 2.5 | Original VPACK Algorithm                                                 | 15 |
| 2.6 | T-VPACK Algorithm [MBR99]                                                | 17 |
| 2.7 | Structure of (a) Basic Logic Element (BLE) and (b) Logic Cluster [BRM99] | 19 |
| 3.1 | Structure and Speed Paths of a Logic Cluster [BRM99]                     | 23 |
| 3.2 | Number of Inputs Required for 98% Logic Block Utilization                | 28 |
| 3.3 | Total Area for Clusters of Size 1 to 5                                   | 29 |
| 3.4 | Total Area for Clusters of Size 6 to 10                                  | 30 |
| 3.5 | Total Logic Block Cluster Area                                           | 31 |

| 3.6  | Number of Clusters and Cluster Area Versus K (for N=1)                      | 31 |
|------|-----------------------------------------------------------------------------|----|
| 3.7  | Intra-cluster Multiplexer Area and LUT Size                                 | 32 |
| 3.8  | Routing Area                                                                | 33 |
| 3.9  | Number of Clusters and Routing Area Per Cluster Versus K (for N=1) $\ldots$ | 34 |
| 3.10 | Total Delay for Clusters of Size 1 to 10                                    | 37 |
| 3.11 | Total Intra-Cluster Delay for Clusters of Sizes 1 to 10                     | 37 |
| 3.12 | Number of BLEs on Critical Path and BLE delay vs K (for N=1) $\ldots$       | 39 |
| 3.13 | Total Inter-Cluster Delay for Clusters of Size 1 to 10                      | 39 |
| 3.14 | Number of Cluster levels on Critical Path                                   | 40 |
| 3.15 | Average BLE Fanout                                                          | 41 |
| 3.16 | Area-Delay Product for Clusters of Size 1 to 10                             | 43 |
| 3.17 | Close-up View of Area-Delay Product for Clusters of Size 1 to 10            | 43 |
| 4.1  | Cascaded 4-LUTs                                                             | 46 |
| 4.2  | Cluster Description (with HLBs)                                             | 47 |
| 4.3  | HLB Tapping Buffers                                                         | 48 |
| 4.4  | Comparison of HLBs with and without tapping buffers                         | 48 |
| 4.5  | HLB Cluster Contents with Full Routing Crossbar for Output Signals          | 49 |
| 4.6  | HLB Packing Flow                                                            | 51 |
| 4.7  | Pseudo-code for HLB Timing Driven Packing                                   | 53 |
| 4.8  | HLB with Cascaded 4-LUTs and Tapping Buffer                                 | 54 |
| 4.9  | Total Area Comparisons for Hardwired Arch. vs. Non-Hardwired                | 56 |
| 4.10 | Inter-Cluster Area Comparisons for Hardwired Arch. vs. Non-Hardwired        | 58 |
| 4.11 | Intra-Cluster Area Comparisons for Hardwired Arch. vs. Non-Hardwired        | 59 |
| 4.12 | Total Critical Path Delay Comparisons for Hardwired Arch. vs. Non-Hardwired | 61 |
| 4.13 | Number of BLEs on the Critical Path Comparisons for Hardwired Arch. vs.     |    |
|      | Non-Hardwired                                                               | 62 |

| 4.14 | Number of Clusters on the Critical Path Comparisons for Hardwired Arch. vs. |    |
|------|-----------------------------------------------------------------------------|----|
|      | Non-Hardwired                                                               | 62 |
| 5.1  | Various HLB architectures                                                   | 66 |

# CHAPTER 1

# Introduction

### 1.1 Motivation

Field-Programmable Gate Arrays (FPGAs) have experienced tremendous growth in recent years and have become a multi-billion dollar industry. Shrinking device geometries resulting in larger gate capacity have provided for greater functionality. The instant programmability gives systems built with these devices a significant time-to-market advantage. However, this programmability comes at a price, since FPGAs are at least three times slower and demand more than ten times the silicon area when implementing the same function on a chip when compared to Standard Cells or Masked-Programmable Gate Arrays [BFRV92]. This happens because Standard Cells use simple wires to make interconnections between logic gates but in FPGAs, gates are connected with programmable switches. These switches have much larger resistance and capacitance and hence are slower than the wires in full-fabrication chips. Ideally, to improve the performance of an FPGA we would like to use as few switches as possible for any given circuit. In general, the three main factors affecting overall FPGA performance are the architecture of the FPGA, the quality of the CAD tools, and the electrical transistor level design of the FPGA. While this thesis explores all three issues, we focus primarily on the logic block FPGA architecture.

### **1.2 FPGA Logic Block Architecture**

This thesis examines several aspects of FPGA logic block architecture and its impact on area and performance. A generic FPGA consists of numerous programmable logic blocks which have the capability to implement some digital logic functions. In between these logic blocks are programmable routing switches which connect the input and output pins of each logic block. This basic FPGA architecture is illustrated in Figure 1.1 and is known as an "islandstyle" structure in which a symmetric array of logic blocks is surrounded by routing channels (or tracks). The I/O pads are evenly distributed around the perimeter of the FPGA.



Figure 1.1: Island-Style FPGA [BRM99]

Figure 1.2 shows a typical logic cluster, which is a logic block that consists of one or more basic logic elements (BLEs) grouped together. BLEs are often composed of look-up tables (LUTs). The BLEs in the cluster are fully-interconnected meaning that a crossbar allows any BLE output to reach any BLE input and that all inputs to the cluster can reach any of the BLE inputs. The advantage of having a fully-connected internal routing crossbar is that physical routing becomes much easier since the router (the CAD tool which determines the paths of the wires in an FPGA) simply has to connect to any one of the cluster input pins. This added flexibility in the router results in a fewer number of tracks being used. However, there is a cost in terms of multiplexer area and delay to build the full crossbar. For large clusters, this area and delay can be quite significant.



Figure 1.2: FPGA Cluster-style Logic Block Contents

The focus of the first part of this thesis is to determine the effect of the number of inputs to the LUT (K) in a homogeneous architecture that employs all the same size of LUTs, and the number of such LUTs in a cluster (N) on the performance and density of an FPGA. Increasing either LUT size (K) or cluster size (N) increases the functionality of the logic block, which has two positive effects: it decreases the total number of logic blocks needed to implement a given function, and it decreases the number of such blocks on the critical path, typically improving performance. Working against these positive effects is that the size of the logic block increases with both K and N. The size of the LUT is exponential in K [RFLC90] and the size of the cluster is quadratic in N [BR97]. Furthermore, the area devoted to routing outside the block will change as a function of K and N, and this effect (since routing area typically is a large percentage of total area) has a strong effect on the results. The choice of the logic block granularity which produces the best area-delay product lies in between these two extremes. In exploring these trade-offs we seek to answer the following questions:

• For a cluster-based logic block with N LUTs of size K and I inputs to the cluster, what

should the value of I be so that 98 % of the LUTs in the cluster can be fully utilized? Certainly setting  $I=K\times N$  will do this, but a value less than this, which is cheaper, may also suffice.

- What is the effect of K and N on FPGA area?
- What is the effect of K and N on FPGA delay?
- Which values of K and N give the best area-delay product?

Most importantly, we seek to clearly explain the results and thus perhaps leading to better architectures. Even though some of these questions were addressed some time ago in [RFCL89] [RFLC90] [KG91] [KG92b] [HW91] and [SRCL92], several reasons compelled us to revisit the issue. First, prior work on the appropriate size of the LUT focused on nonclustered logic blocks, which are known to have a significant impact on the area and delay [MBR99]. Second, most prior studies tended to look at area or delay, but not both as we will here. Third, prior results were based on IC process generations that are several factors larger than current process generations, and so do not take deep-submicron electrical effects into account. In the present work, we perform detailed spice-level simulations of circuits and perform appropriate buffer and transistor sizing for all the logic and routing elements, in the manner of [BRM99]. Fourth, the CAD tools available today for experimentation are significantly better than those available 10 years ago, when this question was first raised. This turned out to be significant because our new results show that the superior tools give rise to different trends in the explanation of the results.

### **1.3 Hardwired Logic Blocks**

The second part of this thesis will explore the use of hardwired logic blocks (HLBs) within the context of logic clusters [Chu94]. HLBs consist of two or more BLEs connected together by wires. These wires do not have switches and lack any form of programmability. The low resistance and capacitance of the metal wires makes the connections between BLEs fast and cheap in terms of area. In a non-HLB architecture (shown in Figure 1.2) connections between BLEs must propagate through the local routing crossbar which connects all BLE outputs and inputs. The area requirements of the full-routing crossbar can be quite significant sometimes even larger than the BLE area. Also, there is a significant delay required for signals to reach from a BLE output to another BLE input. The use of HLBs alleviates the area and delay demands of local BLE connections and may improve FPGA density and performance. We explore the area and delay of one particular HLB based FPGA architecture, again in the context of clustered architectures.



a) Cascaded Hardwired LUTs

b) Tree Topology Hardwired LUTs

Figure 1.3: Examples of Hardwired Logic Blocks

### 1.4 Thesis Organization

This thesis is organized as follows: Chapter 2 describes the background required to understand this thesis and previous work relating to FPGA logic block architecture. Chapter 3 presents the results of an extensive study of LUT and cluster size on FPGA area and delay. Chapter 4 presents the area and delay results for a cascaded 4-LUT hardwired logic block-based FPGA and compares it to the non-hardwired results. Finally, we provide the conclusion in Chapter 5 along with possibilities for future work.

# CHAPTER 2

### Background

This chapter discusses several related works from the past in the area of FPGA architecture and CAD tools. There has been a significant amount of work in FPGA architecture over the last decade and we will attempt to summarize some of the key related results. Also, we will review the CAD flow and algorithms used in logical synthesis, placement and routing used to produce the results in the present work.

# 2.1 CAD Flow

The best-known and most believable method of determining the answers to the questions posed in Section 1.2 is to experimentally synthesize real circuits using a CAD flow into the different FPGA architectures of interest, and then measure the resulting area and delay [BFRV92] [BRM99] [KG91]. Figure 2.2 illustrates the CAD flow that was used in [BRM99] and [MBR99] to explore architectures and this is the one that we employ in this research. First, each circuit passes through technology-independent logic optimization using the SIS program [ea90]. It is worth noting that, from this point on, the entire CAD flow is fully timing-driven. Technology mapping (which converts the logic expressions into a netlist of K-input LUTs), was performed using the FlowMap and FlowPack tools [CD94]. At this stage, there exists a netlist of logic blocks (LUTs and registers). T-VPACK [MBR99] takes this netlist of logic blocks and first groups them into basic logic elements (BLEs) which consist of a single K-LUT and flip-flop. Any LUT with a fanout of one and feeding into a flip-flop (as shown in Figure 2.1) can be collapsed into a single BLE. Hence, BLEs can be composed of either a LUT, a LUT and a LATCH or simply a LATCH.



Figure 2.1: Grouping of LUTs and Flip-Flops

These BLEs are then packed into clusters optimizing for both logic density and speed. VPR [BRM99] is then used for timing-driven placement and routing. Placement is the process of determining the location of every cluster within the FPGA tile and routing determines which programmable switches to turn on in order to connect the nets. Note that the VPACK and VPR toolset was designed to explore FPGA architectures and each tool takes a number of parameters as input that describe an FPGA architecture. For example, the packer (VPACK) takes the LUT size (K), cluster size (N) and number of cluster inputs (I) as input parameters. The router takes a VPR ".arch" file [BRM99] which is a textual description of the routing architecture are extracted. VPR uses a transistor based area model to calculate the total FPGA area and the timing analyzer determines the critical path delay by extracting the Elmore delay of each net and performing a path based timing analysis.

In our approach to modeling the area of an FPGA required by any given circuit, we deter-



Figure 2.2: Architecture Evaluation Flow

mine the minimum number of tracks needed to successfully route each circuit,  $W_{min}$ . Clearly this isn't possible in real FPGAs, but we believe this is meaningful as part of a logic density metric for an architecture. The area model which makes use of this minimum track count is described more fully in Section 2.1.1. In order to determine the minimum number of tracks per channel to route each circuit we continuously route each circuit, removing tracks from the architecture until it fails to route. We call the situation where the FPGA has the minimum number of tracks needed to route a given circuit a "high stress" routing since the circuit is barely routable. We believe that measuring the performance of a circuit under these high-stress conditions is unreasonable and atypical, because FPGA designers don't like working just on the edge of routability. They will typically change something to avoid it, such as using a larger device, or removing part of the circuit.

For this reason, we add 30% more tracks to the minimum track count and then perform final "low stress" routing and use that to measure the critical path delay.

From the output of the router, and using the area models described in the next section along with the circuit delay parameters, we can compare different architectures.

### 2.1.1 Area Model

Betz' area modeling procedure [BRM99] was to create the detailed, transistor-level circuit design of all of the logic and routing circuitry in the FPGA. This includes circuits for the LUTs, flip-flops, intra-cluster muxes, inter-cluster routing muxes and switches and all of the associated programming bits. His basic assumption was that the total area of the FPGA was active-area limited, which tends to be true when there are many layers of metal (according to [BRM99]). Two commercial PLD vendors have confirmed this assumption.

This design process includes proper sizing of all of the gates and buffers, including the pass-transistors in the routing. Betz uses the number of "minimum-width transistor areas" as his area metric. The definition of a minimum-width transistor area is the smallest possible layout area of a transistor that can be processed for a specific technology plus the minimum spacing surrounding the transistor as shown in Figure 2.3. The spacing is dictated by the design rules for that particular technology. Any transistors in the circuit design that are sized larger than minimum are counted as a greater number of minimum-width transistors, taking into account the fact that a double size transistor takes less than twice the layout area. One advantage of this metric is that it is a somewhat process-independent estimate of the FPGA area.



Figure 2.3: Definition of a Minimum-Width Transistor Area [BRM99]

### 2.2 FPGA Packing Algorithms

In order to effectively study the feasibility of HLB architectures we need to have a packing algorithm capable of targeting such structures. However, we should first provide a brief overview of some of the general packing algorithms relevant to our work. This section will discuss three such packing algorithms: RASP, VPACK and T-VPACK. The input to all these packing algorithms is a nelist of LUTs and Flip-flops and the output is a clustered set of BLEs.

#### 2.2.1 RASP

RASP [CPD96] is a general synthesis system for SRAM-based FPGAs. It has the capability to map circuits into various types of logic blocks. RASP is composed of a core which includes synthesis and optimization algorithms targeting technology-independent logic synthesis and mapping for LUT-based FPGAs. The packing algorithm uses a "closeness" metric to determine which LUTs to group together in the same cluster. It first creates a compatibility graph where the vertices represent the LUTs that require grouping and edges are formed between vertices

if they can be grouped together. There is no edge in the compatibility graph if two BLEs cannot be grouped together due to some hard constraint violation (for example, exceeding the maximum number of cluster inputs allowable). The next step is to assign weights to all the edges in the network. Weights are assigned depending on the design objective. If circuit performance is the main objective then a large weight is assigned to edges which produce a grouping that reduces the length of the critical path. Conversely, if FPGA area is the primary objective then large weights are given to those edges resulting in groupings that do not create complex interconnection patterns in the final mapping. The algorithmic complexity of the mapper is O(nm) where n is the number of LUTs and m is the number of edges. With the current benchmarks used in our experiments, the number of edges m in the compatibility graph is  $O(n^2)$ . This results in an overall algorithmic complexity of  $O(n^3)$ , which is excessive and somewhat impractical for modern day circuits which require in excess of 10,000 blocks.

#### 2.2.2 VPACK

VPACK [BRM99] takes a netlist of LUTs and registers as input and outputs a netlist of logic clusters as illustrated in Figure 2.4. It groups BLEs together in order to maximize input sharing. The number of BLEs per cluster (N), inputs per cluster (I), LUT size (K) and clocks per cluster ( $M_{clk}$ ) are all input parameters to the VPACK algorithm. VPACK accepts any combination of these parameters and creates optimized logic clusters. The complete pseudo-code for the VPACK algorithm is given in Figure 2.5.

VPACK attempts to pack as many BLEs into a given cluster without violating the following constraints:

- There can be no more than N BLEs in any given cluster.
- Each cluster must use I inputs or less.
- Every BLE contained in the cluster must have K-inputs per LUT (strictly homogeneous architecture).

#### Netlist of BLEs

Netlist of Clusters



Figure 2.4: Packing of BLEs to Form Clusters

• M<sub>clk</sub> is the maximum number of clocks per cluster.

The basic algorithm consists of two stages. The first stage examines the input netlist and groups LUTs and registers together to form BLEs. Only LUTs with a fanout of one and whose output directly feeds a register are grouped together. All other structures such as LUTs with fanout greater than one and LUT to LUT connections require multiple BLEs.

The second stage of the VPACK algorithm groups BLEs together to form clusters. Clusters are created in a sequential manner one after the other. First, the algorithm is in a greedy mode and continues to add BLEs to the current cluster until the maximum cluster capacity has been reached. If the maximum number of BLEs, N, have been packed then the current cluster is "closed" and a new empty cluster is created and the packing process is repeated. The basic VPACK clustering starts by selecting a seed BLE to pack into the current open cluster. The unclustered BLE with the most number of used inputs is always selected as the next available seed. Once a seed BLE has been selected, an attraction function is applied to determine the next BLE to group within the current cluster. The attraction between a BLE, B, and the current cluster, C, are the number of common nets that are shared. A net is defined as any electrically equivalent wire connecting one or more logic blocks together.

$$Attraction(B) = |Nets(B) \cap Nets(C)|$$

If the greedy algorithm of the packer does not completely fill the cluster to its maximum capacity then there is the option to invoke the hill-climbing phase. Hill-climbing will continue to pack BLEs into the cluster even though the cluster uses more than the maximum I inputs that are allowable. Hill-climbing allows BLEs to be added even though the result is an infeasible cluster (too many inputs). It must be remembered that adding a BLE to a cluster in which all the inputs are already present in the cluster and having its output used by another BLE causes a reduction in the number of cluster inputs. This is the key driving force behind hill-climbing. That is, even though infeasiblities may occur early on during packing, it may become feasible at later stages and the result is an improvement in logic block density. The algorithmic complexity of VPACK is  $O(k_{max} \cdot K \cdot n)$  where the maximum number of terminals on a net is represented by  $k_{max}$ , K is the number of inputs to each LUT and n is the number of LUTs + registers in the circuit.

#### 2.2.3 Timing-Driven Packing (T-VPACK)

T-VPACK [MBR99] was an extension to VPACK that attempts to maximize logic cluster capacity while simultaneously working to reduce the critical path delay. The total delay is improved by reducing the number of inter-cluster connections on the critical path. Since the external cluster routing delay is much larger than the local routing inside the cluster this should have a positive impact on delay. The pseudo-code for the T-VPACK algorithm is shown in Figure 2.6.

Timing analysis is performed on the circuit before any packing begins. T-VPACK models three types of delays within an FPGA: the delay through a logic block, *Logic Block Delay*, the intra-cluster delay between logic blocks, *Intra Logic Delay* and the external delay between cluster input/output pins, *Inter\_Block\_Delay*. The external routing delays cannot be determined before placement and routing and so these must be approximated. It was experimentally demonstrated in [MBR99] [BRM99] that weighting the *Logic Block\_Delay* and *In*-

```
Let: UnclusteredBLEs be the set of BLEs not contained in any cluster
  C be the set of BLEs contained in the current cluster
  LogicClusters be the set of clusters (where each cluster is a set of BLEs)
UnclusteredBLEs = PatternMatchToBLEs (LUTs, Registers);
LogicClusters = NULL;
while (UnclusteredBLEs != NULL) { /* More BLEs to cluster */
  C = GetBLEwithMostUsedInputs (UnclusteredBLEs);
  while (|C| < N) { /* Cluster is not full */
    BestBLE = MaxAttractionLegalBLE (C, UnclusteredBLEs);
    if (BestBLE == NULL) /* No BLE can be added to cluster */
       break;
    UnclusteredBLEs = UnclusteredBLEs - BestBLE;
    C = C \cup BestBLE;
  }
  LogicClusters = LogicClusters \cup C;
}
```

Figure 2.5: Original VPACK Algorithm

*tra\_Logic\_Delay* to 0.1 and *Inter\_Block\_Delay* to 1 produced the most accurate post-routing results.

All the net timing and slack information is stored after initial timing analysis has been completed. From this, the criticality of every connection can be calculated. Slack [HSC83] is defined as the amount of delay that may be added to a connection before it becomes critical. Slack(i) is the slack of a connection *i* and *MaxSlack* is the maximum possible slack for all connections in the circuit. The criticality of any connection, *i*, is expressed as:

ConnectionCriticality(i) = 
$$1 - \frac{slack(i)}{MaxSlack}$$

Once the connection criticalities have been determined T-VPACK determines which BLE will be chosen as the seed for a new cluster by selecting the BLE attached to the net with the

highest criticality. Once a seed BLE has been selected, an attraction function is applied to calculate the next BLE that would be the best candidate for inclusion in the current cluster. The attraction function for a BLE, B, towards a cluster, C is given by:

$$Attraction(B) = \lambda \cdot Criticality(B) + (1 - \lambda) \cdot \frac{|Nets(B) \cap Nets(C)|}{MaxNets}$$

The BLE with the highest attraction will be the next candidate and  $\lambda$  is a parameter which determines whether T-VPACK should be fully timing driven or maximizes input sharing. If  $\lambda$  is 1 then T-VPACK attempts to minimize delay without regard to maximizing input pin sharing and if  $\lambda$  is 0 then T-VPACK will attempt to minimize the number of used inputs. [MBR99] demonstrated that setting  $\lambda$  to a range between 0.4 and 0.8 was best.

Let:**UnclusteredBLEs** be the set of BLEs not contained in any cluster **C** be the set of BLEs contained in the current cluster **LogicClusters** be the set of clusters (where each cluster is a set of BLEs)

```
UnclusteredBLEs = PatternMatchToBLEs (LUTs, Registers);
LogicClusters = NULL;
```

```
ComputeCriticalities();
BLEsSinceLastCriticalityRecompute = 0;
```

```
while (UnclusteredBLEs != NULL) { /* More BLEs to cluster */
```

```
C = GetMostCriticalBLE (UnclusteredBLEs);
BLEsSinceLastCriticalityRecompute ++;
```

```
while (|C| < N) { /* Cluster is not full */
```

```
if (BLEsSinceLastCriticalityRecompute >= RecomputeInterval) {
   ComputeCriticalities();
   BLEsSinceLastCriticalityRecompute = 0;
}
```

```
BestBLE = MaxAttractionLegalBLE (C, UnclusteredBLEs);
if (BestBLE == NULL) /* No BLE can be added to cluster */
break;
UnclusteredBLEs = UnclusteredBLEs - BestBLE;
C = C ∪ BestBLE;
BLEsSinceLastCriticalityRecompute ++;
```

```
}
LogicClusters = LogicClusters \cup C;
}
```

## 2.3 FPGA Logic Block Architecture

Now that the FPGA experimental methodology and design flows have been described, we discuss earlier work which examined logic block architecture and its impact on FPGA density and performance. This research and much of the relevant prior work was based on a cluster-based island style FPGA. The structure of the cluster-based logic block is illustrated in Figure 2.7. Each cluster contains N basic logic elements (BLEs) fed by I cluster inputs. The BLE, illustrated in Figure 2.7(a) typically consists of a K-input lookup table (LUT) and register, which feed a two-input multiplexer that determines whether the registered or unregistered LUT output drives the BLE output. For clusters containing more than one BLE, "full connectivity" is assumed. This means that all I cluster inputs and N outputs can be programmably connected to each of the K inputs on every LUT. These are implemented using the multiplexers shown in the figure, which are un-necessary for a cluster of size 1. The Altera Flex 6K, 8K, 10K [Inc98a] and Xilinx 5200 [Inc97] and Virtex [Inc98b] are commercial examples of such clusters (although the Xilinx logic clusters are not fully connected).

#### 2.3.1 LUT Size

There have been several studies in the past which examined the effect of logic block functionality on the area and performance of FPGAs. The use of large LUTs generally produces good performance results since there are a fewer number of BLEs on any given critical path. However, because the logic block area grows exponentially with the number of LUT inputs, these larger blocks are very expensive. With this in mind, the key to any FPGA logic block study is to balance these two opposing factors and determine an appropriate LUT size which gives both good performance and logic density. For example, the work in [RFLC90] and [KG92a] showed that a LUT size of 4 is the most area efficient in a non-clustered context. One of the key observations from this study was that the FPGA area was mainly dominated by the inter-cluster routing area and even though increasing logic functionality resulted in fewer logic blocks this


Figure 2.7: Structure of (a) Basic Logic Element (BLE) and (b) Logic Cluster [BRM99]

could easily be offset by the increase in routing area due to the larger number of external pin to pin connections between blocks. Hence, it was concluded in [RFLC90] that logic blocks with high functionality per connected pin produced the best area results. Conversely, evaluation of FPGA performance was performed in [Sin91] [SRCL92] and [KG91]. It was observed that using a LUT size of 5 to 6 gave the best performance. The FPGA performance studies tended to favour larger LUTs since they resulted in fewer levels of logic on the critical path. Since the routing delay was larger than the logic delay, this generally lead to postive delay results. A recent publication [KBKC99] has suggested that using a heterogeneous mixture of LUT sizes of 2 and 3 was equivalent in area efficiency to a LUT size of 4. In addition [ACSG<sup>+</sup>99] states that a logic structure using two 3-input LUTs was most beneficial in terms of area. However, it must be noted that both these last two papers did not perform a full area or delay study where a range of LUT sizes was examined.

#### 2.3.2 Cluster Size

In general, clusters can be classified using four parameters [BRM99]: the number of inputs that each BLE has (K), the number of BLEs within each cluster (N), the number of distinct cluster input pins feeding each cluster (I) and the number of distinct clocks per cluster ( $M_{clk}$ ). It was experimentally shown in [BRM99] that given a cluster with a single clock and assuming a LUT size of 4, then clusters of size 4 to 10 produced the best area-delay results using non-timing driven packing (VPACK). Later on, a separate study [Mar99] [MBR99] based on a timingdriven packing algorithm (T-VPACK) found that clusters of sizes 7 to 10 were best in terms of area-delay product.

## 2.4 Summary

This chapter outlined some of the key results from the past in FPGA logic block architecture. Also, background information concerning logic synthesis, technology mapping and packing were provided. The rest of the thesis will elaborate on the area and delay results for various logic block architectures.

# CHAPTER 3

## FPGA Logic Block Architecture

This chapter explores the effect of logic block functionality on FPGA performance and density. In particular, in the context of lookup table, cluster-based island-style FPGAs [BRM99] we look at the effect of lookup table (LUT) size and cluster size (number of LUTs per cluster) on the speed and logic density of an FPGA.

We use a fully timing-driven experimental flow as described in Section 2.1 [BRM99] [Mar99] in which a set of benchmark circuits are synthesized into different cluster-based [BR97] [BR98] [Mar99] logic block architectures, which contain groups of LUTs and flip-flops. We look across all architectures with LUT sizes in the range of 2 inputs to 7 inputs, and cluster size from 1 to 10 LUTs. In order to judge the quality of the architecture we do both detailed circuit level design and measure the demand of routing resources for every circuit in each architecture.

## 3.1 FPGA Architecture Modeling

In this section we give a brief description of the FPGA architecture and delay modeling used in our work. The level of detail present in these models goes far beyond any modeling previously used in this kind of experimental analysis. All device parameters and circuits are modeled using SPICE simulations of a 0.18  $\mu$ m CMOS process.

We make the following assumptions about the basic island-style architecture:

- The number of routing tracks in each channel between logic blocks is uniform throughout the FPGA.
- All metal routing wires are placed on metal layer 3 with minimum width and spacing.
- Each circuit is mapped into the smallest square (MxM) grid possible given the number of logic clusters it requires.

However, it is important to note that the area metric we count is not the total area required by the square M x M block on the FPGA. Rather, we use the exact number of clusters required to implement the circuit. For example, a circuit which requires 800 logic blocks will be routed in 29 x 29 FPGA grid which results in 841 blocks. We use the area of the logic and routing surrounding 800 clusters as opposed to 841.

#### 3.1.1 Logic Circuit Design and Delay Model

The circuit design process described above is also necessary to determine accurate delay measurements of the final placed and routed circuit. In deep-submicron IC design processes, the effect of wire resistance and capacitance becomes more prevalent. We account for these effects in this delay modeling. Figure 3.1 shows the detailed logic block circuit. All circuit design was done in TSMC's  $0.18 \,\mu$ m,  $1.8 \,V$  CMOS process. The paths have been simulated with their actual loads in place and the input driven by what would actually be driving it in a real FPGA. As the cluster size increases, the buffers shown in Figure 3.1 must be sized larger because of larger loading from the internal muxes, which results in an increase in the basic BLE delay. This is shown in Table 3.1 which gives the logic delays as the cluster size increases for the paths indicated in Figure 3.1 for a BLE based on a 4-input LUT.



Figure 3.1: Structure and Speed Paths of a Logic Cluster [BRM99]

| Cluster Size (N)   | A to B (ps) | B to C and D to C (ps) | C to D (ps) | B to D (ps) |
|--------------------|-------------|------------------------|-------------|-------------|
| 1 (No local muxes) | 377         | 180                    | 376         | 556         |
| 2                  | 377         | 221                    | 385         | 606         |
| 4                  | 377         | 301                    | 401         | 702         |
| 6                  | 377         | 332                    | 397         | 729         |
| 8                  | 377         | 331                    | 396         | 727         |
| 10                 | 377         | 337                    | 387         | 724         |

Table 3.1: Logic Cluster Delays for 4-input LUT Using 0.18 µm CMOS process

Similarly, the design of the larger LUTs must be done carefully, with proper buffer sizing and, in some cases, insertion of buffers within the tree of pass-transistors. Table 3.2 gives the LUT delay as a function of the LUT size.

| LUT Size (K) | C to D (ps) |
|--------------|-------------|
| 2            | 199         |
| 3            | 283         |
| 4            | 401         |
| 5            | 534         |
| б            | 662         |
| 7            | 816         |

Table 3.2: LUT Delays Using 0.18  $\mu$ m CMOS process

#### 3.1.2 Routing Architecture

The target routing architecture of the CAD flow used in these experiments is one that Betz et. al [BRM99] indicate is a good choice. This architecture has the following parameters:

- Routing segments have a logical length of four (the logical length of a segment is defined as the number of logic block clusters that it spans)
- 50% of these segments use tri-state buffers as the programmable switch and 50% use pass transistors

The experiments conducted in [BRM99] were based on a LUT size of four and a cluster size of four. We will assume these results are valid for all the LUT sizes and cluster sizes that we are comparing.

However, the LUT and cluster size does affect the sizing of the buffers used to drive the programmable routing, both from the block itself and the tri-state buffers internal to the programmable routing. As the logic block cluster increases in size, the size of each logic tile is larger, and therefore the length of the wires being driven by each buffer increases. Since this increases the capacitive loading of each wire, the buffers must be sized appropriately. Betz [BRM99] indicates that for a cluster size of four and a LUT size of four, the best routing pass transistor width was ten times the minimum width, while the best tri-state buffer size was only five times the minimum. We size our buffers in direct proportion to the length of this tile. That is, if the tile length has doubled, then we double the size of the routing buffers.

## **3.2 Experimental Results**

In this section we present the experimental results of synthesizing benchmark circuits through the CAD flow described in Section 2.1 with the area delay modeled as described in Section 3.1. The benchmark circuits used in these experiments were the 20 largest from MCNC [Yan91] along with 8 new benchmarks <sup>1</sup>. Table 3.3 gives a description of the circuits, including the name, number of 4 input-LUTs and number of nets.

Each circuit was mapped, placed and routed with LUT size varying from 2 to 7 and cluster sizes from 1 to 10. With 6 different LUT sizes and 10 different cluster sizes this gives a total of 60 distinct architectures.

#### 3.2.1 Cluster Inputs Required vs. LUT and Cluster Size

Before answering the principal questions raised in the introduction, we need to determine an appropriate value for I, the number of logic block cluster inputs. The value of I should be a function of K (the LUT size) and N (the number of LUTs in a cluster). This is of concern since the larger the number of inputs the larger and slower the multiplexers feeding the LUT inputs will be, and more programmable switches will be needed to connect externally to the logic block. Indeed, one of the principal advantages of fully-connected clusters is that they require fewer than the full number of inputs (K  $\times$  N) to achieve high logic utilization. There are several reasons for this:

<sup>&</sup>lt;sup>1</sup>The 8 new benchmarks are from the University of Toronto from two computer vision applications. The 8 benchmarks are {display\_chip, img\_calc, img\_interp, input\_chip, peak\_chip, scale125\_chip, scale2\_chip, warping}

| Circuit       | # of 4-Input BLEs | Number of Nets |
|---------------|-------------------|----------------|
| alu4          | 1522              | 1536           |
| apex2         | 1878              | 1916           |
| apex4         | 1262              | 1271           |
| bigkey        | 1707              | 1936           |
| clma          | 8383              | 8445           |
| des           | 1591              | 1847           |
| diffeq        | 1497              | 1561           |
| dsip          | 1370              | 1599           |
| elliptic      | 3604              | 3735           |
| ex1010        | 4598              | 4608           |
| ex5p          | 1064              | 1072           |
| frisc         | 3556              | 3576           |
| misex3        | 1397              | 1411           |
| pdc           | 4575              | 4591           |
| s298          | 1931              | 1935           |
| s38417        | 6406              | 6435           |
| s38584.1      | 6447              | 6485           |
| seq           | 1750              | 1791           |
| spla          | 3690              | 3706           |
| tseng         | 1047              | 1099           |
| display_chip  | 1794              | 2419           |
| img_calc      | 10141             | 10180          |
| img_interp    | 2727              | 2769           |
| input_chip    | 807               | 841            |
| peak_chip     | 809               | 840            |
| scale125_chip | 2632              | 2654           |
| scale2_chip   | 1189              | 1202           |
| warping       | 1353              | 1394           |

Table 3.3: MCNC Benchmark Circuit Descriptions

- Some of the inputs are feedbacks from the outputs of LUTs within the same clusters, saving inputs.
- Some inputs are shared by multiple LUTs in the cluster
- Some of the LUTs do not require all of their K-inputs to be used. Indeed this is often the case, as pointed out in [KBKC99].

Betz and Rose [BR97] [BR98] showed that when K=4 and I is set to the value 2N+2, then 98% of all of the 4-LUTs in a cluster would typically be used. We would like to find a similar relation, but one that includes the variable K.

To determine this relation, we ran several experiments, using only the first three steps illustrated in Figure 2.2: logic synthesis, technology mapping and packing. For each possible value of N and K, we ran experiments varying the value of I (the maximum number of inputs to the cluster allowed by the packer) from 1 to K  $\times$  N. Following [BR97] we chose the lowest value of I that provided 98% utilization of all of the BLEs present in the circuit. Figure 3.2 is a plot of the relationship between the number of inputs (I) required to achieve 98% utilization and the cluster size (N) and the LUT size (K). Typically, the value of I must be between 50 and 60% of the total possible BLE inputs, I = K  $\times$ N.

By inspection we have generalized the relationship as:

$$I = \frac{K}{2} \times (N+1)$$

This equation provides a close fit to the results in Figure 3.2. The average percentage error across all possible data points is only 10.1 % with a standard deviation of 7.6 %.

#### **3.2.2** Area as a Function of N and K

In this section we present and discuss the experimental results that show the area of an FPGA as a function of N and K. Note that I was set to the value determined in the previous section.



Figure 3.2: Number of Inputs Required for 98% Logic Block Utilization

These results are for the 28 benchmark circuits. Area, as discussed above, is measured in terms of the total number of minimum-width transistors required to implement all of the logic and routing.

#### **Total Area**

Figures 3.3 and 3.4 give a plot of the geometric average (across all 28 circuits) of the total area required as a function of cluster size and LUT size. Several observations can be made from this data:

- LUT sizes of 4 and 5 are the most area-efficient for all cluster sizes.
- There is a reduction in total area when the cluster size is increased from 1 to 3 for all LUT sizes. However, as clusters are made larger (N > 4) there is very little impact on total FPGA area. Figure 3.4 demonstrates this behaviour very well. It is generally expected that increasing logic block functionality should result in more BLEs being added to a cluster and connections that normally would have been routed externally are now ab-

sorbed internal to the cluster. This should reduce the inter-cluster area which is usually much higher than the intra-cluster area and thus having a positive impact on total area. However, the reason the total FPGA area doesn't decrease is because increasing logic capacity (more input & output pins) results in an increase in track count. It must also be remembered that the logic block area is also increasing due to the LUT area and the multiplexer area. So the area savings is not as significant as it appears and Figures 3.4 and 3.3 confirm this fact.



Figure 3.3: Total Area for Clusters of Size 1 to 5

It is instructive to break out the components of the data in Figures 3.3 and 3.4 in order to achieve both insight and inspiration on how to make more area-efficient FPGAs. The total area can be broken into two parts, the logic block area (including the muxes inside the clusters) and the routing area, which is the programmable routing external to the clusters. Throughout the rest of this paper, these will be referred to as the intra-cluster area and inter-cluster area respectively.



Figure 3.4: Total Area for Clusters of Size 6 to 10

We will first explore the intra-cluster area. Figure 3.5 shows the total intra-cluster area component of the total area (again, geometrically averaged over the 28 circuits) as a function of the LUT size. The data shows that the intra-cluster area increases as K increases. This area is the product of the total number of clusters times the area per cluster. A plot of these two components for a cluster size of 1 is given in Figure 3.6.

The logic block area grows exponentially with LUT size as there are  $2^{K}$  bits in a K-input LUT. In addition, larger LUT sizes require larger intra-cluster multiplexers because the size of each multiplexer is (I + N) = (K/2 (N+1) + N). As K increases, though, the number of clusters decreases (because each LUT can implement more of the logic function) as shown by the downward curve in Figure 3.6.

However, the rate of decrease in the number of logic blocks is far outweighed by the increase in the size of the block as K increases, and hence the upward trend in Figure 3.5. Figure 3.7 decomposes the logic block area into two parts: a) intra-cluster multiplexer area and b) LUT area. The results illustrate that the local intra-cluster routing area cannot be ignored and



Figure 3.5: Total Logic Block Cluster Area



Figure 3.6: Number of Clusters and Cluster Area Versus K (for N=1)

can be quite significant for larger clusters.



Figure 3.7: Intra-cluster Multiplexer Area and LUT Size

Observing the absolute values in Figures 3.3 to 3.5, we see that the intra-cluster area typically takes up about only 25% to 35% of the total area, except when the LUT size reaches 6 and 7, at which point intra-cluster area becomes a dominant factor.

The key effect, as always in FPGAs, is with the routing area. Figure 3.8 is a plot of the total inter-cluster routing area as a function of the LUT size and cluster size. The Figure shows that the routing area decreases in a linear fashion with increasing LUT size. This particular result is interesting since previous work from [RFLC90] has shown that the routing area achieved a minimum between K=3 and K=4, and increased for values of K beyond this.

To explain this observed behavior, observe Figure 3.9 which decomposes the total routing area into two separate components: the number of clusters and the routing area per cluster. These curves are given for a cluster size of 1, but are representative for all cluster sizes. The product of these two curves gives the total inter-cluster routing area. The reason why the routing area decreases linearly with LUT size is that as we increase the LUT size, the number of clusters decreases much faster than the rate at which the routing area per cluster increases.

The routing area per cluster grows slightly with increasing LUT size since due to the fact that there is very little change in channel width as the LUT size is varied. This is shown in Tables 3.4 and 3.5 where it can also be seen that increasing cluster size leads to larger average channel widths. The channel width is the major determining factor in inter-cluster area. The difference in results from [RFLC90] and our current results can be attributed to the fact that we are now using better CAD tools with more sophisticated algorithms; in particular the quality of the placement tool and the routing tool is significantly better, and uses significantly less wiring. In addition, for clustered logic blocks, more of the routing is being implemented within the cluster itself.



Figure 3.8: Routing Area

#### **3.2.3** Performance as a Function of N and K

The second key metric for FPGAs is their speed measured by the critical path delay. The total critical path delay is defined as the total delay due to the logic cluster combined with the routing delay. Figure 3.10 shows the geometric average of the total critical path delay across



Figure 3.9: Number of Clusters and Routing Area Per Cluster Versus K (for N=1)

all 28 circuits as a function of the cluster size and LUT size. Observing the Figure, it is clear that increasing N or K decreases the critical path delay. These decreases are significant: an architecture with N=1 and K=2 has an average delay of 45 ns while K=7 and N=10 has an average critical path delay of just 14 ns. There are two trends that explain this behavior. As the LUT and cluster size increases:

- the delay of the LUT and the delay through a cluster increases
- the number of LUTs and clusters in series on the critical path decreases

We will discuss these effects in more detail below.

It is instructive to break the total delay into two components: intra-cluster delay (which includes the delay of the muxes and LUTs), and inter-cluster delay.

Figure 3.11 shows the portion of the critical path delay that comes from the intra-cluster delay as a function of K and N. There are two key points to observe here. First, the intra-cluster delay decreases as the LUT size increases. This is due to the fact that there is a reduction in

| Cluster Size (N) | LUT Size (K) | Channel Width (geometric avg.) |
|------------------|--------------|--------------------------------|
| 1                | 2            | 15.9                           |
|                  | 3            | 16.5                           |
|                  | 4            | 17.6                           |
|                  | 5            | 17.9                           |
|                  | б            | 18.5                           |
|                  | 7            | 18.5                           |
| 2                | 2            | 23.7                           |
|                  | 3            | 25.5                           |
|                  | 4            | 27.7                           |
|                  | 5            | 26.6                           |
|                  | б            | 26.9                           |
|                  | 7            | 26.3                           |
| 3                | 2            | 29.9                           |
|                  | 3            | 31.7                           |
|                  | 4            | 32.3                           |
|                  | 5            | 32.1                           |
|                  | б            | 30.9                           |
|                  | 7            | 30.9                           |
| 4                | 2            | 34.3                           |
|                  | 3            | 36.1                           |
|                  | 4            | 36.8                           |
|                  | 5            | 35.9                           |
|                  | 6            | 34.3                           |
|                  | 7            | 34.3                           |
| 5                | 2            | 37.4                           |
|                  | 3            | 39.6                           |
|                  | 4            | 40.1                           |
|                  | 5            | 39.6                           |
|                  | 6            | 38.3                           |
|                  | 7            | 38.0                           |

Table 3.4: Channel Width vs. LUT and Cluster Size (1 to 5)

the number of BLE levels on the critical path and hence there will be fewer logic levels to implement. This will translate into a reduction in intra-cluster delay. Figure 3.12 illustrates this concept more clearly at the BLE level: it is a plot of BLE delay and number of BLEs on the critical path versus LUT size for a cluster size of 1. The number of BLE levels decreases quicker than the increase in BLE delay and hence the decrease in logic delay. The second behaviour that should be noticed is that the intra-cluster delay increases for any given LUT size as the cluster size is increased. This is because the intra-cluster muxes get larger and therefore

| Cluster Size (N) | LUT Size (K) | Channel Width (geometric avg.) |
|------------------|--------------|--------------------------------|
| б                | 2            | 40.9                           |
|                  | 3            | 43.8                           |
|                  | 4            | 43.8                           |
|                  | 5            | 42.9                           |
|                  | б            | 41.0                           |
|                  | 7            | 40.6                           |
| 7                | 2            | 43.8                           |
|                  | 3            | 46.4                           |
|                  | 4            | 46.6                           |
|                  | 5            | 45.5                           |
|                  | 6            | 44.2                           |
|                  | 7            | 43.0                           |
| 8                | 2            | 46.3                           |
|                  | 3            | 49.9                           |
|                  | 4            | 49.5                           |
|                  | 5            | 48.6                           |
|                  | б            | 45.3                           |
|                  | 7            | 45.8                           |
| 9                | 2            | 49.3                           |
|                  | 3            | 50.6                           |
|                  | 4            | 51.2                           |
|                  | 5            | 50.4                           |
|                  | 6            | 47.0                           |
|                  | 7            | 47.4                           |
| 10               | 2            | 50.6                           |
|                  | 3            | 53.6                           |
|                  | 4            | 54.7                           |
|                  | 5            | 52.0                           |
|                  | б            | 50.5                           |
|                  | 7            | 50.4                           |

Table 3.5: Channel Width vs. LUT and Cluster Size (6 to 10)

slower. However, the delay through these muxes is still much faster than the inter-cluster delay, as shown in figure 3.11.

Figure 3.13 shows the portion of the critical path delay that comes from the inter-cluster routing delay as a function of K and N.

As K increases there are fewer LUTs on the critical path, and this translates into fewer inter-cluster routing links, thus decreasing the inter-cluster routing delay. Similarly, as N is increased, more connections are captured within a cluster, and again, the inter-cluster routing



Figure 3.10: Total Delay for Clusters of Size 1 to 10



Figure 3.11: Total Intra-Cluster Delay for Clusters of Sizes 1 to 10

In discussing these trade-offs, it's useful to follow an explicit example: Table 3.6 shows how the delay through one BLE and multiplexer stage (delay from B to D on Figure 3.1) rises from 0.556 ns to 0.702 ns when going from K=4 and N=1 to K=4 and N=4. Although the number of BLE levels on the critical path remains fairly constant since we have not modified K, the total logic delay increases from 5.58 ns to 6.65 ns due to the increase in the local cluster routing multiplexers. However, since there are now 4 BLEs in every cluster as opposed to a single BLE, more logic is implemented internally within the clusters. Nets that normally would have been routed externally are now internal to the clusters. This translates in a reduction in the inter-cluster routing delay from 19.48 ns when using K=4, N=1 to 11.72 ns for K=4 and N=4. The total critical path delay decreases from 25.91 ns to 19.35 ns as originally shown in Figure 3.10.

|                                | N=1      | N=4      |
|--------------------------------|----------|----------|
| BLE + Mux Delay                | 0.556 ns | 0.702 ns |
| Avg # of BLEs on Critical Path | 9.53     | 9.13     |
| Total Intra-Cluster Delay      | 5.58 ns  | 6.65 ns  |
| Total Inter-Cluster Delay      | 19.48 ns | 11.72 ns |
| Total Delay                    | 25.91 ns | 19.35 ns |

Table 3.6: Critical Path Delay Comparison for K=4

In general, inter-cluster routing delay is much larger than the intra-cluster delay, and hence the value of increasing the cluster or LUT size. However, it is interesting that increasing cluster size has little impact after a certain point (for N > 3). Figure 3.10 shows this clearly where for any fixed LUT size, the majority of the improvement in critical path delay occurs as the cluster size is increased from 1 to 3. Any further increases in cluster size results in a very minimum delay improvement. This behaviour suggests that clustering has little effect after a certain point. This is counter-intuitive to what we expect. That is, employing larger clusters



Figure 3.12: Number of BLEs on Critical Path and BLE delay vs K (for N=1)



Figure 3.13: Total Inter-Cluster Delay for Clusters of Size 1 to 10

should always reduce the critical path. Although, the total delay results from Figure 3.10 do not contradict this, what was surprising was how little of an improvement in total delay that was

achieved with larger clusters. To better understand this situation, it is sufficient to examine the number of logic block levels (ie: cluster levels). Figure 3.14 shows the number of cluster levels as a function of cluster and LUT size. The results clearly show the number of levels decreasing with increasing cluster and LUT sizes. But, for any given LUT size it can be seen that most of the reduction in the number of levels occurs as the cluster size is increased from 1 to 3. Also, recall that the majority of the critical path delay was reduced in this range (as shown in Figure 3.10). The direct relationship between the number of cluster levels on the critical path and the final total delay is no coincidence. Fewer logic blocks on the critical path leads to improved performance. The main reason that the total delay did not improve significantly as we varied the cluster size from 3 to 10 was that there was no significant reduction in the number of logic block levels. Without a reduction of the number of inter-cluster levels on the critical path we cannot possibly expect improvements in FPGA performance.



Figure 3.14: Number of Cluster levels on Critical Path

Another interesting trend to observe from figure 3.14 is that increasing the cluster size has less of an effect for architectures composed of larger LUTs. For example, increasing the cluster

size from 1 to 10 for a 2-input LUT architecture results in a 60% reduction in the number of cluster levels on the critical path. Conversely, employing BLEs with a 7-input LUT and varying the cluster size from 1 to 10 results in only a 22% reduction in logic levels. Hence, clustering proves to be more effective for smaller LUTs. To understand this more clearly, we should examine the average BLE fanout for every LUT size. Figure 3.15 shows this and as we can see larger LUTs correlate to larger average fanout. The reason smaller LUTs had a better response to larger cluster sizes was due to the fact that each LUT had a relatively small fanout and hence adding an extra BLE to a cluster usually guaranteed some reduction in the number of logic levels. The same cannot be said about larger LUTs since they have a much larger average block fanout and it becomes much more diffult to ensure that any subsequent BLE addition will result in fewer cluster levels on the critical path.



Figure 3.15: Average BLE Fanout

#### 3.2.4 Area-Delay Product

So far, we have examined the effect of K and N on area and performance of FPGAs. As area can often be traded for delay, it is instructive to look at the area-delay product. Figures 3.16 and 3.17 illustrate the area-delay product versus K and N. This plot shows that using a LUT sizes of 4 to 6 and clusters of 3 to 10 appear to give the best area-delay results.

Notice that area-delay decreases significantly as the LUT size is increased from 2 to 4 for all cluster sizes. This is because their delay is poor due to the large amount of BLE levels on the critical path combined with the fact that the total area requirements are slightly larger (by about 20%), and so they are a bad choice.

The area-delay product jumps for K=7 on average by 10-15% principally because the huge area cost for 7-input LUT outweighs the modest performance gains it achieves.

This latter observation suggests that, if there was a way to achieve the depth properties of a 7-input LUT without paying the heavy area price, then such a 7-input input function may well be a good choice.

We have also observed that, for large clusters, a large portion of the delay is taken up by the intra-cluster muxes. If this delay could be reduced somehow, then significant speed wins could be achieved.

#### 3.2.5 Summary

We have studied the effect that different logic block architectures have on FPGA area and performance. The main results are summarized in table 3.7. In addition, we experimentally derived a relationship between the number of cluster logic block inputs required to achieve 98% utilization as a function of the LUT size, K and the cluster size, N. This is  $I = \frac{K}{2} \times (N+1)$ , where I is the number of distinct cluster inputs.

Secondly, we have shown that small LUT sizes (2-input and 3-input LUTs) are not as area efficient as the 4 and 5-input LUTs and their performance characteristics are very poor. If area-



Figure 3.16: Area-Delay Product for Clusters of Size 1 to 10



Figure 3.17: Close-up View of Area-Delay Product for Clusters of Size 1 to 10

delay is the main criteria, then the use of clusters of between 3 and 10 and LUT sizes of 4 to 6 will produce the best overall results.

Finally, our work suggests two future directions: finding ways to reduce the number of levels of logic without the expense of large LUTs, and reducing the delay of intra-cluster multiplexers.

| Criteria   | LUT Size (K) | Cluster Size (N) |
|------------|--------------|------------------|
| Area       | 4 to 5       | 4 to 9           |
| Delay      | 7            | 3 to 10          |
| Area-Delay | 4 to 6       | 3 to 10          |

Table 3.7: Summary of Best Area, Delay, and Area-Delay Results

## CHAPTER 4

## Hardwired Logic Blocks

This chapter explores the hardwired logic block (HLB) architecture introduced in Chapter 1 with the expectation that it will lead to an improved area-delay product compared to the non-hardwired 2 to 7-input LUT clustered architectures. Recall that employing logic clusters consisting of 3-10 BLEs and LUT sizes of 4-6 produced the best area-delay results. However, we have shown that there was a 10-15% increase in the area-delay product when moving from a 6-input LUT to a 7-input LUT. The increase in area-delay was due to the fact that the LUT area grows exponentially with LUT size. Our main motivation for exploring HLB architectures was to find a coarse grain logic block which had almost the equivalent logic functionality and depth properties as a 7-input LUT but without the large area cost. The rest of this chapter will examine one such architecture, a cascade of two 4-input LUTs, as illustrated in Figure 4.1, and compare its area and performance to the non-hardwired architectures.

The HLB architecture in Figure 4.1 has several advantages over a 7-input LUT. First, there is a considerable logic area savings since the 7-LUT requires  $2^7 = 128$  SRAM configuration bits while the cascaded 4-LUTs demands only  $2 \times 2^4 = 32$  SRAM bits. Secondly, the LUT



Figure 4.1: Cascaded 4-LUTs

delay is slightly less since the output of the first LUT is fed directly into the fastest LUT input of the second stage LUT. This is possible because LUTs are implemented as a tree of transistors, and therefore the inputs have different delays. Our goal in this chapter is to determine the area and delay of the HLB structure shown in Figure 4.1 and compare it to the regular 4 to 7-input LUT-based logic clusters.

## 4.1 Hardwired Architecture

In general, HLB architectures consist of one or more fully-interconnected HLBs as illustrated in Figure 4.2. Each HLB is hardwired in a tree topology such that every LUT within the HLB can only fanout to at most one other LUT within the same HLB. The root BLE in the HLB is the only LUT that can implement sequential logic since its output can be registered by a D flip-flop. The finer points of an HLB architecture are presented below.

#### 4.1.1 Cluster Inputs (I)

The number of distinct cluster inputs is an important architectural issue that needs to be addressed for HLBs. As discussed in Chapter 3 for non-hardwired architectures, just enough inputs were used to produce 98% utilization of the LUTs. However, such high logic block density is difficult to achieve in an HLB due to the inflexibility imposed by the hardwired connections. For that reason it is assumed that the cascaded 4-LUT structure, which has 7 inputs, requires the same number of inputs as a 7-LUT. This way, the same relationship that



Figure 4.2: Cluster Description (with HLBs)

was determined earlier in Chapter 3 for 98% logic utilization can be used:

$$I = \frac{K}{2} \times (N+1)$$

Here, *N* represents the number of HLBs per cluster and *K* is the number of input pins accessible externally from the HLB. For the cascaded 4-LUT, K=7 and therefore,  $I = \frac{7}{2} \times (N+1).$ 

#### 4.1.2 **Tapping Buffers**

The HLB structure in Figure 4.2 has 7 inputs and one output which can be registered or unregistered. Notice that only LUT B can have its output accessed externally from the cluster output pins. Since the output of LUT A is hardwired into LUT B, this restricts how and where subsequent LUTs are placed within the HLB. In Figure 4.2, LUTs A and B may be adjacent to each other as long as LUT A has a fanout of one and LUT B uses the output of LUT A as one of its inputs. If this pattern doesn't appear frequently enough in the netlist, many LUTs within HLBs will be unused. The result is a reduction in logic block density for the HLB-based FPGA. The use of *tapping buffers* [Chu94] will help alleviate this problem and improve the density of HLB-based FPGAs. Tapping buffers allow the LUT outputs within an HLB to be externally accessed. For example, in Figure 4.3 the output of A can be accessed without propagating it through LUTs B and C.



Figure 4.3: HLB Tapping Buffers

The tapping buffer saves two LUT delays and also allows other logic functions to be implemented in LUTs B and C rather than simply programming them to propagate the output of A and essentially wasting area. The possible benefits of tapping buffers are increased logic density and improvements in speed. For these reasons, tapping buffers are included in the HLB architecture studied in this chapter. Section 4.2.2 provides experimental justification for this decision. We should keep in mind that tapping buffer-enabled HLB architectures lead to more cluster output pins compared to HLBs without tapping buffers as illustrated in Figure 4.4.



Figure 4.4: Comparison of HLBs with and without tapping buffers

#### 4.1.3 Logical Equivalence of Cluster Outputs

Logical equivalence on the cluster output pins means the router can connect to any one of the output pins and still reach any of the BLE or HLB outputs in the cluster. This added flexibility can reduce the track count by up to 50% compared to an architecture where the output pins are not logically equivalent. For clusters based on the non-tapping structure shown in Figure 4.4(a), the outputs *are* logically equivalent because they are identical and the full connectivity of the inputs allows them to be easily swapped. Figure 4.4(b) shows a tapping-buffer based HLB architecture and these outputs *are not* logically equivalent and the outputs of LUTs A and B cannot be swapped. Output pin logical equivalence for this structure can be achieved by creating a full-routing crossbar which connects all LUT and cluster outputs together as shown in Figure 4.5. The crossbar contains N N:1 multiplexers, where N is the number of LUTs in the cluster. The area cost of this output crossbar multiplexer is relatively small in comparison to the total logic block area.



Figure 4.5: HLB Cluster Contents with Full Routing Crossbar for Output Signals

Table 4.1 shows the percentage of the logic block area that this full crossbar routing multiplexer occupies for different cluster sizes. Even for large cluster sizes, the logic block area overhead is less than 8%.

| Cluster Size | Multiplexer Area | Logic Block Area | %   |
|--------------|------------------|------------------|-----|
| 1            | 16               | 766              | 2   |
| 2            | 72               | 1723             | 4.1 |
| 3            | 168              | 3089             | 5.4 |
| 4            | 256              | 4504             | 5.6 |
| 5            | 420              | 6274             | 6.6 |
| б            | 552              | 8363             | 6.6 |
| 7            | 700              | 10441            | 6.7 |
| 8            | 864              | 12511            | 6.9 |
| 9            | 1152             | 15011            | 7.6 |
| 10           | 1360             | 17419            | 7.8 |

Table 4.1: Percentage of Logic Block Area that is Occupied by Output Routing Crossbar

## 4.2 HLB Packing

In order to explore HLBs, we need to synthesize circuits into HLB structures. This section outlines a packing algorithm aimed at hardwired logic blocks. However, it should be noted that there are several different ways of synthesizing circuits into HLBs.

• The entire HLB can be considered as a coarse grain target for technology mapping during logic synthesis. The basic procedure here is to translate the set of boolean equations that represent the circuit into logic gates and then perform technology mapping directly into the HLB structure. This was the approach taken in [CD94] since it was believed that technology mapping to HLBs focused more attention on optimizing hardwired connections which could possibly lead to better results than the alternatives listed below.

- HLB packing could be performed during layout synthesis as a separate step before placement and routing as is done in VPACK [BRM99]. [Chu94] also took this approach to HLB packing where the circuit would first be mapped to a netlist of basic blocks composed of LUTs and flip-flops. Afterwards, these LUTs and flip-flops are packed into HLBs.
- Finally, packing could be done simultaneously with placement. The placer would have the ability to move BLEs from one cluster to another in order to improve FPGA area and/or delay.

This thesis does HLB packing prior to placement but after logic synthesis, as illustrated in the flow given in Figure 4.6. Although the direct logic synthesis of gates into HLBs holds the promise of the best results, this can be very complex and we chose the layout synthesis approach due to time constraints.



Figure 4.6: HLB Packing Flow

#### 4.2.1 HLB Packing Algorithm

The goal of the HLB packer is to minimize some combination of area and delay. The main priority is to pack highly critical LUTs into the same HLB to take advantage of the "zero-delay" hardwired links. If this is not possible then critically related LUTs should be packed into the same cluster. The HLB packing pseudo-code is illustrated in Figure 4.6. The first several steps in the HLB packing algorithm are identical to the T-VPACK [MBR99] algorithm described in Chapter 2. First, timing analysis is performed on the unclustered netlist of BLEs. After timing analysis, the BLE with the highest criticality <sup>1</sup> is chosen as the seed of a new cluster. This BLE is placed at the root of any of the HLBs, however, it may be shifted within the HLB later on. With the seed BLE now packed, the next best BLE to pack into the HLB and cluster is determined. To do this, the attraction function described in Chapter 2 and originally defined in [MBR99] is used to find the next candidate BLE:

$$Attraction(B) = \lambda \cdot Criticality(B) + (1 - \lambda) \cdot \frac{|Nets(B) \cap Nets(C)|}{MaxNets}$$

The attraction function determines which unclustered BLE *B* should be added to the current cluster *C*. The parameter which controls the tradeoff between area and delay is  $\lambda$ . Setting  $\lambda$  to 1 results in a packing algorithm focuses only on criticality. Conversely, setting  $\lambda$  to 0 results in a packing algorithm which tries to maximize input sharing and hence FPGA logic density. Our work uses a  $\lambda$  value of 0.75 as suggested in [MBR99]. Once the next BLE (*BestBLE*) to be clustered has been determined we attempt to place it into one of the HLBs in the currently open cluster. At this stage, there are three choices which are evaluated in order:

- 1. Place BestBLE in one of the non-empty HLBs within the current cluster, or
- 2. Place BestBLE into the next empty HLB in the current cluster, or
- 3. Place BestBLE into the next empty cluster.

<sup>&</sup>lt;sup>1</sup>Refer to Chapter 2 for a definition of criticality

```
Let: UnclusteredBLEs be the set of BLEs not contained in any cluster
    C be the set of HLBs contained in the current cluster
    HLBs[i] be the set of HLBs in the cluster
    num_HLBs be the number of HLBs in the current cluster which are NOT empty
    N be the maximum number of HLBs in a cluster
    LogicClusters be the set of clusters (where each cluster is a set of HLBs)
UnclusteredBLEs = PatternMatchToBLEs (LUTs, Registers);
LogicClusters = NULL;
ComputeCriticalities();
num_HLBs = 1
while (UnclusteredBLEs != NULL) { /* More BLEs to cluster */
    C = GetMostCriticalBLE (UnclusteredBLEs);
    while (|C| < maximum_LUT_capacity) { /* Cluster is not full */
        if (BestBLE == NULL) /* No BLE can be added to cluster */
            break;
        BLE_SUCCESSFULLY_ADDED = FALSE; /* initilization */
        for (i=0; i < num_HLBs; i++) {
            if (BestBLE_can_be_inserted_into_HLB( BestBLE, HLB[i] )) {
                HLBs[i] = HLBs[i] \cup BestBLE;
                BLE_SUCCESSFULLY_ADDED = TRUE;
                break;
            }
        }
        if (BLE_SUCCESSFULLY_ADDED == FALSE) {
            /* Try placing BestBLE into an empty HLB if one exists */
            num_HLBs ++;
            if (|num_HLBs| > N) {
                break; /* No more empty HLB slots */
            } else {
                HLBs[num_HLB -1] = insert_into_HLB( BestBLE );
                HLBs[num_HLB -1] = HLBs[num_HLBs -1] \cup BestBLE;
            }
        }
        UnclusteredBLEs = UnclusteredBLEs - BestBLE;
        C = C \cup HLBs;
    }
    LogicClusters = LogicClusters \cup C;
}
```

This algorithm tries packing *BestBLE* into one of the non-empty HLBs to determine if it has a common net with any of the previously packed BLEs. The *BestBLE* can be added to one of the non-empty HLBs if a common net exists, the HLB is NOT full and only one of the LUTs makes use of the register. *BestBLE* can still be added to an HLB in which no common net exists as long as the logic block architecture contains tapping buffers. For example, Figure 4.8 shows a cascaded 4-LUT HLB where LUT F has been placed at the root. Due to the tapping buffer and the fact that F only used 3 of its 4 LUT inputs, it is now possible to add another LUT G which has no common connection with LUT F.



Figure 4.8: HLB with Cascaded 4-LUTs and Tapping Buffer

#### 4.2.2 HLB Packing with Tapping Buffers

The example above illustrates how tapping buffers can improve FPGA density. We performed two separate studies which examined the effectiveness of tapping buffer-enabled HLBs for the cascaded 4-LUT structure shown in Figure 4.8. The first HLB architecture contained no tapping buffers and the second had tapping buffers between all the LUTs in the HLBs. The 28 benchmark circuits used in Chapter 3 were synthesized and packed into HLBs. The results in Table 4.2 show that that tapping buffered architectures had an average logic block utilization
$^2$  rate of 88% compared to only 66% for non-tapping buffered architectures. This increased logic density lead to 26% fewer clusters on average as shown in Table 4.3.

| Cluster Size | % Utilization (no tapping) | % Utilization (with tapping) |
|--------------|----------------------------|------------------------------|
| 1            | 64                         | 89                           |
| 2            | 63                         | 87                           |
| 3            | 64                         | 88                           |
| 4            | 64                         | 87                           |
| 5            | 65                         | 88                           |
| б            | 65                         | 87                           |
| 7            | 66                         | 87                           |
| 8            | 66                         | 87                           |
| 9            | 67                         | 88                           |
| 10           | 67                         | 88                           |

Table 4.2: HLB Cluster Utilization (with and without tapping buffers)

Table 4.3: Number of Clusters with and without Tapping Buffers

| Cluster Size | <pre># of Clusters (no tapping)</pre> | <pre># of Clusters (with tapping)</pre> | % savings |
|--------------|---------------------------------------|-----------------------------------------|-----------|
| 1            | 1729                                  | 1233                                    | -28.6%    |
| 2            | 869                                   | 631                                     | -27.3%    |
| 3            | 573                                   | 418                                     | -27.0%    |
| 4            | 428                                   | 315                                     | -26.4%    |
| 5            | 337                                   | 251                                     | -25.5%    |
| 6            | 280                                   | 210                                     | -25.0%    |
| 7            | 239                                   | 179                                     | -25.1%    |
| 8            | 207                                   | 157                                     | -27.1%    |
| 9            | 183                                   | 139                                     | -24.0%    |
| 10           | 165                                   | 125                                     | -24.2%    |

### 4.3 Experimental Results

This section presents the area and delay results for the cascaded 4-LUTs after HLB packing, placement and routing through the full timing-driven CAD flow given in Figure 2.2. This time,

<sup>&</sup>lt;sup>2</sup>Utilization refers to the percentage of LUTs within the cluster that are actually used

however, the new HLB packing algorithm is used. Although there are many different HLB structures, due to time constraints only the cascaded 4-LUT configuration was explored. The 20 largest MCNC [Yan91] circuits were used along with 8 new benchmarks as described in Chapter 3 and Table 3.3. Throughout this section, the results of the hardwired architecture in Figure 4.1 will be compared to those of the conventional non-hardwired architecture. All results are based on a 0.18  $\mu$ m CMOS design, including detailed circuit-level design for the HLB-based logic clusters was performed.

#### 4.3.1 Area Results

This section presents the total FPGA area results (logic cluster + external routing) of the cascaded 4-LUT architecture in relation to the non-hardwired 4 to 7-LUTs shown in Figure  $4.9.^3$ From this, two key observations about the data can be made:



Figure 4.9: Total Area Comparisons for Hardwired Arch. vs. Non-Hardwired

<sup>&</sup>lt;sup>3</sup>The results for the small grain 2 and 3-input LUTs are not included due to their poor area-delay product. It is more important to compare the hardwired results with the best non-hardwired architectures.

- Increasing the cluster size from 4 to 10 in the cascaded 4-LUT architecture had a greater negative effect on total area compared to a similar increase for the non-hardwired 4 to 7-LUTs.
- The cascaded 4-LUT requires less total area than the 7-input LUT, as anticipated. Although it is not as efficient as the 4-LUT.

To further understand these observations, we have broken the total area results into two components: i) inter-cluster and ii) intra-cluster area. These components are shown in Figures 4.10 and 4.11. The inter-cluster area results show that the positive effects of clustering on reducing routing area are diminished after a certain point. More specifically, increasing the cluster size beyond 4 has very little impact on reducing inter-cluster area. The reason being that nets were no longer being absorbed within the cluster after this point. If nets are not being absorbed, then the BLEs attached to their terminals are in separate clusters and hence valuable routing resources must be used to make the connections. As a result, there was no reduction in inter-cluster area for any cluster size increase beyond 4.

The other component of the total area is the intra-cluster area shown in Figure 4.11. Here, the cascaded 4-LUT intra-cluster area demands more area than the 4 and 5-LUT non-hardwired architectures. The hardwired logic block architecture requires a larger intra-cluster area for several reasons:



Figure 4.10: Inter-Cluster Area Comparisons for Hardwired Arch. vs. Non-Hardwired

- 1. There are now twice as many LUTs per cluster for any given cluster size compared to the non-hardwired architectures.
- The intra-cluster multiplexer area is larger since there are now more LUTs which translates into more inputs. Also, these extra LUTs result in more outputs which require a full routing crossbar on the outputs to maintain logical equivalence on the cluster output pins.
- 3. There is only 88% logic block utilization for the hardwired architecture compared to 98% for the non-hardwired architectures.

All these factors contributed to the increase in total FPGA area of the cascaded 4-LUT architecture as the cluster size was increased from 4 to 10. Note, that even though the hardwired architecture demanded slightly more total area than the 4 to 6-LUTs, there was still up to 15% in area savings compared to the 7-LUTs. Recall that the initial motivation for exploring these hardwired structures was to reduce the large area requirements of the 7-LUT while still main-



Figure 4.11: Intra-Cluster Area Comparisons for Hardwired Arch. vs. Non-Hardwired

taining its desirable performance characteristics. The results indicate that our initial predictions were correct as far as area is concerned. The 15% area reduction could have been better had it not been for the 5 circuits in the benchmark suite which required significantly more 4-LUTs than 7-LUTs after technology mapping.<sup>4</sup> A poor mapping would be any circuit that requires more than two 4-input LUTs for every 7-LUT after technology mapping. Table 4.4 presents the ratio of 7-LUTs to 4-LUTs after technology mapping for all the circuits with the inefficient ones marked in bold. The determination of why these circuits map inefficiently is left for future work.

#### 4.3.2 Delay Results

The next step is to examine the hardwired logic block performance. Figure 4.12 compares the critical path delays of the cascaded 4-LUTs to the regular non-hardwired architectures. The results show that the cascaded 4-LUTs slightly out-perform the pure 4-LUTs but are worse

<sup>&</sup>lt;sup>4</sup>The 5 circuits were bigkey, des, ex1010, s38417, img\_calc

| Circuit       | # of 4-LUTs | # of 7-LUTs | $\frac{4-LUTs}{7-LUTs}$ |
|---------------|-------------|-------------|-------------------------|
| alu4          | 1522        | 952         | 1.59                    |
| apex2         | 1878        | 1351        | 1.39                    |
| apex4         | 1262        | 847         | 1.48                    |
| bigkey        | 1707        | 463         | 3.6*                    |
| clma          | 8383        | 5539        | 1.51                    |
| des           | 1591        | 497         | 3.2*                    |
| diffeq        | 1497        | 800         | 1.87                    |
| dsip          | 1370        | 1357        | 1.01                    |
| elliptic      | 3604        | 2257        | 1.59                    |
| ex1010        | 4598        | 1827        | 2.51*                   |
| ex5p          | 1064        | 559         | 1.9                     |
| frisc         | 3556        | 2519        | 1.41                    |
| misex3        | 1397        | 1039        | 1.34                    |
| pdc           | 4575        | 3014        | 1.51                    |
| s298          | 1931        | 1141        | 1.69                    |
| s38417        | 6406        | 3099        | 2.06*                   |
| s38584        | 6447        | 3489        | 1.84                    |
| seq           | 1750        | 1198        | 1.46                    |
| spla          | 3690        | 2663        | 1.38                    |
| tseng         | 1047        | 877         | 1.19                    |
| display_chip  | 1794        | 1046        | 1.71                    |
| img_calc      | 10141       | 3771        | 2.68*                   |
| img_interp    | 2727        | 1430        | 1.9                     |
| input_chip    | 807         | 550         | 1.46                    |
| peak_chip     | 809         | 493         | 1.64                    |
| scale125_chip | 2632        | 1808        | 1.45                    |
| scale2_chip   | 1189        | 775         | 1.53                    |
| warping       | 1353        | 859         | 1.57                    |

Table 4.4: Comparison of number of 4-LUT to 7-LUT blocks after technology mapping

than the 5 to 7-LUTs.



Figure 4.12: Total Critical Path Delay Comparisons for Hardwired Arch. vs. Non-Hardwired

To understand this situation more clearly, we should examine the number of BLEs on the critical path and the number of cluster levels on the critical path. The number of BLEs simply represents the amount of BLEs in series traversed along the critical path. Similarly, the number of clusters represents how many inter-cluster levels that exist on the critical path. There is a direct relationship between the number of cluster levels and the critical path delay. Generally, the more levels there are the larger the delay. This is generally true since in modern FPGAs the majority of the delay is still due to the external cluster routing. Figures 4.13 and 4.14 illustrate the number of BLEs and cluster levels respectively as a function of LUT and cluster size. The trend shows that more coarse grain logic structures result in fewer BLE and cluster levels on the critical path. These larger BLEs tend to absorb more connections internally and produce a fewer number of logic blocks. It can be seen that the cascaded 4-LUTs have more cluster levels than the 5 to 7-LUTs and this the major reason for the increase in total delay.



Figure 4.13: Number of BLEs on the Critical Path Comparisons for Hardwired Arch. vs. Non-Hardwired



Figure 4.14: Number of Clusters on the Critical Path Comparisons for Hardwired Arch. vs. Non-Hardwired

#### 4.4 Area-Delay Results

In FPGA studies, another metric for evaluating the quality of an architecture is by measuring its area-delay product. Since there is always the tradeoff between area and delay, a good architecture is one with a low value for area-delay product. This minimum represents the point where we are sacrificing the least amount of area for the most performance. Table 4.5 shows the areadelay product for the cascaded 4-LUTs and the non-hardwired architectures (4 to 7-LUTs). The 7-LUT architecture has the worst areadelay product but interestingly, the hardwired architecture does not perform any better despite the fact that there was a reduction in the total FPGA area. Unfortunately, the area reduction was not accompanied by a corresponding reduction in critical path delay. The improvement in total FPGA area was not enough to compensate for the slight increase in delay.

| Cluster Size | 4-LUT | 5-LUT | 6-LUT | 7-LUT | Hardwired 4-LUTs |
|--------------|-------|-------|-------|-------|------------------|
| 1            | 0.098 | 0.088 | 0.079 | 0.094 | 0.091            |
| 2            | 0.084 | 0.077 | 0.069 | 0.078 | 0.076            |
| 3            | 0.067 | 0.064 | 0.061 | 0.074 | 0.074            |
| 4            | 0.063 | 0.058 | 0.062 | 0.069 | 0.075            |
| 5            | 0.065 | 0.062 | 0.059 | 0.069 | 0.074            |
| б            | 0.062 | 0.059 | 0.056 | 0.068 | 0.073            |
| 7            | 0.060 | 0.060 | 0.058 | 0.070 | 0.074            |
| 8            | 0.058 | 0.059 | 0.060 | 0.070 | 0.074            |
| 9            | 0.059 | 0.061 | 0.060 | 0.070 | 0.075            |
| 10           | 0.059 | 0.060 | 0.060 | 0.073 | 0.077            |

 Table 4.5: Area-Delay Product Comparison Between Cascaded 4-LUTs and Non-hardwired

 Architectures

### 4.5 Summary

This chapter examined the feasibility of employing hardwired logic blocks and how they compared to the non-hardwired architectures. We have found that the hardwired architecture required 10-15% less area than the 7-input LUT. However, there was a slight increase in critical path delay and this offset any positive gains we realized in area. The result was no improvement in the area-delay product.

## CHAPTER 5

#### **Conclusions and Future Work**

#### 5.1 Summary and Contributions

The goal of this thesis was to study the impact of different logic block architectures on FPGA density and performance. In chapter 3 we studied the effect of LUT and cluster size on FPGA area and speed. It was found that in terms of area, using LUT sizes of 4-5 and clusters of 4-9 were best. Also, a LUT size of 7 and clusters of 3-10 produced the lowest average critical path delay. However, for the area-delay product, LUT sizes of 4-6 and clusters of 3-10 were best. Another important contribution from Chapter 3 was an expression for the number of distinct cluster inputs (I) as a function of both LUT size (K) and cluster size (N). It had been previously shown in [BR97] [BR98] that when K=4 and I is set to 2N + 2 then 98% of all the 4-LUTs would be utilized. We found a more general expression for the 98% logic block utilization:

$$I = \frac{K}{2} \times (N+1)$$

Chapter 4 introduced a new HLB architecture and packing algorithm. We were specifically interested in the cascaded 4-LUT hardwired logic blocks. Our motivation for exploring such a hardwired architecture was that it may lead to an improvement in area-delay when compared to the non-hardwired logic blocks. However, we found that the use of the cascaded 4-LUT architecture did not improve our area-delay product. In fact, the hardwired area-delay results were quite similar to the 7-LUT logic block architectures.

#### 5.2 Future Work

In the future it would be interesting to study the effect of larger LUT sizes (> 7) and larger cluster sizes (> 10) on FPGA area and performance. Also, our HLB study focused strictly on the hardwired 4-LUTs. However, the behaviour of different hardwired architectures such as those shown in Figure 5.1 would be an interesting topic.



Figure 5.1: Various HLB architectures

Finally, our packing algorithm was based on a layout synthesis approach where we first technology mapped to K-LUTs and then packed these logic elements to form HLBs. Another method that could possibly lead to better area and delay results would be to perform the HLB mapping at the logic synthesis level. That is, treat the entire HLB as a coarse grain target for technology mapping.

# ${\sf APPENDIX}\,A$

### **Total Area**

| Circuit       | 1      | LUT Size |        |        |        |        |  |
|---------------|--------|----------|--------|--------|--------|--------|--|
|               | 2      | 3        | 4      | 5      | 6      | 7      |  |
| alu4          | 4.063  | 3.140    | 2.775  | 2.576  | 2.623  | 3.040  |  |
| apex2         | 4.695  | 3.867    | 3.697  | 4.019  | 4.333  | 5.160  |  |
| apex4         | 3.470  | 2.870    | 2.698  | 3.026  | 3.026  | 3.406  |  |
| bigkey        | 3.811  | 3.502    | 2.264  | 2.051  | 1.370  | 1.301  |  |
| clma          | 20.771 | 17.854   | 18.191 | 17.565 | 19.632 | 23.726 |  |
| des           | 3.695  | 2.761    | 2.341  | 2.254  | 1.222  | 1.514  |  |
| diffeq        | 2.702  | 2.282    | 2.236  | 2.361  | 2.367  | 2.651  |  |
| dsip          | 3.023  | 2.124    | 2.150  | 1.376  | 1.409  | 4.142  |  |
| elliptic      | 6.941  | 6.403    | 6.809  | 6.060  | 6.435  | 8.806  |  |
| ex1010        | 11.751 | 10.337   | 8.663  | 10.213 | 10.303 | 7.783  |  |
| ex5p          | 2.821  | 2.703    | 2.384  | 2.228  | 2.071  | 1.799  |  |
| frisc         | 9.364  | 8.026    | 7.466  | 8.055  | 9.797  | 11.564 |  |
| misex3        | 4.027  | 3.146    | 2.682  | 2.768  | 3.137  | 3.431  |  |
| pdc           | 15.333 | 12.170   | 12.048 | 11.871 | 12.065 | 12.771 |  |
| s298          | 4.479  | 3.230    | 3.050  | 3.367  | 3.518  | 4.167  |  |
| s38417        | 14.137 | 11.727   | 9.372  | 10.317 | 7.814  | 10.123 |  |
| s38584        | 12.941 | 11.136   | 11.514 | 9.764  | 10.391 | 10.997 |  |
| seq           | 4.365  | 3.775    | 3.717  | 3.686  | 4.024  | 4.583  |  |
| spla          | 11.930 | 9.993    | 8.298  | 9.677  | 9.520  | 11.297 |  |
| tseng         | 1.979  | 1.365    | 1.575  | 1.516  | 1.879  | 3.059  |  |
| display_chip  | 5.663  | 3.588    | 2.397  | 2.610  | 2.853  | 3.224  |  |
| img_calc      | 35.386 | 22.918   | 15.696 | 11.832 | 13.281 | 12.949 |  |
| img_interp    | 7.784  | 5.265    | 4.281  | 4.182  | 4.163  | 4.705  |  |
| input_chip    | 2.243  | 1.501    | 1.097  | 0.991  | 1.281  | 1.612  |  |
| peak_chip     | 2.570  | 1.960    | 1.050  | 1.042  | 1.092  | 1.411  |  |
| scale125_chip | 7.153  | 5.061    | 4.135  | 4.365  | 5.105  | 5.933  |  |
| scale2_chip   | 2.710  | 1.910    | 1.784  | 1.297  | 1.832  | 2.261  |  |
| warping       | 2.817  | 2.259    | 1.816  | 1.578  | 1.775  | 2.488  |  |
| Geom. Avg.    | 5.596  | 4.389    | 3.801  | 3.673  | 3.777  | 4.494  |  |

Table A.1: Total Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 1)

| Circuit       |        |        | LUT    | Size   |        |        |
|---------------|--------|--------|--------|--------|--------|--------|
|               | 2      | 3      | 4      | 5      | 6      | 7      |
| alu4          | 3.484  | 3.318  | 2.786  | 2.819  | 2.844  | 3.176  |
| apex2         | 3.680  | 3.929  | 3.910  | 4.050  | 4.204  | 5.042  |
| apex4         | 2.947  | 2.741  | 2.649  | 3.019  | 2.900  | 3.494  |
| bigkey        | 2.908  | 2.598  | 2.008  | 2.006  | 1.367  | 1.317  |
| clma          | 19.447 | 18.681 | 20.308 | 19.479 | 21.269 | 24.957 |
| des           | 2.795  | 2.581  | 2.287  | 2.412  | 1.286  | 1.521  |
| diffeq        | 2.437  | 2.113  | 2.335  | 2.356  | 2.163  | 2.668  |
| dsip          | 2.409  | 2.058  | 2.034  | 1.487  | 1.476  | 3.919  |
| elliptic      | 6.148  | 5.920  | 6.917  | 5.946  | 5.920  | 8.542  |
| ex1010        | 9.168  | 12.264 | 9.350  | 11.088 | 10.820 | 7.823  |
| ex5p          | 2.335  | 2.633  | 2.281  | 2.273  | 2.063  | 1.931  |
| frisc         | 7.657  | 6.988  | 7.939  | 8.433  | 9.435  | 10.959 |
| misex3        | 3.348  | 3.022  | 2.726  | 2.846  | 3.021  | 3.688  |
| pdc           | 13.294 | 12.313 | 12.658 | 12.366 | 13.300 | 13.923 |
| s298          | 4.075  | 3.725  | 3.242  | 3.234  | 3.273  | 4.213  |
| s38417        | 14.570 | 13.727 | 11.825 | 11.595 | 8.191  | 10.431 |
| s38584        | 13.766 | 12.029 | 10.798 | 10.789 | 11.440 | 12.216 |
| seq           | 3.770  | 3.689  | 3.713  | 3.765  | 4.014  | 4.621  |
| spla          | 10.629 | 10.621 | 9.243  | 10.085 | 10.594 | 11.159 |
| tseng         | 1.632  | 1.310  | 1.466  | 1.561  | 1.831  | 2.965  |
| display_chip  | 4.516  | 3.814  | 2.395  | 2.713  | 2.710  | 3.297  |
| img_calc      | 38.436 | 27.930 | 18.592 | 12.739 | 15.383 | 13.782 |
| img_interp    | 6.350  | 5.065  | 3.712  | 4.037  | 4.026  | 4.415  |
| input_chip    | 1.857  | 1.453  | 1.069  | 0.951  | 1.241  | 1.595  |
| peak_chip     | 2.229  | 1.889  | 0.948  | 1.026  | 1.121  | 1.462  |
| scale125_chip | 6.368  | 4.956  | 3.582  | 4.211  | 4.627  | 5.835  |
| scale2_chip   | 2.474  | 1.811  | 1.555  | 1.377  | 1.783  | 2.276  |
| warping       | 2.568  | 2.233  | 1.761  | 1.355  | 1.908  | 2.504  |
| Geom. Avg.    | 4.846  | 4.379  | 3.804  | 3.763  | 3.808  | 4.553  |

Table A.2: Total Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 2)

| Circuit       |        |        | LUT    | Size   |        |        |
|---------------|--------|--------|--------|--------|--------|--------|
|               | 2      | 3      | 4      | 5      | 6      | 7      |
| alu4          | 3.156  | 2.666  | 2.590  | 2.483  | 2.658  | 3.058  |
| apex2         | 3.780  | 3.432  | 3.357  | 3.650  | 4.007  | 4.942  |
| apex4         | 2.754  | 2.462  | 2.335  | 2.701  | 2.799  | 3.337  |
| bigkey        | 2.587  | 2.526  | 2.219  | 2.130  | 1.379  | 1.432  |
| clma          | 18.706 | 17.054 | 16.142 | 16.836 | 19.334 | 23.297 |
| des           | 2.671  | 2.353  | 2.215  | 2.247  | 1.164  | 1.548  |
| diffeq        | 2.244  | 1.936  | 2.176  | 2.139  | 2.063  | 2.673  |
| dsip          | 2.100  | 1.954  | 1.964  | 1.398  | 1.386  | 4.347  |
| elliptic      | 5.775  | 5.439  | 5.931  | 5.525  | 5.720  | 8.651  |
| ex1010        | 9.568  | 11.169 | 8.555  | 9.467  | 9.701  | 7.484  |
| ex5p          | 2.201  | 2.250  | 2.072  | 2.052  | 1.976  | 1.858  |
| frisc         | 6.933  | 6.656  | 6.766  | 7.423  | 9.059  | 11.142 |
| misex3        | 2.950  | 2.695  | 2.503  | 2.539  | 2.930  | 3.565  |
| pdc           | 13.022 | 10.701 | 10.982 | 10.590 | 10.965 | 12.469 |
| s298          | 3.475  | 3.044  | 2.741  | 2.860  | 3.118  | 3.971  |
| s38417        | 14.056 | 11.739 | 9.817  | 10.077 | 7.910  | 10.625 |
| s38584        | 12.169 | 11.546 | 9.863  | 10.022 | 10.682 | 12.131 |
| seq           | 3.710  | 3.223  | 3.114  | 3.202  | 3.713  | 4.394  |
| spla          | 10.457 | 9.492  | 8.319  | 8.597  | 8.887  | 10.321 |
| tseng         | 1.450  | 1.249  | 1.430  | 1.392  | 1.791  | 2.925  |
| display_chip  | 4.100  | 3.265  | 2.273  | 2.602  | 2.657  | 3.347  |
| img_calc      | 36.277 | 26.150 | 17.409 | 12.863 | 14.364 | 13.488 |
| img_interp    | 5.593  | 4.364  | 3.488  | 3.641  | 3.832  | 4.548  |
| input_chip    | 1.694  | 1.350  | 0.940  | 0.949  | 1.255  | 1.669  |
| peak_chip     | 2.024  | 1.647  | 0.913  | 0.964  | 1.089  | 1.433  |
| scale125_chip | 5.736  | 4.451  | 3.298  | 3.885  | 4.831  | 5.802  |
| scale2_chip   | 2.025  | 1.608  | 1.389  | 1.235  | 1.660  | 2.328  |
| warping       | 2.326  | 1.891  | 1.682  | 1.329  | 1.758  | 2.685  |
| Geom. Avg.    | 4.475  | 3.917  | 3.457  | 3.433  | 3.598  | 4.518  |

Table A.3: Total Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 3)

| Circuit       |        | LUT Size |        |        |        |        |  |
|---------------|--------|----------|--------|--------|--------|--------|--|
|               | 2      | 3        | 4      | 5      | 6      | 7      |  |
| alu4          | 2.851  | 2.507    | 2.365  | 2.297  | 2.653  | 2.961  |  |
| apex2         | 3.544  | 3.416    | 3.263  | 3.605  | 3.881  | 4.911  |  |
| apex4         | 2.524  | 2.364    | 2.313  | 2.587  | 2.688  | 3.320  |  |
| bigkey        | 2.258  | 2.342    | 1.983  | 1.917  | 1.353  | 1.319  |  |
| clma          | 17.251 | 16.294   | 15.806 | 16.153 | 19.397 | 23.058 |  |
| des           | 2.195  | 1.926    | 2.042  | 2.075  | 1.122  | 1.473  |  |
| diffeq        | 2.076  | 1.881    | 2.035  | 2.082  | 2.078  | 2.564  |  |
| dsip          | 1.749  | 1.454    | 1.593  | 1.262  | 1.378  | 4.224  |  |
| elliptic      | 5.371  | 5.106    | 6.202  | 5.346  | 9.606  | 8.459  |  |
| ex1010        | 9.405  | 10.544   | 8.295  | 9.187  | 9.212  | 7.132  |  |
| ex5p          | 2.029  | 2.094    | 2.004  | 1.871  | 1.904  | 1.878  |  |
| frisc         | 6.619  | 6.310    | 6.301  | 7.177  | 8.728  | 10.885 |  |
| misex3        | 2.880  | 2.669    | 2.350  | 2.421  | 2.930  | 3.503  |  |
| pdc           | 11.499 | 9.828    | 10.717 | 10.175 | 10.799 | 11.870 |  |
| s298          | 3.192  | 2.886    | 2.518  | 2.751  | 3.069  | 3.913  |  |
| s38417        | 14.108 | 11.589   | 9.926  | 9.574  | 7.867  | 10.355 |  |
| s38584        | 12.153 | 10.045   | 9.333  | 9.511  | 10.839 | 11.859 |  |
| seq           | 3.345  | 3.200    | 3.025  | 3.139  | 3.601  | 4.325  |  |
| spla          | 9.529  | 8.814    | 7.954  | 8.290  | 8.983  | 10.137 |  |
| tseng         | 1.384  | 1.100    | 1.402  | 1.313  | 1.724  | 2.963  |  |
| display_chip  | 3.799  | 2.976    | 2.228  | 2.655  | 2.611  | 3.308  |  |
| img_calc      | 34.379 | 24.689   | 16.297 | 12.610 | 14.393 | 13.552 |  |
| img_interp    | 5.491  | 4.085    | 3.389  | 3.523  | 4.005  | 4.593  |  |
| input_chip    | 1.616  | 1.285    | 0.934  | 0.932  | 1.216  | 1.635  |  |
| peak_chip     | 1.751  | 1.585    | 0.855  | 0.922  | 1.098  | 1.498  |  |
| scale125_chip | 5.487  | 4.151    | 3.214  | 3.668  | 4.594  | 5.650  |  |
| scale2_chip   | 1.930  | 1.461    | 1.325  | 1.212  | 1.637  | 2.326  |  |
| warping       | 2.143  | 1.765    | 1.574  | 1.275  | 1.748  | 2.500  |  |
| Geom. Avg.    | 4.147  | 3.641    | 3.294  | 3.284  | 3.614  | 4.428  |  |

Table A.4: Total Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 4)

| Circuit       | LUT Size |        |        |        |        |        |
|---------------|----------|--------|--------|--------|--------|--------|
|               | 2        | 3      | 4      | 5      | 6      | 7      |
| alu4          | 2.768    | 2.360  | 2.340  | 2.351  | 2.623  | 3.183  |
| apex2         | 3.417    | 3.186  | 3.377  | 3.529  | 3.918  | 5.028  |
| apex4         | 2.468    | 2.259  | 2.296  | 2.643  | 2.704  | 3.280  |
| bigkey        | 2.307    | 2.161  | 2.008  | 2.002  | 1.415  | 1.427  |
| clma          | 17.411   | 15.803 | 15.885 | 16.421 | 18.865 | 24.472 |
| des           | 2.244    | 1.996  | 2.035  | 2.171  | 1.164  | 1.507  |
| diffeq        | 2.005    | 1.891  | 2.122  | 2.195  | 2.095  | 2.790  |
| dsip          | 1.898    | 1.516  | 1.687  | 1.386  | 1.428  | 4.353  |
| elliptic      | 5.728    | 4.976  | 6.218  | 5.723  | 5.764  | 8.734  |
| ex1010        | 9.081    | 10.045 | 8.130  | 9.177  | 9.329  | 7.473  |
| ex5p          | 1.963    | 2.053  | 1.976  | 2.000  | 1.914  | 1.941  |
| frisc         | 6.104    | 5.861  | 6.293  | 6.920  | 8.916  | 11.220 |
| misex3        | 2.801    | 2.430  | 2.357  | 2.498  | 2.858  | 3.526  |
| pdc           | 11.243   | 9.486  | 10.202 | 9.916  | 11.043 | 12.069 |
| s298          | 2.808    | 2.519  | 2.516  | 2.818  | 3.024  | 3.982  |
| s38417        | 13.440   | 11.118 | 9.355  | 10.091 | 8.027  | 10.821 |
| s38584        | 11.764   | 10.138 | 9.802  | 9.834  | 10.631 | 12.237 |
| seq           | 3.116    | 2.956  | 3.034  | 3.214  | 3.497  | 4.472  |
| spla          | 9.377    | 8.156  | 7.310  | 7.860  | 9.074  | 9.998  |
| tseng         | 1.348    | 1.114  | 1.452  | 1.367  | 1.805  | 3.053  |
| display_chip  | 3.430    | 2.729  | 2.172  | 2.781  | 2.660  | 3.417  |
| img_calc      | 31.044   | 23.093 | 15.734 | 12.772 | 14.730 | 13.709 |
| img_interp    | 5.277    | 4.050  | 3.339  | 3.501  | 4.041  | 4.626  |
| input_chip    | 1.584    | 1.234  | 0.942  | 0.942  | 1.269  | 1.690  |
| peak_chip     | 1.606    | 1.534  | 0.909  | 0.993  | 1.121  | 1.558  |
| scale125_chip | 4.914    | 4.024  | 3.233  | 3.661  | 4.734  | 5.928  |
| scale2_chip   | 1.813    | 1.439  | 1.331  | 1.224  | 1.739  | 2.359  |
| warping       | 2.059    | 1.826  | 1.628  | 1.372  | 1.856  | 2.725  |
| Geom. Avg.    | 4.000    | 3.506  | 3.298  | 3.370  | 3.601  | 4.579  |

Table A.5: Total Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 5)

| Circuit       |        |        | LUT    | Size   |        |        |
|---------------|--------|--------|--------|--------|--------|--------|
|               | 2      | 3      | 4      | 5      | 6      | 7      |
| alu4          | 2.646  | 2.316  | 2.161  | 2.335  | 2.621  | 3.075  |
| apex2         | 3.297  | 3.259  | 3.323  | 3.603  | 4.023  | 5.014  |
| apex4         | 2.387  | 2.370  | 2.442  | 2.471  | 2.762  | 3.326  |
| bigkey        | 2.010  | 1.866  | 1.942  | 1.919  | 1.390  | 1.415  |
| clma          | 16.178 | 15.392 | 15.761 | 16.376 | 19.020 | 24.193 |
| des           | 2.110  | 2.003  | 1.950  | 2.064  | 1.156  | 1.542  |
| diffeq        | 1.954  | 1.734  | 2.064  | 2.127  | 2.086  | 2.784  |
| dsip          | 1.856  | 1.578  | 1.471  | 1.288  | 1.413  | 4.237  |
| elliptic      | 5.138  | 5.036  | 6.130  | 5.655  | 5.770  | 8.756  |
| ex1010        | 8.942  | 10.371 | 8.204  | 9.351  | 9.807  | 7.442  |
| ex5p          | 2.030  | 2.112  | 1.971  | 1.948  | 1.962  | 2.000  |
| frisc         | 6.275  | 5.974  | 6.203  | 6.903  | 8.661  | 11.220 |
| misex3        | 2.625  | 2.470  | 2.243  | 2.440  | 2.923  | 3.601  |
| pdc           | 10.730 | 9.462  | 10.434 | 10.178 | 10.894 | 12.019 |
| s298          | 2.611  | 2.530  | 2.517  | 2.853  | 3.021  | 3.931  |
| s38417        | 13.283 | 10.759 | 9.489  | 9.798  | 8.187  | 10.906 |
| s38584        | 11.833 | 9.928  | 9.733  | 9.974  | 10.630 | 12.317 |
| seq           | 3.132  | 2.946  | 2.886  | 3.173  | 3.540  | 4.513  |
| spla          | 9.024  | 8.303  | 7.218  | 7.995  | 8.703  | 10.008 |
| tseng         | 1.290  | 1.149  | 1.457  | 1.393  | 1.846  | 3.059  |
| display_chip  | 3.225  | 2.864  | 2.330  | 2.817  | 2.791  | 3.489  |
| img_calc      | 31.968 | 21.903 | 15.841 | 12.599 | 14.396 | 13.982 |
| img_interp    | 4.879  | 3.892  | 3.486  | 3.673  | 3.949  | 4.733  |
| input_chip    | 1.403  | 1.256  | 0.960  | 0.956  | 1.289  | 1.726  |
| peak chip     | 1.563  | 1.497  | 0.891  | 1.001  | 1.145  | 1.557  |
| scale125_chip | 4.835  | 3.883  | 3.243  | 3.793  | 4.620  | 5.950  |
| scale2_chip   | 1.727  | 1.466  | 1.381  | 1.264  | 1.759  | 2.414  |
| warping       | 2.124  | 1.749  | 1.559  | 1.362  | 1.836  | 2.708  |
| Geom. Avg.    | 3.852  | 3.484  | 3.267  | 3.355  | 3.614  | 4.598  |

Table A.6: Total Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 6)

| Circuit       | LUT Size |        |        |        |        |        |
|---------------|----------|--------|--------|--------|--------|--------|
|               | 2        | 3      | 4      | 5      | 6      | 7      |
| alu4          | 2.706    | 2.345  | 2.265  | 2.317  | 2.709  | 3.142  |
| apex2         | 3.261    | 3.103  | 3.283  | 3.526  | 4.102  | 5.132  |
| apex4         | 2.458    | 2.306  | 2.374  | 2.482  | 2.987  | 3.375  |
| bigkey        | 1.986    | 2.223  | 2.200  | 2.044  | 1.499  | 1.521  |
| clma          | 15.401   | 15.950 | 15.498 | 16.442 | 19.812 | 24.396 |
| des           | 2.291    | 2.139  | 1.990  | 2.172  | 1.234  | 1.603  |
| diffeq        | 1.847    | 1.793  | 1.986  | 2.159  | 2.185  | 2.870  |
| dsip          | 1.787    | 1.837  | 1.561  | 1.394  | 1.499  | 4.451  |
| elliptic      | 5.213    | 5.002  | 6.224  | 5.618  | 6.038  | 9.119  |
| ex1010        | 8.741    | 10.084 | 8.386  | 9.219  | 9.550  | 7.592  |
| ex5p          | 1.935    | 2.068  | 1.937  | 2.013  | 1.991  | 2.016  |
| frisc         | 6.203    | 5.746  | 6.294  | 6.885  | 9.148  | 11.563 |
| misex3        | 2.664    | 2.405  | 2.305  | 2.444  | 2.961  | 3.630  |
| pdc           | 10.312   | 9.311  | 9.774  | 10.062 | 10.967 | 12.169 |
| s298          | 2.723    | 2.497  | 2.373  | 2.751  | 3.164  | 4.021  |
| s38417        | 13.121   | 11.032 | 9.321  | 9.830  | 8.479  | 11.289 |
| s38584        | 11.168   | 10.013 | 9.457  | 10.252 | 11.118 | 12.691 |
| seq           | 3.103    | 3.035  | 2.993  | 3.145  | 3.676  | 4.593  |
| spla          | 8.624    | 8.234  | 7.134  | 7.619  | 9.012  | 10.229 |
| tseng         | 1.267    | 1.207  | 1.440  | 1.410  | 1.869  | 3.279  |
| display_chip  | 3.111    | 2.682  | 2.278  | 2.886  | 2.815  | 3.544  |
| img_calc      | 29.804   | 22.082 | 15.766 | 12.964 | 14.963 | 14.545 |
| img_interp    | 4.964    | 3.626  | 3.524  | 3.797  | 4.181  | 4.888  |
| input_chip    | 1.518    | 1.261  | 0.976  | 0.961  | 1.345  | 1.846  |
| peak_chip     | 1.486    | 1.585  | 0.928  | 1.013  | 1.212  | 1.603  |
| scale125_chip | 4.530    | 3.975  | 3.131  | 3.890  | 4.884  | 6.137  |
| scale2_chip   | 1.770    | 1.508  | 1.414  | 1.390  | 1.902  | 2.559  |
| warping       | 2.005    | 1.803  | 1.652  | 1.421  | 1.972  | 2.887  |
| Geom. Avg.    | 3.799    | 3.530  | 3.284  | 3.404  | 3.765  | 4.750  |

Table A.7: Total Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 7)

| Circuit       |        | LUT Size |        |        |        |        |  |
|---------------|--------|----------|--------|--------|--------|--------|--|
|               | 2      | 3        | 4      | 5      | 6      | 7      |  |
| alu4          | 2.653  | 2.315    | 2.293  | 2.368  | 2.673  | 3.159  |  |
| apex2         | 3.222  | 3.106    | 3.322  | 3.555  | 4.015  | 5.147  |  |
| apex4         | 2.455  | 2.342    | 2.312  | 2.498  | 2.922  | 3.483  |  |
| bigkey        | 1.980  | 2.201    | 2.089  | 2.021  | 1.530  | 1.484  |  |
| clma          | 15.373 | 15.642   | 15.742 | 16.679 | 19.807 | 24.507 |  |
| des           | 2.190  | 2.003    | 2.048  | 2.184  | 1.252  | 1.613  |  |
| diffeq        | 1.938  | 1.822    | 2.066  | 2.169  | 2.191  | 2.894  |  |
| dsip          | 1.706  | 1.583    | 1.590  | 1.359  | 1.525  | 4.428  |  |
| elliptic      | 5.465  | 5.063    | 5.965  | 5.662  | 6.069  | 9.225  |  |
| ex1010        | 8.675  | 10.377   | 8.581  | 9.337  | 9.680  | 7.463  |  |
| ex5p          | 1.917  | 2.116    | 1.945  | 2.193  | 2.052  | 2.102  |  |
| frisc         | 5.931  | 6.003    | 6.054  | 7.069  | 9.285  | 11.781 |  |
| misex3        | 2.717  | 2.411    | 2.306  | 2.410  | 3.004  | 3.637  |  |
| pdc           | 9.978  | 9.241    | 9.765  | 9.997  | 11.104 | 11.975 |  |
| s298          | 2.695  | 2.450    | 2.483  | 2.760  | 3.230  | 4.195  |  |
| s38417        | 12.931 | 10.235   | 9.415  | 9.732  | 8.562  | 11.219 |  |
| s38584        | 11.127 | 9.822    | 9.462  | 10.105 | 11.414 | 13.038 |  |
| seq           | 3.053  | 2.908    | 2.993  | 3.230  | 3.694  | 4.565  |  |
| spla          | 8.704  | 8.299    | 6.956  | 7.848  | 8.983  | 10.235 |  |
| tseng         | 1.218  | 1.165    | 1.483  | 1.453  | 1.944  | 3.229  |  |
| display_chip  | 3.188  | 2.829    | 2.262  | 2.890  | 2.955  | 3.637  |  |
| img_calc      | 27.935 | 21.432   | 15.576 | 13.180 | 15.300 | 14.566 |  |
| img_interp    | 4.780  | 3.797    | 3.522  | 3.774  | 4.142  | 4.896  |  |
| input_chip    | 1.434  | 1.260    | 0.959  | 1.018  | 1.405  | 1.839  |  |
| peak chip     | 1.551  | 1.593    | 0.942  | 1.044  | 1.252  | 1.655  |  |
| scale125_chip | 4.731  | 3.934    | 3.085  | 3.818  | 5.009  | 6.104  |  |
| scale2_chip   | 1.753  | 1.466    | 1.368  | 1.304  | 1.864  | 2.551  |  |
| warping       | 1.986  | 1.822    | 1.610  | 1.425  | 1.983  | 2.878  |  |
| Geom. Avg.    | 3.765  | 3.497    | 3.279  | 3.430  | 3.810  | 4.776  |  |

Table A.8: Total Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 8)

| Circuit       | LUT Size |        |        |        |        |        |  |  |
|---------------|----------|--------|--------|--------|--------|--------|--|--|
|               | 2        | 3      | 4      | 5      | 6      | 7      |  |  |
| alu4          | 2.602    | 2.287  | 2.162  | 2.343  | 2.766  | 3.228  |  |  |
| apex2         | 3.211    | 2.995  | 3.298  | 3.672  | 4.117  | 5.178  |  |  |
| apex4         | 2.306    | 2.275  | 2.265  | 2.679  | 2.879  | 3.529  |  |  |
| bigkey        | 2.179    | 2.020  | 2.003  | 2.115  | 1.531  | 1.519  |  |  |
| clma          | 15.260   | 14.792 | 15.542 | 17.063 | 19.920 | 24.458 |  |  |
| des           | 2.155    | 1.964  | 2.009  | 2.177  | 1.261  | 1.646  |  |  |
| diffeq        | 2.013    | 1.729  | 2.021  | 2.228  | 2.272  | 2.881  |  |  |
| dsip          | 1.808    | 1.513  | 1.541  | 1.433  | 1.531  | 4.485  |  |  |
| elliptic      | 5.322    | 4.984  | 5.928  | 5.944  | 6.119  | 9.158  |  |  |
| ex1010        | 8.326    | 10.013 | 8.763  | 9.865  | 9.841  | 7.752  |  |  |
| ex5p          | 1.962    | 2.071  | 1.977  | 2.093  | 2.098  | 2.057  |  |  |
| frisc         | 6.025    | 5.934  | 6.093  | 7.226  | 9.091  | 11.580 |  |  |
| misex3        | 2.641    | 2.316  | 2.299  | 2.608  | 3.041  | 3.730  |  |  |
| pdc           | 10.340   | 8.934  | 9.594  | 10.399 | 10.913 | 12.134 |  |  |
| s298          | 2.713    | 2.438  | 2.431  | 2.904  | 3.320  | 4.237  |  |  |
| s38417        | 12.442   | 10.306 | 9.413  | 10.376 | 8.663  | 11.280 |  |  |
| s38584        | 10.868   | 10.158 | 9.778  | 10.744 | 11.709 | 13.297 |  |  |
| seq           | 2.986    | 2.986  | 2.912  | 3.273  | 3.671  | 4.644  |  |  |
| spla          | 8.481    | 7.813  | 6.821  | 8.226  | 8.931  | 10.301 |  |  |
| tseng         | 1.297    | 1.182  | 1.586  | 1.542  | 1.976  | 3.349  |  |  |
| display_chip  | 3.299    | 2.831  | 2.262  | 3.121  | 2.915  | 3.673  |  |  |
| img_calc      | 27.772   | 21.651 | 15.498 | 13.836 | 15.450 | 14.817 |  |  |
| img_interp    | 4.613    | 3.707  | 3.463  | 4.068  | 4.257  | 5.016  |  |  |
| input chip    | 1.395    | 1.236  | 0.949  | 1.093  | 1.410  | 1.885  |  |  |
| peak_chip     | 1.541    | 1.549  | 0.949  | 1.130  | 1.221  | 1.661  |  |  |
| scale125_chip | 4.663    | 3.935  | 3.089  | 4.180  | 5.066  | 6.327  |  |  |
| scale2_chip   | 1.759    | 1.456  | 1.442  | 1.426  | 1.933  | 2.607  |  |  |
| warping       | 1.955    | 1.753  | 1.642  | 1.495  | 2.001  | 2.890  |  |  |
| Geom. Avg.    | 3.759    | 3.425  | 3.265  | 3.594  | 3.844  | 4.838  |  |  |

Table A.9: Total Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 9)

| Circuit       | LUT Size |        |        |        |        |        |  |  |
|---------------|----------|--------|--------|--------|--------|--------|--|--|
|               | 2        | 3      | 4      | 5      | 6      | 7      |  |  |
| alu4          | 2.659    | 2.296  | 2.327  | 2.463  | 2.842  | 3.284  |  |  |
| apex2         | 3.325    | 3.124  | 3.238  | 3.863  | 4.182  | 5.309  |  |  |
| apex4         | 2.386    | 2.407  | 2.484  | 2.764  | 3.027  | 3.521  |  |  |
| bigkey        | 2.277    | 2.117  | 2.111  | 2.260  | 1.619  | 1.569  |  |  |
| clma          | 15.307   | 14.984 | 15.844 | 17.695 | 20.526 | 24.997 |  |  |
| des           | 2.196    | 2.068  | 2.066  | 2.301  | 1.314  | 1.696  |  |  |
| diffeq        | 1.927    | 1.842  | 2.075  | 2.243  | 2.268  | 2.953  |  |  |
| dsip          | 1.951    | 1.618  | 1.621  | 1.504  | 1.596  | 4.565  |  |  |
| elliptic      | 5.331    | 5.045  | 6.209  | 5.937  | 6.416  | 9.240  |  |  |
| ex1010        | 8.698    | 10.410 | 8.797  | 9.787  | 9.877  | 7.964  |  |  |
| ex5p          | 1.923    | 2.086  | 2.033  | 2.200  | 2.138  | 2.188  |  |  |
| frisc         | 5.978    | 5.971  | 6.271  | 7.406  | 9.415  | 11.638 |  |  |
| misex3        | 2.495    | 2.469  | 2.407  | 2.555  | 3.085  | 3.685  |  |  |
| pdc           | 9.859    | 8.808  | 9.765  | 10.399 | 11.451 | 12.319 |  |  |
| s298          | 2.569    | 2.440  | 2.547  | 2.929  | 3.320  | 4.354  |  |  |
| s38417        | 12.466   | 10.517 | 9.428  | 10.512 | 8.992  | 11.832 |  |  |
| s38584        | 10.524   | 10.147 | 10.128 | 11.025 | 12.012 | 13.610 |  |  |
| seq           | 3.025    | 2.980  | 2.961  | 3.476  | 3.849  | 4.695  |  |  |
| spla          | 8.284    | 8.175  | 7.019  | 8.042  | 9.097  | 10.463 |  |  |
| tseng         | 1.279    | 1.274  | 1.623  | 1.565  | 2.004  | 3.421  |  |  |
| display_chip  | 3.096    | 2.797  | 2.363  | 3.145  | 3.075  | 3.782  |  |  |
| img_calc      | 26.454   | 22.238 | 15.690 | 13.820 | 16.033 | 14.925 |  |  |
| img_interp    | 4.757    | 3.644  | 3.718  | 4.046  | 4.425  | 5.303  |  |  |
| input_chip    | 1.406    | 1.217  | 0.971  | 1.072  | 1.459  | 1.860  |  |  |
| peak_chip     | 1.589    | 1.562  | 1.018  | 1.122  | 1.318  | 1.691  |  |  |
| scale125_chip | 4.538    | 3.930  | 3.274  | 4.121  | 5.312  | 6.355  |  |  |
| scale2_chip   | 1.757    | 1.502  | 1.518  | 1.400  | 1.948  | 2.688  |  |  |
| warping       | 1.971    | 1.808  | 1.734  | 1.610  | 2.099  | 3.026  |  |  |
| Geom. Avg.    | 3.749    | 3.505  | 3.387  | 3.660  | 3.971  | 4.942  |  |  |

Table A.10: Total Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 10)

# ${\sf APPENDIX}\,B$

## Intra-Cluster (Logic) Area

| Circuit       | LUT Size |       |       |       |       |       |  |  |
|---------------|----------|-------|-------|-------|-------|-------|--|--|
|               | 2        | 3     | 4     | 5     | 6     | 7     |  |  |
| alu4          | 0.287    | 0.279 | 0.391 | 0.576 | 0.992 | 1.520 |  |  |
| apex2         | 0.332    | 0.327 | 0.483 | 0.723 | 1.250 | 2.158 |  |  |
| apex4         | 0.231    | 0.233 | 0.324 | 0.507 | 0.821 | 1.353 |  |  |
| bigkey        | 0.313    | 0.345 | 0.439 | 0.590 | 0.585 | 0.739 |  |  |
| clma          | 1.497    | 1.490 | 2.154 | 3.019 | 5.281 | 8.846 |  |  |
| des           | 0.305    | 0.308 | 0.409 | 0.568 | 0.469 | 0.794 |  |  |
| diffeq        | 0.268    | 0.251 | 0.385 | 0.527 | 0.735 | 1.278 |  |  |
| dsip          | 0.266    | 0.265 | 0.352 | 0.396 | 0.582 | 2.167 |  |  |
| elliptic      | 0.575    | 0.576 | 0.926 | 1.176 | 1.805 | 3.604 |  |  |
| ex1010        | 0.842    | 0.858 | 1.182 | 1.862 | 2.617 | 2.918 |  |  |
| ex5p          | 0.187    | 0.193 | 0.273 | 0.380 | 0.626 | 0.893 |  |  |
| frisc         | 0.632    | 0.613 | 0.914 | 1.322 | 2.487 | 4.023 |  |  |
| misex3        | 0.268    | 0.265 | 0.359 | 0.530 | 0.980 | 1.659 |  |  |
| pdc           | 0.883    | 0.830 | 1.176 | 1.747 | 3.070 | 4.813 |  |  |
| s298          | 0.449    | 0.403 | 0.496 | 0.698 | 1.101 | 1.822 |  |  |
| s38417        | 1.434    | 1.319 | 1.646 | 2.350 | 2.865 | 4.949 |  |  |
| s38584        | 1.312    | 1.252 | 1.657 | 2.338 | 3.657 | 5.572 |  |  |
| seq           | 0.309    | 0.309 | 0.450 | 0.662 | 1.121 | 1.913 |  |  |
| spla          | 0.781    | 0.730 | 0.948 | 1.501 | 2.542 | 4.253 |  |  |
| tseng         | 0.195    | 0.167 | 0.269 | 0.373 | 0.676 | 1.401 |  |  |
| display_chip  | 0.502    | 0.448 | 0.461 | 0.715 | 1.033 | 1.670 |  |  |
| img_calc      | 2.973    | 2.592 | 2.606 | 2.699 | 4.684 | 6.022 |  |  |
| img_interp    | 0.692    | 0.587 | 0.701 | 0.991 | 1.451 | 2.284 |  |  |
| input_chip    | 0.234    | 0.212 | 0.207 | 0.278 | 0.518 | 0.878 |  |  |
| peak_chip     | 0.255    | 0.255 | 0.208 | 0.292 | 0.455 | 0.787 |  |  |
| scale125_chip | 0.721    | 0.636 | 0.676 | 1.035 | 1.783 | 2.887 |  |  |
| scale2_chip   | 0.284    | 0.248 | 0.306 | 0.366 | 0.689 | 1.238 |  |  |
| warping       | 0.295    | 0.295 | 0.348 | 0.392 | 0.728 | 1.372 |  |  |
| Geom. Avg.    | 0.464    | 0.443 | 0.571 | 0.792 | 1.245 | 2.085 |  |  |

Table B.1: Intra-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 1)

| Circuit       | LUT Size |       |       |       |       |        |  |
|---------------|----------|-------|-------|-------|-------|--------|--|
|               | 2        | 3     | 4     | 5     | 6     | 7      |  |
| alu4          | 0.376    | 0.470 | 0.565 | 0.833 | 1.275 | 1.808  |  |
| apex2         | 0.439    | 0.560 | 0.697 | 1.079 | 1.605 | 2.572  |  |
| apex4         | 0.308    | 0.386 | 0.486 | 0.797 | 1.076 | 1.675  |  |
| bigkey        | 0.416    | 0.510 | 0.617 | 0.828 | 0.750 | 0.877  |  |
| clma          | 1.934    | 2.395 | 3.056 | 4.366 | 6.769 | 10.476 |  |
| des           | 0.400    | 0.492 | 0.610 | 0.796 | 0.601 | 0.942  |  |
| diffeq        | 0.346    | 0.383 | 0.541 | 0.739 | 0.944 | 1.513  |  |
| dsip          | 0.342    | 0.389 | 0.573 | 0.555 | 0.746 | 2.568  |  |
| elliptic      | 0.742    | 0.827 | 1.305 | 1.670 | 2.314 | 4.270  |  |
| ex1010        | 1.116    | 1.606 | 1.702 | 2.911 | 3.407 | 3.635  |  |
| ex5p          | 0.249    | 0.339 | 0.417 | 0.572 | 0.816 | 1.070  |  |
| frisc         | 0.821    | 0.927 | 1.289 | 1.862 | 3.188 | 4.765  |  |
| misex3        | 0.351    | 0.439 | 0.514 | 0.784 | 1.256 | 1.982  |  |
| pdc           | 1.161    | 1.390 | 1.739 | 2.702 | 4.149 | 5.934  |  |
| s298          | 0.587    | 0.615 | 0.698 | 0.979 | 1.412 | 2.205  |  |
| s38417        | 1.844    | 1.947 | 2.313 | 3.297 | 3.674 | 5.862  |  |
| s38584        | 1.686    | 1.803 | 2.328 | 3.280 | 4.689 | 6.600  |  |
| seq           | 0.408    | 0.524 | 0.647 | 0.965 | 1.453 | 2.284  |  |
| spla          | 1.020    | 1.220 | 1.397 | 2.342 | 3.388 | 5.219  |  |
| tseng         | 0.264    | 0.251 | 0.378 | 0.524 | 0.868 | 1.660  |  |
| display_chip  | 0.653    | 0.655 | 0.648 | 1.004 | 1.325 | 1.978  |  |
| img_calc      | 3.854    | 3.998 | 3.661 | 3.786 | 6.006 | 7.133  |  |
| img_interp    | 0.890    | 0.875 | 0.985 | 1.390 | 1.861 | 2.704  |  |
| input_chip    | 0.301    | 0.309 | 0.292 | 0.390 | 0.664 | 1.040  |  |
| peak_chip     | 0.328    | 0.367 | 0.292 | 0.410 | 0.583 | 0.934  |  |
| scale125_chip | 0.927    | 0.919 | 0.950 | 1.452 | 2.286 | 3.419  |  |
| scale2_chip   | 0.365    | 0.352 | 0.430 | 0.514 | 0.883 | 1.467  |  |
| warping       | 0.380    | 0.420 | 0.489 | 0.550 | 0.933 | 1.626  |  |
| Geom. Avg.    | 0.605    | 0.688 | 0.819 | 1.141 | 1.605 | 2.490  |  |

Table B.2: Intra-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 2)

| Circuit       | LUT Size |       |       |       |       |        |  |
|---------------|----------|-------|-------|-------|-------|--------|--|
|               | 2        | 3     | 4     | 5     | 6     | 7      |  |
| alu4          | 0.379    | 0.464 | 0.628 | 0.859 | 1.365 | 1.949  |  |
| apex2         | 0.441    | 0.547 | 0.781 | 1.093 | 1.726 | 2.770  |  |
| apex4         | 0.307    | 0.391 | 0.535 | 0.786 | 1.139 | 1.734  |  |
| bigkey        | 0.412    | 0.559 | 0.691 | 0.872 | 0.802 | 0.950  |  |
| clma          | 1.952    | 2.452 | 3.415 | 4.515 | 7.255 | 11.325 |  |
| des           | 0.401    | 0.508 | 0.647 | 0.843 | 0.643 | 1.017  |  |
| diffeq        | 0.351    | 0.410 | 0.611 | 0.782 | 1.007 | 1.636  |  |
| dsip          | 0.346    | 0.430 | 0.568 | 0.587 | 0.799 | 2.776  |  |
| elliptic      | 0.751    | 0.937 | 1.452 | 1.743 | 2.487 | 4.614  |  |
| ex1010        | 1.117    | 1.515 | 1.932 | 2.829 | 3.650 | 3.818  |  |
| ex5p          | 0.250    | 0.328 | 0.452 | 0.581 | 0.861 | 1.152  |  |
| frisc         | 0.827    | 0.999 | 1.434 | 1.963 | 3.404 | 5.148  |  |
| misex3        | 0.354    | 0.442 | 0.580 | 0.801 | 1.358 | 2.145  |  |
| pdc           | 1.166    | 1.376 | 1.912 | 2.657 | 4.331 | 6.263  |  |
| s298          | 0.583    | 0.656 | 0.777 | 1.033 | 1.511 | 2.341  |  |
| s38417        | 1.869    | 2.144 | 2.573 | 3.477 | 3.921 | 6.330  |  |
| s38584        | 1.706    | 2.029 | 2.585 | 3.458 | 5.005 | 7.127  |  |
| seq           | 0.408    | 0.519 | 0.722 | 0.999 | 1.549 | 2.457  |  |
| spla          | 1.032    | 1.216 | 1.540 | 2.285 | 3.581 | 5.527  |  |
| tseng         | 0.259    | 0.274 | 0.421 | 0.552 | 0.927 | 1.796  |  |
| display_chip  | 0.656    | 0.729 | 0.719 | 1.058 | 1.414 | 2.139  |  |
| img_calc      | 3.896    | 4.218 | 4.073 | 3.993 | 6.411 | 7.703  |  |
| img_interp    | 0.903    | 0.951 | 1.095 | 1.467 | 1.987 | 2.923  |  |
| input_chip    | 0.305    | 0.343 | 0.324 | 0.412 | 0.708 | 1.128  |  |
| peak_chip     | 0.333    | 0.414 | 0.325 | 0.433 | 0.625 | 1.011  |  |
| scale125_chip | 0.938    | 1.030 | 1.056 | 1.532 | 2.442 | 3.695  |  |
| scale2_chip   | 0.368    | 0.403 | 0.478 | 0.543 | 0.945 | 1.587  |  |
| warping       | 0.385    | 0.478 | 0.543 | 0.581 | 0.997 | 1.759  |  |
| Geom. Avg.    | 0.609    | 0.728 | 0.906 | 1.184 | 1.713 | 2.678  |  |

Table B.3: Intra-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 3)

| Circuit       |       |       | LU    | Size  |       |        |
|---------------|-------|-------|-------|-------|-------|--------|
|               | 2     | 3     | 4     | 5     | 6     | 7      |
| alu4          | 0.421 | 0.496 | 0.683 | 0.942 | 1.491 | 2.034  |
| apex2         | 0.492 | 0.592 | 0.849 | 1.195 | 1.877 | 2.890  |
| apex4         | 0.344 | 0.415 | 0.587 | 0.870 | 1.240 | 1.813  |
| bigkey        | 0.455 | 0.577 | 0.748 | 0.950 | 0.868 | 0.983  |
| clma          | 2.180 | 2.603 | 3.735 | 4.947 | 7.875 | 11.813 |
| des           | 0.447 | 0.535 | 0.727 | 0.920 | 0.698 | 1.059  |
| diffeq        | 0.392 | 0.427 | 0.664 | 0.850 | 1.094 | 1.695  |
| dsip          | 0.387 | 0.444 | 0.601 | 0.636 | 0.863 | 2.881  |
| elliptic      | 0.843 | 0.977 | 1.586 | 1.895 | 4.522 | 4.788  |
| ex1010        | 1.256 | 1.666 | 2.103 | 3.096 | 3.965 | 3.983  |
| ex5p          | 0.280 | 0.355 | 0.487 | 0.636 | 0.939 | 1.212  |
| frisc         | 0.931 | 1.048 | 1.565 | 2.162 | 3.709 | 5.381  |
| misex3        | 0.396 | 0.472 | 0.632 | 0.878 | 1.471 | 2.237  |
| pdc           | 1.304 | 1.463 | 2.078 | 2.915 | 4.663 | 6.500  |
| s298          | 0.652 | 0.693 | 0.858 | 1.137 | 1.641 | 2.432  |
| s38417        | 2.091 | 2.245 | 2.817 | 3.785 | 4.251 | 6.567  |
| s38584        | 1.907 | 2.114 | 2.826 | 3.760 | 5.426 | 7.398  |
| seq           | 0.457 | 0.552 | 0.784 | 1.089 | 1.681 | 2.576  |
| spla          | 1.151 | 1.288 | 1.672 | 2.493 | 3.860 | 5.703  |
| tseng         | 0.290 | 0.291 | 0.466 | 0.600 | 1.004 | 1.864  |
| display_chip  | 0.735 | 0.754 | 0.790 | 1.151 | 1.536 | 2.220  |
| img_calc      | 4.351 | 4.382 | 4.444 | 4.344 | 6.951 | 7.991  |
| img_interp    | 1.014 | 0.995 | 1.194 | 1.595 | 2.153 | 3.034  |
| input_chip    | 0.342 | 0.355 | 0.354 | 0.447 | 0.768 | 1.169  |
| peak_chip     | 0.371 | 0.428 | 0.355 | 0.470 | 0.678 | 1.051  |
| scale125_chip | 1.047 | 1.065 | 1.152 | 1.665 | 2.645 | 3.830  |
| scale2_chip   | 0.412 | 0.416 | 0.522 | 0.589 | 1.024 | 1.644  |
| warping       | 0.431 | 0.494 | 0.594 | 0.631 | 1.079 | 1.822  |
| Geom. Avg.    | 0.680 | 0.766 | 0.988 | 1.292 | 1.893 | 2.785  |

Table B.4: Intra-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 4)

| Circuit       | LUT Size |       |       |       |       |        |  |
|---------------|----------|-------|-------|-------|-------|--------|--|
|               | 2        | 3     | 4     | 5     | 6     | 7      |  |
| alu4          | 0.438    | 0.510 | 0.756 | 1.040 | 1.554 | 2.107  |  |
| apex2         | 0.508    | 0.605 | 0.943 | 1.318 | 1.962 | 3.001  |  |
| apex4         | 0.355    | 0.429 | 0.653 | 0.939 | 1.297 | 1.876  |  |
| bigkey        | 0.474    | 0.616 | 0.840 | 1.055 | 0.915 | 1.026  |  |
| clma          | 2.265    | 2.704 | 4.166 | 5.447 | 8.248 | 12.259 |  |
| des           | 0.463    | 0.557 | 0.791 | 1.021 | 0.731 | 1.103  |  |
| diffeq        | 0.408    | 0.450 | 0.741 | 0.943 | 1.145 | 1.765  |  |
| dsip          | 0.401    | 0.475 | 0.673 | 0.711 | 0.908 | 3.001  |  |
| elliptic      | 0.873    | 1.032 | 1.773 | 2.111 | 2.811 | 4.987  |  |
| ex1010        | 1.298    | 1.655 | 2.342 | 3.418 | 4.167 | 4.127  |  |
| ex5p          | 0.290    | 0.362 | 0.545 | 0.700 | 0.987 | 1.258  |  |
| frisc         | 0.959    | 1.100 | 1.758 | 2.393 | 3.891 | 5.572  |  |
| misex3        | 0.411    | 0.485 | 0.707 | 0.970 | 1.540 | 2.306  |  |
| pdc           | 1.351    | 1.517 | 2.318 | 3.209 | 4.852 | 6.720  |  |
| s298          | 0.677    | 0.725 | 0.957 | 1.253 | 1.718 | 2.527  |  |
| s38417        | 2.171    | 2.373 | 3.157 | 4.210 | 4.463 | 6.841  |  |
| s38584        | 1.981    | 2.241 | 3.167 | 4.187 | 5.694 | 7.702  |  |
| seq           | 0.474    | 0.568 | 0.874 | 1.195 | 1.758 | 2.659  |  |
| spla          | 1.194    | 1.335 | 1.866 | 2.733 | 4.016 | 5.903  |  |
| tseng         | 0.299    | 0.306 | 0.520 | 0.669 | 1.053 | 1.942  |  |
| display_chip  | 0.760    | 0.802 | 0.884 | 1.280 | 1.613 | 2.317  |  |
| img_calc      | 4.508    | 4.646 | 4.981 | 4.832 | 7.294 | 8.331  |  |
| img_interp    | 1.048    | 1.050 | 1.340 | 1.774 | 2.258 | 3.156  |  |
| input_chip    | 0.355    | 0.378 | 0.398 | 0.499 | 0.810 | 1.214  |  |
| peak_chip     | 0.385    | 0.457 | 0.398 | 0.526 | 0.711 | 1.092  |  |
| scale125_chip | 1.086    | 1.136 | 1.294 | 1.856 | 2.778 | 3.994  |  |
| scale2_chip   | 0.428    | 0.444 | 0.584 | 0.657 | 1.073 | 1.710  |  |
| warping       | 0.445    | 0.528 | 0.665 | 0.704 | 1.132 | 1.898  |  |
| Geom. Avg.    | 0.705    | 0.803 | 1.104 | 1.431 | 1.948 | 2.893  |  |

Table B.5: Intra-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 5)

| Circuit       | LUT Size |       |       |       |       |        |  |
|---------------|----------|-------|-------|-------|-------|--------|--|
|               | 2        | 3     | 4     | 5     | 6     | 7      |  |
| alu4          | 0.454    | 0.563 | 0.803 | 1.093 | 1.630 | 2.192  |  |
| apex2         | 0.529    | 0.669 | 1.000 | 1.386 | 2.051 | 3.109  |  |
| apex4         | 0.369    | 0.477 | 0.694 | 0.986 | 1.357 | 1.959  |  |
| bigkey        | 0.491    | 0.674 | 0.891 | 1.113 | 0.960 | 1.068  |  |
| clma          | 2.353    | 2.976 | 4.397 | 5.740 | 8.628 | 12.683 |  |
| des           | 0.482    | 0.610 | 0.838 | 1.079 | 0.769 | 1.137  |  |
| diffeq        | 0.425    | 0.493 | 0.791 | 0.996 | 1.199 | 1.835  |  |
| dsip          | 0.417    | 0.519 | 0.716 | 0.747 | 0.951 | 3.109  |  |
| elliptic      | 0.904    | 1.134 | 1.884 | 2.226 | 2.945 | 5.164  |  |
| ex1010        | 1.350    | 1.856 | 2.487 | 3.631 | 4.359 | 4.273  |  |
| ex5p          | 0.302    | 0.404 | 0.575 | 0.747 | 1.042 | 1.301  |  |
| frisc         | 0.999    | 1.215 | 1.859 | 2.509 | 4.062 | 5.766  |  |
| misex3        | 0.426    | 0.534 | 0.744 | 1.020 | 1.613 | 2.397  |  |
| pdc           | 1.403    | 1.668 | 2.444 | 3.373 | 5.071 | 6.958  |  |
| s298          | 0.704    | 0.794 | 1.009 | 1.323 | 1.795 | 2.616  |  |
| s38417        | 2.256    | 2.598 | 3.344 | 4.432 | 4.674 | 7.081  |  |
| s38584        | 2.059    | 2.452 | 3.359 | 4.403 | 5.964 | 7.972  |  |
| seq           | 0.492    | 0.625 | 0.919 | 1.264 | 1.836 | 2.753  |  |
| spla          | 1.238    | 1.470 | 1.978 | 2.880 | 4.210 | 6.123  |  |
| tseng         | 0.311    | 0.336 | 0.550 | 0.703 | 1.108 | 2.013  |  |
| display_chip  | 0.790    | 0.879 | 0.934 | 1.347 | 1.687 | 2.397  |  |
| img_calc      | 4.687    | 5.089 | 5.284 | 5.091 | 7.635 | 8.615  |  |
| img_interp    | 1.088    | 1.153 | 1.422 | 1.869 | 2.366 | 3.274  |  |
| input_chip    | 0.368    | 0.414 | 0.422 | 0.527 | 0.844 | 1.260  |  |
| peak_chip     | 0.401    | 0.500 | 0.422 | 0.552 | 0.744 | 1.137  |  |
| scale125_chip | 1.130    | 1.244 | 1.372 | 1.952 | 2.912 | 4.136  |  |
| scale2_chip   | 0.446    | 0.487 | 0.622 | 0.693 | 1.125 | 1.781  |  |
| warping       | 0.464    | 0.578 | 0.706 | 0.742 | 1.191 | 1.972  |  |
| Geom. Avg.    | 0.733    | 0.883 | 1.170 | 1.508 | 2.040 | 3.000  |  |

Table B.6: Intra-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 6)

| Circuit       | LUT Size |       |       |       |       |        |  |
|---------------|----------|-------|-------|-------|-------|--------|--|
|               | 2        | 3     | 4     | 5     | 6     | 7      |  |
| alu4          | 0.476    | 0.632 | 0.848 | 1.153 | 1.705 | 2.315  |  |
| apex2         | 0.552    | 0.744 | 1.053 | 1.461 | 2.148 | 3.278  |  |
| apex4         | 0.385    | 0.534 | 0.736 | 1.045 | 1.422 | 2.061  |  |
| bigkey        | 0.512    | 0.767 | 0.941 | 1.178 | 0.999 | 1.132  |  |
| clma          | 2.453    | 3.341 | 4.650 | 6.051 | 9.007 | 13.399 |  |
| des           | 0.501    | 0.688 | 0.883 | 1.141 | 0.807 | 1.200  |  |
| diffeq        | 0.442    | 0.558 | 0.833 | 1.057 | 1.261 | 1.943  |  |
| dsip          | 0.435    | 0.590 | 0.756 | 0.791 | 0.999 | 3.278  |  |
| elliptic      | 0.942    | 1.285 | 2.001 | 2.361 | 3.086 | 5.458  |  |
| ex1010        | 1.406    | 2.027 | 2.645 | 3.817 | 4.559 | 4.511  |  |
| ex5p          | 0.315    | 0.451 | 0.609 | 0.785 | 1.079 | 1.352  |  |
| frisc         | 1.037    | 1.370 | 1.967 | 2.651 | 4.246 | 6.100  |  |
| misex3        | 0.445    | 0.599 | 0.783 | 1.075 | 1.684 | 2.518  |  |
| pdc           | 1.460    | 1.870 | 2.580 | 3.557 | 5.285 | 7.350  |  |
| s298          | 0.734    | 0.899 | 1.068 | 1.401 | 1.876 | 2.771  |  |
| s38417        | 2.352    | 2.944 | 3.536 | 4.698 | 4.882 | 7.485  |  |
| s38584        | 2.147    | 2.787 | 3.551 | 4.668 | 6.233 | 8.432  |  |
| seq           | 0.514    | 0.700 | 0.979 | 1.335 | 1.926 | 2.923  |  |
| spla          | 1.290    | 1.648 | 2.075 | 3.032 | 4.387 | 6.472  |  |
| tseng         | 0.326    | 0.379 | 0.586 | 0.749 | 1.160 | 2.129  |  |
| display_chip  | 0.825    | 0.998 | 0.991 | 1.431 | 1.765 | 2.535  |  |
| img_calc      | 4.881    | 5.778 | 5.587 | 5.393 | 7.988 | 9.107  |  |
| img_interp    | 1.133    | 1.307 | 1.504 | 1.981 | 2.471 | 3.464  |  |
| input_chip    | 0.383    | 0.471 | 0.447 | 0.556 | 0.888 | 1.335  |  |
| peak_chip     | 0.418    | 0.569 | 0.447 | 0.586 | 0.777 | 1.200  |  |
| scale125_chip | 1.179    | 1.415 | 1.450 | 2.071 | 3.046 | 4.376  |  |
| scale2_chip   | 0.465    | 0.554 | 0.656 | 0.731 | 1.180 | 1.876  |  |
| warping       | 0.483    | 0.657 | 0.748 | 0.785 | 1.241 | 2.078  |  |
| Geom. Avg.    | 0.764    | 0.995 | 1.237 | 1.595 | 2.133 | 3.168  |  |

Table B.7: Intra-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 7)

| Circuit       | LUT Size |       |       |       |       |        |  |
|---------------|----------|-------|-------|-------|-------|--------|--|
|               | 2        | 3     | 4     | 5     | 6     | 7      |  |
| alu4          | 0.548    | 0.664 | 0.893 | 1.214 | 1.822 | 2.372  |  |
| apex2         | 0.637    | 0.793 | 1.116 | 1.531 | 2.290 | 3.360  |  |
| apex4         | 0.446    | 0.562 | 0.767 | 1.091 | 1.527 | 2.135  |  |
| bigkey        | 0.592    | 0.806 | 0.995 | 1.235 | 1.071 | 1.146  |  |
| clma          | 2.836    | 3.520 | 4.900 | 6.335 | 9.616 | 13.699 |  |
| des           | 0.580    | 0.725 | 0.930 | 1.192 | 0.862 | 1.245  |  |
| diffeq        | 0.511    | 0.588 | 0.879 | 1.105 | 1.342 | 1.977  |  |
| dsip          | 0.503    | 0.621 | 0.800 | 0.831 | 1.059 | 3.360  |  |
| elliptic      | 1.091    | 1.349 | 2.111 | 2.471 | 3.288 | 5.594  |  |
| ex1010        | 1.625    | 2.174 | 2.785 | 3.995 | 4.839 | 4.586  |  |
| ex5p          | 0.364    | 0.478 | 0.646 | 0.831 | 1.157 | 1.384  |  |
| frisc         | 1.202    | 1.443 | 2.073 | 2.788 | 4.531 | 6.246  |  |
| misex3        | 0.515    | 0.637 | 0.828 | 1.127 | 1.798 | 2.589  |  |
| pdc           | 1.686    | 1.972 | 2.706 | 3.720 | 5.652 | 7.531  |  |
| s298          | 0.850    | 0.943 | 1.130 | 1.466 | 2.007 | 2.827  |  |
| s38417        | 2.715    | 3.085 | 3.728 | 4.920 | 5.221 | 7.670  |  |
| s38584        | 2.482    | 2.921 | 3.747 | 4.891 | 6.661 | 8.638  |  |
| seq           | 0.594    | 0.736 | 1.027 | 1.401 | 2.056 | 2.965  |  |
| spla          | 1.491    | 1.733 | 2.180 | 3.171 | 4.667 | 6.622  |  |
| tseng         | 0.373    | 0.403 | 0.618 | 0.780 | 1.231 | 2.174  |  |
| display_chip  | 0.950    | 1.048 | 1.046 | 1.503 | 1.884 | 2.589  |  |
| img_calc      | 5.641    | 6.056 | 5.895 | 5.649 | 8.533 | 9.330  |  |
| img_interp    | 1.310    | 1.370 | 1.585 | 2.073 | 2.647 | 3.538  |  |
| input_chip    | 0.443    | 0.494 | 0.470 | 0.585 | 0.948 | 1.364  |  |
| peak_chip     | 0.483    | 0.597 | 0.474 | 0.614 | 0.837 | 1.226  |  |
| scale125_chip | 1.363    | 1.483 | 1.530 | 2.167 | 3.251 | 4.467  |  |
| scale2_chip   | 0.538    | 0.580 | 0.693 | 0.766 | 1.256 | 1.917  |  |
| warping       | 0.559    | 0.688 | 0.790 | 0.824 | 1.330 | 2.135  |  |
| Geom. Avg.    | 0.883    | 1.048 | 1.305 | 1.671 | 2.278 | 3.241  |  |

Table B.8: Intra-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 8)

| Circuit       | LUT Size |       |       |       |       |        |  |  |
|---------------|----------|-------|-------|-------|-------|--------|--|--|
|               | 2        | 3     | 4     | 5     | 6     | 7      |  |  |
| alu4          | 0.574    | 0.704 | 0.941 | 1.318 | 1.890 | 2.447  |  |  |
| apex2         | 0.666    | 0.829 | 1.166 | 1.662 | 2.376 | 3.453  |  |  |
| apex4         | 0.466    | 0.594 | 0.798 | 1.185 | 1.575 | 2.195  |  |  |
| bigkey        | 0.621    | 0.855 | 1.045 | 1.344 | 1.102 | 1.189  |  |  |
| clma          | 2.966    | 3.717 | 5.144 | 6.889 | 9.949 | 14.087 |  |  |
| des           | 0.604    | 0.768 | 0.979 | 1.291 | 0.888 | 1.281  |  |  |
| diffeq        | 0.535    | 0.623 | 0.924 | 1.203 | 1.389 | 2.035  |  |  |
| dsip          | 0.527    | 0.659 | 0.842 | 0.902 | 1.102 | 3.453  |  |  |
| elliptic      | 1.143    | 1.433 | 2.212 | 2.688 | 3.407 | 5.740  |  |  |
| ex1010        | 1.692    | 2.249 | 2.938 | 4.333 | 4.967 | 4.734  |  |  |
| ex5p          | 0.380    | 0.495 | 0.677 | 0.902 | 1.188 | 1.441  |  |  |
| frisc         | 1.259    | 1.529 | 2.179 | 3.015 | 4.681 | 6.426  |  |  |
| misex3        | 0.539    | 0.665 | 0.875 | 1.220 | 1.861 | 2.653  |  |  |
| pdc           | 1.763    | 2.082 | 2.834 | 4.032 | 5.826 | 7.752  |  |  |
| s298          | 0.888    | 1.002 | 1.183 | 1.592 | 2.076 | 2.904  |  |  |
| s38417        | 2.842    | 3.274 | 3.923 | 5.350 | 5.397 | 7.889  |  |  |
| s38584        | 2.597    | 3.107 | 3.945 | 5.323 | 6.886 | 8.873  |  |  |
| seq           | 0.619    | 0.778 | 1.084 | 1.512 | 2.119 | 3.064  |  |  |
| spla          | 1.558    | 1.828 | 2.289 | 3.449 | 4.824 | 6.838  |  |  |
| tseng         | 0.391    | 0.427 | 0.649 | 0.849 | 1.274 | 2.241  |  |  |
| display_chip  | 0.995    | 1.115 | 1.100 | 1.636 | 1.947 | 2.676  |  |  |
| img_calc      | 5.902    | 6.432 | 6.201 | 6.146 | 8.818 | 9.582  |  |  |
| img_interp    | 1.371    | 1.455 | 1.667 | 2.255 | 2.734 | 3.636  |  |  |
| input_chip    | 0.464    | 0.527 | 0.495 | 0.637 | 0.973 | 1.418  |  |  |
| peak_chip     | 0.507    | 0.633 | 0.495 | 0.672 | 0.859 | 1.258  |  |  |
| scale125_chip | 1.427    | 1.578 | 1.612 | 2.361 | 3.364 | 4.596  |  |  |
| scale2_chip   | 0.563    | 0.617 | 0.732 | 0.840 | 1.303 | 1.990  |  |  |
| warping       | 0.585    | 0.733 | 0.831 | 0.893 | 1.374 | 2.195  |  |  |
| Geom. Avg.    | 0.923    | 1.109 | 1.371 | 1.816 | 2.353 | 3.340  |  |  |

Table B.9: Intra-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 9)

| Circuit       | LUT Size |       |       |       |        |        |  |  |
|---------------|----------|-------|-------|-------|--------|--------|--|--|
|               | 2        | 3     | 4     | 5     | 6      | 7      |  |  |
| alu4          | 0.599    | 0.734 | 0.982 | 1.371 | 1.948  | 2.489  |  |  |
| apex2         | 0.697    | 0.864 | 1.219 | 1.729 | 2.439  | 3.526  |  |  |
| apex4         | 0.484    | 0.622 | 0.847 | 1.238 | 1.620  | 2.230  |  |  |
| bigkey        | 0.647    | 0.902 | 1.097 | 1.401 | 1.146  | 1.219  |  |  |
| clma          | 3.100    | 3.890 | 5.402 | 7.171 | 10.230 | 14.391 |  |  |
| des           | 0.632    | 0.801 | 1.027 | 1.350 | 0.917  | 1.296  |  |  |
| diffeq        | 0.558    | 0.652 | 0.969 | 1.248 | 1.424  | 2.074  |  |  |
| dsip          | 0.551    | 0.689 | 0.879 | 0.941 | 1.129  | 3.526  |  |  |
| elliptic      | 1.201    | 1.498 | 2.329 | 2.793 | 3.503  | 5.860  |  |  |
| ex1010        | 1.772    | 2.388 | 3.080 | 4.521 | 5.123  | 4.823  |  |  |
| ex5p          | 0.397    | 0.522 | 0.699 | 0.941 | 1.228  | 1.478  |  |  |
| frisc         | 1.313    | 1.598 | 2.284 | 3.151 | 4.812  | 6.534  |  |  |
| misex3        | 0.562    | 0.697 | 0.911 | 1.279 | 1.915  | 2.723  |  |  |
| pdc           | 1.841    | 2.180 | 2.971 | 4.204 | 5.991  | 7.908  |  |  |
| s298          | 0.929    | 1.047 | 1.245 | 1.657 | 2.144  | 2.982  |  |  |
| s38417        | 2.970    | 3.420 | 4.119 | 5.565 | 5.549  | 8.038  |  |  |
| s38584        | 2.716    | 3.242 | 4.138 | 5.544 | 7.087  | 9.049  |  |  |
| seq           | 0.647    | 0.816 | 1.129 | 1.585 | 2.177  | 3.111  |  |  |
| spla          | 1.626    | 1.908 | 2.400 | 3.590 | 4.976  | 6.975  |  |  |
| tseng         | 0.410    | 0.443 | 0.687 | 0.890 | 1.309  | 2.282  |  |  |
| display_chip  | 1.040    | 1.163 | 1.155 | 1.698 | 2.013  | 2.723  |  |  |
| img_calc      | 6.163    | 6.722 | 6.512 | 6.393 | 9.068  | 9.801  |  |  |
| img_interp    | 1.433    | 1.520 | 1.752 | 2.353 | 2.815  | 3.708  |  |  |
| input_chip    | 0.486    | 0.548 | 0.520 | 0.665 | 1.015  | 1.426  |  |  |
| peak_chip     | 0.528    | 0.663 | 0.520 | 0.696 | 0.884  | 1.296  |  |  |
| scale125_chip | 1.491    | 1.647 | 1.694 | 2.455 | 3.454  | 4.693  |  |  |
| scale2_chip   | 0.588    | 0.645 | 0.764 | 0.869 | 1.342  | 2.022  |  |  |
| warping       | 0.612    | 0.764 | 0.873 | 0.931 | 1.408  | 2.256  |  |  |
| Geom. Avg.    | 0.965    | 1.160 | 1.437 | 1.893 | 2.424  | 3.407  |  |  |

Table B.10: Intra-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 10)

# ${}_{\mathsf{APPENDIX}}\,C$

## Inter-Cluster (Routing) Area

| Circuit       | LUT Size |        |        |        |        |        |  |  |  |
|---------------|----------|--------|--------|--------|--------|--------|--|--|--|
|               | 2        | 3      | 4      | 5      | 6      | 7      |  |  |  |
| alu4          | 3.776    | 2.861  | 2.383  | 2.001  | 1.631  | 1.520  |  |  |  |
| apex2         | 4.363    | 3.540  | 3.214  | 3.296  | 3.083  | 3.003  |  |  |  |
| apex4         | 3.239    | 2.637  | 2.374  | 2.518  | 2.206  | 2.053  |  |  |  |
| bigkey        | 3.498    | 3.157  | 1.825  | 1.462  | 0.785  | 0.562  |  |  |  |
| clma          | 19.275   | 16.364 | 16.036 | 14.546 | 14.351 | 14.880 |  |  |  |
| des           | 3.390    | 2.453  | 1.932  | 1.687  | 0.753  | 0.720  |  |  |  |
| diffeq        | 2.434    | 2.031  | 1.852  | 1.835  | 1.632  | 1.373  |  |  |  |
| dsip          | 2.758    | 1.859  | 1.798  | 0.981  | 0.827  | 1.975  |  |  |  |
| elliptic      | 6.366    | 5.826  | 5.883  | 4.884  | 4.630  | 5.201  |  |  |  |
| ex1010        | 10.909   | 9.479  | 7.481  | 8.351  | 7.686  | 4.866  |  |  |  |
| ex5p          | 2.635    | 2.510  | 2.110  | 1.848  | 1.445  | 0.907  |  |  |  |
| frisc         | 8.732    | 7.413  | 6.552  | 6.733  | 7.310  | 7.541  |  |  |  |
| misex3        | 3.759    | 2.881  | 2.323  | 2.238  | 2.157  | 1.771  |  |  |  |
| pdc           | 14.451   | 11.340 | 10.873 | 10.124 | 8.995  | 7.958  |  |  |  |
| s298          | 4.030    | 2.827  | 2.554  | 2.669  | 2.417  | 2.345  |  |  |  |
| s38417        | 12.703   | 10.408 | 7.725  | 7.967  | 4.948  | 5.173  |  |  |  |
| s38584        | 11.629   | 9.884  | 9.858  | 7.427  | 6.734  | 5.425  |  |  |  |
| seq           | 4.057    | 3.467  | 3.268  | 3.024  | 2.904  | 2.670  |  |  |  |
| spla          | 11.149   | 9.263  | 7.350  | 8.175  | 6.978  | 7.044  |  |  |  |
| tseng         | 1.784    | 1.198  | 1.306  | 1.143  | 1.203  | 1.658  |  |  |  |
| display_chip  | 5.161    | 3.140  | 1.936  | 1.895  | 1.820  | 1.553  |  |  |  |
| img_calc      | 32.413   | 20.326 | 13.090 | 9.133  | 8.597  | 6.927  |  |  |  |
| img_interp    | 7.091    | 4.678  | 3.580  | 3.191  | 2.712  | 2.422  |  |  |  |
| input_chip    | 2.009    | 1.289  | 0.889  | 0.714  | 0.763  | 0.733  |  |  |  |
| peak_chip     | 2.315    | 1.705  | 0.842  | 0.750  | 0.636  | 0.624  |  |  |  |
| scale125_chip | 6.432    | 4.425  | 3.458  | 3.330  | 3.322  | 3.046  |  |  |  |
| scale2_chip   | 2.426    | 1.661  | 1.479  | 0.932  | 1.143  | 1.023  |  |  |  |
| warping       | 2.522    | 1.964  | 1.469  | 1.187  | 1.048  | 1.116  |  |  |  |
| Geom. Avg.    | 5.124    | 3.935  | 3.219  | 2.860  | 2.506  | 2.372  |  |  |  |

Table C.1: Inter-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 1)

| Circuit       |        |        | LUT    | Size   |        |        |
|---------------|--------|--------|--------|--------|--------|--------|
|               | 2      | 3      | 4      | 5      | 6      | 7      |
| alu4          | 3.107  | 2.848  | 2.220  | 1.986  | 1.568  | 1.368  |
| apex2         | 3.242  | 3.369  | 3.213  | 2.971  | 2.599  | 2.470  |
| apex4         | 2.640  | 2.355  | 2.163  | 2.222  | 1.825  | 1.819  |
| bigkey        | 2.492  | 2.087  | 1.392  | 1.178  | 0.616  | 0.439  |
| clma          | 17.513 | 16.285 | 17.253 | 15.113 | 14.499 | 14.481 |
| des           | 2.394  | 2.089  | 1.677  | 1.616  | 0.685  | 0.580  |
| diffeq        | 2.091  | 1.730  | 1.794  | 1.617  | 1.219  | 1.155  |
| dsip          | 2.067  | 1.668  | 1.461  | 0.931  | 0.730  | 1.351  |
| elliptic      | 5.406  | 5.093  | 5.612  | 4.276  | 3.606  | 4.272  |
| ex1010        | 8.052  | 10.658 | 7.648  | 8.177  | 7.412  | 4.188  |
| ex5p          | 2.087  | 2.294  | 1.864  | 1.701  | 1.247  | 0.860  |
| frisc         | 6.837  | 6.061  | 6.651  | 6.572  | 6.246  | 6.194  |
| misex3        | 2.997  | 2.584  | 2.212  | 2.062  | 1.765  | 1.706  |
| pdc           | 12.133 | 10.923 | 10.919 | 9.665  | 9.151  | 7.989  |
| s298          | 3.488  | 3.110  | 2.544  | 2.254  | 1.861  | 2.008  |
| s38417        | 12.727 | 11.780 | 9.513  | 8.298  | 4.516  | 4.569  |
| s38584        | 12.079 | 10.227 | 8.470  | 7.510  | 6.751  | 5.616  |
| seq           | 3.362  | 3.164  | 3.066  | 2.800  | 2.560  | 2.337  |
| spla          | 9.609  | 9.400  | 7.846  | 7.744  | 7.206  | 5.940  |
| tseng         | 1.368  | 1.058  | 1.087  | 1.038  | 0.963  | 1.305  |
| display_chip  | 3.863  | 3.159  | 1.747  | 1.709  | 1.385  | 1.319  |
| img_calc      | 34.582 | 23.932 | 14.931 | 8.952  | 9.377  | 6.649  |
| img_interp    | 5.459  | 4.190  | 2.727  | 2.647  | 2.165  | 1.711  |
| input_chip    | 1.556  | 1.144  | 0.777  | 0.561  | 0.578  | 0.555  |
| peak_chip     | 1.901  | 1.522  | 0.655  | 0.616  | 0.538  | 0.528  |
| scale125_chip | 5.442  | 4.036  | 2.631  | 2.759  | 2.341  | 2.416  |
| scale2_chip   | 2.108  | 1.459  | 1.126  | 0.863  | 0.900  | 0.809  |
| warping       | 2.188  | 1.812  | 1.272  | 0.805  | 0.975  | 0.878  |
| Geom. Avg.    | 4.231  | 3.678  | 2.957  | 2.590  | 2.165  | 2.013  |

Table C.2: Inter-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 2)

| Circuit       | LUT Size |        |        |        |        |        |  |  |
|---------------|----------|--------|--------|--------|--------|--------|--|--|
|               | 2        | 3      | 4      | 5      | 6      | 7      |  |  |
| alu4          | 2.777    | 2.202  | 1.962  | 1.625  | 1.293  | 1.109  |  |  |
| apex2         | 3.340    | 2.885  | 2.577  | 2.557  | 2.281  | 2.172  |  |  |
| apex4         | 2.446    | 2.071  | 1.800  | 1.915  | 1.660  | 1.603  |  |  |
| bigkey        | 2.175    | 1.967  | 1.529  | 1.258  | 0.577  | 0.482  |  |  |
| clma          | 16.754   | 14.602 | 12.727 | 12.321 | 12.079 | 11.972 |  |  |
| des           | 2.269    | 1.845  | 1.567  | 1.403  | 0.521  | 0.531  |  |  |
| diffeq        | 1.893    | 1.526  | 1.565  | 1.357  | 1.056  | 1.037  |  |  |
| dsip          | 1.753    | 1.525  | 1.396  | 0.811  | 0.587  | 1.571  |  |  |
| elliptic      | 5.024    | 4.502  | 4.479  | 3.782  | 3.234  | 4.036  |  |  |
| ex1010        | 8.452    | 9.654  | 6.623  | 6.637  | 6.051  | 3.666  |  |  |
| ex5p          | 1.951    | 1.922  | 1.619  | 1.471  | 1.115  | 0.706  |  |  |
| frisc         | 6.106    | 5.657  | 5.332  | 5.460  | 5.656  | 5.994  |  |  |
| misex3        | 2.597    | 2.253  | 1.923  | 1.738  | 1.572  | 1.420  |  |  |
| pdc           | 11.856   | 9.325  | 9.070  | 7.933  | 6.634  | 6.206  |  |  |
| s298          | 2.892    | 2.388  | 1.964  | 1.827  | 1.607  | 1.630  |  |  |
| s38417        | 12.187   | 9.595  | 7.244  | 6.600  | 3.989  | 4.295  |  |  |
| s38584        | 10.463   | 9.517  | 7.278  | 6.563  | 5.678  | 5.004  |  |  |
| seq           | 3.301    | 2.703  | 2.392  | 2.203  | 2.164  | 1.937  |  |  |
| spla          | 9.425    | 8.276  | 6.779  | 6.312  | 5.306  | 4.794  |  |  |
| tseng         | 1.191    | 0.975  | 1.009  | 0.840  | 0.863  | 1.130  |  |  |
| display_chip  | 3.444    | 2.536  | 1.554  | 1.544  | 1.244  | 1.209  |  |  |
| img_calc      | 32.381   | 21.932 | 13.335 | 8.869  | 7.953  | 5.785  |  |  |
| img_interp    | 4.690    | 3.413  | 2.393  | 2.174  | 1.845  | 1.625  |  |  |
| input_chip    | 1.389    | 1.007  | 0.616  | 0.536  | 0.547  | 0.541  |  |  |
| peak_chip     | 1.691    | 1.233  | 0.589  | 0.531  | 0.464  | 0.422  |  |  |
| scale125_chip | 4.798    | 3.421  | 2.242  | 2.353  | 2.389  | 2.107  |  |  |
| scale2_chip   | 1.657    | 1.205  | 0.911  | 0.693  | 0.716  | 0.741  |  |  |
| warping       | 1.941    | 1.413  | 1.140  | 0.748  | 0.761  | 0.926  |  |  |
| Geom. Avg.    | 3.852    | 3.168  | 2.529  | 2.221  | 1.851  | 1.803  |  |  |

Table C.3: Inter-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 3)

| Circuit       | LUT Size |        |        |        |        |        |  |  |
|---------------|----------|--------|--------|--------|--------|--------|--|--|
|               | 2        | 3      | 4      | 5      | 6      | 7      |  |  |
| alu4          | 2.430    | 2.011  | 1.682  | 1.355  | 1.163  | 0.927  |  |  |
| apex2         | 3.052    | 2.825  | 2.413  | 2.410  | 2.004  | 2.022  |  |  |
| apex4         | 2.180    | 1.949  | 1.727  | 1.717  | 1.449  | 1.506  |  |  |
| bigkey        | 1.803    | 1.764  | 1.235  | 0.967  | 0.485  | 0.336  |  |  |
| clma          | 15.071   | 13.692 | 12.071 | 11.206 | 11.522 | 11.246 |  |  |
| des           | 1.748    | 1.391  | 1.316  | 1.155  | 0.424  | 0.413  |  |  |
| diffeq        | 1.684    | 1.454  | 1.372  | 1.231  | 0.983  | 0.869  |  |  |
| dsip          | 1.362    | 1.010  | 0.992  | 0.625  | 0.515  | 1.342  |  |  |
| elliptic      | 4.528    | 4.128  | 4.616  | 3.451  | 5.084  | 3.671  |  |  |
| ex1010        | 8.149    | 8.879  | 6.192  | 6.091  | 5.247  | 3.149  |  |  |
| ex5p          | 1.749    | 1.740  | 1.517  | 1.234  | 0.966  | 0.666  |  |  |
| frisc         | 5.689    | 5.263  | 4.735  | 5.015  | 5.019  | 5.504  |  |  |
| misex3        | 2.484    | 2.196  | 1.718  | 1.543  | 1.459  | 1.265  |  |  |
| pdc           | 10.194   | 8.364  | 8.638  | 7.260  | 6.136  | 5.370  |  |  |
| s298          | 2.541    | 2.193  | 1.660  | 1.614  | 1.428  | 1.481  |  |  |
| s38417        | 12.017   | 9.344  | 7.108  | 5.789  | 3.616  | 3.788  |  |  |
| s38584        | 10.245   | 7.931  | 6.507  | 5.751  | 5.414  | 4.461  |  |  |
| seq           | 2.888    | 2.648  | 2.241  | 2.049  | 1.920  | 1.749  |  |  |
| spla          | 8.378    | 7.526  | 6.281  | 5.798  | 5.123  | 4.434  |  |  |
| tseng         | 1.094    | 0.809  | 0.936  | 0.713  | 0.721  | 1.099  |  |  |
| display_chip  | 3.064    | 2.222  | 1.439  | 1.504  | 1.075  | 1.087  |  |  |
| img_calc      | 30.028   | 20.307 | 11.853 | 8.266  | 7.442  | 5.561  |  |  |
| img_interp    | 4.478    | 3.090  | 2.195  | 1.928  | 1.852  | 1.559  |  |  |
| input_chip    | 1.275    | 0.930  | 0.580  | 0.484  | 0.448  | 0.466  |  |  |
| peak chip     | 1.379    | 1.157  | 0.499  | 0.452  | 0.421  | 0.447  |  |  |
| scale125_chip | 4.440    | 3.085  | 2.062  | 2.003  | 1.949  | 1.820  |  |  |
| scale2_chip   | 1.518    | 1.045  | 0.803  | 0.623  | 0.613  | 0.682  |  |  |
| warping       | 1.713    | 1.271  | 0.981  | 0.645  | 0.668  | 0.678  |  |  |
| Geom. Avg.    | 3.449    | 2.850  | 2.277  | 1.959  | 1.681  | 1.600  |  |  |

Table C.4: Inter-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 4)

| Circuit       | LUT Size |        |        |        |        |        |  |  |
|---------------|----------|--------|--------|--------|--------|--------|--|--|
|               | 2        | 3      | 4      | 5      | 6      | 7      |  |  |
| alu4          | 2.330    | 1.850  | 1.584  | 1.311  | 1.070  | 1.076  |  |  |
| apex2         | 2.909    | 2.581  | 2.434  | 2.211  | 1.957  | 2.026  |  |  |
| apex4         | 2.113    | 1.831  | 1.643  | 1.704  | 1.407  | 1.405  |  |  |
| bigkey        | 1.833    | 1.545  | 1.169  | 0.946  | 0.500  | 0.401  |  |  |
| clma          | 15.146   | 13.099 | 11.719 | 10.974 | 10.616 | 12.214 |  |  |
| des           | 1.781    | 1.439  | 1.244  | 1.151  | 0.433  | 0.404  |  |  |
| diffeq        | 1.597    | 1.441  | 1.381  | 1.252  | 0.949  | 1.024  |  |  |
| dsip          | 1.497    | 1.041  | 1.014  | 0.675  | 0.519  | 1.352  |  |  |
| elliptic      | 4.855    | 3.944  | 4.445  | 3.612  | 2.953  | 3.746  |  |  |
| ex1010        | 7.783    | 8.390  | 5.788  | 5.759  | 5.162  | 3.347  |  |  |
| ex5p          | 1.672    | 1.692  | 1.431  | 1.300  | 0.927  | 0.684  |  |  |
| frisc         | 5.146    | 4.760  | 4.535  | 4.527  | 5.025  | 5.648  |  |  |
| misex3        | 2.390    | 1.945  | 1.650  | 1.527  | 1.318  | 1.220  |  |  |
| pdc           | 9.892    | 7.968  | 7.885  | 6.707  | 6.191  | 5.349  |  |  |
| s298          | 2.131    | 1.795  | 1.558  | 1.565  | 1.306  | 1.456  |  |  |
| s38417        | 11.268   | 8.745  | 6.198  | 5.881  | 3.563  | 3.980  |  |  |
| s38584        | 9.783    | 7.898  | 6.635  | 5.647  | 4.937  | 4.535  |  |  |
| seq           | 2.642    | 2.388  | 2.160  | 2.020  | 1.739  | 1.812  |  |  |
| spla          | 8.184    | 6.821  | 5.444  | 5.127  | 5.058  | 4.094  |  |  |
| tseng         | 1.049    | 0.808  | 0.931  | 0.699  | 0.752  | 1.112  |  |  |
| display_chip  | 2.670    | 1.927  | 1.288  | 1.501  | 1.047  | 1.100  |  |  |
| img_calc      | 26.536   | 18.446 | 10.753 | 7.939  | 7.436  | 5.378  |  |  |
| img_interp    | 4.229    | 3.000  | 1.999  | 1.726  | 1.783  | 1.470  |  |  |
| input_chip    | 1.229    | 0.855  | 0.544  | 0.444  | 0.459  | 0.476  |  |  |
| peak chip     | 1.221    | 1.078  | 0.511  | 0.468  | 0.410  | 0.466  |  |  |
| scale125_chip | 3.827    | 2.887  | 1.939  | 1.805  | 1.956  | 1.934  |  |  |
| scale2_chip   | 1.385    | 0.995  | 0.747  | 0.567  | 0.666  | 0.649  |  |  |
| warping       | 1.613    | 1.299  | 0.962  | 0.668  | 0.724  | 0.827  |  |  |
| Geom. Avg.    | 3.275    | 2.677  | 2.165  | 1.905  | 1.619  | 1.646  |  |  |

Table C.5: Inter-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 5)

| Circuit       |        |        | LUT    | Size   |        |        |
|---------------|--------|--------|--------|--------|--------|--------|
|               | 2      | 3      | 4      | 5      | 6      | 7      |
| alu4          | 2.192  | 1.754  | 1.358  | 1.242  | 0.991  | 0.883  |
| apex2         | 2.769  | 2.590  | 2.323  | 2.217  | 1.972  | 1.905  |
| apex4         | 2.018  | 1.893  | 1.748  | 1.485  | 1.406  | 1.367  |
| bigkey        | 1.519  | 1.192  | 1.051  | 0.806  | 0.431  | 0.347  |
| clma          | 13.824 | 12.416 | 11.364 | 10.636 | 10.393 | 11.510 |
| des           | 1.628  | 1.393  | 1.113  | 0.985  | 0.387  | 0.405  |
| diffeq        | 1.529  | 1.240  | 1.273  | 1.131  | 0.887  | 0.949  |
| dsip          | 1.439  | 1.059  | 0.755  | 0.541  | 0.461  | 1.128  |
| elliptic      | 4.234  | 3.902  | 4.246  | 3.429  | 2.825  | 3.592  |
| ex1010        | 7.593  | 8.516  | 5.716  | 5.719  | 5.448  | 3.168  |
| ex5p          | 1.727  | 1.708  | 1.396  | 1.201  | 0.920  | 0.699  |
| frisc         | 5.277  | 4.759  | 4.344  | 4.394  | 4.599  | 5.453  |
| misex3        | 2.199  | 1.936  | 1.499  | 1.420  | 1.310  | 1.204  |
| pdc           | 9.327  | 7.793  | 7.990  | 6.805  | 5.823  | 5.061  |
| s298          | 1.906  | 1.736  | 1.508  | 1.530  | 1.226  | 1.314  |
| s38417        | 11.027 | 8.161  | 6.145  | 5.366  | 3.514  | 3.825  |
| s38584        | 9.774  | 7.476  | 6.374  | 5.571  | 4.666  | 4.345  |
| seq           | 2.640  | 2.321  | 1.967  | 1.909  | 1.704  | 1.760  |
| spla          | 7.786  | 6.832  | 5.240  | 5.115  | 4.492  | 3.885  |
| tseng         | 0.979  | 0.813  | 0.907  | 0.690  | 0.738  | 1.045  |
| display_chip  | 2.434  | 1.986  | 1.396  | 1.470  | 1.103  | 1.092  |
| img_calc      | 27.281 | 16.814 | 10.557 | 7.508  | 6.761  | 5.367  |
| img_interp    | 3.792  | 2.739  | 2.064  | 1.804  | 1.583  | 1.460  |
| input_chip    | 1.036  | 0.842  | 0.538  | 0.429  | 0.445  | 0.466  |
| peak chip     | 1.161  | 0.997  | 0.470  | 0.449  | 0.401  | 0.420  |
| scale125_chip | 3.705  | 2.639  | 1.872  | 1.840  | 1.709  | 1.813  |
| scale2_chip   | 1.282  | 0.979  | 0.759  | 0.571  | 0.634  | 0.634  |
| warping       | 1.660  | 1.171  | 0.853  | 0.620  | 0.645  | 0.736  |
| Geom. Avg.    | 3.095  | 2.570  | 2.063  | 1.809  | 1.535  | 1.555  |

Table C.6: Inter-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 6)

| Circuit       | LUT Size |        |        |        |        |        |  |  |
|---------------|----------|--------|--------|--------|--------|--------|--|--|
|               | 2        | 3      | 4      | 5      | 6      | 7      |  |  |
| alu4          | 2.230    | 1.713  | 1.416  | 1.164  | 1.004  | 0.827  |  |  |
| apex2         | 2.709    | 2.359  | 2.231  | 2.065  | 1.954  | 1.854  |  |  |
| apex4         | 2.073    | 1.773  | 1.637  | 1.437  | 1.565  | 1.313  |  |  |
| bigkey        | 1.474    | 1.457  | 1.259  | 0.867  | 0.501  | 0.389  |  |  |
| clma          | 12.948   | 12.610 | 10.848 | 10.391 | 10.806 | 10.997 |  |  |
| des           | 1.790    | 1.451  | 1.107  | 1.031  | 0.427  | 0.403  |  |  |
| diffeq        | 1.405    | 1.235  | 1.153  | 1.102  | 0.924  | 0.927  |  |  |
| dsip          | 1.352    | 1.247  | 0.805  | 0.603  | 0.501  | 1.173  |  |  |
| elliptic      | 4.271    | 3.718  | 4.223  | 3.256  | 2.952  | 3.661  |  |  |
| ex1010        | 7.334    | 8.057  | 5.741  | 5.403  | 4.991  | 3.081  |  |  |
| ex5p          | 1.620    | 1.617  | 1.328  | 1.228  | 0.912  | 0.665  |  |  |
| frisc         | 5.165    | 4.376  | 4.327  | 4.234  | 4.902  | 5.464  |  |  |
| misex3        | 2.219    | 1.806  | 1.522  | 1.369  | 1.276  | 1.112  |  |  |
| pdc           | 8.852    | 7.441  | 7.194  | 6.505  | 5.682  | 4.819  |  |  |
| s298          | 1.988    | 1.598  | 1.305  | 1.350  | 1.288  | 1.250  |  |  |
| s38417        | 10.769   | 8.088  | 5.785  | 5.132  | 3.597  | 3.804  |  |  |
| s38584        | 9.021    | 7.226  | 5.906  | 5.584  | 4.885  | 4.260  |  |  |
| seq           | 2.588    | 2.335  | 2.014  | 1.810  | 1.750  | 1.669  |  |  |
| spla          | 7.334    | 6.586  | 5.060  | 4.588  | 4.624  | 3.757  |  |  |
| tseng         | 0.941    | 0.828  | 0.854  | 0.661  | 0.710  | 1.150  |  |  |
| display_chip  | 2.286    | 1.684  | 1.287  | 1.454  | 1.050  | 1.010  |  |  |
| img_calc      | 24.923   | 16.304 | 10.178 | 7.572  | 6.975  | 5.438  |  |  |
| img_interp    | 3.830    | 2.319  | 2.020  | 1.816  | 1.710  | 1.424  |  |  |
| input_chip    | 1.134    | 0.790  | 0.529  | 0.406  | 0.457  | 0.511  |  |  |
| peak_chip     | 1.068    | 1.016  | 0.481  | 0.428  | 0.435  | 0.403  |  |  |
| scale125_chip | 3.351    | 2.560  | 1.681  | 1.818  | 1.838  | 1.760  |  |  |
| scale2_chip   | 1.305    | 0.955  | 0.758  | 0.659  | 0.722  | 0.683  |  |  |
| warping       | 1.522    | 1.146  | 0.904  | 0.636  | 0.731  | 0.808  |  |  |
| Geom. Avg.    | 3.011    | 2.502  | 2.015  | 1.777  | 1.597  | 1.543  |  |  |

Table C.7: Inter-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 7)

| Circuit       | LUT Size |        |        |        |        |        |  |  |
|---------------|----------|--------|--------|--------|--------|--------|--|--|
|               | 2        | 3      | 4      | 5      | 6      | 7      |  |  |
| alu4          | 2.105    | 1.651  | 1.401  | 1.154  | 0.850  | 0.787  |  |  |
| apex2         | 2.585    | 2.313  | 2.206  | 2.024  | 1.725  | 1.787  |  |  |
| apex4         | 2.008    | 1.780  | 1.545  | 1.407  | 1.395  | 1.348  |  |  |
| bigkey        | 1.388    | 1.395  | 1.094  | 0.786  | 0.459  | 0.337  |  |  |
| clma          | 12.537   | 12.122 | 10.842 | 10.343 | 10.191 | 10.808 |  |  |
| des           | 1.610    | 1.277  | 1.118  | 0.992  | 0.390  | 0.368  |  |  |
| diffeq        | 1.427    | 1.234  | 1.187  | 1.064  | 0.849  | 0.917  |  |  |
| dsip          | 1.202    | 0.962  | 0.790  | 0.528  | 0.466  | 1.067  |  |  |
| elliptic      | 4.374    | 3.714  | 3.854  | 3.191  | 2.781  | 3.631  |  |  |
| ex1010        | 7.050    | 8.203  | 5.796  | 5.343  | 4.841  | 2.877  |  |  |
| ex5p          | 1.553    | 1.638  | 1.299  | 1.363  | 0.895  | 0.719  |  |  |
| frisc         | 4.729    | 4.560  | 3.980  | 4.280  | 4.753  | 5.535  |  |  |
| misex3        | 2.203    | 1.774  | 1.479  | 1.283  | 1.206  | 1.047  |  |  |
| pdc           | 8.291    | 7.269  | 7.059  | 6.277  | 5.453  | 4.444  |  |  |
| s298          | 1.845    | 1.507  | 1.353  | 1.293  | 1.222  | 1.368  |  |  |
| s38417        | 10.215   | 7.151  | 5.687  | 4.812  | 3.342  | 3.549  |  |  |
| s38584        | 8.645    | 6.901  | 5.715  | 5.214  | 4.753  | 4.400  |  |  |
| seq           | 2.459    | 2.172  | 1.966  | 1.829  | 1.638  | 1.600  |  |  |
| spla          | 7.213    | 6.566  | 4.776  | 4.676  | 4.316  | 3.613  |  |  |
| tseng         | 0.845    | 0.762  | 0.865  | 0.673  | 0.712  | 1.055  |  |  |
| display_chip  | 2.239    | 1.781  | 1.216  | 1.388  | 1.071  | 1.047  |  |  |
| img_calc      | 22.295   | 15.376 | 9.681  | 7.531  | 6.767  | 5.236  |  |  |
| img_interp    | 3.470    | 2.427  | 1.937  | 1.701  | 1.494  | 1.357  |  |  |
| input_chip    | 0.991    | 0.766  | 0.489  | 0.433  | 0.457  | 0.475  |  |  |
| peak chip     | 1.068    | 0.997  | 0.468  | 0.430  | 0.415  | 0.430  |  |  |
| scale125_chip | 3.368    | 2.451  | 1.555  | 1.651  | 1.758  | 1.637  |  |  |
| scale2_chip   | 1.215    | 0.886  | 0.675  | 0.538  | 0.608  | 0.633  |  |  |
| warping       | 1.427    | 1.134  | 0.819  | 0.601  | 0.653  | 0.743  |  |  |
| Geom. Avg.    | 2.854    | 2.413  | 1.940  | 1.720  | 1.495  | 1.492  |  |  |

Table C.8: Inter-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 8)

| Circuit       |        | LUT Size |        |        |       |        |  |  |  |
|---------------|--------|----------|--------|--------|-------|--------|--|--|--|
|               | 2      | 3        | 4      | 5      | 6     | 7      |  |  |  |
| alu4          | 2.028  | 1.584    | 1.221  | 1.025  | 0.877 | 0.782  |  |  |  |
| apex2         | 2.545  | 2.167    | 2.132  | 2.010  | 1.741 | 1.725  |  |  |  |
| apex4         | 1.841  | 1.680    | 1.467  | 1.494  | 1.304 | 1.334  |  |  |  |
| bigkey        | 1.558  | 1.165    | 0.958  | 0.771  | 0.429 | 0.330  |  |  |  |
| clma          | 12.295 | 11.074   | 10.397 | 10.174 | 9.971 | 10.372 |  |  |  |
| des           | 1.551  | 1.196    | 1.029  | 0.886  | 0.374 | 0.365  |  |  |  |
| diffeq        | 1.478  | 1.106    | 1.097  | 1.025  | 0.884 | 0.846  |  |  |  |
| dsip          | 1.281  | 0.854    | 0.699  | 0.531  | 0.429 | 1.032  |  |  |  |
| elliptic      | 4.180  | 3.551    | 3.717  | 3.256  | 2.712 | 3.419  |  |  |  |
| ex1010        | 6.633  | 7.764    | 5.825  | 5.531  | 4.874 | 3.019  |  |  |  |
| ex5p          | 1.582  | 1.576    | 1.300  | 1.191  | 0.910 | 0.616  |  |  |  |
| frisc         | 4.766  | 4.404    | 3.914  | 4.210  | 4.410 | 5.154  |  |  |  |
| misex3        | 2.103  | 1.651    | 1.424  | 1.388  | 1.180 | 1.077  |  |  |  |
| pdc           | 8.577  | 6.852    | 6.761  | 6.367  | 5.087 | 4.381  |  |  |  |
| s298          | 1.824  | 1.436    | 1.248  | 1.313  | 1.244 | 1.333  |  |  |  |
| s38417        | 9.599  | 7.032    | 5.490  | 5.026  | 3.267 | 3.391  |  |  |  |
| s38584        | 8.271  | 7.052    | 5.833  | 5.420  | 4.824 | 4.425  |  |  |  |
| seq           | 2.367  | 2.209    | 1.828  | 1.761  | 1.552 | 1.579  |  |  |  |
| spla          | 6.923  | 5.985    | 4.533  | 4.777  | 4.107 | 3.464  |  |  |  |
| tseng         | 0.906  | 0.754    | 0.937  | 0.693  | 0.702 | 1.108  |  |  |  |
| display_chip  | 2.304  | 1.716    | 1.161  | 1.485  | 0.968 | 0.998  |  |  |  |
| img_calc      | 21.870 | 15.219   | 9.297  | 7.690  | 6.632 | 5.235  |  |  |  |
| img_interp    | 3.242  | 2.252    | 1.796  | 1.813  | 1.523 | 1.380  |  |  |  |
| input_chip    | 0.931  | 0.709    | 0.454  | 0.456  | 0.436 | 0.467  |  |  |  |
| peak chip     | 1.035  | 0.916    | 0.454  | 0.458  | 0.362 | 0.404  |  |  |  |
| scale125_chip | 3.236  | 2.357    | 1.477  | 1.819  | 1.702 | 1.730  |  |  |  |
| scale2_chip   | 1.196  | 0.840    | 0.710  | 0.586  | 0.630 | 0.618  |  |  |  |
| warping       | 1.369  | 1.020    | 0.811  | 0.602  | 0.627 | 0.695  |  |  |  |
| Geom. Avg.    | 2.809  | 2.276    | 1.857  | 1.739  | 1.453 | 1.456  |  |  |  |

Table C.9: Inter-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 9)

| Circuit       | LUT Size |        |        |        |        |        |  |  |  |
|---------------|----------|--------|--------|--------|--------|--------|--|--|--|
|               | 2        | 3      | 4      | 5      | 6      | 7      |  |  |  |
| alu4          | 2.059    | 1.562  | 1.345  | 1.093  | 0.894  | 0.795  |  |  |  |
| apex2         | 2.628    | 2.259  | 2.019  | 2.135  | 1.743  | 1.783  |  |  |  |
| apex4         | 1.902    | 1.785  | 1.637  | 1.526  | 1.407  | 1.291  |  |  |  |
| bigkey        | 1.630    | 1.215  | 1.014  | 0.859  | 0.474  | 0.351  |  |  |  |
| clma          | 12.207   | 11.094 | 10.442 | 10.524 | 10.296 | 10.607 |  |  |  |
| des           | 1.564    | 1.267  | 1.039  | 0.951  | 0.397  | 0.400  |  |  |  |
| diffeq        | 1.369    | 1.190  | 1.106  | 0.995  | 0.844  | 0.879  |  |  |  |
| dsip          | 1.400    | 0.929  | 0.742  | 0.563  | 0.467  | 1.038  |  |  |  |
| elliptic      | 4.130    | 3.547  | 3.880  | 3.144  | 2.913  | 3.380  |  |  |  |
| ex1010        | 6.926    | 8.022  | 5.718  | 5.266  | 4.754  | 3.141  |  |  |  |
| ex5p          | 1.526    | 1.564  | 1.334  | 1.259  | 0.910  | 0.711  |  |  |  |
| frisc         | 4.664    | 4.372  | 3.987  | 4.256  | 4.603  | 5.103  |  |  |  |
| misex3        | 1.933    | 1.772  | 1.496  | 1.276  | 1.170  | 0.962  |  |  |  |
| pdc           | 8.018    | 6.629  | 6.794  | 6.195  | 5.460  | 4.411  |  |  |  |
| s298          | 1.639    | 1.393  | 1.302  | 1.272  | 1.176  | 1.372  |  |  |  |
| s38417        | 9.496    | 7.097  | 5.309  | 4.948  | 3.443  | 3.794  |  |  |  |
| s38584        | 7.808    | 6.906  | 5.990  | 5.481  | 4.925  | 4.561  |  |  |  |
| seq           | 2.378    | 2.164  | 1.831  | 1.890  | 1.672  | 1.584  |  |  |  |
| spla          | 6.658    | 6.267  | 4.619  | 4.451  | 4.121  | 3.488  |  |  |  |
| tseng         | 0.869    | 0.831  | 0.937  | 0.676  | 0.694  | 1.139  |  |  |  |
| display_chip  | 2.056    | 1.635  | 1.208  | 1.447  | 1.062  | 1.059  |  |  |  |
| img_calc      | 20.291   | 15.516 | 9.177  | 7.427  | 6.965  | 5.124  |  |  |  |
| img_interp    | 3.324    | 2.124  | 1.967  | 1.694  | 1.610  | 1.595  |  |  |  |
| input_chip    | 0.920    | 0.669  | 0.452  | 0.407  | 0.444  | 0.434  |  |  |  |
| peak_chip     | 1.061    | 0.899  | 0.498  | 0.426  | 0.434  | 0.394  |  |  |  |
| scale125_chip | 3.046    | 2.283  | 1.580  | 1.666  | 1.858  | 1.662  |  |  |  |
| scale2_chip   | 1.169    | 0.857  | 0.755  | 0.531  | 0.606  | 0.665  |  |  |  |
| warping       | 1.359    | 1.044  | 0.862  | 0.679  | 0.692  | 0.770  |  |  |  |
| Geom. Avg.    | 2.756    | 2.303  | 1.915  | 1.728  | 1.512  | 1.494  |  |  |  |

Table C.10: Inter-Cluster Area ( $\times 10^6$ ) in Min. Width Trans. Area (Cluster Size = 10)

# APPENDIX D

### FPGA Channel Width

| Circuit       | LUT Size |      |      |      |      |      |  |  |
|---------------|----------|------|------|------|------|------|--|--|
|               | 2        | 3    | 4    | 5    | 6    | 7    |  |  |
| alu4          | 19       | 19   | 19   | 17   | 15   | 16   |  |  |
| apex2         | 19       | 20   | 21   | 23   | 23   | 23   |  |  |
| apex4         | 20       | 21   | 23   | 25   | 25   | 25   |  |  |
| bigkey        | 16       | 17   | 13   | 12   | 12   | 12   |  |  |
| clma          | 19       | 21   | 24   | 25   | 26   | 29   |  |  |
| des           | 16       | 15   | 15   | 15   | 15   | 15   |  |  |
| diffeq        | 13       | 15   | 15   | 17   | 20   | 17   |  |  |
| dsip          | 15       | 13   | 16   | 12   | 13   | 15   |  |  |
| elliptic      | 16       | 19   | 20   | 21   | 24   | 24   |  |  |
| ex1010        | 19       | 21   | 20   | 23   | 28   | 28   |  |  |
| ex5p          | 20       | 24   | 24   | 24   | 21   | 16   |  |  |
| frisc         | 20       | 23   | 23   | 26   | 28   | 32   |  |  |
| misex3        | 20       | 20   | 20   | 21   | 20   | 17   |  |  |
| pdc           | 24       | 26   | 30   | 30   | 28   | 28   |  |  |
| s298          | 13       | 13   | 16   | 19   | 20   | 21   |  |  |
| s38417        | 13       | 15   | 15   | 17   | 16   | 17   |  |  |
| s38584        | 13       | 15   | 19   | 16   | 17   | 16   |  |  |
| seq           | 19       | 21   | 23   | 23   | 24   | 23   |  |  |
| spla          | 21       | 24   | 25   | 28   | 26   | 28   |  |  |
| tseng         | 13       | 13   | 15   | 15   | 16   | 19   |  |  |
| display_chip  | 15       | 13   | 13   | 13   | 16   | 15   |  |  |
| img_calc      | 16       | 15   | 16   | 17   | 17   | 19   |  |  |
| img_interp    | 15       | 15   | 16   | 16   | 17   | 17   |  |  |
| input_chip    | 12       | 11   | 13   | 12   | 13   | 13   |  |  |
| peak_chip     | 13       | 12   | 12   | 12   | 12   | 12   |  |  |
| scale125_chip | 13       | 13   | 16   | 16   | 17   | 17   |  |  |
| scale2_chip   | 12       | 12   | 15   | 12   | 15   | 13   |  |  |
| warping       | 12       | 12   | 13   | 15   | 13   | 13   |  |  |
| Geom. Avg.    | 15.9     | 16.5 | 17.7 | 17.9 | 18.5 | 18.5 |  |  |

Table D.1: Channel Width (Cluster Size = 1)

| Circuit       | LUT Size |      |      |      |      |      |  |
|---------------|----------|------|------|------|------|------|--|
|               | 2        | 3    | 4    | 5    | 6    | 7    |  |
| alu4          | 28       | 29   | 30   | 29   | 24   | 24   |  |
| apex2         | 25       | 29   | 36   | 34   | 33   | 32   |  |
| apex4         | 29       | 29   | 34   | 34   | 34   | 36   |  |
| bigkey        | 20       | 19   | 17   | 17   | 16   | 16   |  |
| clma          | 32       | 34   | 46   | 45   | 46   | 49   |  |
| des           | 20       | 20   | 21   | 25   | 23   | 20   |  |
| diffeq        | 20       | 21   | 25   | 26   | 25   | 24   |  |
| dsip          | 20       | 20   | 19   | 20   | 19   | 17   |  |
| elliptic      | 25       | 30   | 34   | 32   | 32   | 34   |  |
| ex1010        | 25       | 33   | 36   | 36   | 46   | 39   |  |
| ex5p          | 28       | 32   | 34   | 36   | 30   | 25   |  |
| frisc         | 29       | 32   | 41   | 45   | 41   | 45   |  |
| misex3        | 29       | 28   | 33   | 32   | 28   | 28   |  |
| pdc           | 37       | 39   | 51   | 46   | 47   | 47   |  |
| s298          | 20       | 24   | 28   | 28   | 26   | 30   |  |
| s38417        | 24       | 30   | 33   | 32   | 25   | 26   |  |
| s38584        | 25       | 28   | 29   | 29   | 30   | 29   |  |
| seq           | 28       | 29   | 37   | 36   | 36   | 34   |  |
| spla          | 33       | 38   | 45   | 42   | 45   | 39   |  |
| tseng         | 17       | 19   | 21   | 23   | 21   | 25   |  |
| display_chip  | 20       | 23   | 20   | 20   | 20   | 21   |  |
| img_calc      | 32       | 30   | 33   | 30   | 33   | 32   |  |
| img_interp    | 21       | 23   | 21   | 23   | 23   | 20   |  |
| input chip    | 17       | 17   | 19   | 16   | 16   | 16   |  |
| peak_chip     | 19       | 19   | 16   | 17   | 17   | 17   |  |
| scale125_chip | 20       | 21   | 21   | 23   | 20   | 23   |  |
| scale2_chip   | 19       | 19   | 19   | 19   | 19   | 17   |  |
| warping       | 19       | 20   | 19   | 17   | 20   | 17   |  |
| Geom. Avg.    | 23.8     | 25.5 | 27.7 | 27.6 | 26.9 | 26.4 |  |

Table D.2: Channel Width (Cluster Size = 2)

| Circuit       | LUT Size |      |      |      |      |      |
|---------------|----------|------|------|------|------|------|
|               | 2        | 3    | 4    | 5    | 6    | 7    |
| alu4          | 34       | 34   | 36   | 32   | 26   | 25   |
| apex2         | 36       | 39   | 39   | 41   | 39   | 37   |
| apex4         | 37       | 38   | 39   | 42   | 42   | 43   |
| bigkey        | 24       | 25   | 25   | 24   | 20   | 23   |
| clma          | 43       | 46   | 46   | 51   | 52   | 54   |
| des           | 26       | 26   | 28   | 29   | 23   | 24   |
| diffeq        | 25       | 26   | 29   | 29   | 29   | 28   |
| dsip          | 23       | 25   | 28   | 23   | 21   | 26   |
| elliptic      | 32       | 36   | 37   | 39   | 39   | 43   |
| ex1010        | 37       | 49   | 42   | 43   | 51   | 46   |
| ex5p          | 36       | 42   | 41   | 43   | 36   | 26   |
| frisc         | 36       | 43   | 45   | 50   | 51   | 58   |
| misex3        | 34       | 37   | 38   | 37   | 33   | 30   |
| pdc           | 50       | 52   | 59   | 55   | 47   | 49   |
| s298          | 23       | 26   | 29   | 30   | 30   | 32   |
| s38417        | 32       | 34   | 34   | 34   | 30   | 33   |
| s38584        | 30       | 36   | 34   | 34   | 34   | 34   |
| seq           | 38       | 38   | 39   | 38   | 41   | 37   |
| spla          | 45       | 52   | 54   | 50   | 45   | 43   |
| tseng         | 21       | 24   | 26   | 24   | 25   | 28   |
| display_chip  | 25       | 25   | 24   | 24   | 24   | 25   |
| img_calc      | 42       | 41   | 41   | 41   | 38   | 37   |
| img_interp    | 25       | 26   | 25   | 25   | 26   | 25   |
| input_chip    | 21       | 20   | 20   | 20   | 20   | 20   |
| peak_chip     | 23       | 21   | 19   | 19   | 19   | 17   |
| scale125_chip | 24       | 24   | 24   | 26   | 28   | 26   |
| scale2_chip   | 21       | 21   | 21   | 20   | 20   | 20   |
| warping       | 23       | 21   | 23   | 21   | 21   | 23   |
| Geom. Avg.    | 29.9     | 31.7 | 32.3 | 32.1 | 30.9 | 31.0 |

Table D.3: Channel Width (Cluster Size = 3)

| Circuit       |      | LUT Size |      |      |      |      |  |
|---------------|------|----------|------|------|------|------|--|
|               | 2    | 3        | 4    | 5    | 6    | 7    |  |
| alu4          | 39   | 39       | 39   | 33   | 29   | 26   |  |
| apex2         | 42   | 47       | 46   | 49   | 42   | 43   |  |
| apex4         | 42   | 45       | 47   | 47   | 45   | 50   |  |
| bigkey        | 26   | 29       | 26   | 24   | 21   | 20   |  |
| clma          | 50   | 55       | 56   | 59   | 63   | 64   |  |
| des           | 26   | 25       | 29   | 30   | 24   | 24   |  |
| diffeq        | 28   | 32       | 32   | 33   | 33   | 29   |  |
| dsip          | 23   | 21       | 26   | 23   | 23   | 28   |  |
| elliptic      | 37   | 42       | 49   | 45   | 47   | 49   |  |
| ex1010        | 46   | 55       | 50   | 50   | 55   | 50   |  |
| ex5p          | 41   | 47       | 49   | 45   | 38   | 30   |  |
| frisc         | 43   | 51       | 51   | 58   | 56   | 67   |  |
| misex3        | 42   | 45       | 43   | 41   | 38   | 33   |  |
| pdc           | 56   | 59       | 72   | 64   | 55   | 54   |  |
| s298          | 26   | 30       | 30   | 33   | 33   | 36   |  |
| s38417        | 41   | 43       | 43   | 38   | 34   | 36   |  |
| s38584        | 38   | 38       | 39   | 38   | 41   | 38   |  |
| seq           | 43   | 47       | 46   | 45   | 45   | 41   |  |
| spla          | 52   | 60       | 64   | 59   | 55   | 50   |  |
| tseng         | 24   | 25       | 30   | 26   | 26   | 34   |  |
| display_chip  | 28   | 28       | 28   | 30   | 26   | 28   |  |
| img_calc      | 51   | 49       | 46   | 49   | 45   | 45   |  |
| img_interp    | 30   | 30       | 29   | 28   | 33   | 30   |  |
| input chip    | 24   | 24       | 24   | 23   | 20   | 21   |  |
| peak_chip     | 24   | 25       | 20   | 20   | 21   | 23   |  |
| scale125_chip | 29   | 28       | 28   | 28   | 28   | 28   |  |
| scale2_chip   | 24   | 23       | 23   | 23   | 21   | 23   |  |
| warping       | 26   | 24       | 25   | 23   | 23   | 21   |  |
| Geom. Avg.    | 34.4 | 36.2     | 36.8 | 35.9 | 34.3 | 34.4 |  |

Table D.4: Channel Width (Cluster Size = 4)

| Circuit       | LUT Size |      |      |      |      |      |
|---------------|----------|------|------|------|------|------|
|               | 2        | 3    | 4    | 5    | 6    | 7    |
| alu4          | 42       | 42   | 42   | 36   | 30   | 32   |
| apex2         | 46       | 51   | 54   | 51   | 47   | 46   |
| apex4         | 47       | 50   | 51   | 54   | 50   | 50   |
| bigkey        | 30       | 29   | 28   | 26   | 25   | 26   |
| clma          | 59       | 63   | 64   | 67   | 67   | 76   |
| des           | 30       | 30   | 32   | 34   | 28   | 25   |
| diffeq        | 30       | 36   | 37   | 38   | 36   | 37   |
| dsip          | 29       | 25   | 30   | 28   | 26   | 30   |
| elliptic      | 46       | 47   | 55   | 54   | 51   | 54   |
| ex1010        | 51       | 65   | 55   | 55   | 62   | 58   |
| ex5p          | 45       | 54   | 52   | 54   | 41   | 33   |
| frisc         | 45       | 54   | 56   | 60   | 65   | 75   |
| misex3        | 46       | 47   | 47   | 46   | 39   | 34   |
| pdc           | 63       | 67   | 76   | 68   | 65   | 59   |
| s298          | 25       | 29   | 32   | 36   | 34   | 38   |
| s38417        | 45       | 47   | 43   | 45   | 39   | 41   |
| s38584        | 42       | 45   | 46   | 43   | 43   | 42   |
| seq           | 45       | 50   | 51   | 51   | 46   | 46   |
| spla          | 59       | 65   | 64   | 60   | 63   | 50   |
| tseng         | 26       | 29   | 34   | 29   | 30   | 37   |
| display_chip  | 28       | 28   | 29   | 34   | 29   | 30   |
| img_calc      | 52       | 52   | 49   | 54   | 52   | 47   |
| img_interp    | 33       | 34   | 30   | 29   | 36   | 30   |
| input_chip    | 26       | 25   | 25   | 23   | 23   | 23   |
| peak_chip     | 24       | 26   | 23   | 23   | 23   | 25   |
| scale125_chip | 29       | 30   | 30   | 29   | 32   | 32   |
| scale2_chip   | 25       | 25   | 24   | 23   | 26   | 23   |
| warping       | 28       | 28   | 28   | 26   | 28   | 28   |
| Geom. Avg.    | 37.5     | 39.7 | 40.2 | 39.7 | 38.3 | 38.0 |

Table D.5: Channel Width (Cluster Size = 5)

| Circuit       | LUT Size |      |      |      |      |      |
|---------------|----------|------|------|------|------|------|
|               | 2        | 3    | 4    | 5    | 6    | 7    |
| alu4          | 46       | 46   | 41   | 39   | 32   | 30   |
| apex2         | 51       | 59   | 59   | 59   | 54   | 49   |
| apex4         | 52       | 59   | 62   | 54   | 56   | 54   |
| bigkey        | 29       | 25   | 29   | 25   | 24   | 25   |
| clma          | 62       | 69   | 72   | 75   | 75   | 82   |
| des           | 32       | 34   | 33   | 33   | 28   | 29   |
| diffeq        | 33       | 36   | 38   | 39   | 38   | 38   |
| dsip          | 32       | 30   | 25   | 25   | 26   | 29   |
| elliptic      | 47       | 54   | 60   | 59   | 55   | 59   |
| ex1010        | 58       | 75   | 62   | 62   | 75   | 62   |
| ex5p          | 54       | 62   | 59   | 56   | 46   | 38   |
| frisc         | 54       | 62   | 62   | 67   | 67   | 82   |
| misex3        | 49       | 54   | 49   | 49   | 43   | 38   |
| pdc           | 69       | 76   | 90   | 80   | 69   | 63   |
| s298          | 25       | 32   | 36   | 41   | 36   | 38   |
| s38417        | 51       | 51   | 49   | 47   | 43   | 45   |
| s38584        | 49       | 49   | 51   | 49   | 46   | 46   |
| seq           | 52       | 56   | 54   | 55   | 51   | 51   |
| spla          | 65       | 75   | 71   | 69   | 64   | 54   |
| tseng         | 28       | 33   | 38   | 32   | 33   | 39   |
| display_chip  | 30       | 33   | 36   | 38   | 34   | 34   |
| img_calc      | 63       | 55   | 55   | 59   | 54   | 54   |
| img_interp    | 34       | 36   | 36   | 34   | 36   | 34   |
| input chip    | 25       | 28   | 28   | 25   | 25   | 25   |
| peak_chip     | 26       | 28   | 24   | 25   | 25   | 25   |
| scale125_chip | 32       | 32   | 33   | 33   | 32   | 34   |
| scale2_chip   | 26       | 28   | 28   | 26   | 28   | 25   |
| warping       | 33       | 29   | 28   | 28   | 28   | 28   |
| Geom. Avg.    | 40.9     | 43.9 | 43.9 | 43.0 | 41.0 | 40.6 |

Table D.6: Channel Width (Cluster Size = 6)

| Circuit       | LUT Size |      |      |      |      |      |
|---------------|----------|------|------|------|------|------|
|               | 2        | 3    | 4    | 5    | 6    | 7    |
| alu4          | 51       | 49   | 46   | 39   | 33   | 29   |
| apex2         | 55       | 59   | 62   | 59   | 55   | 50   |
| apex4         | 58       | 60   | 63   | 56   | 64   | 56   |
| bigkey        | 30       | 34   | 37   | 29   | 29   | 30   |
| clma          | 65       | 78   | 76   | 81   | 82   | 85   |
| des           | 38       | 38   | 36   | 37   | 32   | 30   |
| diffeq        | 34       | 38   | 37   | 41   | 41   | 39   |
| dsip          | 33       | 37   | 29   | 30   | 29   | 32   |
| elliptic      | 52       | 56   | 65   | 60   | 60   | 64   |
| ex1010        | 63       | 81   | 69   | 64   | 72   | 65   |
| ex5p          | 55       | 64   | 60   | 62   | 47   | 38   |
| frisc         | 58       | 63   | 68   | 71   | 75   | 88   |
| misex3        | 55       | 55   | 55   | 51   | 43   | 37   |
| pdc           | 73       | 80   | 89   | 84   | 71   | 64   |
| s298          | 29       | 32   | 34   | 38   | 39   | 38   |
| s38417        | 56       | 56   | 51   | 49   | 46   | 49   |
| s38584        | 50       | 52   | 52   | 54   | 50   | 49   |
| seq           | 56       | 62   | 59   | 56   | 54   | 51   |
| spla          | 68       | 80   | 76   | 68   | 69   | 56   |
| tseng         | 29       | 36   | 38   | 33   | 33   | 46   |
| display_chip  | 30       | 30   | 36   | 41   | 34   | 34   |
| img_calc      | 64       | 59   | 59   | 65   | 58   | 59   |
| img_interp    | 38       | 33   | 38   | 37   | 41   | 36   |
| input_chip    | 30       | 28   | 29   | 25   | 26   | 29   |
| peak_chip     | 26       | 30   | 26   | 25   | 28   | 25   |
| scale125_chip | 32       | 34   | 33   | 36   | 36   | 36   |
| scale2_chip   | 29       | 29   | 30   | 33   | 33   | 29   |
| warping       | 34       | 30   | 32   | 30   | 33   | 33   |
| Geom. Avg.    | 43.8     | 46.5 | 46.6 | 45.6 | 44.3 | 43.0 |

Table D.7: Channel Width (Cluster Size = 7)

| Circuit       |      | LUT Size |      |      |      |      |  |
|---------------|------|----------|------|------|------|------|--|
|               | 2    | 3        | 4    | 5    | 6    | 7    |  |
| alu4          | 55   | 52       | 50   | 42   | 30   | 30   |  |
| apex2         | 58   | 64       | 67   | 64   | 52   | 54   |  |
| apex4         | 63   | 67       | 65   | 60   | 63   | 62   |  |
| bigkey        | 32   | 36       | 36   | 29   | 29   | 29   |  |
| clma          | 71   | 84       | 84   | 89   | 85   | 93   |  |
| des           | 39   | 38       | 41   | 41   | 32   | 30   |  |
| diffeq        | 38   | 42       | 42   | 43   | 41   | 42   |  |
| dsip          | 32   | 32       | 32   | 29   | 30   | 32   |  |
| elliptic      | 60   | 63       | 65   | 65   | 62   | 71   |  |
| ex1010        | 67   | 90       | 77   | 71   | 77   | 67   |  |
| ex5p          | 58   | 72       | 64   | 75   | 49   | 47   |  |
| frisc         | 59   | 73       | 69   | 80   | 81   | 99   |  |
| misex3        | 60   | 59       | 58   | 52   | 45   | 39   |  |
| pdc           | 77   | 88       | 97   | 89   | 75   | 65   |  |
| s298          | 30   | 33       | 39   | 41   | 41   | 47   |  |
| s38417        | 59   | 55       | 56   | 50   | 47   | 49   |  |
| s38584        | 54   | 56       | 56   | 56   | 54   | 56   |  |
| seq           | 59   | 64       | 64   | 62   | 55   | 54   |  |
| spla          | 75   | 89       | 80   | 77   | 71   | 59   |  |
| tseng         | 29   | 37       | 42   | 37   | 36   | 46   |  |
| display_chip  | 33   | 36       | 37   | 42   | 38   | 39   |  |
| img_calc      | 65   | 63       | 62   | 72   | 62   | 63   |  |
| img_interp    | 39   | 39       | 41   | 39   | 39   | 38   |  |
| input chip    | 29   | 30       | 30   | 29   | 28   | 30   |  |
| peak_chip     | 29   | 33       | 28   | 28   | 29   | 29   |  |
| scale125_chip | 36   | 36       | 33   | 36   | 38   | 37   |  |
| scale2_chip   | 30   | 30       | 29   | 29   | 30   | 29   |  |
| warping       | 34   | 33       | 32   | 32   | 32   | 33   |  |
| Geom. Avg.    | 46.3 | 49.9     | 49.5 | 48.7 | 45.4 | 45.9 |  |

Table D.8: Channel Width (Cluster Size = 8)

| Circuit       | LUT Size |      |      |      |      |      |
|---------------|----------|------|------|------|------|------|
|               | 2        | 3    | 4    | 5    | 6    | 7    |
| alu4          | 56       | 54   | 47   | 39   | 33   | 32   |
| apex2         | 62       | 65   | 71   | 64   | 56   | 55   |
| apex4         | 62       | 68   | 69   | 65   | 62   | 64   |
| bigkey        | 39       | 32   | 34   | 29   | 29   | 30   |
| clma          | 75       | 84   | 88   | 90   | 90   | 95   |
| des           | 41       | 38   | 41   | 37   | 33   | 32   |
| diffeq        | 43       | 41   | 43   | 43   | 45   | 43   |
| dsip          | 37       | 30   | 30   | 30   | 29   | 33   |
| elliptic      | 62       | 64   | 69   | 68   | 64   | 71   |
| ex1010        | 69       | 95   | 84   | 75   | 84   | 75   |
| ex5p          | 64       | 76   | 71   | 67   | 55   | 41   |
| frisc         | 64       | 76   | 73   | 81   | 81   | 98   |
| misex3        | 62       | 60   | 60   | 58   | 46   | 43   |
| pdc           | 86       | 90   | 101  | 93   | 75   | 69   |
| s298          | 32       | 34   | 38   | 43   | 45   | 49   |
| s38417        | 60       | 58   | 58   | 54   | 49   | 50   |
| s38584        | 56       | 62   | 62   | 59   | 59   | 60   |
| seq           | 62       | 71   | 64   | 62   | 56   | 56   |
| spla          | 78       | 88   | 82   | 81   | 72   | 60   |
| tseng         | 34       | 39   | 50   | 38   | 38   | 51   |
| display_chip  | 37       | 37   | 38   | 46   | 36   | 38   |
| img_calc      | 69       | 67   | 64   | 75   | 65   | 67   |
| img_interp    | 39       | 38   | 41   | 43   | 43   | 41   |
| input_chip    | 29       | 29   | 29   | 32   | 29   | 30   |
| peak_chip     | 30       | 33   | 29   | 30   | 26   | 29   |
| scale125_chip | 37       | 37   | 34   | 41   | 39   | 42   |
| scale2_chip   | 32       | 30   | 33   | 32   | 33   | 30   |
| warping       | 36       | 32   | 34   | 33   | 33   | 33   |
| Geom. Avg.    | 49.3     | 50.7 | 51.3 | 50.5 | 47.1 | 47.5 |

Table D.9: Channel Width (Cluster Size = 9)

| Circuit       |      |      | LUT  | Size |      |      |
|---------------|------|------|------|------|------|------|
|               | 2    | 3    | 4    | 5    | 6    | 7    |
| alu4          | 60   | 56   | 54   | 43   | 34   | 33   |
| apex2         | 67   | 71   | 69   | 71   | 59   | 59   |
| apex4         | 67   | 75   | 78   | 68   | 68   | 65   |
| bigkey        | 43   | 34   | 37   | 34   | 33   | 33   |
| clma          | 80   | 89   | 93   | 99   | 97   | 102  |
| des           | 42   | 42   | 42   | 41   | 36   | 37   |
| diffeq        | 41   | 46   | 45   | 43   | 45   | 45   |
| dsip          | 42   | 34   | 33   | 33   | 33   | 34   |
| elliptic      | 64   | 68   | 75   | 69   | 72   | 73   |
| ex1010        | 76   | 102  | 86   | 75   | 85   | 81   |
| ex5p          | 64   | 77   | 75   | 72   | 55   | 49   |
| frisc         | 67   | 80   | 78   | 85   | 88   | 101  |
| misex3        | 60   | 67   | 65   | 55   | 47   | 39   |
| pdc           | 86   | 91   | 107  | 95   | 85   | 72   |
| s298          | 30   | 34   | 41   | 42   | 43   | 51   |
| s38417        | 63   | 62   | 59   | 56   | 54   | 60   |
| s38584        | 56   | 64   | 67   | 63   | 63   | 65   |
| seq           | 65   | 71   | 67   | 69   | 62   | 58   |
| spla          | 80   | 98   | 88   | 78   | 75   | 63   |
| tseng         | 33   | 45   | 51   | 39   | 38   | 55   |
| display_chip  | 34   | 37   | 41   | 47   | 41   | 43   |
| img_calc      | 68   | 73   | 67   | 76   | 72   | 69   |
| img_interp    | 42   | 38   | 46   | 41   | 46   | 49   |
| input_chip    | 30   | 29   | 29   | 29   | 29   | 29   |
| peak_chip     | 32   | 33   | 32   | 29   | 33   | 29   |
| scale125_chip | 37   | 38   | 38   | 39   | 45   | 41   |
| scale2_chip   | 32   | 32   | 36   | 30   | 32   | 33   |
| warping       | 37   | 34   | 37   | 38   | 37   | 37   |
| Geom. Avg.    | 50.6 | 53.6 | 54.8 | 52.0 | 50.5 | 50.5 |

Table D.10: Channel Width (Cluster Size = 10)

\_\_\_\_
# $\mathsf{APPENDIX} \; E$

#### Total Critical Path Delay

| Circuit       |        |       | LUT   | Size  |       |       |
|---------------|--------|-------|-------|-------|-------|-------|
|               | 2      | 3     | 4     | 5     | 6     | 7     |
| alu4          | 29.21  | 29.29 | 24.72 | 20.19 | 20.84 | 29.75 |
| apex2         | 31.70  | 29.17 | 27.16 | 21.18 | 23.79 | 26.55 |
| apex4         | 31.30  | 22.51 | 19.78 | 18.57 | 21.68 | 20.71 |
| bigkey        | 20.66  | 14.84 | 13.77 | 25.20 | 14.90 | 11.53 |
| clma          | 67.85  | 50.70 | 51.68 | 49.81 | 43.05 | 35.88 |
| des           | 24.43  | 22.48 | 22.04 | 21.07 | 18.31 | 40.43 |
| diffeq        | 49.68  | 28.05 | 22.30 | 17.16 | 16.51 | 14.21 |
| dsip          | 20.81  | 15.57 | 12.73 | 13.39 | 12.92 | 29.67 |
| elliptic      | 64.18  | 34.16 | 30.79 | 29.60 | 28.04 | 31.88 |
| ex1010        | 46.61  | 54.44 | 40.75 | 75.00 | 49.52 | 25.53 |
| ex5p          | 25.11  | 23.81 | 18.16 | 17.21 | 16.91 | 12.99 |
| frisc         | 91.72  | 43.79 | 39.04 | 32.55 | 29.73 | 30.18 |
| misex3        | 26.43  | 24.49 | 20.98 | 22.63 | 16.87 | 23.00 |
| pdc           | 50.54  | 56.13 | 64.71 | 41.63 | 41.14 | 51.11 |
| s298          | 58.92  | 49.18 | 39.87 | 37.00 | 33.10 | 31.76 |
| s38417        | 45.75  | 27.67 | 29.16 | 20.98 | 16.63 | 15.22 |
| s38584        | 37.85  | 21.43 | 19.82 | 21.16 | 19.68 | 16.80 |
| seq           | 31.40  | 30.12 | 18.40 | 18.34 | 18.40 | 27.11 |
| spla          | 43.63  | 46.38 | 47.29 | 38.92 | 36.18 | 34.23 |
| tseng         | 51.32  | 28.25 | 20.31 | 18.25 | 16.21 | 14.27 |
| display_chip  | 63.46  | 36.49 | 22.00 | 16.77 | 15.04 | 11.88 |
| img_calc      | 143.91 | 91.46 | 49.79 | 37.72 | 34.71 | 28.57 |
| img_interp    | 69.02  | 42.56 | 25.14 | 21.58 | 19.20 | 14.30 |
| input_chip    | 54.20  | 34.91 | 19.12 | 18.52 | 14.16 | 12.49 |
| peak_chip     | 68.92  | 38.17 | 24.03 | 17.71 | 14.58 | 13.78 |
| scale125_chip | 88.88  | 45.98 | 28.27 | 26.67 | 19.75 | 18.17 |
| scale2_chip   | 57.26  | 32.82 | 22.65 | 26.73 | 15.50 | 15.09 |
| warping       | 40.30  | 24.21 | 16.05 | 13.01 | 10.42 | 10.97 |
| Geom. Avg.    | 45.84  | 32.89 | 25.91 | 24.11 | 20.97 | 21.08 |

Table E.1: Total Delay in nano-seconds (Cluster Size = 1)

| Circuit       | LUT Size |       |       |       |       |       |  |
|---------------|----------|-------|-------|-------|-------|-------|--|
|               | 2        | 3     | 4     | 5     | 6     | 7     |  |
| alu4          | 24.72    | 20.19 | 22.14 | 18.77 | 17.75 | 15.69 |  |
| apex2         | 28.20    | 24.81 | 19.81 | 21.29 | 19.17 | 19.95 |  |
| apex4         | 24.33    | 20.12 | 20.47 | 21.35 | 17.40 | 18.52 |  |
| bigkey        | 14.81    | 12.42 | 12.32 | 13.05 | 8.47  | 10.67 |  |
| clma          | 59.52    | 45.28 | 40.56 | 37.07 | 37.02 | 32.60 |  |
| des           | 19.93    | 19.01 | 18.25 | 19.38 | 15.51 | 14.40 |  |
| diffeq        | 37.49    | 24.58 | 22.10 | 17.76 | 15.49 | 14.27 |  |
| dsip          | 14.58    | 12.27 | 10.55 | 9.37  | 10.54 | 9.35  |  |
| elliptic      | 48.08    | 38.04 | 28.68 | 28.76 | 27.40 | 31.44 |  |
| ex1010        | 39.30    | 47.16 | 30.64 | 38.01 | 28.84 | 22.61 |  |
| ex5p          | 22.47    | 18.81 | 21.16 | 15.82 | 14.66 | 12.61 |  |
| frisc         | 63.49    | 42.74 | 33.99 | 30.21 | 29.26 | 28.12 |  |
| misex3        | 22.43    | 21.62 | 18.61 | 18.39 | 18.51 | 15.16 |  |
| pdc           | 39.69    | 36.15 | 36.22 | 28.39 | 28.84 | 23.24 |  |
| s298          | 45.66    | 34.80 | 33.24 | 31.66 | 36.01 | 27.24 |  |
| s38417        | 35.63    | 25.54 | 21.59 | 20.12 | 15.67 | 14.58 |  |
| s38584        | 23.44    | 22.62 | 17.40 | 18.19 | 15.30 | 15.79 |  |
| seq           | 24.70    | 19.35 | 17.40 | 18.65 | 14.82 | 23.40 |  |
| spla          | 39.37    | 35.66 | 34.14 | 30.24 | 25.59 | 27.75 |  |
| tseng         | 34.92    | 26.11 | 20.32 | 17.73 | 15.48 | 15.05 |  |
| display_chip  | 45.40    | 31.21 | 18.69 | 16.11 | 13.87 | 12.22 |  |
| img_calc      | 106.03   | 74.97 | 46.08 | 34.86 | 30.05 | 28.63 |  |
| img_interp    | 49.52    | 35.33 | 24.32 | 19.92 | 18.06 | 15.31 |  |
| input_chip    | 42.33    | 30.90 | 15.97 | 15.94 | 13.83 | 12.16 |  |
| peak_chip     | 49.81    | 33.72 | 20.85 | 16.79 | 14.97 | 13.64 |  |
| scale125_chip | 59.95    | 42.87 | 24.06 | 24.41 | 18.51 | 19.62 |  |
| scale2_chip   | 41.84    | 29.32 | 20.19 | 17.63 | 14.81 | 13.98 |  |
| warping       | 25.69    | 19.19 | 13.62 | 11.76 | 10.26 | 9.42  |  |
| Geom. Avg.    | 34.94    | 27.82 | 22.30 | 20.63 | 18.19 | 17.33 |  |

Table E.2: Total Delay in nano-seconds (Cluster Size = 2)

| Circuit       | LUT Size |       |       |       |       |       |  |
|---------------|----------|-------|-------|-------|-------|-------|--|
|               | 2        | 3     | 4     | 5     | 6     | 7     |  |
| alu4          | 22.08    | 23.25 | 14.48 | 15.12 | 14.49 | 15.51 |  |
| apex2         | 23.22    | 19.39 | 18.15 | 19.89 | 15.97 | 16.51 |  |
| apex4         | 22.16    | 19.29 | 16.69 | 20.40 | 16.52 | 18.33 |  |
| bigkey        | 13.53    | 10.81 | 8.25  | 8.97  | 7.82  | 8.36  |  |
| clma          | 55.09    | 39.16 | 37.01 | 33.95 | 32.24 | 34.24 |  |
| des           | 18.45    | 16.64 | 15.03 | 13.94 | 13.02 | 14.17 |  |
| diffeq        | 30.99    | 22.53 | 17.82 | 18.28 | 13.95 | 13.14 |  |
| dsip          | 13.26    | 10.98 | 8.44  | 8.90  | 8.55  | 9.23  |  |
| elliptic      | 39.00    | 32.93 | 26.94 | 28.11 | 23.24 | 25.23 |  |
| ex1010        | 35.02    | 38.75 | 28.10 | 24.86 | 32.63 | 19.38 |  |
| ex5p          | 20.55    | 18.02 | 15.95 | 14.94 | 13.19 | 13.18 |  |
| frisc         | 54.56    | 35.78 | 32.80 | 29.23 | 29.21 | 25.65 |  |
| misex3        | 19.50    | 19.21 | 15.07 | 14.46 | 13.90 | 13.87 |  |
| pdc           | 40.94    | 31.65 | 36.90 | 29.17 | 34.35 | 27.62 |  |
| s298          | 43.80    | 37.09 | 32.89 | 30.63 | 29.49 | 30.52 |  |
| s38417        | 32.49    | 28.50 | 19.43 | 20.44 | 15.51 | 13.88 |  |
| s38584        | 22.55    | 18.38 | 18.63 | 14.95 | 15.04 | 16.73 |  |
| seq           | 20.45    | 17.16 | 16.04 | 18.30 | 15.02 | 14.93 |  |
| spla          | 32.63    | 31.58 | 25.15 | 28.10 | 34.35 | 29.31 |  |
| tseng         | 32.29    | 22.19 | 17.24 | 16.54 | 15.67 | 14.74 |  |
| display_chip  | 38.61    | 28.75 | 18.03 | 14.57 | 12.99 | 11.66 |  |
| img_calc      | 91.80    | 71.02 | 40.62 | 31.73 | 27.73 | 27.15 |  |
| img_interp    | 45.72    | 32.93 | 23.79 | 22.82 | 16.15 | 13.66 |  |
| input_chip    | 35.22    | 22.97 | 15.78 | 13.96 | 12.24 | 12.71 |  |
| peak_chip     | 39.83    | 29.63 | 18.46 | 15.64 | 14.45 | 13.85 |  |
| scale125_chip | 49.64    | 37.53 | 24.54 | 23.13 | 17.37 | 17.01 |  |
| scale2_chip   | 37.96    | 25.49 | 19.03 | 16.63 | 14.29 | 14.39 |  |
| warping       | 21.64    | 15.42 | 11.72 | 9.98  | 9.00  | 9.14  |  |
| Geom. Avg.    | 30.86    | 24.87 | 19.60 | 18.66 | 16.96 | 16.46 |  |

Table E.3: Total Delay in nano-seconds (Cluster Size = 3)

| Circuit       | LUT Size |       |       |       |       |       |  |
|---------------|----------|-------|-------|-------|-------|-------|--|
|               | 2        | 3     | 4     | 5     | 6     | 7     |  |
| alu4          | 35.20    | 19.83 | 15.96 | 14.32 | 13.85 | 14.22 |  |
| apex2         | 25.35    | 19.13 | 26.55 | 15.92 | 19.97 | 16.85 |  |
| apex4         | 22.22    | 22.53 | 17.85 | 22.10 | 19.00 | 14.66 |  |
| bigkey        | 13.97    | 12.67 | 8.50  | 8.06  | 8.03  | 8.75  |  |
| clma          | 50.94    | 37.58 | 34.78 | 35.10 | 33.90 | 33.73 |  |
| des           | 19.49    | 16.47 | 15.23 | 13.94 | 13.32 | 13.35 |  |
| diffeq        | 30.03    | 21.19 | 17.24 | 17.78 | 13.46 | 13.33 |  |
| dsip          | 12.92    | 12.40 | 11.10 | 7.53  | 10.97 | 8.08  |  |
| elliptic      | 39.47    | 33.25 | 25.72 | 28.85 | 36.48 | 23.64 |  |
| ex1010        | 34.49    | 40.17 | 24.44 | 24.14 | 28.74 | 21.24 |  |
| ex5p          | 20.10    | 18.87 | 15.26 | 16.87 | 15.11 | 12.13 |  |
| frisc         | 55.00    | 35.90 | 34.44 | 28.09 | 31.86 | 31.91 |  |
| misex3        | 21.42    | 17.71 | 15.65 | 15.22 | 13.16 | 14.74 |  |
| pdc           | 65.14    | 39.18 | 31.41 | 32.11 | 28.18 | 24.30 |  |
| s298          | 36.76    | 37.09 | 32.31 | 30.88 | 28.29 | 27.36 |  |
| s38417        | 30.19    | 23.70 | 19.31 | 18.62 | 14.20 | 15.89 |  |
| s38584        | 21.17    | 16.31 | 14.88 | 14.04 | 13.86 | 14.11 |  |
| seq           | 20.77    | 15.57 | 15.91 | 14.83 | 15.41 | 15.06 |  |
| spla          | 34.39    | 42.19 | 23.30 | 27.74 | 27.79 | 19.16 |  |
| tseng         | 35.42    | 21.27 | 17.29 | 16.17 | 15.04 | 14.60 |  |
| display_chip  | 36.25    | 27.39 | 15.81 | 14.24 | 12.22 | 10.88 |  |
| img_calc      | 90.66    | 68.51 | 38.11 | 30.56 | 27.61 | 26.77 |  |
| img_interp    | 41.77    | 30.22 | 22.83 | 17.28 | 18.15 | 14.25 |  |
| input_chip    | 33.93    | 25.18 | 14.36 | 13.23 | 12.26 | 12.50 |  |
| peak_chip     | 39.11    | 28.87 | 17.24 | 15.32 | 13.00 | 11.81 |  |
| scale125_chip | 50.73    | 35.89 | 23.74 | 21.87 | 17.51 | 17.59 |  |
| scale2_chip   | 37.35    | 24.07 | 18.54 | 17.14 | 15.23 | 12.65 |  |
| warping       | 20.90    | 16.26 | 11.25 | 9.94  | 9.28  | 8.76  |  |
| Geom. Avg.    | 31.58    | 24.93 | 19.35 | 17.95 | 17.29 | 15.73 |  |

Table E.4: Total Delay in nano-seconds (Cluster Size = 4)

| Circuit       | LUT Size |       |       |       |       |       |  |
|---------------|----------|-------|-------|-------|-------|-------|--|
|               | 2        | 3     | 4     | 5     | 6     | 7     |  |
| alu4          | 25.77    | 16.69 | 14.68 | 14.12 | 13.33 | 13.56 |  |
| apex2         | 24.91    | 19.80 | 18.20 | 17.40 | 15.12 | 16.49 |  |
| apex4         | 22.52    | 16.86 | 16.42 | 15.74 | 16.31 | 15.88 |  |
| bigkey        | 12.57    | 10.03 | 8.29  | 7.94  | 8.18  | 6.33  |  |
| clma          | 49.83    | 38.27 | 32.06 | 37.76 | 33.01 | 28.24 |  |
| des           | 17.49    | 17.70 | 15.20 | 14.85 | 12.44 | 14.98 |  |
| diffeq        | 28.07    | 21.41 | 16.12 | 15.65 | 13.45 | 12.72 |  |
| dsip          | 13.28    | 9.66  | 9.98  | 8.20  | 7.99  | 7.24  |  |
| elliptic      | 35.90    | 28.88 | 25.66 | 24.48 | 20.04 | 22.26 |  |
| ex1010        | 50.02    | 58.34 | 41.92 | 35.00 | 32.30 | 19.89 |  |
| ex5p          | 19.19    | 20.42 | 15.67 | 17.21 | 14.93 | 11.75 |  |
| frisc         | 48.75    | 32.71 | 29.30 | 27.05 | 29.78 | 27.52 |  |
| misex3        | 18.46    | 28.36 | 16.48 | 20.34 | 13.95 | 13.75 |  |
| pdc           | 37.99    | 51.85 | 36.36 | 30.38 | 31.85 | 26.95 |  |
| s298          | 46.05    | 34.67 | 36.71 | 34.37 | 29.26 | 27.26 |  |
| s38417        | 39.37    | 21.37 | 21.52 | 20.42 | 14.42 | 13.64 |  |
| s38584        | 21.61    | 17.36 | 15.04 | 18.39 | 20.17 | 14.26 |  |
| seq           | 21.63    | 17.05 | 15.18 | 15.54 | 15.25 | 13.66 |  |
| spla          | 35.14    | 33.17 | 37.31 | 24.81 | 25.35 | 27.65 |  |
| tseng         | 32.18    | 22.45 | 17.39 | 17.05 | 14.29 | 14.29 |  |
| display_chip  | 37.06    | 25.33 | 16.53 | 13.91 | 12.30 | 10.62 |  |
| img_calc      | 93.33    | 68.72 | 38.88 | 30.77 | 26.09 | 25.49 |  |
| img_interp    | 41.51    | 28.46 | 24.08 | 20.39 | 15.72 | 13.30 |  |
| input_chip    | 32.40    | 25.10 | 14.69 | 13.57 | 11.79 | 13.01 |  |
| peak_chip     | 38.61    | 28.32 | 17.58 | 14.62 | 14.38 | 11.44 |  |
| scale125_chip | 49.00    | 35.51 | 23.83 | 20.49 | 16.26 | 17.74 |  |
| scale2_chip   | 33.79    | 25.08 | 18.93 | 17.06 | 14.71 | 13.35 |  |
| warping       | 19.87    | 14.73 | 11.17 | 10.43 | 8.93  | 7.68  |  |
| Geom. Avg.    | 30.59    | 24.69 | 19.79 | 18.47 | 16.54 | 15.23 |  |

Table E.5: Total Delay in nano-seconds (Cluster Size = 5)

| Circuit       |       |       | LUT   | Size  |       |       |
|---------------|-------|-------|-------|-------|-------|-------|
|               | 2     | 3     | 4     | 5     | 6     | 7     |
| alu4          | 19.46 | 17.55 | 17.45 | 14.79 | 13.71 | 14.99 |
| apex2         | 24.17 | 19.96 | 16.93 | 17.40 | 14.40 | 16.53 |
| apex4         | 28.51 | 20.74 | 16.52 | 17.77 | 14.00 | 14.09 |
| bigkey        | 11.63 | 10.08 | 9.33  | 7.23  | 7.52  | 7.08  |
| clma          | 44.16 | 41.97 | 35.98 | 32.48 | 31.30 | 25.57 |
| des           | 16.72 | 15.65 | 15.52 | 13.76 | 12.03 | 13.58 |
| diffeq        | 27.52 | 20.23 | 16.32 | 17.46 | 13.22 | 13.32 |
| dsip          | 10.90 | 11.54 | 7.31  | 7.76  | 7.78  | 7.79  |
| elliptic      | 36.72 | 27.37 | 22.57 | 27.58 | 27.96 | 20.20 |
| ex1010        | 32.78 | 44.15 | 33.28 | 36.26 | 21.61 | 18.87 |
| ex5p          | 19.87 | 18.25 | 17.31 | 18.39 | 14.22 | 11.36 |
| frisc         | 52.84 | 33.57 | 30.86 | 29.48 | 26.31 | 26.21 |
| misex3        | 19.00 | 15.62 | 16.77 | 13.03 | 12.41 | 13.09 |
| pdc           | 39.09 | 38.42 | 31.67 | 24.83 | 35.91 | 27.12 |
| s298          | 43.55 | 40.20 | 29.78 | 30.55 | 25.61 | 24.83 |
| s38417        | 28.18 | 23.02 | 19.06 | 16.45 | 13.16 | 20.36 |
| s38584        | 19.31 | 16.77 | 15.26 | 13.92 | 14.09 | 14.64 |
| seq           | 21.09 | 17.84 | 15.86 | 13.86 | 12.82 | 13.45 |
| spla          | 34.56 | 28.05 | 40.44 | 26.70 | 22.52 | 18.03 |
| tseng         | 30.09 | 22.35 | 16.84 | 16.18 | 14.54 | 14.01 |
| display_chip  | 35.71 | 24.61 | 14.94 | 15.15 | 11.87 | 10.31 |
| img_calc      | 85.38 | 64.03 | 36.87 | 30.97 | 25.25 | 24.75 |
| img_interp    | 42.31 | 29.98 | 22.33 | 19.19 | 14.77 | 13.04 |
| input_chip    | 33.71 | 24.02 | 13.71 | 13.78 | 11.48 | 10.46 |
| peak_chip     | 38.25 | 27.16 | 18.29 | 14.98 | 14.70 | 11.76 |
| scale125_chip | 49.86 | 32.51 | 21.74 | 20.61 | 17.46 | 15.81 |
| scale2_chip   | 32.19 | 24.05 | 17.98 | 15.90 | 13.50 | 12.71 |
| warping       | 20.53 | 13.85 | 11.62 | 10.39 | 9.52  | 8.04  |
| Geom. Avg.    | 29.04 | 23.64 | 19.19 | 17.74 | 15.63 | 14.81 |

Table E.6: Total Delay in nano-seconds (Cluster Size = 6)

| Circuit       | LUT Size |       |       |       |       |       |  |
|---------------|----------|-------|-------|-------|-------|-------|--|
|               | 2        | 3     | 4     | 5     | 6     | 7     |  |
| alu4          | 19.60    | 16.41 | 16.37 | 15.05 | 12.69 | 13.12 |  |
| apex2         | 28.65    | 17.72 | 17.39 | 17.32 | 17.03 | 15.04 |  |
| apex4         | 19.73    | 23.28 | 17.49 | 17.24 | 13.28 | 18.21 |  |
| bigkey        | 10.94    | 9.57  | 7.97  | 8.99  | 7.19  | 7.17  |  |
| clma          | 57.68    | 37.20 | 32.26 | 39.49 | 28.79 | 32.06 |  |
| des           | 16.52    | 15.77 | 14.97 | 12.96 | 10.62 | 12.46 |  |
| diffeq        | 27.04    | 21.07 | 16.02 | 14.15 | 12.77 | 11.92 |  |
| dsip          | 11.15    | 9.66  | 7.28  | 7.60  | 7.80  | 6.73  |  |
| elliptic      | 39.08    | 31.74 | 27.71 | 27.71 | 24.49 | 22.29 |  |
| ex1010        | 32.23    | 36.53 | 24.62 | 27.22 | 27.89 | 18.75 |  |
| ex5p          | 22.03    | 20.08 | 15.28 | 15.78 | 14.69 | 11.08 |  |
| frisc         | 49.02    | 35.79 | 30.09 | 29.91 | 28.02 | 25.61 |  |
| misex3        | 19.00    | 16.39 | 15.16 | 13.00 | 12.60 | 13.36 |  |
| pdc           | 34.41    | 38.14 | 31.53 | 38.90 | 31.39 | 28.13 |  |
| s298          | 39.61    | 36.92 | 33.20 | 29.17 | 25.49 | 22.13 |  |
| s38417        | 34.26    | 27.07 | 33.84 | 16.89 | 16.96 | 15.80 |  |
| s38584        | 19.67    | 18.62 | 14.45 | 16.17 | 14.13 | 16.88 |  |
| seq           | 23.29    | 15.33 | 14.25 | 16.37 | 13.52 | 13.12 |  |
| spla          | 33.97    | 30.37 | 26.85 | 30.05 | 19.99 | 19.59 |  |
| tseng         | 29.90    | 22.25 | 16.08 | 15.74 | 13.93 | 13.62 |  |
| display_chip  | 36.97    | 25.67 | 14.64 | 14.51 | 12.68 | 10.03 |  |
| img_calc      | 85.71    | 66.41 | 34.35 | 30.94 | 25.49 | 24.68 |  |
| img_interp    | 40.27    | 28.16 | 20.42 | 17.08 | 15.36 | 13.15 |  |
| input_chip    | 32.09    | 24.52 | 13.51 | 12.83 | 11.30 | 10.87 |  |
| peak_chip     | 36.55    | 26.86 | 17.11 | 15.31 | 13.61 | 13.31 |  |
| scale125_chip | 43.06    | 35.46 | 20.76 | 18.54 | 18.01 | 16.78 |  |
| scale2_chip   | 31.77    | 23.19 | 17.87 | 14.68 | 12.65 | 12.30 |  |
| warping       | 19.66    | 14.15 | 10.68 | 9.94  | 9.14  | 7.82  |  |
| Geom. Avg.    | 28.88    | 23.63 | 18.53 | 17.79 | 15.60 | 14.79 |  |

Table E.7: Total Delay in nano-seconds (Cluster Size = 7)

| Circuit       |       |       | LUT   | Size  |       |       |
|---------------|-------|-------|-------|-------|-------|-------|
|               | 2     | 3     | 4     | 5     | 6     | 7     |
| alu4          | 19.85 | 16.27 | 14.29 | 13.07 | 15.25 | 14.30 |
| apex2         | 22.50 | 17.99 | 15.51 | 19.79 | 16.48 | 16.08 |
| apex4         | 20.29 | 23.68 | 15.23 | 19.34 | 16.21 | 16.63 |
| bigkey        | 11.40 | 9.09  | 8.05  | 8.34  | 8.82  | 7.40  |
| clma          | 50.14 | 37.42 | 35.71 | 30.06 | 30.56 | 31.20 |
| des           | 16.95 | 15.90 | 14.28 | 12.52 | 10.30 | 14.36 |
| diffeq        | 30.04 | 20.19 | 14.68 | 15.81 | 13.38 | 13.01 |
| dsip          | 11.63 | 11.21 | 8.06  | 7.05  | 7.44  | 7.10  |
| elliptic      | 39.81 | 27.58 | 26.66 | 26.58 | 24.47 | 21.15 |
| ex1010        | 33.65 | 39.82 | 24.45 | 24.55 | 23.43 | 19.53 |
| ex5p          | 19.91 | 17.86 | 15.13 | 15.16 | 15.54 | 11.87 |
| frisc         | 53.90 | 34.25 | 31.09 | 27.35 | 28.35 | 27.49 |
| misex3        | 18.82 | 15.28 | 15.65 | 14.87 | 14.29 | 13.24 |
| pdc           | 36.64 | 30.68 | 33.95 | 25.97 | 23.94 | 20.60 |
| s298          | 47.79 | 34.60 | 29.86 | 33.81 | 34.25 | 23.23 |
| s38417        | 31.24 | 24.21 | 18.20 | 15.62 | 13.48 | 13.57 |
| s38584        | 21.48 | 16.20 | 15.85 | 15.31 | 13.63 | 14.25 |
| seq           | 20.59 | 17.26 | 15.64 | 13.26 | 14.75 | 15.07 |
| spla          | 32.91 | 28.46 | 27.06 | 34.51 | 23.96 | 19.72 |
| tseng         | 32.38 | 22.44 | 15.61 | 16.18 | 14.21 | 14.72 |
| display_chip  | 39.26 | 25.76 | 14.79 | 13.71 | 11.65 | 10.02 |
| img_calc      | 93.20 | 65.57 | 33.97 | 30.89 | 24.81 | 25.07 |
| img_interp    | 45.55 | 27.33 | 19.27 | 17.06 | 16.71 | 12.69 |
| input_chip    | 37.74 | 21.93 | 12.50 | 12.40 | 11.51 | 11.19 |
| peak_chip     | 41.00 | 27.00 | 17.31 | 14.66 | 13.46 | 12.04 |
| scale125_chip | 51.65 | 33.75 | 22.38 | 19.52 | 15.76 | 15.79 |
| scale2_chip   | 36.47 | 24.00 | 17.08 | 14.46 | 14.28 | 11.96 |
| warping       | 22.25 | 13.72 | 10.38 | 9.71  | 8.68  | 7.71  |
| Geom, Avg.    | 30.01 | 22.92 | 17.93 | 17.20 | 15.92 | 14.75 |

Table E.8: Total Delay in nano-seconds (Cluster Size = 8)

| Circuit       | LUT Size |       |       |       |       |       |  |  |
|---------------|----------|-------|-------|-------|-------|-------|--|--|
|               | 2        | 3     | 4     | 5     | 6     | 7     |  |  |
| alu4          | 24.12    | 16.29 | 14.24 | 13.24 | 13.33 | 13.33 |  |  |
| apex2         | 23.22    | 21.56 | 16.24 | 15.71 | 16.13 | 16.78 |  |  |
| apex4         | 21.16    | 17.65 | 19.21 | 15.05 | 19.50 | 15.46 |  |  |
| bigkey        | 10.94    | 9.18  | 8.71  | 7.66  | 8.03  | 7.56  |  |  |
| clma          | 44.73    | 34.76 | 35.22 | 39.36 | 31.61 | 28.17 |  |  |
| des           | 18.98    | 17.56 | 14.44 | 13.71 | 12.03 | 12.02 |  |  |
| diffeq        | 30.63    | 20.46 | 14.49 | 13.58 | 13.06 | 12.12 |  |  |
| dsip          | 11.28    | 9.73  | 7.71  | 9.77  | 7.74  | 7.01  |  |  |
| elliptic      | 37.17    | 30.67 | 25.51 | 24.68 | 20.16 | 22.75 |  |  |
| ex1010        | 30.37    | 38.77 | 23.20 | 23.93 | 23.94 | 17.20 |  |  |
| ex5p          | 20.65    | 20.57 | 17.73 | 14.21 | 12.63 | 11.21 |  |  |
| frisc         | 52.77    | 36.11 | 28.82 | 35.73 | 30.63 | 25.53 |  |  |
| misex3        | 21.88    | 15.11 | 14.77 | 12.21 | 13.17 | 16.11 |  |  |
| pdc           | 40.07    | 34.87 | 29.52 | 28.62 | 28.69 | 21.28 |  |  |
| s298          | 39.24    | 34.09 | 35.10 | 24.61 | 25.02 | 24.22 |  |  |
| s38417        | 31.92    | 24.59 | 26.23 | 17.01 | 13.65 | 14.80 |  |  |
| s38584        | 19.88    | 16.01 | 16.57 | 13.63 | 14.43 | 14.60 |  |  |
| seq           | 18.26    | 16.89 | 16.88 | 14.17 | 13.34 | 15.08 |  |  |
| spla          | 30.53    | 31.40 | 28.79 | 34.56 | 29.55 | 20.57 |  |  |
| tseng         | 37.21    | 21.47 | 14.03 | 16.87 | 14.88 | 12.38 |  |  |
| display_chip  | 38.79    | 24.39 | 14.78 | 13.58 | 11.94 | 9.85  |  |  |
| img_calc      | 90.25    | 63.70 | 32.61 | 31.42 | 25.72 | 24.89 |  |  |
| img_interp    | 43.85    | 27.71 | 18.42 | 18.35 | 15.65 | 12.83 |  |  |
| input_chip    | 36.10    | 22.62 | 13.39 | 12.20 | 11.90 | 10.80 |  |  |
| peak_chip     | 38.16    | 25.65 | 16.81 | 14.22 | 13.30 | 12.12 |  |  |
| scale125_chip | 53.00    | 33.18 | 20.65 | 20.06 | 15.24 | 15.67 |  |  |
| scale2_chip   | 37.90    | 23.85 | 17.19 | 14.20 | 13.14 | 12.19 |  |  |
| warping       | 21.98    | 14.67 | 10.09 | 9.68  | 8.87  | 7.75  |  |  |
| Geom. Avg.    | 29.84    | 23.05 | 18.28 | 17.07 | 15.79 | 14.54 |  |  |

Table E.9: Total Delay in nano-seconds (Cluster Size = 9)

| Circuit       |       |       | LUT   | Size  |       |       |
|---------------|-------|-------|-------|-------|-------|-------|
|               | 2     | 3     | 4     | 5     | 6     | 7     |
| alu4          | 26.30 | 14.93 | 16.17 | 13.21 | 12.95 | 12.96 |
| apex2         | 21.63 | 20.20 | 15.31 | 18.36 | 15.74 | 15.76 |
| apex4         | 25.39 | 17.16 | 16.41 | 15.24 | 14.92 | 17.19 |
| bigkey        | 11.56 | 10.02 | 8.50  | 7.61  | 8.01  | 6.71  |
| clma          | 42.64 | 35.37 | 29.38 | 30.24 | 28.21 | 27.68 |
| des           | 17.26 | 14.53 | 12.48 | 12.77 | 12.28 | 12.21 |
| diffeq        | 28.41 | 20.03 | 14.10 | 15.41 | 12.56 | 11.66 |
| dsip          | 10.98 | 10.07 | 7.28  | 7.72  | 7.40  | 6.73  |
| elliptic      | 36.80 | 31.86 | 25.17 | 26.50 | 24.50 | 20.11 |
| ex1010        | 31.77 | 41.99 | 29.44 | 22.96 | 22.83 | 18.98 |
| ex5p          | 21.30 | 19.90 | 14.90 | 15.78 | 12.81 | 11.74 |
| frisc         | 53.61 | 35.92 | 28.11 | 28.95 | 30.85 | 30.21 |
| misex3        | 21.18 | 15.64 | 13.11 | 13.98 | 12.79 | 13.41 |
| pdc           | 32.76 | 34.24 | 26.99 | 24.06 | 20.62 | 28.48 |
| s298          | 39.42 | 32.41 | 30.81 | 25.44 | 26.55 | 23.59 |
| s38417        | 36.65 | 25.42 | 18.34 | 19.38 | 14.44 | 14.98 |
| s38584        | 18.27 | 17.09 | 16.74 | 13.96 | 13.88 | 15.13 |
| seq           | 18.02 | 15.78 | 14.54 | 13.98 | 12.94 | 14.82 |
| spla          | 30.65 | 31.58 | 27.13 | 21.22 | 21.89 | 27.22 |
| tseng         | 30.33 | 21.46 | 15.08 | 15.88 | 14.11 | 12.46 |
| display_chip  | 37.73 | 23.57 | 14.52 | 14.01 | 11.49 | 10.51 |
| img_calc      | 85.29 | 59.98 | 34.58 | 30.60 | 25.27 | 24.65 |
| img_interp    | 42.65 | 27.30 | 19.25 | 16.76 | 15.87 | 12.59 |
| input_chip    | 34.65 | 22.43 | 13.54 | 12.29 | 12.23 | 11.31 |
| peak_chip     | 38.02 | 26.60 | 17.56 | 15.04 | 13.11 | 11.92 |
| scale125_chip | 48.04 | 31.95 | 19.38 | 19.54 | 16.46 | 14.58 |
| scale2_chip   | 34.84 | 23.33 | 16.66 | 15.10 | 12.97 | 13.72 |
| warping       | 21.42 | 13.64 | 9.99  | 9.99  | 8.93  | 7.95  |
| Geom. Avg.    | 29.14 | 22.73 | 17.46 | 16.63 | 15.26 | 14.82 |

Table E.10: Total Delay in nano-seconds (Cluster Size = 10)

# ${}_{\mathsf{APPENDIX}}\,F$

#### Intra-Cluster (Logic) Delay

| Circuit       | LUT Size |       |       |       |       |       |  |
|---------------|----------|-------|-------|-------|-------|-------|--|
|               | 2        | 3     | 4     | 5     | 6     | 7     |  |
| alu4          | 5.35     | 3.34  | 4.18  | 4.33  | 3.56  | 4.17  |  |
| apex2         | 5.71     | 3.34  | 3.62  | 5.00  | 5.18  | 5.13  |  |
| apex4         | 4.26     | 3.34  | 3.62  | 4.33  | 4.37  | 5.13  |  |
| bigkey        | 3.88     | 2.43  | 1.87  | 1.52  | 2.59  | 2.05  |  |
| clma          | 13.29    | 5.49  | 5.21  | 4.89  | 7.47  | 8.80  |  |
| des           | 4.99     | 3.78  | 3.62  | 3.65  | 2.74  | 3.21  |  |
| diffeq        | 14.38    | 8.99  | 7.99  | 6.90  | 4.22  | 5.91  |  |
| dsip          | 3.88     | 2.43  | 1.87  | 2.20  | 2.59  | 2.05  |  |
| elliptic      | 19.09    | 9.86  | 8.54  | 8.25  | 8.29  | 8.80  |  |
| ex1010        | 6.07     | 5.09  | 4.18  | 3.66  | 5.18  | 5.13  |  |
| ex5p          | 5.35     | 4.22  | 3.62  | 3.65  | 3.56  | 4.17  |  |
| frisc         | 24.52    | 13.35 | 12.99 | 10.93 | 10.73 | 10.73 |  |
| misex3        | 4.99     | 3.34  | 3.62  | 2.98  | 4.37  | 4.17  |  |
| pdc           | 6.07     | 4.22  | 4.74  | 5.00  | 5.18  | 5.13  |  |
| s298          | 11.48    | 9.42  | 7.99  | 8.92  | 9.10  | 9.77  |  |
| s38417        | 6.78     | 7.24  | 5.21  | 5.56  | 5.85  | 5.91  |  |
| s38584        | 9.31     | 5.53  | 5.29  | 4.33  | 4.37  | 6.10  |  |
| seq           | 3.90     | 3.34  | 3.62  | 4.33  | 4.37  | 5.13  |  |
| spla          | 5.35     | 5.09  | 4.74  | 3.65  | 4.37  | 5.13  |  |
| tseng         | 15.83    | 9.42  | 7.43  | 6.90  | 6.66  | 6.87  |  |
| display_chip  | 19.09    | 11.61 | 7.99  | 7.57  | 6.66  | 5.91  |  |
| img_calc      | 37.19    | 24.28 | 15.22 | 14.29 | 12.36 | 13.63 |  |
| img_interp    | 19.81    | 12.48 | 7.43  | 6.90  | 5.85  | 5.91  |  |
| input_chip    | 17.28    | 10.30 | 7.99  | 8.25  | 6.66  | 5.91  |  |
| peak_chip     | 18.36    | 11.61 | 8.54  | 6.90  | 6.66  | 6.87  |  |
| scale125_chip | 21.62    | 14.67 | 11.33 | 10.93 | 7.48  | 8.80  |  |
| scale2_chip   | 16.55    | 10.73 | 8.54  | 4.21  | 6.66  | 6.87  |  |
| warping       | 11.12    | 6.36  | 5.76  | 4.89  | 4.22  | 2.24  |  |
| Geom. Avg.    | 9.65     | 6.43  | 5.58  | 5.26  | 5.34  | 5.51  |  |

Table F.1: Intra-Cluster Delay in nano-seconds (Cluster Size = 1)

| Circuit       | LUT Size |       |       |       |       |       |  |
|---------------|----------|-------|-------|-------|-------|-------|--|
|               | 2        | 3     | 4     | 5     | 6     | 7     |  |
| alu4          | 5.22     | 3.83  | 4.53  | 4.86  | 4.90  | 5.70  |  |
| apex2         | 5.63     | 4.84  | 5.14  | 5.62  | 5.82  | 5.70  |  |
| apex4         | 5.22     | 3.83  | 3.92  | 4.10  | 4.90  | 4.62  |  |
| bigkey        | 4.38     | 2.78  | 2.03  | 2.47  | 1.99  | 2.27  |  |
| clma          | 15.48    | 6.32  | 8.70  | 10.09 | 9.36  | 9.81  |  |
| des           | 5.22     | 4.84  | 3.92  | 2.58  | 3.06  | 3.54  |  |
| diffeq        | 15.48    | 10.37 | 8.70  | 7.04  | 7.52  | 6.58  |  |
| dsip          | 3.97     | 3.29  | 2.03  | 2.47  | 1.99  | 2.27  |  |
| elliptic      | 21.23    | 11.38 | 9.30  | 9.33  | 9.36  | 9.81  |  |
| ex1010        | 6.45     | 5.35  | 4.53  | 4.86  | 5.82  | 6.78  |  |
| ex5p          | 5.22     | 4.34  | 4.53  | 4.86  | 4.90  | 4.62  |  |
| frisc         | 25.75    | 14.92 | 12.33 | 12.37 | 13.04 | 11.97 |  |
| misex3        | 4.81     | 3.83  | 3.32  | 3.34  | 3.98  | 5.70  |  |
| pdc           | 6.45     | 5.35  | 5.74  | 5.62  | 5.82  | 6.78  |  |
| s298          | 13.42    | 10.88 | 9.30  | 10.09 | 10.28 | 10.89 |  |
| s38417        | 10.13    | 7.84  | 4.46  | 4.75  | 6.59  | 6.58  |  |
| s38584        | 7.68     | 7.33  | 1.43  | 4.86  | 5.82  | 6.78  |  |
| seq           | 5.63     | 3.83  | 4.53  | 4.86  | 4.90  | 5.70  |  |
| spla          | 6.45     | 5.35  | 5.14  | 5.62  | 4.90  | 4.62  |  |
| tseng         | 17.53    | 10.37 | 6.88  | 7.80  | 7.52  | 7.65  |  |
| display_chip  | 19.18    | 13.41 | 8.70  | 8.56  | 7.52  | 6.58  |  |
| img_calc      | 42.60    | 27.07 | 16.58 | 15.42 | 13.96 | 15.20 |  |
| img_interp    | 20.41    | 14.42 | 8.09  | 7.04  | 7.52  | 7.65  |  |
| input_chip    | 17.94    | 11.89 | 8.70  | 9.33  | 7.52  | 6.58  |  |
| peak_chip     | 17.53    | 12.90 | 9.30  | 7.80  | 7.52  | 6.58  |  |
| scale125_chip | 25.75    | 16.95 | 12.33 | 12.37 | 9.36  | 9.81  |  |
| scale2_chip   | 17.94    | 11.89 | 9.30  | 8.56  | 7.52  | 7.65  |  |
| warping       | 11.78    | 6.83  | 5.67  | 5.52  | 4.75  | 2.46  |  |
| Geom. Avg.    | 10.47    | 7.43  | 5.84  | 6.16  | 6.11  | 6.19  |  |

Table F.2: Intra-Cluster Delay in nano-seconds (Cluster Size = 2)

| Circuit       | LUT Size |       |       |       |       |       |  |
|---------------|----------|-------|-------|-------|-------|-------|--|
|               | 2        | 3     | 4     | 5     | 6     | 7     |  |
| alu4          | 6.18     | 3.64  | 4.79  | 5.09  | 6.05  | 6.10  |  |
| apex2         | 5.76     | 5.87  | 5.43  | 5.89  | 6.05  | 7.26  |  |
| apex4         | 4.49     | 4.20  | 4.15  | 5.09  | 5.09  | 6.10  |  |
| bigkey        | 4.06     | 3.05  | 2.15  | 2.58  | 2.87  | 2.23  |  |
| clma          | 15.00    | 10.30 | 7.29  | 9.78  | 7.66  | 9.18  |  |
| des           | 4.49     | 5.32  | 4.15  | 3.49  | 3.17  | 3.79  |  |
| diffeq        | 14.16    | 9.74  | 9.22  | 6.58  | 5.74  | 6.86  |  |
| dsip          | 3.63     | 3.60  | 2.15  | 2.58  | 2.87  | 2.23  |  |
| elliptic      | 18.79    | 10.86 | 9.86  | 8.98  | 9.57  | 9.18  |  |
| ex1010        | 6.18     | 5.87  | 4.79  | 5.89  | 6.05  | 6.10  |  |
| ex5p          | 5.76     | 4.76  | 4.79  | 4.29  | 5.09  | 4.94  |  |
| frisc         | 23.42    | 17.00 | 12.43 | 12.18 | 11.49 | 11.50 |  |
| misex3        | 4.91     | 4.20  | 4.15  | 5.09  | 5.09  | 6.10  |  |
| pdc           | 6.18     | 5.32  | 4.79  | 5.89  | 6.05  | 6.10  |  |
| s298          | 13.74    | 11.42 | 9.86  | 8.98  | 9.57  | 10.34 |  |
| s38417        | 8.69     | 8.07  | 6.65  | 7.38  | 6.70  | 6.86  |  |
| s38584        | 5.76     | 2.52  | 5.43  | 5.89  | 6.05  | 6.10  |  |
| seq           | 4.91     | 3.64  | 3.50  | 4.29  | 5.09  | 6.10  |  |
| spla          | 6.60     | 5.87  | 5.43  | 4.29  | 6.05  | 6.10  |  |
| tseng         | 15.84    | 10.86 | 8.58  | 8.18  | 7.66  | 8.02  |  |
| display_chip  | 18.37    | 14.21 | 9.22  | 8.98  | 7.66  | 6.86  |  |
| img_calc      | 35.63    | 27.04 | 17.58 | 19.38 | 15.32 | 14.98 |  |
| img_interp    | 21.32    | 15.88 | 7.93  | 7.38  | 7.66  | 8.02  |  |
| input_chip    | 18.79    | 13.09 | 7.29  | 9.78  | 7.66  | 6.86  |  |
| peak_chip     | 20.05    | 14.21 | 9.86  | 8.98  | 7.66  | 6.86  |  |
| scale125_chip | 24.68    | 18.11 | 13.08 | 12.98 | 9.57  | 10.34 |  |
| scale2_chip   | 17.95    | 13.09 | 9.22  | 8.98  | 7.66  | 8.02  |  |
| warping       | 11.21    | 8.07  | 6.00  | 4.98  | 3.83  | 4.55  |  |
| Geom. Avg.    | 9.99     | 7.72  | 6.35  | 6.55  | 6.35  | 6.55  |  |

Table F.3: Intra-Cluster Delay in nano-seconds (Cluster Size = 3)

| Circuit       | LUT Size |       |       |       |       |       |  |
|---------------|----------|-------|-------|-------|-------|-------|--|
|               | 2        | 3     | 4     | 5     | 6     | 7     |  |
| alu4          | 4.90     | 4.87  | 5.20  | 5.31  | 5.26  | 6.09  |  |
| apex2         | 5.82     | 6.02  | 5.20  | 6.14  | 6.26  | 4.93  |  |
| apex4         | 5.36     | 4.30  | 4.50  | 5.31  | 5.26  | 4.93  |  |
| bigkey        | 4.42     | 3.69  | 2.24  | 2.59  | 2.96  | 2.21  |  |
| clma          | 16.41    | 9.99  | 9.96  | 8.45  | 9.92  | 10.31 |  |
| des           | 6.29     | 5.45  | 3.80  | 3.64  | 1.29  | 3.77  |  |
| diffeq        | 15.48    | 11.71 | 9.96  | 7.61  | 6.94  | 6.84  |  |
| dsip          | 4.88     | 3.69  | 2.24  | 2.59  | 2.96  | 1.46  |  |
| elliptic      | 23.78    | 11.14 | 10.67 | 10.12 | 12.90 | 10.31 |  |
| ex1010        | 6.29     | 6.02  | 5.20  | 5.31  | 6.26  | 6.09  |  |
| ex5p          | 4.90     | 4.87  | 4.50  | 4.47  | 5.26  | 3.77  |  |
| frisc         | 29.78    | 15.15 | 14.18 | 12.62 | 12.90 | 8.00  |  |
| misex3        | 3.98     | 3.15  | 3.80  | 3.64  | 4.27  | 4.93  |  |
| pdc           | 7.67     | 6.02  | 6.61  | 6.14  | 6.26  | 6.09  |  |
| s298          | 15.02    | 12.28 | 9.96  | 10.95 | 10.91 | 10.31 |  |
| s38417        | 11.34    | 5.98  | 7.86  | 7.61  | 6.94  | 5.68  |  |
| s38584        | 4.90     | 4.87  | 6.61  | 5.31  | 6.26  | 7.24  |  |
| seq           | 5.36     | 4.87  | 3.10  | 4.47  | 4.27  | 4.93  |  |
| spla          | 4.90     | 4.87  | 5.90  | 5.31  | 6.26  | 6.09  |  |
| tseng         | 17.79    | 12.28 | 9.26  | 8.45  | 7.93  | 8.00  |  |
| display_chip  | 21.94    | 15.15 | 9.96  | 9.28  | 7.93  | 6.84  |  |
| img_calc      | 41.76    | 32.34 | 16.98 | 20.15 | 14.88 | 14.93 |  |
| img_interp    | 22.86    | 15.15 | 4.35  | 8.45  | 7.93  | 8.00  |  |
| input_chip    | 19.63    | 12.86 | 8.56  | 10.12 | 7.93  | 6.84  |  |
| peak_chip     | 21.02    | 14.00 | 9.96  | 8.45  | 7.93  | 6.84  |  |
| scale125_chip | 27.47    | 19.16 | 12.77 | 13.46 | 9.92  | 9.15  |  |
| scale2_chip   | 20.09    | 13.43 | 10.67 | 9.28  | 7.93  | 8.00  |  |
| warping       | 13.18    | 5.41  | 7.16  | 5.10  | 4.95  | 4.53  |  |
| Geom. Avg.    | 10.80    | 8.01  | 6.65  | 6.71  | 6.44  | 6.09  |  |

Table F.4: Intra-Cluster Delay in nano-seconds (Cluster Size = 4)

| Circuit       | LUT Size |       |       |       |       |       |  |
|---------------|----------|-------|-------|-------|-------|-------|--|
|               | 2        | 3     | 4     | 5     | 6     | 7     |  |
| alu4          | 5.93     | 5.12  | 4.72  | 4.60  | 6.34  | 6.03  |  |
| apex2         | 5.93     | 4.52  | 5.46  | 5.47  | 6.34  | 6.03  |  |
| apex4         | 4.99     | 3.92  | 4.72  | 5.47  | 5.33  | 4.89  |  |
| bigkey        | 4.03     | 3.87  | 2.37  | 2.69  | 3.01  | 2.20  |  |
| clma          | 15.31    | 11.71 | 10.50 | 9.59  | 10.06 | 10.21 |  |
| des           | 6.40     | 4.52  | 3.98  | 4.60  | 3.32  | 3.74  |  |
| diffeq        | 17.19    | 11.71 | 10.50 | 8.72  | 7.04  | 6.78  |  |
| dsip          | 4.03     | 2.66  | 2.37  | 2.69  | 3.01  | 2.20  |  |
| elliptic      | 24.24    | 12.31 | 11.23 | 10.45 | 10.06 | 10.21 |  |
| ex1010        | 5.93     | 5.72  | 6.20  | 5.47  | 5.33  | 6.03  |  |
| ex5p          | 6.40     | 5.12  | 4.72  | 4.60  | 5.33  | 4.89  |  |
| frisc         | 28.47    | 17.14 | 14.19 | 13.03 | 13.08 | 10.21 |  |
| misex3        | 5.93     | 3.91  | 3.98  | 4.60  | 5.33  | 4.89  |  |
| pdc           | 7.34     | 6.93  | 6.20  | 5.47  | 6.34  | 6.03  |  |
| s298          | 13.90    | 12.92 | 10.50 | 10.45 | 11.06 | 11.36 |  |
| s38417        | 3.09     | 9.30  | 7.54  | 7.86  | 7.04  | 6.78  |  |
| s38584        | 8.28     | 6.93  | 6.94  | 5.47  | 6.34  | 7.17  |  |
| seq           | 6.40     | 4.52  | 4.72  | 4.60  | 4.33  | 4.89  |  |
| spla          | 6.40     | 5.72  | 4.72  | 5.47  | 6.34  | 6.03  |  |
| tseng         | 19.07    | 11.11 | 9.76  | 8.72  | 8.04  | 7.92  |  |
| display_chip  | 21.89    | 14.72 | 9.76  | 9.58  | 8.04  | 6.78  |  |
| img_calc      | 44.45    | 27.99 | 18.62 | 20.79 | 14.09 | 14.79 |  |
| img_interp    | 22.83    | 15.33 | 8.28  | 7.86  | 8.04  | 7.92  |  |
| input_chip    | 20.01    | 13.52 | 7.54  | 9.58  | 8.04  | 6.78  |  |
| peak_chip     | 22.36    | 14.72 | 11.23 | 8.72  | 8.04  | 6.78  |  |
| scale125_chip | 27.53    | 20.15 | 14.19 | 13.89 | 10.06 | 10.21 |  |
| scale2_chip   | 20.95    | 14.12 | 10.50 | 8.72  | 8.04  | 7.92  |  |
| warping       | 12.49    | 9.30  | 7.54  | 5.28  | 5.02  | 4.49  |  |
| Geom. Avg.    | 10.89    | 8.41  | 7.06  | 6.87  | 6.73  | 6.38  |  |

Table F.5: Intra-Cluster Delay in nano-seconds (Cluster Size = 5)

| Circuit       | LUT Size |       |       |       |       |       |  |
|---------------|----------|-------|-------|-------|-------|-------|--|
|               | 2        | 3     | 4     | 5     | 6     | 7     |  |
| alu4          | 6.65     | 3.43  | 5.39  | 5.56  | 6.27  | 6.05  |  |
| apex2         | 7.14     | 5.94  | 6.12  | 6.44  | 5.27  | 6.05  |  |
| apex4         | 6.16     | 4.68  | 3.93  | 5.56  | 5.27  | 6.05  |  |
| bigkey        | 5.16     | 4.01  | 2.48  | 2.74  | 2.98  | 2.20  |  |
| clma          | 17.39    | 12.16 | 9.63  | 11.51 | 6.95  | 10.25 |  |
| des           | 6.65     | 5.31  | 4.66  | 4.68  | 3.28  | 3.75  |  |
| diffeq        | 16.90    | 12.16 | 10.36 | 7.12  | 5.96  | 6.80  |  |
| dsip          | 4.67     | 3.38  | 2.34  | 2.74  | 2.98  | 2.20  |  |
| elliptic      | 24.23    | 12.79 | 11.09 | 8.88  | 7.95  | 10.25 |  |
| ex1010        | 7.63     | 6.56  | 4.66  | 5.56  | 6.27  | 6.05  |  |
| ex5p          | 5.67     | 5.31  | 4.66  | 4.68  | 5.27  | 4.90  |  |
| frisc         | 30.59    | 17.80 | 14.01 | 14.14 | 12.92 | 11.40 |  |
| misex3        | 6.16     | 4.68  | 3.93  | 5.56  | 5.27  | 4.90  |  |
| pdc           | 8.12     | 6.56  | 6.85  | 6.44  | 7.26  | 6.05  |  |
| s298          | 15.43    | 14.04 | 11.09 | 10.63 | 10.93 | 11.40 |  |
| s38417        | 11.03    | 9.65  | 8.18  | 7.12  | 6.95  | 5.65  |  |
| s38584        | 6.16     | 6.56  | 6.85  | 6.44  | 6.27  | 7.20  |  |
| seq           | 5.67     | 4.68  | 5.39  | 4.68  | 5.27  | 4.90  |  |
| spla          | 7.14     | 5.94  | 5.39  | 5.56  | 5.27  | 7.20  |  |
| tseng         | 19.34    | 12.79 | 9.63  | 8.88  | 7.95  | 7.95  |  |
| display_chip  | 22.27    | 14.04 | 10.36 | 9.75  | 7.95  | 6.80  |  |
| img_calc      | 45.75    | 34.73 | 19.84 | 17.65 | 13.91 | 14.84 |  |
| img_interp    | 24.72    | 15.92 | 8.90  | 8.00  | 7.95  | 5.65  |  |
| input_chip    | 19.34    | 14.67 | 8.18  | 10.63 | 7.95  | 6.80  |  |
| peak_chip     | 23.74    | 16.55 | 11.09 | 8.88  | 7.95  | 6.80  |  |
| scale125_chip | 26.68    | 19.69 | 14.74 | 14.14 | 8.94  | 10.25 |  |
| scale2_chip   | 21.79    | 14.04 | 10.36 | 9.75  | 7.95  | 7.95  |  |
| warping       | 13.96    | 10.28 | 7.45  | 5.37  | 4.97  | 4.50  |  |
| Geom. Avg.    | 12.01    | 8.93  | 7.21  | 7.11  | 6.49  | 6.39  |  |

Table F.6: Intra-Cluster Delay in nano-seconds (Cluster Size = 6)

| Circuit       | LUT Size |       |       |       |       |       |  |
|---------------|----------|-------|-------|-------|-------|-------|--|
|               | 2        | 3     | 4     | 5     | 6     | 7     |  |
| alu4          | 6.24     | 5.71  | 5.42  | 5.46  | 6.37  | 6.23  |  |
| apex2         | 7.24     | 7.07  | 5.42  | 3.74  | 5.36  | 6.23  |  |
| apex4         | 6.24     | 4.36  | 3.95  | 4.60  | 5.36  | 5.04  |  |
| bigkey        | 4.73     | 4.31  | 2.35  | 2.69  | 3.02  | 2.30  |  |
| clma          | 13.66    | 11.08 | 9.68  | 9.57  | 8.08  | 10.59 |  |
| des           | 6.74     | 5.71  | 4.68  | 4.60  | 3.34  | 3.86  |  |
| diffeq        | 18.12    | 13.11 | 8.95  | 8.71  | 7.06  | 7.04  |  |
| dsip          | 3.74     | 4.31  | 2.35  | 2.69  | 3.02  | 2.30  |  |
| elliptic      | 18.62    | 11.76 | 8.21  | 6.99  | 7.06  | 10.59 |  |
| ex1010        | 7.73     | 7.07  | 5.42  | 6.32  | 5.36  | 6.23  |  |
| ex5p          | 6.74     | 5.71  | 4.68  | 3.74  | 5.36  | 5.04  |  |
| frisc         | 33.00    | 19.20 | 13.35 | 10.44 | 13.14 | 11.77 |  |
| misex3        | 5.75     | 5.04  | 3.95  | 4.60  | 5.36  | 5.04  |  |
| pdc           | 7.73     | 7.75  | 6.15  | 6.32  | 6.37  | 6.23  |  |
| s298          | 13.66    | 11.76 | 9.68  | 10.44 | 11.11 | 11.77 |  |
| s38417        | 10.68    | 9.73  | 8.21  | 6.99  | 5.04  | 7.04  |  |
| s38584        | 7.24     | 6.39  | 6.88  | 5.46  | 6.37  | 7.41  |  |
| seq           | 5.75     | 5.04  | 3.95  | 3.74  | 5.36  | 6.23  |  |
| spla          | 8.72     | 7.07  | 5.42  | 5.46  | 6.37  | 7.41  |  |
| tseng         | 20.11    | 14.47 | 9.68  | 8.71  | 8.08  | 7.04  |  |
| display_chip  | 22.59    | 17.17 | 8.95  | 9.57  | 5.04  | 7.04  |  |
| img_calc      | 46.40    | 38.16 | 19.21 | 20.77 | 15.16 | 15.32 |  |
| img_interp    | 25.07    | 19.20 | 8.21  | 8.71  | 8.08  | 8.22  |  |
| input_chip    | 19.12    | 15.82 | 8.21  | 10.44 | 8.08  | 7.04  |  |
| peak_chip     | 24.08    | 17.17 | 9.68  | 8.71  | 8.08  | 7.04  |  |
| scale125_chip | 30.03    | 21.91 | 14.81 | 13.88 | 9.09  | 10.59 |  |
| scale2_chip   | 22.09    | 15.82 | 11.15 | 9.57  | 8.08  | 8.22  |  |
| warping       | 9.20     | 10.40 | 7.48  | 5.27  | 5.04  | 4.67  |  |
| Geom. Avg.    | 11.76    | 9.72  | 6.87  | 6.63  | 6.43  | 6.69  |  |

Table F.7: Intra-Cluster Delay in nano-seconds (Cluster Size = 7)

| Circuit       | LUT Size |       |       |       |       |       |  |
|---------------|----------|-------|-------|-------|-------|-------|--|
|               | 2        | 3     | 4     | 5     | 6     | 7     |  |
| alu4          | 7.40     | 5.63  | 5.38  | 4.58  | 5.47  | 6.32  |  |
| apex2         | 8.58     | 6.30  | 5.38  | 6.30  | 5.47  | 6.32  |  |
| apex4         | 6.81     | 4.96  | 4.65  | 4.58  | 5.47  | 6.32  |  |
| bigkey        | 3.82     | 3.58  | 2.34  | 2.69  | 3.11  | 2.34  |  |
| clma          | 15.66    | 12.92 | 11.06 | 10.41 | 8.28  | 10.76 |  |
| des           | 7.40     | 5.63  | 4.65  | 4.58  | 3.40  | 3.92  |  |
| diffeq        | 22.18    | 13.59 | 8.15  | 7.83  | 8.28  | 7.15  |  |
| dsip          | 5.01     | 4.25  | 2.34  | 2.69  | 3.11  | 2.34  |  |
| elliptic      | 30.46    | 14.25 | 11.06 | 9.55  | 9.32  | 10.76 |  |
| ex1010        | 7.99     | 6.96  | 5.38  | 5.44  | 6.50  | 6.32  |  |
| ex5p          | 6.80     | 5.63  | 4.65  | 3.73  | 5.47  | 5.12  |  |
| frisc         | 39.34    | 18.26 | 13.97 | 13.84 | 13.45 | 13.17 |  |
| misex3        | 5.62     | 5.63  | 4.65  | 4.58  | 5.47  | 5.12  |  |
| pdc           | 10.95    | 6.30  | 3.92  | 6.30  | 6.50  | 6.32  |  |
| s298          | 18.62    | 12.92 | 10.33 | 8.69  | 11.39 | 11.97 |  |
| s38417        | 13.89    | 9.59  | 7.43  | 7.83  | 7.25  | 7.15  |  |
| s38584        | 9.17     | 6.30  | 6.83  | 1.83  | 6.50  | 7.52  |  |
| seq           | 6.21     | 5.63  | 5.38  | 5.44  | 5.47  | 6.32  |  |
| spla          | 9.17     | 6.30  | 5.38  | 5.44  | 6.50  | 6.32  |  |
| tseng         | 22.77    | 12.92 | 9.61  | 8.69  | 8.28  | 5.95  |  |
| display_chip  | 28.10    | 16.26 | 9.61  | 8.69  | 7.25  | 7.15  |  |
| img_calc      | 58.29    | 37.60 | 18.33 | 19.85 | 14.49 | 15.57 |  |
| img_interp    | 30.46    | 18.92 | 8.88  | 8.69  | 7.25  | 8.36  |  |
| input_chip    | 23.95    | 15.59 | 10.33 | 10.41 | 8.28  | 7.15  |  |
| peak_chip     | 29.28    | 17.59 | 9.61  | 8.69  | 8.28  | 7.15  |  |
| scale125_chip | 32.83    | 21.59 | 14.70 | 13.84 | 9.32  | 10.76 |  |
| scale2_chip   | 25.73    | 14.92 | 10.33 | 9.55  | 6.21  | 8.36  |  |
| warping       | 16.85    | 10.92 | 3.79  | 5.26  | 5.18  | 3.54  |  |
| Geom. Avg.    | 14.03    | 9.66  | 6.88  | 6.59  | 6.72  | 6.72  |  |

Table F.8: Intra-Cluster Delay in nano-seconds (Cluster Size = 8)

| Circuit       | LUT Size |       |       |       |       |       |  |
|---------------|----------|-------|-------|-------|-------|-------|--|
|               | 2        | 3     | 4     | 5     | 6     | 7     |  |
| alu4          | 7.82     | 5.15  | 5.35  | 4.82  | 5.60  | 6.41  |  |
| apex2         | 8.98     | 7.23  | 5.35  | 5.72  | 5.60  | 7.64  |  |
| apex4         | 6.08     | 5.15  | 4.63  | 5.72  | 5.60  | 5.19  |  |
| bigkey        | 3.75     | 3.71  | 2.34  | 2.83  | 3.18  | 2.37  |  |
| clma          | 18.22    | 13.41 | 10.29 | 10.07 | 8.48  | 10.93 |  |
| des           | 7.24     | 5.84  | 4.63  | 4.82  | 3.48  | 3.97  |  |
| diffeq        | 21.70    | 13.41 | 8.12  | 7.35  | 7.42  | 7.26  |  |
| dsip          | 4.91     | 3.71  | 2.34  | 2.83  | 3.18  | 2.37  |  |
| elliptic      | 29.80    | 15.49 | 8.85  | 10.97 | 8.48  | 10.93 |  |
| ex1010        | 8.40     | 6.53  | 5.35  | 4.82  | 5.60  | 6.41  |  |
| ex5p          | 7.82     | 5.84  | 4.63  | 4.82  | 5.60  | 5.19  |  |
| frisc         | 37.33    | 21.03 | 13.91 | 13.69 | 12.72 | 12.15 |  |
| misex3        | 7.82     | 4.46  | 3.91  | 4.82  | 5.60  | 6.41  |  |
| pdc           | 9.56     | 6.53  | 5.35  | 6.63  | 5.60  | 7.64  |  |
| s298          | 18.22    | 14.10 | 11.02 | 10.97 | 11.66 | 10.93 |  |
| s38417        | 11.85    | 10.64 | 6.68  | 7.35  | 7.42  | 7.26  |  |
| s38584        | 10.71    | 6.53  | 6.07  | 6.63  | 6.66  | 7.64  |  |
| seq           | 7.82     | 5.15  | 3.91  | 4.82  | 5.60  | 6.41  |  |
| spla          | 8.98     | 7.23  | 6.07  | 6.63  | 6.66  | 6.41  |  |
| tseng         | 25.17    | 14.80 | 9.57  | 9.16  | 8.48  | 7.26  |  |
| display_chip  | 26.91    | 16.88 | 9.57  | 9.16  | 5.30  | 7.26  |  |
| img_calc      | 57.02    | 39.05 | 20.41 | 20.93 | 15.90 | 17.04 |  |
| img_interp    | 28.64    | 19.65 | 8.85  | 9.16  | 8.48  | 8.48  |  |
| input_chip    | 24.01    | 16.18 | 10.29 | 10.97 | 8.48  | 7.26  |  |
| peak_chip     | 28.07    | 18.26 | 9.57  | 9.16  | 8.48  | 7.26  |  |
| scale125_chip | 32.70    | 22.42 | 13.91 | 14.59 | 9.54  | 10.93 |  |
| scale2_chip   | 24.01    | 14.80 | 10.29 | 10.07 | 8.48  | 8.48  |  |
| warping       | 16.48    | 11.33 | 5.23  | 5.54  | 5.30  | 4.81  |  |
| Geom. Avg.    | 14.22    | 9.91  | 6.83  | 7.23  | 6.75  | 6.99  |  |

Table F.9: Intra-Cluster Delay in nano-seconds (Cluster Size = 9)

| Circuit       | LUT Size |       |       |       |       |       |  |
|---------------|----------|-------|-------|-------|-------|-------|--|
|               | 2        | 3     | 4     | 5     | 6     | 7     |  |
| alu4          | 6.03     | 5.73  | 4.63  | 5.73  | 5.59  | 6.48  |  |
| apex2         | 8.90     | 7.08  | 5.36  | 6.64  | 6.65  | 6.48  |  |
| apex4         | 6.61     | 5.05  | 4.63  | 4.83  | 5.59  | 6.48  |  |
| bigkey        | 5.44     | 4.32  | 2.34  | 2.84  | 3.19  | 2.41  |  |
| clma          | 16.92    | 13.14 | 6.68  | 11.00 | 10.60 | 9.83  |  |
| des           | 6.03     | 5.73  | 3.91  | 4.83  | 3.47  | 4.01  |  |
| diffeq        | 18.07    | 13.82 | 9.58  | 7.37  | 7.42  | 7.35  |  |
| dsip          | 4.87     | 3.64  | 2.34  | 2.84  | 3.19  | 2.41  |  |
| elliptic      | 29.55    | 11.79 | 11.02 | 11.00 | 4.25  | 11.06 |  |
| ex1010        | 7.75     | 7.08  | 6.08  | 5.73  | 5.59  | 7.72  |  |
| ex5p          | 7.18     | 5.05  | 4.63  | 5.73  | 4.53  | 5.25  |  |
| frisc         | 38.16    | 20.61 | 13.92 | 12.81 | 13.77 | 12.30 |  |
| misex3        | 6.61     | 5.05  | 4.63  | 5.73  | 4.53  | 5.25  |  |
| pdc           | 8.90     | 6.41  | 6.08  | 6.64  | 6.65  | 6.48  |  |
| s298          | 18.07    | 14.50 | 11.02 | 11.00 | 11.66 | 11.06 |  |
| s38417        | 13.47    | 8.39  | 7.41  | 7.37  | 7.42  | 6.12  |  |
| s38584        | 11.20    | 6.41  | 6.08  | 6.64  | 6.65  | 7.72  |  |
| seq           | 6.61     | 5.05  | 4.63  | 5.73  | 4.53  | 6.48  |  |
| spla          | 8.90     | 7.08  | 5.36  | 4.83  | 6.65  | 6.48  |  |
| tseng         | 24.38    | 13.82 | 8.85  | 9.18  | 8.48  | 8.59  |  |
| display_chip  | 28.97    | 17.22 | 9.58  | 10.09 | 7.42  | 6.12  |  |
| img_calc      | 55.95    | 38.27 | 18.26 | 20.07 | 15.89 | 16.01 |  |
| img_interp    | 28.97    | 19.25 | 8.85  | 11.00 | 8.48  | 8.59  |  |
| input_chip    | 23.23    | 15.86 | 10.30 | 11.00 | 7.42  | 7.35  |  |
| peak chip     | 26.68    | 17.22 | 11.02 | 9.18  | 8.48  | 7.35  |  |
| scale125_chip | 34.71    | 21.97 | 13.92 | 14.63 | 9.54  | 11.06 |  |
| scale2_chip   | 25.53    | 14.50 | 10.30 | 10.09 | 8.48  | 8.59  |  |
| warping       | 15.77    | 11.11 | 5.23  | 5.56  | 5.31  | 4.88  |  |
| Geom. Avg.    | 13.90    | 9.71  | 6.87  | 7.44  | 6.63  | 6.93  |  |

Table F.10: Intra-Cluster Delay in nano-seconds (Cluster Size = 10)

# ${}^{\text{APPENDIX}}\,G$

#### Inter-Cluster (Routing) Delay

| Circuit       | LUT Size |       |       |       |       |       |  |
|---------------|----------|-------|-------|-------|-------|-------|--|
|               | 2        | 3     | 4     | 5     | 6     | 7     |  |
| alu4          | 23.87    | 25.94 | 20.54 | 15.86 | 17.29 | 25.58 |  |
| apex2         | 25.99    | 25.83 | 23.53 | 16.18 | 18.60 | 21.42 |  |
| apex4         | 27.04    | 19.17 | 16.16 | 14.24 | 17.31 | 15.57 |  |
| bigkey        | 16.77    | 12.41 | 11.89 | 23.67 | 12.31 | 9.48  |  |
| clma          | 54.56    | 45.21 | 46.47 | 44.92 | 35.57 | 27.08 |  |
| des           | 19.44    | 18.70 | 18.42 | 17.42 | 15.56 | 37.23 |  |
| diffeq        | 35.30    | 19.07 | 14.31 | 10.26 | 12.29 | 8.30  |  |
| dsip          | 16.93    | 13.14 | 10.85 | 11.19 | 10.33 | 27.62 |  |
| elliptic      | 45.09    | 24.30 | 22.24 | 21.36 | 19.76 | 23.08 |  |
| ex1010        | 40.54    | 49.35 | 36.57 | 71.35 | 44.33 | 20.40 |  |
| ex5p          | 19.76    | 19.59 | 14.54 | 13.55 | 13.36 | 8.82  |  |
| frisc         | 67.21    | 30.44 | 26.04 | 21.61 | 19.00 | 19.44 |  |
| misex3        | 21.44    | 21.15 | 17.35 | 19.65 | 12.50 | 18.83 |  |
| pdc           | 44.47    | 51.91 | 59.98 | 36.63 | 35.96 | 45.97 |  |
| s298          | 47.44    | 39.76 | 31.88 | 28.08 | 23.99 | 21.99 |  |
| s38417        | 38.98    | 20.43 | 23.95 | 15.42 | 10.78 | 9.31  |  |
| s38584        | 28.54    | 15.91 | 14.53 | 16.83 | 15.31 | 10.70 |  |
| seq           | 27.50    | 26.78 | 14.77 | 14.01 | 14.03 | 21.97 |  |
| spla          | 38.29    | 41.29 | 42.56 | 35.27 | 31.81 | 29.09 |  |
| tseng         | 35.49    | 18.83 | 12.87 | 11.35 | 9.55  | 7.40  |  |
| display_chip  | 44.37    | 24.88 | 14.01 | 9.19  | 8.38  | 5.98  |  |
| img_calc      | 106.72   | 67.18 | 34.57 | 23.43 | 22.35 | 14.94 |  |
| img_interp    | 49.21    | 30.08 | 17.71 | 14.68 | 13.36 | 8.39  |  |
| input_chip    | 36.93    | 24.61 | 11.13 | 10.27 | 7.50  | 6.58  |  |
| peak_chip     | 50.56    | 26.56 | 15.49 | 10.81 | 7.92  | 6.91  |  |
| scale125_chip | 67.26    | 31.31 | 16.94 | 15.74 | 12.27 | 9.36  |  |
| scale2_chip   | 40.71    | 22.09 | 14.11 | 22.52 | 8.84  | 8.22  |  |
| warping       | 29.18    | 17.85 | 10.28 | 8.12  | 6.20  | 8.73  |  |
| Geom. Avg.    | 35.63    | 25.66 | 19.48 | 17.80 | 14.97 | 14.47 |  |

Table G.1: Inter-Cluster Delay in nano-seconds (Cluster Size = 1)

| Circuit       | LUT Size |       |       |       |       |       |  |
|---------------|----------|-------|-------|-------|-------|-------|--|
|               | 2        | 3     | 4     | 5     | 6     | 7     |  |
| alu4          | 19.50    | 16.35 | 17.61 | 13.90 | 12.85 | 9.99  |  |
| apex2         | 22.57    | 19.97 | 14.67 | 15.66 | 13.34 | 14.25 |  |
| apex4         | 19.11    | 16.28 | 16.55 | 17.25 | 12.49 | 13.90 |  |
| bigkey        | 10.42    | 9.64  | 10.29 | 10.58 | 6.48  | 8.41  |  |
| clma          | 44.04    | 38.96 | 31.86 | 26.98 | 27.67 | 22.79 |  |
| des           | 14.71    | 14.17 | 14.32 | 16.81 | 12.45 | 10.86 |  |
| diffeq        | 22.01    | 14.21 | 13.41 | 10.72 | 7.97  | 7.69  |  |
| dsip          | 10.61    | 8.98  | 8.52  | 6.90  | 8.55  | 7.08  |  |
| elliptic      | 26.85    | 26.65 | 19.38 | 19.44 | 18.04 | 21.63 |  |
| ex1010        | 32.85    | 41.81 | 26.11 | 33.14 | 23.01 | 15.84 |  |
| ex5p          | 17.25    | 14.47 | 16.63 | 10.95 | 9.76  | 7.99  |  |
| frisc         | 37.73    | 27.81 | 21.66 | 17.83 | 16.22 | 16.15 |  |
| misex3        | 17.62    | 17.78 | 15.29 | 15.05 | 14.53 | 9.46  |  |
| pdc           | 33.23    | 30.80 | 30.48 | 22.76 | 23.02 | 16.47 |  |
| s298          | 32.24    | 23.92 | 23.94 | 21.57 | 25.74 | 16.35 |  |
| s38417        | 25.50    | 17.70 | 17.13 | 15.37 | 9.07  | 8.00  |  |
| s38584        | 15.75    | 15.29 | 15.97 | 13.33 | 9.48  | 9.02  |  |
| seq           | 19.07    | 15.52 | 12.87 | 13.79 | 9.91  | 17.70 |  |
| spla          | 32.92    | 30.31 | 29.00 | 24.62 | 20.68 | 23.13 |  |
| tseng         | 17.38    | 15.74 | 13.44 | 9.92  | 7.96  | 7.39  |  |
| display_chip  | 26.23    | 17.80 | 9.99  | 7.55  | 6.36  | 5.64  |  |
| img_calc      | 63.42    | 47.90 | 29.51 | 19.44 | 16.09 | 13.43 |  |
| img_interp    | 29.11    | 20.91 | 16.23 | 12.88 | 10.55 | 7.65  |  |
| input_chip    | 24.39    | 19.01 | 7.27  | 6.61  | 6.31  | 5.58  |  |
| peak_chip     | 32.28    | 20.82 | 11.55 | 8.99  | 7.45  | 7.06  |  |
| scale125_chip | 34.20    | 25.92 | 11.73 | 12.04 | 9.15  | 9.81  |  |
| scale2_chip   | 23.90    | 17.43 | 10.89 | 9.07  | 7.29  | 6.32  |  |
| warping       | 13.91    | 12.36 | 7.95  | 6.24  | 5.51  | 6.96  |  |
| Geom. Avg.    | 23.59    | 19.67 | 15.61 | 13.69 | 11.49 | 10.62 |  |

Table G.2: Inter-Cluster Delay in nano-seconds (Cluster Size = 2)

| Circuit       | LUT Size |       |       |       |       |       |
|---------------|----------|-------|-------|-------|-------|-------|
|               | 2        | 3     | 4     | 5     | 6     | 7     |
| alu4          | 15.91    | 19.61 | 9.69  | 10.03 | 8.44  | 9.41  |
| apex2         | 17.46    | 13.52 | 12.72 | 14.00 | 9.92  | 9.25  |
| apex4         | 17.67    | 15.09 | 12.54 | 15.31 | 11.43 | 12.22 |
| bigkey        | 9.48     | 7.77  | 6.10  | 6.39  | 4.95  | 6.13  |
| clma          | 40.09    | 28.86 | 29.72 | 24.17 | 24.59 | 25.05 |
| des           | 13.96    | 11.33 | 10.88 | 10.45 | 9.84  | 10.39 |
| diffeq        | 16.83    | 12.79 | 8.60  | 11.70 | 8.21  | 6.28  |
| dsip          | 9.62     | 7.37  | 6.29  | 6.32  | 5.69  | 7.00  |
| elliptic      | 20.21    | 22.07 | 17.08 | 19.13 | 13.66 | 16.05 |
| ex1010        | 28.84    | 32.88 | 23.31 | 18.97 | 26.58 | 13.28 |
| ex5p          | 14.80    | 13.26 | 11.16 | 10.65 | 8.10  | 8.23  |
| frisc         | 31.14    | 18.78 | 20.37 | 17.05 | 17.72 | 14.15 |
| misex3        | 14.58    | 15.01 | 10.93 | 9.37  | 8.81  | 7.77  |
| pdc           | 34.76    | 26.33 | 32.11 | 23.28 | 28.30 | 21.51 |
| s298          | 30.06    | 25.67 | 23.02 | 21.65 | 19.91 | 20.18 |
| s38417        | 23.81    | 20.43 | 12.78 | 13.06 | 8.81  | 7.01  |
| s38584        | 16.80    | 15.85 | 13.20 | 9.06  | 8.99  | 10.63 |
| seq           | 15.54    | 13.51 | 12.54 | 14.01 | 9.93  | 8.83  |
| spla          | 26.03    | 25.70 | 19.72 | 23.81 | 28.30 | 23.21 |
| tseng         | 16.44    | 11.33 | 8.66  | 8.36  | 8.02  | 6.72  |
| display_chip  | 20.24    | 14.54 | 8.82  | 5.59  | 5.33  | 4.80  |
| img_calc      | 56.17    | 43.98 | 23.04 | 12.35 | 12.41 | 12.17 |
| img_interp    | 24.41    | 17.05 | 15.86 | 15.44 | 8.49  | 5.63  |
| input_chip    | 16.42    | 9.88  | 8.49  | 4.18  | 4.59  | 5.84  |
| peak_chip     | 19.78    | 15.42 | 8.60  | 6.66  | 6.80  | 6.98  |
| scale125_chip | 24.95    | 19.42 | 11.47 | 10.15 | 7.79  | 6.67  |
| scale2_chip   | 20.02    | 12.40 | 9.81  | 7.65  | 6.63  | 6.37  |
| warping       | 10.43    | 7.35  | 5.71  | 5.00  | 5.17  | 4.60  |
| Geom. Avg.    | 19.84    | 16.14 | 12.61 | 11.30 | 10.05 | 9.38  |

Table G.3: Inter-Cluster Delay in nano-seconds (Cluster Size = 3)

| Circuit       |       |       | LUT   | Size  |       |       |
|---------------|-------|-------|-------|-------|-------|-------|
|               | 2     | 3     | 4     | 5     | 6     | 7     |
| alu4          | 30.29 | 14.95 | 10.76 | 9.01  | 8.59  | 8.13  |
| apex2         | 19.53 | 13.11 | 21.35 | 9.77  | 13.71 | 11.92 |
| apex4         | 16.86 | 18.22 | 13.35 | 16.80 | 13.74 | 9.73  |
| bigkey        | 9.55  | 8.98  | 6.26  | 5.46  | 5.06  | 6.54  |
| clma          | 34.54 | 27.59 | 24.81 | 26.66 | 23.98 | 23.43 |
| des           | 13.20 | 11.02 | 11.44 | 10.31 | 12.03 | 9.57  |
| diffeq        | 14.55 | 9.48  | 7.28  | 10.17 | 6.53  | 6.49  |
| dsip          | 8.04  | 8.71  | 8.86  | 4.93  | 8.00  | 6.61  |
| elliptic      | 15.68 | 22.11 | 15.05 | 18.74 | 23.59 | 13.34 |
| ex1010        | 28.21 | 34.15 | 19.24 | 18.83 | 22.49 | 15.15 |
| ex5p          | 15.20 | 13.99 | 10.76 | 12.39 | 9.84  | 8.35  |
| frisc         | 25.22 | 20.75 | 20.27 | 15.46 | 18.96 | 23.92 |
| misex3        | 17.44 | 14.55 | 11.85 | 11.59 | 8.89  | 9.81  |
| pdc           | 57.48 | 33.16 | 24.81 | 25.97 | 21.93 | 18.21 |
| s298          | 21.73 | 24.81 | 22.35 | 19.93 | 17.38 | 17.05 |
| s38417        | 18.85 | 17.72 | 11.46 | 11.01 | 7.26  | 10.21 |
| s38584        | 16.26 | 11.43 | 8.27  | 8.73  | 7.61  | 6.87  |
| seq           | 15.41 | 10.69 | 12.82 | 10.36 | 11.14 | 10.13 |
| spla          | 29.49 | 37.32 | 17.40 | 22.43 | 21.53 | 13.07 |
| tseng         | 17.63 | 8.99  | 8.03  | 7.72  | 7.11  | 6.61  |
| display_chip  | 14.31 | 12.25 | 5.85  | 4.96  | 4.29  | 4.04  |
| img_calc      | 48.90 | 36.17 | 21.13 | 10.41 | 12.73 | 11.84 |
| img_interp    | 18.91 | 15.07 | 18.48 | 8.84  | 10.22 | 6.26  |
| input_chip    | 14.30 | 12.33 | 5.80  | 3.11  | 4.33  | 5.66  |
| peak_chip     | 18.09 | 14.86 | 7.28  | 6.88  | 5.07  | 4.97  |
| scale125_chip | 23.25 | 16.73 | 10.97 | 8.40  | 7.60  | 8.44  |
| scale2_chip   | 17.25 | 10.64 | 7.88  | 7.86  | 7.30  | 4.65  |
| warping       | 7.72  | 10.85 | 4.09  | 4.84  | 4.33  | 4.23  |
| Geom. Avg.    | 18.82 | 15.80 | 11.72 | 10.32 | 10.05 | 9.06  |

Table G.4: Inter-Cluster Delay in nano-seconds (Cluster Size = 4)

| Circuit       | LUT Size |       |       |       |       |       |  |  |
|---------------|----------|-------|-------|-------|-------|-------|--|--|
|               | 2        | 3     | 4     | 5     | 6     | 7     |  |  |
| alu4          | 19.84    | 11.57 | 9.96  | 9.52  | 6.99  | 7.53  |  |  |
| apex2         | 18.98    | 15.29 | 12.74 | 11.93 | 8.78  | 10.46 |  |  |
| apex4         | 17.53    | 12.95 | 11.70 | 10.28 | 10.98 | 10.99 |  |  |
| bigkey        | 8.53     | 6.16  | 5.93  | 5.25  | 5.17  | 4.13  |  |  |
| clma          | 34.52    | 26.56 | 21.57 | 28.18 | 22.95 | 18.02 |  |  |
| des           | 11.09    | 13.18 | 11.22 | 10.24 | 9.12  | 11.24 |  |  |
| diffeq        | 10.88    | 9.70  | 5.62  | 6.93  | 6.41  | 5.94  |  |  |
| dsip          | 9.24     | 7.00  | 7.62  | 5.51  | 4.99  | 5.04  |  |  |
| elliptic      | 11.66    | 16.56 | 14.43 | 14.03 | 9.98  | 12.05 |  |  |
| ex1010        | 44.08    | 52.62 | 35.73 | 29.53 | 26.97 | 13.86 |  |  |
| ex5p          | 12.79    | 15.30 | 10.95 | 12.60 | 9.59  | 6.86  |  |  |
| frisc         | 20.28    | 15.57 | 15.11 | 14.02 | 16.70 | 17.30 |  |  |
| misex3        | 12.52    | 24.44 | 12.50 | 15.74 | 8.62  | 8.87  |  |  |
| pdc           | 30.65    | 44.92 | 30.16 | 24.92 | 25.51 | 20.92 |  |  |
| s298          | 32.15    | 21.76 | 26.21 | 23.92 | 18.19 | 15.90 |  |  |
| s38417        | 36.28    | 12.07 | 13.98 | 12.55 | 7.38  | 6.86  |  |  |
| s38584        | 13.32    | 10.43 | 8.10  | 12.92 | 13.82 | 7.08  |  |  |
| seq           | 15.23    | 12.53 | 10.46 | 10.94 | 10.92 | 8.77  |  |  |
| spla          | 28.74    | 27.45 | 32.59 | 19.34 | 19.01 | 21.62 |  |  |
| tseng         | 13.11    | 11.34 | 7.64  | 8.33  | 6.25  | 6.37  |  |  |
| display_chip  | 15.16    | 10.60 | 6.77  | 4.33  | 4.26  | 3.84  |  |  |
| img_calc      | 48.88    | 40.73 | 20.25 | 9.98  | 12.00 | 10.70 |  |  |
| img_interp    | 18.68    | 13.13 | 15.80 | 12.53 | 7.67  | 5.38  |  |  |
| input_chip    | 12.38    | 11.59 | 7.15  | 3.98  | 3.74  | 6.23  |  |  |
| peak_chip     | 16.24    | 13.59 | 6.34  | 5.89  | 6.34  | 4.66  |  |  |
| scale125_chip | 21.47    | 15.36 | 9.64  | 6.60  | 6.20  | 7.52  |  |  |
| scale2_chip   | 12.84    | 10.96 | 8.44  | 8.33  | 6.67  | 5.42  |  |  |
| warping       | 7.38     | 5.43  | 3.63  | 5.15  | 3.91  | 3.19  |  |  |
| Geom. Avg.    | 17.42    | 14.87 | 11.61 | 10.57 | 9.14  | 8.34  |  |  |

Table G.5: Inter-Cluster Delay in nano-seconds (Cluster Size = 5)

| Circuit       |       |       | LUT   | Size  |       |       |
|---------------|-------|-------|-------|-------|-------|-------|
|               | 2     | 3     | 4     | 5     | 6     | 7     |
| alu4          | 12.81 | 14.12 | 12.06 | 9.23  | 7.45  | 8.94  |
| apex2         | 17.03 | 14.02 | 10.81 | 10.96 | 9.12  | 10.48 |
| apex4         | 22.35 | 16.06 | 12.58 | 12.21 | 8.73  | 8.04  |
| bigkey        | 6.47  | 6.07  | 6.86  | 4.50  | 4.54  | 4.87  |
| clma          | 26.77 | 29.81 | 26.35 | 20.97 | 24.35 | 15.32 |
| des           | 10.07 | 10.34 | 10.86 | 9.08  | 8.74  | 9.82  |
| diffeq        | 10.62 | 8.07  | 5.96  | 10.34 | 7.26  | 6.52  |
| dsip          | 6.23  | 8.16  | 4.97  | 5.02  | 4.80  | 5.59  |
| elliptic      | 12.49 | 14.58 | 11.48 | 18.71 | 20.01 | 9.95  |
| ex1010        | 25.16 | 37.59 | 28.62 | 30.71 | 15.34 | 12.82 |
| ex5p          | 14.20 | 12.94 | 12.64 | 13.70 | 8.95  | 6.46  |
| frisc         | 22.26 | 15.77 | 16.85 | 15.34 | 13.39 | 14.82 |
| misex3        | 12.84 | 10.94 | 12.83 | 7.47  | 7.14  | 8.18  |
| pdc           | 30.98 | 31.85 | 24.82 | 18.39 | 28.65 | 21.07 |
| s298          | 28.12 | 26.15 | 18.69 | 19.92 | 14.68 | 13.44 |
| s38417        | 17.16 | 13.37 | 10.88 | 9.32  | 6.20  | 14.71 |
| s38584        | 13.15 | 10.21 | 8.41  | 7.48  | 7.82  | 7.44  |
| seq           | 15.42 | 13.15 | 10.47 | 9.18  | 7.55  | 8.55  |
| spla          | 27.42 | 22.12 | 35.04 | 21.14 | 17.24 | 10.83 |
| tseng         | 10.75 | 9.56  | 7.21  | 7.30  | 6.59  | 6.06  |
| display_chip  | 13.44 | 10.57 | 4.58  | 5.39  | 3.93  | 3.51  |
| img_calc      | 39.63 | 29.29 | 17.03 | 13.32 | 11.34 | 9.91  |
| img_interp    | 17.59 | 14.06 | 13.43 | 11.19 | 6.82  | 7.39  |
| input_chip    | 14.37 | 9.35  | 5.53  | 3.15  | 3.53  | 3.66  |
| peak_chip     | 14.51 | 10.61 | 7.20  | 6.10  | 6.75  | 4.96  |
| scale125_chip | 23.18 | 12.82 | 7.01  | 6.47  | 8.52  | 5.56  |
| scale2_chip   | 10.40 | 10.01 | 7.62  | 6.15  | 5.56  | 4.76  |
| warping       | 6.57  | 3.57  | 4.17  | 5.02  | 4.55  | 3.54  |
| Geom. Avg.    | 15.46 | 13.31 | 10.81 | 9.79  | 8.59  | 7.93  |

Table G.6: Inter-Cluster Delay in nano-seconds (Cluster Size = 6)

| Circuit       | LUT Size |       |       |       |       |       |  |  |
|---------------|----------|-------|-------|-------|-------|-------|--|--|
|               | 2        | 3     | 4     | 5     | 6     | 7     |  |  |
| alu4          | 13.35    | 10.70 | 10.96 | 9.59  | 6.32  | 6.89  |  |  |
| apex2         | 21.42    | 10.65 | 11.98 | 13.58 | 11.67 | 8.81  |  |  |
| apex4         | 13.49    | 18.92 | 13.54 | 12.64 | 7.92  | 13.17 |  |  |
| bigkey        | 6.21     | 5.26  | 5.62  | 6.31  | 4.17  | 4.87  |  |  |
| clma          | 44.02    | 26.12 | 22.58 | 29.92 | 20.71 | 21.47 |  |  |
| des           | 9.78     | 10.06 | 10.29 | 8.36  | 7.29  | 8.60  |  |  |
| diffeq        | 8.91     | 7.96  | 7.08  | 5.44  | 5.71  | 4.89  |  |  |
| dsip          | 7.41     | 5.35  | 4.92  | 4.91  | 4.79  | 4.43  |  |  |
| elliptic      | 20.46    | 19.98 | 19.49 | 20.71 | 17.42 | 11.70 |  |  |
| ex1010        | 24.50    | 29.46 | 19.20 | 20.90 | 22.53 | 12.52 |  |  |
| ex5p          | 15.29    | 14.37 | 10.60 | 12.04 | 9.33  | 6.04  |  |  |
| frisc         | 16.02    | 16.59 | 16.74 | 19.48 | 14.88 | 13.84 |  |  |
| misex3        | 13.26    | 11.35 | 11.21 | 8.40  | 7.24  | 8.32  |  |  |
| pdc           | 26.68    | 30.39 | 25.38 | 32.58 | 25.02 | 21.91 |  |  |
| s298          | 25.95    | 25.16 | 23.52 | 18.73 | 14.38 | 10.36 |  |  |
| s38417        | 23.57    | 17.35 | 25.63 | 9.90  | 11.92 | 8.76  |  |  |
| s38584        | 12.44    | 12.23 | 7.57  | 10.72 | 7.76  | 9.47  |  |  |
| seq           | 17.54    | 10.29 | 10.30 | 12.64 | 8.16  | 6.89  |  |  |
| spla          | 25.24    | 23.30 | 21.43 | 24.59 | 13.61 | 12.17 |  |  |
| tseng         | 9.79     | 7.78  | 6.39  | 7.03  | 5.85  | 6.58  |  |  |
| display_chip  | 14.38    | 8.50  | 5.69  | 4.93  | 7.63  | 2.99  |  |  |
| img_calc      | 39.32    | 28.25 | 15.14 | 10.17 | 10.33 | 9.35  |  |  |
| img_interp    | 15.20    | 8.96  | 12.20 | 8.37  | 7.28  | 4.93  |  |  |
| input_chip    | 12.97    | 8.70  | 5.29  | 2.40  | 3.22  | 3.83  |  |  |
| peak_chip     | 12.47    | 9.68  | 7.43  | 6.59  | 5.53  | 6.27  |  |  |
| scale125_chip | 13.03    | 13.55 | 5.95  | 4.66  | 8.92  | 6.19  |  |  |
| scale2_chip   | 9.68     | 7.37  | 6.72  | 5.11  | 4.58  | 4.08  |  |  |
| warping       | 10.46    | 3.75  | 3.20  | 4.67  | 4.10  | 3.15  |  |  |
| Geom. Avg.    | 15.40    | 12.40 | 10.61 | 9.83  | 8.60  | 7.58  |  |  |

Table G.7: Inter-Cluster Delay in nano-seconds (Cluster Size = 7)

| Circuit       |       |       | LUT   | Size  |       |       |
|---------------|-------|-------|-------|-------|-------|-------|
|               | 2     | 3     | 4     | 5     | 6     | 7     |
| alu4          | 12.46 | 10.63 | 8.91  | 8.48  | 9.78  | 7.97  |
| apex2         | 13.92 | 11.69 | 10.13 | 13.49 | 11.01 | 9.76  |
| apex4         | 13.49 | 18.72 | 10.58 | 14.76 | 10.74 | 10.31 |
| bigkey        | 7.58  | 5.51  | 5.71  | 5.65  | 5.71  | 5.05  |
| clma          | 34.48 | 24.50 | 24.65 | 19.65 | 22.27 | 20.44 |
| des           | 9.55  | 10.27 | 9.62  | 7.93  | 6.90  | 10.45 |
| diffeq        | 7.87  | 6.60  | 6.52  | 7.97  | 5.09  | 5.86  |
| dsip          | 6.62  | 6.96  | 5.72  | 4.36  | 4.33  | 4.75  |
| elliptic      | 9.35  | 13.32 | 15.60 | 17.03 | 15.16 | 10.39 |
| ex1010        | 25.66 | 32.85 | 19.07 | 19.11 | 16.92 | 13.21 |
| ex5p          | 13.10 | 12.23 | 10.48 | 11.43 | 10.07 | 6.75  |
| frisc         | 14.56 | 15.99 | 17.12 | 13.51 | 14.90 | 14.32 |
| misex3        | 13.20 | 9.65  | 11.00 | 10.28 | 8.82  | 8.12  |
| pdc           | 25.69 | 24.38 | 30.03 | 19.67 | 17.43 | 14.28 |
| s298          | 29.17 | 21.68 | 19.53 | 25.12 | 22.87 | 11.26 |
| s38417        | 17.36 | 14.62 | 10.77 | 7.78  | 6.24  | 6.41  |
| s38584        | 12.30 | 9.90  | 9.02  | 13.48 | 7.12  | 6.73  |
| seq           | 14.37 | 11.63 | 10.26 | 7.81  | 9.28  | 8.75  |
| spla          | 23.74 | 22.16 | 21.68 | 29.07 | 17.45 | 13.40 |
| tseng         | 9.61  | 9.52  | 6.01  | 7.49  | 5.92  | 8.77  |
| display_chip  | 11.16 | 9.50  | 5.19  | 5.02  | 4.40  | 2.86  |
| img_calc      | 34.91 | 27.97 | 15.64 | 11.04 | 10.33 | 9.49  |
| img_interp    | 15.08 | 8.40  | 10.39 | 8.37  | 9.46  | 4.34  |
| input_chip    | 13.79 | 6.34  | 2.17  | 1.99  | 3.23  | 4.04  |
| peak_chip     | 11.72 | 9.41  | 7.70  | 5.97  | 5.18  | 4.88  |
| scale125_chip | 18.82 | 12.16 | 7.68  | 5.68  | 6.45  | 5.02  |
| scale2_chip   | 10.74 | 9.08  | 6.75  | 4.91  | 8.07  | 3.61  |
| warping       | 5.40  | 2.80  | 6.59  | 4.45  | 3.50  | 4.16  |
| Geom. Avg.    | 13.89 | 11.76 | 10.01 | 9.37  | 8.64  | 7.51  |

Table G.8: Inter-Cluster Delay in nano-seconds (Cluster Size = 8)

| Circuit       | LUT Size |       |       |       |       |       |  |  |
|---------------|----------|-------|-------|-------|-------|-------|--|--|
|               | 2        | 3     | 4     | 5     | 6     | 7     |  |  |
| alu4          | 16.30    | 11.15 | 8.89  | 8.42  | 7.73  | 6.92  |  |  |
| apex2         | 14.25    | 14.34 | 10.89 | 9.98  | 10.53 | 9.14  |  |  |
| apex4         | 15.07    | 12.50 | 14.58 | 9.33  | 13.90 | 10.26 |  |  |
| bigkey        | 7.20     | 5.47  | 6.37  | 4.83  | 4.85  | 5.19  |  |  |
| clma          | 26.51    | 21.35 | 24.93 | 29.29 | 23.13 | 17.25 |  |  |
| des           | 11.74    | 11.72 | 9.81  | 8.90  | 8.55  | 8.05  |  |  |
| diffeq        | 8.93     | 7.05  | 6.37  | 6.22  | 5.64  | 4.87  |  |  |
| dsip          | 6.38     | 6.02  | 5.37  | 6.94  | 4.56  | 4.64  |  |  |
| elliptic      | 7.36     | 15.18 | 16.66 | 13.70 | 11.68 | 11.82 |  |  |
| ex1010        | 21.97    | 32.24 | 17.85 | 19.12 | 18.34 | 10.78 |  |  |
| ex5p          | 12.83    | 14.73 | 13.11 | 9.40  | 7.03  | 6.01  |  |  |
| frisc         | 15.44    | 15.08 | 14.91 | 22.05 | 17.91 | 13.38 |  |  |
| misex3        | 14.06    | 10.66 | 10.86 | 7.39  | 7.57  | 9.70  |  |  |
| pdc           | 30.51    | 28.33 | 24.17 | 22.00 | 23.09 | 13.65 |  |  |
| s298          | 21.02    | 19.99 | 24.09 | 13.64 | 13.36 | 13.30 |  |  |
| s38417        | 20.07    | 13.96 | 19.55 | 9.65  | 6.23  | 7.54  |  |  |
| s38584        | 9.16     | 9.48  | 10.49 | 7.00  | 7.77  | 6.96  |  |  |
| seq           | 10.44    | 11.74 | 12.97 | 9.35  | 7.74  | 8.66  |  |  |
| spla          | 21.55    | 24.17 | 22.72 | 27.93 | 22.89 | 14.16 |  |  |
| tseng         | 12.04    | 6.67  | 4.46  | 7.71  | 6.40  | 5.12  |  |  |
| display_chip  | 11.88    | 7.52  | 5.21  | 4.42  | 6.64  | 2.59  |  |  |
| img_calc      | 33.23    | 24.65 | 12.19 | 10.49 | 9.82  | 7.85  |  |  |
| img_interp    | 15.21    | 8.07  | 9.58  | 9.18  | 7.17  | 4.35  |  |  |
| input_chip    | 12.08    | 6.43  | 3.10  | 1.23  | 3.42  | 3.54  |  |  |
| peak_chip     | 10.09    | 7.39  | 7.25  | 5.06  | 4.82  | 4.86  |  |  |
| scale125_chip | 20.31    | 10.76 | 6.74  | 5.47  | 5.70  | 4.74  |  |  |
| scale2_chip   | 13.89    | 9.05  | 6.90  | 4.13  | 4.66  | 3.71  |  |  |
| warping       | 5.50     | 3.34  | 4.86  | 4.13  | 3.57  | 2.94  |  |  |
| Geom. Avg.    | 13.74    | 11.45 | 10.30 | 8.65  | 8.38  | 7.02  |  |  |

Table G.9: Inter-Cluster Delay in nano-seconds (Cluster Size = 9)

| Circuit       | LUT Size |       |       |       |       |       |  |  |
|---------------|----------|-------|-------|-------|-------|-------|--|--|
|               | 2        | 3     | 4     | 5     | 6     | 7     |  |  |
| alu4          | 20.26    | 9.20  | 11.54 | 7.48  | 7.36  | 6.47  |  |  |
| apex2         | 12.72    | 13.11 | 9.95  | 11.72 | 9.10  | 9.27  |  |  |
| apex4         | 18.79    | 12.12 | 11.78 | 10.41 | 9.34  | 10.71 |  |  |
| bigkey        | 6.12     | 5.70  | 6.17  | 4.78  | 4.81  | 4.30  |  |  |
| clma          | 25.72    | 22.23 | 22.70 | 19.24 | 17.61 | 17.85 |  |  |
| des           | 11.23    | 8.80  | 8.57  | 7.94  | 8.81  | 8.20  |  |  |
| diffeq        | 10.34    | 6.20  | 4.52  | 8.04  | 5.14  | 4.31  |  |  |
| dsip          | 6.11     | 6.43  | 4.94  | 4.88  | 4.21  | 4.32  |  |  |
| elliptic      | 7.25     | 20.07 | 14.14 | 15.50 | 20.25 | 9.05  |  |  |
| ex1010        | 24.02    | 34.91 | 23.36 | 17.23 | 17.24 | 11.26 |  |  |
| ex5p          | 14.12    | 14.86 | 10.27 | 10.05 | 8.28  | 6.50  |  |  |
| frisc         | 15.45    | 15.31 | 14.19 | 16.14 | 17.08 | 17.91 |  |  |
| misex3        | 14.57    | 10.60 | 8.47  | 8.24  | 8.25  | 8.16  |  |  |
| pdc           | 23.85    | 27.83 | 20.91 | 17.42 | 13.97 | 22.00 |  |  |
| s298          | 21.35    | 17.91 | 19.79 | 14.44 | 14.89 | 12.52 |  |  |
| s38417        | 23.18    | 17.03 | 10.94 | 12.01 | 7.01  | 8.86  |  |  |
| s38584        | 7.07     | 10.68 | 10.66 | 7.32  | 7.24  | 7.41  |  |  |
| seq           | 11.41    | 10.73 | 9.91  | 8.24  | 8.41  | 8.34  |  |  |
| spla          | 21.74    | 24.50 | 21.77 | 16.40 | 15.24 | 20.73 |  |  |
| tseng         | 5.95     | 7.63  | 6.23  | 6.70  | 5.63  | 3.87  |  |  |
| display_chip  | 8.75     | 6.35  | 4.95  | 3.92  | 4.06  | 4.40  |  |  |
| img_calc      | 29.34    | 21.71 | 16.31 | 10.53 | 9.38  | 8.64  |  |  |
| img_interp    | 13.67    | 8.05  | 10.40 | 5.76  | 7.39  | 4.00  |  |  |
| input_chip    | 11.42    | 6.57  | 3.24  | 1.29  | 4.81  | 3.96  |  |  |
| peak_chip     | 11.34    | 9.38  | 6.53  | 5.86  | 4.63  | 4.56  |  |  |
| scale125_chip | 13.33    | 9.98  | 5.46  | 4.92  | 6.92  | 3.51  |  |  |
| scale2_chip   | 9.32     | 8.83  | 6.36  | 5.00  | 4.49  | 5.13  |  |  |
| warping       | 5.65     | 2.53  | 4.76  | 4.44  | 3.62  | 3.07  |  |  |
| Geom. Avg.    | 12.87    | 11.30 | 9.56  | 8.18  | 8.04  | 7.27  |  |  |

Table G.10: Inter-Cluster Delay in nano-seconds (Cluster Size = 10)

## ${\sf APPENDIX}\, H$

#### Number of BLE Levels on Critical Path

| Circuit       | LUT Size |       |      |      |      |      |  |
|---------------|----------|-------|------|------|------|------|--|
|               | 2        | 3     | 4    | 5    | 6    | 7    |  |
| alu4          | 14       | 7     | 7    | 6    | 4    | 4    |  |
| apex2         | 15       | 7     | 6    | 7    | 6    | 5    |  |
| apex4         | 11       | 7     | 6    | 6    | 5    | 5    |  |
| bigkey        | 10       | 5     | 3    | 2    | 3    | 2    |  |
| clma          | 36       | 12    | 9    | 7    | 9    | 9    |  |
| des           | 13       | 8     | 6    | 5    | 3    | 3    |  |
| diffeq        | 39       | 20    | 14   | 10   | 5    | 6    |  |
| dsip          | 10       | 5     | 3    | 3    | 3    | 2    |  |
| elliptic      | 52       | 22    | 15   | 12   | 10   | 9    |  |
| ex1010        | 16       | 11    | 7    | 5    | 6    | 5    |  |
| ex5p          | 14       | 9     | 6    | 5    | 4    | 4    |  |
| frisc         | 67       | 30    | 23   | 16   | 13   | 11   |  |
| misex3        | 13       | 7     | 6    | 4    | 5    | 4    |  |
| pdc           | 16       | 9     | 8    | 7    | 6    | 5    |  |
| s298          | 31       | 21    | 14   | 13   | 11   | 10   |  |
| s38417        | 18       | 16    | 9    | 8    | 7    | 6    |  |
| s38584        | 25       | 12    | 9    | 6    | 5    | 6    |  |
| seq           | 10       | 7     | 6    | 6    | 5    | 5    |  |
| spla          | 14       | 11    | 8    | 5    | 5    | 5    |  |
| tseng         | 43       | 21    | 13   | 10   | 8    | 7    |  |
| display_chip  | 52       | 26    | 14   | 11   | 8    | 6    |  |
| img_calc      | 102      | 55    | 27   | 21   | 15   | 14   |  |
| img_interp    | 54       | 28    | 13   | 10   | 7    | 6    |  |
| input_chip    | 47       | 23    | 14   | 12   | 8    | 6    |  |
| peak_chip     | 50       | 26    | 15   | 10   | 8    | 7    |  |
| scale125_chip | 59       | 33    | 20   | 16   | 9    | 9    |  |
| scale2_chip   | 45       | 24    | 15   | 6    | 8    | 7    |  |
| warping       | 30       | 14    | 10   | 7    | 5    | 2    |  |
| Geom. Avg.    | 25.72    | 13.97 | 9.53 | 7.43 | 6.26 | 5.46 |  |

Table H.1: Number of BLEs on Critical Path (Cluster Size = 1)

| Circuit       | LUT Size |       |      |      |      |      |  |
|---------------|----------|-------|------|------|------|------|--|
|               | 2        | 3     | 4    | 5    | 6    | 7    |  |
| alu4          | 12       | 7     | 7    | 6    | 5    | 5    |  |
| apex2         | 13       | 9     | 8    | 7    | 6    | 5    |  |
| apex4         | 12       | 7     | 6    | 5    | 5    | 4    |  |
| bigkey        | 10       | 5     | 3    | 3    | 2    | 2    |  |
| clma          | 37       | 12    | 14   | 13   | 10   | 9    |  |
| des           | 12       | 9     | 6    | 3    | 3    | 3    |  |
| diffeq        | 37       | 20    | 14   | 9    | 8    | 6    |  |
| dsip          | 9        | 6     | 3    | 3    | 2    | 2    |  |
| elliptic      | 51       | 22    | 15   | 12   | 10   | 9    |  |
| ex1010        | 15       | 10    | 7    | 6    | 6    | 6    |  |
| ex5p          | 12       | 8     | 7    | 6    | 5    | 4    |  |
| frisc         | 62       | 29    | 20   | 16   | 14   | 11   |  |
| misex3        | 11       | 7     | 5    | 4    | 4    | 5    |  |
| pdc           | 15       | 10    | 9    | 7    | 6    | 6    |  |
| s298          | 32       | 21    | 15   | 13   | 11   | 10   |  |
| s38417        | 24       | 15    | 7    | 6    | 7    | 6    |  |
| s38584        | 18       | 14    | 2    | 6    | 6    | 6    |  |
| seq           | 13       | 7     | 7    | 6    | 5    | 5    |  |
| spla          | 15       | 10    | 8    | 7    | 5    | 4    |  |
| tseng         | 42       | 20    | 11   | 10   | 8    | 7    |  |
| display_chip  | 46       | 26    | 14   | 11   | 8    | 6    |  |
| img_calc      | 103      | 53    | 27   | 20   | 15   | 14   |  |
| img_interp    | 49       | 28    | 13   | 9    | 8    | 7    |  |
| input_chip    | 43       | 23    | 14   | 12   | 8    | 6    |  |
| peak_chip     | 42       | 25    | 15   | 10   | 8    | 6    |  |
| scale125_chip | 62       | 33    | 20   | 16   | 10   | 9    |  |
| scale2_chip   | 43       | 23    | 15   | 11   | 8    | 7    |  |
| warping       | 28       | 13    | 9    | 7    | 5    | 2    |  |
| Geom. Avg.    | 24.64    | 14.06 | 9.16 | 7.73 | 6.37 | 5.53 |  |

Table H.2: Number of BLEs on Critical Path (Cluster Size = 2)

| Circuit       | LUT Size |       |      |      |      |      |  |
|---------------|----------|-------|------|------|------|------|--|
|               | 2        | 3     | 4    | 5    | 6    | 7    |  |
| alu4          | 14       | 6     | 7    | 6    | 6    | 5    |  |
| apex2         | 13       | 10    | 8    | 7    | 6    | 6    |  |
| apex4         | 10       | 7     | 6    | 6    | 5    | 5    |  |
| bigkey        | 9        | 5     | 3    | 3    | 3    | 2    |  |
| clma          | 35       | 18    | 11   | 12   | 8    | 8    |  |
| des           | 10       | 9     | 6    | 4    | 3    | 3    |  |
| diffeq        | 33       | 17    | 14   | 8    | 6    | 6    |  |
| dsip          | 8        | 6     | 3    | 3    | 3    | 2    |  |
| elliptic      | 44       | 19    | 15   | 11   | 10   | 8    |  |
| ex1010        | 14       | 10    | 7    | 7    | 6    | 5    |  |
| ex5p          | 13       | 8     | 7    | 5    | 5    | 4    |  |
| frisc         | 55       | 30    | 19   | 15   | 12   | 10   |  |
| misex3        | 11       | 7     | 6    | 6    | 5    | 5    |  |
| pdc           | 14       | 9     | 7    | 7    | 6    | 5    |  |
| s298          | 32       | 20    | 15   | 11   | 10   | 9    |  |
| s38417        | 20       | 14    | 10   | 9    | 7    | 6    |  |
| s38584        | 13       | 4     | 8    | 7    | 6    | 5    |  |
| seq           | 11       | 6     | 5    | 5    | 5    | 5    |  |
| spla          | 15       | 10    | 8    | 5    | 6    | 5    |  |
| tseng         | 37       | 19    | 13   | 10   | 8    | 7    |  |
| display_chip  | 43       | 25    | 14   | 11   | 8    | 6    |  |
| img_calc      | 84       | 48    | 27   | 24   | 16   | 13   |  |
| img_interp    | 50       | 28    | 12   | 9    | 8    | 7    |  |
| input_chip    | 44       | 23    | 11   | 12   | 8    | 6    |  |
| peak_chip     | 47       | 25    | 15   | 11   | 8    | 6    |  |
| scale125_chip | 58       | 32    | 20   | 16   | 10   | 9    |  |
| scale2_chip   | 42       | 23    | 14   | 11   | 8    | 7    |  |
| warping       | 26       | 14    | 9    | 6    | 4    | 4    |  |
| Geom. Avg.    | 22.93    | 13.23 | 9.43 | 7.87 | 6.48 | 5.58 |  |

Table H.3: Number of BLEs on Critical Path (Cluster Size = 3)

| Circuit       |       |       | LUT  | Size |      |      |
|---------------|-------|-------|------|------|------|------|
|               | 2     | 3     | 4    | 5    | 6    | 7    |
| alu4          | 10    | 8     | 7    | 6    | 5    | 5    |
| apex2         | 12    | 10    | 7    | 7    | 6    | 4    |
| apex4         | 11    | 7     | 6    | 6    | 5    | 4    |
| bigkey        | 9     | 6     | 3    | 3    | 3    | 2    |
| clma          | 35    | 17    | 14   | 10   | 10   | 9    |
| des           | 13    | 9     | 5    | 4    | 1    | 3    |
| diffeq        | 33    | 20    | 14   | 9    | 7    | 6    |
| dsip          | 10    | 6     | 3    | 3    | 3    | 1    |
| elliptic      | 51    | 19    | 15   | 12   | 13   | 9    |
| ex1010        | 13    | 10    | 7    | 6    | 6    | 5    |
| ex5p          | 10    | 8     | 6    | 5    | 5    | 3    |
| frisc         | 64    | 26    | 20   | 15   | 13   | 7    |
| misex3        | 8     | 5     | 5    | 4    | 4    | 4    |
| pdc           | 16    | 10    | 9    | 7    | 6    | 5    |
| s298          | 32    | 21    | 14   | 13   | 11   | 9    |
| s38417        | 24    | 10    | 11   | 9    | 7    | 5    |
| s38584        | 10    | 8     | 9    | 6    | 6    | 6    |
| seq           | 11    | 8     | 4    | 5    | 4    | 4    |
| spla          | 10    | 8     | 8    | 6    | 6    | 5    |
| tseng         | 38    | 21    | 13   | 10   | 8    | 7    |
| display_chip  | 47    | 26    | 14   | 11   | 8    | 6    |
| img_calc      | 90    | 56    | 24   | 24   | 15   | 13   |
| img_interp    | 49    | 26    | 6    | 10   | 8    | 7    |
| input_chip    | 42    | 22    | 12   | 12   | 8    | 6    |
| peak_chip     | 45    | 24    | 14   | 10   | 8    | 6    |
| scale125_chip | 59    | 33    | 18   | 16   | 10   | 8    |
| scale2_chip   | 43    | 23    | 15   | 11   | 8    | 7    |
| warping       | 28    | 9     | 10   | 6    | 5    | 4    |
| Geom. Avg.    | 22.64 | 13.42 | 9.13 | 7.77 | 6.31 | 5.14 |

Table H.4: Number of BLEs on Critical Path (Cluster Size = 4)

| Circuit       | LUT Size |       |      |      |      |      |  |
|---------------|----------|-------|------|------|------|------|--|
|               | 2        | 3     | 4    | 5    | 6    | 7    |  |
| alu4          | 12       | 8     | 6    | 5    | 6    | 5    |  |
| apex2         | 12       | 7     | 7    | 6    | 6    | 5    |  |
| apex4         | 10       | 6     | 6    | 6    | 5    | 4    |  |
| bigkey        | 8        | 6     | 3    | 3    | 3    | 2    |  |
| clma          | 32       | 19    | 14   | 11   | 10   | 9    |  |
| des           | 13       | 7     | 5    | 5    | 3    | 3    |  |
| diffeq        | 36       | 19    | 14   | 10   | 7    | 6    |  |
| dsip          | 8        | 4     | 3    | 3    | 3    | 2    |  |
| elliptic      | 51       | 20    | 15   | 12   | 10   | 9    |  |
| ex1010        | 12       | 9     | 8    | 6    | 5    | 5    |  |
| ex5p          | 13       | 8     | 6    | 5    | 5    | 4    |  |
| frisc         | 60       | 28    | 19   | 15   | 13   | 9    |  |
| misex3        | 12       | 6     | 5    | 5    | 5    | 4    |  |
| pdc           | 15       | 11    | 8    | 6    | 6    | 5    |  |
| s298          | 29       | 21    | 14   | 12   | 11   | 10   |  |
| s38417        | 6        | 15    | 10   | 9    | 7    | 6    |  |
| s38584        | 17       | 11    | 9    | 6    | 6    | 6    |  |
| seq           | 13       | 7     | 6    | 5    | 4    | 4    |  |
| spla          | 13       | 9     | 6    | 6    | 6    | 5    |  |
| tseng         | 40       | 18    | 13   | 10   | 8    | 7    |  |
| display_chip  | 46       | 24    | 13   | 11   | 8    | 6    |  |
| img_calc      | 94       | 46    | 25   | 24   | 14   | 13   |  |
| img_interp    | 48       | 25    | 11   | 9    | 8    | 7    |  |
| input_chip    | 42       | 22    | 10   | 11   | 8    | 6    |  |
| peak_chip     | 47       | 24    | 15   | 10   | 8    | 6    |  |
| scale125_chip | 58       | 33    | 19   | 16   | 10   | 9    |  |
| scale2_chip   | 44       | 23    | 14   | 10   | 8    | 7    |  |
| warping       | 26       | 15    | 10   | 6    | 5    | 4    |  |
| Geom. Avg.    | 22.40    | 13.39 | 9.22 | 7.72 | 6.54 | 5.50 |  |

Table H.5: Number of BLEs on Critical Path (Cluster Size = 5)

| Circuit       | LUT Size |       |      |      |      |      |  |
|---------------|----------|-------|------|------|------|------|--|
|               | 2        | 3     | 4    | 5    | 6    | 7    |  |
| alu4          | 13       | 5     | 7    | 6    | 6    | 5    |  |
| apex2         | 14       | 9     | 8    | 7    | 5    | 5    |  |
| apex4         | 12       | 7     | 5    | 6    | 5    | 5    |  |
| bigkey        | 10       | 6     | 3    | 3    | 3    | 2    |  |
| clma          | 35       | 19    | 13   | 13   | 7    | 9    |  |
| des           | 13       | 8     | 6    | 5    | 3    | 3    |  |
| diffeq        | 34       | 19    | 14   | 8    | 6    | 6    |  |
| dsip          | 9        | 5     | 3    | 3    | 3    | 2    |  |
| elliptic      | 49       | 20    | 15   | 10   | 8    | 9    |  |
| ex1010        | 15       | 10    | 6    | 6    | 6    | 5    |  |
| ex5p          | 11       | 8     | 6    | 5    | 5    | 4    |  |
| frisc         | 62       | 28    | 19   | 16   | 13   | 10   |  |
| misex3        | 12       | 7     | 5    | 6    | 5    | 4    |  |
| pdc           | 16       | 10    | 9    | 7    | 7    | 5    |  |
| s298          | 31       | 22    | 15   | 12   | 11   | 10   |  |
| s38417        | 22       | 15    | 11   | 8    | 7    | 5    |  |
| s38584        | 12       | 10    | 9    | 7    | 6    | 6    |  |
| seq           | 11       | 7     | 7    | 5    | 5    | 4    |  |
| spla          | 14       | 9     | 7    | 6    | 5    | 6    |  |
| tseng         | 39       | 20    | 13   | 10   | 8    | 7    |  |
| display_chip  | 45       | 22    | 14   | 11   | 8    | 6    |  |
| img_calc      | 93       | 55    | 27   | 20   | 14   | 13   |  |
| img_interp    | 50       | 25    | 12   | 9    | 8    | 5    |  |
| input_chip    | 39       | 23    | 11   | 12   | 8    | 6    |  |
| peak_chip     | 48       | 26    | 15   | 10   | 8    | 6    |  |
| scale125_chip | 54       | 31    | 20   | 16   | 9    | 9    |  |
| scale2_chip   | 44       | 22    | 14   | 11   | 8    | 7    |  |
| warping       | 28       | 16    | 10   | 6    | 5    | 4    |  |
| Geom. Avg.    | 23.86    | 13.72 | 9.52 | 7.87 | 6.39 | 5.49 |  |

Table H.6: Number of BLEs on Critical Path (Cluster Size = 6)

| Circuit       |       |       | LUT S | Size | LUT Size |      |  |  |  |  |
|---------------|-------|-------|-------|------|----------|------|--|--|--|--|
|               | 2     | 3     | 4     | 5    | 6        | 7    |  |  |  |  |
| alu4          | 12    | 8     | 7     | 6    | 6        | 5    |  |  |  |  |
| apex2         | 14    | 10    | 7     | 4    | 5        | 5    |  |  |  |  |
| apex4         | 12    | 6     | 5     | 5    | 5        | 4    |  |  |  |  |
| bigkey        | 9     | 6     | 3     | 3    | 3        | 2    |  |  |  |  |
| clma          | 27    | 16    | 13    | 11   | 8        | 9    |  |  |  |  |
| des           | 13    | 8     | 6     | 5    | 3        | 3    |  |  |  |  |
| diffeq        | 36    | 19    | 12    | 10   | 7        | 6    |  |  |  |  |
| dsip          | 7     | 6     | 3     | 3    | 3        | 2    |  |  |  |  |
| elliptic      | 37    | 17    | 11    | 8    | 7        | 9    |  |  |  |  |
| ex1010        | 15    | 10    | 7     | 7    | 5        | 5    |  |  |  |  |
| ex5p          | 13    | 8     | 6     | 4    | 5        | 4    |  |  |  |  |
| frisc         | 66    | 28    | 18    | 12   | 13       | 10   |  |  |  |  |
| misex3        | 11    | 7     | 5     | 5    | 5        | 4    |  |  |  |  |
| pdc           | 15    | 11    | 8     | 7    | 6        | 5    |  |  |  |  |
| s298          | 27    | 17    | 13    | 12   | 11       | 10   |  |  |  |  |
| s38417        | 21    | 14    | 11    | 8    | 5        | 6    |  |  |  |  |
| s38584        | 14    | 9     | 9     | 6    | 6        | 6    |  |  |  |  |
| seq           | 11    | 7     | 5     | 4    | 5        | 5    |  |  |  |  |
| spla          | 17    | 10    | 7     | 6    | 6        | 6    |  |  |  |  |
| tseng         | 40    | 21    | 13    | 10   | 8        | 6    |  |  |  |  |
| display_chip  | 45    | 25    | 12    | 11   | 5        | 6    |  |  |  |  |
| img_calc      | 93    | 56    | 26    | 24   | 15       | 13   |  |  |  |  |
| img_interp    | 50    | 28    | 11    | 10   | 8        | 7    |  |  |  |  |
| input_chip    | 38    | 23    | 11    | 12   | 8        | 6    |  |  |  |  |
| peak_chip     | 48    | 25    | 13    | 10   | 8        | 6    |  |  |  |  |
| scale125_chip | 60    | 32    | 20    | 16   | 9        | 9    |  |  |  |  |
| scale2_chip   | 44    | 23    | 15    | 11   | 8        | 7    |  |  |  |  |
| warping       | 18    | 15    | 10    | 6    | 5        | 4    |  |  |  |  |
| Geom. Avg.    | 23.01 | 13.88 | 9.04  | 7.45 | 6.22     | 5.57 |  |  |  |  |

Table H.7: Number of BLEs on Critical Path (Cluster Size = 7)

| Circuit       |       | LUT Size |      |      |      |      |  |  |
|---------------|-------|----------|------|------|------|------|--|--|
|               | 2     | 3        | 4    | 5    | 6    | 7    |  |  |
| alu4          | 12    | 8        | 7    | 5    | 5    | 5    |  |  |
| apex2         | 14    | 9        | 7    | 7    | 5    | 5    |  |  |
| apex4         | 11    | 7        | 6    | 5    | 5    | 5    |  |  |
| bigkey        | 6     | 5        | 3    | 3    | 3    | 2    |  |  |
| clma          | 26    | 19       | 15   | 12   | 8    | 9    |  |  |
| des           | 12    | 8        | 6    | 5    | 3    | 3    |  |  |
| diffeq        | 37    | 20       | 11   | 9    | 8    | 6    |  |  |
| dsip          | 8     | 6        | 3    | 3    | 3    | 2    |  |  |
| elliptic      | 51    | 21       | 15   | 11   | 9    | 9    |  |  |
| ex1010        | 13    | 10       | 7    | 6    | 6    | 5    |  |  |
| ex5p          | 11    | 8        | 6    | 4    | 5    | 4    |  |  |
| frisc         | 66    | 27       | 19   | 16   | 13   | 11   |  |  |
| misex3        | 9     | 8        | 6    | 5    | 5    | 4    |  |  |
| pdc           | 18    | 9        | 5    | 7    | 6    | 5    |  |  |
| s298          | 31    | 19       | 14   | 10   | 11   | 10   |  |  |
| s38417        | 23    | 14       | 10   | 9    | 7    | 6    |  |  |
| s38584        | 15    | 9        | 9    | 2    | 6    | 6    |  |  |
| seq           | 10    | 8        | 7    | 6    | 5    | 5    |  |  |
| spla          | 15    | 9        | 7    | 6    | 6    | 5    |  |  |
| tseng         | 38    | 19       | 13   | 10   | 8    | 5    |  |  |
| display_chip  | 47    | 24       | 13   | 10   | 7    | 6    |  |  |
| img_calc      | 98    | 56       | 25   | 23   | 14   | 13   |  |  |
| img_interp    | 51    | 28       | 12   | 10   | 7    | 7    |  |  |
| input chip    | 40    | 23       | 14   | 12   | 8    | 6    |  |  |
| peak_chip     | 49    | 26       | 13   | 10   | 8    | 6    |  |  |
| scale125_chip | 55    | 32       | 20   | 16   | 9    | 9    |  |  |
| scale2_chip   | 43    | 22       | 14   | 11   | 6    | 7    |  |  |
| warping       | 28    | 16       | 5    | 6    | 5    | 3    |  |  |
| Geom. Avg.    | 23.08 | 14.00    | 9.12 | 7.42 | 6.35 | 5.50 |  |  |

Table H.8: Number of BLEs on Critical Path (Cluster Size = 8)

| Circuit       | LUT Size |       |      |      |      |      |
|---------------|----------|-------|------|------|------|------|
|               | 2        | 3     | 4    | 5    | 6    | 7    |
| alu4          | 13       | 7     | 7    | 5    | 5    | 5    |
| apex2         | 15       | 10    | 7    | 6    | 5    | 6    |
| apex4         | 10       | 7     | 6    | 6    | 5    | 4    |
| bigkey        | 6        | 5     | 3    | 3    | 3    | 2    |
| clma          | 31       | 19    | 14   | 11   | 8    | 9    |
| des           | 12       | 8     | 6    | 5    | 3    | 3    |
| diffeq        | 37       | 19    | 11   | 8    | 7    | 6    |
| dsip          | 8        | 5     | 3    | 3    | 3    | 2    |
| elliptic      | 51       | 22    | 12   | 12   | 8    | 9    |
| ex1010        | 14       | 9     | 7    | 5    | 5    | 5    |
| ex5p          | 13       | 8     | 6    | 5    | 5    | 4    |
| frisc         | 64       | 30    | 19   | 15   | 12   | 10   |
| misex3        | 13       | 6     | 5    | 5    | 5    | 5    |
| pdc           | 16       | 9     | 7    | 7    | 5    | 6    |
| s298          | 31       | 20    | 15   | 12   | 11   | 9    |
| s38417        | 20       | 15    | 9    | 8    | 7    | 6    |
| s38584        | 18       | 9     | 8    | 7    | 6    | 6    |
| seq           | 13       | 7     | 5    | 5    | 5    | 5    |
| spla          | 15       | 10    | 8    | 7    | 6    | 5    |
| tseng         | 43       | 21    | 13   | 10   | 8    | 6    |
| display_chip  | 46       | 24    | 13   | 10   | 5    | 6    |
| img_calc      | 98       | 56    | 28   | 23   | 15   | 14   |
| img_interp    | 49       | 28    | 12   | 10   | 8    | 7    |
| input_chip    | 41       | 23    | 14   | 12   | 8    | 6    |
| peak_chip     | 48       | 26    | 13   | 10   | 8    | 6    |
| scale125_chip | 56       | 32    | 19   | 16   | 9    | 9    |
| scale2_chip   | 41       | 21    | 14   | 11   | 8    | 7    |
| warping       | 28       | 16    | 7    | 6    | 5    | 4    |
| Geom. Avg.    | 23.95    | 13.83 | 9.09 | 7.74 | 6.23 | 5.64 |

Table H.9: Number of BLEs on Critical Path (Cluster Size = 9)

| Circuit       |       |       | LUT  | Size |      |      |
|---------------|-------|-------|------|------|------|------|
|               | 2     | 3     | 4    | 5    | 6    | 7    |
| alu4          | 10    | 8     | 6    | 6    | 5    | 5    |
| apex2         | 15    | 10    | 7    | 7    | 6    | 5    |
| apex4         | 11    | 7     | 6    | 5    | 5    | 5    |
| bigkey        | 9     | 6     | 3    | 3    | 3    | 2    |
| clma          | 29    | 19    | 9    | 12   | 10   | 8    |
| des           | 10    | 8     | 5    | 5    | 3    | 3    |
| diffeq        | 31    | 20    | 13   | 8    | 7    | 6    |
| dsip          | 8     | 5     | 3    | 3    | 3    | 2    |
| elliptic      | 51    | 17    | 15   | 12   | 4    | 9    |
| ex1010        | 13    | 10    | 8    | 6    | 5    | 6    |
| ex5p          | 12    | 7     | 6    | 6    | 4    | 4    |
| frisc         | 66    | 30    | 19   | 14   | 13   | 10   |
| misex3        | 11    | 7     | 6    | 6    | 4    | 4    |
| pdc           | 15    | 9     | 8    | 7    | 6    | 5    |
| s298          | 31    | 21    | 15   | 12   | 11   | 9    |
| s38417        | 23    | 12    | 10   | 8    | 7    | 5    |
| s38584        | 19    | 9     | 8    | 7    | 6    | 6    |
| seq           | 11    | 7     | 6    | 6    | 4    | 5    |
| spla          | 15    | 10    | 7    | 5    | 6    | 5    |
| tseng         | 42    | 20    | 12   | 10   | 8    | 7    |
| display_chip  | 50    | 25    | 13   | 11   | 7    | 5    |
| img_calc      | 97    | 56    | 25   | 22   | 15   | 13   |
| img_interp    | 50    | 28    | 12   | 12   | 8    | 7    |
| input_chip    | 40    | 23    | 14   | 12   | 7    | 6    |
| peak_chip     | 46    | 25    | 15   | 10   | 8    | 6    |
| scale125_chip | 60    | 32    | 19   | 16   | 9    | 9    |
| scale2_chip   | 44    | 21    | 14   | 11   | 8    | 7    |
| warping       | 27    | 16    | 7    | 6    | 5    | 4    |
| Geom. Avg.    | 23.60 | 13.83 | 9.14 | 7.96 | 6.12 | 5.52 |

Table H.10: Number of BLEs on Critical Path (Cluster Size = 10)

## APPENDIX

#### Number of Cluster Levels on Critical Path

| Circuit       | LUT Size |       |      |      |      |      |  |
|---------------|----------|-------|------|------|------|------|--|
|               | 2        | 3     | 4    | 5    | 6    | 7    |  |
| alu4          | 14       | 7     | 7    | 6    | 4    | 4    |  |
| apex2         | 15       | 7     | 6    | 7    | 6    | 5    |  |
| apex4         | 11       | 7     | 6    | 6    | 5    | 5    |  |
| bigkey        | 10       | 5     | 3    | 2    | 3    | 2    |  |
| clma          | 37       | 12    | 9    | 7    | 10   | 9    |  |
| des           | 13       | 8     | 6    | 5    | 3    | 3    |  |
| diffeq        | 40       | 21    | 15   | 11   | 6    | 7    |  |
| dsip          | 10       | 5     | 3    | 3    | 3    | 2    |  |
| elliptic      | 53       | 23    | 16   | 13   | 11   | 10   |  |
| ex1010        | 16       | 11    | 7    | 5    | 6    | 5    |  |
| ex5p          | 14       | 9     | 6    | 5    | 4    | 4    |  |
| frisc         | 68       | 31    | 24   | 17   | 14   | 12   |  |
| misex3        | 13       | 7     | 6    | 4    | 5    | 4    |  |
| pdc           | 16       | 9     | 8    | 7    | 6    | 5    |  |
| s298          | 32       | 22    | 15   | 14   | 12   | 11   |  |
| s38417        | 19       | 17    | 10   | 9    | 8    | 7    |  |
| s38584        | 26       | 13    | 10   | 6    | 5    | 7    |  |
| seq           | 10       | 7     | 6    | 6    | 5    | 5    |  |
| spla          | 14       | 11    | 8    | 5    | 5    | 5    |  |
| tseng         | 44       | 22    | 14   | 11   | 9    | 8    |  |
| display_chip  | 53       | 27    | 15   | 12   | 8    | 7    |  |
| img_calc      | 102      | 55    | 27   | 22   | 15   | 14   |  |
| img_interp    | 54       | 28    | 14   | 10   | 7    | 6    |  |
| input_chip    | 48       | 24    | 15   | 13   | 9    | 6    |  |
| peak_chip     | 51       | 26    | 16   | 11   | 9    | 7    |  |
| scale125_chip | 59       | 34    | 21   | 17   | 9    | 9    |  |
| scale2_chip   | 46       | 25    | 16   | 6    | 8    | 8    |  |
| warping       | 31       | 14    | 10   | 7    | 5    | 2    |  |
| Geom. Avg.    | 26.05    | 14.23 | 9.85 | 7.67 | 6.50 | 5.69 |  |

Table I.1: Number of Clusters on Critical Path (Cluster Size = 1)

| Circuit       |       | LUT Size |      |      |      |      |  |
|---------------|-------|----------|------|------|------|------|--|
|               | 2     | 3        | 4    | 5    | 6    | 7    |  |
| alu4          | 8     | 7        | 7    | 6    | 5    | 5    |  |
| apex2         | 10    | 9        | 8    | 6    | 6    | 5    |  |
| apex4         | 9     | 7        | 6    | 5    | 4    | 4    |  |
| bigkey        | 6     | 4        | 3    | 3    | 2    | 2    |  |
| clma          | 28    | 12       | 15   | 14   | 10   | 8    |  |
| des           | 9     | 9        | 6    | 3    | 3    | 3    |  |
| diffeq        | 26    | 20       | 15   | 10   | 8    | 7    |  |
| dsip          | 7     | 5        | 3    | 2    | 2    | 2    |  |
| elliptic      | 34    | 21       | 16   | 12   | 11   | 9    |  |
| ex1010        | 10    | 10       | 7    | 6    | 5    | 6    |  |
| ex5p          | 9     | 8        | 6    | 6    | 5    | 4    |  |
| frisc         | 45    | 29       | 20   | 17   | 13   | 12   |  |
| misex3        | 9     | 7        | 3    | 3    | 4    | 5    |  |
| pdc           | 12    | 9        | 7    | 7    | 5    | 5    |  |
| s298          | 24    | 21       | 16   | 13   | 12   | 11   |  |
| s38417        | 20    | 14       | 8    | 7    | 8    | 7    |  |
| s38584        | 14    | 14       | 3    | 7    | 6    | 7    |  |
| seq           | 11    | 6        | 5    | 6    | 5    | 5    |  |
| spla          | 13    | 9        | 7    | 7    | 5    | 4    |  |
| tseng         | 24    | 19       | 11   | 11   | 9    | 8    |  |
| display_chip  | 33    | 24       | 13   | 10   | 8    | 7    |  |
| img_calc      | 81    | 53       | 27   | 19   | 13   | 12   |  |
| img_interp    | 42    | 27       | 13   | 9    | 8    | 8    |  |
| input_chip    | 32    | 24       | 11   | 9    | 8    | 6    |  |
| peak_chip     | 33    | 21       | 14   | 9    | 9    | 6    |  |
| scale125_chip | 48    | 32       | 17   | 14   | 11   | 9    |  |
| scale2_chip   | 33    | 23       | 14   | 11   | 9    | 7    |  |
| warping       | 17    | 12       | 8    | 6    | 6    | 2    |  |
| Geom. Avg.    | 18.28 | 13.42    | 8.74 | 7.45 | 6.40 | 5.65 |  |

Table I.2: Number of Clusters on Critical Path (Cluster Size = 2)

| Circuit       | LUT Size |       |      |      |      |      |
|---------------|----------|-------|------|------|------|------|
|               | 2        | 3     | 4    | 5    | 6    | 7    |
| alu4          | 8        | 5     | 5    | 5    | 5    | 4    |
| apex2         | 9        | 6     | 6    | 5    | 4    | 5    |
| apex4         | 7        | 6     | 5    | 4    | 5    | 4    |
| bigkey        | 6        | 4     | 3    | 2    | 2    | 2    |
| clma          | 26       | 15    | 12   | 11   | 7    | 7    |
| des           | 8        | 7     | 5    | 4    | 3    | 3    |
| diffeq        | 23       | 16    | 13   | 9    | 7    | 7    |
| dsip          | 6        | 4     | 3    | 2    | 2    | 2    |
| elliptic      | 28       | 14    | 10   | 9    | 9    | 7    |
| ex1010        | 10       | 9     | 6    | 6    | 6    | 4    |
| ex5p          | 9        | 7     | 7    | 4    | 4    | 4    |
| frisc         | 41       | 20    | 17   | 12   | 12   | 8    |
| misex3        | 9        | 6     | 6    | 5    | 5    | 4    |
| pdc           | 12       | 7     | 5    | 6    | 5    | 5    |
| s298          | 21       | 17    | 13   | 12   | 11   | 10   |
| s38417        | 15       | 13    | 10   | 10   | 8    | 6    |
| s38584        | 11       | 4     | 9    | 6    | 6    | 4    |
| seq           | 8        | 6     | 5    | 4    | 4    | 4    |
| spla          | 9        | 8     | 6    | 4    | 4    | 4    |
| tseng         | 20       | 14    | 12   | 10   | 8    | 7    |
| display_chip  | 30       | 19    | 11   | 8    | 7    | 6    |
| img_calc      | 59       | 40    | 25   | 19   | 14   | 12   |
| img_interp    | 34       | 20    | 11   | 8    | 6    | 8    |
| input_chip    | 27       | 15    | 9    | 7    | 6    | 6    |
| peak_chip     | 27       | 17    | 13   | 8    | 7    | 6    |
| scale125_chip | 35       | 26    | 14   | 12   | 10   | 7    |
| scale2_chip   | 28       | 16    | 14   | 9    | 8    | 6    |
| warping       | 16       | 12    | 6    | 5    | 4    | 3    |
| Geom. Avg.    | 15.75    | 10.52 | 8.21 | 6.48 | 5.77 | 5.08 |

Table I.3: Number of Clusters on Critical Path (Cluster Size = 3)

| Circuit       |       | LUT Size |      |      |      |      |  |  |
|---------------|-------|----------|------|------|------|------|--|--|
|               | 2     | 3        | 4    | 5    | 6    | 7    |  |  |
| alu4          | 7     | 4        | 6    | 4    | 4    | 4    |  |  |
| apex2         | 9     | 7        | 6    | 5    | 4    | 4    |  |  |
| apex4         | 6     | 7        | 4    | 4    | 4    | 4    |  |  |
| bigkey        | 6     | 3        | 2    | 3    | 3    | 2    |  |  |
| clma          | 22    | 10       | 13   | 11   | 8    | 8    |  |  |
| des           | 8     | 6        | 5    | 4    | 1    | 3    |  |  |
| diffeq        | 20    | 15       | 12   | 9    | 7    | 7    |  |  |
| dsip          | 6     | 5        | 2    | 2    | 2    | 1    |  |  |
| elliptic      | 20    | 11       | 10   | 8    | 8    | 7    |  |  |
| ex1010        | 8     | 8        | 7    | 6    | 4    | 3    |  |  |
| ex5p          | 7     | 6        | 6    | 5    | 5    | 2    |  |  |
| frisc         | 30    | 24       | 16   | 15   | 14   | 7    |  |  |
| misex3        | 7     | 5        | 5    | 4    | 4    | 3    |  |  |
| pdc           | 7     | 7        | 6    | 6    | 5    | 4    |  |  |
| s298          | 20    | 17       | 15   | 11   | 12   | 9    |  |  |
| s38417        | 16    | 10       | 11   | 9    | 8    | 5    |  |  |
| s38584        | 10    | 8        | 7    | 5    | 6    | 5    |  |  |
| seq           | 8     | 6        | 4    | 4    | 3    | 3    |  |  |
| spla          | 8     | 5        | 5    | 5    | 4    | 5    |  |  |
| tseng         | 19    | 14       | 11   | 10   | 9    | 7    |  |  |
| display_chip  | 23    | 16       | 8    | 7    | 5    | 6    |  |  |
| img_calc      | 55    | 46       | 22   | 18   | 11   | 11   |  |  |
| img_interp    | 29    | 19       | 5    | 8    | 6    | 8    |  |  |
| input_chip    | 21    | 18       | 8    | 6    | 6    | 6    |  |  |
| peak_chip     | 22    | 19       | 9    | 7    | 7    | 5    |  |  |
| scale125_chip | 33    | 24       | 12   | 11   | 8    | 9    |  |  |
| scale2_chip   | 23    | 15       | 10   | 12   | 7    | 6    |  |  |
| warping       | 13    | 9        | 4    | 7    | 6    | 5    |  |  |
| Geom. Avg.    | 13.57 | 10.01    | 7.11 | 6.52 | 5.37 | 4.73 |  |  |

Table I.4: Number of Clusters on Critical Path (Cluster Size = 4)

| Circuit       |       | LUT Size |      |      |      |      |  |
|---------------|-------|----------|------|------|------|------|--|
|               | 2     | 3        | 4    | 5    | 6    | 7    |  |
| alu4          | 8     | 6        | 6    | 5    | 3    | 4    |  |
| apex2         | 6     | 4        | 7    | 3    | 5    | 4    |  |
| apex4         | 8     | 5        | 4    | 5    | 3    | 4    |  |
| bigkey        | 8     | 3        | 2    | 3    | 3    | 2    |  |
| clma          | 20    | 16       | 10   | 10   | 8    | 8    |  |
| des           | 7     | 5        | 5    | 5    | 3    | 3    |  |
| diffeq        | 16    | 14       | 9    | 9    | 6    | 6    |  |
| dsip          | 6     | 3        | 2    | 3    | 2    | 2    |  |
| elliptic      | 16    | 9        | 10   | 8    | 8    | 7    |  |
| ex1010        | 6     | 5        | 5    | 6    | 4    | 4    |  |
| ex5p          | 11    | 7        | 6    | 4    | 4    | 4    |  |
| frisc         | 27    | 21       | 15   | 12   | 12   | 10   |  |
| misex3        | 8     | 5        | 3    | 4    | 3    | 3    |  |
| pdc           | 11    | 6        | 6    | 5    | 5    | 5    |  |
| s298          | 20    | 14       | 13   | 10   | 10   | 9    |  |
| s38417        | 7     | 11       | 11   | 9    | 7    | 7    |  |
| s38584        | 10    | 9        | 6    | 7    | 5    | 5    |  |
| seq           | 6     | 5        | 6    | 4    | 3    | 4    |  |
| spla          | 7     | 8        | 6    | 5    | 4    | 3    |  |
| tseng         | 18    | 15       | 9    | 10   | 7    | 6    |  |
| display_chip  | 22    | 14       | 9    | 5    | 6    | 5    |  |
| img_calc      | 53    | 30       | 22   | 16   | 13   | 10   |  |
| img_interp    | 29    | 18       | 9    | 8    | 6    | 7    |  |
| input_chip    | 20    | 18       | 9    | 6    | 5    | 7    |  |
| peak_chip     | 22    | 15       | 9    | 6    | 8    | 5    |  |
| scale125_chip | 30    | 22       | 12   | 10   | 8    | 7    |  |
| scale2_chip   | 20    | 15       | 10   | 10   | 8    | 6    |  |
| warping       | 13    | 10       | 3    | 6    | 5    | 4    |  |
| Geom. Avg.    | 12.91 | 9.31     | 6.93 | 6.30 | 5.26 | 4.95 |  |

Table I.5: Number of Clusters on Critical Path (Cluster Size = 5)

| Circuit       |       |      | LUT  | Size |      |      |
|---------------|-------|------|------|------|------|------|
|               | 2     | 3    | 4    | 5    | 6    | 7    |
| alu4          | 8     | 4    | 4    | 3    | 3    | 3    |
| apex2         | 7     | 6    | 5    | 5    | 5    | 4    |
| apex4         | 7     | 6    | 5    | 4    | 4    | 4    |
| bigkey        | 3     | 2    | 3    | 2    | 2    | 2    |
| clma          | 18    | 18   | 11   | 11   | 5    | 8    |
| des           | 8     | 7    | 6    | 5    | 3    | 3    |
| diffeq        | 17    | 13   | 10   | 8    | 6    | 6    |
| dsip          | 4     | 4    | 2    | 3    | 2    | 2    |
| elliptic      | 17    | 10   | 9    | 7    | 6    | 6    |
| ex1010        | 8     | 8    | 5    | 5    | 5    | 3    |
| ex5p          | 7     | 7    | 5    | 3    | 4    | 4    |
| frisc         | 29    | 22   | 15   | 12   | 12   | 8    |
| misex3        | 7     | 5    | 4    | 4    | 4    | 4    |
| pdc           | 7     | 6    | 5    | 6    | 4    | 4    |
| s298          | 16    | 14   | 13   | 12   | 11   | 8    |
| s38417        | 14    | 13   | 11   | 8    | 7    | 5    |
| s38584        | 9     | 7    | 6    | 5    | 7    | 6    |
| seq           | 9     | 6    | 4    | 5    | 4    | 4    |
| spla          | 10    | 8    | 6    | 5    | 4    | 4    |
| tseng         | 16    | 15   | 11   | 10   | 7    | 6    |
| display_chip  | 20    | 14   | 7    | 7    | 6    | 4    |
| img_calc      | 48    | 42   | 20   | 16   | 11   | 10   |
| img_interp    | 28    | 17   | 12   | 9    | 7    | 4    |
| input_chip    | 23    | 14   | 6    | 5    | 6    | 4    |
| peak_chip     | 22    | 13   | 10   | 8    | 7    | 5    |
| scale125_chip | 35    | 17   | 10   | 9    | 9    | 6    |
| scale2_chip   | 16    | 14   | 11   | 10   | 8    | 7    |
| warping       | 7     | 7    | 3    | 6    | 5    | 4    |
| Geom. Avg.    | 12.16 | 9.43 | 6.80 | 6.15 | 5.32 | 4.58 |

Table I.6: Number of Clusters on Critical Path (Cluster Size = 6)

| Circuit       | LUT Size |      |      |      |      |      |  |
|---------------|----------|------|------|------|------|------|--|
|               | 2        | 3    | 4    | 5    | 6    | 7    |  |
| alu4          | 10       | 6    | 4    | 4    | 3    | 3    |  |
| apex2         | 7        | 5    | 7    | 3    | 5    | 4    |  |
| apex4         | 7        | 6    | 5    | 5    | 4    | 3    |  |
| bigkey        | 4        | 2    | 3    | 3    | 2    | 2    |  |
| clma          | 15       | 11   | 12   | 9    | 5    | 9    |  |
| des           | 6        | 7    | 6    | 4    | 3    | 3    |  |
| diffeq        | 14       | 14   | 7    | 9    | 6    | 6    |  |
| dsip          | 5        | 3    | 3    | 2    | 3    | 2    |  |
| elliptic      | 11       | 9    | 7    | 7    | 6    | 7    |  |
| ex1010        | 8        | 5    | 5    | 5    | 4    | 4    |  |
| ex5p          | 8        | 5    | 6    | 3    | 5    | 4    |  |
| frisc         | 22       | 21   | 14   | 10   | 12   | 11   |  |
| misex3        | 7        | 6    | 4    | 4    | 3    | 3    |  |
| pdc           | 8        | 6    | 5    | 4    | 5    | 5    |  |
| s298          | 15       | 10   | 11   | 9    | 9    | 8    |  |
| s38417        | 13       | 13   | 8    | 6    | 6    | 7    |  |
| s38584        | 10       | 7    | 7    | 7    | 6    | 6    |  |
| seq           | 8        | 6    | 5    | 3    | 3    | 3    |  |
| spla          | 7        | 7    | 5    | 6    | 4    | 4    |  |
| tseng         | 16       | 11   | 9    | 10   | 7    | 6    |  |
| display_chip  | 21       | 13   | 7    | 8    | 5    | 4    |  |
| img_calc      | 45       | 41   | 18   | 18   | 10   | 10   |  |
| img_interp    | 24       | 13   | 9    | 9    | 6    | 5    |  |
| input_chip    | 23       | 13   | 7    | 4    | 4    | 5    |  |
| peak_chip     | 20       | 13   | 11   | 7    | 8    | 5    |  |
| scale125_chip | 20       | 18   | 11   | 8    | 10   | 6    |  |
| scale2_chip   | 15       | 12   | 8    | 8    | 8    | 5    |  |
| warping       | 12       | 6    | 3    | 6    | 5    | 3    |  |
| Geom. Avg.    | 11.64    | 8.55 | 6.67 | 5.75 | 5.12 | 4.64 |  |

Table I.7: Number of Clusters on Critical Path (Cluster Size = 7)

| Circuit       | LUT Size |      |      |      |      |      |  |
|---------------|----------|------|------|------|------|------|--|
|               | 2        | 3    | 4    | 5    | 6    | 7    |  |
| alu4          | 8        | 6    | 4    | 5    | 3    | 3    |  |
| apex2         | 9        | 8    | 7    | 4    | 4    | 3    |  |
| apex4         | 7        | 5    | 4    | 4    | 3    | 3    |  |
| bigkey        | 4        | 2    | 3    | 3    | 3    | 2    |  |
| clma          | 16       | 12   | 12   | 11   | 8    | 7    |  |
| des           | 7        | 7    | 6    | 5    | 3    | 3    |  |
| diffeq        | 11       | 12   | 5    | 5    | 5    | 5    |  |
| dsip          | 4        | 3    | 3    | 2    | 2    | 2    |  |
| elliptic      | 13       | 9    | 8    | 8    | 5    | 7    |  |
| ex1010        | 10       | 7    | 5    | 5    | 4    | 4    |  |
| ex5p          | 10       | 7    | 6    | 4    | 4    | 3    |  |
| frisc         | 20       | 21   | 16   | 12   | 14   | 11   |  |
| misex3        | 6        | 6    | 4    | 4    | 3    | 3    |  |
| pdc           | 7        | 8    | 4    | 4    | 3    | 4    |  |
| s298          | 14       | 15   | 14   | 10   | 11   | 7    |  |
| s38417        | 13       | 12   | 8    | 9    | 7    | 7    |  |
| s38584        | 10       | 9    | 7    | 3    | 5    | 5    |  |
| seq           | 7        | 7    | 4    | 4    | 3    | 3    |  |
| spla          | 9        | 9    | 5    | 5    | 4    | 4    |  |
| tseng         | 16       | 14   | 9    | 10   | 7    | 6    |  |
| display_chip  | 16       | 14   | 8    | 7    | 7    | 4    |  |
| img_calc      | 41       | 42   | 12   | 13   | 10   | 10   |  |
| img_interp    | 23       | 13   | 10   | 9    | 5    | 5    |  |
| input chip    | 23       | 10   | 4    | 4    | 3    | 4    |  |
| peak_chip     | 19       | 14   | 10   | 9    | 7    | 6    |  |
| scale125_chip | 29       | 17   | 12   | 7    | 9    | 7    |  |
| scale2_chip   | 17       | 12   | 10   | 8    | 6    | 4    |  |
| warping       | 7        | 4    | 5    | 6    | 5    | 3    |  |
| Geom. Avg.    | 11.46    | 9.14 | 6.55 | 5.78 | 4.86 | 4.38 |  |

Table I.8: Number of Clusters on Critical Path (Cluster Size = 8)

| Circuit       | LUT Size |      |      |      |      |      |
|---------------|----------|------|------|------|------|------|
|               | 2        | 3    | 4    | 5    | 6    | 7    |
| alu4          | 5        | 6    | 4    | 4    | 5    | 2    |
| apex2         | 5        | 4    | 5    | 5    | 4    | 4    |
| apex4         | 7        | 4    | 4    | 3    | 4    | 3    |
| bigkey        | 3        | 3    | 3    | 3    | 2    | 2    |
| clma          | 17       | 14   | 11   | 10   | 6    | 9    |
| des           | 7        | 8    | 6    | 5    | 3    | 2    |
| diffeq        | 12       | 12   | 9    | 7    | 6    | 5    |
| dsip          | 4        | 5    | 3    | 3    | 3    | 2    |
| elliptic      | 11       | 7    | 8    | 7    | 7    | 7    |
| ex1010        | 7        | 9    | 7    | 3    | 5    | 3    |
| ex5p          | 8        | 5    | 6    | 4    | 3    | 3    |
| frisc         | 20       | 16   | 15   | 11   | 11   | 10   |
| misex3        | 5        | 5    | 5    | 5    | 2    | 3    |
| pdc           | 8        | 7    | 5    | 4    | 5    | 4    |
| s298          | 18       | 17   | 11   | 12   | 10   | 10   |
| s38417        | 11       | 12   | 9    | 9    | 6    | 7    |
| s38584        | 9        | 6    | 7    | 5    | 5    | 5    |
| seq           | 6        | 4    | 4    | 4    | 3    | 3    |
| spla          | 7        | 7    | 4    | 5    | 4    | 3    |
| tseng         | 19       | 10   | 8    | 10   | 8    | 6    |
| display_chip  | 18       | 11   | 9    | 7    | 5    | 3    |
| img_calc      | 34       | 39   | 14   | 15   | 10   | 9    |
| img_interp    | 24       | 14   | 9    | 9    | 5    | 5    |
| input_chip    | 21       | 10   | 6    | 3    | 4    | 4    |
| peak_chip     | 16       | 11   | 9    | 6    | 7    | 5    |
| scale125_chip | 32       | 16   | 10   | 8    | 8    | 5    |
| scale2_chip   | 22       | 13   | 9    | 6    | 8    | 6    |
| warping       | 6        | 4    | 6    | 5    | 4    | 5    |
| Geom. Avg.    | 10.54    | 8.35 | 6.74 | 5.71 | 4.97 | 4.29 |

Table I.9: Number of Clusters on Critical Path (Cluster Size = 9)

| Circuit       | LUT Size |      |      |      |      |      |
|---------------|----------|------|------|------|------|------|
|               | 2        | 3    | 4    | 5    | 6    | 7    |
| alu4          | 5        | 4    | 4    | 4    | 4    | 2    |
| apex2         | 8        | 5    | 6    | 4    | 4    | 4    |
| apex4         | 7        | 5    | 4    | 5    | 4    | 3    |
| bigkey        | 4        | 2    | 2    | 3    | 3    | 2    |
| clma          | 16       | 15   | 6    | 7    | 8    | 8    |
| des           | 7        | 5    | 5    | 5    | 2    | 3    |
| diffeq        | 17       | 10   | 5    | 9    | 5    | 6    |
| dsip          | 4        | 5    | 3    | 3    | 3    | 2    |
| elliptic      | 11       | 7    | 9    | 8    | 5    | 7    |
| ex1010        | 8        | 5    | 5    | 6    | 5    | 4    |
| ex5p          | 8        | 5    | 5    | 3    | 4    | 3    |
| frisc         | 20       | 17   | 16   | 14   | 11   | 10   |
| misex3        | 5        | 6    | 4    | 3    | 3    | 2    |
| pdc           | 9        | 7    | 7    | 5    | 4    | 5    |
| s298          | 17       | 15   | 14   | 12   | 10   | 10   |
| s38417        | 10       | 9    | 10   | 8    | 7    | 6    |
| s38584        | 9        | 7    | 8    | 5    | 5    | 5    |
| seq           | 9        | 6    | 4    | 4    | 4    | 3    |
| spla          | 7        | 8    | 5    | 5    | 3    | 5    |
| tseng         | 10       | 12   | 9    | 9    | 7    | 5    |
| display_chip  | 14       | 11   | 8    | 6    | 7    | 6    |
| img_calc      | 43       | 35   | 11   | 12   | 10   | 9    |
| img_interp    | 22       | 12   | 9    | 6    | 7    | 6    |
| input_chip    | 20       | 10   | 6    | 3    | 4    | 4    |
| peak_chip     | 15       | 15   | 9    | 7    | 6    | 4    |
| scale125_chip | 21       | 15   | 6    | 8    | 7    | 4    |
| scale2_chip   | 15       | 13   | 9    | 6    | 7    | 6    |
| warping       | 11       | 4    | 6    | 5    | 4    | 5    |
| Geom. Avg.    | 10.73    | 8.10 | 6.30 | 5.67 | 5.02 | 4.47 |

Table I.10: Number of Clusters on Critical Path (Cluster Size = 10)

#### Bibliography

- [ACSG<sup>+</sup>99] O. Agrawal, H. Chang, B. Sharpe-Geisler, N. Schmitz, B. Nguyen, J. Wong, G. Tran, F. Fontana, and B. Harding. "An Innovative, Segmented High Performance FPGA Family with Variable-Grain-Architecture and Wide-gating Functions". In ACM Symp. on FPGAs, Monterey, CA, USA, 1999.
- [AR00] E. Ahmed and J. Rose. "The Effect of LUT and Cluster Size on Deep-Submicron FPGA Performance and Density". In ACM Symp. on FPGAs, pages 3–12, 2000.
- [BFRV92] S. Brown, R. Francis, J. Rose, and Z. Vranesic. "Field-Programmable Gate Arrays". Kluwer Academic Publishers, 1992.
- [BR96] S. Brown and J. Rose. "FPGA and CPLD Architectures: A Tutorial". In IEEE Design and Test of Computers, pages 42–57, Summer 1996.
- [BR97] V. Betz and J. Rose. "Cluster-Based Logic Blocks for FPGAs: Area-Effeciency vs Input Sharing and Size". In IEEE Custom Integrated Circuits Conference, pages 551–554, Santa Clara, CA, 1997.
- [BR98] V. Betz and J. Rose. "How Much Logic Should Go in an FPGA Logic Block?". In IEEE Design and Test Magazine, pages 10–15, 1998.
- [BRM99] V. Betz, J. Rose, and A. Marquardt. "Architecture and CAD for Deep-Submicron FPGAs". Kluwer Academic Publishers, New York, 1999.
- [CD94] J. Cong and Y. Ding. "FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs". In IEEE Trans. on CAD, pages 1–12, Jan 1994.

|          | <i>FPGAs with Application to Architecture Evaluation</i> ". In <i>ACM Symp. on FPGAs</i> , Monterey, CA, 1998.                                                             |
|----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [Chu94]  | K. Chung. "Architecture and Synthesis of Field-Programmable Gate Arrays with<br>Hardwired Connections". PhD thesis, University of Toronto, 1994.                           |
| [CPD96]  | J. Cong, J. Peck, and Y. Ding. "RASP: A General Logic Synthesis System for SRAM-based FPGAs". In ACM Symp. on FPGAs, pages 137–143, 1996.                                  |
| [ea90]   | E.M. Sentovich et al. "SIS: A System for Sequential Circuit Analysis". Technical report, University of California, Berkeley, 1990.                                         |
| [HSC83]  | R. Hitchcock, G. Smith, and D. Cheng. <i>"Timing Analysis of Computer-Hardware"</i> . Technical report, IBM Journal of Research and Development, Jan. 1983.                |
| [HW91]   | D. Hill and N-S Woo. "The Benefits of Flexibility in Look-up Table FPGAs". In International Workshop on FPGAs, 1991.                                                       |
| [Inc97]  | Xilinx Inc. "XC5200 Series of FPGAs". 1997.                                                                                                                                |
| [Inc98a] | Altera Inc. "Data Book". 1998.                                                                                                                                             |
| [Inc98b] | Xilinx Inc. "Virtex 2.5 V Field Programmable Gate Arrays". 1998.                                                                                                           |
| [KBKC99] | S. Kaptanoglu, G. Bakker, A. Kundu, and I. Corneillet. "A new high density and very low cost reprogrammable FPGA architecture". In ACM Symp. on FPGAs, Monterey, CA, 1999. |
| [KG91]   | J. Kouloheris and A.El Gamal. "FPGA Performance vs. Cell Granularity". In Proc. of Custom Integrated Circuits Conference, pages 6.2.1 – 6.2.4, May 1991.                   |
| [KG92a]  | J. Kouloheris and A.El Gamal. "FPGA Area vs. Cell Granularity - PLA Cells".<br>In Proc. of Custom Integrated Circuits Conference, 1992.                                    |
| [KG92b]  | J. Kouloheris and A.El Gamal. "FPGA Area vs. Cell Granularity - Lookup tables<br>and PLA Cells". In First ACM Workshop on FPGAs, Berkeley, CA, 1992.                       |
| [Mar99]  | A. Marquardt. "Cluster-Based Architecture, Timing-Driven Packing, and Timing-<br>Driven Placement for FPGAs". Master's thesis, University of Toronto, 1999.                |

J. Cong and Y. Hwang. "Boolean Matching for Complex PLBs in LUT-based

[CH98]

- [MBR99] A. Marquardt, V. Betz, and J. Rose. "Using Cluster-Based Logic Blocks and Timing-Driven Packing to Improve FPGA Speed and Density". In ACM/SIGDA FPGA, 1999.
- [RFCL89] J. Rose, R.J. Francis, P. Chow, and D. Lewis. "The Effect of Logic Block Complexity on Area of Programmable Arrays". In Proc. of Custom Integrated Circuits Conference, pages 5.3.1 – 5.3.5, 1989.
- [RFLC90] J. Rose, R.J. Francis, D. Lewis, and P. Chow. "Architecture of Field-Programmable Gate Arrays: The Effect of Logic Functionality on Area Efficiency". In IEEE Journal of Solid-State Circuits, 1990.
- [Sin91] S. Singh. "*The Effect of Logic Block Architecture on FPGA Performance*". Master's thesis, University of Toronto, 1991.
- [SRCL92] S. Singh, J. Rose, P. Chow, and D. Lewis. "The Effect of Logic Block Architecture on FPGA Performance". In IEEE Journal of Solid-State Circuits, 1992.
- [SS91] A. Sedra and K. Smith. "Microelectronic Circuits: Third Edition". Oxford University Press, 1991.
- [WE93] N. West and K. Eshraghian. "Principles of CMOS VLSI Design; A System Perspective; Second Edition". Addison Wesley, 1993.
- [Yan91] S. Yang. "*Logic Synthesis and Optimization Benchmarks, Version 3.0*". Technical report, Microelectronics Centre of North Carolina, 1991.