# Efficient Simulation of Electromigration Damage in Large Chip Power Grids Using Accurate Physical Models

(Invited Paper)

Farid N. Najm ECE Department University of Toronto f.najm@utoronto.ca Valeriy Sukharev Mentor, A Siemens Business Fremont, CA 94538, USA valeriy\_sukharev@mentor.com

Abstract—Due to continued technology scaling, electromigration (EM) signoff has become increasingly difficult, mainly due to the use of inaccurate methods for EM assessment, such as the empirical Black's model. In this paper, we review the state of the art for EM verification in on-die power/ground grids, with emphasis on a recent finite-difference based approach for power grid EM checking using physics-based models. The resulting model allows the EM damage across the power grid to be simulated based on a Linear Time Invariant (LTI) system formulation. The model also handles early failures and accounts for their impact on the grid lifetime. Our results, for a number of IBM power grid benchmarks, confirm that existing approaches for EM checking can be highly inaccurate. The lifetimes found using our physics-based approach are on average about 2X, or more, those based on the existing approaches, showing that existing EM design guidelines are overly pessimistic. The method is also quite fast, with a runtime of about 8 minutes for a 4M node grid, and so it is suitable for large circuits.

Keywords — Electromigration, hydrostatic stress, verification, power grid, reliability, linear time invariant system.

#### I. INTRODUCTION

With continued technology scaling, electromigration (EM) in metal lines has become a major reliability concern for integrated circuits [1]. It is well-known that EM-induced voiding causes increased resistance in metal lines, while hillock formation can lead to short-circuit failures. We will focus on voids because they are found to occur much more frequently than hillocks in practice. On-chip metal lines (interconnect) are either signal lines, including intra- and inter-cell connectivity, or power supply and ground lines, whose purpose is to deliver a well-regulated supply voltage across the whole die. We will use the term *power grid* to refer to either the power supply network, the ground network, or both.

While both signal and power lines are susceptible to EM degradation, they have different rates of EM damage under typical chip operation, because of the different types of current that they carry. The majority of signal lines carry bidirectional currents that lead to a repetitive increase and decrease of the mechanical stress in metal lines, which results in very long EM lifetimes. In contrast, power lines carry mostly unidirectional currents and so they have much shorter lifetimes due to the

This work was supported in part by the Semiconductor Research Corporation (SRC) and by Mentor, A Siemens Business.

negligible stress relaxation that they experience. Thus, in the majority of cases, EM-induced chip failure is due to the failure of the power network to deliver acceptable voltages to some cell in the underlying circuit [2]. Hence the focus of this paper is on EM in power grids.

Today, it is becoming harder to sign off on chip designs using state of the art EM checking tools, as there is very little margin left between the predicted EM stress and that allowed by EM design rules [3]. This loss of safety margin can be traced back to the inaccurate and oversimplified nature of EM models used by existing tools.

## A. Existing approach

The standard method for EM checking in the industry is an extension of the traditional approach for EM assessment (for single-link test-structures) to the domain of the power grid. A standard single-link test-structure is typically a set of Copper lines, as in Fig. 1a, in which the electric current flows from the wide metal supply lines through the vias, into the narrower test lines. The EM test is applied to a variety of such test structures designed for different metal layers and different current directions including upstream and downstream tests, as in Fig. 1b. Changes in voltage and resistance over time are measured and recorded. An increase of resistance of individual lines above some threshold value is considered to be a failure. Details describing this methodology characterized by a variety of measurement techniques can be found elsewhere; see for example [4]. Here, we will only give a brief summary of the approach.

The time-to-failure (TTF) of each metal line, stressed by direct current (DC) of density j at the temperature T, is recorded for hundreds of identical lines that may be in the test structures. Due to random manufacturing variations, these TTF values are random samples, and they are known to follow a lognormal distribution [5]. The average of the observed TTFs, denoted as the mean-time-to-failure (MTF), is extracted from the measured TTF ensembles, and it is known to follow Black's model [6]

$$MTF = \frac{A}{j^n} e^{-E_A/kT},$$
(1)

where k is the Boltzmann constant and A is a proportionality coefficient, which depends on line geometry, residual stress



Fig. 1. Multi-link EM test structure (a); upstream and downstream test structure (b), where the arrows indicate the direction of electron flow.

and temperature. Two critical parameters, the current density exponent n and activation energy  $E_A$  are extracted by regression from the measured TTFs using lognormal plots. In order for failures to happen in reasonable time-intervals of several hours, these measurements are carried out at so-called stressed conditions, in a process that is called accelerated testing, characterized by elevated temperatures  $T_{st}$  of  $200-400^{\circ}$ C and high current densities  $j_{st}$  of  $3 \times 10^9 - 5 \times 10^{10}$  A/m<sup>2</sup>. Translation of the MTF<sub>st</sub> obtained at the stress conditions to the operating conditions MTF<sub>op</sub>, characterized by lower temperatures and current densities, is performed based on the equation

$$\mathrm{MTF}_{op} = \mathrm{MTF}_{st} \left(\frac{j_{st}}{j_{op}}\right)^n \exp\left[\frac{E_A}{k} \left(\frac{1}{T_{op}} - \frac{1}{T_{st}}\right)\right]. \quad (2)$$

This is then used to figure out the maximum allowable current density  $j_{max}$  corresponding to a specified target MTF<sub>sp</sub> using

$$j_{max} = j_{st} \left(\frac{\text{MTF}_{st}}{\text{MTF}_{sp}}\right)^{1/n} \exp\left[\frac{E_A}{nk} \left(\frac{1}{T_{op}} - \frac{1}{T_{st}}\right)\right].$$
 (3)

This maximum current density  $j_{max}$  is then imposed as a design guideline for any metal line with this geometry and temperature. This is all fine and good, but it is a big "leap of faith" to extend this sound analysis of single-link test structures and apply it with little modification to a completely different context, such as the analysis of a large power grid. Unfortunately, this is exactly what has been done in industrial tools over the last few decades. Existing methods for power grid EM verification break up a power grid into its millions of constitutive metal segments, as in Fig. 2, then independently apply the single-link analysis to each segment and check if the current density exceeds the  $j_{max}$  guideline for that segment. A chip is deemed to have failed if any of its metal lines is found to carry a current density higher than  $j_{max}$ . This method of doing EM verification by means of a simple current density check in isolated lines is prevalent across the industry.

There are many problems with this existing approach; and this is not a new observation. Indeed, as early as 2001, SP Hau-Riege and CV Thompson [7] wrote "It is demonstrated that the reliability of a segment can not be predicted without



Fig. 2. An interconnect tree is decomposed into its many segments.



Fig. 3. A typical cross-sectional schematic of Cu dual damascene interconnect.

knowledge of the conditions for stress migration and electromigration in connecting segments," and found that "Accurate circuit-level reliability analyses can not be based on segmentlevel models, but require instead, tree-based models such as the one developed here."

# B. Power grid structure

Modern power grids are made of Copper (Cu) and are fabricated using a dual damascene process, in which the metal line and via are formed simultaneously using Copper. A barrier metal liner (usually Tantalum) completely surrounds all Cu interconnects to prevent the Copper from diffusing into the surrounding dielectric. The cross-section of a typical metal via structure in a Cu dual damascene process is shown in Fig. 3. Due to the presence of the barrier metal liner around the vias, Cu atoms from one layer cannot diffuse to another layer. The on-die power grid spans all metal layers, from M1 and M2 which connect to transistors in the underlying circuit, to the top-level metal pads that connect via C4 bumps to the external supply/ground. Every metal layer of the grid consists mostly of parallel stripes that are connected by vias to other metal layers right above and below, as shown in Fig. 4. On every layer, the power and ground stripes are interleaved, as shown in the figure. This resulting mesh pattern does not simply repeat as is across the whole die: there are breaks, and other variations, in order to allow for signal and other types of routing. In any case, the point is that the grid metal structures within any given layer are mostly trees, i.e., they contain no loops or cycles. Thus, all previous work in this area assumes that the grid is made up of interconnect trees. An interconnect tree is a continuously connected acyclic structure of straight metal lines within one layer of metalization such that atomic flux can flow freely within it, as in the simple example in Fig. 5.



Fig. 4. A multi-layer power grid with interspersed power and ground lines.



Fig. 5. A typical interconnect tree structure.

#### C. Problems with the existing approach

We will outline three reasons why the existing approach is problematic. First, the fitting parameters obtained for Black's model under accelerated testing conditions are not valid at actual operating conditions, and this can lead to significant errors in lifetime extrapolation [8], [9].

Second, the straightforward extension of the single-link analysis based on Black's model to power grids, by a simple decomposition, ignores the material flow between branches. In today's mesh structured power grids, many branches within the same layer are connected as part of what is called an interconnect tree and atomic flux can flow freely between the branches of these trees [7]. As a result, if the individual branches happen to be short so that they are deemed immortal due to the Blech effect [10], then the tree would appear to be immortal, which is highly *optimistic* and can be entirely misleading for design. In fact, due to material flow across the tree, failures can and do happen even if the branches are short. On the other hand, because the assumption of no material flow between branches effectively means that the reliability of nearby metal lines are independent of each other, then the traditional approach can also be highly *pessimistic*, as we have reported in earlier work [11]. Indeed, two identical connected lines with the same current density can in practice have quite different MTF values [12], due to the differences in material flow into these lines from the rest of the tree.

Finally, the third problem lies with the series model assumption. A series model is the case where a power grid is deemed to have failed as soon as the first of its branches has failed,



Fig. 6. Circuit corresponding to a simple two-layer grid.

typically due to an open circuit. However, modern power grids use a mesh structure [13]–[16], as in Fig. 4; a simple two layer grid translates to a simple mesh circuit structure as in Fig. 6. As such, there are many paths for the current to flow from the C4 bumps to the underlying logic, a characteristic that we refer to as *redundancy*. Mesh power grids are in fact closer to (but not quite) a parallel system. As such, it is highly pessimistic to assume that a single branch failure will always cause the whole grid to fail.

Additional discussion of the above shortcomings of existing methods is given in [17]. In short, there is a need to reconsider the traditional approaches and develop a new EM checking method that accurately models EM degradation using physicsbased models, combined with a mesh model to account for grid redundancy, while being fast enough to be practically useful.

## D. Emerging solutions

Over the last few years, many approaches have been proposed which overcome, to some extent, the aforementioned shortcomings. Chatterjee et al. [13] proposed the mesh model as an alternative to the series model. In the mesh model, a grid is deemed to have failed, not when the first line fails, but when enough lines have failed so that the voltage drop at some grid nodes(s) have exceeded some pre-defined threshold (which would cause errors in the underlying logic). However, [13] still used Black's model to compute the reliability of individual branches. Huang et. al. [14] proposed an adaptation of Korhonen's physical EM model [18] for interconnect trees. Hau-Riege et al. [7] used Korhonen's model to develop a closedform solution for stress evolution at a junction (a point where multiple branches meet) by replacing its connected branches with semi-infinite limbs, which was later used by Li et al. [19] in their EM verification tool. Both [14] and [19] were later extended to account for temperature variation as well [20], [21]. However, the approach in [14] is slow, requiring up to 32 hours to estimate the failure time of a 400K node grid. In [19], since the connected branches are replaced by semi-infinite limbs, atomic flow across the whole tree is not accounted for.

# E. EM Simulation

In [11], [22], [23], we proposed a new EM checking approach for power grids that accurately models EM degradation using physics-based models, combined with a mesh model to account for redundancy and using a voltage-drop failure criterion, while being fast enough to be practically useful. This technique is based on a finite-difference based approach, and has the unique feature that the grid is deemed to have failed, not when a void has formed, but when the voltage drop somewhere (anywhere) on the grid has exceeded user specifications; this is how redundancy in the grid is taken into account. The method starts with Korhonen's one-dimensional (1D) physical model [18], and augments it by i) introducing boundary laws at junctions to track the material flow and stress evolution in multi-branch interconnect trees (for arbitrary complex geometries) and *ii*) accounting for thermal stresses generated by non-uniform temperature distribution across the grid. We refer to this as the Extended Korhonen's Model (EKM). For each tree, the extended model starts out as a system of Partial Differential Equations (PDE) coupled by boundary laws which are then discretized and scaled to reduce it to a homogeneous Linear Time Invariant (LTI) system, where each state represents the stress at a discretized point in the tree. We numerically solve this system to track the stress evolution over time and find the corresponding time of void nucleations, some of which might cause early failures, which we account for by disconnecting a via. Once a void has nucleated, we update the system and recompute the currents and voltages, and restart the simulation for the next nucleation phase. This is repeated until enough voids have formed that a voltage drop violation has occurred; the time of the violation is declared as the TTF and the simulation is terminated.

The random nature of EM degradation, caused by process variation, is taken care of by using a Monte Carlo method. Successive instances of the same grid are generated by randomly sampling from a distribution of the diffusivities in the branches. This gives a population of grids with identical geometry and topology, but with randomly varying diffusivities, which we then simulate to get successive samples of the grid time to failure (TTF), until the estimate of the overall MTF has converged. We improve our runtime by using a filtering scheme that estimates up-front the active set of trees that are most-likely to impact the MTF assessment of the grid, a scheme which we have found has minimal impact on accuracy. We also propose a predictive scheme that allows for fast MTF estimation based on extrapolation of the solution (stress curve) obtained from a few time-points. Testing this approach on the IBM grid benchmarks [24], with the largest grid up to 720K nodes, shows that the MTF estimated using our physics-based approach are on average 2.75X longer than those based on a (calibrated) Black's model. This justifies the claim that Black's model can be overly inaccurate for modern power grids and confirms the need for physical models. With a run-time of 3.6 minutes for the largest (720K node) grid, this approach appears to be promising for large circuits.

# II. BACKGROUND

For any given interconnect branch, we will refer to the time duration before any voids have nucleated as the nucleation phase. In this phase, the resistance of a line remains roughly the same as that of a fresh or undamaged line. Once the hydrostatic stress (tensile, positive) at some point in the line has exceeded some critical value, a void is said to nucleate at that point. At this point, the void growth phase begins and the void starts to grow. In some cases, depending on the geometry and the location of the void, nucleation by itself may be enough to cause failure due to an open circuit (by disconnecting a via) [25]. These failures are typically called early failures. In other cases, again depending on geometry, a line may continue to conduct current after void nucleation. With time, the void continues to grow in the direction of the electron flow and the line resistance increases towards some steady-state value. In order to track the evolution of the line (and tree) towards voiding and beyond, we need to simulate and track the values of stress in the lines over time.

#### A. Korhonen's model

Fully accurate EM analysis would require 3D simulation of metal structures, and with a very fine mesh in order to capture enough details, which would be prohibitively expensive for practical work with integrated circuits. Fortunately, Korhonen's 1D model [18] offers a reasonable trade-off between accuracy and model efficiency. The model captures the evolution over time of the *hydrostatic stress*  $\sigma$  under EM, at any point along the length of a metal line. Hydrostatic stress is the average of all normal components of the full stress tensor. Consider a uniform metal line embedded in a rigid dielectric. We are interested in the time-varying stress  $\sigma(x, t)$  at location x along the length of the line, relative to some given reference point, and at time t. Following Korhonen's formulation,  $\sigma$  is positive for tensile stress and negative for compressive stress, and its time evolution is governed by the PDE

$$\frac{\partial \sigma}{\partial t} = \frac{B\Omega}{k_b T_m} \frac{\partial}{\partial x} \left\{ D_a \left( \frac{\partial \sigma}{\partial x} - \frac{q^* \rho}{\Omega} j \right) \right\},\tag{4}$$

where j is the current density in the line at that point and time,  $D_a$  is the coefficient of atomic diffusion,  $T_m$  is the temperature in Kelvin, B is the effective bulk modulus,  $\Omega$  is the atomic volume,  $q^*$  is the absolute value of the effective charge of the conductor,  $k_b$  is the Boltzmann constant, and  $\rho$  is the resistivity of the conductor. The corresponding atomic flux  $J_a(x,t)$  in the line can be written as [18], [26]

$$J_a = \frac{D_a C\Omega}{k_b T_m} \left( \frac{\partial \sigma}{\partial x} - \frac{q^* \rho}{\Omega} j \right), \tag{5}$$

where C is the atomic concentration in the line, which in an ideal lattice with zero stress is equal to  $1/\Omega$ . Note that  $J_a$  can be positive or negative, depending on the reference direction chosen and the actual direction of the electric current. The randomness in the TTF due to EM is primarily accounted for by the corresponding randomness in  $D_a$ , which is lognormally distributed [5] with mean  $D_{avg}$ . Strictly speaking,  $D_a$  also

depends on the value of stress. However, it has been reported that the numerical results with stress dependent  $D_a$  are "not too different" from constant  $D_a$  [18]. Hence, as in many previous works [14], [19]–[21], we will assume that  $D_a$  is stress-independent. Finally, a void is said to nucleate in the line once the stress exceeds a predefined threshold value  $\sigma_{th} > 0$ .

#### **III. PROBLEM FORMULATION**

On the face of it, simulating the stress at every point in the power grid seems to require solving a very large number of PDEs, which is (computationally) prohibitively expensive. Instead, the work in [11], [22], [23] applies a number of steps in the numerical formulation of the problem so as to convert the system of PDEs into a system of ODEs (Ordinary Differential Equations). Even better, the resulting system of ODEs represents a linear time-invariant (LTI) system, which allows a very efficient solution. We will review and summarize this method, without too much detail; the interested reader is referred to the original sources.

#### A. Preliminaries

The work starts with the Method of Lines (MoL), which is a special *finite-difference* technique for solving PDEs [27]. The basic idea is to discretize the PDE in *all but one* independent variable, so that we are left with a set of ODEs that approximate the PDE. One can then use well-established methods to numerically solve the ODE. Discretizing the PDE is achieved by the standard method of approximating the derivative by a central difference formula [27].

Next, as in most recent works on EM, it is assumed that diffusivity  $D_a$  is the same throughout a branch. As a result, the atomic flux divergence is higher at branch ends, i.e. junctions, as compared to branch interior. Thus, voids will nucleate only at the junctions in a tree. This is a very mild assumption [19], [25] because it is much more common in the field to find voids at the end-points of branches.

EM is a long-term failure mechanism. As such, short-term transients in workload typically experienced in chips do not play a significant role in EM degradation. Thus, standard practice in the field has been to use an effective-current model [28] to estimate EM degradation, so that the lifetime of a metal line when carrying the constant (DC) effective current and the time-varying transient current is the same. We also adopt this approach in our work, so that the circuit currents loading the grid are assumed to be fixed (DC) currents. Hence, between any two successive void nucleations, the power grid has constant (effective) currents, voltages and conductances, and can be modelled as a DC linear system. So, the effective branch currents can be obtained directly from effective source currents by doing a simple DC analysis.

As voids nucleate due to EM, branch resistances change fairly quickly. Correspondingly, the currents also change quickly to their new effective values and the voltage drops change in turn. Thus, the power grid model must include the fact that the branch conductances change (occasionally) over time, and so can be expressed as

$$G(t)v(t) = i_s, (6)$$

where G(t) is the piecewise-constant conductance matrix, v(t) is the corresponding time-varying but piecewise-constant node voltage drop vector and  $i_s$  is the vector of effective DC source current values that model the underlying logic blocks.

Before proceeding with any analysis, one must assign reference directions to all tree branches, in order to consistently track the signs of branch currents and atomic flux. Formally, an interconnect tree is a acyclic graph  $\mathcal{T} = (\mathcal{N}, \mathcal{B})$ , where  $\mathcal{N}$ is a set of tree *junctions* and  $\mathcal{B}$  is a set of resistive *branches*. Fig. 5 shows a typical interconnect tree structure. A branch is a continuous straight metal line of uniform width. A junction is any point on the interconnect tree where a branch ends or where a via is located. Junctions are classified based on their *degree*, which is defined to be the number of branches connected to it, as shown in Fig. 5. Note that vias do not contribute to the degree of a junction. Reference directions are created by using Breadth First Search (BFS) to assign directions to all branches in the tree (shown by dashed arrow lines in Fig. 5). The current density  $j_k$  and the atomic flux  $J_{a,k}$ for branch  $b_k$  is positive if it flows in the reference direction, otherwise it is negative. The initial stress at t = 0 in branch  $b_k$  is assumed equal to its residual thermal stress  $\sigma_{T,k}$ , which can be computed as shown in [29].

#### B. Extending Korhonen's model to trees

In order to find the stress across the whole interconnect tree, Korhonen's model is extended to account for the coupling between the tree branches. For any point  $x_k$  within branch  $b_k$ , the stress  $\sigma_k(x_k, t)$  is found using the original Korhonen's model (4). In order to couple the equations corresponding to the various branches, a pair of *boundary laws* have been formulated to capture the behaviour of stress at branch ends, i.e. at junctions. The first of these, which follows from basic physical laws of mass conservation and continuity, governs behaviour during the nucleation phase, and is given below [22].

**Law 1.** Before a void nucleates at a junction, stress is continuous across a junction and the number of metal atoms flowing into a junction per unit time is equal to the number of number of metal atoms flowing out from it.

The second law governs behaviour during the void growth phase. The recent work in [29] provides an extension of the Korhonen 1D model to describe behaviour of stress around a void. From this, stress falls to zero at the void surface but remains at its original value a very short distance  $\delta \approx 1$ nm from the void surface, called the *thickness of the void interface*. Using [29], this leads to the second law, given below [22].

**Law 2.** After a void nucleates at a junction, there is no flow of atomic flux between the connected branches. The stress gradient at junction n in branch k is

$$\partial \sigma_{jn} / \partial x_k = \pm \sigma_{jn} / \delta,$$
 (7)



Fig. 7. A simple 3-terminal tree  $T_d$ .

where  $\sigma_{jn}$  is the stress value at the junction and  $\delta$  is the thickness of the void interface. The sign is positive for branches with reference directions going out of the junction and negative for branches with reference directions going into the junction.

For better understanding, we illustrate our approach with a simple example. Consider a simple tree  $\mathcal{T}_d = (\mathcal{N}, \mathcal{B})$ , with  $\mathcal{N} = \{n_1, n_2, n_3\}$  and  $\mathcal{B} = \{b_1, b_2\}$ , with reference directions as shown by the dashed arrows in Fig. 7. Branch  $b_k$  has dimensions  $L_k \times w_k \times h_k$  (length  $\times$  width  $\times$  height), carries a current density  $j_k$ , has an atomic diffusivity of  $D_{a,k}$  and temperature  $T_{m,k}$ , where k is 1 or 2 in this case. Note that  $x_1 = L_1$  and  $x_2 = 0$  denote the same point: the location of  $n_2$ . We are interested in the stress as a function of position and time, i.e.  $\sigma_1(x_1, t)$  and  $\sigma_2(x_2, t)$  for branches  $b_1$  and  $b_2$ , respectively. Once these are known, we can easily determine the EM degradation in the branches.

Korhonen's model (4) gives the time rate of change of stress for a point *within* a branch k, as follows:

$$\frac{\partial \sigma_k}{\partial t} = \frac{B\Omega D_{a,k}}{k_b T_{m,k}} \frac{\partial}{\partial x_k} \left( \frac{\partial \sigma_k}{\partial x_k} - \frac{q^* \rho}{\Omega} j_k \right), \ x_k \in (0, L_k)$$
(8)

However, in order to solve the PDE for the whole tree, we need to also state the boundary conditions at all end-points of branches. The boundary conditions describe the behaviour of stress and atomic flux at the junctions. For the example in Fig. 7, we will discuss the two cases of a diffusion barrier and a dotted-I junction, based on the above two laws.

1) Diffusion Barrier: Junctions  $n_1$  and  $n_3$  are diffusion barriers, where the atomic flux is blocked. Considering the nucleation phase first, by Law 1,  $J_a$  is zero at the barrier so that from (5):

$$J_{a,1}(0,t) = 0 \implies \frac{\partial \sigma_1(0,t)}{\partial x_1} = \frac{q^* \rho}{\Omega} j_1 \tag{9a}$$

$$J_{a,2}(L_2,t) = 0 \implies \frac{\partial \sigma_2(L_2,t)}{\partial x_2} = \frac{q^* \rho}{\Omega} j_2 \qquad (9b)$$

We next move to the void growth phase. For a void to nucleate at  $n_1$  ( $n_3$ ), we must have  $j_1 < 0$  ( $j_2 > 0$ ) so that the electron flow pushes the metal atoms away from  $n_1$  ( $n_3$ ). Using Law 2, the stress gradients at junctions  $n_1$  and  $n_3$  throughout the void growth phase are:

$$\frac{\partial \sigma_1(0,t)}{\partial x_1} = \frac{\sigma_1(0,t)}{\delta}, \quad \frac{\partial \sigma_2(L_2,t)}{\partial x_2} = -\frac{\sigma_2(L_2,t)}{\delta} \quad (10)$$

where  $\sigma_1(0,t) = \sigma_2(L_2,t) = \sigma_{th}$  at the time of void nucleation.



Fig. 8. Early failures and Conventional Failures.

2) Dotted-I Junction: The atomic flux interaction at dotted-I junction  $n_2$  is the key to describing the coupling of stresses in branches  $b_1$  and  $b_2$ . Considering the nucleation phase first, by Law 1 the stress is continuous across  $n_2$ , which is the same physical point of both  $b_1$  and  $b_2$ , so that:

$$\sigma_1(L_1, t) = \sigma_2(0, t) \tag{11}$$

and atomic flux can flow freely between  $b_1$  and  $b_2$ . Because the material flow across an infinitesimal boundary at  $n_2$  has to be continuous, by Law 1, we have:

$$w_1 h_1 J_{a,1}(L_1, t) = w_2 h_2 J_{a,2}(0, t)$$
(12)

Next, considering the void growth phase, once a void nucleates at  $n_2$ , it is shared by both branches  $b_1$  and  $b_2$ . By Law 2, we effectively treat  $n_2$  as a diffusion barrier for both branches  $b_1$ and  $b_2$ , with:

$$\frac{\partial \sigma_1(L_1,t)}{\partial x_1} = -\frac{\sigma_1(L_1,t)}{\delta}, \quad \frac{\partial \sigma_2(0,t)}{\partial x_2} = \frac{\sigma_2(0,t)}{\delta} \quad (13)$$

Combining the boundary conditions obtained from (9)-(13), and the initial condition as stated earlier, with (8), we can formulate a Linear Time Invariant (LTI) system that completely determines  $\sigma_1(x,t)$  and  $\sigma_2(x,t)$ .

#### C. Void size and early failures

Tracking void growth is useful in order to determine the change in branch resistances and in the corresponding current densities. We assume that once a void nucleates at a junction, it is shared by all the branches connected to that junction. Recent work [29] shows that the initial void growth rate is very high, and gives the steady state void volume. Hence, as a conservative approximation, we assume that once a void nucleates at any junction  $n_p$ , the void lengths for all branches  $b_k$  connected to  $n_p$  reach their steady state values in a very short period of time. As a result, the line resistance rises immediately to its steady state value for all connected branches. Based on this, we iteratively find  $j_k$  and the corresponding steady-state void volume using a modified Richardson iteration [11]. We ignore void healing and void migration.

In addition, we also check for *early failures* based on the location of the void. Depending on its location and size, a void might lead to an early failure. Specifically, if a large enough void forms *below a via*, it might in some cases cause an open circuit failure by disconnecting the via, as in Fig. 8. This happens because the capping layer is *not conductive*; hence

if the void covers the entire cross-section of a via, there is no conductive path left between the via and the tree below and the current in the via completely falls to 0, as shown in Fig. 8. On the other hand, voids that form above the via generally happen at the top of the line away from the via, and so take a long time to completely fill the cross-section, and even then do not translate to an open circuit because the current can continue to flow through the metal liner. The appearance of an open circuit right under a via, as happens during the early failures, can have a significant impact on the voltage drop and grid reliability and thus should be accounted for. In our model, once we have determined the steady state void volume, we check for *i*) whether the void is located below a via (this is determined based on the given geometry of the grid) and *ii*) whether the void is large enough to disconnect the via. If both conditions are met, this void leads to an early failure, so we remove the via from the power grid and update the voltage drops and current density values.

## D. Discretization, ODEs and the LTI system

Korhonen's model (4) is often scaled by introducing dimensionless variants of stress, length and time [26]. This leads to stable PDEs that are easier to solve numerically. We define the following scaling factors for any branch  $b_k \in \mathcal{B}$ :

$$\tau \stackrel{\scriptscriptstyle \Delta}{=} \frac{B\Omega}{k_b T_m^{\star}} \frac{D_a^{\star} t}{L_c^2}, \quad \eta_k \stackrel{\scriptscriptstyle \Delta}{=} \frac{\Omega \sigma_k}{k_b T_m^{\star}}, \quad \xi_k \stackrel{\scriptscriptstyle \Delta}{=} \frac{x_k}{L_k}$$
(14)

where  $D_a^{\star}$  is the atomic diffusivity at some chosen nominal temperature  $T_m^{\star}$ ,  $L_c$  is some chosen characteristic length and  $0 \leq x_k \leq L_k$ . The new variables  $\tau$ ,  $\eta$  and  $\xi$  are referred to as reduced time, stress and distance, respectively. After some manipulation [11], this leads to

$$\frac{\partial \eta_k}{\partial \tau} = \theta_k \frac{\partial^2 \eta_k}{\partial \xi_k^2} \tag{15}$$

where  $\theta_k = L_c^2 D_{a,k} T_m^* / (L_k^2 D_a^* T_{m,k})$ . Also, the atomic flux in  $b_k$  can be restated in terms of reduced variables:

$$J_{a,k} = \frac{D_{a,k}CT_m^*}{L_k T_{m,k}} \left(\frac{\partial \eta_k}{\partial \xi_k} - \alpha_k\right)$$
(16)

where  $\alpha_k = q^* \rho j_k L_k / (k_b T_m^*)$ ,  $j_k$  is the current density,  $T_{m,k}$  is the temperature and  $D_{a,k}$  is the diffusivity for  $b_k$ . We uniformly discretize branch  $b_k$  into N segments, as in the Method of Lines, where N is the same for all branches (because we have scaled all branch lengths to 1 as in (14)). The reduced stress at each of the N + 1 discrete spatial points  $\{0, \ldots N\}$  is denoted by  $\eta_{k,i}$  and the time rate of change of  $\eta_{k,i}$  is (from (15)):

$$\frac{\partial \eta_{k,i}}{\partial \tau} = \theta_k \frac{\partial^2 \eta_{k,i}}{\partial \xi_k^2} \quad \text{for } i = 0, 1, \dots N$$
 (17)

Further, we approximate the partial derivatives with respect to  $\xi$  using a central difference approximation, so that (17) gives:

$$\frac{\mathrm{d}\eta_{k,i}}{\mathrm{d}\tau} = \gamma_k \eta_{k,i+1} - 2\gamma_k \eta_{k,i} + \gamma_k \eta_{k,i-1} \tag{18}$$

where  $\gamma_k = \theta_k/(\Delta\xi)^2$  and  $\Delta\xi = \Delta\xi_k = 1/N$ ,  $\forall k$ . This (18) is basically the *governing linear ODE equation* for every discretized point in the tree, and leads directly to the matrix equation that forms the linear system, once combined with the boundary conditions. As for these boundary conditions, note that the corresponding atomic flux  $J_{a,k,i}$  at the  $i^{th}$  point is:

$$J_{a,k,i} = \frac{D_{a,k}CT_m^*}{L_k T_{m,k}} \left(\frac{\eta_{k,i+1} - \eta_{k,i-1}}{2\Delta\xi} - \alpha_k\right)$$
(19)

This, combined with the two boundary laws introduced earlier, gives additional terms that enter into the ODE system formulation. Note that for each branch, the ODEs at the two junctions  $(i = \{0, N\})$  require the values for  $\eta_{k,-1}$  and  $\eta_{k,N+1}$ , which are *not* part of the  $\xi_k$  domain. The values at these *ghost points* are obtained by solving for the respective boundary condition(s). After much manipulation [11], these equations combine to provide a non-singular matrix A that forms the homogeneous LTI system

$$\dot{z}(t) = Az(t), \tag{20}$$

which incorporates the system inputs (fixed (DC) branch currents) by a simple variable transformation.

## IV. NUMERICAL SOLUTION

This above system (20) is easily solved using existing techniques for solving large ODE systems, such as used in standard circuit simulation [30]. One discretizes the derivative, leading to an algebraic system, then steps forward in time, solving a linear algebraic system at every time point. We have found [23] that this is a *stiff* system, again similar to circuit simulation, so that numerical formulas like the 2nd order Gear (BDF2) are best suited to the task. This provides, at every time point, the stress values at every discretized point along the tree.

We have found that numerical solutions of the above 1D problem formulation produce excellent results, in comparison with analytical solutions, with 3D simulation, as well as with experimental measurements. For the 3-terminal tree  $T_d$  with a dotted-I junction in Fig. 7, the kinetics of stress evolution of all three junctions and the evolution of the stress distribution across the 3-terminal tree are shown in Fig. 9. Comparisons of our solution for a 4-terminal T-structure with the analytical solution of [31] (CTHKS) for the same structure are shown in Fig. 10 and show excellent agreement. Finally, a comparison with a detailed 3D solver was shown in [32] and again shows excellent agreement.

#### A. Solving the full grid

As mentioned earlier, the power grid TTF is estimated using the *mesh model*, in which a grid is deemed to have failed when enough voids have nucleated so that the voltage drop at a node exceeds the user-provided voltage drop specification. The temperature distribution of the grid is determined at t = 0using compact thermal models, which gives the initial stress profile for the trees. A subset of trees, called the *active set*, is chosen such that the first void nucleation time for each tree in the active set is less than some time threshold  $t = t_m$ . All



Fig. 9. For  $\mathcal{T}_d$ , (a) evolution of stress at junctions with time and (b) stress profile with time. Here,  $L_1 = L_2 = 50 \mu m$ , and  $j_1 = -j_2 = 6e9 A/m^2$ .



Fig. 10. Comparing our numerical solution for stress evolution in 4-terminal T-structure with an analytical solution using [31].

trees in the active set are then numerically integrated using the LTI formulation to obtain their stress as a function of position and time. Every time a void nucleates at a junction, the steady state void volume is calculated for all connected branches and the corresponding resistances are updated. If a void leads to an *early failure* by virtue of its location, the corresponding via is removed from the grid. Next, the node voltage drops and the temperature distribution are updated. Finally, the equations for all trees in the active set are reformulated using the updated boundary conditions and the simulation is carried on until the next void failure. Due to increasing branch resistances, the conductivity of the grid degrades over time and the voltage drops eventually exceed the specification. The earliest time when a voltage drop violation occurs is the TTF of the grid.

In order to account for the randomness in EM degradation, we perform *Monte Carlo random sampling* to estimate the Mean Time to Failure (MTF) to within a user-specified error tolerance. The time threshold  $t_m$  for selecting trees in the

| TABLE I             |
|---------------------|
| PERFORMANCE RESULTS |

| Power Grids |         |          |         | Performance Metrics |         |                 |  |
|-------------|---------|----------|---------|---------------------|---------|-----------------|--|
| Grid        | num. of | num. of  | num. of | MTF <sub>s</sub>    | MTFm    | $t_{	ext{CPU}}$ |  |
| Name        | nodes   | branches | trees   | (years)             | (years) | (mins)          |  |
| ibmpg1      | 6K      | 11K      | 709     | 3.3                 | 7.1     | 0.6             |  |
| ibmpg2      | 62K     | 61K      | 462     | 6.7                 | 12.0    | 1.2             |  |
| ibmpg3      | 410K    | 401K     | 8.1K    | 4.5                 | 6.8     | 4.2             |  |
| ibmpg4      | 475K    | 465K     | 9.6K    | 9.0                 | 16.8    | 6.6             |  |
| ibmpg5      | 249K    | 496K     | 2K      | 4.5                 | 6.4     | 1.8             |  |
| ibmpg6      | 404K    | 798K     | 10.2K   | 5.6                 | 11.2    | 14.0            |  |
| ibmpgnew1   | 316K    | 698K     | 19.5K   | 3.8                 | 13.3    | 4.8             |  |
| ibmpgnew2   | 718K    | 698K     | 19.5K   | 4.7                 | 7.4     | 3.6             |  |
| PG1         | 560K    | 558K     | 2.6K    | 4.4                 | 17.2    | 2.4             |  |
| PG2         | 1.2M    | 1.2M     | 5.6K    | 3.6                 | 10.4    | 7.2             |  |
| PG3         | 1.6M    | 1.6M     | 6.9K    | 3.9                 | 8.5     | 3.6             |  |
| PG4         | 2.6M    | 2.6M     | 12.2K   | 3.3                 | 14.8    | 9.0             |  |
| PG5         | 4.1M    | 4.1M     | 12.7K   | 4.4                 | 9.2     | 8.4             |  |

active set is part of the Monte-Carlo (MC) process. It is initially set to a sufficiently high value and is updated as more mesh TTF samples are obtained form the subsequent MC iterations. The overall flow chart is shown in Fig. 12. Finally, Table I shows performance results on a number of test grids, including IBM benchmarks [24] and specially synthesized grids [23]. The column for MTF<sub>s</sub> shows the MTF when, having simulated the stress using our EKM engine, we declare grid to have failed when the very first void occurs; i.e., using a series model approach, and this is shown only for comparison. The table also shows the MTF using our mesh model  $(MTF_m)$ and the total wall-clock computational (CPU) time  $(t_{CPU})$ that our approach took to generate the  $\mathrm{MTF}_{\mathrm{m}}.$  Clearly, the series model is highly pessimistic compared to the voltagedrop based mesh model, and the compute time is very good and promising for practical work.



Fig. 11. Impact of early failures (EF) on (a) the maximum voltage drop (shown for one sample grid) and (b) estimated mesh MTF for ibmpg2.



Fig. 12. Flow chart showing the Monte Carlo loop around the numerical solver, including the termination check and the check for early failures.

In order to assess the impact of early failures on the grid lifetime, we present a case study using the ibmpg2 grid; we estimate its mesh MTF under two settings, one where early failure detection is on and the other where early failure detection is turned off. As can be seen from Fig. 11b, turning off early failures gives an optimistic MTF estimate which is 34% longer than the actual MTF. Thus, if the target product lifetime is set as 15 yrs, this grid will fail EM sign off due to the impact of early failures, but would erroneously succeed if early failures are ignored. The difference in MTFs stems from the influence of early failures on node voltage drops. In Fig. 11a, we show how the maximum node voltage drop changes with time (for one sample grid) as the voids nucleate due to EM. Since early failures lead to removal of a via, their impact on voltage drops is more severe, which ultimately leads to shorter lifetimes. In general, the effect of early failures gets more pronounced as the difference between the maximum initial voltage drop and  $v_{th}$  increases.

## V. CONCLUSION

For many decades, EM checking for complex on-die metal structures under realistic (complex and time-varying) operating conditions has been virtually impossible. The only available option for users had been a current density check that is mapped empirically to an MTF value, via the venerable but limited Black's model. The dynamics of the failure over time, how it is impacted by the stress in the metal around it, how it depends on the currents as they evolve in time, all that has been simply unavailable for tool developers and chip design teams. Now, things are changing. With the availability of an accurate 1D physical model and its extension to large metal structures, we can finally deal with EM damage as a well-understood physical concept, much like familiar circuit concepts like voltage and current. Even though we have assumed acyclic metal structures and DC currents, the method is very easily extended to eliminate both these limitations. The task now is to integrate this new knowledge into design tools, so as to enable not only reliable design but better design overall. In this work, we have taken a first step in that direction, and we look forward to seeing this approach adopted and extended.

#### REFERENCES

- J. Warnock, "Circuit design challenges at the 14nm technology node," in ACM/IEEE 48th Design Automation Conference (DAC-2011), San Diego, CA, June 5-9 2011, pp. 464–467.
- [2] V. Sukharev, "Beyond black's equation: Full-chip EM/SM assessment in 3D IC stack," *Microelectronic Engineering*, vol. 120, pp. 99–105, May 25 2014.
- [3] A. S. Oates, "Interconnect reliability challenges for technology scaling: A circuit focus," in *IEEE International Interconnect Technology Conference / Advanced Metallization Conference (IITC/AMC)*, San Jose, CA, May 23-26 2016, pp. 59–59.
- [4] M. Ohring and L. Kasprzak, Reliability and Failure of Electronic Materials and Devices, 2nd ed. Academic Press, 2014.
- [5] J. R. Lloyd and J. Kitchin, "The electromigration failure distribution: The fine-line case," *Journal of Applied Physics*, vol. 69, no. 4, pp. 2117–2127, Feb. 1991.
- [6] J. R. Black, "Electromigration failure modes in aluminum metallization for semiconductor devices," in *Proceedings of the IEEE*, Sep. 1969, pp. 1587–1594.
- [7] S. P. Hau-Riege and C. V. Thompson, "Experimental characterization and modeling of the reliability of interconnect trees," *Journal of Applied Physics*, vol. 89, no. 1, pp. 601–609, January 1 2001.
- [8] M. Hauschildt, M. Gall, S. Thrasher, P. Justison, R. Hernandez, H. Kawasaki, and P. S. Ho, "Statistical analysis of electromigration lifetimes and void evolution," *Journal of Applied Physics*, vol. 101, no. 4, p. 043523, 2007.
- [9] J. R. Lloyd, "Black's law revisited Nucleation and growth in electromigration failure," *Microelectronics Reliability*, vol. 47, pp. 1468–1472, Sep. 2007.
- [10] I. A. Blech, "Electromigration in thin aluminum films on titanium nitride," *Journal of Applied Physics*, vol. 47, no. 4, pp. 1203–1208, Apr. 1976.
- [11] S. Chatterjee, V. Sukharev, and F. N. Najm, "Power grid electromigration checking using physics-based models," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 37, no. 7, pp. 1317–1330, Jul. 2018.
- [12] C. L. Gan, C. V. Thompson, K. L. Pey, and W. K. Choi, "Experimental characterization and modeling of the reliability of three-terminal dualdamascene Cu interconnect trees," *Journal of Applied Physics*, vol. 94, no. 2, pp. 1222–1228, July 15 2003.
- [13] S. Chatterjee, M. Fawaz, and F. N. Najm, "Redundancy-aware electromigration checking for mesh power grids," in *IEEE/ACM International Conference on Computer-Aided Design (ICCAD)*, San Jose, CA, November 18-21 2013, pp. 540–547.
- [14] X. Huang, T. Yu, V. Sukharev, and S.-X.-D. Tan, "Physics-based electromigration assessment for power grid networks," in ACM/IEEE 50th Design Automation Conference (DAC-2014), San Francisco, CA, June 1-5 2014.
- [15] B. Ouattara, L. Doyen, D. Ney, H. Mehrez, and P. Bazargan-Sabet, "Power grid redundant path contribution in system on chip (soc) robustness against electromigration," *Microelectronics Reliability*, vol. 54, no. 9-10, pp. 1702–1706, September-October 2014.
- [16] B. Li, A. Kim, P. McLaughlin, B. Linder, and C. Christiansen, "Electromigration characteristics of power grid like structures," in *IEEE International Reliability Physics Symposium (IRPS)*, Burlingame, CA, March 11-15 2018, pp. 4F.3.1–5.

- [17] V. Sukharev and F. N. Najm, "Electromigration check: where the design and reliability methodologies meet," *IEEE Transactions on Device and Materials Reliability (TDMR)*, vol. 18, no. 4, pp. 498–507, Dec. 2018.
- [18] M. A. Korhonen, P. Borgesen, K. N. Tu, and C.-Y. Li, "Stress evolution due to electromigration in confined metal lines," *Journal of Applied Physics*, vol. 73, no. 8, pp. 3790–3799, April 15 1993.
- [19] D.-A. Li, M. Marek-Sadowska, and S. R. Nassif, "A method for improving power grid resilience to electromigration-caused via failures," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 23, no. 1, pp. 118–130, Jan. 2015.
- [20] X. Huang, V. Sukharev, J.-H. Choy, M. Chew, T. Kim, and S. X.-D.Tan, "Electromigration assessment for power grid networks considering temperature and thermal stress effects," *Integration, the VLSI Journal*, vol. 55, pp. 307–315, Sep. 2016.
- [21] D.-A. Li, M. Marek-Sadowska, and S. R. Nassif, "T-VEMA: A temperature- and variation-aware electromigration power grid analysis tool," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 23, no. 10, pp. 2327–2331 K electromigration (EM), jL product, process variation, thermal analysis, Oct. 2015.
- [22] S. Chatterjee, V. Sukharev, and F. N. Najm, "Fast physics-based electromigration checking for on-die power grids," in *IEEE/ACM International Conference on Computer-Aided Design (ICCAD)*, Austin, TX, November 7-10 2016.
- [23] —, "Fast physics-based electromigration assessment by efficient solution of linear time-invariant (LTI) systems," in *IEEE/ACM International Conference on Computer-Aided Design (ICCAD)*, Irvine, CA, November 13-16 2017, pp. 659–666.
- [24] S. R. Nassif, "Power grid analysis benchmarks," in 13th Asia and South Pacific Design Automation Conference (ASPDAC-08), Seoul, Korea, January 21-24 2008, pp. 376–381.
- [25] B. Li, J. Gill, C. J. Christiansen, T. D. Sullivan, and P. S. McLaughlin, "Impact of via-line contact on cu interconnect electromigration performance," in *IEEE International Reliability Physics Symposium*, April 17-21 2005, pp. 24–30.
- [26] J. Clement, "Electromigration modeling for integrated circuit interconnect reliability analysis," *IEEE Transactions on Device and Materials Reliability*, vol. 1, no. 1, pp. 33–42, Mar. 2001.
- [27] W. E. Schiesser, Computational Mathematics in Engineering and Applied Science: ODEs, DAEs, and PDEs. Taylor & Francis, 1993.
- [28] L. M. Ting, J. S. May, W. R. Hunter, and J. W. McPherson, "AC electromigration characterization and modeling of multilayered interconnects," in *International Reliability Physics Symposium (IRPS)*, Mar. 1993, pp. 311–316.
- [29] V. Sukharev, A. Kteyan, and X. Huang, "Postvoiding stress evolution in confined metal lines," *IEEE Transactions on Device and Materials Reliability (TDMR)*, vol. 16, no. 1, pp. 50–60, Mar. 2016.
- [30] F. N. Najm, Circuit Simulation. Hoboken, NJ: John Wiley & Sons, Inc., 2010.
- [31] H.-B. Chen, S.-X.-D. Tan, X. Huang, T. Kim, and V. Sukharev, "Analytical modeling and characterization of electromigration effects for multibranch interconnect trees," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 35, no. 11, pp. 1811– 1824, Nov. 2016.
- [32] J.-H. Choy, V. Sukharev, S. Chatterjee, F. N. Najm, A. Kteyan, and S. Moreau, "Finite-difference methodology for full-chip electromigration analysis applied to 3D IC test structure: Simulation vs. experiment," in *IEEE International Conference on Simulation of Semiconductor Processes and Devices (SISPAD-17)*, Kamakura, Japan, September 7-9 2017, pp. 41–44.