Representative Path Selection for Post-Silicon Timing Prediction Under Variability

Size: px
Start display at page:

Download "Representative Path Selection for Post-Silicon Timing Prediction Under Variability"

Transcription

1 Representative Path Selection for Post-Silicon Timing Prediction Under Variability Lin Xie and Azadeh Davoodi Department of Electrical & Computer Engineering University of Wisconsin - Madison {lxie2, adavoodi}@wisc.edu ABSTRACT The identification of speedpaths is required for post-silicon (PS) timing validation, and it is currently becoming timeconsuming due to manufacturing variations. In this paper we propose a method to find a small set of representative paths that can help monitor a large pool of target paths which are more prone to fail the timing at PS stage, to reduce with the validation effort. We first introduce the concept of effective rank to select a small set of representative paths to predict the target paths with high accuracy. To handle the large dimension and degree of independent random parameter variations, we then allow modeling target path delays using segment delays and formulate it as a convex problem. The identification of segments can be incorporated in design of custom test structures to monitor PS circuit timing behavior. Simulations show that we can use the actual timing information of less than 100 paths or segments to accurately predict up to 3,500 target paths (statisticallycritical ones) with more than 1,000 process variables. Categories and Subject Descriptors B.7.2 [Integrated Circuits]: Design Aids General Terms Algorithms, Design Keywords Post-Silicon Validation, Process Variations 1. INTRODUCTION In the presence of deep submicron electrical issues and process variations, post-silicon timing validation is becoming significantly expensive and time-consuming [6]. Among the related literature, [1] proposes a statistical learning approach to predict timing failures that might occur on target speedpaths. This prediction is with the aid of measuring the delays of a small set of representative paths. However, [1] does not discuss selection of these representative paths. This research is supported by National Science Foundation under award CCF Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DAC 2010, June 13-18, 2010, Anaheim, California, USA. Copyright 2010 ACM ACM $ To help identify representative paths, [3] proposes a technique which relies on defining a set of basic features (e.g., types of logic gates) to rank and cluster the target speedpaths. This helps to define a smaller subset of representative ones to be used for timing failure prediction. However, it is not clear to what extent these features can bind the paths to their representative ones in the presence of variations. Another related work is [7] which synthesizes a representative speedpath so that its delay highly correlates with the circuit delay. By directly measuring the delay of this representative path at post-silicon, the chip frequency can be predicted. However, this approach cannot localize the timing failure. In this paper, we propose to monitor a large pool of target paths, which are more likely to fail timing at post-silicon stage. We study the identification of a set of representative paths from this set of target paths at design stage such that special-purpose test structures or flipflops can be embedded in the circuit to allow their post-silicon measurement. These measured delays will be used to predict the post-silicon delays of a large pool of target speedpaths. The goal is to select of a minimum number of representative paths such that their delays would highly correlate with the delays of target speedpaths. This helps to better localize those speedpaths that may fail their timing requirements. We assume the source of delay uncertainty at the post-silicon stage is parameter variations (not other electrical issues). The challenge is knowledge of actual parameter variation values. To further reduce the post-silicon validation effort, we allow selection of path segments. Segment selection can be useful in presence of large number of independent random variations to reduce the overall number of post-silicon measurements. Even if segment delay mays not be directly measured, their identification can be incorporated in design of custom test structures to predict the post-silicon behavior. Our contributions in this paper are enumerated below: 1. Given a high degree of independent random variations and large number of target paths we discuss a technique which can decrease the number of representative paths. This is based on the idea of effective rank of the transformation matrix between the process parameter variations and target path delays. 2. We discuss a convex formulation to select minimum number of segments to predict the target paths. Simulations show that for over 1,000 random variations and up to several thousand target paths, at most 100 paths or segments are needed for delay prediction. Even though we still have a prediction error, we show that a guard-band for post-silicon timing analysis to be very small, and therefore our framework can be used to accelerate timing validation.

2 G1 G2 G3 G4 G5 G6 G7 G8 G9 Figure 1: Only three of the designated paths (merging at G5) are sufficient to predict the delay of the fourth one with zero error. 2. PRELIMINARIES AND MOTIVATION Preliminaries: Given a set of target paths P tar = {p 1, p 2,..., p n}, we aim to derive their delays at the post-silicon stage, which are denoted by d Ptar = [d p1,..., d pn ]. Let us define process variations by x = [x 1,..., x m], and assume that the actual values of x for a fabricated chip are unavailable. Each entry x i can fall into one of the variation categories: die-to-die, within-die and random variations. A die-to-die variation is common to all the paths. A within-die variation is shared among a group of interconnects or gates that are in the same physical proximity on the chip. Note that we can decorrelate the within-die variations using existing approaches such as the hierarchical spatial correlation model in [2]. A random variation is specific to a gate or interconnect that belongs to at least one of the n target paths. Therefore, the number of random components can be very large in general. All entries x i are independent from each other and are typically Gaussian distributed with mean 0 and variance 1. We model the path delays as a linear function of the parameters variations similar as [2]: G1 G2 d Ptar = µ Ptar + Ax, (1) where µ Ptar is a vector representing the nominal delays of all paths in the set P tar and the matrix A captures the linear transformation of process parameter variables to the delays of the target paths. For the i-th path delay, we have d pi = µ pi + n i j=1 aijxj, where µ pi is the nominal path delay and a ij = 0 if the j-th parameter variation is not related to one or more gates or interconnects on path i. Otherwise, a ij is the sensitivity of d pi with respect to x j. Next, we introduce the notation for defining the delay of a segment of a path as it will be needed in future sections. Given the graph representation of the set of target paths, a segment is the union of those consecutive edges in the paths that do not have any incoming or outgoing edges in between. For a vector of n S segments S = {s 1, s 2,..., s ns }, we express its delay d S and Eqn (1) as: d S = µ S + Σx, d Ptar = Gd S = Gµ S + GΣx, (2) which indicates A = GΣ and µ Ptar = Gµ S. Motivation: Given a set of target paths P tar, we aim to select a minimum-sized subset of paths from P tar to form a set of representative paths, which we denote by P r = {p 1, p 2,..., p r}. The delays of these representative paths will be measured at post-silicon stage and used to predict the delays of the remaining paths in P tar to reduce the postsilicon validation effort. Take Figure 1 as an example. Here, we consider four paths: p 1 : G1 G3 G5 G7 G9, p 2 : G1 G3 G5 G6 G8, p 3 : G2 G4 G5 G6 G8, and G3 G4 G5 G6 G7 G8 G9 p 4 : G2 G4 G5 G7 G9. The graph-based representation of the subcircuit containing these four paths is given on the right of Figure 1. We can write an exact expression of the delay of each path as a linear combination of the remaining three paths due to the common segments shared among these paths. For example, d p1 = d p2 d p3 + d p4. It indicates that we can form a set of representative paths P r = {p 2, p 3, p 4} to model the delays of the remaining paths with zero error. However, the dimension of P r can still be very large (3 out of 4). As we will show in this paper, by allowing a small prediction error tolerance, it is possible to significantly reduce the number of representative paths. This can further be reduced if accurate path-segment delay information can be available at the post-silicon stage. 3. PROBLEM DEFINITION Problem Definition: Given a set of target paths denoted by P tar of size n, we aim to select a set of representative paths P r P tar of size r. Here, we denote the set of remaining n r paths in P tar as P m and their delays as d Pm. Assuming the actual delays of the representative paths denoted by d Pr are available at post-silicon stage, we like to build a prediction model that maps d Pr to d Pm, the actual delays of paths in P m at post-silicon stage. Our objective is to minimize r and identify the representative paths P r, such that the error of the delay prediction model is upper bounded by a sufficient small provided tolerance ǫ. As shown above, we expect to predict the timing of the most possible timing-failure paths correctly, which can help towards more effective and faster debug and diagnosis. To solve this problem, in Section 4, we show that a recentlyintroduced idea known as effective rank [4] of the transformation matrix between parameter variations and target path delays (A in Eqn (1)) can be applied. While the rank of A identifies representative paths which exactly predict delays of the target paths, the effective rank of A is calculated for a given tolerance error and can reduce the number of representative paths. To further reduce the number of selections, in Section 5, we allow selecting path-segments to predict the delays of target paths. We show significant reduction in the total number of selections even when the dimension of independent random variations, x, is very high. Other related research acknowledge that dealing with large number of parameter variations remains a challenge for the simpler problem of chip delay prediction [7]. Assumptions on Accurate Delay Measurement In our problem definition, we assume that accurate postsilicon delay information on a small set of paths/segments can be available. To accurately measure path delays, we can insert special-type scan flipflops (e.g., proposed in [10]). However, to the best of our knowledge, no existing literature has proposed methods for measuring a segment delay. This can perhaps be because the benefits of such measurement are unknown. However, this might be possible by manual efforts (e.g., probing) or by inserting scan flipflops, such as in [10], around the desired segment and handling the loading effects. Regardless that, we believe that custom test structures can be designed so that their delays are highly correlated with the delays of designated segments. It can be quite similar as in [7]. The results of this paper thus encourage the research community to look into custom test structure designs for segment measurement, such that we can reduce the overhead associated with the post-silicon validation effort.

3 4. REPRESENTATIVE PATH SELECTION In this Section, we consider representative path selection for a given error tolerance ǫ. 4.1 Exact Selection We first consider exact selection with ǫ = 0. Let A r be the rows of A corresponding to the r paths we select, and A m be the remaining rows. We can then rewrite Eqn (1) as [ ] [ ] [ dpr µr Ar d Ptar = = + d Pm µ m A m ] x. (3) We first introduce the following Theorem and Lemma: Theorem 1. The smallest r in which we can exactly express d Pm as linear combination of d Pr is r = rank(a). Proof. From Eqn (2), we have d Ptar = Gd S. Therefore, d Pm can be exactly written as linear combination of representative paths d Pr1 with r 1 = rank(g) given definition of matrix rank [5]. Similarly, from Eqn (1), we have d Ptar µ Ptar = Ax. d Pm can also be exactly written as linear combination of d Pr2, with r 2 = rank(a). Therefore, the smallest r to write the d Pm in terms of d Pr is: r = min(r 1, r 2) = min(rank(a),rank(g)). (4) Due to A = GΣ, rank(a) min(rank(g),rank(σ)) always holds [5]. This completes the proof. Theorem 1 shows that for r = rank(a), any set of paths corresponding to any r linearly independent rows of A will suffice as the representative paths. This is from the definition of rank of matrix since the r rows span all the remaining rows of A, and consequently the entire vector of path delay d Ptar can be exactly recovered. To select these r = rank(a) representative paths, we can utilize Algo. 2, which we will illustrate in Section 4.3. Lemma 1. For the smallest number of representative paths, we have r = rank(a) n S, where n S = S. Proof. Since rank(a n x ) rank(g n ns ), we have r = rank(a) n S. The above lemma states that to write an exact linear expression that maps d Pr to d Pm we need at most n S representative paths. Note that n S (number of segments) is at most equal to the number of edges in the timing graph, since the segments are lumped representation of the edges. The number of edges in turn can be much smaller than the target number of paths. To illustrate the above Theorem and Lemma, we consider circuit S1423. In this circuit, we extract 644 statistically critical paths, which indicates n = P tar = 644. These paths cover 415 gates and 255 segments. Since rank(a) = 122 holds, we only need 122 paths to exactly recover the delays of all remaining paths. 4.2 Approximate Selection with Effective Rank As shown in the above Section, for exact path delay prediction, the minimum number of required paths is rank(a), where A is the transformation matrix between d Ptar and x. Here, we will show that by allowing a small error tolerance ǫ, we can greatly reduce the number of required paths by using the novel idea of effective rank of A proposed in [4]. We first explain the idea intuitively. Consider the example of S1423 again. Since the extracted 644 paths can only cover Normalized Singular Values of A (a) Index Normalized Singular Values A (b) Index Figure 2: The normalized singular values of transformation matrix A under two configurations. 255 segments, many paths are forced to share the segments. It indicates that many of the rows in G are similar. That is, for many pairs of paths such as p i and p k, few entries such as g ij and g kj in G might be different, indicating that the two paths only differ in a few segments (i.e., segment s j). Correspondingly, this results in similarity between the i-th and k-th rows of matrix A since Σ is a constant sensitivity matrix. Therefore, intuitively, it seems that we may need much less than rank(a) = 122 to predict the remaining paths with high prediction accuracy. Formally, we can perform singular value decomposition (SVD) over A R n x and obtain A = U V T where U and V are n n and x x orthogonal matrices, respectively, and is a n x diagonal matrix. The diagonal elements λ i in are singular values and follow λ i λ i+1. The rank of A is equal to the largest i such that λ i > 0. These λ is can also provide other insights into the structure of A. Let us denote the energy as E = n i=1 λi, we define the effective rank of A to be [arg min k ( k i=1 λi (1 η)e)], where η is specified as a threshold, for example 5%. So it is the index of the smallest singular value which marks the points exceeding (1-η)% of total energy. This effective rank is shown to be closely related to prediction error ǫ as in [4]. For matrix A, its effective rank can be much smaller than its rank. If its singular value λ is drop with a faster rate, it means that only a few singular values are dominant and the effective rank of A is small. It further indicates that fewer representative paths are required for prediction under a given error tolerance. Figure 2 (a) plots the normalized singular values of A, which is equal to λ i/ λ i, on the y-axis using log-linear scale for S1423. Simulation configuration is given in Section 6. In this figure, we sort the singular values of A in non-increasing order and only plot the first 30 eigenvalues. As shown from the large gap that separates the singular values into subsets of large and small singular values, we can conclude that we may need only 30 paths to predict remaining paths with very high accuracy. However, with the further scaling of the submicron technology, both the dimension and the extent of random variations with respect to the total variation (including die-to-die, within-die, and random variations) would greatly increase. In this case, the number of representative paths would dramatically grow. As an example, we only increase the sensitivity of the independent random variations in A by 3X and plot its normalized singular values in Figure 2 (b). As we can see in this figure, the drop rate of the singular values of matrix A decreases quite a lot compared to Figure 2 (a), which indicates that more representative paths are required for prediction d Ptar. We can show similar plots if we increase the number of random variation.

4 Next, we discuss our proposed approximate path selection procedure assuming ǫ is provided. Specifically, we use effective rank to select representative paths so that the prediction errors for remaining paths are bounded by ǫ. 4.3 Path Selection Procedure with ǫ The high-level algorithm for representative path selection with error tolerance ǫ is given below. We start by exactly selecting r = rank(a) paths as explained in Section 4.1. The initial error in this case is ǫ r = 0. We decrement the number of target paths by one and select r 1 representative paths in Step 2.2. This introduces a new error which we compute in Step 2.3 and update ǫ r. If the new error is still smaller than given ǫ, we repeat another iteration and further decrement r until the error tolerance is reached. Algorithm 1: Representative Path Selection Input: error tolerance ǫ, d Ptar µ Ptar = Ax. 1. Select r = rank (A) representative paths exactly and set error ǫ r = 0 2. While (ǫ r ǫ) 2.1 r r Select r representative paths from P tar 2.3 Update error ǫ r for the newly selected paths. Next, we explain Step 2.2 of Algo. 1 to select r representative paths. We also discuss building a model between d Pr and d Pm and the computation of error ǫ r (Step 2.3). Step 2.2: Selection of r Representative Paths The selection of representative paths is a combinatorial optimization problem which is NP-complete [5]. From algorithmic perspective, it is equivalent to the subset selection problem in computational linear algebra. One procedure to solve this problem approximately is QR decomposition using column pivoting which we discuss below: Algorithm 2: Selection of r Representative Paths Input: Matrix A and r rank(a) 1. Perform SVD decomposition on A = U V T. 2. Perform QR with column pivoting on matrix U r composed by the first r columns of U and get: U T r P r = QR, where P r is n n permutation matrix. 3. Take A r to be the sub-matrix formed by the first r rows of P T r A We first perform SVD decomposition on A to obtain matrix U. Then we apply QR decomposition with column pivoting on U [5]. The input to the procedure is U r, a submatrix formed by the first r columns of U. The matrices Q and R are found during the procedure and help identify the output permutation matrix P r. After obtaining P r, to identify the r representative paths, we compute P T r A and take the sub-matrix formed by the first r rows which in turn relates to r path delays from vector d Ptar. Note, in Algo. 1, as we decrement r at each iteration in the while loop, we apply the column pivoting QR decomposition. This procedure can also be implemented incrementally based on the result of the previous iteration. For more details, we refer the reader to [4]. Step 2.3: d Pr d Pm Model and Error Computation After selecting the r representative paths, we use the following Theorem to build a model between the delays of representative paths d Pr to the delays of remaining paths d Pm. We assume all entries in x are independent and have a standard Gaussian distribution as in [7]. Theorem 2. The optimal linear predictor d Pr d Pm is d Pm = µ m + A ma T r (A ra T r ) 1 (d Pr µ r ), (5) where () 1 denotes the pseudo-inverse operator, and A r, A m µ r, and µ m are defined in Eqn (3). Then, the prediction error for d Pm is r = A ma T r (A ra T r ) 1 A mx A rx = Ω rx, (6) where Ω r A ma T r (A ra T r ) 1 A m A r is constant after selection is performed. It also shows that r is multivariate Gaussian distributed. Error Definition: Eqn (6) shows that we can compute the deviation of predicted path delays compared to exact path delays once we determine the representative paths P r. In this paper, we define the error ǫ r used in Algo. 1 as ǫ r = max i=1,2,...,n r WC( (i) r )/T cons, (7) where T cons denotes the circuit timing constraint, and (i) r denotes the i-th entry of r. The function WC(y) denotes the worst-case value of random variable y. Therefore, Eqn (7) indicates that the worst-case prediction error (deviation from the actual path delay) cannot be larger than ǫ rt cons for all paths in P m when P r is equal to r. Since r follows multivariate-gaussian distribution with known mean and variance, we can compute max( r (i) ) analytically. More importantly, the definition in Eqn (7) indicates that a maximum prediction error of deviation in path delays can be set to ǫt cons, which can be used as guard-band in post-silicon analysis to determine with full confidence if a path will fail the timing constraint. This upper bound may still be too pessimistic. In fact, for each path i in P m, a separate error of ǫ it cons can be defined from Algo. 1 and used as a guard-band for more accurate analysis. We further discuss guard-band analysis and demonstrate it in our simulation results. 4.4 Complexity Analysis In Algo. 1, we call Algo. 2 at most by r = rank(a) times. Each call requires one SVD and one QR decomposition as its dominant computing requirements. Sophisticated algorithms exist to solve these procedures. We use svd() and qr() functions from Matlab in our simulations. Generally, if the number of target paths is very large, we can apply a clustering procedure to form clusters of paths of smaller size for speedup. Furthermore, building the model between d Pr and d Pm and evaluating the error (using Theorem 2) are all done analytically and very efficiently. 5. HYBRID PATH/SEGMENT SELECTION Motivated by Figure 2, when the dimension and range of random variations increase, we need to select more paths. In this Section, we allow modeling path delays using the delays of a set of representative segments, and expect to further reduce the post-silicon validation effort. As we have already mentioned in Section 3, currently, there are no techniques for measuring segment delay. Segment selection can guide design of custom test structures which can be measured at the post-silicon stage and further reduce the post-silicon effort. Our goal is to show the benefits of knowing segment delays at the post-silicon stage, as shown by our simulations assuming that accurate post-silicon delays of segments are available.

5 We outline our proposed hybrid path/segment selection algorithm in Algo. 3, which has the same ǫ as in Algo. 1. Specifically, we can use Algo. 1 to solve Step 1 and Step 4. Step 3 is rather straightforward and can be done using standard least square method [8] and skipped due to lack of space. Here, ǫ in Step 2 is smaller than ǫ since there exists additional error from the delays of P r1 to those of P tar. We will discuss the selection on ǫ in simulation results. In the remaining of this Section, we discuss formulating Step 2 to a convex optimization problem. Algorithm 3: Hybrid Path/Segment Selection Input: error tolerance ǫ, d Ptar = Gd S = µ Ptar + Ax 1. Select a set of representative paths P r1 to model d Ptar with zero error tolerance 2. Select a set of representative segments S r1 to model d Pr1 from Step 1 with error tolerance ǫ < ǫ 3. Use delays of S r1 from Step 2 to model d Ptar ; detect the set of paths P r2 with prediction error larger than ǫ 4. Select paths/segments from P r2 and S r1 to form P r and S r with zero error tolerance. Step 2: Representative Segment Selection In this Step, we expect to select S r1 from S and build a model to predict d Pr1 Bd S, where d Pr1 and d S denote the delays of paths P r1 and segments S, respectively. From d Ptar = Gd S in Eqn (2), we can obtain d Pr1 = G r1 d S, where G r1 can be derived from G. Thus, we can express the prediction error in this approximation as r1 = d Pr1 Bd S = (G r1 B)d S, (8) which indicates that r1 follows a multivariate Gaussian distribution. Once B is determined, we can analytically obtain the mean and covariance of r1. In order to obtain the optimal B, following similar procedures in [9], we can write a mathematical formulation as min B s.t B T l0 /l q WC( (i) r 1 ) ǫ T cons, for i = 1,2,..., r 1 (9) where (i) r 1 denotes the i-th entry of r1, and WC(y) denotes the worst-case value of random variable y, the same as in Eqn (7). We also denote l i as the i-th norm. Based on the definition of l 0 norm, B T l0 /l q counts the number of columns in B that has a non-zero l q norm. Therefore, l 0/l norm represents the number of non-zero columns in B, which is equal to the number of selected segments S r1. Since the l 0 component in Eqn (9) yields a non-convex and computationally intractable problem, [9] proposed a relaxation to turn Eqn (9) into an easier problem based on l 1/l norm: min B s.t. n S i=1 max( b 1i, b 2i,..., b ni ) WC( (i) r 1 ) ǫ T cons, for i = 1,2,..., r 1 (10) where b ij is the ij-th element of B. Finally, we can observe that the formulation in Eqn (10) is convex because: 1) the objective function is convex via defining auxiliary variables z i = max( b 1i, b 2i,..., b ni ) and translating each max into n linear constraints. 2) the constraint is quadratic with respect to B after taking square operation on both sides. Since the procedure is straight forward, we refer to [9] for further details in solving the above convex optimization efficiently. Table 1: Results for Approximate Path Selection. Configurations Exact Approximate BENCH G R P tar P r P r e 1% e 2% S S S S S S S S S S Ave SIMULATION RESULTS We synthesized ISCAS 89 benchmarks using 90nm TSMC library and Synopsys Design Compiler for minimum area under a stringent timing constraint to ensure that the circuits are optimized. We assume parameter variations in effective channel length L eff and zero-bias threshold voltage V t, which are Gaussian distributed with standard deviation equal to 10% of their mean. To capture the spatial correlation between these parameter variations, we use the hierarchical model in [2], which defines rectangular regions on the chip. Column 3 in Table 1 gives the total number of regions ( R ) of each benchmark. As shown in this table, for smaller benchmarks, we use a 3-level model resulting in 21 regions. For larger ones, we use a 5-level model resulting in 341 regions. This assumption is consistent with [7]. In addition, each gate has a random variation term, which is 6% of the total variations. Note that this cannot be captured by [7]. In our simulation, we adopt the algorithm in [11] to extract P tar. Specifically, we extract all paths with path yield smaller than a given threshold since they are more likely to fail the timing at post-silicon stage. Note that our approaches can incorporate other path extraction algorithms. Finally, to evaluate our approaches, we generate N = 10, 000 samples for process variations, and compute the delays of the representative components (segments or paths depending on the approach). Then we predict the delays of the remaining paths using our representative components and compare them with their actual delay values provided by the samples. We evaluate the following metrics: Metric: ε i and ˆε i indicating the maximum and average relative prediction errors for i-th remaining path, respectively; e 1 and e 2 indicating the average of ε i and ˆε i over all remaining paths (P tar P r), respectively: where d (k) pred ε i = max N k=1 N ˆε i = 1 N k=1 d (k) pred (i) d(k) true (i) d (k) true d (k) pred (i) d(k) true (i) d (k) true, e 1 =, e 2 = n r i=1 ε i n r, n r i=1 ˆε i n r, (i) and d(k) true(i) are the predicted and actual delays of the i-th remaining path for sample k. 6.1 Approximate Path Selection We first evaluate approximate and exact path selection approaches. We set the timing constraint to be nominal circuit delay (without variation) and select all paths with a timing yield-loss greater than 0.01(1 Y ) where Y is circuit timing yield. We set ǫ = 5% in Algorithm 1. In Table 1, Columns 2 and 3 give the total number of gates and regions in this circuit ( G and R ). The number of extracted paths ( P tar ) is given in Column 4. For some benchmarks, we cannot extract many paths, since these circuits are intrinsically unbalanced [7]. Column 5 shows the number of representative paths of the exact approach ( P r ).

6 Table 2: Results for Evaluating Hybrid Path/Segment Selection. Configurations Approx. Path Selection Hybrid Path/Segment Selection BENCH G R G C R C P tar P r e 1% e 2% P r S r P r + S r e 1% e 2% S S S S S S S S S S Ave This number is much smaller than the number of target paths P tar in Column 4. The prediction error in this case is zero. Column 6 shows the number of representative paths using the approximate approach when error tolerance ǫ in Algo 1 is set to 5%. We can observe significant reduction in the number of representative paths. Also, the number of representative paths is scalable to the size of P tar and relevant to the circuit topology. Generally, when the paths in P tar are more correlated, we can achieve higher reduction after Algo. 1. Specifically, for circuit S38417, we extracted 692 paths and 190 paths are required for exact selection with zero prediction error. With error tolerance ǫ = 5%, we can reduce P r to 44. We also report e 1 and e 2 in Columns 7-8, which are rather small. The indications of these errors will be discussed in Section 6.3 for guard-band analysis. 6.2 Hybrid Path/Segment Selection We evaluate our hybrid path/segment selection approach. To make the hybrid approach meaningful, we relax the timing constraint to extract more critical paths (Note P tar in Table 1 is too small). We still use the same path yield threshold to extract P tar. We first implement approximate path selection approach with ǫ = 8% (Algo. 1). We then implement our hybrid approach in Algo. 3 with the same ǫ = 8%. In terms of ǫ used in Algo. 3, since hybrid path/segment selection is performed at design stage and can be parallelized, in our simulation, we try different ǫ < ǫ and use the one with minimum P r + S r. Note that in this paper, our goal is to show the benefit of knowing actual segment delay, and we leave the selection on theoretical optimal ǫ for future. Table 2 reports the results of this experiment. Columns 2-6 report the total number of gates and regions of the circuit ( G and R ), the number of gates and regions covered by P tar ( G C and R C ), and the number of extracted paths P tar, respectively. Take S38417 as an example. We extract 3,507 target paths which cover 1,386 gates and 157 regions. Thus, for S38417, we have 1,700(= 1, ) independent variations, where 1,386 denotes the random variation for each gate, and 157 denotes the global/local variations, and 2 denotes the parameters of L eff and V t. For approximate path selection, we report the number of selected paths P r and errors e 1, e 2 in Columns 7-9, respectively. For the hybrid approach, we report P r (the number of selected paths) and S r (the number of selected segments), and their summations (total post-silicon delay information) in Columns 10-12, respectively. The e 1 and e 2 are reported in Columns 13 and 14. These are combined errors of the path and segment modeling procedures. Compare Column 7 with 12, and we can see in the majority of the cases, the hybrid approach can reduce the number of post-silicon delay measurements to less than 100. When the extracted paths in P tar are less correlated (illustrated by Column 7), our reduction is significant, while the errors do not change much. In addition, for nearly all benchmarks, the number of selected segments is less than Guard-band Analysis Section 4.3 gives the specific definition to error ǫ and relate it to guard-band analysis at post-silicon stage. We let the guard-band to be the gap that we need to consider after we make path delay prediction at post-silicon, and denote it as φ. As defined in error definition in Section 4.3 and 5, we have φ upper-bounded by ǫt cons, where ǫ = 5% in Table 1 and ǫ = 8% in Table 2. As seen in these two tables, we have e 1 to be 2.29%, 3.05% and 3.54% on average for different approaches. This is the average φ for all predicted paths as illustrated by the definitions of ε i and e 1. It indicates that for paths p i, if the predicted path delay is d (k) pred (i) divided by 1 εi, is larger than T cons, this path fails the timing. We can see that the average guard-band e 1 is smaller than our pre-specified error tolerance ǫ. In addition, the average prediction error over all predicted paths in our MC simulation is very small (as illustrated by e 2). It indicates that our defined guardband is useful to facilitate the post-silicon failure detection. 7. CONCLUSIONS AND FUTURE WORK We present a framework for post-silicon timing prediction under variability. Knowing the actual delays on a small set of carefully selected paths/segments, we can predict the timings of a large pool of target paths at post-silicon stage. Simulations show that we can use very few selections to predict up to 3,500 paths with very high prediction accuracy even when the dimension of process variations is larger than 1,000. In addition, our framework can help to guide custom test structure designs. We also plan to incorporate our framework into post-silicon diagnosis in the future. 8. REFERENCES [1] Bastani, P., and et. al. Speedpath prediction based on learning from a small set of examples. In DAC (2008). [2] Blaauw, D., and et. al. Statistical timing analysis: From basic principles to state of the art. TCAD 27, 4 (2008). [3] Callegari, N., and et. al. Path selection for monitoring unexpected systematic timing effects. In ASPDAC (2009). [4] Chua, D., and et. al. Network kriging. JSAC 24, 12 (2006). [5] Golub, G., and Loan, C. Matrix Computations, 2nd ed. The Johns Hopkins University Press, London, [6] Josephson, D. The good, the bad, and the ugly of silicon debug. In DAC (2006). [7] Liu, Q., and Sapatnekr, S. Synthesizing a representative critical path for post-silicon delay prediction. In ISPD (2009). [8] Nocedal, J., and Wright, S. Numerical Optimization, 2nd ed. The Springer Press, New York, [9] Turlach, B., and et. al. Simultaneous variable selection. Technometrics 27 (2005). [10] Wang, X., and et. al. Path-RO: a novel on-chip critical path delay measurement under process variations. In ICCAD (2008). [11] Xie, L., and Davoodi, A. Bound-based identification of timing-violating paths under variability. In ASPDAC (2009).

Integer Least Squares: Sphere Decoding and the LLL Algorithm

Integer Least Squares: Sphere Decoding and the LLL Algorithm Integer Least Squares: Sphere Decoding and the LLL Algorithm Sanzheng Qiao Department of Computing and Software McMaster University 28 Main St. West Hamilton Ontario L8S 4L7 Canada. ABSTRACT This paper

More information

Parallel Singular Value Decomposition. Jiaxing Tan

Parallel Singular Value Decomposition. Jiaxing Tan Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate SVD? How to parallelize SVD? Future Work What is SVD? Matrix Decomposition Eigen Decomposition A (non-zero) vector

More information

Test Generation for Designs with Multiple Clocks

Test Generation for Designs with Multiple Clocks 39.1 Test Generation for Designs with Multiple Clocks Xijiang Lin and Rob Thompson Mentor Graphics Corp. 8005 SW Boeckman Rd. Wilsonville, OR 97070 Abstract To improve the system performance, designs with

More information

Dynamic Adaptation for Resilient Integrated Circuits and Systems

Dynamic Adaptation for Resilient Integrated Circuits and Systems Dynamic Adaptation for Resilient Integrated Circuits and Systems Krishnendu Chakrabarty Department of Electrical and Computer Engineering Duke University Durham, NC 27708, USA Department of Computer and

More information

Fast Buffer Insertion Considering Process Variation

Fast Buffer Insertion Considering Process Variation Fast Buffer Insertion Considering Process Variation Jinjun Xiong, Lei He EE Department University of California, Los Angeles Sponsors: NSF, UC MICRO, Actel, Mindspeed Agenda Introduction and motivation

More information

Determining Appropriate Precisions for Signals in Fixed-Point IIR Filters

Determining Appropriate Precisions for Signals in Fixed-Point IIR Filters 38.3 Determining Appropriate Precisions for Signals in Fixed-Point IIR Filters Joan Carletta Akron, OH 4435-3904 + 330 97-5993 Robert Veillette Akron, OH 4435-3904 + 330 97-5403 Frederick Krach Akron,

More information

CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization

CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization Tim Roughgarden February 28, 2017 1 Preamble This lecture fulfills a promise made back in Lecture #1,

More information

Geometric interpretation of signals: background

Geometric interpretation of signals: background Geometric interpretation of signals: background David G. Messerschmitt Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-006-9 http://www.eecs.berkeley.edu/pubs/techrpts/006/eecs-006-9.html

More information

Conditions for Robust Principal Component Analysis

Conditions for Robust Principal Component Analysis Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and

More information

POST-SILICON TIMING DIAGNOSIS UNDER PROCESS VARIATIONS

POST-SILICON TIMING DIAGNOSIS UNDER PROCESS VARIATIONS POST-SILICON TIMING DIAGNOSIS UNDER PROCESS VARIATIONS by Lin Xie A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Electrical Engineering) at

More information

Matrix decompositions

Matrix decompositions Matrix decompositions Zdeněk Dvořák May 19, 2015 Lemma 1 (Schur decomposition). If A is a symmetric real matrix, then there exists an orthogonal matrix Q and a diagonal matrix D such that A = QDQ T. The

More information

Statistical Timing Analysis with Path Reconvergence and Spatial Correlations

Statistical Timing Analysis with Path Reconvergence and Spatial Correlations Statistical Timing Analysis with Path Reconvergence and Spatial Correlations Lizheng Zhang, Yuhen Hu, Charlie Chung-Ping Chen ECE Department, University of Wisconsin, Madison, WI53706-1691, USA E-mail:

More information

Reducing Delay Uncertainty in Deeply Scaled Integrated Circuits Using Interdependent Timing Constraints

Reducing Delay Uncertainty in Deeply Scaled Integrated Circuits Using Interdependent Timing Constraints Reducing Delay Uncertainty in Deeply Scaled Integrated Circuits Using Interdependent Timing Constraints Emre Salman and Eby G. Friedman Department of Electrical and Computer Engineering University of Rochester

More information

Rank Revealing QR factorization. F. Guyomarc h, D. Mezher and B. Philippe

Rank Revealing QR factorization. F. Guyomarc h, D. Mezher and B. Philippe Rank Revealing QR factorization F. Guyomarc h, D. Mezher and B. Philippe 1 Outline Introduction Classical Algorithms Full matrices Sparse matrices Rank-Revealing QR Conclusion CSDA 2005, Cyprus 2 Situation

More information

Algorithm 853: an Efficient Algorithm for Solving Rank-Deficient Least Squares Problems

Algorithm 853: an Efficient Algorithm for Solving Rank-Deficient Least Squares Problems Algorithm 853: an Efficient Algorithm for Solving Rank-Deficient Least Squares Problems LESLIE FOSTER and RAJESH KOMMU San Jose State University Existing routines, such as xgelsy or xgelsd in LAPACK, for

More information

Main matrix factorizations

Main matrix factorizations Main matrix factorizations A P L U P permutation matrix, L lower triangular, U upper triangular Key use: Solve square linear system Ax b. A Q R Q unitary, R upper triangular Key use: Solve square or overdetrmined

More information

Pre and post-silicon techniques to deal with large-scale process variations

Pre and post-silicon techniques to deal with large-scale process variations Pre and post-silicon techniques to deal with large-scale process variations Jaeyong Chung, Ph.D. Department of Electronic Engineering Incheon National University Outline Introduction to Variability Pre-silicon

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 1: Course Overview & Matrix-Vector Multiplication Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 20 Outline 1 Course

More information

EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University September 22, 2005 0 Preface This collection of ten

More information

Linear Algebra and Eigenproblems

Linear Algebra and Eigenproblems Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

Methodology to Achieve Higher Tolerance to Delay Variations in Synchronous Circuits

Methodology to Achieve Higher Tolerance to Delay Variations in Synchronous Circuits Methodology to Achieve Higher Tolerance to Delay Variations in Synchronous Circuits Emre Salman and Eby G. Friedman Department of Electrical and Computer Engineering University of Rochester Rochester,

More information

1 Searching the World Wide Web

1 Searching the World Wide Web Hubs and Authorities in a Hyperlinked Environment 1 Searching the World Wide Web Because diverse users each modify the link structure of the WWW within a relatively small scope by creating web-pages on

More information

Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

Numerical Methods. Elena loli Piccolomini. Civil Engeneering.  piccolom. Metodi Numerici M p. 1/?? Metodi Numerici M p. 1/?? Numerical Methods Elena loli Piccolomini Civil Engeneering http://www.dm.unibo.it/ piccolom elena.loli@unibo.it Metodi Numerici M p. 2/?? Least Squares Data Fitting Measurement

More information

EffiTest2: Efficient Delay Test and Prediction for Post-Silicon Clock Skew Configuration under Process Variations

EffiTest2: Efficient Delay Test and Prediction for Post-Silicon Clock Skew Configuration under Process Variations IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 1 EffiTest2: Efficient Delay Test and Prediction for Post-Silicon Clock Skew Configuration under Process Variations Grace Li

More information

Pr[X = s Y = t] = Pr[X = s] Pr[Y = t]

Pr[X = s Y = t] = Pr[X = s] Pr[Y = t] Homework 4 By: John Steinberger Problem 1. Recall that a real n n matrix A is positive semidefinite if A is symmetric and x T Ax 0 for all x R n. Assume A is a real n n matrix. Show TFAE 1 : (a) A is positive

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

arxiv: v1 [math.na] 5 May 2011

arxiv: v1 [math.na] 5 May 2011 ITERATIVE METHODS FOR COMPUTING EIGENVALUES AND EIGENVECTORS MAYSUM PANJU arxiv:1105.1185v1 [math.na] 5 May 2011 Abstract. We examine some numerical iterative methods for computing the eigenvalues and

More information

Solution of Linear Equations

Solution of Linear Equations Solution of Linear Equations (Com S 477/577 Notes) Yan-Bin Jia Sep 7, 07 We have discussed general methods for solving arbitrary equations, and looked at the special class of polynomial equations A subclass

More information

AMS 209, Fall 2015 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems

AMS 209, Fall 2015 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems AMS 209, Fall 205 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems. Overview We are interested in solving a well-defined linear system given

More information

Simultaneous Shield Insertion and Net Ordering for Capacitive and Inductive Coupling Minimization

Simultaneous Shield Insertion and Net Ordering for Capacitive and Inductive Coupling Minimization Simultaneous Shield Insertion and Net Ordering for Capacitive and Inductive Coupling Minimization Lei He University of Wisconsin 1415 Engineering Drive Madison, WI 53706 (608) 262-3736 lhe@ece.wisc.edu

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized

More information

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms Christopher Engström November 14, 2014 Hermitian LU QR echelon Contents of todays lecture Some interesting / useful / important of matrices Hermitian LU QR echelon Rewriting a as a product of several matrices.

More information

BlockMatrixComputations and the Singular Value Decomposition. ATaleofTwoIdeas

BlockMatrixComputations and the Singular Value Decomposition. ATaleofTwoIdeas BlockMatrixComputations and the Singular Value Decomposition ATaleofTwoIdeas Charles F. Van Loan Department of Computer Science Cornell University Supported in part by the NSF contract CCR-9901988. Block

More information

On Application of Output Masking to Undetectable Faults in Synchronous Sequential Circuits with Design-for-Testability Logic

On Application of Output Masking to Undetectable Faults in Synchronous Sequential Circuits with Design-for-Testability Logic On Application of Output Masking to Undetectable Faults in Synchronous Sequential Circuits with Design-for-Testability Logic Irith Pomeranz 1 and Sudhakar M. Reddy 2 School of Electrical & Computer Eng.

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

An Optimal Algorithm of Adjustable Delay Buffer Insertion for Solving Clock Skew Variation Problem

An Optimal Algorithm of Adjustable Delay Buffer Insertion for Solving Clock Skew Variation Problem An Optimal Algorithm of Adjustable Delay Buffer Insertion for Solving Clock Skew Variation Problem Juyeon Kim 1 juyeon@ssl.snu.ac.kr Deokjin Joo 1 jdj@ssl.snu.ac.kr Taewhan Kim 1,2 tkim@ssl.snu.ac.kr 1

More information

Singular Value Decomposition

Singular Value Decomposition Singular Value Decomposition (Com S 477/577 Notes Yan-Bin Jia Sep, 7 Introduction Now comes a highlight of linear algebra. Any real m n matrix can be factored as A = UΣV T where U is an m m orthogonal

More information

Total Ordering on Subgroups and Cosets

Total Ordering on Subgroups and Cosets Total Ordering on Subgroups and Cosets Alexander Hulpke Department of Mathematics Colorado State University 1874 Campus Delivery Fort Collins, CO 80523-1874 hulpke@math.colostate.edu Steve Linton Centre

More information

Efficient Incremental Analysis of On-Chip Power Grid via Sparse Approximation

Efficient Incremental Analysis of On-Chip Power Grid via Sparse Approximation Efficient Incremental Analysis of On-Chip Power Grid via Sparse Approximation Pei Sun and Xin Li ECE Department, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 1513 {peis, xinli}@ece.cmu.edu

More information

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

CHAPTER 11. A Revision. 1. The Computers and Numbers therein CHAPTER A Revision. The Computers and Numbers therein Traditional computer science begins with a finite alphabet. By stringing elements of the alphabet one after another, one obtains strings. A set of

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

Practical Linear Algebra: A Geometry Toolbox

Practical Linear Algebra: A Geometry Toolbox Practical Linear Algebra: A Geometry Toolbox Third edition Chapter 12: Gauss for Linear Systems Gerald Farin & Dianne Hansford CRC Press, Taylor & Francis Group, An A K Peters Book www.farinhansford.com/books/pla

More information

Vector Space Concepts

Vector Space Concepts Vector Space Concepts ECE 174 Introduction to Linear & Nonlinear Optimization Ken Kreutz-Delgado ECE Department, UC San Diego Ken Kreutz-Delgado (UC San Diego) ECE 174 Fall 2016 1 / 25 Vector Space Theory

More information

Least Squares Approximation

Least Squares Approximation Chapter 6 Least Squares Approximation As we saw in Chapter 5 we can interpret radial basis function interpolation as a constrained optimization problem. We now take this point of view again, but start

More information

1 Singular Value Decomposition and Principal Component

1 Singular Value Decomposition and Principal Component Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)

More information

Numerical Methods - Numerical Linear Algebra

Numerical Methods - Numerical Linear Algebra Numerical Methods - Numerical Linear Algebra Y. K. Goh Universiti Tunku Abdul Rahman 2013 Y. K. Goh (UTAR) Numerical Methods - Numerical Linear Algebra I 2013 1 / 62 Outline 1 Motivation 2 Solving Linear

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

Lecture 5 Singular value decomposition

Lecture 5 Singular value decomposition Lecture 5 Singular value decomposition Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University, tieli@pku.edu.cn

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

Linear Least-Squares Data Fitting

Linear Least-Squares Data Fitting CHAPTER 6 Linear Least-Squares Data Fitting 61 Introduction Recall that in chapter 3 we were discussing linear systems of equations, written in shorthand in the form Ax = b In chapter 3, we just considered

More information

Variations-Aware Low-Power Design with Voltage Scaling

Variations-Aware Low-Power Design with Voltage Scaling Variations-Aware -Power Design with Scaling Navid Azizi, Muhammad M. Khellah,VivekDe, Farid N. Najm Department of ECE, University of Toronto, Toronto, Ontario, Canada Circuits Research, Intel Labs, Hillsboro,

More information

Lecture 2: Linear Algebra

Lecture 2: Linear Algebra Lecture 2: Linear Algebra Rajat Mittal IIT Kanpur We will start with the basics of linear algebra that will be needed throughout this course That means, we will learn about vector spaces, linear independence,

More information

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4 Linear Algebra Section. : LU Decomposition Section. : Permutations and transposes Wednesday, February 1th Math 01 Week # 1 The LU Decomposition We learned last time that we can factor a invertible matrix

More information

Efficient Optimization of In-Package Decoupling Capacitors for I/O Power Integrity

Efficient Optimization of In-Package Decoupling Capacitors for I/O Power Integrity 1 Efficient Optimization of In-Package Decoupling Capacitors for I/O Power Integrity Jun Chen and Lei He Electrical Engineering Department, University of California, Los Angeles Abstract With high integration

More information

Technology Mapping for Reliability Enhancement in Logic Synthesis

Technology Mapping for Reliability Enhancement in Logic Synthesis Technology Mapping for Reliability Enhancement in Logic Synthesis Zhaojun Wo and Israel Koren Department of Electrical and Computer Engineering University of Massachusetts,Amherst,MA 01003 E-mail: {zwo,koren}@ecs.umass.edu

More information

An Introduction to NeRDS (Nearly Rank Deficient Systems)

An Introduction to NeRDS (Nearly Rank Deficient Systems) (Nearly Rank Deficient Systems) BY: PAUL W. HANSON Abstract I show that any full rank n n matrix may be decomposento the sum of a diagonal matrix and a matrix of rank m where m < n. This decomposition

More information

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination Math 0, Winter 07 Final Exam Review Chapter. Matrices and Gaussian Elimination { x + x =,. Different forms of a system of linear equations. Example: The x + 4x = 4. [ ] [ ] [ ] vector form (or the column

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

PARADE: PARAmetric Delay Evaluation Under Process Variation *

PARADE: PARAmetric Delay Evaluation Under Process Variation * PARADE: PARAmetric Delay Evaluation Under Process Variation * Xiang Lu, Zhuo Li, Wangqi Qiu, D. M. H. Walker, Weiping Shi Dept. of Electrical Engineering Dept. of Computer Science Texas A&M University

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725 Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple

More information

MTH Linear Algebra. Study Guide. Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education

MTH Linear Algebra. Study Guide. Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education MTH 3 Linear Algebra Study Guide Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education June 3, ii Contents Table of Contents iii Matrix Algebra. Real Life

More information

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for 1 A cautionary tale Notes for 2016-10-05 You have been dropped on a desert island with a laptop with a magic battery of infinite life, a MATLAB license, and a complete lack of knowledge of basic geometry.

More information

System-Level Mitigation of WID Leakage Power Variability Using Body-Bias Islands

System-Level Mitigation of WID Leakage Power Variability Using Body-Bias Islands System-Level Mitigation of WID Leakage Power Variability Using Body-Bias Islands Siddharth Garg Diana Marculescu Dept. of Electrical and Computer Engineering Carnegie Mellon University {sgarg1,dianam}@ece.cmu.edu

More information

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora Scribe: Today we continue the

More information

Early-stage Power Grid Analysis for Uncertain Working Modes

Early-stage Power Grid Analysis for Uncertain Working Modes Early-stage Power Grid Analysis for Uncertain Working Modes Haifeng Qian Department of ECE University of Minnesota Minneapolis, MN 55414 qianhf@ece.umn.edu Sani R. Nassif IBM Austin Research Labs 11400

More information

Pre-sessional Mathematics for Big Data MSc Class 2: Linear Algebra

Pre-sessional Mathematics for Big Data MSc Class 2: Linear Algebra Pre-sessional Mathematics for Big Data MSc Class 2: Linear Algebra Yuri Kalnishkan September 22, 2018 Linear algebra is vitally important for applied mathematics We will approach linear algebra from a

More information

Feasibility-Preserving Crossover for Maximum k-coverage Problem

Feasibility-Preserving Crossover for Maximum k-coverage Problem Feasibility-Preserving Crossover for Maximum -Coverage Problem Yourim Yoon School of Computer Science & Engineering Seoul National University Sillim-dong, Gwana-gu Seoul, 151-744, Korea yryoon@soar.snu.ac.r

More information

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

EE 381V: Large Scale Optimization Fall Lecture 24 April 11 EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that

More information

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition AM 205: lecture 8 Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition QR Factorization A matrix A R m n, m n, can be factorized

More information

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic Applied Mathematics 205 Unit II: Numerical Linear Algebra Lecturer: Dr. David Knezevic Unit II: Numerical Linear Algebra Chapter II.3: QR Factorization, SVD 2 / 66 QR Factorization 3 / 66 QR Factorization

More information

Krylov Subspace Methods to Calculate PageRank

Krylov Subspace Methods to Calculate PageRank Krylov Subspace Methods to Calculate PageRank B. Vadala-Roth REU Final Presentation August 1st, 2013 How does Google Rank Web Pages? The Web The Graph (A) Ranks of Web pages v = v 1... Dominant Eigenvector

More information

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Lecture 7: Positive Semidefinite Matrices

Lecture 7: Positive Semidefinite Matrices Lecture 7: Positive Semidefinite Matrices Rajat Mittal IIT Kanpur The main aim of this lecture note is to prepare your background for semidefinite programming. We have already seen some linear algebra.

More information

Applied Linear Algebra in Geoscience Using MATLAB

Applied Linear Algebra in Geoscience Using MATLAB Applied Linear Algebra in Geoscience Using MATLAB Contents Getting Started Creating Arrays Mathematical Operations with Arrays Using Script Files and Managing Data Two-Dimensional Plots Programming in

More information

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Lecture 9: Numerical Linear Algebra Primer (February 11st) 10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 1002 Lecture notes 10 November 23, Linear models DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

More information

7 Principal Component Analysis

7 Principal Component Analysis 7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is

More information

7. Symmetric Matrices and Quadratic Forms

7. Symmetric Matrices and Quadratic Forms Linear Algebra 7. Symmetric Matrices and Quadratic Forms CSIE NCU 1 7. Symmetric Matrices and Quadratic Forms 7.1 Diagonalization of symmetric matrices 2 7.2 Quadratic forms.. 9 7.4 The singular value

More information

Nonlinear Optimization Methods for Machine Learning

Nonlinear Optimization Methods for Machine Learning Nonlinear Optimization Methods for Machine Learning Jorge Nocedal Northwestern University University of California, Davis, Sept 2018 1 Introduction We don t really know, do we? a) Deep neural networks

More information

22.4. Numerical Determination of Eigenvalues and Eigenvectors. Introduction. Prerequisites. Learning Outcomes

22.4. Numerical Determination of Eigenvalues and Eigenvectors. Introduction. Prerequisites. Learning Outcomes Numerical Determination of Eigenvalues and Eigenvectors 22.4 Introduction In Section 22. it was shown how to obtain eigenvalues and eigenvectors for low order matrices, 2 2 and. This involved firstly solving

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

3 (Maths) Linear Algebra

3 (Maths) Linear Algebra 3 (Maths) Linear Algebra References: Simon and Blume, chapters 6 to 11, 16 and 23; Pemberton and Rau, chapters 11 to 13 and 25; Sundaram, sections 1.3 and 1.5. The methods and concepts of linear algebra

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725 Consider Last time: proximal Newton method min x g(x) + h(x) where g, h convex, g twice differentiable, and h simple. Proximal

More information

The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization

The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering Michigan Technological University Houghton, Michigan

More information

MEASUREMENTS that are telemetered to the control

MEASUREMENTS that are telemetered to the control 2006 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 19, NO. 4, NOVEMBER 2004 Auto Tuning of Measurement Weights in WLS State Estimation Shan Zhong, Student Member, IEEE, and Ali Abur, Fellow, IEEE Abstract This

More information

Scientific Computing

Scientific Computing Scientific Computing Direct solution methods Martin van Gijzen Delft University of Technology October 3, 2018 1 Program October 3 Matrix norms LU decomposition Basic algorithm Cost Stability Pivoting Pivoting

More information

Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning

Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning Yuchun Ma* Zhuoyuan Li* Jason Cong Xianlong Hong Glenn Reinman Sheqin Dong* Qiang Zhou *Department of Computer Science &

More information