Microprocessor Floorplanning with Power Load Aware Temporal Temperature Variation
|
|
- Roland Harper
- 5 years ago
- Views:
Transcription
1 Microprocessor Floorplanning with Power Load Aware Temporal Temperature Variation Chun-Ta Chu, Xinyi Zhang, Lei He, and Tom Tong Jing Department of Electrical Engineering, University of California at Los Angeles, CA, {chunta, zxy, lhe, ABSTRACT Traditional microprocessor floorplanning only considers area and wire length. Recent research has started to consider performance optimization due to interconnect pipelining and thermal hotspot caused by increased power density. However, there is no efficient approach on microprocessor floorplanning considering application dependent power load for performance and thermal optimization. This paper studies microprocessor floorplanning considering thermal and throughput optimization. We first develop a stochastic heat diffusion model taking into account the application dependent power load for thermal analysis. Then, we design the floorplanning algorithm based on this model. Experimental results show that, compared with the deterministic heat diffusion model, our model obtains up to 3.2 o C reduction of the on-chip peak temperature, 1.29% reduction of the area, and 1.13x better CPI (cycles per instruction) performance, respectively. Compared with temperature aware floorplanning in the HOTSPOT tool set that ignores interconnect pipelining, our algorithm is up to 27x faster, reduces the peak temperature by up to 3 o C, and also reduces CPI significantly with a negligible area overhead. We also use our thermal model to analyze the floorplanning for dual core microarchitecture. The results show that optimizing the floorplan over two cores simultaneously can further reduce temperature by 3.36 o C with slightly better CPI performance. 1. INTRODUCTION Traditional microprocessor floorplanning only considers area and wirelength, which is under the assumption that micro architecture optimization decides the average cycles per instruction (CPI) and therefore system throughput, and floorplanning determines wirelength and clock rate. Due to the increasing clock rate, the interconnect delay may become longer than the clock period. As a result, interconnect pipelining becomes a must and affects CPI as well [1] [2]. Several recent studies [1] [3] [4] optimized microprocessor floorplanning on area and CPI considering interconnect pipelining. As devices keep shrinking, the decreasing rate of power consumption in a chip cannot catch up with the shrinking chip size. As a result, the power density within a chip keeps increasing, which leads to higher temperature and higher possibility to degrade performance and chip reliability [5]. Floorplanning is the design stage with the largest impact on This paper is partially supported by NSF CAREER award CCR / and a UC MICRO grant sponsored by Altera and Intel. Address comments to lhe@ee.ucla.edu. chip temperature. Therefore, the impact of thermal effects should be considered in the floorplanning phase, which has not been done in the above studies. Thermal aware placement has been studied for several years [6] [7] [8] [9]. Different methods were used to make the thermal distribution of standard cells smooth, which makes the temperature distribution quasi-uniform in one module. However, even though the thermal distribution across one module is smooth, without thermal aware floorplan, it is still possible that the temperature gradient between modules is quite large since different modules may have different power density and thus different temperature distribution. Putting modules with high power density together may lead to higher temperature in those modules since the horizon heat diffusion is littler across those modules. As a result, an improper floorplan may lead to large temperature gradient across the modules and thus causes hotspots within a chip. Several recent developments in microprocessor foorplanning have started to consider thermal modeling and thermal aware optimization. In order to estimate the temperature impact on a floorplan, it is straightforward to calculate the temperature to estimate the impact. Given a microarchitecture, [10] first obtained the average power over all benchmarks for each block. For each new floorplan, it used HOTSPOT [11] to model the whole package as a thermal RC network with power as branch current and temperature as node voltage and calculate the peak steady-state temperature with average power as input. As a result, the objective then consists of the area, CPI, and the peak steady-state temperature. [12] further considered transient power and the dependency between power and CPI with consideration of interconnect pipeline for floorplanning. Besides, for each new floorplan, the input is not just average power but transient power over time. In this case, the objective consists of not only the area and CPI but also the peak temperature and the average temperature over time. One major drawback of the aforementioned work is that the temperature is calculated for new floorplan, which is time-consuming. [12] had even longer running time than [10] since [12] applied transient power instead of the average power. Considering this, [13] proposed a simple deterministic heat diffusion model to model the lateral heat diffusion between modules to avoid directly calculating temperature. However, this heat diffusion model is too simplified to guarantee a good solution. Also, [13] just used total wire length instead of CPI in the objective. More importantly, all the above studies do not consider the application dependent power load and its induced temperature variation. As shown in Fig. 1[14], given
2 a micro-architecture with different applications, the temperature distribution across the chip could be totally different. It is not clear how to extend the above thermal aware microprocessor floorplanning to handle application dependent power load. Figure 1: Temperature distribution for different applications In this paper, we develop an accurate, yet efficient thermalaware floorplanning considering the correlation between power consumptions for different micro-architecture modules and for different microprocessor applications. Instead of calculating temperature directly during floorplanning, we develop a stochastic heat diffusion model with consideration of block geometry and the above power correlation. We apply this model to the iterative-improvement-based floorplanning for thermal optimization, and also simultaneously optimize throughput using the trajectory piecewise linear model (TPWL) for pipelined interconnects developed in [3]. Multi-core micro-architecture has been studied for several years because of less area and high efficiency compared with single-core design. However, the thermal effect on floorplanning is only studied based on one core [15] [16]. In this work, we compare the thermal effect and performance on dual-core floorplan, and show the difference if the design considers thermal effect. The rest of this paper is organized as follows. Section 2 introduces the background of floorplanning, microprocessor performance, and the relation between power and temperature. Section 3 reviews the deterministic heat diffusion model, points out its shortcomings, and then presents our stochastic heat diffusion model. Section 4 summarizes the flow of floorplanning, experiment settings, and experimental results. We conclude this work in Section BACKGROUNDS AND PROBLEM FORMULATION 2.1 Floorplanning Floorplanning determines the locations and the shapes of the modules on chip subject to the minimization of a cost function. Traditionally, the cost function considers the chip area and total wire length. However, floorplan in the microarchitecture level has to consider not only the area and the total wire length but also CPI. Moreover, in order to consider the impact of thermal interaction between any two modules for different floorplans, we have to take into consideration the thermal estimation. Therefore, the objective function in our floorplan algorithm consists of area, CPI, and thermal effect as follows. Area CP I W area + W CP I + Area norm CP I norm thermal W thermal (1) thermal norm where W area, W CP I, and W thermal are the weights for corresponding constraints and can be adjusted by different goal, Area norm, CP I norm, and thermal norm are terms for normalization. Most current floorplanning solvers are based on simulated annealing (SA) algorithm [17] [18], which is also used in this paper. The SA starts with an arbitrary initial floorplan and changes the floorplan to a new one in the next iteration by either changing the positions or the shapes of a small set of modules. During each iteration, the cost is calculated by using the cost function. The new floorplan will be accepted either if the cost is smaller than the current best case or with a probability based on the annealing temperature. If the number of iteration is large enough, SA is able to find a high-quality solution. 2.2 Microprocessor Performance Microprocessor performance is measured by CPI (for a given clock rate) or millions instructions per second (MIPS). In this paper, we use the metric of CPI for a given clock rate under a fixed clock rate, which means the lower the CPI, the better the performance. The traditional objective of floorplanning is to minimize interconnect delay and obtain the targeted clock rate. Due to the ever increasing clock rate, interconnect delay may become longer than one clock cycle. In this case, interconnect pipelining is a must and it affects CPI. Note that floorplanning is the primary design stage to decide interconnect length and the need of interconnect pipelining. CPI with interconnect pipelining obtained by micro-architecture level simulation is time-consuming. In this paper, we apply efficient TPWL model from [3]. It starts with a traditional floorplan with objectives of area and wire length to sample the bus latency vector that is the latencies between modules. A CPI table is then generated by using cycle-accurate simulation tool, SimpleScalar [19]. CPI under a new floorplanning is approximated by CPI values from the CPI table and TPWL. 2.3 Power and Temperature To simulate the power consumption for a given microarchitecture, we use PTscalar [20] that is a cycle-accurate micro-architecture-level performance and power simulator for SuperScalar architectures. We get the power consumption for all modules with simulation from SPEC2000 [21] benchmarks. Given a micro-architecture floorplan and power consumption input, HOTSPOT [11] models the whole package shown in Fig. 2 as a thermal RC network [22] with power as a branch current and temperature as a node voltage. In the flip-chip package design as shown in Fig. 2, the power (heat) generated by modules flow both horizontally to the adjacent modules or to the adjacent heat spreader and vertically down to the PCB and up to the heat spreader and then to heat sink. Without the horizontal flow, vertical flow directly dissipates heat from the modules and thus lower the temperature of modules. However, because of the horizon-
3 tal flow between modules, the relative locations of modules may affect the final temperature of each module and thus generate a temperature gradient. Also, different applications with the same floorplan may have completely different power consumptions and thus different temperature gradients. Fig. 1 shows such an example. Therefore, because of lateral heat flow, we can arrange the locations of different modules to make the temperature distribution across the chip smooth, which means the variance of temperature for modules becomes smaller. During each new floorplan, the transient power for each module may also change since the change of wire length between modules affects CPI and CPI affects transient power. Also, different floorplan leads to different temperature, which affects leakage power. In this work, we assume the power is invariant over different floorplan for modules and different temperature but our work can be easily extended to consider leakage, temperature, floorplanning interdependency [23]. Figure 2: Thermal RC model of flip-chip package In order to make the temperature distribution across the die smooth, previous work [10] [12] used HOTSPOT [11] to directly calculate temperature during each iteration in SA and to evaluate the cost function since the goal is to minimize the peak temperature and mean temperature across the die. The objective function with calculating temperature then becomes Area CP I W area + W CP I + Area norm CP I norm W T emp (T avg + T max) T norm (2) The major drawback is that the computation time is long in SA flow. For example, considering temperature effect, it took around forty minutes to complete the whole flow [10], while it only took less than five minutes to complete the flow without considering the temperature effect. Therefore, directly calculating temperature is not efficient for thermalaware micro-architecture floorplanning. We ll describe how to accurately and efficiently model the thermal effect in the next section. 3. STOCHASTIC THERMAL-AWARE FLOORPLANNING In this section, we first describe the deterministic heat diffusion model and point out why it is inaccurate based on four observations. We then show how to build an accurate stochastic heat diffusion model based on those observations. 3.1 Deterministic Heat Diffusion Modeling Deterministic model definition [13] improved the runtime in thermal-aware floorplanning by using heat diffusion model to avoid directly calculating temperature. The concept is that the temperature of an isolated module linearly depends on its power density. Modules with high power density tend to have high temperature as a result. Therefore, in order to lower the temperature of modules with high power density, it is better to make modules with low power density adjacent to those modules to help to dissipate the heat. The temperature of modules with high power density then becomes lower because of large heat diffusion to the modules with low power density. As a result, in order to lower the temperature of modules with high power density, the more modules with lower power density adjacent to the modules with high power density, the smoother the temperature distribution is across the chip. The heat diffusion between two adjacent modules M i and M j can be represented as h(m i, M j) = (P Di P Dj) shared length ij (3) where P Di and P Dj are the average power density over time for modules M i and M j, shared length ij is the shared length between modules M i and M j. The total heat diffusion for module i is X H i = h(m i, M j) (4) j adjacent to i Since the potentially hottest modules are limited, considering the total heat diffusion for those potentially hottest modules can be a good estimation. The total heat diffusion becomes HeatDiff = X i H i (5) The objective function with heat diffusion becomes Area CP I W area + W CP I + Area norm CP I norm HeatDiff W thermal (6) HeatDiff norm Although the heat diffusion is a good representation to estimate the lateral heat flow, this model is too simplified to guarantee finding a good floorplan considering thermal effect since it ignores other factors described below Shortcomings First of all, given a micro-architecture and a series of applications, transient power may fluctuate a lot over time for one module since different applications have different work load on that module and thus the transient power over time is quite nonuniform. Also, power vectors over time for two modules may be either positively or negatively correlated. Using average power may underestimate the peak temperature for positively correlated modules since transient temperature is higher when both modules have high transient power consumption as shown in Fig. 3, where Fig. 3(a) and Fig. 3(b) have the same average power but the difference
4 positively correlated negatively correlated Tmax Tmin Figure 3: Temporal correlation between M1 and M2, (a) positively correlated and (b) negatively correlated. M1 has a higher transient temperature in (a) than in (b), although the average power is same. M1 M2 M3 Tmax Figure 5: The temperature of M2 is higher than that of M1 and M3 of the peak temperature and that of the lowest temperature are around 5 O C, which can not be ignored. Here a real case from our simulation is shown in Fig. 4. In Fig. 4(a), it shows that the correlation between LSQ and IALU4 is almost perfect, while in Fig. 4(b) RUU and F P Add have no correlation. As a result, without considering the correlation of power consumptions between modules, the maximal temperature is underestimated. power/watt power over time LSQ IALU cycle x 10 4 power/watt power over time RUU FPAdd cycle x 10 4 Figure 4: Different styles of lines for different modules: (a)lsq and IALU4 and (b)ruu and F P Add Second, the module next to the border of a die has extra heat flow to the heat spreader or the ambient. Fig. 5 shows that when three modules have similar power consumptions, temperature of the two modules on the side is lower than that of the module between them. The temperature difference is around 0.5 O C in this case and the difference may vary according to the design of the package. Third, since there is no power consumption for any dead space, the heat diffusion from one module to the dead space (shadow in Fig. 6) is much larger than that from one module to another module. As shown in Fig. 6, M1 has the highest power density and M1 has a lower peak temperature (92.96 O C) in Fig. 6(b) compared with that (94.2 O C) in Fig. 6(a) since the dead space can remove more heat than other modules do. In the floorplanning, thermal effect may try to move modules separately as far as possible while the constraint on the area attempts to minimize the total dead space. As a result, the weights of area and temperature constraints have to be adjusted for balance. Fourth, the heat diffusion between two modules is pro- (a) (b) Tmax Tmin Figure 6: Dead space effect: M1 has lower temperature in (b) than in (a) portional to their shared length since the lateral thermal conductance is proportional to the width of the module. However, the area of each adjacent module should also be considered. Given four modules M1, M2, M3, and M4 with power density P D1 > P D4 > P D2 = P D3 in Fig. 7, M1 may have a lower temperature with adjacency of M3 since M1 can diffuse more heat to M3 than M2, which suggests the heat diffusion should consider not only the shared length but also the depth of the adjacent module. As shown in Fig. 7, the difference of the peak temperature is over 1 O C, and the temperature gradient in Fig. 7(a) is sharper than that in Fig. 7(b). However, if the depth of one adjacent module is too large, this may lead to wrong estimation. The reason is that another possible floorplan candidate with an adjacent module having lower power density may still be rejected because the depth of the adjacent module is much smaller than the current one. Considering this, we can predefine a penetration window to enclose the target module. If the depth of the adjacent module is across the window, we only consider the module inside the window. On the other hand, if the adjacent module is inside the window, we take into account the modules adjacent to it as well. For example, in Fig. 7(a), M2 is inside the red window (dash line), we have to consider the area of M4 inside the red window as well. Similarly, since M3 crosses the window, we only consider the area of M3 inside the window. Details will be described in the next section. Finally, considering several hottest modules, just summing their heat diffusion may not gurantee a good solution. For example, module A and module B are the two hottest modules, and P DA > P DB. After running the SA algorithm, we have H A = H B C. This is not the optimal solution since H A = C + D and H B = C D, where C > D > 0, can give a better solution because of more heat flow for module A as shown in Fig. 8. Considering this effect, we should use the weighted sum of the heat flow for those hottest modules
5 (a) (b) Tmax Tmin Figure 7: M1 has lower temperature in (b) since M3 and M2 have same power density but M3 is larger than M2 to reduce peak temperature more effectively. where L ij is the shared length between modules M i and M j, C ij is the shared length between module M i and dead space N j. The heat diffusion vector to the border is f(b i) = P Di B i Con lateral Con adjacent (11) where B i is the shared length between module M i and the border of the die, Con lateral and Con adjacent are the unit lateral conductance between the heat spreader and module M i and between two adjacent modules, respectively, both of which can be calculated according to [11]. The standard deviation of the total heat diffusion for module M i is σ i = sqrt(e((h i adj + H i dead + f(b i)) 2 ) (E(H i adj + H i dead + f(b i))) 2 ) (12) The stochastic heat diffusion model for module M i is H i = E(H i adj ) + E(H i dead ) + E(f(B i)) + 3 σ i (13) Figure 8: Module deserves more heat flow due to higher power density 3.2 Stochastic Modeling Stochastic heat diffusion model Since directly calculating temperature is time-consuming and the deterministic heat diffusion model is not accurate enough, here we propose an accurate and efficient stochastic heat diffusion model based on the observations in Subsection Given a micro-architecture floorplan with m modules, n dead spaces, and power vector P i = [p i1,..., p it ] over T time steps for module M i, 1 i m. The mean power density P Di for module M i is P Di = E(P Di) = 1 A i 1 T TX p ij (7) j=1 where A i is the area for module M i, P Di is the transient power density vector, which equals P i A i. In this paper, E(X) is the expectation value of vector X. The power density covariance between any two modules M i and M j is cov(p Di, P Dj) = E(P Di P Dj) P Di P Dj (8) Given x adjacent modules, y adjacent dead spaces, and a penetration window size W L, the heat diffusion vector to the adjacent modules H i adj and to the adjacent dead spaces H i dead for module M i are defined as follows, respectively. H i adj = H i dead = xx (P Di P Dj) L ij (9) j = 1 yx P Di C ij (10) j = 1 where the first two terms are the mean heat diffusion to the adjacent modules and dead space, respectively, the third term is the mean heat diffusion to the lateral heat spreader, and 3σ i is the term for the correlation impact approximated by Equation (12). The larger the standard deviation between modules is, the smaller the correlation is. If dead space N j or module M j are totally inside the penetration window, we have to consider other modules that are partially inside the window. Then P Dj is modified to P Dj = P K k = 1 P Dk D k (K k + 1) P k k = 1 D k (K k + 1) (14) where K is level number between the target block and the window, the level contacting M i is level 1 and the level contacting the window is level K. P Dk is the average power density in level k. D k is the depth of each level k. In Fig. 9, the red window (dash line) defines the modules involved, and the blue (slash) one defines the modules to calculate modified P D1. The modified P D1 is derived from modules M 1, M 2, and M 3, and the first belongs to level 1 and the second and the third belong to level 2. Also, the power density of level 1 ( P D1) is just P D1 and the power density of level 2 ( P D2) is composed of P D2 and P D3. Figure 9: Illustration of calculation of modified P Dj Note that Equation (12) can be easily calculated once we calculate equations (7) and (8) that are pre-calculated during one-time cycle-accurate micro-architecture simulation for performance and power before entering SA algorithm. Therefore, there is no significant runtime overhead.
6 Considering Z potential hottest modules, the total stochastic heat flow then becomes ZX Stochastic HeatDiff = W i Hi (15) i = 1 where W i is the weight proportional to P Di If the heat diffusion for module M i is positive, it means the total net heat diffusion flows outward. Similarly, negative value means the total net heat diffusion flows inward to module M i. As a result, the larger the heat diffusion, the more heat can be diffused to the adjacent modules with lower power densities and thus the more the temperature can be lowered Hierarchal clustering We only have to consider the heat diffusion for several potential hottest modules and we use K-mean clustering algorithm to find the right number of potential hottest modules. In this paper, the objective is to minimize variance V ar V ar = kx i = 1 X P Dj S i ( P Dj µ i ) 2 (16) where P Dj is power density of module M j and µ i is the average power density within cluster S i. We use a hierarchical K-mean clustering to find the potential hottest modules. First we set a threshold such as 30% of total modules as the maximum number we have to consider. Then we run K-mean to cluster modules into two clusters and perform the same procedure to the cluster with the higher power density. The above recursive procedure stops when the number of modules in the recursively refined cluster with the higher power density is less than the threshold. Using this hierarchical method, we can find the optimal number to be considered in the calculation of total heat diffusion. 3.3 Floorplanning with Stochastic Thermal Model In Equation (6), the thermal term is deterministic without considering correlations of power densities between modules and other effects. We propose the new objective function as Area CP I W area + W CP I + Area norm CP I norm Stochastic HeatDiff W thermal (17) HeatDiff norm where we replace HeatDiff by Stochastic HeatDiff, which is calculated from Sub-section and Sub-section ALGORITHM FLOW AND EXPERIMENTS In this section, we compare the floorplanning based on our proposed model with [13] and [10]. Moreover, we optimize the overall floorplan on the dual core micro-architectures as well as only on one core to study the impact. The experiment setting is described below. 4.1 Flow and Experiment Setting Similar to [3], we assume two SuperScalar processors for both 90nm and 65nm technologies. The settings are shown in Table 1. We treat the blocks as soft and the aspect ratio is between 0.33 and 3 and L2 is partition into three modules. Table 1: Settings in 90nm and 65nm technology. 90nm 65nm Issue Width 4 8 Die Area(mm 2 ) Die Thickness(mm) 0.5 Heat Spreader(mm 2 ) Heat Sink(mm 2 ) Functional Units 3 Integer ALU 6 Integer ALU 1 Integer Mult 2 Integer Mult 1 FP Adder 2 FP Adder 1 FP Mult 2 FP Mult Register Update Unit 64 Instructions 128 Instructions Load Store Queue 32 Instructions 32 Instructions Fetch Queue 8 Instructions 16 Instructions Clock Frequency 3GHz 6GHz FF Insertion Length 2000 um 707 um Blocks composed of multiple modules are the RUU block including Register Update Unit and Load Store Queue, Decode block including Fetch Queue and the Decoder, Branch block including Fetch Unit and Branch Predictor, DL1 block including the Level 1 Data Cache and the DTLB, and the last IL1 block including the Level 1 Instruction Cache and the ITLB. The L2 unified cache and all functional units are treated as independent blocks. The block area is summarized in Table 2. Table 2: Area of logical block in 90nm and 65nm technology. Technology 90nm 65nm Block Area (mm 2 ) 90nm Area (mm 2 ) 65nm IALU IMULT FPADD FPMULT RUU Decode Branch L IL DL The length of interconnect between two blocks in Table 2 are computed according to the Manhattan distance between the centers of two blocks. We view the latency of each such interconnect as an independent variable. Changing the latency of one of these interconnects is an effective change in the micro-architecture and may impact performance. In Table 3, we specify these interconnects with respect to their terminal blocks. We apply TPWL model [3] as our CPI model. Given the architecture configuration in Table 1, we use PTscalar [20] to simulate the power consumption for four integer applications bzip2, gcc, gzip, and mcf and three floating applications art, equake, and mesa in SPEC2000 [21]. With these power vectors, we calculate the mean power density (w/mm 2 ) and standard deviation for each module. Fig. 10 shows the correlation matrix for 90nm technology, which is acquired from the sampling power vectors from all
7 Table 3: Buses that potentially affect CPI Bus ID Terminal blocks Bus ID Terminal blocks Area 1 IALU,RUU 6 IL1,IL2 2 IMULT,RUU 7 DL1,L2 3 FPAdd,RUU 8 Branch,IL1 4 FPMul,RUU 9 Decode,Branch 5 LSQ,DL1 10 Decode,RUU Number Block Name Description 1 decode Instruction Decoder 2 branch Branch Predictor 3 RAT Rename Table 4 RUU Register Updata Unit 5 LSQ Load Store Queue 6 IALU1 Integer Adder 1 7 IALU2 Integer Adder 2 8 IALU3 Integer Adder 3 9 IntReg Integer Register 10 IL1 Level 1 Instruction Cache and ITLB 11 DL1 Level 1 Data Cache and DTLB 12 IALU4 Integer Multiplier 13 FPAdd Floatpoint Adder 14 FPMul Floatpoint Multiplier 15 FPReg Floatpoint Register 16 L2 left Part of Level 2 Cache 17 L2 right Part of Level 2 Cache 18 L2 bottom Part of Level 2 Cache Figure 10: temporal correlation matrix of power consumption modules. In order to get the real correlation, we only count those sampling points in active mode since in sleep mode, every module consumes only static power. If we consider those sampling points, there is big error in estimating the correlation. As shown in Fig. 10, we can roughly partition all modules into three groups, the first group is from Decode(1) to DL1(11), the second group is IALU4(12), which does not have strong correlation to any other module, and the last one is from FPAdd(13) to L2 right(18). Modules in the same group are highly positive correlated and the correlations between modules in the different groups are either uncorrelated or negative correlated. We use SA-based PARQUET [18] as our base floorplan solver combined with the CPI model [3] and our stochastic heat diffusion model. We run the experiments on a Linux workstation. After completing the whole flow with different objectives, HOTSPOT [11] is used to calculate the temper- Figure 11: Whole flow of thermal-aware floorplanning using TPWL to calculate CPI ature for verification purpose only. Also, for each objective, we run ten iterations to acquire the best case and the average case due to SA algorithm. The whole flow of our work is summarized in Fig Comparison between Thermal Models We compare our stochastic heat diffusion model (SHDM) with [10], which calculates the max temperature for every iteration in SA to estimate the cost for each new floorplan. We run seven benchmarks as described in Section 4.1. The objective function is area and thermal effect with weight 0.6 and 0.3, respectively. Table. 4 summarizes the final result. WS means white space (dead space) percentage in the total area. For example, (4.3%) means the total area is and the white space percentage is 4.3%. From the table, SHDM can reduce T max by up to 3 o C (93.0 o C o C) (3.2%) with a 1.34% increase in the area. The above results show our model is quite accurate while it is 27x faster for 90nm processor and 19x faster for 65nm processor. 4.3 Comparison between Different Objectives In this section, we compare the thermal impact on the floorplan with different objectives and we also compare the results with [13]. We use the same setting as in Section 4.2 with different objective function. We summarize the results in Table 5. Since our model is stochastic, we denote our model with objectives area, CPI, and heat diffusion as ACHs and [13] with the same objectives as ACW Hd. The
8 Table 4: Comparison between stochastic heat diffusion model and [10] 90nm 65nm Tmax ( o C) Area (mm 2 ) (WS) Runtime (s) Tmax ( o C) Area (mm 2 ) (WS) Runtime (s) [10] (4.7%) (4.3%) 2980 SHDM (5.6%) (5.8%) 155 impact -3.2% +1.34% 1/27x -0.2% +1.03% 1/19x weight for area, CPI, and heat diffusion is 0.6, 0.3, and 0.2, respectively, in the objective function. We first compare the results with or without considering thermal effect based on our stochastic model. As shown in Table 5, considering the thermal effect with objective AC in the best case, the maximal temperature is reduced by 9.1% from 97.7 o C to 88.8 o C for 90nm and by 5.4% from o C to 97.2 o C for 65nm with negligible area overhead and increase of CPI up to 7.3%. Clearly, there is a trade off between lowering the temperature and reducing CPI, the reason is that to lower temperature, two hot modules with critical bus or wire have to be separated as far as possible, but which also means CPI or wire length increase. However, the huge amount of reduction in temperature with little decrease in performance is worthy since lower temperature implies better clock rate and clock reliability. The work in [13] used simple heat diffusion model without considering correlation, dead space, border effect and the geometry of the modules, which produced up to 2.9% area overhead and up to 21.3% increase of CPI with similar or less temperature reduction compared with our model, which shows that ours is more accurate and robust. Although our runtime is longer than that in [13] as shown in Table 6, since the run time is just less than a few minutes, it does not have much practical impact. AR (aspect ratio) in Table 6 means the ratio of length and width of the final floorplan. 4.4 Dual Core Micro-architecture In this section, we compare the thermal impacts on the dual-core floorplan for 90nm technology. We assume both cores run the same application and both cores have the same micro-architecture as described in Section 4.1. When optimizing is only on one core, we define the aspect ratio to be two to make the aspect ratio be one after combing two cores. In this experiment, the weight for area, CPI, and thermal is 0.5, 0.6, and 0.3, respectively. In Table 7 and Table 8, AC Separate and ACHs Separate mean the floorplan is optimized only on one core while AC Mixed and ACHs Mixed mean the floorplan is optimized over two cores. Fig. 12, 13, 14, and 15 show the final floorplan for different objectives. Modules that are named * 1 (i.e., 1 at the end) with blue color (dark) belong to core one. Modules named * 2 with yellow color (grey) belong to core two. The white space means dead space. As can been seen from the figures, both AC Separate and AC Mixed tend to cluster the core modules together to reduce the CPI but AC mixed has smaller CPI than AC Separate as shown in Table 7. Theoretically, AC Mixed should get the minimal CPI or at least close result compared with AC Mixed. The experiments show the similar results. Another thing should be noticed is that both the average peak temperature and the peak temperature in the best case for AC Mixed is lower than that for AC Separate as shown in Table 7. The reason is mainly because modules with low power densities have more chance to be adjacent to the potentially hottest modules in the mixed floorplan. In brief, without considering thermal effect, AC Mixed generally has lower peak temperature with slightly smaller CPI compared with AC Separate. With considering thermal effect, ACHs Separate still has well-clustered core with lower peak temperature compared with AC Separate while the core of ACHs Mixed is spread out with the lowest peak temperature compared with AC Mixed. Moreover, ACHs Mixed obtains better temperature reduction by 3.36 o C (98.57 o C o C) and even 1.02x (0.926/0.910) faster CPI performance compared with ACHs separate. However, CPI increases by 7.5% (best, ( )/0.846) and 6.9% (average, ( )/0.896), respectively, and area increases by 1.7% (best, ( )/240.3) and 1.0% (average, ( )/246.8) compared with floorplan without considering thermal effect, which shows a trade off between thermal and performance as shown in Table 7. In conclusion, floorplan optimized over two cores can slightly improve CPI performance compared with floorplan optimized separately. With consideration of thermal effect, mixed floorplan can further reduce the peak temperature compared with floorplan optimized separately. 5. CONCLUSIONS AND DISCUSSIONS We have proposed a stochastic thermal-aware floorplanning with consideration of micro-architecture level throughput optimization. First, we have convincingly shown that there are correlations between power for modules for different microprocessor applications. Second, considering dead space, border effect, and the geometry of modules, we have developed a stochastic heat diffusion model and implemented this model on microprocessor floorplanning. Compared with the existing floorplanning using deterministic heat diffusion model, our model obtains up to 3.2 o C (92.0 o C-88.8 o C) reduction of the on-chip peak temperature, 1.29% (( )/224.1) reduction of the area, and 1.13x (0.995/0.880) better CPI performance, respectively. Moreover, compared with temperature aware floorplanning in the HOTSPOT toolset that ignores interconnect pipelining, our algorithm is up to 27x faster and reduces the peak temperature by up to 3 o C with a negligible area overhead. We also study the dual core micro-architecture to see the effect if the floorplan is optimized over two cores instead of only on two cores separately. The results show optimizing over two cores can further reduce the temperature by 3.36 o C with slightly better CPI performance compared with optimizing only one core, which provides different aspects on dual core micro-architecture design. Also, this concept may be applied to multi core to obtain a better balance between performance and thermal issues. In the future, we will further apply our stochastic thermal model to deal with rectilinear blocks, 3D stacking, multi core micro-architecture, and other floorplanning methods.
9 Table 5: Comparison of stochastic and deterministic heat diffusion model with different objectives 90nm Obj. CPI Tmax ( o C) Area (mm 2 ) WS (%) Best Avg Best Avg Best Avg Best Avg AC ACHd % +12.4% -5.8% -4.7% +2.9% +2.3% ACHs % +7.2% -9.1% -8.1% +2.2% +0.6% 65nm Obj. CPI Tmax ( o C) Area (mm 2 ) WS (%) Best Avg Best Avg Best Avg Best Avg AC ACHd % +8.9% -5.0% -4.6% +2.89% -1.00% ACHs % +1.8% -5.4% -7.5% +1.56% -0.27% Table 6: Comparison of runtime and aspect ratio (AR) under different objectives 90nm Obj. Runtime (s) AR Best Avg AC ACHd ACHs nm Obj. Runtime (s) AR Best Avg AC ACHd ACHs Table 7: Comparison between mixed and separate dual core floorplan Obj. CPI Tmax ( o C) Area (mm 2 ) WS (%) Best Avg Best Avg Best Avg Best Avg AC Separate AC Mixed % -4.57% -6.43% -2.44% +0.00% -1.32% ACHs Separate ACHs Mixed % -0.72% -3.41% -1.84% -0.77% +0.20% Table 8: Comparison of runtime and aspect ratio (AR) under different objectives for dual core microarchitecture Obj. Runtime (s) AR Best Avg AC Separate AC Mixed ACHs Separate ACHs Mixed REFERENCES [1] M. Ekpanyapong, J. R. Minz, T. Watewai, H. S. Lee, and S. K. Lim, Profile-guided microarchitecture floorplanning for deep submicro processor design, in IEEE/ACM Design Automation Conf., [2] V. Nookala, Y. Chen, D. Lilja, and S. Sapatnekat, Microarchitectur-aware floorplanning using a statitical design of experiemtns approach, in
10 Figure 12: Floorplan: AC Separate (half) Figure 13: Floorplan: AC Mixed IEEE/ACM Design Automation Conf., pp , [3] C. Long, L. Simonson, W. Liao, and L. He, Floorplanning optimization with trajectory piecewise-linear model for pipelined interconnects, in IEEE/ACM Design Automation Conf., [4] A. Jagannathan, H. H. Yang, K. Konigsfeld, D. Milliron, M. Mohan, M. Romesis1, G. Reinman, and J. Cong1, Microarchitecture evaluation with floorplanning and interconnect pipelining, in Asia South Pacific Design Automation Conf., pp [5] J. Srinivasan, S. Adve, P. Bose, and J. Rivers, The impact of technology scaling on lifetime reliability, in Proc. Dependable Systems and Networks, [6] C. Tsai and S. Kang, Cell-level placement for improving substrate thermal distribution, in IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol.19, no.2, pp , [7] C. C. Chu and D. Wong, A matrix synthesis approach to thermal placement, in IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol.17, no.11, pp.1166-pp1174, [8] B. Oberneier and F. Johannes, Temperature-aware global placement, in Asia South Pacific Design Automation Conf., pp , [9] G. Chen and S. Sapatnekar, Partition-driven standard cell thermal placement, in Int. Symp. on Physical Design, vol.1, pp.75-80, 2003.
11 Figure 14: Floorplan: ACHs Separate (half) Figure 15: Floorplan: ACHs Mixed [10] K. Sankaranarayanan, S. Velusamy, M. R. Stan, and K. Skadron, A case for thermal-aware floorplanning at the microarchitectural level, in Journal of Instruction-Level Parallelism, [11] K. Skadrona, M. R. Stan, K. Sankaranarayanan, W. Huang, S. Velusamy, and D. Tarjan, Temperature-aware microarchitecture, in Proc. IEEE Int. Symp. on Circuits and Systems, [12] V. Nookala, D. J. Lilja, and S. S. Sapatnekar, Temperature-aware floorplanning of microarchitecture blocks with ipc-power dependence modeling and transient analysis, in Int. Symp. on Low Power Electronics and Design, [13] Y. Han, I. Koren, and C. A. Moritz, Temperature aware floorplanning, in Temperature aware Computing Systems, [14] H. Yu, Y. Hu, C. C. Liu, and L. He, Minimal skew clock embedding considering time variant temperature variation with automatic correlation extraction, in Int. Symp. on Physical Design, [15] M. Monchiero, R.Canal, and A. Gonzalez, Design space exploration for multicore architectures: a power/performance/thermal view, in Proc.of Int. Conf. on Supercomputing pp , [16] E. Humenay, D. Tarjan, and K. Skadron, Impact of process variations on multicore performance symmetry, in Proc. Design Automation and Test in Europe, 2007.
12 [17] N. Sherwani, Algorithms for vlsi design automation, Kluwer, 3rd ed., [18] S. N. Adya and I. L. Markov, Fixed-outlined floorplanning through better local research, in IEEE/ACM Int. Conf. on Computer Aided Design, pp , [19] D. Burger and T. Austin, The simplescalar tool set version 2.0, in University of Wisconsin-Madison, [20] W. L. W. Liao, L. He, and K. Lepakand, Ptscalar version 1.0, in University of California-Los Angeles, [21] J. L. Henning, SPEC CPU 2000:Measuring CPU performance in the new millennium, IEEE Trans. on Computers, pp vol.33, [22] A. Krum, Thermal management, in F.Kreith, ed, The CRC Handbook of Thermal Enginnering pp CRC Press, Boca Raton, FL, [23] W. Liao, L. He, and K. Lepak, Temperature and supply voltage aware performance and power modeling at microarchitecture level, IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, pp , 2005.
Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning
Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning Yuchun Ma* Zhuoyuan Li* Jason Cong Xianlong Hong Glenn Reinman Sheqin Dong* Qiang Zhou *Department of Computer Science &
More informationTemperature Aware Floorplanning
Temperature Aware Floorplanning Yongkui Han, Israel Koren and Csaba Andras Moritz Depment of Electrical and Computer Engineering University of Massachusetts,Amherst, MA 13 E-mail: yhan,koren,andras @ecs.umass.edu
More informationTemperature-Aware Floorplanning of Microarchitecture Blocks with IPC-Power Dependence Modeling and Transient Analysis
Temperature-Aware Floorplanning of Microarchitecture Blocks with IPC-Power Dependence Modeling and Transient Analysis Vidyasagar Nookala David J. Lilja Sachin S. Sapatnekar ECE Dept, University of Minnesota,
More informationSimulated Annealing Based Temperature Aware Floorplanning
Copyright 7 American Scientific Publishers All rights reserved Printed in the United States of America Journal of Low Power Electronics Vol. 3, 1 15, 7 Simulated Annealing Based Temperature Aware Floorplanning
More informationArchitecture-level Thermal Behavioral Models For Quad-Core Microprocessors
Architecture-level Thermal Behavioral Models For Quad-Core Microprocessors Duo Li Dept. of Electrical Engineering University of California Riverside, CA 951 dli@ee.ucr.edu Sheldon X.-D. Tan Dept. of Electrical
More informationSystem Level Leakage Reduction Considering the Interdependence of Temperature and Leakage
System Level Leakage Reduction Considering the Interdependence of Temperature and Leakage Lei He, Weiping Liao and Mircea R. Stan EE Department, University of California, Los Angeles 90095 ECE Department,
More informationThermomechanical Stress-Aware Management for 3-D IC Designs
2678 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 25, NO. 9, SEPTEMBER 2017 Thermomechanical Stress-Aware Management for 3-D IC Designs Qiaosha Zou, Member, IEEE, Eren Kursun,
More informationTemperature-Aware Performance and Power Modeling
Technical Report UCLA Engr. 04-250 University of California at Los Angeles Temperature-Aware Performance and Power Modeling Weiping Liao, Lei He and Kevin Lepak Electrical Engineering Department University
More informationThermal-reliable 3D Clock-tree Synthesis Considering Nonlinear Electrical-thermal-coupled TSV Model
Thermal-reliable 3D Clock-tree Synthesis Considering Nonlinear Electrical-thermal-coupled TSV Model Yang Shang 1, Chun Zhang 1, Hao Yu 1, Chuan Seng Tan 1, Xin Zhao 2, Sung Kyu Lim 2 1 School of Electrical
More informationStochastic Dynamic Thermal Management: A Markovian Decision-based Approach. Hwisung Jung, Massoud Pedram
Stochastic Dynamic Thermal Management: A Markovian Decision-based Approach Hwisung Jung, Massoud Pedram Outline Introduction Background Thermal Management Framework Accuracy of Modeling Policy Representation
More informationBuffered Clock Tree Sizing for Skew Minimization under Power and Thermal Budgets
Buffered Clock Tree Sizing for Skew Minimization under Power and Thermal Budgets Krit Athikulwongse, Xin Zhao, and Sung Kyu Lim School of Electrical and Computer Engineering Georgia Institute of Technology
More informationCoupled Power and Thermal Simulation with Active Cooling
Coupled Power and Thermal Simulation with Active Cooling Weiping Liao and Lei He Electrical Engineering Department University of California, Los Angeles, CA 90095 {wliao,lhe}@ee.ucla.edu Abstract. Power
More informationEnhancing the Sniper Simulator with Thermal Measurement
Proceedings of the 18th International Conference on System Theory, Control and Computing, Sinaia, Romania, October 17-19, 214 Enhancing the Sniper Simulator with Thermal Measurement Adrian Florea, Claudiu
More informationCoupled Power and Thermal Simulation with Active Cooling
Coupled Power and Thermal Simulation with Active Cooling Weiping Liao and Lei He Electrical Engineering Department, University of California, Los Angeles, CA 90095 {wliao, lhe}@ee.ucla.edu Abstract. Power
More informationCSE241 VLSI Digital Circuits Winter Lecture 07: Timing II
CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II CSE241 L3 ASICs.1 Delay Calculation Cell Fall Cap\Tr 0.05 0.2 0.5 0.01 0.02 0.16 0.30 0.5 2.0 0.04 0.32 0.178 0.08 0.64 0.60 1.20 0.1ns 0.147ns
More informationPERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah
PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Jan. 17 th : Homework 1 release (due on Jan.
More informationCMOS device technology has scaled rapidly for nearly. Modeling and Analysis of Nonuniform Substrate Temperature Effects on Global ULSI Interconnects
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 24, NO. 6, JUNE 2005 849 Modeling and Analysis of Nonuniform Substrate Temperature Effects on Global ULSI Interconnects
More informationEnabling Power Density and Thermal-Aware Floorplanning
Enabling Power Density and Thermal-Aware Floorplanning Ehsan K. Ardestani, Amirkoushyar Ziabari, Ali Shakouri, and Jose Renau School of Engineering University of California Santa Cruz 1156 High Street
More informationReliability-aware Thermal Management for Hard Real-time Applications on Multi-core Processors
Reliability-aware Thermal Management for Hard Real-time Applications on Multi-core Processors Vinay Hanumaiah Electrical Engineering Department Arizona State University, Tempe, USA Email: vinayh@asu.edu
More informationHIGH-PERFORMANCE circuits consume a considerable
1166 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL 17, NO 11, NOVEMBER 1998 A Matrix Synthesis Approach to Thermal Placement Chris C N Chu D F Wong Abstract In this
More informationPerformance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So
Performance, Power & Energy ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So Recall: Goal of this class Performance Reconfiguration Power/ Energy H. So, Sp10 Lecture 3 - ELEC8106/6102 2 PERFORMANCE EVALUATION
More informationFastSpot: Host-Compiled Thermal Estimation for Early Design Space Exploration
FastSpot: Host-Compiled Thermal Estimation for Early Design Space Exploration Darshan Gandhi, Andreas Gerstlauer, Lizy John Electrical and Computer Engineering The University of Texas at Austin Email:
More informationA Novel Software Solution for Localized Thermal Problems
A Novel Software Solution for Localized Thermal Problems Sung Woo Chung 1,* and Kevin Skadron 2 1 Division of Computer and Communication Engineering, Korea University, Seoul 136-713, Korea swchung@korea.ac.kr
More informationInterconnect Lifetime Prediction for Temperature-Aware Design
Interconnect Lifetime Prediction for Temperature-Aware Design UNIV. OF VIRGINIA DEPT. OF COMPUTER SCIENCE TECH. REPORT CS-23-2 NOVEMBER 23 Zhijian Lu, Mircea Stan, John Lach, Kevin Skadron Departments
More informationTechnical Report GIT-CERCS. Thermal Field Management for Many-core Processors
Technical Report GIT-CERCS Thermal Field Management for Many-core Processors Minki Cho, Nikhil Sathe, Sudhakar Yalamanchili and Saibal Mukhopadhyay School of Electrical and Computer Engineering Georgia
More informationTEMPERATURE-AWARE COMPUTER SYSTEMS: OPPORTUNITIES AND CHALLENGES
TEMPERATURE-AWARE COMPUTER SYSTEMS: OPPORTUNITIES AND CHALLENGES TEMPERATURE-AWARE DESIGN TECHNIQUES HAVE AN IMPORTANT ROLE TO PLAY IN ADDITION TO TRADITIONAL TECHNIQUES LIKE POWER-AWARE DESIGN AND PACKAGE-
More informationINF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)
INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder
More informationEfficient online computation of core speeds to maximize the throughput of thermally constrained multi-core processors
Efficient online computation of core speeds to maximize the throughput of thermally constrained multi-core processors Ravishankar Rao and Sarma Vrudhula Department of Computer Science and Engineering Arizona
More informationOn Optimal Physical Synthesis of Sleep Transistors
On Optimal Physical Synthesis of Sleep Transistors Changbo Long, Jinjun Xiong and Lei He {longchb, jinjun, lhe}@ee.ucla.edu EE department, University of California, Los Angeles, CA, 90095 ABSTRACT Considering
More informationDynamic Power Management under Uncertain Information. University of Southern California Los Angeles CA
Dynamic Power Management under Uncertain Information Hwisung Jung and Massoud Pedram University of Southern California Los Angeles CA Agenda Introduction Background Stochastic Decision-Making Framework
More informationUSING ON-CHIP EVENT COUNTERS FOR HIGH-RESOLUTION, REAL-TIME TEMPERATURE MEASUREMENT 1
USING ON-CHIP EVENT COUNTERS FOR HIGH-RESOLUTION, REAL-TIME TEMPERATURE MEASUREMENT 1 Sung Woo Chung and Kevin Skadron Division of Computer Science and Engineering, Korea University, Seoul 136-713, Korea
More informationAnalytical Model for Sensor Placement on Microprocessors
Analytical Model for Sensor Placement on Microprocessors Kyeong-Jae Lee, Kevin Skadron, and Wei Huang Departments of Computer Science, and Electrical and Computer Engineering University of Virginia kl2z@alumni.virginia.edu,
More informationShort Papers. Efficient In-Package Decoupling Capacitor Optimization for I/O Power Integrity
734 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 4, APRIL 2007 Short Papers Efficient In-Package Decoupling Capacitor Optimization for I/O Power Integrity
More informationDesign for Manufacturability and Power Estimation. Physical issues verification (DSM)
Design for Manufacturability and Power Estimation Lecture 25 Alessandra Nardi Thanks to Prof. Jan Rabaey and Prof. K. Keutzer Physical issues verification (DSM) Interconnects Signal Integrity P/G integrity
More informationAnalysis of Temporal and Spatial Temperature Gradients for IC Reliability
1 Analysis of Temporal and Spatial Temperature Gradients for IC Reliability UNIV. OF VIRGINIA DEPT. OF COMPUTER SCIENCE TECH. REPORT CS-24-8 MARCH 24 Zhijian Lu, Wei Huang, Shougata Ghosh, John Lach, Mircea
More informationEEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis
EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture Rajeevan Amirtharajah University of California, Davis Outline Announcements Review: PDP, EDP, Intersignal Correlations, Glitching, Top
More informationLecture 2: CMOS technology. Energy-aware computing
Energy-Aware Computing Lecture 2: CMOS technology Basic components Transistors Two types: NMOS, PMOS Wires (interconnect) Transistors as switches Gate Drain Source NMOS: When G is @ logic 1 (actually over
More informationLecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics
Lecture 23 Dealing with Interconnect Impact of Interconnect Parasitics Reduce Reliability Affect Performance Classes of Parasitics Capacitive Resistive Inductive 1 INTERCONNECT Dealing with Capacitance
More informationNCU EE -- DSP VLSI Design. Tsung-Han Tsai 1
NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using
More informationLecture 2: Metrics to Evaluate Systems
Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with AM, GM, HM Sign up for the class mailing list! Video
More informationAccurate Temperature Estimation for Efficient Thermal Management
9th International Symposium on Quality Electronic Design Accurate emperature Estimation for Efficient hermal Management Shervin Sharifi, ChunChen Liu, ajana Simunic Rosing Computer Science and Engineering
More informationThroughput of Multi-core Processors Under Thermal Constraints
Throughput of Multi-core Processors Under Thermal Constraints Ravishankar Rao, Sarma Vrudhula, and Chaitali Chakrabarti Consortium for Embedded Systems Arizona State University Tempe, AZ 85281, USA {ravirao,
More informationVariation-aware Clock Network Design Methodology for Ultra-Low Voltage (ULV) Circuits
Variation-aware Clock Network Design Methodology for Ultra-Low Voltage (ULV) Circuits Xin Zhao, Jeremy R. Tolbert, Chang Liu, Saibal Mukhopadhyay, and Sung Kyu Lim School of ECE, Georgia Institute of Technology,
More informationIEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 10, OCTOBER
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL 17, NO 10, OCTOBER 2009 1495 Architecture-Level Thermal Characterization for Multicore Microprocessors Duo Li, Student Member, IEEE,
More informationA Detailed Study on Phase Predictors
A Detailed Study on Phase Predictors Frederik Vandeputte, Lieven Eeckhout, and Koen De Bosschere Ghent University, Electronics and Information Systems Department Sint-Pietersnieuwstraat 41, B-9000 Gent,
More informationFast Thermal Simulation for Run-Time Temperature Tracking and Management
Fast Thermal Simulation for Run-Time Temperature Tracking and Management Pu Liu, Student Member, IEEE, Hang Li, Student Member, IEEE, Lingling Jin, Wei Wu, Sheldon X.-D. Tan, Senior Member, IEEE, Jun Yang,
More information3-D Thermal-ADI: A Linear-Time Chip Level Transient Thermal Simulator
1434 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 12, DECEMBER 2002 3-D Thermal-ADI: A Linear-Time Chip Level Transient Thermal Simulator Ting-Yuan Wang and
More informationOnline Work Maximization under a Peak Temperature Constraint
Online Work Maximization under a Peak Temperature Constraint Thidapat Chantem Department of CSE University of Notre Dame Notre Dame, IN 46556 tchantem@nd.edu X. Sharon Hu Department of CSE University of
More informationMicroprocessor Power Analysis by Labeled Simulation
Microprocessor Power Analysis by Labeled Simulation Cheng-Ta Hsieh, Kevin Chen and Massoud Pedram University of Southern California Dept. of EE-Systems Los Angeles CA 989 Outline! Introduction! Problem
More informationParameterized Transient Thermal Behavioral Modeling For Chip Multiprocessors
Parameterized Transient Thermal Behavioral Modeling For Chip Multiprocessors Duo Li and Sheldon X-D Tan Dept of Electrical Engineering University of California Riverside, CA 95 Eduardo H Pacheco and Murli
More informationAn Automated Approach for Evaluating Spatial Correlation in Mixed Signal Designs Using Synopsys HSpice
Spatial Correlation in Mixed Signal Designs Using Synopsys HSpice Omid Kavehei, Said F. Al-Sarawi, Derek Abbott School of Electrical and Electronic Engineering The University of Adelaide Adelaide, SA 5005,
More informationA Physical-Aware Task Migration Algorithm for Dynamic Thermal Management of SMT Multi-core Processors
A Physical-Aware Task Migration Algorithm for Dynamic Thermal Management of SMT Multi-core Processors Abstract - This paper presents a task migration algorithm for dynamic thermal management of SMT multi-core
More informationStack Sizing for Optimal Current Drivability in Subthreshold Circuits REFERENCES
598 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL 16, NO 5, MAY 2008 design can be easily expanded to a hierarchical 64-bit adder such that the result will be attained in four cycles
More informationThe Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering Michigan Technological University Houghton, Michigan
More informationChip Level Thermal Profile Estimation Using On-chip Temperature Sensors
Chip Level Thermal Profile Estimation Using On-chip Temperature Sensors Yufu Zhang, Ankur Srivastava and Mohamed Zahran* University of Maryland, *City University of New York {yufuzh, ankurs}@umd.edu, *mzahran@ccny.cuny.edu
More informationPre-characterization Free, Efficient Power/Performance Analysis of Embedded and General Purpose Software Applications
Pre-characterization Free, Efficient Power/Performance Analysis of Embedded and General Purpose Software Applications Venkata Syam P. Rapaka, Diana Marculescu Department of Electrical and Computer Engineering
More informationLogical Effort: Designing for Speed on the Back of an Envelope David Harris Harvey Mudd College Claremont, CA
Logical Effort: Designing for Speed on the Back of an Envelope David Harris David_Harris@hmc.edu Harvey Mudd College Claremont, CA Outline o Introduction o Delay in a Logic Gate o Multi-stage Logic Networks
More informationSchool of EECS Seoul National University
4!4 07$ 8902808 3 School of EECS Seoul National University Introduction Low power design 3974/:.9 43 Increasing demand on performance and integrity of VLSI circuits Popularity of portable devices Low power
More informationGranularity of Microprocessor Thermal Management: A Technical Report
Granularity of Microprocessor Thermal Management: A Technical Report Karthik Sankaranarayanan, Wei Huang, Mircea R. Stan, Hossein Haj-Hariri, Robert J. Ribando and Kevin Skadron Department of Computer
More informationDesign for Testability
Design for Testability Outline Ad Hoc Design for Testability Techniques Method of test points Multiplexing and demultiplexing of test points Time sharing of I/O for normal working and testing modes Partitioning
More informationThree-Tier 3D ICs for More Power Reduction: Strategies in CAD, Design, and Bonding Selection
Three-Tier 3D ICs for More Power Reduction: Strategies in CAD, Design, and Bonding Selection Taigon Song 1, Shreepad Panth 2, Yoo-Jin Chae 3, and Sung Kyu Lim 1 1 School of ECE, Georgia Institute of Technology,
More informationThermal Scheduling SImulator for Chip Multiprocessors
TSIC: Thermal Scheduling SImulator for Chip Multiprocessors Kyriakos Stavrou Pedro Trancoso CASPER group Department of Computer Science University Of Cyprus The CASPER group: Computer Architecture System
More informationControl-Theoretic Techniques and Thermal-RC Modeling for Accurate and Localized Dynamic Thermal Management
Control-Theoretic Techniques and Thermal-RC Modeling for Accurate and Localized ynamic Thermal Management UNIV. O VIRGINIA EPT. O COMPUTER SCIENCE TECH. REPORT CS-2001-27 Kevin Skadron, Tarek Abdelzaher,
More informationCompact Thermal Modeling for Temperature-Aware Design
Compact Thermal Modeling for Temperature-Aware Design Wei Huang, Mircea R. Stan, Kevin Skadron, Karthik Sankaranarayanan Shougata Ghosh, Sivakumar Velusamy Departments of Electrical and Computer Engineering,
More informationCS 700: Quantitative Methods & Experimental Design in Computer Science
CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,
More informationBeiHang Short Course, Part 7: HW Acceleration: It s about Performance, Energy and Power
BeiHang Short Course, Part 7: HW Acceleration: It s about Performance, Energy and Power James C. Hoe Department of ECE Carnegie Mellon niversity Eric S. Chung, et al., Single chip Heterogeneous Computing:
More informationTAU 2014 Contest Pessimism Removal of Timing Analysis v1.6 December 11 th,
TU 2014 Contest Pessimism Removal of Timing nalysis v1.6 ecember 11 th, 2013 https://sites.google.com/site/taucontest2014 1 Introduction This document outlines the concepts and implementation details necessary
More informationAnalytical Heat Transfer Model for Thermal Through-Silicon Vias
Analytical Heat Transfer Model for Thermal Through-Silicon Vias Hu Xu, Vasilis F. Pavlidis, and Giovanni De Micheli LSI - EPFL, CH-1015, Switzerland Email: {hu.xu, vasileios.pavlidis, giovanni.demicheli}@epfl.ch
More informationThrough Silicon Via-Based Grid for Thermal Control in 3D Chips
Through Silicon Via-Based Grid for Thermal Control in 3D Chips José L. Ayala 1, Arvind Sridhar 2, Vinod Pangracious 2, David Atienza 2, and Yusuf Leblebici 3 1 Dept. of Computer Architecture and Systems
More informationThroughput Maximization for Intel Desktop Platform under the Maximum Temperature Constraint
2011 IEEE/ACM International Conference on Green Computing and Communications Throughput Maximization for Intel Desktop Platform under the Maximum Temperature Constraint Guanglei Liu 1, Gang Quan 1, Meikang
More informationDesign for Variability and Signoff Tips
Design for Variability and Signoff Tips Alexander Tetelbaum Abelite Design Automation, Walnut Creek, USA alex@abelite-da.com ABSTRACT The paper provides useful design tips and recommendations on how to
More informationLogic BIST. Sungho Kang Yonsei University
Logic BIST Sungho Kang Yonsei University Outline Introduction Basics Issues Weighted Random Pattern Generation BIST Architectures Deterministic BIST Conclusion 2 Built In Self Test Test/ Normal Input Pattern
More informationMeasurement & Performance
Measurement & Performance Timers Performance measures Time-based metrics Rate-based metrics Benchmarking Amdahl s law Topics 2 Page The Nature of Time real (i.e. wall clock) time = User Time: time spent
More informationA Novel Methodology for Thermal Aware Silicon Area Estimation for 2D & 3D MPSoCs
A Novel Methodology for Thermal Aware Silicon Area Estimation for 2D & 3D MPSoCs Ramya Menon C. and Vinod Pangracious Department of Electronics & Communication Engineering, Rajagiri School of Engineering
More informationMeasurement & Performance
Measurement & Performance Topics Timers Performance measures Time-based metrics Rate-based metrics Benchmarking Amdahl s law 2 The Nature of Time real (i.e. wall clock) time = User Time: time spent executing
More informationComputer Architecture
Lecture 2: Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture CPU Evolution What is? 2 Outline Measurements and metrics : Performance, Cost, Dependability, Power Guidelines
More informationSystem-Level Modeling and Microprocessor Reliability Analysis for Backend Wearout Mechanisms
System-Level Modeling and Microprocessor Reliability Analysis for Backend Wearout Mechanisms Chang-Chih Chen and Linda Milor School of Electrical and Comptuer Engineering, Georgia Institute of Technology,
More informationPerformance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu
Performance Metrics for Computer Systems CASS 2018 Lavanya Ramapantulu Eight Great Ideas in Computer Architecture Design for Moore s Law Use abstraction to simplify design Make the common case fast Performance
More informationA Mathematical Solution to. by Utilizing Soft Edge Flip Flops
A Mathematical Solution to Power Optimal Pipeline Design by Utilizing Soft Edge Flip Flops M. Ghasemazar, B. Amelifard, M. Pedram University of Southern California Department of Electrical Engineering
More informationL16: Power Dissipation in Digital Systems. L16: Spring 2007 Introductory Digital Systems Laboratory
L16: Power Dissipation in Digital Systems 1 Problem #1: Power Dissipation/Heat Power (Watts) 100000 10000 1000 100 10 1 0.1 4004 80088080 8085 808686 386 486 Pentium proc 18KW 5KW 1.5KW 500W 1971 1974
More informationECE321 Electronics I
ECE321 Electronics I Lecture 1: Introduction to Digital Electronics Payman Zarkesh-Ha Office: ECE Bldg. 230B Office hours: Tuesday 2:00-3:00PM or by appointment E-mail: payman@ece.unm.edu Slide: 1 Textbook
More informationCHARACTERIZATION AND CLASSIFICATION OF MODERN MICRO-PROCESSOR BENCHMARKS KUNXIANG YAN, B.S. A thesis submitted to the Graduate School
CHARACTERIZATION AND CLASSIFICATION OF MODERN MICRO-PROCESSOR BENCHMARKS BY KUNXIANG YAN, B.S. A thesis submitted to the Graduate School in partial fulfillment of the requirements for the degree Master
More informationTiming Driven Power Gating in High-Level Synthesis
Timing Driven Power Gating in High-Level Synthesis Shih-Hsu Huang and Chun-Hua Cheng Department of Electronic Engineering Chung Yuan Christian University, Taiwan Outline Introduction Motivation Our Approach
More informationMulti-Objective Module Placement For 3-D System-On-Package
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 5, MAY 2006 553 Multi-Objective Module Placement For 3-D System-On-Package Eric Wong, Jacob Minz, and Sung Kyu Lim Abstract
More informationEvaluating Overheads of Multi-bit Soft Error Protection Techniques at Hardware Level Sponsored by SRC and Freescale under SRC task number 2042
Evaluating Overheads of Multi-bit Soft Error Protection Techniques at Hardware Level Sponsored by SR and Freescale under SR task number 2042 Lukasz G. Szafaryn, Kevin Skadron Department of omputer Science
More informationCMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues
CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan,
More informationVLSI Signal Processing
VLSI Signal Processing Lecture 1 Pipelining & Retiming ADSP Lecture1 - Pipelining & Retiming (cwliu@twins.ee.nctu.edu.tw) 1-1 Introduction DSP System Real time requirement Data driven synchronized by data
More informationDesign for Testability
Design for Testability Outline Ad Hoc Design for Testability Techniques Method of test points Multiplexing and demultiplexing of test points Time sharing of I/O for normal working and testing modes Partitioning
More informationCMP 338: Third Class
CMP 338: Third Class HW 2 solution Conversion between bases The TINY processor Abstraction and separation of concerns Circuit design big picture Moore s law and chip fabrication cost Performance What does
More informationLecture 25. Dealing with Interconnect and Timing. Digital Integrated Circuits Interconnect
Lecture 25 Dealing with Interconnect and Timing Administrivia Projects will be graded by next week Project phase 3 will be announced next Tu.» Will be homework-like» Report will be combined poster Today
More informationThermal-reliable 3D Clock-tree Synthesis Considering Nonlinear Electrical-thermal-coupled TSV Model
Thermal-reliable 3D Clock-tree Synthesis Considering Nonlinear Electrical-thermal-coupled TSV Model Yang Shang, Chun Zhang, Hao Yu, Chuan Seng Tan Xin Zhao, Sung Kyu Lim School of Electrical and Electronic
More informationImplementation of Clock Network Based on Clock Mesh
International Conference on Information Technology and Management Innovation (ICITMI 2015) Implementation of Clock Network Based on Clock Mesh He Xin 1, a *, Huang Xu 2,b and Li Yujing 3,c 1 Sichuan Institute
More informationInterconnect s Role in Deep Submicron. Second class to first class
Interconnect s Role in Deep Submicron Dennis Sylvester EE 219 November 3, 1998 Second class to first class Interconnect effects are no longer secondary # of wires # of devices More metal levels RC delay
More informationPerformance Metrics & Architectural Adaptivity. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So
Performance Metrics & Architectural Adaptivity ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So What are the Options? Power Consumption Activity factor (amount of circuit switching) Load Capacitance (size
More informationRobust Optimization of a Chip Multiprocessor s Performance under Power and Thermal Constraints
Robust Optimization of a Chip Multiprocessor s Performance under Power and Thermal Constraints Mohammad Ghasemazar, Hadi Goudarzi and Massoud Pedram University of Southern California Department of Electrical
More informationAN ANALYTICAL THERMAL MODEL FOR THREE-DIMENSIONAL INTEGRATED CIRCUITS WITH INTEGRATED MICRO-CHANNEL COOLING
THERMAL SCIENCE, Year 2017, Vol. 21, No. 4, pp. 1601-1606 1601 AN ANALYTICAL THERMAL MODEL FOR THREE-DIMENSIONAL INTEGRATED CIRCUITS WITH INTEGRATED MICRO-CHANNEL COOLING by Kang-Jia WANG a,b, Hong-Chang
More informationECE 3060 VLSI and Advanced Digital Design. Testing
ECE 3060 VLSI and Advanced Digital Design Testing Outline Definitions Faults and Errors Fault models and definitions Fault Detection Undetectable Faults can be used in synthesis Fault Simulation Observability
More informationCMP 334: Seventh Class
CMP 334: Seventh Class Performance HW 5 solution Averages and weighted averages (review) Amdahl's law Ripple-carry adder circuits Binary addition Half-adder circuits Full-adder circuits Subtraction, negative
More informationiretilp : An efficient incremental algorithm for min-period retiming under general delay model
iretilp : An efficient incremental algorithm for min-period retiming under general delay model Debasish Das, Jia Wang and Hai Zhou EECS, Northwestern University, Evanston, IL 60201 Place and Route Group,
More informationOverlay Aware Interconnect and Timing Variation Modeling for Double Patterning Technology
Overlay Aware Interconnect and Timing Variation Modeling for Double Patterning Technology Jae-Seok Yang, David Z. Pan Dept. of ECE, The University of Texas at Austin, Austin, Tx 78712 jsyang@cerc.utexas.edu,
More information