Blind Identification of Thermal Models and Power Sources from Thermal Measurements

Size: px
Start display at page:

Download "Blind Identification of Thermal Models and Power Sources from Thermal Measurements"

Transcription

1 in IEEE Sensors Journal 1 Blind Identification of Thermal Models and Power Sources from Thermal Measurements Sherief Reda, Senior Member, IEEE, Kapil Dev Member, IEEE, and Adel Belouchrani Senior Member, IEEE Abstract The ability to sense the temperatures and power consumption of various key components of a chip is central to the operation of modern integrated circuits such as processors. While modern chips often include a number of embedded thermal sensors, they lack the ability to sense power at fine granularity. This paper proposes a new direction to simultaneously identify the thermal models and the fine-grain power consumption of a chip from just the measurements of the thermal sensors and the total power consumption. Our identification technique is blind as it does not require design knowledge of the thermal-power model to identify the power sources. We investigate the main challenges in blind identification, which are the permutation and scaling ambiguities, and propose novel techniques to resolve these ambiguities. We implement our technique and apply it in three contexts. First, we implement it within a controlled simulation environment, which enables us to verify its accuracy and analyze its sensitivity to relevant issues, such as measurement noise and number of available training samples. Second, we apply it on a real multi-core CPU+GPU processor-based system, where we show the ability to identify the runtime power consumption of the individual cores using just the total power measurement and the measurements of the embedded thermal sensors under different workloads. Third, we apply it for non-invasive power sensing of chips by inverting the temperatures measured using an external infrared imaging camera. We show that our technique consistently improves the modeling and sensing accuracy of integrated circuits. I. INTRODUCTION To enable correct thermal and power management, it is necessary to sense key physical metrics such as the power consumption and temperatures of various components that makeup a chip. For modern chips, one can obtain a few thermal measurements using embedded thermal sensors inside the chip, or get fine-grain thermal maps using external thermal imaging systems that capture infrared emissions. Direct power sensing is more restricted as its granularity is limited by the number of power domains inside the chip. For instance, modern processors use the running average power limit (RAPL) interface to enable applications to measure the power consumption [9]; however, these measurements are typically coarse-grain, S. Reda and K. Dev are with the School of Engineering, Brown University, 184 Hope St, Providence, RI 2912, USA. sherief reda@brown.edu. A. Belouchrani is with the Electrical Engineering Department/LDCCP, Ecole Nationale Polytechnique, Algiers, ALGERIA. adel.belouchrani@enp.edu.dz. An earlier version of this paper appeared at DATE 217 [16]. This submission contains numerous novel materials, including (1) an entire new application and results for the proposed method for infrared imaging; (2) improvements to the original blind identification algorithm using a new initialization method that leads to much better results; (3) characterization of noise in modern processors using infrared imaging and analysis on the impact of noise on power estimation; and (4) analysis of the impact of training data size on model accuracy. only giving the power consumption of all cores, uncore units and total package power. Further, one can invert the thermal measurements, either from internal thermal sensors or from an external infrared camera, to identify the power consumption of the various components that make-up a chip [7]. The goal of this paper is to blindly estimate both the thermal models of the chip and the power consumption of individual chip components from the total power consumption of the chip and the temperature measurements obtained through either internal thermal sensors or an external infrared camera. In contrast to previous work, our blind identification method (1) makes no assumption or need for any prior design-based models for power or temperature; (2) does not require any special conditions or calibrations during runtime; and (3) does not need additional measurements like performance counters [1], [13], [22]. Our methodology only uses thermal sensor measurements and total power consumption under regular operating conditions to simultaneously identify the thermal model and a fine-grain map of power consumption. The contributions of this paper are as follows. We formulate the blind power identification (BPI) problem to estimate the power consumption of individual units in a chip together with the chip s thermal model using only the total power measurements and the thermal measurements through either internal means (e.g., thermal sensors) or external means (e.g., infrared camera). Existing general blind identification methods suffer from their inability to determine or pin-point the exact location of the power sources, and their power estimates can be off by a constant factor [2]. To eliminate these ambiguities, a novel methodology for BPI is devised that exploits the physical characteristics of thermal transfer to provide a unique thermal model that is consistent with the measurements. Our method handles steady-state and transient operation seamlessly, enabling users to track power consumption during runtime. We verify the accuracy of our method within a controlled simulation environment, and show that the proposed BPI method is able to resolve the power consumption of various multi-core processor configurations accurately. Further, we characterize the noise in thermal sensor measurements, and use the simulation environment to analyze the impact of thermal sensor noise and number of training samples on the accuracy of our proposed method. We implement our methodology on a real quad-core CPU+GPU processor and apply it to estimate the power consumption of its cores under various standard bench-

2 2 marks from the measurements of the internal thermal sensors and the RAPL interface. We also apply our method to obtain fine-grain power maps of a test chip from detailed thermal measurements, which are obtained using an infrared imaging camera that captures the chip s thermal radiation. The organization of this paper is as follows. In Section II, we provide background on thermal-power physical model, and review the related work. The problem formulation for blind estimation of thermal models and power sources is provided in Section III. Further, the proposed methodology for blind estimation is described in Section IV. The experimental results are provided in Section V. Lastly, the main conclusions of this work are summarized in Section VI. II. BACKGROUND AND RELATED WORK A. Thermal-Power Physical Model The physical relationship between power and temperature of a chip is governed by the heat diffusion equation [21]: t(x, y, z, τ) ρ(x, y, z)c p (x, y, z) = τ.[κ(x, y, z) t(x, y, z, τ)] + p(x, y, z, τ), (1) where, (x, y, z) denotes the location (e.g., the location of a core) and τ denotes the time instance; ρ(x, y, z) and c p (x, y, z) are the density and specific heat of the material, respectively; κ(x, y, z) is the thermal conductivity of the chip at specific location; t(x, y, z, τ) and p(x, y, z, τ) are the temperature and power dissipation at specific location of the chip and time, respectively. In a practical implementation, the heat diffusion equation has to be discretized in both space and time domains. This discretization comes from the finite memory size of any computer. Moreover, the spatial resolution of the thermal imaging equipment also sets a limit on the discretization granularity in space domain. Similarly, the sampling rates of the thermal camera or internal thermal sensors decide the discretization granularity in time domain. The discretized model is commonly termed as lumped model. Further, using the duality between thermal and electric models, we can express the lumped model as a resistance-capacitance (RC) network, where thermal resistance, capacitance, temperature, and power values in a thermal model are analogous to electrical resistance, capacitance, voltage, and current, respectively. Spatial Discretization: In order to discretize the heat diffusion equation in space domain, the chip and the cooling system is assumed to be made of smaller blocks. For example, it could be assumed that the chip is made up of different building blocks, such as cores, caches, etc. Similarly, the entire geometry could be composed of a heat sink in a forced-air ambient environment, heat spreader, bulk silicon, active layer, and packaging material, or any other geometry and combination of materials. For numerical thermal analysis in spatial domain, a seven-point finite difference discretization method can be (i-1,j,l) (i,j,l+1) (i,j-1,l) (i,j,l-1) (i+1,j,l) (i,j+1,l) Fig. 1. Illustration of discretized model in the spatial domain. The center point/block in the grid is located at (i, j, l) location, where i, j, and l offsets are used to represent discrete blocks along the x, y, and z axes, respectively. applied [23]; in this regard, the seven points/blocks are shown in Figure 1. Each block is assumed to be an independent entity so that the entire block represents single power source, and it has uniform temperature over the block. The discretized heat diffusion equation at an interior point (i, j, l) of the discretized grid could be written as: ρc p V dt i,j,l(τ) = G x [t i 1,j,l (τ) 2t i,j,l (τ) + t i+1,j,l (τ)]+ dτ G y [t i,j 1,l (τ) 2t i,j,l (τ) + t i,j+1,l (τ)]+ G z [t i,j,l 1 (τ) 2t i,j,l (τ) + t i,j,l+1 (τ)] + V p i,j,l (τ), (2) where, i, j, and l are used to represent discrete blocks along the x, y, and z axes, respectively; V denotes the block volume, with V = x y z, if x, y, and z are the discretization steps along the x, y, and z axes; t i,j,l (τ) denotes the temperature at (i, j, l) location at time τ; G x, G y, and G z are the thermal conductances between adjacent blocks; they are defined as: G x = κ y z/ x, G y = κ x z/ y, and G z = κ x y/ z. If we divide the entire chip package into N discrete elements, Equation 2 could be translated in to the following thermal modeling equation: dt(τ) C c = A c t(τ) + p(τ), dτ (3) where, the thermal capacitance matrix C c R N N is a diagonal matrix; A c R N N is the thermal conductance matrix; t(τ) R N 1 and p(τ) R N 1 are thermal and power vectors, respectively. Temporal Discretization: Now, we will discretize the equation 3 in time domain using (backward) Euler s method. The following discrete-time state space could be obtained: t(k) = t(k 1) + τc 1 c [A c t(k) + p(k)], (4) where, τ is the sampling interval, k is the sampling instance (i.e., τ = k τ). Rearranging the above equation, we could obtain: t(k) = At(k 1) + Bp(k), (5) where, A = (I τc 1 c A c ) 1 and B = τac 1 c with I being the identity matrix. z y x

3 3 Typically, all practical measurements have noise in them. To account for the measurement noise, we add an additive noise term ɛ(k) to the above equation. Hence, the final thermal and power model of many-core processor chip in the discretized state-space form is given by [1], [6], [17], [18], [22]: t(k) = At(k 1) + Bp(k) + ɛ(k), (6) where, t(k) and p(k) are vectors that denote the temperature and power consumption measurements of the cores at time k, respectively; A and B are the two modeling matrices that capture the physical relationship between power and thermal; and ɛ(k) is a vector that represents the noise in the measurement process at time k. Note that if one knows the matrices A and B, then we can recover the individual powers of the cores over time (i.e., p(k)) relatively easily by just using the measurements of the thermal sensors and applying inversion techniques [7]. B. Model Identification To identify the state-space model that links temperatures and power, there are two general approaches: a design-time approach and a runtime approach. The design-time approach requires extensive information of the layout of the chip and its package characteristics [12]. For instance, the user would require knowledge of the layout of the chip, its materials, and its heat sink configuration to generate the appropriate entries in the model matrices. Thus, the design-time approach requires the transfer of proprietary information, i.e., the state-space models, which is processor specific, to the users to be deployed during runtime. This approach could be also prone to errors due to variabilities arising from manufacturing or ambient conditions. The runtime approach identifies the state-space models from physical measurements during runtime. The processor is treated as a gray or black box and machine learning or system identification techniques are used to identify the statespace models from the thermal sensor measurements [1], [8], [15], [18], [22]. A key assumption in all previous runtime modeling approaches is that there are sensors for the power sources [1], [15], [18], [22]. However, modern processors lack fine-grain power sensors. For instance, the RAPL interface provides the total power consumption for individual domains (e.g., all cores and uncore units), but it does not provide power measurements for the individual cores [9]. Beneventi et al. developed a regression-based model to estimate the power consumption of individual cores assuming that when a core is active, it is fully busy running at the maximum instructions per cycle [3]. This method does not work well in practice because (1) workloads have a large impact on power consumption, (2) modern processors automatically adjust the voltage and frequency depending on the number of active cores, and (3) per-core leakage power increases when more cores are activated because of thermal coupling. To summarize, we can say that in previous work, researchers assumed either (1) the availability of p(k) and sought to identify A and B [6], [17], [22], or (2) the availability of A and B and sought to identify p(k) [7]. In contrast to previous work, we seek blind identification of A, B and p(k), with no assumption or need for any prior design-based models for any of them. That is, our methodology only uses runtime measurements (i.e., the total power and thermal sensors measurements) to simultaneously identify A, B and p(k). We make no assumptions about the availability of power sensors or prior power models, and our method works seamlessly under various frequency and voltage settings. Our technique enables designers to simplify the number of sensors by eliminating the need for physical power sensors, and to instead use measurements of the thermal sensors and total power to derive per-core power consumption. Modern processors rely on internal micro-controllers to collect the measurements of the sensors and to orchestrate thermal and power management decisions [19], [2]. Our technique can be implemented to run on internal micro-controllers or as a software thread on the main processor. III. PROBLEM FORMULATION The blind estimation problem seeks to identify the matrices A and B, together with the power profiles p(k). It is well known that the steady-state thermal model can be derived from Equation 6. If a stable set of power sources, denoted by the vector p s, are applied, then after the transient response, the steady-state temperatures, denoted by the vector t s, will be measured; i.e., t s = t(k ). By re-arranging Equation 6, one gets t s At s + Bp s, (7) (I A)t s Bp s, t s (I A) 1 Bp s, t s Rp s, (8) where, R = (I A) 1 B is the steady state thermal transfer matrix and I the identity matrix. If one obtains m thermal steady-state measurements, [t s1 t s2... t sm ], from different experiments using m different sets of power signals, [p s1 p s2... p sm ], then we can summarize the results using [t s1 t s2... t sm ] = R[p s1 p s2... p sm ] (9) T = RP. (1) If R and A are identified, matrix B can be calculated by B = (I A)R. (11) Note that the model of Equation (8) is similar to the one commonly used in array signal processing, particularly in blind source separation [2]. The latter consists of blindly identifying the matrix R, i.e. by resorting only to the information carried by the measured signals. Before proceeding, it is important to specify the notion of blind identification. Challenges in Blind Identification. In the blind context, a full identification of the matrix R from model (8) is impossible because the exchange of a fixed scalar factor between a given source signal (a power source) and the corresponding

4 4 column of R does not affect the observations (i.e., thermal measurements). That is, if we divide column i of R by an arbitrary factor α i and multiple the i th row of P by α i, then T does not change. For example, the temperature vectors in the numerical example shown below remain identical if we divide the third column of R by.2 and multiply the third row of P by = = (12) Note also that the labeling in Equation 9 is arbitrary. For example, the temperatures vectors in the next numerical example remain identical if we permute R by exchanging the first and third columns, and correspondingly permute P by exchanging the first and third rows = = (13) Hence the blind identification of R can be performed up to permutation and scaling factor of its columns using blind source separation algorithms, but this blind identification is not sufficient for our needs since we would like to resolve the powers of the individual units of the chip (e.g cores) and map them to the exact units. In the sequel, we propose to (1) take advantage of the particular physical characteristics of thermal transfer to solve the permutation ambiguity, and to (2) solve the scaling ambiguity by using the total power measurements. IV. PROPOSED METHODOLOGY In this section, we describe the proposed methodology to accurately estimate thermal modeling matrices and power sources of processors. A. Estimating Natural Response Matrix First, we start by estimating the natural response matrix A. By forcing p(k) = in Equation (6), we get t(k) = At(k 1) + ɛ(k). (14) Thus, an estimate of A is obtained by the least square minimization. If we collect K consecutive transient thermal traces, then we can construct two matrices T 1 = [t(1) t(k 1)] and T 2 = [t(2) t(k)], and solve the following quadratic programming formulation: min T 2 AT under the constraint A (15) where A denotes that all entries of A are non-negative. Fig. 2. Thermal map in Kelvin for a 3 3 unit chip, where the bottom left unit is activated. A 3 3 chip leads to a 9 9 R matrix. In this case, the impact of an activated 1 W unit in the lower-left corner (i.e., block 7) on the temperatures of all units is given by the 7th column in R, where the element on the diagonal has the highest value. B. Estimating the Steady-State and Forced Response Matrices Second, we describe the process of estimating the steadystate thermal transfer matrix (R) and the forced response matrix (B) given a matrix T of measured state-state temperatures. The first step is to apply NMF (Non Negative Matrix Factorization [14]) algorithm on T. The NMF algorithm is considered because it inherently copes with the positivity constraint of the Matrix R and the power profiles P. However, the solution provided by the NMF algorithm is not unique due to the aforementioned ambiguities, as highlighted in section III earlier. Let us define R and P to be the estimates up to permutation and scaling of R and P matrices, respectively. We introduce techniques to resolve these ambiguities in this section. Resolving Permutation Ambiguity: To solve this ambiguity, we resort to the physical characteristics of the thermal transfer matrix. The latter has the characteristics that the highest thermal impact of a power source is at the source location, and smaller thermal impact at the neighboring locations. This physical phenomenon is illustrated in Figure 2, which gives the thermal map of a chip with 3 3 units, such that the power source at the bottom-left unit is activated. A 3 3 chip leads to a 9 9 R matrix. If we activate 1 W of power at the unit in the lower-left corner (i.e., block 7), then the thermal impact on all units is given by the 7th column in R, where the element on the diagonal has the highest value and other elements have lower values because of heat diffusion properties. Thus, the largest values of the thermal-transfer matrix should be at the diagonal. Hence, the correct position for each column in R and row in P is recovered by identifying the position of the maximum value of each column in R and moving the column to the corresponding column position in the matrix. For example, if T = R P = (16)

5 5 then we reorganize the columns of R and the rows of P correspondingly so that maximum element of each column R line up on the diagonal. That is, we swap column 1 with 4 and column 2 with 3 in R, and correspondingly swap row 1 with 4 and row 2 with 3 in P, we get T = (17) where, permutation ambiguities in R and P are resolved. Resolving Scaling Ambiguity: Let α 1,..., α N be the correct scaling factors, then we notice that multiplying the N columns of R and dividing the N rows of P by these factors is equivalent to 1/α 1... α /α 2... T = R α P... 1/α N... α N (18) We observe that the scaling factors can be resolved by measuring the total power, which is an easy measurement. If [c 1 c 2... c m ] denote the total power measurements corresponding to the different sets of power sources P = [p s1 p s2... p sm ], then we get α 1... [ ] α P = [ ] c 1 c 2... c m,... α N (19) which can be simplified to [ α1 α 2... α N ] P = [ c1 c 2... c m ]. (2) The solution to Equation 2, and hence to the scaling ambiguity problem is then given by: [ α1 α 2... α N ] = [ c1 c 2... c m ] P, (21) where denotes the pseudo-inverse operator. The thermal transfer matrix R is finally obtained by sorting and re-scaling the columns of R using the scaling factor results from Equation 21. Getting Forced Response Matrix: Once matrix R and A, from sub-section IV-A), are identified, the forced response matrix B is estimated through equation (11): B = (I A)R. Initialization: One important consideration point during the blind identification process is the initialization of the NMF algorithm. We found that the quality of solution is particularly sensitive to the initialization of the algorithm. In our earlier work [16], we initialized the NMF algorithm by the fast ICA algorithm [11]. In this paper, we argue for a better initialization algorithm that respects the self-coupling of the R matrix. Thus, we initialize the NMF estimation of the R with the identity matrix I, which also has the property that the largest Procedure: Blind Identification of Power Profiles Input: K Transient thermal traces t(k), steady-state thermal measurements T and corresponding total power measurements [c 1... c m ] Output: Natural Response Matrix A, Thermal Transfer Matrix R, Forced Response Matrix B. 1) Let T 1 = [t(1) t(k 1)] & T 2 = [t(2) t(k)]. 2) Find the Natural Response Matrix A using least square minimization: min T 2 AT under the constraint A 3) Find R the thermal transfer matrix and P the power profiles up to permutation and scaling using the NMF algorithm [14] with the proposed initialization method. 4) Solve the Permutation Ambiguity: Identify the correct locations for each column in R by finding the positions of the maximum value of each column in R. Use the identified positions to sort the columns of R and rows of P to obtain: R and P 5) Solve the Scaling Ambiguity: Find the scaling factors: α = [α 1... α N ] = [c 1... c m ]P 6) Find the thermal transfer matrix R: R = Diag[α] 1 R 7) Find the forced response matrix B: B = (I A)R Fig. 3. Offline Power Identification Algorithm. element in every column is at the diagonal line. We initialize the NMF estimation of the P matrix, such that each block in the circuit has the same power and sum of each column adds up to the total measured power. That is, initial R = I (22) c 1 /N c 2 /N... c m /N c 1 /N c 2 /N... c m /N initial P = (23) c 1 /N c 2 /N... c m /N Thus, our initial conditions for both R and P ensure that these two matrices are physically in the right form. The offline analysis procedure is summarized in Figure 3. C. Estimating Runtime Power Consumption During online tracking, the core power profiles at every instance of time are obtained by solving the following quadratic programming periodically: such that min Bp(k) (t(k) At(k 1)) 2 2, p(k) and p(k) 1 = total measured power

6 6 Procedure: Blind Identification of Power Profiles Input: Temperatures t(k), total power c(k), matrices A and B from offline BPI analysis Output: Power Profiles p(k) 1) Solve quadratic programming: Bp(k) (t(k) At(k 1)) 2, such that p(k) and p(k) 1 = total measured power 2) Return solution of the quadratic programming p(k) Fig. 4. Online Blind Power Identification Algorithm. The online power estimation method is summarized in Figure 4. V. EXPERIMENTAL RESULTS To evaluate the proposed BPI algorithm, we consider three sets of experiments. In Subsection V-A we first verify the accuracy of the proposed BPI algorithm by using it to derive the percore estimates of a variety of multi-core processor configurations that are simulated in HotSpot, and compare the estimates against the known per-core power consumptions. We consider both steady-case and transient power estimation. The sample rate here is 1 ms. In Subsection V-B we apply the BPI algorithm to estimate the thermal models and per-core power estimates of a real quad-core CPU+GPU processor (Intel Haswell processor Core i7-479k) using the measurements of internal thermal sensors and the total power consumption as measured by the RAPL interface. We consider both steady-case and transient power estimation. The sampling rate here is 1 second. The online part of our algorithm 4 only takes 4.75 ms in runtime, which is shorter than the sampling rates, and as a result, BPI can track the power estimates at the same granularity as our thermal sensor measurements. In Subsection V-C we apply the BPI algorithm to estimate the power consumption of various blocks on a test chip, where the thermal measurements are obtained using an external SC56 FLIR infrared camera. In this application, we only consider steady-state power estimation, since transient response measurements using infrared imaging require peta-bytes of storage. The source code and sample data from our experiments are available at github.com/scale-lab/bpi. A. Verification and Analysis of BPI Using Simulators BPI Accuracy. The accuracy of the proposed BPI algorithm is analyzed by comparing our per-core power estimates with the actual per-core power consumption. Given that modern processors lack such power sensors, we resort to simulation for verification. We use HotSpot, which is a popular thermal simulator for chip designs [12]. We create multiple multi-core Temperature (C) Power(W) Power(W) Power(W) Power(W) (a) thermal simulation results from HotSpot 4 core 1 core 2 3 core 3 core (b) estimated and actual power consumption of core 1 3 estimated 2 actual (c) estimated and actual power consumption of core 2 3 estimated 2 actual (d) estimated and actual power consumption of core 3 3 estimated 2 actual (e) estimated and actual power consumption of core 4 3 estimated 2 actual Fig. 5. Verification of the proposed BPI algorithm using HotSpot simulator. Subfigure (a) gives the thermal measurements of the four cores using HotSpot, subfigures (b-e) give the per-core power estimates from BPI and the input power of each core to HotSpot. Dashed red lines give input power, while blue lines give estimated power, which tracks the input power quite accurately. layout configurations, with 4-core (2 2), 9-core (3 3) and 16- core (4 4) layouts. All chip-configurations are assumed to be 1 cm 1 cm in dimension with a maximum power budget of 6 W. We use scaled power traces from our Core i7 processor as input to HotSpot; otherwise, we use the default thermal settings in the simulator. The simulated thermal traces from HotSpot, together with the total power consumption are given as inputs to our BPI algorithm. The proposed BPI algorithm is used to identify the state-space matrix models and the percore power estimates. The power estimates produced from the BPI algorithm are then compared against the actual power consumptions of the individual cores that were used as inputs to HotSpot to verify its estimation accuracy. Figure 5 illustrates the results of our experiment for different thermal traces applied over time. In particular, Figure 5.a gives the thermal simulation output from HotSpot for the four cores, while Figures 5.b-e give the actual per-core power traces given as inputs to HotSpot (dashed red line) and the per-core power estimates computed from BPI (solid blue lines) for the four cores. The results in Figure 5 demonstrate that BPI tracks the power accurately. Further, the high accuracy of estimated power from the simulated temperature traces demonstrates that the proposed BPI algorithm is able to estimate the thermal to power model for the given chip quite accurately.

7 7 num of cores Our earlier work [16] avg. abs. avg. abs. error (W) error (%) % % % This work avg. abs. avg. abs. error (W) error (%).9 4.5% % % TABLE I S UMMARY OF BPI ACCURACY. T HE AVERAGE ABSOLUTE ERROR IN PER - CORE POWER ESTIMATES ARE REPORTED AS A FUNCTION OF THE NUMBER OF CORES. H OT S POT IS USED FOR VALIDATION OF PER - CORE ESTIMATES AGAINST THE ACTUAL POWER CONSUMPTION OF EACH CORE. average absolute power estimation error per core (%) 5 Fig. 7. An example IR thermal image of the Core i7 processor die. Such IR thermal images are used to obtain detailed and accurate temperature maps of the processor core for characterizing the internal thermal sensor noise of the processor cores 9 cores 16 cores number of training samples (thermal simulations) Fig. 6. Average absolute error in per-core power estimates as a function of the number of thermal traces which are used to train the BPI algorithm. HotSpot tool was used to generate the thermal traces for different power traces. About 5 thermal traces are enough to obtain reasonably good power estimates. To understand the scalability of our algorithm as a function of the number of power sources, i.e., the number of cores, we repeat our verification experiment for multi-core processors with 9-core and 16-core configurations. We define the average absolute error in Watt and percentage as av. abs. error (W) = N 1 X estimated power actual power, N n=1 (24) and N 1 X estimated power actual power N n=1 actual power (25) respectively. The average absolute errors are summarized in Table I. We also provide the results from using our earlier method [16], where we used the ICA algorithm to initialize the NMF, and contrast it to the new initialization method proposed in this paper. The results show that the proposed initialization method delivers consistently significantly better results. In particular, we observe that the new initialization method provides up to 7.2% better accuracy for the studied chip configurations. av. abs. error (%) = Impact of Number of Samples. To understand the sensitivity of BPI results as a function of the number of training samples, we re-evaluate the accuracy of BPI as a function of the number of thermal measurements. For N sources, there are N 2 unknown values in the R matrix, which is a lower bound on the number of training samples. Each steady-state thermal trace gives N thermal measurements, one per core. Figure 6 gives the average per-core absolute error in power estimation (%) as a function of the number of traces for the 4-core, 9-core and 16-core cases. The results show that increasing the number of training traces generally improves the accuracy of the algorithm; however, even with 2-5 thermal traces, one can get reasonably good accuracy. One also observes that for the same number of samples, increasing the number of cores improves the accuracy, which is an important consideration given that future many-core processors will incorporate tens of cores. Noise characterization and Impact. Another impact on performance of BPI comes from the noise in measurements, which are typical for internal thermal sensors in processors. There are mainly two types of noise: (1) noise arising from discretization, since internal measurement sensors are provided as discrete integer values, and (2) noise arising from inherent sensor noise, where the same temperature can lead to different internal measurements. To understand the magnitude of noise in sensor measurements, we characterize the temperatures of our Core i7 processor using an external infrared camera that has a noise figure of 15 mk. Figure 7 shows an example thermal map of the i7 processor. We then compare the temperatures of the core temperatures as reported from the internal thermal sensors against the temperatures measured from the far more accurate camera. The differences between the two temperatures for different cores are plotted as histograms in Figure 8. In this figure, the x-axis denotes the error in thermal sensor measurements, while y-axis denotes the probability of sensor error across different experiments. The plot shows that the internal sensor noise falls within a [ 2, 2] C window, and that is Gaussian in nature as verified by the KolmogorovSmirnov normality test. To analyze the impact of noise on the accuracy of BPI

8 8 p(error) p(error) Core Error in temperature (C) Core Error in temperature (C) p(error) p(error) Core Error in temperature (C) Core Error in temperature (C) Fig. 8. Histograms of thermal sensors measurement errors for different cores of the Core i7 processor. X-axes of histograms denote error in thermal sensor measurements, while Y-axes denote the error probabilities. The overlaid blue plots are the fitted Gaussian distributions for each core. average absolute per-core power es1ma1on error 7.% 6.% 5.% 4.% 3.% 2.% 1.%.% floa/ng point integer integer with noise sensor output format Fig. 9. Impact of internal thermal sensor measurement accuracy on estimated power accuracy from the proposed BPI method. Floating point refers to the estimation accuracy when the simulation results of HotSpot, which is given in floating point, are used. In Integer, we discretized the simulation results by rounding them to the nearest integer, which emulate the measurements from the internal thermal sensors in real processors. In Integer with noise, we round the measurements after introducing an amount of noise from a Gaussian source of standard deviation of 2/3 C. algorithm, we use HotSpot tool again since it allows us to control the per-core power consumptions easily. The thermal measurements from HotSpot are naturally given in floating point numbers. We quantify the performance of BPI when HotSpot measurements are (1) discretized, and (2) discretized with additional noise from the [ 2, 2] C window. We plot in Figure 9 the average absolute per-core power estimation (%) as a function of the sensor noise mode for case of the 9-core chip. As expected the discretization and thermal sensor noise have a small impact on the accuracy of the algorithm; however, this degradation is graceful and does not lead to large inversion errors during power estimation. B. Blind Modeling and Estimation of a Real Quad-Core Processor Using Internal Thermal Sensor Measurements In the second set of experiments, we apply the BPI algorithm to estimate the thermal models and per-core power estimates of a real quad-core CPU+GPU processor. We use a Linux-based system with an Intel Haswell processor Core i7-479k (Devil Canyon) which features four cores, an integrated GPU, and a L3 cache of 8 MB. The RAPL interface enables us to read the total power consumption of all the cores. Further, the lmsensors module v3.3.4 is used to read the thermal measurements of the four cores. The sampling rate of the RAPL power and thermal sensor measurements is 1 second. The frequencies and voltages of the cores are automatically controlled by Intel s speed driver, where the driver adjusts the frequencies of the cores automatically depending on the load and the available thermal/power envelope to a maximum of 4.2 GHz. Thus, the frequency is variable during our experiments. While we have fixed the fan speed in our experiments, a variable fan speed can be incorporated in our technique by repeating our modeling approach under various fan speed settings, and then looking up the correct model during power tracking depending on the actual fan speed. We execute a good collection of workloads to collect traces that include the internal measurements from the thermal sensors and total power from RAPL. The initial phase of our BPI algorithm is then executed on the data to blindly estimate the state-space model matrices for the processor. During runtime, our light-weight power estimation (algorithm of Figure 4) takes about 4.75 ms per-sample to compute the per-core power estimates. Hence, the proposed BPI algorithm could easily be used to make run-time decisions to control the chip temperature. Controlled Stress MicroBenchmarks. We first demonstrate that our BPI technique produces correct results on the real system. We design a multi-threaded stress generation application that enables us to control the number of threads and the exact cores that are being stressed when the application is executed. For space considerations, we demonstrate three cases out of the possible 16 cases: (a) one core is stressed (core 1), (b) two cores are stressed (cores 2 and 3), and (c) all cores are stressed. The results are given in Figure 1, where we report the total power, the measurements from the thermal sensors, and per-core power estimates using our BPI technique for the three cases. We observe from the plots that our technique is able to break-down the power consumption and map it to the four cores correctly as known from the controlled scheduling. While some of the inactive cores are correctly estimated to consume a small amount of power, this power is mainly attributed to leakage power, since our technique identifies the total power (dynamic and leakage). Furthermore, we can see that the cores do not consume the same exact power when they are all active (case c). This can be attributed to leakage power which depends on the thermal profile and process variability [5]. We also implemented the regression-based approach reported in [3], which only works in steady state and assumes that active cores are fully busy. We found that it can lead

9 9 temp. (C) 2 1 measured total Thermal sensor measurements (C) estimated core 1 Power (W) estimated core 2 Power (W) estimated core 3 Power (W) estimated core 4 Power (W) (a) core 1 stressed (b) cores 2 and 3 stressed (c) cores 1, 2, 3, and 4 stressed temp. (C) 5 measured total Thermal sensor measurements (C) estimated core 1 Power (W) estimated core 2 Power (W) estimated core 3 Power (W) estimated core 4 Power (W) temp. (C) 1 5 measured total Thermal sensor measurements (C) estimated core 1 Power (W) estimated core 2 Power (W) estimated core 3 Power (W) estimated core 4 Power (W) Fig. 1. Illustration of successful operation of BPI in which various cores are stressed and the power dissipation of the cores are blindly identified. In set of subplots in column (a), only one core is stressed (core 1); in (b) two cores are stressed (cores 2 and 3), and in (c) all cores are stressed. temp. (C) 1 5 measured total Thermal sensor measurements (C) estimated core 1 Power (W) estimated core 2 Power (W) estimated core 3 Power (W) estimated core 4 Power (W) Fig. 11. Demonstration of BPI using a mix of SPEC CPU 26 benchmarks. Four benchmarks (hmmer, mcf, milc, and povray) are launched with one benchmark per core. The wait time of about 1 seconds is used between launching two consecutive benchmarks. For example, the spike in power of core 4 corresponds to the launch time of povray benchmark on that core. to up 11.4% deviation in power estimation compared to the actual steady-state total power. This deviation results because the method does not account for the automatic changes in operational voltage-frequency and leakage power when more cores are activated. Multiple SPEC CPU6 Benchmarks. We conduct additional experiments using SPEC CPU 26 and PARSEC benchmarks to demonstrate the ability to track the power consumption of general benchmarks. In the second experiment, we launch four benchmarks of the SPEC CPU 26: hmmer, mcf, milc, and povray on cores 1, 2, 3 and 4, respectively. We wait for about 1 seconds between launching two consecutive benchmarks. Figure 11 gives the per-core estimates from our BPI algorithm. The plot correctly shows that the estimated power of core 4 spikes at about 4 seconds when povray was launched on it. In a similar trend, the power estimates of every core are very low, as expected, right after the completion of the SPEC benchmark running on the core. Furthermore, Core 1 is displaying the highest power consumption among all cores, as it executes hmmer, which is the most CPU intensive application among the selected benchmarks. Multi-threaded PARSEC Benchmarks. In the third experiment, we use bodytrack from the PARSEC multi-threaded benchmarks. We limit it to a maximum of two threads, and use our BPI algorithm to estimate the per-core power estimates. The results are given in Figure 12. Interestingly, the plot shows activation of all cores; however, not all cores appear active simultaneously. This result perfectly matches the bodytrack characteristics, which launches one thread per image to analyze 26 image frames. Since we limited bodytrack to two

10 1 temp. (C) 4 2 measured total Thermal sensor measurements (C) estimated core 1 Power (W) estimated core 2 Power (W) estimated core 3 Power (W) estimated core 4 Power (W) Fig. 12. Demonstration of BPI using bodytrack from PARSEC benchmark suite configured to run using two threads. As observed from the reconstructed power for core 1-4, at a time no more than two cores are active. However, over time, all four cores are used by the Linux scheduler for balancing load across all cores. threads, the Linux scheduler automatically seeks to balance the launched threads among the cores, and as a result all the cores are used over the course of execution but no more than two cores are active at a time. Our BPI algorithm correctly tracks this behavior during runtime, where the power spikes correspond to the launching and termination of threads on the various cores. C. Blind Modeling and Estimation Using Infrared Imaging In the third set of experiments we demonstrate the applicability of our method to the thermal model and power modeling from infrared imaging techniques. To this end, we design a test chip that is composed of 1 1=1 micro heaters embedded in a 9nm Altera Stratix II FPGA test chip of 7.2 mm 7.9 mm. Each microheater corresponds to a finegrained block on the chip, and when enabled, it consumes 2 mw power. Figure 13.a shows the silicon die area, also called the test area. To capture the thermal emissions, we use a FLIR SC56 infrared camera, and to measure total power, Fig. 13. (a) Layout of the 7.2mm 7.9mm test chip on an Altera FPGA with 1 (1 1) custom micro-heaters; (b) an example of thermal traces from an high-end infrared (IR) camera. Temperatures are in Celsius above room temperature. we intercept the power supply lines using a shunt resistor and measure the shunt s voltage using an Agilent 3441 multimeter. Figure 13.b gives an example of the captured thermal emissions in steady state. For infrared imaging (IR) based experiments, the resolution of BPI algorithm is dictated by the resolution of IR camera. In our lab, we have an IR camera from FLIR (FLIR SC56). The camera has a spatial resolution of 5µm. However, in practice, the temperature and power profiles are studied at larger granularity, dictated by the size of different functional blocks. In this experiment, we have 1 blocks in 7.2 mm 7.9 mm as illustrated in Figure 13, so our horizontal pitch is.72 mm and the vertical resolution is.79 mm. As such, the spatial resolution of estimated power maps could be the same as the temperature resolution of the IR camera. However, in this work, the die area is divided into finegrained blocks to create microheaters; hence, the resolution of temperature and power maps are decided by the number of microheaters. To record thermal emissions for the purpose of blind identification, we enable various microheaters in various pattern configurations on the chip and record the resultant infrared emissions and the total power consumption. We repeat this procedure 2 times with different power patterns to create enough number of thermal traces that can be used as inputs to our BPI method to identify the steady-state thermal model and the power consumption of the individual blocks. Figure 13.b gives an example of such thermal trace. In Figure 14, we plot the histogram of the number of incorrectly estimated microheaters out of the 1 units in our design for our 2 power patterns. Overall, the average power estimation error for the 2 power patterns is about 11.5%. In other words, on the average 88.5% of the 1 microheaters blocks have their power estimated correctly by the proposed blind identification method. Figure 15 provides two examples of such power maps. VI. CONCLUSIONS We proposed a new technique for blind identification of power consumption of individual cores in multicore processors with no need for any a priori thermal-power models. Our BPI technique simultaneously identifies the state-space thermal

11 11 Number of power patterns (out of 2) Number of incorrectly estimated microheaters (out of 1) Fig. 14. Histogram showing the number of inaccurately estimated microheaters out of the 1 microheaters in the chip. Overall, 2 different random power patterns were generated using the microheater grid. On average BPI algorithm estimates the power dissipation of the micro-heaters with reasonably high accuracy (88.5%). Fig. 15. Comparison of two blindly estimated power maps against the reference (actual) power maps. model and the power consumption of cores from just the measurements of total power and thermal sensors during runtime. To overcome the challenges in general blind source separation techniques, we proposed methods that exploit the nature of thermal characteristics, and the total power measurements to construct the state-space model correctly with appropriate permutation and scaling factors. We verified the accuracy of our method and demonstrated its resilience to scaling and sensor noise. We have proposed and implemented two applications for our technique: (1) a real multi-core processor, where we used it to track the exact power consumption of its cores under different workloads using the internal thermal sensors, and (2) an application on a test chip, where we use the measurements from an infrared camera as input to our BPI method, which in return provided fine-grain power maps for the chip. Further, in this paper, we evaluated the proposed blind identification method on regular 2D ICs. As a future work, we are planning to extend our technique to 3D ICs with arbitrary locations for embedded thermal sensors and power sources. Acknowledgments: This work is partially supported by NSF grants # , and by an Arab-American Frontiers Fellowship of the U.S. National Academy of Sciences, Engineering and Medicine. REFERENCES [1] A. Bartolini, M. Cacciari, A. Tilli and L. Benini, A distributed and selfcalibrating model-predictive controller for energy and thermal management of high-performance multicores, IEEE DATE, pp. 1-6, 211. [2] A. Belouchrani, K. Abed-Meraim, J.F. Cardoso and E. Moulines, A blind source separation technique using second order statistics, IEEE Trans. on Signal Processing, vol. 45, no. 2, pp , February [3] F. Beneventi, A. Bartolini, and L. Benini, Static thermal model learning for high-performance multicore servers, in ICCCN, pp. 1-6, 211. [4] F. Beneventi, A. Bartolini, A. Tilli, and L. Benini, An Effective Gray- Box Identification Procedure for Multicore Thermal Modeling, in IEEE Transactions on Computers, Vol. 63(5), 214. [5] S. Borkar, T. Karik, S. Narendra, J. Tschanz, A. Keshavarzi and V. De, Parameter variations and impact on circuits and microarchitecture, in ACM/IEEE DAC, pp , 23. [6] R. Cochran and S. Reda, Consistent Runtime Thermal Prediction and Control Through Workload Phase Detection, DAC, pp , 21. [7] R. Cochran, A. N. Nowroz and S. Reda, Post-Silicon Power Characterization Using Thermal Infrared Emissions, ISLPED, pp , 21. [8] A. Coskun, T. Rosing and K. C. Gross, Utilizing Predictors for Efficient Thermal Management in Multiprocessor SoCs, IEEE Transactions on CAD of Integrated Circuits and Systems, Vol 28(1), pp , 29. [9] H. David et al., RAPL: Memory Power Estimation and Capping, in ISLPED, pp , 21. [1] K. Dev, A. N. Nowroz and S. Reda, Power Mapping and Modeling of Multi-core Processors, in IEEE International Symposium on Low-Power Electronics and Design, pp , 213. [11] A. Hyvarinen, Fast and Robust Fixed-Point Algorithms for Independent Component Analysis, IEEE Trans. on Neural Networks 1(3): , [12] W. Huang et al. HotSpot: a compact thermal modeling methodology for early-stage VLSI design, IEEE Transactions on VLSI Systems, vol 14(5), pp , 26. [13] C. Isci, G. Contreras and M. Martonosi, Live, Runtime Phase Monitoring and Prediction on Real Systems with Application to Dynamic Power Management, in ISCA, pp , 26. [14] D. D. Lee and H. S. Seung, Learning the parts of objects by nonnegative matrix factorization, Nature, 41 (1999), pp [15] D. Li, S. X.-D. Tan, E. Pacheco and M. Tirmula, Parameterized Architecture-level Dynamic Thermal Models for Multicore Microprocessors, ACM TODAES., Vol 15(2), pp. 16:1-16:22, 21. [16] S. Reda and A. Belouchrani, Blind Identification of Power Sources in Processors, IEEE/ACM Design, Automation & Test in Europe, 217. [17] S. Sharifi and C.-C. Liu and T. Rosing, Accurate Temperature Estimation for Efficient Thermal Management, ISQED, pp , 28. [18] Y. Wang, K. Ma, and X. Wang, Temperature-Constrained Power Control for Chip Multiprocessors with Online Model Estimation, ISCA, pp. pp , 29. [19] K. Dev and S. Reda, Scheduling Challenges and Opportunities in Integrated CPU+GPU Processors, in ACM/IEEE Symposium on Embedded Systems for Real-time Media, pp , 216. [2] M. Yuffe et al., A Fully Integrated Multi-CPU, Processor Graphics, and Memory Controller 32-nm Processor, IEEE JSSC, Vol 47(1), pp , 212. [21] M. N. Ozisik, Heat Conduction, Natick, MA: Wiley/IEEE, Mar [22] F. Beneventi, A. Bartolini, A. Tilli, and L. Benini, An Effective Gray- Box Identification Procedure for Multicore Thermal Modeling, IEEE Trans. Comp., Vol 63(5), pp , 214. [23] Y. Yang, Z. Gu, C. Zhu, R.P. Dick, and L. Shang, ISAC: Integrated Space-and-Time-Adaptive Chip-Package Thermal Analysis, IEEE Trans. CAD of Integ. Circ. and Sy., Vol 26(1), pp , 27.

Blind Identification of Power Sources in Processors

Blind Identification of Power Sources in Processors Blind Identification of Power Sources in Processors Sherief Reda School of Engineering Brown University, Providence, RI 2912 Email: sherief reda@brown.edu Abstract The ability to measure power consumption

More information

Understanding the Sources of Power Consumption in Mobile SoCs

Understanding the Sources of Power Consumption in Mobile SoCs Understanding the Sources of Power Consumption in Mobile SoCs Mostafa Said, Sofiane Chetoui, Adel Belouchrani 2, Sherief Reda SCALE Lab, Brown University, Rhode Island, USA 2 Ecole ational Polytechnique,

More information

Technical Report GIT-CERCS. Thermal Field Management for Many-core Processors

Technical Report GIT-CERCS. Thermal Field Management for Many-core Processors Technical Report GIT-CERCS Thermal Field Management for Many-core Processors Minki Cho, Nikhil Sathe, Sudhakar Yalamanchili and Saibal Mukhopadhyay School of Electrical and Computer Engineering Georgia

More information

Thermal System Identification (TSI): A Methodology for Post-silicon Characterization and Prediction of the Transient Thermal Field in Multicore Chips

Thermal System Identification (TSI): A Methodology for Post-silicon Characterization and Prediction of the Transient Thermal Field in Multicore Chips Thermal System Identification (TSI): A Methodology for Post-silicon Characterization and Prediction of the Transient Thermal Field in Multicore Chips Minki Cho, William Song, Sudhakar Yalamanchili, and

More information

Architecture-level Thermal Behavioral Models For Quad-Core Microprocessors

Architecture-level Thermal Behavioral Models For Quad-Core Microprocessors Architecture-level Thermal Behavioral Models For Quad-Core Microprocessors Duo Li Dept. of Electrical Engineering University of California Riverside, CA 951 dli@ee.ucr.edu Sheldon X.-D. Tan Dept. of Electrical

More information

Thermal Scheduling SImulator for Chip Multiprocessors

Thermal Scheduling SImulator for Chip Multiprocessors TSIC: Thermal Scheduling SImulator for Chip Multiprocessors Kyriakos Stavrou Pedro Trancoso CASPER group Department of Computer Science University Of Cyprus The CASPER group: Computer Architecture System

More information

Optimal Multi-Processor SoC Thermal Simulation via Adaptive Differential Equation Solvers

Optimal Multi-Processor SoC Thermal Simulation via Adaptive Differential Equation Solvers Optimal Multi-Processor SoC Thermal Simulation via Adaptive Differential Equation Solvers Francesco Zanini, David Atienza, Ayse K. Coskun, Giovanni De Micheli Integrated Systems Laboratory (LSI), EPFL,

More information

Parallel VLSI CAD Algorithms. Lecture 1 Introduction Zhuo Feng

Parallel VLSI CAD Algorithms. Lecture 1 Introduction Zhuo Feng Parallel VLSI CAD Algorithms Lecture 1 Introduction Zhuo Feng 1.1 Prof. Zhuo Feng Office: EERC 513 Phone: 487-3116 Email: zhuofeng@mtu.edu Class Website http://www.ece.mtu.edu/~zhuofeng/ee5900spring2012.html

More information

Optimal Multi-Processor SoC Thermal Simulation via Adaptive Differential Equation Solvers

Optimal Multi-Processor SoC Thermal Simulation via Adaptive Differential Equation Solvers Optimal Multi-Processor SoC Thermal Simulation via Adaptive Differential Equation Solvers Francesco Zanini, David Atienza, Ayse K. Coskun, Giovanni De Micheli Integrated Systems Laboratory (LSI), EPFL,

More information

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Jan. 17 th : Homework 1 release (due on Jan.

More information

Accurate Temperature Estimation for Efficient Thermal Management

Accurate Temperature Estimation for Efficient Thermal Management 9th International Symposium on Quality Electronic Design Accurate emperature Estimation for Efficient hermal Management Shervin Sharifi, ChunChen Liu, ajana Simunic Rosing Computer Science and Engineering

More information

A Physical-Aware Task Migration Algorithm for Dynamic Thermal Management of SMT Multi-core Processors

A Physical-Aware Task Migration Algorithm for Dynamic Thermal Management of SMT Multi-core Processors A Physical-Aware Task Migration Algorithm for Dynamic Thermal Management of SMT Multi-core Processors Abstract - This paper presents a task migration algorithm for dynamic thermal management of SMT multi-core

More information

Through Silicon Via-Based Grid for Thermal Control in 3D Chips

Through Silicon Via-Based Grid for Thermal Control in 3D Chips Through Silicon Via-Based Grid for Thermal Control in 3D Chips José L. Ayala 1, Arvind Sridhar 2, Vinod Pangracious 2, David Atienza 2, and Yusuf Leblebici 3 1 Dept. of Computer Architecture and Systems

More information

Efficient online computation of core speeds to maximize the throughput of thermally constrained multi-core processors

Efficient online computation of core speeds to maximize the throughput of thermally constrained multi-core processors Efficient online computation of core speeds to maximize the throughput of thermally constrained multi-core processors Ravishankar Rao and Sarma Vrudhula Department of Computer Science and Engineering Arizona

More information

A Novel Software Solution for Localized Thermal Problems

A Novel Software Solution for Localized Thermal Problems A Novel Software Solution for Localized Thermal Problems Sung Woo Chung 1,* and Kevin Skadron 2 1 Division of Computer and Communication Engineering, Korea University, Seoul 136-713, Korea swchung@korea.ac.kr

More information

TempoMP: Integrated Prediction and Management of Temperature in Heterogeneous MPSoCs

TempoMP: Integrated Prediction and Management of Temperature in Heterogeneous MPSoCs : Integrated Prediction and Management of Temperature in Heterogeneous MPSoCs Shervin Sharifi, Raid Ayoub, Tajana Simunic Rosing Computer Science and Engineering Department University of California, San

More information

Online Work Maximization under a Peak Temperature Constraint

Online Work Maximization under a Peak Temperature Constraint Online Work Maximization under a Peak Temperature Constraint Thidapat Chantem Department of CSE University of Notre Dame Notre Dame, IN 46556 tchantem@nd.edu X. Sharon Hu Department of CSE University of

More information

PARADE: PARAmetric Delay Evaluation Under Process Variation *

PARADE: PARAmetric Delay Evaluation Under Process Variation * PARADE: PARAmetric Delay Evaluation Under Process Variation * Xiang Lu, Zhuo Li, Wangqi Qiu, D. M. H. Walker, Weiping Shi Dept. of Electrical Engineering Dept. of Computer Science Texas A&M University

More information

Thermal and Power Characterization of Real Computing Devices

Thermal and Power Characterization of Real Computing Devices 76 IEEE TRANSACTIONS ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, VOL. 1, NO. 2, JUNE 2011 Thermal and Power Characterization of Real Computing Devices Sherief Reda, Member, IEEE (Invited Paper)

More information

Reliability-aware Thermal Management for Hard Real-time Applications on Multi-core Processors

Reliability-aware Thermal Management for Hard Real-time Applications on Multi-core Processors Reliability-aware Thermal Management for Hard Real-time Applications on Multi-core Processors Vinay Hanumaiah Electrical Engineering Department Arizona State University, Tempe, USA Email: vinayh@asu.edu

More information

Using FLOTHERM and the Command Center to Exploit the Principle of Superposition

Using FLOTHERM and the Command Center to Exploit the Principle of Superposition Using FLOTHERM and the Command Center to Exploit the Principle of Superposition Paul Gauché Flomerics Inc. 257 Turnpike Road, Suite 100 Southborough, MA 01772 Phone: (508) 357-2012 Fax: (508) 357-2013

More information

FastSpot: Host-Compiled Thermal Estimation for Early Design Space Exploration

FastSpot: Host-Compiled Thermal Estimation for Early Design Space Exploration FastSpot: Host-Compiled Thermal Estimation for Early Design Space Exploration Darshan Gandhi, Andreas Gerstlauer, Lizy John Electrical and Computer Engineering The University of Texas at Austin Email:

More information

Thermal Measurements & Characterizations of Real. Processors. Honors Thesis Submitted by. Shiqing, Poh. In partial fulfillment of the

Thermal Measurements & Characterizations of Real. Processors. Honors Thesis Submitted by. Shiqing, Poh. In partial fulfillment of the Thermal Measurements & Characterizations of Real Processors Honors Thesis Submitted by Shiqing, Poh In partial fulfillment of the Sc.B. In Electrical Engineering Brown University Prepared under the Direction

More information

Energy-Optimal Dynamic Thermal Management for Green Computing

Energy-Optimal Dynamic Thermal Management for Green Computing Energy-Optimal Dynamic Thermal Management for Green Computing Donghwa Shin, Jihun Kim and Naehyuck Chang Seoul National University, Korea {dhshin, jhkim, naehyuck} @elpl.snu.ac.kr Jinhang Choi, Sung Woo

More information

USING ON-CHIP EVENT COUNTERS FOR HIGH-RESOLUTION, REAL-TIME TEMPERATURE MEASUREMENT 1

USING ON-CHIP EVENT COUNTERS FOR HIGH-RESOLUTION, REAL-TIME TEMPERATURE MEASUREMENT 1 USING ON-CHIP EVENT COUNTERS FOR HIGH-RESOLUTION, REAL-TIME TEMPERATURE MEASUREMENT 1 Sung Woo Chung and Kevin Skadron Division of Computer Science and Engineering, Korea University, Seoul 136-713, Korea

More information

Parameterized Architecture-Level Dynamic Thermal Models for Multicore Microprocessors

Parameterized Architecture-Level Dynamic Thermal Models for Multicore Microprocessors Parameterized Architecture-Level Dynamic Thermal Models for Multicore Microprocessors DUO LI and SHELDON X.-D. TAN University of California at Riverside and EDUARDO H. PACHECO and MURLI TIRUMALA Intel

More information

Thermomechanical Stress-Aware Management for 3-D IC Designs

Thermomechanical Stress-Aware Management for 3-D IC Designs 2678 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 25, NO. 9, SEPTEMBER 2017 Thermomechanical Stress-Aware Management for 3-D IC Designs Qiaosha Zou, Member, IEEE, Eren Kursun,

More information

Implications on the Design

Implications on the Design Implications on the Design Ramon Canal NCD Master MIRI NCD Master MIRI 1 VLSI Basics Resistance: Capacity: Agenda Energy Consumption Static Dynamic Thermal maps Voltage Scaling Metrics NCD Master MIRI

More information

PARADE: PARAmetric Delay Evaluation Under Process Variation * (Revised Version)

PARADE: PARAmetric Delay Evaluation Under Process Variation * (Revised Version) PARADE: PARAmetric Delay Evaluation Under Process Variation * (Revised Version) Xiang Lu, Zhuo Li, Wangqi Qiu, D. M. H. Walker, Weiping Shi Dept. of Electrical Engineering Dept. of Computer Science Texas

More information

A Modular NMF Matching Algorithm for Radiation Spectra

A Modular NMF Matching Algorithm for Radiation Spectra A Modular NMF Matching Algorithm for Radiation Spectra Melissa L. Koudelka Sensor Exploitation Applications Sandia National Laboratories mlkoude@sandia.gov Daniel J. Dorsey Systems Technologies Sandia

More information

Mitigating Semiconductor Hotspots

Mitigating Semiconductor Hotspots Mitigating Semiconductor Hotspots The Heat is On: Thermal Management in Microelectronics February 15, 2007 Seri Lee, Ph.D. (919) 485-5509 slee@nextremethermal.com www.nextremethermal.com 1 Agenda Motivation

More information

Intel Stratix 10 Thermal Modeling and Management

Intel Stratix 10 Thermal Modeling and Management Intel Stratix 10 Thermal Modeling and Management Updated for Intel Quartus Prime Design Suite: 17.1 Subscribe Send Feedback Latest document on the web: PDF HTML Contents Contents 1...3 1.1 List of Abbreviations...

More information

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 10, OCTOBER

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 10, OCTOBER IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL 17, NO 10, OCTOBER 2009 1495 Architecture-Level Thermal Characterization for Multicore Microprocessors Duo Li, Student Member, IEEE,

More information

Thermal Interface Materials (TIMs) for IC Cooling. Percy Chinoy

Thermal Interface Materials (TIMs) for IC Cooling. Percy Chinoy Thermal Interface Materials (TIMs) for IC Cooling Percy Chinoy March 19, 2008 Outline Thermal Impedance Interfacial Contact Resistance Polymer TIM Product Platforms TIM Design TIM Trends Summary 2 PARKER

More information

Lecture 2: Metrics to Evaluate Systems

Lecture 2: Metrics to Evaluate Systems Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with AM, GM, HM Sign up for the class mailing list! Video

More information

Exploiting Power Budgeting in Thermal-Aware Dynamic Placement for Reconfigurable Systems

Exploiting Power Budgeting in Thermal-Aware Dynamic Placement for Reconfigurable Systems Exploiting Power Budgeting in Thermal-Aware Dynamic Placement for Reconfigurable Systems Shahin Golshan 1, Eli Bozorgzadeh 1, Benamin C Schafer 2, Kazutoshi Wakabayashi 2, Houman Homayoun 1 and Alex Veidenbaum

More information

Reducing Delay Uncertainty in Deeply Scaled Integrated Circuits Using Interdependent Timing Constraints

Reducing Delay Uncertainty in Deeply Scaled Integrated Circuits Using Interdependent Timing Constraints Reducing Delay Uncertainty in Deeply Scaled Integrated Circuits Using Interdependent Timing Constraints Emre Salman and Eby G. Friedman Department of Electrical and Computer Engineering University of Rochester

More information

Energy-Efficient Real-Time Task Scheduling in Multiprocessor DVS Systems

Energy-Efficient Real-Time Task Scheduling in Multiprocessor DVS Systems Energy-Efficient Real-Time Task Scheduling in Multiprocessor DVS Systems Jian-Jia Chen *, Chuan Yue Yang, Tei-Wei Kuo, and Chi-Sheng Shih Embedded Systems and Wireless Networking Lab. Department of Computer

More information

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay SP-CNN: A Scalable and Programmable CNN-based Accelerator Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay Motivation Power is a first-order design constraint, especially for embedded devices. Certain

More information

PowerField: A Transient Temperature-to-Power Technique based on Markov Random Field Theory

PowerField: A Transient Temperature-to-Power Technique based on Markov Random Field Theory PowerField: A Transient Temperature-to-Power Technique based on Markov Random Field Theory Wongyu Shin 1 wgshin@mvlsi.kaist.ac.kr Seungwook Paek 1 swpaek@mvlsi.kaist.ac.kr Jaehyeong Sim 1 jhsim@mvlsi.kaist.ac.kr

More information

Throughput Maximization for Intel Desktop Platform under the Maximum Temperature Constraint

Throughput Maximization for Intel Desktop Platform under the Maximum Temperature Constraint 2011 IEEE/ACM International Conference on Green Computing and Communications Throughput Maximization for Intel Desktop Platform under the Maximum Temperature Constraint Guanglei Liu 1, Gang Quan 1, Meikang

More information

Continuous heat flow analysis. Time-variant heat sources. Embedded Systems Laboratory (ESL) Institute of EE, Faculty of Engineering

Continuous heat flow analysis. Time-variant heat sources. Embedded Systems Laboratory (ESL) Institute of EE, Faculty of Engineering Thermal Modeling, Analysis and Management of 2D Multi-Processor System-on-Chip Prof David Atienza Alonso Embedded Systems Laboratory (ESL) Institute of EE, Falty of Engineering Outline MPSoC thermal modeling

More information

Featured Articles Advanced Research into AI Ising Computer

Featured Articles Advanced Research into AI Ising Computer 156 Hitachi Review Vol. 65 (2016), No. 6 Featured Articles Advanced Research into AI Ising Computer Masanao Yamaoka, Ph.D. Chihiro Yoshimura Masato Hayashi Takuya Okuyama Hidetaka Aoki Hiroyuki Mizuno,

More information

L16: Power Dissipation in Digital Systems. L16: Spring 2007 Introductory Digital Systems Laboratory

L16: Power Dissipation in Digital Systems. L16: Spring 2007 Introductory Digital Systems Laboratory L16: Power Dissipation in Digital Systems 1 Problem #1: Power Dissipation/Heat Power (Watts) 100000 10000 1000 100 10 1 0.1 4004 80088080 8085 808686 386 486 Pentium proc 18KW 5KW 1.5KW 500W 1971 1974

More information

The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization

The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering Michigan Technological University Houghton, Michigan

More information

The challenges of Power estimation and Power-silicon correlation. Yoad Yagil Intel, Haifa

The challenges of Power estimation and Power-silicon correlation. Yoad Yagil Intel, Haifa The challenges of estimation and -silicon correlation Yoad Yagil Intel, Haifa Nanoscale Integrated Systems on Chip, Technion, Dec. 006 Acknowledgements Yoni Aizik Ali Muhtaroglu Agenda Introduction Can

More information

Robust Optimization of a Chip Multiprocessor s Performance under Power and Thermal Constraints

Robust Optimization of a Chip Multiprocessor s Performance under Power and Thermal Constraints Robust Optimization of a Chip Multiprocessor s Performance under Power and Thermal Constraints Mohammad Ghasemazar, Hadi Goudarzi and Massoud Pedram University of Southern California Department of Electrical

More information

Parallel Polynomial Evaluation

Parallel Polynomial Evaluation Parallel Polynomial Evaluation Jan Verschelde joint work with Genady Yoffe University of Illinois at Chicago Department of Mathematics, Statistics, and Computer Science http://www.math.uic.edu/ jan jan@math.uic.edu

More information

Chip Level Thermal Profile Estimation Using On-chip Temperature Sensors

Chip Level Thermal Profile Estimation Using On-chip Temperature Sensors Chip Level Thermal Profile Estimation Using On-chip Temperature Sensors Yufu Zhang, Ankur Srivastava and Mohamed Zahran* University of Maryland, *City University of New York {yufuzh, ankurs}@umd.edu, *mzahran@ccny.cuny.edu

More information

i.mx 6 Temperature Sensor Module

i.mx 6 Temperature Sensor Module NXP Semiconductors Document Number: AN5215 Application Note Rev. 1, 03/2017 i.mx 6 Temperature Sensor Module 1. Introduction All the i.mx6 series application processors use the same temperature sensor

More information

Tools for Thermal Analysis: Thermal Test Chips Thomas Tarter Package Science Services LLC

Tools for Thermal Analysis: Thermal Test Chips Thomas Tarter Package Science Services LLC Tools for Thermal Analysis: Thermal Test Chips Thomas Tarter Package Science Services LLC ttarter@pkgscience.com INTRODUCTION Irrespective of if a device gets smaller, larger, hotter or cooler, some method

More information

Branch Prediction based attacks using Hardware performance Counters IIT Kharagpur

Branch Prediction based attacks using Hardware performance Counters IIT Kharagpur Branch Prediction based attacks using Hardware performance Counters IIT Kharagpur March 19, 2018 Modular Exponentiation Public key Cryptography March 19, 2018 Branch Prediction Attacks 2 / 54 Modular Exponentiation

More information

Dynamic Thermal Management of Processors Using Thermoelectric Coolers

Dynamic Thermal Management of Processors Using Thermoelectric Coolers Dynamic Thermal Management of Processors Using Thermoelectric Coolers By Sriram Jayakumar Submitted in partial fulfillment of the requirements of the degree of Bachelor of Science with Honors in Computer

More information

Thermal-reliable 3D Clock-tree Synthesis Considering Nonlinear Electrical-thermal-coupled TSV Model

Thermal-reliable 3D Clock-tree Synthesis Considering Nonlinear Electrical-thermal-coupled TSV Model Thermal-reliable 3D Clock-tree Synthesis Considering Nonlinear Electrical-thermal-coupled TSV Model Yang Shang 1, Chun Zhang 1, Hao Yu 1, Chuan Seng Tan 1, Xin Zhao 2, Sung Kyu Lim 2 1 School of Electrical

More information

Lecture 21: Packaging, Power, & Clock

Lecture 21: Packaging, Power, & Clock Lecture 21: Packaging, Power, & Clock Outline Packaging Power Distribution Clock Distribution 2 Packages Package functions Electrical connection of signals and power from chip to board Little delay or

More information

Thermal Prediction and Adaptive Control Through Workload Phase Detection

Thermal Prediction and Adaptive Control Through Workload Phase Detection Thermal Prediction and Adaptive Control Through Workload Phase Detection RYAN COCHRAN and SHERIEF REDA, Brown University Elevated die temperature is a true limiter to the scalability of modern processors.

More information

2366 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 22, NO. 11, NOVEMBER 2014

2366 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 22, NO. 11, NOVEMBER 2014 2366 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 22, NO. 11, NOVEMBER 2014 Power Blurring: Fast Static and Transient Thermal Analysis Method for Packaged Integrated Circuits

More information

Enhancing Multicore Reliability Through Wear Compensation in Online Assignment and Scheduling. Tam Chantem Electrical & Computer Engineering

Enhancing Multicore Reliability Through Wear Compensation in Online Assignment and Scheduling. Tam Chantem Electrical & Computer Engineering Enhancing Multicore Reliability Through Wear Compensation in Online Assignment and Scheduling Tam Chantem Electrical & Computer Engineering High performance Energy efficient Multicore Systems High complexity

More information

Enhancing the Sniper Simulator with Thermal Measurement

Enhancing the Sniper Simulator with Thermal Measurement Proceedings of the 18th International Conference on System Theory, Control and Computing, Sinaia, Romania, October 17-19, 214 Enhancing the Sniper Simulator with Thermal Measurement Adrian Florea, Claudiu

More information

Impact of Thread and Frequency Scaling on Performance and Energy in Modern Multicores: A Measurement-based Study

Impact of Thread and Frequency Scaling on Performance and Energy in Modern Multicores: A Measurement-based Study Impact of Thread and Frequency Scaling on Performance and Energy in Modern Multicores: A Measurement-based Study Armen Dzhagaryan Electrical and Computer Engineering The University of Alabama in Huntsville

More information

HIGH-PERFORMANCE circuits consume a considerable

HIGH-PERFORMANCE circuits consume a considerable 1166 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL 17, NO 11, NOVEMBER 1998 A Matrix Synthesis Approach to Thermal Placement Chris C N Chu D F Wong Abstract In this

More information

Thermal Resistance Measurement

Thermal Resistance Measurement Optotherm, Inc. 2591 Wexford-Bayne Rd Suite 304 Sewickley, PA 15143 USA phone +1 (724) 940-7600 fax +1 (724) 940-7611 www.optotherm.com Optotherm Sentris/Micro Application Note Thermal Resistance Measurement

More information

EE115C Winter 2017 Digital Electronic Circuits. Lecture 6: Power Consumption

EE115C Winter 2017 Digital Electronic Circuits. Lecture 6: Power Consumption EE115C Winter 2017 Digital Electronic Circuits Lecture 6: Power Consumption Four Key Design Metrics for Digital ICs Cost of ICs Reliability Speed Power EE115C Winter 2017 2 Power and Energy Challenges

More information

Cache Contention and Application Performance Prediction for Multi-Core Systems

Cache Contention and Application Performance Prediction for Multi-Core Systems Cache Contention and Application Performance Prediction for Multi-Core Systems Chi Xu, Xi Chen, Robert P. Dick, Zhuoqing Morley Mao University of Minnesota, University of Michigan IEEE International Symposium

More information

Temperature Issues in Modern Computer Architectures

Temperature Issues in Modern Computer Architectures 12 Temperature Issues in Modern omputer Architectures Basis: Pagani et al., DATE 2015, ODES+ISSS 2014 Babak Falsafi: Dark Silicon & Its Implications on Server hip Design, Microsoft Research, Nov. 2010

More information

Lecture 12: Energy and Power. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 12: Energy and Power. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 12: Energy and Power James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L12 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today a working understanding of

More information

A Detailed Study on Phase Predictors

A Detailed Study on Phase Predictors A Detailed Study on Phase Predictors Frederik Vandeputte, Lieven Eeckhout, and Koen De Bosschere Ghent University, Electronics and Information Systems Department Sint-Pietersnieuwstraat 41, B-9000 Gent,

More information

Analytical Model for Sensor Placement on Microprocessors

Analytical Model for Sensor Placement on Microprocessors Analytical Model for Sensor Placement on Microprocessors Kyeong-Jae Lee, Kevin Skadron, and Wei Huang Departments of Computer Science, and Electrical and Computer Engineering University of Virginia kl2z@alumni.virginia.edu,

More information

Parameterized Transient Thermal Behavioral Modeling For Chip Multiprocessors

Parameterized Transient Thermal Behavioral Modeling For Chip Multiprocessors Parameterized Transient Thermal Behavioral Modeling For Chip Multiprocessors Duo Li and Sheldon X-D Tan Dept of Electrical Engineering University of California Riverside, CA 95 Eduardo H Pacheco and Murli

More information

Co-Design of Multicore Architectures and Microfluidic Cooling for 3D Stacked ICs

Co-Design of Multicore Architectures and Microfluidic Cooling for 3D Stacked ICs Co-Design of Multicore Architectures and Microfluidic Cooling for 3D Stacked ICs Zhimin Wan, He Xiao, Yogendra Joshi*, Sudhakar Yalamanchili Georgia Institute of Technology, Atlanta, USA * Corresponding

More information

INTEGRATION, the VLSI journal

INTEGRATION, the VLSI journal INTEGRATION, the VLSI journal 46 (2013) 69 79 Contents lists available at SciVerse ScienceDirect INTEGRATION, the VLSI journal journal homepage: www.elsevier.com/locate/vlsi Post-silicon power mapping

More information

Leakage Minimization Using Self Sensing and Thermal Management

Leakage Minimization Using Self Sensing and Thermal Management Leakage Minimization Using Self Sensing and Thermal Management Alireza Vahdatpour Computer Science Department University of California, Los Angeles alireza@cs.ucla.edu Miodrag Potkonjak Computer Science

More information

Evaluating Linear Regression for Temperature Modeling at the Core Level

Evaluating Linear Regression for Temperature Modeling at the Core Level Evaluating Linear Regression for Temperature Modeling at the Core Level Dan Upton and Kim Hazelwood University of Virginia ABSTRACT Temperature issues have become a first-order concern for modern computing

More information

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications Christopher Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-Mei W. Hwu University of Illinois at Urbana-Champaign

More information

ODSA: A Novel Ordering Divisional Scheduling Algorithm for Modern Operating Systems

ODSA: A Novel Ordering Divisional Scheduling Algorithm for Modern Operating Systems ODSA: A Novel Ordering Divisional Scheduling Algorithm for Modern Operating Systems Junaid Haseeb Khizar Hameed Muhammad Junaid Muhammad Tayyab Samia Rehman COMSATS Institute of Information Technology,

More information

CIS 371 Computer Organization and Design

CIS 371 Computer Organization and Design CIS 371 Computer Organization and Design Unit 13: Power & Energy Slides developed by Milo Mar0n & Amir Roth at the University of Pennsylvania with sources that included University of Wisconsin slides by

More information

Dynamic Adaptation for Resilient Integrated Circuits and Systems

Dynamic Adaptation for Resilient Integrated Circuits and Systems Dynamic Adaptation for Resilient Integrated Circuits and Systems Krishnendu Chakrabarty Department of Electrical and Computer Engineering Duke University Durham, NC 27708, USA Department of Computer and

More information

ELECTROMAGNETIC MODELING OF THREE DIMENSIONAL INTEGRATED CIRCUITS MENTOR GRAPHICS

ELECTROMAGNETIC MODELING OF THREE DIMENSIONAL INTEGRATED CIRCUITS MENTOR GRAPHICS ELECTROMAGNETIC MODELING OF THREE DIMENSIONAL INTEGRATED CIRCUITS MENTOR GRAPHICS H I G H S P E E D D E S I G N W H I T E P A P E R w w w. m e n t o r. c o m / p c b INTRODUCTION Three Dimensional Integrated

More information

Compact Thermal Modeling for Temperature-Aware Design

Compact Thermal Modeling for Temperature-Aware Design Compact Thermal Modeling for Temperature-Aware Design Wei Huang, Mircea R. Stan, Kevin Skadron, Karthik Sankaranarayanan Shougata Ghosh, Sivakumar Velusamy Departments of Electrical and Computer Engineering,

More information

Spatio-Temporal Thermal-Aware Scheduling for Homogeneous High-Performance Computing Datacenters

Spatio-Temporal Thermal-Aware Scheduling for Homogeneous High-Performance Computing Datacenters Spatio-Temporal Thermal-Aware Scheduling for Homogeneous High-Performance Computing Datacenters Hongyang Sun a,, Patricia Stolf b, Jean-Marc Pierson b a Ecole Normale Superieure de Lyon & INRIA, France

More information

CS 700: Quantitative Methods & Experimental Design in Computer Science

CS 700: Quantitative Methods & Experimental Design in Computer Science CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,

More information

PROCESS CONTROL BASIS FOR A COST-EFFECTIVE SELECTIVE SOLDERING PROCESS

PROCESS CONTROL BASIS FOR A COST-EFFECTIVE SELECTIVE SOLDERING PROCESS PROCESS CONTROL BASIS FOR A COST-EFFECTIVE SELECTIVE SOLDERING PROCESS Christian Ott Kreuzwertheim, Germany christian.ott@seho.de Heike Schlessmann heike.schlessmann@seho.de Reiner Zoch reiner.zoch@seho.de

More information

STUDY OF THERMAL RESISTANCE MEASUREMENT TECHNIQUES

STUDY OF THERMAL RESISTANCE MEASUREMENT TECHNIQUES STUDY OF THERMAL RESISTANCE MEASUREMENT TECHNIQUES The Development of the Next Generation VLSI Technology for Very High Speed Analog Chip Design PROJECT UPDATE Prepared for : Dr. Ronald Carter Dr. W.Alan

More information

Scheduling of Frame-based Embedded Systems with Rechargeable Batteries

Scheduling of Frame-based Embedded Systems with Rechargeable Batteries Scheduling of Frame-based Embedded Systems with Rechargeable Batteries André Allavena Computer Science Department Cornell University Ithaca, NY 14853 andre@cs.cornell.edu Daniel Mossé Department of Computer

More information

Design for Manufacturability and Power Estimation. Physical issues verification (DSM)

Design for Manufacturability and Power Estimation. Physical issues verification (DSM) Design for Manufacturability and Power Estimation Lecture 25 Alessandra Nardi Thanks to Prof. Jan Rabaey and Prof. K. Keutzer Physical issues verification (DSM) Interconnects Signal Integrity P/G integrity

More information

Designing a Thermostat Worksheet

Designing a Thermostat Worksheet Designing a Thermostat Worksheet Most of us have a thermostat in our homes to control heating and cooling systems of our home. These important devices help us save energy by automatically turning off energy

More information

Analog Computation in Flash Memory for Datacenter-scale AI Inference in a Small Chip

Analog Computation in Flash Memory for Datacenter-scale AI Inference in a Small Chip 1 Analog Computation in Flash Memory for Datacenter-scale AI Inference in a Small Chip Dave Fick, CTO/Founder Mike Henry, CEO/Founder About Mythic 2 Focused on high-performance Edge AI Full stack co-design:

More information

THERMAL ANALYSIS & OPTIMIZATION OF A 3 DIMENSIONAL HETEROGENEOUS STRUCTURE

THERMAL ANALYSIS & OPTIMIZATION OF A 3 DIMENSIONAL HETEROGENEOUS STRUCTURE THERMAL ANALYSIS & OPTIMIZATION OF A 3 DIMENSIONAL HETEROGENEOUS STRUCTURE Ramya Menon C. 1 and Vinod Pangracious 2 1 Department of Electronics & Communication Engineering, Sahrdaya College of Engineering

More information

Temperature-aware Task Partitioning for Real-Time Scheduling in Embedded Systems

Temperature-aware Task Partitioning for Real-Time Scheduling in Embedded Systems Temperature-aware Task Partitioning for Real-Time Scheduling in Embedded Systems Zhe Wang, Sanjay Ranka and Prabhat Mishra Dept. of Computer and Information Science and Engineering University of Florida,

More information

Three-Tier 3D ICs for More Power Reduction: Strategies in CAD, Design, and Bonding Selection

Three-Tier 3D ICs for More Power Reduction: Strategies in CAD, Design, and Bonding Selection Three-Tier 3D ICs for More Power Reduction: Strategies in CAD, Design, and Bonding Selection Taigon Song 1, Shreepad Panth 2, Yoo-Jin Chae 3, and Sung Kyu Lim 1 1 School of ECE, Georgia Institute of Technology,

More information

Throughput of Multi-core Processors Under Thermal Constraints

Throughput of Multi-core Processors Under Thermal Constraints Throughput of Multi-core Processors Under Thermal Constraints Ravishankar Rao, Sarma Vrudhula, and Chaitali Chakrabarti Consortium for Embedded Systems Arizona State University Tempe, AZ 85281, USA {ravirao,

More information

Direct Self-Consistent Field Computations on GPU Clusters

Direct Self-Consistent Field Computations on GPU Clusters Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd

More information

CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues

CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan,

More information

An Algorithm for a Two-Disk Fault-Tolerant Array with (Prime 1) Disks

An Algorithm for a Two-Disk Fault-Tolerant Array with (Prime 1) Disks An Algorithm for a Two-Disk Fault-Tolerant Array with (Prime 1) Disks Sanjeeb Nanda and Narsingh Deo School of Computer Science University of Central Florida Orlando, Florida 32816-2362 sanjeeb@earthlink.net,

More information

CSE140L: Components and Design Techniques for Digital Systems Lab. Power Consumption in Digital Circuits. Pietro Mercati

CSE140L: Components and Design Techniques for Digital Systems Lab. Power Consumption in Digital Circuits. Pietro Mercati CSE140L: Components and Design Techniques for Digital Systems Lab Power Consumption in Digital Circuits Pietro Mercati 1 About the final Friday 09/02 at 11.30am in WLH2204 ~2hrs exam including (but not

More information

Evaluating Sampling Based Hotspot Detection

Evaluating Sampling Based Hotspot Detection Evaluating Sampling Based Hotspot Detection Qiang Wu and Oskar Mencer Department of Computing, Imperial College London, South Kensington, London SW7 2AZ, UK {qiangwu,oskar}@doc.ic.ac.uk http://comparch.doc.ic.ac.uk

More information

THERMAL DESIGN OF POWER SEMICONDUCTOR MODULES FOR MOBILE COMMNICATION SYSYTEMS. Yasuo Osone*

THERMAL DESIGN OF POWER SEMICONDUCTOR MODULES FOR MOBILE COMMNICATION SYSYTEMS. Yasuo Osone* Nice, Côte d Azur, France, 27-29 September 26 THERMAL DESIGN OF POWER SEMICONDUCTOR MODULES FOR MOBILE COMMNICATION SYSYTEMS Yasuo Osone* *Mechanical Engineering Research Laboratory, Hitachi, Ltd., 832-2

More information

Analysis of Temporal and Spatial Temperature Gradients for IC Reliability

Analysis of Temporal and Spatial Temperature Gradients for IC Reliability 1 Analysis of Temporal and Spatial Temperature Gradients for IC Reliability UNIV. OF VIRGINIA DEPT. OF COMPUTER SCIENCE TECH. REPORT CS-24-8 MARCH 24 Zhijian Lu, Wei Huang, Shougata Ghosh, John Lach, Mircea

More information

Interconnect Lifetime Prediction for Temperature-Aware Design

Interconnect Lifetime Prediction for Temperature-Aware Design Interconnect Lifetime Prediction for Temperature-Aware Design UNIV. OF VIRGINIA DEPT. OF COMPUTER SCIENCE TECH. REPORT CS-23-2 NOVEMBER 23 Zhijian Lu, Mircea Stan, John Lach, Kevin Skadron Departments

More information

EE241 - Spring 2001 Advanced Digital Integrated Circuits

EE241 - Spring 2001 Advanced Digital Integrated Circuits EE241 - Spring 21 Advanced Digital Integrated Circuits Lecture 12 Low Power Design Self-Resetting Logic Signals are pulses, not levels 1 Self-Resetting Logic Sense-Amplifying Logic Matsui, JSSC 12/94 2

More information