ABSTRACT ELECTRO-THERMAL CODESIGN IN LIQUID COOLED 3D ICS: PUSHING THE POWER- PERFORMANCE LIMITS. Bing Shi, Doctor of Philosophy, 2013

Size: px

Start display at page:

Download "ABSTRACT ELECTRO-THERMAL CODESIGN IN LIQUID COOLED 3D ICS: PUSHING THE POWER- PERFORMANCE LIMITS. Bing Shi, Doctor of Philosophy, 2013"

Edwina Marian Carson
5 years ago
Views:

1 ABSTRACT Title of dissertation: ELECTRO-THERMAL CODESIGN IN LIQUID COOLED 3D ICS: PUSHING THE POWER- PERFORMANCE LIMITS Bing Shi, Doctor of Philosophy, 2013 Dissertation directed by: Professor Ankur Srivastava Department of Electrical and Computer Engineering The performance improvement of today s computer systems is usually accompanied by increased chip power consumption and system temperature. Modern CPUs dissipate an average of W power while spatial and temporal power variations result in hotspots with even higher power density (up to 300W/cm 2 ). The coming years will continue to witness a significant increase in CPU power dissipation due to advanced multi-core architectures and 3D integration technologies. Nowadays the problems of increased chip power density, leakage power and system temperatures have become major obstacles for further improvement in chip performance. The conventional air cooling based heat sink has been proved to be insufficient for three dimensional integrated circuits (3D-ICs). Hence better cooling solutions are necessary. Micro-fluidic cooling, which integrates micro-channel heat sinks into silicon substrates of the chip and uses liquid flow to remove heat inside the chip, is an effective active cooling scheme for 3D-ICs. While the micro-fluidic cooling provides excellent cooling to 3D-ICs, the associated overhead (cooling power

2 consumed by the pump to inject the coolant through micro-channels) is significant. Moreover, the 3D-IC structure also imposes constraints on micro-channel locations (basically resource conflict with through-silicon-vias TSVs or other structures). In this work, we investigate optimized micro-channel configurations that address the aforementioned considerations. We develop three micro-channel structures (hotspot optimized cooling configuration, bended micro-channel and hybrid cooling network) that can provide sufficient cooling to 3D-IC with minimum cooling power overhead, while at the same time, compatible with the existing electrical structure such as TSVs. These configurations can achieve up to 70% cooling power savings compared with the configuration without any optimization. Based on these configurations, we then develop a micro-fluidic cooling based dynamic thermal management approach that maintains the chip temperature through controlling the fluid flow rate (pressure drop) through micro-channels. These cooling configurations are designed after the electrical parts, and therefore, compatible with the current standard IC design flow. Furthermore, the electrical, thermal, cooling and mechanical aspects of 3D-IC are interdependent. Hence the conventional design flow that designs the cooling configuration after electrical aspect is finished will result in inefficiencies. In order to overcome this problem, we then investigate electrical-thermal co-design methodology for 3D-ICs. Two co-design problems are explored: TSV assignment and micro-channel placement co-design, and gate sizing and fluidic cooling co-design. The experimental results show that the co-design enables a fundamental powerperformance improvement over the conventional design flow which separates the

3 electrical and cooling design. For example, the gate sizing and fluidic cooling codesign achieves 12% power savings under the same circuit timing constraint and 16% circuit speedup under the same power budget.

5 ELECTRO-THERMAL CODESIGN IN LIQUID COOLED 3D ICS: PUSHING THE POWER-PERFORMANCE LIMITS by Bing Shi Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2013 Advisory Committee: Professor Ankur Srivastava, Chair/Advisor Professor Joseph JaJa Professor Shuvra Bhattacharyya Professor Donald Yeung Professor Doron Levy

6 Copyright by Bing Shi 2013

7 ACKNOWLEDGEMENT I would like to thank my advisor, Professor Ankur Srivastava for the support and guidance he has provided throughout my time in the Ph.D. program. Thank you for introducing me to the world of Electronic Design Automation, for giving me so many opportunities, for helping me every step of the way, for encouraging me in those hard times. In addition, I would like to thank Professor Joseph JaJa who helped me a lot in my Ph.D. oral qualify exam, research proposal and also Ph.D. dissertation. I would like to thank Professor Shuvra Bhattacharyya for his support on my competition for ECE dissertation fellowship. I would also like to thank my committee members, Professor Joseph JaJa, Professor Shuvra Bhattacharyya, Professor Donald Yeung and Professor Doron Levy, for their time, comments and feedback. I also thank all past and present members of our lab: Domenic Forte, Yufu Zhang, Caleb Serafy, Tiantao Lu and Chongxi Bao, for their help, friendship, and support. I am grateful for all the fun times we have shared throughout the years. Finally, I would like to thank my parents and my family for their ongoing support and encouragement. Thank you for all of their love and support over the course of my long journey as an academic. ii

9 Table of Contents List of Figures List of Tables vii ix 1 Introduction Thermal Issues in 3D-ICs Conventional Dynamic Thermal Management Interlayer Micro-fluidic Cooling Interdependency between Electrical, Thermal, Reliability and Cooling Advantage of Electrical and Cooling System Co-Design Thesis Outline Background Basics of Three Dimensional Integrated Circuit Fundamental Characteristics of Fluids in Micro-channels Conservation Law of Fluid Dynamics Dimensionless Numbers in Fluid Mechanics Single and Two Phase Flow Laminar and Turbulent Flow Thermal Modeling of 3D-IC with Micro-fluidic Cooling Distributed RC Thermal Model Cooling Performance of Micro-channels Overall Thermal Model of 3D-IC with Micro-channels Thermal Impact of TSVs Modeling of Power Consumption Dynamic Power Consumption Leakage Power Consumption Micro-channel Cooling Power Straight Micro-channels Micro-channels with Bends Design of Micro-fluidic Cooling Configurations for 3D-ICs Motivation of Micro-Fluidic Cooling Micro-channel Design Considerations/Constraints Cooling Power Consumption Non-uniform Power Profile TSV Constraint Thermal stress Hotspot Optimized Non-Uniform Micro-channel Problem Formulation Heuristic for Micro-channel Placement Workload-balanced Initial Micro-channel Distribution Micro-channel Cost Assignment iv

10 3.4 TSV Constrained Bended Micro-channel Motivation of Using Bended Micro-channel Problem Formulation Overall Micro-channel Design Flow Mincost Flow Based Micro-channel Design Initialization of Minimum Cost Flow Network Cost Assignment Micro-channel Refinement Temperature and Pumping Power Analysis Iterative Micro-channel Optimization Hybrid Cooling Network Motivation of Hybrid Cooling Network Algorithm for Hybrid Cooling Network Design Micro-channel Priority Assignment/Update Thermal TSV Allocation and Sizing Basic Thermal TSV Placement Approach Modified Thermal TSV Allocation and Sizing Approach Finding Maximum Independent Set E Considering Thermal Variations Cooling Performance of Micro-channel Designs Runtime Thermal Management Using Micro-channels Algorithm for Micro-fluidic Based DTM Performance of Micro-channel Based DTM Summary Co-design of Electrical and Fluidic Cooling Systems Motivation for Co-Design Co-optimization of TSV Assignment and Micro-Channel Placement Problem Formulation Algorithm for TSV Assignment and Micro-channel Placement Co-optimization Overall Design Flow Multi-commodity Minimum Cost Flow Formulation Iterative Optimization Computational Simplifications Multi Layer Case Two Layer Case Performance of TSV Assignment and Micro-channel Placement Co-design Comparison of Wirelength and Pumping Power Tradeoff Between Wirelength and Pumping Power Co-optimization of Gate Sizing and Micro-Fluidic Cooling Motivation of Simultaneous Gate Sizing and Micro-channel Distribution v

11 4.3.2 Modeling of Gate Delay Problem Formulation Algorithm for Gate Sizing and Micro-channel Placement Cooptimization Step 1: Ideal Heat Sink and Gate Size Co-optimization Step 2: Micro-channel Distribution for Ideal Case Step 3: Gate Size and Grid Temperature Refinement Step 4: Micro-channel Distribution Refinement Step 5: Re-iteration and Stopping Criteria Performance of Gate Sizing and Micro-channel Placement Codesign Comparison of Power Consumption Comparison of Circuit Delay Power-Performance Tradeoff Summary Conclusion and Discussion Conclusion Future Work Bibliography 140 vi

12 List of Figures 1.1 Interdependency between Electrical, Thermal, Reliability and Cooling Stacked 3D-IC with micro-channel cooling system Control volume of fluid (a)-(f) Two phase flow patterns, (g) Evaporation process in a channel Comparison of single and two phase flow (a) Laminar flow pattern, (b) Turbulent flow pattern, (c) Transitional flow pattern Fluid in micro-channel with bends RC network for 3D-IC thermal modeling Micro-channel thermal model Thermal resistive network of one 3D-IC layer with micro-channels A 3D-IC grid with thermal TSV Exponential leakage model versus quadratic leakage model Micro-channel and TSV configuration Pumping power versus chip power consumption Thermal stress inside and surrounding TSV (a) when chip temperature is 100, (b) when chip temperature is 50 (assuming stress free temperature is 250 ) Potential locations of micro-channels: (a) uniform spreading of micro-channels, (b) workload-balanced micro-channel spreading Example of formulating mincost flow network, (a) 3D-IC structure, (b) abstract grid graph, (c) minimum cost flow network (a) Cost initialization, (b) Cost update Example of silicon layer thermal profile with TSV and (a) straight, (b) bended micro-channels Example of micro-channel infrastructure design using minimum cost flow Micro-channel infrastructure design flow Cost assignment Examples of (a) unbalanced cooling demand, (b) different number of bends Example of pairwise cooling workload balance Examples of bend elimination Overall design flow of micro-channel and thermal TSV co-optimization Change in interdependence region of a grid (a) after allocating or enlarging a thermal TSV, (b) after shrinking a thermal TSV Flow chart of micro-channel placement Comparison of Pumping Power Runtime pressure drop control versus fixed pressure drop for (a) group L, (b) group M, (c) group H Conventional chip design flow Thermal profile of one 3D-IC layer, and an example of TSV and micro-channel allocation where TSVs constraint us from allocating micro-channels at hotspots Overall design flow of MCMCF based algorithm vii

13 4.4 3D-IC with potential TSV and micro-channel locations Multi-commodity min-cost flow formulation Computationally simplifying transformation for multi-layer case Computationally simplifying transformation for two-layer case Tradeoff between wirelength and pumping power Overall design flow Delay versus power tradeoff for benchmark viii

14 List of Tables 3.1 Comparison of pumping power Problem formulation Benchmark Information Comparison between our approach, TSV first and channel first approach (P pump : W, W L : m, temperature: o C) Comparison of total power consumption (power: W, t cons : ns) Comparison of circuit performance (power: W, t cons : ns) ix

15 Chapter 1 Introduction Moore s law has predicted a spectacular exponential growth in chip performance. However, in recent years, such performance improvements are slowing down, leading the research community to investigate alternative technologies that can restore the expected Moore s law rhythm in the functionality and cost of electronic products. The three dimensional integrated circuit (3D-IC), which contains two or more layers of active electronic components that are stacked vertically, has become a significant technology for achieving continued performance improvements. The 3D-IC allows a significant increase in device densities, as well as faster on-chip communications compared with equivalent 2D circuits due to the shortening of interconnection length and increased bandwidth [30][88]. Besides the performance improvement, 3D-IC can also result in overall system energy savings and co-integration of heterogeneous components [22][49]. Despite these advantages, the 3D-IC also brings forth new challenges to chip thermal management due to the stacked structure. 1

16 1.1 Thermal Issues in 3D-ICs Modern CPUs dissipate an average of W power while spatial and temporal power variations result in hotspots with even higher power density (up to 300W/cm 2 ). The coming years will continue to witness a significant increase in CPU power dissipation due to advanced multi-core architectures and 3D integration technologies. Increase in CPU power density is usually accompanied by drastic increase in chip temperature. Nowadays the problems of increased chip power density, leakage power and system temperature have become major obstacles for further improvements in chip performance. The advent of 3D integration technology, exacerbates the thermal problems on chip since the power density increases dramatically due to several stacks of microprocessor chips, and also due to constraints imposed on heat flow paths (by several intermediate layers). Recent data shows that more than 50% of all IC failures are related to thermal issues [58]. For instance, excessive temperature reduces the electron and hole mobilities which leads to increase in circuit propagation delay [44][83]; thermal variations and hotspots on chip cause reliability problems such as circuit mismatch and reduced chip lifetime (due to the cumulative damage caused by excessive temperature) [29][10][50]. Hence, loss of performance and reliability due to unpredictable thermal hotspots has become a major issue and limiting factor for further performance improvement in modern computer systems. Furthermore, with continued scaling, the impact of leakage power is growing as well. Today, up to 50% (or even more) of the total power consumption is leakage 2

17 power [38]. It has been shown that leakage and temperature are highly interdependent: higher temperature increases the leakage power which in-turn further increases the temperature [47][80][27]. This interdependency increases the importance and difficulty of chip thermal management. The interdependence between temperature and leakage has been known for years and several attempts have been made during design time to better estimate/control the leakage and temperature through various design decisions [66][81]. For example [66] estimates the chip thermal and leakage profile while accounting for their interdependence, and [81] estimates the chip leakage profile while accounting for thermal variability. In convectional computer systems, the thermal issues within the chip are handled at the package level by attaching a large heatsink on the top of the chip which dissipates heat into surrounding air, together with air cooling based cooling devices such as fans and air conditioners. Such remote cooling approaches have limitations in the following ways: 1. Fail to account for temporal variations: the processor operation exhibits great variations during runtime due to the nature of different applications and data. The demand for resources by different applications also varies. The processor operation and demand for resources influence the power and thermal states on-chip, hence the chip power and thermal profiles change during runtime. Therefore the convectional air cooling that ignores the real time chip operation and cooling demand is inefficient. 2. Fail to account for spatial variations: the chip power and thermal profiles also 3

18 exhibit variations spatially since different parts of the chip exhibit different switching activities. Such variations result in thermal hotspots which are important issues in electronic systems. The convectional heat sinks usually provide uniform cooling, which is very inefficient when there are hotspots. 3. Insufficient cooling capability: convectional heat sinks are usually attached at the top of the chip, which makes it ineffective in removing the heat inside the chip. Especially for 3D-ICs, the air based cooling has already been proved to be insufficient. As illustrated in [8], if two 100W/cm 2 microprocessors are stacked on top of each other, the power density becomes 200W/cm 2, which is beyond the heat removal limits of air cooled heat sinks. Many efforts have been made to further mitigate the thermal issues in CPU chips. These efforts can be classified into three categories: CPU thermal management schemes [11][16][20][21][53][64][63], materials with better thermal property [67][79] and advanced cooling schemes [43][84][46][9]. In this work, we focus on the new cooling technology and dynamic thermal management for 3D-ICs. 1.2 Conventional Dynamic Thermal Management Usually, the chip performance and temperature are closely related. In order to improve the performance delivered by the microprocessor, we could increase the transistor integration density of the chip, or increase the supply voltage and clock frequency, which leads to increased chip power consumption and temperature. Dynamic thermal management (DTM), where the chip operation is controlled during 4

19 runtime for curtailing thermal emergencies, can better address the temporal and spatial variations of the power and thermal profiles on-chip (in addition to the convectional package level cooling scheme). In conventional DTM schemes, thermal management can be achieved by controlling processor knobs such as core frequency and supply voltage [64][25][13][41], scheduling of tasks etc [93], which in effect, control the power dissipation in different parts of the chip. These schemes basically manage the chip temperature through controlling the heat dissipation rate/pattern. For example, in dynamic voltage and frequency scaling (DVFS), the supply voltage and operating frequency of micro-processors are dynamically controlled to reduce the chip power consumption, thereby reducing the temperature as well. However decreasing the supply voltage or operating frequency causes a potential performance reduction. Hence in the conventional DTM schemes, constraining the chip temperature is usually accompanied by reductions in performance. With the continued application of conventional thermal management techniques, many of today s electronic systems underperform their inherent physical limits while operate at the highest power dissipation allowed by the available thermal management technology. CMOS, telecommunications, active sensing and imaging have undergone tremendous technological innovation over the last 40 years. However, despite the need and the potential for enhanced thermal management, electronic cooling technologies have changed very little in the past two or three decades, continuing instead to implement a remote cooling paradigm with only incremental improvements in performance. 5

20 1.3 Interlayer Micro-fluidic Cooling Relying on the conventional air-cooled heat sink for the thermal management of 3D-ICs could have catastrophic consequences. On one hand, due to the strong thermal-performance interdependency, in order to limit on-chip temperatures, designers will resort to aggressive shutdown or slowdown resulting in significant underutilization of the available devices, hurting overall performance and leading 3D-ICs to experience greater fractions of dark silicon than that experienced by 2D-ICs. On the other hand, the heat removal challenge could limit the number of 3D layers or physical design optimization space. Consequently, if the performance and energy efficiency promised by 3D integration are to be realized, the thermal challenge needs to be actively addressed. Micro-fluidic cooling, which integrates micro-channel heat sinks into silicon substrates of the chip and uses liquid flow to remove the heat from inside of the chip, can overcome this limitation. It has been reported to support heat dissipation higher than 700W/cm 2 [84]. Despite the excellent cooling capability, an overhead associated with micro-channel based heat removal technology is that the cooling system needs to consume extra energy for pumping the coolant through the channels. This has motivated a body of work that attempts to improve the micro-channel cooling effectiveness (thereby reducing the cooling energy consumption) by: a) controlling their dimensional parameters such as channel width, height and aspect ratio [42][84], b) investigating more sophisticated micro-channel infrastructures such as cross-linked micro-channels [32], micro-pin-fins [52][59], tree- or serpentine-shaped 6

21 micro-channels [68][23], and c) using hotspot optimized micro-channel structures [12][76], etc. Recently, micro-channel cooling has also been adopted in dynamic thermal management to control the runtime CPU performance and chip temperature by tuning the fluid flow rate through channels [19][18]. 1.4 Interdependency between Electrical, Thermal, Reliability and Cooling The electrical, thermal, reliability and cooling aspects of 3D-ICs are all interdependent. As the plot in Figure 1.1 shows, higher performance usually leads to greater chip power consumption and generates heat. Increase in chip temperature has a lot of detrimental effects. 1. It will result in higher circuit delay or delay uncertainties, which in turn limits the performance improvement. 2. Due to the interdependency between temperature and leakage power, increase in chip temperature will further increase the power consumption. 3. High chip temperature also exacerbate the electro-migration which will cause reliability loss. On the other hand, the heat level inside the chip also decides the micro-fluidic cooling system configuration, which in turn changes the temperature/power distribution (due to thermal power interdependence), thereby changing the circuit delay and chip lifetime. Furthermore, the existence of micro-fluidic cooling also causes 7

22 greater thermal gradients. Such thermal gradients and reduced chip temperature will cause greater thermal stress, which on one hand, might result in mechanical reliability issues such as crack formation, and on the other hand will change the transistor delay. Figure 1.1: Interdependency between Electrical, Thermal, Reliability and Cooling 1.5 Advantage of Electrical and Cooling System Co-Design In the conventional IC design flow, the electrical parts of the chip is designed first. The cooling system is then designed based on the current electrical system in place. However, due to the aforementioned interdependency, such design methodology (that separates electrical and cooling system design) is inefficient. Co-design of electrical and fluidic cooling system is necessary. It has the following advantages: 1. Higher cooling in timing critical areas results in better performing designs since transistor delay is proportional to temperature. 2. Higher cooling in timing critical areas enables us to aggressively pursue high power dissipating performance enhancements such as increasing supply volt- 8

23 age. This results in higher performance without impacting temperature since the extra heat can be managed by micro-fluidics. 3. The design optimization could be more aggressive since temperature issue can be addressed by aggressive cooling (placement, floorplanning etc). 4. Increasing the cooling levels in high leakage areas helps reduce the overall power since leakage is a highly non-linear function of temperature. Reduction in leakage may be significant enough to make increase in pumping power irrelevant. 5. Micro-fluidics may impact silicon thickness causing TSV performance degradation. By smart electrical design, this degradation could potentially be removed. For example, degradation in TSV performance could be overcome by stronger drivers. 1.6 Thesis Outline In this work, we investigate optimization of micro-fluidic cooling system that can provide sufficient cooling to the 3D-IC with minimum overhead, while at the same time, addressing the design constraints imposed by the 3D-IC structure. Three micro-fluidic cooling configurations are proposed: hotspot-optimized non-uniform micro-channel, bended micro-channel and hybrid cooling network. In order to fully explore the interdependency among electrical, thermal, reliability and cooling aspects of 3D-ICs, we also investigate electrical and micro-fluidic 9

24 co-design methodologies. With the co-design, fundamental power-performance improvements can be achieved. This dissertation is organized in five chapters. Following this introduction is the background about 3D-IC and micro-fluidic cooling. In that chapter, we briefly introduce the fundamentals of micro-fluidic cooling, as well as thermal and power modeling of 3D-IC with micro-fluidic cooling. Chapter 3 discusses the design considerations of micro-fluidic cooling in 3D-ICs and presents three micro-channel heat sink configurations that addresses these considerations. A micro-fluidic cooling based dynamic thermal management (DTM) scheme is proposed. In Chapter 4, we investigate the electrical and cooling system co-design methodology. In that chapter, we focus on two aspects of the co-design: a) TSV assignment and micro-channel placement co-optimization, and b) gate sizing and micro-channel co-optimization. Finally, we conclude in Chapter 5 with a summary of the main findings of this work, and consider further prospects of this research field. 10

25 Chapter 2 Background 2.1 Basics of Three Dimensional Integrated Circuit The 3D-IC contains two or more layers of active electronic components which are stacked vertically. Figure 2.1 shows a three-tiered stacked 3D-IC. In the 3D- IC, each active layer contains the functional units such as cores and caches, etc. The metal layer contains wires that enable communication among different components. There is also a metal-oxide layer above each metal layer. Through-silicon-vias (TSVs) are inserted in 3D-IC to deliver signal/power/ground among different tiers. In 3D-IC, since several layers of electronic components that dissipates power are stacked vertically, its power density is usually higher than 2D-ICs, leading to potential thermal issues. Moreover, the thermal conductivity of oxide layer is low and hence would reduce the heat transfer towards the ambient. This exacerbates the thermal problems in 3D-ICs. Hence an important issue with 3D-IC is the removal of high density heat resulting from several stacks of microprocessor chips. Although current 3D-IC designs are limited to partitioning of memory and datapath across layers, future 3D-IC designs are expected to have significantly complex architectures and integration levels that would be associated with very high power dissipation and heat density. In order to alleviate the thermal issues, micro-channel based liquid cooling and 11

26 thermal TSVs have been adopted. As shown in Figure 2.1, micro-channel heat sinks are embedded below the active layers. Liquid is pumped through each channel, and takes away the heat generated in the active layers [39][43]. The heated coolant is then cooled down in the heat exchanger, and recirculates into the fluid pump again for the cooling in the next circulation. On the other hand, TSVs, which are usually made of copper and have better thermal conductivity than silicon or metal-oxide, can help improve conduction of heat between different layers. When the number of signal TSVs is not enough, dummy thermal TSVs are inserted to further mitigate the thermal issues. Figure 2.1: Stacked 3D-IC with micro-channel cooling system 2.2 Fundamental Characteristics of Fluids in Micro-channels Conservation Law of Fluid Dynamics The characteristic of fluid inside the micro-channels is governed by conservation law of fluid. Considering the control volume of fluid U and its surface S 12

27 Figure 2.2: Control volume of fluid (as shown in Figure 2.2). The fluid flow in the control volume is governed by the following mass, momentum and energy conservation equations [87][62][85][37][78]: Mass conservation : ρ t + (ρ v) = 0 Momentum conservation : ρ( v t + v v) = p + µ 2 v Energy conservation : C v dt dt + ( k f T ) + C v v T = P (2.1) Here v is the flow velocity vector, T is the fluid temperature, P is the volumetric heat generation rate, and p is the pressure inside fluid. Also, ρ, µ, C v and k f are the density, viscosity, volumetric specific heat and thermal conductivity of the fluid, respectively Dimensionless Numbers in Fluid Mechanics The governing equations above are complex partial differential equations (PDE). Researchers in fluid mechanics introduced a set of dimensionless numbers which could help simplify the complex problem and also better understand the relative importance of forces, energies, or time scales [87][55]. Some of these dimensionless numbers are Reynolds number (Re), Prandtl number (Pr) and Nusselt number (Nu), etc. 13

28 Reynolds number Re: The Reynolds number gives a measure of the ratio between inertial forces to viscous forces, and is defined as: Re = ρvl c µ (2.2) where v is the mean fluid velocity and L c is the characteristic length. In straight micro-channels, the characteristic length is usually given by the hydraulic diameter D h. When the cross section of the channel is circular, D h is the diameter of the cross section, while in rectangular channels, D h is defined as D h = 4 cross sectional area/perimeter = 4 x z/(2 x + 2 z), where x and z are the width and height of the micro-channel. Usually, the Reynolds number is used to distinguish between laminar and turbulent flow, which will be explained later. Prandtl number Pr: The Prandtl number is the ratio of momentum diffusivity (kinematic viscosity) to thermal diffusivity. P r = kinematic viscosity thermal diffusivity = µ/ρ k f /(ρc v ) = C vµ (2.3) k f Nusselt number Nu: The Nusselt number is the ratio of convective to conductive heat transfer across the boundary between the fluid and solid. The Nusselt number is defined as: Nu = hl c k f (2.4) where h, L c and k f are the convective heat transfer coefficient, channel characteristic 14

29 length and fluid thermal conductivity. Usually, N u is used to calculate the convective heat transfer coefficient h. Many works have been done to characterize the Nusselt number in micro-channels, and express it as a function of the Reynolds number and Prandtl number [15][5][89][94] Single and Two Phase Flow The working fluid in the micro-channel can be either single phase or two phase. The single phase flow consists of exclusively liquid coolant as the working fluid, while two phase flow consists of both liquid and vapor. When the power density is too high so that the liquid absorbs too much heat and its temperature increases dramatically, part of the liquid will become vapor and two phase flow is formed. The two phase flow exhibits different patterns. Figure 2.3(a)-(f) shows the two phase flow patterns in horizontal channels. When the flow rate is low, the flow usually exhibits bubbly (Figure 2.3(a)) or plug pattern (Figure 2.3(b)), as the flow rate increases, the pattern becomes stratified (Figure 2.3(c)) and wavy (Figure 2.3(d)), and finally slug (Figure 2.3(e)) and annular (Figure 2.3(f)) [82][51]. The evaporation process in a channel is as Figure 2.3(g) shows. As the single phase liquid absorbs heat so that the temperature increases to the evaporation point, small bubbles appear. When the fluid continues to absorb heat along the channel, plug and slug flows appear. The flow becomes waved and annular in the end. Figure 2.4 compares the cooling effectiveness of single and two phase flows. 15

30 (a) bubbly (b) plug (c) stratified (d) wave (e) slug (f) annular (g) Figure 2.3: (a)-(f) Two phase flow patterns, (g) Evaporation process in a channel It plots the solid temperature at the micro-channel outlet location T w versus the footprint power density P a for both single and two phase flows at same pumping power [6]. It shows that two phase flow achieves lower solid temperature than single phase flow, which indicates that two phase flow has higher cooling effectiveness than single phase flow. 16

31 60 50 Single phase Two phase T w ( o C) P a (W/cm 2 ) Figure 2.4: Comparison of single and two phase flow Laminar and Turbulent Flow The flow inside micro-channels can be laminar, turbulent, or transitional [55]. Figures 2.5(a), 2.5(b), 2.5(c) show these three types of patterns. Laminar flow (Figure 2.5(a)) occurs when fluid flows in parallel layers, with no disruption between the layers. That is, the pathlines of different particles are parallel. It generally happens in small channels and low flow velocities. In turbulent flow (as shown in Figure 2.5(b)), vortices and eddies appear, and make the flow unpredictable. Turbulent flow generally happens at high flow rates and larger channels. Transitional flow (Figure 2.5(c)) is a mixture of laminar and turbulent flow, with turbulence in the center of the channel, and laminar flow near the edges. Usually, Reynolds number is used to predict the type of flow (whether laminar, turbulent or transitional) in straight channels. For example, as [55] shows: When Re < 2100, it is laminar flow; when 2100 < Re < 4000, it is transitional flow; when Re > 4000, it is turbulent flow. When the channel involves more complex structure, the fluid exhibits more 17

32 (a) Laminar (b) Turbulent (c) Transitional Figure 2.5: (a) Laminar flow pattern, (b) Turbulent flow pattern, (c) Transitional flow pattern complicated behavior. Figure 2.6 shows an example of otherwise laminar flow in straight channels in a micro-channel with bends. When fluid enters a channel, it firstly subjects to a flow development process and after traveling some distance downstream, it becomes fully developed laminar flow. Then, when the flow comes across a bend, it becomes turbulent/developing around the corner and settles down after traveling some distance downstream into laminar fully developed flow again [68]. Figure 2.6: Fluid in micro-channel with bends 18

33 2.3 Thermal Modeling of 3D-IC with Micro-fluidic Cooling Distributed RC Thermal Model The chip thermal behavior can be modeled by a distributed RC network by partitioning it into fine grids. In this network, each grid is represented by a node. The voltage at each node represents the temperature at that grid. The current source in each grid represents the power dissipated at that location, so the chip power profile decides the current injected at each grid. Each resistance represents a heat transfer path between grids, while capacitors indicate the ability to store heat [77]. Figure 2.7 shows an example of the RC network for one 3D-IC layer. In this network, R i,j (i, j = 1...6) indicates the heat path (thermal resistance) between grids i and j, C i represents the thermal capacitance of grid i. According to the thermal model, the thermal dynamics of each grid i is governed by the following equation: dt i dt = j N (i) T i T j R i,j C i + P i C i, grids i (2.5) Here T i is the temperature of grid i, and P i is the power consumption at this grid, N (i) represents the set of grids adjacent to grid i. In some works, people are more interested in the steady state thermal behavior. In this case, the thermal model can be simplified as a resistive network that represent steady state chip thermal behavior. Hence the governing equation in Equation 2.5 can be simplified as a set of linear equations of temperature and power as Equation 19

34 Figure 2.7: RC network for 3D-IC thermal modeling 2.6 shows. Given a chip thermal resistive network and power profile, the temperature profile can be estimated by solving the following system of linear equations. G T = P (2.6) Here G is the thermal conductance matrix decided by the thermal resistance network, P = {Pi, grid i}, T = {Ti, grid i} represent the power and temperature profiles Cooling Performance of Micro-channels The heat removal through micro-channels comprises of an intricate combination of heat conduction, convection and coolant flow. Consider the micro-channel in Figure 2.8, heat dissipated in surrounding regions (basically active layers) first conducts to the micro-channel sidewalls. The heat is then absorbed by the fluid through convection. The heated fluid is then carried away by the moving flow. These three aspects can be captured by expressing them as three types of thermal resistances: R cond for conduction, R conv for convection and R heat captures fluid flow (as shown 20

35 in Figure 2.8). Figure 2.8: Micro-channel thermal model Conductive resistance R cond : It is decided by thermal characteristics of the silicon that conducts heat dissipated in surrounding region to micro-channel sidewalls. It can be calculated using the model in [77]. Convective resistance R conv : It results from the convection of fluid, which moves the heat from micro-channel sidewalls to into the coolant fluid. The convective resistance depends on the fluid property and area for heat transfer between the micro-channel sidewalls and fluid. Assuming the micro-channel has been discretized into grids along the fluid direction z. The size of each grid is x y z as Figure 2.8 shows. Let R conv be the convective resistance between the micro-channel and sidewalls in each grid. As shown in [84], R conv = 1/hA h, where A h is the surface area for heat transfer in each grid. If we assume that heat can be transferred from all four sidewalls, the surface area of each grid is A h = 2 z( x + y). The parameter h is the coefficient of convective heat transfer explained in Section Given the Nusselt number N u and the micro-channel dimension, it is calculated by h = N u k f /D h, k f is fluid thermal conductivity and D h is the hydraulic diameter. So the convective resistance could be expressed as: 21

36 R conv = 1 ha h = D h 2N u k f z( x + y) (2.7) Convective resistance R heat : The heat resistance basically represents the heat flowing downstream caused by the moving fluid: R heat = 1 C v ρf (2.8) Here f is the volumetric flow rate in each channel. It depends on the fluid velocity v and micro-channel cross sectional area: f = velocity cross sectional area = v x y. C v is the fluid specific heat, and ρ is fluid density Overall Thermal Model of 3D-IC with Micro-channels As indicated earlier, the thermal behavior of micro-channels can also be modeled by a thermal resistance network (Figure 2.8). The parameters of this resistive network could be computed using the equations described above or experiment based approaches [54]. The 3D-IC resistive network and micro-channel network can be combined to generate a unified model that captures the steady state thermal behavior of 3D-ICs with liquid cooling (Figure 2.9). Other aspects of 3D-ICs such as the thermal impact of TSVs and thermal wake effect [54] can also be incorporated in this resistive network. 22

37 Figure 2.9: Thermal resistive network of one 3D-IC layer with micro-channels Thermal Impact of TSVs Besides of micro-fluidic cooling, some works have also proposed usage of dummy thermal TSVs for 3D-IC temperature reduction [24][17][90]. Due to the existence of oxide layer which separates different tiers of 3D-IC thermally, the heat cannot be effectively dissipated between tiers. The dummy thermal TSVs are firstly proposed by [14] as additional heat dissipation paths to alleviate the temperature issues on chip. Now it is adopted in 3D-ICs [24][17][90]. Since the TSV fill materials such as copper usually have much higher thermal conductivity than silicon and oxide, thermal TSVs could enhance the vertical heat transfer between different 3D-IC tiers and to the heat sinks by reducing the effective thermal resistances. To quantify the thermal effect of TSV on 3D-IC, assuming there is a thermal TSV inserted in a 3D-IC grid as Figure 2.10 shows. The dimension of the grid is x y z, and its original vertical thermal conductivity is ky old. Assuming the cross sectional area of thermal TSV is A tsv, the vertical thermal conductivity of this grid after inserting the thermal TSV becomes: 23

38 k new y = k tsv A tsv x z + kold y (1 A tsv x z ) = k old y + (k tsv ky old ) A tsv x z (2.9) Since the thermal conductivity of TSV fill material k tsv is usually larger than the original thermal conductivity k old y (which is generally the thermal conductivity of silicon and metal-oxide), the thermal conductivity of this grid will increase after inserting the thermal TSV, which could result in better heat transfer between different tiers, and thus more uniform thermal profile. Figure 2.10: A 3D-IC grid with thermal TSV 2.4 Modeling of Power Consumption The chip power consumption has two major components: dynamic power and leakage power [86]. Dynamic power results from charging of transistor load capacitances when they are switched, while leakage power is the power consumed by transistors when they are in idle state. At the system level, there are generally three power states. A) Active mode, where the system is performing some operation. In this mode, the chip dissipates both dynamic power and leakage power. B) Standby mode, where the system is idle but ready to execute an operation. In this mode, the circuit dissipates only 24

39 leakage power. C) Inactive mode, where the power supply to circuits are shut down by power gating or other leakage reduction techniques. Very small amount of power is dissipated in this mode. In addition to the chip power consumption, the micro-fluidic heat sink also consumes extra power for performing chip cooling. This power basically comes from the pump to inject the coolant through micro-channels. This extra cooling power consumption is called pumping power Dynamic Power Consumption The dynamic power depends on the transistor load capacitances being charged, the rate of switching, supply voltage, etc [86]. For each gate g i, its dynamic power can be calculated by P d,i = α i C d,i V 2 dd F, where C d,i is the load capacitance of gate g i, α i is its average switching activity in each cycle and F is the clock frequency. The output capacitance C d,i is proportional to gate width s i, hence the dynamic power can also be represented as a function of gate size and clock frequency as Equation 2.10 shows. In the equation, β d,i depends on the switching activity α i and supply voltage V dd, etc. P d,i = α i C d,i (s)v 2 ddf = β d,i s i F (2.10) 25

40 2.4.2 Leakage Power Consumption Current leaks through transistors even when they are turned off, resulting in leakage power consumption. There are three main components of leakage power: reverse biased junction leakage, sub-threshold leakage and gate oxide tunneling leakage [86]. The junction leakage and sub-threshold leakage increases with temperature while the gate leakage is rather insensitive to temperature. [47] models the leakagetemperature dependency as: P l,i = β l,1 T 2 i e βl,2 T i + β l,3 (2.11) As shown in [91], the variation of e 1 T i is very small in the normal range of chip operating temperature. Hence, some works also approximate the leakage model as a quadratic function of temperature as Equation 2.12 shows [91]. The quadratic fitting parameters ε 1,2,3 are obtained from the underlying model in [47]. We tested the accuracy of this quadratic model. Figure 2.11 shows that the quadratic model is very close to the exponential model given in [47]. P l,i = ε 1 T 2 i + ε 2 T i + ε 3 (2.12) The leakage power is also a linear function of gate width s i [86]. Hence the overall 26

41 Transistor leakage power (W) x 10 8 exponential model quadratic model Temperature ( o C) Figure 2.11: Exponential leakage model versus quadratic leakage model leakage power can be modeled as (here φ is a constant): P l,i = φ s i (β l,1 T 2 i e βl,2 T i + β l,3 ) φ s i (ε 1 T 2 i + ε 2 T i + ε 3 ) (2.13) From the power models, large gate size will result in higher dynamic and leakage power, which leads to temperature increase. Temperature increase in turn will lead to further increase in leakage power Micro-channel Cooling Power Straight Micro-channels The power used by micro-channels for performing chip cooling comes from the work done by the fluid pump to push the coolant fluid into micro-channels. It is a strong function of the level of heat removal desired. Basically, to maintain acceptable thermal levels, increase of chip power dissipation would result in increased pumping power P pump, which is decided by the pressure drop and coolant fluid flow rate. 27

42 N P pump = f n p n (2.14) n=1 Assuming there are N micro-channels, p n and f n are the pressure drop and fluid flow rate of the n-th micro-channel. Here we assume the flow is fully developed laminar flow. The pressure drop in a micro-channel is decided by: p = 2γµLv D 2 h (2.15) where L is channel length, D h is hydraulic diameter, v is fluid velocity, µ is the viscosity of fluid and γ is a function of micro-channel aspect ratio ( y x ) [42][39]. In this work, we assume that all straight micro-channels have the same width and height. Usually fluid pumps are designed to work such that all the microchannels experience the same pressure drop p. For a given pressure drop that the pump delivers across all channels, fluid velocity v could be estimated using Equation So the fluid flow rate f = v x y is also a function of pressure drop p, and could be estimated. Since the pressure drop is the same across all channels, so are the velocity and fluid flow rate since we assume all channels have the same dimension. Given this, the pumping power can be rewritten as: P pump = Nf p = N x yd2 h p2 2γµL (2.16) 28

43 Micro-channels with Bends Consider the micro-channel structure shown in Figure 2.6. The existence of a bend causes a change in the flow properties which impact the cooling effectiveness and pressure drop. An otherwise fully developed laminar flow in the straight part of the channel, when comes across a 90 bend becomes turbulent/developing around the corner and settles down after traveling some distance downstream into laminar fully developed again (see Figure 2.6). So a channel with bends has three distinct regions, 1) fully developed laminar flow region, 2) the bend corner, and 3) the developing/turbulent region after the bend [33][68]. The length of flow developing region is [69]: L d = ( y x 0.04 y2 x 2 )ReD h (2.17) where Re is the Raynolds number, and x, y and D h are the micro-channel width, height and hydraulic diameter. The rectangular bend impacts the pressure drop. Due to the presence of bends, the pressure drop in the channel is greater than an equivalent straight channel with exactly the same dimensions. The total pressure drop in a channel with bends is the sum of the pressure drop in the three regions described above (which finally depend on how many bends the channel has). Assume L is the total channel length, and m is the bend count. Therefore m L d is the total length that has developing/turbulent flow and m x is the total length attributed to corners (see Figure 2.6). Hence the effective channel length attributed to fully developed laminar flow is L m L d m x. The pressure drop in the channel is the sum of the pressure 29

44 drop in each of these regions. Pressure drop in fully developed laminar region: The total pressure drop in fully developed laminar region is [42]: p f = 2γµ(L m L d m x)v D 2 h = 2γµL fv D 2 h (2.18) Here L f = L m L d m x is the total length of the fully developed laminar region which is explained above, the other parameters are the same as in Equation Pressure drop in flow developing region: The pressure drop in each flow developing region is: δp d = 2µv D 2 h Ld 0 ψ(z)dz [56]. Here ψ(z) is given by ψ(z) = 3.44 (ReD h )/z, where z is the distance from the entrance of developing region in the flow direction. Assuming there are a total of m corners in a given microchannel, so there are m developing regions with the same length L d in this channel. By putting the expression of ψ(z) and L d into the equation of δp d and solving the integration, we can get the total pressure drop of the developing region in this micro-channel: p d = m δp d = mk d ρv 2 (2.19) where K d = 13.76( y x 0.04 y2 x 2 ) 1 2 is a constant associated with the aspect ratio y. Please refer to [33][56] for details. x 30

45 Pressure drop in corner region: The total pressure drop at all the 90 bends in a micro-channel is decided by: p 90 = m δp 90 = m ρ 2 K 90v 2 (2.20) where m is the number of corners in the channel, δp 90 is the pressure drop at each bend corner and K 90 is the pressure loss coefficient for 90 bend whose value can be found in [33]. Total pumping power: The total pressure drop in a micro-channel with bends is the sum of the pressure drop in the three regions discussed above: p = p d + p f + p 90 = 2γµL f D 2 h v + m(k d + K 90 2 )ρv2 (2.21) From Equations 2.21, the total pressure drop of a micro-channel is a quadratic function of the fluid velocity v. For a given pressure difference applied on a microchannel, we can calculate the associated fluid velocity by solving Equation With the fluid velocity, we can then estimate the fluid flow rate f, and thus estimate the thermal resistance and pumping power for this channel. Hence the pumping power as well as cooling effectiveness of micro-channels with bends is a function of 1) number of bends, 2) location of channels, and 3) pressure drop across the channel. Comparing Equations 2.15 and 2.21, due to the presence of bends, if the same 31

46 pressure drop is applied on a straight and a bended micro-channel of the same length, the bended channel will have lower fluid velocity, which leads to a lower cooling capability. Therefore, to provide the same amount of cooling, we will need to increase the overall pressure drop that the pump delivers, which results in increase of pumping power. But bends allow for better coverage in the presence of TSVs. 32

47 Chapter 3 Design of Micro-fluidic Cooling Configurations for 3D-ICs 3.1 Motivation of Micro-Fluidic Cooling The coming years will witness a significant increase in CPU power dissipation due to advanced multi-core architectures and 3D integration technologies. The thermal problem in 3D-IC is even more severe compared with 2D circuits, because the power density is usually higher due to the stacked architecture. Moreover, the thermal conductivity of oxide layer is low and hence would reduce the heat conduction towards the ambient. The conventional air cooling has been proved to be insufficient for future high performance 3D-ICs even with sophisticated DTM schemes [8]. As a result, more effective active cooling schemes are being investigated for high performance 3D-ICs [39][43]. Micro-channel cooling, which integrates microchannel heat sinks into each tier of the 3D-IC and uses liquid flow to remove heat from within the 3D chip, is an effective active cooling scheme for 3D-IC. It has been reported to support heat dissipation higher than 700W/cm 2 with single phase flow[84]. When the working fluid is two phase flow, the heat removal rate is even higher. 33

48 Figure 3.1: Micro-channel and TSV configuration 3.2 Micro-channel Design Considerations/Constraints As shown in Figure 2.1, each tier of 3D-IC contains an active silicon layer and silicon substrate. The micro-channels are placed horizontally in the silicon substrate. TSVs such as power/ground TSV, signal TSV, etc, are incorporated for communications between layers and delivery of power and ground. Figure 3.1 shows a possible configuration of micro-channels and TSVs in the silicon substrate of 3D-IC [40][45]. In each 3D-IC tier, micro-channels are etched in the inter-layer region (silicon substrate). Fluidic channels (fluidic TSVs) go through all the tiers and delivers coolant to micro-channels. TSVs also go through the silicon substrate vertically to deliver signal, power and ground. Though the micro-channel heat sink is capable of achieving good cooling performance, many problems need to be addressed when designing the micro-channel infrastructure for cooling 3D-IC so as to ensure the reliability of the chip and also improve the effectiveness of the micro-channel [72]. 34

49 10 8 Ppump(W) Total 3D IC chip power (W) Figure 3.2: Pumping power versus chip power consumption Cooling Power Consumption The micro-fluidic cooling is active by nature. That is, the fluid pump consumes extra energy for pushing the coolant through the micro-channels (we call this pumping power consumption). The pumping power can be quite significant. Figure 3.2 shows the pumping power required to maintain the 3D chip below temperature constraints (85 ) for different chip power profiles using the conventional approach of spreading straight micro-channels all over each tier. For each power profile, we find the minimum pressure drop required to maintain the chip temperature within constraints and then estimate the pumping power under this pressure drop using Equation As we can see, to maintain the chip temperature within acceptable levels, pumping power increases very fast as the total chip power increases. Therefore controlling the micro-channel pumping power is very important Non-uniform Power Profile The underlying heat dissipated in each active silicon layer exhibits great nonuniformity [39][60]. Such non-uniformity in power profile results in hotspots in 35

50 thermal profiles. Therefore, when designing micro-channel heat sink infrastructure, one should account for this non-uniformity in thermal and power profiles. Simply minimizing the total equivalent thermal resistance of the micro-channels while failing to consider the non-uniformity of the power profile will lead to suboptimal design. For example, conventional approaches for micro-channel designs spread the entire surface to be cooled with channels, and find the width and height of micro-channels that minimize the overall thermal resistance [84][42]. This approach, though helps reducing the peak temperature around the hotspot region, over cools areas that are already sufficiently cool. This is wasteful from the point of view of pumping power TSV Constraint 3D-ICs impose significant constraints on how and where the micro-channels could be located due to the presence of TSVs, which allow different layers to communicate. As illustrated in Figure 3.1, micro-channels are allocated in the interlayer bulk silicon regions. TSVs also exist in this region, causing a resource conflict. A 3D-IC usually contains thousands of TSVs which are incorporated with clustered or distributed topologies [26][57]. These TSVs form obstacles to the micro-channels since the micro-channels cannot be placed at the locations of TSVs. Therefore the presence of TSVs limits the available spaces for micro-channels, and designing the micro-channel infrastructure should take this fact into consideration. 36

51 3.2.4 Thermal stress The TSV fill materials are usually different from silicon. For example, copper has low resistivity and is therefore widely used as the material for TSV fill. Because the annealing temperature is usually much higher than the operating temperature, thermal stress will appear in silicon substrate and TSV after cooling down to room temperature due to the thermal expansion mismatch between copper and silicon [92][7]. This thermal stress might result in reliability problems such as cracking. Moreover, as shown in [92][28], thermal stress also influences electron/hole mobilities significantly, hence changing the gate delay. Therefore, if the gates on critical paths are allocated near TSVs (basically regions with high thermal stress), timing violation might occur. The existence of micro-channels which influences the temperature around TSVs will influence the thermal stress, thereby changing the mechanical reliability analysis and timing analysis in the 3D-IC with TSVs. For example, Figure 3.3 shows the thermal stress inside and surrounding a TSV at different thermal conditions. Figure 3.3(a) depicts the thermal stress when chip temperature is 100 and annealing temperature (which is basically the stress free reference temperature) is 250. The figure shows that large thermal stress (up to 490MPa) appears surrounding the TSV. Figure 3.3(b) depicts the thermal stress when the chip temperature is 50. In this case (where chip temperature is 50 ), the overall thermal stress is increased (compared with the previous case where chip temperature is 100 ), and the maximum thermal stress reaches up to 670MPa. Such phenomenon indicates 37

52 that reduction in chip temperature results in an increase in thermal stress. Hence the existence of micro-channels, which generally reduces chip temperature, may increase the TSV-induced thermal stress. Such phenomenon should be considered when designing the micro-channel infrastructure. (a) (b) Figure 3.3: Thermal stress inside and surrounding TSV (a) when chip temperature is 100, (b) when chip temperature is 50 (assuming stress free temperature is 250 ) Moreover, if micro-channels are placed too close to the TSVs, the silicon walls between the TSVs and micro-channels will be more likely to crack because the walls are thin. These facts further limits the locations of micro-channels. In this chapter, we propose three micro-channel structures (cooling configura- 38

53 tions) to improve the cooling effectiveness while still satisfying the design constraints imposed on micro-channels. These three structures are: non-uniform (hotspot optimized) micro-channels [76], bended (TSV-constrained) micro-channels [73] and hybrid cooling network [75]. We also investigate a micro-channel based dynamic thermal management scheme that controls the runtime chip temperature by tuning the pressure drop (fluid flow rate) through micro-channels [71]. 3.3 Hotspot Optimized Non-Uniform Micro-channel The first configuration is hotspot-optimized non-uniformly distributed microchannels [76]. In this work, we start from the regular straight micro-channels. According to the micro-channel thermal model in Section 2.3.2, the cooling effectiveness of micro-channels depends on the dimension and distribution of micro-channels, as well as the fluid flow rate through micro-channels. The pumping power required by micro-channels also depends on these parameters as Equation 2.16 shows. Here we assume the micro-channel width and height are fixed. The optimal micro-channel width and height were investigated in [84][42], etc. In this case, designing the optimal micro-channel structure is basically deciding the count and distribution of micro-channels. For a given pressure drop, increase in the number of micro-channels helps increasing the coverage of cooling system thereby improving the heat removal rate. But this will also lead to linear increase in total pumping power. 39

54 3.3.1 Problem Formulation Given a 3D-IC design, its power distribution is a function of the architecture and application. Assuming the power profile is given (this assumption will be generalized later), and we know a set of locations as potential target locations for micro-channels (see Figure 3.4) (all locations containing TSVs have been removed from this set for the sake of illustration). We want to find the number and locations of channels such that the temperature all over the chip is within acceptable limits while minimizing the number of channels (assuming pressure drop p is fixed). The problem is formulated as follows: unknowns : min B N = sum(b) (3.1) s.t. T (B) Tmax Here, B = {B 1, B 2,..., B N } is a vector representing the locations of all microchannels. Assuming we know the set of potential micro-channel locations, each element B n (n = 1...N) in B corresponds to one of these locations and it s value is assigned as: 1, micro-channel exists in this location B n = 0, otherwise (3.2) N is the total number of micro-channels placed. When pressure drop p is given, pumping power only depends on the total number of micro-channels N, so the objective in Equation 3.1 basically minimizes the pumping power. For a given 40

55 allocation of micro-channels, the thermal resistive network can be used to estimate temperature profile by dividing the 3D-IC into grids (using the approach illustrated in Section 2.3.3). Finding the optimal locations of micro-channels is a complex discrete problem. Now we describe an iterative heuristic that finds a good solution Heuristic for Micro-channel Placement Algorithm 1 gives the basic framework of our heuristic. The heuristic is based on iterative improvement. We start by finding a set of potential locations for microchannels as Figure 3.4 shows. Note that all the locations containing TSVs and other structures are removed from this set for the sake of illustration. In reality the potential location would be limited by TSV locations, etc. The detailed approach for finding the potential micro-channel locations is given in Section In the initial micro-channel design, micro-channels are placed at all the potential locations. We assign each micro-channel a cost which represents the impact of removing the microchannel on thermal profile. Given the initial design and micro-channel cost, the algorithm iteratively removes micro-channels until further removal results in thermal violation. In each iteration, the micro-channel with the smallest cost is removed. After each micro-channel removal, the costs of the remaining micro-channels need to be updated. This is because the impact of removing a micro-channel on the thermal profile is a function of both the power profile and also which micro-channels have been removed so far. A micro-channel that had little impact on the thermal profile if many micro-channels were present in its neighborhood might have a much 41

56 (a) (b) Figure 3.4: Potential locations of micro-channels: (a) uniform spreading of micro-channels, (b) workload-balanced micro-channel spreading Algorithm 1 Heuristic for micro-channel placement Starting from micro-channels placed at all potential locations: 1. Initialize the cost (defined below) for each micro-channel; 2. Set viscosity µ = µ(t in ), where T in is coolant inlet temperature; 3. Repeat: 4. Remove micro-channel with the lowest cost; 5. Generate the new resistive thermal model; 6. Estimate the temperature profile T 7. If T T max, update cost and viscosity, and go to step 2; 8. Else stop. higher impact when its neighboring micro-channels have been removed. Since the fluid viscosity µ is a function of fluid temperature, we also update the value of fluid viscosity after each iteration. To estimate the new viscosity, we calculate the average fluid temperature among all channels and lookup the associated viscosity from the table in [34]. The complexity in this optimization problem comes from the fact that as we change the location of channels, the underlying thermal resistive network changes. 42

57 In order to estimate the thermal impact, we need to solve Equation 2.6 every time we have a new resistive network, which, even though exhibits linear complexity for estimation of the thermal profile, can have high complexity due to the granularity of the grid. The success (both performance and runtime) of this algorithm critically depends on how potential micro-channel locations are distributed (which basically decides the initial micro-channel distribution) and how micro-channel cost is assigned and updated. In the next three subsections, we will discuss these aspects and investigate ways to improve the efficiency of the algorithm (basically reducing the required number of iterations in the algorithm) Workload-balanced Initial Micro-channel Distribution The heuristic of micro-channel placement starts with an initial distribution where micro-channels are placed at all the potential locations and iteratively removes micro-channels. The complexity of the algorithm mostly comes from the thermal estimation in each iteration. Hence we should reduce the number of iterations required by the algorithm (while still maintaining its performance), which critically depends on how potential micro-channel locations are distributed. So in this section, we investigate the method to find a good initial micro-channel distribution, which is basically finding a set of potential locations of micro-channels. As shown in [39] and [60], the underlying heat dissipated in each active silicon layer exhibits great non-uniformity. For example, typical CPU designs are generally very hot in areas surrounding ALU and cooler around caches. Therefore, spreading 43

58 micro-channels all over the 3D chip or using arbitrary initial micro-channel distribution may result in imbalance in micro-channel cooling workloads and waste pumping power. For example, for a 3D-IC shown in Figure 3.4, in the active silicon layer which dissipates power, the height of the arrow indicates the power density. If the potential micro-channel locations spread the entire chip as Figure 3.4(a) shows, the regions with higher power density are covered by similar amount of micro-channels as lower power regions. Since all channels have the same pressure drop and dimension (therefore provides same cooling capability), in order to cool the higher power density region, we need to increase the pressure drop or dimension of all channels, which is unnecessary for low power regions and leads to waste of pumping power. Therefore, we consider spreading the potential micro-channel locations according to the spatial variations in power/thermal profiles on chip. Intuitively, in those locations where the potential cooling workload is high, we try to place more microchannels as Figure 3.4(b) shows. In other words, each micro-channel should absorb same/similar amount of heat. This initial distribution could then be further optimized by our iterative approach described earlier. The problem of finding the initial micro-channel distribution is formally stated as follows: Problem Statement: Given a 3D-IC and a power profile, we would like to find N potential micro-channel locations in the micro-channel layers such that all the channels will absorb the same amount of heat. The amount of heat each microchannel absorbs can be estimated as follows: assuming the 3D-IC is divided into grids and modeled as a thermal resistive network (as Figure 2.9). The heat absorbed by micro-channel (i, j) is: 44

59 P heat,i,j = (i 2, j 2, k 2 ) G(i, j) (i 1, j 1, k 1 ) / G(i, j) I(i 1, j 1, k 1 ; i 2, j 2, k 2 ) (3.3) Here I(i 1, j 1, k 1 ; i 2, j 2, k 2 ) is the heat (current) flowing from grid (i 1, j 1, k 1 ) to grid (i 2, j 2, k 2 ) (note that (i 1, j 1, k 1 ) and (i 2, j 2, k 2 ) must be neighboring grids), and G(i, j) is the set of grids covered by micro-channel (i, j) (micro-channel located at the i-th/j-th grid in x/y direction). Therefore (i 2, j 2, k 2 ) is a grid inside microchannel (i, j) while (i 1, j 1, k 1 ) is outside micro-channel (i, j), and I(i 1, j 1, k 1 ; i 2, j 2, k 2 ) indicates the heat flowing into micro-channel (i, j) through grids (i 1, j 1, k 1 ) and (i 2, j 2, k 2 ). Here I(i 1, j 1, k 1 ; i 2, j 2, k 2 ) can be estimated by thermal analysis. For example, assuming the temperature at grids (i 1, j 1, k 1 ) and (i 2, j 2, k 2 ) are T i1,j 1,k 1 and T i2,j 2,k 2, then I(i 1, j 1, k 1 ; i 2, j 2, k 2 ) = (T i1,j 1,k 1 T i2,j 2,k 2 )/R i 2,j 2,k 2 i 1,j 1,k 1, where R i 2,j 2,k 2 i 1,j 1,k 1 is the thermal resistance between grids (i 1, j 1, k 1 ) and (i 2, j 2, k 2 ) (it is usually a combination of convective and conductive resistances). Therefore I(i 1, j 1, k 1 ; i 2, j 2, k 2 ) depends on the micro-channel structure (location and size). Assuming the total number of potential micro-channel locations N is fixed, we would like to allocate these N micro-channels so that the heat each micro-channel absorbs (P heat,i,j ) are the same. The difficulty in this problem comes from the fact that the amount of heat each micro-channel absorbs is hard to decide before micro-channel placement, since the location of micro-channels and pressure drop will largely influence the direction of heat flow and thereby influence the heat each micro-channel absorbs. Therefore, we use a minimum cost flow based heuristic to find a good initial micro-channel 45

60 density distribution. Formulation of minimum cost flow problem: To form the minimum cost flow problem, we firstly divide the 3D-IC into coarse grids and each grid can contain several micro-channels. Basically we would like to decide the density distribution of the potential micro-channel locations among the grids. Finding the density distribution of micro-channels is basically deciding the number of micro-channels in each grid. Note that, since the micro-channel encompasses the whole chip in z direction, the number/location of micro-channels in the grids at same (x, y) position are the same. So we use N i,j to denote the number of channels in the i-th/j-th grids in x/y direction (note that the grid network is coarse). The density of micro-channels should be proportional to the potential cooling workload for the micro-channels in this region. After dividing the 3D-IC into grids, we perform a thermal analysis based on this grid division assuming there is no micro-channel, and estimate the temperature at each grid. Meanwhile, we abstract the 3D-IC structure as an undirected graph. Figure 3.5 gives an example of how we form the minimum cost flow problem based on the given 3D-IC structure and thermal profile. Figure 3.5(a) shows a 3D-IC with two active silicon layers and a micro-channel layer in between. This 3D-IC is divided into coarse grids and an associated graph which captures the 3D-IC structure is formed in Figure 3.5(b) and the corresponding minimum cost flow problem is given in Figure 3.5(c). As we can see from Figure 3.5(b), each grid is represented by a node, and each pair of neighboring grids (nodes) are connected by an undirected 46

61 edge. Based on this graph and the temperature profile, the minimum cost flow problem is formed as follows: Figure 3.5: Example of formulating mincost flow network, (a) 3D-IC structure, (b) abstract grid graph, (c) minimum cost flow network Nodes: a) Each node (i, j, k) in the active silicon layer forms a source node, with a i,j,k = max{0, T i,j,k T in } units of flow available, where T i,j,k is the temperature at grid (i, j, k) and T in is the constant fluid inlet temperature. As shown in Figure 3.5(b), the active layer nodes are represented by black dots, and becomes source nodes in the minimum cost flow problem in Figure 3.5(c). b) There is a single sink node with demand a i,j,k. This node is (i,j,k) active layer represented by a black square in the minimum cost flow in Figure 3.5(c). c) Each of the other grids/nodes is represented by an intermediate node (gray dots in Figure 3.5(c)). Edges: a) Similar as the graph in Figure 3.5(b), in the minimum cost flow in Figure 3.5(c), each pair of neighboring nodes are connected by an edge and the edges are bi-directional (can take heat flow in either direction). Each edge has unlimited capacity and also a cost which is assigned as: 47

62 cost(i 1, j 1, k 1 ; i 2, j 2, k 2 ) = r 1 (T i1,j 1,k 1 + T i2,j 2,k 2 )/2 (3.4) Here cost(i 1, j 1, k 1 ; i 2, j 2, k 2 ) denotes the cost of edge connecting nodes (i 1, j 1, k 1 ) and (i 2, j 2, k 2 ). The cost is basically decided by the average temperature of the two neighboring nodes, and r 1 is a constant scaling factor. b) All the nodes in the micro-channel layers are connected to the sink node with the capacity and cost defined as follows: capacity : cap(i, j, k) = r 2 V r 3 n i,j,k T SV cost : cost(i, j, k; sink) = r 4 T i,j,k (3.5) Here V is a constant representing the maximum number of micro-channels each grid can contain, n i,j,k T SV represents the number of TSVs in grid (i, j, k), r 2, r 3 are constant scaling factors and r 4 is a small constant. The edge capacity is decided by the number of micro-channels each grid could contain at most, which depends on the number of TSVs in the grid. The existence of TSVs in a grid would reduce the capacity of each grid since micro-channel cannot be placed in the places where there are TSVs. The minimum cost flow problem basically sends the flows from source nodes to the sink node through some of the edges so that the total cost of the selected edges is minimized. The solution of minimum cost flow gives the amount of flow (e i,j,k ) that passes through each micro-channel layer node (i, j, k). Assuming N is the total number of potential micro-channel locations that we would like to find, the 48

63 number of micro-channels in grids (i, j, k) is assigned as follows: N i,j = round( k e i,j,k i,j,k e i,j,k N) (3.6) The round() function means rounding the fractional number to the nearest integer number. After getting the number of micro-channels in each grid, we uniformly place such amount of micro-channels in each grid. That is, N i,j micro-channels are uniformly distributed in grids (i, j, k) (note that we had used a coarse grained grid structure). Figure 3.4(b) shows such a workload-balanced micro-channel distribution. The grids with higher power density are allocated more channels, and within each grid (i, j, k), N i,j micro-channels spread uniformly if the TSVs do not block the placement of micro-channels. To account for the presence of TSVs, during the micro-channel placement, when there are TSVs in any place along the micro-channel location, no microchannel is allocated in this location Micro-channel Cost Assignment Given the initial micro-channel distribution, we iteratively remove micro-channels to save pumping power as Algorithm 1 shows. To determine the order in which micro-channels are removed, we assign a cost to each micro-channel, which indicates the cost of removing this micro-channel. In each iteration, the micro-channel with the smallest cost is removed. After each micro-channel removal, the cost of remaining micro-channels is updated. In this subsection, we discuss how micro- 49

64 channel cost is assigned and updated. Defining Micro-channel Cost: The temperature at an on chip location largely depends on the power dissipated in that region, and its neighboring regions. Thus, we use weighted power based approach for micro-channel cost assignment. Basically each micro-channel should absorb the heat generated in the region right below and above itself in active layers and also the heat generated in near neighbors. To assign the cost of micro-channels, we define a region of influence (ROI) for each potential micro-channel. The ROI of a micro-channel is the region to which this channel provides cooling (that is, the region right below and above this channel in active layer and also in the near neighbors). The dark region in Figure 3.6(a) shows the ROI of micro-channel 3. We divide the 3D-IC into fine grained grids, each of which contains at most one micro-channel. Let W i,j denote the cost of the microchannel located in position (i, j) (i-th grid in x direction in micro-channel layer j), it is assigned as the weighted sum of the power dissipated in its ROI: W i,j =u 1 (w 0 P i,j+1 + b max b=1 w b (P i+b,j+1 + P i b,j+1 )) b max +u 2 (w 0 P i,j 1 + w b (P i+b,j 1 + P i b,j 1 )) b=1 (3.7) Here P i,j = k P i,j,k, where P i,j,k is the power dissipated at grid (i, j, k) (the i- th/j-th/k-th grid in x/y/z direction). In z direction the channel covers the whole chip, so we sum up the power in all grids in z direction (denoted by P i,j ) and the channel cost is a weighted sum of P i,j. The weight is decided by the distance from the heat source to the micro- 50

65 channel. In Equation 3.7, u 1 and u 2 are the vertical weight factors. Assume microchannels absorb heat from the active layers right above and below them. As Figure 3.6(a) shows, u 1 is the vertical weight factor for the power from the active layer above the micro-channel, it is inversely proportional to the vertical distance between micro-channel and its top active layer. Similarly, u 2 is the vertical weight factor for the power from the active layer below the micro-channel, and its value is decided in a similar way. Here w b is the horizontal weight factor. We assume horizontally each channel has a coverage of b max in x direction, that is, each channel absorbs the heat in the region within a distance of b max from it in x direction. Note that the horizontal distance here is measured in x direction since in z direction the channel covers the whole chip. The horizontal weight factor w b (b = 1...b max ) is decided by the distance from the channel to the heat source in x direction (measured by b). We set w 0 = 1 and w b is monotonically decreasing with distance b. Updating Micro-channel Cost: After removing a micro-channel, we should update the cost of remaining channels. Basically, after a channel is removed, its neighboring channels should take care of the region covered by the removed channel (Figure 3.6(b)), and thus the cost of these neighboring channels should increase. Assuming (i 0, j) is the micro-channel we have just removed (the channel located at i 0 -th grids in x direction in layer j), we will update the cost of remaining microchannels in layers j 2, j and j + 2 as Figure 3.6(b) shows (note that layers j ± 1 are active layers), the update function is as follows: 51

66 (a) (b) Figure 3.6: (a) Cost initialization, (b) Cost update W i,j = W i,j + w i0 i W i0,j i s.t. i 0 i b max (3.8) W i,j±2 = W i,j±2 + u 3 w i0 i W i0,j Here w b is the horizontal weight factor, and u 3 is the vertical weight factor decided by the vertical distance between two micro-channel layers. The algorithm iterates until further removal of micro-channels results in thermal violation. The remaining micro-channels form the final cooling system. The cooling effectiveness of the resultant micro-channel design will be given in Section 3.7, which shows that the non-uniform micro-channel design can result in more than 50% pumping power savings compared with the conventional design. Though significant power saving is achieved, this non-uniform micro-channel structure is still inefficient in dealing with the spatial constraints imposed by TSVs. In the next section, we will investigate a TSV constrained micro-channel design that can better address this problem and further save pumping power. 52

67 3.4 TSV Constrained Bended Micro-channel Motivation of Using Bended Micro-channel The previous configuration uses straight channels that spread in areas that demand high cooling. If the spatial distribution of micro-channels is unconstrained then such an approach results in the best cooling efficiency with the minimum cooling energy. However 3D-ICs impose significant constraints on how and where the micro-channels could be located due to the presence of TSVs, which allow different layers to communicate. A 3D-IC usually contains thousands of TSVs which are incorporated with clustered or distributed topologies [57]. These TSVs form obstacles to the micro-channels since the channels cannot be placed at the locations of TSVs. Therefore the presence of TSVs prevents distribution of straight micro-channels. This results in the following problems. 1. As illustrated in Figure 3.7(a), micro-channels would fail to reach thermally critical areas thereby resulting in thermal violations and hotspots. 2. To fix the thermal hotspots in areas where micro-channels cannot reach, we need to increase the fluid flow rate resulting in a significant increase in cooling energy. To address this problem, we investigate micro-channel with bends as illustrated in Figure 3.7(b). With bended structure, the micro-channels can reach those TSVblocked hotspot regions that straight micro-channels cannot reach. This results in better coverage of hotspots and therefore better cooling efficiency and reduced 53

3D-ICs and more specifically address the constraint imposed by TSVs towards spreading of straight micro-channels [73]. 3.4.

68 (a) (b) Figure 3.7: Example of silicon layer thermal profile with TSV and (a) straight, (b) bended microchannels cooling energy. While micro-channels with bends (or serpentine organization of micro-channels) have been investigated in the past [68][23], our work is the first one to investigate this structure from the context of 3D-ICs and more specifically address the constraint imposed by TSVs towards spreading of straight micro-channels [73] Problem Formulation In this work, we would like to decide the locations and geometry of microchannels with bended structure so that its cooling effectiveness is maximized. Designing 3D-IC micro-channel infrastructure is a very complex problem. For example there are exponentially many ways to incorporate micro-channels with bends whose impact on the silicon temperature requires us to solve complex system of thermal equations. The specific problem formulation is as follows. 54

69 min P pump (e l i,j, p) s.t. e l i,j = 1, grid i {CI, CO}, channel layer l j N (i) e l i,j = k {0, 2}, grid i / {CI, CO, TSV}, channel layer l j N (i) e l i,j = 0, if grid i or j {TSV}, channel layer l (3.9) T l i (e l i,j, p) T max, grid i, channel layer l e l i,j {0, 1}, grids i, j, channel layer l e l i,j = e l j,i, grids i, j, channel layer l Figure 3.8: Example of micro-channel infrastructure design using minimum cost flow Figure 3.8 represents the problem formulation graphically. Given a set of stacked silicon layers, some of the intermediate layers between silicon layers would have micro-channels (as shown in Figure 3.8(a), two intermediate layers comprise of micro-channels). The locations of input and output orifices for the micro-channels are assumed known. We would like to find micro-channel routes from one side to 55

70 the other such that the routes do not intersect, avoid TSVs and provide sufficient cooling at minimum pumping energy. We impose a graph on each micro-channel layer as indicated in Figure 3.8(b). In the graph, each grid is represented by a node, and the edges define the immediate neighbors of a node. The micro-channel routing would be performed on this graph. If there is a TSV located on a grid, then its corresponding neighborhood edges are removed since micro-channels cannot be routed through TSVs. Let e l i,j = 1 represents the fact that there is a channel connecting grids i and j in the l-th micro-channel layer of the 3D-IC (so i and j must be neighboring nodes in the grid graph and e l i,j = e l j,i). Neither i nor j should have a TSV (because TSVs will not allow channels to go through them). In the first constraint, {CI, CO} represents the set of input and output orifice nodes, N (i) represents the set of i s neighboring nodes. So the first constraint imposes that the input and output orifice nodes must have a neighboring grid they are connected to so that their incoming/outgoing fluid can be pushed into/out-of the micro-channel layer. The next constraint imposes that, for each grid, either there is a channel going through this grid (and therefore j N (i) el i,j = 2), or no micro-channel goes through it (and therefore j N (i) el i,j = 0). In the third constraint, {T SV } represents the set of grids containing TSVs, so micro-channels cannot be routed through these nodes. The following constraint imposes that the temperature is within acceptable limits and the objective tries to minimize the pumping power. 56

71 Figure 3.9: Micro-channel infrastructure design flow Overall Micro-channel Design Flow This is a very complex problem since: 1) the variables need to be discrete, and 2) the thermal and pumping power models are highly nonlinear. In this section we investigate such a methodology as illustrated in Figure 3.9. Our methodology follows a sequence of logical steps. First the severity of the thermal problem and the need for having micro-channels is evaluated by performing a full scale thermal analysis. Based on the severity of the thermal problem (location, intensity of hotspots) an initial micro-channel design is developed. This design is further improved for reducing the cooling power footprint and improving the thermal effectiveness using iterative methods. Now we go into the details of these individual steps Mincost Flow Based Micro-channel Design The full scale 3D thermal analysis would identify locations of hotspots in different layers which cannot be removed by conventional package/air cooling based approaches. These are the areas which require sufficient proximity to the micro- 57

72 channels. Since solving the formulation in Equation 3.9 is intractable, we use simple models to come up with a sufficiently good initial micro-channel infrastructure which is iteratively improved subsequently. In order to develop this initial solution we use the minimum cost flow formulation Initialization of Minimum Cost Flow Network Consider the 3D-IC and the corresponding grid graph of each micro-channel layer as illustrated in Figure 3.8(a)(b). For each micro-channel layer, we instantiate a minimum cost flow problem as follows (see Figure 3.8(c) for illustration). The nodes corresponding to the input/output orifices for the given micro-channel layer are assigned a supply/demand of one flow unit. All nodes in the grid graph have a capacity one. The edges have unlimited capacity and are bi-directional (can take fluid flow in either direction). As indicated earlier the edges between two neighboring nodes exist only if neither of the nodes has a TSV. This enforces the routing constraint imposed by TSVs. Figure 3.8(c) indicates the flow network for the two micro-channel layers. Each node has a cost whose assignment would be discussed subsequently. We would like to send flow from inlet nodes to outlet nodes such that the capacity constraints are not violated and the cost is minimum. Assigning the node capacity to be 1 would ensure that all the flow from inlet to outlet follows simple paths (nonintersecting and non-cyclic). A minimum cost flow formulation with a well defined node capacity could be solved using very similar methods as a formulation with edge 58

73 capacity alone [65]. It is noteworthy that because there is an edge between each pair of neighboring nodes, the flow path could take several bends if necessary Cost Assignment The cost assignment should be such that the minimum cost flow formulation develops an initial infrastructure that distributes the micro-channels with higher density in areas that demand more cooling. The chip scale thermal analysis would identify locations of grids in the silicon layers that are in dire need of cooling (see Figure 3.8(a)). A silicon layer would be cooled by the micro-channels both above and below (unless the silicon layer is at the very top or very bottom of the stack). For example, the middle silicon layer in Figure 3.8(a) could be cooled by two microchannel layers unlike the top and bottom silicon layers. As illustrated in Figure 3.8(b), each micro-channel layer is represented as a grid graph. The amount of cooling required at a certain node in this graph is a function of how hot the top and bottom grids in the silicon layers are. It also depends on how we chose to distribute the cooling demand at a certain location in the silicon layer between the micro-channel layers just above and just below. Let us suppose a certain location in the silicon layer has temperature T T max and requires cooling (estimated by full scale thermal analysis). Let ut (with 0 u 1) represent the fraction of this cooling demand assigned to the micro-channel grid right above and (1 u)t represent the cooling demand assigned to the micro-channel grid just below. If u is set very low then most of the cooling will be done by the channel layer below 59

74 and vice versa for large u. Let u l i be the heat load partitioning factor of grid i in silicon layer l, it is assigned as follows. Case 1: If l is the topmost (bottommost) layer, then u l i = 0(u l i = 1) so that all the cooling demand goes to the micro-channel layer right below (above) l, which is layer l 1 (l + 1). Case 2: If l is neither top nor bottom layer, 0 u l i 1, implying that the heat generated in grid i of silicon layer l needs to be distributed in the two micro-channels layers right above and below. If the channel layers above and below (layers l +1 and l 1) have the same number of TSVs then u l i = 1/2, else it is scaled linearly such that more cooling demand is assigned to the micro-channel layer with lesser TSVs. Given the partitioning factor u l i, the cost is assigned as follows. (See Figure 3.10 for an illustration.) Let cost(i, l) denote the cost for node i in micro-channel layer l (hence layers l 1 and l + 1 correspond to silicon layers just below and above the micro-channel layer l), three cases are considered depending on whether there is hotspot below and above this node in the silicon layers l 1 and l + 1. Case 1: Hotspots on both sides. When the grid i in both silicon layers l 1 and l + 1 are in hotspot regions (T l 1 i > T max and T l+1 i > T max ), the micro-channel should provide cooling to both sides (above and below), so the cost is: cost(i, l) = [(1 u l+1 i )T l+1 i + u l 1 i T l 1 i ] (3.10) Here the first component inside the square bracket indicates the cooling demand from the silicon grid above and the second component corresponds to the cooling 60

75 Figure 3.10: Cost assignment demand from the silicon grid just below. Higher demand leads to lower cost since we would like micro-channels to pass through high cooling demand regions. See Figure 3.10 for an illustration. Case 2: Hotspot in one side. When the silicon grid i on only one side (l 1 or l + 1) is in hotspot region (but not both), the cost is assigned as cost(i, l) = (1 u l+1 i )T l+1 i, if T l+1 i T max u l 1 i T l 1 i, if T l 1 i T max (3.11) Case 3: No hotspot in either side. When there is no hotspot in either side, then the node cost is assigned to a small positive value cost(i, l) = ϵ > 0. The minimum cost flow formulation would therefore route flows such that maximum number of high cooling demand grids are touched by the channels. The non-hotspot regions are assigned a small positive cost. This would enable the minimum cost flow formulation to avoid areas that do not demand high cooling. 61

76 3.4.5 Micro-channel Refinement The primary objective of the minimum cost flow formulation is to come up with an initial micro-channel design that carries cooling in sufficient proximity of hot areas. This is not enough to guarantee effective cooling. For example, some channels have several bends and/or may be routed over disproportionately large number of hotspots. Both of these situations cause a degradation in the overall cooling quality. In this section we present approaches for iteratively refining the design for improved cooling effectiveness. The micro-channel infrastructure refinement process works as illustrated in Figure Temperature and Pumping Power Analysis The impact of micro-channels on the 3D-IC thermal profile is a function of how the micro-channels are routed and also how much fluid flow they carry. The initial design generated using minimum cost flow technique does not prescribe the pressure drop and the fluid flow rate that the channels need to work at. Hence given the micro-channel design, we then need to estimate the smallest pressure drop that the pump needs to work at such that thermal constraints are satisfied. Given the micro-channel design, the smallest pressure drop value results in the smallest pumping energy. As indicated earlier, we assume that all channels are subjected to the same pressure drop by the pump, hence the minimum pressure drop can be determined by linearly increasing pressure drop ( p) and calculating the thermal profile for each value until the thermal constraints are met. For a given pressure 62

77 drop across the pump and a given micro-channel design, Equation 2.21 could be used to determine the velocity (fluid flow rate) in each channel. Note that because each channel has different number of bends and total length, the flow rate would be different too. Based on the flow rate information which is computed for a given pressure drop, the associated thermal conductance matrix G could be computed. This information could be used to estimate the thermal profile of the 3D-IC for a given pressure drop. After finding the minimum required pressure drop ( p), we could calculate the required pumping power. This technique is highlighted in Algorithm 2. Algorithm 2 Finding the minimum required pumping power 1. p = p min, and repeat steps 2-6: 2. Calculate the fluid velocity using Equations 2.21; 3. Calculate thermal conductance matrix G; 4. Estimate temperature profile; 5. If thermal violation occurs, p = p + δp; 6. Else break; 7. Calculate pumping power Iterative Micro-channel Optimization The objective of minimum cost flow formulation did not capture cooling energy and/or number of bends in the channels. Figure 3.11 illustrates typical situations that can occur. In Figure 3.11, the two micro-channels have significantly different cooling demands (Figure 3.11(a)) and number of bends (Figure 3.11(b)). Such imbalance (in cooling demand and bend count) leads to increase in the required pressure drop and thereby increasing the pumping energy. The basic idea is that all the channels should have similar levels of heat load, length and number of bends. 63

78 (a) (b) Figure 3.11: Examples of (a) unbalanced cooling demand, (b) different number of bends Hence if a channel has too many bends or goes through many hotspots while others are shorter, then other channels could be made longer thereby more uniformly distributing the heat load and also reducing the number of bends in the most critical micro-channel. Based on these considerations, we try to refine the initial design by 1) balancing the heat loads among micro-channels and 2)reducing unnecessary bends. Micro-channel heat load balancing: Starting from the initial design we identify the micro-channels which have disproportionately high heat removal load and spread their heat load into neighboring channels. Algorithm 3 highlights the iterative pairwise micro-channel cooling load balance process. In the first iteration of pairwise micro-channel cooling workload balance, we start from the channel with the highest cooling workload. Here the cooling workload is measured by the total heat absorbed by the micro-channel, which could be calculated using P = (T out T in )/R io. Here T in is the fluid supply temperature at micro-channel inlet, and T out is the fluid temperature at micro-channel outlet, R io is the total thermal resistance between the fluid inlet and outlet of that spe- 64

79 cific channel. Given the pressure drop, power profile of the 3D-IC and the location and dimensions of the micro-channels, these parameters could be easily calculated (see discussion in Sections 2.3 and 2.4, as well as reference [76]). Assuming i is the channel with the highest cooling workload, we then pick one of i s neighbors (either left or right) with lower cooling workload, say channel k, and balance the workload between channels i and k. Algorithm 3 Pairwise micro-channel cooling load balance Repeat: 1. Pick the micro-channel with highest cooling load i; 2. Pick a micro-channel k from i s neighbor with smaller cooling load, that is, k = argmin k {i 1,i+1} (load(k)); 3. Equally divide the hotspot region covered by channels i and k, and assign one of the region to channel i, the other to channel k; 4. Remove some edges on the boundary between these two regions from the grid graph; 5. Resolve the minimum cost flow based on new graph; 6. Temperature analysis and calculating minimum required pumping power using Algorithm 2; 7. If no further pumping power saving could be achieved, stop. To balance the workload of channels i and k, we firstly partition the hotspot regions covered by channels i and k. This region is bounded by channels min(i, k) 1 and max(i, k) + 1. For instance, as shown in Figure 3.12 in which we would like to balance the workload between channels 2 and 3. Then, the hotspot region covered by channels 2 and 3 is bounded by channels 1 and 4 (region identified by dotted line in Figure 3.12). To equally partition this region, basically, we would like the resultant two parts have similar total amount of heat load (cooling demand). As indicated earlier, the cost of a node i at the l-th micro-channel layer signifies the degree of cooling desired there. The total cooling needed in the region covered by channels i and k is simply the sum total of the cost in all the associated grids. We 65

would like each channel to be assigned about half of this total cooling load in that region. Hence we would like to partition this region into two subregions with the same total cooling load.

As soon as we have collected grids whose sum total of cooling load is 1/2 of that of the region, we stop. The boundary between these two subregions is defined in this fashion.

80 would like each channel to be assigned about half of this total cooling load in that region. Hence we would like to partition this region into two subregions with the same total cooling load. Starting from the top left grid of the region covered by i and k, we traverse the grid network in a row major form (left to right and then bottom). As soon as we have collected grids whose sum total of cooling load is 1/2 of that of the region, we stop. The boundary between these two subregions is defined in this fashion. A row major form of traversal ensures that each channel will be somewhat uniformly loaded with heat from a spatial perspective. Now one region is assigned to i and the other is assigned to k. In order to find the exact route of the micro-channels we can remove the edges connecting the two regions and solve the minimum cost flow formulation once again (see Figure 3.12). This would ensure that channels i and k do not encroach on each others regions. In the case where the minimum cost flow could not return feasible solution due to the removal of too many edges, we will add some removed edges back until a feasible solution is returned. Figure 3.12: Example of pairwise cooling workload balance The minimum cost flow gives a refined micro-channel structure design. We then redo the temperature analysis and find the minimum pumping power for the new design using algorithm 2. 66

81 In the next iteration of optimization, we find the currently highest workload micro-channel in the new design and do pairwise load balance on this channel using the new graph updated in the previous iteration. We repeat this process iteratively until no further pumping power saving could be achieved. Bend Elimination As shown in section , the corners/bends in the micro-channel will introduce considerable pressure drop, which increases the pumping power. Bends in micro-channels allow us to reach areas which cannot be directly connected due to the presence of TSV obstacles. But unnecessary bends which have been incorporated due to the heuristic nature of our algorithm provide little benefit while impacting the cooling quality. As a final refinement step we develop a pattern matching based scheme for removing unnecessary and redundant bends on the channel networks. We firstly generate a library of the patterns of unnecessary corners and use pattern match to find those unnecessary corners in our design. Then, we replace those corner patterns with some equivalent patterns with lesser corners. Figure 3.13 highlights a few patterns and their replacement patterns. This step should be performed in a judicious fashion. Removing corners in the hotspot region might lead to reduction in the micro-channel cooling performance since it reduces the level of coverage. Hence we only remove those corners in the non-hotspot regions which can easily be identified by the thermal analysis. The algorithms used for pattern matching are similar to those used in technology mapping. The exact details of how pattern matching is done has been omitted here. 67

82 Figure 3.13: Examples of bend elimination 3.5 Hybrid Cooling Network Motivation of Hybrid Cooling Network Besides micro-channels, TSVs are also considered as an alternative solution for cooling of 3D-ICs. TSVs are usually made of copper which has better thermal conductivity than silicon or metal-oxide, and hence enable better vertical heat conduction between different layers. When the number of signal TSVs is not enough, dummy thermal TSVs are inserted to further mitigate the thermal issues. Both micro-channels and thermal TSVs have advantages and drawbacks in performing 3D-IC cooling. Micro-channel liquid cooling: The cooling effectiveness of micro-channel is quite high and they have been reported to support heat densities as high as 700W/cm 2 [84]. However as illustrated earlier, the drawback of micro-channel based heat removal technology is that the cooling system consumes extra energy for pumping the coolant through channels. On the other hand, the presence of TSVs that connect signals and power between layers constraints the locations where channels could be placed, since micro-channels cannot be placed in the locations where these TSVs are allocated (as shown in Figure 3.1). This constraint limits the heat removal 68

83 capability of micro-channels. Thermal TSV: The thermal TSVs help alleviate the 3D-IC thermal issues by establishing heat transfer paths from heat source to heat sink using high thermal conductivity materials, so that heat can be more effectively absorbed by heat sinks. It also moves heat from hot to cool areas (without consuming extra cooling power) to balance the heat between layers and make the thermal profile more uniform. However, thermal TSVs only help redistribute heat instead of removing heat. Moreover, since the TSVs can only be placed in the whitespace between the layout, the number and locations of thermal TSVs are limited by the chip floorplan. As a result, their cooling capability is limited. Also, large number of TSVs will increase the fabrication cost, degrade the yield of chips and exacerbate the thermal stress problem in 3D-IC. Based on these considerations, in this section, we propose a hybrid 3D-IC cooling scheme: a cooling network which uses micro-channel based liquid cooling together with thermal TSVs [75]. In this hybrid cooling network, micro-channels and thermal TSVs work in a mutually complementary way. Thermal TSVs redistribute heat and establish heat dissipation paths that deliver heat to micro-channels, and the heat is then removed by micro-channels. This hybrid cooling scheme would provide sufficient level of cooling to the 3D-IC using fewer cooling power and thermal TSVs. To extract maximum cooling effectiveness, we would like to co-optimize the allocation of micro-channels and thermal TSVs. 69

84 3.5.2 Algorithm for Hybrid Cooling Network Design Our algorithm for micro-channel and thermal TSV co-optimization is based on iterative improvement. The overall iterative design flow is similar as the algorithm in Section 3.3. But instead of iteratively removing micro-channels, we use a constructive approach. That is, we start from the 3D-IC structure without any micro-channel or thermal TSV, and iteratively add micro-channels and size thermal TSVs until they could provide sufficient cooling. The overall constructive design approach is illustrated in Algorithm 4 and Figure Algorithm 4 Heuristic for micro-channel and thermal TSV co-optimization Starting from the 3D-IC structure without micro-channels or thermal TSVs: 1. Assuming we are given a set of potential micro-channel locations, initialize the priority level of each potential micro-channel; 2. Repeat until thermal constraint is satisfied: 3. Add a micro-channel with highest priority; 4. Decide the locations and sizes of thermal TSVs; 5. Set up thermal resistive network, estimate thermal profile; 6. If thermal constraint is satisfied, stop; 7. Else update priority of un-added channels and go to step 2. The algorithm starts by finding a set of potential locations for micro-channels. We use the algorithm proposed in Section to find the set of potential microchannel locations. Based on the potential micro-channel locations, we assign a priority for each potential micro-channel. The priority is associated with the significance of the microchannel in removing heat. In each iteration, we firstly add a micro-channel with the highest priority (that is, the most important micro-channel). Then we insert or size thermal TSVs based on the current micro-channel allocation. After the microchannel and thermal TSV placement in each iteration, we check if the current cooling 70

85 Figure 3.14: Overall design flow of micro-channel and thermal TSV co-optimization system design could provide enough cooling to the 3D-IC. If not, we will continue to add more micro-channels and resize thermal TSVs. Once we have added a microchannel, we need to update the priority of the remaining un-added micro-channels before adding another micro-channel. We repeat this iterative process until thermal constraint is satisfied. The success of this approach depends on how micro-channel priority is assigned and how thermal TSVs are allocated and sized. The next three subsections explain them in detail Micro-channel Priority Assignment/Update The micro-channel priority assignment and update is similar as the microchannel cost assignment/update approach presented in Section 3.3.4, with slight modifications on the updating formulation as Equation 3.12 shows. 71

86 W i,j = W i,j w i0 i W i0,j i s.t. i 0 i b max (3.12) W i,j±2 = W i,j±2 u 3 w i0 i W i0,j Basically, when we add a micro-channel, this micro-channel absorbs heat from the regions surrounding it, so the cooling workload of its potential neighboring micro-channels would reduce. Hence the priority of the potential neighboring microchannels should decrease as Equation 3.12 shows Thermal TSV Allocation and Sizing After inserting a micro-channel in each iteration, we place thermal TSVs in the remaining available area to further reduce the chip temperature. For thermal TSV allocation and sizing, we use the basic idea of iterative thermal conductivity updating proposed in [24], but improve it for better rate of convergence Basic Thermal TSV Placement Approach In the approach proposed in [24], the 3D-IC is divided into fine grids. It finds the distribution of thermal TSVs by calculating the desired vertical thermal conductivity of each grid that could eliminate or mitigate thermal problem. Their approach is based on iterative improvement. To update the thermal conductivity in each iteration, the vertical thermal gradient q z between two vertically neighboring grids is calculated, and the vertical thermal conductivity k z in each grid is updated using the following equation: 72

87 k new z = qold z qz new k old z (3.13) where q old z is the current vertical thermal gradient, and the new thermal gradient q new z (which is the desired thermal gradient after this iteration) is chosen as some value closer to the ideal thermal gradient q ideal than q old z : q new z = q ideal ( qold z q ideal ) θ (3.14) Here θ is a user defined parameter between 0 and 1, which is used to control the rate of convergence. In each iteration, the thermal conductivity of all grids is updated simultaneously. Once the algorithm converges, they calculate the number/size of thermal TSVs in each grid that could result in the desired thermal conductivity using Equation 2.9. Adding a thermal TSV will change the thermal conductivity matrix G (given in Section 2.3.1) and hence change the thermal gradient q z across the chip. So basically every time we have placed or sized a thermal TSV, we need to recompute the thermal profile and get the updated thermal gradient q z before updating the thermal conductivity of the next grid. Nevertheless, in [24], the thermal conductivities of all grids are updated simultaneously in each iteration. In order to simultaneously update the thermal conductivity of all grids without recalculating the thermal profile, the parameter θ should be close to 1 so that the change in thermal conductivity in each step is very small and therefore has little influence on the thermal gradient of other grids. However using such a θ value leads to slower convergence rate. 73

88 Modified Thermal TSV Allocation and Sizing Approach In our modified thermal TSV planning approach, we still use the basic iterative updating framework given in [24]. However, as explained earlier, the approach proposed in [24] needs to use a large θ which indicates slower rate of convergence. So in our modified approach, instead of modifying the thermal conductivity in all grids in each iteration, we only update a subset of the grids E. The grids in this subset E should satisfy the following two conditions: a) all the grids in this set have very small interdependence with each other, and b) they have large influence on the hotspot regions. The first condition ensures that only those grids that are independent of each other are updated. So when we change the thermal conductivity of a grid in set E, the thermal gradient of other grids in this set almost does not change. Hence we could simultaneously update all the grids in this set using a small θ which indicates faster rate of convergence. The second condition ensures that we focus on updating those grids that are most likely to reduce the hotspot temperature. This could help us to reduce the number and size of thermal TSVs used. We call this subset maximum independent set E. The success of this approach depends on how many independent grids we could find and simultaneously update without recomputing the thermal profile in each iteration. The micro-channel heat sinks basically behave as heat isolators (since they carry heat away) and therefore reduce the interdependence between grids. Hence the existence of micro-channels leads to more independent grids that can be updated simultaneously. 74

89 Based on these two conditions, our modified thermal TSV placement and sizing algorithm works as follows: Algorithm 5 Algorithm of thermal TSV placement and sizing 1. Estimate interdependency of each pair of grids; 2. Repeat steps 3-6 until the stop condition is satisfied: 3. Assign a weight to each grid according to its interdependency with hotspot grids; 4. Find the maximum independent set E; 5. Update the thermal conductance of the grids in set E using the approach given in Section ; 6. Update thermal gradient and grid interdependency, go to step Calculate thermal TSV size/density in each grid based on the achieved thermal conductivity. E in detail. In the next subsection, we explain how to find the maximum independent set Finding Maximum Independent Set E For a given 3D-IC structure, to estimate the interdependency between grids, we firstly calculate the inverse of thermal conductance matrix G. This inverse matrix H = G 1 satisfies T = H Q. Here H(i, j) basically indicates how much temperature increase in grid i is caused by the power dissipation in grid j. If H(i, j) > 0, when the thermal conductivity at grid j changes, it will affect the temperature at grid i. The interdependency of each pair of grids depends on how many power sources they share. Here we use interdependency matrix IN T to indicate the interdependency between each pair of grids. The interdependency matrix is defined as: 1, if H(i) H T (j) > ζ INT (i, j) = INT (j, i) = 0, otherwise (3.15) 75

90 Here INT is symmetric. INT (i, j) indicates whether grids i and j are interdependent (1 indicate the two grids are interdependent and 0 indicate they are independent). INT (i, j) is decided by the correlation between grids i and j which is measured by H(i) H T (j) (H(i) represents the i-th row of matrix H). When the correlation is very small (less than ζ), we assume the two grids are independent and set INT (i, j) to 0, otherwise, we set it to 1 which indicates the two grids are dependent. Once we get the interdependency matrix, we would like to find the set of grids that are: a) independent of each other and b) have maximum dependency with hotspot grids. To achieve this, we assign a weight to each grid which indicates its interdependency with all hotspot grids, and then find the set of independent grids with the maximum total weights. Grid weight assignment The weight of each grid c i which represents its interdependency with hotspot regions is assigned as follows: c i = INT (i) E T, for each grid i (3.16) where INT (i) is the i-th row of interdependency matrix INT, and E = {E j, grid j} is a vector indicating whether each grid is a hotspot: 76

91 1, if T j > T max where T j is temperature of grid j E(j) = 0, otherwise (3.17) Here T max is the thermal constraint. If the weight of a grid is high, this basically means that the grid has higher interdependency with hotspots. We would like to focus on updating those grids that have higher interdependency with hotspots since inserting thermal TSVs in these grids can better reduce the hotspot temperature. Finding independent grids with maximum total weight Given the weight of each grid and the interdependence between them, we would like to find the set of grids which are independent and have the maximum total weight. This problem is equivalent to weighted clique problem which is NP complete. Many existing works have proposed heuristics to find a good solution. Here we use the adaptive, randomized greedy approach in [31]. Once we get this maximum independent set E, we simultaneously update the thermal conductivity of the grids in this set using the approach illustrated in Section Since the grids in this set have very small interdependence with each other, we can use a θ close to 0 thereby achieving faster convergence rate. Moreover, because we only update grids that are highly interdependent with hotspots, we could use fewer thermal TSVs. Updating interdependence matrix 77

92 The change in thermal TSVs will change the thermal resistive network thereby changing the grid interdependency. So after we updated the thermal TSV in each iteration, we should update the interdependence matrix based on the new thermal resistive network. A simple approach is to regenerate the thermal conductance matrix G and then recalculate its inverse matrix H as well as the interdependency matrix IN T after every iteration. However, the problem is calculating the inverse matrix H is time consuming. To save time for computing interdependence matrix, we only calculate (initialize) matrices H and INT once at the beginning of the algorithm before allocating or sizing any thermal TSV, and every time we updated thermal TSV, we only update some elements of matrix INT instead of re-calculating the whole matrix. By exploring the interdependency matrix, we found that, the interdependency between two grids largely depends on the distance between them. We define an interdependence region for each grid, which includes all the grids that are interdependent with this grid. We found that, each grid usually has higher interdependence with the grids close to it and smaller or no interdependence with those grids far away. So the interdependence region of a grid is usually a region surrounding that grid as Figure 3.15 shows. As we have added or enlarged a thermal TSV, the interdependency between grids would generally increase because the thermal conductivity increased. So the interdependence region of each grid is enlarged (as Figure 3.15(a) shows). On the other hand, as we reduce the size of a thermal TSV, the interdependency between grids reduces and the interdependence region of each grid shrinks (as Figure 3.15(b) 78

93 Figure 3.15: Change in interdependence region of a grid (a) after allocating or enlarging a thermal TSV, (b) after shrinking a thermal TSV shows). Usually the change in a thermal TSV only affects the interdependence regions of the grids close to this TSV. The level by which we enlarge/shrink the interdependence region of each grid depends on the distance between this grid and the newly allocated/sized thermal TSV, and also depends on the amount by which we have sized the thermal TSV. Once we updated the interdependence region of the grids, we can modify the interdependence matrix IN T based on the new interdependence region of each grid. Stop condition: We keep updating the thermal conductivities iteratively until one of the following situations occurs: a) Thermal constraint is satisfied. In this case, no more thermal TSV is needed. b) The thermal TSV capacity is reached. In this case, no more thermal TSV could be added. c) Peak temperature cannot be further reduced. In this case, the algorithm converges, so adding more thermal TSV will not help reducing the chip temperature. After the thermal TSV allocation/sizing, we perform thermal analysis and 79

94 check if the resultant micro-channel and thermal TSV allocation could provide enough cooling to the 3D-IC. If the resultant maximum temperature is within the thermal constraint, then the current design is our final design. Otherwise, we will continue to add micro-channels and size thermal TSVs until thermal constraint is satisfied. 3.6 Considering Thermal Variations The previous approaches in Sections assume that the power profile is fixed and known, and design the cooling structure based on the given power profile. In reality CPU power profiles are strong function of the application and vary based on the workload the CPU is experiencing at a given time. We address this problem by using multiple training power profiles. Given a set of training power profiles (that represent different classes of applications and workload levels), we would design the cooling structure (non-uniform, bended micro-channel or hybrid cooling system) which provides enough cooling to all the power profiles using minimum amount of pumping power. Conventionally such approaches are addressed by choosing the profile with the highest total dissipated power (TDP) and designing the cooling system based on it. But such approach fails to account for the fact that a power profile with a smaller TDP might end up with thermal violations due to the nature of its hotspots even if the profile with higher TDP does not. The advantage of using multiple training power profiles is that the resultant micro-channel network could adapt to various power profiles. 80

95 Figure 3.16: Flow chart of micro-channel placement Our approach that accounts for multiple power profiles is illustrated in Figure 4.3. We start with the power profile with the highest TDP and design the cooling structure for this power profile using the heuristics given in Sections We call this a pilot power profile. Then we test if all the power profiles meet the thermal constraint. If a set of power profiles violate the thermal constraint, then the pilot power profile is refined using Algorithm 6 and the cooling structure is re-designed based on the new pilot power profile. Algorithm 6 Pilot power profile refinement Assuming temperature constraint violation occurs in power profiles P 1, P 2,..., P M ; 1. For m = 1 to M 2. Increase power density of pilot power profile in the region where thermal violations occurs in power profile P m. The refining process in step 2 of Algorithm 6 is basically increasing power density of pilot power profile in the regions where thermal violation occurs in the other power profiles. This would enable the micro-channel placement heuristic to allocate more cooling (either micro-channels or thermal TSVs) in that region. For example, if the violation occurs at grid (i, j, k) of power profile P m, we increase the power consumption at grid (i, j, k) and all grids surrounding (i, j, k) in the pilot power profile. The level of increase depends on the degree of thermal violation and 81

96 the distance from (i, j, k). The performance of this heuristic depends on the range in which we choose to increase the power in the pilot profile. If this range is large, the algorithm will converge faster but might have more channels and therefore higher pumping power. 3.7 Cooling Performance of Micro-channel Designs Now we compare the cooling effectiveness of the three micro-channel designs. In our experiment, we use a three-tier stacked 3D structure. In the 3D-IC, three active layers are vertically stacked and the micro-channel layers are below each active layer. There is also an air-cooled heat sink at the top of 3D-IC. We use the ITC 99 circuits, which are typical synthesized circuits consisting of AND, OR, NOT, NAND and NOR gates, to generate the 3D-IC benchmarks [4]. Each 3D-IC layer contains several arbitrarily chosen ITC 99 circuits. We use the Capo placer to place the gates in each layer [1]. To obtain the power profiles for each layer, we randomly assign a switching activity factor (between [0, 1]) for each gate and use the power models in [47][86] to estimate the power consumption. Based on the placement information, we also find the whitespace between layout, and randomly allocate 1000 signal TSVs in the whitespace. This forms our testing benchmarks. The chip dimension is W = L = 9mm. We setup the resistive network by using the hotspot like model in three dimension [77]. The micro-channel width and height is x = 100µm and z = 200µm, and the diameter of TSV is 10µm. The overall thermal resistance of the heat sink for air cooling is 0.5 /W. The inlet 82

97 coolant temperature is 10 and the maximum temperature constraint T max is 85. We compare the pumping power of the three micro-channel designs proposed in this chapter. The comparison is given in Table 3.1 and Figure For all the power profiles, the air cooling cannot provide sufficient cooling to reduce the chip temperature below thermal constraint. Here, All channels design indicates the conventional micro-channel design that spreads straight micro-channels all over the interlayer regions, and save indicate the pumping power saving of each design over the All channels design. As we can see from the table, the Non-uniform micro-channel design saves about 57% pumping power compared with the All channels design. Using bended micro-channel could save another 11% pumping power. Among these three approaches, the Hybrid cooling network saves most pumping power (78% pumping power savings compared with the conventional All channels design). Ppump(W) All channels Non uniform Bended Hybrid Pchip(W) Figure 3.17: Comparison of Pumping Power 3.8 Runtime Thermal Management Using Micro-channels Recently, the micro-fluidic cooling has also been adopted in dynamic thermal management (DTM) to control the runtime CPU performance and chip temperature 83

98 Table 3.1: Comparison of pumping power All channels Non-uniform Bended Hybrid P chip N P pump N P pump save N P pump save N P pump save % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % Average % % % by tuning the fluid flow rate through micro-channels [19][18][61]. In this section, we investigate a micro-channel based DTM scheme that could provide sufficient cooling to the 3D-IC using minimal amount of cooling energy [71]. In this DTM scheme, assuming the micro-channel structure has already been decided using either of the aforementioned structures (Sections ), it dynamically controls the pressure drop across the micro-channels based on the runtime cooling demand. Now we explain our micro-channel based DTM scheme in detail Algorithm for Micro-fluidic Based DTM The temperature profiles on chip is a strong function of the power dissipated, while the power dissipation depends on the applications which change at runtime. 84

99 In order to track the runtime thermal and power state, thermal sensors are placed at various chip locations. Our micro-channel based DTM keeps track of power profiles at runtime using the information achieved by thermal sensors and adaptive Kalman filter based estimation approach (proposed in [96]), and then decides the micro-channel pressure drop based on it. To estimate the power profile, [96] assumes there are M different power states (power profiles), each of which essentially represents a certain class of applications. The Kalman filter holds a belief of what the current power profile is and predicts the temperature profile based on this belief. Meanwhile, the thermal sensors keep measuring the temperature. The power estimation method in [96] iteratively compares the temperature predicted by Kalman filter and sensor observations. If the error between them is close to zero, this indicates that the belief of current power state is correct. Otherwise, the belief might be wrong, which means the power state has changed. Once the change in power state is detected, it tries to decide the new power state, which is the one most likely to result in the current sensor reading. Interested readers are referred to [96] for the details of this adaptive power estimation approach. Once the power profile is obtained, we select the best pressure drop which provides enough cooling for this power profile using minimum pumping power. Hence the micro-channel based DTM problem is formally stated as follows. Given: a 3D-IC design, its power distribution is a function of the architecture and application. Assuming the power profiles are given (or estimated using appropriate sensors) and the micro-channel structure is also fixed, we would like to find 85

100 the pressure drop for each power profile such that the temperature across the chip is within acceptable limits while minimizing pumping power: min P pump ( p) min p s.t. G( p) T = P T T max (3.18) p min p p max The objective minimizes the pumping power used by micro-channels. When the regular straight micro-channels are used, the pumping power can be calculated using Equation If bended micro-channels are used, the pumping power is calculated using Equations 2.14 and The first constraint indicates the resistive thermal model, where P is a 3D- IC power profile and T is the corresponding thermal profile, and G is the thermal conductivity matrix which depends on the pressure drop p. The second constraint indicates that the peak temperature should not exceed the thermal constraint T max. The last constraint gives the feasible range of pressure drop. This optimization problem is difficult to solve directly because of the complexity of thermal model and the impact of micro-channel on temperature. Therefore we use a linear search based approach to find the best pressure drop. Assume the micro-channel structure is already decided, therefore the pumping power P pump is only a function of pressure drop in this problem. It can be proved that the pumping power for both straight and bended micro-channels is monotonic increasing func- 86

101 tion of the pressure drop p. Hence minimizing pressure drop basically minimizes pumping power, and the problem is simplified to finding the minimum pressure drop that provides enough cooling. The pressure drop p influences the heat resistance R heat, thereby changing the cooling performance. Increase in pressure drop results in increased fluid velocity v and flow rate f, while higher flow rate results in smaller heat resistance R heat and hence better cooling performance. In summary, a larger pressure drop would result in better cooling at the cost of higher pumping power. Hence cooling effectiveness is a monotonic function of pressure drop. Therefore the linear search approach can find the best pressure drop. Specifically, this is done by starting from the minimum pressure drop p = p min and increasing it step by step until thermal constraint is satisfied. Due to the monotonic nature of the impact of pressure drop on micro-channel cooling effectiveness, this linear search approach can result in the optimal selection of pressure drop for a given micro-channel configuration Performance of Micro-channel Based DTM We then implemented the runtime thermal management by micro-channel pressure drop control. Here we assume the underlying micro-channel design is the non-uniform straight micro-channel configuration proposed in Section 3.3. We use the same 3D structure as Section 3.7 and tested three groups of benchmarks with different power profiles. In the first group (group L), we generate 6 different 3D-IC 87

102 power profiles whose total dissipated power (TDP) ranges from W. Based on the non-uniform micro-channel design, we select the best pressure drop for each power profile and calculate the associated pumping power. The second (group M ) and third (group H ) groups are generated in a similar way, but with higher total dissipated power. Figure 3.18 shows the required pumping power for each group of benchmarks using runtime DTM and fixed pressure drop approach. In fixed pressure drop approach, we use the lowest pressure drop that could provide enough cooling to all benchmarks in this group. P chip is the TDP of each benchmark. The runtime pressure drop controlling approach achieves an average of 39%, 43% and 46% pumping power saving for benchmark groups L, M and H. The pressure drop calculation can be done off line and stored in a table. Once we detect a specific power profile occurs, we simply look up the best pressure drop for this power profile. 3.9 Summary This chapter investigated the optimized micro-fluidic cooling configurations. The first configuration (hotspot-optimized non-uniform micro-channel design) allocates micro-channels only in hotspot regions so that less channels are used, thereby saving pumping power. In this configuration, straight micro-channels are used. The straight micro-channels are easy to manufacture and more power efficient compared with bended micro-channels of the same length. However, straight micro-channels are inefficient in addressing the spatial constraints imposed by TSVs. Hence in the 88

103 1 0.8 Ppump(W) fixed pressure dynamic pressure Pchip(W) (a) Ppump(W) fixed pressure dynamic pressure Pchip(W) (b) 10 8 Ppump(W) fixed pressure dynamic pressure Pchip(W) (c) Figure 3.18: Runtime pressure drop control versus fixed pressure drop for (a) group L, (b) group M, (c) group H 89

104 second configuration, we proposed the usage of bended micro-channel, which can be flexibly routed to hotspot regions while avoiding TSVs. In order to further reduce the pumping power overhead, we also proposed a hybrid cooling network which utilizes dummy thermal TSVs (that reinforce vertical heat transfer) and micro-channels together. Compared with the conventional micro-channel design that spreads straight micro-channels all over the interlayer region, the optimized configurations can result in 57%, 68% and 78% pumping power savings respectively. In these designs, microchannel structures are designed after the electrical part of the chip, hence they are compatible with the standard IC design flow. We also proposed a micro-channel based dynamic thermal management method that controls the pressure drop at runtime to allow real time thermal control. Through runtime pressure drop tuning, we can further save about 43% pumping power compared with using fixed pressure drop. However, as illustrated in Section 1.4, the electrical, thermal, reliability and cooling aspects are all interdependent. Hence, separating the design of electrical and cooling system will lead to sub-optimal designs. In the next chapter, we will investigate the electrical and cooling system co-design to achieve further powerperformance improvement. 90

105 Chapter 4 Co-design of Electrical and Fluidic Cooling Systems 4.1 Motivation for Co-Design In the conventional chip design flow, cooling considerations are put in place after the entire system has been designed (as Figure 4.1 shows). Such a postfix approach can lead to sub-optimality, such as significant pumping power, competition with TSVs, thickening of silicon substrate and impact on reliability. As illustrated in Section 1.4, the electrical, thermal, reliability and cooling aspects are all interdependent. It is important to investigate the interplay between electrical and fluidic aspects, and develop avenues for co-design. Such co-design can result in the following advantages: 1. Higher cooling in timing critical areas results in better performing designs since transistor delay is proportional to temperature. 2. Higher cooling in timing critical areas enables us to aggressively pursue high power dissipating performance enhancements such as increasing supply voltage. This results in higher performance without impacting temperature since the extra heat can be manager by micro-fluidics. 3. The design optimization could be more aggressive since temperature issue can be addressed by aggressive cooling (placement, floorplanning etc.) 91

106 Figure 4.1: Conventional chip design flow 4. Increasing the cooling levels in high leakage areas helps reduce the overall power since leakage is a highly non-linear function of temperature. Reduction in leakage may be significant enough to make increase in pumping power irrelevant. 5. Micro-fluidics may impact silicon thickness causing TSV performance degradation. By smart electrical design, this degradation could potentially be removed. For example, degradation in TSV performance could be overcome by stronger drivers. In this chapter, we investigate two electrical and cooling co-design problems. Section 4.2 investigates the TSV allocation/assignment and micro-channel placement co-design [70], and in Section 4.3, a gate sizing and micro-fluidic co-design problem is investigated [71]. 92

107 4.2 Co-optimization of TSV Assignment and Micro-Channel Placement In 3D-ICs, the interlayer nets use TSVs to deliver signals and power among different layers. Recently, significant attention has been made to the problem of allocating interlayer nets to TSVs that allow their successful routing. Existing work mostly tries to address this problem with the objective of minimizing total wirelength. Two general approaches have been investigated: Post-Placement [48][90][95] and In-Placement [36]. In Post-Placement approaches, cells are firstly placed in the 3D-IC. This determines the whitespace distribution capable of supporting TSVs. These potential TSV locations are then allocated to the interlayer nets such that the total wirelength is minimized [48][90][95]. In-Placement approaches perform simultaneous optimization of cell placement, TSV placement and interlayer net to TSV assignment during the 3D-IC placement process itself. While both approaches have their advantages, in our work, we assume the placement to be already done before TSV assignment to the interlayer nets (Post-Placement paradigm), though our work could also be extended to the In-Placement approach. Conventional Post-Placement approaches for interlayer net to TSV assignment do not consider the possibility of adding micro-channels in the interlayer regions. TSVs impose significant constraints on how and where the micro-channels can be located, and form obstacles to the micro-channel placement since the micro-channels cannot be placed at the locations of TSVs. The location of TSVs is essentially decided by the allocation of interlayer nets to TSVs. The exiting works for Post- 93

108 Figure 4.2: Thermal profile of one 3D-IC layer, and an example of TSV and micro-channel allocation where TSVs constraint us from allocating micro-channels at hotspots Placement TSV allocation (which ignore the possibility of allocating channels) and micro-channel placement as proposed in the previous chapter (which assume the TSV locations to be fixed) do not consider the possibility of combining these steps for obtaining better results. Two trivial approaches for allocating TSVs to nets and micro-channels to interlayer regions together can be conceived as follows: TSV first approach and Micro-channel first approach. If micro-channels are allocated before TSVs, there is a possibility of increase in wirelength since the available whitespace for TSVs shrinks due to the existence of micro-channels which deter allocation of TSVs in those areas. A TSV first approach also has disadvantages. For instance, if TSVs are placed at or near hotspot regions which preventing the allocation of micro-channels at that hotspot, the cooling effectiveness of micro-channels will suffer. In this section, we investigate co-optimization of TSV assignment and microchannel allocation simultaneously such that the total wirelength is minimized, and maximizing the micro-channel cooling effectiveness [70]. As stated earlier, we assume a Post-Placement paradigm. 94

CSE140L: Components and Design Techniques for Digital Systems Lab. Power Consumption in Digital Circuits. Pietro Mercati

CSE140L: Components and Design Techniques for Digital Systems Lab Power Consumption in Digital Circuits Pietro Mercati 1 About the final Friday 09/02 at 11.30am in WLH2204 ~2hrs exam including (but not