Stochastic modelling for energy efficient supercomputing clusters

Size: px

Start display at page:

Download "Stochastic modelling for energy efficient supercomputing clusters"

Hope Pierce
5 years ago
Views:

1 Stochastic modelling for energy efficient supercomputing clusters TUAN V. DINH PhD 2015.

2 Abstract Energy efficiency has become a priority in supercomputing clusters because of the rising energy consumption and growing computing demand for data analyses, scientific simulations, and emerging technologies such as Cloud Computing and Big Data. Power proportional computing a goal in which power consumption is proportional to the serving load is a promising approach to improving energy efficiency. In this thesis, we investigate control provisioning algorithms applying to the entire cluster and speed scaling techniques applying to single processors to realise that goal. At a macro level (entire cluster), power proportionality can be achieved by switching off idle computing servers. It is however important that this is done with a minimum impact on performance, thus requiring effective dynamic capacity provisioning. We study the stochastic behaviours of the future job occupation in a supercomputer via the Mt X /G/ queue, with an aim to assist the provisioning decision of an energy saving control framework, that employs Model Predictive Control. Our simulation results show that recognising and incorporating the dependency between the number of jobs arriving as a batch, and the runtime of each of those jobs can improve the prediction ability. Via real-data driven simulations, our framework shows an encouraging reduction in energy consumption compared to a scheme that switches servers off after they have been idle for a preset time. This is particular marked when the dependence is large. We also investigate a workload prediction alternative that uses individual user information. This model shows improved results over auto-regressive models that have been found effective in predicting resource utilisation in grid computing. At a micro level, we exploit the flexibility of modern computing servers where processors can run at different speeds with which the energy consumption varies accordingly by using speed scaling techniques. After developing numerical algorithms for finding the optimal speed-load mapping policy, we examine the ability of the system to operate in a near-optimal manner. When the error in the load estimate is low, the performance does not monotonically improve as the number of levels increases. This suggests that it may be possible to define a meaningful measure of local robustness which could be used to determine the number of speeds required at the design stage.

3 Dedication To my Mom and Dad. 1

4 Acknowledgement There are many people who have helped me during my candidature. First and foremost, I would like to express my deep gratitude to A/Prof. Lachlan Andrew for his mentorship and supervision. Despite moving to a new university, he has been with me every step of the way, from preparing my project proposal at the very beginning to the very last sentences of this thesis. It is my privilege to be his student and I would have never gone this far without his support, guidance and patience. I would also like to thank Dr. Philip Branch and Dr. Yoni Nazarathy for co-supervising me. Besides my research, they have also given me guidance and advice in many aspects of the academic life. I thank Prof. Jarrod Hurley for his generosity in granting me access to the log files of the Swinburne Supercomputer that greatly facilitated my research, and Gin Tan for her time and collaboration. I also thank Prof. Michel Mandjes for his kindness to agree to meet me in Amsterdam and his comments on my work, that enlighten and help me to improve my model. Many thanks to all the members of CAIA. In the last four years, I have had many great memories with the people with whom I met and worked at this place. I have even played futsal with some of them, for years. I would like to thank all of them, Tung, Hien, Lawrence, Nigel, Suong, Imrul, Redika for all the chats, fun, stories we had and shared. Those were an important part of this academic life which is tough sometimes. When doing a summer internship at CAIA back in 2008, I knew I wanted to comeback to this EN605 lab to pursue this degree. The experiences and memories I had at this place will always be a high note of my life. Finally, I would like to thank all the FSET, Swinburne Research and administrative staff for doing an excellent job in providing me with great assistance over the years. 2

5 Declaration I declare that the thesis hereby submitted to Swinburne University of Technology for the Doctor of Philosophy s degree in Information and Communication Technology has not been previously submitted by me elsewhere, that is my original work and to the best of my knowledge contains no material previously published or written by another person except where due reference is made. Signed by TUAN VAN DINH Date 3

6 Contents Dedication Acknowledgement Declaration List of Figures List of Tables List of Abbreviations List of Notations Introduction The green challenge in high performance computing Proposed approaches Hardware and architecture replacement Efficient cooling Power proportional computing The power proportional wall Dynamic capacity provisioning - a macro approach Speed scaling for computing components - a micro approach Challenges & constraints Summary Overview of Thesis Thesis statement Thesis contribution Thesis roadmap List of publications Background Power management in HPC facilities Power distribution in a High Performance Computing facility

7 2.1.2 Cooling systems Resource management in HPC facilities A case study of Swinburne supercomputers Monitoring tools Power proportionality for storage systems Cluster supercomputers vs. Data Centres Supercomputing in academic and research Measures of energy efficiency Other perspectives on energy efficiency Summary Techniques for power proportionality Overview Dynamic capacity provisioning Enhancing existing infrastructure approaches Generic & theoretical approaches Models based on operator staffing in call centres Power proportionality in a distributed storage systems Power efficiency of computing components Speed scaling for energy efficient cluster computing Optimality and robustness of speed scaling Supercomputing computing workload characterisation Modelling workload characteristics for prediction Workload consolidation and coordinated scheduling Server virtualisation Coordinated scheduling Summary & Remarks A control framework with Model Predictive Control Overview Framework Control objective Definitions of the cost objectives Formulation I Formulation II Mt X /G/ model of future workload

8 4.2.6 Assumptions and simplications Numerical solution techniques Existing results on Mt X /G/ Recognising the dependency between job size and batch size Gaussian approximation Linear Programming for solving Formulation I Dynamic Programming for solving Formulation II Summary Evaluation of the control framework via simulation Discrete event simulation Simulator: implementation and features Events Simplification of job scheduling Simulator validity Performance metrics for evaluation Impact of the dependence between job sizes and batch sizes Generating synthetic traces for simulations Benefits of incorporating job size and batch size dependence Online parameter estimation Estimation of the rate function λ t (s) Estimation for job size and batch size distributions Performance using real trace inputs Improvement over passive estimations Performance with respect to other schemes Summary and remarks Estimating arrival workload using individual user information Motivation and related work Roles of workload prediction in cluster computing Related work Properties of the workload traces Overview of the Swinburne Supercomputer Aggregate and individual user behaviour Prediction based on hazard functions of job inter-arrival time distributions Estimating hazard rate function from the arrival records

9 6.4 HR-based prediction schemes Scheme definition Clustering small users Numerical evaluation Objectives Benchmark models Estimating rates rather than arrivals Performance evaluation Remarks Summary of contribution Concluding remarks Energy efficiency in a single processor Overview Model and Control Schemes Model and notations Cost objective Control schemes Optimal design of an architecture A = < A < A =, continuum speed Global and local robustness of a desgin Global Robustness in Fixed Allocation and Adaptive Allocation Designs Quantitative measure of Local Robustness Related work and practical implications Related work Practical implications Summary and remarks Conclusion and future work Conclusion Future work Final remarks Appendices 184 7

10 A Proof of Proposition B Implementation of the discrete event simulator 189 B.1 Overview B.2 Extended features B.2.1 Server states B.2.2 Type of Events B.2.3 Extra features of the QueueSystem B.2.4 Modifications to the BIRTH&DEATH module B.2.5 Modification to the MPC MODULE B.3 A simple model of the rate limitation B.3.1 Remarks C Cost objective of speed scaling model 196 8

11 List of Figures 1.1 Causes and consequences of rising energy consumption in High Performance Computing (HPC) facilities Illustration of the power proportionality wall in a server An illustration of the relationship between Moab and Torque Job arrival characteristics of user A on a random day. Horizontal axis indicates arrival times during that 24 hour-span while the vertical shows the amount of requested CPUs Job arrival characteristics of user B on a random day. Horizontal axis indicates arrival times during that 24-hour span while the vertical shows the number of requested CPUs Seasonal characteristics of the Swinburne Supercomputer workload: Load is generally higher in office hours than late evening and early morning, although it is more obvious in 2010 than 2011 and On weekends, workload is also generally less than that on weekdays Building blocks of the simulator Comparison of the cluster utilisation level between simulation and Ganglia s data for 15 day period (November, 2011) The solid line is the mean job size of batches size indicated by the horizontal axis whilst the dash line marks the one-hour threshold The daily average batch arrival of a few variates of the simulated arrival process (the length of each variate is 60 days) along with the simulated rate calculated by the supercomputing traces Batch size distributions of a few synthetic traces with the empirical distribution

12 5.6 The job size c.d.f of synthetic workload with a reference to the empirical distribution obtained from traces Job size and batch size correlation Time-averagepredictionerrorofremainingjobsamongexistingjobs(E[U t (v)] Ũ t (v)) at slot (v = 0,...,W 1) Time-averagepredictionerrorofremainingjobsamongnewarrivals(E[X t (v)] X t (v)) at slot (v = 0,...,W 1) Statistics of reduction in total cost of DEP over IND with different types of workload (ν H = 100 = ν T = 100, ǫ = 0.05) DEP vs. IND with cost breakdown Performance of the Fixed-Idle scheme with different values of the idle threshold This presents the cost (normalised) against the specified ǫ for IN D and DEP schemes. The scattered points are the costs of using FI scheme, in which the horizontal axis presents the true shortfall rate Empirical complementary cumulative distributions of the batch size Empirical and Gamma fitted job size distributions SE (left column) vs. AE (right column) in the breakdown of energy costs and switching costs for several values of ǫ. Simulation time: T = 360 days, starting on 1 st -Jan An illustration of a(t) From left to right: (i) FI (t w = 10 hours), (ii) the MPC-based scheme (AE) with optimisation Formulation I, (iii) the offline optimal cost. ǫ = Workload contribution by users in 2010 and Most workload was generated by a few users; approximately, 15% of users generated 85% of the total workload Total number of submissions in 2011 of the 10 biggest users (measured in total CPU hour of all submitted jobs in that year). Note that the big users do not necessarily submit many jobs Curve fittings for six top users Curve fittings for 3 clusters AR coefficients for different values of p, calculated using Yule-Walker method and the workload traces of Swinburne Supercomputer in 2010 as the sample workload

13 6.6 Arrival workload of Swinburne Supercomputer for a five-hour interval. Solid lines are the total requested CPUs in slots (width = p ), while dash lines are the smoothed version using (6.13). The arrivals appear to be very bursty but the overall characteristics are typical for such supercomputer systems ( p = 5 minutes, W p = 6) Fixed Allocation (FA): The error metric, or the distance to the optimal cost using policy optimised for load λ d with the reality of load λ a and fixed thresholds System occupancies and general costs when actual load is very close to µ max (λ a = 0.9µ max ). This figure compares costs for A = 0 and A = 3 with low design loads Adaptive Allocation (AA): Distance to the optimal cost using policy optimised for load λ d with the reality of load λ a and adjustable thresholds Benefits in cost reduction of having adjustable thresholds at runtime Local robustness of a fixed allocation (FA) as function of the number of speeds: R(.) B.1 Building blocks of the extended simulator

14 List of Tables 1.1 Progress of energy efficiency in data centres Advantages and drawbacks of system approach for power-aware provisioning Advantages and drawbacks of power-aware provisioning as a control optimisation Main properties of the generated workloads. Note that only the generation of job size distinguishes True-correlation workload from High-correlation workload The mean prediction error (proportion of capacity). Input workload-type: High-correlation, ν H = The mean prediction error (proportion of capacity). Input workload-type: True-correlation, ν T = User clusters with WEKA using k-means algorithm Mean squared error and mean absolute error for different prediction schemes

15 List of Abbreviations AA Adaptive Allocation. AE Adaptive estimation. AR Auto-regressive. ARFIMA Auto-regressive Fractional Integrated Moving Average. ARIMA Auto-regressive Integrated Moving Average. ARMA Auto-regressive Moving Average. BSD Berkeley Software Distribution. CMOS CPCI CRAC CRS Complementary metal-oxide-semiconductor (CMOS). Computer Power Consumption Index. Computer Room Air Conditioning. Capacity Right Sizing. DCiE DCP DCPI DFS DFVS DP DR DRS DVS Data Centre Infrastructure Efficiency. Dynamic Capacity Provisioning. Data Centre Physical Infrastructure. Dynamic Frequency Scaling. Dynamic Voltage Frequency Scaling. Dynamic Programming. Dynamic Range. Dynamic Right Sizing. Dynamic Voltage Scaling. EPEAT Electronic Product Environment Assessment Tool. FA Fixed Allocation. 13

16 FI FLOP Fixed Idle. Floating-point Operations Per Second. GPU Graphics Processing Unit. HPC High Performance Computing. IC ICT IEEE IoTs IPMI ISA IT Integrated circuit. Information and Communication Technology. Institute of Electrical and Electronics Engineers. Internet of Things. Intelligent Platform Management Interface. Instruction Set Architecture. Information Technology. LCP LLC LP Lazy Capacity Provisioning. Limited Look-ahead Control. Linear Programming. MAE MPC MSE Mean Absolute Error. Model Predictive Control. Mean Squared Error. NHPP Non Homogeneous Poisson Process. PDU PMF PS PSA PUE Power Distribution Unit. Probability Mass Function. Processor Sharing. Pointwise Stationary Approximation. Power Usage Effectiveness. RHC Receding Horizon Control. SE SLA SPUE Static estimation. Service Level Agreement. Server Power Usage Effectiveness. 14

17 SSA Simple Stationary Approximation. TCP Transmission Control Protocol. TORQUE Telescale Opensource Resource and Queue Manager. TPUE Total Power Usage Effectiveness. UPS Uninterrupted Power System. VM Virtual Machine. 15

18 List of Notations (Chapter 4 & 5) k κ N 0 W length of a MPC slot. slot number. total number of slots of an entire horizon. total number of servers (capacity). limited look-ahead horizon. n(k) number of powered-on servers during slot k. u(k) number of servers being switched at the start of slot k. Z(k) total number of jobs running at slot k. β 1 β 2 β 3 weight of the energy costs. weight of the switching costs. weight of the performance penalty. C(k) cost of slot k in formulation I. ǫ Ĉ(k) X t (s) U t (s) Z t (s) λ(t) shortfall threshold. cost of slot k in formulation II. predicting number of jobs arrived after t and still in the system at t+s. predicting number of jobs arrived before t and still in the system at t+s. predicting number of jobs in the system at t+s. rate function of the non-homogeneous batch Poisson process. µ G average service rate. p b the probability mass function of batch size. F G cumulative distribution function of job size. F G complementary cumulative distribution function of job size. F Gl cumulative distribution function of job size in a batch of size l. F Gl complementary cumulative distribution function of job size in a batch of size l. 16

19 M t number of batches running in the system at t. Λ t (s) (compound) rate of the compound Poisson random variable X t (s), see X t (s). Ω k defined as t F 0 Gk (t s)λ(s)ds, see F Gl and λ(t). U t (0 ) number of jobs running in the system at t. θ ǫ ˆX t (s) Û t (s) Ẑ t (s) ξ W X ξ W U ξ W X ξ κ X ξ κ U ξ κ X minimum number of servers to meed optimisation constraints with parameter ǫ. Normal approximation of X t (s). Normal approximation of U t (s). Normal approximation of Z t (s). limited-horizon average prediction error of incoming jobs. limited-horizon average prediction error of remaining jobs. overall average prediction error in limited horizon. average prediction error of incoming jobs in entire control horizon. average prediction error of remaining jobs in entire control horizon. overall average prediction error in entire control horizon. 17

20 List of Notations (Chapter 6) z j (.) probability mass function of the inter-arrival times between batches of user j. h j (.) hazard rate function of z j (.). ĥ j (.) estimated hazard rate function of z j (.) from data. µ j mean number of required CPUs per request of user j. s j (t) the last time before t user j submitted a job. δ maximum gap between consecutive jobs in a batch. N u T max ˆL C φ i p M W p p total number of users. maximum inter-arrival time in empirical data (for a user or a cluster of user). number of big users. number of clusters of small users. i th (i p ) coefficient of the autoregressive model AR(p). order of the AR(Autoregressive) model. number of timeslots in evaluation of HR workload prediction schemes. average window for smoothing. length of a slot in evaluation of HR workload prediction schemes. 18

21 List of Notations (Chapter 7) λ Poisson rate of M/G/1 PS queue. λ d λ a design load. actual load in real time. Q(t) number of jobs in the M/G/1 PS queue at time t. s n A M service rate (speed) when there are n jobs in the system. architecture parameter specifying number of unique speeds. set of available speeds. E energy cost to process a job. N number of jobs in the system. T waiting service time of a job in the system. P n Power consumption when occupancy of the M/G/1 PS is n. β weighted cost of delay to energy cost in speed scaling model. 19

22 Chapter 1 Introduction 1.1 The green challenge in high performance computing Increased energy consumption - a rising concern The rising energy consumption of Information and Communication Technology (ICT) infrastructure, together with the increase in electricity price have become a major management and operational concern among computing and networking facilities. In 2006, ICT infrastructure was estimated to account for nearly 3% of global electricity consumption and the same proportion of greenhouse gas production. More importantly, this was expected to increase rapidly [42]. Such an increase not only affect electricity bills but also raises other problems such as cooling costs, hardware failure, etc. Meanwhile, the fast growth of the Internet infrastructure and applications in addition with nonstop production of new technology and hardware require even more energy. Among this ICT complex, nearly half of the energy consumption and greenhouse emissions are contributed by servers in data centres where tens of thousands of machines are networked and placed in relatively small areas. From the High Performance Computing (HPC) operation management point of view, besides electricity costs, increasing power consumption means more investment in cooling systems, shorter lives of servers and supporting devices such as Power Distribution Units (PDUs), and cables due to heat damage, hence extra care and costs are required. In addition, the availability of electricity power, especially at peak utilisation, has recently become a significant problem [150]. 20

23 Environmental impacts There are also considerable environmental concerns behind rising energy consumption. Even though energy usage and the contribution of CO 2 emissions of ICT equipment is well below that of agriculture or transportation industries, they are not simply 3% problem alone [163]. In fact, there are further economic and management consequences. For example, heat dissipation can degrade hardware and increase the failure rate that consequentially leads to more IT waste (computer hardware, electronic devices, etc.) which is known to be very environmentally unfriendly both in their production and disposal [96]. In recent years, the density of clustered racks placed in a physical room has increased due to the constantly rising demand of computing resources [166]. In data centres, thermal management is an important operational requirement to keep the temperature and humidity in the room at acceptable levels for the deployed hardware. As a result, a major cost in data centres is the cooling cost, and as the power usage increases the cooling cost also increases and so does system failure rate, wear-and-tear cost and so forth. The annual budget for power and cooling in cluster computing is approaching the annual budget for deploying new servers [97]. Computing demand on the rise On the other hand, the consumption of power in data centres or supercomputing facilities is expected to grow due to the increasing demand of computing power and emerging technologies. For example, scientific research demands multi-fold more powerful supercomputer facilities. Engineers are working to build exa-scale supercomputers that can achieve more than FLOPs. In addition, high performance supercomputers are an important tool for Big Data that increasingly attracts attention in both academic and industrial disciplines. The proliferation of cloud computing also greatly contributes to this trend as cloud computing relies largely on the computing capacity of these supercomputing facilities. The huge potential in terms of cost effectiveness and convenience will likely make cloud computing services increasingly popular. For example, ventures can save on the capital cost of infrastructure investment and maintenance, and data can be accessed from everywhere. As the world has become more connected than ever and half of the world population will probably live in cities in a near future, the Internet of Things (IoTs) will increase the use of ICT technologies to a whole new level [87]. The IoTs phenomenon is exciting but it does add more computation burden on the cloud, as end-user devices will only act as terminals. That computation burden is then of course redirected to the data centres and 21

24 supercomputing facilities and make the task of green computing even more urgent and challenging [23]. Energy efficiency as a primary design objective Governments have encouraged consumers to favour energy efficient products by promoting benchmarks and tools to evaluate the effect of the products on the environment. For instance, the Electronic Product Environment Assessment Tool (EPEAT), is a rating system that helps consumers choose environmentally friendly products. EPEAT was initiated by the Western Electronic Product Stewardship Initiative and endorsed by international communities, including the Institute of Electrical and Electronics Engineers (IEEE) [4]. In the United States, the Energy Star program promotes energy efficient standards for a wide range of electronic devices (including computers and servers) and helps guide businesses and consumers towards greener computing [5]. Figure 1.1: Causes and consequences of rising energy consumption in High Performance Computing (HPC) facilities.. In data centres and computing clusters, energy efficiency has become a mainstream concern in green management. In fact, power and cooling has been recognised as primary design concerns in supercomputers for many years. Experts now view energy efficiency as an important metric when considering performance of cluster servers [56]. 22

25 1.2 Proposed approaches Hardware and architecture replacement. Since attention was brought to the rising energy consumption in data centres, much research and engineering effort has been carried out in many aspects of this problem ranging from software, hardware to cross platform re-design to architectural considerations. In the long run, replacing existing non-energy efficient hardware components that are application oriented and energy efficient is a desired approach. This approach however would require the collaboration of many parts of this ecosystem due to their complexities and interdependencies. The interdependence can be viewed as the tight relation between the business model and technical implementation, end-user impacts and impacts to services, as suggested in [187]. For example, as many data centres have begun to use Graphics Processing Units (GPUs) (which are capable of more computation per a power unit [98]), new programming models and languages must be developed to efficiently use the new hardware Efficient cooling Attention has shifted towards making existing infrastructure more energy efficient. First, IT power is not the only major energy consumer. The cooling infrastructure is also responsible for a large amount of energy consumption. However, advanced cooling techniques developed in recent years have greatly reduced the power consumption of cooling systems. They include effective cooling designs [30, 54, 154], and capitalising on free cooling, especially liquid cooling which is widely regarded as a success [86]. These practices will be discussed with more details in in which a proper context is introduced. In [86], Greenberg et al. present their investigation on the Power Usage Effectiveness - PUE (which is the ratio of the total power consumed by the entire facility to the power consumed by only IT equipment) for 22 data centres, that found the PUE is in the range of 1.33 to 3 (or the Computer Power Consumption Index - CPCI (inverse of PUE) varies from 0.75 to 0.33). Recently, Google has claimed their data centres operate at a PUE of 1.11 in the last quarter of 2013 [3]. Their average PUE in mid-2008 was already around Table 1.1 summarises the progress of improve PUE (largely due to smart cooling practices). As power consumption of the cooling system has been greatly reduced, further savings can only be achieved by reducing the actual power consumption of the infrastructure that is directly involved with the computation activities, namely computing servers, storage 23

26 Table 1.1: Progress of energy efficiency in data centres Time Report on PUE A study of Greenberg et al. [86], in PUE found varied from 1.33 to 3 for 22 investigated data centres, 1.83 on average. Belady et al. studied three data centres A careful designed data centre can achieve [32]. a PUE of 2. US Environmental Protection Agency s A PUE score of 1.4 can be achieved by report, published in [42] Last quarter, Google s data centres achieved a PUE of 1.11 [3]. system, etc. In the following section, we shall discuss power proportional computing, one of the approaches to realise that goal. 1.3 Power proportional computing The power proportional wall Power proportionality is a goal in which power consumption is proportional to the utilisation of the computing device. Utilisation can be understood as the CPU s load. To realise power proportionality, the power consumption must be proportional to the utilisation as illustrated by the unmarked solid line (for the case of linear proportionality) in figure 1.2. Unfortunately, it is far from the case in practice as a server consumes a significant amount of power even when idle. This non-trivial power is consumed by components such as disks, memory and the power supply unit. One of the measures for power proportionality of a server is the dynamic range, DR = 1 P idle P peak (DR = 1 in the ideal case) [189]. Note that for the same value of DR, there are a few possible curves of the power consumption as a function of the CPU s utilisation (Figure 1.2). For a typical computing cluster server, the power usage of an idle server (near zero utilisation) often exceeds 60% of its peak power [51] (DR = 0.4). In fact, according to [27], even an energy efficient server still consumes around 50% of its peak power when doing no useful work. More importantly (still from [27]), the energy efficiency of a server when it is at 20% to 30% utilisation - the most frequent occurrence - is only half that of the peak performance. Here, the energy efficiency refers to the utilisation level over the power usage. Obviously, it is a very poor score, but however implies opportunity for improvement. 24

27 power consumption (peak normalised) linear proportional DR=0.4 (linear) 0.2 DR=0.4 (sublinear) DR=0.4 (superlinear) Efficiency, DR=0.4 (linear) (%) CPU utilisation Figure 1.2: Illustration of the power proportionality wall in a server.. Much effort has been made to improve the power proportionality in computing servers. In fact, greater energy efficient servers can achieve higher DR scores using more energy efficient hardware components or innovated server architectures [60, 144, 145, 189]. For example, Wong et al. [189] recorded a DR score of 0.8 in their examination of server power proportionality trend in nearly 300 servers, although the average is still less than 0.7. However, these methodologies requires highly customised servers or significant redesign and it raises the concerns whether the costs can justify the benefits. Nevertheless, achieving DR=1 or close to 1 is still considered nearly impossible. Note that power proportionality is not the only approach in green computing. For example, reducing power consumption at peak or any level of a CPU s utilisation is among the alternatives, although it requires innovation and modification in hardware. More importantly, as the power gap between being idle and completely switched off reduces (DR approaches 1), the need to completely switch off idles servers becomes less Dynamic capacity provisioning - a macro approach Because of the significant gap of power utilisation of idle servers in practice (close to 50% peak power) and the case of power proportionality (zero), switching off idle servers can be considered as the most efficient methodology to save energy. For clusters where the overall utilisation is less than the full capacity (a common occurrence as clusters rarely operate at full capacity [160]), meaning that there are many servers sitting idle, this method can result in great energy savings. By reducing the power usage from at least 50% peak to nearly zero, server usage becomes more power proportional. As a result, the whole cluster achieves a greater efficiency in terms of spending energy for an amount of work performed. Such an approach that provisions just enough number of servers 25

28 while maintaining server satisfaction is called Dynamic Capacity Provisioning (DCP) or Dynamic Right-sizing (DRS). In addition, workload consolidation techniques (such as virtualisation) can help concentrate workload into a smaller number of servers. For example, applications that require a small amount of memory for a short period of time can be run on a virtual machine (VM) and that VM can then be squeezed into a running (physical/real) machine instead of allocating a new machine to it, especially when that machine must be powered on, not to mention the time it takes for booting. Job scheduling (while still maintaining acceptable service with respect to the Service Level Agreement (SLA)) can also play a role in improving energy efficiency; for instance, by avoiding switching a server on by delaying executing a job and waiting for a soon-to-be available server. Therefore, DCP can work hand in hand with workload consolidation techniques to achieve even a greater energy saving. Note that the DCP approach only deals with the overall capacity, without any preference to how the load is actually distributed among the servers or scheduling policies. It can be considered as a method to achieve power proportionality in a macro perspective although there are proposals to combine DCP with detailed power saving techniques for individual servers [69]. The load that is allocated to a particular server can vary and speed adaptation techniques can be used to further improve the effectiveness and efficiency of energy use, as shall be discussed in the following section Speed scaling for computing components - a micro approach Speed scaling is an approach to scale the individual server s speed to adapt to the variation of the workload assigned or distributed to that server with the aim to improve energy efficiency. For example, it is in some cases energy wasteful for CPUs to run at the maximum speed, especially for tasks of which CPU-time is relatively small compared with I/O (Input/Output) time. In CPUs, energy consumption is proportional to the clock frequency. Specifically, it is known energy consumption is approximately proportional to the cube of the the frequency in most CMOS ICs [43]. Therefore, potential energy saving can be realised by scaling down the clock frequency in CPU non-intensive periods, or applying speed scaling techniques in general. In order to better understanding how speed scaling can fit in the big picture of power proportionality in cluster computing, consider a useful breakdown of power usage in an HPC facility proposed by [28] through the metric of the energy efficiency defined as: 26

29 Energy Efficiency = 1 PUE 1 SPUE Energy covered by computation Energy consumed by computing components. (1.1) 1 By multiplying the total energy consumption with the first term ( PUE ), it takes out the energy consumed by IT equipment. The SPUE (Server Power Usage Effectiveness), according to Barroso et al. [28], is the server PUE. That is the energy consumed by the electronic components that are directly involved in computation such as CPUs, DRAM, disks and so on. Other components include power supplies and fans. Inequation1.1, thetotal(ortrue)powerusageeffectiveness(tpue=pue SPUE) tells us the portion of power consumed by the actual computing components in the total power draw of the entire cluster. So far, the energy consumption of these components with respect to computation amount (workload) is not yet considered with a reference to energy efficiency exploitation. One may consider the Dynamic Capacity approach to improve the SPUE while speed scaling techniques try to reduce the denominator of the third term, thus improve the overall energy efficiency. Moreover, speed scaling techniques can be particularly useful in systems where switching off may be prohibitive due to the strict access to data (note that this problem is less significant with the presence of a separate storage system which is very common in modern data centres) or the time to wake up the machine [113, 174, 189]. There have been many studies in improving energy efficiency of CPUs using speed scaling techniques, such as Dynamic Voltage and Frequency Scaling (DFVS), Dynamic Voltage Scaling(DVS) or Dynamic Frequency Scaling(DFS)[19, 21, 33, 43, 69, 72, 81, 186]. The goal is to reduce the overall energy consumption by appropriately reducing the CPU s speed. One of the constraints in this problem is the task s execution time, which is obviously affected by the speed scaling. It is therefore often formulated as an optimisation problem to find the best policy that matches the CPU s speed to a particular load level while still satisfiying some given constraints of performance Challenges & constraints Dynamic capacity provisioning The realisation of the DPC approach in practice faces several challenges. First of all, it requires prediction of the future workload, and obtaining reliable and accurate workload prediction is non-trivial. Secondly, the switching costs are non-trivial and may introduce addtional overheads. 27

30 When booting or shutting down a server, this process consumes a considerable amount of energy, due to the need of saving and loading system states into memory or drives [120]. There are also other related services which must be re-initiated in order to fully re-attach the server into the cluster, which is also known as the setup cost. This process takes nontrivial time to complete, up to an order of minutes (4 minutes, reported in [79]), and can impose significant delays for some applications as well as increase management overheads. Furthermore, switching activities can impose wear-and-tear costs and increase the server failure rate. The third concern of the DCP approach is the impact on performance. It is undesirable to save energy at an expense of performance; or at least a bound on performance degradation due to provisioning activities must be known. In some case, a few seconds of delay (or less) could have a serious impact on service. For example, a study shows that for every 100 microsecond delay, Amazon loses 1% in sales [2]. It is undesirable to sacrifice performance even for energy savings. As a result, provisioning decision must be made in a balanced way considering all these factors. Finally, there are non-trivial technical resistances that must be overcome to provide effective provisioning and a minimised impact on performance. The technical overhead doesnotjuststopatimplementingmechanismtoswitchserversonandoff. Thereareissues related to routine administrative responsibilities. For example, software is constantly updated and it is important that all servers are updated with the newest version. When turning servers on and off, this process is interrupted and such an intervention imposes difficulty from a management point of view. However, some system updates can only be tested when a server is booting, thus more frequent switching may actually be beneficial as errors can be detected in an early stage of the deployment. Speed scaling When applying speed scaling techniques, the first question is essentially to find the scaling policy that maps the scaled speed to the load. The actual scaling can be achieved by different means such as DFVS, DFS or DVS. Although a slower speed reduces power consumption, it increases the execution time which can be subjected to performance or application constraints. Therefore, the speed should be chosen so that energy saving is maximised while the service constraints are met. As a result, it is often regarded as an optimisation problem to which the solution is a scaling policy that optimises an objective involving both energy consumption and service delay. There have been many published studies on this optimisation problem with variations 28

31 in formulations and constraints (which shall be discussed in more details in Chapter 3, where a literature review on this subject is presented). However, there are several remaining issues that the previous proposed models have not yet covered or at least they are not entirely applicable to some fundamental practical constraints. For example, the speed is always upper-bounded as there is a limit to the system s capacity. Moreover, although modern servers support multiple speed processors via different clock rates, the number of the available tuned rates, hence the number of speeds is finite. Therefore, these constraints should be taken into account when considering the feasibility of the scaling policies. Another important issue is the robustness of the system with regard to a scaling policy. Usually, an optimal scaling policy is determined according to a given workload, e.g a homogeneous Poisson process with a particular rate. The robustness of the system relates to its ability to operate at near-optimal manner when the estimates of the workload parameter values (the Poisson rates) are not precise, or even grossly incorrect. Given workload mis-estimation is common in practice, the investigation of adaptive schemes with the given set of speeds may improve the robustness of the system. 1.4 Summary The rising energy consumption in data centres and supercomputing clusters has become a mainstream concern not only because of the electricity bills but also about many associated costs such as cooling costs, hardware failures and environmental impacts. On the other hand, the rising supercomputing demands for simulations, data analyses, and emerging technologies like Big Data, Cloud computing or the Internet of Things will increase the computing demand very quickly. It is thus important and urgent to make these facilities more energy efficient. In a supercomputer cluster, there are IT equipment and supporting infrastructure. IT equipment include servers (where computation take place), storage, and networking devices connecting them. The power infrastructure includes the cooling systems and the power delivery system. The power usage effectiveness (PUE), the ratio of the power consumed by the IT equipment and the total power entering the facility, measures how effective power is used within a facility. A more strict energy efficiency measure would be the ratio of power consumed by the components carried out the computation over the total power. Nevertheless, PUE has been widely used. Advanced cooling techniques such as liquid cooling and capitalising on free cooling have greatly brought down cooling costs. In order to further reduce energy consumption, the IT equipment must become more energy efficient. Energy conservation approaches 29

32 such as workload consolidation techniques (virtualisation), and smart resource allocation have been used in that regard. Making computing components of a server more energy efficient by using speed scaling techniques can also contribute to that effort. Power proportionality is a goal in which the power consumption is proportional to the workload. Supercomputing facilities are far from power proportional since an idle server s power usage exceeds at least 50% of the peak power, and idleness is common in these facilities. However, power proportionality can be achieved by maintaining only an appropriate number of active servers while idle servers are switched off to save energy. Such an approach is known as Dynamic Capacity Provisioning. This approach, however, faces several challenges. First of all, there are costs associated with switching, not to mention the infrastructure overheads of implementing the power saving mechanism. Secondly, provisioning requires workload prediction which is not trivial. Finally, performance degradation is inevitable involving provisioning scheme; and it is undesirable to save energy by sacrificing performance. Even if idle servers are switched off with an acceptable impact on performance, power proportionality is not yet perfectly achieved. By dynamically adjusting the number of active cluster s servers, we only save the energy that should have not been wasted (improving SPUE in equation (1.1)). After the workload is allocated to the available (powered on) servers, applying speed scaling techniques will guide the individual server to optimally consume power with respect to its coming serving load, thus help bring the cluster s performance closer to the goal of power proportionality. In a sense, Dynamic Capacity Provisioning addresses the problem in a macro manner, while the speed scaling deals with it at a micro level (single processor). 1.5 Overview of Thesis Thesis statement A theoretical control framework is developed to apply Dynamic Capacity Provisioning approach to improve energy efficiency of supercomputing clusters with consideration to system performance. This framework exploits the stochastic behaviours of the system s future occupation which is predicted via a queue model. A new perspective for workload estimation is proposed, that shifts towards exploiting individual user information rather than aggregated user data. Numerical algorithms are developed for finding the optimal scaling policy when applying speed scaling techniques to single servers. Factors at the design state that impacts on 30

Tuan V. Dinh, Lachlan Andrew and Philip Branch

Predicting supercomputing workload using per user information Tuan V. Dinh, Lachlan Andrew and Philip Branch 13 th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. Delft, 14 th -16