eraid: A Queueing Model Based Energy Saving Policy

Size: px

Start display at page:

Download "eraid: A Queueing Model Based Energy Saving Policy"

Priscilla Snow
5 years ago
Views:

1 eraid: A Queueing Model Based Policy Dong Li University of ebraska Lincoln Lincoln, E li@cseunledu Jun Wang University of Central Florida Orlando, Florida wangjunwang@gmailcom Abstract Recently energy consumption becomes an ever critical concern for both low-end and high-end storage servers In this paper, we propose an energy saving policy, eraid (energy-efficient RAID), for mirrored redundant disk array systems eraid saves energy by spinning down partial or entire mirror disk group with controllable performance degradation We first develop an energy-saving model for multi-disk environments by taking into account both disk characteristics and workload features Then, we develop a queueing model based performance (response time and throughput) control scheme for eraid Experimental results show that eraid can save up to 32% energy without violating predefined performance degradation constraints 1 Introduction In today s high-performance data-intensive computing systems, the energy consumed by the disk-based storage subsystems may easily surpass the energy consumed by the rest of the computing system [1, 2] A majority of existing energy conservation solutions are available for multispeed disks [9, 11, 13, 19, 2] Since current server systems are still built with conventional disks, it is important to provide new solutions for conventional disks The most fundamental method to save conventional disk energy is to turn it to standby state when there is no arriving requests Since server side workloads are usually too intensive to supply long idle periods for disks to spin down to save energy, these long idle periods have to be artificially created Without intervening with the execution of applications, a possible way to create long idle periods for disks is to unbalance disk workloads Currently, there are two approaches to unbalance workloads for conventional disks toward energy-saving: relocat- This work is supported in part by the US ational Science Foundation under grants CS-61263, CS-948 and CCF-42999, and the US Department of Energy Early Career Principal Investigator Award DE-FG2-ER2687 ing data and redirecting request The representative works for the first approach are MAID [] and PDC [1] The common feature of both policies is to intentionally unbalance disk loads by migrating data across disks according to the changing of data access patterns However, Pinheiro et al [1] find that MAID and PDC can conserve energy for conventional disks only when the load on the server is extremely low The basic idea of the second approach redirecting request is to intensionally bypass a data target (eg a disk) by redirecting requests to other data target(s) to provide the same information to users We explored how to use this approach in RAID systems in literatures [11, 12, 18] Recently, Pinheiro et al [6] introduce Diverted Accesses to leverage the redundancy for energy saving For all of above mentioned works, the performance impact of redirecting requests is not properly controlled Literature [13] is among the first to develop several control algorithms to ensure performance-guarantee for a multi-speed disk based storage system But their performance prediction scheme for disks is still at an early stage without taking into account many critical factors, such as the workload dynamics, time-criticality, disk queueing delay and disk array organizations In this paper, we develop an energy saving policy called eraid deploying request redirecting for conventional disk based RAID-1 systems Experimental results show that eraid can save up to 32% energy without violating predefined performance degradation constraints The rest of this paper is organized as follows Section 2 describes related work and motivation The design of eraid are introduced in Section 3 Section 4 is the experimental evaluation Finally, we give the conclusion remarks in Section 2 Related work and motivation There are three major problems with current server-side disk energy management policies (1) System reconfiguration: Most current energy saving policies for server side storage systems require various system reconfigurations, such as deploying multi-speed disks [9, 11, 13, 19, 2], adding cache disks [, 1], or resetting system redun-

2 dancy configuration according to energy saving expectations [6] (2) Single performance measurement: The traditional performance metrics of I/O systems are throughput (bandwidth) and response time (latency) According to our knowledge, all the current disk energy management algorithms only take one of the two traditional measures, into consideration for the tradeoff (3) o differentiation for workload time criticality: Since synchronous and asynchronous loads have different time criticalities, their reactions to storage system performance degradation are also different Three observations motivate us to develop an energy saving policy for conventional disk based mirrored-redundant RAID systems (1) With the help of redundant information, it is possible to generate long idle periods for disks in dataintensive environments (2) Serve disks usually have spare service capacity [9, 11], and the system loads are roughly predictable [19] (3) Queueing model is a widely adopted method to model RAID systems for performance analysis and prediction [3, 14, 16] Queueing model can provide performance measures of both throughput and response time 3 eraid The main idea of eraid is to spin down partial or entire mirror groups to standby mode to save energy; when these disks are in standby state, read requests are served by data copies in the primary disks Write requests to the standby disks are deferred in the controller cache or active disks, and then flushed to the standby disks after they are spun up In current implementation, we only study deferring writes by V-RAM Since V-RAM is battery-backed, eraid does not impact the reliability of RAID systems We must note that using redundancy information to spin down disks to save energy can also be used for other RAID organizations, such as RAID- Because of the space limitation, we only focus on RAID-1 in this paper To provide enough active disks to meet performance based service quality, we deploy a time-window based performance control scheme (details in Section 33) in eraid to control the tradeoff between energy savings and performance (response time and throughput) degradation To predict the performance of a storage system, we first forecast the load features in the next time-window Then, we use the load features as input and feed them into the performance prediction model to make the prediction 1 Different queueing network models are used to precisely model the RAID-1 storage system with specific characteristics At the beginning of each time-window, eraid does performance/energy predictions to find out how many disks can be spun down to save energy, or how many disks should be spun up so as not to violate predefined constraints This 1 There are many techniques for capturing the temporal behavior of workloads Finding which one is the best is beyond the scope of this dissertation In our current implementation, we use ARMA (Autoregressive Moving Average) to do the workload prediction problem can be formalized as follows Object: maximize S E = E base E eraid E base System Throughput Constraint: D T = TeRAID T base T Subject to base Limit T Response Time Constraint: D X = X base X eraid X base Limit X Here E, T and X denote energy consumption, mean response time and mean throughput base represents the baseline system that employs no energy-efficiency policy Limit T and Limit X are user-defined performance degradation parameters (eg %) for mean response time and mean throughput respectively ext, we show how to address this problem 31 Solving for energy saving S E We develop a power model for multi-disk systems to estimate the energy saving S E Since we mainly focus on the disk energy consumption, other components (eg CPU and cache) in RAID systems are not examined here Several energy consumption models have been proposed for multi-speed disks [2, 9, 19, 2] Recently, Pinheiro et al model and evaluate several energy conservation techniques for conventional disks [6] However, their study is in a limited context and some workload features are not considered, such as time criticality In the following discussion we will see that energy saving results may be different for synchronous and asynchronous loads even with the same request arrival intensity Given a time-window with length T and an -disk RAID-1 system, we assume that R requests are served in T and the average service time is t Then the system utilization is ρ 1 = R t T LetP a,p i,p s and P w denote the power when a disk is respectively active, idle, standby and performing state switching Let T s denote the time of disk in standby state, and T w denote the time the disk is performing state switching For write load, the energy consumption of coherence depends on the write operation of workloads and is hard to estimate To get the upper bound of energy saving, we do not consider coherence energy consumption here Thus the energy consumption of original disk array is E base = E active + E idle = P a Tρ 1 + P d T(1 ρ 1 ) For asynchronous loads, the request arrival rate is independent of the number of active disks For synchronous loads, the request arrival rate can be reduced because less active disks provide smaller system throughput That is, in synchronous loads, the number of served requests in T may be less than R We use ρ 2 ( ρ 1 ) to denote the new utilization Suppose i disks are spun down to standby state at the beginning of T, and spun up to active state at the end of T Then, the energy consumption of the i disks is i (P s T s +P w T w ) The energy consumption of the ( i) disks is P a Tρ 2 + P i [( i)t Tρ 2 ] Finally the new energy consumption can be described as: E eraid =

3 i (P s T s + P w T w )+P a Tρ 2 + P d [( i)t Tρ 2 ] The percentage of energy saving, S E, in the period T can be expressed as: S E = (E base E eraid )/E base = (P a P d )(ρ 1 ρ 2 )+ i (P d PsTs+PwTw T ) P d +(P a P d )ρ 1 Here ρ 1 and ρ 2 can be resolved using the performance prediction technique in Section 32 From the equation of S E, we can see that the larger the value of (ρ 1 ρ 2 ), the more energy saving can be achieved For asynchronous loads, (ρ 1 ρ 2 ) is zero With the same ρ 1, the energy saving of asynchronous loads can be considered to be the bottom line of synchronous loads In this section, we only analyze the energy saving of asynchronous loads We take IBM Ultrastar 36Z1 2 as the representative server SCSI disk in this analysis Since disk utilization in server-side RAID systems is usually less than 3% [9, 11], P i (P a P i )ρ 1 is held in most practical cases The energy saving equation can be simplified to be S E i PsTs+PwTw (1 P d T ) Rsi = 1/ Rsi = 1/8 Rsi = 1/6 Rsi = 1/4 8 Disks umber of Spun Down Disks Disks with 4 disks spun down with 3 disks spun down with 2 disks spun down with 1 disk spun down P k Figure 1 The impact of i, R sd = Ps P d and k on energy saving P denotes the maximum energy saving when T T w We analyze the energy savings in two steps Firstly, we assume T T w to get the upper bound of energy savings According to this assumption, T s = T T w T Then the energy saving equation can be further simplified to be S E i Ps (1 P d ) Defining R si = Ps P i,wedrawthe left diagram in Figure 1 to show the impact of i and R si on the energy savings From the diagram we can see that the number of spun down disks is the dominant factor impacting energy savings Secondly, we remove the assumption T T w To study the impact of the length of T on the energy saving, we set T = kt w,k > 1 ow we get S E i Ps(k 1)+Pw (1 kp d )= i ( k ) We draw the right diagram of Figure 1 to show the impact of T on energy saving From the diagram we can see that, with the increase of k, the energy saving is approaching that of ideal condition P (T T w ) 32 Solving for performance degradations D T and D X We use queueing models to calculate the performance degradation factors D T and D X Our method can be described as follows: (1) use queueing models to model 2 RAID-1 and obtain performance predictions; (2) examine how the input parameters change after spinning down some mirror disks; (3) calculate new performance predictions; (4) compare original and new predictions for decision-making We know that, read and write operations in RAID-1 systems have different features, and synchronous and asynchronous loads have different time criticalities To isolate the difference, we examine the performance impact of spinning down mirror disks under synchronous read (SR), asynchronous read (AR), synchronous write (SW) and asynchronous write (AW) loads individually We extend two queueing models introduced by Varki et al in literature [16] to model RAID-1 systems with synchronous read and synchronous write loads Based on both models, we develop two queueing models for asynchronous read and asynchronous write loads The accuracy of these queueing network model is shown in Section 43 The queueing models are built based on the main features of a real RAID system, HP SureStore E Disk Array FC6 [] Read load The left diagram of Figure 2 shows the model of a RAID-1 system with a synchronous read load It is assumed there are M processes in this system and each process generates an I/O stream (array read requests) Array read requests are first submitted to the controller cache If cache misses occur, they are directed to the disks After the requests are finished, they return to the processes Only after the previous request returns does a process issue another request following a further process-delay time All the processes are modeled as one delay server The average performance measures of this closed queueing network can be computed by the Mean Value Analysis (MVA) technique [1] For asynchronous loads, the RAID system is modeled as an open queue network, shown in the right diagram of Figure 2 Since the load balance policy is not considered here, the disk array is treated as a bunch of M/M/1 queue nodes The advantage of assuming exponential distribution is the efficiency and simplicity of the corresponding performance technique [16] Our experiments show that the error caused by this assumption is at an affordable level (details in Section 43) Table 1 shows the model input parameters The disks are named from subsystem 1 to subsystem The RAID controller cache is named subsystem ote that P = 1 and i=1 P i =1 cache hit rate After some mirror disks are spun down to save energy, two input parameters of the queueing models could be affected: disk access probability and disk service time When i mirror disks are spun down, the load of the mirror disks will be directed to their primary disks Access probabilities of other disks are not affected Disk service time depends on a large number of factors, including disk specification, disk scheduling policies and workload features To find the re-

4 with synchronous read load with asynchronous read load Processes 1 M Process_delay Cache cache_hit_rate cache_service_rate disk_access_prob Disk 1 Disk request_arrival_rate Cache cache_hit_rate cache_service_rate disk_access_prob Disk 1 Disk disk_service_rate disk_service_rate disk array disk array Figure 2 Queueing network model of RAID-1 with read workloads Table 1 Read load model input parameters Parameters Description Common Parameters ( i ) number of all disks in RAID-1 µ i service rate of subsystem i P i access probability of subsystem i Synchronous Read Model M the number of processes O p mean process delay Asynchronous Read Model λ mean request arrival rate lationship between disk service time and disk queue length, we directly measure the service time of three physical disks under a wide spectrum of workloads We find that when the disk queue length is less than three 3, the change in disk service time (for both read and write operations) has only a minor impact on the calculation of performance degradation Therefore, we do not consider the change of disk service time in the prediction Based on the above discussion, we can compute the performance measures for both the base system and eraid For ease of presentation, we define disk 1 to /2 to be mirror disks and disk (/2+1) to to be primary disks in an -disk RAID-1 system Let disk n and disk v be a mirror disk and its corresponding primary disk, with v=/2-n+1 For a synchronous read load, we use T i (M) and T i (M) to denote the mean response time of subsystem i with M processes in the baseline system and eraid respectively T i (M) and T i (M) can be easily computed using the MVA technique [1] The final mean response time of the disk array and the mean throughput of the baseline system are: T base (SR) = [T n (M)P n ], X base (SR) = n= M T base (SR)+O p 3 We only spin down mirror disks when the system utilization is low Even when the disk utilization is 3%, which means the queue length is ρ/(1 ρ) = 3%/(1 3%) 4 According to queueing theory, doubling the disk load would increase the queueing length to 7%/(1 7%) 233 The performance measures of eraid are: i T eraid (SR) = T (M)+ [T n(m)p n ] + n= i+1 n=i+1 [T n(m)(p n + P v )], M X eraid (SR) = T eraid (SR)+O For asynchronous read loads, as discussed p in Section 2, system throughput is not affected if there are no disks in an unstable state after spinning down the mirror disks The mean response time is computed as the mean cache response time plus the mean disk response time That is, 1 T base (AR) = µ λ + P n P n = µ n=1 n λp n µ n= n λp n After spinning down mirror disks 1 to i, the mean response time of eraid is i 1 T eraid (AR) = µ λ + P n µ n λp n + n= i+1 n=i+1 P n + P v µ n λ(p n + P v ) Write load Unlike with reads, a write request is completed once the data is written to the cache FC-6 uses a two-threshold write-back policy [, 16], destage threshold (eg 2% of cache size) and max dirty blocks (eg cache size) Destaging operations start if the number of saved dirty blocks is larger than the destage threshold The maximum number of dirty blocks that can be held in the cache is determined by max dirty blocks For write workloads, the RAID system can be modeled as a Markov birth-death M/M/1/K process [16], with K = max dirty blocks destage threshold average request size +1 Practically, K is the maximum number of requests that can be held in cache Thus, the disk array can be modeled by the cache alone with the input parameters listed in Table 2 Since all data must be written to both mirror and primary disks, an -disk RAID-1 can serve /2 requests simultaneously Thus the service rate d µ is 2µ [16] In the synchronous write model, χ denotes the maximum array throughput, which is the array s throughput when the disk array has

5 Table 2 Write load model input parameters Para Description P miss array cache write miss probability K maximum queue length d λ d µ arrival rate of dirty blocks to cache rate at which dirty blocks are written from cache to disks µ mean disk service rate Synchronous Write Model χ maximum array throughput an infinite cache, and χ can be computed using the MVA technique The arrival rate of dirty blocks of a synchronous write load is different from that of an asynchronous write load For a synchronous write load, d λ can be approximated by χ P miss, while for asynchronous loads, d λ = λ P miss where λ (defined in Table 1) can be directly measured In current design of eraid, we use V-RAM to save inconsistent data By spinning down i disks, the service rate d µ is 2i 2µ Another two input parameters may be affected when some of the mirror disks are spun down: K and P miss Deferring the write of standby disks dirty blocks in the cache for a longer period may actually increase the destage threshold, which would then decrease the queue length K Through experiments we find eraid has little impact on the other parameter, P miss There are two reasons for this First, the data access locality is relatively low in the controller cache compared to other higher level caches such as file system cache Second, eraid restricts the accumulated dirty blocks of standby disks within the second half of the LRU table in order to reduce the degree of the cache pollution For synchronous write loads, the mean throughput of the disk array is computed by χ (1 P max dirty )Here P max dirty is the steady state probability when the cache has the number of maximum dirty blocks According to M/M/1/K queueing model [1], P max dirty = (1 a) ak 1 a K+1 with a = d λ That is, dµ = 2µχPmiss (1 a) ak X base (SW)=χ (1 1 a K+1 ), a= 2µχP miss The mean response time can be computed using Little s Law on the entire system: M T base (SW)= X base (SW) O p X eraid (AW ) and T eraid (AW ) can be computed in the same way as computing X eraid (SW) and T eraid (SW) but replacing K by K and a = 2µλPmiss 2i Here K = max dirty blocks new destage threshold average request size +1 For asynchronous load, the mean response time is k d λ according to M/M/1/K model Here k is the average queue length and k = a 1 a (K+1)aK+1 with a = d 1 a K+1 λ dµ = 2µλP miss < 1 That is, a (K +1)aK+1 T base (AW )= d λ (1 a) d λ (1 a K+1 ),a= 2µλP miss In the same way, T eraid (AW ) can be computed via replacing K by K and a = 2µλPmiss 2i 33 Control algorithm Since queuing model based control is more viable at large time granularity [4], we set one hour as the default time-window length in current eraid implementation The sensitivity study about the impact of time-window length is shown in Section 43 Another design consideration is disk start/stop cycles As mentioned by Zhu et al, frequently starting and stopping disks affects disk drive longevity [19] Usually disks are designed to sustain a certain number of start/stop cycles, so we set a maximum start/stop frequency for each disk 4 The control algorithm needs to solve the multi-constraint problem This multi-constraint (S E, D T and D X ) problem can be formulated as a multidimensional -1 knapsack problem with the dimension being two (response time limit and throughput limit) The multidimensional -1 knapsack problem is a classified P-hard optimization problem [8] Solving this problem involves much computing overhead Fortunately, we can simplify the solution according two observations: (1) as shown in Section 31, the energy saving S E is dominated by the number of spun-down disks rather than which disks to spin down (2) according to queueing theory, the disk with lower utilization tends to give less performance degradation when its access probability is almost doubled That is, given performance degradation constraints, we may achieve the maximum number of spun-down disks by spinning down lightest loaded mirror disks Our solution is described as Algorithm 1 Our experiments show that this algorithm gives optimal solution in most cases Compared with using close-to-optimal algorithms [8] to solve the multidimensional knapsack problem, Algorithm 1 needs much less computing overhead However, in real world systems, prediction error of performance degradation is usually unavoidable Here we define the prediction error (a ratio) to be denoted by real degradation predicted degradation σ = real degradation This error comes from the load predictor and queueing network models Since the error of the load predictor and queueing network models can be measured or estimated beforehand in most cases, we assume the average error is a 4 For example, we set 4 times per day for an IBM Ultrastar 36Z1 disk according to its minimum of, start/stop cycles and three-year warrantable service time

6 Algorithm 1: find mirror disks to spin down 1: Put active mirror disks in list S; 2: Sort all the mirror disks from the least loaded to the heaviest loaded; 3: for(i =1;i</2; i ++){ 4: Predict performance degradations of spinning down first i disksofs; : If any one of the constraints is violated, then the first (i 1) disks in S are disks to spin down; 6: } Algorithm 2: conservative control algorithm 1: Limit T = Limit T (1 σ); 2: Limit X = Limit X (1 σ); 3: Put the disks found by Algorithm 1 into set T; 4: Spin down disks in T; : Spin up standby disks that are not in T; known parameter and it is updated periodically (eg every day) In order not to violate the limits, we reset Limit T to be Limit T (1 σ) and Limit X to be Limit X (1 σ) Obviously, if σ is larger than one, eraid can not spin down any disk to save energy Experimental results show that σ is less than 1% in current design More details about the prediction error are shown in Section 43 The conservative control algorithm is summarized as Algorithm, which is called at the beginning of each timewindow It is called conservative control because eraid always tries to keep the performance degradation lower than the constraints in each time-window For write load, eraid tries to replace dirty blocks belonging to active disks until the accumulated dirty data of standby disks is larger than the new destage threhold After all the dirty data are flushed back to these disks, eraid spins them down again if these disks have not yet reached their maximum start/stop frequency 4 Evaluation 41 Experiment setup We evaluate eraid by using trace-driven simulations We choose IBM Ultrastar 36Z1 as the disk power model [2] This disk is commonly used in data-intensive environments and some research works [19] Eight traces are used in the evaluation For asynchronous loads, we select three real world traces to replay I/O traffic of large storage server systems The first one, SPC-SE (Storage Performance Council-Search Engine I/O) serves as an AR load because it has almost no write request The second trace is TPC-C2 6 We get the pure write requests of TPC-C2 from the backup device and use it as an AW load The last trace is Cello99 7 To make Cello99 trace intensive enough software for current hardware conditions, we merge the loads of three days into one day Cello99 has the inter arrival time 67 ms, read ratio 62% and average request size 1273 KB The inter arrival time for SPC-SE and TPC-C2 are 11 ms and 21 ms The average request sizes are 168 KB and 4 KB for SPC-SE and TPC-C2 respectively By separating read and write requests of Cello99, we get one more asynchronous read load AR(Cello99), and one more write load AW(Cello99) Usually real world block-level traces do not provide enough information about the synchronous relationship among requests, so we generate four synthetic synchronous loads based on the load features of Cello99 According to literature [7], we assume that 3% of all requests of Cello99 are synchronous requests, then we get 14 and 23 as the representative process numbers of Cello99 in the morning and the afternoon respectively The 14-process and 23-process traces are named S14 and S23 respectively With pure read and pure write requests, we generate four traces, AR(S23), AR(S23), AW(S14) and AW(S23) To make S14 and S23 have same workload intensity, we set their average process delays to be 4 ms and 8 ms The access sequentiality and locality are 22 and 36, and average request size is the same as that of Cello99 The request inter-arrival time follows exponential distribution We configure a RAID-1 system based on Hewlett- Packard SureStore E FC-6 disk array [] in Disksim, a popular storage system simulator The RAID-1 system has eight disks and two mirrored array controllers Each controller has 12MB VRAM Then we augment Disksim with a disk power model and our energy-efficient policy LRU is chosen as the default controller replacement algorithm We let eraid defer the data update of spun-down disks only in the second half of the controller cache as suggested in literature [17] 42 Evaluation results Figure 3 shows the overall experimental results for three scenarios The top row shows the synchronous results and the bottom row shows the asynchronous results From Figure 3 we can see that our performance control algorithm is effective for all four kinds of workloads and that performance degradation constraints, Limit T and Limit X,are not violated in any scenarios For CASEI, which gives tight constraints, the maximum energy savings, 22%, is achieved in AR(Cello99) Because SPC-SE is more intensive than Cello99 loads, eraid saves 1% energy for SPC-SE Because of the constraints on both response time and throughput, eraid saves less energy with synchronous loads than it does with asynchronous loads For all traces, more energy is conserved in read loads than write loads This is because when mirror disks are spun down in RAID- 1, the system service rate with write loads degrades more than that with read loads CaseII increases the value of

7 3 3 CASE I 3 3 CASE II 3 3 CASE III SR(S14) SW(S14) SR(S23) SW(S23) (a) CASE I 3 3 SR(S14) SW(S14) SR(S23) SW(S23) (b) CASE II 3 3 SR(S14) SW(S14) SR(S23) SW(S23) (c) CASE III AR(Cello99) AW(Cello99) SPC-SE TPC-C2 (d) AR(Cello99) AW(Cello99) SPC-SE TPC-C2 (e) AR(Cello99) AW(Cello99) SPC-SE TPC-C2 Figure 3 Overall results CASEI: Limit T 2%, Limit X %; CASEII: Limit T 2%, Limit X %; CASEIII: Limit T %, Limit X % the throughput degradation limit from % (in CASEI) to % This does not affect the energy savings of asynchronous loads, but eraid achieves more (23 9%) energy savings with the synchronous loads CASEIII lowers the constraints a step further by raising the bar of response time degradation from 2% to % This change makes eraid conserve more energy with both synchronous and asynchronous loads Compared with the energy savings in CASEII, the energy savings of synchronous loads in CASEIII are 4 9% higher, while the savings increase 114% for asynchronous loads Finally, the maximum energy saving with the three cases is 324% achieved in Cello99 AR load For all traces, the degradation of throughput is less than that of response time In all cases, eraid involves less throughput degradation with S23 than it does with S14 That is, for the same I/O intensity, eraid introduces less throughput degradation for the loads having more processes (and longer process delay) Asynchronous loads, which have no throughput degradation, can be viewed extreme cases that have a very large process number and very long process delays 43 Sensitivity study In this section, we study the sensitivity of performance prediction errors, RAID size, and time-window length By comparing the performance degradations with the predictions of eraid in each time-window, we get the average prediction errors shown in Figure 4 From the left diagram of Figure 4, we can see that the prediction errors for response time degradations are always bigger than those for throughput degradation We show two kinds of results in the right diagram of Figure 4 We get the perfect traffic prediction using off-line trace analysis The workload prediction using ARMA contributes 1 3% to overall prediction errors The workloads of TPC-C2 fluctuant less than those of SPC-SE and Cello99, so a smaller prediction error is introduced in TPC-C Response Time Throughput SR(S14) SW(S14) SR(S23) SW(S23) (f) Response Time (with ARMA Traffic Prediction) Response Time (with Perfect Traffic Prediction) AR(Cello99) AW(Cello99) SPC-SE TPC-C2 Figure 4 Prediction errors of performance degradation (defined by real degradation predicted degradation real degradation ) To examine the effect of RAID size, we redo the experiment with 6-disk RAID-1 and -disk RAID-1 for CA- SEIII We take traces S14, SPC-SE and TPC-C2 as the experiment workloads To make fair comparisons, we change workload intensities and RAID cache size according to the RAID size ew experimental results show that the energy savings and performance degradation increase as the RAID size increases Given a larger size, the RAID-1 system has more options to fine tune the number of spun-down disks With the ability to better tune the number of spun-down disks, eraid saves more energy in a -disk RAID-1 system than in an 8-disk RAID-1 We redo the experiments of CASEIII with the timewindow lengths:,, 3, 6, 9, and 12 minutes We find that, with the increase of the time-window length, energy savings of all the loads tend to remain relatively consistent Energy savings of asynchronous loads, however, drop

8 quickly as time-window length decreases This is because both ARMA and queueing models make bigger prediction errors with fine granularity (smaller time-window length) than they do for large time-windows In order to guarantee performance, conservative control lowers the values of limits according to σ: Limit T <= Limit T (1 σ), Limit X <= Limit X (1 σ) Therefore, the faster σ increases, the quicker energy saving decreases Compared with asynchronous loads, synchronous loads are more insensitive to time-window length This is because their request inter-arrival time is strictly an exponential distribution The prediction error does not change much with the variation of time-window length Conclusions and future work Compared with previous server side energy saving policies, eraid has the following advantages: (1) it is a soft solution and does not require hardware updates (eg adding cache disks or deploying multi-speed disks) for current storage systems; (2) it is independent of host systems and can be easily deployed to standard RAID systems with little change of existing disk array configurations; (3) it does tradeoff between energy and performance with the consideration of both performance metrics, average response time and throughput In the future, we will extend the idea of eraid to other RAID organizations (eg RAID-) with the incorporation of advanced controller features such as access coalescing, load balancing and prefetching Acknowledgements We would like to thank the anonymous reviewers for their thoughtful suggestions We would also like to thank Hewlett-Packard Labs Storage Systems Department for providing us storage traces References [1] G Bolch, S Greiner, H de Meer, and K S Trivedi Queueing etworks and Markov Chains: Modeling and Performance Evaluation with Computer Science Applications John Wiley & Sons, ew York, Aug 1998 [2] E V Carrera, E Pinheiro, and R Bianchini Conservingdiskenergyinnetworkservers InProceedings of the 23 International Conference on Supercomputing (ICS- 3), pages 86 97, ew York, June ACM Press [3] S Chen and D Towsley A performance evaluation of RAID architectures IEEE Transactions on Computers, 4(): , Oct 1996 [4] Y Chen, A Das, W Qin, A Sivasubramaniam, Q Wang, and Gautam Managing server energy and operational costs in hosting centers In SIGMETRICS, pages , 2 [] D Colarelli and D Grunwald Massive arrays of idle disks for storage archives In Proceedings of Super Computing 22 Conference CD, pages , Baltimore, MD, ov 22 IEEE/ACM SIGARCH [6] R B E Pinheiro and C Dubnicki Exploiting redundancy to conserve energy in storage systems In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), Saint- Malo, France, June [7] G R Ganger and Y Patt The process-flow model: Examining I/O performance from the system s point of view In SIGMETRICS, pages 86 97, 1993 [8] M R Garey and D S Johnson Computer and Intractability: A Guide to the Theory of P-CompletenessWHFreeman, 1979 [9] S Gurumurthi, A Sivasubramaniam, M Kandemir, and H Franke DRPM: dynamic speed control for power management in server class disks In Proceedings of the 3th Annual International Symposium on Computer Architecture (ISCA-3), pages , ew York, June [] Hewlett-Packard Company HP SureStore E Disk Array FC6 User s Guide PuboA277-91, Dec 2 [11] D Li and J Wang EERAID: Energy-efficient redundant and inexpensive disk array In Proceedings of 11th ACM SIGOPS European Workshop, Leuven, Belgium, September 2-22, 24 [12] D Li, J Wang, and P Varman Conserving energy in raid system with conventional disks In International Workshop on Storage etwork Architecture and Parallel I/Os, Saint Louis, Missouri, Sep 18 2 [13] X Li, Z Li, F David, P Zhou, Y Zhou, S Adve, and S Kumar Performance-directed energy management for main memory and disks In Proceedings of the Eleventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 4), October 24 [14] A Merchant and P S Yu Analytic modeling of clustered RAID with mapping based on nearly random permutation IEEE Trans Comput, 4(3): , 1996 [1] E Pinheiro and R Bianchini Energy conservation techniques for disk array-based servers In Proceedings of the 18th International Conference on Supercomputing, pages 68 78, June 26 - July 1 24 [16] E Varki, A Merchant, J Z Xu, and X Z Qiu Issues and challenges in the performance analysis of real disk arrays IEEE Transactions on Parallel and Distributed Systems, 1(6):9 74, June 24 [17] A Varma and Q Jacobson Destage algorithms for disk arrays with non-volatile caches In H Jin, T Cortes, and R Buyya, editors, High Performance Mass Storage and Parallel I/O: Technologies and Applications IEEE/Wiley Press, ew York, 21 chap [18] X Yao and J Wang Rimac: A redundancy-based, twolevel collaborative i/o cache architecture for energy-efficient storage systems In 1st ACM EuroSys Conference, Leuven, Belgium, April 18th-21st 26 [19] Q Zhu, Z Chen, L Tan, Y Zhou, K Keeton, and J Wilkes Hibernator: Helping disk arrays sleep through the winter In 2th ACM Symposium on Operating Systems Principles (SOSP ), Brighton, United Kingdom, October [2] Q Zhu and Y Zhou Power-aware storage cache management IEEE Transactions on Computers, 4(Issue ):87 62, May 2

NEC PerforCache. Influence on M-Series Disk Array Behavior and Performance. Version 1.0

NEC PerforCache. Influence on M-Series Disk Array Behavior and Performance. Version 1.0 NEC PerforCache Influence on M-Series Disk Array Behavior and Performance. Version 1.0 Preface This document describes L2 (Level 2) Cache Technology which is a feature of NEC M-Series Disk Array implemented