Comparing Variance Reduction Techniques for Monte Carlo Simulation of Trading and Security in a Three-Area Power System Magnus Perninge Email: magnus.perninge@ee.kth.se Mikael Amelin Email: mikael.amelin@ee.kth.se Valerijs Knazkins Email: valerijs.knazkins@ee.kth.se Abstract A variance reduction technique is a method used to reduce the variance of a Monte Carlo Simulation. In this paper four of the most commonly used variance reduction techniques are tested to estimate the trade of between trading and security in a three-area electric power system. The comparison is made by regarding both the variance reduction and the bias induced by the method. I. INTRODUCTION When performing a Monte Carlo Simulation the outcome will be a stochastic variable, say X, on some given probability space (Ω, I, P ). The variance of X on Ω is given by Var(X) (X(ω) m X ) 2 dp (ω), (1) where Ω m X XdP, (2) Ω is the expectation of X over Ω. If the method used is unbiased then m X as defined above will be the true sought value. The probability of obtaining m X as a result from a simulation is P (X 1 (m X )) which will often be zero (a.s.). The variance tells us how broad the distribution of X will be. As we increase the number of samples in our simulation, the variance will decrease as 1/n, where n is the number of samples, hence giving us a series, converging almost surely [1]. Variance reduction techniques are used to decrease the variance of each sample yielding a larger probability of obtaining a good estimate from our simulation. In this paper four methods for reducing the variance of Monte Carlo Simulations will be tested and compared to one another and to the standard Simple Sampling. The comparison is made on the problem of finding the Trade-Off between Trading and Security of a benchmark three-area power system. II. VARIANCE REDUCTION TECHNIQUES The variance reduction techniques tested on our problem will be: Dagger Sampling, Importance Sampling, Stratified Sampling and the Control Variates method [2]. These methods are arguably the most commonly used techniques for increasing the probability of obtaining good estimates from Monte Carlo Simulations. Dagger Sampling Dagger sampling is commonly used when the sought expected value is a function of a number of r.v. living in {1, 2} (e.g. the reliability of a large system of components). In dagger sampling a single random number is used to generate a large number of outcomes. Assume that we have a system of two components in series each with a probability q i of failure. If these probabilities are small, say 0.001 each, the risk of not getting a single system failure from 1000 scenarios of a Monte Carlo Simulation of the system using Simple Sampling is more than 10%. If on the other hand we were to use Dagger Sampling to generate the outcomes we start by partitioning the interval [0,1] into 1/q i segments {α k } of length q i and possibly a reminder β of length 1-q i 1/q i for i = 1, 2,... We then generate two random numbers x 1 and x 2 from a U(0,1) distributed s.v. To generate outcomes for the Monte Carlo Simulation we find the k such that x i α k and let outcome number k be a failure of component i. If x i β no outcome is a failure. Hence there will be no risk of not getting a single system failure when performing a Monte Carlo Simulation on the system described above. Importance Sampling When using Importance Sampling for variance reduction the probability measure of the random variables is changed to reduce the variance of the Monte Carlo Simulation. To get an unbiased estimate the resulting randomized cases must then be weighted before taking the mean. Suppose that we change the probability of failure from 0.001 to 0.1 for each component in the two-component system described above. This would yield a probability of 1 (1 0.1) 2 = 0.19 of system failure, instead of the probability 1.999 10 3 of failure in the original system. If we carry out our simulations on the new system and then weight the results with
Probability of state s in the original system w(s) = Probability of state s in the alternated system, (3) the expected value of the estimate will still be the same as in the original case. However, the risk of not getting enough cases from some of what might very often be the most important outcomes will be much smaller. Stratified Sampling If there is a natural way to partition the sample space Ω of our probability space (Ω, I, P ), into m subsets α 1, α 2,..., α m, n (such that α i α j = (i, j), i j, α i = Ω, α i I i) i=1 called strata, and a partition n 1, n 2..., n m of n such that m (P(α i )) 2 1 Var(g(X) X α i ) < 1 Var(g(X)), (4) n i n i=1 then the variance will be reduced if we divide the sampling procedure into m different Monte Carlo Simulations in which we constrain the r.v X to live on α 1, α 2,..., α m respectively, with p.m. induced by the conditional probability P ( α i ), and then weight the resulting estimates together using the weights P (α i ). When a partition is made we can use Neyman allocation to find the ultimate choice of n 1, n 2..., n m. Let σ h = Var(X h ) be the variance of X on α h and ω h = P(α h ), for h = 1,..., m. Then n h = n ω hσ h m (5) ω k σ k k=1 is the partition of n that minimizes the variance of our estimate. This principle is called Neyman allocation. Control Variates When using the control variates method our knowledge of the system is used to get a better estimate. Assume that the objective of our Monte Carlo Simulation is to estimate the mean of some r.v. Y = g(x) : X(Ω) S, where X is a r.v. on some given probability space (Ω, I, P ). Here, of course, E(g(X)) has to be well defined. By simplifying the model g( ) to get a new r.v. Z = g(x) : X(Ω) S, with mean m Z, that can be calculated analytically, we can, instead of sampling Y, sample the r.v. Y Z and then add m Z to the estimate of E(Y Z). Now, since Var(Y Z) = Var(Y ) + Var(Z) 2Cov(Y, Z), a good choice of g will result in a dramatically reduced variance when estimating m X. A warning to keep in mind is that an error in computing m Z will lead to a bias when estimating Y. III. PROBLEM DESCRIPTION The problem on which the variance reduction techniques will be tested and compared to one another is finding the tradeoff between trading and security in a benchmark three area electric power system. The system, shown in Figure 1, consists of three areas, each area connected to the other two with l ij lines each (i, j) S, where S = {(i, j) {1, 2, 3} 2 i j}. All the lines have a transfer capacity of κ MW each, and an availability probability p l. If we try to transfer more than the capacity of a line the connection will be turned off. Area i contains kv i power plants. Let J i = {1, 2,..., kv i }, giving us a total of kv i power plants to support the loads placed in each area. Power plant j in area i has production cost C ij C/MWh, production capacity Ḡij and availability probability p ij, for (i, j) V where V = J 1 J 2 J 3. The loads D i, i I = {1, 2, 3}, have a known joint probability distribution. We assume that all the actors on the electricity market have perfect information, and thus that the cheapest power plants are always used. The problem to be analyzed is what happens to the expected benefit of trading and the security of the system when a set of transfer limits K ij, (i, j) S, is used. All the frequency regulation is placed in Area 1 and the transferring capacity that is not used by the trading can be used to regulate the frequency. I.e. if there is a failure in plant 1 in Area 3, the production in Area 1 will increase to cover the new demand. Fig. 1. The Value of Trading The three-area system with 2 power plants in each area From a publicly economic point of view, maximizing the value of trading will be the same as minimizing the expected production cost plus the expected value of the cost originating from having to disconnect load [3]. The cost of disconnecting load is C DC C/MWh and is the same for all loads. The Security of the System As a measure of the security of the system we use the probability of a major disturbance causing loads to be disconnected. Real time regulation is assumed. This means that if we lose a transmission line, for example, the power plants will (if possible) immediately change their production to fit the new need. If the transfer limit is set too high the failure of one line will, however, lead to cascading failures. The optimization problem of covering the demand of the market will look as follows:
subject to: minimize G ij (i,j) V (i,j) V C ij G ij + i I j J i G ij D i + D DCi = j I {i} C DC D DCi (6) P ij, i I, (7) P ij K ij, (i, j) S, (8) 0 G ij G ij, (i, j) V, (9) P ij = U 2 i R Z 2 + U iu j Z 2 { X sin(θi θ j + x ij ) R cos(θ i θ j + x ij ) }, (i, j) S. (10) P ij, is the active power flow from area i towards area j and is constrained by the load flow equations (3.5). In (3.5) U i and U j is the per unit values of the voltages in area i and area j respectively, θ i and θ j is the phase angle in area i and area j respectively, and x ij is the angular changes induced by the phase shifters in area i used to shift the phase on the line towards area j. A physical bound might have to be added here as one of the constraints to the problem. Z = R + ix is the impedances of the lines used in the system. IV. SIMPLIFIED MODEL For our simplified model, which will be used to test the control variates method and importance sampling to get the value of trading, we assume that there are no losses in the system, i.e. P ij = P ji (i, j) S, and that all transfers for which P ij K ij are allowed. We also assume that no failures of either the power plants or the lines will ever occur. With these assumptions made it is actually possible to find an analytic solution to the problem of finding the value of trading for a certain load vector D = [D 1, D 2, D 3 ] T, and constraints K ij, (i, j) S. The solution is based on finding the production in every power plant as a function G ij : R 3 + R +, G ij (D) = Production in power plant j of Area i when the load is D = [D 1, D 2, D 3 ] T for (i, j) V. Let us start with the cheapest power plant which is situated in say Area 1. We want to use this power plant as much as possible since it is the cheapest. We cannot, however, export more than K 12 MWh/h to Area 2 and K 13 MWh/h to Area 3. Therefore G 11 = Ḡ11 (D 1 +K 12 (D 2 +P 23 + )+K 13 (D 3 +P 32 + )) (11) where P + 23 = K 23 ((D 3 K 13 ) 0) (12) is what could be sent to Area 3 via Area 2 and P + 32 = ((K 13 D 3 ) 0) (K 32 (D 2 K 23 ) 0)) (13) is what could be sent to Area 2 via Area 3. We now move to the next power plant in order of production cost and so on until the functions G ij are found for all (i, j) V. This method can be generalized to cover problems with power plants having nonlinear but finitely piecewise linear production costs by splitting one power plant into several plants, one plant for each linear part of the production cost. V. CASE STUDY As the Case Study on which our variance reduction techniques will be tested we use the following: In Area 1 there are 2 hydro power plants with production costs 10 C/MWh and 15 C/MWh respectively, and production capacities 100 MW and 100 MW respectively. In Area 2 there are 2 nuclear power plants with production costs 20 C/MWh and 25 C/MWh respectively, and production capacities 40 MW and 100 MW respectively. In Area 3 there are 2 heat power plants with production costs 30 C/MWh and 40 C/MWh respectively, and production capacities 30 MW and 100 MW respectively. All power plants have an availability probability of 99%. All lines have an availability probability of 99%. The load D = [D 1, D 2, D 3 ] T is N(m, Σ)-distributed with mean m = [50, 50, 50] T MW and covariance matrix Σ = 10 9 9 9 10 9 9 9 10 The cost of disconnecting load is 100 C/MWh. As for the power system quite ordinary values are used and the phase shifters can shift the angles to fit all purposes. 50 50 K = 50 25 50 25 How does the comparison work? To compare different variance reduction techniques to one another we use the measure time taken by the simulation variance of the resulting estimate. This measure is reasonable since variance decays as 1/n and t n, where n is the amount of samples and t is the time taken by the simulation. In this paper all the methods tested take approximately the same time when using the same n. While Importance Sampling, Stratified Sampling and Dagger Sampling are methods that affect our way of generating random variables, the Control Variates method uses the same way of randomizing the state variables as Simple Sampling. Therefore the Control Variates method can be combined with other methods to reduce variance even further. We came to the conclusion that Stratified Sampling and Dagger Sampling should be used only when generating the system state (Dagger Sampling cannot be used in any other way on this problem). Importance Sampling on the other hand will be used when randomizing both the loads and the system state. All in all six set of methods were tested, as depicted in the table below:
Loads System State IS SS IS DS Str SS CVM 1 2 3 4 5 6 where IS refers to Importance Sampling, SS is Simple Sampling, DS is Dagger Sampling, Str is Stratified Sampling and CVM is the Control Variates method. Simplified model With the values given above there is practically no risk that any area will run out of production capacity and the power plants are aligned so that Area 1 contains all the cheapest plants, Area 2 the next two plants in order of production cost and Area 3 the most expensive plants. Therefore we can model the two power plants in Area i as one power plant G i and then let G i1 = H(G i1 G i )G i and G i2 = H(G i G i1 )(G i G i1 ), for all i I, where H is the heaviside function. The G i s are then: G 1 = D 1 + K 12 (D 2 + P 23 + ) + K 13 (D 3 + P 32 + ), (14) Setting 1 G 2 = 0 (D 2 + P 23 + K 12 P23 ), (15) G 3 = 0 (D 3 P + 23 K 13). (16) This method is straightforward and no explanations are needed. It gave an expected production cost of 1795 C/period and a variance of 103, when taking n = 500 samples. However, we never get any disconnection of loads, which implies that this method gives us a so called cardinal error (when all power plants are down, load will undeniably have to be disconnected), i.e. we miss some important cases. Setting 2 When using Importance Sampling there is a certain strategy that actually yields a variance of 0. If we want to sample the function X = g(y ) : Y (Ω) S of the r.v. Y with well known distribution function f Y and instead take samples from a r.v. Z with distribution function f Z (ψ) = ĝ(ψ) f Y (ψ) (17) m X and, then weight the resulting samples with w = f Y /f Z, we get an estimate of m X with variance 0. However, this requires knowing m X = ĝ(y )df Y, (18) R 3 + which is what we wanted in the first place. Using our approximation ĝ of g and its analytically computable mean value gives us a method for reducing the variance of our estimate. The time needed to compute the integral (5.5) is neglected, since this process only has to be done once. Unfortunately this method did not work well on this system. Setting 3 We now turn our attention to the Control Variates method to se if it can do what Importance Sampling could not. Here as with Importance Sampling we have to calculate the expected value (5.4), and for the same reasons as before we disregard the time taken by this procedure. However this method seems to be amazingly good and the variance of the estimate is only 0.2375. To get that small variance with Simple Sampling we would have to take approximately n = 200000 samples. Setting 4 Having seen what the Control Variates method can do to the variance, we keep using this method and let the loads be randomized as with Simple Sampling. In Setting 3 we use Importance Sampling to randomize the system state. We change the probabilities of component failure from 1% to 20% and then weight the resulting scenarios. This was quite effective and the result was that the probability of system failure was estimated to 3.8731 10 7, but this still seems somewhat small considering that the probability of three or more components not working is 2.0562 10 4. Setting 5 This method did not work well at all, even when using it together with Importance Sampling (raising the probabilities of failure and then using Dagger Sampling) did not give a much better result than in Setting 3. Setting 6 When partitioning the sample space into 13 different strata α 0,..., α 12 such that all states with i failures belong to strata α i and then using the Control Variates method to get the value of trading, we obtained the best result with a variance of 2.3469 when estimating the estimated cost of supporting the loads in the system and an estimated probability of failure equal to 2.2271 10 4 and with variance 1.7450 10 8. Using the same arguments as with Setting 4, this seems to be a reasonable result. The partition of n was not made using Neyman allocation as described in Section 2. This because we want to take two estimates from every simulation, both the cost of supporting the loads and the risk of having to disconnect loads. When having to weigh these variances against one another it seemed easier and more efficient to try a few partitions to see which gave the best results. The most efficient method was thus to use Simple Sampling techniques for randomizing the loads and then Stratified Sampling to randomize the system state. Using this technique, we made the plot (Fig 2) showing the cost of supporting the
loads as a function of x when K = x 1 1 1 0.5 1 0.5 4.5 x 104 4 3.5 Risk of major disturbance 0.02 0.018 0.016 0.014 0.012 0.01 0.008 0.006 0.004 3 2.5 2 1.5 1 0.002 0 0 10 20 30 40 50 60 70 80 90 100 x Fig. 4. Risk of system failure 0.5 0 0 10 20 30 40 50 60 70 80 90 100 Allowed maximal trading Fig. 2. where the dashed line is the mean of cost of supporting the loads in the simplified system. If we look closer at this figure, we see that in the interval x > 50 the two lines become separated. This follows from the larger probability of the failure of a line causing the system to trip and loads to be disconnected. 2200 2100 System, a good choice of variance reduction technique can dramatically increase the efficiency of the simulation. A poor choice might, however, not give that much of an improvement. Some techniques, like Simple Sampling, give a result that seems very good but actually miss some very important scenarios, yielding a cardinal error. For our case study, using Stratified Sampling together with the Control Variates method gave the best results. These techniques can, of course, be used on many different problems in power system analysis. REFERENCES [1] Kai Lai Chung, A Course in Probability Theory, 3rd ed. San Diego, California: Academic Press, 2001. [2] Reuven Y. Rubinstein, Simulation and The Monte Carlo Method, New York: John Wiley & Sons, Inc., 1981. [3] Mikael Amelin, PhD Thesis On Monte Carlo Simulation and Analysis of Electricity Markets,. (Avaliable on www.eps.ee.kth.se.), Stockholm: Universitetsservis US-AB, 2004. 2000 1900 1800 1700 1600 1500 35 40 45 50 55 60 65 70 75 80 Allowed maximal trading Fig. 3. As for the probability of system failure, Figure 4 shows us the relation between x and the probability of having to disconnect loads. VI. CONCLUSIONS When using Monte Carlo methods to simulate the Trade- Off between Trading and Security in a Three-Area Power