Appendix A Random Variables and Stochastic Processes

Size: px

Start display at page:

Download "Appendix A Random Variables and Stochastic Processes"

Shanon Moore
5 years ago
Views:

1 Appendix A Random Variables and Stochastic Processes In view of the modeling, forecasting, and forecast verification aspects covered throughout the various chapters of this book, it might be necessary to recall some of the necessary basics on random variables and stochastic processes here. It is assumed, however, that the readers of this book have some general understanding of probability theory and statistics. Instead, focus is given to the necessary basics and terminology when considering renewable energy generation and market quantities. Specifically, we shall focus on concepts like random variables, stochastic processes, time series, and how randomness is described by important concepts like probability densities, moments (mean and variance), and correlation structures; with a special focus on the autocorrelation. The first section will center on scalar (or univariate) random variables, and first some important examples related to renewables and power systems will be mentioned. SectionA.2 provides an overview of some of the most relevant probability densities for random variables related to this book. Section A.3 will consider multivariate random variables. This gives us the possibility to introduce concepts like correlations and conditional expectations. Conditional expectations are important for forecasting, since it can be shown (see, e.g., [3]) that, assuming symmetric cost functions and densities, the optimal prediction is the conditional expectation. This section also introduces the multivariate normal density. Later on, in Sect. A.4, discrete time indexed random variables (stochastic processes) will be introduced. The outcome of such stochastic processes is called time series. In general, it will be clear that stochastic processes can be handled like multivariate random variables. In fact, any stochastic process is completely characterized by the so-called family of multivariate densities, all related to vectorizations of the scalar random variables associated with the stochastic process at various fixed time points. Finally, correlation methods for measuring the serial dependence in time series are described in Sect. A.5. J. M. Morales et al., Integrating Renewables in Electricity Markets, 331 International Series in Operations Research & Management Science 205, DOI / , Springer Science+Business Media New York 2014

2 332 A Random Variables and Stochastic Processes A.1 Random Variables Random variables comprise a basis quantity that is used in all branches of probability and statistics. They are the basic description of events that are entailed by a certain level of uncertainty. The outcome or value of a random variable can be a real number (scalar), but one can consider arbitrary types such as boolean, categorical, complex, vector, positive definite matrix, and time series; just to mention some of the most relevant types. After an introduction to random variables, this section will discuss how the likelihood of outcomes of random variables is described by probability density functions and its cumulative counterpart, namely, the cumulative distribution function. Later on, the moments of random variables are presented with a special focus on the expectation (mean) and variance. Finally, the concept of quantiles, which are often used for, e.g., probabilistic wind power forecasting, is briefly introduced, and ultimately, some concepts related to descriptive statistics are provided. Definition A.1 (Random Variable). A random variable X is a variable whose value is subject to variations due to randomness. A random variable does not have a single, fixed value; rather, it can take on a set of possible different values, and each value is associated with a probability. A random variable is said to be discrete if it can only take a number of countable values, i.e., if there exists a countable set S R such that P {X S} =1. In contrast, it is said to be continuous if it can take an uncountable number of values, that is, if the set S such that P {X S} =1is uncountable. S is called the sample space. Each value in the sample space is associated with a probability, which in the discrete case is given directly by the probability mass function (PMF). In the continuous case, the corresponding function is called the probability density function (PDF), however, the values of a PDF are not probabilities as such; the PDF must be integrated over an interval to yield a probability. The cumulated variant is in both cases called the cumulative distribution function (CDF). Let us illustrate these definitions based on examples inspired by renewable energy generation. Example A.2 (Cutoff Event for a Wind Turbine). Wind turbines generate electricity for wind speeds up to the so-called cutoff speed. For higher wind speeds, turbines are shut down for security reasons in order to avoid structural damages. It commonly ranges between 25 and 30 ms -1 nowadays, depending on the turbine manufacturers and models. Consequently, when not having a perfect knowledge of the wind speed seen by a wind turbine, one can define a discrete random variable X such that { 0, if the turbine is shut down X = 1, if the turbine is operating. In that case, X would actually be referred to as a binary random variable, since it can take two values only.

3 A.1 Random Variables 333 Example A.3 (Power Generation of a Wind Turbine). For wind speeds below the cutoff speed, a wind turbine can generate electric power between 0 and its nominal capacity P n (typically between 2 and 5 MW for today s turbines) following its power curve. Having a limited knowledge of the wind conditions at the level of the turbine, one can then define a continuous random variable X such that X may take any value in the interval [0, P n ]. From now on, emphasis will be placed on continuous random variables, since they are the most relevant ones for the description of the types of uncertain processes considered in this book. A.1.1 Probability Densities and Cumulative Distribution Functions Describing discrete random variables is easier than describing continuous ones. For instance, for the case of cutoff events in Example A.2 above, the binary random variable X is fully characterized by the probabilities P{X = 0} and P{X = 1}. This cannot be done for continuous random variables, because their support consists of an uncountable set. Actually, for continuous random variables, P{X = x} = 0, x S. Therefore, in the continuous case, random variables are characterized instead by P{X x}, which leads to the definition of the cumulative distribution function. Definition A.2 (Cumulative Distribution Function). The cumulative distribution function (abbreviated cdf or CDF) for a continuous random variable X is the function F (x) such that The CDF necessarily has the following properties: F (x) = P{X x}, x S. (A.1) (i) it is always positive, F (x) 0, x S, (ii) its left and right limits are 0 and 1, respectively, lim x S = 0 lim F (x) x S + = 1, where S and S + are the left and right bounds of the support S of X, (iii) it is an increasing function, hence for all x 1, x 2 S, x 2 x 1 implies that F (x 2 ) F (x 1 ). In direct relation with the CDF of a random variable, one may define its probability density function. Definition A.3 (Probability Density Function). The probability density function f (x) (abbreviated pdf or PDF) for a continuous random variable X corresponds to

4 334 A Random Variables and Stochastic Processes the derivative of its cumulative distribution function, i.e., f (x) = df, x S. dx (A.2) The PDF necessarily has the following properties: (i) it is always positive, f (x) 0, x S, since F is a nondecreasing function, (ii) its left and right limits are 0, that is, lim x S f (x) = lim x S + f (x) = 0 (S and S + are again the left and right bounds of S, the support of X), (iii) it integrates to 1, x S f (x)dx = 1, given that P[x S] = 1. A consequence of these definitions is that P[x 1 X x 2 ] = F (x 2 ) F (x 1 ) = x2 x 1 f (x)dx. (A.3) A.1.2 Expectations and Moments For a discrete random variable X, the expectation (or the mean) is defined as E[X] = x xp{x = x}, i.e., an average of the possible values of X, each value being weighted by its probability. For continuous variables, expectations are defined as integrals. Definition A.4 (Expectation). The expectation (or mean) of a continuous variable X with probability density function f X is E[X] = xf X (x) dx, (A.4) whenever this integral exists. Note that we will write f X (x) and F X (x) instead of f (x) and F (x), respectively, whenever it is necessary to emphasize to which random variable the function belongs. The expectation is also referred to as the first moment, because it is indeed the first moment of the area under f X with respect to the line x = 0. It is, furthermore, a linear operator, i.e., E[aX 1 + bx 2 ] = ae[x 1 ] + be[x 2 ]. (A.5) If X is a random variable and g is a function, then Y = g(x) is also a random variable, with the expectation E[Y ] = E[g(X)] = This allows us to define higher order moments. g(x)f X (x) dx. (A.6)

5 A.1 Random Variables 335 Definition A.5 (Moments). The n th moment of X is E[X n ] = x n f X (x) dx, (A.7) and the n th central moment is defined as E[(X E[X]) n ] = (x E[X]) n f X (x) dx. (A.8) The second central moment is also called the variance of X. Itisgivenby Var[X] = E[(X E[X]) 2 ] = E[X 2 ] (E[X]) 2. (A.9) For two random variables X i and X j, the central second-order moment is given by the covariance between X i and X j : Cov[X i, X j ] = E[(X i E[X i ])(X j E[X j ]) = E[X i X j ] E[X i ]E[X j ]. (A.10) The covariance gives information about the simultaneous variation of two random variables and is useful for identifying interdependencies. Besides, this definition will be subsequently generalized to describe interdependencies in time-indexed sequences of random variables, i.e., in a stochastic process. By using (A.10) and the fact that the expectation operator is linear, we obtain the following important calculation rule for the covariance: Cov[aX 1 + bx 2, cx 3 + dx 4 ] = accov[x 1, X 3 ] + adcov[x 1, X 4 ] + bccov[x 2, X 3 ] + bdcov[x 2, X 4 ], (A.11) where X 1,..., X 4 are random variables, and a,..., d are constants. The correlation between two random variables, X i and X j, is a normalization of the covariance and can be written as ρ ij = Cov[X i, X j ] Var[Xi ]Var[X j ] = σ ij σ i σ j, where σ ij = Cov[X i, X j ] and σ i = Var[X i ]. (A.12) Example A.4 (Spatial Correlation of Wind Speed Data) Let us consider data for wind speed during the period from 1970 to 1999 measured every6hatanumber of synoptic stations in Europe. Using this data, we have estimated that the correlation between wind speed in the northern part of Germany and the wind speed in Denmark East is 0.64 for the same hour. This rather high correlation tells us that if the wind power generation in Denmark is high, then it is likely that the wind power generation in the northern part of Germany is also high at the same time.

6 336 A Random Variables and Stochastic Processes % of installed capacity % 90% 80% 70% 60% 50% 40% 30% 20% 10% hours Fig. A.1 Probabilistic forecasts of future wind power generation using quantiles (deciles) On the contrary, it is found that the correlation between the wind speed in the northern and southern parts of Spain for the same period in time is only This indicates that, for Spain, the weather phenomena driving the wind speed are very different from the south to the north; maybe, in the southern part of Spain, wind speed is mainly caused by thermal differences, whereas for the northern part the main source of wind is frontal passages. A.1.3 Description by Quantiles This section briefly introduces the concept of quantile, which is of particular importance in the field of probabilistic forecasting. Definition A.6 Provided that F t is a strictly increasing function, the quantile q t (α) with proportion α [0, 1] of the random variable Y t is uniquely defined as the value of y such that P{Y t y} =α, (A.13) or equivalently as q (α) t = F 1 t (α). (A.14) Note that there exists a generalization for the definition of a quantile in the case where F t is not a strictly increasing function. It is, though, not discussed here, for simplicity. Quantiles are often used to describe the uncertainty of, e.g., a wind power forecast. This is illustrated in Fig. A.1, which shows a probabilistic forecast of the future wind power production by plotting the quantiles (deciles) of the future wind power generation up to 144 h ahead. The forecasts in Fig. A.1 are generated by the WPPT approach as described in [5] and [6].

7 A.1 Random Variables 337 Wind Speed Denmark East Histogram Box Plot Frequency Wind Speed Density Non parametric density Sample Quantiles Normal Q Q Plot Wind Speed Theoretical Quantiles Fig. A.2 Some descriptive statistical analysis of the variations of wind speed in the eastern part of Denmark for the period from 1970 to 1979 A.1.4 Descriptive Statistics So far, the discussion has focused on probability theory. We have, for example, assumed that the probability density function of a certain random variable is known. This is, however, seldom the case in practice, where such a function has to be inferred from a set of data. Descriptive statistics is the discipline of quantitatively describing the most important features of the data. In this case, the sample size is important, since obviously more precise inference about, e.g., the mean value can be obtained from large sample sizes. Univariate analysis involves methods for describing the distribution or variation of a single variable. This can be done graphically by means of, e.g., the histogram and the box plot. It can also be done quantitatively by a measure of the central tendency of the variable (including the mean, median, and mode), the dispersion of the data (through the range and quantiles, for example), and measures of spread (including the variance and inter quantile range, among others). Bivariate analysis is relevant for samples with two or more variables and is used to describe the relationship between these variables. Important tools of bivariate analysis comprise scatter plots and quantitative measures of dependence like the empirical correlation, Pearson s r, rank correlation, and empirical covariance. Example A.5 (Descriptive Statistics). Figure A.2 illustrates some of the possibilities for analyzing the wind speed data measured in the eastern part of Denmark from

8 338 A Random Variables and Stochastic Processes 1970 to 1979 (both years are included) using descriptive statistics. The upper left plot shows the histogram of wind speed for this period. A box plot of the same data is shown in the upper right part of the figure. The horizontal bar in the middle indicates the median of the wind speed data, and the bottom and the top of the box show the 25th and 75th percentiles, respectively, also referred to as the first and third quartiles and denoted by Q 1 and Q 3, in that order. The difference between these percentiles is known as inter-quartile range (IQR). The lowest and highest whiskers of the box plot are then computed as Q1 1.5 IQR and Q IQR from the values of wind speed for the considered period, i.e., from 1970 to Finally, the points in the box plot are called outliers. The lower left plot of Fig. A.2 presents a non-parametric estimate of the probability density function using a kernel method (see [2]). Very often it is difficult to identify a known density function for the data and, in these cases, such non-parametric methods prove to be very useful. A simple test for normality (and, in many ways, the best) is the quantile-quantile plot, which displays ranked values from the sample against a similar number of random numbers simulated using a normal distribution. If the sample is actually normally distributed, then the line of points will be straight. Departures from a normal distribution, as seen here, show up in various sorts of non-linear relations. Therefore, this plot highlights that the wind speed data under consideration are not normally distributed. This is not surprising, as it is well-known that wind speed is better described by a Weibull distribution [8]. Prior to investing in a new wind farm, the potential for wind power production is most often assessed using a statistical analysis of existing data from the candidate site or by correlating wind speed measurements from the actual site for a shorter period to wind speed data from a site which has a longer record of measured wind speed data. This analysis most often assumes stationarity, a concept which we shall study in detail in the section on stochastic processes later on. Let us now, as a simplistic approach, just compare the statistics for wind speed measurements for and for the eastern part of Denmark using the box plot. The results are shown in Fig. A.3 and seem to indicate that the wind speed has increased from the period to the period in the eastern part of Denmark. A.2 Some Relevant Probability Distributions There exists a large number of probability distributions for continuous variables, which are more or less relevant depending upon the considered application. A gentle introduction to probability distributions for general applications is, for instance, givenin[1] and [4]. Let us provide here a brief overview of a set of probability distributions of particular relevance when it comes to the modeling of renewable energy generation and of market quantities (e.g., prices and energy quantities). Uniform Distribution The Uniform distribution, denoted U[a, b], is one of the simplest probability distributions. It is fully characterized by the bounds a and b of

9 A.2 Some Relevant Probability Distributions 339 Wind Speed Box Plot and Data Period Fig. A.3 A comparison of wind speed measurements for the periods and for the eastern part of Denmark its support. Stating that X is distributed U[a, b], i.e., X U[a, b], means that X has the same likelihood of taking any value in the interval [a, b]. Its PDF is then given as f (x) = { (b a) 1, x [a, b] 0, otherwise, while the corresponding CDF writes 0, x<a F (x) = (x a)(b a) 1, x [a, b) 1, x b. (A.15) (A.16) The PDF and CDF of the Uniform distribution are depicted in Fig. A.4. The uniform distribution has a number of interesting applications in the modeling of renewable energy generation and electricity markets. For instance, when issuing a probabilistic forecast for renewable energy generation, the most simple and naive prediction takes the form of a uniform distribution with its left bound a being 0 and its right bound b being the nominal capacity of the renewable energy portfolio.

10 340 A Random Variables and Stochastic Processes Fig. A.4 Probability density (bottom) and cumulative distribution (top) functions for the Uniform distribution U[0,1] 1 f(x) F(x) x Gaussian Distribution The Gaussian (also called Normal) distribution is the most commonly used probability distribution owing to its nice analytical properties, among other reasons. Furthermore, the Central Limit Theorem (CLT) states that the mean of a large number of independent random variables, given some conditions, will be approximately normally distributed. The Gaussian density is fully and uniquely characterized by its mean μ and its variance σ 2. Stating that X is distributed N(μ, σ 2 ), i.e., X N(μ, σ 2 ), implies that X has a decreasing likelihood of taking values around its mean μ (which, besides, coincides with its mode and median). More specifically, the likelihood

11 A.2 Some Relevant Probability Distributions 341 Fig. A.5 Probability density (bottom) and cumulative distribution (top) functions for the Gaussian distribution and for various values of its mean μ and standard deviation σ 1 μ=0, σ=1 μ=0, σ=1.5 μ=1, σ=1 F(x) f(x) x decreases following a bell shape, as illustrated in Fig. A.5. Its PDF is then given by f (x) = 1 { σ 2π exp 1 } (x μ)2, (A.17) 2σ 2 which is naturally symmetric around μ. Moreover, the corresponding CDF writes F (x) = 1 { ( )} x μ 1 + erf 2 σ, (A.18) (2) where erf is the error function, defined as erf(x) = 2 π x 0 exp ( t 2 )dt. The PDF and CDF of various Gaussian distributions are depicted in Fig. A.5. (A.19)

12 342 A Random Variables and Stochastic Processes Beta Distribution The Beta distribution is attractive for the modeling of bounded and nonlinear variables. This is so because this distribution is, by definition, bounded between 0 and 1, or any values a and b after appropriate scaling. It is fully characterized by its two shape parameters α and β, which are directly related to the mean μ and standard deviation σ of that distribution as follows: μ = α(α + β) 1 (A.20) σ = αβ(α + β + 1) 1 (α + β) 2. (A.21) Consider that a random variable X is distributed Beta(α, β), i.e., X Beta(α, β). The PDF of X has then the following form: f (x) = 1 B(α, β) x(α 1) (1 x) (β 1), (A.22) where B(α, β) is the Beta function, which serves as a normalizing constant for the PDF to integrate to 1. The Beta function is defined as B(α, β) = 1 while its incomplete counterpart is given by B(x; α, β) = 0 x 0 t (α 1) (1 t) (β 1) dt, t (α 1) (1 t) (β 1) dt. The CDF of the Beta distribution is consequently obtained as F (x) = B(x; α, β) B(α, β). (A.23) (A.24) (A.25) The PDF and CDF of the Beta distribution are depicted in Fig. A.6, for different values of α and β. Notice that different combinations of these two parameters may result in very different shapes, each characterized by a particular degree of skewness. For this reason, the Beta distribution is considered to be a quite flexible probability distribution for modeling data. A.3 Multivariate Random Variables The theory behind multivariate random variables is useful both for studying the relation between two (or more) random variables and for analyzing stochastic processes and time series. This is due to the fact that the theory for describing stochastic

13 A.3 Multivariate Random Variables 343 Fig. A.6 Probability density (bottom) and cumulative distribution (top) functions for the Beta distribution and for various values of α and β 1 F(x) α=0.5, β=0.5 α=1, β=3 α=2, β= f(x) x processes is basically the same as the theory for describing multivariate random variables. An n-dimensional random variable (random vector) is a vector of n scalar random variables. The random vector is written as X = X 1 X 2. X n. Random vectors will also be referred to as multivariate random variables. (A.26)

14 344 A Random Variables and Stochastic Processes A.3.1 Joint Densities The n-dimensional random variable X has the joint distribution function F (x 1,..., x n ) = P{X 1 x 1,..., X n x n }. (A.27) If X takes values on a continuous sample space, the joint (probability) density function is defined as f (x 1,..., x n ) = n F (x 1,..., x n ) x 1... x n, (A.28) and the relation between the distribution and density function is given by F (x 1,..., x n ) = x1 xn f (t 1,..., t n ) dt 1...dt n. (A.29) A multivariate random variable, X, is called discrete if it takes values on a discrete (countable) sample space. In this case, the joint density function (or mass function) is defined as f (x 1,..., x n ) = P {X 1 = x 1,..., X n = x n }, and the joint distribution and mass functions are related by F (x 1,..., x n ) = f (t 1,..., t n ). t 1 x 1 t n x n (A.30) (A.31) A.3.2 Marginal Densities For a sub-vector S = (X 1,..., X k ) T (k<n)ofthen-dimensional random vector X the marginal density function is f S (x 1,..., x k ) = in the continuous case, and f S (x 1,..., x k ) = x k+1 f (x 1,..., x n ) dx k+1...dx n (A.32) x n f (x 1,..., x n ); (A.33) if X is discrete. In both cases the marginal distribution function is F S (x 1,..., x k ) = F (x 1,..., x k,,..., ). (A.34)

15 A.3 Multivariate Random Variables 345 A.3.3 Conditional Densities and Independence The concept of forecasting is very important in power system operations and planning, and in electricity markets. When forecasting, we want to make a statement about future values of a certain time series given past observations. The full probabilistic information (full probabilistic forecast) about the value of a time series at a single future time instant is provided by the conditional density function, whereas a point forecast refers only to the conditional mean of the future value (see, for example, [3]). Suppose that the continuous random variables X and Y have joint density f X,Y. Based on the fact that f X (x) is positive for some values of x, we introduce next the concept of conditional density function. Definition A.7 (Conditional Density). The conditional density (function) of random variable Y given X = x is defined as f Y X=x (y) = f X,Y (x, y), (f X (x) > 0), (A.35) f X (x) where f X,Y is the joint density function of X and Y. It should be noticed that both X and Y may be multivariate random variables. The conditional distribution function is then found by integration or summation, as previously described in (A.29) and (A.31) for the continuous and discrete cases, respectively. It follows from (A.35) that f X,Y (x, y) = f Y X=x (y)f X (x), (A.36) and by interchanging X and Y on the right-hand side of (A.36), we obtain Bayes rule: f Y X=x (y) = f X Y =y(x)f Y (y). (A.37) f X (x) We are now in a position to introduce the concept of independence. Definition A.8 (Independence). X and Y are independent if f X,Y (x, y) = f X (x)f Y (y), (A.38) which corresponds to F X,Y (x, y) = F X (x)f Y (y). (A.39) If X and Y are independent, it clearly holds that f Y X=x (y) = f Y (y). (A.40) Notice that if two random variables are independent, then they are also uncorrelated, while uncorrelated variables are not necessarily independent.

16 346 A Random Variables and Stochastic Processes A.3.4 Conditional Expectations The conditional expectation is the expected value of a random variable with respect to a conditional probability distribution, that is, given (or conditional on) values of another random variable. Under a symmetric cost function or the minimum variance criterion, the optimal forecast is the conditional mean, which is hence often used in optimal forecasts of load or wind power production. Definition A.9 (Conditional Expectation). The conditional expectation of the random variable Y given X = x is E[Y X = x] = yf Y X=x (y) dy. (A.41) Remark A.1. It is important to underline that E[Y X] is actually a random variable, with its own distribution and, for instance, a mean value E[E[Y X]]. However, for a given value of X, say X = x, we have that E[Y X = x] is a number (for example, the point forecast of the future wind power output of a wind farm given a forecast of the wind speed). A.3.5 Moments for Multivariate Random Variables Often only the first moment (the mean value) and the second central moment (the variance) of a random vector are considered instead of the entire probability density function. This leads us to extend the concepts of expectation and covariance to multivariate random variables. Definition A.10 (Expectation of the Random Vector). The expectation (or the mean value) μ of the random vector X is μ = E[X] = E[X 1 ] E[X 2 ]. E[X n ]. Definition A.11 (Covariance). The covariance (matrix) X of X is given by X = Var[X] = E[(X μ)(x μ) ] Var[X 1 ] Cov[X 1, X 2 ]... Cov[X 1, X n ] Cov[X 2, X 1 ] Var[X 2 ]... Cov[X 2, X n ] =... Cov[X n, X 1 ] Cov[X n, X 2 ]... Var[X n ] (A.42) (A.43)

17 A.3 Multivariate Random Variables 347 In some cases, the following notation is used instead: σ1 2 σ σ 1n σ 21 σ σ 2n =... σ n1 σ n2... σn 2. (A.44) Furthermore, for the variance, σi 2, the alternative notation σ ii is sometimes employed. We can now use the definition of the correlation coefficient (A.12) to introduce the correlation matrix of a random vector. Definition A.12 (Correlation Matrix). The correlation matrix for X is 1 ρ ρ 1n ρ ρ 2n R = ρ =.... (A.45) ρ n1 ρ n It can be easily shown that the covariance matrix and the correlation matrix R are both (a) symmetric and (b) positive semi-definite (see, e.g., [3] and [7]). The covariance between two multivariate random variables is also of particular interest and therefore, we present it below. Definition A.13 The covariance (matrix) XY between a random vector X with dimension p and mean μ, and another random vector Y with dimension q and mean ν is computed as XY = C[X, Y ] = E [ (X μ)(y ν) ] (A.46) Cov[X 1, Y 1 ]... Cov[X 1, Y q ] =..... (A.47) Cov[X p, Y 1 ]... Cov[X p, Y q ] Clearly, we have that C[X, X] = Var[X]. Let us now consider a generalization of the covariance calculation rules (A.11) to the multivariate case. Let X and Y be defined as in Definition (A.13), and let A and B be real matrices. Let U and V be random vectors of appropriate dimensions. Then C[A(X + U), B(Y + V ) = AC[X, Y ]B + AC[X, V ]B (A.48) Important special cases are +AC[U, Y ]B + AC[U, V ]B. (A.49) Var[AX] = AVar[X]A (A.50) C[X + U, Y ] = C[X, Y ] + C[U, Y ]. (A.51)

18 348 A Random Variables and Stochastic Processes A.3.6 The Multivariate Normal Distribution In power engineering, the normal distribution, and the distributions derived from it, are of great use. Let us therefore briefly introduce the multivariate normal probability density function (PDF). We assume that X 1, X 2,..., X n are independent random variables with means μ 1, μ 2,..., μ n, and variances σ1 2, σ 2 2,..., σ n 2. We write X i N(μ i, σi 2). Now, define the random vector X = (X 1, X 2,..., X n ). As the random variables are independent, it follows from (A.38) that f X (x 1,..., x n ) = f X1 (x 1 )...f Xn (x n ) (A.52) = n 1 exp [ (x i μ i ) 2 ] σ i 2π (A.53) = i=1 2σ 2 i 1 ( n i=1 σ ) [ exp 1 i (2π) n/2 2 n [ ] ] xi μ 2 i. (A.54) By introducing the mean vector μ = (μ 1,..., μ n ) and the covariance matrix X = Var[X] = diag(σ1 2,..., σ n 2 ), this can be written as f X (x) = i=1 1 (2π) n/2 det X exp [ 1 2 (x μ) 1 X (x μ)]. σ i (A.55) Note that the operator diag( ) provides a square matrix with the elements of the specified vector on the main diagonal and det X denotes the determinant of the covariance matrix X. The following definition generalizes (A.55) to the case where the covariance matrix is a full matrix. Definition A.14 (The Multivariate Normal Distribution). The joint density function for the n-dimensional random variable X with mean μ and covariance is 1 f X (x) = (2π) n/2 det exp [ 1 2 (x μ) 1 (x μ) ], (A.56) where >0. We write X N(μ, ). If X N(0, I), we say that X is standardized normally distributed. A.4 Stochastic Processes In general, a process {X z,t } is defined as the evolution of a variable X with time t and through space z. Such a process can be deterministic or stochastic. In the former case, even if chaotic, a perfect knowledge of the initial state and the governing equations of

19 A.4 Stochastic Processes 349 X(t,.) X(., w 1 ) X(., w 2 ) X(., w 3 ) Time Fig. A.7 Realizations of a stochastic process. Each realization is a time series this process permits to perfectly predict its future states. In contrast, for a stochastic one, even if the initial state of the system and its governing equations are known, several evolution paths are possible. Throughout the book, it will be assumed that the process of renewable energy generation, being from the wind, sun, or waves, is stochastic. Hence, it may not be possible to perfectly determine future values of that process; forecasts will always have a share of uncertainty. In the following, we will thus focus on stochastic processes. A.4.1 Time Series and Stochastic Processes A time series is a realization of a stochastic process. That is, the historical record (given as a time series) {x t, t = 0, ±1,...} is assumed to be one realization out of an infinite number of possible realizations of the stochastic process {X t, t = 0, ±1,...}. A stochastic process is defined as a family of random variables, {X(t)} or {X t }, where t belongs to a totally ordered index set T. In the first case, we are considering a process in continuous time, while in the latter, a process in discrete time. In this section, only processes in continuous time will be described for ease of notation, although the results and definitions presented next can be easily extended to the discrete case as well. In essence, a stochastic process is a function having two arguments, {X(t, ω), t T, ω Ω}, where Ω is the sample space or the ensemble of the process. The stochastic process takes values on the state space S. This implies that Ω is the set of all the possible time series that can be generated by the process. For a fixed t, we say that X(t, ) isarandom variable, and for fixed ω Ω, we call it a realization of the process, i.e., a time series X(, ω); see Fig. A.7.

20 350 A Random Variables and Stochastic Processes Table A.1 Classification of some stochastic processes Time (Index set, T ) State space (S) Discrete Continuous Discrete Markov chain processes Time series processes Continous Point processes Stochastic differential equations. Both the index set (usually interpreted as time) T and the state space S can be either discrete or continuous leading to the four different categories of stochastic processes included in Table A.1. Due to the fact that a stochastic process is a family of random variables, its distributional specifications are almost equivalent to what we have seen for multidimensional random variables. The function f X(t1 ),...,X(t n )(x 1,..., x n ) is called the n-dimensional probability distribution function for the process {X(t)}. The family of all these probability distribution functions, for all values of n, i.e., for n = 1, 2,..., and for all values of t, forms the family of finite-dimensional probability distribution functions, which fully characterizes the stochastic process {X(t)}. In this appendix, we will limit ourselves to temporal processes. However, in some applications such as those involving variations in time and space of renewable power production, spatiotemporal processes are needed. These spatiotemporal stochastic processes can be equivalently seen, though, as a family of random variables indexed by time t and space z. Rather than using the family of probability distribution functions, stochastic processes are typically described by their mean value and the so-called covariance functions, which are introduced in Sect. A.4.2. Subsequently, Sect. A.4.3. describes some characteristic types of processes, including the important family of stationary stochastic processes, which will be analyzed in more detail in Sect. A.5. A.4.2 Mean Value and Covariance Functions The concept of moment, first introduced in Sect. A.1.2 for a single random variable and subsequently extended in Sect. A.3.5. to random vectors, is now applied here to the case of stochastic processes. We begin with the simplest moment, i.e., the mean value, which, for a stochastic process {X(t)}, isgivenby μ(t) = E[X(t)] = xf X(t) (x) dx. (A.57) Notice that, for a stochastic process, the mean value, also known as first moment, is actually a function of t and accordingly, μ(t) in(a.57) is often referred to as the mean value function.

21 A.4 Stochastic Processes 351 Similarly, we can define the variance (or the variance function) of the process, i.e., σ 2 (t) = Var[X(t)] = E [ (X(t) μ(t)) 2]. (A.58) When it comes to stochastic processes, the concept of autocovariance function deserves special mention. Definition A.15 (Autocovariance Function). The autocovariance function γ XX of a stochastic process {X(t)} is defined as γ XX (t 1, t 2 ) = γ (t 1, t 2 ) = Cov[X(t 1 ), X(t 2 )] = E[(X(t 1 ) μ(t 1 ))(X(t 2 ) μ(t 2 ))]. (A.59) Note that the variance of the process can be obtained as a special case of (A.59). Indeed, it holds that σ 2 (t) = γ (t, t). The mean value and autocovariance functions constitute the second-order moment representation of the process. Lastly, the autocorrelation function (ACF) is obtained by normalizing the autocovariance function with the variance function, i.e., ρ XX (t 1, t 2 ) = ρ(t 1, t 2 ) = γ XX(t 1, t 2 ) σ 2 (t 1 )σ 2 (t 2 ). (A.60) In a similar manner, we can define moments of higher order. A.4.3 Characteristics for Stochastic Processes In this section, we will introduce a number of useful characteristics for stochastic processes. This will pave the way for a more detailed study of stationary processes later on. A Stationary Processes Definition A.16 (Strong Stationarity). A process {X(t)} is said to be strongly stationary if all finite-dimensional distributions are invariant in time, i.e. for every n, any set (t 1, t 2,..., t n ), and for any h, it holds f X(t1 ),...,X(t n )(x 1,..., x n ) = f X(t1 +h),...,x(t n +h)(x 1,..., x n ). (A.61) Definition A.17 (Weak Stationarity). A process {X(t)} is said to be weakly stationary of order k if all its first k moments are invariant in time. A weakly stationary process of order 2 is simply called weakly stationary.

22 352 A Random Variables and Stochastic Processes Theorem A. 1. A weakly stationary process is characterized by the fact that both the mean value and the variance are constant, while the autocovariance function depends only on the time difference, i.e., on (t 1 t 2 ), which is called the time lag. Proof. It follows directly from Definition A.17. In the following, we will use the term stationary to denote a weakly stationary process. A Normal Processes Definition A.18 (Normal Process). A process {X(t)} is said to be a normal process (or, alternatively, a Gaussian process) if all the n-dimensional distribution functions f X(t1 ),...,X(t n )(x 1,..., x n ) for any n are (multidimensional) normal distributions. A normal process is completely specified by its mean value function μ(t i ) = E[X(t i )], (A.62) and autocovariance function γ (t i, t j ) = Cov[X(t i ), X(t j )]. (A.63) By introducing the vector μ = (μ(t 1 ), μ(t 2 ),..., μ(t n )) T and the variance matrix ={γ (t i, t j )}, the joint distribution for X = (X(t 1 ), X(t 2 ),..., X(t n )) is given by ( 1 f X (x) = (2π) n/2 det exp 1 ) 2 (x μ) 1 (x μ), (A.64) where is assumed to be regular. Notice the similarity with the multivariate normal density in (A.56). For normal processes, weak stationarity and strong stationarity are equivalent, since a normal process is completely characterized by its first two moments. A Markov Processes Definition A.19 (Markov Process). A process {X(t)} is called a Markov process, if for t 1 <t 2 < <t n, the distribution of X(t n ) given (X(t 1 ),..., X(t n 1 )) is the same as the distribution of X(t n )givenx(t n 1 ). This implies that P{X(t n ) x X(t n 1 ),..., X(t 1 )}=P{X(t n ) x X(t n 1 )}. (A.65) A Markov process is thus characterized by the fact that all the information about X(t n ) from past observations of {X(t)} is contained in the just previous observation X(t n 1 ). Because only the most recent observation is needed, the process is also called a first order Markov process.

23 A.4 Stochastic Processes 353 A The White Noise Process Definition A.20 (White Noise Process). A sequence {ε t } of uncorrelated identically distributed variables with E[ε t ] = 0 and Var[ε t ] = σ 2 is called a white noise process. Example A.6 (AR(1) Process, Part I). Let {ε t } be a normally distributed white noise with E[ε t ] = 0 and Var[ε t ] = σ 2, and let {ε t } be the input to a dynamical system defined by the difference equation Y t = φy t 1 + ε t, (A.66) which defines a stochastic process {Y t }. This process is called an Autoregressive process of first order, in short, AR(1). The name indicates that new values of Y are obtained by regression on old values of Y, and the order is given by the order of the difference equation. By successively substituting Y t 1 = φy t 2 + ε t 1, Y t 2 = φy t 3 + ε t 2,... on the right-hand side of (A.66), it is seen that Y t can be written as Y t = ε t + φε t 1 + φ 2 ε t 2 + +φ i ε t i +... (A.67) Moreover, by using the fact that the expectation operator is linear, one finds that μ Y = E[Y t ] = 0, (A.68) and σ 2 Y = Var[Y t] = (1 + φ 2 + φ 4 + +φ 2i + )σ 2 = σ 2 provided that φ < 1. For φ 1, the variance is unbounded. Finally, we have the covariance function γ (t 1, t 2 ) = Cov[Y t1, Y t2 ] 1 φ 2, (A.69) = Cov[ε t1 + φε t φ t 1 t 2 ε t2 +, ε t2 + φε t2 1 + ] = φ t 1 t 2 (1 + φ 2 + φ )σ 2 = φ t 1 t 2 σ 2 Y, for t 1 >t 2. Notice that Cov[ε ti, ε t2 + φε t ] = 0 for all t i >t 2. Similarly, for t 1 <t 2, we obtain γ (t 1, t 2 ) = φ t 2 t 1 σ 2 Y. Observe that γ (t 1, t 2 ) depends only on the time lag t 1 t 2, i.e., γ (t 1, t 2 ) = γ (t 1 t 2 ) = φ t 1 t 2 σ 2 Y. (A.70)

24 354 A Random Variables and Stochastic Processes Since the mean value and the variance are constant for φ < 1, and the autocovariance function depends only on the time difference, {Y t } is a weakly stationary process for φ < 1. In addition, it is well known that a sum of normally distributed random variables is also normally distributed. Consequently, since ε t is normally distributed, {Y t } is a normal process. The process (A.66) is thus strongly stationary for φ < 1. Last, from (A.66), it can be seen that P{Y t y Y t 1, Y t 2,...}=P{Y t y Y t 1 }, (A.71) and therefore, {Y t } is a (first order) Markov process. A Deterministic Processes Definition A.21 (Deterministic Processes). A process is said to be deterministic if it can be predicted without uncertainty from past observations. Example A.7 The process {Y t } defined by Y t = φ 1 Y t 1 φ 2 Y t 2 (Y 0 = A 1 ; Y 1 = A 2 ), (A.72) where φ 1 and φ 2 are constants, is a deterministic process. Here A 1 and A 2 are random variables. A Purely Stochastic Processes Definition A.22 (Purely Stochastic Process). A process is said to be (purely) stochastic, if it can be written in the form X t = ε t + ψ 1 ε t 1 + ψ 2 ε t 2 + where {ε t } is a sequence of uncorrelated stochastic variables, with E[ε t ] = 0, Var[ε t ] = σ 2, and i=0 ψ 2 i <. A.5 Serial Dependence We begin this section with the so-called autocovariance and autocorrelation functions, which are used to describe the serial dependence for a single time series, and conclude with the cross-correlation function, which serves to characterize the serial dependence between two time series.

25 A.5 Serial Dependence 355 A.5.1 Autocovariance and Autocorrelation Functions For a stochastic process, {X(t)}, the autocovariance function is given by γ XX (t 1, t 2 ) = γ (t 1, t 2 ) = Cov[X(t 1 ), X(t 2 )], (A.73) where the variance of the process is σ 2 (t) = γ XX (t, t). Similarly, the autocorrelation function (ACF) is defined as ρ XX (t 1, t 2 ) = ρ(t 1, t 2 ) = γ XX(t 1, t 2 ) σ 2 (t 1 )σ 2 (t 2 ). (A.74) For Stationary Processes If the process is stationary, then (A.74) is only a function of the time lag t 2 t 1. Denoting the time difference by τ, the autocovariance and autocorrelation functions for a stationary process {X(t)} boil down to γ XX (τ) = Cov[X(t), X(t + τ)], (A.75) and ρ XX (τ) = γ XX(τ) γ XX (0) = γ XX(τ), (A.76) σ 2 X respectively, where σx 2(t) is the variance of the process. Note that ρ XX(0) = 1. The following example illustrates how to compute the autocorrelation function for an AR(1) process. Example A.8 (AR(1) Process, Part II). Consider the same stochastic process as in (A.66). This process is stationary for φ < 1. For k>0, it holds γ (k) = γ ( k) = Cov[Y t, Y t k ] = Cov[φY t 1 + ε t, Y t k ] = φcov[y t 1, Y t k ] = φγ(k 1) = φ 2 γ (k 2) =... Therefore, we obtain that γ (k) = φ k γ (0). Since γ (k) is an even function, it follows that γ (k) = φ k γ (0), and thus, the autocorrelation function becomes ρ(k) = φ k, φ < 1. (A.77)

26 356 A Random Variables and Stochastic Processes 1.0 ACF for wind speed (DK East; ) ACF Lag Fig. A.8 Autocorrelation function for wind speed in Denmark (East). Note that the x-axis represents hours The argument k is often referred to as the lag, i.e., the time distance. The coefficient φ determines the memory of the process. For φ close to 1, there is a long memory, while the memory is short for small values of φ. Finally, ρ(k) will oscillate for φ<0. Example A.9 (Autocorrelation Functions for Wind Speed in Denmark and Spain). Figures A.8 and A.9 show the autocorrelation function for wind speed in Denmark East and the central part of Spain, respectively. In both cases, the data correspond to wind speed observations collected every 6 h during the period from 1970 to The autocorrelation functions turn out to be rather different. For Spain, a clear diurnal pattern is observed, whereas this is not the case for Denmark. On the other hand, we see that the hourly persistence is higher for the wind speed in Denmark than for the wind speed in Spain. A similar analysis only focused on the northern region of Spain reveals a much lower diurnal variation of the wind speed in this part of the country, but a higher persistence. Indeed, for the wind speed in the northern part of Spain, the 6-h correlation is almost 0.8, whereas this value decreases to around 0.5 for the middle part of the country. A.5.2 Cross-Covariance and Cross-Correlation Functions Let us consider now the stochastic processes {X(t)} and {Y (t)}. The dependence structure between these two stochastic processes is described by the cross-covariance

27 A.5 Serial Dependence ACF for wind speed (Spain; ) ACF Lag Fig. A.9 Autocorrelation function for wind speed in the central part of Spain. Note that the x-axis represents hours function, which is determined as γ XY (t 1, t 2 ) = Cov[X(t 1 ), Y (t 2 )] = E[(X(t 1 ) μ X (t 1 ))(Y (t 2 ) μ Y (t 2 )), (A.78) where μ X (t) = E[X(t)] and μ Y (t) = E[Y (t)]. Similarly to (A.74), we define the cross-correlation function as follows: ρ XY (t 1, t 2 ) = where σx 2(t) = Var[X(t)] and σ Y 2 (t) = Var[Y (t)]. For Stationary Processes γ XY (t 1, t 2 ), (A.79) σx 2(t 1)σY 2(t 2) If the stochastic processes {X(t)} and {Y (t)} are stationary 1, the cross-covariance and cross-correlation functions, (A.78) and (A.79), become respectively. γ XY (τ) = Cov[X(t), Y (t + τ)] and (A.80) ρ XY (τ) = γ XY (τ) γxx (0)γ YY (0) = γ XY (τ), σ X σ Y (A.81) 1 Actually, the bivariate process (X(t), Y (t)) T must be stationary.

28 358 A Random Variables and Stochastic Processes UK UK & DK_East ACF lag (in hours) DK_East & UK lag (in hours) DK_East ACF lag (in hours) lag (in hours) Fig. A.10 Autocorrelation functions for UK and eastern Denmark (in the diagonal) and the crosscorrelation function between UK and eastern Denmark (outside the diagonal). Note that the x-axes represent hours Example A.10 (Auto-Correlation and Cross-Correlation Functions for Wind Speed in UK and Eastern Denmark) Figure A.10 shows the auto-correlation and cross-correlation functions for wind speed in UK and Denmark (East) for the period from 1970 to It is seen that the autocorrelation functions for UK and Denmark are very similar and rather different from the autocorrelation function shown previously in Fig. A.9 for the wind speed in Spain. The figure also shows the cross-correlation function between the wind speed in UK and Denmark (East). Since the maximum of the cross-correlation function is apparently around the lags corresponding to 18 and 12 h, it is concluded that the phase shift between UK and Denmark for the wind speed is on average between 12 and 18 h. The cross correlation is rather high, indicating that it would probably be beneficial to exploit this cross correlation in any tool for wind power forecasting in Denmark. Similar analyses using cross-correlation functions have shown that the cross correlation between Germany and Greece is negative, which indicates that if the wind speed is high in Germany, then the wind speed is low in Greece. References 1. Bury, K.: Statistical Distributions in Engineering. Cambridge University Press, New York (1999) 2. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2001)

29 References Madsen, H.: Time Series Analysis. Chapman & Hall/CRC, Boca Raton (2008) 4. Madsen, H., Thyregod, P.: Introduction to General and Generalized Linear Models. Chapman & Hall/CRC, Boca Raton (2011) 5. Nielsen, H.A., Madsen, H., Nielsen, T.S.: Using quantile regression to extend an existing wind power forecasting system with probabilistic forecasts. Wind Energy 9(1-2), (2006) 6. Pinson, P., Nielsen, H.A., Møller, J.K., Madsen, H., Kariniotakis, G.N.: Non-parametric probabilistic forecasts of wind power: Required properties and evaluation. Wind Energy 10(6), (2007) 7. Rao, C.R.: Linear Statistical Inference and its Applications. Wiley, New York (1973) 8. Zaharim, A., Najid, S., Razali, A., Sopian, K.: The suitability of statistical distribution in fitting wind speed data. WSEAS Trans. Math. 7(12), (2008)

30 Appendix B Basics of Optimization In this appendix, we review some basics of optimization. In Sect. B.1, we introduce the mathematical formulation for both general and linear optimization problems. Duality theory in linear programming is briefly presented in Sect. B.2. The Karush Kuhn Tucker (KKT) optimality conditions are presented in Sect. B.3. Finally, Mathematical Programs with Equilibrium Constraints (MPECs) are introduced in Sect. B.4. B.1 Formulation of an Optimization Problem The general mathematical formulation of an optimization problem is: Min. f (x) x s.t. h(x) = 0, g(x) 0. (B.1a) (B.1b) (B.1c) Problem (B.1) includes the following elements: x R n is a vector including the n decision variables. f ( ):R n R is the objective function of the optimization problem. It maps values of the decision vector x to a real value representing the desirability of this solution to the decision-maker. Typically the objective function represents a cost in minimization problems or a benefit in maximization ones. h( ):R n R m and g( ):R n R l are vector-valued functions of the decision vector x. They define m equality and l inequality constraints through (B.1b) and (B.1c), respectively. Note that we assume that the zero-valued vectors on the right-hand side of (B.1b) and (B.1c) are properly sized to match the dimension of the vectors on the left-hand side. The joint enforcement of equalities (B.1b) and inequalities (B.1c) defines the feasibility region of the optimization problem. A decision x is called feasible if it satisfies (B.1b) (B.1c). J. M. Morales et al., Integrating Renewables in Electricity Markets, 361 International Series in Operations Research & Management Science 205, DOI / , Springer Science+Business Media New York 2014

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions