Brooklyn College, CUNY. Lecture Notes. Christian Beneš

Size: px
Start display at page:

Download "Brooklyn College, CUNY. Lecture Notes. Christian Beneš"

Transcription

1 Brooklyn College, CUNY Math 4506 Time Series Lecture Notes Spring 2015 Christian Beneš

2 Math 4506 (Spring 2015) January 28, 2015 Prof. Christian Beneš Lecture #1: Introduction; Probability Review 1.1 What this course is about: time series and time series models Essentially, all models are wrong, but some are useful. George E. P. Box Probabilists study time series models. These are abstract random objects which are completely well-defined and can generate sets of data (using random number generators). Statisticians study time series (which are data sets) and try to find the right model for it, that is, the time series model from which the data could have been generated. In that sense, probabilists and statisticians do the opposite job, the first being (arguably) more elegant, the second being (definitely) more practical. Below are some examples of time series. The first three are real-world data. The following 6 are computer-generated. Our goal in this course will be to find ways to construct models from which these data could have arisen Baltimore city annual water use, liters per capita per day,

3 ed Index Daily value of one $US in Euros, May 6, May 6,

4 Closing value of NASDAQ 100 index, July January 23,

5 Ten random data points. What can we say about the underlying distribution? What about these 10 data points? Same question 1 4

6 Scale is important when visualizing data. Here are the same data sets as on the previous page, shown all three at the same scale:

7 It turns out that these data are drawn from the (multivariate) normal distributions N(0, Σ 1 ), N(0, Σ 2 ), N(0, Σ 3 ), respectively, where Σ 1 = , /5 4/5 4/5 4/5 4/5 4/5 4/5 4/5 4/5 4/5 1 4/5 4/5 4/5 4/5 4/5 4/5 4/5 4/5 4/5 4/5 1 4/5 4/5 4/5 4/5 4/5 4/5 4/5 4/5 4/5 4/5 1 4/5 4/5 4/5 4/5 4/5 4/5 Σ 2 = 4/5 4/5 4/5 4/5 1 4/5 4/5 4/5 4/5 4/5 4/5 4/5 4/5 4/5 4/5 1 4/5 4/5 4/5 4/5, 4/5 4/5 4/5 4/5 4/5 4/5 1 4/5 4/5 4/5 4/5 4/5 4/5 4/5 4/5 4/5 4/5 1 4/5 4/5 4/5 4/5 4/5 4/5 4/5 4/5 4/5 4/5 1 4/5 4/5 4/5 4/5 4/5 4/5 4/5 4/5 4/5 4/ /25 24/25 24/25 24/25 24/25 24/25 24/25 24/25 24/25 24/ /25 24/25 24/25 24/25 24/25 24/25 24/25 24/25 24/25 24/ /25 24/25 24/25 24/25 24/25 24/25 24/25 24/25 24/25 24/ /25 24/25 24/25 24/25 24/25 24/25 Σ 3 = 24/25 24/25 24/25 24/ /25 24/25 24/25 24/25 24/25 24/25 24/25 24/25 24/25 24/ /25 24/25 24/25 24/25. 24/25 24/25 24/25 24/25 24/25 24/ /25 24/25 24/25 24/25 24/25 24/25 24/25 24/25 24/25 24/ /25 24/25 24/25 24/25 24/25 24/25 24/25 24/25 24/25 24/ /25 24/25 24/25 24/25 24/25 24/25 24/25 24/25 24/25 24/25 1 If you re not sure what this means, don t worry. Details are coming up. In a nutshell, the samples of the first data set are drawn from independent normal random variables, while those from the other two sets are drawn from a family of pairwise positively correlated random variables (with covariances 4/5 in the first case and 24/25 in the second). The main purpose of time series modeling is to come up (as one would expect) with the stochastic process (time series model) from which the observed data (time series) is a realization. This is an impossible task, as suggested by the quote at the beginning of this lecture. Randomness in the real world is simply too complex to grasp completely. However, there are ways to determine according to some (sometimes subjective) criteria which models work better and which models don t work as well in a given setting. 1 6

8 Here s where finding a model for data is tricky: There are many choices for a model which at first (and even second) glance seem reasonable for a given data set. I am sure none of you would have been shocked if I had told you that the second to last data set above was drawn from independent normal random variables with mean 0 and standard deviation 1/2. Nor would you have been very troubled if I d suggested that they were generated using independent exponential random variables with mean 1. This illustrates the fact that in time series modeling, one often has a choice between a number of models (in the case I just mentioned, types of random variables) and within these, a number of parameters (means, variances, covariances, etc.). In this course, you will be exposed to a number of models which all depend on a number of parameters. There usually isn t a systematic way to choose a model (and the corresponding parameters), so modeling usually requires a fair dose of theoretical understanding (to determine if a model is even acceptable in a given setting) and flair (since all models are wrong, experience comes in handy when trying to find one that is better than others). Since the title of this course is Time Series, it might be useful if we know what a time series is! Definition 1.1. A time series is simply a set of observations {x t }, with each data point being observed at a specific time t. A time series model is a set of random variables {X t }, each of which corresponds to a specific time t. Notation The symbol A := B means A is defined to equal B, whereas C = D by itself means simply that C and D are equal. This is an important distinction because if you write A := B, then there is no need to verify the equality of A and B. They are equal by definition. However, if C = D, then there IS something that needs to be proved, namely the equality of C and D (which might not be obvious). For example, you may recall that for a random variable X, and V ar(x) := E[(X E[X]) 2 ] V ar(x) = E[X 2 ] E[X] Introduction to Random Variables While writing my book [Stochastic Processes] I had an argument with Feller. He asserted that everyone said random variable and I asserted that everyone said chance variable. We obviously had to use the same name in our books, so we decided the issue by a stochastic procedure. That is, we tossed for it and he won. 1 7 Joe Doob

9 In probability, Ω is used to denote the sample space of outcomes of an experiment. Example 1.1. Toss a die once: Ω = {1, 2, 3, 4, 5, 6}. Example 1.2. Toss two dice: Ω = {(i, j) : 1 i 6, 1 j 6}. Note that in each case Ω is a finite set. (That is, the cardinality of Ω, written Ω, is finite.) Example 1.3. Consider a needle attached to a spinning wheel centred at the origin. When the wheel is spun, the angle ω made with the tip of the needle and the positive x-axis is measured. The possible values of ω are Ω = [0, 2π). In this case, Ω is an uncountably infinite set. (That is, Ω is uncountable with Ω =.) Definition 1.2. A random variable X is a function from the sample space Ω to the real numbers R = (, ). Symbolically, X : Ω R ω X(ω). Example 1.4. (1.1 continued). Let X denote the upmost face when a die is tossed. Then, X(i) = i, i = 1,..., 6. Example 1.5. (1.2 continued). Let X denote the sum of the upmost faces when two dice are tossed. Then, X((i, j)) = i + j, i = 1,..., 6, j = 1,..., 6. Note that the elements of Ω are ordered pairs, so that the function X( ) acts on (i, j) giving X((i, j)). We will often omit the inner parentheses and simply write X(i, j). Example 1.6. (1.3 continued). Let X denote the cosine of the angle made by the needle on the spinning wheel and the positive x-axis. Then X(ω) = cos(ω) so that X(ω) [ 1, 1]. Remark. As mentioned in the definition, a random variable is really a function whose input variable is random, that is, determined by chance (or God, or destiny, or karma, or whatever you think decides how our world works). The use of the notation X and X(ω) is EXACTLY the same as the use of f and f(x) in elementary calculus. For example, f(x) = x 2, f(t) = t 2, f(ω) = ω 2, and X(ω) = ω 2 all describe EXACTLY the same function, namely, the function which takes a number and squares it. What makes random variables slightly more complicated than functions is that, unlike the variable x from calculus, the variable ω is random and therefore comes from a distribution. 1.3 Discrete and Continuous Random Variables Definition 1.3. Suppose that X is a random variable. Suppose that there exists a function f : R R with the properties that f(x) 0 for all x, f(x) dx = 1, and P ({ω Ω : X(ω) a}) =: P (X a) = a f(x) dx. We call f the (probability) density (function) of X and say that X is a continuous random variable. Furthermore, the function F defined by F (a) := P (X a) is called the (probability) distribution (function) of X. 1 8

10 Note 1.1. By the Fundamental Theorem of Calculus, F (x) = f(x). Remark. There exist continuous random variables which do not have densities. Although it s good to know that the definition of continuous random variables is slightly more general than what is suggested above, you won t need to worry about it in this course. Example 1.7. A random variable X is said to be normally distributed with parameters µ, σ 2, if the density of X is f(x) = 1 ( ) (x µ) 2 σ 2π exp, < µ <, 0 < σ <. 2σ 2 This is sometimes written X N (µ, σ 2 ). In Exercise 1.2, you will show that the mean of X is µ and the variance of X is σ 2, respectively. Definition 1.4. Suppose that X is a random variable. Suppose that there exists a function p : Z R with the properties that p(k) 0 for all k, k= p(k) = 1, and P ({ω Ω : X(ω) N}) =: P (X N) = N k= p(k). We call p the (probability mass function or) density of X and say that X is a discrete random variable. Furthermore, the function F defined by F (N) := P (X N) is called the (probability) distribution (function) of X. Example (continued). If X is defined to be the sum of the upmost faces when two dice are tossed, then the density of X, written p(k) := P (X = k), is given by p(2) p(3) p(4) p(5) p(6) p(7) p(8) p(9) p(10) p(11) p(12) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 and p(k) = 0 for any other k Z. Remark. There do exist random variables which are neither discrete nor continuous; however, such random variables will not concern us. 1.4 Expectation and Variance Suppose that X : Ω R is a random variable (either discrete or continuous), and that g : R R is a (piecewise) continuous function. Then Y := g X : Ω R defined by Y (ω) = g(x(ω)) is also a random variable. We usually write Y = g(x). We now define the expectation of the random variable Y, distinguishing the discrete and continuous cases. 1 9

11 Definition 1.5. If X is a discrete random variable and g is as above, then the expectation of g X is given by E[g(X)] := g(k) p(k) k where p is the probability mass function of X. Definition 1.6. If X is a continuous random variable and g is as above, then the expectation of g X is given by E[g(X)] := where f is the probability density function of X. g(x) f(x) dx Notice that if g is the identity function (that is, g(x) = x for all x, we get the expectation of X itself: E[X] := k k p(k), if X is discrete, and E[X] := x f(x) dx if X is continuous. µ := E[X] is also called the mean of X. Note that µ. If < µ <, then we say that X has a finite mean, or that X is an integrable random variable, and we write X L 1. Exercise 1.1. Suppose that X is a Cauchy random variable. That is, X is a continuous random variable with density function f(x) = 1 π 1 x Carefully show that X L 1 (that is, X doesn t have a finite mean). Theorem 1.1 (Linearity of Expectation). Suppose that X : Ω R and Y : Ω R are (discrete or continuous) random variables with X L 1 and Y L 1. Suppose also that f : R R and g : R R are both (piecewise) continuous and such that f(x) L 1 and g(y ) L 1. Then, for any a, b R, af(x) + bg(y ) L 1 and, furthermore, E[af(X) + bg(y )] = ae[f(x)] + be[g(y )]. Using Definitions 1.5 and 1.6, we can compute the kth moments E[X k ] of a random variable X. One frequent assumption about a random variable is that it has a finite second moment. This is to ensure that the Central Limit Theorem can be used. Definition 1.7. If X is a random variable with E[X 2 ] <, then we say that X has a finite second moment and write X L 2. If X L 2, then we define the variance of X to be the number σ 2 := E [(X µ) 2 ]. The standard deviation of X is the number σ := σ 2. (As usual, this is the positive square root.) 1 10

12 Remark. It is an important fact that if X L 2, then it must be the case that X L 1. The following is a useful formula when computing variances (people sometime confuse it with the definition of variance, which it s not; for the definition, see above). Theorem 1.2. Suppose X L 2. Then Proof. By linearity of expectation, V ar(x) = E[X 2 ] E[X] 2. V ar(x) = E[(X µ) 2 ] = E[X 2 2µX + µ 2 ] = E[X 2 ] E[2µX] + E[µ 2 ] = E[X 2 ] 2µE[X] + µ 2 = E[X 2 ] 2µ 2 + µ 2 = E[X 2 ] µ 2 = E[X 2 ] E[X] 2 The following exercise is a little bit tedious, but you should make sure you know how to do it. If you remember doing it and remember well how it works, feel free to skip it. Since this lecture and the next are mostly review, I am including several exercises which are meant to refresh your memory on some basic ideas from probability but which you may know very well how to do already. That s why I m including the comment (optional) next to them. I will not include these problems on the homework assignments. Exercise 1.2. (optional) The purpose of this exercise is to make sure you can compute some straightforward (but messy) integrals [Hint: A change of variables will make them easier to handle.]. Suppose that X N (µ, σ 2 ); that is, X is a normally distributed random variable with parameters µ, σ 2. (See Example 1.7 for the density of X.) Show directly (without using any unstated properties of expectations or distributions) that E[X] = µ, E[X 2 ] = σ 2 + µ 2, and { )} E[e θx ] = exp (θµ σ2 θ 2, for 0 θ <. 2 V ar(x) = σ 2 [Note that this follows from the first two parts and Theorem 1.2.] This is the reason that if X N (µ, σ 2 ), we say that X is normally distributed with mean µ and variance σ 2 (not just with parameters µ and σ 2 ). 1 11

13 Math 4506 (Spring 2015) February 2, 2015 Prof. Christian Beneš Lecture #2: Multivariate Random Variables 2.1 Bivariate Random Variables Theorem 2.1. If X and Y are random variables with X L 2 and Y L 2, then the product XY is a random variable with XY L 1. Definition 2.1. If X and Y are both random variables in L 2, then the covariance of X and Y, written Cov(X, Y ) is defined to be Cov(X, Y ) := E [(X µ X )(Y µ Y )] where µ X := E[X], µ Y := E[Y ]. Whenever the covariance of X and Y exists, we define the correlation of X and Y to be Corr(X, Y ) := Cov(X, Y ) σ X σ Y ( ) where σ X is the standard deviation of X, and σ Y is the standard deviation of Y. Remark. By convention, 0/0 := 0 in the definition of correlation. This arbitrary choice is designed to simplify some formulas and means that if Var(X) = 0 or V ar(y ) = 0, then Corr(X, Y ) = 0 (this follows from the fact that if Var(X) = 0 or Var(Y ) = 0, then Cov(X, Y ) = 0). Since if Var(X) = 0, X is constant (in which case we call X degenerate which in this context just means non-random), this means that the correlation of two random variables is always 0 if one of them is degenerate. Definition 2.2. We say that X and Y are uncorrelated if Cov(X, Y ) = 0 (or, equivalently, if Corr(X, Y ) = 0). Fact 2.1. If X L 2 and Y L 2, then the following computational formulas hold: Cov(X, Y ) = E[XY ] E[X]E[Y ]; Var(X) = Cov(X, X); Exercise 2.3. Verify the two computational formulas above. [Note that the formulas don t necessarily hold without the assumption that X L 2 and Y L 2, so make sure you explain why these assumptions are needed in general.] Definition 2.3. Two random variables X and Y are said to be independent if f(x, y), the joint density of (X, Y ), can be expressed as f(x, y) = f X (x) f Y (y) where f X is the (marginal) density of X and f Y is the (marginal) density of Y. 2 1

14 Remark. Notice that we have combined the cases of a discrete and a continuous random variable into one definition. You can substitute the phrases probability mass function or probability density function as appropriate. The following result is often needed and at a first glance not completely obvious. Theorem 2.2. If X and Y are independent random variables with X L 1 and Y L 1, then the product XY is a random variable with XY L 1, and E[XY ] = E[X] E[Y ]. Exercise 2.4. (optional) Using this theorem, quickly prove that if X and Y are independent random variables, then they are necessarily uncorrelated. (As the next exercise shows, the converse, however, is not true: there do exist uncorrelated, dependent random variables.) Exercise 2.5. (optional) Consider the random variable X defined by P (X = 1) = 1/4, P (X = 0) = 1/2, P (X = 1) = 1/4. Let the random variable Y be defined as Y := X 2. Hence, P (Y = 0 X = 0) = 1, P (Y = 1 X = 1) = 1, P (Y = 1 X = 1) = 1. Show that the density of Y is P (Y = 0) = 1/2, P (Y = 1) = 1/2. Find the joint density of (X, Y ), and show that X and Y are not independent. Find the density of XY, compute E[XY ], and show that X and Y are uncorrelated. The following result allows us to get a grip on the variance in algebraic manipulations when the random variables involved are independent: Theorem 2.3 (Linearity of Variance in the Case of Independence). Suppose that X : Ω R and Y : Ω R are (discrete or continuous) random variables with X L 2 and Y L 2. If X and Y are independent, then X + Y L 2 and Var(X + Y ) = Var(X) + Var(Y ). 2.2 Multivariate Random Variables We just saw that pairs of random variables can be more complicated than what one might like to think. It is not enough to know the distributions of the random variables X and Y to know how they behave together. Think of the following example: You may know the distribution of the heights (X) and weights (Y ) of people in a certain population. However, this by itself will not tell you how height affects weight and vice-versa. The information on how the random variables are related is not contained in the distributions of X and Y (that is, the marginals). To have an idea of the relative behavior of random variables, one needs the correlation coefficient. Recall: 2 2

15 If we want to describe a single random variable (also called univariate random variables), we need a density f(x), which graphically can be described as a curve (or a set of points in the discrete case) in the plane. If we want to describe a pair of random variables (also called bivariate random variables), we need a joint density f(x, y), which graphically can be described as a surface (or a set of points in the discrete case) in space. This extends easily to higher dimensions: If we want to describe a family of n random variables, we need a joint density f(x 1,..., x n ), which graphically can be described as a hyper-surface (or a set of points in the discrete case) in (n + 1)-dimensional space. We are usually comfortable with drawing or imagining objects in 1, 2, or 3 dimensions. In higher dimensions, we tend to get a headache before we can make a sense of what we are trying to represent, so we will limit ourselves to depicting densities of univariate and bivariate random variables and will deal with the rest algebraically (and refer to pictures in dimensions 3 when we get confused and need a picture to help us out). We will write x = x 1 x 2 x n = (x 1,..., x n ) and will think of random vectors as being column vectors. Therefore, the random vector X = (X 1,..., X n ) has joint distribution (we will often just say distribution) An equivalent way of writing this is F (x 1,..., x n ) = P (X 1 x 1,... X n x n ). F (x) = P (X x). Recall that if F (x, y) is a bivariate distribution (say for jointly continuous r.v. s), then F (x) = P (X x) = P (X x, Y ) = x f(a, b) db da = F (x, ). The distributions of subsets of random variables are obtained in the same way as in 2 dimensions: If F (x 1,..., x n ) is a multivariate distribution, then, for instance, F (x 1, x 2, x n ) = P (X 1 x 1, X 2 x 2, X 3, X n 1, X n x n ) = F (x 1, x 2,,...,, x n ). 2 3

16 For univariate random variables, you know that the p.d.f. is the derivative of the distribution function. In higher dimensions, this is true as well, but since we are dealing with functions of several variables, we have to talk about partial derivatives. f(x 1,... x n ) = n F (x 1,..., x n ) x 1 x n. The random variables X 1,..., X n are independent if F (x 1,..., x n ) = F X1 (x 1 ) F Xn (x n ) or, alternatively, if the joint p.d.f. (p.m.f.) is the product of the marginal p.d.f s (p.m.f s). Since the random vector X = (X 1,..., X n ) is a vector, so is its mean E[X] = (E[X 1 ],... E[X n ]). Since there is a covariance between any two of the X i, there is a total of n 2 covariances which compose the covariance matrix Cov(X 1, X 1 ) Cov(X 1, X 2 )... Cov(X 1, X n 1 ) Cov(X 1, X n ) Cov(X 2, X 1 ) Cov(X 2, X 2 )... Cov(X 2, X n 1 ) Cov(X 2, X n ) Σ X = Cov(X n 1, X 1 ) Cov(X n 1, X 2 )... Cov(X n 1, X n 1 ) Cov(X n 1, X n ) Cov(X n, X 1 ) Cov(X n, X 2 )... Cov(X n, X n 1 ) Cov(X n, X n ) Note that Σ X = Var(X 1 ) Cov(X 1, X 2 )... Cov(X 1, X n 1 ) Cov(X 1, X n ) Cov(X 2, X 1 ) Var(X 2 )... Cov(X 2, X n 1 ) Cov(X 2, X n ) Cov(X n, X 1 ) Cov(X n, X 2 )... Cov(X n, X n 1 ) Var(X n ). Since for any i, j {1,..., n}, Cov(X i, X j ) = Cov(X j, X i ), the covariance matrix is symmetric. The following result tells us how to deal with the covariance of linear combinations of random variables. Theorem 2.4. If X, Y, Z L 2 and a, b, c R, then Cov(aX + by + c, Z) = a Cov(X, Z) + b Cov(Y, Z). Exercise 2.6. (optional) Prove Theorem 2.4 Note 2.1. From this theorem follows another result which you already know: Var(aX) = a 2 Var(X). 2 4

17 2.3 Some Basic Linear Algebra Caveat 2.1. I may not be entirely consistent with notation in what follows. Sometimes, vectors will be represented by boldfaced symbols (x) and sometimes like this: x. On rare occasions, I may use the same notation as for scalars, since that notation is common as well. If that s the case, you should be able to figure out from context whether you re dealing with a vector or not. For matrices and a vector we have the following definitions: a 1,1 a 1,2... a 1,l 1 a 1,l a 2,1 a 2,2... a 2,l 1 a 2,l A = [a ij ] 1 i k,1 j l = a k,1 a k,2... a k,l 1 a k,l b 1,1 b 1,2... b 1,n 1 b 1,n b 2,1 b 2,2... b 2,n 1 b 2,n B = [b ij ] 1 i l,1 j n = b l,1 b l,2... b l,n 1 b l,n v = v 1 v 2. v l 1 v l,,, The product of two matrices is AB = [c i,j ] 1 i k,1 j n, where c i,j = l a i,k b k,j. k=1 In particular, the product of a matrix and a vector is l a 1,i v i l a 2,i v i A v =.. l a n 1,i v i l a n,i v i 2 5

18 The transpose of matrix A is A = [c i,j ] 1 i l,1 j k, where c i,j = a j,i. The determinant of a matrix A, written det(a), is something fairly easy to compute but its definition isn t exactly short, so those who can t remember it should look it up in a book on linear algebra. Wikipedia also has a definition and some examples. Note that the determinant is defined only for square matrices (with same number of rows and columns). We say that A is singular if det(a) = 0. Otherwise, A is nonsingular. The following definitions are for the case k = l (that is, A is a square matrix): If A is nonsingular, the inverse of A, denoted by A 1 is the unique matrix such that AA 1 = A A = 1 k := If it is clear from context what the dimensions of the matrix are, we write 1 = 1 k. A is called orthogonal if A = A 1. In that case, A is symmetric if for all 1 i, j k, AA = A A = 1. a i,j = a j,i. A is positive definite if for all vectors v = [v 1,..., v k ], v A v 0. Theorem 2.5. If an n n matrix A is symmetric and positive definite, it can be written as where A = P ΛP, λ λ Λ = λ n and P is orthogonal. Here, λ 1,..., λ n are the eigenvalues of A. Theorem 2.6. The covariance matrix of a random vector X is symmetric and positive definite. 2 6

19 Corollary 2.1. The covariance matrix Σ of a random vector X can be written in the form Σ = P ΛP, where and P is orthogonal. Note 2.2. If we define λ λ Λ = λ n Λ 1/2 := λ 1/ λ 1/ λn 1/2 and then, since P P = P P = 1, B = P Λ 1/2 P, B 2 = BB = P Λ 1/2 P P Λ 1/2 P = P ΛP = Σ. Since B 2 = Σ, it makes perfect sense to define Σ 1/2 := P Λ 1/2 P = B. (1) Since we will often deal with linear transformations of random variables, the following proposition will be useful: Proposition 2.1. If X is a random vector, a is a (nonrandom) vector, B is a matrix, and Y = BX + a, then E[Y ] = a + BE[X], Proof. See first homework assignment. Σ Y = BΣ X B. 2.4 Multivariate Normal Random Variables You already know that the normal distribution is the most important of them all, since the central limit theorem tells us that as soon as we start adding up random variables, a normal pops up. Recall from Lecture 2 that a normal random variable X with parameters µ, σ 2 has density f(x) = 1 σ 2π exp ( (x µ) 2 2σ 2 ), < µ <, 0 < σ <. 2 7

20 You should verify that this is the one-dimensional particular case of the multivariate normal density with mean µ and nonsingular covariance matrix Σ (written X N(µ, Σ)): f X (x) = 1 ((2π) n det(σ)) 1/2 exp{ 1 2 (x µ) Σ 1 (x µ)}. Note 2.3. Make sure you understand why one needs Σ to be nonsingular in order for the definition of the multivariate normal density to make sense. Exercise 2.7. Suppose X N(0, 1), Y N(0, 2) are bivariate normal with correlation coefficient ρ(x, Y ) = 1 2. Find the joint density of X and Y. Let S 1 be the square with vertices (0,0), (1,0), (0,1), and (1,1) and let S 2 be the square with vertices (0,0), (1,0), (0,-1), and (1,-1). Without doing any computations, explain which of P ((X, Y ) S 1 ) and P ((X, Y ) S 2 ) should be greater. You probably recall that if X N(µ, σ 2 ), you can apply a linear transformation to change X into a standard normal: Z = X µ N(0, 1). σ The same works for the multivariate normal: Exercise 2.8. Prove that if X N( µ, Σ), then Z := Σ 1/2 (X µ) N(0, 1). In particular (prove this only in the bivariate case), the components of Z are independent. Hint: Use proposition 2.1. Note 2.4. This last exercise shows how to obtain a standard normal vector from any multivariate normal distribution. On the homework, you will also show how to do the converse, that is, obtain any multivariate normal distribution from the standard multivariate normal. 2 8

21 You can generate multivariate normal random variables in R using the following commands (note that comments about what a line does will follow the symbol %; these comments are not part of what you should include in your input line): > library(mass) % this loads the library in which the multivariate normal generator is > S=c(1,0,0,1) % this generates the vector (1, 0, 0, 1) > dim(s)=c(2,2) % this transforms the vector into a 2-by-2 matrix > S % this allows you to check what S is. [,1] [,2] [1,] 1 0 [2,] 0 1 > mu=c(0,0) % this is the mean (row) vector > mu [1] 0 0 > dim(mu)=c(2,1) %this makes the mean vector into a column vector > mu [,1] [1,] 0 [2,] 0 > N=mvrnorm(100,mu,S) % this generates 100 samples from the multivariate normal random distribution with mean mu and covariance matrix S > plot(n) N[,2] N[,1] 2 9

22 > S2=c(1,1,1,1) > dim(s2)=c(2,2) > N2=mvrnorm(100,mu,S2) > plot(n2) N2[,2] N2[,1] > S3=c(1,-0.8,-0.8,1) > dim(s3)=c(2,2) > N3=mvrnorm(100,mu,S3) > plot(n3) N3[,2] N3[,1] 2 10

23 The following are the graphs of 3 multivariate normal densities (any two pictures on a same line are of the same pdf, but seen under different angles) Try to say as much as you can about their means and covariance matrices. 2 y y x x y y 0 0 x -2 x y y x -2 x

24 2 y y x x 2 2 The joint pdf of two independent standard normal random variables 2 y y 0 0 x -2 x 2 2 The joint pdf of two normal random variables with mean 0 and covariance matrix 1 1/2 Σ=. 1/2 1 2 y y x -2 x 2 2 The joint pdf of two normal random variables with mean 0 and covariance matrix 1 1/2 Σ=. 1/

25 When pictures of surfaces don t make as much sense as we d like, we can always look at level curves. Here are the same graphs as above with level curves: y y y 2 0 x x The joint pdf of two independent standard normal random variables y y 2 0 x x The joint pdf of two normal random variables with mean 0 and covariance matrix [ ] 1 1/2 Σ =. 1/ y 2 x x The joint pdf of two normal random variables with mean 0 and covariance matrix [ ] 1 1/2 Σ =. 1/

26 y y When you draw samples from a distribution, you should see most of your data points accumulate in areas of high probability. The shapes of these areas are precisely given by the level curves: x 50 samples from a bivariate normal random variable with mean 0 and covariance matrix [ ] 1 0 Σ = x 50 samples from a bivariate normal random variable with mean 0 and covariance matrix [ ] 1 1/2 Σ =. 1/ y x 50 samples from a bivariate normal random variable with mean 0 and covariance matrix [ ] 1 1/2 Σ =. 1/

27 y y y The connection between the data and the distribution becomes more obvious as the data set increases in size: x 500 samples from a bivariate normal random variable with mean 0 and covariance matrix [ ] 1 0 Σ = x 500 samples from a bivariate normal random variable with mean 0 and covariance matrix [ ] 1 1/2 Σ =. 1/ x 500 samples from a bivariate normal random variable with mean 0 and covariance matrix [ ] 1 1/2 Σ =. 1/

28 Math 4506 (Spring 2015) February 4, 2015 Prof. Christian Beneš Lecture #3: Decomposing Time Series; Stationarity Reference. The material in this section is an introduction to time series and is meant to complement Chapter 1 in the textbook. Make sure you read that chapter in its entirety and work in parallel with R to reproduce what is being done in the textbook. This lecture also covers most of the topics from Chapter 2, which we will re-visit in more detail in the next lecture. 3.1 Basic decomposition The following graph represents the number of monthly aircraft miles (in Millions) flown by U.S. airlines between 1963 and 1970: Air.ts Time Given a data set such as the one above, how can we construct a model for it? The idea will be to decompose random data into three distinct components: A trend component m t (increase of populations, increase in global temperature, etc.) A seasonal component s t (describing cyclical phenomena such as annual temperature patterns, etc.) 3 1

29 A random noise component Y t describing the non-deterministic aspect of the time series. Note that the book uses z t for this component. In the notes, I ll write Y t, as the letter z usually suggests a normal distribution, which may not be the actual underlying distribution of the random noise component. A common model is the so-called additive model, that is, one where we try to find m t, s t, Y t such that a given time series can be expressed as X t = m t + s t + Y t. We will never know what m t, s t, and Y t actually are, but we can estimate them. The estimates will be called ˆm t, ŝ t, and y t. Note that we ll use the same notation for estimates and estimators in this case. Once we see the data, our estimates have to satisfy x t = ˆm t + ŝ t + y t, where ˆm t is an estimate for m t, ŝ t is an estimate for s t, and y t is an estimate for Y t. The corresponding data set can be found at and looks like this: Jan F eb M ar Apr M ay Jun Jul Aug Sep Oct N ov Dec In fact, this is not exactly the form in which the data set is found on that website. There, it doesn t have any labels. As it turns out, it is quite straightforward to include those labels with R. Let s look at the graph above. Two patterns are striking. There appears to be an increasing pattern a clear cyclical pattern with some apparently fixed period Some questions we ll try to answer throughout the course are: How can we extract these patterns?. Once we ve extracted the patterns are we left with pure randomness or does the randomness have a structure? Can we use these patterns to make predictions for future values of this time series? 3 2

30 3.2 Stationary Time Series We will eventually return to a more careful analysis of the trend and seasonal component of a time series, but focus for now on Y t, the random component of a time series after extraction of a trend and cyclical component. Multidimensional distributions are very complicated objects and involve more parameters than we would like to deal with. We will focus on two essential quantities giving information about a time series: the means and the covariances. Definition 3.1. If {X t } is a time series with X t L 1 for each t, then the mean function (or trend) of {X t } is the non-random function µ(t) := E[X t ]. Definition 3.2. If {X t } is a time series with X t L 2 for each t, then the autocovariance function of {X t } is the non-random function γ(t, s) := Cov(X t, X s ) = E [(X t µ(t))(x s µ(s))]. The autocorrelation function of {X t } is ρ(t, s) = γ(t, s) Var(Xt ) Var(X s ) = Corr(X t, X s ). Definition 3.3. We call the time series {X t } second-order (or weakly) stationary if there is a constant µ such that µ(t) = µ for all t, and γ(t + h, t) only depends on h; that is, if γ(t + h, t) = γ(h, 0) =: γ(h) for all t and for all h. Exercise 3.9. For a second-order stationary process, show that Var(X t ) = γ(0) for each t. Via the last exercise, the second condition for second-order stationarity allows us to rephrase the definition above: Definition 3.4. Suppose that {X t } is a second-order stationary process. The autocovariance function (ACVF) at lag h of {X t } is γ(h) := Cov(X t+h, X t ). The autocorrelation function (ACF) at lag h of {X t } is Note 3.1. By Exercise 3.9, ρ(h) = ρ(h) := Corr(X t+h, X t ). Cov(X t+h, X t ) Var(Xt+h ) Var(X t ) = γ(h) γ(0). 3 3

31 3.3 Some simple time series models All the time series below are discrete-time, that is, the time set is a subset of the integers. Example 3.1. (White Noise.) Often when taking measurements, little imprecisions (in the measuring device and on the part of the measurer) will yield measurements that are a little off. It is often assumed that these errors are uncorrelated and that they all come from a same distribution with zero mean. A sequence of random variables {X n } n 1 with E[X n ] = 0 and E[X k X m ] = σ 2 δ(k m) is called white noise. (The name comes from the spectrum of a stationary process which we may discuss at the end of the semester. There also noises that are pink, red, blue, purple, etc.) Here δ(k m) is the Dirac delta function, defined by { 1 x = 0 δ(x) = 0 x R \ {0} Two important particular cases of white noise are: The distribution of X i is binary: P (X i = a) = 1 P (X i = a) = 1/2 for some a R. X i N(0, σ 2 ). In this case, we talk about Gaussian white noise. Example 3.2. (IID Noise.) A sequence of independent, identically distributed random variables {X n } n 1 with E[X n ] = 0 is called i.i.d. noise. Example 3.3. (Random walk.) If {X i } i 1 is i.i.d. noise, n S n = is a random walk. In particular, if P (X i = 1) = 1 P(X i = 1) = 1/2, we have a symmetric simple random walk. Random walks have been a (very crude) choice of model for the stock market for a long time. X i Position of Walker 6 4 Position of Walker Number of Steps Number of Steps Two independent realizations of a simple random walk of 100 time steps. 3 4

32 Example 3.4. (Gaussian time series.) {X n } n 1 is a Gaussian time series if for every collection of integers {i k } 1 k n, the vector is multivariate Gaussian. (X i1,..., X in ) Since many natural quantities have a normal distribution, this is a natural model in many settings. It also has the advantage of allowing many kinds of dependence between the data. 3.4 Autocovariance function: some examples We saw that for stationary time series, covariance depends only on one parameter (the time between two given random variables), allowing us to define an autocovariance function at lag h. In the examples below, we compute the autocovariance function of the simple time series which we defined during the last lecture and use it to determine which of them are stationary and which are not. Example 3.5 (White Noise). Suppose that {X t } is White Noise. We now verify that {X t } is second-order stationary. First, it is obvious that µ(t) = 0 for all t. Second, if s t, then the assumption that the collection is uncorrelated implies that γ(t, s) = 0, s t. On the other hand, if s = t, then γ(t, t) = Var(X t ) = σ 2. Thus, µ(t) = 0 for all t, and { σ 2, h = 0, γ(h) = γ(t + h, t) = 0, h 0, This shows that {X t } is indeed second-order stationary since γ depends only on h. We write {X t } W N(0, σ 2 ) to indicate that {X t } is white noise with Var(X t ) = σ 2, for each t. Example 3.6 (IID Noise). Suppose instead that {X t } is collection of independent random variables, each with mean 0 and variance σ 2. We say that {X t } is iid Noise. As with white noise, we easily see that iid noise is stationary with trend µ(t) = 0 and { σ 2, h = 0, γ(h) = γ(t + h, t) = 0, h 0. We write {X t } IID(0, σ 2 ) to indicate that {X t } is iid noise with Var(X t ) = σ 2, for each t. Remark. With these two examples, we see that two different processes may both have the same trend and autocovariance function. Thus, µ(t) and γ(t + h, t) are NOT always enough to distinguish stationary processes. (However, for stationary Gaussian processes they are enough.) Example 3.7. If S t = t X i (where {X i } is a sequence of independent random variables with P (X i = 1) = 1 P (X i = 1) = 1/2 and therefore Var(X i ) = 1) is symmetric simple 3 5

33 random walk, we find that if s > t, γ(s, t) = Cov(S s, S t ) = Cov(S t + X t X s, S t ) = Cov(S t, S t ) t = Var(S t ) = Var X i = t. In particular, γ(t + h, t) = t, which implies that simple random walk is not a stationary time series (since stationary time series have a constant variance). 3 6

34 Math 4506 (Spring 2015) February 9, 2015 Prof. Christian Beneš Lecture #4: More stationary time series; Autocovariance; Linear Processes; MA processes Reference. Chapter 2 and Sections 4.1 and 4.2 from the textbook. 4.1 Inequalities Many probabilists are enthralled by inequalities (upper/lower bounds). One of the many purposes for finding upper bounds is to check that quantities are finite, by checking it for a more tractable but larger quantity. (This is something you ve seen in the comparison test for integrals: Though it s not straightforward to check that 1000 e x2 log log log x + 1 <, the fact that for x 1000, 0 e x2 log log log x + 1 e x implies that e x2 log log log x + 1 R e x <.) A very common inequality in analysis and probability is Jensen s inequality. Definition 4.1. A function φ : R R is called convex if for x, y R, 0 p 1, φ(px + (1 p)y) pφ(x) + (1 p)φ(y). Theorem 4.1 (Jensen s inequality). Suppose φ : R R is convex. Suppose X is a random variable satisfying E[ X ] < and E[ φ(x) ] <. Then φ(e[x]) E[φ(X)]. Proof. If φ is convex, then for every x 0 R, there is a c(x 0 ) such that φ(x) φ(x 0) x x 0 c(x 0 ). Choosing x 0 = E[X] and letting x = X, we get φ(x) c(e[x])(x E[X]) + φ(e[x]). Taking expectations on both sides concludes the proof. Example 4.1. Two straightforward consequences of Jensen s formula are: E[X] E[ X ]. E[X] 2 E[X 2 ]. In particular, applying the second inequality to the random variable X, we get E[ X ] 2 E[ X 2 ] E[X 2 ], so that if E[X] = 0, E[ X ] σ. (2) Two other very commonly useful inequalities are 4 1

35 Theorem 4.2. (Cauchy-Schwarz inequality) If X, Y L 2, E[ XY ] 2 E[X 2 ]E[Y 2 ]. Note 4.1. This last inequality is the probabilistic version of the C-S inequality and should be compared with the C-S inequality in its most standard form: ( n ) 2 x i y i n x 2 i n yi 2. (3) Theorem 4.3. (Triangle inequality) If x, y R, x + y x + y By induction, if x 1,..., x n R, n x i n x i. 4.2 Linear Processes Definition 4.2. We define the backwards shift operator B by BX t = X t 1. For j 2, we define In other words, B j X t = BB j 1 X t. B j X t = X t j. Definition 4.3. A time series {X t } t Z is a linear process if for every t Z, we can write X t = ψ i Z t i, (4) i= where Z t W N(0, σ 2 ) and the scalar sequence {ψ i } i Z satisfies ψ i <. Using the i Z shortcut Ψ(B) = ψ i B i, we can write i= X t = ψ(b)z t. If ψ i = 0 for all i < 0, we call X a moving average or MA( ) process. 4 2

36 Note 4.2. Infinite sums of random variables are somewhat delicate. You know what it means for an infinite sum of real numbers to converge, but for random variables, it isn t clear at first what the corresponding meaning would be. In fact, there are a number of different ways to give a meaning to the notion of convergence of random variables. For technical reasons, convergence of a sum of random variables is often taken in the mean square sense: {Y k } k 1 converges to Y in the mean square sense if there exists a random variable Y such that ( n ) 2 E Y k Y n 0. k=1 In any case, it should be intuitively clear that some requirement on the ψ i is necessary, since if all the ψ i were equal to 1, X t would be an infinite sum of i.i.d. random variables, which does not converge (since we re always adding more random variables that don t shrink, the sum would not stabilize). The requirement ψ i < ensures that the random series ψ i Z t i has a limit. I i Z i= won t expect you to completely understand what this means, but if you care about it, here s the argument: ψ i < ψi 2 < ψi 2 E[Zt 1] 2 < i 0 i 0 i 0 n i=m ψ 2 i E[Z 2 t i] m,n 0. (The last implication is the Cauchy criterion for convergence of series.) Now by the Cauchy- Schwarz inequality ((3) with y i = 1 for all i {1,..., n}), ( n n n ) 2 ψi 2 E[Zt i] 2 = E[ ψi 2 Zt i] 2 E ψ i Z t i. Therefore, n i=m i=m ψ 2 i E[Z 2 t i] m,n 0 E i=m ( n n i=m i=m ) 2 ψ i Z t i m,n 0 i=m ψ i Z t i converges as n, m i 0 ψ i Z t i converges. The last implication is the Cauchy criterion for convergence of sequences of random variables. Now that we know that the process defined in (10) exists, let s also show that for any t Z, X t L 1 : If ψ i <, using the triangle inequality (for the first inequality; note that since it s an i Z infinite sum, we have to take limits) and Jensen s inequality (for the last), we get E[ X t ] i Z E ψ i Z t i i Z ψ i E Z t i σ i Z ψ i. 4 3

37 4.3 Moving Average Processes We will now construct stationary time series that have a non-zero autocovariance up to a certain lag q but have zero autocovariance at all later lags. One simple and natural way is to start with white noise Z t (denoted Z t W N(0, σ 2 )) and to construct a new sequence of random variables which depend on an overlapping subset of the Z t. Definition 4.4. A moving-average process of order q is defined for t Z by the equation X t = Z t + θ 1 Z t θ q Z t q = Z t + q θ i Z t i = q θ i Z t i = Θ(B)Z t, i=0 where {Z t } W N(0, σ 2 ), θ 0 = 1, θ 1,..., θ q are constants, and Θ(z) = 1 + q θ i z t i. We now check that X t is a stationary sequence: E[X t ] = E[Z t ] + q θ i E[Z t i ] = 0. If h > q, Cov(X t, X t+h ) = Cov( q θ i Z t i, i=0 q θ j Z t+h j ) = j=0 q θ i θ j Cov(Z t i, Z t+h j ) = 0, since if h > q and j q, then t + h j > t, so that t + h j > t i, so that Z t i and Z t+h j are uncorrelated. i,j=0 If 0 h q, the random variables X t and X t+h contain some of the same Z i. Cov(X t, X t+h ) = Cov(θ q Z t q θ q h Z t+h q θ 0 Z t, θ q Z t+h q +... θ h Z t θ 0 Z t+h ) = Cov(θ q Z t q +...+θ q h+1 Z t+h q 1 + q θ q i Z t q+i, i=h q θ q+h i Z t q+i +θ h 1 Z t θ 0 Z t+h ) i=h q h q = σ 2 θ q i θ q i+h = σ 2 θ q i h θ q i i=h Since this covariance does not depend on t, we see that the moving-average process of order q is weakly stationary. To find the autocorrelation function, we just need to compute E[X 2 t ] = Cov(X t, X t ) = Cov( i=0 q θ i Z t i, i=0 q θ i Z t i ) = σ 2 i=0 q θi 2. i=0 4 4

38 Combining all our computations above, we get q h σ γ X (h) = 2 θ q i h θ q i 0 h q i=0 0 h > q (5) and ρ X (h) = q h θ q i h θ q i i=0 q 0 h q (6) i=0 θ 2 i 0 h > q 4 5

39 Math 4506 (Spring 2015) February 11, 2015 Prof. Christian Beneš Lecture #5: MA processes - Autocovariance; AR processes Reference. Section 4.2 from the textbook. 5.1 ACF of MA Processes Example 5.1. (MA(1) process) Let s examine the ACF of a MA(1) process: If X t = Z t + θ 1 Z t 1, we have θ 0 = 1, θ 1 0, and θ i = 0 for all i > 1. Therefore, using (5) and (6), we get and 1 0 γ X (0) = σ 2 θ 1 i θ 1 i = σ 2 (1 + θ 2 ), i=0 1 1 γ X (1) = σ 2 θ i θ 1 i = σ 2 θ 0 θ 1 = σ 2 θ 1 i=0 γ X (h) = 0, h > 2, ρ X (0) = 1, ρ X (1) = σ 2 θ 1 σ 2 (1 + θ 2 ), ρ X (h) = 0, h 2, Example 5.2. (MA(2) process) We ll now compute the ACF of a MA(2) process. Again, this is straightforward with the help of (5) and (6): 2 0 γ X (0) = σ 2 θ 2 i 0 θ 2 i = σ 2 (1 + θ1 2 + θ2), 2 i=0 2 1 γ X (1) = σ 2 θ 2 i 1 θ 2 i = σ 2 (θ 1 θ 2 + θ 1 ) i=0 2 2 γ X (2) = σ 2 θ 2 i 2 θ 2 i = σ 2 θ 2 i=0 γ X (h) = 0, h

40 Therefore, the ACF is ρ X (0) = 1, ρ X (1) = θ 1θ 2 + θ θ1 2 + θ2 2 θ 2 ρ X (2) = 1 + θ1 2 + θ2 2 ρ X (h) = 0, h 3. Example 5.3. Let us now simulate two MA(2) processes. First, consider the process We can simulate it as follows: > Z=rnorm(500) > X=Z > for (i in 3:500) X[i]=Z[i]+Z[i-1]-Z[i-2] > plot(x,type= l ) X t = Z t + Z t 1 Z t 2. X Index Let s now change the signs of the coefficients in the time series above to see what the process X t = Z t Z t 1 + Z t 2 looks like. > Z=rnorm(500) > X=Z > for (i in 3:500) X[i]=Z[i]-Z[i-1]+Z[i-2] > plot(x,type= l ) 5 2

41 X Index 5.2 Autoregressive Processes Recall the following definition: Definition 5.1. We define the backwards shift operator B by BX t = X t 1. For j 2, we define In other words, B j X t = BB j 1 X t. B j X t = X t j. Example 5.4. Recall that for n 1, we defined random walks S n as follows: If {X i } i 1 is i.i.d. noise, n S n = X i. Another way of defining random walk is by defining S 1 = X 1 and for n 2, or, with the backward shift notation, S n = S n 1 + X n S n BS n = X n. We can use the factorization that we use for real numbers in this case as well, but have to be careful and realize that the symbolic factorization is for operators (in particular, 1 represents the identity operator, not the number one). This gives (1 B)S n = X n. One natural way of introducing correlation into a time series model is by defining the time series recursively. 5 3

42 Definition 5.2. We define an autoregressive process of order p to be a process X satisfying for all t Z, X t φ 1 X t 1... φ p X t p = Z t (7) (1 φ 1 B φ 2 B 2 φ p B p )X t = Z t Φ p (B)X t = Z t, where Z t W N(0, σ 2 ), Z t is independent of X s, s < t, and Φ p (z) = 1 p φ iz i. 5 4

43 Math 4506 (Spring 2015) February 18, 2015 Prof. Christian Beneš Lecture #6: AR processes Reference. Section 4.3 from the textbook. 6.1 AR processes Definition 6.1. We define an autoregressive process of order p to be a process X satisfying for all t Z, X t φ 1 X t 1... φ p X t p = Z t (8) (1 φ 1 B φ 2 B 2 φ p B p )X t = Z t Φ p (B)X t = Z t, where Z t W N(0, σ 2 ), Z t is independent of X s, s < t, and Φ p (z) = 1 p φ iz i. Note that random walk S n is defined by the equation (1 B)S n = X n, so random walk is a particular case of an AR(1) process. We already saw that random walk is not stationary, so we see that there are processes satisfying the AR equation that aren t stationary. Note that this is different from MA processes, which are always stationary. 6.2 Stationarity of AR processes It turns out that for any set of parameters {φ i } 1 i p, this process exists. However, it isn t always stationary. The criterion for stationarity is quite simple: An AR(p) process is stationary if and only if all roots of the characteristic equation Φ p (z) = 0 have modulus greater than 1. In that case, the process is uniquely defined by the equation (8) In other words, if z 1,... z p are the roots of the characteristic equation, we need z i 1 for all i {1,..., p}. Note that the z i have to be thought of as complex numbers. Let s see what might go wrong when φ = 1 by looking at simple random walk: Example 6.1. Is the AR(3) process defined by stationary? X t = X t 2 + X t 3 + Z t We can rewrite the equation above as Φ 3 (B)X t = Z t, where Φ 3 (z) = 1 z 2 z 3. Therefore, we need to find the roots of the characteristic polynomial Φ 3 (z) = 1 z 2 z 3. This is best done with the help of R: First define the vector of coefficients of the polynomial > a=c(1,0,-1,-1) 6 1

University of Regina. Lecture Notes. Michael Kozdron

University of Regina. Lecture Notes. Michael Kozdron University of Regina Statistics 252 Mathematical Statistics Lecture Notes Winter 2005 Michael Kozdron kozdron@math.uregina.ca www.math.uregina.ca/ kozdron Contents 1 The Basic Idea of Statistics: Estimating

More information

Statistics of stochastic processes

Statistics of stochastic processes Introduction Statistics of stochastic processes Generally statistics is performed on observations y 1,..., y n assumed to be realizations of independent random variables Y 1,..., Y n. 14 settembre 2014

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 MA 575 Linear Models: Cedric E Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 1 Revision: Probability Theory 11 Random Variables A real-valued random variable is

More information

Notes on Random Variables, Expectations, Probability Densities, and Martingales

Notes on Random Variables, Expectations, Probability Densities, and Martingales Eco 315.2 Spring 2006 C.Sims Notes on Random Variables, Expectations, Probability Densities, and Martingales Includes Exercise Due Tuesday, April 4. For many or most of you, parts of these notes will be

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

For a stochastic process {Y t : t = 0, ±1, ±2, ±3, }, the mean function is defined by (2.2.1) ± 2..., γ t,

For a stochastic process {Y t : t = 0, ±1, ±2, ±3, }, the mean function is defined by (2.2.1) ± 2..., γ t, CHAPTER 2 FUNDAMENTAL CONCEPTS This chapter describes the fundamental concepts in the theory of time series models. In particular, we introduce the concepts of stochastic processes, mean and covariance

More information

STA 6857 Autocorrelation and Cross-Correlation & Stationary Time Series ( 1.4, 1.5)

STA 6857 Autocorrelation and Cross-Correlation & Stationary Time Series ( 1.4, 1.5) STA 6857 Autocorrelation and Cross-Correlation & Stationary Time Series ( 1.4, 1.5) Outline 1 Announcements 2 Autocorrelation and Cross-Correlation 3 Stationary Time Series 4 Homework 1c Arthur Berg STA

More information

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear

More information

Lecture 22: Variance and Covariance

Lecture 22: Variance and Covariance EE5110 : Probability Foundations for Electrical Engineers July-November 2015 Lecture 22: Variance and Covariance Lecturer: Dr. Krishna Jagannathan Scribes: R.Ravi Kiran In this lecture we will introduce

More information

1 Linear Difference Equations

1 Linear Difference Equations ARMA Handout Jialin Yu 1 Linear Difference Equations First order systems Let {ε t } t=1 denote an input sequence and {y t} t=1 sequence generated by denote an output y t = φy t 1 + ε t t = 1, 2,... with

More information

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations EECS 70 Discrete Mathematics and Probability Theory Fall 204 Anant Sahai Note 5 Random Variables: Distributions, Independence, and Expectations In the last note, we saw how useful it is to have a way of

More information

Multivariate Time Series

Multivariate Time Series Multivariate Time Series Notation: I do not use boldface (or anything else) to distinguish vectors from scalars. Tsay (and many other writers) do. I denote a multivariate stochastic process in the form

More information

Vectors and Matrices Statistics with Vectors and Matrices

Vectors and Matrices Statistics with Vectors and Matrices Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc

More information

Stat 248 Lab 2: Stationarity, More EDA, Basic TS Models

Stat 248 Lab 2: Stationarity, More EDA, Basic TS Models Stat 248 Lab 2: Stationarity, More EDA, Basic TS Models Tessa L. Childers-Day February 8, 2013 1 Introduction Today s section will deal with topics such as: the mean function, the auto- and cross-covariance

More information

STOR Lecture 16. Properties of Expectation - I

STOR Lecture 16. Properties of Expectation - I STOR 435.001 Lecture 16 Properties of Expectation - I Jan Hannig UNC Chapel Hill 1 / 22 Motivation Recall we found joint distributions to be pretty complicated objects. Need various tools from combinatorics

More information

IEOR 4701: Stochastic Models in Financial Engineering. Summer 2007, Professor Whitt. SOLUTIONS to Homework Assignment 9: Brownian motion

IEOR 4701: Stochastic Models in Financial Engineering. Summer 2007, Professor Whitt. SOLUTIONS to Homework Assignment 9: Brownian motion IEOR 471: Stochastic Models in Financial Engineering Summer 27, Professor Whitt SOLUTIONS to Homework Assignment 9: Brownian motion In Ross, read Sections 1.1-1.3 and 1.6. (The total required reading there

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

Problem Set 1 Solution Sketches Time Series Analysis Spring 2010

Problem Set 1 Solution Sketches Time Series Analysis Spring 2010 Problem Set 1 Solution Sketches Time Series Analysis Spring 2010 1. Construct a martingale difference process that is not weakly stationary. Simplest e.g.: Let Y t be a sequence of independent, non-identically

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

Stochastic Processes: I. consider bowl of worms model for oscilloscope experiment:

Stochastic Processes: I. consider bowl of worms model for oscilloscope experiment: Stochastic Processes: I consider bowl of worms model for oscilloscope experiment: SAPAscope 2.0 / 0 1 RESET SAPA2e 22, 23 II 1 stochastic process is: Stochastic Processes: II informally: bowl + drawing

More information

Discrete Probability Refresher

Discrete Probability Refresher ECE 1502 Information Theory Discrete Probability Refresher F. R. Kschischang Dept. of Electrical and Computer Engineering University of Toronto January 13, 1999 revised January 11, 2006 Probability theory

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Multivariate Distributions (Hogg Chapter Two)

Multivariate Distributions (Hogg Chapter Two) Multivariate Distributions (Hogg Chapter Two) STAT 45-1: Mathematical Statistics I Fall Semester 15 Contents 1 Multivariate Distributions 1 11 Random Vectors 111 Two Discrete Random Variables 11 Two Continuous

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying

More information

Multivariate Random Variable

Multivariate Random Variable Multivariate Random Variable Author: Author: Andrés Hincapié and Linyi Cao This Version: August 7, 2016 Multivariate Random Variable 3 Now we consider models with more than one r.v. These are called multivariate

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

Part II. Time Series

Part II. Time Series Part II Time Series 12 Introduction This Part is mainly a summary of the book of Brockwell and Davis (2002). Additionally the textbook Shumway and Stoffer (2010) can be recommended. 1 Our purpose is to

More information

Continuous Random Variables

Continuous Random Variables 1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

3. ARMA Modeling. Now: Important class of stationary processes

3. ARMA Modeling. Now: Important class of stationary processes 3. ARMA Modeling Now: Important class of stationary processes Definition 3.1: (ARMA(p, q) process) Let {ɛ t } t Z WN(0, σ 2 ) be a white noise process. The process {X t } t Z is called AutoRegressive-Moving-Average

More information

Chapter 1 Review of Equations and Inequalities

Chapter 1 Review of Equations and Inequalities Chapter 1 Review of Equations and Inequalities Part I Review of Basic Equations Recall that an equation is an expression with an equal sign in the middle. Also recall that, if a question asks you to solve

More information

Main topics for the First Midterm Exam

Main topics for the First Midterm Exam Main topics for the First Midterm Exam The final will cover Sections.-.0, 2.-2.5, and 4.. This is roughly the material from first three homeworks and three quizzes, in addition to the lecture on Monday,

More information

Slope Fields: Graphing Solutions Without the Solutions

Slope Fields: Graphing Solutions Without the Solutions 8 Slope Fields: Graphing Solutions Without the Solutions Up to now, our efforts have been directed mainly towards finding formulas or equations describing solutions to given differential equations. Then,

More information

Regression Analysis. Ordinary Least Squares. The Linear Model

Regression Analysis. Ordinary Least Squares. The Linear Model Regression Analysis Linear regression is one of the most widely used tools in statistics. Suppose we were jobless college students interested in finding out how big (or small) our salaries would be 20

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

conditional cdf, conditional pdf, total probability theorem?

conditional cdf, conditional pdf, total probability theorem? 6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random

More information

STAT 248: EDA & Stationarity Handout 3

STAT 248: EDA & Stationarity Handout 3 STAT 248: EDA & Stationarity Handout 3 GSI: Gido van de Ven September 17th, 2010 1 Introduction Today s section we will deal with the following topics: the mean function, the auto- and crosscovariance

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation)

MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation) MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation) Last modified: March 7, 2009 Reference: PRP, Sections 3.6 and 3.7. 1. Tail-Sum Theorem

More information

Notes on Random Processes

Notes on Random Processes otes on Random Processes Brian Borchers and Rick Aster October 27, 2008 A Brief Review of Probability In this section of the course, we will work with random variables which are denoted by capital letters,

More information

Lecture 2: Review of Probability

Lecture 2: Review of Probability Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................

More information

On 1.9, you will need to use the facts that, for any x and y, sin(x+y) = sin(x) cos(y) + cos(x) sin(y). cos(x+y) = cos(x) cos(y) - sin(x) sin(y).

On 1.9, you will need to use the facts that, for any x and y, sin(x+y) = sin(x) cos(y) + cos(x) sin(y). cos(x+y) = cos(x) cos(y) - sin(x) sin(y). On 1.9, you will need to use the facts that, for any x and y, sin(x+y) = sin(x) cos(y) + cos(x) sin(y). cos(x+y) = cos(x) cos(y) - sin(x) sin(y). (sin(x)) 2 + (cos(x)) 2 = 1. 28 1 Characteristics of Time

More information

STOR 356: Summary Course Notes

STOR 356: Summary Course Notes STOR 356: Summary Course Notes Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, NC 7599-360 rls@email.unc.edu February 19, 008 Course text: Introduction

More information

STAT 443 Final Exam Review. 1 Basic Definitions. 2 Statistical Tests. L A TEXer: W. Kong

STAT 443 Final Exam Review. 1 Basic Definitions. 2 Statistical Tests. L A TEXer: W. Kong STAT 443 Final Exam Review L A TEXer: W Kong 1 Basic Definitions Definition 11 The time series {X t } with E[X 2 t ] < is said to be weakly stationary if: 1 µ X (t) = E[X t ] is independent of t 2 γ X

More information

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline. Random Variables Amappingthattransformstheeventstotherealline. Example 1. Toss a fair coin. Define a random variable X where X is 1 if head appears and X is if tail appears. P (X =)=1/2 P (X =1)=1/2 Example

More information

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Lecture 3 Stationary Processes and the Ergodic LLN (Reference Section 2.2, Hayashi)

Lecture 3 Stationary Processes and the Ergodic LLN (Reference Section 2.2, Hayashi) Lecture 3 Stationary Processes and the Ergodic LLN (Reference Section 2.2, Hayashi) Our immediate goal is to formulate an LLN and a CLT which can be applied to establish sufficient conditions for the consistency

More information

Multiple Random Variables

Multiple Random Variables Multiple Random Variables This Version: July 30, 2015 Multiple Random Variables 2 Now we consider models with more than one r.v. These are called multivariate models For instance: height and weight An

More information

Lecture 2: Review of Basic Probability Theory

Lecture 2: Review of Basic Probability Theory ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent

More information

Some Time-Series Models

Some Time-Series Models Some Time-Series Models Outline 1. Stochastic processes and their properties 2. Stationary processes 3. Some properties of the autocorrelation function 4. Some useful models Purely random processes, random

More information

1 Presessional Probability

1 Presessional Probability 1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional

More information

Week 12-13: Discrete Probability

Week 12-13: Discrete Probability Week 12-13: Discrete Probability November 21, 2018 1 Probability Space There are many problems about chances or possibilities, called probability in mathematics. When we roll two dice there are possible

More information

Probability Theory and Statistics. Peter Jochumzen

Probability Theory and Statistics. Peter Jochumzen Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................

More information

Getting Started with Communications Engineering. Rows first, columns second. Remember that. R then C. 1

Getting Started with Communications Engineering. Rows first, columns second. Remember that. R then C. 1 1 Rows first, columns second. Remember that. R then C. 1 A matrix is a set of real or complex numbers arranged in a rectangular array. They can be any size and shape (provided they are rectangular). A

More information

STAT 520: Forecasting and Time Series. David B. Hitchcock University of South Carolina Department of Statistics

STAT 520: Forecasting and Time Series. David B. Hitchcock University of South Carolina Department of Statistics David B. University of South Carolina Department of Statistics What are Time Series Data? Time series data are collected sequentially over time. Some common examples include: 1. Meteorological data (temperatures,

More information

STA205 Probability: Week 8 R. Wolpert

STA205 Probability: Week 8 R. Wolpert INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and

More information

Basic Probability. Introduction

Basic Probability. Introduction Basic Probability Introduction The world is an uncertain place. Making predictions about something as seemingly mundane as tomorrow s weather, for example, is actually quite a difficult task. Even with

More information

(, ) : R n R n R. 1. It is bilinear, meaning it s linear in each argument: that is

(, ) : R n R n R. 1. It is bilinear, meaning it s linear in each argument: that is 17 Inner products Up until now, we have only examined the properties of vectors and matrices in R n. But normally, when we think of R n, we re really thinking of n-dimensional Euclidean space - that is,

More information

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3 Probability Paul Schrimpf January 23, 2018 Contents 1 Definitions 2 2 Properties 3 3 Random variables 4 3.1 Discrete........................................... 4 3.2 Continuous.........................................

More information

Final Review: Problem Solving Strategies for Stat 430

Final Review: Problem Solving Strategies for Stat 430 Final Review: Problem Solving Strategies for Stat 430 Hyunseung Kang December 14, 011 This document covers the material from the last 1/3 of the class. It s not comprehensive nor is it complete (because

More information

Notes on Mathematics Groups

Notes on Mathematics Groups EPGY Singapore Quantum Mechanics: 2007 Notes on Mathematics Groups A group, G, is defined is a set of elements G and a binary operation on G; one of the elements of G has particularly special properties

More information

where r n = dn+1 x(t)

where r n = dn+1 x(t) Random Variables Overview Probability Random variables Transforms of pdfs Moments and cumulants Useful distributions Random vectors Linear transformations of random vectors The multivariate normal distribution

More information

Math 5a Reading Assignments for Sections

Math 5a Reading Assignments for Sections Math 5a Reading Assignments for Sections 4.1 4.5 Due Dates for Reading Assignments Note: There will be a very short online reading quiz (WebWork) on each reading assignment due one hour before class on

More information

The Growth of Functions. A Practical Introduction with as Little Theory as possible

The Growth of Functions. A Practical Introduction with as Little Theory as possible The Growth of Functions A Practical Introduction with as Little Theory as possible Complexity of Algorithms (1) Before we talk about the growth of functions and the concept of order, let s discuss why

More information

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get 18:2 1/24/2 TOPIC. Inequalities; measures of spread. This lecture explores the implications of Jensen s inequality for g-means in general, and for harmonic, geometric, arithmetic, and related means in

More information

LECTURES 2-3 : Stochastic Processes, Autocorrelation function. Stationarity.

LECTURES 2-3 : Stochastic Processes, Autocorrelation function. Stationarity. LECTURES 2-3 : Stochastic Processes, Autocorrelation function. Stationarity. Important points of Lecture 1: A time series {X t } is a series of observations taken sequentially over time: x t is an observation

More information

Lecture 2: Univariate Time Series

Lecture 2: Univariate Time Series Lecture 2: Univariate Time Series Analysis: Conditional and Unconditional Densities, Stationarity, ARMA Processes Prof. Massimo Guidolin 20192 Financial Econometrics Spring/Winter 2017 Overview Motivation:

More information

18.440: Lecture 28 Lectures Review

18.440: Lecture 28 Lectures Review 18.440: Lecture 28 Lectures 18-27 Review Scott Sheffield MIT Outline Outline It s the coins, stupid Much of what we have done in this course can be motivated by the i.i.d. sequence X i where each X i is

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Time Series 2. Robert Almgren. Sept. 21, 2009

Time Series 2. Robert Almgren. Sept. 21, 2009 Time Series 2 Robert Almgren Sept. 21, 2009 This week we will talk about linear time series models: AR, MA, ARMA, ARIMA, etc. First we will talk about theory and after we will talk about fitting the models

More information

The Hilbert Space of Random Variables

The Hilbert Space of Random Variables The Hilbert Space of Random Variables Electrical Engineering 126 (UC Berkeley) Spring 2018 1 Outline Fix a probability space and consider the set H := {X : X is a real-valued random variable with E[X 2

More information

Lesson 4: Stationary stochastic processes

Lesson 4: Stationary stochastic processes Dipartimento di Ingegneria e Scienze dell Informazione e Matematica Università dell Aquila, umberto.triacca@univaq.it Stationary stochastic processes Stationarity is a rather intuitive concept, it means

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

Lecture 6 Basic Probability

Lecture 6 Basic Probability Lecture 6: Basic Probability 1 of 17 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 6 Basic Probability Probability spaces A mathematical setup behind a probabilistic

More information

Random Variables and Expectations

Random Variables and Expectations Inside ECOOMICS Random Variables Introduction to Econometrics Random Variables and Expectations A random variable has an outcome that is determined by an experiment and takes on a numerical value. A procedure

More information

Math 381 Midterm Practice Problem Solutions

Math 381 Midterm Practice Problem Solutions Math 381 Midterm Practice Problem Solutions Notes: -Many of the exercises below are adapted from Operations Research: Applications and Algorithms by Winston. -I have included a list of topics covered on

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Lecture Note 1: Probability Theory and Statistics

Lecture Note 1: Probability Theory and Statistics Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 1: Probability Theory and Statistics Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 For this and all future notes, if you would

More information

A Probability Review

A Probability Review A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in

More information

Joint Distribution of Two or More Random Variables

Joint Distribution of Two or More Random Variables Joint Distribution of Two or More Random Variables Sometimes more than one measurement in the form of random variable is taken on each member of the sample space. In cases like this there will be a few

More information

Sample Spaces, Random Variables

Sample Spaces, Random Variables Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted

More information

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables Chapter 2 Some Basic Probability Concepts 2.1 Experiments, Outcomes and Random Variables A random variable is a variable whose value is unknown until it is observed. The value of a random variable results

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

1 Basic continuous random variable problems

1 Basic continuous random variable problems Name M362K Final Here are problems concerning material from Chapters 5 and 6. To review the other chapters, look over previous practice sheets for the two exams, previous quizzes, previous homeworks and

More information

Lecture 10: Powers of Matrices, Difference Equations

Lecture 10: Powers of Matrices, Difference Equations Lecture 10: Powers of Matrices, Difference Equations Difference Equations A difference equation, also sometimes called a recurrence equation is an equation that defines a sequence recursively, i.e. each

More information

The Multivariate Normal Distribution 1

The Multivariate Normal Distribution 1 The Multivariate Normal Distribution 1 STA 302 Fall 2017 1 See last slide for copyright information. 1 / 40 Overview 1 Moment-generating Functions 2 Definition 3 Properties 4 χ 2 and t distributions 2

More information

Probability on a Riemannian Manifold

Probability on a Riemannian Manifold Probability on a Riemannian Manifold Jennifer Pajda-De La O December 2, 2015 1 Introduction We discuss how we can construct probability theory on a Riemannian manifold. We make comparisons to this and

More information

Jointly Distributed Random Variables

Jointly Distributed Random Variables Jointly Distributed Random Variables CE 311S What if there is more than one random variable we are interested in? How should you invest the extra money from your summer internship? To simplify matters,

More information

STAT 414: Introduction to Probability Theory

STAT 414: Introduction to Probability Theory STAT 414: Introduction to Probability Theory Spring 2016; Homework Assignments Latest updated on April 29, 2016 HW1 (Due on Jan. 21) Chapter 1 Problems 1, 8, 9, 10, 11, 18, 19, 26, 28, 30 Theoretical Exercises

More information

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices Lecture 3: Simple Linear Regression in Matrix Format To move beyond simple regression we need to use matrix algebra We ll start by re-expressing simple linear regression in matrix form Linear algebra is

More information

Discrete Mathematics for CS Spring 2006 Vazirani Lecture 22

Discrete Mathematics for CS Spring 2006 Vazirani Lecture 22 CS 70 Discrete Mathematics for CS Spring 2006 Vazirani Lecture 22 Random Variables and Expectation Question: The homeworks of 20 students are collected in, randomly shuffled and returned to the students.

More information

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna

More information

Math 300: Final Exam Practice Solutions

Math 300: Final Exam Practice Solutions Math 300: Final Exam Practice Solutions 1 Let A be the set of all real numbers which are zeros of polynomials with integer coefficients: A := {α R there exists p(x) = a n x n + + a 1 x + a 0 with all a

More information

Brief Review of Probability

Brief Review of Probability Brief Review of Probability Nuno Vasconcelos (Ken Kreutz-Delgado) ECE Department, UCSD Probability Probability theory is a mathematical language to deal with processes or experiments that are non-deterministic

More information

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities PCMI 207 - Introduction to Random Matrix Theory Handout #2 06.27.207 REVIEW OF PROBABILITY THEORY Chapter - Events and Their Probabilities.. Events as Sets Definition (σ-field). A collection F of subsets

More information

Notes for Math 324, Part 19

Notes for Math 324, Part 19 48 Notes for Math 324, Part 9 Chapter 9 Multivariate distributions, covariance Often, we need to consider several random variables at the same time. We have a sample space S and r.v. s X, Y,..., which

More information