Stochastic Processes (Master degree in Engineering) Franco Flandoli

Size: px
Start display at page:

Download "Stochastic Processes (Master degree in Engineering) Franco Flandoli"

Transcription

1 Stochastic Processes (Master degree in Engineering) Franco Flandoli

2

3 Contents Preface v Chapter. Preliminaries of Probability. Transformation of densities. About covariance matrices 3 3. Gaussian vectors 5 Chapter. Stochastic processes. Generalities 3. Discrete time stochastic process 3. Stationary processes 6 3. Time series and empirical quantities 9 4. Gaussian processes 5. Discrete time Fourier transform 6. Power spectral density 4 7. Fundamental theorem on PSD 6 8. Signal to noise ratio 3 9. An ergodic theorem 3 Chapter 3. ARIMA models 37. De nitions 37. Stationarity, ARMA and ARIMA processes 4 3. Correlation function 4 4. Power spectral density 45 iii

4

5 Preface These notes are planned to be the last part of a course of Probability and Stochastic Processes. The rst part is devoted to the introduction to the following topics, taken for instance from the book of Baldi (Italian language) or Billingsley (in English): Probability space (; F; P ) Conditional probability and independence of events Factorization formula and Bayes formula Concept of random variable, random vector = ( ; :::; n ) Law of a r.v., probability density (discrete and continuous) Distribution function and quantiles Joint law of a vector and marginal laws, relations (Transformation of densities and moments) (see complements below) Expectation, properties Moments, variance, standard deviation, properties Covariance and correlation coe cient, covariance matrix Generating function and characteristic function (Discrete r.v.: Bernoulli, binomial, Poisson, geometric) Continuous r.v.: uniform, exponential, Gaussian, Weibull, Gamma Notions of convergence of r.v. (Limit theorems: LLN, CLT; Chebyshev inequality.) Since we need some more specialized material, Chapter is a complement to this list of items. v

6

7 CHAPTER Preliminaries of Probability. Transformation of densities Exercise. If has cdf F (x) and g is increasing and continuous, then Y = g () has cdf F Y (y) = F g (y) for all y in the image of y. If g is decreasing and continuous, the formula is F Y (y) = F g (y) Exercise. If has continuous pdf f (x) and g is increasing and di erentiable, then Y = g () has pdf f Y (y) = f g (y) g (g (y)) = f (x) g (x) y=g(x) for all y in the image of y. If g is decreasing and di erentiable, the formula is f Y (y) = f (x) g (x) : y=g(x) Thus, in general, we have the following result. Proposition. If g is monotone and di erentiable, the transformation of densities is given by f Y (y) = f (x) jg (x)j y=g(x) Remark. Under proper assumptions, when g is not injective the formula generalizes to f Y (y) = f (x) jg (x)j : x:y=g(x) Remark. A second proof of the previous formula comes from the following characterization of the density: f is the density of if and only if Z E [h ()] = h (x) f (x) dx R for all continuous bounded functions h. Let us use this fact to prove that f Y (y) = f (x) is the y=g(x) density of Y = g (). Let us compute E [h (Y )] for a generic continuous bounded functions h. We jg (x)j

8 . PRELIMINARIES OF PROBABILITY have, from the de nition of Y and from the characterization applied to, Z E [h (Y )] = E [h (g ())] = h (g (x)) f (x) dx: Let us change variable y = g (x), under the assumption that g is monotone, bijective and di erentiable. We have x = g (y), dx = dy (we put the absolute value since we do not change the extreme jg (g (y))j of integration, but just rewrite R R ) so that Z Z h (g (x)) f (x) dx = h (y) f g (y) R R jg (g (y))j dy: If we set f Y (y) := f (x) jg (x)j we have proved that y=g(x) Z E [h (Y )] = h (y) f Y (y) dy R for every continuous bounded functions h. By the characterization, this implies that f Y (y) is the density of Y. This proof is thus based on the change of variable formula. Remark 3. The same proof works in the multidimensional case, using the change of variable formula for multiple integrals. Recall that in place of dy = g (x)dx one has to use dy = jdet Dg (x)j dx where Dg is the Jacobian (the matrix of rst derivatives) of the transformation g : R n! R n. In fact we need the inverse transformation, so we use the corresponding formula dx = det Dg (y) dy = jdet Dg (g (y))j dy: With the same passages performed above, one gets the following result. Proposition. If g is a di erentiable bijection and Y = g (), then f (x) f Y (y) = jdet Dg (x)j : y=g(x) Exercise 3. If (in R n ) has density f (x) and Y = U, where U is an orthogonal linear transformation of R n (it means that U = U T ), then Y has density R f Y (y) = f U T y :.. Linear transformation of moments. The solution of the following exercises is based on the linearity of expected value (and thus of covariance in each argument). Exercise 4. Let = ( ; :::; n ) be a random vector, A be a n d matrix, Y = A. = ; :::; n be the vector of mean values of, namely i = E [ i ]. Then Y := A is the vector of mean values of Y, namely Y i = E [Y i ]. Exercise 5. Under the same assumptions, if Q and Q Y are the covariance matrices of and Y, then Q Y = AQ A T : Let

9 . ABOUT COVARIANCE MATRICES 3. About covariance matrices The covariance matrix Q of a vector = ( ; :::; n ), de ned as Q ij = Cov ( i ; j ), is symmetric: and non-negative de nite: x T Qx = Q ij = Cov ( i ; j ) = Cov ( j ; i ) = Q ji Q ij x i x j = i;j= = x i i ; i= Cov ( i ; j ) x i x j = i;j= x j j A = V ar [W ] j= Cov (x i i ; x j j ) where W = P n i= x i i. The spectral theorem states that any symmetric matrix Q can be diagonalized, namely it exists a orthonormal basis e ; :::; e n of R n where Q takes the form Q e ::: n Moreover, the numbers i are eigenvalues of Q, and the vectors e i are corresponding eigenvectors. Since the covariance matrix Q is also non-negative de nite, we have A : i ; i = ; :::; n: Remark 4. To understand better this theorem, recall a few facts of linear algebra. R n is a vector space with a scalar product h:; :i, namely a set of elements (called vectors) with certain operations (sum of vectors, multiplication by real numbers, scalar product between vectors) and properties. We may call intrinsic the objects de ned in these terms, opposite to the objects de ned by means of numbers, with respect to a given basis. A vector x R n is an intrinsic object; but we can write it as a sequence of numbers (x ; :::; x n ) in in nitely many ways, depending on the basis we choose. Given an orthonormal basis u ; :::; u n, the components of a vector x R n in this basis are the numbers hx; u j i, j = ; :::; n. A linear map L in R n, given the basis u ; :::; u n, can be represented by a matrix of components hlu i ; u j i. We shall write y T x for hx; yi (or hy; xi). Remark 5. After these general comments, we see that a matrix represents a linear transformation, given a basis. Thus, given the canonical basis of R n, that we shall denote by u ; :::; u n, given the matrix Q, it is de ned a linear transformation L from R n to a R n. The spectral theorem states that there is a new orthonormal basis e ; :::; e n of R n such that, if Q e represents the linear transformation L in this new basis, then Q e is diagonal. Remark 6. Let us recall more facts about linear algebra. Start with an orthonormal basis u ; :::; u n, that we call canonical or original basis. Let e ; :::; e n be another orthonormal basis. The vector u, in i;j=

10 4. PRELIMINARIES OF PROBABILITY the canonical basis, has components u = and so on for the other vectors. Each vector e j has certain components. Denote by U the matrix such that its rst column has the same components as e (those of the canonical basis), and so on for the other columns. We could write U = (e ; :::; e n ). Also, U ij = e T j u i. Then U ::: ::: C A C A = e and so on, namely U represents the linear map which maps the canonical (original) basis of R n into e ; :::; e n. This is an orthogonal transformation: U = U T : Indeed, U maps e ; :::; e n into the canonical basis (by the above property of U), and U T same: e T U T e e = B e e C ::: A = B ::: A e T n e and so on. does the Remark 7. Let us now go back to the covariance matrix Q and the matrix Q e given by the spectral theorem: Q e is a diagonal matrix which represents the same linear transformation L in a new basis e ; :::; e n. Assume we do not know anything else, except they describe the same map L and Q e is diagonal, namely of the form Q e ::: A : n Let us deduce a number of facts: i) Q e = UQU T ii) the diagonal elements j are eigenvalues of L, with eigenvectors e j iii) j, j = ; :::; n. To prove (i), recall from above that (Q e ) ij = e T j Le i and Q ij = u T j Lu i :

11 Moreover, U ij = e T j u i, hence e j = P n k= U kju k, and thus (Q e ) ij = e T j Le i = U ki U k ju T k Lu k = k;k = 3. GAUSSIAN VECTORS 5 k;k = U ki Q ij U k j = UQU T ij : To prove (ii), let us write the vector Le in the basis e ; :::; e n : e i is the vector represented by Q e, hence Le is equal to Q e ::: C A = ::: C A = ::: C A ::: C A, the map L is which is e in the basis e ; :::; e n. We have checked that Le = e, namely that is an eigenvalue and e is a corresponding eigenvector. The proof for, etc. is the same. To prove (iii), just see that, in the basis e ; :::; e n, e T j Q e e j = j : But e T j Q e e j = e T j UQU T e j = v T Qv where v = U T e j, having used the property that Q is non-negative de nite. Hence j. 3. Gaussian vectors Recall that a Gaussian, or Normal, r.v. N ; is a r.v. with probability density! f (x) = p exp jx j : We have shown that is the mean value and the variance. The standard Normal is the case =, =. If Z is a standard normal r.v., then + Z is N ;. We may give the de nition of Gaussian vector in two ways, generalizing either the expression of the density or the property that + Z is N ;. Let us start with a lemma. Lemma. Given a vector = ( ; :::; n ) and a symmetric positive de nite n n matrix Q (namely v T Qv > for all v 6= ), consider the function! f (x) = p () n det(q) exp (x ) T Q (x ) where x = (x ; :::; x n ) R n. Notice that the inverse Q is well de ned for positive de nite matrices, (x ) T Q (x ) is a positive quantity, det(q) is a positive number. Then: i) f (x) is a probability density;

12 6. PRELIMINARIES OF PROBABILITY ii) if = ( ; :::; n ) is a random vector with such joint probability density, then is the vector of mean values, namely and Q is the covariance matrix: i = E [ i ] Q ij = Cov ( i ; j ) : Proof. Step. In this step we explain the meaning of the expression f (x). We have recalled above that any symmetric matrix Q can be diagonalized, namely it exists a orthonormal basis e ; :::; e n of R n where Q takes the form Q e ::: A : n Moreover, the numbers i are eigenvalues of Q, and the vectors e i are corresponding eigenvectors. See above for more details. Let U be the matrix introduced there, such that U = U T. Recall the relation Q e = UQU T. Since v T Qv > for all v 6=, we deduce v T Q e v = v T U Q U T v > for all v 6= (since U T v 6= ). Taking v = e i, we get i >. Therefore the matrix Q e is invertible, with inverse given by Qe ::: A : n It follows that also Q, being equal to U T Q e U (the relation Q = U T Q e U comes from Q e = UQU T ), is invertible, with inverse Q = U T Qe U. Easily one gets (x ) T Q (x ) > for x 6=. Moreover, because det(q) = det U T det (Q e ) det (U) = n det(q e ) = n and det (U) =. The latter property comes from = det I = det U T U = det U T det (U) = det (U) (to be used in exercise 3). Therefore det(q) >. The formula for f (x) is meaningful and de nes a positive function. Step. Let us prove that f (x) is a density. By the theorem of change of variables in multidimensional integrals, with the change of variables x = U T y, Z Z f (x) dx = f U T y dy R n R n

13 3. GAUSSIAN VECTORS 7 because det U T = (and the Jacobian of a linear transformation is the linear map itself). Now, since UQ U T = Qe, f U T y is equal to the following function:! f e (y) = p () n det(q e ) exp (y e ) T Qe (y e ) where Since and det(q e ) = n, we get e = U: (y e ) T Q e (y e ) = f e (y) = ny p exp i i= (y i ( e ) i ) i= i! (y i ( e ) i ) : i Namely, f e (y) is the product of n Gaussian densities N (( e ) i ; i ). We know from the theory of joint probability densities that the product of densities is the joint density of a vector with independent components. Hence f e (y) is a probability density. Therefore R R R f n e (y) dy =. This proves R f (x) dx =, so that f is a probability density. n Step 3. Let = ( ; :::; n ) be a random vector with joint probability density f, when written in the original basis. Let Y = U. Then (exercise 3) Y has density f Y (y) given by f Y (y) = f U T y. Thus! ny (y i ( e ) f Y (y) = f e (y) = p exp i ) : i i Thus (Y ; :::; Y n ) are independent N (( e ) i ; i ) r.v. and therefore i= E [Y i ] = ( e ) i ; Cov (Y i ; Y j ) = ij i : From exercises 4 and 5 we deduce that = U T Y has mean and covariance = U T Y Q = U T Q Y U: Since Y = e and e = U we readily deduce = U T U =. Since Q Y = Q e and Q = U T Q e U we get Q = Q. The proof is complete. Definition. Given a vector = ( ; :::; n ) and a symmetric positive de nite n n matrix Q, we call Gaussian vector of mean and covariance Q a random vector = ( ; :::; n ) having joint probability density function! f (x) = p () n det(q) exp (x ) T Q (x ) where x = (x ; :::; x n ) R n. We write N (; Q).

14 8. PRELIMINARIES OF PROBABILITY The only drawback of this de nition is the restriction to strictly positive de nite matrices Q. It is sometimes useful to have the notion of Gaussian vector also in the case when Q is only non-negative de nite (sometimes called degenerate case). For instance, we shall see that any linear transformation of a Gaussian vector is a Gaussian vector, but in order to state this theorem in full generality we need to consider also the degenerate case. In order to give a more general de nition, let us take the idea recalled above for the -dimensional case: a ne transformations of Gaussian r.v. are Gaussian. Definition. i) The standard d-dimensional Gaussian vector is the random vector Z = (Z ; :::; Z d ) dy with joint probability density f (z ; :::; z d ) = p (z i ) where p (z) = p e z : i= ii) All other Gaussian vectors = ( ; :::; n ) (in any dimension n) are obtained from standard ones by a ne transformations: = AZ + b where A is a matrix and b is a vector. If has dimension n, we require A to be d n and b to have dimension n (but n can be di erent from d). The graph of a standard -dimensional Gaussian vector is.5 z..5. y x and the graph of the other Gaussian vectors can be guessed by linear deformations of the base plane xy (deformations de ned by A) and shift (by b). For instance, if A = matrix which enlarge the x axis by a factor, we get the graph

15 3. GAUSSIAN VECTORS 9.5 z y x First, let us compute the mean and covariance matrix of a vector of the form = AZ + b, with Z of standard type. From exercises 4 and 5 we readily have: by Proposition 3. Mean and covariance Q matrix of a vector of the previous form are given = b Q = AA T : When two di erent de nitions are given for the same object, one has to prove their equivalence. If Q is positive de nite, the two de nition aim to describe the same object, but for Q non-negative de nite but not strictly positive de nite, we have only the last de nition, so we do not have to check any compatibility. Proposition 4. If Q is positive de nite, then de nitions and are equivalent. More precisely, if = ( ; :::; n ) is a Gaussian random vector with mean and covariance Q in the sense of de nition, then there exists a standard Gaussian random vector Z = (Z ; :::; Z n ) and a n n matrix A such that = AZ + : One can take A = p Q, as described in the proof. Vice versa, if = ( ; :::; n ) is a Gaussian random vector in the sense of de nition, of the form = AZ + b, then is Gaussian in the sense of de nition, with mean and covariance Q given by the previous proposition. Proof. Let us prove the rst claim. Let us de ne p Q = U T p Q e U where p Q e is simply de ned as We have p Qe p ::: A p : n p Q T = U T p Qe T U = U T p Q e U = p Q

16 . PRELIMINARIES OF PROBABILITY and p Q = U T p Q e UU T p Q e U = U T p Q e p Qe U = U T Q e U = Q because p p Q e Qe = Q e. Set p Z = Q where notice that p Q is invertible, from its de nition and the strict positivity of i. Gaussian. Indeed, from the formula for the transformation of densities, f (x) f Z (z) = jdet Dg (x)j z=g(x) Then Z is where g (x) = p Q p x ; hence det Dg (x) = det Q = p p ; therefore n p ny p Qz + f Z (z) = i p () n det(q) exp T p Q Qz + = i= p T Qz Q p p () n exp Qz! = p () n exp z T z! which is the density of a standard Gaussian vector. From the de nition of Z we get = p QZ +, so the rst claim is proved. The proof of the second claim is a particular case of the next exercise, that we leave to the reader. Exercise 6. Let = ( ; :::; n ) be a Gaussian random vector, B a n m matrix, c a vector of R m. Then Y = B + c is a Gaussian random vector of dimension m. The relations between means and covariances is and covariance Y = B + c Q Y = BQ B T : Remark 8. We see from the exercise that we may start with a non-degenerate vector and get a degenerate one Y, if B is not a bijection. This always happens when m > n. Remark 9. The law of a Gaussian vector is determined by the mean vector and the covariance matrix. This fundamental fact will be used below when we study stochastic processes. Remark. Some of the previous results are very useful if we want to generate random vectors according to a prescribed Gaussian law. Assume we have prescribed mean and covariance Q, n- dimensional, and want to generate a random sample (x ; :::; x n ) from such N (; Q). Then we may generate n independent samples z ; :::; z n from the standard one-dimensional Gaussian law and compute p Qz +

17 3. GAUSSIAN VECTORS where z = (z ; :::; z n ). In order to have the entries of the matrix p Q, if the software does not provide them (certain software do it), we may use the formula p Q = U T p Q e U. The matrix p Q e is obvious. In order to get the matrix U recall that its columns are the vectors e ; :::; e n written in the original basis. And such vectors are an orthonormal basis of eigenvectors of Q. Thus one has to use at least a software that makes the spectral decomposition of a matrix, to get e ; :::; e n.

18

19 CHAPTER Stochastic processes. Generalities. Discrete time stochastic process We call discrete time stochastic process any sequence ; ; ; :::; n ; ::: of random variables de ned on a probability space (; F; P ), taking values in R. This de nition is not so rigid with respect to small details: the same name is given to sequences ; ; :::; n ; :::, or to the case when the r.v. n take values in a space di erent from R. We shall also describe below the case when the time index takes negative values. The main objects attached to a r.v. are its law, its rst and second moments (and possibly higher order moments and characteristic or generating function, and the distribution function). We do the same for a process ( n ) n : the probability density of the r.v. n, when it exists, will be denoted by f n (x), the mean by n, the standard deviation by n. Often, we shall write t in place of n, but nevertheless here t will be always a non-negative integer. So, our rst concepts are: i) mean function and variance function: t = E [ t ] ; t = V ar [ t ] ; t = ; ; ; ::: In addition, the time-correlation is very important. We introduce three functions: ii) the autocovariance function C (t; s), t; s = ; ; ; :::: C (t; s) = E [( t t ) ( s s )] and the function R (t; s) = E [ t s ] (the name will be discussed below). They are symmetric (R (t; s) = R (s; t) and the same for C (t; s)) so it is su cient to know them for t s. We have C (t; s) = R (t; s) t s ; C (t; t) = t : In particular, when t (which is often the case), C (t; s) = R (t; s). Most of the importance will be given to t and R (t; s). In addition, let us introduce: iii) the autocorrelation function C (t; s) (t; s) = t s We have (t; t) = ; j (t; s)j : The functions C (t; s), R (t; s), (t; s) are used to detect repetitions in the process, self-similarities under time shift. For instance, if ( n ) n is roughly periodic of period P, (t + P; t) will be signi - cantly higher than the other values of (t; s) (except (t; t) which is always equal to ). Also a trend 3

20 4. STOCHASTIC PROCESSES. GENERALITIES is a form of repetitions, self-similarity under time shift, and indeed when there is a trend all values of (t; s) are quite high, compared to the cases without trend. See the numerical example below. Other objects (when de ned) related to the time structure are: iv) the joint probability density of the vector ( t ; :::; tn ) and v) the conditional density f t ;:::;t n (x ; :::; x n ) ; ; t n ::: t f tjs (xjy) = f t;s (x; y) ; t > s: f s (y) Now, a remark about the name of R (t; s). In Statistics and Time Series Analysis, the name autocorrelation function is given to (t; s), as we said above. But in certain disciplines related to signal processing, R (t; s) is called autocorrelation function. There is no special reason except the fact that R (t; s) is the fundamental quantity to be understood and investigated, the others (C (t; s) and (t; s)) being simple transformations of R (t; s). Thus R (t; s) is given the name which mostly reminds the concept of self-relation between values of the process at di erent times. In the sequel we shall use both languages and sometimes we shall call (t; s) the autocorrelation coe cient. The last object we introduce is concerned with two processes simultaneously: ( n ) n and (Y n ) n. It is called: vi) cross-correlation function C ;Y (t; s) = E [( t E [ t ]) (Y s E [Y s ])] : This function is a measure of the similarity between two processes, shifted in time. For instance, it can be used for the following purpose: one of the two processes, say Y, is known, has a known shape of interest for us, the other process,, is the process under investigation, and we would like to detect portions of which have a shape similar to Y. Hence we shift in all possible ways and compute the correlation with Y. When more than one process is investigated, it may be better to write R (t; s), C (t; s) and so on for the quantities associated to process... Example : white noise. The white noise with intensity is the process ( n ) n with the following properties: i) ; ; ; :::; n ; ::: are independent r.v. s ii) n N ;. It is a very elementary process, with a trivial time-structure, but it will be used as a building block for other classes of processes, or as a comparison object to understand the features of more complex cases. The following picture has been obtained by R software by the commands x<-rnorm(); ts.plot(x).

21 . DISCRETE TIME STOCHASTIC PROCESS 5 Let us compute all its relevant quantities (the check is left as an exercise): t = t = R (t; s) = C (t; s) = (t s) where the symbol (t s) denotes for t 6= s, for t = s, f t ;:::;t n (x ; :::; x n ) = (t; s) = (t s) ny p (x i ) where p (x) = i= f tjs (xjy) = p (x). p e.. Example : random walk. Let (W n ) n be a white noise (or more generally, a process with independent identically distributed W ; W ; W ; :::). Set = n+ = n + W n ; n : This is a random walk. White noise has been used as a building block: ( n ) n is the solution of a recursive linear equation, driven by white noise (we shall see more general examples later on). The following picture has been obtained by R software by the commands x<-rnorm(); y<-cumsum(x); ts.plot(y). The random variables n are not independent ( n+ obviously depends on n ). One has n+ = W i : We have the following facts We prove them by means of the iterative relation (this generalizes better to more complex discrete linear equations). First, = i= n+ = n ; n hence n = for every n. By induction, n and W n are independent for every n, hence: x

22 6. STOCHASTIC PROCESSES. GENERALITIES Exercise 7. Denote by the intensity of the white noise; nd a relation between n+ and n and prove that n = p n; n : An intuitive interpretation of the result of the exercise is that n behaves as p n, in a very rough way. As to the time-dependent structure, C (t; s) = R (t; s), and: Exercise 8. Prove that R (m; n) = n, for all m n (prove it for m = n, m = m+, m = n+ and extend). Then prove that r n (m; n) = m : The result of this exercise implies that (m; )! as m! : We may interpret this result by saying that the random walk looses memory of the initial position.. Stationary processes A process is called wide-sense stationary if t and R (t + n; t) are independent of t. It follows that also t, C (t + n; t) and (t + n; t) are independent of t. Thus we speak of: i) mean ii) standard deviation iii) covariance function C (n) := C (n; ) iv) autocorrelation function (in the improper sense described above) R (n) := R (n; ) v) autocorrelation coe cient (or also autocorrelation function, in the language of Statistics) (n) := (n; ) :

23 . STATIONARY PROCESSES 7 A process is called strongly stationary if the law of the generic vector ( n +t; :::; nk +t) is independent of t. This implies wide stationarity. The converse is not true in general, but it is true for Gaussian processes (see below)... Example: white noise. We have R (t; s) = (t s) hence R (n) = (n) :.. Example: linear equation with damping. Consider the recurrence relation n+ = n + W n ; n where (W n ) n is a white noise with intensity and ( ; ) : The following picture has been obtained by R software by the commands ( = :9, = ): w <- rnorm() x <- rnorm() x[]= for (i in :999) { x[i+] <-.9*x[i] + w[i] } ts.plot(x) It has some features similar to white noise, but less random, more persistent in the direction where it moves. Let be a r.v. independent of the white noise, with zero average and variance e. Let us show that ( n ) n is stationary (in the wide sense) if e is properly chosen with respect to.

24 8. STOCHASTIC PROCESSES. GENERALITIES First we have = n+ = n ; n hence n = for every n. The mean function is constant. As a preliminary computation, let us impose that the variance function is constant. By induction, n and W n are independent for every n, hence If we want n+ = n for every n, we need namely In particular, this implies the relation n+ = n + ; n : n = n + ; n n = ; n : e = : It is here that we rst see the importance of the condition jj <. If we assume this condition on the law of, then we nd = + = = and so on, n+ = n for every n. Thus the variance function is constant. Finally, we have to show that R (t + n; t) is independent of t. We have which is independent of t; and so on, R (t + ; t) = E [( t + W t ) t ] = n = R (t + ; t) = E [( t+ + W t+ ) t ] = R (t + ; t) = R (t + n; t) = E [( t+n + W t+n ) t ] = R (t + n ; t) = ::: = n R (t; t) = n which is independent of t. The process is stationary. We have It also follows that R (n) = n : (n) = n : The autocorrelation coe cient (as well as the autocovariance function) decays exponentially in time.

25 3. TIME SERIES AND EMPIRICAL QUANTITIES 9.3. Processes de ned also for negative times. We may extend a little bit the previous de nitions and call discrete time stochastic process also the two-sided sequences ( n ) nz of random variables. Such processes are thus de ned also for negative time. The idea is that the physical process they represent started in the far past and continues in the future. This notion is particularly natural in the case of stationary processes. The function R (n) (similarly for C (n) and (n)) are thus de ned also for negative n: R (n) = E [ n ] ; n Z: By stationarity, R ( n) = R (n) because R ( n) = E [ n ] = E [ n+n +n ] = E [ n ] = R (n). Therefore we see that this extension does not contain so much new information; however it is useful or at least it simpli es some computation. 3. Time series and empirical quantities A time series is a sequence or real numbers, x ; :::; x n. Also empirical samples have the same form. The name time series is appropriate when the index i of x i has the meaning of time. A nite realization of a stochastic process is a time series. ideally, when we have an experimental time series, we think that there is a stochastic process behind. Thus we try to apply the theory of stochastic process. Recall from elementary statistics that empirical estimates of mean values of a single r.v. are computed from an empirical sample x ; :::; x n of that r.v.; the higher is n, the better is the estimate. A single sample x is not su cient to estimate moments of. Similarly, we may hope to compute empirical estimates of R (t; s) etc. from time series. But here, when the stochastic process has special properties (stationary and ergodic, see below the concept of ergodicity), one sample is su cient! By one sample we mean one time series (which is one realization of the process, like the single x is one realization of the r.v. ). Again, the higher is n, the better is the estimate, but here n refers to the length of the time series. Consider a time series x ; :::; x n. In the sequel, t and n t are such Let us de ne x t = n t n t i= bc (t) = n t b (t) = b C (t) b b t = t + n t = n: x i+t ; b t = n t (x i+t x t ) n t br (t) = n t n t i= n t i= i= x i x i+t (x i x ) (x i+t x t ) P nt i= (x i x ) (x i+t x t ) q Pnt i= (x i x ) P : n t i= (x i+t x t )

26 . STOCHASTIC PROCESSES. GENERALITIES These quantities are taken as approximations of t ; t ; R (t; ) ; C (t; ) ; (t; ) respectively. In the case of stationary processes, they are approximations of ; ; R (t) ; C (t) ; (t) : In the section on ergodic theorems we shall see rigorous relations between these empirical and theoretical functions. The empirical correlation coe cient P n i= b ;Y = (x i x) (y i y) q Pn i= (x i x) P n i= (y i y) between two sequences x ; :::; x n and y ; :::; y n is a measure of their linear similarity. If the there are coe cients a and b such that the residuals " i = y i (ax i + b) are small, then jb ;Y j is close to ; precisely, b ;Y is close to if a >, close to - if a <. A value of b ;Y close to means that no such linear relation is really good (in the sense of small residuals). Precisely, smallness of residuals must be understood compared to the empirical variance b Y of y ; :::; y n : one can prove that b ;Y = (the so called explained variance, the proportion of variance which has been explained by the linear model). After these remarks, the intuitive meaning of R b (t), C b (t) and b (t) should be clear: they measure the linear similarity between the time series and its t-translation. It is useful to detect repetitions, periodicity, trend. Example. Consider the following time series, taken form EUROSTAT database. export data concerning motor vehicles accessories, since January 995 to December 8. b " b Y It collects Its autocorrelation function b (t) is given by

27 4. GAUSSIAN PROCESSES We see high values (the values of b (t) are always smaller than in absolute value) for all time lag t. The reason is the trend of the original time series (highly non stationary). Example. If we consider only the last few years of the same time series, precisely January 5 - December 8, the data are much more stationary, the trend is less strong. The autocorrelation function b (t) is now given by where we notice a moderate annual periodicity. 4. Gaussian processes If the generic vector ( t ; :::; tn ) is jointly Gaussian, we say that the process is Gaussian. The law of a Gaussian vector is determined by the mean vector and the covariance matrix. Hence the law of the marginals of a Gaussian process are determined by the mean function t and the autocorrelation function R (t; s). Proposition 5. For Gaussian processes, stationarity in the wide and strong sense are equivalent. Proof. Given a Gaussian process ( n ) nn, the generic vector ( t +s; :::; tn+s) is Gaussian, hence with law determined by the mean vector of components and the covariance matrix of components E [ ti +s] = ti +s Cov ti +s; tj +s = R (ti + s; t j + s) ti +s tj +s: If the process is stationary in the wide sense, then ti +s = and R (t i + s; t j + s) ti +s tj +s = R (t i t j ) do not depend on s. Then the law of ( t +s; :::; tn+s) does not depend on s. This means that the process is stationary in the strict sense. The converse is a general fact. The proof is complete. Most of the models in these notes are obtained by linear transformations of white noise. White noise is a Gaussian process. Linear transformations preserve gaussianity. Hence the resulting processes are

28 . STOCHASTIC PROCESSES. GENERALITIES Gaussian. Since we deal very often with stationary processes in the wide sense, being them Gaussian they also are strictly stationary. 5. Discrete time Fourier transform Given a series (x n ) nz of real or complex numbers such that P nz jx nj <, we denote by bx (!) or by F [x] (!) the discrete time Fourier transform (DTFT) de ned as bx (!) = F [x] (!) = p e i!n x n ;! [; ] : nz The function can be considered for all! R, but it is -periodic. Sometimes the factor p is not included in the de nition; sometimes, it is preferable to use the variant bx (f) = p e ifn x n ; f [; ] : nz We make the choice above, independently of the fact that in certain applications it is customary or convenient to make others. The factor p is included for symmetry with the inverse transform or the Plancherel formula (without p, a factor appears in one of them). The L -theory of Fourier series guarantees that the series P nz e i!n x n converges in mean square with respect to!, namely, there exists a square integrable function bx (!) such that Z lim N! e i!n x n bx (!) jnjn d! = : The sequence x n can be reconstructed from its Fourier transform by means of the inverse Fourier transform x n = p Z e i!n bx (!) d!: Among other properties, let us mention Plancherel formula Z jx n j = jbx (!)j d! nz and the fact that under Fourier transform the convolution corresponds to the product: " # F f ( n) g (n) (!) = f b (!) bg (!) : When nz jx n j < nz then the series P nz e i!n x n is absolutely convergent, uniformly in! [; ], simply because sup e i!n x n = sup e i!n jx n j = jx n j < : nz![;] nz![;] nz

29 5. DISCRETE TIME FOURIER TRANSFORM 3 In this case, we may also say that bx (!) is a bounded continuous function, not only square integrable. Notice that the assumption P nz jx nj < implies P nz jx nj <, because P nz jx nj sup nz jx n j P nz jx nj and sup nz jx n j is bounded when P nz jx nj converges. One can do the DTFT also for sequences which do not satisfy the assumption P nz jx nj <, in special cases. Consider for instance the sequence Compute the truncation Recall that Hence sin (! n) = ei! n jnjn e i! n i x n = a sin (! n) : bx N (!) = p e i!n a sin (! n) = i jnjn sin t = eit jnjn e i!n a sin (! n) : e it : i e i(!! )n i jnjn e i(!+! )n : The next lemma makes use of the concept of generalized function or distribution, which is outside the scope of these notes. We still given the result, to be understood in some intuitive sense. We use the generalized function (t) called delta Dirac, which is characterized by the property (5.) Z (t t ) f (t) dt = f (t ) for all continuous compact support functions f. No usual function has this property. A way to get intuition is the following one. Consider a function n (t) which is equal to zero for t outside n ; n, interval of length n around the origin; and equal to n in n ; n. Hence (t t ) is equal to zero for t outside t n ; t + n, equal to n in t n ; t + n. We have Z n (t) dt = : Now, Z Z t + n n (t t ) f (t) dt = n f (t) dt t n which is the average of f around t. As n!, this average converges to f (t ) when f is continuous. Namely. we have Z lim n (t t ) f (t) dt = f (t ) n! which is the analog of identity (5.), but expressed by means of traditional concepts. In a sense, thus, the generalized function (t) is the limit of the traditional functions n (t). But we see that n (t) converges to zero for all t 6=, and to for t =. So, in a sense, (t) is equal to zero for t 6=, and to for t = ; but this is a very poor information, because it does not allow to deduce identity (5.) (the way n (t) goes to in nity is essential, not only the fact that (t) is for t = ).

30 4. STOCHASTIC PROCESSES. GENERALITIES Lemma. Denote by (t) the generalized function such that Z (t t ) f (t) dt = f (t ) for all continuous compact support functions f (it is called the delta Dirac distribution). Then lim e itn = (t) : N! jnjn From this lemma it follows that lim e i!n a sin (! n) = i (!! ) In other words, N! jnjn Corollary. The sequence has a generalized DTFT bx (!) = lim N! bx N (!) = x n = a sin (! n) i (! +! ) : p p i ( (!! ) (! +! )) : This is only one example of the possibility to extend the de nition and meaning of DTFT outside the assumption P nz jx nj <. It is also very interesting for the interpretation of the concept of DTFT. If the signal x n has a periodic component (notice that DTFT is linear) with angular frequency!, then its DTFT has two symmetric peaks (delta Dirac components) at!. This way, the DTFT reveals the periodic components of the signal. Exercise 9. Prove that the sequence has a generalized DTFT bx (!) = lim N! bx N (!) = x n = a cos (! n) p p ( (!! ) + (! +! )) : 6. Power spectral density Given a stationary process ( n ) nz with correlation function R (n) = E [ n ], n Z, we call power spectral density (PSD) the function S (!) = p e i!n R (n) ;! [; ] : Alternatively, one can use the expression S (f) = p e ifn R (n) ; f [; ] nz nz which produces easier visualizations because we catch more easily the fractions of the interval [; ].

31 6. POWER SPECTRAL DENSITY 5 Remark. In principle, to be de ned, this series requires P nz jr (n)j < or at least P nz jr (n)j <. In practice, on a side the convergence may happen also in unexpected cases due to cancellations, on the other side it may be acceptable to use a nite-time variant, something like P jnjn e i!n R (n), for practical purposes or from the computational viewpoint. A priori, one may think that S (f) may be not real valued. However, the function R (n) is nonnegative de nite (this means P n i= R (t i t j ) a i a j for all t ; :::; t n and a ; :::; a n ) and a theorem states that the Fourier transform of non-negative de nite function is a non-negative function. Thus, at the end, it turns out that S (f) is real and also non-negative. We do not give the details of this fact here because it will be a consequence of the fundamental theorem below. 6.. Example: white noise. We have R (n) = (n) hence S (!) = p ;! R: The spectra density is constant. This is the origin of the name, white noise. 6.. Example: perturbed periodic time series. This example is numeric only. Produce with R software the following time series: t <- : y<- sin(t/3)+.3*rnorm() ts.plot(y) The empirical autocorrelation function, obtained by acf(y), is and the power spectral density, suitable smoothed, obtained by spectrum(y,span=c(,3)), is

32 6. STOCHASTIC PROCESSES. GENERALITIES 6.3. Pink, Brown, Blue, Violet noise. In certain applications one meets PSD of special type which have been given names similarly to white noise. Recall that white noise has a constant PSD. Pink noise has PSD of the form Brown noise: S (f) f : Blue noise Violet noise S (f) f : S (f) f : S (f) f : 7. Fundamental theorem on PSD The following theorem is often stated without assumptions in the applied literature. One of the reasons is that it can be proved under various level of generality, with di erent meanings of the limit operation (it is a limit of functions). We shall give a rigorous statement under a very precise assumption on the autocorrelation function R (n); the convergence we prove is rather strong. The assumption is a little bit strange, but satis ed in all our examples. The assumption is that there exists a sequence (" n ) nn of positive numbers such that (7.) lim n! " n = ; nn jr (n)j " n < : This is just a little bit more restrictive than the condition P nn jr (n)j < which is natural to impose if we want uniform convergence of p PnZ e i!n R (n) to S (!). Any example of R (n) satisfying P nn jr (n)j < that the reader may have in mind, presumably satis es assumption (7.) in a easy way.

33 7. FUNDAMENTAL THEOREM ON PSD 7 Theorem (Wiener-Khinchin). If ( (n)) nz is a wide-sense stationary process satisfying assumption (7.), then S (!) = lim N! N + E N b (!) : The limit is uniform in! [; ]. Here N is the truncated process [ N;N]. In particular, it follows that S (!) is real an non-negative. Proof. Step. Let us prove the following main identity: (7.) S (!) = where the remainder r N is given by with r N (!) = N + E N + F 4 b N (!) + r N (!) n(n;) 3 E [ ( + n) (n)] 5 (!) (N; t) = [ N; N t ) [ (N t + 8 ; N] < N if t N t + = N t if < t N : if t > N 8 < N t = : N if t N t if N t < if t < N Since R (t) = E [ (t + n) (n)] for all n, we obviously have, for every T >, R (t) = E [ (t + n) (n)] : N + Thus (7.3) S (!) = b R (!) = hence Then recall that because " F f ( nz jnjn Z N N + F E [ ( + n) (n)] (!) : N n) g (n) # (!) = b f (!) bg (!) " # " # F f ( + n) g (n) (!) = F f ( n) g ( n) (!) nz nz = b f (!) bg (!) F [g ( )] (!) = bg (!) : :

34 8. STOCHASTIC PROCESSES. GENERALITIES Moreover, if the input function g is real, then bg (!) = bg (!), so we get " # F f ( + n) g (n) (!) = f b (!) bg (!) : nz If f (n) = g (n) = (n) [ N;N] (n) = N (n), then, for t, For t < we have In general, Therefore F 4 f (t + n) g (n) = nz f (t + n) g (n) = nz N + f (t + n) g (n) = nz N (t^n) n= N N n= N+(( t)^n) N + t (t + n) (n) : (t + n) (n) : n=n t (t + n) (n) : 3 ( + n) (n) 5 (!) = b N (!) b N (!) = b N (!) : And thus n=n F 4 N + 3 E [ ( + n) (n)] 5 (!) = E N b (!) : n=n From (7.3), we now get (7.). Step. The proof is complete if we show that lim N! r N (!) = uniformly in! [; ]. But E [ (t + n) (n)] = R (t) R (t) = " t j (N; t)j " t n(n;t) n(n;t) where j (N; t)j denotes the cardinality of (N; t). We have hence N + n(n;t) j (N; t)j (N ^ t) E [ (t + n) (n)] jr (t)j " t (N ^ t) " t N + : Given >, let t be such that " t for all t t. Then take N t such that N N. It is not restrictive to assume " t for all t. Then, for N N, if t t then t N+ for all (N ^ t) " t N + t " t N + t N +

35 7. FUNDAMENTAL THEOREM ON PSD 9 and if t t then (N ^ t) " t (N ^ t) N + N + : We have proved the following statement: for all > there exists N such that for all N N, uniformly in t. Then also N + n(n;t) (N ^ t) " t N + E [ (t + n) (n)] jr (t)j " t for all N N, uniformly in t. Therefore 3 jr N (!)j = p e i!t 4 E [ (t + n) (n)] 5 N + tz n(n;t) p N + E [ (t + n) (n)] p jr (t)j = p C " t jr(t)j tz n(n;t) where C = P tz " t <. This is the de nition of lim N! r N (!) = uniformly in! [; ]. The proof is complete. This theorem gives us the interpretation of PSD. The Fourier transform b T (!) identi es the frequency structure of the signal. The square b T (!) drops the information about the phase and keeps the information about the amplitude, but in the sense of energy (a square). It gives us the energy spectrum, in a sense. So the PSD is the average amplitude of the oscillatory component at frequency f =!. Thus PSD is a very useful tool if you want to identify oscillatory signals in your time series data and want to know their amplitude. By PSD, one can get a "feel" of data at an early stage of time series analysis. PSD tells us at which frequency ranges variations are strong. Remark. A priori one could think that it were more natural to compute the Fourier transform b (!) = P nz ei!n n without a cut-o of size T. But the process ( n ) is stationary. Therefore, it does not satisfy the assumption P nz n < or similar ones which require a decay at in nity. Stationarity is in contradiction with a decay at in nity (it can be proved, but we leave it at the obvious intuitive level). Remark 3. Under more assumptions (in particular a strong ergodicity one) it is possible to prove that S (!) = lim T! T b T (!) without expectation. Notice that T b T (!) is a random quantity, but the limit is deterministic. tz

36 3. STOCHASTIC PROCESSES. GENERALITIES 8. Signal to noise ratio Assume the process ( n ) nz we observe is the superposition of white noise (W n ) nz and a signal (f n ) nz, namely a process (maybe deterministic) which contains information and we would like to detect in spite of the noise corruption. The nal problem is the noise ltering, namely the reconstruction of a signal efn as close as possible to (f n) nz (the meaning of closedness maybe di erent; nz for instance we could be interested only in distinguishing between two a priori known signals). Let us make only preliminary comments on the size of the signal inside the noise. Assume n = W n + f n with and f independent of each other and, for sake of simplicity, assume f stationary. Then So where R (n) = E [ n ] = E [W n W ] + E [W n f ] + E [f n W ] + E [f n f ] = (n) + R f (n) : (n) = R (n) R () = (n) + R f (n) = + R f () SNR (n) + + SNR + SNR f (n) SNR := R f () = f W is the so called signal-to-noise-ratio. We see that we appreciate the shape of f (n) in (n) only if SN R is su ciently large. [One should be more precise. Indeed, theoretically, since (n) is equal to SNR zero for n 6=, we always see +SNR f (n) with in nite precision. The problem is that the measured W (n) is not (n) but something close to at n =, close to zero for n 6= but not equal to zero. However, the closedness to zero of W (n) is not just measured by : it depends on the number of observed points, the whiteness of the noise,... so we cannot write a simple formula.] Second, S (!) = p + S f (!) where Thus again we see S f (!) = p e i!n R f (n) = R f () p e i!n f (n) : nz p S (!) = + SNR nz nz e i!n f (n) : The contribution of the signal, P nz e i!n f (n), is visible only if SNR is not too small. [Here also we could say that we always may reconstruct exactly P nz e i!n f (n), just by taking p S (!) ; however, the term is only theoretical, in practice it is a moderately at function, with uctuations, and usually with a cut-o at large distances, again all facts depending on the size of the sample and the whiteness of the noise.]

37 9. AN ERGODIC THEOREM 3 9. An ergodic theorem There exist several versions of ergodic theorems. The simplest one is the Law of Large Numbers. Let us recall it in its simplest version, with convergence in mean square. Proposition 6. If ( n ) n is a sequence of uncorrelated r.v. (Cor ( i ; j ) = for all i 6= j), with nite and equal mean and variance, then P n n i= i converges to in mean square: lim E 4 3 n! i 5 = : n It also converges in probability. hence Proof. E 4 n i= i n i= n i 3 5 = i= i n i;j= = n i= = n ( i ) i= = n ( i ) ( j ) i;j= E [( i ) ( j )] = n Cor ( i ; j ) i;j= ij = n! : Recall that Chebyshev inequality states (in this particular case)! h P E n P i n > " n i= i " i= i;j= i for every " >. Hence, from the computation of the previous proof we deduce! P i n > " " n : i= In itself, this is an interesting estimate on the probability that the sample average P n n i= i di ers from more than ". It follows that! lim P i n! n > " = for every " >. This is the convergence in probability of n P n i= i to. i=

38 3. STOCHASTIC PROCESSES. GENERALITIES Remark 4. Often this theorem is stated only in the particular case when the r.v. i are independent and identically distributed, with nite second moment. We see that the proof is very easy under much more general assumptions. We have written the proof, very classical, so that the proof of the following lemma is obvious. Lemma 3. Let ( n ) n be a sequence of r.v. with nite second moments and equal mean. Assume that (9.) lim n! n Cor ( i ; j ) = : i;j= Then n P n i= i converges to in mean square and in probability. The lemma will be useful if we detect interesting su cient conditions for (9.). Here is our main ergodic theorem. Usually by the name ergodic theorem one means a theorem which states that the time-averages of a process converge to a deterministic value (the mean of the process, in the stationary case). Theorem. Assume that ( n ) n is a wide sense stationary process (this ensures in particular that ( n ) n is a sequence of r.v. with nite second moments and equal mean ). If lim R (n) = n! then n P n i= i converges to in mean square and in probability. Proof. Since Cor ( i ; j ) = Cor ( j ; i ), we have Cor ( i ; j ) jcor ( i ; j )j i;j= so it is su cient to prove that lim n! i;j= n i= j= i= j= i jcor ( i ; j )j = : i jcor ( i ; j )j Since the process is stationary, Cor ( i ; j ) = R (i P j) so we have to prove lim n P i n! n i= j= jr (i j)j =. But i i jr (i j)j = jr (k)j i= j= i= k= = jr ()j + (jr ()j + jr ()j) + (jr ()j + jr ()j + jr ()j) + ::: + (jr ()j + ::: + jr (n )j) = n jr ()j + (n ) jr ()j + (n ) jr ()j + ::: + jr (n )j = (n k) jr (k)j n jr (k)j : k= k=

39 9. AN ERGODIC THEOREM 33 P Therefore it is su cient to prove lim n n! n k= jr (k)j =. If lim n! R (n) =, for every " > there is n such that for all n n we have jr (n)j ". Hence, for n n, n jr (k)j n n k= k= jr (k)j + " n n k=n k= jr (k)j + ": Since P n k= jr (k)j is independent of n, there is n n such that for all n n Therefore, for all n n, This means that lim n! n n jr (k)j ": n k= n jr (k)j ": n k= P n k= jr (k)j =. The proof is complete. 9.. Rate of convergence. Concerning the rate of convergence, recall from the proof of the LLG that E 4 3 i 5 n n : i= We can reach the same result in the case of the ergodic theorem, under a suitable assumption. Proposition 7. If ( n ) n is a wide sense stationary process such that := jr (k)j < (this implies lim n! R (n) = ) then E 4 n k= i= i 3 5 n : Proof. It is su cient to put together several steps of the previous proof: E 4 3 i 5 = n n Cor ( i ; j ) i n jcor ( i ; j )j The proof is complete. i= i;j= jr (k)j n n : k= i= j= Notice that the assumptions of these two ergodic results (especially the ergodic theorem) are very general and always satis ed in our examples.

40 34. STOCHASTIC PROCESSES. GENERALITIES 9.. Empirical autocorrelation function. Very often we need the convergence of time averages of certain functions of the process: we would like to have g ( i )! g n i= in mean square, for certain functions g. We need to check the assumptions of the ergodic theorem for the sequence (g ( n )) n. Here is a simple example. Proposition 8. Let ( n ) n be a wide sense stationary process, with nite fourth moments, such that E n+k is independent of n and lim E k = : k! Then n P n i= i converges to E in mean square and in probability. Proof. Consider the process Y n = n. The mean function of (Y n ) is E n which is independent of n by the wide-sense stationarity of ( n ). For the autocorrelation function R (n; n + k) = E [Y n Y n+k ] = E n+k we need the new assumption of the proposition. Thus (Y n ) is wide-sense stationary. Finally, from the assumption lim k! E k =, which means limk! R Y (k) = where R Y (k) is the autocorrelation function of (Y n ), we can apply the ergodic theorem. The proof is complete. More remarkable is the following result, related to the estimation of R (n) by sample path autocorrelation function. Given a process ( n ) n, call sample path (or empirical) autocorrelation function the process i i+k : n i= Theorem 3. Let ( n ) n be a wide sense stationary process, with nite fourth moments, such that E [ n n+k n+j n+j+k ] is independent of n and lim E [ k j j+k ] = : j! Then the sample path autocorrelation function P n n i= i i+k converges to R (k) as n! in mean square and in probability. Precisely, for every k N, we have lim E 4 3 i n! i+k R (k) 5 = n i= and similarly for the convergence in probability. Proof. Given k N, consider the new process Y n = n n+k. Its mean function is constant in n because of the wide-sense stationarity of ( n ). For the autocorrelation function, R Y (n; n + j) = E [Y n Y n+j ] = E [ n n+k n+j n+j+k ]

Gaussian random variables inr n

Gaussian random variables inr n Gaussian vectors Lecture 5 Gaussian random variables inr n One-dimensional case One-dimensional Gaussian density with mean and standard deviation (called N, ): fx x exp. Proposition If X N,, then ax b

More information

ELEG 3143 Probability & Stochastic Process Ch. 6 Stochastic Process

ELEG 3143 Probability & Stochastic Process Ch. 6 Stochastic Process Department of Electrical Engineering University of Arkansas ELEG 3143 Probability & Stochastic Process Ch. 6 Stochastic Process Dr. Jingxian Wu wuj@uark.edu OUTLINE 2 Definition of stochastic process (random

More information

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley Time Series Models and Inference James L. Powell Department of Economics University of California, Berkeley Overview In contrast to the classical linear regression model, in which the components of the

More information

Notes on Time Series Modeling

Notes on Time Series Modeling Notes on Time Series Modeling Garey Ramey University of California, San Diego January 17 1 Stationary processes De nition A stochastic process is any set of random variables y t indexed by t T : fy t g

More information

For a stochastic process {Y t : t = 0, ±1, ±2, ±3, }, the mean function is defined by (2.2.1) ± 2..., γ t,

For a stochastic process {Y t : t = 0, ±1, ±2, ±3, }, the mean function is defined by (2.2.1) ± 2..., γ t, CHAPTER 2 FUNDAMENTAL CONCEPTS This chapter describes the fundamental concepts in the theory of time series models. In particular, we introduce the concepts of stochastic processes, mean and covariance

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

EIGENVALUES AND EIGENVECTORS 3

EIGENVALUES AND EIGENVECTORS 3 EIGENVALUES AND EIGENVECTORS 3 1. Motivation 1.1. Diagonal matrices. Perhaps the simplest type of linear transformations are those whose matrix is diagonal (in some basis). Consider for example the matrices

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

4.3 - Linear Combinations and Independence of Vectors

4.3 - Linear Combinations and Independence of Vectors - Linear Combinations and Independence of Vectors De nitions, Theorems, and Examples De nition 1 A vector v in a vector space V is called a linear combination of the vectors u 1, u,,u k in V if v can be

More information

Stochastic Processes. M. Sami Fadali Professor of Electrical Engineering University of Nevada, Reno

Stochastic Processes. M. Sami Fadali Professor of Electrical Engineering University of Nevada, Reno Stochastic Processes M. Sami Fadali Professor of Electrical Engineering University of Nevada, Reno 1 Outline Stochastic (random) processes. Autocorrelation. Crosscorrelation. Spectral density function.

More information

2. SPECTRAL ANALYSIS APPLIED TO STOCHASTIC PROCESSES

2. SPECTRAL ANALYSIS APPLIED TO STOCHASTIC PROCESSES 2. SPECTRAL ANALYSIS APPLIED TO STOCHASTIC PROCESSES 2.0 THEOREM OF WIENER- KHINTCHINE An important technique in the study of deterministic signals consists in using harmonic functions to gain the spectral

More information

Boolean Inner-Product Spaces and Boolean Matrices

Boolean Inner-Product Spaces and Boolean Matrices Boolean Inner-Product Spaces and Boolean Matrices Stan Gudder Department of Mathematics, University of Denver, Denver CO 80208 Frédéric Latrémolière Department of Mathematics, University of Denver, Denver

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

Statistical signal processing

Statistical signal processing Statistical signal processing Short overview of the fundamentals Outline Random variables Random processes Stationarity Ergodicity Spectral analysis Random variable and processes Intuition: A random variable

More information

a11 a A = : a 21 a 22

a11 a A = : a 21 a 22 Matrices The study of linear systems is facilitated by introducing matrices. Matrix theory provides a convenient language and notation to express many of the ideas concisely, and complicated formulas are

More information

Supplemental Material 1 for On Optimal Inference in the Linear IV Model

Supplemental Material 1 for On Optimal Inference in the Linear IV Model Supplemental Material 1 for On Optimal Inference in the Linear IV Model Donald W. K. Andrews Cowles Foundation for Research in Economics Yale University Vadim Marmer Vancouver School of Economics University

More information

Math 413/513 Chapter 6 (from Friedberg, Insel, & Spence)

Math 413/513 Chapter 6 (from Friedberg, Insel, & Spence) Math 413/513 Chapter 6 (from Friedberg, Insel, & Spence) David Glickenstein December 7, 2015 1 Inner product spaces In this chapter, we will only consider the elds R and C. De nition 1 Let V be a vector

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

Elementary Linear Algebra

Elementary Linear Algebra Matrices J MUSCAT Elementary Linear Algebra Matrices Definition Dr J Muscat 2002 A matrix is a rectangular array of numbers, arranged in rows and columns a a 2 a 3 a n a 2 a 22 a 23 a 2n A = a m a mn We

More information

3 Random Samples from Normal Distributions

3 Random Samples from Normal Distributions 3 Random Samples from Normal Distributions Statistical theory for random samples drawn from normal distributions is very important, partly because a great deal is known about its various associated distributions

More information

A matrix over a field F is a rectangular array of elements from F. The symbol

A matrix over a field F is a rectangular array of elements from F. The symbol Chapter MATRICES Matrix arithmetic A matrix over a field F is a rectangular array of elements from F The symbol M m n (F ) denotes the collection of all m n matrices over F Matrices will usually be denoted

More information

µ X (A) = P ( X 1 (A) )

µ X (A) = P ( X 1 (A) ) 1 STOCHASTIC PROCESSES This appendix provides a very basic introduction to the language of probability theory and stochastic processes. We assume the reader is familiar with the general measure and integration

More information

Convergence for periodic Fourier series

Convergence for periodic Fourier series Chapter 8 Convergence for periodic Fourier series We are now in a position to address the Fourier series hypothesis that functions can realized as the infinite sum of trigonometric functions discussed

More information

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES JOEL A. TROPP Abstract. We present an elementary proof that the spectral radius of a matrix A may be obtained using the formula ρ(a) lim

More information

Parametric Inference on Strong Dependence

Parametric Inference on Strong Dependence Parametric Inference on Strong Dependence Peter M. Robinson London School of Economics Based on joint work with Javier Hualde: Javier Hualde and Peter M. Robinson: Gaussian Pseudo-Maximum Likelihood Estimation

More information

MA 8101 Stokastiske metoder i systemteori

MA 8101 Stokastiske metoder i systemteori MA 811 Stokastiske metoder i systemteori AUTUMN TRM 3 Suggested solution with some extra comments The exam had a list of useful formulae attached. This list has been added here as well. 1 Problem In this

More information

Stochastic Processes

Stochastic Processes Introduction and Techniques Lecture 4 in Financial Mathematics UiO-STK4510 Autumn 2015 Teacher: S. Ortiz-Latorre Stochastic Processes 1 Stochastic Processes De nition 1 Let (E; E) be a measurable space

More information

Probability Space. J. McNames Portland State University ECE 538/638 Stochastic Signals Ver

Probability Space. J. McNames Portland State University ECE 538/638 Stochastic Signals Ver Stochastic Signals Overview Definitions Second order statistics Stationarity and ergodicity Random signal variability Power spectral density Linear systems with stationary inputs Random signal memory Correlation

More information

Universal examples. Chapter The Bernoulli process

Universal examples. Chapter The Bernoulli process Chapter 1 Universal examples 1.1 The Bernoulli process First description: Bernoulli random variables Y i for i = 1, 2, 3,... independent with P [Y i = 1] = p and P [Y i = ] = 1 p. Second description: Binomial

More information

On 1.9, you will need to use the facts that, for any x and y, sin(x+y) = sin(x) cos(y) + cos(x) sin(y). cos(x+y) = cos(x) cos(y) - sin(x) sin(y).

On 1.9, you will need to use the facts that, for any x and y, sin(x+y) = sin(x) cos(y) + cos(x) sin(y). cos(x+y) = cos(x) cos(y) - sin(x) sin(y). On 1.9, you will need to use the facts that, for any x and y, sin(x+y) = sin(x) cos(y) + cos(x) sin(y). cos(x+y) = cos(x) cos(y) - sin(x) sin(y). (sin(x)) 2 + (cos(x)) 2 = 1. 28 1 Characteristics of Time

More information

conditional cdf, conditional pdf, total probability theorem?

conditional cdf, conditional pdf, total probability theorem? 6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random

More information

9 Brownian Motion: Construction

9 Brownian Motion: Construction 9 Brownian Motion: Construction 9.1 Definition and Heuristics The central limit theorem states that the standard Gaussian distribution arises as the weak limit of the rescaled partial sums S n / p n of

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

MATH 117 LECTURE NOTES

MATH 117 LECTURE NOTES MATH 117 LECTURE NOTES XIN ZHOU Abstract. This is the set of lecture notes for Math 117 during Fall quarter of 2017 at UC Santa Barbara. The lectures follow closely the textbook [1]. Contents 1. The set

More information

Multivariate Time Series

Multivariate Time Series Multivariate Time Series Notation: I do not use boldface (or anything else) to distinguish vectors from scalars. Tsay (and many other writers) do. I denote a multivariate stochastic process in the form

More information

Linear algebra. S. Richard

Linear algebra. S. Richard Linear algebra S. Richard Fall Semester 2014 and Spring Semester 2015 2 Contents Introduction 5 0.1 Motivation.................................. 5 1 Geometric setting 7 1.1 The Euclidean space R n..........................

More information

Problem set 1 - Solutions

Problem set 1 - Solutions EMPIRICAL FINANCE AND FINANCIAL ECONOMETRICS - MODULE (8448) Problem set 1 - Solutions Exercise 1 -Solutions 1. The correct answer is (a). In fact, the process generating daily prices is usually assumed

More information

LECTURE 12 UNIT ROOT, WEAK CONVERGENCE, FUNCTIONAL CLT

LECTURE 12 UNIT ROOT, WEAK CONVERGENCE, FUNCTIONAL CLT MARCH 29, 26 LECTURE 2 UNIT ROOT, WEAK CONVERGENCE, FUNCTIONAL CLT (Davidson (2), Chapter 4; Phillips Lectures on Unit Roots, Cointegration and Nonstationarity; White (999), Chapter 7) Unit root processes

More information

Time Series 2. Robert Almgren. Sept. 21, 2009

Time Series 2. Robert Almgren. Sept. 21, 2009 Time Series 2 Robert Almgren Sept. 21, 2009 This week we will talk about linear time series models: AR, MA, ARMA, ARIMA, etc. First we will talk about theory and after we will talk about fitting the models

More information

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t 2.2 Filtrations Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of σ algebras {F t } such that F t F and F t F t+1 for all t = 0, 1,.... In continuous time, the second condition

More information

IEOR 4701: Stochastic Models in Financial Engineering. Summer 2007, Professor Whitt. SOLUTIONS to Homework Assignment 9: Brownian motion

IEOR 4701: Stochastic Models in Financial Engineering. Summer 2007, Professor Whitt. SOLUTIONS to Homework Assignment 9: Brownian motion IEOR 471: Stochastic Models in Financial Engineering Summer 27, Professor Whitt SOLUTIONS to Homework Assignment 9: Brownian motion In Ross, read Sections 1.1-1.3 and 1.6. (The total required reading there

More information

1 Review of di erential calculus

1 Review of di erential calculus Review of di erential calculus This chapter presents the main elements of di erential calculus needed in probability theory. Often, students taking a course on probability theory have problems with concepts

More information

Gaussian, Markov and stationary processes

Gaussian, Markov and stationary processes Gaussian, Markov and stationary processes Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ November

More information

Measure-theoretic probability

Measure-theoretic probability Measure-theoretic probability Koltay L. VEGTMAM144B November 28, 2012 (VEGTMAM144B) Measure-theoretic probability November 28, 2012 1 / 27 The probability space De nition The (Ω, A, P) measure space is

More information

at time t, in dimension d. The index i varies in a countable set I. We call configuration the family, denoted generically by Φ: U (x i (t) x j (t))

at time t, in dimension d. The index i varies in a countable set I. We call configuration the family, denoted generically by Φ: U (x i (t) x j (t)) Notations In this chapter we investigate infinite systems of interacting particles subject to Newtonian dynamics Each particle is characterized by its position an velocity x i t, v i t R d R d at time

More information

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R. Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions

More information

ELEMENTARY LINEAR ALGEBRA

ELEMENTARY LINEAR ALGEBRA ELEMENTARY LINEAR ALGEBRA K R MATTHEWS DEPARTMENT OF MATHEMATICS UNIVERSITY OF QUEENSLAND First Printing, 99 Chapter LINEAR EQUATIONS Introduction to linear equations A linear equation in n unknowns x,

More information

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1 Random Walks and Brownian Motion Tel Aviv University Spring 011 Lecture date: May 0, 011 Lecture 9 Instructor: Ron Peled Scribe: Jonathan Hermon In today s lecture we present the Brownian motion (BM).

More information

Linear Algebra March 16, 2019

Linear Algebra March 16, 2019 Linear Algebra March 16, 2019 2 Contents 0.1 Notation................................ 4 1 Systems of linear equations, and matrices 5 1.1 Systems of linear equations..................... 5 1.2 Augmented

More information

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory Part V 7 Introduction: What are measures and why measurable sets Lebesgue Integration Theory Definition 7. (Preliminary). A measure on a set is a function :2 [ ] such that. () = 2. If { } = is a finite

More information

October 7, :8 WSPC/WS-IJWMIP paper. Polynomial functions are renable

October 7, :8 WSPC/WS-IJWMIP paper. Polynomial functions are renable International Journal of Wavelets, Multiresolution and Information Processing c World Scientic Publishing Company Polynomial functions are renable Henning Thielemann Institut für Informatik Martin-Luther-Universität

More information

Introduction to Linear Algebra. Tyrone L. Vincent

Introduction to Linear Algebra. Tyrone L. Vincent Introduction to Linear Algebra Tyrone L. Vincent Engineering Division, Colorado School of Mines, Golden, CO E-mail address: tvincent@mines.edu URL: http://egweb.mines.edu/~tvincent Contents Chapter. Revew

More information

Parametric Signal Modeling and Linear Prediction Theory 1. Discrete-time Stochastic Processes

Parametric Signal Modeling and Linear Prediction Theory 1. Discrete-time Stochastic Processes Parametric Signal Modeling and Linear Prediction Theory 1. Discrete-time Stochastic Processes Electrical & Computer Engineering North Carolina State University Acknowledgment: ECE792-41 slides were adapted

More information

A time series is called strictly stationary if the joint distribution of every collection (Y t

A time series is called strictly stationary if the joint distribution of every collection (Y t 5 Time series A time series is a set of observations recorded over time. You can think for example at the GDP of a country over the years (or quarters) or the hourly measurements of temperature over a

More information

Estimates for probabilities of independent events and infinite series

Estimates for probabilities of independent events and infinite series Estimates for probabilities of independent events and infinite series Jürgen Grahl and Shahar evo September 9, 06 arxiv:609.0894v [math.pr] 8 Sep 06 Abstract This paper deals with finite or infinite sequences

More information

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770 Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770 Jonathan B. Hill Dept. of Economics University of North Carolina - Chapel Hill November

More information

Stochastic Processes

Stochastic Processes Stochastic Processes A very simple introduction Péter Medvegyev 2009, January Medvegyev (CEU) Stochastic Processes 2009, January 1 / 54 Summary from measure theory De nition (X, A) is a measurable space

More information

Midterm for Introduction to Numerical Analysis I, AMSC/CMSC 466, on 10/29/2015

Midterm for Introduction to Numerical Analysis I, AMSC/CMSC 466, on 10/29/2015 Midterm for Introduction to Numerical Analysis I, AMSC/CMSC 466, on 10/29/2015 The test lasts 1 hour and 15 minutes. No documents are allowed. The use of a calculator, cell phone or other equivalent electronic

More information

1 Linear Difference Equations

1 Linear Difference Equations ARMA Handout Jialin Yu 1 Linear Difference Equations First order systems Let {ε t } t=1 denote an input sequence and {y t} t=1 sequence generated by denote an output y t = φy t 1 + ε t t = 1, 2,... with

More information

Econ 424 Time Series Concepts

Econ 424 Time Series Concepts Econ 424 Time Series Concepts Eric Zivot January 20 2015 Time Series Processes Stochastic (Random) Process { 1 2 +1 } = { } = sequence of random variables indexed by time Observed time series of length

More information

Applied Probability and Stochastic Processes

Applied Probability and Stochastic Processes Applied Probability and Stochastic Processes In Engineering and Physical Sciences MICHEL K. OCHI University of Florida A Wiley-Interscience Publication JOHN WILEY & SONS New York - Chichester Brisbane

More information

Analysis-3 lecture schemes

Analysis-3 lecture schemes Analysis-3 lecture schemes (with Homeworks) 1 Csörgő István November, 2015 1 A jegyzet az ELTE Informatikai Kar 2015. évi Jegyzetpályázatának támogatásával készült Contents 1. Lesson 1 4 1.1. The Space

More information

Probability and Statistics

Probability and Statistics Probability and Statistics 1 Contents some stochastic processes Stationary Stochastic Processes 2 4. Some Stochastic Processes 4.1 Bernoulli process 4.2 Binomial process 4.3 Sine wave process 4.4 Random-telegraph

More information

The Multivariate Gaussian Distribution [DRAFT]

The Multivariate Gaussian Distribution [DRAFT] The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,

More information

Math 443/543 Graph Theory Notes 5: Graphs as matrices, spectral graph theory, and PageRank

Math 443/543 Graph Theory Notes 5: Graphs as matrices, spectral graph theory, and PageRank Math 443/543 Graph Theory Notes 5: Graphs as matrices, spectral graph theory, and PageRank David Glickenstein November 3, 4 Representing graphs as matrices It will sometimes be useful to represent graphs

More information

Lecture - 30 Stationary Processes

Lecture - 30 Stationary Processes Probability and Random Variables Prof. M. Chakraborty Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur Lecture - 30 Stationary Processes So,

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

5.4 Continuity: Preliminary Notions

5.4 Continuity: Preliminary Notions 5.4. CONTINUITY: PRELIMINARY NOTIONS 181 5.4 Continuity: Preliminary Notions 5.4.1 Definitions The American Heritage Dictionary of the English Language defines continuity as an uninterrupted succession,

More information

Prof. Dr.-Ing. Armin Dekorsy Department of Communications Engineering. Stochastic Processes and Linear Algebra Recap Slides

Prof. Dr.-Ing. Armin Dekorsy Department of Communications Engineering. Stochastic Processes and Linear Algebra Recap Slides Prof. Dr.-Ing. Armin Dekorsy Department of Communications Engineering Stochastic Processes and Linear Algebra Recap Slides Stochastic processes and variables XX tt 0 = XX xx nn (tt) xx 2 (tt) XX tt XX

More information

7: FOURIER SERIES STEVEN HEILMAN

7: FOURIER SERIES STEVEN HEILMAN 7: FOURIER SERIES STEVE HEILMA Contents 1. Review 1 2. Introduction 1 3. Periodic Functions 2 4. Inner Products on Periodic Functions 3 5. Trigonometric Polynomials 5 6. Periodic Convolutions 7 7. Fourier

More information

Gaussian vectors and central limit theorem

Gaussian vectors and central limit theorem Gaussian vectors and central limit theorem Samy Tindel Purdue University Probability Theory 2 - MA 539 Samy T. Gaussian vectors & CLT Probability Theory 1 / 86 Outline 1 Real Gaussian random variables

More information

Chapter 6. Order Statistics and Quantiles. 6.1 Extreme Order Statistics

Chapter 6. Order Statistics and Quantiles. 6.1 Extreme Order Statistics Chapter 6 Order Statistics and Quantiles 61 Extreme Order Statistics Suppose we have a finite sample X 1,, X n Conditional on this sample, we define the values X 1),, X n) to be a permutation of X 1,,

More information

1 Euclidean geometry. 1.1 The metric on R n

1 Euclidean geometry. 1.1 The metric on R n 1 Euclidean geometry This chapter discusses the geometry of n-dimensional Euclidean space E n, together with its distance function. The distance gives rise to other notions such as angles and congruent

More information

white noise Time moving average

white noise Time moving average 1.3 Time Series Statistical Models 13 white noise w 3 1 0 1 0 100 00 300 400 500 Time moving average v 1.5 0.5 0.5 1.5 0 100 00 300 400 500 Fig. 1.8. Gaussian white noise series (top) and three-point moving

More information

Lecture 3 Stationary Processes and the Ergodic LLN (Reference Section 2.2, Hayashi)

Lecture 3 Stationary Processes and the Ergodic LLN (Reference Section 2.2, Hayashi) Lecture 3 Stationary Processes and the Ergodic LLN (Reference Section 2.2, Hayashi) Our immediate goal is to formulate an LLN and a CLT which can be applied to establish sufficient conditions for the consistency

More information

We simply compute: for v = x i e i, bilinearity of B implies that Q B (v) = B(v, v) is given by xi x j B(e i, e j ) =

We simply compute: for v = x i e i, bilinearity of B implies that Q B (v) = B(v, v) is given by xi x j B(e i, e j ) = Math 395. Quadratic spaces over R 1. Algebraic preliminaries Let V be a vector space over a field F. Recall that a quadratic form on V is a map Q : V F such that Q(cv) = c 2 Q(v) for all v V and c F, and

More information

ENSC327 Communications Systems 19: Random Processes. Jie Liang School of Engineering Science Simon Fraser University

ENSC327 Communications Systems 19: Random Processes. Jie Liang School of Engineering Science Simon Fraser University ENSC327 Communications Systems 19: Random Processes Jie Liang School of Engineering Science Simon Fraser University 1 Outline Random processes Stationary random processes Autocorrelation of random processes

More information

Probability Models in Electrical and Computer Engineering Mathematical models as tools in analysis and design Deterministic models Probability models

Probability Models in Electrical and Computer Engineering Mathematical models as tools in analysis and design Deterministic models Probability models Probability Models in Electrical and Computer Engineering Mathematical models as tools in analysis and design Deterministic models Probability models Statistical regularity Properties of relative frequency

More information

4 Sums of Independent Random Variables

4 Sums of Independent Random Variables 4 Sums of Independent Random Variables Standing Assumptions: Assume throughout this section that (,F,P) is a fixed probability space and that X 1, X 2, X 3,... are independent real-valued random variables

More information

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2. Chapter 1 LINEAR EQUATIONS 11 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,, a n, b are given real

More information

2. Prime and Maximal Ideals

2. Prime and Maximal Ideals 18 Andreas Gathmann 2. Prime and Maximal Ideals There are two special kinds of ideals that are of particular importance, both algebraically and geometrically: the so-called prime and maximal ideals. Let

More information

3. ESTIMATION OF SIGNALS USING A LEAST SQUARES TECHNIQUE

3. ESTIMATION OF SIGNALS USING A LEAST SQUARES TECHNIQUE 3. ESTIMATION OF SIGNALS USING A LEAST SQUARES TECHNIQUE 3.0 INTRODUCTION The purpose of this chapter is to introduce estimators shortly. More elaborated courses on System Identification, which are given

More information

Markov Chains and Stochastic Sampling

Markov Chains and Stochastic Sampling Part I Markov Chains and Stochastic Sampling 1 Markov Chains and Random Walks on Graphs 1.1 Structure of Finite Markov Chains We shall only consider Markov chains with a finite, but usually very large,

More information

5 Birkhoff s Ergodic Theorem

5 Birkhoff s Ergodic Theorem 5 Birkhoff s Ergodic Theorem Birkhoff s Ergodic Theorem extends the validity of Kolmogorov s strong law to the class of stationary sequences of random variables. Stationary sequences occur naturally even

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

Stochastic Processes. Theory for Applications. Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS

Stochastic Processes. Theory for Applications. Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS Stochastic Processes Theory for Applications Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv Swgg&sfzoMj ybr zmjfr%cforj owf fmdy xix Acknowledgements xxi 1 Introduction and review

More information

Some Notes on Linear Algebra

Some Notes on Linear Algebra Some Notes on Linear Algebra prepared for a first course in differential equations Thomas L Scofield Department of Mathematics and Statistics Calvin College 1998 1 The purpose of these notes is to present

More information

11. Further Issues in Using OLS with TS Data

11. Further Issues in Using OLS with TS Data 11. Further Issues in Using OLS with TS Data With TS, including lags of the dependent variable often allow us to fit much better the variation in y Exact distribution theory is rarely available in TS applications,

More information

Gaussian processes. Basic Properties VAG002-

Gaussian processes. Basic Properties VAG002- Gaussian processes The class of Gaussian processes is one of the most widely used families of stochastic processes for modeling dependent data observed over time, or space, or time and space. The popularity

More information

ECON 616: Lecture 1: Time Series Basics

ECON 616: Lecture 1: Time Series Basics ECON 616: Lecture 1: Time Series Basics ED HERBST August 30, 2017 References Overview: Chapters 1-3 from Hamilton (1994). Technical Details: Chapters 2-3 from Brockwell and Davis (1987). Intuition: Chapters

More information

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes UC Berkeley Department of Electrical Engineering and Computer Sciences EECS 6: Probability and Random Processes Problem Set 3 Spring 9 Self-Graded Scores Due: February 8, 9 Submit your self-graded scores

More information

Stochastic Processes. Monday, November 14, 11

Stochastic Processes. Monday, November 14, 11 Stochastic Processes 1 Definition and Classification X(, t): stochastic process: X : T! R (, t) X(, t) where is a sample space and T is time. {X(, t) is a family of r.v. defined on {, A, P and indexed

More information

Statistics 992 Continuous-time Markov Chains Spring 2004

Statistics 992 Continuous-time Markov Chains Spring 2004 Summary Continuous-time finite-state-space Markov chains are stochastic processes that are widely used to model the process of nucleotide substitution. This chapter aims to present much of the mathematics

More information

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna

More information

16 1 Basic Facts from Functional Analysis and Banach Lattices

16 1 Basic Facts from Functional Analysis and Banach Lattices 16 1 Basic Facts from Functional Analysis and Banach Lattices 1.2.3 Banach Steinhaus Theorem Another fundamental theorem of functional analysis is the Banach Steinhaus theorem, or the Uniform Boundedness

More information

Elementary linear algebra

Elementary linear algebra Chapter 1 Elementary linear algebra 1.1 Vector spaces Vector spaces owe their importance to the fact that so many models arising in the solutions of specific problems turn out to be vector spaces. The

More information

THE CORONA FACTORIZATION PROPERTY AND REFINEMENT MONOIDS

THE CORONA FACTORIZATION PROPERTY AND REFINEMENT MONOIDS THE CORONA FACTORIZATION PROPERTY AND REFINEMENT MONOIDS EDUARD ORTEGA, FRANCESC PERERA, AND MIKAEL RØRDAM ABSTRACT. The Corona Factorization Property of a C -algebra, originally defined to study extensions

More information

Connectedness. Proposition 2.2. The following are equivalent for a topological space (X, T ).

Connectedness. Proposition 2.2. The following are equivalent for a topological space (X, T ). Connectedness 1 Motivation Connectedness is the sort of topological property that students love. Its definition is intuitive and easy to understand, and it is a powerful tool in proofs of well-known results.

More information

Notes on Random Vectors and Multivariate Normal

Notes on Random Vectors and Multivariate Normal MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution

More information

ECE 650 Lecture 4. Intro to Estimation Theory Random Vectors. ECE 650 D. Van Alphen 1

ECE 650 Lecture 4. Intro to Estimation Theory Random Vectors. ECE 650 D. Van Alphen 1 EE 650 Lecture 4 Intro to Estimation Theory Random Vectors EE 650 D. Van Alphen 1 Lecture Overview: Random Variables & Estimation Theory Functions of RV s (5.9) Introduction to Estimation Theory MMSE Estimation

More information