SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection
|
|
- Melvin Randall
- 5 years ago
- Views:
Transcription
1 SG Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection Ioan Tabus Department of Signal Processing Tampere University of Technology Finland 1 / 28
2 Overview Basic notions of estimation theory Statistically efficient estimates ARMA, AR and MA models Estimation of AR and ARMA parameters Selection of the model order for AR models 2 / 28
3 Basic notions of estimation theory Assume that a set of data x 1, x 2,..., x is given, and we postulate a model in the form of the pdf p(x a) as a function of a parameter a. An estimate of a is denoted â and it is a function of the data x 1, x 2,..., x. Example 1: Consider that the data x 1, x 2,..., x are independent and identically distributed, i.e., were generated as realizations of the random variable X = a + ε. We assume that ε i and ε j are independent for all pairs of i, j, with i j. Additionally we assume that ε has a Gaussian distribution (0, σ). The intuitive significance of a is that it is the mean of the variables Xi. Having available a realization, x1, x 2,..., x, an estimate of the mean is â = 1 x i The estimate is a function of the data. The particular function used to compute the estimate is called estimator. Many estimators can be proposed. If the realizations are changing, the value of â also changes. Hence â changes with the realizations of Xi and it is itself a random variable. (x i a) 2 By independence assumption the joint pdf is p(x1, x 2,..., x a) = 1 (2π) /2 σ e 2σ 2. 3 / 28
4 The likelihood function The likelihood function is denoted L(a x 1, x 2,..., x ) and is seen as a function in the variable a, given the data x 1, x 2,..., x. It is defined formally as L(a x 1, x 2,..., x ) = p(x 1, x 2,..., x a), although the pdf is defined for a fixed a and has the variables x 1, x 2,..., x. In many cases, when the pdfs contain an exponential function, it is more convenient to operate with the logarithm of the likelihood, named log-likelihood log L(a x 1, x 2,..., x ). For Example 1, the likelihood is L(a x 1, x 2,..., x ) = and the log-likelihood is log L(a x 1, x 2,..., x ) = 1 2σ 2 (x i a) 2 1 e (2π) /2 σ 2σ 2 (x i a) 2 log(2π) log σ 2 4 / 28
5 The maximum likelihood (ML) estimate The maximum likelihood (ML) estimate is the estimate that maximizes the likelihood function. One may want to estimate the unknown parameter by the value having the highest likelihood. This is just one possible estimate, of the many existing ones (moment fitting estimators are quite popular also). However, in general, when ML is easy to compute, it is likely to have very useful properties. The formal definition is â ML = arg max L(a x a 1, x 2,..., x ) â ML = arg max log L(a x a 1, x 2,..., x ) The two definitions are equivalent, since log function is monotone increasing, and maximizing the function, or its logarithm, are achieved at the same value of a. For example 1, to find the maximizing a take the derivative of the log-likelihood with respect to a a log L(a x 1, x 2,..., x ) = a 1 2σ 2 (x i a) 2 log(2π) log σ 2 = 1 σ 2 (x i a) In order to find the extremum point equate now the derivative to 0, to get 1 σ 2 (x i â ML ) = 0 â ML = x i 1 â ML = x i 5 / 28
6 Properties of a good estimator The most interesting properties of an estimate are the following Small bias: Bias(â) = E[â] a When Bias(â) = 0 the estimate is said unbiased. Small variance: Variance = E[(â a) 2 ] To quantify how small a variance can be, several bounds are available, the most important being Cramer-Rao bound. Consistency: lim â = a When the number of available data points is very large one wishes to have a very precise estimation, very close to the true a, and the more data is provided, the closer the estimate becomes to the true value. 6 / 28
7 ML estimate of a in Example 1 In the case of Example 1, the bias in the ML estimate of a can be evaluated easily: E[â ML ] = E[ 1 X i ] = 1 E[X i ] = 1 E[a + ε i ] = 1 a + 1 E[ε i ] = a so that the bias B(â ML ) = E[â ML ] a = 0 and the estimator is unbiased. The variance of the estimate âml results from Var[â ML ] = E[(â ML a) 2 ] = E[ 1 2 X i a ] = 1 2 Var(ε i ) = 1 2 σ 2 = σ2 where we have used the property that the variance of the sum of independent random variables ε 1,..., ε is the sum of variances of the variables, each variance being σ 2. When the number of measurements grows to infinite, the variance of the ML estimate tends to 0, lim Var[â ML ] = lim σ 2 = 0. This ensures that the fluctuations of â ML about mean are becoming extremely small, and is one form of convergence for random variables (â ML is said to converge in mean square to a). 7 / 28
8 ML estimate of σ in Example 1 When computing the ML estimate of a we assumed σ is known. In fact the estimate of a does not depend on σ. If we want to find an estimation of σ in addition to the estimate of a, we have to solve the system: a log L(a, σ x 1, x 2,..., x ) = 0 σ log L(a, σ x 1, x 2,..., x ) = 0 and denote the solution (â ML, ˆσ ML ). The maximizing âml will not depend on σ and is as before â ML = 1 x i. To find the maximizing σ take the derivative of the log-likelihood with respect to σ σ log L(a, σ x 1, x 2,..., x ) = σ 1 2σ 2 (x i a) 2 log(2π) log σ 2 = 2 2σ 3 (x i a) 2 σ Equating this to 0, and taking now also a to be the ML estimate, 2 2ˆσ ML 3 (x i â ML ) 2 ˆσ ML = 0 ˆσ 2 ML = 1 (x i â ML ) 2 8 / 28
9 ML estimate of σ in Example 1 The bias in the ML estimate of σ 2 can be evaluated as: E[ˆσ 2 ML ] = 1 E[(x i â ML ) 2 ] = 1 E[(a + ε i â ML ) 2 ] and using that â ML = a + 1 ε i : E[ˆσ 2 ML ] = 1 E[(ε i 1 ε i ) 2 ] = 1 [( ( 1)εi j i E ε ) 2 ] i ( ( 1) 2 Eε 2 i + ) j i = Eε2 i ( 1) 2 = 2 σ 2 = 1 σ2 so that the bias B(ˆσ 2 ML ) = E[ˆσ2 ML ] σ2 = 1 σ2 and the estimator is biased. An unbiased estimator of σ 2 can be found simply by scaling the ML estimator: ˆσ 2 unbiased = 1 ˆσ2 ML = 1 1 (x i â ML ) 2 The ML estimate, although is biased, is preferred to be use in some applications (in spectrum estimation) 9 / 28
10 Cramér-Rao bound Consider a set of data x = [x1... x ] generated by a distribution p(x a) and take any estimate â(x). Denote the bias Bias = B(a) = E[â(x) a a]. The MSE can be defined as MSE = E[(â(x) a) 2 a] = E[â(x) 2 a] 2aE[â(x) a] + a 2 Var = E[(â(x) E[â(x) a]) 2 a] = E[â(x) 2 a] (E[â(x) a]) 2 E[â(x) 2 a] = Var + (E[â(x) a]) 2 MSE = Var + (E[â(x) a]) 2 2aE[â(x) a] + a 2 = Var + (Bias) 2 Then the MSE of the estimate can be bounded by using the mean square error (MSE) inequality: MSE = E[(â(x) a) 2 [1 + a a] { B(a)]2 [ ] } = CRB E 2 a log p(x a) a { [ ] } The denominator in CRB is known as the Fisher information I (a) = E a 2 log p(x a) a If the estimator is unbiased, Bias = B(a) = 0, then Var(â(x)) = E[(â(x) a) 2 a] 1 1 { [ ] } = E 2 a log p(x a) I (a) = CRB a 10 / 28
11 Efficient estimate An estimator is efficient if it is unbiased Bias = B(a) = E[â(x) a a] = 0. Its variance achieves the CRB Var(â(x)) = E[(â(x) a) 2 a] 1 1 { [ ] } = E 2 a log p(x a) I (a) = CRB a When an estimator is efficient, there is no other unbiased estimator that can have smaller variance. 11 / 28
12 Asymptotic properties of the ML estimator the estimator is consistent: â(x) a when the estimator is asymptotically Gaussian, with mean a and variance inverse of the Fisher information I (a). the estimator is asymptotically efficient When an estimator is efficient, there is no other unbiased estimator that can have smaller variance. 12 / 28
13 ML estimates of uniform distribution parameters Consider a uniform distribution U(0, a). Here the parameter is a, which is the upper bound of the interval for which the distribution is defined. The pdf is p(x a) = 1/a if x a and p(x a) = 0 if x > a. Assume that a single realization x1 is available, so = 1. What is the ML estimate of a? We need to maximize with respect to a the likelihood function L(a x1 ). If a < x 1, L(a x 1 ) = 0. If a x 1, then L(a x 1 ) = 1/a. The maximum L(a x 1 ) = 1/a is obtained when â ML = x 1. When = 1 the ML is very biased. We have E[x1 ] = a/2 and since â ML = x 1, we have E[â ML ] = a/2, and the bias is Bias(â ML ) = a/2. Assume two realizations, x1, x 2 are available. Take x 1 to be the largest of them. Repeating the reasoning above, it results that â ML = max{x 1, x 2 }. Assume that the number of realizations goes to infinity. Then the value âml = max{x 1,..., x } tends to a and the ML estimate is consistent. ML for uniform distribution shows that not always partial derivatives are needed to find the ML estimates. 13 / 28
14 ML estimates for Gaussian distributed model errors Most often we assume Gaussian identically and independent distribution for the errors et in some models involving a vector of parameters θ (see the AR models later in the lecture) y t = u T t θ + et Then maximizing the likelihood max θ log L(θ x 1, x 2,..., x ) = max θ 1 2σ 2 (y i u T i θ) 2 log(2π) log σ 2 becomes equivalent to minimizing the sum of squares min θ (y i u T i θ) 2 which explains why the LS estimator has very good statistical properties. 14 / 28
15 : Simplest example AR(1) model Consider a first order autoregressive model y n = ay n 1 + e n where e n is a white noise sequence of zero mean and variance σ 2 e, initialized at n = 0 with y 0 = 0. Using the delay in time operator we can obtain the dependence of yn on the inputs e 1,..., e n y n = aq 1 y n + e n 1 y n = en 1 aq 1 y n = (1 + aq 1 + a 2 q 2 + a 3 q )e n y n = e n + ae n 1 + a 2 e n 2 + a 3 e n a n 1 e 1 The cross-correlation Eyne n+k for all k 0 is zero: y n = e n + ae n 1 + a 2 e n 2 + a 3 e n a n 1 e 1 Ey ne n+k = Ee n+k e n + aee n+k e n 1 + a 2 Ee n+k e n 2 + a 3 Ee n+k e n a n 1 Ee n+k e 1 = 0 since e n is not correlated with any e n+k with k 0. Given y(0) = 0 and e1,..., e the values of y 1,..., y can be computed Given same y(0) = 0 and another sequence e1,..., e results in other sequence y 1,..., y. Hence the sequence y 1,..., y is random and depends on the realization of e 1,..., e. 15 / 28
16 AR(1) model Input and output signals of the model and the autocorrelation function, for a = Input: white noise 2 e t time t 60 Output: AR(1), a = y t time t Autocorrelation function, a = estimated r true r r(τ) lag τ 16 / 28
17 AR(1) model Input and output signals of the model and the autocorrelation function, for a = Input: white noise 2 e t time t Output: AR(1), a = 0.98 y t time t Autocorrelation function, a = 0.98 estimated r true r r(τ) lag τ 17 / 28
18 Stability and stationarity of AR(1) model Property: The mean of yn is zero, for all n. We have y 1 = e 1 and the mean Ey 1 = 0. Then Ey 2 = aey 1 + Ee 2 = 0 and by induction we see that Ey n = 0. We want to evaluate the variance of yi, i.e., the autocorrellation function r yy (i, i) = Ey 2 i. We have r yy (n, n) = Ey 2 n = E(ay n 1 + e n) 2 = a 2 Ey 2 n 1 + 2aEy n 1e n + Ee 2 n = a2 r yy (n 1, n 1) + σ 2 e Iterating the equation ryy (n, n) = a 2 r yy (n 1, n 1) + σ 2 e r yy (n, n) = a 2 r yy (n 1, n 1) + σ 2 e = a4 r yy (n 2, n 2) + (1 + a 2 )σ 2 e = a 2n r yy (0, 0) + (1 + a 2 + a a 2n 2 )σ 2 e = a2n r yy (0, 0) + 1 a2n 1 a 2 σ2 e = 1 a 2n 1 a 2 σ2 e 18 / 28
19 Stability and stationarity of AR(1) model In general the autocorrelations depend on n, r yy (n, n) = 1 a 2n 1 a 2 σ2 e and hence the process AR(1) is not stationary. If the parameter a is smaller than 1 in modulus, after an initial transient regime the autocorrelation values are converging to the value r yy (n, n) = 1 1 a 2 σ2 e and the process is asymptotically stationary (but in practice the transient regime can be short). If the parameter a is larger than 1 in modulus, the autocorrelations are diverging, and the AR(1) process is non-stationary. In general, for a process AR(k) yn = a 1 y n a k y n k + e n, the condition of stability is that all the roots of the polynomial A(z) = 1 a 1 z a 2 z 2... a k z k are inside the unit circle. For the AR(1) this requires that the root of A(z) = 1 a 1 z, which is a, be smaller than 1 in modulus. Exercise If one initializes the AR(1) randomly, with y0 which has zero mean and variance 1 1 a 2 σ2 e, then the AR(1) model is wide sense stationary, i.e all the autocorrelation functions r yy (i, k) = Ey(i)y(k) depend only on the difference between i and j, for i, j > / 28
20 AR, MA and The general ARMA(n,m) model is described by the equation with differences y t = a 1 y t 1 + a 2 y t a ny t n + e t + b 1 e t b me t m Applying the Z transform to both members and assuming zero initial conditions Zy t = a 1 Zy t 1 + a 2 Zy t a nzy t n + Ze t + b 1 Ze t b mze t m Y (z) = a 1 z 1 Y (z) + a 2 z 2 Y (z) a nz n Y (z) + E(z) + b 1 z 2 E(z) b mz m E(z) (1 a 1 z 1 a 2 z 2... a nz n )Y (z) = (1 + b 1 z b mz m )E(z) 1 + b 1 z b mz m B(z) Y (z) = 1 a 1 z 1 a 2 z 2 E(z) =... a nz n A(z) E(z) The model written in the time domain using time shift operator is (1 a 1 q 1 a 2 q 2... a nq n )y t = (1 + b 1 q b mq m )e t y t = 1 + b 1 q b mq m 1 a 1 q 1 a 2 q 2 et... a nq n e t E(z) H(z) = B(z) A(z) y t Y (z) 20 / 28
21 AR, MA models The autoregressive AR(n) model is described by the equation with differences or the transfer function H(z) = 1 A(z) y t = a 1 y t 1 + a 2 y t a ny t n + e t 1 H(z) = A(z) The moving average MA(m) model is described by the equation with differences y t = e t + b 1 e t b me t m H(z) = B(z) Equivalence between models: An MA(m) model is equivalent to an AR( ) model An AR(n) model is equivalent to an MA( ) model An ARMA(n,m) model has an equivalent AR( ) model and an equivalent MA( ) model 21 / 28
22 Equivalence of an ARMA model to AR( ) model Example: consider an ARMA(1,1) model H(z) = H(z) = H(z) = 1 0.8z 1 = ARMA(1, 1) z 1 1 ( z 1 ) 1 (1 0.8z 1 ) 1 ( z 1 )( z z z ) = AR( ) The infinitely long AR( ) can be truncated to a finite number of coefficients and then we have an approximate equivalence, between an ARMA(n,m) model and a very long AR(L) model. The long model has L coefficients, much more than the n + m coefficients of the ARMA model. If two models are equivalent, in principle we prefer models having less parameters. However, the parameters in the AR model are easier to estimate (we already discussed the Levinson-Durbin and LS methods) while for the parameter estimation is a nonlinear optimization problem, difficult to solve. 22 / 28
23 Estimation of AR models The autoregressive AR(n) model is described by the equation with differences y t = a 1 y t 1 + a 2 y t a ny t n + e t which is a prediction model of order n. Levinson-Durbin algorithm can be used to estimate its parameters, for all possible orders n = 1,..., n max. The LS method can also be applied to estimate the AR parameters. Denote the unknowns θ = [a 1 a 2... a n] T. Denoting the data vector u t = [y t 1 y t 2... y t n ] T and since the desired values are d t = y t the following system of equations results y 1 = u T 1 θ + e 1... y = u T θ + e The system can be solved in LS sense, and has the solution ˆθ = 1 u i u T i u i d i (1) 23 / 28
24 Estimation of MA models The moving average MA(m) model is described by the equation with differences y t = e t + b 1 e t b me t m H(z) = B(z) r y (k) = E[y t y t k ] = E(e t + b 1 e t b me t m )(e t k + b 1 e t k b me t k m ) If k > m (or k < m) then ry (k) = 0. If 0 < k m (or 0 k m) then collecting all equations together r y (k) = σ 2 m k e b i b i+k i=0 r y (0) = σ 2 e (1 + b2 1 + b b2 m ) r y (1) = σ 2 e (b 1 + b 1 b 2 + b 2 b b m 1 b m)... r y (m 1) = σ 2 e (b m 1 + b 1 b m) r y (m) = σ 2 e bm which is a nonlinear system of m + 1 equations and m + 1 unknowns σ 2 e, b 1,..., b m. onlinear solvers may have troubles due to local minima in the criterion. More often used solver is the two stage least squares method, used also for the more general, described next. 24 / 28
25 Estimation of : Two-stage LS method The general ARMA(n,m) model is described by the equation with differences y t = a 1 y t 1 + a 2 y t a ny t n + e t + b 1 e t b me t m An ARMA(n,m) model has an equivalent AR( ) model, B(z) A(z) = 1 C(z) where C(z) = 1 + c 1 z 1 + c 2 z has an infinite number of coefficients. We estimate first a truncated form of C(z), where only the coefficients c1, c 2, c 3,..., c L are retained, with a very large L, but L <, e.g. L = /4. We use for estimation the LS method (this is the first LS stage) in the model y t = c 1 y t 1 + c 2 y t c L y t L + ε t using the values y 1,..., y and we obtain the estimated coefficients ĉ 1, ĉ 2, ĉ 3,..., ĉ L. from the estimated coefficients ĉ1, ĉ 2, ĉ 3,..., ĉ L compute the estimated values of the residuals of the model, ˆε 1, ˆε 2,..., ˆε, ˆε t = y t ĉ 1 y t 1 ĉ 2 y t 2... ĉ L y t L 25 / 28
26 Estimation of : Two-stage LS method We return to estimating the unknown coefficients in the ARMA(n,m), using now the equations for all t = 1,..., y t = a 1 y t 1 + a 2 y t a ny t n + ˆε t + b 1 ˆε t b m ˆε t m from which we obtain a system of equations, for all t = 1,..., y t ˆε t = a 1 y t 1 + a 2 y t a ny t n + +b 1 ˆε t b m ˆε t m with the unknowns θ = [a 1 a 2... a n b 1 b 2... b m] T. Denoting the data vector u t = [y t 1 y t 2... y t n ˆε t 1 ˆε t 2... ˆε t m ] T and the desired values d t = y t ˆε t the overdetermined system of equations becomes d 1 = u T 1 θ... d = u T θ having the LS solution ˆθ = 1 u i u T i u i d i (2) 26 / 28
27 Selecting the order of A set of candidate orders is given, most usually k {1, 2,..., K}. An estimated method is used for obtaining for each order k the estimates of the parameters ˆθ k and the power ˆσ k 2 of the residuals computed with the estimated vector ˆθ k. For each candidate order k a certain criterion is computed and the selected order is the order k minimizing the criterion. There are many criteria which are in use, derived based on various considerations, more often information theoretic considerations. one of them is good in all situations. A good policy is to check the optimal value given by each criterion and take a voting decision, taking into account the dangers of overfitting or underfitting in the current application. All criteria are penalizing a too large number of parameters and contain a term which rewards very good fits. The selection is a trade-off between overfitting and underfitting. Overfitting occurs when having a very good fit to data, obtained for very large k, but in which case the noise in the data is modeled as well. Such a model will not perform well on other data, different than the one where it was fitted. Under-fitting occurs when we have too few parameters, and the fit to the data is poor. 27 / 28
28 Criteria for order selection Final prediction error (FPE) criterion FPE(k) = + k k ˆσ2 k FPE tends to underestimate the order. Akaike Information Criterion (AIC) AIC(k) = ln(ˆσ 2 k ) + 2k For large, the AIC tends to overestimate the order (is not penalizing enough). FPE never exceeds the selection of AIC, hence is better for large. Minimum Description Length (MDL) criterion, also known as Bayesian Information Criterion (BIC) MDL(k) = ln(ˆσ 2 k ) + k ln For large, MDL estimates correctly the order (is penalizing correctly the number of parameters). However, for small the performance of AIC and MDL is similar. There are no definitive ranking of the three methods. 28 / 28
F & B Approaches to a simple model
A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys
More informationEstimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators
Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let
More information9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures
FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models
More informationParameter Estimation
Parameter Estimation Consider a sample of observations on a random variable Y. his generates random variables: (y 1, y 2,, y ). A random sample is a sample (y 1, y 2,, y ) where the random variables y
More informationAdvanced Signal Processing Introduction to Estimation Theory
Advanced Signal Processing Introduction to Estimation Theory Danilo Mandic, room 813, ext: 46271 Department of Electrical and Electronic Engineering Imperial College London, UK d.mandic@imperial.ac.uk,
More informationBrief Review on Estimation Theory
Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate
More informationIf we want to analyze experimental or simulated data we might encounter the following tasks:
Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction
More informationECE531 Lecture 10b: Maximum Likelihood Estimation
ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)
1 ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE) Jingxian Wu Department of Electrical Engineering University of Arkansas Outline Minimum Variance Unbiased Estimators (MVUE)
More informationMathematical statistics
October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationMathematical statistics
October 18 th, 2018 Lecture 16: Midterm review Countdown to mid-term exam: 7 days Week 1 Chapter 1: Probability review Week 2 Week 4 Week 7 Chapter 6: Statistics Chapter 7: Point Estimation Chapter 8:
More informationAutoregressive Moving Average (ARMA) Models and their Practical Applications
Autoregressive Moving Average (ARMA) Models and their Practical Applications Massimo Guidolin February 2018 1 Essential Concepts in Time Series Analysis 1.1 Time Series and Their Properties Time series:
More informationModern Methods of Data Analysis - WS 07/08
Modern Methods of Data Analysis Lecture VIc (19.11.07) Contents: Maximum Likelihood Fit Maximum Likelihood (I) Assume N measurements of a random variable Assume them to be independent and distributed according
More informationIntroduction to Simple Linear Regression
Introduction to Simple Linear Regression Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Introduction to Simple Linear Regression 1 / 68 About me Faculty in the Department
More informationLecture 2: Univariate Time Series
Lecture 2: Univariate Time Series Analysis: Conditional and Unconditional Densities, Stationarity, ARMA Processes Prof. Massimo Guidolin 20192 Financial Econometrics Spring/Winter 2017 Overview Motivation:
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationPart III Spectrum Estimation
ECE79-4 Part III Part III Spectrum Estimation 3. Parametric Methods for Spectral Estimation Electrical & Computer Engineering North Carolina State University Acnowledgment: ECE79-4 slides were adapted
More informationLesson 13: Box-Jenkins Modeling Strategy for building ARMA models
Lesson 13: Box-Jenkins Modeling Strategy for building ARMA models Facoltà di Economia Università dell Aquila umberto.triacca@gmail.com Introduction In this lesson we present a method to construct an ARMA(p,
More informationEconometric Forecasting
Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna October 1, 2014 Outline Introduction Model-free extrapolation Univariate time-series models Trend
More informationEIE6207: Maximum-Likelihood and Bayesian Estimation
EIE6207: Maximum-Likelihood and Bayesian Estimation Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak
More informationBIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation
BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)
More informationTheory of Statistics.
Theory of Statistics. Homework V February 5, 00. MT 8.7.c When σ is known, ˆµ = X is an unbiased estimator for µ. If you can show that its variance attains the Cramer-Rao lower bound, then no other unbiased
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 2: Estimation Theory January 2018 Heikki Huttunen heikki.huttunen@tut.fi Department of Signal Processing Tampere University of Technology Classical Estimation
More informationEC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)
1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For
More informationEstimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction
Estimation Tasks Short Course on Image Quality Matthew A. Kupinski Introduction Section 13.3 in B&M Keep in mind the similarities between estimation and classification Image-quality is a statistical concept
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationModule 2. Random Processes. Version 2, ECE IIT, Kharagpur
Module Random Processes Version, ECE IIT, Kharagpur Lesson 9 Introduction to Statistical Signal Processing Version, ECE IIT, Kharagpur After reading this lesson, you will learn about Hypotheses testing
More informationStationary Stochastic Time Series Models
Stationary Stochastic Time Series Models When modeling time series it is useful to regard an observed time series, (x 1,x,..., x n ), as the realisation of a stochastic process. In general a stochastic
More informationEmpirical Market Microstructure Analysis (EMMA)
Empirical Market Microstructure Analysis (EMMA) Lecture 3: Statistical Building Blocks and Econometric Basics Prof. Dr. Michael Stein michael.stein@vwl.uni-freiburg.de Albert-Ludwigs-University of Freiburg
More information2 Statistical Estimation: Basic Concepts
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:
More informationTerminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1
Estimation Theory Overview Properties Bias, Variance, and Mean Square Error Cramér-Rao lower bound Maximum likelihood Consistency Confidence intervals Properties of the mean estimator Properties of the
More informationCircle the single best answer for each multiple choice question. Your choice should be made clearly.
TEST #1 STA 4853 March 6, 2017 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. There are 32 multiple choice
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationEIE6207: Estimation Theory
EIE6207: Estimation Theory Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: Steven M.
More informationModel structure. Lecture Note #3 (Chap.6) Identification of time series model. ARMAX Models and Difference Equations
System Modeling and Identification Lecture ote #3 (Chap.6) CHBE 70 Korea University Prof. Dae Ryoo Yang Model structure ime series Multivariable time series x [ ] x x xm Multidimensional time series (temporal+spatial)
More informationMachine Learning Basics: Estimators, Bias and Variance
Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics
More informationSTAT Financial Time Series
STAT 6104 - Financial Time Series Chapter 4 - Estimation in the time Domain Chun Yip Yau (CUHK) STAT 6104:Financial Time Series 1 / 46 Agenda 1 Introduction 2 Moment Estimates 3 Autoregressive Models (AR
More informationBasic concepts in estimation
Basic concepts in estimation Random and nonrandom parameters Definitions of estimates ML Maimum Lielihood MAP Maimum A Posteriori LS Least Squares MMS Minimum Mean square rror Measures of quality of estimates
More informationEstimators as Random Variables
Estimation Theory Overview Properties Bias, Variance, and Mean Square Error Cramér-Rao lower bound Maimum likelihood Consistency Confidence intervals Properties of the mean estimator Introduction Up until
More informationMGR-815. Notes for the MGR-815 course. 12 June School of Superior Technology. Professor Zbigniew Dziong
Modeling, Estimation and Control, for Telecommunication Networks Notes for the MGR-815 course 12 June 2010 School of Superior Technology Professor Zbigniew Dziong 1 Table of Contents Preface 5 1. Example
More informationLecture 2: ARMA(p,q) models (part 2)
Lecture 2: ARMA(p,q) models (part 2) Florian Pelgrin University of Lausanne, École des HEC Department of mathematics (IMEA-Nice) Sept. 2011 - Jan. 2012 Florian Pelgrin (HEC) Univariate time series Sept.
More informationRegression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood
Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. β 0, β 1 Q = n (Y i (β 0 + β 1 X i )) 2 i=1 Minimize this by maximizing
More informationLecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis
Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,
More informationLecture 4a: ARMA Model
Lecture 4a: ARMA Model 1 2 Big Picture Most often our goal is to find a statistical model to describe real time series (estimation), and then predict the future (forecasting) One particularly popular model
More information7. Forecasting with ARIMA models
7. Forecasting with ARIMA models 309 Outline: Introduction The prediction equation of an ARIMA model Interpreting the predictions Variance of the predictions Forecast updating Measuring predictability
More informationLikelihood-Based Methods
Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)
More informationSystem Identification
System Identification Arun K. Tangirala Department of Chemical Engineering IIT Madras July 26, 2013 Module 6 Lecture 1 Arun K. Tangirala System Identification July 26, 2013 1 Objectives of this Module
More informationUnivariate Time Series Analysis; ARIMA Models
Econometrics 2 Fall 24 Univariate Time Series Analysis; ARIMA Models Heino Bohn Nielsen of4 Outline of the Lecture () Introduction to univariate time series analysis. (2) Stationarity. (3) Characterizing
More informationSTATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN
Massimo Guidolin Massimo.Guidolin@unibocconi.it Dept. of Finance STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN SECOND PART, LECTURE 2: MODES OF CONVERGENCE AND POINT ESTIMATION Lecture 2:
More informationDiscrete time processes
Discrete time processes Predictions are difficult. Especially about the future Mark Twain. Florian Herzog 2013 Modeling observed data When we model observed (realized) data, we encounter usually the following
More informationMATH4427 Notebook 2 Fall Semester 2017/2018
MATH4427 Notebook 2 Fall Semester 2017/2018 prepared by Professor Jenny Baglivo c Copyright 2009-2018 by Jenny A. Baglivo. All Rights Reserved. 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................
More informationHT Introduction. P(X i = x i ) = e λ λ x i
MODS STATISTICS Introduction. HT 2012 Simon Myers, Department of Statistics (and The Wellcome Trust Centre for Human Genetics) myers@stats.ox.ac.uk We will be concerned with the mathematical framework
More information1 Class Organization. 2 Introduction
Time Series Analysis, Lecture 1, 2018 1 1 Class Organization Course Description Prerequisite Homework and Grading Readings and Lecture Notes Course Website: http://www.nanlifinance.org/teaching.html wechat
More informationVector Auto-Regressive Models
Vector Auto-Regressive Models Laurent Ferrara 1 1 University of Paris Nanterre M2 Oct. 2018 Overview of the presentation 1. Vector Auto-Regressions Definition Estimation Testing 2. Impulse responses functions
More informationOn Moving Average Parameter Estimation
On Moving Average Parameter Estimation Niclas Sandgren and Petre Stoica Contact information: niclas.sandgren@it.uu.se, tel: +46 8 473392 Abstract Estimation of the autoregressive moving average (ARMA)
More informationVAR Models and Applications
VAR Models and Applications Laurent Ferrara 1 1 University of Paris West M2 EIPMC Oct. 2016 Overview of the presentation 1. Vector Auto-Regressions Definition Estimation Testing 2. Impulse responses functions
More informationRegression Estimation Least Squares and Maximum Likelihood
Regression Estimation Least Squares and Maximum Likelihood Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 1 Least Squares Max(min)imization Function to minimize
More informationAutoregressive and Moving-Average Models
Chapter 3 Autoregressive and Moving-Average Models 3.1 Introduction Let y be a random variable. We consider the elements of an observed time series {y 0,y 1,y2,...,y t } as being realizations of this randoms
More informationLecture 7: Model Building Bus 41910, Time Series Analysis, Mr. R. Tsay
Lecture 7: Model Building Bus 41910, Time Series Analysis, Mr R Tsay An effective procedure for building empirical time series models is the Box-Jenkins approach, which consists of three stages: model
More information6.1 Variational representation of f-divergences
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016
More informationRegression I: Mean Squared Error and Measuring Quality of Fit
Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving
More informationReview. December 4 th, Review
December 4 th, 2017 Att. Final exam: Course evaluation Friday, 12/14/2018, 10:30am 12:30pm Gore Hall 115 Overview Week 2 Week 4 Week 7 Week 10 Week 12 Chapter 6: Statistics and Sampling Distributions Chapter
More informationSelection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models
Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Junfeng Shang Bowling Green State University, USA Abstract In the mixed modeling framework, Monte Carlo simulation
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More information6.3 Forecasting ARMA processes
6.3. FORECASTING ARMA PROCESSES 123 6.3 Forecasting ARMA processes The purpose of forecasting is to predict future values of a TS based on the data collected to the present. In this section we will discuss
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationMax. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes
Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter
More informationAutomatic Autocorrelation and Spectral Analysis
Piet M.T. Broersen Automatic Autocorrelation and Spectral Analysis With 104 Figures Sprin ger 1 Introduction 1 1.1 Time Series Problems 1 2 Basic Concepts 11 2.1 Random Variables 11 2.2 Normal Distribution
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationReview Session: Econometrics - CLEFIN (20192)
Review Session: Econometrics - CLEFIN (20192) Part II: Univariate time series analysis Daniele Bianchi March 20, 2013 Fundamentals Stationarity A time series is a sequence of random variables x t, t =
More informationA SARIMAX coupled modelling applied to individual load curves intraday forecasting
A SARIMAX coupled modelling applied to individual load curves intraday forecasting Frédéric Proïa Workshop EDF Institut Henri Poincaré - Paris 05 avril 2012 INRIA Bordeaux Sud-Ouest Institut de Mathématiques
More informationMultivariate Regression
Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the
More information7. MULTIVARATE STATIONARY PROCESSES
7. MULTIVARATE STATIONARY PROCESSES 1 1 Some Preliminary Definitions and Concepts Random Vector: A vector X = (X 1,..., X n ) whose components are scalar-valued random variables on the same probability
More informationTutorial lecture 2: System identification
Tutorial lecture 2: System identification Data driven modeling: Find a good model from noisy data. Model class: Set of all a priori feasible candidate systems Identification procedure: Attach a system
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7
MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is
More informationCircle a single answer for each multiple choice question. Your choice should be made clearly.
TEST #1 STA 4853 March 4, 215 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. There are 31 questions. Circle
More informationTesting Restrictions and Comparing Models
Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationMachine Learning Basics: Maximum Likelihood Estimation
Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning
More informationVariations. ECE 6540, Lecture 10 Maximum Likelihood Estimation
Variations ECE 6540, Lecture 10 Last Time BLUE (Best Linear Unbiased Estimator) Formulation Advantages Disadvantages 2 The BLUE A simplification Assume the estimator is a linear system For a single parameter
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More information10. Time series regression and forecasting
10. Time series regression and forecasting Key feature of this section: Analysis of data on a single entity observed at multiple points in time (time series data) Typical research questions: What is the
More informationARIMA Modelling and Forecasting
ARIMA Modelling and Forecasting Economic time series often appear nonstationary, because of trends, seasonal patterns, cycles, etc. However, the differences may appear stationary. Δx t x t x t 1 (first
More informationTopic 4 Unit Roots. Gerald P. Dwyer. February Clemson University
Topic 4 Unit Roots Gerald P. Dwyer Clemson University February 2016 Outline 1 Unit Roots Introduction Trend and Difference Stationary Autocorrelations of Series That Have Deterministic or Stochastic Trends
More informationLinear models. Chapter Overview. Linear process: A process {X n } is a linear process if it has the representation.
Chapter 2 Linear models 2.1 Overview Linear process: A process {X n } is a linear process if it has the representation X n = b j ɛ n j j=0 for all n, where ɛ n N(0, σ 2 ) (Gaussian distributed with zero
More informationProblem Set 2: Box-Jenkins methodology
Problem Set : Box-Jenkins methodology 1) For an AR1) process we have: γ0) = σ ε 1 φ σ ε γ0) = 1 φ Hence, For a MA1) process, p lim R = φ γ0) = 1 + θ )σ ε σ ε 1 = γ0) 1 + θ Therefore, p lim R = 1 1 1 +
More informationModel selection using penalty function criteria
Model selection using penalty function criteria Laimonis Kavalieris University of Otago Dunedin, New Zealand Econometrics, Time Series Analysis, and Systems Theory Wien, June 18 20 Outline Classes of models.
More informationAlfredo A. Romero * College of William and Mary
A Note on the Use of in Model Selection Alfredo A. Romero * College of William and Mary College of William and Mary Department of Economics Working Paper Number 6 October 007 * Alfredo A. Romero is a Visiting
More informationUnit Root and Cointegration
Unit Root and Cointegration Carlos Hurtado Department of Economics University of Illinois at Urbana-Champaign hrtdmrt@illinois.edu Oct 7th, 016 C. Hurtado (UIUC - Economics) Applied Econometrics On the
More informationEstimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1
Estimation and Model Selection in Mixed Effects Models Part I Adeline Samson 1 1 University Paris Descartes Summer school 2009 - Lipari, Italy These slides are based on Marc Lavielle s slides Outline 1
More information5: MULTIVARATE STATIONARY PROCESSES
5: MULTIVARATE STATIONARY PROCESSES 1 1 Some Preliminary Definitions and Concepts Random Vector: A vector X = (X 1,..., X n ) whose components are scalarvalued random variables on the same probability
More informationDS-GA 1002 Lecture notes 11 Fall Bayesian statistics
DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian
More informationStatistics and Econometrics I
Statistics and Econometrics I Point Estimation Shiu-Sheng Chen Department of Economics National Taiwan University September 13, 2016 Shiu-Sheng Chen (NTU Econ) Statistics and Econometrics I September 13,
More informationCh. 14 Stationary ARMA Process
Ch. 14 Stationary ARMA Process A general linear stochastic model is described that suppose a time series to be generated by a linear aggregation of random shock. For practical representation it is desirable
More informationSystem Identification, Lecture 4
System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2012 F, FRI Uppsala University, Information Technology 30 Januari 2012 SI-2012 K. Pelckmans
More information