Improved Methods for Monte Carlo Estimation of the Fisher Information Matrix

Size: px
Start display at page:

Download "Improved Methods for Monte Carlo Estimation of the Fisher Information Matrix"

Transcription

1 008 American Control Conference Westin Seattle otel, Seattle, Washington, USA June -3, 008 ha8. Improved Methods for Monte Carlo stimation of the isher Information Matrix James C. Spall he Johns opkins University Applied Physics Laboratory Laurel, Maryland U.S.A. Summary: he isher information matrix summarizes the amount of information in a set of data relative to the quantities of interest and forms the basis for the Cramér- Rao (lower) bound on the uncertainty in an estimate. here are many applications of the information matrix in modeling, systems analysis, and estimation. his paper presents a resampling-based method for computing the information matrix together with some new theory related to efficient implementation. We show how certain properties associated with the likelihood function and the error in the estimates of the essian matrix can be exploited to improve the accuracy of the Monte Carlobased estimate of the information matrix. Key words: System identification; Monte Carlo simulation; Cramér-Rao bound; simultaneous perturbation (SPSA); likelihood function.. Problem Setting he isher information matrix has long played an important role in parameter estimation and system identification (e.g., Ljung, 999, pp. 5 8). In many practical problems, however, the information matrix is difficult or impossible to obtain analytically. his includes nonlinear and/or non-gaussian models as well as some linear problems (e.g., Segal and Weinstein, 988; Levy, 995). In previous work (Spall, 005), the author has presented a relatively simple Monte Carlo means of obtaining the isher information matrix for use in complex estimation settings. In contrast to the conventional approach, there is no need to analytically compute the expected value of either the essian matrix of the log-likelihood function or the outer product of the gradient of the log-likelihood function. he Monte Carlo approach can work with either evaluations of the log-likelihood function or evaluations of the gradient of the log-likelihood function, depending on what information is available. he required expected value in the definition of the information matrix is estimated via a Monte Carlo averaging combined with a simulation-based generation of artificial data. An extension to this basic Monte Carlo method is given in Das et al. (007), where it is shown that prior knowledge of some of the elements in in the information matrix can lead to improved estimates for all elements. his paper introduces two simple modifications to the approach of Spall (005) that improve the accuracy of the Monte Carlo estimate. hese modifications may be implemented with little more difficulty than the original approach. hese modifications may be used with or without the approach of Das et al. (007) for handling prior information.. he isher Information Matrix and Associated Approximations Consider a collection of n random vectors Z [z, z,, z n ] ; these vectors are not necessarily i.i.d. Let us assume that the general form for the joint probability density or probability mass (or hybrid density/mass) function for the random data matrix Z is known, but that this function depends on an unknown vector θ. Let the probability density/mass function for Z be p Z (ζ θ) where ζ ( zeta ) is a dummy matrix representing the possible outcomes for the elements in Z. he corresponding likelihood function, say (θ ζ), satisfies θ ζ) = p Z (ζ θ). (.) With the definition of the likelihood function in (.), we are now in a position to present the isher information matrix. he expectations below are with respect to the data set Z. Let L(θ) = log (θ Z) (so we are suppressing the data dependence in L). he p p information matrix (θ) for a differentiable log-likelihood function is given by L L ( θ). (.) θ θ In the case where the underlying data {z, z,, z n } are independent, the magnitude of (θ) will grow at a rate proportional to n since L will represent a sum of n random terms. he bounded quantity ( θ ) n is employed as an average information matrix over all measurements. ote also that when the data depend on some analyst-specified inputs, then (θ) also depends on these inputs. or notational convenience and since many applications depend on cases (such as i.i.d. data) where there are no /08/$ AACC. 395

2 inputs we suppress this dependence and write (θ) for the information matrix. xcept for relatively simple problems, however, the form in (.) is generally not useful in the practical calculation of the information matrix. Computing the expectation of a product of multivariate nonlinear functions is usually a hopeless task. A well-known equivalent form follows by assuming that L is twice continuously differentiable in θ (a.s. in Z). hat is, the essian matrix L ( θ) θθ is assumed to exist. urther, assume that the likelihood function is regular in the sense that standard conditions such as in Wilks (96, pp ; pp ) or Bickel and Doksum (977, pp. 6 7) hold. One of these conditions is that the set {ζ: (θ ζ) > 0} does not depend on θ. A fundamental implication of the regularity for the likelihood is that the necessary interchanges of differentiation and integration are valid. hen, the information matrix is related to the essian matrix of L through: L ( θ) (.3) θθ he form in (.3) is usually more amenable to calculation than the product-based form in (.). ote that in some applications, the observed information matrix at a particular data set Z (i.e., (θ)) may be easier to compute and/or preferred from an inference point of view relative to the actual information matrix (θ) in (.3) (e.g., fron and inckley, 978). Although the method in this paper is described for the determination of (θ), the efficient essian estimation described in Section 4 may also be used directly for the determination of (θ) when it is not easy to calculate the essian directly. xpression (.3) directly motivates a Monte Carlo simulation-based approach, as given in Spall (005). Let Z pseudo (i) be a Monte-Carlo generated random matrix from the assumed distribution for the actual data based on the parameters θ taking on some specified value (typically an estimated value). ote that Z pseudo (i) represents a sample of size n, analogous to the real data Z, and that dim(z pseudo (i)) = dim(z). urther, let ˆ represent the kth estimate of (θ) at the data vector Z pseudo (i); ˆ is to be used in an averaging process as described below. As described below, the estimate ˆ is generated via efficient simultaneous perturbation (SPSA) principles (Spall, 99) using either log-likelihood L(θ) values (alone) or the gradient (score vector) g(θ) L( θ) θ if that is available. he former usually corresponds to cases where the likelihood function and associated nonlinear process are so complex that no gradients are available. o highlight the fundamental commonality of approach, let G k i (θ ) represent either a gradient approximation (based on L(θ) values) or the exact gradient g(θ), as used in the kth essian estimate at the given Z pseudo (i). Let Δ k i [Δ k i, Δ k i,, Δ kp i ] be a mean-zero random vector such that the scalar elements {Δ kj i } are independent, identically distributed, symmetrically distributed random variables that are uniformly bounded and satisfy ( Δ kj i ) <. he latter condition excludes such commonly used Monte Carlo distributions as uniform and Gaussian. urther, assume that the Δ kj i are bounded in magnitude. ote that the user has full control over the choice of the Δ kj i distribution. A valid (and simple) choice is the Bernoulli ± distribution (it is not known at this time if this is the best distribution to choose for this application). he formula for estimating the essian at the point θ is: ˆ δgki δgki = ( Δ ) + ( Δ ), (.4) c c where δg Gki ( θ + cδ ) Gki ( θ cδ ), Δ denotes the vector of inverses of the p individual elements of Δ k i, and c > 0 is a small constant. he prime rationale for (.4) is that ˆ is a nearly unbiased estimator of the unknown (θ). Spall (000) gives conditions such that the essian estimate has an O(c ) bias (the main such condition is smoothness of L(θ), as reflected in the assumption that g(θ) is thrice continuously differentiable in θ near the nominal value of interest). he symmetrizing operation in (.4) (the multiple / and the indicated sum) is convenient to maintain a symmetric essian estimate. o illustrate how the individual essian estimates may be quite poor, note that ˆ in (.4) has (at most) rank two (and may not even be positive semi-definite). his low quality, however, does not prevent the information matrix estimate of interest from being accurate since it is not the essian per se that is of interest. he averaging process eliminates the inadequacies of the individual essian estimates. he main source of efficiency for (.4) is the fact that the estimate requires only a small (fixed) number of gradient or log-likelihood values for any dimension p. When gradient estimates are available, only two evaluations are needed (i.e., the two values Gki ( θ± cδ ) = g( θ ± cδ k ) evaluated at Z pseudo (i) are used to form the essian estimate). When only log-likelihood values are available, each of the gradient approximations Gki ( θ± cδ ) require two evaluations of L( ). ence, one approximation ˆ uses four log-likelihood values. he gradient approximations at the two design levels are: 396

3 L( θ± cδ + c Δ ) L( θ± cδ c Δ ) G ( θ± cδk) = Δ c (.5) where Δ = [,,..., ] Δ k iδ k i Δ kpi is generated in the same statistical manner as Δ k i, but independently of Δ k i (in particular, choosing Δ kj i as independent Bernoulli ± random variables is a valid but not necessary choice), Δ denotes the vector of inverses of the p elements of Δ, and c > 0 (like c) is a small constant. he Monte Carlo approach of Spall (005) is based on a double averaging scheme. he first inner average forms essian estimates at a given Z pseudo (i) (i =,,..., ) from k =,,..., M values of ˆ and the second outer average combines these sample mean essian estimates across the values of pseudo data. herefore, the Monte Carlo-based estimate of (θ) in Spall (005), denoted, ( θ ), is: M, M i= k= M M ( θ ) ˆ, (.6) or, in equivalent recursive (in i =,,, ) form: M i ˆ M, i( θ) = M, i ( θ ) + i im (.7) k = ( M,0 = 0). he aim of this paper is to introduce two modifications to the basic form in (.6) and (.7). 3. Characterization of rror in essian stimate his section provides a few key facts about ˆ, as used in the averaging process for the estimation of the information matrix (eqn. (.6)). We will use these facts in Sections 4 and 5 to show how the accuracy can be improved relative to the basic averaging process in (.6) and (.7). he probabilistic big-o terms appearing below are to be interpreted in the almost surely (a.s.) sense (e.g., Oc ( ) implies a function that is a.s. bounded when divided by c, c 0); all associated equalities hold a.s. One of the key ways in which the error in the isher estimate will be reduced is through the use of feedback. We now present an expression for the error in the essian estimates that is useful in creating the feedback term. rom Spall (006), it is known that ˆ in (.4) can be decomposed into three parts: ˆ = () θ + Ψ + Oc ( ), (3.) where Ψ k i is a p p matrix of terms dependent on (θ), Δ k i, and, when only L values are available, the additional perturbation vector Δ (see Spall, 006). ote that Ψ k i represents the error due to the simultaneous perturbations (Δ k i and, if relevant, Δ ). he specific form and notation for the Ψ k i term depends on whether L (θ) values or g(θ) values are available, represented as Ψ k i = Ψ or Ψ k i = Ψ, respectively, in the notation below. inally, the big- O error is a reflection of the bias in the essian estimate; in the case where only L values are available, the Oc ( ) bias assumes that cc is O() (c 0). Let us define D = Δ ( Δ ) I p and D = Δ ( Δ ) I p where I p is the p p identity matrix (note that D k i and D are symmetric when the perturbations are i.i.d. Bernoulli distributed). Given D k i and D above, it is shown in Spall (006) that with the Gki ( θ± cδ ) formed from L measurements only (see (.5)): Ψki ( )= D D + D + D + D D + D + Dki, (3.) while with the Gki ( θ ± cδ ) formed from direct g values: ( ) + Ψ D D. (3.3) 4. Implementation with Independent Perturbation per Measurement Let us assume in this section that the n data vectors entering each Z pseudo (i) are mutually independent (analogous to the real data z, z,, z n being mutually independent). he basic structure in Section can be improved by exploiting this independence. In particular, the variance of the elements of the individual essian estimates ˆ ˆ ki can be reduced by decomposing into a sum of n independent estimates, each corresponding to one of the data vectors. A separate perturbation vector can then be applied to each of the independent estimates, which produces variance reduction in the resulting estimate M,. the independent perturbations above reduce the variance of the elements in the estimate of (θ) from O( ) in Spall (005) to O( ( n )). he complete version of the paper available upon request includes these results. 5. eedback-based Method for (θ) stimation Aside from the independent perturbation idea of Section 4, variance reduction is possible by using the current estimate of (θ) to reduce the variance of the next essian estimate. his, in turn, reduces the variance of the next estimate for (θ). his feedback idea applies whether or not the independent perturbations of Section 4 are used. he idea here is in the spirit of Spall (006), but the context is different in that the primary quantity of interest is (θ), not the essian matrix. In essence, the variance reduction below follows from the fundamental property that var(x) (X ) for any random variable X having a finite second moment. 397

4 Let M, denote the feedback-based form of this section. So M, is a direct replacement for M, in Sections 4. Using the basic recursive form in (.7) as the starting point, the feedback-based form for the estimate of the information matrix in recursive (in i) form is i Mi, ( θ) = Mi, ( θ) i M + ˆ ( M, i ( )), (5.) im Ψ θ k= where Ψ = Ψ or Ψ = Ψ from (3.) or (3.3), as appropriate ( M,0 = 0). ote that the current best estimate of (θ) (i.e., M, i ) is used in place of when evaluating Ψ ). he main result related to improved accuracy is heorem (and its Corollary ) below. Because of the frequent need to refer to the specific θ of interest, as well as expansions of functions around that point of interest, we let θ represent the particular θ at which we wish to determine (θ); also, for convenience, let = (θ ). As a first step towards analyzing the variance reduction due to feedback, we show in the Lemma below that M, + O(c ) in mean square as (fixed M). he results below apply ˆ ki when is formed from either L(θ) values (alone) or from gradient (score) values g(θ ) and/or when the independent perturbation idea of Section 4 is used. ollowing the previously established notation for the third derivatives of L, let L (θ) denote the p 4 row vector of all possible fourth derivatives of L. Vector and matrix norms are the standard uclidean norm; in the matrix case, this corresponds to the robenius norm: =, where the a ij are the components of A. i j a ij Lemma. or some open neighborhood of θ, suppose L (θ) exists continuously (a.s. in Z) and that () ) ( ˆ ) ki L θ is bounded in magnitude. urther, let < (recall that the ˆ are identically distributed for all k and i). hen, given the basic conditions on Δ k i in Section (applying also to Δ when only L values are used for ˆ ), ( M, B( θ ) ) 0 as for any fixed M and all c sufficiently small, where B(θ ) is a bias matrix satisfying B(θ ) = O(c ). Proof. Due to space limitations, we do not include the proof of the Lemma here; the proof is available upon request. A similar proof is in Spall (006). We are now in a position to establish that feedback reduces the asymptotic mean-squared error of the estimate for the information matrix. In the proofs of heorem and Corollary, we use the mean-squared convergence result A of the Lemma to establish the main result on improved accuracy. ote that heorem and Corollary only consider the p case because at p =, M, = M, due to the errors in (3.) and (3.3) being identically zero. heorem. Suppose that the conditions of the Lemma hold, p, ( ) θ <, 0, and 0. urther, suppose that for some δ > 0 and δ > 0 such that ( + δ ) + ( +δ ) + δ =, ( L () θ ) is uniformly bounded in magnitude for all θ in an open neighborhood of θ, ( + δ Δ kj i ) < and, when only L values are used, ( + δ Δ kj i ) < (arbitrary i, j, and k by the i.i.d. assumption for the perturbation elements). hen the accuracy of M, is greater than the accuracy of M, in the sense that M, lim Oc ( ) +. (5.) M, Proof. Let f, f, and f denote corresponding (arbitrary) scalar elements of M, (no feedback estimate), M,, and, respectively. hat is, f and f are estimates for one of the p( p+ ) unique elements in the information ψi matrix. urther, let ψ i, ψi, and be the corresponding scalar elements of the error matrices Ψ i( i ) ( ), Ψ i ( θ ), and Ψ i ( ) (corresponding to the component of the information matrix being estimated by f and f ). Only Ψ i( i ) is actually computed in the algorithm; however, the other two errors, Ψ i ( ( θ )) and Ψ i ( ), play a key role in the proof below (recall that (θ ), in general, depends on Z pseudo (i)). Also, let h ˆi be the scalar element of ˆ i corresponding to the component of M, and M, being estimated by f and f, respectively. Because (5.) is based on the robenius norm, it is sufficient to show that ( ) ( ) lim f f f f + O(c ) for all of the p( p+ ) unique elements. As in the proof of the Lemma, without loss of generality, take M =. rom (5.), f ( ˆ = hi ψi). i = ˆ i hen, from the fact that ( ) < (with ( ˆ ) i = ( ˆ j ) for all i and j) and fact that,i converges in 398

5 mean square to + B(θ ) (Lemma), it is known that the mean-squared error ( f f ) exists for all. hus, by the standard bias-variance decomposition of meansquared error (e.g., Spall, 003, p. 333), ˆ 4 = var i ψ i + i= ( f f ) ( h ) O( c ), (5.3) where the summation follows from the uncorrelatedness of the summands h ˆi ψ i, which follows by the independence across i of the Δ i (and Δ i when the estimates are based only on measurements of L), and the O(c 4 ) term follows from the fact that the elements of the bias matrix B(θ ) are O(c ) (the Lemma). Analogously, in the no-feedback case (eqns. (.6) and (.7)) 4 ( f ) var( ˆ f = h) ( ) i + O c (5.4) i= (the specific analytical form of the squared bias contribution O(c 4 ) is identical in both the feedback and non-feedback cases). rom expressions (5.3) and (5.4) for the mean-squared errors, it is clear that the relative errors in the feedback and non-feedback cases are determined by analyzing the relative behavior of var( hˆ i ψ i) and var( h ˆ i ). It can then be shown that, ˆ var( h ψ ) = var( h +ψ ψ ) + O( c ), (5.5) i i i i i ˆ var( hi) = var( hi +ψ i ) + O( c ). (5.6) In the case of having direct g values, e i is simpler: e i is a numerator of sums of elements in L ( θ ) times terms within Δ i over a denominator of one term from Δ i. he arguments above then follow analogously, leading to the relationships in (5.5) and (5.6). ote that h i and ψi ψ i are uncorrelated and h i and ψ i are uncorrelated (the uncorrelatedness at each i follows by the form for Ψ i and the independence of the Δ i, Z pseudo (i), and, when the estimates are based only on measurements of L, Δ i ). hen, the variance terms on the right-hand side of (5.5) and (5.6) satisfy i i i i i i var( h +ψ ψ ) = var( h ) + var( ψ ψ ), (5.7) var( hi +ψ i ) = var( hi) + var( ψi ). (5.8) As a vehicle towards characterizing the relative values of var( hˆ i ψ i) and var( h ˆ i ) via (5.5) (5.8), we now show that the second term on the right-hand side of (5.7) satisfies var( ψi ψi) var( ψi ψi ) O(c ) as i. ote that, var( i i ) ( i i ) ψ ψ = ψ = ( i ) ( ψi ) ψ ψ, (5.9) where the first equality follows from ( ψi ) = ( ψi ) = 0 and the second equality follows by the independence of the Δ i, Z pseudo (i), and (when the estimates are based only on measurements of L) Δ i at each i. ence, from the Lemma, lim var( ψi ψi) var( i i ) = Oc ( ) i ψ ψ, (5.0) as desired. Relative to the right-hand side of (5.8), var( i ) ψ = ( i ) ψ, (5.) implying from (5.9) that var( ψi ψi ) var( ψi ) (the inequality is strict when ( ψi ) 0; see Corollary ). herefore, by the principle of Cesàro summability (e.g., Apostol, 974, heorem 8.48), it is known from (5.5) (5.) that lim var( hˆ var ˆ i) ( h ) i i ψ i= = var( ψ ) var( ψ ψ ) 0 (5.) (it is sufficient to consider only first values, ψ and ψ, in the limiting variance due to the identical distribution of ψi and ψi ). o establish the result to be proved, we need to ensure that the indicated limit in (5.) exists. rom (5.3), (5.4), and (5.) (which show that the numerator and denominator have the same O( ) convergence rate to unique limits), the limit in (5.) exists if var( h ˆ ) > 0 for at least one of the p( p+ ) elements being estimated (for all sufficiently small c). rom (5.6) and (5.8), var( h ˆ ) = var(h ) + var( ψ ) + O(c ). In the case where (θ ) is deterministic, we know from the error decompositions in (3.) and (3.3), that var( ψ ) > 0 for at least one element because (θ ) = and it is assumed that 0 and 0. In the case where (θ ) is random (i.e., at least one element is non-degenerate random), var(h ) for the nondegenerate elements exist (by the assumption ( ( θ ) ) < ) and satisfy var(h ) > 0. ence, by (5.6) and (5.8), var( h ˆ ) > 0 for at least one element. inally, for any element such that var( h ˆ ) > 0, we know by (5.3), (5.4), (5.5), (5.6), and (5.) that ( f f ) var( ) var( ) ( ) lim h + ψ ψ + O c = var( h ( f ) ) var( ) O( c ) f + ψ + + Oc ( ). (5.3) 399

6 he result to be proved in (5.) then follows by the fact that the results in (5.5) (5.) show that any elements with var( h ˆ ) = 0 do not contribute to the norms in either the numerator or denominator. Q..D. Corollary establishes conditions for strict inequality in (5.). he proof rests on the following: he solution matrix X to the equation AX + XB = 0 is unique (X = 0) if and only if the square matrices A and B have no eigenvalues in common (Lancaster and ismenetsky, 985, p. 44). Corollary to heorem. Suppose that the conditions of heorem hold, rank( ), and that the elements of Δ k i Δ are generated according to the Bernoulli ± and distribution. hen the inequality in (5.) is strict (i.e., < instead of ). Proof. rom (5.) and the definition of robenius norm, it is sufficient to show that the inequality in (5.3) is strict for at least one scalar element. rom (5.), this strict inequality follows if var( ψi ψi ) < var( ψi ), which, as noted below (5.), holds when ( ψi ) 0, requiring that ψi cannot be 0 a.s. or both the case of L measurements and g measurements, we now establish that ψi cannot be 0 a.s. for at least one scalar element. Let rank( ) = r p, and without loss of generality assume that the elements of θ are ordered such that the upper left r r block of is full rank; in the arguments below, a subscript r r denotes the upper left r r block of the indicated matrix. or the case of L measurements, we know from (3.), Ψ ( ) Δ, Δ,..., Δ ( i i i ri ) = ( D i + D i Δ i, Δ i,..., Δ ri ) D ir ; r 0 D ir ; r 0 = where the first equality follows by the symmetry of and D i (the latter due to the assumed Bernoulli distribution for the perturbations) and the fact that D i has mean 0 while the second equality follows from D i having mean 0 (the D ir ; r submatrix that remains is from the indicated conditioning). Because r r and r r have no eigenvalues in common, the above-mentioned result in Lancaster and ismenetsky (985, p. 44) indicates that ( Ψ i ( ) Δ i, Δ i,..., Δ ri ) = 0 (a necessary condition for Ψ ( ) = 0) if and only if D ir ; r = 0. i, his cannot happen since D ir ; r takes on r unique values (r ), none of which are 0. Likewise, for the case of g measurements, we know from (3.3) that ( Ψ i ( ) Δ i, Δ i,..., Δ ri ) is equal to the right-most expression above for ( Ψ i ( ) Δ i, Δ i,..., Δ ri ). ence, the same reasoning applies using the result in Lancaster and ismenetsky (985, p. 44). We thus know that ψi cannot be 0 a.s. for at least one element of ( L ) Ψ i ( ) or Ψ ( ) (as relevant), completing the proof. Q..D. inal remarks and acknowledgment: A numerical study has been performed on the feedback idea in Section 5, showing strong improvement in the quality of the estimate on the problem setting considered in Spall (005); details are available upon request. his work was partially supported by U.S. avy Contract D References Apostol,. M. (974), Mathematical Analysis (nd ed.), Addison- Wesley, Reading, MA. Bickel, P. J. and Doksum, K. (977), Mathematical Statistics: Basic Ideas and Selected opics, olden-day, San rancisco. Das, S., Spall, J. C., and Ghanem, R. (007), An fficient Calculation of isher Information Matrix: Monte Carlo Approach Using Prior Information, Proceedings of the I Conference on Decision and Control, 4 December 007, ew Orleans, LA, USA, paper WeB.4. fron, B. and inckley, D. V. (978), Assessing the Accuracy of the Maximum Likelihood stimator: Observed versus xpected isher Information (with discussion), Biometrika, vol. 65, pp Lancaster, P. and ismenetsky, M. (985), he heory of Matrices (nd ed.), Academic, ew York. Levy, L. J. (995), Generic Maximum Likelihood Identification Algorithms for Linear State Space Models, Proceedings of the Conference on Information Sciences and Systems, March 995, Baltimore, MD, pp Ljung, L. (999), System Identification heory for the User (nd ed.), Prentice all PR, Upper Saddle River, J. Segal, M. and Weinstein,. (988), A ew Method for valuating the Log-Likelihood Gradient (Score) of Linear Dynamic Systems, I ransactions on Automatic Control, vol. 33, pp Spall, J. C. (99), Multivariate Stochastic Approximation Using a Simultaneous Perturbation Gradient Approximation, I ransactions on Automatic Control, vol. 37, pp Spall, J. C. (000), Adaptive Stochastic Approximation by the Simultaneous Perturbation Method, I ransactions on Automatic Control, vol. 45, pp Spall, J. C. (003), Introduction to Stochastic Search and Optimization: stimation, Simulation, and Control, Wiley, oboken, J. Spall, J. C. (005), Monte Carlo Computation of the isher Information Matrix in onstandard Settings, Journal of Computational and Graphical Statistics (American Statistical Assoc.), vol. 4, pp Spall, J. C. (006), Convergence Analysis for eedback- and Weighting-Based Jacobian stimates in the Adaptive Simultaneous Perturbation Algorithm, Proceedings of the I Conference on Decision and Control, 3 5 December 006, San Diego, CA, pp (paper rb3.). Wilks, S. S. (96), Mathematical Statistics, Wiley, ew York. 400

Cramér Rao Bounds and Monte Carlo Calculation of the Fisher Information Matrix in Difficult Problems 1

Cramér Rao Bounds and Monte Carlo Calculation of the Fisher Information Matrix in Difficult Problems 1 Cramér Rao Bounds and Monte Carlo Calculation of the Fisher Information Matrix in Difficult Problems James C. Spall he Johns Hopkins University Applied Physics Laboratory Laurel, Maryland 073-6099 U.S.A.

More information

EFFICIENT CALCULATION OF FISHER INFORMATION MATRIX: MONTE CARLO APPROACH USING PRIOR INFORMATION. Sonjoy Das

EFFICIENT CALCULATION OF FISHER INFORMATION MATRIX: MONTE CARLO APPROACH USING PRIOR INFORMATION. Sonjoy Das EFFICIENT CALCULATION OF FISHER INFORMATION MATRIX: MONTE CARLO APPROACH USING PRIOR INFORMATION by Sonjoy Das A thesis submitted to The Johns Hopkins University in conformity with the requirements for

More information

Efficient Monte Carlo computation of Fisher information matrix using prior information

Efficient Monte Carlo computation of Fisher information matrix using prior information Efficient Monte Carlo computation of Fisher information matrix using prior information Sonjoy Das, UB James C. Spall, APL/JHU Roger Ghanem, USC SIAM Conference on Data Mining Anaheim, California, USA April

More information

1216 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 54, NO. 6, JUNE /$ IEEE

1216 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 54, NO. 6, JUNE /$ IEEE 1216 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL 54, NO 6, JUNE 2009 Feedback Weighting Mechanisms for Improving Jacobian Estimates in the Adaptive Simultaneous Perturbation Algorithm James C Spall, Fellow,

More information

Large Sample Properties of Estimators in the Classical Linear Regression Model

Large Sample Properties of Estimators in the Classical Linear Regression Model Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in

More information

Recent Advances in SPSA at the Extremes: Adaptive Methods for Smooth Problems and Discrete Methods for Non-Smooth Problems

Recent Advances in SPSA at the Extremes: Adaptive Methods for Smooth Problems and Discrete Methods for Non-Smooth Problems Recent Advances in SPSA at the Extremes: Adaptive Methods for Smooth Problems and Discrete Methods for Non-Smooth Problems SGM2014: Stochastic Gradient Methods IPAM, February 24 28, 2014 James C. Spall

More information

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

Proofs for Large Sample Properties of Generalized Method of Moments Estimators Proofs for Large Sample Properties of Generalized Method of Moments Estimators Lars Peter Hansen University of Chicago March 8, 2012 1 Introduction Econometrica did not publish many of the proofs in my

More information

Monte Carlo Computation of the Fisher Information Matrix in Nonstandard Settings

Monte Carlo Computation of the Fisher Information Matrix in Nonstandard Settings Monte Cao Computation of the Fisher Information Matrix in Nonstandard Settings James C. SPALL The Fisher information matrix summarizes the amount of information in the data relative to the quantities of

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

CONSTRAINED OPTIMIZATION OVER DISCRETE SETS VIA SPSA WITH APPLICATION TO NON-SEPARABLE RESOURCE ALLOCATION

CONSTRAINED OPTIMIZATION OVER DISCRETE SETS VIA SPSA WITH APPLICATION TO NON-SEPARABLE RESOURCE ALLOCATION Proceedings of the 200 Winter Simulation Conference B. A. Peters, J. S. Smith, D. J. Medeiros, and M. W. Rohrer, eds. CONSTRAINED OPTIMIZATION OVER DISCRETE SETS VIA SPSA WITH APPLICATION TO NON-SEPARABLE

More information

Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C.

Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C. Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C. Spall John Wiley and Sons, Inc., 2003 Preface... xiii 1. Stochastic Search

More information

Further Results on Model Structure Validation for Closed Loop System Identification

Further Results on Model Structure Validation for Closed Loop System Identification Advances in Wireless Communications and etworks 7; 3(5: 57-66 http://www.sciencepublishinggroup.com/j/awcn doi:.648/j.awcn.735. Further esults on Model Structure Validation for Closed Loop System Identification

More information

DA Freedman Notes on the MLE Fall 2003

DA Freedman Notes on the MLE Fall 2003 DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

A Few Notes on Fisher Information (WIP)

A Few Notes on Fisher Information (WIP) A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties

More information

Parameter Estimation

Parameter Estimation Parameter Estimation Consider a sample of observations on a random variable Y. his generates random variables: (y 1, y 2,, y ). A random sample is a sample (y 1, y 2,, y ) where the random variables y

More information

On Identification of Cascade Systems 1

On Identification of Cascade Systems 1 On Identification of Cascade Systems 1 Bo Wahlberg Håkan Hjalmarsson Jonas Mårtensson Automatic Control and ACCESS, School of Electrical Engineering, KTH, SE-100 44 Stockholm, Sweden. (bo.wahlberg@ee.kth.se

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Semiparametric Gaussian Copula Models: Progress and Problems

Semiparametric Gaussian Copula Models: Progress and Problems Semiparametric Gaussian Copula Models: Progress and Problems Jon A. Wellner University of Washington, Seattle 2015 IMS China, Kunming July 1-4, 2015 2015 IMS China Meeting, Kunming Based on joint work

More information

Economics 583: Econometric Theory I A Primer on Asymptotics

Economics 583: Econometric Theory I A Primer on Asymptotics Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:

More information

EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University October 17, 005 Lecture 3 3 he Singular Value Decomposition

More information

Noisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get

Noisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get Supplementary Material A. Auxillary Lemmas Lemma A. Lemma. Shalev-Shwartz & Ben-David,. Any update of the form P t+ = Π C P t ηg t, 3 for an arbitrary sequence of matrices g, g,..., g, projection Π C onto

More information

Economics 241B Review of Limit Theorems for Sequences of Random Variables

Economics 241B Review of Limit Theorems for Sequences of Random Variables Economics 241B Review of Limit Theorems for Sequences of Random Variables Convergence in Distribution The previous de nitions of convergence focus on the outcome sequences of a random variable. Convergence

More information

Semiparametric Gaussian Copula Models: Progress and Problems

Semiparametric Gaussian Copula Models: Progress and Problems Semiparametric Gaussian Copula Models: Progress and Problems Jon A. Wellner University of Washington, Seattle European Meeting of Statisticians, Amsterdam July 6-10, 2015 EMS Meeting, Amsterdam Based on

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

Concentration Ellipsoids

Concentration Ellipsoids Concentration Ellipsoids ECE275A Lecture Supplement Fall 2008 Kenneth Kreutz Delgado Electrical and Computer Engineering Jacobs School of Engineering University of California, San Diego VERSION LSECE275CE

More information

Cramér-Rao Bounds for Estimation of Linear System Noise Covariances

Cramér-Rao Bounds for Estimation of Linear System Noise Covariances Journal of Mechanical Engineering and Automation (): 6- DOI: 593/jjmea Cramér-Rao Bounds for Estimation of Linear System oise Covariances Peter Matiso * Vladimír Havlena Czech echnical University in Prague

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

ECE 275A Homework 6 Solutions

ECE 275A Homework 6 Solutions ECE 275A Homework 6 Solutions. The notation used in the solutions for the concentration (hyper) ellipsoid problems is defined in the lecture supplement on concentration ellipsoids. Note that θ T Σ θ =

More information

The loss function and estimating equations

The loss function and estimating equations Chapter 6 he loss function and estimating equations 6 Loss functions Up until now our main focus has been on parameter estimating via the maximum likelihood However, the negative maximum likelihood is

More information

Estimators as Random Variables

Estimators as Random Variables Estimation Theory Overview Properties Bias, Variance, and Mean Square Error Cramér-Rao lower bound Maimum likelihood Consistency Confidence intervals Properties of the mean estimator Introduction Up until

More information

FDSA and SPSA in Simulation-Based Optimization Stochastic approximation provides ideal framework for carrying out simulation-based optimization

FDSA and SPSA in Simulation-Based Optimization Stochastic approximation provides ideal framework for carrying out simulation-based optimization FDSA and SPSA in Simulation-Based Optimization Stochastic approximation provides ideal framework for carrying out simulation-based optimization Rigorous means for handling noisy loss information inherent

More information

MULTIVARIATE PROBABILITY DISTRIBUTIONS

MULTIVARIATE PROBABILITY DISTRIBUTIONS MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined

More information

Backward Error Estimation

Backward Error Estimation Backward Error Estimation S. Chandrasekaran E. Gomez Y. Karant K. E. Schubert Abstract Estimation of unknowns in the presence of noise and uncertainty is an active area of study, because no method handles

More information

2 Statistical Estimation: Basic Concepts

2 Statistical Estimation: Basic Concepts Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:

More information

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1 Estimation Theory Overview Properties Bias, Variance, and Mean Square Error Cramér-Rao lower bound Maximum likelihood Consistency Confidence intervals Properties of the mean estimator Properties of the

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Online Appendix. j=1. φ T (ω j ) vec (EI T (ω j ) f θ0 (ω j )). vec (EI T (ω) f θ0 (ω)) = O T β+1/2) = o(1), M 1. M T (s) exp ( isω)

Online Appendix. j=1. φ T (ω j ) vec (EI T (ω j ) f θ0 (ω j )). vec (EI T (ω) f θ0 (ω)) = O T β+1/2) = o(1), M 1. M T (s) exp ( isω) Online Appendix Proof of Lemma A.. he proof uses similar arguments as in Dunsmuir 979), but allowing for weak identification and selecting a subset of frequencies using W ω). It consists of two steps.

More information

Module 3. Function of a Random Variable and its distribution

Module 3. Function of a Random Variable and its distribution Module 3 Function of a Random Variable and its distribution 1. Function of a Random Variable Let Ω, F, be a probability space and let be random variable defined on Ω, F,. Further let h: R R be a given

More information

1 of 7 7/16/2009 6:12 AM Virtual Laboratories > 7. Point Estimation > 1 2 3 4 5 6 1. Estimators The Basic Statistical Model As usual, our starting point is a random experiment with an underlying sample

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han and Robert de Jong January 28, 2002 Abstract This paper considers Closest Moment (CM) estimation with a general distance function, and avoids

More information

A Primer on Asymptotics

A Primer on Asymptotics A Primer on Asymptotics Eric Zivot Department of Economics University of Washington September 30, 2003 Revised: October 7, 2009 Introduction The two main concepts in asymptotic theory covered in these

More information

Recursive Generalized Eigendecomposition for Independent Component Analysis

Recursive Generalized Eigendecomposition for Independent Component Analysis Recursive Generalized Eigendecomposition for Independent Component Analysis Umut Ozertem 1, Deniz Erdogmus 1,, ian Lan 1 CSEE Department, OGI, Oregon Health & Science University, Portland, OR, USA. {ozertemu,deniz}@csee.ogi.edu

More information

Introduction to Maximum Likelihood Estimation

Introduction to Maximum Likelihood Estimation Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:

More information

Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit

Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit Evan Kwiatkowski, Jan Mandel University of Colorado Denver December 11, 2014 OUTLINE 2 Data Assimilation Bayesian Estimation

More information

Robust Backtesting Tests for Value-at-Risk Models

Robust Backtesting Tests for Value-at-Risk Models Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society

More information

Growing Window Recursive Quadratic Optimization with Variable Regularization

Growing Window Recursive Quadratic Optimization with Variable Regularization 49th IEEE Conference on Decision and Control December 15-17, Hilton Atlanta Hotel, Atlanta, GA, USA Growing Window Recursive Quadratic Optimization with Variable Regularization Asad A. Ali 1, Jesse B.

More information

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,

More information

Chapter 3. Point Estimation. 3.1 Introduction

Chapter 3. Point Estimation. 3.1 Introduction Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

L03. PROBABILITY REVIEW II COVARIANCE PROJECTION. NA568 Mobile Robotics: Methods & Algorithms

L03. PROBABILITY REVIEW II COVARIANCE PROJECTION. NA568 Mobile Robotics: Methods & Algorithms L03. PROBABILITY REVIEW II COVARIANCE PROJECTION NA568 Mobile Robotics: Methods & Algorithms Today s Agenda State Representation and Uncertainty Multivariate Gaussian Covariance Projection Probabilistic

More information

Multivariate Time Series

Multivariate Time Series Multivariate Time Series Notation: I do not use boldface (or anything else) to distinguish vectors from scalars. Tsay (and many other writers) do. I denote a multivariate stochastic process in the form

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Efficient Implementation of Approximate Linear Programming

Efficient Implementation of Approximate Linear Programming 2.997 Decision-Making in Large-Scale Systems April 12 MI, Spring 2004 Handout #21 Lecture Note 18 1 Efficient Implementation of Approximate Linear Programming While the ALP may involve only a small number

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

ARIMA Modelling and Forecasting

ARIMA Modelling and Forecasting ARIMA Modelling and Forecasting Economic time series often appear nonstationary, because of trends, seasonal patterns, cycles, etc. However, the differences may appear stationary. Δx t x t x t 1 (first

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Analogy Principle. Asymptotic Theory Part II. James J. Heckman University of Chicago. Econ 312 This draft, April 5, 2006

Analogy Principle. Asymptotic Theory Part II. James J. Heckman University of Chicago. Econ 312 This draft, April 5, 2006 Analogy Principle Asymptotic Theory Part II James J. Heckman University of Chicago Econ 312 This draft, April 5, 2006 Consider four methods: 1. Maximum Likelihood Estimation (MLE) 2. (Nonlinear) Least

More information

Stochastic Spectral Approaches to Bayesian Inference

Stochastic Spectral Approaches to Bayesian Inference Stochastic Spectral Approaches to Bayesian Inference Prof. Nathan L. Gibson Department of Mathematics Applied Mathematics and Computation Seminar March 4, 2011 Prof. Gibson (OSU) Spectral Approaches to

More information

Lecture 4 September 15

Lecture 4 September 15 IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric

More information

1 Appendix A: Matrix Algebra

1 Appendix A: Matrix Algebra Appendix A: Matrix Algebra. Definitions Matrix A =[ ]=[A] Symmetric matrix: = for all and Diagonal matrix: 6=0if = but =0if 6= Scalar matrix: the diagonal matrix of = Identity matrix: the scalar matrix

More information

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 700-716 c 2007, Indian Statistical Institute More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order

More information

On the Identification of a Class of Linear Models

On the Identification of a Class of Linear Models On the Identification of a Class of Linear Models Jin Tian Department of Computer Science Iowa State University Ames, IA 50011 jtian@cs.iastate.edu Abstract This paper deals with the problem of identifying

More information

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland Frederick James CERN, Switzerland Statistical Methods in Experimental Physics 2nd Edition r i Irr 1- r ri Ibn World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI CONTENTS

More information

Estimation, Detection, and Identification

Estimation, Detection, and Identification Estimation, Detection, and Identification Graduate Course on the CMU/Portugal ECE PhD Program Spring 2008/2009 Chapter 5 Best Linear Unbiased Estimators Instructor: Prof. Paulo Jorge Oliveira pjcro @ isr.ist.utl.pt

More information

Estimation Theory Fredrik Rusek. Chapters

Estimation Theory Fredrik Rusek. Chapters Estimation Theory Fredrik Rusek Chapters 3.5-3.10 Recap We deal with unbiased estimators of deterministic parameters Performance of an estimator is measured by the variance of the estimate (due to the

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

9 Brownian Motion: Construction

9 Brownian Motion: Construction 9 Brownian Motion: Construction 9.1 Definition and Heuristics The central limit theorem states that the standard Gaussian distribution arises as the weak limit of the rescaled partial sums S n / p n of

More information

Estimation of Dynamic Regression Models

Estimation of Dynamic Regression Models University of Pavia 2007 Estimation of Dynamic Regression Models Eduardo Rossi University of Pavia Factorization of the density DGP: D t (x t χ t 1, d t ; Ψ) x t represent all the variables in the economy.

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Kalman Filters with Uncompensated Biases

Kalman Filters with Uncompensated Biases Kalman Filters with Uncompensated Biases Renato Zanetti he Charles Stark Draper Laboratory, Houston, exas, 77058 Robert H. Bishop Marquette University, Milwaukee, WI 53201 I. INRODUCION An underlying assumption

More information

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM c 2007-2016 by Armand M. Makowski 1 ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM 1 The basic setting Throughout, p, q and k are positive integers. The setup With

More information

Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity

Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity John C. Chao, Department of Economics, University of Maryland, chao@econ.umd.edu. Jerry A. Hausman, Department of Economics,

More information

Stochastic Nonlinear Stabilization Part II: Inverse Optimality Hua Deng and Miroslav Krstic Department of Mechanical Engineering h

Stochastic Nonlinear Stabilization Part II: Inverse Optimality Hua Deng and Miroslav Krstic Department of Mechanical Engineering h Stochastic Nonlinear Stabilization Part II: Inverse Optimality Hua Deng and Miroslav Krstic Department of Mechanical Engineering denghua@eng.umd.edu http://www.engr.umd.edu/~denghua/ University of Maryland

More information

Mean squared error matrix comparison of least aquares and Stein-rule estimators for regression coefficients under non-normal disturbances

Mean squared error matrix comparison of least aquares and Stein-rule estimators for regression coefficients under non-normal disturbances METRON - International Journal of Statistics 2008, vol. LXVI, n. 3, pp. 285-298 SHALABH HELGE TOUTENBURG CHRISTIAN HEUMANN Mean squared error matrix comparison of least aquares and Stein-rule estimators

More information

CALCULATION METHOD FOR NONLINEAR DYNAMIC LEAST-ABSOLUTE DEVIATIONS ESTIMATOR

CALCULATION METHOD FOR NONLINEAR DYNAMIC LEAST-ABSOLUTE DEVIATIONS ESTIMATOR J. Japan Statist. Soc. Vol. 3 No. 200 39 5 CALCULAION MEHOD FOR NONLINEAR DYNAMIC LEAS-ABSOLUE DEVIAIONS ESIMAOR Kohtaro Hitomi * and Masato Kagihara ** In a nonlinear dynamic model, the consistency and

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS

UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS F. C. Nicolls and G. de Jager Department of Electrical Engineering, University of Cape Town Rondebosch 77, South

More information

Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global

Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global homas Laurent * 1 James H. von Brecht * 2 Abstract We consider deep linear networks with arbitrary convex differentiable loss. We provide a short and elementary proof of the fact that all local minima

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Adaptive Dual Control

Adaptive Dual Control Adaptive Dual Control Björn Wittenmark Department of Automatic Control, Lund Institute of Technology Box 118, S-221 00 Lund, Sweden email: bjorn@control.lth.se Keywords: Dual control, stochastic control,

More information

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 9, SEPTEMBER 2010 1987 Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE Abstract

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2 MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS SYSTEMS OF EQUATIONS AND MATRICES Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

INTRODUCTION TO INTERSECTION-UNION TESTS

INTRODUCTION TO INTERSECTION-UNION TESTS INTRODUCTION TO INTERSECTION-UNION TESTS Jimmy A. Doi, Cal Poly State University San Luis Obispo Department of Statistics (jdoi@calpoly.edu Key Words: Intersection-Union Tests; Multiple Comparisons; Acceptance

More information

Multivariate Time Series: VAR(p) Processes and Models

Multivariate Time Series: VAR(p) Processes and Models Multivariate Time Series: VAR(p) Processes and Models A VAR(p) model, for p > 0 is X t = φ 0 + Φ 1 X t 1 + + Φ p X t p + A t, where X t, φ 0, and X t i are k-vectors, Φ 1,..., Φ p are k k matrices, with

More information

A COMPARISON OF OUTPUT-ANALYSIS METHODS FOR SIMULATIONS OF PROCESSES WITH MULTIPLE REGENERATION SEQUENCES. James M. Calvin Marvin K.

A COMPARISON OF OUTPUT-ANALYSIS METHODS FOR SIMULATIONS OF PROCESSES WITH MULTIPLE REGENERATION SEQUENCES. James M. Calvin Marvin K. Proceedings of the 00 Winter Simulation Conference E. Yücesan, C.-H. Chen, J. L. Snowdon, and J. M. Charnes, eds. A COMPARISON OF OUTPUT-ANALYSIS METHODS FOR SIMULATIONS OF PROCESSES WITH MULTIPLE REGENERATION

More information

EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University September 22, 2005 0 Preface This collection of ten

More information

DETERMINANTS, COFACTORS, AND INVERSES

DETERMINANTS, COFACTORS, AND INVERSES DEERMIS, COFCORS, D IVERSES.0 GEERL Determinants originally were encountered in the solution of simultaneous linear equations, so we will use that perspective here. general statement of this problem is:

More information

EIE6207: Estimation Theory

EIE6207: Estimation Theory EIE6207: Estimation Theory Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: Steven M.

More information

A Comparison of Robust Estimators Based on Two Types of Trimming

A Comparison of Robust Estimators Based on Two Types of Trimming Submitted to the Bernoulli A Comparison of Robust Estimators Based on Two Types of Trimming SUBHRA SANKAR DHAR 1, and PROBAL CHAUDHURI 1, 1 Theoretical Statistics and Mathematics Unit, Indian Statistical

More information

ECE 275B Homework # 1 Solutions Version Winter 2015

ECE 275B Homework # 1 Solutions Version Winter 2015 ECE 275B Homework # 1 Solutions Version Winter 2015 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2

More information

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han Victoria University of Wellington New Zealand Robert de Jong Ohio State University U.S.A October, 2003 Abstract This paper considers Closest

More information

Chapter 4: Asymptotic Properties of the MLE

Chapter 4: Asymptotic Properties of the MLE Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider

More information

Gaussian, Markov and stationary processes

Gaussian, Markov and stationary processes Gaussian, Markov and stationary processes Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ November

More information

Dynamic System Identification using HDMR-Bayesian Technique

Dynamic System Identification using HDMR-Bayesian Technique Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in

More information

«Random Vectors» Lecture #2: Introduction Andreas Polydoros

«Random Vectors» Lecture #2: Introduction Andreas Polydoros «Random Vectors» Lecture #2: Introduction Andreas Polydoros Introduction Contents: Definitions: Correlation and Covariance matrix Linear transformations: Spectral shaping and factorization he whitening

More information