Testing Estimator s Credibility Part I: Tests for MSE

Size: px
Start display at page:

Download "Testing Estimator s Credibility Part I: Tests for MSE"

Transcription

1 Testing Estimator s Credibility Part I: Tests for MSE X. Rong Li Zhanlue Zhao Department of Electrical Engineering University of New Orleans New Orleans, LA 748, USA xli@uno.edu, zhanluezhao@gmail.com Abstract - Most estimators and filters provide assessments of their own estimation error, often in the form of mean-square error. Are these self-assessments trustable? What is the degree to which they are trustable? This is Part I of a two-part series that provides answers to some of these questions, referred to as the credibility of the estimators. It formulates the concept of credibility, proposes tests for MSE-credibility, and discusses their superiority to the existing test. Numerical examples are provided to illustrate the utility and effectiveness of the proposed tests and the drawbacks of the existing test. Keywords: Performance evaluation, estimation, credibility, statistical test. Introduction Algorithms for parameter, signal, and state estimation are widely used in science and engineering. No matter how solid such an estimation algorithm, or estimator for short, is in theory, its performance and characteristics must be evaluated in practice to serve a number of purposes, such as verification of its validity, demonstration of its performance, and comparison with other estimators. More specifically, estimators are almost always derived on the basis of more or less restrictive assumptions. These assumptions are often not transparent to practitioners. It is not easy to verify the validity of these assumptions in many practical situations. For a practitioner, the validity of these assumptions per se is of little concern; what is important is whether the estimator works well for the application under consideration. This can be evaluated by stochastic simulation using a number of measures, such as those proposed or discussed in [4]. This paper deals with a closely related issue the credibility of an estimator. Many estimators provide selfassessments of their estimation errors based on some simplifying assumptions. These self-assessments carry useful information about the estimation errors and the capability of the estimators. It would be ashamed to waste such information. Compared with those used to derive the estimator itself, however, usually these assumptions are even less transparent to practitioners and harder to verify. Even worse, these self-assessments are not always reliable and they may even be misleading when the underlying assumptions are not accurate enough. Then important questions for Research supported in part by ARO grant W9NF , NASA/LEQSF grant (2-4)-, and Navy through Planning Systems Contract # N C-382. practitioners include: Can we trust these self-assessments? If not, are the estimators too optimistic or pessimistic? By how much amount? In other words, an important issue in practice is how credible an estimator s self-assessment is and how to determine this credibility both qualitatively and quantitatively. We refer to this issue as the credibility issue of an estimator. Evidently, it amounts to evaluating the self-assessments. Since an estimator is data-driven and its self-assessment is data-dependent in general, the evaluation of its selfassessment should be done only in a statistical sense. In practice, such a task is almost always done by the Monte- Carlo method via computer simulations. Albeit very important in practice, work on this issue has been scarce. Only limited treatments of this topic can be found in publications, e.g., [3, 4, 6, 5]. In our opinion, it has received attention far less than it deserves. As a result, it is rarely the case that a practitioner can answer the above important questions satisfactorily. The purpose of this series is three-fold. First, it provides a formal definition of the credibility of an estimator to facilitate further studies of this topic. Second, it discusses how the credibility of an estimator can be tested properly, along with relevant theoretical results; that is, how to properly answer such questions as Is the self-assessment of the estimator credible? Finally, it intends to stimulate further studies of this important topic. 2 The Notion of Credibility Clearly, the credibility issue has two related sides. On the qualitative side, it addresses whether an estimator s selfassessment is credible. The answer should be yes or no. Unfortunately, like many other decision problems, there is not a clear line that separates the two answers in many cases. An estimator accepted as credible by one user may be rejected by another user for the same case because, for example, the required levels of credibility may differ. Similarly, two noncredible estimators may have vastly different levels of noncredibility, which may lead to completely different actions. Therefore, it would be desirable if we could also quantify the amount by which an estimator is credible or noncredible. This will enable us to compare quantitatively the credibility levels of estimators. In this two-part series, however, we will only deal with the qualitative side of the credibility issue by statistical hypothesis testing. The quantitative side what we called credibility measures is the topic of a companion paper [5] as well as previous

2 publications [2, 3]. Estimation performance is usually described in terms of the estimation error. For simplicity, throughout the paper we will consider only the first two moments (mean and covariance) of the estimation error since most estimators will not be able to provide any other information about its performance. With the above considerations, we define the credibility of an estimator as follows. Definitions. An estimator is said to be credible at a level α ( α ) if the difference between its actual bias and/or mean-square error (MSE) matrix and its self-assessment (i.e., calculated bias and/or MSE matrix) is statistically insignificant at level α in the sense that the actual and the calculated values can be treated as statistically equal. The maximum level α at which an estimator is rejected as being noncredible is referred to as the noncredibility level of the estimator. An estimator is said to be optimistic (or pessimistic) in MSE at a level α if its computed MSE matrix is statistically smaller (or larger) than the actual MSE matrix at the α level. We emphasize that the word difference above should not be interpreted literally. In fact, we mean both (or either) additive difference and/or multiplicative difference. For example, a small difference in A and B could actually mean that A/B is close to, as well as A B is small. In fact, a matrix version of the ratio is used in both credibility tests and measures. The self-assessment of a noncredible estimator is not reliable/credible at the level considered. This is not to be confused with the reliability of the estimates per se. One type of reliability does not imply the other. The above definition makes it explicit that rigorously speaking, when we are speaking of the noncredibility of an estimator, the corresponding level should also be specified. However, it is not appropriate to define the (minimum) level at which the credibility of an estimator is not rejected as the credibility level of the estimator. In fact, we do not propose any definition of the credibility level due to a number of difficulties, as explained later. To our knowledge, the most notable prior publications that provide a considerable amount of treatment of the credibility issue are [3, 4], referred to as (finite-sample) consistency. They address the issue and present a chi-square significance test for determining whether a filter should be accepted as credible, although without a formal definition of the (finite-sample) consistency. The term credibility is recommended here because consistency is an extremely well-established concept widely used in statistics and its application areas, which differs very much from the concept of credibility, and furthermore, credible/credibility has been used in statistics as a technical term, such as the credible region and the degree of credibility, in reference to the commensurability of a hypothesis relative to data or evidence [24]. Terminology and notation. The quantity to be estimated, called estimatee, can be a time-invariant (or slowly varying) parameter, a (deterministic or random) process or signal, or in particular, the state of a (deterministic or random) system. We will use the term estimator to mean both parameter estimator and filter (in particular, state estimator). Let the n-dimensional estimatee, estimate, and estimation error be denoted by x, ˆx, and x, respectively. Note that n is reserved throughout the paper for the dimension of the estimate. We denote the actual MSE matrix of ˆx by P and the estimator-provided MSE matrix by P. Note that P and P are MSE matrices, not error covariance matrices. Each test always uses an independent and identically distributed sample of size N. All default vectors are column vectors. The determinant of A is denoted by A. The B - norm of a vector a is defined as a B = (a B a) /2, where a stands for the transpose of the column vector a. By y N(µ, Σ) we mean y is Gaussian distributed with mean µ and covariance matrix Σ; the corresponding density is denoted as N(y; µ, Σ). Chi-square distribution with n degrees of freedom is denoted as χ 2 n. We use y (µ, Σ) to denote that y has mean µ and covariance Σ. Note that since µ = E[ x] and P = MSE(ˆx), the actual covariance matrix of the estimation error x is Σ = P µ (µ ) and so we always have x (µ, P µ (µ ) ). 3 NEES-Based Credibility Test 3. ANEES and Its Mean and Variance We first describe the existing method for credibility evaluation. It is based on the so-called normalized estimation error squared (NEES), defined by ǫ i = (x i ˆx i ) Pi (x i ˆx i ) = x i 2 P i where x i, ˆx i, and P i are the estimatee, estimate and MSE matrix provided by the estimator on the ith run. NEES can be interpreted as the distance squared between x and ˆx in terms of the P -norm. It depends on the dimension n of x. To remove this dependence, we define the average normalized estimation error squared (ANEES) by ǫ = nn N ǫ i = N i= N i= x i 2 /n P i which we believe appeared first in [6]. In this sense, the ANEES is a one-dimensional-equivalent average distance between x and ˆx in terms of the P -norm. Assume x N(, P ). It can be shown that if the estimator-computed MSE P is not random, then E[ ǫ] = tr(p P ), var( ǫ) = 2tr(P P P P ) n n 2 N These results hold true even if P is very different from the actual MSE P. 3.2 The Test The above results show that if P = P then E[ ǫ] = regardless of the dimension n and var( ǫ) = 2/(nN), which decreases towards zero as sample size N increases. It follows that loosely speaking, the ANEES has a mean larger (or smaller) than if the actual MSE matrix P is larger (or smaller) than the estimator provided MSE P. The closer to the ANEES is, the closer to P the estimator-provided P is. In view of this, a simple conclusion is that an estimator ()

3 is optimistic (or pessimistic) if its ANEES is significantly larger (or smaller) than. However, how much larger (or smaller) is significant? This is usually answered by the chisquare significance test, described next. Credibility testing based on the ANEES relies on the following fact: When x N(, P ) and P P, NEES and thus ANEES are not chi-square distributed because by the Ogasawara-Takahashi theorem (see, e.g., [22]) x P x is chi-square if and only if P P P P P = P P P (which is equivalent to P = P if P is nonsingular), in which case the degrees of freedom is rank(p P ). If the assumptions, models, and approximations based on which the MSE P is computed are valid, P should be close to the actual MSE matrix P. As such, assuming x N(, P ) the NEES ǫ i as a quadratic form in x should have a standard χ 2 n distribution approximately. Then, nn ǫ = N i= ǫ i should be (approximately) the sum of N independent χ 2 n random variables, that is, chi-square χ 2 nn distributed with nn degrees of freedom, which has mean nn and variance 2nN. As such, the MSE-credibility can be evaluated based on the chi-square significance test: Reject P = P if ǫ is outside of the interval [a, b] such that P { ǫ [a, b] nn ǫ χ 2 nn } = α for a < < b and some very small α. In addition, the estimator can be deemed optimistic (or pessimistic) if ǫ > b (or ǫ < a). The rejection of P = P is based on the following principle of small-probability events, sometimes referred to as Cournot s principle: The occurrence of an event of an extremely small probability on a single trial has a profound implication the model (or assumptions) based on which the probability is calculated is incorrect and should be abandoned. It is this principle that underlies the acceptance of a theory as scientific. Specifically, a theory, no matter how exotic or bizarre, is confirmed to be correct and generally accepted if its unexpected predictions are verified by experiments or observations. Good examples include Einstein s general theory of relativity and the Big-Bang theory in cosmology, which are much more bizarre than most science fictions. What is particularly amazing is that these theories were accepted by most physicists in the same field right after only a few bold predictions were verified. Why was this the case? The answer lies in Cournot s principle. Assuming these theories were not correct, a verification of any of their bold predictions would be highly improbable. Given such verifications, the underlying assumption that they are incorrect must be abandoned. According to this principle, P = P should be rejected only if a very small-probability event occurred on a single trial, that is, when the ANEES is outside the interval of a very high probability (say, 99%). The higher the probability, the more confident we can reject P = P. When the ANEES is outside the 99% probability interval, we can say P P with at least 99% confidence or the MSEnoncredibility level is at least 99%. The ANEES is used to reject P = P in the sense that P is not statistically acceptable as the actual MSE P. It is usually a good indication that the assumptions and approximations used by the estimator to compute P are not sufficiently accurate. If the ANEES is much larger (or smaller) than, the actual estimation error is much larger (or smaller) than what the estimator believes; that is, the estimator is too optimistic (or pessimistic). To avoid unnecessary confusion, care should be exercised when using noncredibility levels. For instance, if the ANEES of an estimator is outside the 95% interval but inside the 99% interval, then the estimator may be deemed not credible with a ( confidence ) level between 95% and 99%, that is, the noncredibility level is between 95% and 99%. However, such noncredibility levels should be used only when we have rejected the estimator as being noncredible and the level is very high (say, at or above 95%). In other words, we should use such noncredibility levels only for noncredible estimators because a credible estimator may have a seemingly high noncredibility level, which can be confusing to most practitioners. When the ANEES is outside an interval of a not high probability (say, 85%), rejection of the credibility of an estimator and treating it with a noncredibility level of 85% lacks solid ground and is in fact quite questionable because the underlying smallprobability-event principle is not applicable here. In short, a noncredibility level is meaningful only if it is very high for the significance test. On the other hand, in the case when the ANEES falls inside its 99% probability interval, it is not appropriate to think that the estimator has a 99% credibility level; otherwise all estimators are credible at the % level because every ANEES falls inside the % probability interval, which is [, )! Given the above definition of noncredibility levels, if we require that the credibility and noncredibility levels sum up to unity, then the credibility level of many credible estimators are very low (e.g., 5% or lower). This is highly undesirable. To define the credibility level as the minimum level at which an estimator s self-assement is accepted (i.e., not rejected) as credible is seriously flawed. For instance, suppose that the ANEES is at an end point of its 95% probability interval (say, equal to.43 for χ 2 4 ). This definition states that the estimator has a 95% credibility level. However, from our definition of noncredibility level, it also has a 95% noncredibility level. Clearly, the estimator should be defined as having a 95% noncredibility level because the ANEES is supposed to be in the interval with 95% probability. Otherwise the credibility level is higher (say 99%) if the ANEES is further away from (say,.92)! These examples illustrates that it is not easy to define a simple, quantitative credibility level for an estimator properly. The above difficulty stems from the inherent difficulty in interpreting the confidence level of a decision for hypothesis testing based on probability (or confidence) intervals, i.e., significance tests. No simple, general, and good solution is available within this framework. A better solution is to use credibility measures, the topic of [5, 3, 2], rather than test-based confidence levels. The credibility test discussed here is essentially the same as that presented in [3] and discussed in [4, 5]. The difference is the introduction of the /n factor in the ANEES that makes the mean of the ANEES invariant w.r.t. the dimension of x. This makes it more convenient to use. Also, we believe that the underlying principle is elucidated better here, which helps avoid pitfalls. Note that the tests based on measurement residual proposed in [3] actually deal with filter s optimality, not credibility.

4 3.3 Drawbacks of NEES-Based Test The above NEES-based chi-square test has some obvious and subtle drawbacks, as explained next. First and perhaps most importantly, as explained before, the above chi-square significance test is based on the smallprobability-event principle and is justifiable only when the ANEES ǫ is outside an interval of a very large probability, say,.95. If ǫ is inside the interval, no conclusion can be drawn in principle, although a wide-spread misconception and malpractice is to conclude that P = P. Were this practice correct, we could always conclude that P = P by using a large enough probability, say, , so that the interval [a, b] becomes large enough to include ǫ. Put it differently, the chi-square significance test should only be used to reject the hypothesis P = P. If ǫ is inside the interval, we do not even know whether P = P is more likely than P P or not, let alone to conclude P = P. For instance, data may even suggest that P = P is false but it is not strong enough to make a convincing case (beyond a reasonable doubt) that it is false. This subtle drawback is inherent in all significance tests for a single hypothesis in general, not just for the above ANEES chi-square test in particular. Strangely enough, this drawback is not well known in the engineering community. This lack of knowledge and warning, along with the deceptively simple structure of the significance test, gives rise to the wide-spread misconception and malpractice mentioned above. Second, as explained before the use of this test can be justified only when the noncredibility level is very high (say, not lower than 95%). Third, under the assumption x (, P ), we have E[ ǫ] = tr(p P )/n = λ(p /2 P P /2 ) where λ(a) is the arithmetic average of the eigenvalues of A. Consequently, given any estimator-computed MSE matrix P, the ANEES will be around provided the average eigenvalue of Py = P /2 P P /2 is approximately. Obviously P can be far from P even if λ(p y ) =. On the other hand, the acceptance interval [a, b] of the above ANEES-based chi-square test must include. So there is no reason to expect that the above chi-square test can tell P from P even if they are very different provided λ(p y ), although the ANEES is not chi-square distributed in this case. This drawback has been verified repeatedly by simulation results, including those of Sec. 5. Also, from the Rayleigh-Ritz theorem, λ min (A) a Aa/(a a) λ max (A), it follows immediately that x x/λ max (P) x P x x x/λ min (P). So ǫ will be close to if P is such that λ max (P) E[ x x] λ min (P), although this P can be far from P. These observations all support the above statement that the ANEESbased chi-square test can only be used to reject, rather than verify, the credibility. Fourth, the NEES and thus the ANEES do not account for the bias at all. As such, they only evaluate the MSE credibility alone they cannot be used for bias-credibility or bias-mse joint credibility evaluation of any sort. In a nutshell, except for a rejection of the MSE credibility at a very high confidence, any other conclusion of the Except that we cannot conclude with 95% or higher confidence that the estimator is MSE-noncredible. NEES-based significance test in fact lacks solid scientific ground and can be very misleading; the test should only be used to reject the claim of MSE-credibility very roughly; it is incapable of judging whether the estimator is credible in terms of MSE and/or bias. While the NEES-based chi-square significance test is seriously flawed, its introduction in [3] was an important, pioneering contribution. Fortunately, much better tests are available that can be used to judge the credibility of an estimator in terms of MSE and/or bias. Tests for the MSE are presented in the next section and those for the bias alone and bias-mse jointly are presented in Part II [6]. 4 New Tests for MSE-Credibility 4. MSE-Credibility as Covariance Testing In this case, although the actual bias µ is unknown, we are only concerned with the MSE matrix. This is a common situation in practice, where only the MSE matrix P is provided by the estimator since it assumes zero bias. So P is actually also the estimator-computed error covariance matrix C. In this case, the problem of testing the MSE matrix H : P = P vs H : P P is equivalent to testing the error covariance H : Σ = P vs. H : Σ P Assume that x (µ, Σ ) but µ and Σ are unknown. Let y = P /2 x. Then µ = E[y] = P /2 µ and Σ = cov(y) = P /2 Σ P /2 Clearly, Σ = P if and only if Σ = I. As such, the above problem is equivalent to testing H : Σ = I vs. H : Σ I We refer to this problem as Problem P Σ. It is a standard hypothesis testing problem in multivariate statistical analysis, where generally accepted, highly effective solutions are available and not surprisingly the chi-square significance test is not one of them. The credibility tests we proposed below among to adapting these solutions to the credibility problems. As with the ANEES-based chi-square test, we assume that the data set { x, x 2,..., x N } is available for credibility evaluation. This is a fundamental assumption that requires precise knowledge of the estimatee x, which is hardly available except in computer simulations. Further, for notational brevity, let ȳ = N N y i, V = i= N i= (y i ȳ)(y i ȳ), S = V N (2) Note that ȳ is the sample mean and V is a scaled version of sample covariance S. 4.2 Generalized Likelihood Ratio Test Assume y N(µ, Σ) with unknown µ and Σ. As shown in [], the generalized likelihood ratio test (GLRT) for Problem P Σ is: Reject H if Λ > t or reject H if Λ < t (note

5 that P {Λ = t} = ). Here the generalized likelihood ratio Λ is given by ( e ) nn/2 Λ = V N/2 exp[ tr(v )/2] N where V was defined in (2), e is the base of the natural logarithm ln, tr(v ) and V are the trace and determinant of V. This test is biased. So, we consider the following modified generalized likelihood ratio test []: Reject H if φ c or reject H if φ < c, where φ = 2 ln Λ = tr(v ) + nl(lnl ) L ln V L = N is the degrees of freedom of V, and Λ = Λ N=L. This modified GLRT is unbiased, meaning that the probability of correctly rejecting H (i.e., detection probability) is never smaller than that of incorrectly rejecting H (i.e., false alarm probability). While this looks like an easy requirement, some good tests for the multivariate case do not satisfy it. The distribution of φ under H is needed to determine the threshold c. Under H, the statistic φ has a complex exact distribution when the sample size N is small [9], but asymptotically (as N becomes large, say, larger than ) it tends to have a χ 2 n(n+)/2 distribution. More precisely the statistic ρφ has a χ 2 n(n+)/2 distribution approximately for moderately large N (say, larger than 2), where ρ = 2n2 +3n 6(N )(n+) [] (see also, e.g., [, 8, 2]). Tables of exact 5% and % significance points of φ were presented in [9] and included in []. The corresponding tables of approximate points can be found in [, 8]. It is sensible that the power of these GLRTs increases monotonically as λ i (Σ) increases for each eigenvalue λ i [7], [23]. 4.3 Union-Intersection Test As shown in [2], [23], [7], by the union-intersection principle for Problem P Σ, we have H : Σ = I a H : Σ I a H a, where H a : a Σa = a a H a, where H a : a Σa a a where a is any nonzero vector, meaning that H is true if and only if all H a are true and H is true if and only if any of H a is true. Recall that y i = P /2 x i and V was defined in (2). Given a, the test for H a vs H a is: reject H a if a V a/a a / [c, c 2 ], where the thresholds c and c 2 can be determined by the fact that a V a/a a χ 2 N under H [23]. Hence, the test for H vs H is: reject H if max a a V a/a a c 2 or min a a V a/a a c. It is well known that λ min (V ) a V a a a λ max(v ) where the left and right inequalities become equalities if and only if a is the eigenvector associated with the smallest λ min and largest eigenvalues λ max of V, respectively. Finally, the test for Problem P Σ is: reject H if λ max (V ) c 2 or λ min (V ) c. Tables of the distribution of c and c 2 under H are given by Table 5 of [2], where it appears that c.25 = c. βd and c.5 = c. + γd on Page 3 should be c.25 = c. + βd and c.5 = c. γd. The UIT does not rely on the small-probability-event principle although its component tests are chi-square tests: Since Σ = I if and only if λ max (Σ) = λ min (Σ) =, the fact that [λ min (V ), λ max (V )] belongs to a small interval [c, c 2 ] is a compelling evidence for Σ = I. This is not the case for ǫ [a, b] for the ANEES-based test. In general, the UIT is consistent 2 if the univariate component tests are so, unbiased under certain conditions, and admissible if the univariate component tests are admissible 3 [2] (see also [7]). 4.4 Discussions Invariance. Problem P Σ is clearly invariant under orthogonal transformations and additive transformations: u = Ay + b, where A is an orthogonal matrix. As a result, our tests should also be invariant under these transformations. It can be shown (see, e.g., [2]) that a test for Problem P Σ is invariant under these transformations if and only if the test statistic depends on the data only through the eigenvalues of V. Unfortunately, there is no uniformly most powerful invariant test for this problem [2]. Eigenanalysis. For the null hypothesis Σ = I, it is sensible that the test can be done by checking whether the eigenvalues of the sample covariance V/N as an estimate of Σ are all within a small interval centered at. It turns out that the GLRT, UIT, and the ANEES-based test correspond to three different ways of checking. From the identities tr(v ) = n λ(v ) and V = [ λ(v )] n, where λ(v ) and λ(v ) are the arithmetic and geometric averages of the eigenvalues of V, respectively, it follows that the test statistic for the modified GLRT φ = n[ λ(v ) + L(lnL ) L ln λ(v )] depends on the data only through λ(v ) and λ(v ). The UIT also depends on the data only through the maximum and minimum eigenvalues of V. Therefore, both tests are invariant under orthogonal and additive transformations. From this angle, it appears that the UIT is most natural, simple and intuitively appealing. Note that if the estimator-computed MSE is not datadependent, then it can be easily shown that ǫ = n λ(p /2 ˆPx P /2 ) and V = NC /2 Ĉ x C /2, where C = P is the estimator computed error covariance since the estimator assumes unbiased estimates (i.e., µ = ) and Ĉ x = (/N) N i= ( x i x)( x i x) is its sample version, where x = (/N) N i= x i, and ˆP x = (/N) N i= x i x i. Consider now the test of [9]: Reject H if tr(v ) t 2 or tr(v ) t. In other words, replace the criterion [λ min (V ), λ max (V )] [c, c 2 ] for the UIT by λ(v ) [c, c 2 ]. Clearly, this test is essentially equivalent to the ANEES-based chi-square test if the estimates are indeed unbiased in which case C = P, Ĉ x ˆP x and thus ǫ = n λ(p /2 ˆPx P /2 ) n λ(v/n) otherwise it is superior. It is, however, inferior to the UIT in terms of 2 A test for H against any fixed alternative H is consistent if its power converges to as the sample size increases. 3 A test is admissible if there is no other test that is never worse and sometime better than it.

6 thoroughness if λ(v/n) and λ max (V ) λ min (V ), its decisions are most likely to be incorrect, while the UIT is immune from such mistakes. In other words, the UIT does not suffer from the third drawback of the ANEES-based test explained in Sec GLRT vs. UIT. An important advantage of the GLRT is that 2 lnλ (Λ is the likelihood ratio) is chi-square distributed asymptotically under mild regularity conditions and thus it is relatively easy to determined the test threshold. No such general results are available for the UIT. As a consequence, the UIT is generally harder to develop and apply than the GLRT, but it carries more information than the GLRT since it can pinpoint the univariate components that reject H. The modified GLRT is unbiased. In fact, it is not only unbiased but also uniformly most powerful of all unbiased tests [23] for Problem P Σ. This is a very strong justification for the GLRT. Whether the above UIT is unbiased or not depends on the choice of c and c 2 [23]. Clearly the GLRT statistics rely more critically on the assumption of the data than the UIT statistics, which are valid provided the component statistics are valid. As a result, the UIT statistics are in general less sensitive to the assumption of the data than the GLRT statistics [7]. In addition, the UIT is less sensitive than the GLRT to the actual distribution of the alternative hypothesis. In view of this, it appears that the UIT is preferable to the GLRT when y is far from Gaussian distributed or far from the null hypothesis; the GLRT is probably the better choice when y is (approximately) Gaussian distributed. 5 Illustrative Examples In this section, we provide some simple examples to demonstrate the tests proposed as well as a comparison with the one based on the ANEES. All results are based on Monte Carlo runs. Unless otherwise stated, the vertical axis is always the probability of rejecting the null hypothesis (P = P ) P { H } = P { H µ, Σ} = P { H µ, Σ}, where µ, Σ are the true mean and covariance of the (transformed) estimation error. Example. It follows from the relationship y = ỹ µ that all cases with some µ and Σ are equivalent to the case with µ = and Σ = Σ. 4 So, for performance evaluation we only consider µ =. Specifically, we consider two cases of the truth y N(, Σ) with (a) Σ = σ 2 I and (b) Σ = σ [ 2 3], where σ 2 2. Note that the null hypothesis P = P is true when and only when σ 2 = in case (a); so a large P { H } is desirable in case (a) when σ 2 is quite different from or in case (b). For this problem x P x = y y since y = P /2 x. Thus the ANEESbased significance test checks whether y y is outside the chi-square probability interval. Fig. (a) shows the probabilities of rejecting H vs. σ 2 for case (a) using our proposed GLRT and UIT as well as the existing ANEES-based test, each using data of size in each run with two different test thresholds corresponding to type I error probabilities of α =. and.5, respectively. Fig. (b) shows the 4 Of course, they are different if P is actually the MSE matrix rather than error covariance α GLRT =5% α GLRT =% α UIT =5% α UIT =% α χ 2=5% α χ 2=% σ 2 (a) Σ = σ 2 I α GLRT =5% α GLRT =% α UIT =5% α UIT =% α χ 2=5% α χ 2=% σ 2 (b) Σ = σ 2[ 3 Figure : Probability of rejecting Σ = I vs. parameter σ 2. rejection probability for case (b), where our GLRT and UIT always have (correctly) a unity rejection probability while the ANEES-based test does not. As can be seen, while the differences between our tests and the existing one are minor in case (a), in case (b) no matter what value σ 2 is, the hypothesis P = P is never true and our GLRT and UIT always reject it correctly, but the ANEES-based test could not reject it for σ 2 close to.5. This is because the arithmetic average of the eigenvalues of Σ =.5 [ 3] ] is. This demonstrates the third drawback of the ANEES-based test explained in Sec Such failures of the ANEES-based test occur often in our simulations. Example 2. The above examples use Gaussian distributed data, as assumed in the tests. Here we consider testing Problem P Σ (H : Σ = I against H : Σ I) with non-gaussian data y having a Gaussian mixture distribution, given by f(y) = pn(y; µ, Σ )+( p)n(y; µ 2, Σ 2 ), p It can be easily shown that µ = E[y] = pµ + ( p)µ 2 Σ = cov(y) = p(σ + µ µ ) + ( p)(σ 2 + µ 2 µ 2) For simplicity, we consider two cases: (a) µ = [, ], µ 2 = and Σ = Σ 2 = σ 2 I and (b) µ = µ 2 = [, ] and Σ = Σ 2 = σ [ 2 3], where σ 2. Figs. 2(a) and (b) show the probabilities of rejecting H vs. p and σ 2 using our proposed GLRT and the existing ANEES-based test, respectively, each using data of size in each run ]

7 σ 2 p.2. p.5 σ 2 (a) GLRT (b) χ 2 test Figure 2: Probability of rejecting Σ = I vs. parameters p and σ 2 in case (a). with the thresholds corresponding to α =.5. Note that the distribution becomes more and more non-gaussian as p increases from to.5. The null hypothesis Σ = I (i.e., P = P ) is true in case (a) when and only when (p, σ 2 ) = (, ); so a large P { H } is desirable in other situations. It is never true in case (b). It turned out that all three tests always correctly reject the null hypothesis in case (b) (results not shown); for case (a), the P { H } surface of our GLRT is flat with a height of except for the cone around (p, σ 2 ) = (, ), meaning that our proposed test always rejects the null hypothesis correctly except when it should not be rejected [i.e., when (p, σ 2 ) is around (, ) in case (a)]. This indicates that our GLRT test is not sensitive to the Gaussianity assumption. The results of the UIT are similar to those of the GLRT and are not shown. However, the P { H } surface of the ANEES-based test has an undesirable valley, indicating that it cannot reject some clearly noncredible cases when the Gaussianity assumption is invalid. It has been verified that the arithmetic averages of the eigenvalues of Σ corresponding to the bottom of this valley are indeed around. This once again demonstrates the third drawback of the ANEESbased test explained in Sec Example 3. Consider now a simple filtering example, that is, tracking a target of a nearly constant-velocity onedimensional motion using position-only measurements under the linear-gaussian assumption of the Kalman filter. The system is given by x k+ = [ T z k = [, ]x k + v k ] x k + [ T 2 /2 T ] w k, x k = [x k, ẋ k ] w k =, cov(w k ) = σ 2, v k =, cov(v k ) = 2 2 where T = and the true value of σ is always 3. Consider three cases where the Kalman filter assumes σ = 2, 3, 4, respectively. The filter with σ = 3 is matched exactly, but with σ = 2 or 4 it is mismatched it is optimistic for σ = 2 and pessimistic for σ = 4. The filter was initialized by a weighted least-squares fit to the first two measurements, known as the two-point differencing in the target tracking community. Fig. 3 shows the average GLRT statistic over Monte-Carlo runs, along with the threshold (the horizontal line) corresponding to 5% type I error (false alarm) prob- GLRT Time σ=2 σ=3 σ=4 Figure 3: Generalized likelihood ratio test statistic for MSE-credibility over time. ability. We should not pay much attention on the first or so time steps because of the transient due to non-ideal initialization. It can be seen that the matched filter s MSE matrix is deemed credible most of the time (only 4 out of 5 points are slightly larger than the threshold, consistent with the 5% type I error probability), while the mismatched filter with σ = 2 is almost always rejected and the one with σ = 4 is rejected except over a small portion of time. The difference between the cases with σ = 2 and with σ = 4 makes good sense: It is better to be more conservative in choosing the process noise covariance (see, e.g., [8]). Also, the ratio of σ = 3 to σ = 2 is larger than that of σ = 4 to σ = 3 (the true ratio that matters most is those between the MSE matrices, not process noise covariances). Similar results hold true for the chi-square significance test based on ANEES, except that the matched filter is rejected about 5 times out of 5 time steps and for the rest the test can neither reject nor accept the credibility of the filter-computed MSE matrix.

8 6 Summary The problem of the MSE-credibility of an estimator whether and how much an estimator computed MSE matrix can be trusted has been formulated. A number of tests for MSE-credibility have been proposed. An in-depth discussion of the existing NEES-based significance test for MSE-credibility has been presented, which elucidates the underlying principle and reveals some fundamental drawbacks of the existing test. Not only are the new tests developed immune from these drawbacks, but they also enjoy many desirable properties, including some optimality properties. Simple, illustrative examples have been given, which demonstrate the performance of the proposed tests as well as their superiority to the existing one. Some of the drawbacks of the NEES-based test have been demonstrated clearly via simulation results. These results question the wisdom of continuing using the existing solution. References [] T. W. Anderson. An Introduction to Multivariate Statistical Analysis. Wiley, New York, 2nd edition, 984. [2] S. F. Arnold. The Theory of Linear Models and Multivariate Analysis. Wiley, New York, 98. [3] Y. Bar-Shalom and K. Birmiwal. Consistency and Robustness of PDAF for Target Tracking in Cluttered Environments. Automatica, 9:43 437, July 983. [4] Y. Bar-Shalom and X. R. Li. Estimation and Tracking: Principles, Techniques, and Software. Artech House, Boston, MA, 993. (Reprinted by YBS Publishing, 998). [5] Y. Bar-Shalom, X. R. Li, and T. Kirubarajan. Estimation with Applications to Tracking and Navigation: Theory, Algorithms, and Software. Wiley, New York, 2. [6] O. E. Drummond, X. R. Li, and C. He. Comparison of Various Static Multiple-Model Estimation Algorithms. In Proc. 998 SPIE Conf. on Signal and Data Processing of Small Targets, vol. 3373, pages 5 527, Apr [7] N. C. Giri. Multivariate Statistical Inference. Academic Press, New York, 977. [8] A. H. Jazwinski. Stochastic Processes and Filtering Theory. Academic Press, New York, 97. [9] J. Kiefer and R. Schwartz. Admissible Bayes Character of T 2 - and R 2 - and Other Fully Invariant Tests for Classical Normal Problems. Ann. Math. Statist., 36:747 76, 965. [] B. P. Korin. On the Distribution of a Statistic Used for Testing a Covariance Matrix. Biometrika, 55:7 78, 968. [] P. K. Krishnaiah and J. K. Lee. Likelihood Ratio Tests for Mean Vectors and Covariance Matrices. In P. K. Krishnaiah, editor, Handbook of Statistics, volume, pages North-Holland, 98. [2] X. R. Li, Z. Zhao, and V. P. Jilkov. Practical Measures and Test for Credibility of an Estimator. In Proc. Workshop on Estimation, Tracking, and Fusion A Tribute to Yaakov Bar-Shalom, pages , Monterey, CA, May 2. [3] X. R. Li, Z. Zhao, and V. P. Jilkov. Estimator s Credibility and Its Measures. In Proc. IFAC 5th World Congress, Barcelona, Spain, July 22. Paper no [4] X. R. Li and Z.-L. Zhao. Evaluation of Estimation Algorithms Part I: Incomprehensive Performance Measures. IEEE Trans. Aerospace and Electronic Systems, AES-42(3), July 26. [5] X. R. Li and Z.-L. Zhao. Measuring Estimator s Credibility: Noncredibility Index. In Proc. 26 International Conf. on Information Fusion, Florence, Italy, July 26. [6] X. R. Li and Z.-L. Zhao. Testing Estimator s Credibility Part II: Other Tests. In Proc. 26 International Conf. on Information Fusion, Florence, Italy, July 26. [7] K. V. Mardia, J. T. Kent, and J. M. Bibby. Multivariate Analysis. Academic Press, London, 979. [8] R. J. Muirhead. Aspects of Multivariate Statistical Theory. Wiley, New York, 982. [9] B. N. Nagarsenker and K. C. S. Pillai. Distribution of the Likelihood Ratio Criterion for Testing a Hypothesis Specifying a Covariance Matrix. Biometrika, 6: , 973. [2] H. K. Nandi. On Some Properties of Roy s Union- Intersection Tests. Cakcytta Statist. Assoc. Bull., 4:9 3, 965. [2] E. S. Pearson and H. O. Hartley. Biometrika Tables for Statisticians, volume 2. Cambridge University Press for Biometrika Trustees, Cambridge, England, 972. [22] C. R. Rao. Linear Statistical Inference and Its Applications. Wiley, New York, 2nd edition, 973. [23] M. S. Srivastava and C. G. Khatri. An Introduction to Multivariate Statistics. North-Holland, New York, 979. [24] A. Stuart, K. Ord, and S. Arnold. Kendall s Advanced Theory of Statistics Vol. 2A: Classical Inference and the Linear Model. Arnold, London, 999.

Optimal Linear Unbiased Filtering with Polar Measurements for Target Tracking Λ

Optimal Linear Unbiased Filtering with Polar Measurements for Target Tracking Λ Optimal Linear Unbiased Filtering with Polar Measurements for Target Tracking Λ Zhanlue Zhao X. Rong Li Vesselin P. Jilkov Department of Electrical Engineering, University of New Orleans, New Orleans,

More information

Lecture Note 1: Probability Theory and Statistics

Lecture Note 1: Probability Theory and Statistics Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 1: Probability Theory and Statistics Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 For this and all future notes, if you would

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26. Estimation: Regression and Least Squares

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26. Estimation: Regression and Least Squares CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26 Estimation: Regression and Least Squares This note explains how to use observations to estimate unobserved random variables.

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

Normal (Gaussian) distribution The normal distribution is often relevant because of the Central Limit Theorem (CLT):

Normal (Gaussian) distribution The normal distribution is often relevant because of the Central Limit Theorem (CLT): Lecture Three Normal theory null distributions Normal (Gaussian) distribution The normal distribution is often relevant because of the Central Limit Theorem (CLT): A random variable which is a sum of many

More information

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Statistical inference

Statistical inference Statistical inference Contents 1. Main definitions 2. Estimation 3. Testing L. Trapani MSc Induction - Statistical inference 1 1 Introduction: definition and preliminary theory In this chapter, we shall

More information

Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA

Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box 90251 Durham, NC 27708, USA Summary: Pre-experimental Frequentist error probabilities do not summarize

More information

A nonparametric two-sample wald test of equality of variances

A nonparametric two-sample wald test of equality of variances University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA

More information

Testing Some Covariance Structures under a Growth Curve Model in High Dimension

Testing Some Covariance Structures under a Growth Curve Model in High Dimension Department of Mathematics Testing Some Covariance Structures under a Growth Curve Model in High Dimension Muni S. Srivastava and Martin Singull LiTH-MAT-R--2015/03--SE Department of Mathematics Linköping

More information

IENG581 Design and Analysis of Experiments INTRODUCTION

IENG581 Design and Analysis of Experiments INTRODUCTION Experimental Design IENG581 Design and Analysis of Experiments INTRODUCTION Experiments are performed by investigators in virtually all fields of inquiry, usually to discover something about a particular

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

POL 681 Lecture Notes: Statistical Interactions

POL 681 Lecture Notes: Statistical Interactions POL 681 Lecture Notes: Statistical Interactions 1 Preliminaries To this point, the linear models we have considered have all been interpreted in terms of additive relationships. That is, the relationship

More information

Construction of Combined Charts Based on Combining Functions

Construction of Combined Charts Based on Combining Functions Applied Mathematical Sciences, Vol. 8, 2014, no. 84, 4187-4200 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.45359 Construction of Combined Charts Based on Combining Functions Hyo-Il

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

On Selecting Tests for Equality of Two Normal Mean Vectors

On Selecting Tests for Equality of Two Normal Mean Vectors MULTIVARIATE BEHAVIORAL RESEARCH, 41(4), 533 548 Copyright 006, Lawrence Erlbaum Associates, Inc. On Selecting Tests for Equality of Two Normal Mean Vectors K. Krishnamoorthy and Yanping Xia Department

More information

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS Communications in Statistics - Simulation and Computation 33 (2004) 431-446 COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS K. Krishnamoorthy and Yong Lu Department

More information

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Berkman Sahiner, a) Heang-Ping Chan, Nicholas Petrick, Robert F. Wagner, b) and Lubomir Hadjiiski

More information

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone

More information

Part 4: Multi-parameter and normal models

Part 4: Multi-parameter and normal models Part 4: Multi-parameter and normal models 1 The normal model Perhaps the most useful (or utilized) probability model for data analysis is the normal distribution There are several reasons for this, e.g.,

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Fundamental Probability and Statistics

Fundamental Probability and Statistics Fundamental Probability and Statistics "There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are

More information

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2

EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME. Xavier Mestre 1, Pascal Vallet 2 EXTENDED GLRT DETECTORS OF CORRELATION AND SPHERICITY: THE UNDERSAMPLED REGIME Xavier Mestre, Pascal Vallet 2 Centre Tecnològic de Telecomunicacions de Catalunya, Castelldefels, Barcelona (Spain) 2 Institut

More information

STONY BROOK UNIVERSITY. CEAS Technical Report 829

STONY BROOK UNIVERSITY. CEAS Technical Report 829 1 STONY BROOK UNIVERSITY CEAS Technical Report 829 Variable and Multiple Target Tracking by Particle Filtering and Maximum Likelihood Monte Carlo Method Jaechan Lim January 4, 2006 2 Abstract In most applications

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

Lecture Outline. Target Tracking: Lecture 3 Maneuvering Target Tracking Issues. Maneuver Illustration. Maneuver Illustration. Maneuver Detection

Lecture Outline. Target Tracking: Lecture 3 Maneuvering Target Tracking Issues. Maneuver Illustration. Maneuver Illustration. Maneuver Detection REGLERTEKNIK Lecture Outline AUTOMATIC CONTROL Target Tracking: Lecture 3 Maneuvering Target Tracking Issues Maneuver Detection Emre Özkan emre@isy.liu.se Division of Automatic Control Department of Electrical

More information

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j. Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian

More information

Constructing Ensembles of Pseudo-Experiments

Constructing Ensembles of Pseudo-Experiments Constructing Ensembles of Pseudo-Experiments Luc Demortier The Rockefeller University, New York, NY 10021, USA The frequentist interpretation of measurement results requires the specification of an ensemble

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Primer on statistics:

Primer on statistics: Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

More information

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y

More information

Canonical Correlation Analysis of Longitudinal Data

Canonical Correlation Analysis of Longitudinal Data Biometrics Section JSM 2008 Canonical Correlation Analysis of Longitudinal Data Jayesh Srivastava Dayanand N Naik Abstract Studying the relationship between two sets of variables is an important multivariate

More information

Basic Concepts of Inference

Basic Concepts of Inference Basic Concepts of Inference Corresponds to Chapter 6 of Tamhane and Dunlop Slides prepared by Elizabeth Newton (MIT) with some slides by Jacqueline Telford (Johns Hopkins University) and Roy Welsch (MIT).

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Principles of Statistical Inference Recap of statistical models Statistical inference (frequentist) Parametric vs. semiparametric

More information

Do not copy, post, or distribute

Do not copy, post, or distribute 14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible

More information

Simple linear regression

Simple linear regression Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single

More information

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function

More information

Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption

Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption Alisa A. Gorbunova and Boris Yu. Lemeshko Novosibirsk State Technical University Department of Applied Mathematics,

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and Athens Journal of Sciences December 2014 Discriminant Analysis with High Dimensional von Mises - Fisher Distributions By Mario Romanazzi This paper extends previous work in discriminant analysis with von

More information

Bayesian vs frequentist techniques for the analysis of binary outcome data

Bayesian vs frequentist techniques for the analysis of binary outcome data 1 Bayesian vs frequentist techniques for the analysis of binary outcome data By M. Stapleton Abstract We compare Bayesian and frequentist techniques for analysing binary outcome data. Such data are commonly

More information

Large Sample Properties of Estimators in the Classical Linear Regression Model

Large Sample Properties of Estimators in the Classical Linear Regression Model Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Fisher Information Matrix-based Nonlinear System Conversion for State Estimation

Fisher Information Matrix-based Nonlinear System Conversion for State Estimation Fisher Information Matrix-based Nonlinear System Conversion for State Estimation Ming Lei Christophe Baehr and Pierre Del Moral Abstract In practical target tracing a number of improved measurement conversion

More information

Time Series Prediction by Kalman Smoother with Cross-Validated Noise Density

Time Series Prediction by Kalman Smoother with Cross-Validated Noise Density Time Series Prediction by Kalman Smoother with Cross-Validated Noise Density Simo Särkkä E-mail: simo.sarkka@hut.fi Aki Vehtari E-mail: aki.vehtari@hut.fi Jouko Lampinen E-mail: jouko.lampinen@hut.fi Abstract

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

y ˆ i = ˆ  T u i ( i th fitted value or i th fit) 1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u

More information

Inferences about a Mean Vector

Inferences about a Mean Vector Inferences about a Mean Vector Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University

More information

Sliding Window Test vs. Single Time Test for Track-to-Track Association

Sliding Window Test vs. Single Time Test for Track-to-Track Association Sliding Window Test vs. Single Time Test for Track-to-Track Association Xin Tian Dept. of Electrical and Computer Engineering University of Connecticut Storrs, CT 06269-257, U.S.A. Email: xin.tian@engr.uconn.edu

More information

Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression

Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression Working Paper 2013:9 Department of Statistics Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression Ronnie Pingel Working Paper 2013:9 June

More information

A NEW FORMULATION OF IPDAF FOR TRACKING IN CLUTTER

A NEW FORMULATION OF IPDAF FOR TRACKING IN CLUTTER A NEW FRMULATIN F IPDAF FR TRACKING IN CLUTTER Jean Dezert NERA, 29 Av. Division Leclerc 92320 Châtillon, France fax:+33146734167 dezert@onera.fr Ning Li, X. Rong Li University of New rleans New rleans,

More information

MULTIVARIATE ANALYSIS OF VARIANCE UNDER MULTIPLICITY José A. Díaz-García. Comunicación Técnica No I-07-13/ (PE/CIMAT)

MULTIVARIATE ANALYSIS OF VARIANCE UNDER MULTIPLICITY José A. Díaz-García. Comunicación Técnica No I-07-13/ (PE/CIMAT) MULTIVARIATE ANALYSIS OF VARIANCE UNDER MULTIPLICITY José A. Díaz-García Comunicación Técnica No I-07-13/11-09-2007 (PE/CIMAT) Multivariate analysis of variance under multiplicity José A. Díaz-García Universidad

More information

Asymptotic Analysis of the Generalized Coherence Estimate

Asymptotic Analysis of the Generalized Coherence Estimate IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 49, NO. 1, JANUARY 2001 45 Asymptotic Analysis of the Generalized Coherence Estimate Axel Clausen, Member, IEEE, and Douglas Cochran, Senior Member, IEEE Abstract

More information

V. Properties of estimators {Parts C, D & E in this file}

V. Properties of estimators {Parts C, D & E in this file} A. Definitions & Desiderata. model. estimator V. Properties of estimators {Parts C, D & E in this file}. sampling errors and sampling distribution 4. unbiasedness 5. low sampling variance 6. low mean squared

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Questions?! C. Porciani! Estimation & forecasting! 2! Cosmological parameters! A branch of modern cosmological research focuses

More information

Online tests of Kalman filter consistency

Online tests of Kalman filter consistency Tampere University of Technology Online tests of Kalman filter consistency Citation Piché, R. (216). Online tests of Kalman filter consistency. International Journal of Adaptive Control and Signal Processing,

More information

F & B Approaches to a simple model

F & B Approaches to a simple model A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys

More information

ECON 4160, Autumn term Lecture 1

ECON 4160, Autumn term Lecture 1 ECON 4160, Autumn term 2017. Lecture 1 a) Maximum Likelihood based inference. b) The bivariate normal model Ragnar Nymoen University of Oslo 24 August 2017 1 / 54 Principles of inference I Ordinary least

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Statistical Methods in Particle Physics

Statistical Methods in Particle Physics Statistical Methods in Particle Physics Lecture 11 January 7, 2013 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline How to communicate the statistical uncertainty

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Advanced Signal Processing Introduction to Estimation Theory

Advanced Signal Processing Introduction to Estimation Theory Advanced Signal Processing Introduction to Estimation Theory Danilo Mandic, room 813, ext: 46271 Department of Electrical and Electronic Engineering Imperial College London, UK d.mandic@imperial.ac.uk,

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training

More information

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics The candidates for the research course in Statistics will have to take two shortanswer type tests

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses Space Telescope Science Institute statistics mini-course October 2011 Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses James L Rosenberger Acknowledgements: Donald Richards, William

More information

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Probability quantifies randomness and uncertainty How do I estimate the normalization and logarithmic slope of a X ray continuum, assuming

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

2 Statistical Estimation: Basic Concepts

2 Statistical Estimation: Basic Concepts Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:

More information

A COMPARISON OF TWO METHODS FOR STOCHASTIC FAULT DETECTION: THE PARITY SPACE APPROACH AND PRINCIPAL COMPONENTS ANALYSIS

A COMPARISON OF TWO METHODS FOR STOCHASTIC FAULT DETECTION: THE PARITY SPACE APPROACH AND PRINCIPAL COMPONENTS ANALYSIS A COMPARISON OF TWO METHODS FOR STOCHASTIC FAULT DETECTION: THE PARITY SPACE APPROACH AND PRINCIPAL COMPONENTS ANALYSIS Anna Hagenblad, Fredrik Gustafsson, Inger Klein Department of Electrical Engineering,

More information

AR-order estimation by testing sets using the Modified Information Criterion

AR-order estimation by testing sets using the Modified Information Criterion AR-order estimation by testing sets using the Modified Information Criterion Rudy Moddemeijer 14th March 2006 Abstract The Modified Information Criterion (MIC) is an Akaike-like criterion which allows

More information

STA121: Applied Regression Analysis

STA121: Applied Regression Analysis STA121: Applied Regression Analysis Linear Regression Analysis - Chapters 3 and 4 in Dielman Artin Department of Statistical Science September 15, 2009 Outline 1 Simple Linear Regression Analysis 2 Using

More information

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3083-3093 Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Julia Bondarenko Helmut-Schmidt University Hamburg University

More information

Unified approach to the classical statistical analysis of small signals

Unified approach to the classical statistical analysis of small signals PHYSICAL REVIEW D VOLUME 57, NUMBER 7 1 APRIL 1998 Unified approach to the classical statistical analysis of small signals Gary J. Feldman * Department of Physics, Harvard University, Cambridge, Massachusetts

More information

z = β βσβ Statistical Analysis of MV Data Example : µ=0 (Σ known) consider Y = β X~ N 1 (β µ, β Σβ) test statistic for H 0β is

z = β βσβ Statistical Analysis of MV Data Example : µ=0 (Σ known) consider Y = β X~ N 1 (β µ, β Σβ) test statistic for H 0β is Example X~N p (µ,σ); H 0 : µ=0 (Σ known) consider Y = β X~ N 1 (β µ, β Σβ) H 0β : β µ = 0 test statistic for H 0β is y z = β βσβ /n And reject H 0β if z β > c [suitable critical value] 301 Reject H 0 if

More information

Statistical techniques for data analysis in Cosmology

Statistical techniques for data analysis in Cosmology Statistical techniques for data analysis in Cosmology arxiv:0712.3028; arxiv:0911.3105 Numerical recipes (the bible ) Licia Verde ICREA & ICC UB-IEEC http://icc.ub.edu/~liciaverde outline Lecture 1: Introduction

More information

UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS

UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS F. C. Nicolls and G. de Jager Department of Electrical Engineering, University of Cape Town Rondebosch 77, South

More information

An improved procedure for combining Type A and Type B components of measurement uncertainty

An improved procedure for combining Type A and Type B components of measurement uncertainty Int. J. Metrol. Qual. Eng. 4, 55 62 (2013) c EDP Sciences 2013 DOI: 10.1051/ijmqe/2012038 An improved procedure for combining Type A and Type B components of measurement uncertainty R. Willink Received:

More information

Recall the Basics of Hypothesis Testing

Recall the Basics of Hypothesis Testing Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE

More information

1/24/2008. Review of Statistical Inference. C.1 A Sample of Data. C.2 An Econometric Model. C.4 Estimating the Population Variance and Other Moments

1/24/2008. Review of Statistical Inference. C.1 A Sample of Data. C.2 An Econometric Model. C.4 Estimating the Population Variance and Other Moments /4/008 Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University C. A Sample of Data C. An Econometric Model C.3 Estimating the Mean of a Population C.4 Estimating the Population

More information

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Statistical Methods in Particle Physics Lecture 1: Bayesian methods Statistical Methods in Particle Physics Lecture 1: Bayesian methods SUSSP65 St Andrews 16 29 August 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

More information

Test Volume 11, Number 1. June 2002

Test Volume 11, Number 1. June 2002 Sociedad Española de Estadística e Investigación Operativa Test Volume 11, Number 1. June 2002 Optimal confidence sets for testing average bioequivalence Yu-Ling Tseng Department of Applied Math Dong Hwa

More information

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30 Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain

More information

Chapter 7, continued: MANOVA

Chapter 7, continued: MANOVA Chapter 7, continued: MANOVA The Multivariate Analysis of Variance (MANOVA) technique extends Hotelling T 2 test that compares two mean vectors to the setting in which there are m 2 groups. We wish to

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

More information

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)? ECE 830 / CS 76 Spring 06 Instructors: R. Willett & R. Nowak Lecture 3: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we

More information

Interpreting Regression Results

Interpreting Regression Results Interpreting Regression Results Carlo Favero Favero () Interpreting Regression Results 1 / 42 Interpreting Regression Results Interpreting regression results is not a simple exercise. We propose to split

More information