Testing Distributed Parameter Hypotheses for the Detection of Climate Change

Size: px
Start display at page:

Download "Testing Distributed Parameter Hypotheses for the Detection of Climate Change"

Transcription

1 3464 JOURNAL OF CLIMATE VOLUME 14 Testing Distributed Parameter Hypotheses for the Detection of Climate Change HAROON S. KHESHGI AND BENJAMIN S. WHITE Corporate Strategic Research, ExxonMobil Research and Engineering Company, Annandale, New Jersey (Manuscript received 12 April 2000, in final form 1 February 2001) ABSTRACT A general statistical methodology, based on testing alternative distributed parameter hypotheses, is proposed as a method for deciding whether or not anthropogenic influences are causing climate change. This methodology provides a framework for including known uncertainties in the definition of the hypotheses by allowing model parameters to be specified by probability distributions and thereby allowing the definition of more realistic hypotheses. The method can be used to derive the unique statistical test that minimizes errors in test conclusions. The method is applied to illustrative detection problems by first defining alternative hypotheses for global mean temperature; second, deriving the most powerful test and calculating its statistics; third, applying the test to observed temperature records; and finally, illustrating the test statistics and results on a receiver or relative operating characteristic curve showing the relation between false positive and false negative test errors. It is demonstrated, with an illustrative example, that proper accounting for the uncertainty in all the parameters can produce very different statistical conclusions than the conclusions that would be obtained by simply fixing some parameters at nominal values. 1. Introduction A central problem in climate change research is how to derive rigorous statistical procedures to answer climate change questions. This is a recurring problem because typical climate change questions such as should we trust climate projections, how will climate change in the future, or have we damaged the climate system are not posed in a form that is directly amenable to statistical testing. Instead comparisons of data to idealized statistical models are generally used to address related climate change questions. To do this, first a statistical model is hypothesized to simulate the data. However, the relation between the real data and the statistical model is often incomplete: in particular, model parameters are usually known only within some band of uncertainty, and this uncertainty is not usually captured realistically in the model statistics. Next, with the model framework now given, a statistical problem is posed and solved. However, the relation between a statistical problem and the real climate change question is often imperfect, in that the statistical problem does not address the question precisely. In this paper we address these difficulties by proposing a new statistical methodology based on the theory of hypothesis testing. This method can test the truth or falsity of any statement about the probability space, and therefore potentially more precise Corresponding author address: Dr. Haroon S. Kheshgi, ExxonMobil Research and Engineering Co., Route 22E, Annandale, NJ hskhesh@erenj.com formulations of climate change questions are possible. Furthermore, the method can incorporate realistic estimates of parameter uncertainty, characterized by probability distributions. Three types of problems are commonly posed to answer climate change questions: consistency tests, parameter estimation, and hypothesis tests. These problems are not, in general, statistically equivalent. (i) Consistency tests are generally used to test for model validity. For example, one can test whether the recent climate record, or the paleoclimate record, is consistent with models of natural variability. However, such a test might be pressed into further use and interpreted as answering the related question of whether or not anthropogenic influences have caused climate change; that is, one might rule out natural causes if the data failed the consistency test (e.g., Tsonis and Elsner 1989). In general, there is not a unique or optimum consistency test, so there is some degree of arbitrariness about which test statistic is used. Thus, for example, two researchers analyzing exactly the same data and using exactly the same model can reach different conclusions merely because of their arbitrary choices of a test statistic. This dilemma is exemplified by Mann et al. (1998), who introduce the verification resolved variance statistic in place of the more usual r 2 statistic and find that highly significant values of are possible even when r 2 is only marginally significant American Meteorological Society

2 15 AUGUST 2001 KHESHGI AND WHITE 3465 (ii) Parameter estimation techniques are used to calibrate models. For linear models and known noise covariance, the Gauss Markov theorem (Beyer 1966) gives optimal estimates, in the sense that these estimates minimize the mean square error between the true parameter values and their estimates. More generally, for given prior estimates of the parameters, a Bayesian framework can provide optimal parameter estimates. However, a parameter estimate might be pressed into further use by posing a hypothesis about a parameter that is related to a climate change question. For instance, within a model framework, there may be a parameter that measures climate sensitivity to greenhouse gasses. One may test the hypothesis that this parameter is nonzero and interpret the result as answering the question of whether greenhouse gasses are causing climate change. Clearly the basic climate change question can be posed statistically in this way only in models that have been simplified enough to have a parameter or parameters directly related to the question posed. Furthermore, there is no reason to believe that the test, which may have incorporated optimal parameter estimation, is still optimal for answering the climate change question. Parameter estimation techniques are currently the most popular method for addressing climate change questions. Uncertainty is represented by free model parameters (Wigley and Raper 1990; Kheshgi and White 1993a; Hasselmann 1997) or prior probability distributions of model parameters (Hasselmann 1998; Leroy 1998; Tol and De Vos 1998). Detection and attribution tests are then postulated to be criteria on estimated climate model parameters; for example, is there 95% confidence that the amplitude of a greenhouse signal pattern is greater than zero. (iii) Hypothesis testing is the basis of detection theory, since detection of an event is simply the hypothesis that the event has occurred. The event to be detected may be any statement that can be made about the model or models being used. Note that when parameter estimation is used, the hypotheses to be tested cannot be stated with complete generality, but must be framed to test whether the parameters lie in some specified region of parameter space. Furthermore, a method that optimally estimates the parameters need not give the optimal test for a hypothesis about them. In contrast, the general theory of hypothesis testing, which we utilize below, allows the hypotheses to be any statements about the probability space, and the optimal test, when it exists, is the optimal test for the truth or falsity of the hypotheses. For two alternative hypotheses with parameter distributions completely specified, the Neyman Pearson lemma (Neyman and Pearson 1933) shows how to construct the most powerful, or optimal, test. For example, given two alternative hypotheses for the cause of climate change, the test that best distinguishes between them is that derived by application of the Neyman Pearson lemma. We will illustrate this process in detail below. Many detection problems have been addressed with the theory of hypothesis tests (Swets 1988; Swets et al. 2000). While hypotheses for climate change are often defined by models with certain model parameters, this is not necessary, as we show below. Hypotheses for climate change can be represented by models with parameters distributed by a specified probability distribution. In this way our limited understanding of these parameters is accounted for, making the modeled hypotheses more realistic representations of the alternatives. Models for climate data that are realistic depictions of our understanding of the climate system will include estimates of uncertainty. Therefore each of the types of problems listed above requires a characterization of variability and uncertainty that can be specified as a set of information prior to application of the method to the data. A common set of information may be used for all three problem types, so that the different results may be compared. In this manuscript, we present a new, general statistical methodology for deciding whether or not anthropogenic influences are causing climate change based on testing alternative distributed parameter hypotheses (cf. Levine and Berliner 1999). The affirmative of this statement, that is, that climate change has been detected and can be attributed to anthropogenic influences, we shall refer to as the hypothesis H 1. The alternative null hypothesis, that anthropogenic influences are not causing climate change, will be referred to as H 0. We suppose that statistical models are given, as discussed further below, for the two alternatives H 0 and H 1. Then for analysis of a given set of data, our methodology has the following six properties. (i) It provides a unified statistical framework for detection and attribution and provides the theoretically optimal method, within this framework, for testing the hypotheses. (ii) Its relevant statistical properties, at all significance levels, can be derived from first principles and displayed graphically in a simple way. Furthermore, the statistical properties do not depend on the data, but only on the models for H 0 and H 1. (iii) The methodology does not rely on specialized assumptions about the models (Hasselmann 1997), such as linearity with respect to param-

3 3466 JOURNAL OF CLIMATE VOLUME 14 eters, linear superposition of the effects of different forcings, Gaussian distributions of the parameters, or the existence of fixed fingerprint patterns for a suite of signals. (v) The methodology does not require, as does a fully Bayesian approach (Hasselmann 1998), that prior probabilities of the truth or falsity of H 0 and H 1 be provided. Note that in some problems (e.g., medical testing) such prior probabilities can be based on the results of past tests; however, for climate change there are no past realizations as a basis for these priors. Fortunately, in some cases of climate change study, these priors do not greatly influence the final results (Berliner et al. 2000). However, our proposed methodology does not require at all a personal, subjective probability, obtained without data, that anthropogenic activity is causing climate change. (vi) As in any statistical method, a decision may be made at any level of statistical significance that is specified by the decision maker. However, if the costs of a wrong decision and prior probabilities of the truth or falsity of H 0 and H 1 are specified, then this methodology will provide the unique significance level that will minimize the expected cost of the decision. The sole requirement of the method is that the hypotheses H 0 and H 1 each be completely characterized statistically, with no free parameters. The key to satisfying this requirement is property iv) above: all unknown parameters must be provided with (joint) a priori probability distributions; these probability distributions may, of course, be different depending on whether hypothesis H 0 or H 1 is true. Note that iv above may be considered a weakness of the method if, for instance, one cannot formulate a hypothesis for the behavior of the anthropogenic signal; in this case a consistency test of the null hypothesis may be the only method available. However, a formulation using distribution parameters solves a major difficulty, as identified by Levine and Berliner (1999), with all other methods in current use: no other methods can handle a test of a fuzzy null hypothesis, that is, a null hypothesis where some parameters are not known precisely, but rather have probability distributions. Note also that although the method we propose always leads to the most powerful statistical test, implementation of the test may require the solution of challenging mathematical and computational problems. We illustrate the application of distributed parameter hypothesis testing to the detection and attribution of climate change with modeled hypotheses for changes in globally aggregated mean-annual temperature. This application demonstrates that detection/attribution conclusions could be drawn from results of distributed parameter hypothesis tests based on the computed probabilities of test errors (false positive and false negative). We find, in these illustrative applications, that detection of climate change is contingent on the uncertainty specified in the definitions of the hypotheses. 2. Hypothesis testing, the Neyman Pearson lemma, and the ROC curve Given two fully defined (i.e., simple) alternative hypothesis H 0 and H 1, the Neyman Pearson lemma (Neyman and Pearson 1933) gives the most powerful (or optimal) test to decide between them. This most powerful test is to decide H 1 if the likelihood ratio L (D) is greater than a selected threshold : P 1(D) L(D), (2.1) P 0(D) where D is the data and P 0 and P 1 are the probability densities of the data under the respective hypotheses H 0 and H 1. If (2.1) is not satisfied, then we are to decide in favor of H 0. In particular, if the model for H i, i 0, 1 depends on a vector i of parameters, then probability distributions P i( i ) must be provided for them. Typically, these probabilities will be inferred from studies with a basis that may be either subjective or objective, and incorporate both estimated values for the parameters, and their estimated ranges of uncertainty. We note that these probabilities can be equivalent to the prior probability distributions used in Bayesian parameter estimation. Then the full probabilities P i (D) necessary for the likelihood ratio test will be calculated as multidimensional integrals, P (D) P (D )P ( ) d, i 0, 1, (2.2) i i i i i i where P i (D i ) are the respective conditional probabilities of the data, given that the parameters are known. The four possible outcomes of a hypothesis test are illustrated in the 2 2 contingency table shown in Fig. 1. The probability that the test asserts that H i is true when H j is actually true is given by the conditional probability P(H i H j ) for i 0, 1 and j 0, 1. Thus the diagonal elements of the table in Fig. 1 give the probabilities of the two possible correct decisions: P(H 1 H 1 ) is the probability of correctly detecting global warming due to anthropogenic effects, and P(H 0 H 0 ) is the probability of correctly rejecting anthropogenically caused global warming when indeed there is no such effect. The off-diagonal elements in Fig. 1 represent the two types of errors that are possible: 1) P(H 0 H 1 ), the probability of falsely rejecting anthropogenically caused global warming when it exists, which is called the false negative probability, denoted by the symbol P FN. 2) Similarly, P(H 1 H 0 ), the probability of incorrectly asserting that human activity has

4 15 AUGUST 2001 KHESHGI AND WHITE 3467 FIG. 1. The results of tests to choose between two hypotheses H 1 and H 0 have four possible outcomes represented by the 2 2 contingency table. The probability of each outcome for a given threshold is calculated from the probability distribution of the likelihood ratio L(D ) where D obeys either H 1 or H 0. FIG. 2. The ROC curve. caused global warming, is called the false positive probability, denoted by P FP. Clearly, we would like a statistical test that minimizes the probabilities of both types of errors, P FN and P FP. However this is not, in general, possible, since the two error probabilities must be traded off against each other. As an extreme example, we can easily construct a statistical test for which P FN is zero: for this test, simply always assert, regardless of the evidence, that human activity is causing global warming. Such a test is unacceptable, of course, since the price paid for P FN 0, is that P FP 1, that is, false positive errors are certain. A receiver or relative operating characteristic (ROC) curve can be used to show the relationship between the two types of test errors (see Fig. 2). Following standard practice for presentation of test statistics (Swets 1988; Swets et al. 2000), we plot the probability of successful detection, 1 P FN, versus the probability of erroneous detection P FP. If a choice between hypotheses is made without any information, then P FP P FN 1, which gives the diagonal ROC curve shown in Fig. 2. The test (2.1) is optimal in the sense that it minimizes P FN for each given P FP, leading to the ROC curve that is furthest above the diagonal. Suboptimal tests will lead to ROC curves that are below that derived from Eq. (2.1). The area under an entire ROC curve has been interpreted to be a measure of the accuracy of a hypothesis test (Swets 1988) or the probability of a correct decision (Egan 1975). The first step of the method, then, is to derive the form of the likelihood ratio function L( ), based on the models for H 0 and H 1. The next step is to calculate the statistical properties of the test, that is, the relation between the threshold, the probability of false-positive test results P FP, and the probability of false-negative test results P FN. Next, a specific threshold is chosen. Then finally, the test is applied to the data. Each of these steps will be illustrated in detail below, for several specific examples. The first step, derivation of the likelihood function, is essentially mathematical since it depends on a probabilistic analysis of the models that are proposed. One must first derive a mathematical expression for P 0 (D ), the probability density and data D given hypothesis H 0 ; next, one must similarly derive a mathematical expression for P 1 (D ), the probability density of the data given hypothesis H 1. The likelihood ratio L(D ) is then the ratio of these two expressions, Eq. (2.1). For some special cases, for example, the case of Gaussian probability distributions, one can make use of general formulas for L(D ), which involve the means and covariances of the random data vector D, conditional on each of the two alternative hypotheses. But however it is obtained, L(D ) must be known and computable for all possible values of the data vector D, not just the values that are actually observed. Note that L(D) is a scalar transformation of the data vector D, and that the most powerful test, Eq. (2.1), depends only on the value of this scalar random variable, not on the entire random data vector. Therefore, all of the statistical properties of the most powerful test can be calculated from just two one-dimensional probability density functions: the probability densities of L, conditional on H i for i 0, 1. These probability densities may be derived mathematically or, alternatively, can be computed by Monte Carlo simulations as follows: for each hypothesis H i, i 0, 1, one generates a random data vector D using the model definition and a random number generator. The corresponding value of L is then computed from the known function L(D ). Repeating this

5 3468 JOURNAL OF CLIMATE VOLUME 14 process many times one can generate a histogram of the values of L that approximates the probability density function of L conditional on the hypothesis H i. The statistical properties of the most powerful test are then completely specified by the cumulative probability distributions of L conditional on the two hypotheses. That is, we have P P{L H } P P{L H }. (2.3) FP 0 FN 1 Using this equation, one obtains the false positive and false negative error rates for all possible values of the threshold. This equation is illustrated graphically in the second diagram in Fig. 1. The ROC curve is then obtained by recording the changing relation between the error rates as the threshold is varied. Note that by this method, a complete specification of the statistical test, and a complete analysis of its statistical properties, that is, the error rates, is obtained before any real data is analyzed. Only model properties and model data are used to derive the test and its statistics. The choice of threshold depends on the significance level desired for the test. More specifically, a given choice of threshold will correspond to a point on the ROC curve where the error rates are acceptably small. Alternatively, if action will be taken based on the test results, the threshold may be chosen from economic considerations. For example, a threshold can be chosen to minimize the impact of test errors, for example, cost P FP P FN, for a given decision where CFN P(H 1), (2.4) CFP P(H 0) C FP and C FN are the costs associated with wrong test results, and P(H 0 ) and P(H 1 ) are the prior probabilities that H 0 and H 1 are true. Minimization of cost leads to a choice of threshold that depends on the trade-off between false positive and false negative test results, and so depends on the shape of the ROC curve as illustrated in Fig. 3a. For different shaped ROC curves the values of P FP and P FN at the threshold will vary, as illustrated by the cost minimization criteria applied to a family of ROC curves in Fig. 3b. For this family of ROC curves, a region is defined by the boundary set by the cost minimization thresholds. If the likelihood ratio calculated when the test is applied to the data lies on the corresponding ROC curve in this region, the choice is made for H 1. Given, for example, a choice of for a specific detection of climate change decision, Fig. 3c illustrates the region (for a family of ROC curves) in which the test would result in the choice of H 1, the detection of an anthropogenic effect on climate change. Outside of this region, the result would be nondetection of an anthropogenic effect (H 0 ). Given, for example, a different choice of for a specific rejection of climate change decision, Fig. 3c illustrates a different region in which the test would result in the choice of H 0, the rejection of an anthropogenic effect on climate change. Outside of this region, the result would be nonrejection of an anthropogenic effect (H 0 ). These illustrations are provided to help interpret the results shown in section Application to climate change detection The data D in climate change detection studies commonly consists of discrete sets of temperature records aggregated over spatial and temporal scales to an appropriate level to be represented by climate models. These models are then used to build, for example, alternative hypotheses for temperature change. While we consider only globally aggregated data in the examples shown in this study, the methods could be applied to data disaggregated in both space and time. In detection and attribution of climate change, hypotheses are often chosen to include or exclude mechanisms for climate change by various forcing mechanisms such as greenhouse gases, anthropogenic sulfate aerosols, volcanoes, and variations in solar insolation (Hegerl et al. 1997; Tett et al. 1999). For the detection of anthropogenic effects on climate, one hypothesis is chosen to be a socalled null hypothesis representing only nonanthropogenic effects while the other hypothesis includes anthropogenic effects; this is illustrated in section 5. More generally, alternative hypothesis pairs including different combinations of anthropogenic and natural effects can be designed to address attribution decisions. Consider, for example, two alternative hypotheses, in each of which the data D have a Gaussian distribution. For i 0, 1 the data have a mean (or signal) S i ( i ) and an additive noise N i ( i ). That is, these hypotheses depend on model parameters i, which can be used to parameterize uncertainty in models for the signal and noise. The data would be given, then, under hypothesis H i by D S i( i) N i( i ), (3.1) where noise N i has zero mean and covariance matrix C i ( i ). The conditional probabilities, given i, are then P i(d i) 1 det[2c i ( i )] 1 T 1 i i i i exp [D S( )] C( ) [D S ( )]. 2 (3.2) Given probability distributions of parameters i, the unconditional probabilities are then calculated from the multidimensional integral, Eq. (2.2). The alternatives are then simple hypotheses, and the most powerful test to differentiate between the two alternatives is the Neyman Pearson likelihood ratio test, Eq. (2.1). Note that a common model framework can exist between this hypothesis test and Bayesian parameter es-

6 15 AUGUST 2001 KHESHGI AND WHITE 3469 FIG. 3. The threshold of a hypothesis test can be chosen to minimize the cost of test errors. (a) Shown graphically on the ROC curve, if the cost is a linear function of P FP and P FN, then the threshold that yields minimum cost is at a location on the ROC curve where a line of constant cost is tangent to the ROC curve. (b) For a one-parameter family of ROC curves, the region giving a positive test result for H 1 is described by the region bounded by the family of least-cost thresholds. (c) Hypothesis tests for the detection or attribution of climate change can be interpreted as regions constructed by least cost. Different costs can be assigned to give a region for rejection of climate change hypotheses. timation if a common set of parameter probability distributions (priors in Bayesian estimation), P i( i ), i 0, 1, are chosen for the two different statistical procedures. Interpretation of hypothesis test results, and posterior Bayesian estimates, however, differ. 4. Statistical formulation of hypothesis tests a. Hypotheses with parameters precisely specified We will consider further the Gaussian example of signal plus noise, as introduced in section 3 above. Consider first the case when the parameters are specified precisely, so that i are just fixed numbers. That is, the probability distributions of the parameters of the modeled hypotheses are delta functions. Then the signals and the noise covariances may be denoted as simply S i, and C i, i 0, 1, respectively. From Eqs. (2.1) and (3.2), the likelihood ratio test is P 1(D) L(D) P 0 (D) 1 T exp [D S ] C [D S ] 2 det(c 0) det(c ) 1 1 T exp [D S ] C [D S ] 0 0. (4.1)

7 3470 JOURNAL OF CLIMATE VOLUME 14 By taking logarithms in Eq. (4.1), it is apparent that the optimal test is to compare a quadratic function of the data D to a threshold T 1 T [D S ] C [D S ] [D S ] C [D S 1], (4.2) where 2 ln() ln[det(c 1)] ln[det(c 0)]. (4.3) For the special case when the covariances of the two hypothesized noises are identical C0 C1 C, (4.4) the quadratic terms in Eq. (4.2) cancel, and the test becomes linear. It is only in this much simplified case that the optimal test corresponds to a test that is linear in the data and has a Gaussian distribution under either hypothesis. Tests with these properties are in common use for climate change models (Hasselmann 1997). Note, however, that even the full quadratic form in Eq. (4.2) is a very much simplified case, relying on the assumptions of (i) additive signal and noise, (ii) Gaussian statistics, and (iii) precisely specified parameters. A fourth assumption, Eq. (4.4) is necessary to give a linear Gaussian test. This fact was missed, for example, by Bell (1986), who, while not referring to the Neyman Pearson lemma, derived the correct result for the special case of Gaussians with equal covariances; his solution for unequal covariances is, however, suboptimal, since he assumes that the test is linear. In the remainder of this section, we will adopt the above four assumptions, to completely elucidate the simplified situation of the linear test, for which there is an analytical solution. This test is given by the linear matched detector: T V D *, (4.5) where V C 1(S 1 S 0) and (4.6) 1 T 1 T * ln [S C S S C S ]. (4.7) 2 For this test the probabilities of test errors are T P P{V D FP * H 0} (4.8) T P P{V D FN * H 1}, (4.9) which have the analytical solution where * V T S 0 T P FP 1 F (4.10) V CV * V T S 1 T P FN F, (4.11) V CV x 1 2/2 F(x) e d. (4.12) 2 The ROC curves possible for this set of hypotheses constitute a one-parameter family of curves. This can be seen more easily by defining an alternative threshold T * VS. 0 (4.13) For this definition of threshold the test statistics are given by for the test [ ] [ ] 1 P FP( ) 1 erf, (4.14) 2 T 2V CV T 1 V CV P FN( ) 1 erf, (4.15) T 2 2V CV T V(DS ). 0 (4.16) Therefore, ROC curves for this test are a one-parameter family of functions with parameter 2 T T 1 V CV S1C S 1, (4.17) which depends on the noise covariance and the signal. As approaches zero the data give little useful information, and so the ROC curve approaches the diagonal P FP 1 P FN. When becomes large then both error rates approach zero. b. Hypotheses with distributed parameters For the case where the covariance of both hypotheses is not identical or the model parameters are not known precisely, the test statistics, ROC curve, and test result do not generally have an analytical solution. And for this case, the probability distribution of the test statistic is not necessarily Gaussian, even if the underlying conditional probabilities are given by the Gaussian form, Eq. (3.2). For this more general case, the unconditional probability densities are obtained by numerical integration, via Eq. (2.2). For this test the probabilities of test errors are given by P P{L(D) FP H 0} P P{L(D) FN H 1}. (4.18) These probabilities can be approximated by generating Monte Carlo realizations of D, first under the hypothesis H 0 to find P FP, and then under the hypothesis H 1 to find P FN. For each case, the likelihood ratio L(D ) is calculated for each realization of D and then the probabilities (4.18) are obtained from the fraction of the total number of realizations for which L falls below or above a given. Note that calculation of a single realization of L(D ), for a given hypothesis H i, requires two numerical integrations, as prescribed in Eq. (2.2) for i 0, 1, to obtain the unconditional probability

8 15 AUGUST 2001 KHESHGI AND WHITE 3471 densities P 0 and P 1 evaluated at the corresponding realization of D. 5. Illustrative application: Global, mean-annual temperature In this section we present illustrative applications of distributed parameter hypothesis testing to the detection of climate change in the global mean-annual records of temperature. There are significant uncertainties in the modeling of, for example, climate variability, temperature record errors, and climate change forced by anthropogenic activities that are combined to form alternative hypotheses. These uncertainties can, in theory, be included in the definition of the hypotheses using distributed parameters. There are difficulties in doing this: models for uncertainties are incomplete, and evaluation of test statistics and test results can be computationally challenging. There has been characterization of uncertainties for some aspects of modeled climate change: for example, uncertainty analysis of radiative forcing (Schwartz and Andreae 1996; Harvey et al. 1997), subjective characterization of uncertainty in climate model response to forcing (Morgan and Keith 1995), and the use of long preinstrumental records to constrain models for natural variability (Kheshgi and White 1993b; Mann et al. 1998). Nevertheless, the distributions combining all expected contributors to uncertainty have yet to be assembled and confirmed. In this section the hypotheses constructed use very simple, and incomplete, distributions of parameters in hypotheses for global mean-annual temperature to illustrate the application of distributed parameter hypothesis testing. In addition, results are also shown using hypotheses with precisely specified parameters for comparison. The hypotheses for T of global near-surface temperature T from its modeled equilibrium value T 0 is taken to be the sum of a signal S plus noise N: T T(t) T0 S(t; ) N(t; ). (5.1) The signal and noise models used are described in appendixes A and B, respectively. The signal and noise models depend on parameters that are not known precisely. In this section we examine the effect of uncertainty in definition of hypotheses (characterized by uncertainty in these parameters) on the accuracy and results of hypothesis tests. Equation (5.1), including the signal S, of human activity, constitutes the hypothesis H 1. The null hypothesis H 0, that climate is not forced by human activities, corresponds to the lack of a signal in (5.1), that is H 0 : T N(t; ). (5.2) The noise model represents random processes due to climate variability and uncertainty in the instrumental record of global temperature. Although not included in these illustrative examples, the hypotheses could also include natural signals from, for example, volcanoes and changes in solar forcing. Since the energy balance model (appendix A) used in this study does not provide an expectation of the base temperature T 0, we will work only with the mean-subtracted data, or equivalently, with first differences of the data, which supply the equivalent information of data with the mean subtracted for the hypothesis test provided the covariance matrix is also transformed (see below). Therefore, the data vector (real and synthetic) is transformed by multiplying by the differencing matrix M , (5.3) _ which has dimensions n 1byn, where n is the dimension (number of years) of the data vector D. This transformation produces data that does not depend on T 0. The signal vector is then also multiplied by M, and covariance matrices C i ( i ) are transformed by the operation C ( ) MC ( )M T i i. i i (5.4) In the three subsections that follow, we test illustrative null (5.2) and anthropogenic (5.1) hypotheses for temperature against global mean-annual records of temperature. In the first section we show the sensitivity of results to the time extent of the temperature record tested using hypotheses with parameters precisely specified to be at their base case values. These results show the improvement of test accuracy with future acquisition of data. In section 5b we examine the effect on test statistics and test results of uncertainty in noise model parameter values. The generic noise model described in detail in appendix B is an autoregressive model that is the sum of two noise processes: a white noise process plus an autoregressive process with a correlation function that decays exponentially to zero for large time lags. The timescale for this autoregressive process is the exponential decay rate, called the correlation time, and is denoted by. The amplitude of the white noise process 0, the amplitude of the autoregressive process, and the timescale of the autoregressive process are the parameters examined in this noise model. Two different base case sets of noise model parameters are considered in examples shown: one with a timescale of 10 yr and the other with a timescale of 100 yr, both with amplitudes adjusted to be consistent with records of instrumental and preinstrumental temperature (see appendix B). The effects of uncertainty in noise model parameters are considered using distributed parameters and the results are compared to those using precisely defined parameters. In section 5c we examine the effect on test statistics and test results of uncertainty in signal model parameter values. The model for the anthropogenic cli-

9 3472 JOURNAL OF CLIMATE VOLUME 14 mate signal described in detail in appendix A includes a history of radiative forcing and an energy balance model for the temperature response to this forcing; these include a large set of input parameters. We consider the effect of two signal model parameters: the equilibrium climate sensitivity T 2x and the amplitude of aerosol forcing history R The effects of uncertainty in signal model parameters are considered using distributed parameters and results are compared to those using precisely defined parameters for these two parameters for which there is an expected range of uncertainty (Morgan and Keith 1995; Schwartz and Andreae 1996). FIG. 4. ROC curves and test results for different time extents of data. Hypotheses are defined with precise noise and signal model parameters. Base case parameterization of the signal is used (see appendix A). Circular symbols indicate location on the ROC curves of L(D) of the Jones et al. (1999) record of global mean temperature. Square symbols indicate location on the ROC curves of L(D) of the MSU global lower troposphere data (Spencer and Christy 1999). ROC curves and test results are given for two noise models: (a) century timescale noise, and (b) decade timescale noise (see Table 1 and appendix B). a. Data time extent In this section we calculate the effect of the length of the tested temperature data record on test statistics and, over time periods for which data exist, test results. Test statistics, the relation between P FP and P FN, can be calculated from (4.18) or the more specific case (4.8) (4.9) knowing only the definition of the hypotheses. Given definitions of the hypotheses, the improvement of test accuracy (reduction in P FP for a given P FN ) can be estimated for cases where the time range extends into the future. Improvement in test accuracy with future acquisition of data quantifies the value of the information gained with future acquisition of data. In parameter estimation by generalized linear least squares, for example, the uncertainty of estimated parameters can also be calculated over time ranges prior to application of the data given the parametric model (e.g., Kheshgi and White 1993a,b); however, calculation of parameter estimates do require the data. Similarly, to calculate test results the test must be applied to the data. The record of annual, global near-surface temperature given by Jones et al. (1999) extends from 1856 to 1998, and the global record of lower-tropospheric temperature derived from satellite microwave sounding unit (MSU LT) data (Spencer and Christy 1999) extends from 1979 to In this section, these temperature records are tested against illustrative hypotheses defined with precise choices of parameters. Recall that if the hypotheses use precisely defined parameters, then the ROC curve can be characterized by, the P FP and P FN of a test where the data are at the threshold is represented by V T D, and L(D) can be calculated from and V T D. For the base case choice of signal parameters (see appendix A) and two alternative sets of noise parameters (see appendix B), ROC curves are shown in Fig. 4 for different time ranges of data; the associated test statistics and results are given in Table 1. Three past time ranges are considered: the full time range for the surface data ( ), all but the last 10 yr ( ), and the last 20 yr ( ). The full time range leads to a more accurate test. We note here that to produce these ROC curves, no data are needed; the ROC curves are found by applying the Neyman Pearson lemma to the choice of modeled hypotheses. When the century timescale noise model is assumed (Fig. 4a), the time range leads to a less accurate test than when only the last 20 yr of the data are used. However, if the decade timescale noise model is assumed (Fig. 4b), the time range

10 15 AUGUST 2001 KHESHGI AND WHITE 3473 TABLE 1. Test statistics and results for different time extents of data. Hypotheses are defined with precise parameter values (i.e., without uncertainty). Base case parameterization of the signal is used (see appendix A). Test applied to Jones et al. (1999) record of global mean temperature [and for the period to the MSU global lower troposphere data (Spencer and Christy 1999), results shown in parentheses]. Noise model Time range V T D P * T FP VD 1 P * T FN VD Century timescale noise, C, C Decade timescale noise, C, C (1.36) (0.96) (0.1757) (0.1731) (0.7004) (0.5309) L(D) (0.46) (0.92) leads to a more accurate test than when the last 20 yr of the data are used. These tests are applied to the record of global nearsurface temperature. Calculated likelihood ratios are given in Table 1. If the test threshold were chosen to just be satisfied by the data, that is, * V T D, the probabilities P FP and P FN are given in Table 1 and plotted on the ROC curves in Fig. 4. When the hypotheses are specified to contain the decade, as opposed to century, timescale noise model, the tested data give a larger likelihood ratio along with a more accurate test (larger ) and, therefore, a more conclusive test in favor of detection. The test statistics can also be generated for the future before the acquisition of data, given definitions of the hypotheses. Table 1 also gives the test statistics, and Fig. 4 gives the ROC curves, for the time periods and (extending 10 and 20 yr beyond currently available surface data). These results show that an additional decade of data will lead to a much more accurate test when either decade or century timescale noise is assumed. While it is often assumed that a long time-record of temperature is essential to detect climate change or to estimate climate parameters, these results show that data from the past two decades provide the bulk of the information, although this conclusion does depend on the assumed noise model. This was also found in estimation of climate sensitivity (Kheshgi and White 1993b). Over the past 20 yr, satellite data have been available to estimate both solar intensity and tropospheric temperature. Prior to that period, variations in solar intensity could contribute significantly to the uncertainty in hypotheses for climate change, which would give further emphasis to recent data if this uncertainty were included in the modeled hypotheses. Detection methods have been applied to the satellitebased estimates of atmospheric temperature (e.g., Santer et al. 1996a). As an example, we apply the same hypothesis tests to the MSU LT global record of lowertropospheric temperature (Spencer and Christy 1999) as are applied to the surface temperature record over the last 20 yr ( ); results are shown in Table 1 and Fig. 4. Since the same pairs of hypotheses are tested, the test statistics, illustrated by the ROC curves in Fig. 4, are identical to those for the surface data ( ). When the tests are applied to the MSU LT data, however, the likelihood ratios of the tests (see Table 1) are lower than when applied to the surface data, and this difference is larger if the century timescale noise model is specified in the hypotheses rather than the decade timescale noise model. Therefore, this example test of the MSU LT global record of lower-tropospheric temperature gives a less conclusive test in favor of detection than if applied to the surface temperature record over the same time period, and a much less conclusive test compared to that of the surface temperature record over its full time extent ( ). Of course, simultaneous testing of both tropospheric and surface temperature records will lead to a more accurate test since the combined temperature records contain more information. To go beyond the illustrative test, however, requires that the modeled hypotheses be realistic representations of real alternatives. There is some basis from climate models to use similar signal and noise models in hypotheses for the global lower troposphere and surface temperature records: anthropogenic warming is modeled to be at least as great in the lower troposphere as at the surface (Hansen et al. 1998); and global temperature variability from model control runs are comparable for both the lower troposphere and surface (Santer et al. 2000). Furthermore, errors are not thought to be the primary cause of temperature anomalies in either record (NAS 2000). However, the difference between the global surface and lower-troposphere temperatures has been shown to be inconsistent with climate model results (Santer et al. 2000). And the sharp spike in global lowertroposphere temperature in 1998 attributed to ENSO (Spencer and Christy 1999) is extremely rare in either climate model realizations or autoregresive noise model realizations of the type used in the illustrative tests and summarized in appendix B. Given these inconsistencies, we conclude that there currently are not adequate models to define the hypotheses for the real alternatives for the detection of climate change from the global lower-troposphere temperature record.

11 3474 JOURNAL OF CLIMATE VOLUME 14 To model realistic hypotheses for temperature, uncertainties must be accounted for in their definition. This is done using distributed parameters in the definition of illustrative hypotheses including uncertainty in the amplitude and timescale of noise (section 5b) and uncertainty in climate sensitivity and forcing (section 5b). b. Noise model uncertainty Both hypothesis test statistics and test results depend on the noise defined in the alternative hypotheses. In this section test statistics and results are given for the illustrative null (5.2) and greenhouse (5.1) hypotheses, which have a common definition of noise statistics, that is, N 0 N 1. For the base case choice of signal parameters (see appendix A) and the full ( ) record of global mean temperature (GMT), the sensitivity of test statistics and results to precisely defined noise parameters is given in Table 2. 1) TIMESCALE In the noise model of appendix B, the correlation function for natural variability decays exponentially to zero for large time lags. The timescale for this model is the exponential decay rate, called the correlation time, and is denoted by. A model with 20 yr leads to the less accurate (lower value of ) test than when 10 or 30 yr. Therefore, noise of this timescale masks the climate signal more than noise of shorter or longer timescales. There is, however, a trade-off between timescale and amplitude. This trade-off occurs because, in general, good statistical estimation of the parameters of a time series requires observation of the data record for a length of time much longer than the intrinsic timescale, that is, the correlation length, of the underlying random process. Thus it has been shown that observed climate does bound the estimated amplitude of climate variability with a timescale shorter than the record, but is an ineffective bound on longer timescale variability (cf. Wigley and Raper 1990; Kheshgi and White 1993a). The possibility of larger (compared with the amplitude of short timescale noise) amplitude noise of longer timescale noise might not be excluded from realistic hypotheses of climate change. In Fig. 5, hypotheses with noise defined by two different sets of noise parameters are considered. Century timescale noise is defined to have an amplitude of a century timescale component of C, larger than that of decade timescale noise with C. The precise parameter case using this decade timescale noise leads to a more accurate test than that with century timescale noise. When the test is applied to the data, the likelihood ratio is lower with the century timescale noise. In the distributed parameter case shown in Fig. 5, the hypotheses are defined to have an equal probability that either noise model is true; that is, there is a 0.5 probability that the noise model parameters will equal those of the decade decade or century century timescale model: Pi( noise,i decade ) 0.5 and Pi( noise,i century ) 0.5 for hypotheses i 0, 1. In all distributed parameter cases shown in this study, results are approximated by generating 1000 Monte Carlo realizations of D, first under the hypothesis H 0 to find P FP, and then under the hypothesis H 1 to find P FN. Since there are a finite (two in this case) number of possible parameter values defined, the integral in Eq. (2.2) reduces to a finite sum. Note that the ROC curve happens to lie between those of the two precise parameter cases, but does not share the shape of the single-parameter family of ROC curves illustrated in Fig. 3 and defined by (4.14) (4.15). The test results in the distributed parameter case are similar to that with the century timescale noise model. 2) AMPLITUDE The accuracy of the test decreases as the amplitude of the long timescale component of the noise increases (see Table 2). When these tests are applied to the data, the probability of both types of test errors increases with this amplitude. The corresponding ROC curves for cases with differing amplitudes , 0.33, or 0.66C of century timescale noise are given in Fig. 6a. In the distributed parameter case shown in Fig. 6a, both hypotheses assume an equal probability that any of the three of the century timescale noise amplitudes is the true one. Note that this ROC curve lies close to the middle (base case) precise parameter case; however, the test results are somewhat different than the middle case. To show better why the ROC curve of the distributed parameter cases in 6a is so similar to that of the middle case, we consider this distributed parameter case in more detail. First, examination of the calculations for each individual realization used to calculate L(D ) in (4.18) shows that the conditional probability P i (D i ) in (2.2) is virtually equal to the component of the sum in (2.2) with parameters i equal to that used to generate D of the realization; that is, calculation of this probability easily can detect the difference between these three possible noise models. So, the calculated probability of P FP and P FN is virtually equal to the average of those of the precise parameter cases (see Fig. 7). This average happens to be close to that of the middle case over much of the ROC curve. In the precise parameter cases, the accuracy of the test is relatively insensitive to the amplitude 0 of the white noise component (see in Table 2). When these tests are applied to the data, the probabilities of both types of test errors are not as sensitive to this amplitude as the test is to, for example, the long timescale amplitude. This comparison, however, hides some important effects. First, if short timescale noise statistics differ between hypotheses, then the test will easily be able to detect the difference between hypotheses. In distributed

Hypothesis testing (cont d)

Hypothesis testing (cont d) Hypothesis testing (cont d) Ulrich Heintz Brown University 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 1 Hypothesis testing Is our hypothesis about the fundamental physics correct? We will not be able

More information

Hypothesis Testing - Frequentist

Hypothesis Testing - Frequentist Frequentist Hypothesis Testing - Frequentist Compare two hypotheses to see which one better explains the data. Or, alternatively, what is the best way to separate events into two classes, those originating

More information

Detection theory. H 0 : x[n] = w[n]

Detection theory. H 0 : x[n] = w[n] Detection Theory Detection theory A the last topic of the course, we will briefly consider detection theory. The methods are based on estimation theory and attempt to answer questions such as Is a signal

More information

Detection and Estimation Theory

Detection and Estimation Theory ESE 524 Detection and Estimation Theory Joseph A. O Sullivan Samuel C. Sachs Professor Electronic Systems and Signals Research Laboratory Electrical and Systems Engineering Washington University 2 Urbauer

More information

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary ECE 830 Spring 207 Instructor: R. Willett Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we saw that the likelihood

More information

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing Robert Vanderbei Fall 2014 Slides last edited on November 24, 2014 http://www.princeton.edu/ rvdb Coin Tossing Example Consider two coins.

More information

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Detection theory 101 ELEC-E5410 Signal Processing for Communications Detection theory 101 ELEC-E5410 Signal Processing for Communications Binary hypothesis testing Null hypothesis H 0 : e.g. noise only Alternative hypothesis H 1 : signal + noise p(x;h 0 ) γ p(x;h 1 ) Trade-off

More information

Optimum Joint Detection and Estimation

Optimum Joint Detection and Estimation 20 IEEE International Symposium on Information Theory Proceedings Optimum Joint Detection and Estimation George V. Moustakides Department of Electrical and Computer Engineering University of Patras, 26500

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information

Statistical Methods for Particle Physics (I)

Statistical Methods for Particle Physics (I) Statistical Methods for Particle Physics (I) https://agenda.infn.it/conferencedisplay.py?confid=14407 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

More information

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests http://benasque.org/2018tae/cgi-bin/talks/allprint.pl TAE 2018 Benasque, Spain 3-15 Sept 2018 Glen Cowan Physics

More information

44 CHAPTER 2. BAYESIAN DECISION THEORY

44 CHAPTER 2. BAYESIAN DECISION THEORY 44 CHAPTER 2. BAYESIAN DECISION THEORY Problems Section 2.1 1. In the two-category case, under the Bayes decision rule the conditional error is given by Eq. 7. Even if the posterior densities are continuous,

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

E. Santovetti lesson 4 Maximum likelihood Interval estimation

E. Santovetti lesson 4 Maximum likelihood Interval estimation E. Santovetti lesson 4 Maximum likelihood Interval estimation 1 Extended Maximum Likelihood Sometimes the number of total events measurements of the experiment n is not fixed, but, for example, is a Poisson

More information

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3083-3093 Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Julia Bondarenko Helmut-Schmidt University Hamburg University

More information

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)? ECE 830 / CS 76 Spring 06 Instructors: R. Willett & R. Nowak Lecture 3: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

The Generalized Likelihood Uncertainty Estimation methodology

The Generalized Likelihood Uncertainty Estimation methodology CHAPTER 4 The Generalized Likelihood Uncertainty Estimation methodology Calibration and uncertainty estimation based upon a statistical framework is aimed at finding an optimal set of models, parameters

More information

Introductory Econometrics. Review of statistics (Part II: Inference)

Introductory Econometrics. Review of statistics (Part II: Inference) Introductory Econometrics Review of statistics (Part II: Inference) Jun Ma School of Economics Renmin University of China October 1, 2018 1/16 Null and alternative hypotheses Usually, we have two competing

More information

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Statistical Methods in Particle Physics Lecture 1: Bayesian methods Statistical Methods in Particle Physics Lecture 1: Bayesian methods SUSSP65 St Andrews 16 29 August 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

More information

covariance function, 174 probability structure of; Yule-Walker equations, 174 Moving average process, fluctuations, 5-6, 175 probability structure of

covariance function, 174 probability structure of; Yule-Walker equations, 174 Moving average process, fluctuations, 5-6, 175 probability structure of Index* The Statistical Analysis of Time Series by T. W. Anderson Copyright 1971 John Wiley & Sons, Inc. Aliasing, 387-388 Autoregressive {continued) Amplitude, 4, 94 case of first-order, 174 Associated

More information

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes Neyman-Pearson paradigm. Suppose that a researcher is interested in whether the new drug works. The process of determining whether the outcome of the experiment points to yes or no is called hypothesis

More information

Bayesian Decision Theory

Bayesian Decision Theory Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS

UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS F. C. Nicolls and G. de Jager Department of Electrical Engineering, University of Cape Town Rondebosch 77, South

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University Lecture 19 Modeling Topics plan: Modeling (linear/non- linear least squares) Bayesian inference Bayesian approaches to spectral esbmabon;

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Detection and attribution, forced changes, natural variability, signal and noise, ensembles

Detection and attribution, forced changes, natural variability, signal and noise, ensembles ETH Zurich Reto Knutti Detection and attribution, forced changes, natural variability, signal and noise, ensembles Reto Knutti, IAC ETH What s wrong with this presentation? For the next two decades, a

More information

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Obtaining Critical Values for Test of Markov Regime Switching

Obtaining Critical Values for Test of Markov Regime Switching University of California, Santa Barbara From the SelectedWorks of Douglas G. Steigerwald November 1, 01 Obtaining Critical Values for Test of Markov Regime Switching Douglas G Steigerwald, University of

More information

Primer on statistics:

Primer on statistics: Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

More information

ROBUST MEASUREMENT OF THE DURATION OF

ROBUST MEASUREMENT OF THE DURATION OF ROBUST MEASUREMENT OF THE DURATION OF THE GLOBAL WARMING HIATUS Ross McKitrick Department of Economics University of Guelph Revised version, July 3, 2014 Abstract: The IPCC has drawn attention to an apparent

More information

EECS564 Estimation, Filtering, and Detection Exam 2 Week of April 20, 2015

EECS564 Estimation, Filtering, and Detection Exam 2 Week of April 20, 2015 EECS564 Estimation, Filtering, and Detection Exam Week of April 0, 015 This is an open book takehome exam. You have 48 hours to complete the exam. All work on the exam should be your own. problems have

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

Fundamental Probability and Statistics

Fundamental Probability and Statistics Fundamental Probability and Statistics "There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are

More information

LECTURE NOTE #3 PROF. ALAN YUILLE

LECTURE NOTE #3 PROF. ALAN YUILLE LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.

More information

CH.9 Tests of Hypotheses for a Single Sample

CH.9 Tests of Hypotheses for a Single Sample CH.9 Tests of Hypotheses for a Single Sample Hypotheses testing Tests on the mean of a normal distributionvariance known Tests on the mean of a normal distributionvariance unknown Tests on the variance

More information

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and

More information

Constructing Ensembles of Pseudo-Experiments

Constructing Ensembles of Pseudo-Experiments Constructing Ensembles of Pseudo-Experiments Luc Demortier The Rockefeller University, New York, NY 10021, USA The frequentist interpretation of measurement results requires the specification of an ensemble

More information

Chapter 2 Signal Processing at Receivers: Detection Theory

Chapter 2 Signal Processing at Receivers: Detection Theory Chapter Signal Processing at Receivers: Detection Theory As an application of the statistical hypothesis testing, signal detection plays a key role in signal processing at receivers of wireless communication

More information

INTRODUCTION TO INTERSECTION-UNION TESTS

INTRODUCTION TO INTERSECTION-UNION TESTS INTRODUCTION TO INTERSECTION-UNION TESTS Jimmy A. Doi, Cal Poly State University San Luis Obispo Department of Statistics (jdoi@calpoly.edu Key Words: Intersection-Union Tests; Multiple Comparisons; Acceptance

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

DETECTION theory deals primarily with techniques for

DETECTION theory deals primarily with techniques for ADVANCED SIGNAL PROCESSING SE Optimum Detection of Deterministic and Random Signals Stefan Tertinek Graz University of Technology turtle@sbox.tugraz.at Abstract This paper introduces various methods for

More information

Predicting uncertainty in forecasts of weather and climate (Also published as ECMWF Technical Memorandum No. 294)

Predicting uncertainty in forecasts of weather and climate (Also published as ECMWF Technical Memorandum No. 294) Predicting uncertainty in forecasts of weather and climate (Also published as ECMWF Technical Memorandum No. 294) By T.N. Palmer Research Department November 999 Abstract The predictability of weather

More information

A HIERARCHICAL MODEL FOR REGRESSION-BASED CLIMATE CHANGE DETECTION AND ATTRIBUTION

A HIERARCHICAL MODEL FOR REGRESSION-BASED CLIMATE CHANGE DETECTION AND ATTRIBUTION A HIERARCHICAL MODEL FOR REGRESSION-BASED CLIMATE CHANGE DETECTION AND ATTRIBUTION Richard L Smith University of North Carolina and SAMSI Joint Statistical Meetings, Montreal, August 7, 2013 www.unc.edu/~rls

More information

Detection and Estimation Theory

Detection and Estimation Theory ESE 524 Detection and Estimation Theory Joseph A. O Sullivan Samuel C. Sachs Professor Electronic Systems and Signals Research Laboratory Electrical and Systems Engineering Washington University 2 Urbauer

More information

Bayesian Climate Change Assessment

Bayesian Climate Change Assessment 1NOVEMBER 000 BERLINER ET L. 3805 Bayesian Climate Change ssessment L. MRK BERLINER The Ohio State University, Columbus, Ohio RICHRD. LEVINE University of California, Davis, Davis, California DENNIS J.

More information

Chapter Three. Hypothesis Testing

Chapter Three. Hypothesis Testing 3.1 Introduction The final phase of analyzing data is to make a decision concerning a set of choices or options. Should I invest in stocks or bonds? Should a new product be marketed? Are my products being

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian

More information

The Analysis of Power for Some Chosen VaR Backtesting Procedures - Simulation Approach

The Analysis of Power for Some Chosen VaR Backtesting Procedures - Simulation Approach The Analysis of Power for Some Chosen VaR Backtesting Procedures - Simulation Approach Krzysztof Piontek Department of Financial Investments and Risk Management Wroclaw University of Economics ul. Komandorska

More information

Observed Global Warming and Climate Change

Observed Global Warming and Climate Change Observed Global Warming and Climate Change First Dice Activity natural state human activity measured global warming of Earth s surface primarily caused by anthropogenic increase in greenhouse gases Ozone,

More information

Statistics for the LHC Lecture 2: Discovery

Statistics for the LHC Lecture 2: Discovery Statistics for the LHC Lecture 2: Discovery Academic Training Lectures CERN, 14 17 June, 2010 indico.cern.ch/conferencedisplay.py?confid=77830 Glen Cowan Physics Department Royal Holloway, University of

More information

Probability and Statistics. Joyeeta Dutta-Moscato June 29, 2015

Probability and Statistics. Joyeeta Dutta-Moscato June 29, 2015 Probability and Statistics Joyeeta Dutta-Moscato June 29, 2015 Terms and concepts Sample vs population Central tendency: Mean, median, mode Variance, standard deviation Normal distribution Cumulative distribution

More information

INTRODUCTION TO PATTERN RECOGNITION

INTRODUCTION TO PATTERN RECOGNITION INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take

More information

Advanced statistical methods for data analysis Lecture 1

Advanced statistical methods for data analysis Lecture 1 Advanced statistical methods for data analysis Lecture 1 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Hypothesis testing. Chapter Formulating a hypothesis. 7.2 Testing if the hypothesis agrees with data

Hypothesis testing. Chapter Formulating a hypothesis. 7.2 Testing if the hypothesis agrees with data Chapter 7 Hypothesis testing 7.1 Formulating a hypothesis Up until now we have discussed how to define a measurement in terms of a central value, uncertainties, and units, as well as how to extend these

More information

Bayesian System Identification based on Hierarchical Sparse Bayesian Learning and Gibbs Sampling with Application to Structural Damage Assessment

Bayesian System Identification based on Hierarchical Sparse Bayesian Learning and Gibbs Sampling with Application to Structural Damage Assessment Bayesian System Identification based on Hierarchical Sparse Bayesian Learning and Gibbs Sampling with Application to Structural Damage Assessment Yong Huang a,b, James L. Beck b,* and Hui Li a a Key Lab

More information

Partitioning the Parameter Space. Topic 18 Composite Hypotheses

Partitioning the Parameter Space. Topic 18 Composite Hypotheses Topic 18 Composite Hypotheses Partitioning the Parameter Space 1 / 10 Outline Partitioning the Parameter Space 2 / 10 Partitioning the Parameter Space Simple hypotheses limit us to a decision between one

More information

Bayesian vs frequentist techniques for the analysis of binary outcome data

Bayesian vs frequentist techniques for the analysis of binary outcome data 1 Bayesian vs frequentist techniques for the analysis of binary outcome data By M. Stapleton Abstract We compare Bayesian and frequentist techniques for analysing binary outcome data. Such data are commonly

More information

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1 4 Hypothesis testing 4. Simple hypotheses A computer tries to distinguish between two sources of signals. Both sources emit independent signals with normally distributed intensity, the signals of the first

More information

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics Table of Preface page xi PART I INTRODUCTION 1 1 The meaning of probability 3 1.1 Classical definition of probability 3 1.2 Statistical definition of probability 9 1.3 Bayesian understanding of probability

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

Probability and Statistics. Terms and concepts

Probability and Statistics. Terms and concepts Probability and Statistics Joyeeta Dutta Moscato June 30, 2014 Terms and concepts Sample vs population Central tendency: Mean, median, mode Variance, standard deviation Normal distribution Cumulative distribution

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

Miscellany : Long Run Behavior of Bayesian Methods; Bayesian Experimental Design (Lecture 4)

Miscellany : Long Run Behavior of Bayesian Methods; Bayesian Experimental Design (Lecture 4) Miscellany : Long Run Behavior of Bayesian Methods; Bayesian Experimental Design (Lecture 4) Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/bayes/ Bayesian

More information

Uncertainty and Rules

Uncertainty and Rules Uncertainty and Rules We have already seen that expert systems can operate within the realm of uncertainty. There are several sources of uncertainty in rules: Uncertainty related to individual rules Uncertainty

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Probabilities & Statistics Revision

Probabilities & Statistics Revision Probabilities & Statistics Revision Christopher Ting Christopher Ting http://www.mysmu.edu/faculty/christophert/ : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 January 6, 2017 Christopher Ting QF

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

ME3620. Theory of Engineering Experimentation. Spring Chapter IV. Decision Making for a Single Sample. Chapter IV

ME3620. Theory of Engineering Experimentation. Spring Chapter IV. Decision Making for a Single Sample. Chapter IV Theory of Engineering Experimentation Chapter IV. Decision Making for a Single Sample Chapter IV 1 4 1 Statistical Inference The field of statistical inference consists of those methods used to make decisions

More information

(1) Why do we need statistics?

(1) Why do we need statistics? (1) Why do we need statistics? Statistical methods are required to ensure that data are interpreted correctly and that apparent relationships are meaningful (or significant ) and not simply chance occurrences.

More information

Statistical Methods in Particle Physics

Statistical Methods in Particle Physics Statistical Methods in Particle Physics Lecture 11 January 7, 2013 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline How to communicate the statistical uncertainty

More information

Climate sensitivity of Earth to solar irradiance: update

Climate sensitivity of Earth to solar irradiance: update Paper presented at 2004 Solar Radiation and Climate (SORCE) meeting on Decade Variability in the Sun and the Climate, Meredith, New Hampshire, October 27-29, 2004 Climate sensitivity of Earth to solar

More information

Systematic uncertainties in statistical data analysis for particle physics. DESY Seminar Hamburg, 31 March, 2009

Systematic uncertainties in statistical data analysis for particle physics. DESY Seminar Hamburg, 31 March, 2009 Systematic uncertainties in statistical data analysis for particle physics DESY Seminar Hamburg, 31 March, 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

More information

Effects of Black Carbon on Temperature Lapse Rates

Effects of Black Carbon on Temperature Lapse Rates Effects of Black Carbon on Temperature Lapse Rates Joyce E. Penner 1 Minghuai Wang 1, Akshay Kumar 1, Leon Rotstayn 2, Ben Santer 1 University of Michigan, 2 CSIRO, 3 LLNL Thanks to Warren Washington and

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

More information

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course. Name of the course Statistical methods and data analysis Audience The course is intended for students of the first or second year of the Graduate School in Materials Engineering. The aim of the course

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter

More information

http://www.math.uah.edu/stat/hypothesis/.xhtml 1 of 5 7/29/2009 3:14 PM Virtual Laboratories > 9. Hy pothesis Testing > 1 2 3 4 5 6 7 1. The Basic Statistical Model As usual, our starting point is a random

More information

Statistics for the LHC Lecture 1: Introduction

Statistics for the LHC Lecture 1: Introduction Statistics for the LHC Lecture 1: Introduction Academic Training Lectures CERN, 14 17 June, 2010 indico.cern.ch/conferencedisplay.py?confid=77830 Glen Cowan Physics Department Royal Holloway, University

More information

Learning About Climate Sensitivity. From the Instrumental Temperature Record +

Learning About Climate Sensitivity. From the Instrumental Temperature Record + Learning About Climate Sensitivity From the Instrumental Temperature Record + David L. Kelly, * Charles D. Kolstad, ** Michael E. Schlesinger *** and Natalia G. Andronova *** The debate over the magnitude

More information

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Berkman Sahiner, a) Heang-Ping Chan, Nicholas Petrick, Robert F. Wagner, b) and Lubomir Hadjiiski

More information

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009 Statistics for Particle Physics Kyle Cranmer New York University 1 Hypothesis Testing 55 Hypothesis testing One of the most common uses of statistics in particle physics is Hypothesis Testing! assume one

More information

g(.) 1/ N 1/ N Decision Decision Device u u u u CP

g(.) 1/ N 1/ N Decision Decision Device u u u u CP Distributed Weak Signal Detection and Asymptotic Relative Eciency in Dependent Noise Hakan Delic Signal and Image Processing Laboratory (BUSI) Department of Electrical and Electronics Engineering Bogazici

More information

A Statistical Input Pruning Method for Artificial Neural Networks Used in Environmental Modelling

A Statistical Input Pruning Method for Artificial Neural Networks Used in Environmental Modelling A Statistical Input Pruning Method for Artificial Neural Networks Used in Environmental Modelling G. B. Kingston, H. R. Maier and M. F. Lambert Centre for Applied Modelling in Water Engineering, School

More information

Topic 3: Hypothesis Testing

Topic 3: Hypothesis Testing CS 8850: Advanced Machine Learning Fall 07 Topic 3: Hypothesis Testing Instructor: Daniel L. Pimentel-Alarcón c Copyright 07 3. Introduction One of the simplest inference problems is that of deciding between

More information

ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters

ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters D. Richard Brown III Worcester Polytechnic Institute 26-February-2009 Worcester Polytechnic Institute D. Richard Brown III 26-February-2009

More information

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006 Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20, 2006 1 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher)

More information

Physics 509: Bootstrap and Robust Parameter Estimation

Physics 509: Bootstrap and Robust Parameter Estimation Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept

More information

Statistical Methods for Astronomy

Statistical Methods for Astronomy Statistical Methods for Astronomy If your experiment needs statistics, you ought to have done a better experiment. -Ernest Rutherford Lecture 1 Lecture 2 Why do we need statistics? Definitions Statistical

More information

Evaluating the value of structural heath monitoring with longitudinal performance indicators and hazard functions using Bayesian dynamic predictions

Evaluating the value of structural heath monitoring with longitudinal performance indicators and hazard functions using Bayesian dynamic predictions Evaluating the value of structural heath monitoring with longitudinal performance indicators and hazard functions using Bayesian dynamic predictions C. Xing, R. Caspeele, L. Taerwe Ghent University, Department

More information

Data Privacy in Biomedicine. Lecture 11b: Performance Measures for System Evaluation

Data Privacy in Biomedicine. Lecture 11b: Performance Measures for System Evaluation Data Privacy in Biomedicine Lecture 11b: Performance Measures for System Evaluation Bradley Malin, PhD (b.malin@vanderbilt.edu) Professor of Biomedical Informatics, Biostatistics, & Computer Science Vanderbilt

More information

Detection and Attribution of Climate Change

Detection and Attribution of Climate Change Detection and Attribution of Climate Change What is D&A? Global Mean Temperature Extreme Event Attribution Geert Jan van Oldenborgh, Sjoukje Philip (KNMI) Definitions Detection: demonstrating that climate

More information

Detection Theory. Chapter 3. Statistical Decision Theory I. Isael Diaz Oct 26th 2010

Detection Theory. Chapter 3. Statistical Decision Theory I. Isael Diaz Oct 26th 2010 Detection Theory Chapter 3. Statistical Decision Theory I. Isael Diaz Oct 26th 2010 Outline Neyman-Pearson Theorem Detector Performance Irrelevant Data Minimum Probability of Error Bayes Risk Multiple

More information

Basic Probabilistic Reasoning SEG

Basic Probabilistic Reasoning SEG Basic Probabilistic Reasoning SEG 7450 1 Introduction Reasoning under uncertainty using probability theory Dealing with uncertainty is one of the main advantages of an expert system over a simple decision

More information