ON USING BOOTSTRAP APPROACH FOR UNCERTAINTY ESTIMATION
|
|
- Tyler Richard
- 5 years ago
- Views:
Transcription
1 The First International Proficiency Testing Conference Sinaia, România 11 th 13 th October, 2007 ON USING BOOTSTRAP APPROACH FOR UNCERTAINTY ESTIMATION Grigore Albeanu University of Oradea, UNESCO Chair in Information Technologies, Universitatii Str., No. 1, , Oradea, Romania Abstract Computer-Intensive methods for estimation assessment provide valuable information concerning the adequacy of applied probabilistic models. The bootstrap method is an extensive computational approach to uncertainty estimation based on resampling and statistical estimation. It is a powerful tool, especially when only a small data set is used to predict the behaviour of systems or processes. This paper provides a methodology to investigate uncertainty evaluation by bootstrap and a procedure to obtain confidence bands for linear and nonlinear models used in data analysis and design measurements. Also statistical considerations for proficiency testing are given. Numerical examples and graphical visualizations will be shown for a case study. Key words Bootstrap method, Resampling methods, Data modelling, Sampling variance, Median, Z-scores, Confidence Intervals, Confidence bands, Proficiency Testing 1 INTRODUCTION Many modelling applications in social and engineering sciences give rise to an estimation process (parameters or functional relations). The numerical estimation process always depends on the previously data collection. Even data filtering and outlier s elimination procedures were applied, the ill-conditioned effect could appear. To increase the confidence on the estimations or to improve the estimates, some computer intensive methods were recommended. This paper describes the usage of the bootstrap approach for assessing the estimation process and to provide simultaneous confidence bands for the studied model. 221
2 2 THEORETICAL BACKGROUND Bootstrap is a simple but powerful Monte Carlo method to assess statistical accuracy or to estimate a distribution from sample's statistics. The name may come from phrase Pull yourself up by your bootstraps having the following meaning Improve your situation by your own efforts, as Phrase Finder [13] is able to find. The methods are suitable for any level of modelling being useful for fully parametric, semiparametric, and completely nonparametric analysis. These approaches are not only in use by statisticians, but also are applied anywhere statistics can be used: life sciences, social sciences, econometrics, reliability etc. For the aim of this paper we outline the basic bootstrap principle, the application of bootstrap sampling for accuracy estimation and the method of simultaneous confidence bands for uncertainty management. 2.1 The bootstrap method principle Let X be a random variable and F the cumulative distribution function of the variable X. The bootstrap method was proposed by Efron [5, 6, 7] and it can be used on the estimation of: the distribution function of a random variable R(X, F) ; a functional relation V(F), or the accuracy of a statistics s obtained from a sample (X 1, X 2,..., X n ) of size n from X. In the framework of this paper, the accuracy, describes the variability of s when independent estimations s(1), s(2),..., of the statistics s, are obtained by resampling. The bootstrap technique uses the sample (X 1, X 2,..., X n ) to obtain the sampling cumulative distribution function F n (x) in order to replace the true cumulative distribution function F: F n (x) = (1/n) cardinal { x i x; 1 i n }. To repeatedly simulate bootstrap samples X*:=(X 1 *, X 2 *,..., X n *) from F n, random number generators should be used according to the Monte Carlo approaches. Then, for each bootstrap sample, it is recalculated: the distribution function of the random variable R(X*, F n ) ; the functional relation V(F n ) or V(F n *); the statistics s*( ). The accuracy of the statistics s can be derived under an appropriate statistical inference study on the sequence s*(). The bootstrap resampling can be realised in various ways. Uniform resampling and the importance resampling are the mostly used. We will present, in detail, these resampling algorithms. Uniform resampling assumes that the measurement (observed) values are uniformly sampled from some process. The algorithm for uniform resampling consists in the following steps: 1. Initialise a uniform random number generator over (0,1), say random() ; 2. For k from 1 to n do {u := random(); j := [n u] + 1; X k * := X j }. 222
3 The importance resampling algorithm assumes the generation of sampling values according to a probability distribution {(x i, p i ): i = 1, 2,..., n} such as every p i is a nonnegative real number, and p 1 + p p n = 1. Using a method to generate random samples according to a given discrete distribution function (for more about random variable generation see [14]), the bootstrap sample is obtained in the following steps [1, 2]: 1. Initialise a uniform random number generator over (0,1); 2. For k from 1 to n do steps Let j := 0, and u := random(); 2.2. j := j + 1; 2.3. If u > p 1 + p p j then goto step 2.2; 2.4. X k * := X j. As a common example of the usage of the uniform resampling, we refer to the bootstrap algorithm for estimating standard errors. However, when some observations are more important then others, the importance resampling can provide close to real conclusions. If resampling is based on importance resampling weights, then the bootstrap estimates are reweighted as if uniform resampling is done. 2.2 Uncertainty estimation and bootstrap Terminology For a set (sample) of uncorrelated N random variables, x i, i = 1, 2,, N, let us denote by Mean[X] the sample mean given by: N 1 Mean[X] =. x i N i= 1 The median is the middle value of the group for a particular sample, i.e. half of the results for the sample are higher than it and half are lower (Q2). It is calculated from the sorted values (from lowest to highest). If N is even, the median is the average of the two central values. Let us denote this value by Median[X]. Interquartile range (IQR[X]) is the difference between the lower and upper quartiles = Q3 - Q1. The lower quartile is the value below a quarter of the results lie. Similarly, the upper quartile is the value above a quarter of the results lie. Normalised IQR (NIQR[X]) is a measure of the variability of the results which basically is a robust standard deviation. It is equal to the IQR[X] multiplied by the factor Robust CV (coefficient of variation) is equal to the NIQR[X] divided by the median, expressed as a percentage (i.e. multiplied by 100), it allows for the variability in different tests to be compared. An accepted statistical method for analysis test results in proficiency testing is to calculate a Z-score for each laboratory s result. The standard form for the calculation of Z-scores is 223
4 X i A[ X ] Z i =, B[ X ] where A[X] is the assigned value (sample mean), and B[X] is an estimate of the spread of all results (standard deviation). The classical approach based on mean and standard deviation is signifiquantly influenced by the presence of extreme values (outliers). Therefore, a robust approach based on median and interquantile range is better to be used [11, 16]. Robust Z-scores are calculated by replacing A[X] and B[X] in the classical Z-score by the median and NIQR, respectively. For proficiency testing both betweenlaboratory and within laboratory Z-scores can be used. The standardised sum (S) and the standardised difference (D) for the pair of results are: S = (A+B)/ 2 and D = (A-B)/ 2 (median of A is less than median of B) or D = (B-A)/ 2, otherwise. The between-laboratory Z-score (ZB) is the robust Z-score of S and the withinlaboratory Z-score (ZW) is the robust Z-score of D: and ZB = (S-Median[S])/NIQR(S), ZW = (D - Median(D))/NIQR(D). A methodology based on bootstrap approach can be used to study the robust Z- scores ZB and ZW Simultaneous bootstraping the MEAN and STDEV A Z-score can be computed repeatedly by simultaneous resampling in order to obtain a resample mean and standard deviance. The process is repeated by a number of steps and the final results can be obtained considering the best performance. An 80% approach can be used when a strong acceptance is required. Generally speaking, the best test could be based on a 51% approach Simultaneous bootstraping the sample MEDIAN and NIQR A robust Z-score can be computed repeatedly by simultaneous resampling in order to obtain a resample median and normalised interquantile range. The process is repeated by a number of steps and the final results can be obtained considering the best performance. Also, an 80% approach can be used when a strong acceptance is required. Generally speaking, the best test could be based on a 51% approach. 2.3 Simultaneous confidence bands The general framework for obtaining simultaneous confidence bands for non-linear explicit regression models using resampling techniques is presented in [1, 2, 3]. Let g+, g- be two known nonnegative models (real functions which may depend on the input data), and u+, u- be nonnegative scale factors. The confidence band for the model h (used to explain a relation such as y i = h(x i, θ) + ε i (i = 1, 2, ) is the region: R = {(x, y): x Input domain, h(x, est(θ)) - est(v)u - g - (x) y h(x, est(θ)) + est(v)u + g + (x) }, 224
5 where (x i, y i ) i 1 are observations, θ is a parameter vector, est(θ) is a least squares estimate of θ, (ε i ) i 1 are random variables distributed independently and identically with an unknown cumulative distribution function (cdf) F, but with mean zero and unknown variance σ 2, and est(v) is an estimate of σ. The reference [11] describes the application of this technique for both a parametric log-linear and a kernel estimation model in the normalised input domain (0,1). However, any input domain and any estimation model could be used to obtain an estimation h(x,.). Obtaining the region R asks for solving the following problem: given the models g-, g+, and a (0, 1), search for the non-negative real numbers u- and u+ such that P{h(x, est(θ)) - est(v)u - g - (x) h(x, θ) h(x, est(θ)) + est(v)u + g + (x); x in Input domain} = 1 - a. Various alternatives for choosing g - and g + there exist [1]. A common approach considers the models g - (x) = g + (x) = 1 for all x Input domain. Two approaches related to the scale factors u - and u + are important enough: u + = u - (symmetrically simultaneous confidence bands); or the sum u + + u - must have a minimal value. 2.4 Bootstrap for nonlinear regression and neural network modelling Regression analysis is one of the statistical tools that is frequently used in data analysis [1, 9]. Also designing neural networks for data analysis is a common activity among researchers. Not only nonlinear regression, also neural networks were successfully applied for the prediction of the most important cement quality characteristics, as Guslikov et al. [10] reports. Other successful stories are reported for case-control studies missing data [12] or for environmental risk assessment [15]. The combination of nonlinear regression / neural networks and bootstrap can be used to extract information from cement database. Applying the bootstrap approach a sequence of parameter s estimations and a classical and bootstrap confidence uncertainty analysis can be realised. Mean analysis Bootstrap Mean Figure 1 - Mean analysis by uniform resampling 225
6 Median analysis Bootstrap median Figure 2 - Median analysis by uniform resampling 3 EXPERIMENTAL RESULTS When considering the initial sample of a measurement task and apply a bootstrap resampling for a large moments of time, the bootstrap characteristics of the sample will oscillate between some values to be used in an the uncertainty analysis based on bootstrap intervals. The images shown by figures 1-5 illustrate the bootstrap characteristics of the above sample obtained in a uniform resampling approach. The following characteristics are illustrated: bootstrap mean, bootstrap median, bootstrap IQR, bootstrap N-IQR and bootstrap standard deviation. Figure 6 describes a bootstrap confidence obtained in a nonlinear regression model applied in software testing and reliability prediction according to data given in Table 1. It is easy to see that all 10 bootstrap functional belong to a band easily obtainable considering the domain range values (considering min - max values for all moments and functions). This will permit a bootstrap uncertainty analysis of model s parameters. 226
7 IQR analysis Bootstrap IQR Figure 3 - IQR analysis by uniform resampling N- IQR analysis Bootstrap N-IQR Figure 4 - N-IQR analysis by uniform resampling STDEV analysis Bootstrap STDEV Figure 5 - IQR analysis by uniform resampling 227
8 Software reliability is defined as the probability of failure-free software operation for a specified period of time in a specified environment. Many software reliability growth models are available and most of them assume that detected faults are immediately corrected [3]. Actually, this assumption may not be realistic in practice, and an uncertainty analysis has to be done based on different bootstrap variants as mentioned in [4, 8]. Functional analysis F(t), Fi(t), i=1,..., t F(t) F1(t) F2(t) F3(t) F4(t) F5(t) F6(t) F7(t) F8(t) F9(t) F10(t) Figure 6 - Software reliability growth modelling Table 1 - Collected data and Bootstrap Uniform Resampling Analysis t F(t) F 1 (t) F 2 (t) F 3 (t) F 4 (t) F 5 (t) F 6 (t) F 7 (t) F 8 (t) F 9 (t) F 10 (t)
9 4 CONCLUSIONS The paper provides a methodology to investigate uncertainty evaluation by bootstrap and describes a procedure to obtain confidence bands for linear and nonlinear models used in data analysis and design measurements. The approach can be used both for theoretical applications, and it may addresses practical applications for accuracy assessment, uncertainty analysis for all kind of models, including those based on neural networks. REFERENCES [1] Albeanu G.: The statistical and numerical analysis of some nonlinear regression models, PhD Disertation (in Romanian), Bucharest University, [2] Albeanu G.: Resampling Simultaneous Confidence Bands for Nonlinear Explicit Regression Models, Mathematical Reports, 50(5-6), , (1998). [3] Albeanu G. and Popentiu F.: On the Bootstrap Method: Software Reliability Assessment and Simultaneous Confidence Bands, Annals of Oradea University, Energetics Series, 7(1), , (2001). [4] Cowling A.; Hall P. and Phillips M.J.: Bootstrap Confidence Region for the Intensity of a Poisson Point Process, JASA, 91, , (1996). [5] Efron B.: Bootstrap Methods: Another Look at the Jacknife, Ann. Stat., 7, 1-27, (1979). [6] Efron B.; Tibshirani R.J.: An introduction to the bootstrap, Chapman and Hall, London, [7] Efron B., Computer-Intensive Methods in Statistical Regression, Siam Review, 30, 3, , (1988). [8] Frey H. C.; Burmaster D. E.: Methods for Characterizing Variability and Uncertainty: Comparison of Bootstrap Simulation and Likelihood-Based Approaches, Risk Analysis, 19, 1, , (1999). [9] Gonçalves S.; White H.: Bootstrap Standard Error Estimates for Linear Regression, JASA, 100, 471, , (2005). [10] Guslicov G.; Coarnă M.; Şerbănescu L.; Sorescu V.; Ispeciuc M.: Standard Strength of Cements Predicted via Computer Modeling, Key Engineering Materials, 264(-268), , (2004). [11] Jolliffe I.: Estimation of uncertainty in verification measures, International Verification Methods Workshop, The McGill Faculty Club, Montreal, Quebec, Canada, (2004). [12] Siersma V.; Johansen C.: The use of the bootstrap in the analysis of casecontrol studies missing data, Research Report - Institute of Public Health, University of Copenhagen, (2004). [13] The Phrase Finder: 2007, Meanings and Origins of Phrases, Sayings and Idioms, [14] Vaduva I.: Simulation models by computer, Technical Publishing House (in Romanian), Bucharest, (1977). [15] Verdonck F.; Jaworska J.; Thas O. and Vanrolleghem P.A.: Uncertainty Techniques in Environmental Risk Assessment, Med. Fac. Landbouw. Univ. Gent, 65/4, , (2000). [16] Web1: Testing Interlaboratory comparisons, Asia Pacific Laboratory Acreditation Cooperation, 229
Determining the Spread of a Distribution Variance & Standard Deviation
Determining the Spread of a Distribution Variance & Standard Deviation 1.3 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3 Lecture 3 1 / 32 Outline 1 Describing
More informationThe exact bootstrap method shown on the example of the mean and variance estimation
Comput Stat (2013) 28:1061 1077 DOI 10.1007/s00180-012-0350-0 ORIGINAL PAPER The exact bootstrap method shown on the example of the mean and variance estimation Joanna Kisielinska Received: 21 May 2011
More informationMidwest Big Data Summer School: Introduction to Statistics. Kris De Brabanter
Midwest Big Data Summer School: Introduction to Statistics Kris De Brabanter kbrabant@iastate.edu Iowa State University Department of Statistics Department of Computer Science June 20, 2016 1/27 Outline
More informationPrerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3
University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.
More informationChapter 3. Measuring data
Chapter 3 Measuring data 1 Measuring data versus presenting data We present data to help us draw meaning from it But pictures of data are subjective They re also not susceptible to rigorous inference Measuring
More informationA COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky
A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),
More information11. Bootstrap Methods
11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods
More informationGraphical Presentation of a Nonparametric Regression with Bootstrapped Confidence Intervals
Graphical Presentation of a Nonparametric Regression with Bootstrapped Confidence Intervals Mark Nicolich & Gail Jorgensen Exxon Biomedical Science, Inc., East Millstone, NJ INTRODUCTION Parametric regression
More informationBootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions
JKAU: Sci., Vol. 21 No. 2, pp: 197-212 (2009 A.D. / 1430 A.H.); DOI: 10.4197 / Sci. 21-2.2 Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions Ali Hussein Al-Marshadi
More informationP8130: Biostatistical Methods I
P8130: Biostatistical Methods I Lecture 2: Descriptive Statistics Cody Chiuzan, PhD Department of Biostatistics Mailman School of Public Health (MSPH) Lecture 1: Recap Intro to Biostatistics Types of Data
More informationStatistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018
Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical
More informationSUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota
Submitted to the Annals of Statistics arxiv: math.pr/0000000 SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION By Wei Liu and Yuhong Yang University of Minnesota In
More informationMaster of Science in Statistics A Proposal
1 Master of Science in Statistics A Proposal Rationale of the Program In order to cope up with the emerging complexity on the solutions of realistic problems involving several phenomena of nature it is
More informationPractical Statistics for the Analytical Scientist Table of Contents
Practical Statistics for the Analytical Scientist Table of Contents Chapter 1 Introduction - Choosing the Correct Statistics 1.1 Introduction 1.2 Choosing the Right Statistical Procedures 1.2.1 Planning
More informationDetermining the Spread of a Distribution
Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1
MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1 1 The General Bootstrap This is a computer-intensive resampling algorithm for estimating the empirical
More informationDetermining the Spread of a Distribution
Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative
More informationBetter Bootstrap Confidence Intervals
by Bradley Efron University of Washington, Department of Statistics April 12, 2012 An example Suppose we wish to make inference on some parameter θ T (F ) (e.g. θ = E F X ), based on data We might suppose
More informationIndependent and conditionally independent counterfactual distributions
Independent and conditionally independent counterfactual distributions Marcin Wolski European Investment Bank M.Wolski@eib.org Society for Nonlinear Dynamics and Econometrics Tokyo March 19, 2018 Views
More informationContinuous random variables
Continuous random variables A continuous random variable X takes all values in an interval of numbers. The probability distribution of X is described by a density curve. The total area under a density
More information2011 Pearson Education, Inc
Statistics for Business and Economics Chapter 2 Methods for Describing Sets of Data Summary of Central Tendency Measures Measure Formula Description Mean x i / n Balance Point Median ( n +1) Middle Value
More informationDo Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods
Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods Robert V. Breunig Centre for Economic Policy Research, Research School of Social Sciences and School of
More informationPreliminary Statistics course. Lecture 1: Descriptive Statistics
Preliminary Statistics course Lecture 1: Descriptive Statistics Rory Macqueen (rm43@soas.ac.uk), September 2015 Organisational Sessions: 16-21 Sep. 10.00-13.00, V111 22-23 Sep. 15.00-18.00, V111 24 Sep.
More informationWeb-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach
1 Web-based Supplementary Material for Dependence Calibration in Conditional Copulas: A Nonparametric Approach Elif F. Acar, Radu V. Craiu, and Fang Yao Web Appendix A: Technical Details The score and
More informationZ-score Summary - Plasticity Proficiency Testing Program (79) Z-SCORES SUMMARY. Plasticity May 2018(79)
www.labsmartservices.com.au Z-SCORES SUMMARY Plasticity May 2018(79) The proficiency program was conducted in May 2018 with participants throughout Australia. AS 1289 test methods were preferred but other
More informationThe Nonparametric Bootstrap
The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use
More informationUnit 2. Describing Data: Numerical
Unit 2 Describing Data: Numerical Describing Data Numerically Describing Data Numerically Central Tendency Arithmetic Mean Median Mode Variation Range Interquartile Range Variance Standard Deviation Coefficient
More informationStatistics Toolbox 6. Apply statistical algorithms and probability models
Statistics Toolbox 6 Apply statistical algorithms and probability models Statistics Toolbox provides engineers, scientists, researchers, financial analysts, and statisticians with a comprehensive set of
More informationSensitivity Analysis with Correlated Variables
Sensitivity Analysis with Correlated Variables st Workshop on Nonlinear Analysis of Shell Structures INTALES GmbH Engineering Solutions University of Innsbruck, Faculty of Civil Engineering University
More informationBootstrapping the Confidence Intervals of R 2 MAD for Samples from Contaminated Standard Logistic Distribution
Pertanika J. Sci. & Technol. 18 (1): 209 221 (2010) ISSN: 0128-7680 Universiti Putra Malaysia Press Bootstrapping the Confidence Intervals of R 2 MAD for Samples from Contaminated Standard Logistic Distribution
More informationBagging During Markov Chain Monte Carlo for Smoother Predictions
Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods
More informationMultilevel Statistical Models: 3 rd edition, 2003 Contents
Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction
More information1. Exploratory Data Analysis
1. Exploratory Data Analysis 1.1 Methods of Displaying Data A visual display aids understanding and can highlight features which may be worth exploring more formally. Displays should have impact and be
More informationMIT Spring 2015
MIT 18.443 Dr. Kempthorne Spring 2015 MIT 18.443 1 Outline 1 MIT 18.443 2 Batches of data: single or multiple x 1, x 2,..., x n y 1, y 2,..., y m w 1, w 2,..., w l etc. Graphical displays Summary statistics:
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationVerification of Continuous Forecasts
Verification of Continuous Forecasts Presented by Barbara Brown Including contributions by Tressa Fowler, Barbara Casati, Laurence Wilson, and others Exploratory methods Scatter plots Discrimination plots
More informationinterval forecasting
Interval Forecasting Based on Chapter 7 of the Time Series Forecasting by Chatfield Econometric Forecasting, January 2008 Outline 1 2 3 4 5 Terminology Interval Forecasts Density Forecast Fan Chart Most
More informationA better way to bootstrap pairs
A better way to bootstrap pairs Emmanuel Flachaire GREQAM - Université de la Méditerranée CORE - Université Catholique de Louvain April 999 Abstract In this paper we are interested in heteroskedastic regression
More informationCOLLABORATION OF STATISTICAL METHODS IN SELECTING THE CORRECT MULTIPLE LINEAR REGRESSIONS
American Journal of Biostatistics 4 (2): 29-33, 2014 ISSN: 1948-9889 2014 A.H. Al-Marshadi, This open access article is distributed under a Creative Commons Attribution (CC-BY) 3.0 license doi:10.3844/ajbssp.2014.29.33
More informationAdditional Problems Additional Problem 1 Like the http://www.stat.umn.edu/geyer/5102/examp/rlike.html#lmax example of maximum likelihood done by computer except instead of the gamma shape model, we will
More informationMSc / PhD Course Advanced Biostatistics. dr. P. Nazarov
MSc / PhD Course Advanced Biostatistics dr. P. Nazarov petr.nazarov@crp-sante.lu 2-12-2012 1. Descriptive Statistics edu.sablab.net/abs2013 1 Outline Lecture 0. Introduction to R - continuation Data import
More informationChapter 2 Class Notes Sample & Population Descriptions Classifying variables
Chapter 2 Class Notes Sample & Population Descriptions Classifying variables Random Variables (RVs) are discrete quantitative continuous nominal qualitative ordinal Notation and Definitions: a Sample is
More informationLecture 2 and Lecture 3
Lecture 2 and Lecture 3 1 Lecture 2 and Lecture 3 We can describe distributions using 3 characteristics: shape, center and spread. These characteristics have been discussed since the foundation of statistics.
More informationREFERENCES AND FURTHER STUDIES
REFERENCES AND FURTHER STUDIES by..0. on /0/. For personal use only.. Afifi, A. A., and Azen, S. P. (), Statistical Analysis A Computer Oriented Approach, Academic Press, New York.. Alvarez, A. R., Welter,
More informationPreliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference
1 / 171 Bootstrap inference Francisco Cribari-Neto Departamento de Estatística Universidade Federal de Pernambuco Recife / PE, Brazil email: cribari@gmail.com October 2013 2 / 171 Unpaid advertisement
More informationConfidence Estimation Methods for Neural Networks: A Practical Comparison
, 6-8 000, Confidence Estimation Methods for : A Practical Comparison G. Papadopoulos, P.J. Edwards, A.F. Murray Department of Electronics and Electrical Engineering, University of Edinburgh Abstract.
More informationSubject CS1 Actuarial Statistics 1 Core Principles
Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and
More informationDescriptive Univariate Statistics and Bivariate Correlation
ESC 100 Exploring Engineering Descriptive Univariate Statistics and Bivariate Correlation Instructor: Sudhir Khetan, Ph.D. Wednesday/Friday, October 17/19, 2012 The Central Dogma of Statistics used to
More informationStatistics I Chapter 2: Univariate data analysis
Statistics I Chapter 2: Univariate data analysis Chapter 2: Univariate data analysis Contents Graphical displays for categorical data (barchart, piechart) Graphical displays for numerical data data (histogram,
More informationLast Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics
Last Lecture Distinguish Populations from Samples Importance of identifying a population and well chosen sample Knowing different Sampling Techniques Distinguish Parameters from Statistics Knowing different
More informationstatistical methods for tailoring seasonal climate forecasts Andrew W. Robertson, IRI
statistical methods for tailoring seasonal climate forecasts Andrew W. Robertson, IRI tailored seasonal forecasts why do we make probabilistic forecasts? to reduce our uncertainty about the (unknown) future
More informationEstimation and Hypothesis Testing in LAV Regression with Autocorrelated Errors: Is Correction for Autocorrelation Helpful?
Journal of Modern Applied Statistical Methods Volume 10 Issue Article 13 11-1-011 Estimation and Hypothesis Testing in LAV Regression with Autocorrelated Errors: Is Correction for Autocorrelation Helpful?
More informationChapter 4. Displaying and Summarizing. Quantitative Data
STAT 141 Introduction to Statistics Chapter 4 Displaying and Summarizing Quantitative Data Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 31 4.1 Histograms 1 We divide the range
More informationParticle Filters. Outline
Particle Filters M. Sami Fadali Professor of EE University of Nevada Outline Monte Carlo integration. Particle filter. Importance sampling. Degeneracy Resampling Example. 1 2 Monte Carlo Integration Numerical
More information2.1 Measures of Location (P.9-11)
MATH1015 Biostatistics Week.1 Measures of Location (P.9-11).1.1 Summation Notation Suppose that we observe n values from an experiment. This collection (or set) of n values is called a sample. Let x 1
More informationA Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003.
A Note on Bootstraps and Robustness Tony Lancaster, Brown University, December 2003. In this note we consider several versions of the bootstrap and argue that it is helpful in explaining and thinking about
More informationStatistics I Chapter 2: Univariate data analysis
Statistics I Chapter 2: Univariate data analysis Chapter 2: Univariate data analysis Contents Graphical displays for categorical data (barchart, piechart) Graphical displays for numerical data data (histogram,
More informationA Note on Bayesian Inference After Multiple Imputation
A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in
More informationOne-Sample Numerical Data
One-Sample Numerical Data quantiles, boxplot, histogram, bootstrap confidence intervals, goodness-of-fit tests University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html
More informationExact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring
Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring A. Ganguly, S. Mitra, D. Samanta, D. Kundu,2 Abstract Epstein [9] introduced the Type-I hybrid censoring scheme
More informationThe Model Building Process Part I: Checking Model Assumptions Best Practice
The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test
More informationFundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur
Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new
More informationFRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE
FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE Course Title: Probability and Statistics (MATH 80) Recommended Textbook(s): Number & Type of Questions: Probability and Statistics for Engineers
More informationLeast Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions
Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error
More informationAnalytical Bootstrap Methods for Censored Data
JOURNAL OF APPLIED MATHEMATICS AND DECISION SCIENCES, 6(2, 129 141 Copyright c 2002, Lawrence Erlbaum Associates, Inc. Analytical Bootstrap Methods for Censored Data ALAN D. HUTSON Division of Biostatistics,
More informationMidrange: mean of highest and lowest scores. easy to compute, rough estimate, rarely used
Measures of Central Tendency Mode: most frequent score. best average for nominal data sometimes none or multiple modes in a sample bimodal or multimodal distributions indicate several groups included in
More informationLecture 11. Data Description Estimation
Lecture 11 Data Description Estimation Measures of Central Tendency (continued, see last lecture) Sample mean, population mean Sample mean for frequency distributions The median The mode The midrange 3-22
More informationPost-exam 2 practice questions 18.05, Spring 2014
Post-exam 2 practice questions 18.05, Spring 2014 Note: This is a set of practice problems for the material that came after exam 2. In preparing for the final you should use the previous review materials,
More informationStatistical simulation and prediction in software reliability
An. Şt. Univ. Ovidius Constanţa Vol. 16(1), 2008, 81 90 Statistical simulation and prediction in software reliability Alexei Leahu and Carmen Elena Lupu Abstract On the base of statistical simulation (Monte
More informationFinite Population Correction Methods
Finite Population Correction Methods Moses Obiri May 5, 2017 Contents 1 Introduction 1 2 Normal-based Confidence Interval 2 3 Bootstrap Confidence Interval 3 4 Finite Population Bootstrap Sampling 5 4.1
More informationStatistical Practice
Statistical Practice A Note on Bayesian Inference After Multiple Imputation Xiang ZHOU and Jerome P. REITER This article is aimed at practitioners who plan to use Bayesian inference on multiply-imputed
More informationCHAPTER 1. Introduction
CHAPTER 1 Introduction Engineers and scientists are constantly exposed to collections of facts, or data. The discipline of statistics provides methods for organizing and summarizing data, and for drawing
More informationCoping with uncertainty in the economical optimization of a dike design
Van Gelder, et.al. - 1 Coping with uncertainty in the economical optimization of a dike design P.H.A.J.M. VAN GELDER, J.K. VRIJLING, K.A.H. SLIJKHUIS Delft University of Technology, Faculty of Civil Engineering
More informationMATH 117 Statistical Methods for Management I Chapter Three
Jubail University College MATH 117 Statistical Methods for Management I Chapter Three This chapter covers the following topics: I. Measures of Center Tendency. 1. Mean for Ungrouped Data (Raw Data) 2.
More informationWeek Topics of study Home/Independent Learning Assessment (If in addition to homework) 7 th September 2015
Week Topics of study Home/Independent Learning Assessment (If in addition to homework) 7 th September Functions: define the terms range and domain (PLC 1A) and identify the range and domain of given functions
More informationMartin L. Lesser, PhD Biostatistics Unit Feinstein Institute for Medical Research North Shore-LIJ Health System
PREP Course #10: Introduction to Exploratory Data Analysis and Data Transformations (Part 1) Martin L. Lesser, PhD Biostatistics Unit Feinstein Institute for Medical Research North Shore-LIJ Health System
More informationPreliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference
1 / 172 Bootstrap inference Francisco Cribari-Neto Departamento de Estatística Universidade Federal de Pernambuco Recife / PE, Brazil email: cribari@gmail.com October 2014 2 / 172 Unpaid advertisement
More informationData splitting. INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+TITLE:
#+TITLE: Data splitting INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+AUTHOR: Thomas Alexander Gerds #+INSTITUTE: Department of Biostatistics, University of Copenhagen
More informationSection 3. Measures of Variation
Section 3 Measures of Variation Range Range = (maximum value) (minimum value) It is very sensitive to extreme values; therefore not as useful as other measures of variation. Sample Standard Deviation The
More informationResampling and the Bootstrap
Resampling and the Bootstrap Axel Benner Biostatistics, German Cancer Research Center INF 280, D-69120 Heidelberg benner@dkfz.de Resampling and the Bootstrap 2 Topics Estimation and Statistical Testing
More informationThe Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)
The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE
More informationSequential Implementation of Monte Carlo Tests with Uniformly Bounded Resampling Risk
Sequential Implementation of Monte Carlo Tests with Uniformly Bounded Resampling Risk Axel Gandy Department of Mathematics Imperial College London a.gandy@imperial.ac.uk user! 2009, Rennes July 8-10, 2009
More informationLearning Gaussian Process Models from Uncertain Data
Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada
More informationThe assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values
Statistical Consulting Topics The Bootstrap... The bootstrap is a computer-based method for assigning measures of accuracy to statistical estimates. (Efron and Tibshrani, 1998.) What do we do when our
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationBootstrapping, Randomization, 2B-PLS
Bootstrapping, Randomization, 2B-PLS Statistics, Tests, and Bootstrapping Statistic a measure that summarizes some feature of a set of data (e.g., mean, standard deviation, skew, coefficient of variation,
More informationBootstrap Approach to Comparison of Alternative Methods of Parameter Estimation of a Simultaneous Equation Model
Bootstrap Approach to Comparison of Alternative Methods of Parameter Estimation of a Simultaneous Equation Model Olubusoye, O. E., J. O. Olaomi, and O. O. Odetunde Abstract A bootstrap simulation approach
More informationAnalysis of Fast Input Selection: Application in Time Series Prediction
Analysis of Fast Input Selection: Application in Time Series Prediction Jarkko Tikka, Amaury Lendasse, and Jaakko Hollmén Helsinki University of Technology, Laboratory of Computer and Information Science,
More informationStep-Stress Models and Associated Inference
Department of Mathematics & Statistics Indian Institute of Technology Kanpur August 19, 2014 Outline Accelerated Life Test 1 Accelerated Life Test 2 3 4 5 6 7 Outline Accelerated Life Test 1 Accelerated
More information3 Lecture 3 Notes: Measures of Variation. The Boxplot. Definition of Probability
3 Lecture 3 Notes: Measures of Variation. The Boxplot. Definition of Probability 3.1 Week 1 Review Creativity is more than just being different. Anybody can plan weird; that s easy. What s hard is to be
More informationBasics of Uncertainty Analysis
Basics of Uncertainty Analysis Chapter Six Basics of Uncertainty Analysis 6.1 Introduction As shown in Fig. 6.1, analysis models are used to predict the performances or behaviors of a product under design.
More informationEconometric Analysis of Cross Section and Panel Data
Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND
More informationThe empirical ( ) rule
The empirical (68-95-99.7) rule With a bell shaped distribution, about 68% of the data fall within a distance of 1 standard deviation from the mean. 95% fall within 2 standard deviations of the mean. 99.7%
More informationAn application of the GAM-PCA-VAR model to respiratory disease and air pollution data
An application of the GAM-PCA-VAR model to respiratory disease and air pollution data Márton Ispány 1 Faculty of Informatics, University of Debrecen Hungary Joint work with Juliana Bottoni de Souza, Valdério
More informationConfidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods
Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)
More informationDescribing Distributions with Numbers
Topic 2 We next look at quantitative data. Recall that in this case, these data can be subject to the operations of arithmetic. In particular, we can add or subtract observation values, we can sort them
More informationSTAT440/840: Statistical Computing
First Prev Next Last STAT440/840: Statistical Computing Paul Marriott pmarriott@math.uwaterloo.ca MC 6096 February 2, 2005 Page 1 of 41 First Prev Next Last Page 2 of 41 Chapter 3: Data resampling: the
More informationAdvanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland
EnviroInfo 2004 (Geneva) Sh@ring EnviroInfo 2004 Advanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland Mikhail Kanevski 1, Michel Maignan 1
More informationIn-Flight Engine Diagnostics and Prognostics Using A Stochastic-Neuro-Fuzzy Inference System
In-Flight Engine Diagnostics and Prognostics Using A Stochastic-Neuro-Fuzzy Inference System Dan M. Ghiocel & Joshua Altmann STI Technologies, Rochester, New York, USA Keywords: reliability, stochastic
More information