ON USING BOOTSTRAP APPROACH FOR UNCERTAINTY ESTIMATION

Size: px
Start display at page:

Download "ON USING BOOTSTRAP APPROACH FOR UNCERTAINTY ESTIMATION"

Transcription

1 The First International Proficiency Testing Conference Sinaia, România 11 th 13 th October, 2007 ON USING BOOTSTRAP APPROACH FOR UNCERTAINTY ESTIMATION Grigore Albeanu University of Oradea, UNESCO Chair in Information Technologies, Universitatii Str., No. 1, , Oradea, Romania Abstract Computer-Intensive methods for estimation assessment provide valuable information concerning the adequacy of applied probabilistic models. The bootstrap method is an extensive computational approach to uncertainty estimation based on resampling and statistical estimation. It is a powerful tool, especially when only a small data set is used to predict the behaviour of systems or processes. This paper provides a methodology to investigate uncertainty evaluation by bootstrap and a procedure to obtain confidence bands for linear and nonlinear models used in data analysis and design measurements. Also statistical considerations for proficiency testing are given. Numerical examples and graphical visualizations will be shown for a case study. Key words Bootstrap method, Resampling methods, Data modelling, Sampling variance, Median, Z-scores, Confidence Intervals, Confidence bands, Proficiency Testing 1 INTRODUCTION Many modelling applications in social and engineering sciences give rise to an estimation process (parameters or functional relations). The numerical estimation process always depends on the previously data collection. Even data filtering and outlier s elimination procedures were applied, the ill-conditioned effect could appear. To increase the confidence on the estimations or to improve the estimates, some computer intensive methods were recommended. This paper describes the usage of the bootstrap approach for assessing the estimation process and to provide simultaneous confidence bands for the studied model. 221

2 2 THEORETICAL BACKGROUND Bootstrap is a simple but powerful Monte Carlo method to assess statistical accuracy or to estimate a distribution from sample's statistics. The name may come from phrase Pull yourself up by your bootstraps having the following meaning Improve your situation by your own efforts, as Phrase Finder [13] is able to find. The methods are suitable for any level of modelling being useful for fully parametric, semiparametric, and completely nonparametric analysis. These approaches are not only in use by statisticians, but also are applied anywhere statistics can be used: life sciences, social sciences, econometrics, reliability etc. For the aim of this paper we outline the basic bootstrap principle, the application of bootstrap sampling for accuracy estimation and the method of simultaneous confidence bands for uncertainty management. 2.1 The bootstrap method principle Let X be a random variable and F the cumulative distribution function of the variable X. The bootstrap method was proposed by Efron [5, 6, 7] and it can be used on the estimation of: the distribution function of a random variable R(X, F) ; a functional relation V(F), or the accuracy of a statistics s obtained from a sample (X 1, X 2,..., X n ) of size n from X. In the framework of this paper, the accuracy, describes the variability of s when independent estimations s(1), s(2),..., of the statistics s, are obtained by resampling. The bootstrap technique uses the sample (X 1, X 2,..., X n ) to obtain the sampling cumulative distribution function F n (x) in order to replace the true cumulative distribution function F: F n (x) = (1/n) cardinal { x i x; 1 i n }. To repeatedly simulate bootstrap samples X*:=(X 1 *, X 2 *,..., X n *) from F n, random number generators should be used according to the Monte Carlo approaches. Then, for each bootstrap sample, it is recalculated: the distribution function of the random variable R(X*, F n ) ; the functional relation V(F n ) or V(F n *); the statistics s*( ). The accuracy of the statistics s can be derived under an appropriate statistical inference study on the sequence s*(). The bootstrap resampling can be realised in various ways. Uniform resampling and the importance resampling are the mostly used. We will present, in detail, these resampling algorithms. Uniform resampling assumes that the measurement (observed) values are uniformly sampled from some process. The algorithm for uniform resampling consists in the following steps: 1. Initialise a uniform random number generator over (0,1), say random() ; 2. For k from 1 to n do {u := random(); j := [n u] + 1; X k * := X j }. 222

3 The importance resampling algorithm assumes the generation of sampling values according to a probability distribution {(x i, p i ): i = 1, 2,..., n} such as every p i is a nonnegative real number, and p 1 + p p n = 1. Using a method to generate random samples according to a given discrete distribution function (for more about random variable generation see [14]), the bootstrap sample is obtained in the following steps [1, 2]: 1. Initialise a uniform random number generator over (0,1); 2. For k from 1 to n do steps Let j := 0, and u := random(); 2.2. j := j + 1; 2.3. If u > p 1 + p p j then goto step 2.2; 2.4. X k * := X j. As a common example of the usage of the uniform resampling, we refer to the bootstrap algorithm for estimating standard errors. However, when some observations are more important then others, the importance resampling can provide close to real conclusions. If resampling is based on importance resampling weights, then the bootstrap estimates are reweighted as if uniform resampling is done. 2.2 Uncertainty estimation and bootstrap Terminology For a set (sample) of uncorrelated N random variables, x i, i = 1, 2,, N, let us denote by Mean[X] the sample mean given by: N 1 Mean[X] =. x i N i= 1 The median is the middle value of the group for a particular sample, i.e. half of the results for the sample are higher than it and half are lower (Q2). It is calculated from the sorted values (from lowest to highest). If N is even, the median is the average of the two central values. Let us denote this value by Median[X]. Interquartile range (IQR[X]) is the difference between the lower and upper quartiles = Q3 - Q1. The lower quartile is the value below a quarter of the results lie. Similarly, the upper quartile is the value above a quarter of the results lie. Normalised IQR (NIQR[X]) is a measure of the variability of the results which basically is a robust standard deviation. It is equal to the IQR[X] multiplied by the factor Robust CV (coefficient of variation) is equal to the NIQR[X] divided by the median, expressed as a percentage (i.e. multiplied by 100), it allows for the variability in different tests to be compared. An accepted statistical method for analysis test results in proficiency testing is to calculate a Z-score for each laboratory s result. The standard form for the calculation of Z-scores is 223

4 X i A[ X ] Z i =, B[ X ] where A[X] is the assigned value (sample mean), and B[X] is an estimate of the spread of all results (standard deviation). The classical approach based on mean and standard deviation is signifiquantly influenced by the presence of extreme values (outliers). Therefore, a robust approach based on median and interquantile range is better to be used [11, 16]. Robust Z-scores are calculated by replacing A[X] and B[X] in the classical Z-score by the median and NIQR, respectively. For proficiency testing both betweenlaboratory and within laboratory Z-scores can be used. The standardised sum (S) and the standardised difference (D) for the pair of results are: S = (A+B)/ 2 and D = (A-B)/ 2 (median of A is less than median of B) or D = (B-A)/ 2, otherwise. The between-laboratory Z-score (ZB) is the robust Z-score of S and the withinlaboratory Z-score (ZW) is the robust Z-score of D: and ZB = (S-Median[S])/NIQR(S), ZW = (D - Median(D))/NIQR(D). A methodology based on bootstrap approach can be used to study the robust Z- scores ZB and ZW Simultaneous bootstraping the MEAN and STDEV A Z-score can be computed repeatedly by simultaneous resampling in order to obtain a resample mean and standard deviance. The process is repeated by a number of steps and the final results can be obtained considering the best performance. An 80% approach can be used when a strong acceptance is required. Generally speaking, the best test could be based on a 51% approach Simultaneous bootstraping the sample MEDIAN and NIQR A robust Z-score can be computed repeatedly by simultaneous resampling in order to obtain a resample median and normalised interquantile range. The process is repeated by a number of steps and the final results can be obtained considering the best performance. Also, an 80% approach can be used when a strong acceptance is required. Generally speaking, the best test could be based on a 51% approach. 2.3 Simultaneous confidence bands The general framework for obtaining simultaneous confidence bands for non-linear explicit regression models using resampling techniques is presented in [1, 2, 3]. Let g+, g- be two known nonnegative models (real functions which may depend on the input data), and u+, u- be nonnegative scale factors. The confidence band for the model h (used to explain a relation such as y i = h(x i, θ) + ε i (i = 1, 2, ) is the region: R = {(x, y): x Input domain, h(x, est(θ)) - est(v)u - g - (x) y h(x, est(θ)) + est(v)u + g + (x) }, 224

5 where (x i, y i ) i 1 are observations, θ is a parameter vector, est(θ) is a least squares estimate of θ, (ε i ) i 1 are random variables distributed independently and identically with an unknown cumulative distribution function (cdf) F, but with mean zero and unknown variance σ 2, and est(v) is an estimate of σ. The reference [11] describes the application of this technique for both a parametric log-linear and a kernel estimation model in the normalised input domain (0,1). However, any input domain and any estimation model could be used to obtain an estimation h(x,.). Obtaining the region R asks for solving the following problem: given the models g-, g+, and a (0, 1), search for the non-negative real numbers u- and u+ such that P{h(x, est(θ)) - est(v)u - g - (x) h(x, θ) h(x, est(θ)) + est(v)u + g + (x); x in Input domain} = 1 - a. Various alternatives for choosing g - and g + there exist [1]. A common approach considers the models g - (x) = g + (x) = 1 for all x Input domain. Two approaches related to the scale factors u - and u + are important enough: u + = u - (symmetrically simultaneous confidence bands); or the sum u + + u - must have a minimal value. 2.4 Bootstrap for nonlinear regression and neural network modelling Regression analysis is one of the statistical tools that is frequently used in data analysis [1, 9]. Also designing neural networks for data analysis is a common activity among researchers. Not only nonlinear regression, also neural networks were successfully applied for the prediction of the most important cement quality characteristics, as Guslikov et al. [10] reports. Other successful stories are reported for case-control studies missing data [12] or for environmental risk assessment [15]. The combination of nonlinear regression / neural networks and bootstrap can be used to extract information from cement database. Applying the bootstrap approach a sequence of parameter s estimations and a classical and bootstrap confidence uncertainty analysis can be realised. Mean analysis Bootstrap Mean Figure 1 - Mean analysis by uniform resampling 225

6 Median analysis Bootstrap median Figure 2 - Median analysis by uniform resampling 3 EXPERIMENTAL RESULTS When considering the initial sample of a measurement task and apply a bootstrap resampling for a large moments of time, the bootstrap characteristics of the sample will oscillate between some values to be used in an the uncertainty analysis based on bootstrap intervals. The images shown by figures 1-5 illustrate the bootstrap characteristics of the above sample obtained in a uniform resampling approach. The following characteristics are illustrated: bootstrap mean, bootstrap median, bootstrap IQR, bootstrap N-IQR and bootstrap standard deviation. Figure 6 describes a bootstrap confidence obtained in a nonlinear regression model applied in software testing and reliability prediction according to data given in Table 1. It is easy to see that all 10 bootstrap functional belong to a band easily obtainable considering the domain range values (considering min - max values for all moments and functions). This will permit a bootstrap uncertainty analysis of model s parameters. 226

7 IQR analysis Bootstrap IQR Figure 3 - IQR analysis by uniform resampling N- IQR analysis Bootstrap N-IQR Figure 4 - N-IQR analysis by uniform resampling STDEV analysis Bootstrap STDEV Figure 5 - IQR analysis by uniform resampling 227

8 Software reliability is defined as the probability of failure-free software operation for a specified period of time in a specified environment. Many software reliability growth models are available and most of them assume that detected faults are immediately corrected [3]. Actually, this assumption may not be realistic in practice, and an uncertainty analysis has to be done based on different bootstrap variants as mentioned in [4, 8]. Functional analysis F(t), Fi(t), i=1,..., t F(t) F1(t) F2(t) F3(t) F4(t) F5(t) F6(t) F7(t) F8(t) F9(t) F10(t) Figure 6 - Software reliability growth modelling Table 1 - Collected data and Bootstrap Uniform Resampling Analysis t F(t) F 1 (t) F 2 (t) F 3 (t) F 4 (t) F 5 (t) F 6 (t) F 7 (t) F 8 (t) F 9 (t) F 10 (t)

9 4 CONCLUSIONS The paper provides a methodology to investigate uncertainty evaluation by bootstrap and describes a procedure to obtain confidence bands for linear and nonlinear models used in data analysis and design measurements. The approach can be used both for theoretical applications, and it may addresses practical applications for accuracy assessment, uncertainty analysis for all kind of models, including those based on neural networks. REFERENCES [1] Albeanu G.: The statistical and numerical analysis of some nonlinear regression models, PhD Disertation (in Romanian), Bucharest University, [2] Albeanu G.: Resampling Simultaneous Confidence Bands for Nonlinear Explicit Regression Models, Mathematical Reports, 50(5-6), , (1998). [3] Albeanu G. and Popentiu F.: On the Bootstrap Method: Software Reliability Assessment and Simultaneous Confidence Bands, Annals of Oradea University, Energetics Series, 7(1), , (2001). [4] Cowling A.; Hall P. and Phillips M.J.: Bootstrap Confidence Region for the Intensity of a Poisson Point Process, JASA, 91, , (1996). [5] Efron B.: Bootstrap Methods: Another Look at the Jacknife, Ann. Stat., 7, 1-27, (1979). [6] Efron B.; Tibshirani R.J.: An introduction to the bootstrap, Chapman and Hall, London, [7] Efron B., Computer-Intensive Methods in Statistical Regression, Siam Review, 30, 3, , (1988). [8] Frey H. C.; Burmaster D. E.: Methods for Characterizing Variability and Uncertainty: Comparison of Bootstrap Simulation and Likelihood-Based Approaches, Risk Analysis, 19, 1, , (1999). [9] Gonçalves S.; White H.: Bootstrap Standard Error Estimates for Linear Regression, JASA, 100, 471, , (2005). [10] Guslicov G.; Coarnă M.; Şerbănescu L.; Sorescu V.; Ispeciuc M.: Standard Strength of Cements Predicted via Computer Modeling, Key Engineering Materials, 264(-268), , (2004). [11] Jolliffe I.: Estimation of uncertainty in verification measures, International Verification Methods Workshop, The McGill Faculty Club, Montreal, Quebec, Canada, (2004). [12] Siersma V.; Johansen C.: The use of the bootstrap in the analysis of casecontrol studies missing data, Research Report - Institute of Public Health, University of Copenhagen, (2004). [13] The Phrase Finder: 2007, Meanings and Origins of Phrases, Sayings and Idioms, [14] Vaduva I.: Simulation models by computer, Technical Publishing House (in Romanian), Bucharest, (1977). [15] Verdonck F.; Jaworska J.; Thas O. and Vanrolleghem P.A.: Uncertainty Techniques in Environmental Risk Assessment, Med. Fac. Landbouw. Univ. Gent, 65/4, , (2000). [16] Web1: Testing Interlaboratory comparisons, Asia Pacific Laboratory Acreditation Cooperation, 229

Determining the Spread of a Distribution Variance & Standard Deviation

Determining the Spread of a Distribution Variance & Standard Deviation Determining the Spread of a Distribution Variance & Standard Deviation 1.3 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3 Lecture 3 1 / 32 Outline 1 Describing

More information

The exact bootstrap method shown on the example of the mean and variance estimation

The exact bootstrap method shown on the example of the mean and variance estimation Comput Stat (2013) 28:1061 1077 DOI 10.1007/s00180-012-0350-0 ORIGINAL PAPER The exact bootstrap method shown on the example of the mean and variance estimation Joanna Kisielinska Received: 21 May 2011

More information

Midwest Big Data Summer School: Introduction to Statistics. Kris De Brabanter

Midwest Big Data Summer School: Introduction to Statistics. Kris De Brabanter Midwest Big Data Summer School: Introduction to Statistics Kris De Brabanter kbrabant@iastate.edu Iowa State University Department of Statistics Department of Computer Science June 20, 2016 1/27 Outline

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

Chapter 3. Measuring data

Chapter 3. Measuring data Chapter 3 Measuring data 1 Measuring data versus presenting data We present data to help us draw meaning from it But pictures of data are subjective They re also not susceptible to rigorous inference Measuring

More information

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),

More information

11. Bootstrap Methods

11. Bootstrap Methods 11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods

More information

Graphical Presentation of a Nonparametric Regression with Bootstrapped Confidence Intervals

Graphical Presentation of a Nonparametric Regression with Bootstrapped Confidence Intervals Graphical Presentation of a Nonparametric Regression with Bootstrapped Confidence Intervals Mark Nicolich & Gail Jorgensen Exxon Biomedical Science, Inc., East Millstone, NJ INTRODUCTION Parametric regression

More information

Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions

Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions JKAU: Sci., Vol. 21 No. 2, pp: 197-212 (2009 A.D. / 1430 A.H.); DOI: 10.4197 / Sci. 21-2.2 Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions Ali Hussein Al-Marshadi

More information

P8130: Biostatistical Methods I

P8130: Biostatistical Methods I P8130: Biostatistical Methods I Lecture 2: Descriptive Statistics Cody Chiuzan, PhD Department of Biostatistics Mailman School of Public Health (MSPH) Lecture 1: Recap Intro to Biostatistics Types of Data

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota Submitted to the Annals of Statistics arxiv: math.pr/0000000 SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION By Wei Liu and Yuhong Yang University of Minnesota In

More information

Master of Science in Statistics A Proposal

Master of Science in Statistics A Proposal 1 Master of Science in Statistics A Proposal Rationale of the Program In order to cope up with the emerging complexity on the solutions of realistic problems involving several phenomena of nature it is

More information

Practical Statistics for the Analytical Scientist Table of Contents

Practical Statistics for the Analytical Scientist Table of Contents Practical Statistics for the Analytical Scientist Table of Contents Chapter 1 Introduction - Choosing the Correct Statistics 1.1 Introduction 1.2 Choosing the Right Statistical Procedures 1.2.1 Planning

More information

Determining the Spread of a Distribution

Determining the Spread of a Distribution Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1 MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1 1 The General Bootstrap This is a computer-intensive resampling algorithm for estimating the empirical

More information

Determining the Spread of a Distribution

Determining the Spread of a Distribution Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative

More information

Better Bootstrap Confidence Intervals

Better Bootstrap Confidence Intervals by Bradley Efron University of Washington, Department of Statistics April 12, 2012 An example Suppose we wish to make inference on some parameter θ T (F ) (e.g. θ = E F X ), based on data We might suppose

More information

Independent and conditionally independent counterfactual distributions

Independent and conditionally independent counterfactual distributions Independent and conditionally independent counterfactual distributions Marcin Wolski European Investment Bank M.Wolski@eib.org Society for Nonlinear Dynamics and Econometrics Tokyo March 19, 2018 Views

More information

Continuous random variables

Continuous random variables Continuous random variables A continuous random variable X takes all values in an interval of numbers. The probability distribution of X is described by a density curve. The total area under a density

More information

2011 Pearson Education, Inc

2011 Pearson Education, Inc Statistics for Business and Economics Chapter 2 Methods for Describing Sets of Data Summary of Central Tendency Measures Measure Formula Description Mean x i / n Balance Point Median ( n +1) Middle Value

More information

Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods

Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods Robert V. Breunig Centre for Economic Policy Research, Research School of Social Sciences and School of

More information

Preliminary Statistics course. Lecture 1: Descriptive Statistics

Preliminary Statistics course. Lecture 1: Descriptive Statistics Preliminary Statistics course Lecture 1: Descriptive Statistics Rory Macqueen (rm43@soas.ac.uk), September 2015 Organisational Sessions: 16-21 Sep. 10.00-13.00, V111 22-23 Sep. 15.00-18.00, V111 24 Sep.

More information

Web-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach

Web-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach 1 Web-based Supplementary Material for Dependence Calibration in Conditional Copulas: A Nonparametric Approach Elif F. Acar, Radu V. Craiu, and Fang Yao Web Appendix A: Technical Details The score and

More information

Z-score Summary - Plasticity Proficiency Testing Program (79) Z-SCORES SUMMARY. Plasticity May 2018(79)

Z-score Summary - Plasticity Proficiency Testing Program (79)   Z-SCORES SUMMARY. Plasticity May 2018(79) www.labsmartservices.com.au Z-SCORES SUMMARY Plasticity May 2018(79) The proficiency program was conducted in May 2018 with participants throughout Australia. AS 1289 test methods were preferred but other

More information

The Nonparametric Bootstrap

The Nonparametric Bootstrap The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use

More information

Unit 2. Describing Data: Numerical

Unit 2. Describing Data: Numerical Unit 2 Describing Data: Numerical Describing Data Numerically Describing Data Numerically Central Tendency Arithmetic Mean Median Mode Variation Range Interquartile Range Variance Standard Deviation Coefficient

More information

Statistics Toolbox 6. Apply statistical algorithms and probability models

Statistics Toolbox 6. Apply statistical algorithms and probability models Statistics Toolbox 6 Apply statistical algorithms and probability models Statistics Toolbox provides engineers, scientists, researchers, financial analysts, and statisticians with a comprehensive set of

More information

Sensitivity Analysis with Correlated Variables

Sensitivity Analysis with Correlated Variables Sensitivity Analysis with Correlated Variables st Workshop on Nonlinear Analysis of Shell Structures INTALES GmbH Engineering Solutions University of Innsbruck, Faculty of Civil Engineering University

More information

Bootstrapping the Confidence Intervals of R 2 MAD for Samples from Contaminated Standard Logistic Distribution

Bootstrapping the Confidence Intervals of R 2 MAD for Samples from Contaminated Standard Logistic Distribution Pertanika J. Sci. & Technol. 18 (1): 209 221 (2010) ISSN: 0128-7680 Universiti Putra Malaysia Press Bootstrapping the Confidence Intervals of R 2 MAD for Samples from Contaminated Standard Logistic Distribution

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

1. Exploratory Data Analysis

1. Exploratory Data Analysis 1. Exploratory Data Analysis 1.1 Methods of Displaying Data A visual display aids understanding and can highlight features which may be worth exploring more formally. Displays should have impact and be

More information

MIT Spring 2015

MIT Spring 2015 MIT 18.443 Dr. Kempthorne Spring 2015 MIT 18.443 1 Outline 1 MIT 18.443 2 Batches of data: single or multiple x 1, x 2,..., x n y 1, y 2,..., y m w 1, w 2,..., w l etc. Graphical displays Summary statistics:

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Verification of Continuous Forecasts

Verification of Continuous Forecasts Verification of Continuous Forecasts Presented by Barbara Brown Including contributions by Tressa Fowler, Barbara Casati, Laurence Wilson, and others Exploratory methods Scatter plots Discrimination plots

More information

interval forecasting

interval forecasting Interval Forecasting Based on Chapter 7 of the Time Series Forecasting by Chatfield Econometric Forecasting, January 2008 Outline 1 2 3 4 5 Terminology Interval Forecasts Density Forecast Fan Chart Most

More information

A better way to bootstrap pairs

A better way to bootstrap pairs A better way to bootstrap pairs Emmanuel Flachaire GREQAM - Université de la Méditerranée CORE - Université Catholique de Louvain April 999 Abstract In this paper we are interested in heteroskedastic regression

More information

COLLABORATION OF STATISTICAL METHODS IN SELECTING THE CORRECT MULTIPLE LINEAR REGRESSIONS

COLLABORATION OF STATISTICAL METHODS IN SELECTING THE CORRECT MULTIPLE LINEAR REGRESSIONS American Journal of Biostatistics 4 (2): 29-33, 2014 ISSN: 1948-9889 2014 A.H. Al-Marshadi, This open access article is distributed under a Creative Commons Attribution (CC-BY) 3.0 license doi:10.3844/ajbssp.2014.29.33

More information

Additional Problems Additional Problem 1 Like the http://www.stat.umn.edu/geyer/5102/examp/rlike.html#lmax example of maximum likelihood done by computer except instead of the gamma shape model, we will

More information

MSc / PhD Course Advanced Biostatistics. dr. P. Nazarov

MSc / PhD Course Advanced Biostatistics. dr. P. Nazarov MSc / PhD Course Advanced Biostatistics dr. P. Nazarov petr.nazarov@crp-sante.lu 2-12-2012 1. Descriptive Statistics edu.sablab.net/abs2013 1 Outline Lecture 0. Introduction to R - continuation Data import

More information

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables Chapter 2 Class Notes Sample & Population Descriptions Classifying variables Random Variables (RVs) are discrete quantitative continuous nominal qualitative ordinal Notation and Definitions: a Sample is

More information

Lecture 2 and Lecture 3

Lecture 2 and Lecture 3 Lecture 2 and Lecture 3 1 Lecture 2 and Lecture 3 We can describe distributions using 3 characteristics: shape, center and spread. These characteristics have been discussed since the foundation of statistics.

More information

REFERENCES AND FURTHER STUDIES

REFERENCES AND FURTHER STUDIES REFERENCES AND FURTHER STUDIES by..0. on /0/. For personal use only.. Afifi, A. A., and Azen, S. P. (), Statistical Analysis A Computer Oriented Approach, Academic Press, New York.. Alvarez, A. R., Welter,

More information

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference 1 / 171 Bootstrap inference Francisco Cribari-Neto Departamento de Estatística Universidade Federal de Pernambuco Recife / PE, Brazil email: cribari@gmail.com October 2013 2 / 171 Unpaid advertisement

More information

Confidence Estimation Methods for Neural Networks: A Practical Comparison

Confidence Estimation Methods for Neural Networks: A Practical Comparison , 6-8 000, Confidence Estimation Methods for : A Practical Comparison G. Papadopoulos, P.J. Edwards, A.F. Murray Department of Electronics and Electrical Engineering, University of Edinburgh Abstract.

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

Descriptive Univariate Statistics and Bivariate Correlation

Descriptive Univariate Statistics and Bivariate Correlation ESC 100 Exploring Engineering Descriptive Univariate Statistics and Bivariate Correlation Instructor: Sudhir Khetan, Ph.D. Wednesday/Friday, October 17/19, 2012 The Central Dogma of Statistics used to

More information

Statistics I Chapter 2: Univariate data analysis

Statistics I Chapter 2: Univariate data analysis Statistics I Chapter 2: Univariate data analysis Chapter 2: Univariate data analysis Contents Graphical displays for categorical data (barchart, piechart) Graphical displays for numerical data data (histogram,

More information

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics Last Lecture Distinguish Populations from Samples Importance of identifying a population and well chosen sample Knowing different Sampling Techniques Distinguish Parameters from Statistics Knowing different

More information

statistical methods for tailoring seasonal climate forecasts Andrew W. Robertson, IRI

statistical methods for tailoring seasonal climate forecasts Andrew W. Robertson, IRI statistical methods for tailoring seasonal climate forecasts Andrew W. Robertson, IRI tailored seasonal forecasts why do we make probabilistic forecasts? to reduce our uncertainty about the (unknown) future

More information

Estimation and Hypothesis Testing in LAV Regression with Autocorrelated Errors: Is Correction for Autocorrelation Helpful?

Estimation and Hypothesis Testing in LAV Regression with Autocorrelated Errors: Is Correction for Autocorrelation Helpful? Journal of Modern Applied Statistical Methods Volume 10 Issue Article 13 11-1-011 Estimation and Hypothesis Testing in LAV Regression with Autocorrelated Errors: Is Correction for Autocorrelation Helpful?

More information

Chapter 4. Displaying and Summarizing. Quantitative Data

Chapter 4. Displaying and Summarizing. Quantitative Data STAT 141 Introduction to Statistics Chapter 4 Displaying and Summarizing Quantitative Data Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 31 4.1 Histograms 1 We divide the range

More information

Particle Filters. Outline

Particle Filters. Outline Particle Filters M. Sami Fadali Professor of EE University of Nevada Outline Monte Carlo integration. Particle filter. Importance sampling. Degeneracy Resampling Example. 1 2 Monte Carlo Integration Numerical

More information

2.1 Measures of Location (P.9-11)

2.1 Measures of Location (P.9-11) MATH1015 Biostatistics Week.1 Measures of Location (P.9-11).1.1 Summation Notation Suppose that we observe n values from an experiment. This collection (or set) of n values is called a sample. Let x 1

More information

A Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003.

A Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003. A Note on Bootstraps and Robustness Tony Lancaster, Brown University, December 2003. In this note we consider several versions of the bootstrap and argue that it is helpful in explaining and thinking about

More information

Statistics I Chapter 2: Univariate data analysis

Statistics I Chapter 2: Univariate data analysis Statistics I Chapter 2: Univariate data analysis Chapter 2: Univariate data analysis Contents Graphical displays for categorical data (barchart, piechart) Graphical displays for numerical data data (histogram,

More information

A Note on Bayesian Inference After Multiple Imputation

A Note on Bayesian Inference After Multiple Imputation A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in

More information

One-Sample Numerical Data

One-Sample Numerical Data One-Sample Numerical Data quantiles, boxplot, histogram, bootstrap confidence intervals, goodness-of-fit tests University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring

Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring A. Ganguly, S. Mitra, D. Samanta, D. Kundu,2 Abstract Epstein [9] introduced the Type-I hybrid censoring scheme

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice

The Model Building Process Part I: Checking Model Assumptions Best Practice The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test

More information

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new

More information

FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE

FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE Course Title: Probability and Statistics (MATH 80) Recommended Textbook(s): Number & Type of Questions: Probability and Statistics for Engineers

More information

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error

More information

Analytical Bootstrap Methods for Censored Data

Analytical Bootstrap Methods for Censored Data JOURNAL OF APPLIED MATHEMATICS AND DECISION SCIENCES, 6(2, 129 141 Copyright c 2002, Lawrence Erlbaum Associates, Inc. Analytical Bootstrap Methods for Censored Data ALAN D. HUTSON Division of Biostatistics,

More information

Midrange: mean of highest and lowest scores. easy to compute, rough estimate, rarely used

Midrange: mean of highest and lowest scores. easy to compute, rough estimate, rarely used Measures of Central Tendency Mode: most frequent score. best average for nominal data sometimes none or multiple modes in a sample bimodal or multimodal distributions indicate several groups included in

More information

Lecture 11. Data Description Estimation

Lecture 11. Data Description Estimation Lecture 11 Data Description Estimation Measures of Central Tendency (continued, see last lecture) Sample mean, population mean Sample mean for frequency distributions The median The mode The midrange 3-22

More information

Post-exam 2 practice questions 18.05, Spring 2014

Post-exam 2 practice questions 18.05, Spring 2014 Post-exam 2 practice questions 18.05, Spring 2014 Note: This is a set of practice problems for the material that came after exam 2. In preparing for the final you should use the previous review materials,

More information

Statistical simulation and prediction in software reliability

Statistical simulation and prediction in software reliability An. Şt. Univ. Ovidius Constanţa Vol. 16(1), 2008, 81 90 Statistical simulation and prediction in software reliability Alexei Leahu and Carmen Elena Lupu Abstract On the base of statistical simulation (Monte

More information

Finite Population Correction Methods

Finite Population Correction Methods Finite Population Correction Methods Moses Obiri May 5, 2017 Contents 1 Introduction 1 2 Normal-based Confidence Interval 2 3 Bootstrap Confidence Interval 3 4 Finite Population Bootstrap Sampling 5 4.1

More information

Statistical Practice

Statistical Practice Statistical Practice A Note on Bayesian Inference After Multiple Imputation Xiang ZHOU and Jerome P. REITER This article is aimed at practitioners who plan to use Bayesian inference on multiply-imputed

More information

CHAPTER 1. Introduction

CHAPTER 1. Introduction CHAPTER 1 Introduction Engineers and scientists are constantly exposed to collections of facts, or data. The discipline of statistics provides methods for organizing and summarizing data, and for drawing

More information

Coping with uncertainty in the economical optimization of a dike design

Coping with uncertainty in the economical optimization of a dike design Van Gelder, et.al. - 1 Coping with uncertainty in the economical optimization of a dike design P.H.A.J.M. VAN GELDER, J.K. VRIJLING, K.A.H. SLIJKHUIS Delft University of Technology, Faculty of Civil Engineering

More information

MATH 117 Statistical Methods for Management I Chapter Three

MATH 117 Statistical Methods for Management I Chapter Three Jubail University College MATH 117 Statistical Methods for Management I Chapter Three This chapter covers the following topics: I. Measures of Center Tendency. 1. Mean for Ungrouped Data (Raw Data) 2.

More information

Week Topics of study Home/Independent Learning Assessment (If in addition to homework) 7 th September 2015

Week Topics of study Home/Independent Learning Assessment (If in addition to homework) 7 th September 2015 Week Topics of study Home/Independent Learning Assessment (If in addition to homework) 7 th September Functions: define the terms range and domain (PLC 1A) and identify the range and domain of given functions

More information

Martin L. Lesser, PhD Biostatistics Unit Feinstein Institute for Medical Research North Shore-LIJ Health System

Martin L. Lesser, PhD Biostatistics Unit Feinstein Institute for Medical Research North Shore-LIJ Health System PREP Course #10: Introduction to Exploratory Data Analysis and Data Transformations (Part 1) Martin L. Lesser, PhD Biostatistics Unit Feinstein Institute for Medical Research North Shore-LIJ Health System

More information

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference 1 / 172 Bootstrap inference Francisco Cribari-Neto Departamento de Estatística Universidade Federal de Pernambuco Recife / PE, Brazil email: cribari@gmail.com October 2014 2 / 172 Unpaid advertisement

More information

Data splitting. INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+TITLE:

Data splitting. INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+TITLE: #+TITLE: Data splitting INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+AUTHOR: Thomas Alexander Gerds #+INSTITUTE: Department of Biostatistics, University of Copenhagen

More information

Section 3. Measures of Variation

Section 3. Measures of Variation Section 3 Measures of Variation Range Range = (maximum value) (minimum value) It is very sensitive to extreme values; therefore not as useful as other measures of variation. Sample Standard Deviation The

More information

Resampling and the Bootstrap

Resampling and the Bootstrap Resampling and the Bootstrap Axel Benner Biostatistics, German Cancer Research Center INF 280, D-69120 Heidelberg benner@dkfz.de Resampling and the Bootstrap 2 Topics Estimation and Statistical Testing

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE

More information

Sequential Implementation of Monte Carlo Tests with Uniformly Bounded Resampling Risk

Sequential Implementation of Monte Carlo Tests with Uniformly Bounded Resampling Risk Sequential Implementation of Monte Carlo Tests with Uniformly Bounded Resampling Risk Axel Gandy Department of Mathematics Imperial College London a.gandy@imperial.ac.uk user! 2009, Rennes July 8-10, 2009

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

The assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values

The assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values Statistical Consulting Topics The Bootstrap... The bootstrap is a computer-based method for assigning measures of accuracy to statistical estimates. (Efron and Tibshrani, 1998.) What do we do when our

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Bootstrapping, Randomization, 2B-PLS

Bootstrapping, Randomization, 2B-PLS Bootstrapping, Randomization, 2B-PLS Statistics, Tests, and Bootstrapping Statistic a measure that summarizes some feature of a set of data (e.g., mean, standard deviation, skew, coefficient of variation,

More information

Bootstrap Approach to Comparison of Alternative Methods of Parameter Estimation of a Simultaneous Equation Model

Bootstrap Approach to Comparison of Alternative Methods of Parameter Estimation of a Simultaneous Equation Model Bootstrap Approach to Comparison of Alternative Methods of Parameter Estimation of a Simultaneous Equation Model Olubusoye, O. E., J. O. Olaomi, and O. O. Odetunde Abstract A bootstrap simulation approach

More information

Analysis of Fast Input Selection: Application in Time Series Prediction

Analysis of Fast Input Selection: Application in Time Series Prediction Analysis of Fast Input Selection: Application in Time Series Prediction Jarkko Tikka, Amaury Lendasse, and Jaakko Hollmén Helsinki University of Technology, Laboratory of Computer and Information Science,

More information

Step-Stress Models and Associated Inference

Step-Stress Models and Associated Inference Department of Mathematics & Statistics Indian Institute of Technology Kanpur August 19, 2014 Outline Accelerated Life Test 1 Accelerated Life Test 2 3 4 5 6 7 Outline Accelerated Life Test 1 Accelerated

More information

3 Lecture 3 Notes: Measures of Variation. The Boxplot. Definition of Probability

3 Lecture 3 Notes: Measures of Variation. The Boxplot. Definition of Probability 3 Lecture 3 Notes: Measures of Variation. The Boxplot. Definition of Probability 3.1 Week 1 Review Creativity is more than just being different. Anybody can plan weird; that s easy. What s hard is to be

More information

Basics of Uncertainty Analysis

Basics of Uncertainty Analysis Basics of Uncertainty Analysis Chapter Six Basics of Uncertainty Analysis 6.1 Introduction As shown in Fig. 6.1, analysis models are used to predict the performances or behaviors of a product under design.

More information

Econometric Analysis of Cross Section and Panel Data

Econometric Analysis of Cross Section and Panel Data Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND

More information

The empirical ( ) rule

The empirical ( ) rule The empirical (68-95-99.7) rule With a bell shaped distribution, about 68% of the data fall within a distance of 1 standard deviation from the mean. 95% fall within 2 standard deviations of the mean. 99.7%

More information

An application of the GAM-PCA-VAR model to respiratory disease and air pollution data

An application of the GAM-PCA-VAR model to respiratory disease and air pollution data An application of the GAM-PCA-VAR model to respiratory disease and air pollution data Márton Ispány 1 Faculty of Informatics, University of Debrecen Hungary Joint work with Juliana Bottoni de Souza, Valdério

More information

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)

More information

Describing Distributions with Numbers

Describing Distributions with Numbers Topic 2 We next look at quantitative data. Recall that in this case, these data can be subject to the operations of arithmetic. In particular, we can add or subtract observation values, we can sort them

More information

STAT440/840: Statistical Computing

STAT440/840: Statistical Computing First Prev Next Last STAT440/840: Statistical Computing Paul Marriott pmarriott@math.uwaterloo.ca MC 6096 February 2, 2005 Page 1 of 41 First Prev Next Last Page 2 of 41 Chapter 3: Data resampling: the

More information

Advanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland

Advanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland EnviroInfo 2004 (Geneva) Sh@ring EnviroInfo 2004 Advanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland Mikhail Kanevski 1, Michel Maignan 1

More information

In-Flight Engine Diagnostics and Prognostics Using A Stochastic-Neuro-Fuzzy Inference System

In-Flight Engine Diagnostics and Prognostics Using A Stochastic-Neuro-Fuzzy Inference System In-Flight Engine Diagnostics and Prognostics Using A Stochastic-Neuro-Fuzzy Inference System Dan M. Ghiocel & Joshua Altmann STI Technologies, Rochester, New York, USA Keywords: reliability, stochastic

More information