GEOS 36501/EVOL January 2012 Page 1 of 23
|
|
- Scott Chandler
- 5 years ago
- Views:
Transcription
1 GEOS 36501/EVOL January 2012 Page 1 of 23 III. Sampling 1 Overview of Sampling, Error, Bias 1.1 Biased vs. random sampling 1.2 Biased vs. unbiased statistic (or estimator) 1.3 Precision vs. accuracy 2 Error Estimates With Assumed Sampling Distribution 2.1 Standard Error: Standard deviation of distribution of sample statistics that would result from infinite number of trials of drawing sample from underlying probability distribution and calculating the sample statistic. 2.2 In practice we generally do not estimate error by repeated sampling from the underlying distribution (expensive and time-consuming), although there are exceptions. 2.3 Approximations based on sample distribution (from Sokal and Rohlf):
2 GEOS 36501/EVOL January 2012 Page 2 of 23
3 GEOS 36501/EVOL January 2012 Page 3 of Limitations: Many approximation formulae make assumptions about shape of distribution and sample size We may be interested in novel statistic or one whose sampling distribution is not well characterized. 3 Bootstrap Error Estimates 3.1 Estimate standard error by resampling from the single sample we have. 3.2 This approach uses sampling with replacement from observed sample to simulate sampling without replacement from the underlying distribution. 3.3 Procedure Start with observed sample of size n and observed sample statistic, call it Z Randomly pick a sample of size n, with replacement, from the observed sample Calculate the sample statistic of interest on this random sample; call is Z boot Repeat many times (generally hundreds to thousands, ideally until estimate of SE stabilizes) Calculate standard deviation of the Z boot. This is an estimate of the standard error of the observed sample statistic Z: SD(Z boot ) SE(Z). 3.4 Simple (but not necessarily most useful) example: trimmed mean Define p-% trimmed mean as mean of sample with p% lowest and p% highest observations discarded. (Idea is to try to reduce effect of outliers.) Suppose data consist of 10 (ordered) observations: 1,2,3,4,8,10,12,15,20,30. Let the trimmed mean be denoted Z. Then Z = ( )/6 = 8.67.
4 GEOS 36501/EVOL January 2012 Page 4 of 23 R code to estimate SE(Z) #define function trim.mean<-function(x,ntrim){ ii<-order(x) xtmp<-x[ii] return(mean(xtmp[(ntrim+1):(n-ntrim)]))} data<-c(1,2,3,4,8,10,12,15,20,30) #specify data n<-length(data) ntrim<-2 #specify number to trim from each side Zobs<-trim.mean(data,ntrim) #get observed value nrep< #specify number of bootstrap replicates Zboot<-rep(NA,nrep) #assign memory for (i in 1:nrep) #get bootstrap replicates Zboot[i]<-trim.mean(sample(data,n,replace=TRUE),ntrim) SE<-sd(Zboot) #calculate bootstrap std. error hist(zboot,breaks=50) #plot histogram of results #alternative code, without loops DATA<-matrix(sample(data,nrep*n,replace=TRUE),n,nrep) #each column is a bootstrap replicate Zboot<-apply(DATA,2,trim.mean,ntrim) SE<-sd(Zboot) This yields Z obs = 8.67 and SE(Z) 3.1. Histogram of Zboot Frequency Zboot
5 GEOS 36501/EVOL January 2012 Page 5 of Useful R function: sample(x,n,replace=true[or FALSE]) returns a random sample of size n from the vector x with or without replacement. 3.6 To sample from array X so that the variables (columns) stay together: nr<-dim(x)[1] #get number of rows i<-sample(1:nr,n,replace=true[or FALSE]) #returns vector of integers sampled on [1,n] XSAMP<-X[i,] 4 Parametric bootstrap 4.1 Take observed sample and estimate relevant parameter from it. 4.2 Resample from parametric distribution with parameter equal to sample estimate (rather than resampling from observed distribution). 4.3 This approach can also be applied to more complicated situations: for example, simulating a process with parameters estimated from data We ll do lots of this later...
6 GEOS 36501/EVOL January 2012 Page 6 of 23 5 Examples of Finite-sample Bias (sample-size bias) 5.1 Sample variance (x x) 2 /n is biased. This is systematically too low, which makes sense since it is based on squared deviations from sample mean (x x) 2 /(n 1) is unbiased. 5.2 Number of taxa Rarefaction method (from Raup 1975) Abundance of species i is N i ; N = N i. Consider a particular species, i. ( N N i ) n is the number of ways of drawing the non-i individuals in a sample of n. ( N n) is the number of ways of drawing all individuals. Therefore, the ratio of these two is the probability of not drawing any individuals of species i. Therefore 1 minus this ratio is the probability of drawing at least one individual of species i. So the expected number of species is just the sum of this probability, calculated for each species in turn Caveats Rarefaction for interpolation rather than extrapolation Collecting curves vs. rarefaction curves Apparent leveling off of curves does not imply that nearly everything has been found (only that you re unlikely to find it with modest effort). Curves affected by factors other than sample size (sampling method, taxonomic treatment, size of geographic area etc.). Crossing of rarefaction curves can make interpretation difficult.
7 GEOS 36501/EVOL January 2012 Page 7 of 23
8 GEOS 36501/EVOL January 2012 Page 8 of Examples of application of taxonomic rarefaction (Raup 1975; Raup and Schopf 1978) This example suggests that the increase in observed family diversity in post-paleozoic echinoids cannot be accounted for by an increase in the number of species sampled.
9 GEOS 36501/EVOL January 2012 Page 9 of 23 This example suggests that much of the variation in the number of observed echinoid orders is consistent with differences in number of sampled species. (But does this mean that s really all that is going on?!)
10 GEOS 36501/EVOL January 2012 Page 10 of Interpretation of taxonomic rarefaction curves not entirely straightforward. Sampling standardization to be treated in more detail later
11 GEOS 36501/EVOL January 2012 Page 11 of Range Example: Range of samples from normal distribution
12 GEOS 36501/EVOL January 2012 Page 12 of 23
13 GEOS 36501/EVOL January 2012 Page 13 of 23
14 GEOS 36501/EVOL January 2012 Page 14 of 23
15 GEOS 36501/EVOL January 2012 Page 15 of Example: Test for nonrandomness of sampling with respect to morphology (Foote 1997, Paleobiology 23:181)
16 GEOS 36501/EVOL January 2012 Page 16 of Correction in general case via rarefaction (random subsampling at controlled sample-size) (Foote 1992, Paleobiology 18:1) Caveat: Range at standardized sample size may not convey any information that isn t conveyed by sample variance.
17 GEOS 36501/EVOL January 2012 Page 17 of 23 6 Extreme value statistics 6.1 Introduction to problem Previous look at standard errors considered sampling distribution of quantities such as mean We may also be interested in distribution of extremes: For example, how is the largest of n observations distributed, or the second smallest, etc.? Applications: earthquakes, floods, etc.; evolutionary constraints 6.2 Probability of number of observations exceeding some value, if distribution known P r(x > x) = 1 F (x), where F (x) is the cumulative distribution If there are N observations, then the probability that exactly k of them exceed some value x is given by a simple binomial: ( ) N [1 F (x)] k F (x) N k k Example: normal with N = 10, x = 0.67, and k = 3: F (0.67) = 0.75, so the probability = ( 10 3 ) = Future observations Suppse we have n 1 past observations ranked from m = 1 (largest) to m = n 1 (smallest), and we take n 2 future observations. What is the probability that exactly k of n 2 observations will exceed the m th value from the first set of n 1 observations? Simply find F (x) corresponding to the m th value and plug into previous binomial equation. Clearly this works only if we know the distribution.
18 GEOS 36501/EVOL January 2012 Page 18 of Probability of number of observations exceeding some value, even if distribution is not known General expressions:
19 GEOS 36501/EVOL January 2012 Page 19 of Derivaton: See Gumbel pp Intuitive explanation for insensitivity to distribution: A given number of points should cover a given proportion of the cumulative distribution, regardless of the shape of the distribution (provided that it is continuous) Example (table from Gumbel): Note symmetry in table. Probability of x exceedances above largest is the same as probability of x exceedances below lowest, etc.
20 GEOS 36501/EVOL January 2012 Page 20 of Application to crinoid evolution (Foote 1994)
21 GEOS 36501/EVOL January 2012 Page 21 of 23
22 GEOS 36501/EVOL January 2012 Page 22 of 23
23 GEOS 36501/EVOL January 2012 Page 23 of Relationship to theory of records Let there be n 1 past trials and n 2 future trials. What is the probability that the record set (m = 1) by first set of trials will stand by the second set (i.e. x = 0)? This is w(0). Now, suppose we let n 1 = n 2, then we have: ( n1 ) ( m m n2 ) x w(x) = (n 1 + n 2 ) ( n 1 +n 2 1), x+m 1 which, for n 1 = n 2, m = 1, and x = 0, gives which is equal to 1 2. w(0) = ( n1 1 )( n1 0 ) (2n 1 ) ( 2n What is the expected number of exceedances above the past record? E(x) = mn 2 n = n 1 n for large n 1 ), Thus, for athletic contests, if all trials reflect the same underlying pool of talent, equipment, etc., the waiting time between successive record should progressively double Likewise for discoveries of largest dinosaur, oldest primate etc. Deviations suggest change in rules or nonrandom searching.
II. Introduction to probability, 2
GEOS 33000/EVOL 33000 5 January 2006 updated January 10, 2006 Page 1 II. Introduction to probability, 2 1 Random Variables 1.1 Definition: A random variable is a function defined on a sample space. In
More informationData Analysis and Statistical Methods Statistics 651
Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Suhasini Subba Rao Motivations for the ANOVA We defined the F-distribution, this is mainly used in
More informationThe bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap
Patrick Breheny December 6 Patrick Breheny BST 764: Applied Statistical Modeling 1/21 The empirical distribution function Suppose X F, where F (x) = Pr(X x) is a distribution function, and we wish to estimate
More informationHypothesis Testing with the Bootstrap. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods
Hypothesis Testing with the Bootstrap Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods Bootstrap Hypothesis Testing A bootstrap hypothesis test starts with a test statistic
More informationFigure Figure
Figure 4-12. Equal probability of selection with simple random sampling of equal-sized clusters at first stage and simple random sampling of equal number at second stage. The next sampling approach, shown
More informationCS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer.
Department of Computer Science Virginia Tech Blacksburg, Virginia Copyright c 2015 by Clifford A. Shaffer Computer Science Title page Computer Science Clifford A. Shaffer Fall 2015 Clifford A. Shaffer
More informationCS 543 Page 1 John E. Boon, Jr.
CS 543 Machine Learning Spring 2010 Lecture 05 Evaluating Hypotheses I. Overview A. Given observed accuracy of a hypothesis over a limited sample of data, how well does this estimate its accuracy over
More informationPermutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods
Permutation Tests Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods The Two-Sample Problem We observe two independent random samples: F z = z 1, z 2,, z n independently of
More informationWhy is the field of statistics still an active one?
Why is the field of statistics still an active one? It s obvious that one needs statistics: to describe experimental data in a compact way, to compare datasets, to ask whether data are consistent with
More informationI. Introduction to probability, 1
GEOS 33000/EVOL 33000 3 January 2006 updated January 10, 2006 Page 1 I. Introduction to probability, 1 1 Sample Space and Basic Probability 1.1 Theoretical space of outcomes of conceptual experiment 1.2
More informationCHAPTER 1. Introduction
CHAPTER 1 Introduction Engineers and scientists are constantly exposed to collections of facts, or data. The discipline of statistics provides methods for organizing and summarizing data, and for drawing
More information-However, this definition can be expanded to include: biology (biometrics), environmental science (environmetrics), economics (econometrics).
Chemometrics Application of mathematical, statistical, graphical or symbolic methods to maximize chemical information. -However, this definition can be expanded to include: biology (biometrics), environmental
More information2.3 Estimating PDFs and PDF Parameters
.3 Estimating PDFs and PDF Parameters estimating means - discrete and continuous estimating variance using a known mean estimating variance with an estimated mean estimating a discrete pdf estimating a
More informationRelating Graph to Matlab
There are two related course documents on the web Probability and Statistics Review -should be read by people without statistics background and it is helpful as a review for those with prior statistics
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationNonparametric hypothesis tests and permutation tests
Nonparametric hypothesis tests and permutation tests 1.7 & 2.3. Probability Generating Functions 3.8.3. Wilcoxon Signed Rank Test 3.8.2. Mann-Whitney Test Prof. Tesler Math 283 Fall 2018 Prof. Tesler Wilcoxon
More informationLecture: Mixture Models for Microbiome data
Lecture: Mixture Models for Microbiome data Lecture 3: Mixture Models for Microbiome data Outline: - - Sequencing thought experiment Mixture Models (tangent) - (esp. Negative Binomial) - Differential abundance
More informationOne-Sample Numerical Data
One-Sample Numerical Data quantiles, boxplot, histogram, bootstrap confidence intervals, goodness-of-fit tests University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html
More information- measures the center of our distribution. In the case of a sample, it s given by: y i. y = where n = sample size.
Descriptive Statistics: One of the most important things we can do is to describe our data. Some of this can be done graphically (you should be familiar with histograms, boxplots, scatter plots and so
More information9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.
Introduction to Data and Analysis Wildlife Management is a very quantitative field of study Results from studies will be used throughout this course and throughout your career. Sampling design influences
More informationSampling in Space and Time. Natural experiment? Analytical Surveys
Sampling in Space and Time Overview of Sampling Approaches Sampling versus Experimental Design Experiments deliberately perturb a portion of population to determine effect objective is to compare the mean
More informationResampling and the Bootstrap
Resampling and the Bootstrap Axel Benner Biostatistics, German Cancer Research Center INF 280, D-69120 Heidelberg benner@dkfz.de Resampling and the Bootstrap 2 Topics Estimation and Statistical Testing
More information- a value calculated or derived from the data.
Descriptive statistics: Note: I'm assuming you know some basics. If you don't, please read chapter 1 on your own. It's pretty easy material, and it gives you a good background as to why we need statistics.
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationECON Introductory Econometrics. Lecture 2: Review of Statistics
ECON415 - Introductory Econometrics Lecture 2: Review of Statistics Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 2-3 Lecture outline 2 Simple random sampling Distribution of the sample
More informationarxiv: v1 [math.gm] 23 Dec 2018
A Peculiarity in the Parity of Primes arxiv:1812.11841v1 [math.gm] 23 Dec 2018 Debayan Gupta MIT debayan@mit.edu January 1, 2019 Abstract Mayuri Sridhar MIT mayuri@mit.edu We create a simple test for distinguishing
More informationCS 361: Probability & Statistics
February 26, 2018 CS 361: Probability & Statistics Random variables The discrete uniform distribution If every value of a discrete random variable has the same probability, then its distribution is called
More informationWeek 2: Review of probability and statistics
Week 2: Review of probability and statistics Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ALL RIGHTS RESERVED
More informationUSE OF STATISTICAL BOOTSTRAPPING FOR SAMPLE SIZE DETERMINATION TO ESTIMATE LENGTH-FREQUENCY DISTRIBUTIONS FOR PACIFIC ALBACORE TUNA (THUNNUS ALALUNGA)
FRI-UW-992 March 1999 USE OF STATISTICAL BOOTSTRAPPING FOR SAMPLE SIZE DETERMINATION TO ESTIMATE LENGTH-FREQUENCY DISTRIBUTIONS FOR PACIFIC ALBACORE TUNA (THUNNUS ALALUNGA) M. GOMEZ-BUCKLEY, L. CONQUEST,
More informationMonte Carlo Studies. The response in a Monte Carlo study is a random variable.
Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating
More informationAPPENDIX E: Estimating diversity profile based on the proposed RAD estimator (for abundance data).
Anne Chao, T. C. Hsieh, Robin L. Chazdon, Robert K. Colwell, and Nicholas J. Gotelli. 2015. Unveiling the species-rank abundance distribution by generalizing the Good-Turing sample coverage theory. Ecology
More informationPhysics 509: Bootstrap and Robust Parameter Estimation
Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept
More informationStatistics and Data Analysis in Geology
Statistics and Data Analysis in Geology 6. Normal Distribution probability plots central limits theorem Dr. Franz J Meyer Earth and Planetary Remote Sensing, University of Alaska Fairbanks 1 2 An Enormously
More informationPolitical Science 236 Hypothesis Testing: Review and Bootstrapping
Political Science 236 Hypothesis Testing: Review and Bootstrapping Rocío Titiunik Fall 2007 1 Hypothesis Testing Definition 1.1 Hypothesis. A hypothesis is a statement about a population parameter The
More informationINDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -27 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.
INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY Lecture -27 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc. Summary of the previous lecture Frequency factors Normal distribution
More informationFORECASTING STANDARDS CHECKLIST
FORECASTING STANDARDS CHECKLIST An electronic version of this checklist is available on the Forecasting Principles Web site. PROBLEM 1. Setting Objectives 1.1. Describe decisions that might be affected
More informationCHAO, JACKKNIFE AND BOOTSTRAP ESTIMATORS OF SPECIES RICHNESS
IJAMAA, Vol. 12, No. 1, (January-June 2017), pp. 7-15 Serials Publications ISSN: 0973-3868 CHAO, JACKKNIFE AND BOOTSTRAP ESTIMATORS OF SPECIES RICHNESS CHAVAN KR. SARMAH ABSTRACT: The species richness
More informationChapter 1. The data we first collected was the diameter of all the different colored M&Ms we were given. The diameter is in cm.
+ = M&M Experiment Introduction!! In order to achieve a better understanding of chapters 1-9 in our textbook, we have outlined experiments that address the main points present in each of the mentioned
More informationChapter 7: Simple linear regression
The absolute movement of the ground and buildings during an earthquake is small even in major earthquakes. The damage that a building suffers depends not upon its displacement, but upon the acceleration.
More informationThe Nonparametric Bootstrap
The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use
More informationBasic Statistics. 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation).
Basic Statistics There are three types of error: 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation). 2. Systematic error - always too high or too low
More informationResampling and the Bootstrap
Resampling and the Bootstrap Axel Benner Biostatistics, German Cancer Research Center INF 280, D-69120 Heidelberg benner@dkfz.de Resampling and the Bootstrap 2 Topics Estimation and Statistical Testing
More informationKernel-based density. Nuno Vasconcelos ECE Department, UCSD
Kernel-based density estimation Nuno Vasconcelos ECE Department, UCSD Announcement last week of classes we will have Cheetah Day (exact day TBA) what: 4 teams of 6 people each team will write a report
More information3.3 Estimator quality, confidence sets and bootstrapping
Estimator quality, confidence sets and bootstrapping 109 3.3 Estimator quality, confidence sets and bootstrapping A comparison of two estimators is always a matter of comparing their respective distributions.
More informationDesign and Implementation of CUSUM Exceedance Control Charts for Unknown Location
Design and Implementation of CUSUM Exceedance Control Charts for Unknown Location MARIEN A. GRAHAM Department of Statistics University of Pretoria South Africa marien.graham@up.ac.za S. CHAKRABORTI Department
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationStatistics 135 Fall 2007 Midterm Exam
Name: Student ID Number: Statistics 135 Fall 007 Midterm Exam Ignore the finite population correction in all relevant problems. The exam is closed book, but some possibly useful facts about probability
More informationUncertainty due to Finite Resolution Measurements
Uncertainty due to Finite Resolution Measurements S.D. Phillips, B. Tolman, T.W. Estler National Institute of Standards and Technology Gaithersburg, MD 899 Steven.Phillips@NIST.gov Abstract We investigate
More informationProbability and Statistics
Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT
More informationLecture 3: Mixture Models for Microbiome data. Lecture 3: Mixture Models for Microbiome data
Lecture 3: Mixture Models for Microbiome data 1 Lecture 3: Mixture Models for Microbiome data Outline: - Mixture Models (Negative Binomial) - DESeq2 / Don t Rarefy. Ever. 2 Hypothesis Tests - reminder
More informationBootstrap, Jackknife and other resampling methods
Bootstrap, Jackknife and other resampling methods Part III: Parametric Bootstrap Rozenn Dahyot Room 128, Department of Statistics Trinity College Dublin, Ireland dahyot@mee.tcd.ie 2005 R. Dahyot (TCD)
More informationFrequency Estimation of Rare Events by Adaptive Thresholding
Frequency Estimation of Rare Events by Adaptive Thresholding J. R. M. Hosking IBM Research Division 2009 IBM Corporation Motivation IBM Research When managing IT systems, there is a need to identify transactions
More informationBootstrap tests. Patrick Breheny. October 11. Bootstrap vs. permutation tests Testing for equality of location
Bootstrap tests Patrick Breheny October 11 Patrick Breheny STA 621: Nonparametric Statistics 1/14 Introduction Conditioning on the observed data to obtain permutation tests is certainly an important idea
More informationCommon ontinuous random variables
Common ontinuous random variables CE 311S Earlier, we saw a number of distribution families Binomial Negative binomial Hypergeometric Poisson These were useful because they represented common situations:
More informationClass 8 Review Problems solutions, 18.05, Spring 2014
Class 8 Review Problems solutions, 8.5, Spring 4 Counting and Probability. (a) Create an arrangement in stages and count the number of possibilities at each stage: ( ) Stage : Choose three of the slots
More informationStat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010
1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of
More informationDensity Estimation (II)
Density Estimation (II) Yesterday Overview & Issues Histogram Kernel estimators Ideogram Today Further development of optimization Estimating variance and bias Adaptive kernels Multivariate kernel estimation
More informationBootstrap. ADA1 November 27, / 38
The bootstrap as a statistical method was invented in 1979 by Bradley Efron, one of the most influential statisticians still alive. The idea is nonparametric, but is not based on ranks, and is very computationally
More informationPart I. Sampling design. Overview. INFOWO Lecture M6: Sampling design and Experiments. Outline. Sampling design Experiments.
Overview INFOWO Lecture M6: Sampling design and Experiments Peter de Waal Sampling design Experiments Department of Information and Computing Sciences Faculty of Science, Universiteit Utrecht Lecture 4:
More informationA General Overview of Parametric Estimation and Inference Techniques.
A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying
More informationRegression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.
Regression, part II I. What does it all mean? A) Notice that so far all we ve done is math. 1) One can calculate the Least Squares Regression Line for anything, regardless of any assumptions. 2) But, if
More informationX = X X n, + X 2
CS 70 Discrete Mathematics for CS Fall 2003 Wagner Lecture 22 Variance Question: At each time step, I flip a fair coin. If it comes up Heads, I walk one step to the right; if it comes up Tails, I walk
More informationSTAT440/840: Statistical Computing
First Prev Next Last STAT440/840: Statistical Computing Paul Marriott pmarriott@math.uwaterloo.ca MC 6096 February 2, 2005 Page 1 of 41 First Prev Next Last Page 2 of 41 Chapter 3: Data resampling: the
More informationIntroduction to statistics
Introduction to statistics Literature Raj Jain: The Art of Computer Systems Performance Analysis, John Wiley Schickinger, Steger: Diskrete Strukturen Band 2, Springer David Lilja: Measuring Computer Performance:
More information(Re)introduction to Statistics Dan Lizotte
(Re)introduction to Statistics Dan Lizotte 2017-01-17 Statistics The systematic collection and arrangement of numerical facts or data of any kind; (also) the branch of science or mathematics concerned
More informationQuestions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.
Chapter 7 Reading 7.1, 7.2 Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.112 Introduction In Chapter 5 and 6, we emphasized
More information10.7 Fama and French Mutual Funds notes
1.7 Fama and French Mutual Funds notes Why the Fama-French simulation works to detect skill, even without knowing the characteristics of skill. The genius of the Fama-French simulation is that it lets
More informationProbability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur
Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur Lecture No. # 33 Probability Models using Gamma and Extreme Value
More informationTHE N-VALUE GAME OVER Z AND R
THE N-VALUE GAME OVER Z AND R YIDA GAO, MATT REDMOND, ZACH STEWARD Abstract. The n-value game is an easily described mathematical diversion with deep underpinnings in dynamical systems analysis. We examine
More informationThe Components of a Statistical Hypothesis Testing Problem
Statistical Inference: Recall from chapter 5 that statistical inference is the use of a subset of a population (the sample) to draw conclusions about the entire population. In chapter 5 we studied one
More informationContents. Acknowledgments. xix
Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables
More information18.05 Practice Final Exam
No calculators. 18.05 Practice Final Exam Number of problems 16 concept questions, 16 problems. Simplifying expressions Unless asked to explicitly, you don t need to simplify complicated expressions. For
More informationStatistics for Managers Using Microsoft Excel 5th Edition
Statistics for Managers Using Microsoft Ecel 5th Edition Chapter 7 Sampling and Statistics for Managers Using Microsoft Ecel, 5e 2008 Pearson Prentice-Hall, Inc. Chap 7-12 Why Sample? Selecting a sample
More informationRobustness and Distribution Assumptions
Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology
More informationAn Algorithm for Unconstrained Quadratically Penalized Convex Optimization (post conference version)
7/28/ 10 UseR! The R User Conference 2010 An Algorithm for Unconstrained Quadratically Penalized Convex Optimization (post conference version) Steven P. Ellis New York State Psychiatric Institute at Columbia
More informationBootstrapping, Randomization, 2B-PLS
Bootstrapping, Randomization, 2B-PLS Statistics, Tests, and Bootstrapping Statistic a measure that summarizes some feature of a set of data (e.g., mean, standard deviation, skew, coefficient of variation,
More informationLecture 3: Chapter 3
Lecture 3: Chapter 3 C C Moxley UAB Mathematics 26 January 16 3.2 Measurements of Center Statistics involves describing data sets and inferring things about them. The first step in understanding a set
More informationClassification and Regression Trees
Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity
More informationError analysis in biology
Error analysis in biology Marek Gierliński Division of Computational Biology Hand-outs available at http://is.gd/statlec Oxford Latin dictionary 2 Previously on Errors Random variable: numerical outcome
More informationHistograms allow a visual interpretation
Chapter 4: Displaying and Summarizing i Quantitative Data s allow a visual interpretation of quantitative (numerical) data by indicating the number of data points that lie within a range of values, called
More informationDistribution-Free Procedures (Devore Chapter Fifteen)
Distribution-Free Procedures (Devore Chapter Fifteen) MATH-5-01: Probability and Statistics II Spring 018 Contents 1 Nonparametric Hypothesis Tests 1 1.1 The Wilcoxon Rank Sum Test........... 1 1. Normal
More information1.0 Continuous Distributions. 5.0 Shapes of Distributions. 6.0 The Normal Curve. 7.0 Discrete Distributions. 8.0 Tolerances. 11.
Chapter 4 Statistics 45 CHAPTER 4 BASIC QUALITY CONCEPTS 1.0 Continuous Distributions.0 Measures of Central Tendency 3.0 Measures of Spread or Dispersion 4.0 Histograms and Frequency Distributions 5.0
More informationDescriptive statistics
Patrick Breheny February 6 Patrick Breheny to Biostatistics (171:161) 1/25 Tables and figures Human beings are not good at sifting through large streams of data; we understand data much better when it
More informationFinite Population Correction Methods
Finite Population Correction Methods Moses Obiri May 5, 2017 Contents 1 Introduction 1 2 Normal-based Confidence Interval 2 3 Bootstrap Confidence Interval 3 4 Finite Population Bootstrap Sampling 5 4.1
More informationACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS
ACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS TABLE OF CONTENTS INTRODUCTORY NOTE NOTES AND PROBLEM SETS Section 1 - Point Estimation 1 Problem Set 1 15 Section 2 - Confidence Intervals and
More informationContents 1. Contents
Contents 1 Contents 1 One-Sample Methods 3 1.1 Parametric Methods.................... 4 1.1.1 One-sample Z-test (see Chapter 0.3.1)...... 4 1.1.2 One-sample t-test................. 6 1.1.3 Large sample
More informationObjectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters
Objectives 10.1 Simple linear regression Statistical model for linear regression Estimating the regression parameters Confidence interval for regression parameters Significance test for the slope Confidence
More informationStatistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca
More informationAssessing Congruence Among Ultrametric Distance Matrices
Journal of Classification 26:103-117 (2009) DOI: 10.1007/s00357-009-9028-x Assessing Congruence Among Ultrametric Distance Matrices Véronique Campbell Université de Montréal, Canada Pierre Legendre Université
More informationA Little Stats Won t Hurt You
A Little Stats Won t Hurt You Nate Derby Statis Pro Data Analytics Seattle, WA, USA Edmonton SAS Users Group, 11/13/09 Nate Derby A Little Stats Won t Hurt You 1 / 71 Outline Introduction 1 Introduction
More informationBayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington
Bayesian Classifiers and Probability Estimation Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1 Data Space Suppose that we have a classification problem The
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More information1 Probability Distributions
1 Probability Distributions In the chapter about descriptive statistics sample data were discussed, and tools introduced for describing the samples with numbers as well as with graphs. In this chapter
More informationOPIM 303, Managerial Statistics H Guy Williams, 2006
OPIM 303 Lecture 6 Page 1 The height of the uniform distribution is given by 1 b a Being a Continuous distribution the probability of an exact event is zero: 2 0 There is an infinite number of points in
More informationMath 308 Discussion Problems #4 Chapter 4 (after 4.3)
Math 38 Discussion Problems #4 Chapter 4 (after 4.3) () (after 4.) Let S be a plane in R 3 passing through the origin, so that S is a two-dimensional subspace of R 3. Say that a linear transformation T
More informationEmpirical Evaluation (Ch 5)
Empirical Evaluation (Ch 5) how accurate is a hypothesis/model/dec.tree? given 2 hypotheses, which is better? accuracy on training set is biased error: error train (h) = #misclassifications/ S train error
More informationWooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics
Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).
More informationRobert Collins CSE586, PSU Intro to Sampling Methods
Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Topics to be Covered Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling (CDF) Ancestral Sampling Rejection
More informationSociology 6Z03 Review II
Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability
More informationThree Monte Carlo Models. of Faunal Evolution PUBLISHED BY NATURAL HISTORY THE AMERICAN MUSEUM SYDNEY ANDERSON AND CHARLES S.
AMERICAN MUSEUM Notltates PUBLISHED BY THE AMERICAN MUSEUM NATURAL HISTORY OF CENTRAL PARK WEST AT 79TH STREET NEW YORK, N.Y. 10024 U.S.A. NUMBER 2563 JANUARY 29, 1975 SYDNEY ANDERSON AND CHARLES S. ANDERSON
More information