STAT Section 2.1: Basic Inference. Basic Definitions

Similar documents
Random variables, Expectation, Mean and Variance. Slides are adapted from STAT414 course at PennState

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Chapter 2: Resampling Maarten Jansen

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

ST745: Survival Analysis: Nonparametric methods

Chapter 1 - Lecture 3 Measures of Location

Bootstrap tests. Patrick Breheny. October 11. Bootstrap vs. permutation tests Testing for equality of location

1 Probability Distributions

Introduction to Statistical Analysis

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

The Nonparametric Bootstrap

STAT Chapter 9: Two-Sample Problems. Paired Differences (Section 9.3)

Probability measures A probability measure, P, is a real valued function from the collection of possible events so that the following

Nonparametric Inference via Bootstrapping the Debiased Estimator

Finite Population Correction Methods

11 Survival Analysis and Empirical Likelihood

Math438 Actuarial Probability

Lecture 2. October 21, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

A general mixed model approach for spatio-temporal regression data

Unit 14: Nonparametric Statistical Methods

Non-parametric methods

What to do today (Nov 22, 2018)?

Midterm 1 and 2 results

Random Numbers and Simulation

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

The Chi-Square Distributions

Review of Statistics 101

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Non-parametric Inference and Resampling

Lecture 2: CDF and EDF

Exam details. Final Review Session. Things to Review

Hypothesis Testing with the Bootstrap. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

STAT100 Elementary Statistics and Probability

The Chi-Square Distributions

Frequency Distribution Cross-Tabulation

NAG Library Chapter Introduction. G08 Nonparametric Statistics

STAT 6385 Survey of Nonparametric Statistics. Order Statistics, EDF and Censoring

Estimation of Quantiles

Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University

Continuous RVs. 1. Suppose a random variable X has the following probability density function: π, zero otherwise. f ( x ) = sin x, 0 < x < 2

Political Science 236 Hypothesis Testing: Review and Bootstrapping

The assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values

Chapter 8 of Devore , H 1 :

Analytical Bootstrap Methods for Censored Data

Semester , Example Exam 1

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

Lecture Slides. Elementary Statistics. by Mario F. Triola. and the Triola Statistics Series

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Answers and expectations

Lecture Slides. Section 13-1 Overview. Elementary Statistics Tenth Edition. Chapter 13 Nonparametric Statistics. by Mario F.

Units. Exploratory Data Analysis. Variables. Student Data

18.05 Practice Final Exam

Introduction to Statistics

Advanced Statistics II: Non Parametric Tests

The exact bootstrap method shown on the example of the mean and variance estimation

Intro to Parametric & Nonparametric Statistics

Nonparametric tests, Bootstrapping

There are two basic kinds of random variables continuous and discrete.

Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996

Chapter 6. Estimation of Confidence Intervals for Nodal Maximum Power Consumption per Customer

STAT 479: Short Term Actuarial Models

Empirical Likelihood

Estimating a population mean

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal

Distribution Fitting (Censored Data)

STAT331. Cox s Proportional Hazards Model

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

Chapter 2 Solutions Page 15 of 28

Lecture 1: Random number generation, permutation test, and the bootstrap. August 25, 2016

Lecture 30. DATA 8 Summer Regression Inference

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

Post-exam 2 practice questions 18.05, Spring 2014

Supplementary Materials for Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach

Web-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

Stat 704 Data Analysis I Probability Review

Smooth nonparametric estimation of a quantile function under right censoring using beta kernels

Diagnostics. Gad Kimmel

NAG Library Chapter Introduction. g08 Nonparametric Statistics

Bootstrap Confidence Intervals

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

Analysing geoadditive regression data: a mixed model approach

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

Identify the scale of measurement most appropriate for each of the following variables. (Use A = nominal, B = ordinal, C = interval, D = ratio.

11. Bootstrap Methods

Z score indicates how far a raw score deviates from the sample mean in SD units. score Mean % Lower Bound

Bootstrap & Confidence/Prediction intervals

Introduction to Statistical Inference Lecture 8: Linear regression, Tests and confidence intervals

Unit 2. Describing Data: Numerical

Introduction to Basic Statistics Version 2

STAT Chapter 8: Hypothesis Tests

Comparing latent inequality with ordinal health data

University of California San Diego and Stanford University and

Variable inspection plans for continuous populations with unknown short tail distributions

1 Glivenko-Cantelli type theorems

Definition 1.1 (Parametric family of distributions) A parametric distribution is a set of distribution functions, each of which is determined by speci

Continuous RVs. 1. Suppose a random variable X has the following probability density function: π, zero otherwise. f ( x ) = sin x, 0 < x < 2

Transcription:

STAT 518 --- Section 2.1: Basic Inference Basic Definitions Population: The collection of all the individuals of interest. This collection may be or even. Sample: A collection of elements of the population. Suppose our population consists of a finite number (say, N) of elements. Random Sample: A sample of size n from a finite population such that each of the possible samples of size n was Another definition: Random Sample: A sample of size n forming a sequence of Note these definitions are equivalent only if the elements are drawn from the population. If the population size is very large, whether the sampling was done with or without replacement makes little practical difference.

Multivariate Data Sometimes each individual may have more than one variable measured on it. Each observation is then a multivariate random variable (or ) Example: If the weight and height of a sample of 8 people are measured, our multivariate data are: If the sample is random, then the components Y i1 and Y i2 might not be independent, but the vectors X 1, X 2,, X 8 will still be independent and identically distributed. That is, knowledge of the value of X 1, say, does not alter the probability distribution of X 2.

Measurement Scales If a variable simply places an individual into one of several (unordered) categories, the variable is measured on a scale. If the variable is categorical but the categories have a meaningful ordering, the variable is on the scale. If the variable is numerical and the value of zero is arbitrary rather than meaningful, then the variable is on the scale. For interval data, the interval (difference) between two values is meaningful, but ratios between two values are not meaningful. If the variable is numerical and there is a meaningful zero, the variable is on the scale.

With ratio measurements, the ratio between two values has meaning. Weaker ------------------------------------ Stronger Most classical parametric methods require the scale of measurement of the data to be interval (or stronger). Some nonparametric methods require ordinal (or stronger) data; others can work for data on any scale. A parameter is a characteristic of a population. Typically a parameter cannot be calculated from sample data. A statistic is a function of random variables. Given the data, we can calculate the value of a statistic. Examples of statistics:

Order Statistics The k-th order statistic for a sample X 1, X 2,, X n is denoted X (k) and is the k-th smallest value in the sample. The values X (1) X (2) X (n) are called the ordered random sample. Example: If our sample is: 14, 7, 9, 2, 16, 18 then X (3) = Section 2.2: Estimation Often we use a statistic to estimate some aspect of a population of interest. A statistic used to estimate is called an estimator. Familiar The sample mean: The sample variance: The sample standard deviation:

These are point estimates (single numbers). An interval estimate (confidence interval) is an interval of numbers that is designed to contain the parameter value. A 95% confidence interval is constructed via a formula that has 0.95 probability (over repeated samples) of containing the true parameter value. Familiar large-sample formula for CI for : Some Less Familiar Estimators The cumulative distribution function (c.d.f.) of a random variable is denoted by F(x): F(x) = P(X < x) x This is f ( t) dt when X is a continuous r.v. Example: If X is a normal variable with mean 100, its c.d.f. F(x) should look like:

Sometimes we do not know the distribution of our variable of interest. The empirical distribution function (e.d.f.) is an estimator of the true c.d.f. it can be calculated from the sample data. Example: Suppose heights of adult females have normal distribution with mean 65 inches and standard deviation 2.5 inches. The c.d.f. of this distribution is: F(x) 0.0 0.2 0.4 0.6 0.8 1.0 60 65 70 Now suppose we do NOT know the true height distribution. We randomly sample 5 females and measure their heights as: 69.3, 66.3, 62.6, 62.9, 67.4 x e.d.f.:

The survival function is defined as 1 F(x), which is the probability that the random variable takes a value greater than x. This is useful in reliability/survival analysis, when it is the probability of the item surviving past time x. The Kaplan-Meier estimator (p. 89-91) is a way to estimate the survival function when the survival time is observed for only some of the data values. The Bootstrap The nonparametric bootstrap is a method of estimating characteristics (like expected values and standard errors) of summary statistics. This is especially useful when the true population distribution is unknown. The nonparametric bootstrap is based on the e.d.f. rather than the true (and perhaps unknown) c.d.f. Method: Resample data (randomly select n values from the original sample, with replacement) m times. These bootstrap samples together mimic the population. For each of the m bootstrap samples, calculate the statistic of interest.

These m values will approximate the sampling distribution. From these bootstrap samples, we can estimate the: (1) expected value of the statistic (2) standard error of the statistic (3) confidence interval of a corresponding parameter Example: We wish to estimate the 85 th percentile of the population of BMI measurements of SC high schoolers. We take a random sample of 20 SC high school students and measure their BMI. See code on course web page for bootstrap computations: