Understanding Samples

Size: px
Start display at page:

Download "Understanding Samples"

Transcription

1 1 Will Moroe CS 109 Samplig ad Bootstrappig Lecture Notes #17 August 2, 2017 Based o a hadout by Chris Piech I this chapter we are goig to talk about statistics calculated o samples from a populatio. We are the goig to talk about probability claims that we ca make with respect to the origial populatio a cetral requiremet for most scietific disciplies. Let s say you are the kig of Bhuta, ad you wat to kow the average happiess of the people i your coutry. You ca t ask every sigle perso, but you could ask a radom subsample. I this ext sectio we will cosider pricipled claims that you ca make based o a subsample. Assume we radomly sample 200 Bhutaese people ad ask them about their happiess, o a scale of 1 to 100 (happiesses? smiles?). Our data looks like this: 72, 85,..., 71. You ca also thik of it as a collectio of = 200 I.I.D. (idepedet, idetically distributed) radom variables X 1, X 2,..., X. Uderstadig Samples The idea behid samplig is simple, but the details ad the mathematical otatio ca be complicated. Here is a picture to show you all of the ideas ivolved: The theory is that there is some large populatio (such as the 774,000 people who live i Bhuta). We collect a sample of people at radom, where each perso i the populatio is equally likely to be i our sample. From each perso we record oe umber (e.g., their reported happiess). We are goig to call the umber from the i-th perso we sampled. Oe way to visualize your samples X 1, X 2,..., X is to make a histogram of their values. We make the assumptio that all of our s are idetically distributed. That meas that we are assumig there is a sigle uderlyig distributio F that we drew our samples from. Recall that a distributio for discrete radom variables should defie a probability mass fuctio.

2 2 Estimatig Mea ad Variace from Samples We assume that the data we look at are I.I.D. from the same uderlyig distributio (F) with a true mea (µ) ad a true variace (σ 2 ). Sice we ca t talk to everyoe i Bhuta, we have to rely o our sample to estimate the mea ad variace. From our sample we ca calculate a sample mea ( X) ad a sample variace (S 2 ). These are the best guesses that we ca make about the true mea ad true variace. X = S 2 = ( X) 2 1 The first thig to kow about these estimates is that they are ubiased. Havig a ubiased estimate meas that if we were to repeat this samplig process may times, the expected value each of the estimate should be equal to the true value we are tryig to estimate. We will first prove that that is the case for X. E[ X] = 1 E = 1 E[ ] = 1 µ = 1 µ = µ The equatio for the sample mea seems related to our uderstadig of expectatio. The same could be said about sample variace except for the surprisig ( 1) i the deomiator of the equatio. Why ( 1)? That deomiator is ecessary to make sure that E[S 2 ] = σ 2. The proof for S 2 is a bit more ivolved; you do t have to remember this, but some people may be iterested i kowig it: E[S 2 ] ( X) 2 1 ( 1)E[S 2 ] ( X) 2 (( µ) + (µ X)) 2 ( µ) 2 + (µ X) ( µ)(µ X) ( µ) 2 + (µ X) 2 + 2(µ X) ( µ) ( µ) 2 + (µ X) 2 + 2(µ X)( X µ) ( µ) 2 (µ X) 2 [ ( µ) 2] E [ (µ X) 2] = σ 2 Var( X) = σ 2 σ2 = σ2 σ 2 = ( 1)σ 2

3 3 So E[S 2 ] = σ 2. The ituitio behid the proof is that sample variace calculates the distace of each sample to the sample mea, ot the true mea. The sample mea itself varies, ad we ca show that its variace is also related to the true variace. Variace of the Sample Mea We ow have estimates for mea ad variace that are ot biased that is, they are correct o average. However, the estimates chage depedig o the samples. How stable are they? The sample mea is computed as a average of radom variables. It takes o values probabilistically, which makes it a radom variable itself. We ca compute its variace: Var( X) = Var ( ) 2 1 = Var ( ) 2 ( ) 2 ( ) = Var( ) = σ 2 = σ 2 = σ2 This tells us that the variace of the sample mea is proportioal to the variace of the uderlyig distributio, but goes dow with the umber of samples. Stadard Error Kowig that the variace of the sample mea is small if the umber of samples is large is reassurig, but the expressio for the variace of the sample mea depeds o the true variace of the uderlyig distributio. What if we do t kow that true variace? What ca we say about the stability of our estimate of the mea, give oly the sample we took? We kow that S 2 is a ubiased estimator for the true variace. So oe reasoable thig to try is to substitute S 2 for σ 2 : Var( X) = σ2 S2 SD( X) = σ S sice S 2 is a ubiased estimate sice SD is the square root of Var That SD( X) formula has a special ame: it is called the stadard error, ad it is a commo way of reportig ucertaity of estimates of meas ( error bars ) i scietific papers. Let s say our sample of happiess has = 200 people, the sample mea is X = 83, ad the sample variace is S 2 = 450. We ca calculate the stadard error of our estimate of the mea to be S 1.5. Whe we report our results, we will say that the average happiess score i Bhuta is 83 ± 1.5, with variace 450. ( If you re woderig, S 2 has a variace too; it turs out it s equal to 1 E[(X µ) 4 ] 3 1 (σ2 ) 2). We wo t use that oe i CS 109.

4 4 Bootstrap The bootstrap is a statistical techique for uderstadig distributios of statistics. It was iveted here at Staford i 1979 whe mathematicias were just startig to uderstad how computers, ad computer simulatios, could be used to better uderstad probabilities. The first key isight is that if we had access to the uderlyig distributio (F), the aswerig almost ay questio we might have about how accurate our statistics are would become straightforward. For example, i the previous sectio we gave a formula for how you could calculate the sample variace from a sample of size. We kow that i expectatio our sample variace is equal to the true variace. But what if we wat to kow the probability that the true variace is withi a certai rage of the umber we calculated? That questio might soud dry, but it is critical to evaluatig scietific claims. If you kew the uderlyig distributio, F, you could simply repeat the experimet of drawig a sample of size from F, calculate the sample variace from our ew sample ad test what portio fell withi a certai rage. The ext isight behid bootstrappig is that the best estimate that we ca get for F is from our sample itself. The geeral algorithm looks like this: def bootstrap(sample): N = umber of elemets i sample pmf = estimate the uderlyig pmf from the sample stats = [] repeat 10,000 times: resample = draw N ew samples from the pmf stat = calculate your stat o the resample stats.apped(stat) stats ca ow be used to estimate the distributio of the stat Next week we will talk i much more detail about estimatig distributios from samples. For ow, the simplest way to estimate F (ad the oe we will use i this class) is to assume that P(X = k) is simply the fractio of times that k showed up i the sample. This set of probabilities defies a probability mass fuctio for a discrete radom variable, which we ll call ˆF, the hat idicatig that ˆF is a estimate of the probability distributio of F. This estimated distributio, formed from couts of samples, is sometimes called this empirical distributio. Bootstrappig is a reasoable thig to do because the sample you have is the best ad oly iformatio you have about what the uderlyig populatio distributio actually looks like. May samples will look quite like the populatio they came from. With this approach, we ca compute probabilities ad estimates ot just for the mea, but for ay statistic we wat. To calculate Var(S 2 ), for example, we could calculate S i 2 for each resample i, ad after 10,000 iteratios, we could calculate the sample variace of all the S i 2 s.

5 5 You might be woderig why the resample is the same size as the origial sample (). The aswer is that the variatio of the variatio of stat that you are calculatig could deped o the size of the sample (or the resample). To accurately estimate the distributio of the stat, we must use resamples of the same size. The bootstrap has strog theoretical guaratees, ad it is accepted by the scietific commuity. It breaks dow whe the uderlyig distributio has a log tail or if the samples are ot I.I.D. Example of p-value calculatio We are tryig to figure out if people are happier i Bhuta or i Nepal. We sample 1 = 200 idividuals i Bhuta ad 2 = 300 idividuals i Nepal ad ask them to rate their happiess o a scale from 1 to 10. We measure the sample meas for the two samples ad observe that people i our Nepal sample are slightly happier the differece betwee the Nepal sample mea ad the Bhuta sample mea is 0.5 poits o the happiess scale. Have we really show that people i Nepal are happier? Sample meas ca fluctuate. How do we kow that we did t just get that differece because of the radom differeces amog samples? There is t a rigorous, objective way to prove that the differece you discovered was t due to chace, or eve to give a probability that the differece was due to chace. It is possible, however, to give a probability for the reverse statemet: if the oly differece i the samples was due to chace, what would be the probability that we get a result just as extreme? The assumptio that the differece betwee the samples was due to chace is a example of a ull hypothesis. A ull hypothesis says that there is o relatioship betwee two measured pheomea or o differece betwee two groups. The probability we gave is kow as a p-value. So a p-value is the probability that, whe the ull hypothesis is true, the statistic measured would be equal to, or more extreme tha, tha the value you are reportig. I the case of comparig Nepal to Bhuta, the ull hypothesis is that there is o differece betwee the distributio of happiess i Bhuta ad Nepal. Whe you drew samples, Nepal had a mea that 0.5 poits larger tha Bhuta by chace. We ca use bootstrappig to calculate the p-value. First, we estimate the uderlyig distributio of the ull hypothesis uderlyig distributio, by makig a probability mass fuctio from all of our samples from Nepal ad all of our samples from Bhuta. def pvaluebootstrap(bhutasample, epalsample): N = size of bhutasample M = size of epalsample uiversalsample = combie bhutasamples ad epalsamples uiversalpmf = estimate the uderlyig pmf of uiversalsample cout = 0 repeat 10,000 times: bhutaresample = draw N ew samples from uiversalpmf

6 6 epalresample = draw M ew samples from uiversalpmf mubhuta = sample mea of bhutaresample munepal = sample mea of epalresample meadifferece = munepal - mubhuta if meadifferece > observeddifferece: cout += 1 pvalue = cout / 10,000 This is particularly ice because we ever had to assume the distributio that our samples came from had a particular form (e.g., we ever had to claim that happiess is ormally distributed). You might have heard of a t-test. That is aother way of calculatig p-values, but it makes the assumptios that samples are ormally distributed ad have the same variace. Nowadays, whe we have reasoable computer power, bootstrappig is a more versatile ad accurate tool.

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10 DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc. Chapter 22 Comparig Two Proportios Copyright 2010, 2007, 2004 Pearso Educatio, Ic. Comparig Two Proportios Read the first two paragraphs of pg 504. Comparisos betwee two percetages are much more commo

More information

Understanding Dissimilarity Among Samples

Understanding Dissimilarity Among Samples Aoucemets: Midterm is Wed. Review sheet is o class webpage (i the list of lectures) ad will be covered i discussio o Moday. Two sheets of otes are allowed, same rules as for the oe sheet last time. Office

More information

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc. Chapter 22 Comparig Two Proportios Copyright 2010 Pearso Educatio, Ic. Comparig Two Proportios Comparisos betwee two percetages are much more commo tha questios about isolated percetages. Ad they are more

More information

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population A quick activity - Cetral Limit Theorem ad Proportios Lecture 21: Testig Proportios Statistics 10 Coli Rudel Flip a coi 30 times this is goig to get loud! Record the umber of heads you obtaied ad calculate

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to: STA 2023 Module 10 Comparig Two Proportios Learig Objectives Upo completig this module, you should be able to: 1. Perform large-sample ifereces (hypothesis test ad cofidece itervals) to compare two populatio

More information

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times Sigificace level vs. cofidece level Agreemet of CI ad HT Lecture 13 - Tests of Proportios Sta102 / BME102 Coli Rudel October 15, 2014 Cofidece itervals ad hypothesis tests (almost) always agree, as log

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

STATS 200: Introduction to Statistical Inference. Lecture 1: Course introduction and polling

STATS 200: Introduction to Statistical Inference. Lecture 1: Course introduction and polling STATS 200: Itroductio to Statistical Iferece Lecture 1: Course itroductio ad pollig U.S. presidetial electio projectios by state (Source: fivethirtyeight.com, 25 September 2016) Pollig Let s try to uderstad

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Frequentist Inference

Frequentist Inference Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Comparing your lab results with the others by one-way ANOVA

Comparing your lab results with the others by one-way ANOVA Comparig your lab results with the others by oe-way ANOVA You may have developed a ew test method ad i your method validatio process you would like to check the method s ruggedess by coductig a simple

More information

Confidence Intervals for the Population Proportion p

Confidence Intervals for the Population Proportion p Cofidece Itervals for the Populatio Proportio p The cocept of cofidece itervals for the populatio proportio p is the same as the oe for, the samplig distributio of the mea, x. The structure is idetical:

More information

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?

More information

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6) STAT 350 Hadout 9 Samplig Distributio, Cetral Limit Theorem (6.6) A radom sample is a sequece of radom variables X, X 2,, X that are idepedet ad idetically distributed. o This property is ofte abbreviated

More information

Median and IQR The median is the value which divides the ordered data values in half.

Median and IQR The median is the value which divides the ordered data values in half. STA 666 Fall 2007 Web-based Course Notes 4: Describig Distributios Numerically Numerical summaries for quatitative variables media ad iterquartile rage (IQR) 5-umber summary mea ad stadard deviatio Media

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates. 5. Data, Estimates, ad Models: quatifyig the accuracy of estimates. 5. Estimatig a Normal Mea 5.2 The Distributio of the Normal Sample Mea 5.3 Normal data, cofidece iterval for, kow 5.4 Normal data, cofidece

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

Homework 5 Solutions

Homework 5 Solutions Homework 5 Solutios p329 # 12 No. To estimate the chace you eed the expected value ad stadard error. To do get the expected value you eed the average of the box ad to get the stadard error you eed the

More information

Simulation. Two Rule For Inverting A Distribution Function

Simulation. Two Rule For Inverting A Distribution Function Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump

More information

The standard deviation of the mean

The standard deviation of the mean Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Distribution of Random Samples & Limit theorems

Distribution of Random Samples & Limit theorems STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to

More information

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Lecture 3. Properties of Summary Statistics: Sampling Distribution Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary

More information

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments: Recall: STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Commets:. So far we have estimates of the parameters! 0 ad!, but have o idea how good these estimates are. Assumptio: E(Y x)! 0 +! x (liear coditioal

More information

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22 CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first

More information

Parameter, Statistic and Random Samples

Parameter, Statistic and Random Samples Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,

More information

Chapter 6 Sampling Distributions

Chapter 6 Sampling Distributions Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to

More information

Chapter 23: Inferences About Means

Chapter 23: Inferences About Means Chapter 23: Ifereces About Meas Eough Proportios! We ve spet the last two uits workig with proportios (or qualitative variables, at least) ow it s time to tur our attetios to quatitative variables. For

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n, CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 9 Variace Questio: At each time step, I flip a fair coi. If it comes up Heads, I walk oe step to the right; if it comes up Tails, I walk oe

More information

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1. Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

More information

PRACTICE PROBLEMS FOR THE FINAL

PRACTICE PROBLEMS FOR THE FINAL PRACTICE PROBLEMS FOR THE FINAL Math 36Q Fall 25 Professor Hoh Below is a list of practice questios for the Fial Exam. I would suggest also goig over the practice problems ad exams for Exam ad Exam 2 to

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE Part 3: Summary of CI for µ Cofidece Iterval for a Populatio Proportio p Sectio 8-4 Summary for creatig a 100(1-α)% CI for µ: Whe σ 2 is kow ad paret

More information

Parameter, Statistic and Random Samples

Parameter, Statistic and Random Samples Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,

More information

Lecture 1 Probability and Statistics

Lecture 1 Probability and Statistics Wikipedia: Lecture 1 Probability ad Statistics Bejami Disraeli, British statesma ad literary figure (1804 1881): There are three kids of lies: lies, damed lies, ad statistics. popularized i US by Mark

More information

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER 1 018/019 DR. ANTHONY BROWN 8. Statistics 8.1. Measures of Cetre: Mea, Media ad Mode. If we have a series of umbers the

More information

Introduction There are two really interesting things to do in statistics.

Introduction There are two really interesting things to do in statistics. ECON 497 Lecture Notes E Page 1 of 1 Metropolita State Uiversity ECON 497: Research ad Forecastig Lecture Notes E: Samplig Distributios Itroductio There are two really iterestig thigs to do i statistics.

More information

Sampling Distributions, Z-Tests, Power

Sampling Distributions, Z-Tests, Power Samplig Distributios, Z-Tests, Power We draw ifereces about populatio parameters from sample statistics Sample proportio approximates populatio proportio Sample mea approximates populatio mea Sample variace

More information

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals 7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence Chapter 8: Estimatig with Cofidece Sectio 8.2 The Practice of Statistics, 4 th editio For AP* STARNES, YATES, MOORE Chapter 8 Estimatig with Cofidece 8.1 Cofidece Itervals: The Basics 8.2 8.3 Estimatig

More information

Lecture 1 Probability and Statistics

Lecture 1 Probability and Statistics Wikipedia: Lecture 1 Probability ad Statistics Bejami Disraeli, British statesma ad literary figure (1804 1881): There are three kids of lies: lies, damed lies, ad statistics. popularized i US by Mark

More information

Statisticians use the word population to refer the total number of (potential) observations under consideration

Statisticians use the word population to refer the total number of (potential) observations under consideration 6 Samplig Distributios Statisticias use the word populatio to refer the total umber of (potetial) observatios uder cosideratio The populatio is just the set of all possible outcomes i our sample space

More information

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight) Tests of Hypotheses Based o a Sigle Sample Devore Chapter Eight MATH-252-01: Probability ad Statistics II Sprig 2018 Cotets 1 Hypothesis Tests illustrated with z-tests 1 1.1 Overview of Hypothesis Testig..........

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

Economics Spring 2015

Economics Spring 2015 1 Ecoomics 400 -- Sprig 015 /17/015 pp. 30-38; Ch. 7.1.4-7. New Stata Assigmet ad ew MyStatlab assigmet, both due Feb 4th Midterm Exam Thursday Feb 6th, Chapters 1-7 of Groeber text ad all relevat lectures

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation II. Descriptive Statistics D. Liear Correlatio ad Regressio I this sectio Liear Correlatio Cause ad Effect Liear Regressio 1. Liear Correlatio Quatifyig Liear Correlatio The Pearso product-momet correlatio

More information

Simple Random Sampling!

Simple Random Sampling! Simple Radom Samplig! Professor Ro Fricker! Naval Postgraduate School! Moterey, Califoria! Readig:! 3/26/13 Scheaffer et al. chapter 4! 1 Goals for this Lecture! Defie simple radom samplig (SRS) ad discuss

More information

(7 One- and Two-Sample Estimation Problem )

(7 One- and Two-Sample Estimation Problem ) 34 Stat Lecture Notes (7 Oe- ad Two-Sample Estimatio Problem ) ( Book*: Chapter 8,pg65) Probability& Statistics for Egieers & Scietists By Walpole, Myers, Myers, Ye Estimatio 1 ) ( ˆ S P i i Poit estimate:

More information

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis Sectio 9.2 Tests About a Populatio Proportio P H A N T O M S Parameters Hypothesis Assess Coditios Name the Test Test Statistic (Calculate) Obtai P value Make a decisio State coclusio Sectio 9.2 Tests

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

AMS570 Lecture Notes #2

AMS570 Lecture Notes #2 AMS570 Lecture Notes # Review of Probability (cotiued) Probability distributios. () Biomial distributio Biomial Experimet: ) It cosists of trials ) Each trial results i of possible outcomes, S or F 3)

More information

SDS 321: Introduction to Probability and Statistics

SDS 321: Introduction to Probability and Statistics SDS 321: Itroductio to Probability ad Statistics Lecture 23: Cotiuous radom variables- Iequalities, CLT Puramrita Sarkar Departmet of Statistics ad Data Sciece The Uiversity of Texas at Austi www.cs.cmu.edu/

More information

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading Topic 15 - Two Sample Iferece I STAT 511 Professor Bruce Craig Comparig Two Populatios Research ofte ivolves the compariso of two or more samples from differet populatios Graphical summaries provide visual

More information

1 Introduction to reducing variance in Monte Carlo simulations

1 Introduction to reducing variance in Monte Carlo simulations Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by

More information

Discrete Mathematics for CS Spring 2005 Clancy/Wagner Notes 21. Some Important Distributions

Discrete Mathematics for CS Spring 2005 Clancy/Wagner Notes 21. Some Important Distributions CS 70 Discrete Mathematics for CS Sprig 2005 Clacy/Wager Notes 21 Some Importat Distributios Questio: A biased coi with Heads probability p is tossed repeatedly util the first Head appears. What is the

More information

BIOSTATS 640 Intermediate Biostatistics Frequently Asked Questions Topic 1 FAQ 1 Review of BIOSTATS 540 Introductory Biostatistics

BIOSTATS 640 Intermediate Biostatistics Frequently Asked Questions Topic 1 FAQ 1 Review of BIOSTATS 540 Introductory Biostatistics BIOTAT 640 Itermediate Biostatistics Frequetly Asked Questios Topic FAQ Review of BIOTAT 540 Itroductory Biostatistics. I m cofused about the jargo ad otatio, especially populatio versus sample. Could

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Lecture 16

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Lecture 16 EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Lecture 16 Variace Questio: Let us retur oce agai to the questio of how may heads i a typical sequece of coi flips. Recall that we

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Aalysis ad Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasii/teachig.html Suhasii Subba Rao Review of testig: Example The admistrator of a ursig home wats to do a time ad motio

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

MA131 - Analysis 1. Workbook 3 Sequences II

MA131 - Analysis 1. Workbook 3 Sequences II MA3 - Aalysis Workbook 3 Sequeces II Autum 2004 Cotets 2.8 Coverget Sequeces........................ 2.9 Algebra of Limits......................... 2 2.0 Further Useful Results........................

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notatio Math 113 - Itroductio to Applied Statistics Name : Use Word or WordPerfect to recreate the followig documets. Each article is worth 10 poits ad ca be prited ad give to the istructor

More information

Variance of Discrete Random Variables Class 5, Jeremy Orloff and Jonathan Bloom

Variance of Discrete Random Variables Class 5, Jeremy Orloff and Jonathan Bloom Variace of Discrete Radom Variables Class 5, 18.05 Jeremy Orloff ad Joatha Bloom 1 Learig Goals 1. Be able to compute the variace ad stadard deviatio of a radom variable.. Uderstad that stadard deviatio

More information

Lecture 2: Concentration Bounds

Lecture 2: Concentration Bounds CSE 52: Desig ad Aalysis of Algorithms I Sprig 206 Lecture 2: Cocetratio Bouds Lecturer: Shaya Oveis Ghara March 30th Scribe: Syuzaa Sargsya Disclaimer: These otes have ot bee subjected to the usual scrutiy

More information

Sample Size Determination (Two or More Samples)

Sample Size Determination (Two or More Samples) Sample Sie Determiatio (Two or More Samples) STATGRAPHICS Rev. 963 Summary... Data Iput... Aalysis Summary... 5 Power Curve... 5 Calculatios... 6 Summary This procedure determies a suitable sample sie

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determine sample size to estimate characteristic value of soil parameters A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig

More information

Read through these prior to coming to the test and follow them when you take your test.

Read through these prior to coming to the test and follow them when you take your test. Math 143 Sprig 2012 Test 2 Iformatio 1 Test 2 will be give i class o Thursday April 5. Material Covered The test is cummulative, but will emphasize the recet material (Chapters 6 8, 10 11, ad Sectios 12.1

More information

Lecture 24 Floods and flood frequency

Lecture 24 Floods and flood frequency Lecture 4 Floods ad flood frequecy Oe of the thigs we wat to kow most about rivers is what s the probability that a flood of size will happe this year? I 100 years? There are two ways to do this empirically,

More information

Eco411 Lab: Central Limit Theorem, Normal Distribution, and Journey to Girl State

Eco411 Lab: Central Limit Theorem, Normal Distribution, and Journey to Girl State Eco411 Lab: Cetral Limit Theorem, Normal Distributio, ad Jourey to Girl State 1. Some studets may woder why the magic umber 1.96 or 2 (called critical values) is so importat i statistics. Where do they

More information

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1 October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 1 Populatio parameters ad Sample Statistics October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 2 Ifereces

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

STAT 203 Chapter 18 Sampling Distribution Models

STAT 203 Chapter 18 Sampling Distribution Models STAT 203 Chapter 18 Samplig Distributio Models Populatio vs. sample, parameter vs. statistic Recall that a populatio cotais the etire collectio of idividuals that oe wats to study, ad a sample is a subset

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Chapter 18 Summary Sampling Distribution Models

Chapter 18 Summary Sampling Distribution Models Uit 5 Itroductio to Iferece Chapter 18 Summary Samplig Distributio Models What have we leared? Sample proportios ad meas will vary from sample to sample that s samplig error (samplig variability). Samplig

More information

Power and Type II Error

Power and Type II Error Statistical Methods I (EXST 7005) Page 57 Power ad Type II Error Sice we do't actually kow the value of the true mea (or we would't be hypothesizig somethig else), we caot kow i practice the type II error

More information

STAT Homework 2 - Solutions

STAT Homework 2 - Solutions STAT-36700 Homework - Solutios Fall 08 September 4, 08 This cotais solutios for Homework. Please ote that we have icluded several additioal commets ad approaches to the problems to give you better isight.

More information

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 15

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 15 CS 70 Discrete Mathematics ad Probability Theory Summer 2014 James Cook Note 15 Some Importat Distributios I this ote we will itroduce three importat probability distributios that are widely used to model

More information

PRACTICE PROBLEMS FOR THE FINAL

PRACTICE PROBLEMS FOR THE FINAL PRACTICE PROBLEMS FOR THE FINAL Math 36Q Sprig 25 Professor Hoh Below is a list of practice questios for the Fial Exam. I would suggest also goig over the practice problems ad exams for Exam ad Exam 2

More information

WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT

WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? Harold G. Loomis Hoolulu, HI ABSTRACT Most coastal locatios have few if ay records of tsuami wave heights obtaied over various time periods. Still

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Introducing Sample Proportions

Introducing Sample Proportions Itroducig Sample Proportios Probability ad statistics Aswers & Notes TI-Nspire Ivestigatio Studet 60 mi 7 8 9 0 Itroductio A 00 survey of attitudes to climate chage, coducted i Australia by the CSIRO,

More information

Econ 371 Exam #1. Multiple Choice (5 points each): For each of the following, select the single most appropriate option to complete the statement.

Econ 371 Exam #1. Multiple Choice (5 points each): For each of the following, select the single most appropriate option to complete the statement. Eco 371 Exam #1 Multiple Choice (5 poits each): For each of the followig, select the sigle most appropriate optio to complete the statemet 1) The probability of a outcome a) is the umber of times that

More information

Module 1 Fundamentals in statistics

Module 1 Fundamentals in statistics Normal Distributio Repeated observatios that differ because of experimetal error ofte vary about some cetral value i a roughly symmetrical distributio i which small deviatios occur much more frequetly

More information

CONFIDENCE INTERVALS STUDY GUIDE

CONFIDENCE INTERVALS STUDY GUIDE CONFIDENCE INTERVALS STUDY UIDE Last uit, we discussed how sample statistics vary. Uder the right coditios, sample statistics like meas ad proportios follow a Normal distributio, which allows us to calculate

More information

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y. Testig Statistical Hypotheses Recall the study where we estimated the differece betwee mea systolic blood pressure levels of users of oral cotraceptives ad o-users, x - y. Such studies are sometimes viewed

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2. SAMPLE STATISTICS A radom sample x 1,x,,x from a distributio f(x) is a set of idepedetly ad idetically variables with x i f(x) for all i Their joit pdf is f(x 1,x,,x )=f(x 1 )f(x ) f(x )= f(x i ) The sample

More information

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ STATISTICAL INFERENCE INTRODUCTION Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I oesample testig, we essetially

More information