Understanding Samples
|
|
- Elaine O’Neal’
- 5 years ago
- Views:
Transcription
1 1 Will Moroe CS 109 Samplig ad Bootstrappig Lecture Notes #17 August 2, 2017 Based o a hadout by Chris Piech I this chapter we are goig to talk about statistics calculated o samples from a populatio. We are the goig to talk about probability claims that we ca make with respect to the origial populatio a cetral requiremet for most scietific disciplies. Let s say you are the kig of Bhuta, ad you wat to kow the average happiess of the people i your coutry. You ca t ask every sigle perso, but you could ask a radom subsample. I this ext sectio we will cosider pricipled claims that you ca make based o a subsample. Assume we radomly sample 200 Bhutaese people ad ask them about their happiess, o a scale of 1 to 100 (happiesses? smiles?). Our data looks like this: 72, 85,..., 71. You ca also thik of it as a collectio of = 200 I.I.D. (idepedet, idetically distributed) radom variables X 1, X 2,..., X. Uderstadig Samples The idea behid samplig is simple, but the details ad the mathematical otatio ca be complicated. Here is a picture to show you all of the ideas ivolved: The theory is that there is some large populatio (such as the 774,000 people who live i Bhuta). We collect a sample of people at radom, where each perso i the populatio is equally likely to be i our sample. From each perso we record oe umber (e.g., their reported happiess). We are goig to call the umber from the i-th perso we sampled. Oe way to visualize your samples X 1, X 2,..., X is to make a histogram of their values. We make the assumptio that all of our s are idetically distributed. That meas that we are assumig there is a sigle uderlyig distributio F that we drew our samples from. Recall that a distributio for discrete radom variables should defie a probability mass fuctio.
2 2 Estimatig Mea ad Variace from Samples We assume that the data we look at are I.I.D. from the same uderlyig distributio (F) with a true mea (µ) ad a true variace (σ 2 ). Sice we ca t talk to everyoe i Bhuta, we have to rely o our sample to estimate the mea ad variace. From our sample we ca calculate a sample mea ( X) ad a sample variace (S 2 ). These are the best guesses that we ca make about the true mea ad true variace. X = S 2 = ( X) 2 1 The first thig to kow about these estimates is that they are ubiased. Havig a ubiased estimate meas that if we were to repeat this samplig process may times, the expected value each of the estimate should be equal to the true value we are tryig to estimate. We will first prove that that is the case for X. E[ X] = 1 E = 1 E[ ] = 1 µ = 1 µ = µ The equatio for the sample mea seems related to our uderstadig of expectatio. The same could be said about sample variace except for the surprisig ( 1) i the deomiator of the equatio. Why ( 1)? That deomiator is ecessary to make sure that E[S 2 ] = σ 2. The proof for S 2 is a bit more ivolved; you do t have to remember this, but some people may be iterested i kowig it: E[S 2 ] ( X) 2 1 ( 1)E[S 2 ] ( X) 2 (( µ) + (µ X)) 2 ( µ) 2 + (µ X) ( µ)(µ X) ( µ) 2 + (µ X) 2 + 2(µ X) ( µ) ( µ) 2 + (µ X) 2 + 2(µ X)( X µ) ( µ) 2 (µ X) 2 [ ( µ) 2] E [ (µ X) 2] = σ 2 Var( X) = σ 2 σ2 = σ2 σ 2 = ( 1)σ 2
3 3 So E[S 2 ] = σ 2. The ituitio behid the proof is that sample variace calculates the distace of each sample to the sample mea, ot the true mea. The sample mea itself varies, ad we ca show that its variace is also related to the true variace. Variace of the Sample Mea We ow have estimates for mea ad variace that are ot biased that is, they are correct o average. However, the estimates chage depedig o the samples. How stable are they? The sample mea is computed as a average of radom variables. It takes o values probabilistically, which makes it a radom variable itself. We ca compute its variace: Var( X) = Var ( ) 2 1 = Var ( ) 2 ( ) 2 ( ) = Var( ) = σ 2 = σ 2 = σ2 This tells us that the variace of the sample mea is proportioal to the variace of the uderlyig distributio, but goes dow with the umber of samples. Stadard Error Kowig that the variace of the sample mea is small if the umber of samples is large is reassurig, but the expressio for the variace of the sample mea depeds o the true variace of the uderlyig distributio. What if we do t kow that true variace? What ca we say about the stability of our estimate of the mea, give oly the sample we took? We kow that S 2 is a ubiased estimator for the true variace. So oe reasoable thig to try is to substitute S 2 for σ 2 : Var( X) = σ2 S2 SD( X) = σ S sice S 2 is a ubiased estimate sice SD is the square root of Var That SD( X) formula has a special ame: it is called the stadard error, ad it is a commo way of reportig ucertaity of estimates of meas ( error bars ) i scietific papers. Let s say our sample of happiess has = 200 people, the sample mea is X = 83, ad the sample variace is S 2 = 450. We ca calculate the stadard error of our estimate of the mea to be S 1.5. Whe we report our results, we will say that the average happiess score i Bhuta is 83 ± 1.5, with variace 450. ( If you re woderig, S 2 has a variace too; it turs out it s equal to 1 E[(X µ) 4 ] 3 1 (σ2 ) 2). We wo t use that oe i CS 109.
4 4 Bootstrap The bootstrap is a statistical techique for uderstadig distributios of statistics. It was iveted here at Staford i 1979 whe mathematicias were just startig to uderstad how computers, ad computer simulatios, could be used to better uderstad probabilities. The first key isight is that if we had access to the uderlyig distributio (F), the aswerig almost ay questio we might have about how accurate our statistics are would become straightforward. For example, i the previous sectio we gave a formula for how you could calculate the sample variace from a sample of size. We kow that i expectatio our sample variace is equal to the true variace. But what if we wat to kow the probability that the true variace is withi a certai rage of the umber we calculated? That questio might soud dry, but it is critical to evaluatig scietific claims. If you kew the uderlyig distributio, F, you could simply repeat the experimet of drawig a sample of size from F, calculate the sample variace from our ew sample ad test what portio fell withi a certai rage. The ext isight behid bootstrappig is that the best estimate that we ca get for F is from our sample itself. The geeral algorithm looks like this: def bootstrap(sample): N = umber of elemets i sample pmf = estimate the uderlyig pmf from the sample stats = [] repeat 10,000 times: resample = draw N ew samples from the pmf stat = calculate your stat o the resample stats.apped(stat) stats ca ow be used to estimate the distributio of the stat Next week we will talk i much more detail about estimatig distributios from samples. For ow, the simplest way to estimate F (ad the oe we will use i this class) is to assume that P(X = k) is simply the fractio of times that k showed up i the sample. This set of probabilities defies a probability mass fuctio for a discrete radom variable, which we ll call ˆF, the hat idicatig that ˆF is a estimate of the probability distributio of F. This estimated distributio, formed from couts of samples, is sometimes called this empirical distributio. Bootstrappig is a reasoable thig to do because the sample you have is the best ad oly iformatio you have about what the uderlyig populatio distributio actually looks like. May samples will look quite like the populatio they came from. With this approach, we ca compute probabilities ad estimates ot just for the mea, but for ay statistic we wat. To calculate Var(S 2 ), for example, we could calculate S i 2 for each resample i, ad after 10,000 iteratios, we could calculate the sample variace of all the S i 2 s.
5 5 You might be woderig why the resample is the same size as the origial sample (). The aswer is that the variatio of the variatio of stat that you are calculatig could deped o the size of the sample (or the resample). To accurately estimate the distributio of the stat, we must use resamples of the same size. The bootstrap has strog theoretical guaratees, ad it is accepted by the scietific commuity. It breaks dow whe the uderlyig distributio has a log tail or if the samples are ot I.I.D. Example of p-value calculatio We are tryig to figure out if people are happier i Bhuta or i Nepal. We sample 1 = 200 idividuals i Bhuta ad 2 = 300 idividuals i Nepal ad ask them to rate their happiess o a scale from 1 to 10. We measure the sample meas for the two samples ad observe that people i our Nepal sample are slightly happier the differece betwee the Nepal sample mea ad the Bhuta sample mea is 0.5 poits o the happiess scale. Have we really show that people i Nepal are happier? Sample meas ca fluctuate. How do we kow that we did t just get that differece because of the radom differeces amog samples? There is t a rigorous, objective way to prove that the differece you discovered was t due to chace, or eve to give a probability that the differece was due to chace. It is possible, however, to give a probability for the reverse statemet: if the oly differece i the samples was due to chace, what would be the probability that we get a result just as extreme? The assumptio that the differece betwee the samples was due to chace is a example of a ull hypothesis. A ull hypothesis says that there is o relatioship betwee two measured pheomea or o differece betwee two groups. The probability we gave is kow as a p-value. So a p-value is the probability that, whe the ull hypothesis is true, the statistic measured would be equal to, or more extreme tha, tha the value you are reportig. I the case of comparig Nepal to Bhuta, the ull hypothesis is that there is o differece betwee the distributio of happiess i Bhuta ad Nepal. Whe you drew samples, Nepal had a mea that 0.5 poits larger tha Bhuta by chace. We ca use bootstrappig to calculate the p-value. First, we estimate the uderlyig distributio of the ull hypothesis uderlyig distributio, by makig a probability mass fuctio from all of our samples from Nepal ad all of our samples from Bhuta. def pvaluebootstrap(bhutasample, epalsample): N = size of bhutasample M = size of epalsample uiversalsample = combie bhutasamples ad epalsamples uiversalpmf = estimate the uderlyig pmf of uiversalsample cout = 0 repeat 10,000 times: bhutaresample = draw N ew samples from uiversalpmf
6 6 epalresample = draw M ew samples from uiversalpmf mubhuta = sample mea of bhutaresample munepal = sample mea of epalresample meadifferece = munepal - mubhuta if meadifferece > observeddifferece: cout += 1 pvalue = cout / 10,000 This is particularly ice because we ever had to assume the distributio that our samples came from had a particular form (e.g., we ever had to claim that happiess is ormally distributed). You might have heard of a t-test. That is aother way of calculatig p-values, but it makes the assumptios that samples are ormally distributed ad have the same variace. Nowadays, whe we have reasoable computer power, bootstrappig is a more versatile ad accurate tool.
DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10
DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set
More informationStatistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.
Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized
More informationChapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.
Chapter 22 Comparig Two Proportios Copyright 2010, 2007, 2004 Pearso Educatio, Ic. Comparig Two Proportios Read the first two paragraphs of pg 504. Comparisos betwee two percetages are much more commo
More informationUnderstanding Dissimilarity Among Samples
Aoucemets: Midterm is Wed. Review sheet is o class webpage (i the list of lectures) ad will be covered i discussio o Moday. Two sheets of otes are allowed, same rules as for the oe sheet last time. Office
More informationChapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.
Chapter 22 Comparig Two Proportios Copyright 2010 Pearso Educatio, Ic. Comparig Two Proportios Comparisos betwee two percetages are much more commo tha questios about isolated percetages. Ad they are more
More informationA quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population
A quick activity - Cetral Limit Theorem ad Proportios Lecture 21: Testig Proportios Statistics 10 Coli Rudel Flip a coi 30 times this is goig to get loud! Record the umber of heads you obtaied ad calculate
More informationStat 421-SP2012 Interval Estimation Section
Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible
More informationSTA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:
STA 2023 Module 10 Comparig Two Proportios Learig Objectives Upo completig this module, you should be able to: 1. Perform large-sample ifereces (hypothesis test ad cofidece itervals) to compare two populatio
More informationAgreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times
Sigificace level vs. cofidece level Agreemet of CI ad HT Lecture 13 - Tests of Proportios Sta102 / BME102 Coli Rudel October 15, 2014 Cofidece itervals ad hypothesis tests (almost) always agree, as log
More informationStatistics 511 Additional Materials
Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability
More informationSTATS 200: Introduction to Statistical Inference. Lecture 1: Course introduction and polling
STATS 200: Itroductio to Statistical Iferece Lecture 1: Course itroductio ad pollig U.S. presidetial electio projectios by state (Source: fivethirtyeight.com, 25 September 2016) Pollig Let s try to uderstad
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationFrequentist Inference
Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for
More informationFACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures
FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More information1 Inferential Methods for Correlation and Regression Analysis
1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet
More informationComparing your lab results with the others by one-way ANOVA
Comparig your lab results with the others by oe-way ANOVA You may have developed a ew test method ad i your method validatio process you would like to check the method s ruggedess by coductig a simple
More informationConfidence Intervals for the Population Proportion p
Cofidece Itervals for the Populatio Proportio p The cocept of cofidece itervals for the populatio proportio p is the same as the oe for, the samplig distributio of the mea, x. The structure is idetical:
More informationHypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance
Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?
More informationSTAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)
STAT 350 Hadout 9 Samplig Distributio, Cetral Limit Theorem (6.6) A radom sample is a sequece of radom variables X, X 2,, X that are idepedet ad idetically distributed. o This property is ofte abbreviated
More informationMedian and IQR The median is the value which divides the ordered data values in half.
STA 666 Fall 2007 Web-based Course Notes 4: Describig Distributios Numerically Numerical summaries for quatitative variables media ad iterquartile rage (IQR) 5-umber summary mea ad stadard deviatio Media
More informationResampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.
Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator
More informationBig Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.
5. Data, Estimates, ad Models: quatifyig the accuracy of estimates. 5. Estimatig a Normal Mea 5.2 The Distributio of the Normal Sample Mea 5.3 Normal data, cofidece iterval for, kow 5.4 Normal data, cofidece
More informationThis is an introductory course in Analysis of Variance and Design of Experiments.
1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class
More informationHomework 5 Solutions
Homework 5 Solutios p329 # 12 No. To estimate the chace you eed the expected value ad stadard error. To do get the expected value you eed the average of the box ad to get the stadard error you eed the
More informationSimulation. Two Rule For Inverting A Distribution Function
Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump
More informationThe standard deviation of the mean
Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More informationDistribution of Random Samples & Limit theorems
STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to
More informationLecture 3. Properties of Summary Statistics: Sampling Distribution
Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary
More informationSTATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:
Recall: STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Commets:. So far we have estimates of the parameters! 0 ad!, but have o idea how good these estimates are. Assumptio: E(Y x)! 0 +! x (liear coditioal
More informationDiscrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22
CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first
More informationParameter, Statistic and Random Samples
Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,
More informationChapter 6 Sampling Distributions
Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to
More informationChapter 23: Inferences About Means
Chapter 23: Ifereces About Meas Eough Proportios! We ve spet the last two uits workig with proportios (or qualitative variables, at least) ow it s time to tur our attetios to quatitative variables. For
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More informationn outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 9 Variace Questio: At each time step, I flip a fair coi. If it comes up Heads, I walk oe step to the right; if it comes up Tails, I walk oe
More informationEcon 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.
Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio
More informationPRACTICE PROBLEMS FOR THE FINAL
PRACTICE PROBLEMS FOR THE FINAL Math 36Q Fall 25 Professor Hoh Below is a list of practice questios for the Fial Exam. I would suggest also goig over the practice problems ad exams for Exam ad Exam 2 to
More informationGoodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)
Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................
More informationChapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p
Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE Part 3: Summary of CI for µ Cofidece Iterval for a Populatio Proportio p Sectio 8-4 Summary for creatig a 100(1-α)% CI for µ: Whe σ 2 is kow ad paret
More informationParameter, Statistic and Random Samples
Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,
More informationLecture 1 Probability and Statistics
Wikipedia: Lecture 1 Probability ad Statistics Bejami Disraeli, British statesma ad literary figure (1804 1881): There are three kids of lies: lies, damed lies, ad statistics. popularized i US by Mark
More informationACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics
ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER 1 018/019 DR. ANTHONY BROWN 8. Statistics 8.1. Measures of Cetre: Mea, Media ad Mode. If we have a series of umbers the
More informationIntroduction There are two really interesting things to do in statistics.
ECON 497 Lecture Notes E Page 1 of 1 Metropolita State Uiversity ECON 497: Research ad Forecastig Lecture Notes E: Samplig Distributios Itroductio There are two really iterestig thigs to do i statistics.
More informationSampling Distributions, Z-Tests, Power
Samplig Distributios, Z-Tests, Power We draw ifereces about populatio parameters from sample statistics Sample proportio approximates populatio proportio Sample mea approximates populatio mea Sample variace
More information7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals
7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses
More informationRandom Variables, Sampling and Estimation
Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig
More informationChapter 8: Estimating with Confidence
Chapter 8: Estimatig with Cofidece Sectio 8.2 The Practice of Statistics, 4 th editio For AP* STARNES, YATES, MOORE Chapter 8 Estimatig with Cofidece 8.1 Cofidece Itervals: The Basics 8.2 8.3 Estimatig
More informationLecture 1 Probability and Statistics
Wikipedia: Lecture 1 Probability ad Statistics Bejami Disraeli, British statesma ad literary figure (1804 1881): There are three kids of lies: lies, damed lies, ad statistics. popularized i US by Mark
More informationStatisticians use the word population to refer the total number of (potential) observations under consideration
6 Samplig Distributios Statisticias use the word populatio to refer the total umber of (potetial) observatios uder cosideratio The populatio is just the set of all possible outcomes i our sample space
More informationTests of Hypotheses Based on a Single Sample (Devore Chapter Eight)
Tests of Hypotheses Based o a Sigle Sample Devore Chapter Eight MATH-252-01: Probability ad Statistics II Sprig 2018 Cotets 1 Hypothesis Tests illustrated with z-tests 1 1.1 Overview of Hypothesis Testig..........
More informationCEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering
CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio
More informationEconomics Spring 2015
1 Ecoomics 400 -- Sprig 015 /17/015 pp. 30-38; Ch. 7.1.4-7. New Stata Assigmet ad ew MyStatlab assigmet, both due Feb 4th Midterm Exam Thursday Feb 6th, Chapters 1-7 of Groeber text ad all relevat lectures
More informationProperties and Hypothesis Testing
Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.
More informationII. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation
II. Descriptive Statistics D. Liear Correlatio ad Regressio I this sectio Liear Correlatio Cause ad Effect Liear Regressio 1. Liear Correlatio Quatifyig Liear Correlatio The Pearso product-momet correlatio
More informationSimple Random Sampling!
Simple Radom Samplig! Professor Ro Fricker! Naval Postgraduate School! Moterey, Califoria! Readig:! 3/26/13 Scheaffer et al. chapter 4! 1 Goals for this Lecture! Defie simple radom samplig (SRS) ad discuss
More information(7 One- and Two-Sample Estimation Problem )
34 Stat Lecture Notes (7 Oe- ad Two-Sample Estimatio Problem ) ( Book*: Chapter 8,pg65) Probability& Statistics for Egieers & Scietists By Walpole, Myers, Myers, Ye Estimatio 1 ) ( ˆ S P i i Poit estimate:
More informationSection 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis
Sectio 9.2 Tests About a Populatio Proportio P H A N T O M S Parameters Hypothesis Assess Coditios Name the Test Test Statistic (Calculate) Obtai P value Make a decisio State coclusio Sectio 9.2 Tests
More informationMATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4
MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.
More informationAMS570 Lecture Notes #2
AMS570 Lecture Notes # Review of Probability (cotiued) Probability distributios. () Biomial distributio Biomial Experimet: ) It cosists of trials ) Each trial results i of possible outcomes, S or F 3)
More informationSDS 321: Introduction to Probability and Statistics
SDS 321: Itroductio to Probability ad Statistics Lecture 23: Cotiuous radom variables- Iequalities, CLT Puramrita Sarkar Departmet of Statistics ad Data Sciece The Uiversity of Texas at Austi www.cs.cmu.edu/
More informationComparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading
Topic 15 - Two Sample Iferece I STAT 511 Professor Bruce Craig Comparig Two Populatios Research ofte ivolves the compariso of two or more samples from differet populatios Graphical summaries provide visual
More information1 Introduction to reducing variance in Monte Carlo simulations
Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by
More informationDiscrete Mathematics for CS Spring 2005 Clancy/Wagner Notes 21. Some Important Distributions
CS 70 Discrete Mathematics for CS Sprig 2005 Clacy/Wager Notes 21 Some Importat Distributios Questio: A biased coi with Heads probability p is tossed repeatedly util the first Head appears. What is the
More informationBIOSTATS 640 Intermediate Biostatistics Frequently Asked Questions Topic 1 FAQ 1 Review of BIOSTATS 540 Introductory Biostatistics
BIOTAT 640 Itermediate Biostatistics Frequetly Asked Questios Topic FAQ Review of BIOTAT 540 Itroductory Biostatistics. I m cofused about the jargo ad otatio, especially populatio versus sample. Could
More informationDiscrete Mathematics and Probability Theory Spring 2014 Anant Sahai Lecture 16
EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Lecture 16 Variace Questio: Let us retur oce agai to the questio of how may heads i a typical sequece of coi flips. Recall that we
More informationData Analysis and Statistical Methods Statistics 651
Data Aalysis ad Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasii/teachig.html Suhasii Subba Rao Review of testig: Example The admistrator of a ursig home wats to do a time ad motio
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More informationMA131 - Analysis 1. Workbook 3 Sequences II
MA3 - Aalysis Workbook 3 Sequeces II Autum 2004 Cotets 2.8 Coverget Sequeces........................ 2.9 Algebra of Limits......................... 2 2.0 Further Useful Results........................
More informationMathematical Notation Math Introduction to Applied Statistics
Mathematical Notatio Math 113 - Itroductio to Applied Statistics Name : Use Word or WordPerfect to recreate the followig documets. Each article is worth 10 poits ad ca be prited ad give to the istructor
More informationVariance of Discrete Random Variables Class 5, Jeremy Orloff and Jonathan Bloom
Variace of Discrete Radom Variables Class 5, 18.05 Jeremy Orloff ad Joatha Bloom 1 Learig Goals 1. Be able to compute the variace ad stadard deviatio of a radom variable.. Uderstad that stadard deviatio
More informationLecture 2: Concentration Bounds
CSE 52: Desig ad Aalysis of Algorithms I Sprig 206 Lecture 2: Cocetratio Bouds Lecturer: Shaya Oveis Ghara March 30th Scribe: Syuzaa Sargsya Disclaimer: These otes have ot bee subjected to the usual scrutiy
More informationSample Size Determination (Two or More Samples)
Sample Sie Determiatio (Two or More Samples) STATGRAPHICS Rev. 963 Summary... Data Iput... Aalysis Summary... 5 Power Curve... 5 Calculatios... 6 Summary This procedure determies a suitable sample sie
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationA statistical method to determine sample size to estimate characteristic value of soil parameters
A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig
More informationRead through these prior to coming to the test and follow them when you take your test.
Math 143 Sprig 2012 Test 2 Iformatio 1 Test 2 will be give i class o Thursday April 5. Material Covered The test is cummulative, but will emphasize the recet material (Chapters 6 8, 10 11, ad Sectios 12.1
More informationLecture 24 Floods and flood frequency
Lecture 4 Floods ad flood frequecy Oe of the thigs we wat to kow most about rivers is what s the probability that a flood of size will happe this year? I 100 years? There are two ways to do this empirically,
More informationEco411 Lab: Central Limit Theorem, Normal Distribution, and Journey to Girl State
Eco411 Lab: Cetral Limit Theorem, Normal Distributio, ad Jourey to Girl State 1. Some studets may woder why the magic umber 1.96 or 2 (called critical values) is so importat i statistics. Where do they
More informationOctober 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1
October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 1 Populatio parameters ad Sample Statistics October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 2 Ifereces
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5
CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More informationSTAT 203 Chapter 18 Sampling Distribution Models
STAT 203 Chapter 18 Samplig Distributio Models Populatio vs. sample, parameter vs. statistic Recall that a populatio cotais the etire collectio of idividuals that oe wats to study, ad a sample is a subset
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationChapter 18 Summary Sampling Distribution Models
Uit 5 Itroductio to Iferece Chapter 18 Summary Samplig Distributio Models What have we leared? Sample proportios ad meas will vary from sample to sample that s samplig error (samplig variability). Samplig
More informationPower and Type II Error
Statistical Methods I (EXST 7005) Page 57 Power ad Type II Error Sice we do't actually kow the value of the true mea (or we would't be hypothesizig somethig else), we caot kow i practice the type II error
More informationSTAT Homework 2 - Solutions
STAT-36700 Homework - Solutios Fall 08 September 4, 08 This cotais solutios for Homework. Please ote that we have icluded several additioal commets ad approaches to the problems to give you better isight.
More informationDiscrete Mathematics and Probability Theory Summer 2014 James Cook Note 15
CS 70 Discrete Mathematics ad Probability Theory Summer 2014 James Cook Note 15 Some Importat Distributios I this ote we will itroduce three importat probability distributios that are widely used to model
More informationPRACTICE PROBLEMS FOR THE FINAL
PRACTICE PROBLEMS FOR THE FINAL Math 36Q Sprig 25 Professor Hoh Below is a list of practice questios for the Fial Exam. I would suggest also goig over the practice problems ad exams for Exam ad Exam 2
More informationWHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT
WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? Harold G. Loomis Hoolulu, HI ABSTRACT Most coastal locatios have few if ay records of tsuami wave heights obtaied over various time periods. Still
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationIntroducing Sample Proportions
Itroducig Sample Proportios Probability ad statistics Aswers & Notes TI-Nspire Ivestigatio Studet 60 mi 7 8 9 0 Itroductio A 00 survey of attitudes to climate chage, coducted i Australia by the CSIRO,
More informationEcon 371 Exam #1. Multiple Choice (5 points each): For each of the following, select the single most appropriate option to complete the statement.
Eco 371 Exam #1 Multiple Choice (5 poits each): For each of the followig, select the sigle most appropriate optio to complete the statemet 1) The probability of a outcome a) is the umber of times that
More informationModule 1 Fundamentals in statistics
Normal Distributio Repeated observatios that differ because of experimetal error ofte vary about some cetral value i a roughly symmetrical distributio i which small deviatios occur much more frequetly
More informationCONFIDENCE INTERVALS STUDY GUIDE
CONFIDENCE INTERVALS STUDY UIDE Last uit, we discussed how sample statistics vary. Uder the right coditios, sample statistics like meas ad proportios follow a Normal distributio, which allows us to calculate
More informationRecall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.
Testig Statistical Hypotheses Recall the study where we estimated the differece betwee mea systolic blood pressure levels of users of oral cotraceptives ad o-users, x - y. Such studies are sometimes viewed
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationThe variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.
SAMPLE STATISTICS A radom sample x 1,x,,x from a distributio f(x) is a set of idepedetly ad idetically variables with x i f(x) for all i Their joit pdf is f(x 1,x,,x )=f(x 1 )f(x ) f(x )= f(x i ) The sample
More informationIf, for instance, we were required to test whether the population mean μ could be equal to a certain value μ
STATISTICAL INFERENCE INTRODUCTION Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I oesample testig, we essetially
More information