Module 1 Fundamentals in statistics

Similar documents
Chapter 6 Sampling Distributions

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Random Variables, Sampling and Estimation

Expectation and Variance of a random variable

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Lecture 2: Monte Carlo Simulation

MATH/STAT 352: Lecture 15

Chapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008

Computing Confidence Intervals for Sample Data

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

(6) Fundamental Sampling Distribution and Data Discription

Binomial Distribution

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Stat 200 -Testing Summary Page 1

Statisticians use the word population to refer the total number of (potential) observations under consideration

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Statistics 511 Additional Materials

This is an introductory course in Analysis of Variance and Design of Experiments.

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

1 Inferential Methods for Correlation and Regression Analysis

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Economics Spring 2015

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Lecture 7: Properties of Random Samples

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Chapter 6. Sampling and Estimation

Topic 10: Introduction to Estimation

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Sampling Error. Chapter 6 Student Lecture Notes 6-1. Business Statistics: A Decision-Making Approach, 6e. Chapter Goals

1 Introduction to reducing variance in Monte Carlo simulations

Basis for simulation techniques

Sampling Distributions, Z-Tests, Power

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

CSE 527, Additional notes on MLE & EM

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

STAT 515 fa 2016 Lec Sampling distribution of the mean, part 2 (central limit theorem)

Mathacle. PSet Stats, Concepts In Statistics Level Number Name: Date:

Common Large/Small Sample Tests 1/55

Comparing your lab results with the others by one-way ANOVA

Chapter 2 The Monte Carlo Method

(7 One- and Two-Sample Estimation Problem )

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 3

4. Partial Sums and the Central Limit Theorem

Stat 421-SP2012 Interval Estimation Section

Data Analysis and Statistical Methods Statistics 651

Stat 319 Theory of Statistics (2) Exercises

Probability and statistics: basic terms

AMS 216 Stochastic Differential Equations Lecture 02 Copyright by Hongyun Wang, UCSC ( ( )) 2 = E X 2 ( ( )) 2

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

Properties and Hypothesis Testing

Introductory statistics

Exam II Review. CEE 3710 November 15, /16/2017. EXAM II Friday, November 17, in class. Open book and open notes.

Statistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes

AMS570 Lecture Notes #2

Lecture 5. Materials Covered: Chapter 6 Suggested Exercises: 6.7, 6.9, 6.17, 6.20, 6.21, 6.41, 6.49, 6.52, 6.53, 6.62, 6.63.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

Frequentist Inference

UNIT 2 DIFFERENT APPROACHES TO PROBABILITY THEORY

Elements of Statistical Methods Lots of Data or Large Samples (Ch 8)

independence of the random sample measurements, we have U = Z i ~ χ 2 (n) with σ / n 1. Now let W = σ 2. We then have σ 2 (x i µ + µ x ) 2 i =1 ( )

Parameter, Statistic and Random Samples

Sample Size Determination (Two or More Samples)

Estimation for Complete Data

Chapter 23: Inferences About Means

Confidence Level We want to estimate the true mean of a random variable X economically and with confidence.

Power and Type II Error

Lecture 19: Convergence

Estimation of a population proportion March 23,

Lecture 33: Bootstrap

A Question. Output Analysis. Example. What Are We Doing Wrong? Result from throwing a die. Let X be the random variable

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

The standard deviation of the mean

Exam II Covers. STA 291 Lecture 19. Exam II Next Tuesday 5-7pm Memorial Hall (Same place as exam I) Makeup Exam 7:15pm 9:15pm Location CB 234

GG313 GEOLOGICAL DATA ANALYSIS

Topic 6 Sampling, hypothesis testing, and the central limit theorem

Distribution of Random Samples & Limit theorems

Transcription:

Normal Distributio Repeated observatios that differ because of experimetal error ofte vary about some cetral value i a roughly symmetrical distributio i which small deviatios occur much more frequetly tha large oes. A cotiuous probability distributio, which is valuable for represetig this situatio is the Gaussia or Normal Distributio. It is give by Module Fudametals i statistics p( y) e ( y ) where is the mea, ad is the variace., y Figure: Normal distributio with differet meas ad variaces Deoted by N(, ) Characterizig a Normal Distributio Oce the mea ad the variace of a ormal distributio are give, the etire distributio is characterized. Area uder a ormal distributio=. Figure: Tail areas of the ormal distributio Stadard Normal Distributio If y is a ormally distributed variable, the it ca be expressed i terms of a stadardized ormal deviate or a uit ormal deviate, y z The distributio of z is N(,). Tables for uit ormal deviate are give i books (refer to Table i hadout). 3 4

Usig Tables of Normal Distributio Example Suppose the daily level of a impurity i a reactor feed is kow to be approximately ormally distributed with a mea of = 4. ad a stadard deviatio of =.3. What is the probability that the impurity level o a radomly chose day will exceed 4.4? Pr( y ( y 4. ) 4.4) = Pr( y 4. 4 44. 4. ) = Pr 3. 3. From the Table of uit ormal distributio we fid that Pr( z 33. ) =.98, i.e. there is 9% probability that the impurity level o a radomly chose day will exceed 4.4. Or Out of radomly chose days, there will be 9 days whe the impurity level will exceed 4.4. Studet s t distributio If the mea ad the stadard deviatio are kow the oe ca compute the uit ormal deviate ad use the uit ormal distributio tables. If the mea ad the stadard deviatios are ukow what ca we say about the occurrece of a evet? 6 t distributio or Studet s distributio cotd Usig disttool i Matlab Example Suppose that s.3 ad y 4., what ca oe say about the radom occurrece of impurity level of 4.4? Sice is ukow we caot use the uit ormal tables. Istead defie the term: ( y y) t s that has a distributio kow as the t distributio. This distributio is similar to the ormal distributio for large sample sizes ( i.e. >3) The Pr(t >.33)=. where there are 6 dof associated with s (i.e. there are oly 7 observatios from which s is calculated). (see table i hadout).8.6.4. -8-6 -4-4 6 8 8

t distributio for various degrees of freedom William Gosset lived from 876 to 937 Gosset iveted the t -test to hadle small samples for quality cotrol i brewig. He wrote uder the ame "Studet". Fid out more at: http://www-history.mcs.st-adrews.ac.uk/history/ Mathematicias/Gosset.html 9 For distributio of averages i t ca be show that The quatity has a t-distributio with - DOF. Note that the shape of the distributio depeds o the DOF -. Specificatio of the DOF is therefore importat. The PDF of t-distributios is symmetric ad bell-shaped like the ormal distributio. However, the spread is more tha that of the stadard ormal distributio. Area uder t-desity fuctio from - to +; DOF:.896 (t-distributio) = (t-distributio) = - - 3 >> cdf( T,,) =.896 Lear to get this from t-tables (iterpolatio may be eeded) The PDF of t-distributio teds to that of the N(,) distributio as the sample size icreases. >> cdf('t',:,3).837345698699.97687477585 >> cdf('normal',:,,).843447466854.9774986858

Example 3 Distributio Based o the data collected from families residig i a area of a city i the last year, test the hypothesis that the average icome of a family per year is 75, $. H: H: (i) Simulate data samples by X = 5 + 5*rad(,); Xm = mea(x); S = std(x); (ii) With 95% level of cofidece, if the we accept H. Otherwise, we reject H. 3 Q? : What is the distributio of s whe observatios are draw radomly from a ormal distributio? The distributio provides a distributio of the sample variace, i.e. it is the distributio of the scaled quatity: ( x i x) or equivaletly s ~ ~ ( ) read last term as: s is distributed as with (-) dof ad a scale factor of /( ) Chi-squared Distributio (i) Let x i s be idepedet samples draw from N(,). The, the sum.5..5 f() =3 =5 is distributed as with degrees of freedom Mea (chi-sq) = Variace (chi-sq) (ii) Let x i s be idepedet samples draw from N(, ). The, Useful for variace related statistical aalysis 5..5 = = 5 5 5 3 Ulike the ormal ad t-distributios, the -distributio is ot symmetric. As icreases, the distributio teds to be more symmetric For = 3, the -distributio teds to a ormal distributio with mea ad variace. 6

Area uder the desity curve betwee ad.645 with 6 DOF =.9 For what value of x is the area uder the desity curve betwee ad x equal to.9? Let DOF = 6. >> cdf( chi,.645,6) as =.9 >> chiiv(.9,6) as =.6446 Lear to obtai these values from the distributio Tables. Also uderstad the upper ad lower limits, ad, respectively, as depicted below. Q? : What is the distributio of whe observatios are draw radomly from a ormal distributio? The distributio provides a distributio of the sample variace, i.e. it is the distributio of the quatity: or equivaletly, where s is the sample variace from a radom sample of observatios from a ormal distributio with ukow variace. Withi a ( ) % cofidece iterval, ( ) s / ( ) / ( ) which gives the cofidece iterval for as follows ( ) s ( ) / ( ) s ( ) / 7 8 Example 4 Assume that we have estimated the variace of a certai product characteristic to be 3 (i.e. s = 3) from = 36 observatios. The samples are assumed to come from a ormal populatio with ukow. The task is to compute a 95% cofidece iterval for. ( ) s PA B.95 ( ) s P B ( ) s A.95 Thus the required cofidece limits are: ad, i.e. 8.56.. Example 5 The data i Example 4 come from a ormal distributio with variace. Test the hypothesis that H: = 5. H: 5. Two tailed test: Look for.5% area o both tails of the distributio..5% area o the left occurs at.57, ad.5% area o the right remais at.833. Therefore, A =.57 ad B =53.. Sice but larger tha which is less tha chiiv(.5,35) =.57. chiiv(.975,35) = 53.. 9 we accept the ull hypothesis.

F Distributio Suppose that a sample of observatios is radomly draw from a ormal distributio havig variace, a secod sample of observatios draw from a secod ormal distributio havig variace. The what ca we say about s / s : As: s ~ F, s where = -, ad = -, are the degrees of freedom. F Distributio Let (m) ad () be idepedet variables distributed as chi-squared with m DOF ad DOF. ( ) The ratio is distributed as a F-distributio over the ( m) m domai [, ) with [, m] DOF. fiv(.95,4,) = 3.59 fiv(.95,,) =.967 cdf( f,3.6,4,) =.95 cdf( f,.93,,) =.95 Useful for comparig variaces of two sets of RVs If N(, ) ad N(, ), the where As a result, The (-cofidece iterval for is H: =.. H: Two radom samples of = ad m = observatio are take, ad the sample variaces are Uder H, the test statistic is Example 6 A chemical egieer is ivestigatig the iheret variability of two types of equipmet. He suspects that the old equipmet, Type, has a differet variace from the ew oe. Hece, he wishes to test the hypothesis: s = 4.5 ad s =.8. Or 3 ad larger tha ca be accepted. Hece, the ull hypothesis 4

Cetral Limit Theorem Overall error is usually a aggregate of umber of compoet errors. a a... follows a ormal distributio as the umber of compoets become large, irrespective of the idividual distributios of the compoets. A importat provisio here is that several of the sources of errors must make importat cotributios to the overall error ad, i particular that o sigle source of error domiates all the rest. a 5 Take radom samples. Fid average. Repeat this procedure may times ad make a distributio of these samples. Cetral Limit Tedecy for Averages (slide ) Paret Distributio Sampled Distributio.4.3.. -3 - - 3.4.3.. -3 - - 3 Sampled distributio is more early ormal tha paret distributio. Variace of radomly sampled distributio is smaller tha paret distributio. 6 Cetral Limit Theorem (cot.) Cetral Limit Tedecy for Averages: Dice Example If the samplig is radom (so that the errors are idepedet ad ucorrelated), the we have the simple rule that y varies about the populatio mea ad variace. Thus E( y), V ( y) Where is the umber of radom samples collected at each time. 7 8

Example: Cetral Limit Tedecy for Averages Two dice are throw. Average is computed. Plot histogram Die: >> b=ceil(rad(,)*6); >> c=mea(b'); >> =uique(c); >> hist(c,) 3 4 5 6 >> b=ceil(rad(,)*6); >> hist(b,::6) 9 3 Three dice are throw. Average is computed. Plot histogram >> b=ceil(rad(,3)*6); >> c=mea(b'); Dice Average >> b=ceil(rad(,)*6); >> c=mea(b'); >> =uique(c); >> hist(c,) >> =uique(c); >> hist(c,) 3 3

>> b=ceil(rad(,)*6); >> c=mea(b'); >> =uique(c); >> N=histc(c,); >> N=N/; >> bar(n,); Eve though the.8 idividual observatios.7 do ot come.6 from a ormal distributio,.5 the distributio of the average.4 value teds to.3 be ormally distributed.. (eve if averages of five idividual. observatios are cosidered).5.5 3 3.5 4 4.5 5 5.5 6 33 The cetral limit theorem explais that o matter what distributio a RV follows, the sum or mea of this RV ted to be close to the ormal distributio. Let X, X, be the idepedetly, idetically distributed (i.i.d.) RVs havig mea ad fiite variace. The Or where The Cetral Limit Theorem 34