Parameter, Statistic and Random Samples

Similar documents
Parameter, Statistic and Random Samples


MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Topic 10: Introduction to Estimation

CHAPTER 2. Mean This is the usual arithmetic mean or average and is equal to the sum of the measurements divided by number of measurements.

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Binomial Distribution

Chapter 6 Sampling Distributions

Median and IQR The median is the value which divides the ordered data values in half.

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Elementary Statistics

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators

Data Description. Measure of Central Tendency. Data Description. Chapter x i

Eco411 Lab: Central Limit Theorem, Normal Distribution, and Journey to Girl State

Lecture 5. Materials Covered: Chapter 6 Suggested Exercises: 6.7, 6.9, 6.17, 6.20, 6.21, 6.41, 6.49, 6.52, 6.53, 6.62, 6.63.

Economics 250 Assignment 1 Suggested Answers. 1. We have the following data set on the lengths (in minutes) of a sample of long-distance phone calls

Topic 9: Sampling Distributions of Estimators

Sample Size Determination (Two or More Samples)

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

(# x) 2 n. (" x) 2 = 30 2 = 900. = sum. " x 2 = =174. " x. Chapter 12. Quick math overview. #(x " x ) 2 = # x 2 "

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

MEASURES OF DISPERSION (VARIABILITY)

7.1 Convergence of sequences of random variables

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics

Chapter 2 Descriptive Statistics

Lecture 1. Statistics: A science of information. Population: The population is the collection of all subjects we re interested in studying.

(6) Fundamental Sampling Distribution and Data Discription

MATH/STAT 352: Lecture 15

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Probability and statistics: basic terms

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Statistics 511 Additional Materials

4. Partial Sums and the Central Limit Theorem

STP 226 EXAMPLE EXAM #1

z is the upper tail critical value from the normal distribution

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

32 estimating the cumulative distribution function

Lecture 7: Properties of Random Samples

Statisticians use the word population to refer the total number of (potential) observations under consideration

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

ENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Solutions Descriptive Statistics. None at all!

Mathematical Statistics - MS

Expectation and Variance of a random variable

Simulation. Two Rule For Inverting A Distribution Function

Lecture 2: Monte Carlo Simulation

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Estimation for Complete Data

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Random Variables, Sampling and Estimation

Understanding Samples

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Computing Confidence Intervals for Sample Data

Frequentist Inference

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Stat 421-SP2012 Interval Estimation Section

1 Inferential Methods for Correlation and Regression Analysis

BIOSTATISTICS. Lecture 5 Interval Estimations for Mean and Proportion. dr. Petr Nazarov

Topic 18: Composite Hypotheses

Distribution of Random Samples & Limit theorems

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

Statistical inference: example 1. Inferential Statistics

STAC51: Categorical data Analysis

Final Examination Solutions 17/6/2010

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers

Introduction to Probability and Statistics Twelfth Edition

Stat 225 Lecture Notes Week 7, Chapter 8 and 11

Chapter 6 Principles of Data Reduction

Department of Civil Engineering-I.I.T. Delhi CEL 899: Environmental Risk Assessment HW5 Solution

STAT 515 fa 2016 Lec Sampling distribution of the mean, part 2 (central limit theorem)

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Topic 8: Expected Values

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

7.1 Convergence of sequences of random variables

Chapter 23: Inferences About Means

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Instructor: Judith Canner Spring 2010 CONFIDENCE INTERVALS How do we make inferences about the population parameters?

Economics Spring 2015

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Anna Janicka Mathematical Statistics 2018/2019 Lecture 1, Parts 1 & 2

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Read through these prior to coming to the test and follow them when you take your test.

2: Describing Data with Numerical Measures

Exam 2 Instructions not multiple versions

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Statistics 300: Elementary Statistics

Transcription:

Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e., it is a quatity whose value ca be calculated from the sample data. It is a radom variable with a distributio fuctio. Statistics are used to make iferece about ukow populatio parameters. The radom variables X, X 2,, X are said to form a (simple) radom sample of size if the X i s are idepedet radom variables ad each X i has the sample probability distributio. We say that the X i s are iid. STA286 week 8

Example Sample Mea ad Variace Suppose X, X 2,, X is a radom sample of size from a populatio with mea μ ad variace σ 2. The sample mea is defied as X i X i. The sample variace is defied as 2 S 2 ( X i X ). The sample stadard deviatio, S, is the square root of the sample variace. i STA286 week 8 2

Quatiles A quatile of a sample, x p, is the value for which a specific fractio, p, of the data values is less tha or equal to it, ad (-p) is greater tha it. The most kow quatile is the media which is the 50th quatile. Quatiles are ofte described as percetiles ad represets a estimate of a characteristic of the theoretical distributio. If a data set cotais observatios, the the pth percetile is the th p ( + ) value i the ordered data set. 00 We ca describe the spread or variability of a distributio by givig several percetiles. STA286 week 8 3

Quartiles The 25th percetile is called the first quartile (Q ). The 75th percetile is called the third quartile (Q 3 ). Note, the media is the secod quartile Q 2. The distace betwee the first ad third quartiles is called the Iterquartile rage (IQR) i.e. IQR Q 3 Q. The IQR is aother measure of spread that is less sesitive to the ifluece of extreme values. STA286 week 8 4

The five-umber summary Thefive-umber summary of a set of observatios cosists of the smallest observatio, the first quartile, the media, the third quartile ad the largest observatio. These five umbers give a reasoably complete descriptio of both the ceter ad the spread of the distributio. MINITAB commads: Stat > Basic Statistics > Display Descriptive Statistics STA286 week 8 5

Example The highway mileages of 20 cars, arraged i icreasig order are: 3 5 6 6 7 9 20 22 23 23 23 24 25 25 26 28 28 28 29 32. Give the five umber summary. Aswer We have, mi 3, Q 8, media 23, Q 3 27, max 32. The MINITAB output usig the above commads is as follows: Variable N Miimum Q Media Q3 Maximum mileage 20 3.00 7.50 23.00 27.50 32.00 STA286 week 8 6

Box-plot A box-plot is a graph of the five-umber summary. Example: Make a box-plot for the data i the above example. Boxplot of Mileages 30 Mileages 25 20 5 MINITAB commads: Graph > Boxplot STA286 week 8 7

Quatile Plots A quatile plot is a plot of the data values o the vertical axis agaist a empirical assessmet of the fractio of observatios exceeded by the data value. A very useful quatile plot is the Normal-Quatile-Quatile plot. It is ofte used by aalysts to determie whether a data set came from a ormal distributio. A Normal Quatile Quatile plot is a plot of the empirical (data) quatiles agaist the correspodig quatiles of the ormal distributio STA286 week 8 8

Iterpretig Normal Quatile Plots If the data comes form ay ormal distributio, the NQQ plot produces a straight lie o the plot. If the poits o a ormal quatile plot lie close to a straight lie, the plot idicates that the data are ormal. Systematic deviatios from a straight lie idicate a oormal distributio. Outliers appear as poits that are far away from the overall patter of the plot. STA286 week 8 9

Histogram, the scores plot ad the ormal quatile plot for data geerated from a ormal distributio (N(500, 20)). 5 540 530 0 520 Frequecy 5 value 50 500 490 480 0 460 470 480 490 500 50 520 530 540 value 470 460 Normal Probability Plot for value -2-0 2 cores 99 ML Estimates 95 90 Mea: StDev: 500.343 7.468 Percet 80 70 60 50 40 30 20 0 5 STA286 week 8 0 450 500 550 Data

Histogram, the scores plots ad the ormal quatile plot for data geerated from a right skewed distributio 0 Frequecy 5 0 0 5 0 value 0 value 5 0-2 - 0 2 cores 2 STA286 week 8

2 cores 0 - -2 0 5 0 value Norm al Probability Plot for value 99 M L Estim ates 95 90 M ea: StDev: 2.64938 2.7848 Percet 80 70 60 50 40 30 20 0 5 0 5 0 STA286 Data week 8 2

Histogram, the scores plots ad the ormal quatile plot for data geerated from a left skewed distributio 0 Frequecy 5 0 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95.05 value.0 0.9 0.8 value 0.7 0.6 0.5 0.4 0.3-2 - 0 2 score STA286 week 8 3

2 score 0 - -2 0.3 0.4 0.5 0.6 0.7 0.8 0.9.0 value Normal Probability Plot for value 99 ML Estimates 95 90 M ea: StDev: 0.802 0.6648 Percet 80 70 60 50 40 30 20 0 5 0.50 0.75.00.25 Data STA286 week 8 4

Histogram, the scores plots ad the ormal quatile plot for data geerated from a uiform distributio (0,5) Frequecy 9 8 7 6 5 4 3 2 0 0.0 0.5.0.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 value 5 4 value 3 2 0-2 - 0 2 cores STA286 week 8 5

2 cores 0 - -2 0 2 3 4 5 value Normal Probability Plot for value 99 M L Estim ates 95 90 M ea: StDev: 2.2603.46678 Percet 80 70 60 50 40 30 20 0 5-2 - 0 2 3 4 5 6 STA286 week 8 6 Data

Samplig Distributio of a Statistic The samplig distributio of a statistic is the distributio of values take by the statistic i all possible samples of the same size from the same populatio. The distributio fuctio of a statistic is NOT the same as the distributio of the origial populatio that geerated the origial sample. The form of the theoretical samplig distributio of a statistic will deped upo the distributio of the observable radom variables i the sample. STA286 week 8 7

Samplig from Normal populatio Ofte we assume the radom sample X, X 2, X is from a ormal populatio with ukow mea μ ad variace σ 2. Suppose we are iterested i estimatig μ ad testig whether it is equal to a certai value. For this we eed to kow the probability distributio of the estimator of μ. STA286 week 8 8

Samplig Distributio of Sample Mea Suppose X, X 2, X are i.i.d ormal radom variables with ukow mea μ ad variace σ 2 the X ~ 2 σ N μ, Proof: STA286 week 8 9

The Cetral Limit Theorem Let X, X 2, be a sequece of i.i.d radom variables with mea E(X i ) μ < ad Var(X i ) σ 2 <. Let S μ The, Z coverges i distributio to Z ~ N(0,). σ Also, Z coverges i distributio to Z ~ N(0,). σ Example X μ S X i i STA286 week 8 20

Example Suppose that the weights of airlie passegers are kow to have a distributio with a mea of 75kg ad a std. dev. of 0kg. A certai plae has a passeger weight capacity of 7700kg. What is the probability that a flight of 00 passegers will exceed the capacity? week 8 2

Questio State whether the followig statemets are true or false. (i) As the sample size icreases, the mea of the samplig distributio of the sample mea X decreases. (ii) As the sample size icreases, the stadard deviatio of the samplig distributio of the sample mea X decreases. (iii) The mea X of a radom sample of size 4 from a egatively skewed distributio is approximately ormally distributed. (iv) The distributio of the proportio of successes X i a sufficietly large sample is approximately ormal with mea p ad stadard deviatio p ( p) where p is the populatio proportio ad is the sample size. (v) If X is the mea of a simple radom sample of size 9 from N(500, 8) distributio, the X has a ormal distributio with mea 500 ad variace 36. week 8 22

Questio State whether the followig statemets are true or false. o A large sample from a skewed populatio will have a approximately ormal shaped histogram. o The mea of a populatio will be ormally distributed if the populatio is quite large. o The average blood cholesterol level recorded i a SRS of 00 studets from a large populatio will be approximately ormally distributed. o The proportio of people with icomes over $200 000, i a SRS of 0 people, selected from all Caadia icome tax filers will be approximately ormal. week 8 23

Exercise A parkig lot is patrolled twice a day (morig ad afteroo). I the morig, the chace that ay particular spot has a illegally parked car is 0.02. If the spot cotaied a car that was ticketed i the morig, the probability the spot is also ticketed i the afteroo is 0.. If the spot was ot ticketed i the morig, there is a 0.005 chace the spot is ticketed i the afteroo. a) Suppose tickets cost $0. What is the expected value of the tickets for a sigle spot i the parkig lot. b) Suppose the lot cotais 400 spots. What is the distributio of the value of the tickets for a day? c) What is the probability that more tha $200 worth of tickets are writte i a day? week 8 24

Law of Large Numbers - Example Toss a coi times. Suppose X i 0 if i if i th th toss came up H toss came up T X i s are Beroulli radom variables with p ½ ad E(X i ) ½. The proportio of heads is X X i. X i Ituitively approaches ½ as. STA286 week 8 25

STA286 week 8 26 Law of Large Numbers Iterested i sequece of radom variables X, X 2, X 3, such that the radom variables are idepedet ad idetically distributed (i.i.d). Let Suppose E(X i ) μ, V(X i ) σ 2, the ad Ituitively, as, so i X i X ( ) ( ) μ i i i i X E X E X E ( ) ( ) X V X V X V i i i i 2 2 σ ( ) 0 X V ( ) μ X E X

Formally, the Weak Law of Large Numbers (WLLN) states the followig: Suppose X, X 2, X 3, are i.i.d with E(X i ) μ <, V(X i ) σ 2 <, the for ay positive umber a as. ( X a) 0 P μ This is called Covergece i Probability. STA286 week 8 27

Recall - The Chi Square distributio If Z ~ N(0,) the, X Z 2 has a Chi-Square distributio with parameter, i.e., X χ ~ 2 (). Ca proof this usig chage of variable theorem for uivariate radom variables. The momet geeratig fuctio of X is m X () t 2t / 2 2 2 2 If X χ, X ~ χ, K, X χ, all idepedet the Proof ~ 2 k ( v ) 2 ( v ) k ( v ) ~ k T ~ χ i X i 2 Σ k v i STA286 week 8 28

Claim Suppose X, X 2, X are i.i.d ormal radom variables with mea μ ad variace σ 2 X. The, i μ Z are idepedet stadard ormal i σ variables, where i, 2,, ad Proof: i Z 2 i i 2 X i μ σ 2 ~ χ ( ) STA286 week 8 29

Samplig Distributio of S 2 Suppose X, X 2, X are i.i.d ormal radom variables with mea μ ad variace σ 2. The, ( ) 2 σ s 2 2 σ 2 2 ( X i X ) ~ χ( ) i Further, it ca be show that X ad s 2 are idepedet. STA286 week 8 30

t distributio Suppose Z ~ N(0,) idepedet of X ~ χ 2 (). The, T Z X / v ~ t ( ). v Proof: usig oe dimesioal chage of variables theorem. The desity fuctio of the t-distributio is give by STA286 week 8 3

Claim Suppose X, X 2, X are i.i.d ormal radom variables with mea μ ad variace σ 2. The, Proof: X μ ~ t S / ( ) STA286 week 8 32

F distributio Suppose X ~ χ 2 () idepedet of Y ~ χ 2 (m). The, X / Y / m ~ F (, m) The desity fuctio of the F distributio is give by STA286 week 8 33

Properties of the F distributio The F-distributio is a right skewed distributio. F( ) i.e. m, F ( < a) P F(, m) (, m) P F (, m) > a P F( m, ) > a Ca use Table A.6 i appedix to fid percetile of the F- distributio. Example STA286 week 8 34