Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e., it is a quatity whose value ca be calculated from the sample data. It is a radom variable with a distributio fuctio. Statistics are used to make iferece about ukow populatio parameters. The radom variables X, X 2,, X are said to form a (simple) radom sample of size if the X i s are idepedet radom variables ad each X i has the sample probability distributio. We say that the X i s are iid. STA286 week 8
Example Sample Mea ad Variace Suppose X, X 2,, X is a radom sample of size from a populatio with mea μ ad variace σ 2. The sample mea is defied as X i X i. The sample variace is defied as 2 S 2 ( X i X ). The sample stadard deviatio, S, is the square root of the sample variace. i STA286 week 8 2
Quatiles A quatile of a sample, x p, is the value for which a specific fractio, p, of the data values is less tha or equal to it, ad (-p) is greater tha it. The most kow quatile is the media which is the 50th quatile. Quatiles are ofte described as percetiles ad represets a estimate of a characteristic of the theoretical distributio. If a data set cotais observatios, the the pth percetile is the th p ( + ) value i the ordered data set. 00 We ca describe the spread or variability of a distributio by givig several percetiles. STA286 week 8 3
Quartiles The 25th percetile is called the first quartile (Q ). The 75th percetile is called the third quartile (Q 3 ). Note, the media is the secod quartile Q 2. The distace betwee the first ad third quartiles is called the Iterquartile rage (IQR) i.e. IQR Q 3 Q. The IQR is aother measure of spread that is less sesitive to the ifluece of extreme values. STA286 week 8 4
The five-umber summary Thefive-umber summary of a set of observatios cosists of the smallest observatio, the first quartile, the media, the third quartile ad the largest observatio. These five umbers give a reasoably complete descriptio of both the ceter ad the spread of the distributio. MINITAB commads: Stat > Basic Statistics > Display Descriptive Statistics STA286 week 8 5
Example The highway mileages of 20 cars, arraged i icreasig order are: 3 5 6 6 7 9 20 22 23 23 23 24 25 25 26 28 28 28 29 32. Give the five umber summary. Aswer We have, mi 3, Q 8, media 23, Q 3 27, max 32. The MINITAB output usig the above commads is as follows: Variable N Miimum Q Media Q3 Maximum mileage 20 3.00 7.50 23.00 27.50 32.00 STA286 week 8 6
Box-plot A box-plot is a graph of the five-umber summary. Example: Make a box-plot for the data i the above example. Boxplot of Mileages 30 Mileages 25 20 5 MINITAB commads: Graph > Boxplot STA286 week 8 7
Quatile Plots A quatile plot is a plot of the data values o the vertical axis agaist a empirical assessmet of the fractio of observatios exceeded by the data value. A very useful quatile plot is the Normal-Quatile-Quatile plot. It is ofte used by aalysts to determie whether a data set came from a ormal distributio. A Normal Quatile Quatile plot is a plot of the empirical (data) quatiles agaist the correspodig quatiles of the ormal distributio STA286 week 8 8
Iterpretig Normal Quatile Plots If the data comes form ay ormal distributio, the NQQ plot produces a straight lie o the plot. If the poits o a ormal quatile plot lie close to a straight lie, the plot idicates that the data are ormal. Systematic deviatios from a straight lie idicate a oormal distributio. Outliers appear as poits that are far away from the overall patter of the plot. STA286 week 8 9
Histogram, the scores plot ad the ormal quatile plot for data geerated from a ormal distributio (N(500, 20)). 5 540 530 0 520 Frequecy 5 value 50 500 490 480 0 460 470 480 490 500 50 520 530 540 value 470 460 Normal Probability Plot for value -2-0 2 cores 99 ML Estimates 95 90 Mea: StDev: 500.343 7.468 Percet 80 70 60 50 40 30 20 0 5 STA286 week 8 0 450 500 550 Data
Histogram, the scores plots ad the ormal quatile plot for data geerated from a right skewed distributio 0 Frequecy 5 0 0 5 0 value 0 value 5 0-2 - 0 2 cores 2 STA286 week 8
2 cores 0 - -2 0 5 0 value Norm al Probability Plot for value 99 M L Estim ates 95 90 M ea: StDev: 2.64938 2.7848 Percet 80 70 60 50 40 30 20 0 5 0 5 0 STA286 Data week 8 2
Histogram, the scores plots ad the ormal quatile plot for data geerated from a left skewed distributio 0 Frequecy 5 0 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95.05 value.0 0.9 0.8 value 0.7 0.6 0.5 0.4 0.3-2 - 0 2 score STA286 week 8 3
2 score 0 - -2 0.3 0.4 0.5 0.6 0.7 0.8 0.9.0 value Normal Probability Plot for value 99 ML Estimates 95 90 M ea: StDev: 0.802 0.6648 Percet 80 70 60 50 40 30 20 0 5 0.50 0.75.00.25 Data STA286 week 8 4
Histogram, the scores plots ad the ormal quatile plot for data geerated from a uiform distributio (0,5) Frequecy 9 8 7 6 5 4 3 2 0 0.0 0.5.0.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 value 5 4 value 3 2 0-2 - 0 2 cores STA286 week 8 5
2 cores 0 - -2 0 2 3 4 5 value Normal Probability Plot for value 99 M L Estim ates 95 90 M ea: StDev: 2.2603.46678 Percet 80 70 60 50 40 30 20 0 5-2 - 0 2 3 4 5 6 STA286 week 8 6 Data
Samplig Distributio of a Statistic The samplig distributio of a statistic is the distributio of values take by the statistic i all possible samples of the same size from the same populatio. The distributio fuctio of a statistic is NOT the same as the distributio of the origial populatio that geerated the origial sample. The form of the theoretical samplig distributio of a statistic will deped upo the distributio of the observable radom variables i the sample. STA286 week 8 7
Samplig from Normal populatio Ofte we assume the radom sample X, X 2, X is from a ormal populatio with ukow mea μ ad variace σ 2. Suppose we are iterested i estimatig μ ad testig whether it is equal to a certai value. For this we eed to kow the probability distributio of the estimator of μ. STA286 week 8 8
Samplig Distributio of Sample Mea Suppose X, X 2, X are i.i.d ormal radom variables with ukow mea μ ad variace σ 2 the X ~ 2 σ N μ, Proof: STA286 week 8 9
The Cetral Limit Theorem Let X, X 2, be a sequece of i.i.d radom variables with mea E(X i ) μ < ad Var(X i ) σ 2 <. Let S μ The, Z coverges i distributio to Z ~ N(0,). σ Also, Z coverges i distributio to Z ~ N(0,). σ Example X μ S X i i STA286 week 8 20
Example Suppose that the weights of airlie passegers are kow to have a distributio with a mea of 75kg ad a std. dev. of 0kg. A certai plae has a passeger weight capacity of 7700kg. What is the probability that a flight of 00 passegers will exceed the capacity? week 8 2
Questio State whether the followig statemets are true or false. (i) As the sample size icreases, the mea of the samplig distributio of the sample mea X decreases. (ii) As the sample size icreases, the stadard deviatio of the samplig distributio of the sample mea X decreases. (iii) The mea X of a radom sample of size 4 from a egatively skewed distributio is approximately ormally distributed. (iv) The distributio of the proportio of successes X i a sufficietly large sample is approximately ormal with mea p ad stadard deviatio p ( p) where p is the populatio proportio ad is the sample size. (v) If X is the mea of a simple radom sample of size 9 from N(500, 8) distributio, the X has a ormal distributio with mea 500 ad variace 36. week 8 22
Questio State whether the followig statemets are true or false. o A large sample from a skewed populatio will have a approximately ormal shaped histogram. o The mea of a populatio will be ormally distributed if the populatio is quite large. o The average blood cholesterol level recorded i a SRS of 00 studets from a large populatio will be approximately ormally distributed. o The proportio of people with icomes over $200 000, i a SRS of 0 people, selected from all Caadia icome tax filers will be approximately ormal. week 8 23
Exercise A parkig lot is patrolled twice a day (morig ad afteroo). I the morig, the chace that ay particular spot has a illegally parked car is 0.02. If the spot cotaied a car that was ticketed i the morig, the probability the spot is also ticketed i the afteroo is 0.. If the spot was ot ticketed i the morig, there is a 0.005 chace the spot is ticketed i the afteroo. a) Suppose tickets cost $0. What is the expected value of the tickets for a sigle spot i the parkig lot. b) Suppose the lot cotais 400 spots. What is the distributio of the value of the tickets for a day? c) What is the probability that more tha $200 worth of tickets are writte i a day? week 8 24
Law of Large Numbers - Example Toss a coi times. Suppose X i 0 if i if i th th toss came up H toss came up T X i s are Beroulli radom variables with p ½ ad E(X i ) ½. The proportio of heads is X X i. X i Ituitively approaches ½ as. STA286 week 8 25
STA286 week 8 26 Law of Large Numbers Iterested i sequece of radom variables X, X 2, X 3, such that the radom variables are idepedet ad idetically distributed (i.i.d). Let Suppose E(X i ) μ, V(X i ) σ 2, the ad Ituitively, as, so i X i X ( ) ( ) μ i i i i X E X E X E ( ) ( ) X V X V X V i i i i 2 2 σ ( ) 0 X V ( ) μ X E X
Formally, the Weak Law of Large Numbers (WLLN) states the followig: Suppose X, X 2, X 3, are i.i.d with E(X i ) μ <, V(X i ) σ 2 <, the for ay positive umber a as. ( X a) 0 P μ This is called Covergece i Probability. STA286 week 8 27
Recall - The Chi Square distributio If Z ~ N(0,) the, X Z 2 has a Chi-Square distributio with parameter, i.e., X χ ~ 2 (). Ca proof this usig chage of variable theorem for uivariate radom variables. The momet geeratig fuctio of X is m X () t 2t / 2 2 2 2 If X χ, X ~ χ, K, X χ, all idepedet the Proof ~ 2 k ( v ) 2 ( v ) k ( v ) ~ k T ~ χ i X i 2 Σ k v i STA286 week 8 28
Claim Suppose X, X 2, X are i.i.d ormal radom variables with mea μ ad variace σ 2 X. The, i μ Z are idepedet stadard ormal i σ variables, where i, 2,, ad Proof: i Z 2 i i 2 X i μ σ 2 ~ χ ( ) STA286 week 8 29
Samplig Distributio of S 2 Suppose X, X 2, X are i.i.d ormal radom variables with mea μ ad variace σ 2. The, ( ) 2 σ s 2 2 σ 2 2 ( X i X ) ~ χ( ) i Further, it ca be show that X ad s 2 are idepedet. STA286 week 8 30
t distributio Suppose Z ~ N(0,) idepedet of X ~ χ 2 (). The, T Z X / v ~ t ( ). v Proof: usig oe dimesioal chage of variables theorem. The desity fuctio of the t-distributio is give by STA286 week 8 3
Claim Suppose X, X 2, X are i.i.d ormal radom variables with mea μ ad variace σ 2. The, Proof: X μ ~ t S / ( ) STA286 week 8 32
F distributio Suppose X ~ χ 2 () idepedet of Y ~ χ 2 (m). The, X / Y / m ~ F (, m) The desity fuctio of the F distributio is give by STA286 week 8 33
Properties of the F distributio The F-distributio is a right skewed distributio. F( ) i.e. m, F ( < a) P F(, m) (, m) P F (, m) > a P F( m, ) > a Ca use Table A.6 i appedix to fid percetile of the F- distributio. Example STA286 week 8 34