Chapter 5 Elementary Statistics, Empirical Probability Distributions, and More on Simulation

Similar documents
CHAPTER VI Statistical Analysis of Experimental Data

Econometric Methods. Review of Estimation

Lecture Notes Types of economic variables

Summary of the lecture in Biostatistics

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

Chapter 5 Properties of a Random Sample

MEASURES OF DISPERSION

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

Chapter 8: Statistical Analysis of Simulated Data

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

ESS Line Fitting

Lecture 3. Sampling, sampling distributions, and parameter estimation

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

ENGI 3423 Simple Linear Regression Page 12-01

Median as a Weighted Arithmetic Mean of All Sample Observations

Module 7: Probability and Statistics

Functions of Random Variables

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

Objectives of Multiple Regression

Module 7. Lecture 7: Statistical parameter estimation

STA302/1001-Fall 2008 Midterm Test October 21, 2008

Multiple Choice Test. Chapter Adequacy of Models for Regression

Lecture 3 Probability review (cont d)

Chapter 8. Inferences about More Than Two Population Central Values

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

Simple Linear Regression

f f... f 1 n n (ii) Median : It is the value of the middle-most observation(s).

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Class 13,14 June 17, 19, 2015


Random Variate Generation ENM 307 SIMULATION. Anadolu Üniversitesi, Endüstri Mühendisliği Bölümü. Yrd. Doç. Dr. Gürkan ÖZTÜRK.

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

X ε ) = 0, or equivalently, lim

Parameter, Statistic and Random Samples

Simulation Output Analysis

Chapter 14 Logistic Regression Models

Lecture 1: Introduction to Regression

A Study of the Reproducibility of Measurements with HUR Leg Extension/Curl Research Line

Lecture 1 Review of Fundamental Statistical Concepts

Descriptive Statistics

Multiple Linear Regression Analysis

PROPERTIES OF GOOD ESTIMATORS

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Measures of Dispersion

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy

Simple Linear Regression

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

Linear Regression. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

Special Instructions / Useful Data

Lecture 8: Linear Regression

Third handout: On the Gini Index

22 Nonparametric Methods.

Correlation and Simple Linear Regression

STATISTICAL INFERENCE

Lecture 1: Introduction to Regression

ENGI 4421 Propagation of Error Page 8-01

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

Introduction to local (nonparametric) density estimation. methods

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance

Arithmetic Mean Suppose there is only a finite number N of items in the system of interest. Then the population arithmetic mean is

Linear Regression with One Regressor

Statistics Descriptive and Inferential Statistics. Instructor: Daisuke Nagakura

Comparison of Dual to Ratio-Cum-Product Estimators of Population Mean

1. A real number x is represented approximately by , and we are told that the relative error is 0.1 %. What is x? Note: There are two answers.

Can we take the Mysticism Out of the Pearson Coefficient of Linear Correlation?

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

GOALS The Samples Why Sample the Population? What is a Probability Sample? Four Most Commonly Used Probability Sampling Methods

Bayes (Naïve or not) Classifiers: Generative Approach

ε. Therefore, the estimate

STA 105-M BASIC STATISTICS (This is a multiple choice paper.)

Linear Regression. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

Line Fitting and Regression

STK4011 and STK9011 Autumn 2016

2.28 The Wall Street Journal is probably referring to the average number of cubes used per glass measured for some population that they have chosen.

Chapter Statistics Background of Regression Analysis

CS286.2 Lecture 4: Dinur s Proof of the PCP Theorem

Chapter 13 Student Lecture Notes 13-1

Continuous Distributions

Multivariate Transformation of Variables and Maximum Likelihood Estimation

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

Lecture 9: Tolerant Testing

1 Onto functions and bijections Applications to Counting

Section l h l Stem=Tens. 8l Leaf=Ones. 8h l 03. 9h 58

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

Lecture 5: Interpolation. Polynomial interpolation Rational approximation

BIOREPS Problem Set #11 The Evolution of DNA Strands

Statistics MINITAB - Lab 5

CHAPTER 4 RADICAL EXPRESSIONS

Quantitative analysis requires : sound knowledge of chemistry : possibility of interferences WHY do we need to use STATISTICS in Anal. Chem.?

The expected value of a sum of random variables,, is the sum of the expected values:

Random Variables and Probability Distributions

5.1 Properties of Random Numbers

For combinatorial problems we might need to generate all permutations, combinations, or subsets of a set.

Chapter 11 Systematic Sampling

Transcription:

Chapter 5 Elemetary Statstcs, Emprcal Probablty Dstrbutos, ad More o Smulato

Cotets Coectg Probablty wth Observatos of Data Sample Mea ad Sample Varace Regresso Techques Emprcal Dstrbuto Fuctos More o Mote Carlo Smulato Statstcal Process Cotrol Covergece of the Sample Mea to the Mea

Coectg Probablty wth Observatos of Data Scope of the chapter I chapter to chapter4: aoms of probablty, varous probablty relatoshps world of mathematcal models we ow deal wth problems of real world sample mea, varace, stadard devato, emprcal dstrbuto of radom data etc.. realm of statstcs

Coectg Probablty wth Observatos of Data (Cot.) Remark:. We wll also cosder estmato theory, ad decso makg based o probablstc cocepts later chapters.. Computer smulato of radom pheomea wll be revsted as well.

Sample Mea ad Sample Varace Suppose we take a sample of maufactured tems {,,, }, e.g. resstors Populato: totalty of tems {,,, } Sample: dvdual 's

Sample Mea ad Sample Varace (Cot.) Defto 5. Sample mea The sample mea of a populato s smply the arthmetc average of all the sample values,.e. Remarks Notce that the sample mea s a radom varable, depedg o the sample values selected from the populato. Hope that t s close to the actual mea: to be dscussed later...

Sample Mea ad Sample Varace (Cot.) Defto 5. Sample varace The sample varace of a populato s defed a smlar maer as the varace of a radom varable dscussed chapter,.e. s ( ) Where s the sample mea defed above.

Sample Mea ad Sample Varace (Cot.) Remarks Notce that s s a radom varable as sample mea s a radom varable It represets the measure of the spread of populato aroud the sample mea The square root of sample varace s called the stadard devato s ( )

Sample Mea ad Sample Varace (Cot.) The reaso for dvso by - rather tha wll become apparet subsequet chapter whe we cosder estmato. Note, for a momet, that as becomes large, the dfferece s small

Sample Mea ad Sample Varace (Cot.) Eample 5. Fd the sample mea of the followg resstace values Ω; 900, 03, 939, 06, 07, 996, 970, 079, 065, ad 049. Soluto Applyg the defto of the sample mea above, we easly get: 009

Sample Mea ad Sample Varace (Cot.) Eample 5. Obta the sample stadard devato of the resstace samples gve the above eample. Soluto Usg the defto of sample stadard devato, we obta s 58. 7

Sample Mea ad Sample Varace (Cot.) Note As the # of populato gets large, computatos volved the sample varace becomes tedous, due to the subtracto procedure... Frst, you calculate the sample mea, ad the subtract t from each sample before squarg ad summg aga

Sample Mea ad Sample Varace (Cot.) Computatoally effcet way of obtag Epadg the RHS of the sample varace, we get applyg the defto of the sample mea above s s ) ( ) ( s

Sample Mea ad Sample Varace (Cot.) Remark: We oly eed the sums of the sample mea ad the squares of the sample mea, wthout the tedous step of subtracto...

Sample Mea ad Sample Varace (Cot.) Eample 5.3 Redo the eample 5- followg the procedure metoed above. Soluto: We get s ad s 0,,806, 58,7, whch s 0,090, ad thus obta the same result obtaed prevously.

Sample Mea ad Sample Varace (Cot.) Eample 5.4 The resduals are defed as: Show that the sum of all resduals s equal to zero. Soluto: From the defto of the sample mea, we have: whch meas d 0 or d 0 ( ) 0

Regresso Techques Suppose we are gve measuremets of a par of data some relatoshp b/w the data 3 matchg the data relato to a smple straght le Objectve: Gve the measuremets of data pars, {(, y ), =,,, }, fd a straght le whch s best ft to the data: 3 Precse determato of the relatoshp s ot possble due to measuremet errors y

Regresso Techques (Cot.) Crtero: Choose the costats α ad β : the straght le ft to the data s take the sese of mmum squared error: (, ) argm o o (, ) where subscrpt o stads for optmum, ad the ε s the average squared error defed as follows: ( y )

Regresso Techques (Cot.) Procedure: (regresso techque) To fd (α o, β o ) makg ε as small as possble, we dfferetate ε wth respect to α ad β, ad put to zero: 0 ) ( 0 ) ( o o o o y y

Regresso Techques (Cot.) We ca re-arrage the above equatos to the followgs: o o o o y y

Regresso Techques (Cot.) Solvg for (α o, β o ) yelds: where represets the sample varace of the values. y s y y o o o ) ( s

Regresso Techques (Cot.) For further smplfcato, defe the sample covarace c y ad the sample correlato coeffcet r y respectvely as: c y ( )( y y) ad where s ad s y represet the sample stadard devato of the data sets { } ad {y }, respectvely. r y s c y s y

Regresso Techques (Cot.) By epadg the product c y, we have: Ad the costat α o ca smply be epressed as: The regresso le whch s best ft to the data becomes: ) ( y y c y y o s c s c y s c y y y o o

Regresso Techques (Cot.) The regresso le ca be arraged to: cy y y ( ) s Or, the regresso le ca be epressed as follows: 4 y y s y 4 Note that ths s a somewhat easer epresso to remember... r y s

Regresso Techques (Cot.) Remarks:. If r y = 0 the regresso le vashes, ad the data set are called ucorrelated.. If r y = ±, the data are learly related as follows: 3. The goodess of the ft s determed by the squared error ε: eve though t has bee mmzed, the regresso le mght stll ot well ft to data, especally whe the correlato b/w/ data s small. y m b

Regresso Techques (Cot.) Eample 5.5 Fd the correlato coeffcet ad the regresso le for the followg data set: Soluto: 0.68 0.7.7.0.63 3.06 3.5 4.00 4.03 4.50 y.45 9.93 6.64 0.4 8.93 3.34.56 6.7 9.6 5.03 From the data set, we ca get: r y = 0.7, =.6, y =.44, ad s = :39, s y = 3:88. Thus, the regresso le becomes: y.44.6 0.7 3.88.39 whch s Fgure5-

Regresso Techques (Cot.) Fgure 5.: Data pars ad regresso le for the radom data set of Eample5-5

Emprcal Dstrbuto Fuctos F Cosder a r.v. X wth dstrbuto fucto whch s ukow. 5 We defe the emprcal cumulatve dstrbuto fucto 6 as follows: Defto 5.3 Emprcal Dstrbuto Fucto: We have a umber of depedet samples of a r.v. X, deoted {, =,,, }. The the emprcal dstrbuto fucto of X s defed as: umber of samples,,, ~ X (,,, ) F X () o greater tha

Emprcal Dstrbuto Fuctos (Cot.) 5 I chapter 3 whe we theoretcally dscussed r.v.'s, we assumed a certa type of dstrbuto fuctos such as Gaussa, epoetal etc., eve though are are ot absolutely sure of t... 6 Or smply emprcal dstrbuto.

Emprcal Dstrbuto Fuctos (Cot.) Eample 5.6 Obta the emprcal dstrbuto of the resstace samples gve Eample5-. Soluto Arragg the samples ascedg order, whch are 900, 939, 970, 996, 03, 07, 049, 06, 065 ad 079, the emprcal dstrbuto ca easly be obtaed va the above defto. The result s plotted Fgure5-.

Emprcal Dstrbuto Fuctos (Cot.) Fgure 5.: The emprcal dstrbuto fucto for resstace samples Eample5-.

Emprcal Dstrbuto Fuctos (Cot.) Note:. The emprcal dstrbuto fucto has the same propertes of a cdf. (check!). The emprcal probablty mass fucto, whch correspods to the pdf, ca easly be obtaed from the dstrbuto fucto. (how?)

Emprcal Dstrbuto Fuctos (Cot.) Smplfed way of computg emprcal dstrbuto: Whe the # of datum s large, we follow the followg steps: () We dvde the data rage to a coveet # of tervals of equal legth. () WE plot a hstogram coutg the umber of data wth each cell (or terval). (3) The # of data wth each cell s cumulatvely summed to get emprcal dstrbuto.

Emprcal Dstrbuto Fuctos (Cot.) Eample 5.7 The tervals b/w telephoe calls arrvg at a certa swtchg offce are recorded mutes, whch are: 0.06, 0.977, 0.05, 0.83, 0.597, 0.46,.37, 0.07, 0.9, 0.938, 0.065, 0.098, 0.7, 0.87, 0.863, 0.0, 0.37, 0.93, 0.343, 0.56, 0.45, 0.637, 0.8, 0.9, 0.4, 0.63, 0.37,.048, 0.5, 0.09,.675, 0.33, 0.06, 0.46,.8, 0.06, 0.04, 0.99, 0.53, 0.376, 0.49, 0.083, 0.575, 0.393, 0.65, 0.009, 0.606, 0.5, 0.83 ad 0.85. Fd the hstogram ad the emprcal cdf of these tme tervals. Soluto: Followg the steps descrbed above, the result are plotted Fgure5-3. Note that the hstogram appears to be roughly epoetally decreasg

Emprcal Dstrbuto Fuctos (Cot.) Fgure 5.3: Emprcal dstrbutos for Eample5-7 (a) hstogram (b) emprcal cdf.

Emprcal Dstrbuto Fuctos (Cot.) Remarks:. Recall the defto of the pdf chapter 3: f X ( ) P( X ) umber of data values (, ) totalumber of data values,

Emprcal Dstrbuto Fuctos (Cot.) Based o ths, we ca fgure out that the hstogram obtaed above eample does ot match the pdf for two reasos: (a) The hstogram s ot ormalzed by the total # of data. (b) As s obvous the above equato,,.e. to get a appromato to the pdf, we must dvde through by Δ Wth these two correctos, replotted hstogram s Fgure5-4, alog wth the plot of the followg fucto: ~ f ( ) e X u( )

Emprcal Dstrbuto Fuctos (Cot.) (cf) The agreemet s surprsgly good 7... 7 The data values Eample5-7 have bee geerated va a radom umber geerator producg epoetally dstrbuted radom varables.

Emprcal Dstrbuto Fuctos (Cot.) Fgure 5.4: Normalzed hstogram to appromate the theoretcal pdf.

Emprcal Dstrbuto Fuctos (Cot.). Notce that there 9 a trade-off b/w # of bs versus the appearace of the hstogram: (a) Too may bs, due to too few samples per b, causes statstcal rregularty, ad thus make the hstogram ragged-lookg. (b) Too few bs causes a low resoluto o the abscssa of the hstogram. (cf) Usually rely o tral ad error process to select the sutable # of bs 8 8 Especally the case of relatvely few data samples, as the case of these eamples...

More o Mote Carlo Smulato Mote Carlo Smulato Comple computer smulatos of systems udergog radom perturbatos, usg radom umber geerator... Eample 5.8 Cosder a RC crcut wth radom resstace ad capactace 9, of whch the 3-dB cutoff frequecy of a RC flter s defed as: f 3 RC Aalyze the possble values of f 3.

More o Mote Carlo Smulato (Cot.) Soluto: Suppose the omal value of R s 000(Ω) w/ ±0% tolerace,.e. R ~ U[900; 00], whereas the capactace has a omal value of (μf) w/ ±5% tolerace,.e. C ~ U[0:95; :05]. 9 Ths s due to the ucerta maufacturg process...

More o Mote Carlo Smulato (Cot.) () Covetoal method: Wthout ay computer smulatos, all we ca deduce from the gve data are the omal value, mamum value, ad the mmum value of f 3,.e.: () omal f 3 : f, om 3 6 0 0 () mamum f 3 : f 500 3, ma 6 9000.950 3 59(Hz) 86(Hz)

More o Mote Carlo Smulato (Cot.) () mmum f 3 : f, m 6 00.050 3 37(Hz) Therefore, all we ca say about f3 s that t would have values of somewhere betwee 7(Hz) ad 86(Hz), wth ts omal value of 59(Hz).

More o Mote Carlo Smulato (Cot.) () Mote Carlo smulato: We geerate, say, 5000 values of R ad C whch are uformly dstrbuted wth ther allowed rage about the omal values, ad compute f 3 for each par of R ad C (usually usg hgh level programmg laguages C++, FORTRAN etc., or utlzg mathematcs package : MATLAB), ad the plot the hstogram of the cutoff frequecy... (fgure5-5)

More o Mote Carlo Smulato (Cot.) Fgure 5.5: Hstograms for (a) resstace, (b) capactace, ad (c) cutoff frequecy.

More o Mote Carlo Smulato (Cot.) Table5. MATLAB program for geeratg hstogram

More o Mote Carlo Smulato (Cot.) Wth ths method, we ca geerate a hstogram for f 3 to determe the fluece of compoet varato o cutoff frequecy Ths gves more formato that the above covetoal method 0 (e.g.) estmate of the most lkely value of f3 at whch the hstogram s ts peak. 0 We call ths covetoal method a etreme value aalyss

More o Mote Carlo Smulato (Cot.) Addtoal use of data from MC smulatos: (a) If the RC crcut s part of a larger system, we ca fd the mamum or mmum values of the cutoff frequecy, ad carry out a worst-case desg. (b) Fd the appromate probablty that f 3 les a specfed rego, for eample, from fgure5-5(c): P(40 whereas P(50 f f 3 3 50) 60) 900 5000 600 5000 0.8 0.3

More o Mote Carlo Smulato (Cot.) Addtoal use of data from MC smulatos: (a) If the RC crcut s part of a larger system, we ca fd the mamum or mmum values of the cutoff frequecy, ad carry out a worst-case desg. (b) Fd the appromate probablty that f 3 les a specfed rego, for eample, from fgure5-5(c): Ths s the case wth more comple fuctoal relatoshps or probabltc models for the compoet values... Note that, a smple model as the above eample, we ca easly fgure out the etreme values w/o MC smulato. Note that, for ths purpose, the emprcal cdf would be more accurate..

Statstcal Process Cotrol I a maufacturg process, there several steps that take place Motor ad determe whe thgs have goe wrog... Close dow the process ad look for the offedg steps : Cotrol Chart Illustrato Suppose we are maufacturg trasstors, the ga of whch s of our cocer measure the ga, dvde the easuremets to lots

Statstcal Process Cotrol (Cot.) compute the sample mea of each lot: m, =,,,N compute the sample mea of the sample meas 3 : m set the upper ad lower lmts as ±3 sample std of the m 's from m check f ay m falls outsde the cotrol lmt... decde whether the maufacturg process has a problem or ot 3.e. sample mea of etre samples!

Statstcal Process Cotrol (Cot.) Eample 5.9 I a trasstor maufacturg process, for 5 lots of sze 5, the sample meas of the curret gas are measured ad plotted fgure5-6 by the 's. The sample mea of the etre 5 5 = 5 trasstor gas s 98.8, ad the sample stadard devato of the lot's sample mea s 6.95. We, the, set the upper ad lower cotrol lmts as: UCL = 98:8 + 3 6:95 = 9:65 LCL = 98:8-3 6:95 = 77:96

Statstcal Process Cotrol (Cot.) Soluto: The correspodg cotrol chart,.e. the sample meas of the 5 lots alog w/ the UCL ad LCL, s fgure 5-6. Note that for ths partcular process, we are well wth the cotrol lmts. 4 The MATLAB program producg the cotrol chart s Table5-. 4 If the sample mea of ay lot eceeds or goes below the UCL or LCL, respectvely, the process s sad to be out of cotrol, ad we seek for the cause of the ecurso, ad correct t!

Statstcal Process Cotrol (Cot.) Fgure 5.6: Cotrol chart for trasstor gas: 5 lots of 5 trasstor each

Statstcal Process Cotrol (Cot.) Table5. MATLAB program for geeratg a cotrol chart.

Covergece of the Sample Mea to the Mea Recall: The sample mea of a populato, gve below, s a estmate of the true mea μ X : Questo: How good ths estmate s? (r.v.) Aswer: We get a boud by applyg the Chebyshev's equalty 5... 5 We replace X wth.

Covergece of the Sample Mea to the Mea (Cot.) To apply the Chebyshev's equalty, we eed the () mea ad () varace of the sample mea : () mea 6 : E( ) X 6 Ths s show problem 4-4a.

Covergece of the Sample Mea to the Mea (Cot.) () varace: each sample from the populato of the varace s where ) Var( ) Var( ] ) [( )] )( [( ) )( ( ) ( } )] ( {[ ) Var( X X X j X j X j X j X X X X X X E X X E X X E X X E E

Covergece of the Sample Mea to the Mea (Cot.) Thus, the Chebyshev's equalty 7 becomes: P X k X k 7 Ufortuately, t s usually true that we also do ot kow the varace X of the samples, so we caot apply ths Chebyshev's equalty real estmato problems. We wll dscuss ths subject Chapter 6 aga.

Covergece of the Sample Mea to the Mea (Cot.) Eample 5.0 Samples draw from a populato are kow to have stadard devato of. We wat the probablty that the absolute value of the dfferece b/w ther sample mea ad the true mea s greater tha 0.5 to be less tha %. How may samples should be draw 8? 8 So that the sample mea, as a estmate of the true mea, matas the precso gve the problem.

Covergece of the Sample Mea to the Mea (Cot.) Soluto: From the Chebyshev's equalty, we wat: 0.0 k ad or k 00 or k 0 kσ X 0 0.5 or 0 0.5 40 or 600 Note that ths s a farly large # of samples.

Covergece of the Sample Mea to the Mea (Cot.) Remark If we kow somethg about the dstrbuto of the sample mea, the requred umber of samples ca be reduced...

Covergece of the Sample Mea to the Mea (Cot.) Recall 9 the cetral lmt theorem whch says that the sample mea s appromately Gaussa for a large umber of samples, ad the the LHS of Chebyshev's equalty becomes: e v /( X / ) dv Q( k) 0.0 k X / X / where the substtuto u = / v/σ X has bee used the tegrad to get the Q-fucto epresso. 9 I Chapter 4.

Covergece of the Sample Mea to the Mea (Cot.) Usg the table of Q-fucto, we fd that k = :57, ad sce k must be a teger, we roud t up to 3. Therefore, we have: kσ X 3 3 0.5 or 0.5 or or 44 0.5 Notce that the # of samples s cosderably lass tha the prevous case...