BASICS ON DISTRIBUTIONS

Similar documents
CHAPTER VI Statistical Analysis of Experimental Data

Continuous Distributions

Special Instructions / Useful Data

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Lecture Notes Types of economic variables

Lecture 3 Probability review (cont d)

Summary of the lecture in Biostatistics

Lecture 3. Sampling, sampling distributions, and parameter estimation

Econometric Methods. Review of Estimation

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Chapter 5 Properties of a Random Sample

Module 7. Lecture 7: Statistical parameter estimation

Simulation Output Analysis

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

STATISTICAL INFERENCE

22 Nonparametric Methods.

STA 105-M BASIC STATISTICS (This is a multiple choice paper.)

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Parameter, Statistic and Random Samples

Lecture 4 : The Binomial Distribution. Jonathan Marchini

Descriptive Statistics

Unsupervised Learning and Other Neural Networks

2.28 The Wall Street Journal is probably referring to the average number of cubes used per glass measured for some population that they have chosen.

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

Evaluation of uncertainty in measurements

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

f f... f 1 n n (ii) Median : It is the value of the middle-most observation(s).

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

Chapter 14 Logistic Regression Models

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

Point Estimation: definition of estimators

Lecture 02: Bounding tail distributions of a random variable

Median as a Weighted Arithmetic Mean of All Sample Observations

Module 7: Probability and Statistics

Chapter 4 Multiple Random Variables

CODING & MODULATION Prof. Ing. Anton Čižmár, PhD.

Qualifying Exam Statistical Theory Problem Solutions August 2005

X ε ) = 0, or equivalently, lim

Chapter 3 Sampling For Proportions and Percentages

Simple Linear Regression

GOALS The Samples Why Sample the Population? What is a Probability Sample? Four Most Commonly Used Probability Sampling Methods

Chapter 4 Multiple Random Variables

LECTURE - 4 SIMPLE RANDOM SAMPLING DR. SHALABH DEPARTMENT OF MATHEMATICS AND STATISTICS INDIAN INSTITUTE OF TECHNOLOGY KANPUR

PROPERTIES OF GOOD ESTIMATORS

MEASURES OF DISPERSION

ρ < 1 be five real numbers. The

TESTS BASED ON MAXIMUM LIKELIHOOD

= 2. Statistic - function that doesn't depend on any of the known parameters; examples:

Section l h l Stem=Tens. 8l Leaf=Ones. 8h l 03. 9h 58

Chapter -2 Simple Random Sampling

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

Chapter 13 Student Lecture Notes 13-1

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

STK4011 and STK9011 Autumn 2016

Construction and Evaluation of Actuarial Models. Rajapaksha Premarathna

Class 13,14 June 17, 19, 2015

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

Introduction to Probability

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

Random Variate Generation ENM 307 SIMULATION. Anadolu Üniversitesi, Endüstri Mühendisliği Bölümü. Yrd. Doç. Dr. Gürkan ÖZTÜRK.

STK3100 and STK4100 Autumn 2017

Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests. Soccer Goals in European Premier Leagues

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

Chapter -2 Simple Random Sampling

5.1 Properties of Random Numbers

Lecture 1 Review of Fundamental Statistical Concepts

Objectives of Multiple Regression

Chapter 8. Inferences about More Than Two Population Central Values

Random Variables and Probability Distributions

Functions of Random Variables

STK3100 and STK4100 Autumn 2018

BIOREPS Problem Set #11 The Evolution of DNA Strands

A Study of the Reproducibility of Measurements with HUR Leg Extension/Curl Research Line

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

ENGI 3423 Simple Linear Regression Page 12-01

ECON 5360 Class Notes GMM

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

Analysis of Variance with Weibull Data

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Chapter 5 Elementary Statistics, Empirical Probability Distributions, and More on Simulation

CHAPTER 6. d. With success = observation greater than 10, x = # of successes = 4, and

Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

4. Standard Regression Model and Spatial Dependence Tests

Randomness and uncertainty play an important

1. A real number x is represented approximately by , and we are told that the relative error is 0.1 %. What is x? Note: There are two answers.

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

7. Joint Distributions

Outline. Point Pattern Analysis Part I. Revisit IRP/CSR

Third handout: On the Gini Index

Lecture 2 - What are component and system reliability and how it can be improved?

Two Heads Are Better Than None

Quantitative analysis requires : sound knowledge of chemistry : possibility of interferences WHY do we need to use STATISTICS in Anal. Chem.?

STA302/1001-Fall 2008 Midterm Test October 21, 2008

THE ROYAL STATISTICAL SOCIETY 2010 EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA MODULE 2 STATISTICAL INFERENCE

ENGI 4421 Propagation of Error Page 8-01

is the score of the 1 st student, x

Transcription:

BASICS ON DISTRIBUTIONS

Hstograms Cosder a epermet whch dfferet outcomes are possble (e. Dce tossg). The probablty of all the outcomes ca be represeted a hstogram

Dstrbutos Probabltes are descrbed wth dstrbutos Dscrete varables: data are valued o a dscrete set: I={, } Probablty dstrbuto: f( ) = probablty for the value Cumulatve dstrbuto: F( ) = probablty for a value Normalzato: f I F HIGHEST

Eamples 4 Cosder a co wth a tal probablty equal to p Flp a co three tmes. Plot the probablty dstrbuto for the umber of tals Flp a co utl the frst tal appears. Plot the probablty dstrbuto for the umber of flps

Emprcal Dstrbutos Epermetal data are represeted wth ormalzed hstograms Dscrete varables: data are valued o a dscrete set: I={, } Probablty dstrbuto: f( ) = N(= )/N tot Cumulatve dstrbuto: F( ) = N( )/N tot Normalzato: f I

Characterzato: Mode The mode s the value that occurs the most frequetly a probablty dstrbuto. For emprcal dstrbutos, the mode s the value that most frequetly occurs a data set. More tha oe mode ca be preset

Characterzato: Meda a meda s descrbed as the umerc value separatg the hgher half of a sample, a populato, or a probablty dstrbuto, from the lower half. The meda of a fte lst of umbers ca be foud by arragg all the observatos from lowest value to hghest value ad pckg the mddle oe. If there s a eve umber of observatos, the there s o sgle mddle value; the meda s the usually defed to be the mea of the two mddle values.

Characterzato: Meda

Averages (The Meda) The meda s the mddle value of a set of data oce the data has bee ordered. Eample. Robert ht balls at Grmsby drvg rage. The recorded dstaces of hs drves, measured yards, are gve below. Fd the meda dstace for hs drves. 85, 5, 30, 65, 00, 70, 75, 50, 40, 95, 70 50, 65, 70, 70, 75, 85, 95, 00, 5, 30, 40 Sgle mddle value Ordered data Meda drve = 85 yards

The meda Algorthm. Averages (The Meda) Sort the data If the umber of data s odd: Else: Take the mddle value Take the average betwee the two cetral values

Measurg the Spread wth Meda

Fdg the meda, quartles ad ter-quartle rage. Eample : Fd the meda ad quartles for the data below., 6, 4, 9, 8, 4, 9, 8, 5, 9, 8, 0 Order the data Q Q Q 3 4, 4, 5, 6, 8, 8, 8, 9, 9, 9, 0, Lower Quartle = 5½ Meda = 8 Upper Quartle = 9 Iter-Quartle Rage = 9-5½ = 3½

Epected Value For a gve fucto g() ad a probablty dstrbuto f() the epected value of a dscrete varable s computed as: g( ) g( f E ) I

Propertes of Epected Values ) E[X+Y]=E[X]+E[Y] ) E[X+c]=E[X]+c 3) E[aX+bY]=aE[X]+bE[Y] For eample, f E[X]=5 ad E[Y]=6, the E[X+5]=5+5=0 E[X+5]=*5+5=5 E[3X+Y]=3*5+*6=7 Prove ), ) ad 3)

Characterzato: Mea Mea E f I For emprcal dstrbutos E I N N TOT jdata N TOT j

Mea ad Varace Varace E E E E E Var()

Propertes of Epected Values ad Varaces. Var(X+Y)=Var(X)+Var(Y), f X ad Y are depedet.. Var(aX)=(a^)Var(X) 3. Var(X+c)=Var(X) 4. Var(aX+bY)=(a^)Var(X)+(b^)Var(Y), f X ad Y are depedet. Prove 4

Varace ad Momets of a Radom Varable The th momet of a radom varable X s E[X] Defto The k th momet of a radom varable X s E[X k ] Defto The Varace of a radom varable X s defed as Var[ X ] E[( X E[ X ]) ] E[ X ] (E[ X]). The stadard devato of a radom varable X s [X] Var[X].

Varace ad Momets of a Radom Varable Defto The covarace of two radom varable X ad Y s Cov( X, Y) E[( X [ X])( Y E[ Y])] Theorem For ay two radom varables X ad Y. Var[ X Y] Var[ X] Var[ Y] Cov( X, Y).

Varace ad Momets of a Radom Varable Proof: drectly proof by Var[ X ] E[( X E[ X] Var[ X Y ] E[( X Y E[ X Y ]) ] E[( X Y E[ X ] E[ Y]) ] )] E[( X E[ X ]) ( Y E[ Y]) ( X E[ X ])( Y E[ Y]) E[( X E[ X ]) E[( E[ Y]) E[( X E[ X ])( Y E[ Y])] Var[ X ] Var[ Y ] Cov( XY, )

Varace ad Momets of a Radom Varable Theorem If X ad Y are two depedet radom varables, the E[ X Y] [ X ] E[ Y].

Varace ad Momets of a Radom Varable Proof: drectly proof by E[ X ] Pr( X E[ X Y ] ( j) Pr(( X ) ( Y j)) ( j) Pr( X ) ( Y j) j ( Pr( X ))( j Pr( Y j)) [ X ] E[ Y ]. j j )

Varace ad Momets of a Radom Varable Corollary If X ad Y are depedet radom varables, the ad Cov( XY, ) 0 Var[ X Y] Var[ X ] Var[ Y].

Varace ad Momets of a Radom Varable Defto Let X, X,, X, be mutually depedet radom varables, the Var[ X ] Var[ X ].

Bomal dstrbuto Compute the probablty of obtag eactly 3 tals tossg 5 cos. These are all the possbltes TTTTT HTTTT THTTT TTHTT TTTHT TTTTH HHTTT HTHTT HTTHT HTTTH THHTT THTHT THTTH TTHHT TTHTH TTTHH TTHHH THTHH THHTH THHHT HTTHH HTHTH HTHHT HHTTH HHTHT HHHTT THHHH HTHHH HHTHH HHHTH HHHHT HHHHH f(5)=0/3 Ad what f the probabltes of head ad tal are dfferet?

Bomal dstrbuto How ca we cout wthout eumeratg? Total umber of cofguratos: 5 evets, wth two possbltes each 5 =3 Number of cofguratos wth two heads: ) choose 3 of the 5 postos: 5 (5-) (5-) possbltes ) the order of choce s rrelevat: there are 6 possble permutatos of 3 objects (3!) 5 4 3/ 6 =5!/(3! X!) = 0 Ad what f the probabltes of head ad tal are dfferet?

Bomal dstrbuto Probablty of havg k postve evets out of depedet epermets. The probablty of a postve evet s p

Bomal dstrbuto (mea) p p p j j p p pp k k p p k k k p p k k k kf E k j j j k k k k k k k k k N k 0 0 )! )!( (! )! )!( (! )!!(! Defe j=k-

Bomal dstrbuto (varace) k p p Var (k) E

Varace ad Momets of a Radom Varable Eample Varace of a Bomal Radom Varable The varace of a bomal radom varable X wth parameters ad p ca be determed drectly by computg E[X ]. j j j j E[ X ] C p ( p) j ( ) p p Var[ X ] E[ X ] ( E[ X ]) ( ) p p ( p) p( p)

Evaluato of parameters from emprcal data Geerally speakg, a parametrc model M ams to reproduce a set of kow data Real data (D) Model M Parameters T Modelled data How to fd the best parameters? 3

Mamum lkelhood T* = argma P(D T,M) = T T* = argma log(p(d T,M)) T D=data, M= model, T=model parameters

Evaluato of p Gve depedet epermets, wth k postve evets, how to evaluate the best p? MAXIMUM LIKELIHOOD defto: Best p p that mamses the total probablty of the epermet results Eample Whch s the p that mamses the outcome of 00000000000? Whch s the statstcal error?

Evaluato of p k p k p p k p k p p kp dp df p p k p k f ML ML ML k ML k ML k ML k ML p p k k ML 0 0 0 ), (

Posso dstrbuto the Posso dstrbuto s a dscrete probablty dstrbuto that epresses the probablty of a umber of evets occurrg a fed perod of tme f these evets occur wth a kow average rate (λ) ad depedetly of the tme sce the last evet. (The Posso dstrbuto ca also be used for the umber of evets other specfed tervals such as dstace, area or volume.)

Posso dstrbuto f ( k) k k! e k E k Var

Cotuous Dstrbutos Data are descrbed wth dstrbutos Cotuous varables: data are valued o a cotuous terval: I Cumulatve probablty: F() = N( X)/N tot Prob(A< B) = F(B)-F(A) = Probablty desty fucto (lmt) B A f ( ) d f ( ) df( ) d Normalzato: I f ( ) d

Mea ad Varace Mea E f ( ) d I Varace Var E f ( ) d I

The Uform Probablty Dstrbuto Uform Probablty Desty Fucto f () = /(b - a) for a < < b = 0 elsewhere where a = smallest value the varable ca assume b = largest value the varable ca assume The probablty of the cotuous radom varable assumg a specfc value s 0. P(= ) = 0

Eample: Buffet Customers are charged for the amout of salad they take. Samplg suggests that the amout of salad take s uformly dstrbuted betwee 5 ouces ad 5 ouces. Probablty Desty Fucto where f ( ) = /0 for 5 < < 5 = 0 elsewhere = salad plate fllg weght

Eample: Buffet What s the probablty that a customer wll take betwee ad 5 ouces of salad? F ( ) /0 P( < < 5) = /0(3) =.3 5 0 5 Salad Weght (oz.)

The Uform Probablty Dstrbuto P(8< < ) =? f ( ) /0 P(8< < ) = (/0)(-8) =.4 5 8 5

The Uform Probablty Dstrbuto P(0< < ) =? f ( ) /0 P(0< < ) = P(5< < )= = (/0)(-5) =.7 5 5

The Uform Probablty Dstrbuto Uform Probablty Desty Fucto f () = /(b - a) for a < < b = 0 elsewhere Epected Value of E() = (a + b)/ Varace of Var() = (b - a) / where a = smallest value the varable ca assume b = largest value the varable ca assume Prove t!!

Normal dstrbuto The ormal dstrbuto s cosdered the most basc cotuous probablty dstrbuto. Specfcally, by the cetral lmt theorem, uder certa codtos the sum of a umber of radom varables wth fte meas ad varaces approaches a ormal dstrbuto as the umber of varables creases. For ths reaso, the ormal dstrbuto s commoly ecoutered practce, ad s used throughout statstcs, atural sceces, ad socal sceces as a smple model for comple pheomea. For eample, the observatoal error a epermet s usually assumed to follow a ormal dstrbuto, ad the propagato of ucertaty s computed usg ths assumpto.

Normal dstrbuto f ( ), e μ : mea σ : varace

, ) ( e f μ : mea σ : varace,, d E d E Normal dstrbuto:cotuous

Gaussa dstrbuto a D- dmesoal space where μ s a D-valued vector (meas) ad s a DD symmetrc matr (covarace matr), wth determat 48 T D ep ), (

Estmatg Mea ad Varace from sampled data Cosder a set of depedet ad detcally dstrbuted data, followg a ormal dstrbuto wth mea μ ad varace σ,.., = X How ca we estmate mea ad varace? Eample: X=0.;0.8;0.4;0.4;0.05;0.6;0.5;0.9,0.49

Estmatg mea ad Varace from sampled data MAXIMUM LIKELIHOOD: Estmatg the parameters as to mamze the probablty for the sampled values ML, ML arg ma, f (,.., )

ML Mea ad Varace The probablty for the sampled data s f X, e It has to be mamsed over the varables μ ad σ

ML Mea ad Varace Sce logarthm s a mootoc fucto X f X f,,, l l arg ma, l arg ma, arg ma

ML Mea ad Varace 0 l 0 l,, ML ML ML ML 0 0 ML ML ML ML

ML Mea ad Varace ML ML ML Are these the epressos that you usually apply?

The μ ML s a ubased estmato Cosder a set of data geerated startg from a ormal dstrbuto N(μ, σ )the estmato ML Is a ubased estmate of μ sce ML E d d d e e e E..

The σ ML s a based estmato ML M Is a based estmate of σ sce E E E E E E j j j j j k k j j j j j ML

σ ML s uderestmatg σ Gree: Real dstrbuto Red: Sampled dstrbutos σ ML s evaluated wth respect to the sample mea

Ubased estmato of μ ad σ M S ML ML M

Dstrbuto of the sample mea Cosder dfferet sets of sampled data X k (e.g. dfferet levels of epresso of gees dfferet populatos of dvduals) A sample mea for each set ca be defed M k k k k How are they dstrbuted?

Dstrbuto of the sample meas M M s a lear combato of depedet detcally dstrbuted ormal varables: t s ormally dstrbuted

Dstrbuto of the sample meas var E E M E M E M M E M M E M j j j j var M E M

Dstrbuto of the sample meas

Testg the mea ad the varace of a sample agast a kow dstrbuto Gve a radom varable X, that you suppose ormally dstrbuted wth mea μ ad varace σ kow a pror Suppose to have a sample X, X, X Compute the sample mea M X ad the sample (ubased) varace S X M

Does M sgfcatly dffer from μ? Null hypothess H 0 : M= μ Alteratve hypothess H a : M μ If H 0 s true, the M (the sample mea) s ormally dstrbuted wth mea μ ad varace σ /. The: M Z s ormally dstrbuted, wth mea 0 ad varace

Two-taled Z-test Gve the sampled absolute value of Z we ca compute the probablty of obtag a value equal or hgher, o the ormal dstrbuto. If that probablty (P-value) s lower tha the desred sgfcace P-value (e.g. 0.05) we ca reject H 0 Sgfcace: MAX allowed probablty of rejectg true ull hypotheses P-value/ -Z-score P-value/ Z-score

Is M sgfcatly hgher tha μ? Null hypothess H 0 : M μ Alteratve hypothess H a : M> μ If H 0 s true, M μ. The etreme case for our testg s M=0: that case, aga, Z M s ormally dstrbuted, wth mea 0 ad varace

Z-score Oe-taled Z-test Gve the Z-score computed from the sample we cosder oly the postve values of Z-score ad compute the probablty of obtag a value equal or hgher, o the ormal dstrbuto. If that probablty s lower tha the desred sgfcace (e.g. 0.05) we ca reject H0 P-value

Frst decmal dgt AREA betwee 0 ad Secod decmal dgt

Eample If Z s equal to, ca we reject the ull hypothess at sgfcace 5%? Ad at sgfcace %?

Dstrbuto of the sample devatos If X are depedet, ormally dstrbuted radom varables wth mea μ ad varace σ, the the radom varable Q N X s dstrbuted accordg to the ch-square dstrbuto wth - degrees of freedom. Ths s usually wrtte as: Q The ch-square dstrbuto has oe parameter: (-) that s a postve teger that specfes the umber of degrees of freedom (.e. the umber of depedet X -μ )

Ch-square dstrbuto Mea = k Varace = k

Dstrbuto of the sample devatos So, beg the ubased sample varace s dstrbuted accordg to the ch-square dstrbuto wth k degrees of freedom. It follows that M S S E S E Ubased

Dstrbuto of the sample devatos: Proof for degree of freedom Let radom varable Y be defed as Y = X where X has ormal dstrbuto wth mea 0 ad varace We ca compute the cumulatve fucto

Dstrbuto of the sample devatos: Proof for degree of freedom y>0

Studet s t-dstrbuto ν : degrees of freedom

Normal Vs t- dstrbutos NORMAL K= K=

Etreme value dstrbuto the Gumbel dstrbuto s used to model the dstrbuto of the mamum (or the mmum) of a umber of samples of varous dstrbutos. For eample we would use t to represet the dstrbuto of the mamum level of a rver a partcular year f we had the lst of mamum values for the past te years. It s useful predctg the chace that a etreme earthquake, flood or other atural dsaster wll occur.

Etreme value dstrbuto

BLAST I accordace wth the Gumbel EVD, the probablty p of observg a score S equal to or greater tha s gve by the equato

BLAST: E-value The epect score E of a database match s the umber of tmes that a urelated database sequece would obta a score S hgher tha by chace. The epectato E obtaed a search for a database of D sequeces s gve by