Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis

Size: px

Start display at page:

Download "Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis"

Samson Anthony
5 years ago
Views:

1 Statstcal analyss usng matlab HY 439 Presented by: George Fortetsanaks

2 Roadmap Probablty dstrbutons Statstcal estmaton Fttng data to probablty dstrbutons

3 Contnuous dstrbutons Contnuous random varable X takes values n subset of real numbers D R X corresponds to measurement of some property, e.g., length, weght Not possble to talk about the probablty of X takng a specfc value P( X 0 Instead talk about probablty of X lyng n a gven nterval P( 1 X 2 P( X [ 1, 2] P( X P( X [, ]

4 Probablty densty functon (pdf Contnuous functon p( defned for each D Probablty of X lyng n nterval I D computed by ntegral: Eamples: P ( X l p( d l Important property: P( 1 X 2 P( X [ 1, 2 ] 2 1 p( d P ( X P( X [, ] p( d D P( X D p( d 1

5 Cumulatve dstrbuton functon (cdf For each D defnes the probablty Important propertes: Complementary cumulatve dstrbuton functon (ccdf ( X P d p X P X P F ( ], [ ( ( ( 0 ( F 1 ( F ( ( ( F F X P ( 1 ( 1 ( ( F X P X P G

6 Eponental dstrbuton Probablty densty functon Cumulatve dstrbuton functon Memoryless property: P( T t T P( T t

7 Posson process Random process that descrbes the tmestamps of varous events Telephone call arrvals Packet arrvals on a router Tme between two consecutve arrvals follows eponental dstrbuton Arrval 1 Arrval 2 Arrval 3 Arrval 4 Arrval 5 Arrval 6 Arrval 7 t 1 t 2 t 3 t 4 t 5 t 6 Tme ntervals t 1, t 2, t 3, are drawn from eponental dstrbuton

8 Roadmap Probablty dstrbutons Statstcal estmaton Fttng data to probablty dstrbutons

9 Basc statstcs Suppose a set of measurements = [ 1 2 n ] ^ Estmaton of mean value: 1 (matlab m=mean(; 1 Estmaton of standard devaton: (matlab s=std(; n n ^ n ^ n 1 2

10 Estmate pdf Suppose dataset = [ 1 2 k ] Can we estmate the pdf that values n follow?

11 Estmate pdf Suppose dataset = [ 1 2 k ] Can we estmate the pdf that values n follow? Produce hstogram

12 Step 1 Dvde samplng space nto a number of bns Measure the number of samples n each bn 3 samples 5 samples 6 samples 2 samples

13 P( Frequency Step E = total area under hstogram plot = 2*3 + 2*5 + 2*6 +2*2 = 32 Normalze y as by dvdng by E 6/32 5/32 3/32 2/

14 Matlab code functon produce_hstogram(, bns % nput parameters % X =[ 1 ; 2 ; n ]: a column vector contanng the data 1, 2,, n. % bns = [b 1 ; b 2 ; b k ]: A vector that Dvdes the samplng space n bns % centered around the ponts b1, b2,, bk. end fgure; % Create a new fgure [f y] = hst(, bns; % Assgn your data ponts to the correspondng bns bar(y, f/trapz(y,f, 1; % Plot the hstogram label(''; % Name as ylabel('p('; % Name as y

15 10000 samples 1000 samples Hstogram eamples Bn spacng 0.1 Bn spacng 0.05

16 Emprcal cdf How can we estmate the cdf that values n follow? Use matlab functon ecdf( Emprcal cdf estmated wth 300 samples from normal dstrbuton

17 Percentles Values of varable below whch a certan percentage of observatons fall 80th percentle s the value, below whch 80 % of observatons fall. 80 th percentle

18 Estmate percentles Percentles n matlab: p = prctle(, y; y takes values n nterval [0 100] 80 th percentle: p = prctle(, 80; Medan: the 50 th percentle med = prctle(, 50; or med = medan(; Why s medan dfferent than the mean? Suppose dataset = [ ]: mean = 201/3=67, medan = 100

19 Roadmap Elements of probablty theory Probablty dstrbutons Statstcal estmaton Fttng data to probablty dstrbutons

20 Problem defnton Dataset D={ 1, 2,, k } collected from an eperment Famles of dstrbutons: Gaussan: θ Eponental: θ Generalzed pareto:, S θ { P1 ( θ1, P2 ( θ2,..., PN ( θν,, } Whch famly of dstrbutons better descrbes the dataset D?

21 Step 1: Mamum lkelhood estmaton For each famly determne parameter that better fts the data Mamze lkelhood of obtanng the data wth respect to k j j j k j k p p p D p ( ln arg ma ( arg ma,...,, ( arg ma ( arg ma θ θ θ θ * θ θ θ θ θ * θ θ Lkelhood functon Due to ndependence of samples

22 Eample: eponental dstrbuton Probablty densty functon Defne the log-lkelhood functon Set dervatve equal to 0 to fnd mamum k k k k k e l ln( ln( ln( ( k k k k d dl 1 * (

23 Reform queston After MLE: nstead of famles we have specfc dstrbutons P( * * 1 θ * 1, P2 ( θ2,..., PN ( θ Whch dstrbuton better descrbes the data? Choose most approprate dstrbuton based on: Q-Q plots Kullback Lebler dvergence

24 Method of Q-Q plots Checks how well a probablty dstrbuton P ( * θ descrbes the data Algorthm 1. Draw random datasets Υ 0, Υ 1, Υ 2,, Υ Μ from dstrbutonp ( 2. Compute percentles of these datasets at predefned set of ponts 3. Compute percentles of epermental dataset D at the same ponts 4. Plot percentles of Y 0 aganst percentles of each of Y 1, Y 2,.., Y M 5. Plot percentles of Y 0 aganst percentles of dataset D * θ If plot of step 5 s n the area defned by plots n step 4 the dstrbuton descrbes the data well

25 Plot percentles of Y0 vs. percentles of Y1

26 Plot percentles of Y0 vs. percentles of Y2

27 Plot percentles of Y0 vs. percentles of Y100

28 Construct envelope

29 Plot percentles of Y 0 vs. percentles of D Good fttng: The blue curve of orgnal percentles les n the envelope

30 Plot percentles of Y 0 vs. percentles of D Bad fttng: The blue curve of orgnal percentles les outsde the envelope

31 Method of Kullback Lebler dvergence Non-symmetrc metrc of dfference between dstrbutons P and Q Dscrete dstrbutons D KL ( P Q N p( log p( 1 q( Contnuous dstrbutons p( D KL ( P Q p( log q( d

32 Algorthm 1. Dscretze the emprcal pdf of the Dataset D 3 samples 5 samples 6 samples 2 samples 3/16 5/16 6/16 2/ Dscretze all dstrbutons P( * * 1 θ * 1, P2 ( θ2,..., PN ( θ 3. Compute KL dvergence of theoretcal dstrbutons wth dataset D 4. Choose the dstrbuton wth the lowest KL dvergence

33 Onlne materal Tutorals Statstcs

34 Cross correlaton corr(, y: estmates the cross correlaton between two tme seres and y R y ( m E[ nm yn] E[ n ynm] The larger the absolute value of the cross correlaton the larger the correlaton of the two varables Whte nose Output of IIR flter No correlaton Some correlaton

Statistics Spring MIT Department of Nuclear Engineering

Statistics Spring MIT Department of Nuclear Engineering Statstcs.04 Sprng 00.04 S00 Statstcs/Probablty Analyss of eperments Measurement error Measurement process systematc vs. random errors Nose propertes of sgnals and mages quantum lmted mages.04 S00 Probablty