BASIC PRINCIPLES OF STATISTICS

PROBABILITY DENSITY DISTRIBUTIONS DISCRETE VARIABLES

BINOMIAL DISTRIBUTION ~ B 0 0 umber of successes trals Pr E [ ] Var[ ] ;

BINOMIAL DISTRIBUTION B7 0. B30 0.3 B50 0.5

MULTINOMIAL DISTRIBUTION ~ Mult k 0 ; k 0 ; k Pr!! k! k k E [ ] Var ] ; [ Cov[ ] j j

POISSON DISTRIBUTION λ ~ Posso λ λ > 0 0 Pr λ λ e! λ E[ λ ] Var[ λ] λ

POISSON DISTRIBUTION Posso Posso5 Posso5

PROBABILITY DENSITY DISTRIBUTIONS CONTINUOUS VARIABLES

α β β α UNIFORM DISTRIBUTION Uform ~ β α β α < < < β α β α ] [ β α β α + E ] [ α β β α Var ;

BETA DISTRIBUTION α β ~ Beta α β α > 0 ; β > 0 0 α β Γ α + β Γ α Γ β α β E[ α β] α α + β ; αβ Var[ α β] α + β α + β +

BETA DISTRIBUTION

EXPONENTIAL DISTRIBUTION λ ~ Ex λ λ > 0 0 λ λ λe 0 0 0 E[ λ] λ ; Var[ λ] λ

EXPONENTIAL DISTRIBUTION

{ } β α Γ β β α α α ex GAMMA DISTRIBUTION Gamma β α β α ~ ad > 0 β α 0 > β α β α ] [ E ] [ β α β α Var ;

GAMMA DISTRIBUTION

CHI-SQUARE DISTRIBUTION ϕ ~ χϕ ϕ > 0 > 0 [same as Gamma ϕ / / ] ϕ ϕ / Γ ϕ / ϕ / ex { / } E[ ϕ ] ϕ ; Var[ ϕ] ϕ

NORMAL GAUSSIAN DISTRIBUTION µ ~Nµ < µ < > 0 < < µ ex µ π E[ µ ] µ ; Var [ µ ]

NORMAL GAUSSIAN DISTRIBUTION Mea ad varace defe the dstrbuto µ A µ B < µ C A C > B But roortos.e. the bellshae are alwas the same. 68.3% 95.5% 99.8%

NORMAL GAUSSIAN DISTRIBUTION x ~ Normal Cetral Lmt Theorem z ~ N0 µ + z ~ N µ w > 0 ad log w ~ Normal w : logormal varable

Relatoshs amog commo dstrbutos Sold les: trasformatos ad secal cases Dashed les: lmts Leems 986

MULTIVARIATE NORMAL DISTRIBUTION < µ < µ Σ~N P µ Σ Σ : ostve defte < < µ Σ π / Σ / ex µ ' Σ µ E[ µ Σ] µ ; Var[ µ Σ ] Σ z ~ N 0 I µ + Az ~ N µ Σ where Σ ' AA

MULTIVARIATE NORMAL DISTRIBUTION

MULTIVARIATE NORMAL: MARGINAL DISTRIBUTIONS ' ' ' ' ' ' µ µ µ ad Σ Σ Σ Σ Σ ad : - ad - dmetoal vectors; + d / / ' π Σ ex µ Σ µ

MULTIVARIATE NORMAL: MARGINAL DISTRIBUTIONS

MULTIVARIATE NORMAL: CONDITIONAL DISTRIBUTIONS ' ' ' ' ' ' µ µ µ ad Σ Σ Σ Σ Σ ad : - ad - dmetoal vectors; + / / π Var ex ' E [ Var ] E E µ + ΣΣ µ ; Var Σ Σ Σ Σ

MULTIVARIATE NORMAL: CONDITIONAL DISTRIBUTIONS x 0 x 0

POINT ESTIMATION METHODS OF FINDING ESTIMATORS Method of Momets Least Squares Maxmum Lkelhood Baesa Estmators

METHOD OF MOMENTS d ~ k Equate the frst k samle momets to the corresodg k oulato momets ad solve the sstem of smultaeous equatos. Samle Momets m m m k k ; ; ; Poulato Momets µ E[ ] µ E[ ] µ k E [ k ]

EXAMPLE Samle Momets Poulato Momets d ~ N µ k m m ; ; µ E[ ] µ µ + E[ ] µ µˆ µˆ ˆ ˆ µ + ˆ

LEAST SQUARES A MATHEMATICAL SOLUTION x a + bx x x x α + β + ε x Resdual Sum of Squares: RSS [ a + bx ]

LEAST SQUARES + bx a RSS ] [ 0 ] [ bx a a RSS 0 ] [ x bx a b RSS bx a xx x S S b x x x S xx x x S

MAXIMUM LIKELIHOOD ~ k d k d L Lkelhood Fucto: log L l Log-Lkelhood Fucto: k log ˆ MLE Θ a ˆ L L arameter sace

MAXIMUM LIKELIHOOD Fdg the maxmum of L : L 0 solutos are ossble caddates L ˆ < 0 maxmum Check also the boudares of the arameter sace!!

Pr L Examle : ~ B d log log l + l d d 0 ˆ ˆ ˆ 0 ˆ < l d d Check:

L / ex µ µ µ Examle : ~ µ N d l log µ µ l µ µ µ + l 4 µ µ ˆµ ˆ ˆ µ

Examle 3: ~ ρ ρ µ µ N d ρ π + ex µ µ ρ µ µ ρ ρ j j ˆµ j j j ˆ ˆ µ ˆ ˆ ˆ ˆ ˆ µ µ µ µ ρ

+ ex ρ ρ ρ π ρ ˆ ˆ ˆ ˆ ˆ µ µ µ µ ρ Examle 4: 0 0 ~ ρ ρ N d Stadard Bvarate Normal Dstrbuto? MLE ˆ ρ ρ ~ρ 0 µ j ρ 0 j Noe of these Rosa ad Gaola 00

BAYESIAN ESTIMATORS : observed data : arameters all uobserved quattes osteror dstrbuto ror dstrbuto samlg dstrbuto More o Baesa Iferece later

CONFIDENCE INTERVAL ˆ :estmator of Pr[ LL < ˆ < UL] α lower lmt uer lmt cofdece credblt If Pr[ ˆ LL] Pr[ ˆ UL] α / :Smmetrcal terval If Pr[ ˆ LL] Pr[ ˆ UL]: No - smmetrcal terval

CONFIDENCE INTERVAL Normal Aroxmatos for Obtag Cofdece Itervals Cetral Lmt Theorem α α α < < ˆ Pr / / z z α α α + < < ˆ ˆ Pr / / z z 0 ~ ˆ N P

APPROXIMATE CONFIDENCE INTERVAL CI[ ; α]: ˆ ± z α / Examle : ~ N µ d CI[ µ ; 95%]: ˆ µ ±.96 If s ukow use a estmate stead. The Studet t dstrbuto s more arorate though. CI[ µ ; 95%]: ˆ µ ± t ; α / s

APPROXIMATE CONFIDENCE INTERVAL Examle : ~ B d Beroull Aroxmate: CI[ ; 90%]: ˆ ±.65 ˆ ˆ More coservatve: 0.5 CI[ ; 90%]: ˆ ±.65 0.5 ˆ LL ˆ + ˆ + F[ ˆ + ˆ ; / ] Exact: ; α UL ˆ + + ˆ + ˆ / F [ ˆ + ; ˆ ; α / ]

HYPOTHESIS TESTING Lkelhood Rato Test LRT d ~ d L Suose: H0 : Θ0 vs. H : Θ0 0 LRT max L Θ0 LRT max L Θ Restrcted Θ 0 maxmzato Urestrcted maxmzato

HYPOTHESIS TESTING Let: H0 : 0 vs. H : 0 So Θ 0 reresets a uque value 0 L 0 LRT L ˆ Crtcal Rego: LRT < c How to choose the cutoff value c?

HYPOTHESIS TESTING H 0 s true Accet H 0 Reject H 0 Te I Error Sgfcace Level - α α H 0 s false β - β Te II Error Power

HYPOTHESIS TESTING ˆ log loglrt 0 L L Log-Lkelhood Rato Test 0 ~ ˆ log ϕ χ L L ϕ: degrees of freedom; Dfferece dmeso of the saces

MONTE CARLO METHODS AND RESAMPLING TECHNIQUES Bootstra Estmato of recso of samle statstcs b samlg wth relacemet from the orgal samle Jackkfe Estmato of recso of samle statstcs b usg a leave-oe-out aroach Permutato Radomzato Test Sgfcace tests aroach erformed b exchagg labels o data ots Cross-valdato k-fold ad leave-oe-out techques: arttog of samle to trag ad valdato or testg sets Markov Cha MCMC e.g. Gbbs Samlg

THE BOOTSTRAP Extremel useful for comutg stadard errors ad cofdece tervals Data Set: Pars Examle: Y Y * Iterest o correlato betwee Y ad Y. ± s ± s

THE NON-PARAMETRIC APPROACH Draw a samle of ars wth relacemet Comute the value of r call t r ad reeat the rocess a large umber B of tmes From the Bootstra estmates [r r r B ] comute stadard error ercetles cofdece terval etc. Defe the statstcs e.g. ad calculate ts value for the data set call t r* r j j j j

THE PARAMETRIC APPROACH Defe a dstrbuto samlg model e.g. j j d ~ N µ µ Estmate ts arameters ad calculate call t r* r ˆ ˆ ˆ Draw a samle of ars from µ ˆ ˆ µ ˆ ˆ ˆ Comute the value of r call t r ad reeat the rocess a large umber B of tmes From the Bootstra estmates [r r r B ] comute stadard error ercetles cofdece terval etc.

THE RANDOMIZATION TEST The basc dea s attractvel smle ad free of mathematcal assumtos Suose: Exermet Trt Trt From dstrbuto F From dstrbuto G H 0 : F G vs. H : F G ± s ± s

THE RANDOMIZATION TEST Combe the + observatos Take a samle of sze wthout relacemet to rereset the Grou C The remag observatos costtute the Grou T Comute the value of t call t t ad reeat the rocess a large umber B of tmes P-value: Σ It t*/b

THE RANDOMIZATION TEST Exermet Permutato Permutato Permutato B Trt Trt T 5 T 3 ± s ± s T ± s T 7 ± s T ± s T 3 4 ± s ± s ± s t* se t < t < < t B P-value: Σ It t*/b