BASIC PRINCIPLES OF STATISTICS

Similar documents
Probability and Statistics. What is probability? What is statistics?

Lecture 3. Sampling, sampling distributions, and parameter estimation

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Special Instructions / Useful Data

CS 2750 Machine Learning Lecture 5. Density estimation. Density estimation

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

STK3100 and STK4100 Autumn 2017

STK3100 and STK4100 Autumn 2018

CHAPTER 6. d. With success = observation greater than 10, x = # of successes = 4, and

Random Variables. ECE 313 Probability with Engineering Applications Lecture 8 Professor Ravi K. Iyer University of Illinois

Training Sample Model: Given n observations, [[( Yi, x i the sample model can be expressed as (1) where, zero and variance σ

Lecture Notes Types of economic variables

Econometric Methods. Review of Estimation

Parameter Estimation

STK4011 and STK9011 Autumn 2016

2SLS Estimates ECON In this case, begin with the assumption that E[ i

Chapter 14 Logistic Regression Models

Summary of the lecture in Biostatistics

Simulation Output Analysis

: At least two means differ SST

Chapter 8: Statistical Analysis of Simulated Data

Chapter 5 Properties of a Random Sample

Chapter 2 General Linear Hypothesis and Analysis of Variance

TESTS BASED ON MAXIMUM LIKELIHOOD

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

Chapter 13 Student Lecture Notes 13-1

Lecture 9. Some Useful Discrete Distributions. Some Useful Discrete Distributions. The observations generated by different experiments have

Parameter, Statistic and Random Samples

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

Point Estimation: definition of estimators

Law of Large Numbers

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Formulas and Tables from Beginning Statistics

ESS Line Fitting

4. Standard Regression Model and Spatial Dependence Tests

X ε ) = 0, or equivalently, lim

Nonparametric Density Estimation Intro

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Class 13,14 June 17, 19, 2015

Multiple Linear Regression Analysis

M2S1 - EXERCISES 8: SOLUTIONS

STA302/1001-Fall 2008 Midterm Test October 21, 2008

Qualifying Exam Statistical Theory Problem Solutions August 2005

å 1 13 Practice Final Examination Solutions - = CS109 Dec 5, 2018

Multivariate Transformation of Variables and Maximum Likelihood Estimation

2. Independence and Bernoulli Trials

Analysis of Variance with Weibull Data

D KL (P Q) := p i ln p i q i

Lecture Outline. Biost 517 Applied Biostatistics I. Comparing Independent Proportions. Summary Measures. Comparing Independent Proportions

Lecture Outline Biost 517 Applied Biostatistics I. Summary Measures Comparing Independent Proportions

Introduction to Econometrics (3 rd Updated Edition, Global Edition) Solutions to Odd-Numbered End-of-Chapter Exercises: Chapter 9

Comparing Different Estimators of three Parameters for Transmuted Weibull Distribution

Linear Regression with One Regressor

Objectives of Multiple Regression

ENGI 3423 Simple Linear Regression Page 12-01

Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests. Soccer Goals in European Premier Leagues

Chapter 3 Experimental Design Models

σ σ r = x i x N Statistics Formulas Sample Mean Population Mean Interquartile Range Population Variance Population Standard Deviation

THE ROYAL STATISTICAL SOCIETY 2009 EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA MODULAR FORMAT MODULE 2 STATISTICAL INFERENCE

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

Entropy, Relative Entropy and Mutual Information

Chapter 4 Multiple Random Variables

Chapter 5 Properties of a Random Sample

STATISTICAL INFERENCE

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

L(θ X) s 0 (1 θ 0) m s. (s/m) s (1 s/m) m s

22 Nonparametric Methods.

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

Chapter 8. Inferences about More Than Two Population Central Values

Comparison of Dual to Ratio-Cum-Product Estimators of Population Mean

Point Estimation: definition of estimators

Generative classification models

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Module 7. Lecture 7: Statistical parameter estimation

Significance Testing in Exact Logistic Multiple Regression

Some Applications of the Resampling Methods in Computational Physics

f f... f 1 n n (ii) Median : It is the value of the middle-most observation(s).

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

ρ < 1 be five real numbers. The

Chain Rules for Entropy

IFYMB002 Mathematics Business Appendix C Formula Booklet

Maximum Likelihood Estimation

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

Statistics MINITAB - Lab 5

Chapter 11 The Analysis of Variance

ECON 5360 Class Notes GMM

Uncertainty, Data, and Judgment

Bias Correction in Estimation of the Population Correlation Coefficient

Dr. Shalabh. Indian Institute of Technology Kanpur

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

THE ROYAL STATISTICAL SOCIETY 2010 EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA MODULE 2 STATISTICAL INFERENCE

Useful Statistical Identities, Inequalities and Manipulations

LINEAR REGRESSION ANALYSIS

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

A BAYESIAN APPROACH TO SHRINKAGE ESTIMATORS

ENGI 4421 Propagation of Error Page 8-01

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter

Transcription:

BASIC PRINCIPLES OF STATISTICS

PROBABILITY DENSITY DISTRIBUTIONS DISCRETE VARIABLES

BINOMIAL DISTRIBUTION ~ B 0 0 umber of successes trals Pr E [ ] Var[ ] ;

BINOMIAL DISTRIBUTION B7 0. B30 0.3 B50 0.5

MULTINOMIAL DISTRIBUTION ~ Mult k 0 ; k 0 ; k Pr!! k! k k E [ ] Var ] ; [ Cov[ ] j j

POISSON DISTRIBUTION λ ~ Posso λ λ > 0 0 Pr λ λ e! λ E[ λ ] Var[ λ] λ

POISSON DISTRIBUTION Posso Posso5 Posso5

PROBABILITY DENSITY DISTRIBUTIONS CONTINUOUS VARIABLES

α β β α UNIFORM DISTRIBUTION Uform ~ β α β α < < < β α β α ] [ β α β α + E ] [ α β β α Var ;

BETA DISTRIBUTION α β ~ Beta α β α > 0 ; β > 0 0 α β Γ α + β Γ α Γ β α β E[ α β] α α + β ; αβ Var[ α β] α + β α + β +

BETA DISTRIBUTION

EXPONENTIAL DISTRIBUTION λ ~ Ex λ λ > 0 0 λ λ λe 0 0 0 E[ λ] λ ; Var[ λ] λ

EXPONENTIAL DISTRIBUTION

{ } β α Γ β β α α α ex GAMMA DISTRIBUTION Gamma β α β α ~ ad > 0 β α 0 > β α β α ] [ E ] [ β α β α Var ;

GAMMA DISTRIBUTION

CHI-SQUARE DISTRIBUTION ϕ ~ χϕ ϕ > 0 > 0 [same as Gamma ϕ / / ] ϕ ϕ / Γ ϕ / ϕ / ex { / } E[ ϕ ] ϕ ; Var[ ϕ] ϕ

NORMAL GAUSSIAN DISTRIBUTION µ ~Nµ < µ < > 0 < < µ ex µ π E[ µ ] µ ; Var [ µ ]

NORMAL GAUSSIAN DISTRIBUTION Mea ad varace defe the dstrbuto µ A µ B < µ C A C > B But roortos.e. the bellshae are alwas the same. 68.3% 95.5% 99.8%

NORMAL GAUSSIAN DISTRIBUTION x ~ Normal Cetral Lmt Theorem z ~ N0 µ + z ~ N µ w > 0 ad log w ~ Normal w : logormal varable

Relatoshs amog commo dstrbutos Sold les: trasformatos ad secal cases Dashed les: lmts Leems 986

MULTIVARIATE NORMAL DISTRIBUTION < µ < µ Σ~N P µ Σ Σ : ostve defte < < µ Σ π / Σ / ex µ ' Σ µ E[ µ Σ] µ ; Var[ µ Σ ] Σ z ~ N 0 I µ + Az ~ N µ Σ where Σ ' AA

MULTIVARIATE NORMAL DISTRIBUTION

MULTIVARIATE NORMAL: MARGINAL DISTRIBUTIONS ' ' ' ' ' ' µ µ µ ad Σ Σ Σ Σ Σ ad : - ad - dmetoal vectors; + d / / ' π Σ ex µ Σ µ

MULTIVARIATE NORMAL: MARGINAL DISTRIBUTIONS

MULTIVARIATE NORMAL: CONDITIONAL DISTRIBUTIONS ' ' ' ' ' ' µ µ µ ad Σ Σ Σ Σ Σ ad : - ad - dmetoal vectors; + / / π Var ex ' E [ Var ] E E µ + ΣΣ µ ; Var Σ Σ Σ Σ

MULTIVARIATE NORMAL: CONDITIONAL DISTRIBUTIONS x 0 x 0

POINT ESTIMATION METHODS OF FINDING ESTIMATORS Method of Momets Least Squares Maxmum Lkelhood Baesa Estmators

METHOD OF MOMENTS d ~ k Equate the frst k samle momets to the corresodg k oulato momets ad solve the sstem of smultaeous equatos. Samle Momets m m m k k ; ; ; Poulato Momets µ E[ ] µ E[ ] µ k E [ k ]

EXAMPLE Samle Momets Poulato Momets d ~ N µ k m m ; ; µ E[ ] µ µ + E[ ] µ µˆ µˆ ˆ ˆ µ + ˆ

LEAST SQUARES A MATHEMATICAL SOLUTION x a + bx x x x α + β + ε x Resdual Sum of Squares: RSS [ a + bx ]

LEAST SQUARES + bx a RSS ] [ 0 ] [ bx a a RSS 0 ] [ x bx a b RSS bx a xx x S S b x x x S xx x x S

MAXIMUM LIKELIHOOD ~ k d k d L Lkelhood Fucto: log L l Log-Lkelhood Fucto: k log ˆ MLE Θ a ˆ L L arameter sace

MAXIMUM LIKELIHOOD Fdg the maxmum of L : L 0 solutos are ossble caddates L ˆ < 0 maxmum Check also the boudares of the arameter sace!!

Pr L Examle : ~ B d log log l + l d d 0 ˆ ˆ ˆ 0 ˆ < l d d Check:

L / ex µ µ µ Examle : ~ µ N d l log µ µ l µ µ µ + l 4 µ µ ˆµ ˆ ˆ µ

Examle 3: ~ ρ ρ µ µ N d ρ π + ex µ µ ρ µ µ ρ ρ j j ˆµ j j j ˆ ˆ µ ˆ ˆ ˆ ˆ ˆ µ µ µ µ ρ

+ ex ρ ρ ρ π ρ ˆ ˆ ˆ ˆ ˆ µ µ µ µ ρ Examle 4: 0 0 ~ ρ ρ N d Stadard Bvarate Normal Dstrbuto? MLE ˆ ρ ρ ~ρ 0 µ j ρ 0 j Noe of these Rosa ad Gaola 00

BAYESIAN ESTIMATORS : observed data : arameters all uobserved quattes osteror dstrbuto ror dstrbuto samlg dstrbuto More o Baesa Iferece later

CONFIDENCE INTERVAL ˆ :estmator of Pr[ LL < ˆ < UL] α lower lmt uer lmt cofdece credblt If Pr[ ˆ LL] Pr[ ˆ UL] α / :Smmetrcal terval If Pr[ ˆ LL] Pr[ ˆ UL]: No - smmetrcal terval

CONFIDENCE INTERVAL Normal Aroxmatos for Obtag Cofdece Itervals Cetral Lmt Theorem α α α < < ˆ Pr / / z z α α α + < < ˆ ˆ Pr / / z z 0 ~ ˆ N P

APPROXIMATE CONFIDENCE INTERVAL CI[ ; α]: ˆ ± z α / Examle : ~ N µ d CI[ µ ; 95%]: ˆ µ ±.96 If s ukow use a estmate stead. The Studet t dstrbuto s more arorate though. CI[ µ ; 95%]: ˆ µ ± t ; α / s

APPROXIMATE CONFIDENCE INTERVAL Examle : ~ B d Beroull Aroxmate: CI[ ; 90%]: ˆ ±.65 ˆ ˆ More coservatve: 0.5 CI[ ; 90%]: ˆ ±.65 0.5 ˆ LL ˆ + ˆ + F[ ˆ + ˆ ; / ] Exact: ; α UL ˆ + + ˆ + ˆ / F [ ˆ + ; ˆ ; α / ]

HYPOTHESIS TESTING Lkelhood Rato Test LRT d ~ d L Suose: H0 : Θ0 vs. H : Θ0 0 LRT max L Θ0 LRT max L Θ Restrcted Θ 0 maxmzato Urestrcted maxmzato

HYPOTHESIS TESTING Let: H0 : 0 vs. H : 0 So Θ 0 reresets a uque value 0 L 0 LRT L ˆ Crtcal Rego: LRT < c How to choose the cutoff value c?

HYPOTHESIS TESTING H 0 s true Accet H 0 Reject H 0 Te I Error Sgfcace Level - α α H 0 s false β - β Te II Error Power

HYPOTHESIS TESTING ˆ log loglrt 0 L L Log-Lkelhood Rato Test 0 ~ ˆ log ϕ χ L L ϕ: degrees of freedom; Dfferece dmeso of the saces

MONTE CARLO METHODS AND RESAMPLING TECHNIQUES Bootstra Estmato of recso of samle statstcs b samlg wth relacemet from the orgal samle Jackkfe Estmato of recso of samle statstcs b usg a leave-oe-out aroach Permutato Radomzato Test Sgfcace tests aroach erformed b exchagg labels o data ots Cross-valdato k-fold ad leave-oe-out techques: arttog of samle to trag ad valdato or testg sets Markov Cha MCMC e.g. Gbbs Samlg

THE BOOTSTRAP Extremel useful for comutg stadard errors ad cofdece tervals Data Set: Pars Examle: Y Y * Iterest o correlato betwee Y ad Y. ± s ± s

THE NON-PARAMETRIC APPROACH Draw a samle of ars wth relacemet Comute the value of r call t r ad reeat the rocess a large umber B of tmes From the Bootstra estmates [r r r B ] comute stadard error ercetles cofdece terval etc. Defe the statstcs e.g. ad calculate ts value for the data set call t r* r j j j j

THE PARAMETRIC APPROACH Defe a dstrbuto samlg model e.g. j j d ~ N µ µ Estmate ts arameters ad calculate call t r* r ˆ ˆ ˆ Draw a samle of ars from µ ˆ ˆ µ ˆ ˆ ˆ Comute the value of r call t r ad reeat the rocess a large umber B of tmes From the Bootstra estmates [r r r B ] comute stadard error ercetles cofdece terval etc.

THE RANDOMIZATION TEST The basc dea s attractvel smle ad free of mathematcal assumtos Suose: Exermet Trt Trt From dstrbuto F From dstrbuto G H 0 : F G vs. H : F G ± s ± s

THE RANDOMIZATION TEST Combe the + observatos Take a samle of sze wthout relacemet to rereset the Grou C The remag observatos costtute the Grou T Comute the value of t call t t ad reeat the rocess a large umber B of tmes P-value: Σ It t*/b

THE RANDOMIZATION TEST Exermet Permutato Permutato Permutato B Trt Trt T 5 T 3 ± s ± s T ± s T 7 ± s T ± s T 3 4 ± s ± s ± s t* se t < t < < t B P-value: Σ It t*/b