Goodness of fit and Wilks theorem

Similar documents
ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

Composite Hypotheses testing

Lecture 20: Hypothesis testing

Lecture Notes on Linear Regression

4.1. Lecture 4: Fitting distributions: goodness of fit. Goodness of fit: the underlying principle

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

STAT 3008 Applied Regression Analysis

Lecture 4 Hypothesis Testing

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

The Geometry of Logit and Probit

Chapter 11: Simple Linear Regression and Correlation

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis

ISQS 6348 Final Open notes, no books. Points out of 100 in parentheses. Y 1 ε 2

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

Estimation: Part 2. Chapter GREG estimation

Limited Dependent Variables

Conjugacy and the Exponential Family

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

First Year Examination Department of Statistics, University of Florida

Linear Approximation with Regularization and Moving Least Squares

x = , so that calculated

Canonical transformations

Thermodynamics and statistical mechanics in materials modelling II

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Statistical models with uncertain error parameters

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Cathy Walker March 5, 2010

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Laboratory 1c: Method of Least Squares

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Statistics Chapter 4

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

Computing MLE Bias Empirically

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede. . For P such independent random variables (aka degrees of freedom): 1 =

Vapnik-Chervonenkis theory

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Supplementary material: Margin based PU Learning. Matrix Concentration Inequalities

Laboratory 3: Method of Least Squares

Hydrological statistics. Hydrological statistics and extremes

Joint Statistical Meetings - Biopharmaceutical Section

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Economics 130. Lecture 4 Simple Linear Regression Continued

F statistic = s2 1 s 2 ( F for Fisher )

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

As is less than , there is insufficient evidence to reject H 0 at the 5% level. The data may be modelled by Po(2).

Negative Binomial Regression

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

Statistics for Economics & Business

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

An Application of Fuzzy Hypotheses Testing in Radar Detection

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

Applied Stochastic Processes

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

Basic Statistical Analysis and Yield Calculations

Stat 642, Lecture notes for 01/27/ d i = 1 t. n i t nj. n j

BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data

π e ax2 dx = x 2 e ax2 dx or x 3 e ax2 dx = 1 x 4 e ax2 dx = 3 π 8a 5/2 (a) We are considering the Maxwell velocity distribution function: 2πτ/m

b ), which stands for uniform distribution on the interval a x< b. = 0 elsewhere

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

The written Master s Examination

/ n ) are compared. The logic is: if the two

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

More metrics on cartesian products

Linear Regression Analysis: Terminology and Notation

Open Systems: Chemical Potential and Partial Molar Quantities Chemical Potential

10.34 Fall 2015 Metropolis Monte Carlo Algorithm

The Jacobsthal and Jacobsthal-Lucas Numbers via Square Roots of Matrices

Convergence of random processes

Effects of Ignoring Correlations When Computing Sample Chi-Square. John W. Fowler February 26, 2012

Explaining the Stein Paradox

A REVIEW OF ERROR ANALYSIS

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

STAT 511 FINAL EXAM NAME Spring 2001

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Chapter 12 Analysis of Covariance

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics )

Homework Assignment 3 Due in class, Thursday October 15

Maximum Likelihood Estimation

Modeling and Simulation NETW 707

Lecture 3 Stat102, Spring 2007

Lecture 10 Support Vector Machines II

Math1110 (Spring 2009) Prelim 3 - Solutions

Multiple Choice. Choose the one that best completes the statement or answers the question.

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

CHAPTER 6 GOODNESS OF FIT AND CONTINGENCY TABLE PREPARED BY: DR SITI ZANARIAH SATARI & FARAHANIM MISNI

+, where 0 x N - n. k k

Parameters Estimation of the Modified Weibull Distribution Based on Type I Censored Samples

Transcription:

DRAFT 0.0 Glen Cowan 3 June, 2013 Goodness of ft and Wlks theorem Suppose we model data y wth a lkelhood L(µ) that depends on a set of N parameters µ = (µ 1,...,µ N ). Defne the statstc t µ ln L(µ) L(ˆµ), (1) where ˆµ are the ML estmators for µ. The value of t µ s a measure of how well the hypotheszed set of parameters µ stand n agreement wth the data. If the agreement s poor, then ˆµ wll be far from µ, the rato of lkelhoods wll be low and t µ wll be large. Larger values of t µ thus ndcate ncreasng ncompatblty between the data and the hypotheszed µ. Accordng to Wlks theorem, f the parameter values µ are true, then the asymptotc lmtofalargedatasample, thepdfoft µ sach-squaredstrbutonforn degreesoffreedom. We wll wrte ths as f(t µ µ) χ 2 N. (2) Suppose we have a data set that gves us an observed value of the statstc t µ,obs. We can quantfy the level of compatblty between µ and the observed data by computng the p-value p µ = t µ,obs f χ 2 N (t µ µ)dt µ. (3) Nowsupposethatthesetofparametersµcanbeexpressedasµ(θ)whereθ = (θ 1,...,θ M ) s a set of M parameters wth M < N. Now defne q µ ln L(µ(ˆθ)) L(ˆµ). (4) That s, n the numerator we adjust M parameters and n the denomnator N. In ths case, Wlks theorem states f(q µ µ(θ)) χ 2 N M (5) Provded certan regularty condtons are satsfed, ths holds regardless of the value of θ. Ths s a very useful property that allows one to compute p-values wthout needng to assume partcular values for the parameters θ. In ths case the p-value reflects the compatblty of the assumed functonal form µ(θ). 1

1 Gaussan data Suppose that the data are a set of N ndependent Gaussan dstrbuted values, y Gauss(µ,σ ), = 1,...,N, (6) where the standard devatons σ are known but the µ must be determned from the data. The lkelhood s so that the log-lkelhood s L(µ) = N 1 2πσ e (y µ ) 2 /2σ 2, (7) lnl(µ) = 1 (y µ ) 2 2 σ 2 +C, (8) where C does not depend on µ. By settng the dervatves of lnl(µ) wth respect to the µ to zero we fnd the ML estmators to be and from ths we fnd ˆµ = y, (9) t µ ln L(µ) L(ˆµ) = N (y µ ) 2 σ 2. (10) In the case where M parameters θ 1,...,θ M are ftted, the statstc q µ s q µ ln L(µ(ˆθ)) L(ˆµ) (y µ (ˆθ)) 2 = σ 2. (11) Thus we can use the mnmzed value of the sum of squares from an LS ft to test the goodness of ft. In such a case the values of µ are obtaned by assumng a functonal relaton between µ and a control varable x, whose value s fxed for each measurement of y. That s, µ (θ) = µ(x ;θ), = 1,...,N. (12) The p-value therefore reflects the degree of compatblty between the data and the functonal form µ(x; θ). 2 Hstogram of Posson or multnomal data Consder now a set of data values n = (n 1,...,n N ) whch we may thnk of as a hstogram wth N bns. Suppose the values are ndependent and Posson dstrbuted wth mean values ν, so that the jont probablty for the vector s 2

N P(n;ν) = ν! e ν. (13) The log-lkelhood s therefore where C represents terms that do not depend on ν. lnl(ν) = lnν ν ]+C, (14) If we regard each of the ν as adjustable, then by settng the dervatves of lnl(ν) wth respect to all of the ν to zero we fnd the ML estmators ˆν =, = 1,...,N. (15) Usng ths we can wrte down the statstc analogous to Eq. (1), t ν ln L(ν) L(ˆν) (16) ln ν ] ν + ˆν ˆν ln ν ] ν + (17), (18) where n the fnal lne we used ˆν =. By gong back to the orgnal Posson probabltes one can see that f = 0, then the logarthmc term n Eq. (16) s n fact absent. As wth the statstc t µ from above, Wlks theorem says that the dstrbuton of t ν approaches a ch-square dstrbuton for N degrees of freedom n the lmt of a large data sample. Here one can see the role of the large sample lmt, snce then the estmators ˆν = become approxmately Gaussan dstrbuted. Now suppose that the set of N mean values ν can be determned through a set of M parameters θ = (θ 1,...,θ M ). We can then defne the statstc q ν ln L(ν(ˆθ)) L(ˆν) ln ν ] (ˆθ) ν (ˆθ)+. (19) As wth the statstc q µ above, ths wll follow a ch-square dstrbuton for N M degrees of freedom. In some problems one may want to model a hstogram of values n = (n 1,...,n N ) as followng a multnomal dstrbuton. Ths s smlar to the Posson case above except that the total number of entres, n tot = (20) 3

s regarded as constant. There are n effect N 1 free parameters n the problem, whch can be taken as all but one of the probabltes p = (p 1,...,p N ) for an event to be n one of the N bns. One of the p s fxed from the constrant The multnomal dstrbuton for s P(n p,n tot ) = p = 1. (21) n tot! n 1!n 2!...n N! pn 1 1 pn 2 2...pn N N. (22) Snce n tot s fxed, we can regard the parameters to be ν = p n tot. The log-lkelhood functon s then lnl(ν) = ln ν n tot +C. (23) As n the Posson case the ML estmators for the ν are found to be ˆν =, so the statstc t ν then becomes t ν ln ν. (24) That s, t s the same as n the Posson case but wthout the terms ν +. Because here there are only N 1 ftted parameters (one of the ˆν can be determned from n tot mnus the sum of the rest), Wlks theorem says that t ν follows a ch-square dstrbuton for N 1 degrees of freedom. If the N mean values ν are determned from M parameters θ = (θ 1,...,θ M ), then the dstrbuton of the correspondng q ν, q ν s a ch-square dstrbuton for N M 1 degrees of freedom. ln ν (ˆθ), (25) Now suppose nstead of evaluatng the ν terms n Eqs. (19) and (25) wth the ML estmators for θ, we wrte the correspondng quanttes as a functon of θ,.e., χ 2 M(θ) χ 2 P(θ) ln ν (θ), (26) ln ν ] (θ) ν (θ)+, (27) where the subscrpts M and P refer to the multnomal or Posson cases, respectvely. These expressons are equal to the correspondng values of 2 ln L(θ). So to maxmze the lkelhood one can smply mnmze χ 2 P (θ) or χ2 M (θ), and the same ML estmators ˆθ wll result. 4

As an added bonus, however, the value of the mnmzed functon can be used drectly for a test of the goodness of ft, and to the extent that Wlks theorem s satsfed, ts samplng dstrbutos a ch-square dstrbuton for N M (Posson) or N M 1 (multnomal) degrees of freedom. References 1] S.S. Wlks, The large-sample dstrbuton of the lkelhood rato for testng composte hypotheses, Ann. Math. Statst. 9 (1938) 60-2. 2] G. Cowan, Statstcal Data Analyss, Oxford Unversty Press, 1998. 3] Steve Baker and Robert D. Cousns, Clarfcaton of the use of the ch-square and lkelhood functons n fts to hstograms, NIM 221 (1984) 437. 5