Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square

Size: px
Start display at page:

Download "Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square"

Transcription

1 Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square Yuxin Chen Electrical Engineering, Princeton University

2 Coauthors Pragya Sur Stanford Statistics Emmanuel Candès Stanford Statistics & Math 2/ 26

3 In memory of Tom Cover ( ) Stanford EE We all know the feeling that follows when one investigates a problem, goes through a large amount of algebra, and finally investigates the answer to find that the entire problem is illuminated not by the analysis but by the inspection of the answer 3/ 26

4 Inference in regression problems Example: logistic regression y i logistic-model ( x i β ), 1 i n 4/ 26

5 Inference in regression problems Example: logistic regression y i logistic-model ( x i β ), 1 i n One wishes to determine which covariate is of importance, i.e. β j = 0 vs. β j 0 (1 j p) 4/ 26

6 Classical tests β j = 0 vs. β j 0 (1 j p) Standard approaches (widely used in R, Matlab, etc): use asymptotic distributions of certain statistics 5/ 26

7 Classical tests β j = 0 vs. β j 0 (1 j p) Standard approaches (widely used in R, Matlab, etc): use asymptotic distributions of certain statistics Wald test: Wald statistic χ 2 Likelihood ratio test: log-likelihood ratio statistic χ 2 Score test:... score N (0, Fisher Info) 5/ 26

8 Example: logistic regression in R (n = 100, p = 30) > fit = glm(y ~ X, family = binomial) > summary(fit) Call: glm(formula = y ~ X, family = binomial) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) X X X X X X ** Signif. codes: 0 *** ** 0.01 * Can these inference calculations (e.g. p-values) be trusted? 6/ 26

9 This talk: likelihood ratio test (LRT) β j = 0 vs. β j 0 (1 j p) Log-likelihood ratio (LLR) statistic LLR j := l ( β ) l ( β ( j) ) l( ): log-likelihood β = arg max β l(β): unconstrained MLE 7/ 26

10 This talk: likelihood ratio test (LRT) β j = 0 vs. β j 0 (1 j p) Log-likelihood ratio (LLR) statistic LLR j := l ( β ) l ( β ( j) ) l( ): log-likelihood β = arg max β l(β): unconstrained MLE β ( j) = arg max β:βj =0 l(β): constrained MLE 7/ 26

11 Wilks phenomenon 1938 Samuel Wilks, Princeton βj = 0 vs. βj 6= 0 (1 j p) LRT asymptotically follows chi-square distribution (under null) d 2 LLRj χ21 (p fixed, n ) 8/ 26

12 30000 Wilks phenomenon Counts Counts P Values classical Wilks j = 0 Samuel Wilks P Values Bartlett-corrected vs. j = 0 (1 Æ j Æ p p-value LRT asymptotically follows chi-square distribution Bartlett correction (finite sample effect): 2 LLRj æ 2LLRj 1+ n /n 21 (p fixed, n æ Œ) 1 p-values are still non-uniform æ this is NOT finite sa Samuel Wilks, Princeton βj = 0 assess significance of coefficients vs. βj 6= 0 (1 j p) LRT asymptotically follows chi-square distribution (under null) d 2 LLRj χ21 (p fixed, n ) 8/ 26

13 Classical LRT in high dimensions p/n (1, ) Linear regression y = Xβ + η }{{} i.i.d. Gaussian classical p-values arenuniform For linear regression (with Gaussian noise) in high dimensions, 2LLR j χ 2 1 (classical test always works) 9/ 26

14 Classical LRT in high dimensions p = 1200, n = 4000 Logistic regression y logistic-model(xβ) classical p-values are highly nonuniform 10/ 26

15 Classical LRT in high dimensions p = 1200, n = 4000 Logistic regression y logistic-model(xβ) classical p-values are highly nonuniform Wilks theorem seems inadequate in accommodating logistic regression in high dimensions 10/ 26

16 Bartlett correction? (n = 4000, p = 1200) Counts P Values classical Wilks Bartlett correction (finite sample effect): 2LLR j 1+α n/n χ2 1 11/ 26

17 Bartlett correction? (n = 4000, p = 1200) Counts Counts P Values P Values classical Wilks Bartlett-corrected 2LLR Bartlett correction (finite sample effect): j 1+α n/n χ2 1 p-values are still non-uniform this is NOT finite sample effect 11/ 26

18 Bartlett correction? (n = 4000, p = 1200) Counts Counts P Values P Values classical Wilks Bartlett-corrected 2LLR Bartlett correction (finite sample effect): j 1+α n/n χ2 1 p-values are still non-uniform this is NOT finite sample effect What happens in high dimensions? 11/ 26

19 Our findings Counts Counts Counts P Values P Values P Values classical Wilks Bartlett-corrected rescaled χ 2 2LLR Bartlett correction (finite sample effect): j 1+α n/n χ2 1 p-values are still non-uniform this is NOT finite sample effect A glimpse of our theory: LRT follows a rescaled χ 2 distribution 11/ 26

20 ood: fy X (y x} Y unknown signal: observation: Y likelihood: (y x} X observation: Y fy X funknown unknown signal: X Yob Xlikelihood: Y X (y x}signal: Problem formulation (formal) X fy X (y x} y Y X likelihood: fy X (y x} unknown signal: X yunknown X y X signal: observation: Y X X y n observation y n p p ind. Gaussian design: Xi N (0, Σ) Logistic model: yi = 1, 1, with prob. with prob. 1 1+exp( Xi> β) 1 1+exp(Xi> β) 1 i n Proportional growth: p/n constant Global null: β = 0 12/ 26

21 For p/n > 1/2 yimle does 2 {0, 1} not or y exist i 2 { 1, 1} When does MLE exist? n X Class labels (MLE) maximizeβ `(β) = Class labels n o log 1 + exp( yi Xi> β) i=1 {z yi 2 {0, 1} or y 0 i yi 2 {0, 1} or y i 2 { 1, 1} 2 { 1, 1}} p-values are uniform p-values are highly non-uniform p-values are uniform p-values are highly non-uniform yi = 1 yi = 1 yi = 1 yi = 1 MLE is unbounded if perfect separating hyperplane 13/ 26

22 When does MLE exist? (MLE) maximizeβ `(β) = n X n o log 1 + exp( yi Xi> β) i=1 {z 0 } If a hyperplane that perfectly separates {yi }, i.e. βb then MLE is unbounded s.t. yi Xi> βb > 0 for all i lim `( a aβb {z} )=0 unbounded 13/ 26

23 When does MLE exist? Separating capacity (Tom Cover, Ph. D. thesis 1965) y i =1 y i = 1 n =2 n =4 n = 12 number of samples n increases = more difficult to find separating hyperplane 14/ 26

24 When does MLE exist? Separating capacity (Tom Cover, Ph. D. thesis 1965) y i =1 y i = 1 n =2 n =4 n = 12 Theorem 1 (Cover 1965) Under i.i.d. Gaussian design, a separating hyperplane exists with high prob. iff n/p < 2 (asymptotically) 14/ 26

25 Main result: asymptotic distribution of LRT Theorem 2 (Sur, Chen, Candès 2017) Suppose n/p > 2. Under i.i.d. Gaussian design and global null, ( d p 2 LLR j α χ n) 2 1 }{{} rescaled χ 2 15/ 26

26 Main result: asymptotic distribution of LRT Theorem 2 (Sur, Chen, Candès 2017) Suppose n/p > 2. Under i.i.d. Gaussian design and global null, ( d p 2 LLR j α χ n) 2 1 }{{} rescaled χ 2 α(p/n) can be determined by solving a system of 2 nonlinear equations and 2 unknowns τ 2 = n p E [ (Ψ(τZ; b)) 2] p n = E [ Ψ (τz; b) ] where Z N (0, 1), Ψ is some operator, and α(p/n) = τ 2 /b 15/ 26

27 Main result: asymptotic distribution of LRT Theorem 2 (Sur, Chen, Candès 2017) Suppose n/p > 2. Under i.i.d. Gaussian design and global null, ( d p 2 LLR j α χ n) 2 1 }{{} rescaled χ 2 α(p/n) can be determined by solving a system of 2 nonlinear equations and 2 unknowns α( ) depends only on aspect ratio p/n It is not a finite sample effect α(0) = 1: matches classical theory 15/ 26

28 Our adjusted LRT theory in practice rescaling constant (p/n) Rescaling Constant κ p/n Counts P Values rescaling constant for logistic model empirical p-values Unif(0, 1) Empirically, LRT rescaled χ 2 1 (as predicted) 16/ 26

29 Validity of tail approximation Empirical cdf t Empirical CDF of adjusted pvalues (n = 4000, p = 1200) Empirical CDF is in near-perfect aggreement with diagonal, suggesting that our theory is remarkably accurate even when we zoom in 17/ 26

30 Efficacy under moderate sample sizes Empirical cdf t Empirical CDF of adjusted pvalues (n = 200, p = 60) Our theory seems adequete for moderately large samples 18/ 26

31 Universality: non-gaussian covariates Counts Counts 5000 Counts P Values P Values P Values classical Wilks Bartlett-corrected rescaled χ 2 i.i.d. Bernoulli design, n = 4000, p = / 26

32 Connection to convex geometry 20/ 26

33 For p/n > 1/2 MLE does not exist sign(xi b) = 1 Class labels sign( yi Xi b) = 1 8i i Xb > 0 Connection yi 2 {0, 1} or y i 2 to convex geometry Perfect separation: b Class labels yi {0, 1} or y i { 1, 1} d Prob. rand. subspace pos. orth. > sign( yi Xi> b)intersects = sign(x i b) sign( yi Xino b) separating =1 i { 1, 1} p P {Xb b R } Rn+ = {0}? hyperplane d sign( yi Xi b) = sign(xi b) sign(xi> b) = 1 8i () sign(xi b) = 1 i Xb > 0 Xb > 0 Prob. rand. subspace intersects pos. orth. = {0}? norn+separating P {Xb b 2 Rp } \ Rn+ 6= {0}? separating hyperplane Prob. rand. subspace intersects pos. orth. P {Xb b Rp } hyperplane exists WLOG, if y1 = = yn = 1, then separability ) = range(x) fl Rn+ = {0} separating hyperplane exists WLOG, if y1 = = yn = 1, then separability = n * (1) 20/ 24 o range(x) Rn+ 6= {0} {z } can be analyzed via convex geometry (e.g. Amelunxen et al.) 20/ 26

34 Connection to robust M-estimation Since y i = ±1 and X i ind. N (0, Σ), maximize β l(β) = d = n i=1 n { } log 1 + exp( y i Xi β) log { } 1 + exp( Xi β) i=1 } {{} n := i=1 ρ( X iβ) 21/ 26

35 Connection to robust M-estimation Since y i = ±1 and X i ind. N (0, Σ), maximize β l(β) = d = n i=1 n { } log 1 + exp( y i Xi β) log { } 1 + exp( Xi β) i=1 } {{} n := i=1 ρ( X iβ) = MLE β n = arg min ρ(y i X i β) β i=1 }{{} robust M-estimation with y = 0 21/ 26

36 Connection to robust M-estimation MLE β n = arg min ρ(y i X i β) β i=1 }{{} robust M-estimation Variance inflation as p/n (El Karoui et al. 13, Donoho, Montanari 13) 22/ 26

37 Connection to robust M-estimation MLE β n = arg min ρ(y i X i β) β i=1 }{{} robust M-estimation Variance inflation as p/n (El Karoui et al. 13, Donoho, Montanari 13) rescaling constant (p/n) Rescaling Constant κ p/n variance inflation increasing rescaling factor 22/ 26

38 This is not just about logistic regression Our theory is applicable to logistic model probit model linear model (under Gaussian noise) rescaling const α(p/n) 1 (consistent with classical theory) linear model (under non-gaussian noise) 23/ 26

39 Key ingredients in establishing our theory Key step is to realize that 2 LLR j d p b(p/n) β 2 j }{{} rescaled χ 2 where b( ) depends only on p n, β ( α( j N 0, n)b( p n) p ) p 24/ 26

40 Key ingredients in establishing our theory Key step is to realize that 2 LLR j d p b(p/n) β 2 j }{{} rescaled χ 2 where b( ) depends only on p n, β ( α( j N 0, n)b( p n) p ) p 1. Convex geometry: show β = O(1) 2. Approximate message passing: determine asymptotic distribution of β 3. Leave-one-out analysis: characterize marginal distribution of β j (rescaled Gaussian) and ensure higher-order terms vanish 24/ 26

41 A small sample of prior work Candès, Fan, Janson, Lv 16: observed empirically nonuniformity of p-values in logistic regression Fan, Demirkaya, Lv 17: classical asymptotic normality of MLE (basis of Wald test) fails to hold in logistic regression when p n 2/3 El Karoui, Beana, Bickel, Limb, Yu 13, El Karoui 13, Donoho, Montanari 13, El Karoui 15: robust M-estimation for linear models (assuming strong convexity) 25/ 26

42 Summary Caution needs to be exercised when applying classical statistical procedures in a high-dimensional regime a regime of growing interest and relevance What shall we do under non-null (β 0)? Paper: The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled chi-square, Pragya Sur, Yuxin Chen, Emmanuel Candès, / 26

A Modern Maximum-Likelihood Theory for High-dimensional Logistic Regression

A Modern Maximum-Likelihood Theory for High-dimensional Logistic Regression A Modern Maximum-Likelihood Theory for High-dimensional Logistic Regression Pragya Sur Emmanuel J. Candès April 2018 Abstract Every student in statistics or data science learns early on that when the sample

More information

A Modern Maximum-Likelihood Theory for High-dimensional Logistic Regression

A Modern Maximum-Likelihood Theory for High-dimensional Logistic Regression A Modern Maximum-Likelihood Theory for High-dimensional Logistic Regression Pragya Sur Emmanuel J. Candès April 25, 2018 Abstract Every student in statistics or data science learns early on that when the

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46 A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

Exercises Chapter 4 Statistical Hypothesis Testing

Exercises Chapter 4 Statistical Hypothesis Testing Exercises Chapter 4 Statistical Hypothesis Testing Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans December 5, 013 Christophe Hurlin (University of Orléans) Advanced Econometrics

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model Regression: Part II Linear Regression y~n X, 2 X Y Data Model β, σ 2 Process Model Β 0,V β s 1,s 2 Parameter Model Assumptions of Linear Model Homoskedasticity No error in X variables Error in Y variables

More information

Logistic Regression 21/05

Logistic Regression 21/05 Logistic Regression 21/05 Recall that we are trying to solve a classification problem in which features x i can be continuous or discrete (coded as 0/1) and the response y is discrete (0/1). Logistic regression

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

The Phase Transition for the Existence of the Maximum Likelihood Estimate in High-dimensional Logistic Regression

The Phase Transition for the Existence of the Maximum Likelihood Estimate in High-dimensional Logistic Regression The Phase Transition for the Existence of the Maximum Likelihood Estimate in High-dimensional Logistic Regression Emmanuel J. Candès Pragya Sur April 25, 2018 Abstract This paper rigorously establishes

More information

Inference for High Dimensional Robust Regression

Inference for High Dimensional Robust Regression Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:

More information

Poisson regression 1/15

Poisson regression 1/15 Poisson regression 1/15 2/15 Counts data Examples of counts data: Number of hospitalizations over a period of time Number of passengers in a bus station Blood cells number in a blood sample Number of typos

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

high-dimensional inference robust to the lack of model sparsity

high-dimensional inference robust to the lack of model sparsity high-dimensional inference robust to the lack of model sparsity Jelena Bradic (joint with a PhD student Yinchu Zhu) www.jelenabradic.net Assistant Professor Department of Mathematics University of California,

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00 Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section

More information

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

Modeling Overdispersion

Modeling Overdispersion James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 Introduction 2 Introduction In this lecture we discuss the problem of overdispersion in

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability

More information

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Statistical modelling: Theory and practice Gilles Guillot gigu@dtu.dk November 4, 2013 Gilles Guillot (gigu@dtu.dk)

More information

2.1 Linear regression with matrices

2.1 Linear regression with matrices 21 Linear regression with matrices The values of the independent variables are united into the matrix X (design matrix), the values of the outcome and the coefficient are represented by the vectors Y and

More information

Applied Asymptotics Case studies in higher order inference

Applied Asymptotics Case studies in higher order inference Applied Asymptotics Case studies in higher order inference Nancy Reid May 18, 2006 A.C. Davison, A. R. Brazzale, A. M. Staicu Introduction likelihood-based inference in parametric models higher order approximations

More information

8. Hypothesis Testing

8. Hypothesis Testing FE661 - Statistical Methods for Financial Engineering 8. Hypothesis Testing Jitkomut Songsiri introduction Wald test likelihood-based tests significance test for linear regression 8-1 Introduction elements

More information

Logistic Regressions. Stat 430

Logistic Regressions. Stat 430 Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to

More information

AFT Models and Empirical Likelihood

AFT Models and Empirical Likelihood AFT Models and Empirical Likelihood Mai Zhou Department of Statistics, University of Kentucky Collaborators: Gang Li (UCLA); A. Bathke; M. Kim (Kentucky) Accelerated Failure Time (AFT) models: Y = log(t

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Lecture 5: LDA and Logistic Regression

Lecture 5: LDA and Logistic Regression Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant

More information

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,

More information

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis STAT 6350 Analysis of Lifetime Data Failure-time Regression Analysis Explanatory Variables for Failure Times Usually explanatory variables explain/predict why some units fail quickly and some units survive

More information

1/15. Over or under dispersion Problem

1/15. Over or under dispersion Problem 1/15 Over or under dispersion Problem 2/15 Example 1: dogs and owners data set In the dogs and owners example, we had some concerns about the dependence among the measurements from each individual. Let

More information

STA 450/4000 S: January

STA 450/4000 S: January STA 450/4000 S: January 6 005 Notes Friday tutorial on R programming reminder office hours on - F; -4 R The book Modern Applied Statistics with S by Venables and Ripley is very useful. Make sure you have

More information

2 Nonlinear least squares algorithms

2 Nonlinear least squares algorithms 1 Introduction Notes for 2017-05-01 We briefly discussed nonlinear least squares problems in a previous lecture, when we described the historical path leading to trust region methods starting from the

More information

Various types of likelihood

Various types of likelihood Various types of likelihood 1. likelihood, marginal likelihood, conditional likelihood, profile likelihood, adjusted profile likelihood 2. semi-parametric likelihood, partial likelihood 3. empirical likelihood,

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Lecture 3: Multiple Regression

Lecture 3: Multiple Regression Lecture 3: Multiple Regression R.G. Pierse 1 The General Linear Model Suppose that we have k explanatory variables Y i = β 1 + β X i + β 3 X 3i + + β k X ki + u i, i = 1,, n (1.1) or Y i = β j X ji + u

More information

Lecture 32: Asymptotic confidence sets and likelihoods

Lecture 32: Asymptotic confidence sets and likelihoods Lecture 32: Asymptotic confidence sets and likelihoods Asymptotic criterion In some problems, especially in nonparametric problems, it is difficult to find a reasonable confidence set with a given confidence

More information

Maximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood

Maximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood Maximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood PRE 906: Structural Equation Modeling Lecture #3 February 4, 2015 PRE 906, SEM: Estimation Today s Class An

More information

Composite Hypotheses and Generalized Likelihood Ratio Tests

Composite Hypotheses and Generalized Likelihood Ratio Tests Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve

More information

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general

More information

STAC51: Categorical data Analysis

STAC51: Categorical data Analysis STAC51: Categorical data Analysis Mahinda Samarakoon April 6, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 25 Table of contents 1 Building and applying logistic regression models (Chap

More information

High-dimensional regression:

High-dimensional regression: High-dimensional regression: How to pick the objective function in high-dimension UC Berkeley March 11, 2013 Joint work with Noureddine El Karoui, Peter Bickel, Chingwhay Lim, and Bin Yu 1 / 12 Notation.

More information

Modeling Real Estate Data using Quantile Regression

Modeling Real Estate Data using Quantile Regression Modeling Real Estate Data using Semiparametric Quantile Regression Department of Statistics University of Innsbruck September 9th, 2011 Overview 1 Application: 2 3 4 Hedonic regression data for house prices

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

Exercise 5.4 Solution

Exercise 5.4 Solution Exercise 5.4 Solution Niels Richard Hansen University of Copenhagen May 7, 2010 1 5.4(a) > leukemia

More information

MSH3 Generalized linear model

MSH3 Generalized linear model Contents MSH3 Generalized linear model 5 Logit Models for Binary Data 173 5.1 The Bernoulli and binomial distributions......... 173 5.1.1 Mean, variance and higher order moments.... 173 5.1.2 Normal limit....................

More information

M-Estimation under High-Dimensional Asymptotics

M-Estimation under High-Dimensional Asymptotics M-Estimation under High-Dimensional Asymptotics 2014-05-01 Classical M-estimation Big Data M-estimation An out-of-the-park grand-slam home run Annals of Mathematical Statistics 1964 Richard Olshen Classical

More information

Chapter 22: Log-linear regression for Poisson counts

Chapter 22: Log-linear regression for Poisson counts Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Logistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy

Logistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy Logistic Regression Some slides from Craig Burkett STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy Titanic Survival Case Study The RMS Titanic A British passenger liner Collided

More information

1 Exercises for lecture 1

1 Exercises for lecture 1 1 Exercises for lecture 1 Exercise 1 a) Show that if F is symmetric with respect to µ, and E( X )

More information

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

2. We care about proportion for categorical variable, but average for numerical one.

2. We care about proportion for categorical variable, but average for numerical one. Probit Model 1. We apply Probit model to Bank data. The dependent variable is deny, a dummy variable equaling one if a mortgage application is denied, and equaling zero if accepted. The key regressor is

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

Matched Pair Data. Stat 557 Heike Hofmann

Matched Pair Data. Stat 557 Heike Hofmann Matched Pair Data Stat 557 Heike Hofmann Outline Marginal Homogeneity - review Binary Response with covariates Ordinal response Symmetric Models Subject-specific vs Marginal Model conditional logistic

More information

General Regression Model

General Regression Model Scott S. Emerson, M.D., Ph.D. Department of Biostatistics, University of Washington, Seattle, WA 98195, USA January 5, 2015 Abstract Regression analysis can be viewed as an extension of two sample statistical

More information

MSH3 Generalized linear model Ch. 6 Count data models

MSH3 Generalized linear model Ch. 6 Count data models Contents MSH3 Generalized linear model Ch. 6 Count data models 6 Count data model 208 6.1 Introduction: The Children Ever Born Data....... 208 6.2 The Poisson Distribution................. 210 6.3 Log-Linear

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences

The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences Yuxin Chen Emmanuel Candès Department of Statistics, Stanford University, Sep. 2016 Nonconvex optimization

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

Introduction to Error Analysis

Introduction to Error Analysis Introduction to Error Analysis Part 1: the Basics Andrei Gritsan based on lectures by Petar Maksimović February 1, 2010 Overview Definitions Reporting results and rounding Accuracy vs precision systematic

More information

Linear Methods for Prediction

Linear Methods for Prediction This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Logistic Regression - problem 6.14

Logistic Regression - problem 6.14 Logistic Regression - problem 6.14 Let x 1, x 2,, x m be given values of an input variable x and let Y 1,, Y m be independent binomial random variables whose distributions depend on the corresponding values

More information

Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking

Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking Yuxin Chen Electrical Engineering, Princeton University Joint work with Jianqing Fan, Cong Ma and Kaizheng Wang Ranking A fundamental

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Lecture 3. Inference about multivariate normal distribution

Lecture 3. Inference about multivariate normal distribution Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates

More information

Age 55 (x = 1) Age < 55 (x = 0)

Age 55 (x = 1) Age < 55 (x = 0) Logistic Regression with a Single Dichotomous Predictor EXAMPLE: Consider the data in the file CHDcsv Instead of examining the relationship between the continuous variable age and the presence or absence

More information

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples. Regression models Generalized linear models in R Dr Peter K Dunn http://www.usq.edu.au Department of Mathematics and Computing University of Southern Queensland ASC, July 00 The usual linear regression

More information

Risk and Noise Estimation in High Dimensional Statistics via State Evolution

Risk and Noise Estimation in High Dimensional Statistics via State Evolution Risk and Noise Estimation in High Dimensional Statistics via State Evolution Mohsen Bayati Stanford University Joint work with Jose Bento, Murat Erdogdu, Marc Lelarge, and Andrea Montanari Statistical

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

Generalized Linear Models

Generalized Linear Models York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear

More information

Chapter 11. Regression with a Binary Dependent Variable

Chapter 11. Regression with a Binary Dependent Variable Chapter 11 Regression with a Binary Dependent Variable 2 Regression with a Binary Dependent Variable (SW Chapter 11) So far the dependent variable (Y) has been continuous: district-wide average test score

More information

MIT Spring 2016

MIT Spring 2016 Generalized Linear Models MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Generalized Linear Models 1 Generalized Linear Models 2 Generalized Linear Model Data: (y i, x i ), i = 1,..., n where y i : response

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:

More information

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid Applied Economics Regression with a Binary Dependent Variable Department of Economics Universidad Carlos III de Madrid See Stock and Watson (chapter 11) 1 / 28 Binary Dependent Variables: What is Different?

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Elias Tragas tragas@cs.toronto.edu October 3, 2016 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, 2016 1 / 23 Naive Bayes Bayes Rules: Naive

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two

More information

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey

More information

12 Modelling Binomial Response Data

12 Modelling Binomial Response Data c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual

More information