Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square
|
|
- Ralph Parker
- 6 years ago
- Views:
Transcription
1 Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square Yuxin Chen Electrical Engineering, Princeton University
2 Coauthors Pragya Sur Stanford Statistics Emmanuel Candès Stanford Statistics & Math 2/ 26
3 In memory of Tom Cover ( ) Stanford EE We all know the feeling that follows when one investigates a problem, goes through a large amount of algebra, and finally investigates the answer to find that the entire problem is illuminated not by the analysis but by the inspection of the answer 3/ 26
4 Inference in regression problems Example: logistic regression y i logistic-model ( x i β ), 1 i n 4/ 26
5 Inference in regression problems Example: logistic regression y i logistic-model ( x i β ), 1 i n One wishes to determine which covariate is of importance, i.e. β j = 0 vs. β j 0 (1 j p) 4/ 26
6 Classical tests β j = 0 vs. β j 0 (1 j p) Standard approaches (widely used in R, Matlab, etc): use asymptotic distributions of certain statistics 5/ 26
7 Classical tests β j = 0 vs. β j 0 (1 j p) Standard approaches (widely used in R, Matlab, etc): use asymptotic distributions of certain statistics Wald test: Wald statistic χ 2 Likelihood ratio test: log-likelihood ratio statistic χ 2 Score test:... score N (0, Fisher Info) 5/ 26
8 Example: logistic regression in R (n = 100, p = 30) > fit = glm(y ~ X, family = binomial) > summary(fit) Call: glm(formula = y ~ X, family = binomial) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) X X X X X X ** Signif. codes: 0 *** ** 0.01 * Can these inference calculations (e.g. p-values) be trusted? 6/ 26
9 This talk: likelihood ratio test (LRT) β j = 0 vs. β j 0 (1 j p) Log-likelihood ratio (LLR) statistic LLR j := l ( β ) l ( β ( j) ) l( ): log-likelihood β = arg max β l(β): unconstrained MLE 7/ 26
10 This talk: likelihood ratio test (LRT) β j = 0 vs. β j 0 (1 j p) Log-likelihood ratio (LLR) statistic LLR j := l ( β ) l ( β ( j) ) l( ): log-likelihood β = arg max β l(β): unconstrained MLE β ( j) = arg max β:βj =0 l(β): constrained MLE 7/ 26
11 Wilks phenomenon 1938 Samuel Wilks, Princeton βj = 0 vs. βj 6= 0 (1 j p) LRT asymptotically follows chi-square distribution (under null) d 2 LLRj χ21 (p fixed, n ) 8/ 26
12 30000 Wilks phenomenon Counts Counts P Values classical Wilks j = 0 Samuel Wilks P Values Bartlett-corrected vs. j = 0 (1 Æ j Æ p p-value LRT asymptotically follows chi-square distribution Bartlett correction (finite sample effect): 2 LLRj æ 2LLRj 1+ n /n 21 (p fixed, n æ Œ) 1 p-values are still non-uniform æ this is NOT finite sa Samuel Wilks, Princeton βj = 0 assess significance of coefficients vs. βj 6= 0 (1 j p) LRT asymptotically follows chi-square distribution (under null) d 2 LLRj χ21 (p fixed, n ) 8/ 26
13 Classical LRT in high dimensions p/n (1, ) Linear regression y = Xβ + η }{{} i.i.d. Gaussian classical p-values arenuniform For linear regression (with Gaussian noise) in high dimensions, 2LLR j χ 2 1 (classical test always works) 9/ 26
14 Classical LRT in high dimensions p = 1200, n = 4000 Logistic regression y logistic-model(xβ) classical p-values are highly nonuniform 10/ 26
15 Classical LRT in high dimensions p = 1200, n = 4000 Logistic regression y logistic-model(xβ) classical p-values are highly nonuniform Wilks theorem seems inadequate in accommodating logistic regression in high dimensions 10/ 26
16 Bartlett correction? (n = 4000, p = 1200) Counts P Values classical Wilks Bartlett correction (finite sample effect): 2LLR j 1+α n/n χ2 1 11/ 26
17 Bartlett correction? (n = 4000, p = 1200) Counts Counts P Values P Values classical Wilks Bartlett-corrected 2LLR Bartlett correction (finite sample effect): j 1+α n/n χ2 1 p-values are still non-uniform this is NOT finite sample effect 11/ 26
18 Bartlett correction? (n = 4000, p = 1200) Counts Counts P Values P Values classical Wilks Bartlett-corrected 2LLR Bartlett correction (finite sample effect): j 1+α n/n χ2 1 p-values are still non-uniform this is NOT finite sample effect What happens in high dimensions? 11/ 26
19 Our findings Counts Counts Counts P Values P Values P Values classical Wilks Bartlett-corrected rescaled χ 2 2LLR Bartlett correction (finite sample effect): j 1+α n/n χ2 1 p-values are still non-uniform this is NOT finite sample effect A glimpse of our theory: LRT follows a rescaled χ 2 distribution 11/ 26
20 ood: fy X (y x} Y unknown signal: observation: Y likelihood: (y x} X observation: Y fy X funknown unknown signal: X Yob Xlikelihood: Y X (y x}signal: Problem formulation (formal) X fy X (y x} y Y X likelihood: fy X (y x} unknown signal: X yunknown X y X signal: observation: Y X X y n observation y n p p ind. Gaussian design: Xi N (0, Σ) Logistic model: yi = 1, 1, with prob. with prob. 1 1+exp( Xi> β) 1 1+exp(Xi> β) 1 i n Proportional growth: p/n constant Global null: β = 0 12/ 26
21 For p/n > 1/2 yimle does 2 {0, 1} not or y exist i 2 { 1, 1} When does MLE exist? n X Class labels (MLE) maximizeβ `(β) = Class labels n o log 1 + exp( yi Xi> β) i=1 {z yi 2 {0, 1} or y 0 i yi 2 {0, 1} or y i 2 { 1, 1} 2 { 1, 1}} p-values are uniform p-values are highly non-uniform p-values are uniform p-values are highly non-uniform yi = 1 yi = 1 yi = 1 yi = 1 MLE is unbounded if perfect separating hyperplane 13/ 26
22 When does MLE exist? (MLE) maximizeβ `(β) = n X n o log 1 + exp( yi Xi> β) i=1 {z 0 } If a hyperplane that perfectly separates {yi }, i.e. βb then MLE is unbounded s.t. yi Xi> βb > 0 for all i lim `( a aβb {z} )=0 unbounded 13/ 26
23 When does MLE exist? Separating capacity (Tom Cover, Ph. D. thesis 1965) y i =1 y i = 1 n =2 n =4 n = 12 number of samples n increases = more difficult to find separating hyperplane 14/ 26
24 When does MLE exist? Separating capacity (Tom Cover, Ph. D. thesis 1965) y i =1 y i = 1 n =2 n =4 n = 12 Theorem 1 (Cover 1965) Under i.i.d. Gaussian design, a separating hyperplane exists with high prob. iff n/p < 2 (asymptotically) 14/ 26
25 Main result: asymptotic distribution of LRT Theorem 2 (Sur, Chen, Candès 2017) Suppose n/p > 2. Under i.i.d. Gaussian design and global null, ( d p 2 LLR j α χ n) 2 1 }{{} rescaled χ 2 15/ 26
26 Main result: asymptotic distribution of LRT Theorem 2 (Sur, Chen, Candès 2017) Suppose n/p > 2. Under i.i.d. Gaussian design and global null, ( d p 2 LLR j α χ n) 2 1 }{{} rescaled χ 2 α(p/n) can be determined by solving a system of 2 nonlinear equations and 2 unknowns τ 2 = n p E [ (Ψ(τZ; b)) 2] p n = E [ Ψ (τz; b) ] where Z N (0, 1), Ψ is some operator, and α(p/n) = τ 2 /b 15/ 26
27 Main result: asymptotic distribution of LRT Theorem 2 (Sur, Chen, Candès 2017) Suppose n/p > 2. Under i.i.d. Gaussian design and global null, ( d p 2 LLR j α χ n) 2 1 }{{} rescaled χ 2 α(p/n) can be determined by solving a system of 2 nonlinear equations and 2 unknowns α( ) depends only on aspect ratio p/n It is not a finite sample effect α(0) = 1: matches classical theory 15/ 26
28 Our adjusted LRT theory in practice rescaling constant (p/n) Rescaling Constant κ p/n Counts P Values rescaling constant for logistic model empirical p-values Unif(0, 1) Empirically, LRT rescaled χ 2 1 (as predicted) 16/ 26
29 Validity of tail approximation Empirical cdf t Empirical CDF of adjusted pvalues (n = 4000, p = 1200) Empirical CDF is in near-perfect aggreement with diagonal, suggesting that our theory is remarkably accurate even when we zoom in 17/ 26
30 Efficacy under moderate sample sizes Empirical cdf t Empirical CDF of adjusted pvalues (n = 200, p = 60) Our theory seems adequete for moderately large samples 18/ 26
31 Universality: non-gaussian covariates Counts Counts 5000 Counts P Values P Values P Values classical Wilks Bartlett-corrected rescaled χ 2 i.i.d. Bernoulli design, n = 4000, p = / 26
32 Connection to convex geometry 20/ 26
33 For p/n > 1/2 MLE does not exist sign(xi b) = 1 Class labels sign( yi Xi b) = 1 8i i Xb > 0 Connection yi 2 {0, 1} or y i 2 to convex geometry Perfect separation: b Class labels yi {0, 1} or y i { 1, 1} d Prob. rand. subspace pos. orth. > sign( yi Xi> b)intersects = sign(x i b) sign( yi Xino b) separating =1 i { 1, 1} p P {Xb b R } Rn+ = {0}? hyperplane d sign( yi Xi b) = sign(xi b) sign(xi> b) = 1 8i () sign(xi b) = 1 i Xb > 0 Xb > 0 Prob. rand. subspace intersects pos. orth. = {0}? norn+separating P {Xb b 2 Rp } \ Rn+ 6= {0}? separating hyperplane Prob. rand. subspace intersects pos. orth. P {Xb b Rp } hyperplane exists WLOG, if y1 = = yn = 1, then separability ) = range(x) fl Rn+ = {0} separating hyperplane exists WLOG, if y1 = = yn = 1, then separability = n * (1) 20/ 24 o range(x) Rn+ 6= {0} {z } can be analyzed via convex geometry (e.g. Amelunxen et al.) 20/ 26
34 Connection to robust M-estimation Since y i = ±1 and X i ind. N (0, Σ), maximize β l(β) = d = n i=1 n { } log 1 + exp( y i Xi β) log { } 1 + exp( Xi β) i=1 } {{} n := i=1 ρ( X iβ) 21/ 26
35 Connection to robust M-estimation Since y i = ±1 and X i ind. N (0, Σ), maximize β l(β) = d = n i=1 n { } log 1 + exp( y i Xi β) log { } 1 + exp( Xi β) i=1 } {{} n := i=1 ρ( X iβ) = MLE β n = arg min ρ(y i X i β) β i=1 }{{} robust M-estimation with y = 0 21/ 26
36 Connection to robust M-estimation MLE β n = arg min ρ(y i X i β) β i=1 }{{} robust M-estimation Variance inflation as p/n (El Karoui et al. 13, Donoho, Montanari 13) 22/ 26
37 Connection to robust M-estimation MLE β n = arg min ρ(y i X i β) β i=1 }{{} robust M-estimation Variance inflation as p/n (El Karoui et al. 13, Donoho, Montanari 13) rescaling constant (p/n) Rescaling Constant κ p/n variance inflation increasing rescaling factor 22/ 26
38 This is not just about logistic regression Our theory is applicable to logistic model probit model linear model (under Gaussian noise) rescaling const α(p/n) 1 (consistent with classical theory) linear model (under non-gaussian noise) 23/ 26
39 Key ingredients in establishing our theory Key step is to realize that 2 LLR j d p b(p/n) β 2 j }{{} rescaled χ 2 where b( ) depends only on p n, β ( α( j N 0, n)b( p n) p ) p 24/ 26
40 Key ingredients in establishing our theory Key step is to realize that 2 LLR j d p b(p/n) β 2 j }{{} rescaled χ 2 where b( ) depends only on p n, β ( α( j N 0, n)b( p n) p ) p 1. Convex geometry: show β = O(1) 2. Approximate message passing: determine asymptotic distribution of β 3. Leave-one-out analysis: characterize marginal distribution of β j (rescaled Gaussian) and ensure higher-order terms vanish 24/ 26
41 A small sample of prior work Candès, Fan, Janson, Lv 16: observed empirically nonuniformity of p-values in logistic regression Fan, Demirkaya, Lv 17: classical asymptotic normality of MLE (basis of Wald test) fails to hold in logistic regression when p n 2/3 El Karoui, Beana, Bickel, Limb, Yu 13, El Karoui 13, Donoho, Montanari 13, El Karoui 15: robust M-estimation for linear models (assuming strong convexity) 25/ 26
42 Summary Caution needs to be exercised when applying classical statistical procedures in a high-dimensional regime a regime of growing interest and relevance What shall we do under non-null (β 0)? Paper: The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled chi-square, Pragya Sur, Yuxin Chen, Emmanuel Candès, / 26
A Modern Maximum-Likelihood Theory for High-dimensional Logistic Regression
A Modern Maximum-Likelihood Theory for High-dimensional Logistic Regression Pragya Sur Emmanuel J. Candès April 2018 Abstract Every student in statistics or data science learns early on that when the sample
More informationA Modern Maximum-Likelihood Theory for High-dimensional Logistic Regression
A Modern Maximum-Likelihood Theory for High-dimensional Logistic Regression Pragya Sur Emmanuel J. Candès April 25, 2018 Abstract Every student in statistics or data science learns early on that when the
More informationInference For High Dimensional M-estimates: Fixed Design Results
Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49
More informationA Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46
A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response
More informationParametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1
Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson
More informationExercises Chapter 4 Statistical Hypothesis Testing
Exercises Chapter 4 Statistical Hypothesis Testing Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans December 5, 013 Christophe Hurlin (University of Orléans) Advanced Econometrics
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science
UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator
More informationLinear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model
Regression: Part II Linear Regression y~n X, 2 X Y Data Model β, σ 2 Process Model Β 0,V β s 1,s 2 Parameter Model Assumptions of Linear Model Homoskedasticity No error in X variables Error in Y variables
More informationLogistic Regression 21/05
Logistic Regression 21/05 Recall that we are trying to solve a classification problem in which features x i can be continuous or discrete (coded as 0/1) and the response y is discrete (0/1). Logistic regression
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationThe Phase Transition for the Existence of the Maximum Likelihood Estimate in High-dimensional Logistic Regression
The Phase Transition for the Existence of the Maximum Likelihood Estimate in High-dimensional Logistic Regression Emmanuel J. Candès Pragya Sur April 25, 2018 Abstract This paper rigorously establishes
More informationInference for High Dimensional Robust Regression
Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:
More informationPoisson regression 1/15
Poisson regression 1/15 2/15 Counts data Examples of counts data: Number of hospitalizations over a period of time Number of passengers in a bus station Blood cells number in a blood sample Number of typos
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationhigh-dimensional inference robust to the lack of model sparsity
high-dimensional inference robust to the lack of model sparsity Jelena Bradic (joint with a PhD student Yinchu Zhu) www.jelenabradic.net Assistant Professor Department of Mathematics University of California,
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More information1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches
Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.
More informationTwo Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00
Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section
More informationBayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples
Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,
More informationGeneralized Linear Models Introduction
Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,
More informationModeling Overdispersion
James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 Introduction 2 Introduction In this lecture we discuss the problem of overdispersion in
More informationGeneralized Linear Models. Last time: Background & motivation for moving beyond linear
Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered
More informationNormal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,
Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability
More informationIntroduction to the Generalized Linear Model: Logistic regression and Poisson regression
Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Statistical modelling: Theory and practice Gilles Guillot gigu@dtu.dk November 4, 2013 Gilles Guillot (gigu@dtu.dk)
More information2.1 Linear regression with matrices
21 Linear regression with matrices The values of the independent variables are united into the matrix X (design matrix), the values of the outcome and the coefficient are represented by the vectors Y and
More informationApplied Asymptotics Case studies in higher order inference
Applied Asymptotics Case studies in higher order inference Nancy Reid May 18, 2006 A.C. Davison, A. R. Brazzale, A. M. Staicu Introduction likelihood-based inference in parametric models higher order approximations
More information8. Hypothesis Testing
FE661 - Statistical Methods for Financial Engineering 8. Hypothesis Testing Jitkomut Songsiri introduction Wald test likelihood-based tests significance test for linear regression 8-1 Introduction elements
More informationLogistic Regressions. Stat 430
Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to
More informationAFT Models and Empirical Likelihood
AFT Models and Empirical Likelihood Mai Zhou Department of Statistics, University of Kentucky Collaborators: Gang Li (UCLA); A. Bathke; M. Kim (Kentucky) Accelerated Failure Time (AFT) models: Y = log(t
More informationGeneralized Linear Models for Non-Normal Data
Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture
More informationLecture 5: LDA and Logistic Regression
Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant
More informationAdministration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books
STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,
More informationSTAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis
STAT 6350 Analysis of Lifetime Data Failure-time Regression Analysis Explanatory Variables for Failure Times Usually explanatory variables explain/predict why some units fail quickly and some units survive
More information1/15. Over or under dispersion Problem
1/15 Over or under dispersion Problem 2/15 Example 1: dogs and owners data set In the dogs and owners example, we had some concerns about the dependence among the measurements from each individual. Let
More informationSTA 450/4000 S: January
STA 450/4000 S: January 6 005 Notes Friday tutorial on R programming reminder office hours on - F; -4 R The book Modern Applied Statistics with S by Venables and Ripley is very useful. Make sure you have
More information2 Nonlinear least squares algorithms
1 Introduction Notes for 2017-05-01 We briefly discussed nonlinear least squares problems in a previous lecture, when we described the historical path leading to trust region methods starting from the
More informationVarious types of likelihood
Various types of likelihood 1. likelihood, marginal likelihood, conditional likelihood, profile likelihood, adjusted profile likelihood 2. semi-parametric likelihood, partial likelihood 3. empirical likelihood,
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationLecture 3: Multiple Regression
Lecture 3: Multiple Regression R.G. Pierse 1 The General Linear Model Suppose that we have k explanatory variables Y i = β 1 + β X i + β 3 X 3i + + β k X ki + u i, i = 1,, n (1.1) or Y i = β j X ji + u
More informationLecture 32: Asymptotic confidence sets and likelihoods
Lecture 32: Asymptotic confidence sets and likelihoods Asymptotic criterion In some problems, especially in nonparametric problems, it is difficult to find a reasonable confidence set with a given confidence
More informationMaximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood
Maximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood PRE 906: Structural Equation Modeling Lecture #3 February 4, 2015 PRE 906, SEM: Estimation Today s Class An
More informationComposite Hypotheses and Generalized Likelihood Ratio Tests
Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve
More informationSTA216: Generalized Linear Models. Lecture 1. Review and Introduction
STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general
More informationSTAC51: Categorical data Analysis
STAC51: Categorical data Analysis Mahinda Samarakoon April 6, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 25 Table of contents 1 Building and applying logistic regression models (Chap
More informationHigh-dimensional regression:
High-dimensional regression: How to pick the objective function in high-dimension UC Berkeley March 11, 2013 Joint work with Noureddine El Karoui, Peter Bickel, Chingwhay Lim, and Bin Yu 1 / 12 Notation.
More informationModeling Real Estate Data using Quantile Regression
Modeling Real Estate Data using Semiparametric Quantile Regression Department of Statistics University of Innsbruck September 9th, 2011 Overview 1 Application: 2 3 4 Hedonic regression data for house prices
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More informationExercise 5.4 Solution
Exercise 5.4 Solution Niels Richard Hansen University of Copenhagen May 7, 2010 1 5.4(a) > leukemia
More informationMSH3 Generalized linear model
Contents MSH3 Generalized linear model 5 Logit Models for Binary Data 173 5.1 The Bernoulli and binomial distributions......... 173 5.1.1 Mean, variance and higher order moments.... 173 5.1.2 Normal limit....................
More informationM-Estimation under High-Dimensional Asymptotics
M-Estimation under High-Dimensional Asymptotics 2014-05-01 Classical M-estimation Big Data M-estimation An out-of-the-park grand-slam home run Annals of Mathematical Statistics 1964 Richard Olshen Classical
More informationChapter 22: Log-linear regression for Poisson counts
Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationLogistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy
Logistic Regression Some slides from Craig Burkett STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy Titanic Survival Case Study The RMS Titanic A British passenger liner Collided
More information1 Exercises for lecture 1
1 Exercises for lecture 1 Exercise 1 a) Show that if F is symmetric with respect to µ, and E( X )
More informationGauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA
JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More information2. We care about proportion for categorical variable, but average for numerical one.
Probit Model 1. We apply Probit model to Bank data. The dependent variable is deny, a dummy variable equaling one if a mortgage application is denied, and equaling zero if accepted. The key regressor is
More informationBIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation
BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)
More informationLecture 12: Effect modification, and confounding in logistic regression
Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More information9 Generalized Linear Models
9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models
More informationMatched Pair Data. Stat 557 Heike Hofmann
Matched Pair Data Stat 557 Heike Hofmann Outline Marginal Homogeneity - review Binary Response with covariates Ordinal response Symmetric Models Subject-specific vs Marginal Model conditional logistic
More informationGeneral Regression Model
Scott S. Emerson, M.D., Ph.D. Department of Biostatistics, University of Washington, Seattle, WA 98195, USA January 5, 2015 Abstract Regression analysis can be viewed as an extension of two sample statistical
More informationMSH3 Generalized linear model Ch. 6 Count data models
Contents MSH3 Generalized linear model Ch. 6 Count data models 6 Count data model 208 6.1 Introduction: The Children Ever Born Data....... 208 6.2 The Poisson Distribution................. 210 6.3 Log-Linear
More informationSolving Corrupted Quadratic Equations, Provably
Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin
More informationFigure 36: Respiratory infection versus time for the first 49 children.
y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects
More informationThe Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences
The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences Yuxin Chen Emmanuel Candès Department of Statistics, Stanford University, Sep. 2016 Nonconvex optimization
More informationSTAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song
STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April
More informationIntroduction to Error Analysis
Introduction to Error Analysis Part 1: the Basics Andrei Gritsan based on lectures by Petar Maksimović February 1, 2010 Overview Definitions Reporting results and rounding Accuracy vs precision systematic
More informationLinear Methods for Prediction
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationLogistic Regression - problem 6.14
Logistic Regression - problem 6.14 Let x 1, x 2,, x m be given values of an input variable x and let Y 1,, Y m be independent binomial random variables whose distributions depend on the corresponding values
More informationSpectral Method and Regularized MLE Are Both Optimal for Top-K Ranking
Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking Yuxin Chen Electrical Engineering, Princeton University Joint work with Jianqing Fan, Cong Ma and Kaizheng Wang Ranking A fundamental
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationLecture 3. Inference about multivariate normal distribution
Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates
More informationAge 55 (x = 1) Age < 55 (x = 0)
Logistic Regression with a Single Dichotomous Predictor EXAMPLE: Consider the data in the file CHDcsv Instead of examining the relationship between the continuous variable age and the presence or absence
More informationRegression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.
Regression models Generalized linear models in R Dr Peter K Dunn http://www.usq.edu.au Department of Mathematics and Computing University of Southern Queensland ASC, July 00 The usual linear regression
More informationRisk and Noise Estimation in High Dimensional Statistics via State Evolution
Risk and Noise Estimation in High Dimensional Statistics via State Evolution Mohsen Bayati Stanford University Joint work with Jose Bento, Murat Erdogdu, Marc Lelarge, and Andrea Montanari Statistical
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationGeneralized Linear Models
York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear
More informationChapter 11. Regression with a Binary Dependent Variable
Chapter 11 Regression with a Binary Dependent Variable 2 Regression with a Binary Dependent Variable (SW Chapter 11) So far the dependent variable (Y) has been continuous: district-wide average test score
More informationMIT Spring 2016
Generalized Linear Models MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Generalized Linear Models 1 Generalized Linear Models 2 Generalized Linear Model Data: (y i, x i ), i = 1,..., n where y i : response
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:
More informationApplied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid
Applied Economics Regression with a Binary Dependent Variable Department of Economics Universidad Carlos III de Madrid See Stock and Watson (chapter 11) 1 / 28 Binary Dependent Variables: What is Different?
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Elias Tragas tragas@cs.toronto.edu October 3, 2016 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, 2016 1 / 23 Naive Bayes Bayes Rules: Naive
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More informationSupport Vector Machines
Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two
More informationGeneralized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence
Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey
More information12 Modelling Binomial Response Data
c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual
More information