Model selection in penalized Gaussian graphical models
|
|
- Georgia Conley
- 5 years ago
- Views:
Transcription
1 University of Groningen ernst January 2014
2 Penalized likelihood generates a PATH of solutions Consider an experiment: Γ genes measured across T time points. Assume n iid samples y (1),..., y (n), where y (i) = (y (i) 1,..., y (i) ΓT ). Assume Y (i) N(0, K 1 ), then Likelihood: l(k y) n {log( K ) tr(sk)}. 2 AIM: Optimization of penalized likelihood: subject to K 0; ˆK λ := argmax K {l(k y)} K 1 1/λ... for λ in some range!!; some factorial colouring F.
3 Between proximity and truth True process for data Y : Y Q. A statistical model is a collection of measures: M i = {P θ θ Θ i }... and typically we consider several: M = {M 1,..., M k }. What is the best model? Proximity : the model that is closest to the truth: min i k KL(Pˆθi ; Q). Truth : the model that is most likely to be the truth: max i k P(M i Y ).
4 Truth With flat prior on Θ M and M, model probability for M M is: p(m y) p(y M)/p(y) e n l(θ) θ Θ M e l(ˆθ) Θ M e 1 2 (θ ˆθ) t n 2 θ 2 l(ˆθ)(θ ˆθ) θ e l(ˆθ) (2π) p/2 n p/2 2 θ l(ˆθ) 2 where p = dim(θ M ) and l(ˆθ) = l(ˆθ)/n. Schwarz (1978) ignored terms not depending on n: p(m y) e l(ˆθ) n p/2, for large n. 1/2,
5 BIC: Bayesian information criterion By applying the log and ignoring constant terms c: BIC(M) = 2l(ˆθ) + p log(n). Minimizing BIC corresponds to maximizing posterior model probability. Define model weights W (M), W (M) = e BIC(M)/2, which rescaled correspond to posterior model probabilities, p(m i y) = W (M i ) M M W (M).
6 BIC for Gaussian graphical models The BIC for an estimated Gaussian graphical model K: BIC( K) = n( log K + Tr(S K)) + log(n), p K where p K = {unique non-zeroes in K}, which is less than p(p 1)/2.
7 BIC for Gaussian graphical models The BIC for an estimated Gaussian graphical model K: BIC( K) = n( log K + Tr(S K)) + p K log(n), where p K = {unique non-zeroes in K}, which is less than p(p 1)/2. But... BIC is an asymptotic approximation. What is n?? For penalized estimate K are number of parameters not smaller??
8 ebic: extended Bayesian information criterion The BIC for an estimated Gaussian graphical model K: ebic γ ( K) = n( log K + Tr(S K)) + log(n) + log(p), p K 4γp K where p K = {unique non-zeroes in K} p = number of nodes γ = tuning paramter(0 γ 1) If γ = 0 ordinary BIC γ = 1 additional sparsity γ = 0.5 good trade-off (Foygel & Drton, 2010)
9 Proximity Let the true density of the data Y be: Y q = dq. The Kullback-Leibler (KL) divergence between fitted and true model KL(ˆθ i ) = q(y) log q(y)dy q(y) log p(y; ˆθ i )dy
10 Proximity Let the true density of the data Y be: Y q = dq. The Kullback-Leibler (KL) divergence between fitted and true model KL(ˆθ i ) = q(y) log q(y)dy q(y) log p(y; ˆθ i )dy = C E q (l(ˆθ i )
11 Proximity Let the true density of the data Y be: Y q = dq. The Kullback-Leibler (KL) divergence between fitted and true model KL(ˆθ i ) = q(y) log q(y)dy q(y) log p(y; ˆθ i )dy = C E q (l(ˆθ i ) C l(ˆθ i ) + p i where p(.; θ) is density ( associated ) with P θ. pi = Trace Ji 1 K i dim(θ i ), where J i and K i are Fisher informations using model M i : ( 2 log f (Y, J i = E ˆθ ) ( ) i ) log(f (Y, ˆθi ) g θ θ t, K i = V g. θ
12 AIC: Akaike s information criterion By applying the log and ignoring constant terms c: AIC(M) = 2l(ˆθ) + 2p. Minimizing AIC corresponds to maximizing posterior model probability. Define Akaike weights W (M), W (M) = e AIC(M)/2, which rescaled correspond to probability weights that add up to one, p(m i ) = W (M i ) M M W (M).
13 AIC for Gaussian graphical models The AIC for an estimated Gaussian graphical model K: AIC( K) = n( log K + Tr(S K)) +, 2p K where p K = {unique non-zeroes in K}, which is less than p(p 1)/2.
14 AIC for Gaussian graphical models The AIC for an estimated Gaussian graphical model K: AIC( K) = n( log K + Tr(S K)) + 2p K, where p K = {unique non-zeroes in K}, which is less than p(p 1)/2. But... AIC is an asymptotic approximation. What is n?? For penalized estimate K are number of parameters not smaller?? In the next slides, we propose three alternatives.
15 1. Exact AIC KL(K 0 K) = 1 2 {Tr( KΣ 0 ) log KΣ 0 p} Scaling this by 2 and ignoring a constant 2KL(K 0 K) = {log K Tr( KS)} {Tr( K(Σ 0 S))}. We can write this as 2KL(K 0 K) = 2l( K) {Tr( K(Σ 0 S))}. Definition (Degrees of freedom in Gaussian graphical model) Let Y N(0, K 1 0 ) and K an estimate of K 0 : df K = 1 2 {Tr( K(Σ 0 S))}.
16 Approximate Exact AIC We obviously don t know Σ 0, but we can estimate it: for some tuning paramter π. Σ 0 = πs (1 π)diag{σ 2 11,..., σ 2 pp}, Definition (Approximate Exact AIC) Let Y N(0, K 1 0 ) and K an estimate of K 0 : AIC( K) = 2l( K) + 2 df, where df = 1 2 {Tr( K( Σ 0 S))}.
17 2. Exact GIC Problem of AIC: ˆK λ is not a MLE. GIC for M-estimator ˆK derived by Konishi & Kitagawa (1996): n GIC = 2 l k ( ˆK; x k ) + 2tr{R 1 Q}, (1) k=1 where R and Q are square matrices of order p 2 given by R = 1 n {Dψ(x k, K)} n K= ˆK, k=1 Q = 1 n ψ(x k, K)Dl k (K) n K= ˆK. k=1 Problem: Bias term in (1) requires inversion of d 2 d 2 matrix R.
18 Approximate GIC (Abbruzzo, Vujacic, Wit) We derived an explicit estimator of KL that avoids matrix inversion: where ĜIC(λ) = 2l( ˆK λ ) + 2 df GIC, df GIC = nk=1 c(s k I λ ) c( ˆK λ (S k I λ ) ˆK λ ) 2n where c() is the vectorize operator, S k = x k x T k, I λ = 1 * ( ˆK λ!= 0), an indicator matrix. c(s I λ) c( ˆK λ (S I λ ) ˆK λ ), 2
19 Approximate GIC: Simulations df_kl df_gic df_aic KL GIC AIC λ λ Bias term (left) and KL divergence (right) estimates for n = 100, d = 50.
20 3. Exact Cross-validation (CV) Ignoring an additive constant, the KL divergence is: KL(K 0 K) = 1 2 log K + E Σ0 1 2 {Tr( KXX t )} The idea of cross-validation is to replace the final expectation by KL(K 0 K) 1 2n n {log K (k) + Tr( K (k) x k xk)} t k=1 Clearly, this would require recalculating the estimate K (k) many times.
21 Approximate Cross-validation (Vujacic, Abbruzzo, Wit) We write the KL divergence as follows, KL(K 0 ˆK λ ) = 1 n l( ˆK λ ) + bias, where l(k) = n{log K tr(ks)}/2 and bias = tr( ˆK λ (K 1 0 S))/2. Definition LOOCV-inspired estimate of Kullback-Leibler divergence KLCV (λ) = 1 ni=1 n l( ˆK 1 c[( ˆK λ S i ) I λ ] ( λ )+ ˆK λ ˆK λ )c[(s S i ) I λ ], n(n 1) where c() is the vectorize operator, S k = x k x T k, I λ = 1 * ( ˆK λ!= 0), an indicator matrix.
22 Proof LOOCV = 1 n f (S i, ˆK ( i) ) 2n i=1 = 1 2 f (S, ˆK) 1 n [f (S i, ˆK ( i) ) f (S i, ˆK)] 2n i=1 1 n l( ˆK) 1 n [ df (Si, ˆK) ] c( ˆK ( i) ˆK). 2n dω i=1 Matrix differential calculus: df (S i, ˆK)/dΩ = c( ˆK 1 S i ). The term c( ˆK ( i) ˆK) is obtained via the Taylor expansion 0 df (S, ˆK) dω + d 2 f (S, ˆK) dω 2 c( ˆK ( i) ˆK) + d 2 f (S, ˆK) dωds c(s( i) S). Inserting: df (S, ˆK)/dΩ = c( ˆK 1 S), d 2 f (S, ˆK)/dΩdS = I p 2, d 2 f (S, ˆK)/dΩ 2 = ˆK 1 ˆK 1 and consequently c( ˆK ( i) ˆK) = ( ˆK ˆK)c(S ( i) S).
23 KLCV simulation results d=100 KL ORACLE KLCV AIC GACV n= (0.37) (0.45) (0.28) (19.94) n= (0.34) (0.39) (0.41) (2.77) n= (0.27) (0.33) (0.81) (1.40) n= (0.19) (0.23) (0.48) (0.52) n= (0.07) (0.08) (0.08) (0.07 )
24 Conclusions BIC aims to find true model AIC aims to come closest to the truth BIC gives sparser model than AIC (typically) AIC/BIC have asymptotic issues. What are number of parameters for penalized inference? Improved versions are available!
Lecture 6: Model Checking and Selection
Lecture 6: Model Checking and Selection Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 27, 2014 Model selection We often have multiple modeling choices that are equally sensible: M 1,, M T. Which
More informationMISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30
MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationStatistics 202: Data Mining. c Jonathan Taylor. Model-based clustering Based in part on slides from textbook, slides of Susan Holmes.
Model-based clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Model-based clustering General approach Choose a type of mixture model (e.g. multivariate Normal)
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 21 Model selection Choosing the best model among a collection of models {M 1, M 2..., M N }. What is a good model? 1. fits the data well (model
More informationInference of Gaussian graphical models and ordinary differential equations Vujacic, Ivan
University of Groningen Inference of Gaussian graphical models and ordinary differential equations Vujacic, Ivan IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if
More informationBias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition
Bias Correction of Cross-Validation Criterion Based on Kullback-Leibler Information under a General Condition Hirokazu Yanagihara 1, Tetsuji Tonda 2 and Chieko Matsumoto 3 1 Department of Social Systems
More information9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures
FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models
More informationInference in non-linear time series
Intro LS MLE Other Erik Lindström Centre for Mathematical Sciences Lund University LU/LTH & DTU Intro LS MLE Other General Properties Popular estimatiors Overview Introduction General Properties Estimators
More informationHigh-dimensional Covariance Estimation Based On Gaussian Graphical Models
High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes
More informationFeature selection with high-dimensional data: criteria and Proc. Procedures
Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June
More informationBayesian Asymptotics
BS2 Statistical Inference, Lecture 8, Hilary Term 2008 May 7, 2008 The univariate case The multivariate case For large λ we have the approximation I = b a e λg(y) h(y) dy = e λg(y ) h(y ) 2π λg (y ) {
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationAkaike Information Criterion
Akaike Information Criterion Shuhua Hu Center for Research in Scientific Computation North Carolina State University Raleigh, NC February 7, 2012-1- background Background Model statistical model: Y j =
More informationEstimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1
Estimation and Model Selection in Mixed Effects Models Part I Adeline Samson 1 1 University Paris Descartes Summer school 2009 - Lipari, Italy These slides are based on Marc Lavielle s slides Outline 1
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationAll models are wrong but some are useful. George Box (1979)
All models are wrong but some are useful. George Box (1979) The problem of model selection is overrun by a serious difficulty: even if a criterion could be settled on to determine optimality, it is hard
More informationChapter 7: Model Assessment and Selection
Chapter 7: Model Assessment and Selection DD3364 April 20, 2012 Introduction Regression: Review of our problem Have target variable Y to estimate from a vector of inputs X. A prediction model ˆf(X) has
More informationThe Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models
The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health
More informationModel Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model
Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population
More informationMinimum Description Length (MDL)
Minimum Description Length (MDL) Lyle Ungar AIC Akaike Information Criterion BIC Bayesian Information Criterion RIC Risk Inflation Criterion MDL u Sender and receiver both know X u Want to send y using
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationModel selection using penalty function criteria
Model selection using penalty function criteria Laimonis Kavalieris University of Otago Dunedin, New Zealand Econometrics, Time Series Analysis, and Systems Theory Wien, June 18 20 Outline Classes of models.
More informationLasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices
Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,
More informationExtended Bayesian Information Criteria for Gaussian Graphical Models
Extended Bayesian Information Criteria for Gaussian Graphical Models Rina Foygel University of Chicago rina@uchicago.edu Mathias Drton University of Chicago drton@uchicago.edu Abstract Gaussian graphical
More informationShrinkage Tuning Parameter Selection in Precision Matrices Estimation
arxiv:0909.1123v1 [stat.me] 7 Sep 2009 Shrinkage Tuning Parameter Selection in Precision Matrices Estimation Heng Lian Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang
More informationGenerative and Discriminative Approaches to Graphical Models CMSC Topics in AI
Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single
More informationMemoirs of the Faculty of Engineering, Okayama University, Vol. 42, pp , January Geometric BIC. Kenichi KANATANI
Memoirs of the Faculty of Engineering, Okayama University, Vol. 4, pp. 10-17, January 008 Geometric BIC Kenichi KANATANI Department of Computer Science, Okayama University Okayama 700-8530 Japan (Received
More informationBayesian methods in economics and finance
1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent
More informationVariable selection for model-based clustering
Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition
More informationAn Extended BIC for Model Selection
An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri,
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationBayesian Model Comparison
BS2 Statistical Inference, Lecture 11, Hilary Term 2009 February 26, 2009 Basic result An accurate approximation Asymptotic posterior distribution An integral of form I = b a e λg(y) h(y) dy where h(y)
More informationGraduate Econometrics I: Maximum Likelihood I
Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood
More informationMASM22/FMSN30: Linear and Logistic Regression, 7.5 hp FMSN40:... with Data Gathering, 9 hp
Selection criteria Example Methods MASM22/FMSN30: Linear and Logistic Regression, 7.5 hp FMSN40:... with Data Gathering, 9 hp Lecture 5, spring 2018 Model selection tools Mathematical Statistics / Centre
More informationModel Comparison. Course on Bayesian Inference, WTCN, UCL, February Model Comparison. Bayes rule for models. Linear Models. AIC and BIC.
Course on Bayesian Inference, WTCN, UCL, February 2013 A prior distribution over model space p(m) (or hypothesis space ) can be updated to a posterior distribution after observing data y. This is implemented
More informationStructure estimation for Gaussian graphical models
Faculty of Science Structure estimation for Gaussian graphical models Steffen Lauritzen, University of Copenhagen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 3 Slide 1/48 Overview of
More informationICES REPORT Model Misspecification and Plausibility
ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin
More informationModel Fitting and Model Selection. Model Fitting in Astronomy. Model Selection in Astronomy.
Model Fitting and Model Selection G. Jogesh Babu Penn State University http://www.stat.psu.edu/ babu http://astrostatistics.psu.edu Model Fitting Non-linear regression Density (shape) estimation Parameter
More informationLeast Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006
Least Squares Model Averaging Bruce E. Hansen University of Wisconsin January 2006 Revised: August 2006 Introduction This paper developes a model averaging estimator for linear regression. Model averaging
More informationSTAT 740: Testing & Model Selection
STAT 740: Testing & Model Selection Timothy Hanson Department of Statistics, University of South Carolina Stat 740: Statistical Computing 1 / 34 Testing & model choice, likelihood-based A common way to
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationSTAT Financial Time Series
STAT 6104 - Financial Time Series Chapter 4 - Estimation in the time Domain Chun Yip Yau (CUHK) STAT 6104:Financial Time Series 1 / 46 Agenda 1 Introduction 2 Moment Estimates 3 Autoregressive Models (AR
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationTheory of Maximum Likelihood Estimation. Konstantin Kashin
Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random
More informationTesting Restrictions and Comparing Models
Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance
More informationBIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation
BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)
More informationEstimating prediction error in mixed models
Estimating prediction error in mixed models benjamin saefken, thomas kneib georg-august university goettingen sonja greven ludwig-maximilians-university munich 1 / 12 GLMM - Generalized linear mixed models
More informationarxiv: v6 [math.st] 3 Feb 2018
Submitted to the Annals of Statistics HIGH-DIMENSIONAL CONSISTENCY IN SCORE-BASED AND HYBRID STRUCTURE LEARNING arxiv:1507.02608v6 [math.st] 3 Feb 2018 By Preetam Nandy,, Alain Hauser and Marloes H. Maathuis,
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationIntroduction to graphical models: Lecture III
Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction
More informationy Xw 2 2 y Xw λ w 2 2
CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:
More informationI. Bayesian econometrics
I. Bayesian econometrics A. Introduction B. Bayesian inference in the univariate regression model C. Statistical decision theory Question: once we ve calculated the posterior distribution, what do we do
More informationApproximate Likelihoods
Approximate Likelihoods Nancy Reid July 28, 2015 Why likelihood? makes probability modelling central l(θ; y) = log f (y; θ) emphasizes the inverse problem of reasoning y θ converts a prior probability
More informationMinimum Message Length Analysis of the Behrens Fisher Problem
Analysis of the Behrens Fisher Problem Enes Makalic and Daniel F Schmidt Centre for MEGA Epidemiology The University of Melbourne Solomonoff 85th Memorial Conference, 2011 Outline Introduction 1 Introduction
More informationCentral Limit Theorem ( 5.3)
Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately
More informationLearning Binary Classifiers for Multi-Class Problem
Research Memorandum No. 1010 September 28, 2006 Learning Binary Classifiers for Multi-Class Problem Shiro Ikeda The Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku, Tokyo, 106-8569,
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming
More informationRegression I: Mean Squared Error and Measuring Quality of Fit
Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving
More informationPosterior Regularization
Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods
More informationMath 533 Extra Hour Material
Math 533 Extra Hour Material A Justification for Regression The Justification for Regression It is well-known that if we want to predict a random quantity Y using some quantity m according to a mean-squared
More informationA Confidence Region Approach to Tuning for Variable Selection
A Confidence Region Approach to Tuning for Variable Selection Funda Gunes and Howard D. Bondell Department of Statistics North Carolina State University Abstract We develop an approach to tuning of penalized
More informationAn Information Criteria for Order-restricted Inference
An Information Criteria for Order-restricted Inference Nan Lin a, Tianqing Liu 2,b, and Baoxue Zhang,2,b a Department of Mathematics, Washington University in Saint Louis, Saint Louis, MO 633, U.S.A. b
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate
More information14 : Mean Field Assumption
10-708: Probabilistic Graphical Models 10-708, Spring 2018 14 : Mean Field Assumption Lecturer: Kayhan Batmanghelich Scribes: Yao-Hung Hubert Tsai 1 Inferential Problems Can be categorized into three aspects:
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationModel selection criteria in Classification contexts. Gilles Celeux INRIA Futurs (orsay)
Model selection criteria in Classification contexts Gilles Celeux INRIA Futurs (orsay) Cluster analysis Exploratory data analysis tools which aim is to find clusters in a large set of data (many observations
More informationComposite Likelihood Estimation
Composite Likelihood Estimation With application to spatial clustered data Cristiano Varin Wirtschaftsuniversität Wien April 29, 2016 Credits CV, Nancy M Reid and David Firth (2011). An overview of composite
More informationLabel Switching and Its Simple Solutions for Frequentist Mixture Models
Label Switching and Its Simple Solutions for Frequentist Mixture Models Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Abstract The label switching
More informationModel comparison: Deviance-based approaches
Model comparison: Deviance-based approaches Patrick Breheny February 19 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/23 Model comparison Thus far, we have looked at residuals in a fairly
More informationNew Bayesian methods for model comparison
Back to the future New Bayesian methods for model comparison Murray Aitkin murray.aitkin@unimelb.edu.au Department of Mathematics and Statistics The University of Melbourne Australia Bayesian Model Comparison
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationVariable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension
Variable Selection in Multivariate Linear Regression Models with Fewer Observations than the Dimension (Last Modified: November 3 2008) Mariko Yamamura 1, Hirokazu Yanagihara 2 and Muni S. Srivastava 3
More informationApplications of Information Geometry to Hypothesis Testing and Signal Detection
CMCAA 2016 Applications of Information Geometry to Hypothesis Testing and Signal Detection Yongqiang Cheng National University of Defense Technology July 2016 Outline 1. Principles of Information Geometry
More informationLabel switching and its solutions for frequentist mixture models
Journal of Statistical Computation and Simulation ISSN: 0094-9655 (Print) 1563-5163 (Online) Journal homepage: http://www.tandfonline.com/loi/gscs20 Label switching and its solutions for frequentist mixture
More informationA Bayesian view of model complexity
A Bayesian view of model complexity Angelika van der Linde University of Bremen, Germany 1. Intuitions 2. Measures of dependence between observables and parameters 3. Occurrence in predictive model comparison
More informationDIC, AIC, BIC, PPL, MSPE Residuals Predictive residuals
DIC, AIC, BIC, PPL, MSPE Residuals Predictive residuals Overall Measures of GOF Deviance: this measures the overall likelihood of the model given a parameter vector D( θ) = 2 log L( θ) This can be averaged
More informationCS 540: Machine Learning Lecture 2: Review of Probability & Statistics
CS 540: Machine Learning Lecture 2: Review of Probability & Statistics AD January 2008 AD () January 2008 1 / 35 Outline Probability theory (PRML, Section 1.2) Statistics (PRML, Sections 2.1-2.4) AD ()
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationMinimax Redundancy for Large Alphabets by Analytic Methods
Minimax Redundancy for Large Alphabets by Analytic Methods Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 March 15, 2012 NSF STC Center for Science of Information CISS, Princeton 2012 Joint
More informationTP, TN, FP and FN Tables for dierent methods in dierent parameters:
TP, TN, FP and FN Tables for dierent methods in dierent parameters: Zhu,Yunan January 11, 2015 Notes: The mean of TP/TN/FP/FN shows NaN means the original matrix is a zero matrix True P/N the bigger the
More informationMixtures and Hidden Markov Models for analyzing genomic data
Mixtures and Hidden Markov Models for analyzing genomic data Marie-Laure Martin-Magniette UMR AgroParisTech/INRA Mathématique et Informatique Appliquées, Paris UMR INRA/UEVE ERL CNRS Unité de Recherche
More informationLikelihood-Based Methods
Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)
More informationInformation Theory and Model Selection
Information Theory and Model Selection Dean Foster & Robert Stine Dept of Statistics, Univ of Pennsylvania May 11, 1998 Data compression and coding Duality of code lengths and probabilities Model selection
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More informationHigh dimensional ising model selection using l 1 -regularized logistic regression
High dimensional ising model selection using l 1 -regularized logistic regression 1 Department of Statistics Pennsylvania State University 597 Presentation 2016 1/29 Outline Introduction 1 Introduction
More informationParametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory
Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007
More informationPractical Econometrics. for. Finance and Economics. (Econometrics 2)
Practical Econometrics for Finance and Economics (Econometrics 2) Seppo Pynnönen and Bernd Pape Department of Mathematics and Statistics, University of Vaasa 1. Introduction 1.1 Econometrics Econometrics
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationMidterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm
Midterm Examination STA 215: Statistical Inference Due Wednesday, 2006 Mar 8, 1:15 pm This is an open-book take-home examination. You may work on it during any consecutive 24-hour period you like; please
More informationStatistics 262: Intermediate Biostatistics Model selection
Statistics 262: Intermediate Biostatistics Model selection Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Today s class Model selection. Strategies for model selection.
More information