Unsupervised Regressive Learning in High-dimensional Space

Size: px
Start display at page:

Download "Unsupervised Regressive Learning in High-dimensional Space"

Transcription

1 Unsupervised Regressive Learning in High-dimensional Space University of Kent ATRC Leicester 31st July, 2018

2 Outline Data Linkage Analysis High dimensionality and variable screening Variable screening and mixture models EPD mixture regression models EPD mixture-based variable selection Simulation studies Conclusion

3 Data linkage analysis Data telematic devices (time-dependent measurements), credit reports (discrete), satellite data (time series), genetic data, historical records from policy administration systems. Insurers are demonstrating increasing interest in using data linkage to improve their pricing accuracy and facilitate more effective loss prevention. See Policy Briefing from IFoA (2017). However, the utility of data linkage can be compromised by the high-dimensionality, heterogeneity and heavy distribution tails of these data.

4 High dimensionality and variable screening Consider y i = p x ij β j + ε i, 1 i n, j=1

5 High dimensionality and variable screening Consider y i = p x ij β j + ε i, 1 i n, j=1 where ε i s are i.i.d. N(0, 1) and there are many more variables than the sample size.

6 High dimensionality and variable screening Consider y i = p x ij β j + ε i, 1 i n, j=1 where ε i s are i.i.d. N(0, 1) and there are many more variables than the sample size.

7 High dimensionality and variable screening LASSO: Estimated coefficients by the L 1 penalised least squares. Correlation screening: To screen variables, for each j, we single out the j-th covariate and rewrite the above equation as y i = x ij β j + ε i, with ε i = t j x itβ t + ε i, 1 i n.

8 High dimensionality and variable screening LASSO: Estimated coefficients by the L 1 penalised least squares. Correlation screening: To screen variables, for each j, we single out the j-th covariate and rewrite the above equation as y i = x ij β j + ε i, with ε i = t j x itβ t + ε i, 1 i n. This gives rise to what is called correlation variable screening.

9 Variable screening and mixture models In general, LASSO is not efficient when y i is heterogeneous. Correlation variable screening is also not efficient,

10 Variable screening and mixture models In general, LASSO is not efficient when y i is heterogeneous. Correlation variable screening is also not efficient, if covariate observations have a group structure, where ε i s are heterogeneously distributed, or

11 Variable screening and mixture models In general, LASSO is not efficient when y i is heterogeneous. Correlation variable screening is also not efficient, if covariate observations have a group structure, where ε i s are heterogeneously distributed, or if {x it : 1 t p} are heavy tailed.

12 Variable screening and mixture models In general, LASSO is not efficient when y i is heterogeneous. Correlation variable screening is also not efficient, if covariate observations have a group structure, where ε i s are heterogeneously distributed, or if {x it : 1 t p} are heavy tailed. To address the issues, we first consider a family of distributions for y i s called EPD.

13 EPD φ(y µ, σ, α) = ) α ( 2σΓ(1/α) exp y µ α σ α, where µ (, ), α > 0 and σ > 0. It is normal if α = 2 and Laplace if α = 1. Figure:

14 EPD mixture regression models Let (y i, x i ), i = 1,..., n be independent observations on response y and p-dimensional covariate x.

15 EPD mixture regression models Let (y i, x i ), i = 1,..., n be independent observations on response y and p-dimensional covariate x. We then consider f (y i x i, Θ K ) = K π k φ(y i x T i β k, σk 2, α k), k=1

16 EPD mixture regression models Let (y i, x i ), i = 1,..., n be independent observations on response y and p-dimensional covariate x. We then consider f (y i x i, Θ K ) = K π k φ(y i x T i β k, σk 2, α k), k=1 where Θ K denotes the set of all the parameters, φ(y i x T i β k, σ 2 k, α k) is the k-th component density with regression coefficients β k = (β k1,..., β kp ) T R p,

17 EPD mixture regression models Let (y i, x i ), i = 1,..., n be independent observations on response y and p-dimensional covariate x. We then consider f (y i x i, Θ K ) = K π k φ(y i x T i β k, σk 2, α k), k=1 where Θ K denotes the set of all the parameters, φ(y i x T i β k, σ 2 k, α k) is the k-th component density with regression coefficients β k = (β k1,..., β kp ) T R p, σ 2 k (0, ), α k (0, ), proportion π k 0, and K k=1 π k = 1.

18 EPD mixture-based variable selection The new proposal: For each observation, we first construct the penalized likelihood and then combine these likelihoods together by a component-wise weighting:

19 EPD mixture-based variable selection The new proposal: For each observation, we first construct the penalized likelihood and then combine these likelihoods together by a component-wise weighting: pl n (Θ K (y i, x i )) = K π k φ(y i x T i β k, σk 2, α k) k=1 exp ( λ β ) k 1 + κ 0 1 σ k σ 2/n. k

20 EPD mixture-based variable selection The proposed penalized likelihood: pl n (Θ K (Y, X)) = n K pl n (Θ K (y i, x i )) i=1 k=1 π δ k k, where δ k, k = 1,..., K are pre-specified constants with default δ k = 1/K.

21 EPD mixture-based variable selection The proposed penalized likelihood: pl n (Θ K (Y, X)) = n K pl n (Θ K (y i, x i )) i=1 k=1 π δ k k, where δ k, k = 1,..., K are pre-specified constants with default δ k = 1/K. The number of components, K, is chosen by minimizing a BIC with respect to 1 K K n and λ 0 λ λ 1. The advantage of the new proposal over the existing one lies in computation and the convergence of the GEM algorithm.

22 Simulation studies We considered the following four screening procedures in simulation studies (λ = 0): Correlation learning or simple Gaussian linear regression (GAU1)

23 Simulation studies We considered the following four screening procedures in simulation studies (λ = 0): Correlation learning or simple Gaussian linear regression (GAU1) Simple EPD linear regression (EPD1)

24 Simulation studies We considered the following four screening procedures in simulation studies (λ = 0): Correlation learning or simple Gaussian linear regression (GAU1) Simple EPD linear regression (EPD1) Simple Gaussian mixture regression (GAUMIX, BIC-based)

25 Simulation studies We considered the following four screening procedures in simulation studies (λ = 0): Correlation learning or simple Gaussian linear regression (GAU1) Simple EPD linear regression (EPD1) Simple Gaussian mixture regression (GAUMIX, BIC-based) Simple EPD mixture regression (EPDMIX, BIC-based).

26 Simulation studies We considered the following four screening procedures in simulation studies (λ = 0): Correlation learning or simple Gaussian linear regression (GAU1) Simple EPD linear regression (EPD1) Simple Gaussian mixture regression (GAUMIX, BIC-based) Simple EPD mixture regression (EPDMIX, BIC-based). We compare the performances of GAU1, EPD1, GAUMIX, and EPDMIX in screening out non-active covariates in the model in terms of specificity and sensitivity.

27 Simulation studies Setting (multiple linear regression): We generated 40 datasets with the sample size n and the dimension p. Each dataset contained observations (y i, x ij ), 1 j p, 1 i n satisfying y i = p x ij β 0j + ε i, j=1 where ε i, 1 i n were iid N(0, 1), and the regression coefficients β 0 = (2 + η 1, η 2, η 3, η 4, η 5, 0 T p 5) T, where η j, 1 j 5, were iid N(0, ), and 0 p 5 was a p 5 vector of zeros.

28 Simulation studies Setting (multiple linear regression): We generated 40 datasets with the sample size n and the dimension p. Each dataset contained observations (y i, x ij ), 1 j p, 1 i n satisfying y i = p x ij β 0j + ε i, j=1 where ε i, 1 i n were iid N(0, 1), and the regression coefficients β 0 = (2 + η 1, η 2, η 3, η 4, η 5, 0 T p 5) T, where η j, 1 j 5, were iid N(0, ), and 0 p 5 was a p 5 vector of zeros. There were five active covariates in the model.

29 Simulations Setting (Gaussian mixture regression): We generated 40 datasets with the sample size n, the dimension p and K 0 components.

30 Simulations Setting (Gaussian mixture regression): We generated 40 datasets with the sample size n, the dimension p and K 0 components. Given x i = (x i1,..., x ip ) T s, y i, 1 i p were independently sampled from K 0 f (y i ) = π k φ(y i x T i β k ), k=1 where φ(.) is the density of the standard normal distribution.

31 Simulations We considered the two cases of K 0 : (1) K 0 = 2, where there are two components with β 1 = (2 + v 1, v 2, v 3, v 4, v 5, 0 T p 5) T, β 2 = (0, 0, 0, 4 + v 21, 4 + v 22, 4 + v 23, 4 + v 24, 4 + v 25, 0 T p 8) T, where v j, 1 j 5, are iid N(0, ) and 0 p 5 is a p 5 vector of zeros.

32 Simulations We considered the two cases of K 0 : (1) K 0 = 2, where there are two components with β 1 = (2 + v 1, v 2, v 3, v 4, v 5, 0 T p 5) T, β 2 = (0, 0, 0, 4 + v 21, 4 + v 22, 4 + v 23, 4 + v 24, 4 + v 25, 0 T p 8) T, where v j, 1 j 5, are iid N(0, ) and 0 p 5 is a p 5 vector of zeros. (2) K 0 = 3, where there are three components with β 1 = (2 + v 11, v 12, v 13, v 14, v 15, 0 T p 5) T, β 2 = (0, 0, 0, 4 + v 21, 4 + v 22, 4 + v 23, 4 + v 24, 4 + v 25, 0 T p 8) T, β 3 = (0, 0, 0, 0, 0, 0, 4 + v 31, 4 + v 32, 0 T p 8) T, where v kj, 1 j 5, k = 1, 2, v 31, v 32 are iid N(0, ), and 0 p 8 is a p 8 vector of zeros.

33 Simulations We considered the two cases of K 0 : (1) K 0 = 2, where there are two components with β 1 = (2 + v 1, v 2, v 3, v 4, v 5, 0 T p 5) T, β 2 = (0, 0, 0, 4 + v 21, 4 + v 22, 4 + v 23, 4 + v 24, 4 + v 25, 0 T p 8) T, where v j, 1 j 5, are iid N(0, ) and 0 p 5 is a p 5 vector of zeros. (2) K 0 = 3, where there are three components with β 1 = (2 + v 11, v 12, v 13, v 14, v 15, 0 T p 5) T, β 2 = (0, 0, 0, 4 + v 21, 4 + v 22, 4 + v 23, 4 + v 24, 4 + v 25, 0 T p 8) T, β 3 = (0, 0, 0, 0, 0, 0, 4 + v 31, 4 + v 32, 0 T p 8) T, where v kj, 1 j 5, k = 1, 2, v 31, v 32 are iid N(0, ), and 0 p 8 is a p 8 vector of zeros. For each case of K 0, we considered (n, p) = (300, 400) and (500, 600).

34 Table: Percentage increase of average specificity compared to the GAU1 in variable screening Setting 4.1.1: single component Sensitivity 5/5 4/5 3/5 2/5 1/5 Percentage increase of ave. spe. (%) (n, p) = (500, 600) GAU EPD EPDMIX GAUMIX (n, p) = (100, 2000) GAU EPD EPDMIX GAUMIX

35 Table: Percentage increase of average specificity compared to the GAU1 in variable screening Setting 4.1.2: multiple components Sensitivity 8/8 7/8 6/8 5/8 4/8 3/8 2/8 1/8 Percentage increase of ave. spe. (%) Two components: (n, p) = (300, 400) GAU EPD EPDMIX GAUMIX Two components: (n, p) = (500, 600) GAU EPD EPDMIX GAUMIX

36 Table: Percentage increase of average specificity compared to the GAU1 in variable screening Setting 4.1.2: multiple components Sensitivity 8/8 7/8 6/8 5/8 4/8 3/8 2/8 1/8 Percentage increase of ave. spe. (%) Three components:(n, p) = (300, 400) GAU EPD EPDMIX GAUMIX Three components:(n, p) = (500, 600) GAU EPD EPDMIX GAUMIX

37 Conclusion We have proposed a new approach for upsupervised regressive learning.

38 Conclusion We have proposed a new approach for upsupervised regressive learning. The proposal has been shown to outperform the existing procedures by simulation studies.

39 Conclusion We have proposed a new approach for upsupervised regressive learning. The proposal has been shown to outperform the existing procedures by simulation studies. Comparison to LASSO also favored our approach.

40 Thank you!

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Modeling Real Estate Data using Quantile Regression

Modeling Real Estate Data using Quantile Regression Modeling Real Estate Data using Semiparametric Quantile Regression Department of Statistics University of Innsbruck September 9th, 2011 Overview 1 Application: 2 3 4 Hedonic regression data for house prices

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Dimension Reduction Methods

Dimension Reduction Methods Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.

More information

Time Series Models for Measuring Market Risk

Time Series Models for Measuring Market Risk Time Series Models for Measuring Market Risk José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department June 28, 2007 1/ 32 Outline 1 Introduction 2 Competitive and collaborative

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection

Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection Tsuyoshi Idé ( Ide-san ), Ankush Khandelwal*, Jayant Kalagnanam IBM Research, T. J. Watson Research Center (*Currently with University

More information

Estimating subgroup specific treatment effects via concave fusion

Estimating subgroup specific treatment effects via concave fusion Estimating subgroup specific treatment effects via concave fusion Jian Huang University of Iowa April 6, 2016 Outline 1 Motivation and the problem 2 The proposed model and approach Concave pairwise fusion

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Bayesian shrinkage approach in variable selection for mixed

Bayesian shrinkage approach in variable selection for mixed Bayesian shrinkage approach in variable selection for mixed effects s GGI Statistics Conference, Florence, 2015 Bayesian Variable Selection June 22-26, 2015 Outline 1 Introduction 2 3 4 Outline Introduction

More information

High-dimensional regression modeling

High-dimensional regression modeling High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Lecture 14: Shrinkage

Lecture 14: Shrinkage Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Tail negative dependence and its applications for aggregate loss modeling

Tail negative dependence and its applications for aggregate loss modeling Tail negative dependence and its applications for aggregate loss modeling Lei Hua Division of Statistics Oct 20, 2014, ISU L. Hua (NIU) 1/35 1 Motivation 2 Tail order Elliptical copula Extreme value copula

More information

An Introduction to Graphical Lasso

An Introduction to Graphical Lasso An Introduction to Graphical Lasso Bo Chang Graphical Models Reading Group May 15, 2015 Bo Chang (UBC) Graphical Lasso May 15, 2015 1 / 16 Undirected Graphical Models An undirected graph, each vertex represents

More information

An Extended BIC for Model Selection

An Extended BIC for Model Selection An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri,

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Sparse and Robust Optimization and Applications

Sparse and Robust Optimization and Applications Sparse and and Statistical Learning Workshop Les Houches, 2013 Robust Laurent El Ghaoui with Mert Pilanci, Anh Pham EECS Dept., UC Berkeley January 7, 2013 1 / 36 Outline Sparse Sparse Sparse Probability

More information

Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior

Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior Tom Heskes joint work with Marcel van Gerven

More information

Uncertainty quantification and visualization for functional random variables

Uncertainty quantification and visualization for functional random variables Uncertainty quantification and visualization for functional random variables MascotNum Workshop 2014 S. Nanty 1,3 C. Helbert 2 A. Marrel 1 N. Pérot 1 C. Prieur 3 1 CEA, DEN/DER/SESI/LSMR, F-13108, Saint-Paul-lez-Durance,

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

Robust Bayesian Simple Linear Regression

Robust Bayesian Simple Linear Regression Robust Bayesian Simple Linear Regression October 1, 2008 Readings: GIll 4 Robust Bayesian Simple Linear Regression p.1/11 Body Fat Data: Intervals w/ All Data 95% confidence and prediction intervals for

More information

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation University of Oxford Statistical Methods Autocorrelation Identification and Estimation Dr. Órlaith Burke Michaelmas Term, 2011 Department of Statistics, 1 South Parks Road, Oxford OX1 3TG Contents 1 Model

More information

An economic application of machine learning: Nowcasting Thai exports using global financial market data and time-lag lasso

An economic application of machine learning: Nowcasting Thai exports using global financial market data and time-lag lasso An economic application of machine learning: Nowcasting Thai exports using global financial market data and time-lag lasso PIER Exchange Nov. 17, 2016 Thammarak Moenjak What is machine learning? Wikipedia

More information

Modeling conditional distributions with mixture models: Theory and Inference

Modeling conditional distributions with mixture models: Theory and Inference Modeling conditional distributions with mixture models: Theory and Inference John Geweke University of Iowa, USA Journal of Applied Econometrics Invited Lecture Università di Venezia Italia June 2, 2005

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Ratemaking application of Bayesian LASSO with conjugate hyperprior

Ratemaking application of Bayesian LASSO with conjugate hyperprior Ratemaking application of Bayesian LASSO with conjugate hyperprior Himchan Jeong and Emiliano A. Valdez University of Connecticut Actuarial Science Seminar Department of Mathematics University of Illinois

More information

Greedy Layer-Wise Training of Deep Networks

Greedy Layer-Wise Training of Deep Networks Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn

More information

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1 Estimation and Model Selection in Mixed Effects Models Part I Adeline Samson 1 1 University Paris Descartes Summer school 2009 - Lipari, Italy These slides are based on Marc Lavielle s slides Outline 1

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information

Lecture 5: GPs and Streaming regression

Lecture 5: GPs and Streaming regression Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html

More information

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential

More information

Learning from Data: Regression

Learning from Data: Regression November 3, 2005 http://www.anc.ed.ac.uk/ amos/lfd/ Classification or Regression? Classification: want to learn a discrete target variable. Regression: want to learn a continuous target variable. Linear

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage

Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Daniel F. Schmidt Centre for Biostatistics and Epidemiology The University of Melbourne Monash University May 11, 2017 Outline

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu 1,2, Daniel F. Schmidt 1, Enes Makalic 1, Guoqi Qian 2, John L. Hopper 1 1 Centre for Epidemiology and Biostatistics,

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Counts using Jitters joint work with Peng Shi, Northern Illinois University

Counts using Jitters joint work with Peng Shi, Northern Illinois University of Claim Longitudinal of Claim joint work with Peng Shi, Northern Illinois University UConn Actuarial Science Seminar 2 December 2011 Department of Mathematics University of Connecticut Storrs, Connecticut,

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.)

MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.) 1/12 MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.) Dominique Guillot Departments of Mathematical Sciences University of Delaware May 6, 2016

More information

Étude de classes de noyaux adaptées à la simplification et à l interprétation des modèles d approximation.

Étude de classes de noyaux adaptées à la simplification et à l interprétation des modèles d approximation. Étude de classes de noyaux adaptées à la simplification et à l interprétation des modèles d approximation. Soutenance de thèse de N. Durrande Ecole des Mines de St-Etienne, 9 november 2011 Directeurs Laurent

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Review of probabilities

Review of probabilities CS 1675 Introduction to Machine Learning Lecture 5 Density estimation Milos Hauskrecht milos@pitt.edu 5329 Sennott Square Review of probabilities 1 robability theory Studies and describes random processes

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:

More information

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Aly Kane alykane@stanford.edu Ariel Sagalovsky asagalov@stanford.edu Abstract Equipped with an understanding of the factors that influence

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

High Dimensional Kullback-Leibler divergence for grassland classification using satellite image time series with high spatial resolution

High Dimensional Kullback-Leibler divergence for grassland classification using satellite image time series with high spatial resolution High Dimensional Kullback-Leibler divergence for grassland classification using satellite image time series with high spatial resolution Presented by 1 In collaboration with Mathieu Fauvel1, Stéphane Girard2

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

Infinitely Imbalanced Logistic Regression

Infinitely Imbalanced Logistic Regression p. 1/1 Infinitely Imbalanced Logistic Regression Art B. Owen Journal of Machine Learning Research, April 2007 Presenter: Ivo D. Shterev p. 2/1 Outline Motivation Introduction Numerical Examples Notation

More information

Machine Learning - MT Classification: Generative Models

Machine Learning - MT Classification: Generative Models Machine Learning - MT 2016 7. Classification: Generative Models Varun Kanade University of Oxford October 31, 2016 Announcements Practical 1 Submission Try to get signed off during session itself Otherwise,

More information

Minimum Message Length Inference and Mixture Modelling of Inverse Gaussian Distributions

Minimum Message Length Inference and Mixture Modelling of Inverse Gaussian Distributions Minimum Message Length Inference and Mixture Modelling of Inverse Gaussian Distributions Daniel F. Schmidt Enes Makalic Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School

More information

Spatial Lasso with Application to GIS Model Selection. F. Jay Breidt Colorado State University

Spatial Lasso with Application to GIS Model Selection. F. Jay Breidt Colorado State University Spatial Lasso with Application to GIS Model Selection F. Jay Breidt Colorado State University with Hsin-Cheng Huang, Nan-Jung Hsu, and Dave Theobald September 25 The work reported here was developed under

More information

Feature selection with high-dimensional data: criteria and Proc. Procedures

Feature selection with high-dimensional data: criteria and Proc. Procedures Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June

More information

Machine Learning for Data Science (CS4786) Lecture 12

Machine Learning for Data Science (CS4786) Lecture 12 Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We

More information

Modeling conditional distributions with mixture models: Applications in finance and financial decision-making

Modeling conditional distributions with mixture models: Applications in finance and financial decision-making Modeling conditional distributions with mixture models: Applications in finance and financial decision-making John Geweke University of Iowa, USA Journal of Applied Econometrics Invited Lecture Università

More information

Density Estimation: ML, MAP, Bayesian estimation

Density Estimation: ML, MAP, Bayesian estimation Density Estimation: ML, MAP, Bayesian estimation CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Maximum-Likelihood Estimation Maximum

More information

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data A Bayesian Perspective on Residential Demand Response Using Smart Meter Data Datong-Paul Zhou, Maximilian Balandat, and Claire Tomlin University of California, Berkeley [datong.zhou, balandat, tomlin]@eecs.berkeley.edu

More information

Multivariate Normal-Laplace Distribution and Processes

Multivariate Normal-Laplace Distribution and Processes CHAPTER 4 Multivariate Normal-Laplace Distribution and Processes The normal-laplace distribution, which results from the convolution of independent normal and Laplace random variables is introduced by

More information

Cellwise robust regularized discriminant analysis

Cellwise robust regularized discriminant analysis Cellwise robust regularized discriminant analysis Ines Wilms (KU Leuven) and Stéphanie Aerts (University of Liège) ICORS, July 2017 Wilms and Aerts Cellwise robust regularized discriminant analysis 1 Discriminant

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

Symmetric btw positive & negative prior returns. where c is referred to as risk premium, which is expected to be positive.

Symmetric btw positive & negative prior returns. where c is referred to as risk premium, which is expected to be positive. Advantages of GARCH model Simplicity Generates volatility clustering Heavy tails (high kurtosis) Weaknesses of GARCH model Symmetric btw positive & negative prior returns Restrictive Provides no explanation

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

Neural Networks - II

Neural Networks - II Neural Networks - II Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1

More information

Group exponential penalties for bi-level variable selection

Group exponential penalties for bi-level variable selection for bi-level variable selection Department of Biostatistics Department of Statistics University of Kentucky July 31, 2011 Introduction In regression, variables can often be thought of as grouped: Indicator

More information

Machine Learning. 7. Logistic and Linear Regression

Machine Learning. 7. Logistic and Linear Regression Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Regularization in Cox Frailty Models

Regularization in Cox Frailty Models Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University

More information

Bayesian Dropout. Tue Herlau, Morten Morup and Mikkel N. Schmidt. Feb 20, Discussed by: Yizhe Zhang

Bayesian Dropout. Tue Herlau, Morten Morup and Mikkel N. Schmidt. Feb 20, Discussed by: Yizhe Zhang Bayesian Dropout Tue Herlau, Morten Morup and Mikkel N. Schmidt Discussed by: Yizhe Zhang Feb 20, 2016 Outline 1 Introduction 2 Model 3 Inference 4 Experiments Dropout Training stage: A unit is present

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Cellwise robust regularized discriminant analysis

Cellwise robust regularized discriminant analysis Cellwise robust regularized discriminant analysis JSM 2017 Stéphanie Aerts University of Liège, Belgium Ines Wilms KU Leuven, Belgium Cellwise robust regularized discriminant analysis 1 Discriminant analysis

More information

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

Multivariate Normal Models

Multivariate Normal Models Case Study 3: fmri Prediction Graphical LASSO Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox February 26 th, 2013 Emily Fox 2013 1 Multivariate Normal Models

More information

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution Semiparametric Mixed Effects Models with Flexible Random Effects Distribution Marie Davidian North Carolina State University davidian@stat.ncsu.edu www.stat.ncsu.edu/ davidian Joint work with A. Tsiatis,

More information

A Selective Review of Sufficient Dimension Reduction

A Selective Review of Sufficient Dimension Reduction A Selective Review of Sufficient Dimension Reduction Lexin Li Department of Statistics North Carolina State University Lexin Li (NCSU) Sufficient Dimension Reduction 1 / 19 Outline 1 General Framework

More information

November 2002 STA Random Effects Selection in Linear Mixed Models

November 2002 STA Random Effects Selection in Linear Mixed Models November 2002 STA216 1 Random Effects Selection in Linear Mixed Models November 2002 STA216 2 Introduction It is common practice in many applications to collect multiple measurements on a subject. Linear

More information

An Imputation-Consistency Algorithm for Biomedical Complex Data Analysis

An Imputation-Consistency Algorithm for Biomedical Complex Data Analysis An Imputation-Consistency Algorithm for Biomedical Complex Data Analysis Faming Liang Purdue University January 11, 2018 Outline Introduction to biomedical complex data An IC algorithm for high-dimensional

More information

Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms

Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms university-logo Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms Andrew Barron Cong Huang Xi Luo Department of Statistics Yale University 2008 Workshop on Sparsity in High Dimensional

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information