Unsupervised Regressive Learning in High-dimensional Space
|
|
- Claude Heath
- 5 years ago
- Views:
Transcription
1 Unsupervised Regressive Learning in High-dimensional Space University of Kent ATRC Leicester 31st July, 2018
2 Outline Data Linkage Analysis High dimensionality and variable screening Variable screening and mixture models EPD mixture regression models EPD mixture-based variable selection Simulation studies Conclusion
3 Data linkage analysis Data telematic devices (time-dependent measurements), credit reports (discrete), satellite data (time series), genetic data, historical records from policy administration systems. Insurers are demonstrating increasing interest in using data linkage to improve their pricing accuracy and facilitate more effective loss prevention. See Policy Briefing from IFoA (2017). However, the utility of data linkage can be compromised by the high-dimensionality, heterogeneity and heavy distribution tails of these data.
4 High dimensionality and variable screening Consider y i = p x ij β j + ε i, 1 i n, j=1
5 High dimensionality and variable screening Consider y i = p x ij β j + ε i, 1 i n, j=1 where ε i s are i.i.d. N(0, 1) and there are many more variables than the sample size.
6 High dimensionality and variable screening Consider y i = p x ij β j + ε i, 1 i n, j=1 where ε i s are i.i.d. N(0, 1) and there are many more variables than the sample size.
7 High dimensionality and variable screening LASSO: Estimated coefficients by the L 1 penalised least squares. Correlation screening: To screen variables, for each j, we single out the j-th covariate and rewrite the above equation as y i = x ij β j + ε i, with ε i = t j x itβ t + ε i, 1 i n.
8 High dimensionality and variable screening LASSO: Estimated coefficients by the L 1 penalised least squares. Correlation screening: To screen variables, for each j, we single out the j-th covariate and rewrite the above equation as y i = x ij β j + ε i, with ε i = t j x itβ t + ε i, 1 i n. This gives rise to what is called correlation variable screening.
9 Variable screening and mixture models In general, LASSO is not efficient when y i is heterogeneous. Correlation variable screening is also not efficient,
10 Variable screening and mixture models In general, LASSO is not efficient when y i is heterogeneous. Correlation variable screening is also not efficient, if covariate observations have a group structure, where ε i s are heterogeneously distributed, or
11 Variable screening and mixture models In general, LASSO is not efficient when y i is heterogeneous. Correlation variable screening is also not efficient, if covariate observations have a group structure, where ε i s are heterogeneously distributed, or if {x it : 1 t p} are heavy tailed.
12 Variable screening and mixture models In general, LASSO is not efficient when y i is heterogeneous. Correlation variable screening is also not efficient, if covariate observations have a group structure, where ε i s are heterogeneously distributed, or if {x it : 1 t p} are heavy tailed. To address the issues, we first consider a family of distributions for y i s called EPD.
13 EPD φ(y µ, σ, α) = ) α ( 2σΓ(1/α) exp y µ α σ α, where µ (, ), α > 0 and σ > 0. It is normal if α = 2 and Laplace if α = 1. Figure:
14 EPD mixture regression models Let (y i, x i ), i = 1,..., n be independent observations on response y and p-dimensional covariate x.
15 EPD mixture regression models Let (y i, x i ), i = 1,..., n be independent observations on response y and p-dimensional covariate x. We then consider f (y i x i, Θ K ) = K π k φ(y i x T i β k, σk 2, α k), k=1
16 EPD mixture regression models Let (y i, x i ), i = 1,..., n be independent observations on response y and p-dimensional covariate x. We then consider f (y i x i, Θ K ) = K π k φ(y i x T i β k, σk 2, α k), k=1 where Θ K denotes the set of all the parameters, φ(y i x T i β k, σ 2 k, α k) is the k-th component density with regression coefficients β k = (β k1,..., β kp ) T R p,
17 EPD mixture regression models Let (y i, x i ), i = 1,..., n be independent observations on response y and p-dimensional covariate x. We then consider f (y i x i, Θ K ) = K π k φ(y i x T i β k, σk 2, α k), k=1 where Θ K denotes the set of all the parameters, φ(y i x T i β k, σ 2 k, α k) is the k-th component density with regression coefficients β k = (β k1,..., β kp ) T R p, σ 2 k (0, ), α k (0, ), proportion π k 0, and K k=1 π k = 1.
18 EPD mixture-based variable selection The new proposal: For each observation, we first construct the penalized likelihood and then combine these likelihoods together by a component-wise weighting:
19 EPD mixture-based variable selection The new proposal: For each observation, we first construct the penalized likelihood and then combine these likelihoods together by a component-wise weighting: pl n (Θ K (y i, x i )) = K π k φ(y i x T i β k, σk 2, α k) k=1 exp ( λ β ) k 1 + κ 0 1 σ k σ 2/n. k
20 EPD mixture-based variable selection The proposed penalized likelihood: pl n (Θ K (Y, X)) = n K pl n (Θ K (y i, x i )) i=1 k=1 π δ k k, where δ k, k = 1,..., K are pre-specified constants with default δ k = 1/K.
21 EPD mixture-based variable selection The proposed penalized likelihood: pl n (Θ K (Y, X)) = n K pl n (Θ K (y i, x i )) i=1 k=1 π δ k k, where δ k, k = 1,..., K are pre-specified constants with default δ k = 1/K. The number of components, K, is chosen by minimizing a BIC with respect to 1 K K n and λ 0 λ λ 1. The advantage of the new proposal over the existing one lies in computation and the convergence of the GEM algorithm.
22 Simulation studies We considered the following four screening procedures in simulation studies (λ = 0): Correlation learning or simple Gaussian linear regression (GAU1)
23 Simulation studies We considered the following four screening procedures in simulation studies (λ = 0): Correlation learning or simple Gaussian linear regression (GAU1) Simple EPD linear regression (EPD1)
24 Simulation studies We considered the following four screening procedures in simulation studies (λ = 0): Correlation learning or simple Gaussian linear regression (GAU1) Simple EPD linear regression (EPD1) Simple Gaussian mixture regression (GAUMIX, BIC-based)
25 Simulation studies We considered the following four screening procedures in simulation studies (λ = 0): Correlation learning or simple Gaussian linear regression (GAU1) Simple EPD linear regression (EPD1) Simple Gaussian mixture regression (GAUMIX, BIC-based) Simple EPD mixture regression (EPDMIX, BIC-based).
26 Simulation studies We considered the following four screening procedures in simulation studies (λ = 0): Correlation learning or simple Gaussian linear regression (GAU1) Simple EPD linear regression (EPD1) Simple Gaussian mixture regression (GAUMIX, BIC-based) Simple EPD mixture regression (EPDMIX, BIC-based). We compare the performances of GAU1, EPD1, GAUMIX, and EPDMIX in screening out non-active covariates in the model in terms of specificity and sensitivity.
27 Simulation studies Setting (multiple linear regression): We generated 40 datasets with the sample size n and the dimension p. Each dataset contained observations (y i, x ij ), 1 j p, 1 i n satisfying y i = p x ij β 0j + ε i, j=1 where ε i, 1 i n were iid N(0, 1), and the regression coefficients β 0 = (2 + η 1, η 2, η 3, η 4, η 5, 0 T p 5) T, where η j, 1 j 5, were iid N(0, ), and 0 p 5 was a p 5 vector of zeros.
28 Simulation studies Setting (multiple linear regression): We generated 40 datasets with the sample size n and the dimension p. Each dataset contained observations (y i, x ij ), 1 j p, 1 i n satisfying y i = p x ij β 0j + ε i, j=1 where ε i, 1 i n were iid N(0, 1), and the regression coefficients β 0 = (2 + η 1, η 2, η 3, η 4, η 5, 0 T p 5) T, where η j, 1 j 5, were iid N(0, ), and 0 p 5 was a p 5 vector of zeros. There were five active covariates in the model.
29 Simulations Setting (Gaussian mixture regression): We generated 40 datasets with the sample size n, the dimension p and K 0 components.
30 Simulations Setting (Gaussian mixture regression): We generated 40 datasets with the sample size n, the dimension p and K 0 components. Given x i = (x i1,..., x ip ) T s, y i, 1 i p were independently sampled from K 0 f (y i ) = π k φ(y i x T i β k ), k=1 where φ(.) is the density of the standard normal distribution.
31 Simulations We considered the two cases of K 0 : (1) K 0 = 2, where there are two components with β 1 = (2 + v 1, v 2, v 3, v 4, v 5, 0 T p 5) T, β 2 = (0, 0, 0, 4 + v 21, 4 + v 22, 4 + v 23, 4 + v 24, 4 + v 25, 0 T p 8) T, where v j, 1 j 5, are iid N(0, ) and 0 p 5 is a p 5 vector of zeros.
32 Simulations We considered the two cases of K 0 : (1) K 0 = 2, where there are two components with β 1 = (2 + v 1, v 2, v 3, v 4, v 5, 0 T p 5) T, β 2 = (0, 0, 0, 4 + v 21, 4 + v 22, 4 + v 23, 4 + v 24, 4 + v 25, 0 T p 8) T, where v j, 1 j 5, are iid N(0, ) and 0 p 5 is a p 5 vector of zeros. (2) K 0 = 3, where there are three components with β 1 = (2 + v 11, v 12, v 13, v 14, v 15, 0 T p 5) T, β 2 = (0, 0, 0, 4 + v 21, 4 + v 22, 4 + v 23, 4 + v 24, 4 + v 25, 0 T p 8) T, β 3 = (0, 0, 0, 0, 0, 0, 4 + v 31, 4 + v 32, 0 T p 8) T, where v kj, 1 j 5, k = 1, 2, v 31, v 32 are iid N(0, ), and 0 p 8 is a p 8 vector of zeros.
33 Simulations We considered the two cases of K 0 : (1) K 0 = 2, where there are two components with β 1 = (2 + v 1, v 2, v 3, v 4, v 5, 0 T p 5) T, β 2 = (0, 0, 0, 4 + v 21, 4 + v 22, 4 + v 23, 4 + v 24, 4 + v 25, 0 T p 8) T, where v j, 1 j 5, are iid N(0, ) and 0 p 5 is a p 5 vector of zeros. (2) K 0 = 3, where there are three components with β 1 = (2 + v 11, v 12, v 13, v 14, v 15, 0 T p 5) T, β 2 = (0, 0, 0, 4 + v 21, 4 + v 22, 4 + v 23, 4 + v 24, 4 + v 25, 0 T p 8) T, β 3 = (0, 0, 0, 0, 0, 0, 4 + v 31, 4 + v 32, 0 T p 8) T, where v kj, 1 j 5, k = 1, 2, v 31, v 32 are iid N(0, ), and 0 p 8 is a p 8 vector of zeros. For each case of K 0, we considered (n, p) = (300, 400) and (500, 600).
34 Table: Percentage increase of average specificity compared to the GAU1 in variable screening Setting 4.1.1: single component Sensitivity 5/5 4/5 3/5 2/5 1/5 Percentage increase of ave. spe. (%) (n, p) = (500, 600) GAU EPD EPDMIX GAUMIX (n, p) = (100, 2000) GAU EPD EPDMIX GAUMIX
35 Table: Percentage increase of average specificity compared to the GAU1 in variable screening Setting 4.1.2: multiple components Sensitivity 8/8 7/8 6/8 5/8 4/8 3/8 2/8 1/8 Percentage increase of ave. spe. (%) Two components: (n, p) = (300, 400) GAU EPD EPDMIX GAUMIX Two components: (n, p) = (500, 600) GAU EPD EPDMIX GAUMIX
36 Table: Percentage increase of average specificity compared to the GAU1 in variable screening Setting 4.1.2: multiple components Sensitivity 8/8 7/8 6/8 5/8 4/8 3/8 2/8 1/8 Percentage increase of ave. spe. (%) Three components:(n, p) = (300, 400) GAU EPD EPDMIX GAUMIX Three components:(n, p) = (500, 600) GAU EPD EPDMIX GAUMIX
37 Conclusion We have proposed a new approach for upsupervised regressive learning.
38 Conclusion We have proposed a new approach for upsupervised regressive learning. The proposal has been shown to outperform the existing procedures by simulation studies.
39 Conclusion We have proposed a new approach for upsupervised regressive learning. The proposal has been shown to outperform the existing procedures by simulation studies. Comparison to LASSO also favored our approach.
40 Thank you!
Linear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationStatistics & Data Sciences: First Year Prelim Exam May 2018
Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationModeling Real Estate Data using Quantile Regression
Modeling Real Estate Data using Semiparametric Quantile Regression Department of Statistics University of Innsbruck September 9th, 2011 Overview 1 Application: 2 3 4 Hedonic regression data for house prices
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationDimension Reduction Methods
Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.
More informationTime Series Models for Measuring Market Risk
Time Series Models for Measuring Market Risk José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department June 28, 2007 1/ 32 Outline 1 Introduction 2 Competitive and collaborative
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationSparse Gaussian Markov Random Field Mixtures for Anomaly Detection
Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection Tsuyoshi Idé ( Ide-san ), Ankush Khandelwal*, Jayant Kalagnanam IBM Research, T. J. Watson Research Center (*Currently with University
More informationEstimating subgroup specific treatment effects via concave fusion
Estimating subgroup specific treatment effects via concave fusion Jian Huang University of Iowa April 6, 2016 Outline 1 Motivation and the problem 2 The proposed model and approach Concave pairwise fusion
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationBayesian shrinkage approach in variable selection for mixed
Bayesian shrinkage approach in variable selection for mixed effects s GGI Statistics Conference, Florence, 2015 Bayesian Variable Selection June 22-26, 2015 Outline 1 Introduction 2 3 4 Outline Introduction
More informationHigh-dimensional regression modeling
High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationLecture 14: Shrinkage
Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationTail negative dependence and its applications for aggregate loss modeling
Tail negative dependence and its applications for aggregate loss modeling Lei Hua Division of Statistics Oct 20, 2014, ISU L. Hua (NIU) 1/35 1 Motivation 2 Tail order Elliptical copula Extreme value copula
More informationAn Introduction to Graphical Lasso
An Introduction to Graphical Lasso Bo Chang Graphical Models Reading Group May 15, 2015 Bo Chang (UBC) Graphical Lasso May 15, 2015 1 / 16 Undirected Graphical Models An undirected graph, each vertex represents
More informationAn Extended BIC for Model Selection
An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri,
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationSparse and Robust Optimization and Applications
Sparse and and Statistical Learning Workshop Les Houches, 2013 Robust Laurent El Ghaoui with Mert Pilanci, Anh Pham EECS Dept., UC Berkeley January 7, 2013 1 / 36 Outline Sparse Sparse Sparse Probability
More informationPart 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior Tom Heskes joint work with Marcel van Gerven
More informationUncertainty quantification and visualization for functional random variables
Uncertainty quantification and visualization for functional random variables MascotNum Workshop 2014 S. Nanty 1,3 C. Helbert 2 A. Marrel 1 N. Pérot 1 C. Prieur 3 1 CEA, DEN/DER/SESI/LSMR, F-13108, Saint-Paul-lez-Durance,
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationRobust Bayesian Simple Linear Regression
Robust Bayesian Simple Linear Regression October 1, 2008 Readings: GIll 4 Robust Bayesian Simple Linear Regression p.1/11 Body Fat Data: Intervals w/ All Data 95% confidence and prediction intervals for
More informationUniversity of Oxford. Statistical Methods Autocorrelation. Identification and Estimation
University of Oxford Statistical Methods Autocorrelation Identification and Estimation Dr. Órlaith Burke Michaelmas Term, 2011 Department of Statistics, 1 South Parks Road, Oxford OX1 3TG Contents 1 Model
More informationAn economic application of machine learning: Nowcasting Thai exports using global financial market data and time-lag lasso
An economic application of machine learning: Nowcasting Thai exports using global financial market data and time-lag lasso PIER Exchange Nov. 17, 2016 Thammarak Moenjak What is machine learning? Wikipedia
More informationModeling conditional distributions with mixture models: Theory and Inference
Modeling conditional distributions with mixture models: Theory and Inference John Geweke University of Iowa, USA Journal of Applied Econometrics Invited Lecture Università di Venezia Italia June 2, 2005
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationWEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract
Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of
More informationRatemaking application of Bayesian LASSO with conjugate hyperprior
Ratemaking application of Bayesian LASSO with conjugate hyperprior Himchan Jeong and Emiliano A. Valdez University of Connecticut Actuarial Science Seminar Department of Mathematics University of Illinois
More informationGreedy Layer-Wise Training of Deep Networks
Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn
More informationEstimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1
Estimation and Model Selection in Mixed Effects Models Part I Adeline Samson 1 1 University Paris Descartes Summer school 2009 - Lipari, Italy These slides are based on Marc Lavielle s slides Outline 1
More informationVariable selection for model-based clustering
Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition
More informationLecture 5: GPs and Streaming regression
Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html
More informationExponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger
Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential
More informationLearning from Data: Regression
November 3, 2005 http://www.anc.ed.ac.uk/ amos/lfd/ Classification or Regression? Classification: want to learn a discrete target variable. Regression: want to learn a continuous target variable. Linear
More informationSTAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song
STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationEstimating Sparse High Dimensional Linear Models using Global-Local Shrinkage
Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Daniel F. Schmidt Centre for Biostatistics and Epidemiology The University of Melbourne Monash University May 11, 2017 Outline
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu 1,2, Daniel F. Schmidt 1, Enes Makalic 1, Guoqi Qian 2, John L. Hopper 1 1 Centre for Epidemiology and Biostatistics,
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationCounts using Jitters joint work with Peng Shi, Northern Illinois University
of Claim Longitudinal of Claim joint work with Peng Shi, Northern Illinois University UConn Actuarial Science Seminar 2 December 2011 Department of Mathematics University of Connecticut Storrs, Connecticut,
More informationMATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.)
1/12 MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.) Dominique Guillot Departments of Mathematical Sciences University of Delaware May 6, 2016
More informationÉtude de classes de noyaux adaptées à la simplification et à l interprétation des modèles d approximation.
Étude de classes de noyaux adaptées à la simplification et à l interprétation des modèles d approximation. Soutenance de thèse de N. Durrande Ecole des Mines de St-Etienne, 9 november 2011 Directeurs Laurent
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes
More informationMultivariate Statistics
Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationReview of probabilities
CS 1675 Introduction to Machine Learning Lecture 5 Density estimation Milos Hauskrecht milos@pitt.edu 5329 Sennott Square Review of probabilities 1 robability theory Studies and describes random processes
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:
More informationMaking Our Cities Safer: A Study In Neighbhorhood Crime Patterns
Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Aly Kane alykane@stanford.edu Ariel Sagalovsky asagalov@stanford.edu Abstract Equipped with an understanding of the factors that influence
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationBi-level feature selection with applications to genetic association
Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationHigh Dimensional Kullback-Leibler divergence for grassland classification using satellite image time series with high spatial resolution
High Dimensional Kullback-Leibler divergence for grassland classification using satellite image time series with high spatial resolution Presented by 1 In collaboration with Mathieu Fauvel1, Stéphane Girard2
More informationVariable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1
Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr
More informationInfinitely Imbalanced Logistic Regression
p. 1/1 Infinitely Imbalanced Logistic Regression Art B. Owen Journal of Machine Learning Research, April 2007 Presenter: Ivo D. Shterev p. 2/1 Outline Motivation Introduction Numerical Examples Notation
More informationMachine Learning - MT Classification: Generative Models
Machine Learning - MT 2016 7. Classification: Generative Models Varun Kanade University of Oxford October 31, 2016 Announcements Practical 1 Submission Try to get signed off during session itself Otherwise,
More informationMinimum Message Length Inference and Mixture Modelling of Inverse Gaussian Distributions
Minimum Message Length Inference and Mixture Modelling of Inverse Gaussian Distributions Daniel F. Schmidt Enes Makalic Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School
More informationSpatial Lasso with Application to GIS Model Selection. F. Jay Breidt Colorado State University
Spatial Lasso with Application to GIS Model Selection F. Jay Breidt Colorado State University with Hsin-Cheng Huang, Nan-Jung Hsu, and Dave Theobald September 25 The work reported here was developed under
More informationFeature selection with high-dimensional data: criteria and Proc. Procedures
Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June
More informationMachine Learning for Data Science (CS4786) Lecture 12
Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We
More informationModeling conditional distributions with mixture models: Applications in finance and financial decision-making
Modeling conditional distributions with mixture models: Applications in finance and financial decision-making John Geweke University of Iowa, USA Journal of Applied Econometrics Invited Lecture Università
More informationDensity Estimation: ML, MAP, Bayesian estimation
Density Estimation: ML, MAP, Bayesian estimation CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Maximum-Likelihood Estimation Maximum
More informationA Bayesian Perspective on Residential Demand Response Using Smart Meter Data
A Bayesian Perspective on Residential Demand Response Using Smart Meter Data Datong-Paul Zhou, Maximilian Balandat, and Claire Tomlin University of California, Berkeley [datong.zhou, balandat, tomlin]@eecs.berkeley.edu
More informationMultivariate Normal-Laplace Distribution and Processes
CHAPTER 4 Multivariate Normal-Laplace Distribution and Processes The normal-laplace distribution, which results from the convolution of independent normal and Laplace random variables is introduced by
More informationCellwise robust regularized discriminant analysis
Cellwise robust regularized discriminant analysis Ines Wilms (KU Leuven) and Stéphanie Aerts (University of Liège) ICORS, July 2017 Wilms and Aerts Cellwise robust regularized discriminant analysis 1 Discriminant
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationSymmetric btw positive & negative prior returns. where c is referred to as risk premium, which is expected to be positive.
Advantages of GARCH model Simplicity Generates volatility clustering Heavy tails (high kurtosis) Weaknesses of GARCH model Symmetric btw positive & negative prior returns Restrictive Provides no explanation
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationNeural Networks - II
Neural Networks - II Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1
More informationGroup exponential penalties for bi-level variable selection
for bi-level variable selection Department of Biostatistics Department of Statistics University of Kentucky July 31, 2011 Introduction In regression, variables can often be thought of as grouped: Indicator
More informationMachine Learning. 7. Logistic and Linear Regression
Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationRegularization in Cox Frailty Models
Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University
More informationBayesian Dropout. Tue Herlau, Morten Morup and Mikkel N. Schmidt. Feb 20, Discussed by: Yizhe Zhang
Bayesian Dropout Tue Herlau, Morten Morup and Mikkel N. Schmidt Discussed by: Yizhe Zhang Feb 20, 2016 Outline 1 Introduction 2 Model 3 Inference 4 Experiments Dropout Training stage: A unit is present
More informationPCA and admixture models
PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1
More informationCellwise robust regularized discriminant analysis
Cellwise robust regularized discriminant analysis JSM 2017 Stéphanie Aerts University of Liège, Belgium Ines Wilms KU Leuven, Belgium Cellwise robust regularized discriminant analysis 1 Discriminant analysis
More informationStepwise Searching for Feature Variables in High-Dimensional Linear Regression
Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:
More informationMultivariate Normal Models
Case Study 3: fmri Prediction Graphical LASSO Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox February 26 th, 2013 Emily Fox 2013 1 Multivariate Normal Models
More informationSemiparametric Mixed Effects Models with Flexible Random Effects Distribution
Semiparametric Mixed Effects Models with Flexible Random Effects Distribution Marie Davidian North Carolina State University davidian@stat.ncsu.edu www.stat.ncsu.edu/ davidian Joint work with A. Tsiatis,
More informationA Selective Review of Sufficient Dimension Reduction
A Selective Review of Sufficient Dimension Reduction Lexin Li Department of Statistics North Carolina State University Lexin Li (NCSU) Sufficient Dimension Reduction 1 / 19 Outline 1 General Framework
More informationNovember 2002 STA Random Effects Selection in Linear Mixed Models
November 2002 STA216 1 Random Effects Selection in Linear Mixed Models November 2002 STA216 2 Introduction It is common practice in many applications to collect multiple measurements on a subject. Linear
More informationAn Imputation-Consistency Algorithm for Biomedical Complex Data Analysis
An Imputation-Consistency Algorithm for Biomedical Complex Data Analysis Faming Liang Purdue University January 11, 2018 Outline Introduction to biomedical complex data An IC algorithm for high-dimensional
More informationPenalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms
university-logo Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms Andrew Barron Cong Huang Xi Luo Department of Statistics Yale University 2008 Workshop on Sparsity in High Dimensional
More informationGibbs Sampling in Linear Models #2
Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More information