Lecture 2 Part 1 Optimization
|
|
- Blaise Martin
- 5 years ago
- Views:
Transcription
1 Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo
2 Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss last week often requires solving an optimization problem Fields Institute, Toronto, Canada 2015 by Mu Zhu 2
3 Ex I: Linear Regression min β n ( yi x T i β ) 2 i=1 = y Xβ 2 y = y 1., X = x T 1.. y n x T n β = (X T X) 1 X T y Fields Institute, Toronto, Canada 2015 by Mu Zhu 3
4 Ex II: Logistic Regression y i Bernoulli(p i ) p i P(y i = 1 x i ) log p i = x T i β or p i = exp(xt i β) 1 p i 1+exp(x T i β) L(p i ;y i ) = n i=1 p y i i (1 p i) 1 y i Fields Institute, Toronto, Canada 2015 by Mu Zhu 4
5 Ex II: Logistic Regression log-likelihood by Newton-Raphson l(p i ;y i ) = logl(p i ;y i ) n = y i log(p i )+(1 y i )log(1 p i ) = l(β;y i,x i ) = to estimate β, i=1 n y i log i=1 p i 1 p i +log(1 p i ) n y i x T i β log(1+e xt i β ) i=1 max β l(β;y i,x i ) Fields Institute, Toronto, Canada 2015 by Mu Zhu 5
6 Ex II: Logistic Regression l (β) = n y i x i i=1 = [ [ e xt i β 1+e xt i β ] x i = ] x 1... x n n (y i p i )x i i=1 y 1 p 1. y n p n = XT (y p) Fields Institute, Toronto, Canada 2015 by Mu Zhu 6
7 Ex II: Logistic Regression l (β) [ ] n (e xt i β )(1+e xt i β ) (e xt i β )(e xt i β ) = x i x T i=1 (1+e xt i β ) 2 i n [ = x i pi p 2 i] x T i [ = i=1 ] x 1... x n p 1 (1 p 1 ) p n (1 p n ) x T 1.. x T n = X T WX Fields Institute, Toronto, Canada 2015 by Mu Zhu 7
8 Ex II: Logistic Regression β new = β old [ ] 1 [ ] l (β old ) l (β old ) = β old + [ X T WX ] 1[ X T (y p) ] = [ X T WX ] 1 X T W }{{} weighted least squares [ Xβold +W 1 (y p) ] (both W and p depend on β old ) w ii p i (1 p i ) max at p i = 1/2 and min at p i = 0 or p i = 1 estimates influenced mostly by points near decision boundary Fields Institute, Toronto, Canada 2015 by Mu Zhu 8
9 Enter Big Data if all this sounds easy, think about x R d and d is large relative to n end up estimating too much with too little resulting estimate cannot be very good in statistically terms, Var( β) inflated to reduce variance, introduce bias Fields Institute, Toronto, Canada 2015 by Mu Zhu 9
10 Penalized Regression let x 1,...,x d denote columns of X suppose y,x 1,...,x d all standardized (1 T y = 0, y = 1, etc) bias each β j by introducing penalty J(β j ) min β 1 2 y (β 1x β d x d ) 2 + d J(β j ) j=1 Fields Institute, Toronto, Canada 2015 by Mu Zhu 10
11 A Class of Penalty Functions J(β j ) = λ β j α d J(β j ) = λ d β j α λ β α j=1 j=1 α α difficulty algorithm ridge regression 2 l 2 convex analytic (exercise) LASSO 1 l 1 convex coordinate descent subset selection 0 l 0 NP-hard heuristic β j 0 = 0, β j = 0 1, β j 0 β j 0 = I(β j 0) Fields Institute, Toronto, Canada 2015 by Mu Zhu 11
12 Bias-Variance Trade-off Exercise Show that, under typical model assumptions, y = Xβ +ε, E(ε) = 0, and Var(ε) = σ 2 I, (i) the usual estimate, β = (X T X) 1 X T y, is unbiased; (ii) the ridge regression estimate call it β λ is biased, but orthonormal matrix V such that Var( β λ ) = σ 2 V D λ V T, Var( β) = σ 2 V DV T, and D λ (j,j) D(j,j) j. (Hint: Use the singular value decomposition of X.) Fields Institute, Toronto, Canada 2015 by Mu Zhu 12
13 Ex III: LASSO coordinate descent at each iteration, solve a univariate problem, min β j L(β j ) = 1 2 z β jx j 2 +λ β j +c j, where z y k j β k x k, c j = k jλ β k, while fixing all β k, k j. cycle through j = 1,2,...,d,1,2,...,d,... Fields Institute, Toronto, Canada 2015 by Mu Zhu 13
14 Ex III: LASSO d dβ j L(β j ) = 0 β j as a function of x j T z 45 β j (x T jx j ) +λ sgn(β j ) = x T jz }{{} x j 2 =1 λ 0 λ β j = x T j z λ, β j > 0; x T j z +λ, β j < 0 solution will typically contain many zeros, i.e., be sparse selection effect of the LASSO Fields Institute, Toronto, Canada 2015 by Mu Zhu 14
15 The l 1 Penalty D. L. Donoho (2006), For most large underdetermined systems of linear equations the minimal l 1 -norm solution is also the sparsest solution, Communications on Pure and Applied Mathematics 59, pp l 1 -problem as a convex relaxation of l 0 -problem Fields Institute, Toronto, Canada 2015 by Mu Zhu 15
16 Ex IV: Graphical LASSO from x i iid N(µ,Σ), estimate Ω Σ 1 (hard for d large) recall linear discriminant analysis requires Σ 1 (last week) for x = (x 1,...,x d ) T N(µ,Σ), variables x j and x k conditionally independent given all other variables if and only if Ω jk = 0 (Dempster, 1972; Biometrics) for Gaussian graphical models, edge between nodes j and k if and only if Ω jk 0 Fields Institute, Toronto, Canada 2015 by Mu Zhu 16
17 Ex IV: Graphical LASSO µ = x = 1 n n x i, S = 1 n i=1 n (x i µ)(x i µ) T i=1 L(Σ) = n i=1 1 [ (2π)d Σ exp 1 ] 2 (x i µ) T Σ 1 (x i µ) l(ω) = const+ n 2 log Σ n tr [ (x i µ) T Σ 1 (x i µ) ] i=1 = const+ n 2 log Ω n 2 tr(ωs) Fields Institute, Toronto, Canada 2015 by Mu Zhu 17
18 Ex IV: Graphical LASSO maximize l 1 -penalized likelihood (Friedman, Hastie & Tibshirani, 2008; Biostatistics): max Ω 0 log Ω tr(ωs) λ Ω 1 }{{} l(ω) where Ω 1 = j,k Ω jk by coordinate descent (one row/column at a time) resulting graphical model is sparse in the number of edges Fields Institute, Toronto, Canada 2015 by Mu Zhu 18
19 Ex IV: Graphical LASSO H-Ras. Left: Inactive form (4q21) & active form (6q21). Regions labelled Switch I & Switch II are known to undergo major conformational changes between 4q21 and 6q21. Right: Estimated graphical model showing conditional dependence structures, obtained by analysing 4q21 alone. (L. Soltan-Ghoraie, F. Burkowski & M. Zhu) Fields Institute, Toronto, Canada 2015 by Mu Zhu 19
20 Ex V: The Netflix Problem (Part 1) high-profile, million-dollar Netflix contest ( ) rating matrix R where r ui = rating of item i by user u set T = {(u,i) : r ui observed} want to predict the missing entries of R Fields Institute, Toronto, Canada 2015 by Mu Zhu 20
21 Ex V: The Netflix Problem (Part 1) Illustration of the Rating Matrix R Fields Institute, Toronto, Canada 2015 by Mu Zhu 21
22 Ex V: The Netflix Problem (Part 1) want to solve min R rank( R) s.t. r ui = r ui for (u,i) T philosophy: only a few factors affect user preferences, so rank of rating matrix must be low but problem is NP-hard Fields Institute, Toronto, Canada 2015 by Mu Zhu 22
23 Ex V: The Netflix Problem (Part 1) instead, solve convex relaxation, min R R s.t. r ui = r ui for (u,i) T where denotes the nuclear norm of a matrix let σ 1,σ 2,... be singular values of A; then, A = σ j 1 whereas rank(a) = I(σ j 0) = σ j 0 a matter of l 1 vs l 0 Fields Institute, Toronto, Canada 2015 by Mu Zhu 23
24 Ex V: The Netflix Problem (Part 1) E. J. Candès & B. Recht (2009), Exact matrix completion via convex optimization, Found Comput Math 9, pp B. Recht, M. Fazel, & P. A. Parrilo (2010), Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization, SIAM Review 52, pp under certain conditions, min minrank( ) min a semi-definite program Fields Institute, Toronto, Canada 2015 by Mu Zhu 24
25 Ex VI: The Netflix Problem (Part 2) explicit parameterization R PQ T = p T 1.. [ q 1 q m ] p T n p u,q i R K (K n,m) are latent coordinates just estimate p u,q i and predict missing entries with r ui = p T uq i in reality, user- and item-effects are removed prior to doing this Fields Institute, Toronto, Canada 2015 by Mu Zhu 25
26 Ex VI: The Netflix Problem (Part 2) min p u,q i (u,i) T( rui p T uq i) 2 +λ[ u p u 2 + i q i 2 ] coordinate descent still applies (over p 1,...,p n, q 1,...,q m,...) strictly speaking, blockwise coordinate descent each step convex, but overall nonconvex (both p u, q i unknown) many traps (e.g., local solutions, saddle points) Fields Institute, Toronto, Canada 2015 by Mu Zhu 26
27 Ex VII: MCP For the penalized regression problem, Zhang (2010; Ann. Stat.) proposed the so-called minimax concave penalty (MCP): βj ( J(β j ) = λ 1 x ) λ β j β2 j 2γ dx =, β j γλ; 0 γλ γλ2, β j > γλ, instead of J(β j ) = λ β j α. Fields Institute, Toronto, Canada 2015 by Mu Zhu 27
28 (a) (b) J(β j ) J(β j ) β j β j (a) MCP (λ = 1, γ = 10); (b) LASSO (λ = 1). Fields Institute, Toronto, Canada 2015 by Mu Zhu 28
29 Ex VII: MCP for β j large, J( ) constant beyond a certain point reduces bias for β j small, J( ) still like the LASSO keeps sparsity in between, smooth interpolation that minimizes maximal concavity coordinate descent applies, but many traps (e.g., local solutions, saddle points) Fields Institute, Toronto, Canada 2015 by Mu Zhu 29
30 Summary key ideas: introduce bias to reduce variance; penalty functions l 1 norm; nuclear norm; nonconvex penalties specific methods: ridge; LASSO; MCP graphical LASSO matrix completion; matrix factorization Newton-Raphson; coordinate descent application areas: proteins recommender systems Fields Institute, Toronto, Canada 2015 by Mu Zhu 30
31 Next... 2 pm by Professor S. Vavasis on optimization a short, 10-minute break research with my students (mostly, the Netflix Problem) Fields Institute, Toronto, Canada 2015 by Mu Zhu 31
Expanded Alternating Optimization of Nonconvex Functions with Applications to Matrix Factorization and Penalized Regression
Expanded Alternating Optimization of Nonconvex Functions with Applications to Matrix Factorization and Penalized Regression W. James Murdoch and Mu Zhu arxiv:1412.4128v1 [stat.co] 12 Dec 2014 Abstract
More informationThe MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010
Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationCOMS 4771 Lecture Fixed-design linear regression 2. Ridge and principal components regression 3. Sparse regression and Lasso
COMS 477 Lecture 6. Fixed-design linear regression 2. Ridge and principal components regression 3. Sparse regression and Lasso / 2 Fixed-design linear regression Fixed-design linear regression A simplified
More informationAdaptive one-bit matrix completion
Adaptive one-bit matrix completion Joseph Salmon Télécom Paristech, Institut Mines-Télécom Joint work with Jean Lafond (Télécom Paristech) Olga Klopp (Crest / MODAL X, Université Paris Ouest) Éric Moulines
More informationChapter 3. Linear Models for Regression
Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear
More informationThe picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R
The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework
More informationStability and the elastic net
Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized
More informationMachine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart
Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural
More informationLecture 16 Solving GLMs via IRWLS
Lecture 16 Solving GLMs via IRWLS 09 November 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Notes problem set 5 posted; due next class problem set 6, November 18th Goals for today fixed PCA example
More informationMidterm exam CS 189/289, Fall 2015
Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points
More informationCOMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017
COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationLatent Variable Graphical Model Selection Via Convex Optimization
Latent Variable Graphical Model Selection Via Convex Optimization The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published
More informationLearning Gaussian Graphical Models with Unknown Group Sparsity
Learning Gaussian Graphical Models with Unknown Group Sparsity Kevin Murphy Ben Marlin Depts. of Statistics & Computer Science Univ. British Columbia Canada Connections Graphical models Density estimation
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationThe lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding
Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty
More information6. Regularized linear regression
Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr
More informationProbabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Adrien Todeschini Inria Bordeaux JdS 2014, Rennes Aug. 2014 Joint work with François Caron (Univ. Oxford), Marie
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationSOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu
SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray
More informationProbabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,
More informationHigh-dimensional regression modeling
High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making
More informationMultivariate Normal Models
Case Study 3: fmri Prediction Graphical LASSO Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox February 26 th, 2013 Emily Fox 2013 1 Multivariate Normal Models
More informationCSC 576: Variants of Sparse Learning
CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in
More informationAn iterative hard thresholding estimator for low rank matrix recovery
An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical
More informationNonconvex penalties: Signal-to-noise ratio and algorithms
Nonconvex penalties: Signal-to-noise ratio and algorithms Patrick Breheny March 21 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/22 Introduction In today s lecture, we will return to nonconvex
More informationCS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu
CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge
More informationMultivariate Normal Models
Case Study 3: fmri Prediction Coping with Large Covariances: Latent Factor Models, Graphical Models, Graphical LASSO Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationOptimisation Combinatoire et Convexe.
Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36 Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix
More informationLinear regression COMS 4771
Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between
More informationA Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)
A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationThe lasso: some novel algorithms and applications
1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,
More informationLecture 4 Towards Deep Learning
Lecture 4 Towards Deep Learning (January 30, 2015) Mu Zhu University of Waterloo Deep Network Fields Institute, Toronto, Canada 2015 by Mu Zhu 2 Boltzmann Distribution probability distribution for a complex
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationSelection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty
Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationSemi-Penalized Inference with Direct FDR Control
Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationProperties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation
Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana
More informationStatistics for high-dimensional data: Group Lasso and additive models
Statistics for high-dimensional data: Group Lasso and additive models Peter Bühlmann and Sara van de Geer Seminar für Statistik, ETH Zürich May 2012 The Group Lasso (Yuan & Lin, 2006) high-dimensional
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationBias-free Sparse Regression with Guaranteed Consistency
Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationMATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.)
1/12 MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.) Dominique Guillot Departments of Mathematical Sciences University of Delaware May 6, 2016
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04
More informationChapter 17: Undirected Graphical Models
Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)
More informationMatrix Rank Minimization with Applications
Matrix Rank Minimization with Applications Maryam Fazel Haitham Hindi Stephen Boyd Information Systems Lab Electrical Engineering Department Stanford University 8/2001 ACC 01 Outline Rank Minimization
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from
More informationSTATS 306B: Unsupervised Learning Spring Lecture 13 May 12
STATS 306B: Unsupervised Learning Spring 2014 Lecture 13 May 12 Lecturer: Lester Mackey Scribe: Jessy Hwang, Minzhe Wang 13.1 Canonical correlation analysis 13.1.1 Recap CCA is a linear dimensionality
More informationORIE 4741: Learning with Big Messy Data. Regularization
ORIE 4741: Learning with Big Messy Data Regularization Professor Udell Operations Research and Information Engineering Cornell October 26, 2017 1 / 24 Regularized empirical risk minimization choose model
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More informationOptimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.
Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may
More informationBAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationMultiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar
Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic
More informationComparisons of penalized least squares. methods by simulations
Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy
More informationLecture Notes 10: Matrix Factorization
Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To
More informationLecture 9: September 28
0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More informationDATA MINING AND MACHINE LEARNING
DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationSparse Covariance Selection using Semidefinite Programming
Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support
More informationBinary matrix completion
Binary matrix completion Yaniv Plan University of Michigan SAMSI, LDHD workshop, 2013 Joint work with (a) Mark Davenport (b) Ewout van den Berg (c) Mary Wootters Yaniv Plan (U. Mich.) Binary matrix completion
More informationAn efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss
An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationLecture 24 May 30, 2018
Stats 3C: Theory of Statistics Spring 28 Lecture 24 May 3, 28 Prof. Emmanuel Candes Scribe: Martin J. Zhang, Jun Yan, Can Wang, and E. Candes Outline Agenda: High-dimensional Statistical Estimation. Lasso
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationEstimators based on non-convex programs: Statistical and computational guarantees
Estimators based on non-convex programs: Statistical and computational guarantees Martin Wainwright UC Berkeley Statistics and EECS Based on joint work with: Po-Ling Loh (UC Berkeley) Martin Wainwright
More informationSB1a Applied Statistics Lectures 9-10
SB1a Applied Statistics Lectures 9-10 Dr Geoff Nicholls Week 5 MT15 - Natural or canonical) exponential families - Generalised Linear Models for data - Fitting GLM s to data MLE s Iteratively Re-weighted
More informationPrediction & Feature Selection in GLM
Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis
More informationMidterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.
CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationChap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University
Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems
More informationUNIVERSITETET I OSLO
UNIVERSITETET I OSLO Det matematisk-naturvitenskapelige fakultet Examination in: STK4030 Modern data analysis - FASIT Day of examination: Friday 13. Desember 2013. Examination hours: 14.30 18.30. This
More informationIEOR 165 Lecture 7 1 Bias-Variance Tradeoff
IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationClassification: Logistic Regression from Data
Classification: Logistic Regression from Data Machine Learning: Jordan Boyd-Graber University of Colorado Boulder LECTURE 3 Slides adapted from Emily Fox Machine Learning: Jordan Boyd-Graber Boulder Classification:
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationIntroduction to Machine Learning. Regression. Computer Science, Tel-Aviv University,
1 Introduction to Machine Learning Regression Computer Science, Tel-Aviv University, 2013-14 Classification Input: X Real valued, vectors over real. Discrete values (0,1,2,...) Other structures (e.g.,
More informationLogistic Regression. Mohammad Emtiyaz Khan EPFL Oct 8, 2015
Logistic Regression Mohammad Emtiyaz Khan EPFL Oct 8, 2015 Mohammad Emtiyaz Khan 2015 Classification with linear regression We can use y = 0 for C 1 and y = 1 for C 2 (or vice-versa), and simply use least-squares
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationMATH 567: Mathematical Techniques in Data Science Logistic regression and Discriminant Analysis
Logistic regression MATH 567: Mathematical Techniques in Data Science Logistic regression and Discriminant Analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware March 6,
More informationProteomics and Variable Selection
Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Mixture Models, Density Estimation, Factor Analysis Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 2: 1 late day to hand it in now. Assignment 3: Posted,
More informationLogistic Regression with the Nonnegative Garrote
Logistic Regression with the Nonnegative Garrote Enes Makalic Daniel F. Schmidt Centre for MEGA Epidemiology The University of Melbourne 24th Australasian Joint Conference on Artificial Intelligence 2011
More information