An Efficient Algorithm For Weak Hierarchical Lasso. Yashu Liu, Jie Wang, Jieping Ye Arizona State University

Size: px
Start display at page:

Download "An Efficient Algorithm For Weak Hierarchical Lasso. Yashu Liu, Jie Wang, Jieping Ye Arizona State University"

Transcription

1 An Efficient Algorithm For Weak Hierarchical Lasso Yashu Liu, Jie Wang, Jieping Ye Arizona State University

2 Outline Regression with Interactions Problems and Challenges Weak Hierarchical Lasso The Proposed Method Experimental Results Conclusion Future Work

3 Linear Regression Linear Regression is a widely used tool in data mining and machine learning areas A linear regression model is yy = xx TT ww + εε Outcome Feature Coefficients Noise yy xx TT ww εε = +

4 Regression with Interactions Regression with only linear effects may not be sufficient One effective way to capture the nonlinearity is to include interactions yy = xx TT ww xxtt QQQQ + εε Feature Coefficients Coefficients xx TT ww xx TT QQ xx Main Effects Interactions

5 Issues of Including Interactions ISSUE 1: Including interactions leads to ultra high dimensionality e.g., 100 features will result in 10,000 interactions ISSUE 2: Hierarchical structural estimator is typically difficult to obtain

6 Related Work Ling Yan et al used lowrank interaction model for click through rate prediction Peter Radchenko et al proposed a Lasso Type penalty VANISH to achieve strong hierarchical structure Ming Yuan et al proposed a type of nonnegative garrote method to achieve hierarchical structures Peng Zhao et al proposed Composite Absolute Penalties for hierarchical interaction models Nam Hee Choi et al formulated the interaction coefficients as the product of main effects coefficients to achieve strong hierarchy

7 Weak Hierarchical Lasso (Jacob et al Annals of Statistics) Weak Hierarchy An interaction is selected only if at least one of the main effects is included in the model QQ iiii 00 only if ww ii 00 OR ww jj 00. Age Gender MMSE Age Gender Age MMSE Gender MMSE

8 Weak Hierarchical Lasso Weak Hierarchical Constraints QQ jj 11 ww jj for jj = 11,, dd M1 M2 M3 M4 M5 When jj = 1, ww M1 M2 M3 M4 M5 QQ M1 M1 M1 M2 M1 M3 M1 M4 M1 M5

9 Weak Hierarchical Lasso Weak Hierarchical Constraints QQ jj 11 ww jj for jj = 11,, dd M1 M2 M3 M4 M5 When jj = 2, ww M1 M2 M3 M4 M5 QQ M1 M2 M2 M2 M2 M3 M2 M4 M2 M5

10 Weak Hierarchical Lasso Let LL(ww, QQ) be the loss function for interaction regression The Weak Hierarchical Lasso formulation: min ww,qq LL ww, QQ + λλ ww 1 + λλ 2 QQ 1 ss. tt. QQ,jj 1 ww jj, jj = 1,, dd. Constraint QQ,jj 1 ww jj makes the optimization problem nonconvex

11 Weak Hierarchical Lasso Jacob et al propose to solve a relaxed version: min ww,qq LL ww+ ww, QQ + λλ1 TT (ww + + ww ) + λλ 2 QQ 1 ss. tt. QQ,jj ww + 1 jj + ww jj ww + jj 0, jj = 1,, dd ww jj 0 ww 1 is relaxed to ww + + ww such that the optimization problem is convex The relaxed formulation guarantees a weak hierarchical solution only if a ridge penalty exists

12 Main Contributions We propose to directly solve the nonconvex Weak Hierarchical Lasso by a proximal algorithm We show that the associated proximal operator admits a closedform solution We accelerate the computation of each subproblem of the proximal operator from quadratic to linearithmic

13 Weak Hierarchical Lasso : Proximal Operator min ww,qq LL ww, QQ + λλ ww 1 + λλ 2 QQ 1 ss. tt. QQ,jj 1 ww jj, jj = 1,, dd Proximal Operator: arg min ww,qq 1 2 ww vv QQ UU FF 2 + λλ tt ww 1 + λλ 2tt QQ 1 ss. tt. QQ,jj 1 ww jj, jj = 1,, dd

14 Factorizing Unknown Variables The KEY IDEA is to factorize unknown coefficients ww and QQ by their signs SS ww, SS QQ and magnitudes ww and QQ Unknown coefficients ww and QQ are factorized as = + + = ww SS ww ww QQ = + + = + = + + = + = + + = SS QQ QQ

15 Proximal Operator Subproblem The proximal operator problem can be decoupled ww jj = SS ww jj ww jj By factorization QQ,jj = SS QQjj QQ,jj ww jj 0 QQ,jj 0, the subproblem is: 1 min SS ww jj,ss QQ, ww jj jj, QQ,jj 2 SS wwjj ww 2 1 jj vv jj SS 2 λλ QQ jj QQ jj UU,jj + FF tt ww jj + λλ 2tt 1TT QQ jj ss. tt. 1TT QQ jj ww jj QQ,jj 0, jj = 1,, dd

16 Solving Sign Variables We solve sign variables SS ww, SS QQ and magnitude variables ww, QQ together Since 1 2 ww 2 1 jj vv jj = 2 SS wwjj ww jj vv jj = 1 2 SS wwjj (SS wwjj ww jj vv jj ) 2 2 = 1 2 ww jj SS ww jj vv jj 2, then SS ww = sign(vv) SS QQjj must be the sign of UU,jj, i.e., SS QQjj = sign(uu,jj )

17 Solving Magnitude Variables The magnitude variables can thus be obtained by solving 1 min QQ,jj, ww jj 2 ww 2 1 jj vv jj + QQ λλ jj UU,jj + 2 tt ww jj + λλ 2tt 1TT QQ jj ss. tt. 1TT QQ jj ww jj QQ,jj 0, where vv jj = SS ww jj vv jj λλ tt 11 and UU,jj = SS QQjj UU,jj λλ 2tt 11 The above subproblem has closed form solutions

18 Accelerating Computation A straightforward implementation of solving the magnitude variables will take quadratic time Not scalable for large scale data We further accelerate the time complexity of computing each proximal operator subproblem to linearithmic Exploit special structure of the subproblem

19 Comparisons of Efficiency We compare the efficiency of our algorithm with the convex Weak Hierarchical Lasso in terms of running time and numbers of iterations

20 Comparisons of Recovery We compare the recovery performances of the two algorithms

21 Performance on Real Data We compare the original Weak Hierarchical Lasso with the convex version and other baseline methods on ADNI data

22 Conclusion We study Weak Hierarchical Lasso which is popular for building nonlinear regression models with interactions nonconvex, challenging to solve We propose a proximal algorithm to directly solve the Weak Hierarchical Lasso We show that the associated proximal operator admits a closed form solution We further accelerate the computation of each proximal operator subproblem from quadratic to linearithmic Empirical study demonstrated superior efficiency and effectiveness of the proposed algorithm

23 Future Work Apply the nonconvex Weak Hierarchical Lasso to other challenging applications such as depression study Extend our proposed algorithm to solve Strong Hierarchical Lasso problems

24 Thank You!

Approximate Second Order Algorithms. Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo

Approximate Second Order Algorithms. Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo Approximate Second Order Algorithms Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo Why Second Order Algorithms? Invariant under affine transformations e.g. stretching a function preserves the convergence

More information

Lecture 11. Kernel Methods

Lecture 11. Kernel Methods Lecture 11. Kernel Methods COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture The kernel trick Efficient computation of a dot product

More information

Lecture 6. Notes on Linear Algebra. Perceptron

Lecture 6. Notes on Linear Algebra. Perceptron Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors

More information

Worksheets for GCSE Mathematics. Quadratics. mr-mathematics.com Maths Resources for Teachers. Algebra

Worksheets for GCSE Mathematics. Quadratics. mr-mathematics.com Maths Resources for Teachers. Algebra Worksheets for GCSE Mathematics Quadratics mr-mathematics.com Maths Resources for Teachers Algebra Quadratics Worksheets Contents Differentiated Independent Learning Worksheets Solving x + bx + c by factorisation

More information

Chapter 22 : Electric potential

Chapter 22 : Electric potential Chapter 22 : Electric potential What is electric potential? How does it relate to potential energy? How does it relate to electric field? Some simple applications What does it mean when it says 1.5 Volts

More information

Support Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Support Vector Machines CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification

More information

Worksheets for GCSE Mathematics. Algebraic Expressions. Mr Black 's Maths Resources for Teachers GCSE 1-9. Algebra

Worksheets for GCSE Mathematics. Algebraic Expressions. Mr Black 's Maths Resources for Teachers GCSE 1-9. Algebra Worksheets for GCSE Mathematics Algebraic Expressions Mr Black 's Maths Resources for Teachers GCSE 1-9 Algebra Algebraic Expressions Worksheets Contents Differentiated Independent Learning Worksheets

More information

An Efficient Algorithm For Weak Hierarchical Lasso

An Efficient Algorithm For Weak Hierarchical Lasso An Efficient Algorithm For Weak Hierarchical Lasso Yashu Liu Arizona State University Tempe, AZ 8587 Yashu.Liu@asu.edu Jie Wang Arizona State University Tempe, AZ 8587 jie.wang.ustc@asu.edu Jieping Ye

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

Lecture 3. Linear Regression

Lecture 3. Linear Regression Lecture 3. Linear Regression COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne Weeks 2 to 8 inclusive Lecturer: Andrey Kan MS: Moscow, PhD:

More information

Variations. ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra

Variations. ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra Variations ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra Last Time Probability Density Functions Normal Distribution Expectation / Expectation of a function Independence Uncorrelated

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

A Posteriori Error Estimates For Discontinuous Galerkin Methods Using Non-polynomial Basis Functions

A Posteriori Error Estimates For Discontinuous Galerkin Methods Using Non-polynomial Basis Functions Lin Lin A Posteriori DG using Non-Polynomial Basis 1 A Posteriori Error Estimates For Discontinuous Galerkin Methods Using Non-polynomial Basis Functions Lin Lin Department of Mathematics, UC Berkeley;

More information

Lecture 10: Logistic Regression

Lecture 10: Logistic Regression BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics Lecture 10: Logistic Regression Jie Wang Department of Computational Medicine & Bioinformatics University of Michigan 1 Outline An

More information

7.3 The Jacobi and Gauss-Seidel Iterative Methods

7.3 The Jacobi and Gauss-Seidel Iterative Methods 7.3 The Jacobi and Gauss-Seidel Iterative Methods 1 The Jacobi Method Two assumptions made on Jacobi Method: 1.The system given by aa 11 xx 1 + aa 12 xx 2 + aa 1nn xx nn = bb 1 aa 21 xx 1 + aa 22 xx 2

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

A semi-device-independent framework based on natural physical assumptions

A semi-device-independent framework based on natural physical assumptions AQIS 2017 4-8 September 2017 A semi-device-independent framework based on natural physical assumptions and its application to random number generation T. Van Himbeeck, E. Woodhead, N. Cerf, R. García-Patrón,

More information

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We

More information

Statistical Learning with the Lasso, spring The Lasso

Statistical Learning with the Lasso, spring The Lasso Statistical Learning with the Lasso, spring 2017 1 Yeast: understanding basic life functions p=11,904 gene values n number of experiments ~ 10 Blomberg et al. 2003, 2010 The Lasso fmri brain scans function

More information

Number Representations

Number Representations Computer Arithmetic Algorithms Prof. Dae Hyun Kim School of Electrical Engineering and Computer Science Washington State University Number Representations Information Textbook Israel Koren, Computer Arithmetic

More information

Inferring the origin of an epidemic with a dynamic message-passing algorithm

Inferring the origin of an epidemic with a dynamic message-passing algorithm Inferring the origin of an epidemic with a dynamic message-passing algorithm HARSH GUPTA (Based on the original work done by Andrey Y. Lokhov, Marc Mézard, Hiroki Ohta, and Lenka Zdeborová) Paper Andrey

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning And Applications: Supervised Learning-SVM Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Dan Roth 461C, 3401 Walnut

Dan Roth   461C, 3401 Walnut CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn

More information

Grover s algorithm. We want to find aa. Search in an unordered database. QC oracle (as usual) Usual trick

Grover s algorithm. We want to find aa. Search in an unordered database. QC oracle (as usual) Usual trick Grover s algorithm Search in an unordered database Example: phonebook, need to find a person from a phone number Actually, something else, like hard (e.g., NP-complete) problem 0, xx aa Black box ff xx

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Adaptive Primal Dual Optimization for Image Processing and Learning

Adaptive Primal Dual Optimization for Image Processing and Learning Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University

More information

General Strong Polarization

General Strong Polarization General Strong Polarization Madhu Sudan Harvard University Joint work with Jaroslaw Blasiok (Harvard), Venkatesan Gurswami (CMU), Preetum Nakkiran (Harvard) and Atri Rudra (Buffalo) December 4, 2017 IAS:

More information

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning CSC 578 Neural Networks and Deep Learning Fall 2018/19 3. Improving Neural Networks (Some figures adapted from NNDL book) 1 Various Approaches to Improve Neural Networks 1. Cost functions Quadratic Cross

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

Quantum Mechanics. An essential theory to understand properties of matter and light. Chemical Electronic Magnetic Thermal Optical Etc.

Quantum Mechanics. An essential theory to understand properties of matter and light. Chemical Electronic Magnetic Thermal Optical Etc. Quantum Mechanics An essential theory to understand properties of matter and light. Chemical Electronic Magnetic Thermal Optical Etc. Fall 2018 Prof. Sergio B. Mendes 1 CHAPTER 3 Experimental Basis of

More information

Review for Exam Hyunse Yoon, Ph.D. Assistant Research Scientist IIHR-Hydroscience & Engineering University of Iowa

Review for Exam Hyunse Yoon, Ph.D. Assistant Research Scientist IIHR-Hydroscience & Engineering University of Iowa 57:020 Fluids Mechanics Fall2013 1 Review for Exam3 12. 11. 2013 Hyunse Yoon, Ph.D. Assistant Research Scientist IIHR-Hydroscience & Engineering University of Iowa 57:020 Fluids Mechanics Fall2013 2 Chapter

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Minimizing the Difference of L 1 and L 2 Norms with Applications

Minimizing the Difference of L 1 and L 2 Norms with Applications 1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:

More information

Solving DC Programs that Promote Group 1-Sparsity

Solving DC Programs that Promote Group 1-Sparsity Solving DC Programs that Promote Group 1-Sparsity Ernie Esser Contains joint work with Xiaoqun Zhang, Yifei Lou and Jack Xin SIAM Conference on Imaging Science Hong Kong Baptist University May 14 2014

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 2: Vector Data: Prediction Instructor: Yizhou Sun yzsun@cs.ucla.edu October 8, 2018 TA Office Hour Time Change Junheng Hao: Tuesday 1-3pm Yunsheng Bai: Thursday 1-3pm

More information

Radial Basis Function (RBF) Networks

Radial Basis Function (RBF) Networks CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks 1 Function approximation We have been using MLPs as pattern classifiers But in general, they are function approximators Depending

More information

SECTION 5: POWER FLOW. ESE 470 Energy Distribution Systems

SECTION 5: POWER FLOW. ESE 470 Energy Distribution Systems SECTION 5: POWER FLOW ESE 470 Energy Distribution Systems 2 Introduction Nodal Analysis 3 Consider the following circuit Three voltage sources VV sss, VV sss, VV sss Generic branch impedances Could be

More information

1. The graph of a function f is given above. Answer the question: a. Find the value(s) of x where f is not differentiable. Ans: x = 4, x = 3, x = 2,

1. The graph of a function f is given above. Answer the question: a. Find the value(s) of x where f is not differentiable. Ans: x = 4, x = 3, x = 2, 1. The graph of a function f is given above. Answer the question: a. Find the value(s) of x where f is not differentiable. x = 4, x = 3, x = 2, x = 1, x = 1, x = 2, x = 3, x = 4, x = 5 b. Find the value(s)

More information

An Efficient Graph Sparsification Approach to Scalable Harmonic Balance (HB) Analysis of Strongly Nonlinear RF Circuits

An Efficient Graph Sparsification Approach to Scalable Harmonic Balance (HB) Analysis of Strongly Nonlinear RF Circuits Design Automation Group An Efficient Graph Sparsification Approach to Scalable Harmonic Balance (HB) Analysis of Strongly Nonlinear RF Circuits Authors : Lengfei Han (Speaker) Xueqian Zhao Dr. Zhuo Feng

More information

CS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds

CS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds CS 6347 Lecture 8 & 9 Lagrange Multipliers & Varitional Bounds General Optimization subject to: min ff 0() R nn ff ii 0, h ii = 0, ii = 1,, mm ii = 1,, pp 2 General Optimization subject to: min ff 0()

More information

SVRG++ with Non-uniform Sampling

SVRG++ with Non-uniform Sampling SVRG++ with Non-uniform Sampling Tamás Kern András György Department of Electrical and Electronic Engineering Imperial College London, London, UK, SW7 2BT {tamas.kern15,a.gyorgy}@imperial.ac.uk Abstract

More information

Online Mode Shape Estimation using Complex Principal Component Analysis and Clustering, Hallvar Haugdal (NTMU Norway)

Online Mode Shape Estimation using Complex Principal Component Analysis and Clustering, Hallvar Haugdal (NTMU Norway) Online Mode Shape Estimation using Complex Principal Component Analysis and Clustering, Hallvar Haugdal (NTMU Norway) Mr. Hallvar Haugdal, finished his MSc. in Electrical Engineering at NTNU, Norway in

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

Rotational Motion. Chapter 10 of Essential University Physics, Richard Wolfson, 3 rd Edition

Rotational Motion. Chapter 10 of Essential University Physics, Richard Wolfson, 3 rd Edition Rotational Motion Chapter 10 of Essential University Physics, Richard Wolfson, 3 rd Edition 1 We ll look for a way to describe the combined (rotational) motion 2 Angle Measurements θθ ss rr rrrrrrrrrrrrrr

More information

Exploratory quantile regression with many covariates: An application to adverse birth outcomes

Exploratory quantile regression with many covariates: An application to adverse birth outcomes Exploratory quantile regression with many covariates: An application to adverse birth outcomes June 3, 2011 eappendix 30 Percent of Total 20 10 0 0 1000 2000 3000 4000 5000 Birth weights efigure 1: Histogram

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Statistical Machine Learning for Structured and High Dimensional Data

Statistical Machine Learning for Structured and High Dimensional Data Statistical Machine Learning for Structured and High Dimensional Data (FA9550-09- 1-0373) PI: Larry Wasserman (CMU) Co- PI: John Lafferty (UChicago and CMU) AFOSR Program Review (Jan 28-31, 2013, Washington,

More information

Lecture No. 1 Introduction to Method of Weighted Residuals. Solve the differential equation L (u) = p(x) in V where L is a differential operator

Lecture No. 1 Introduction to Method of Weighted Residuals. Solve the differential equation L (u) = p(x) in V where L is a differential operator Lecture No. 1 Introduction to Method of Weighted Residuals Solve the differential equation L (u) = p(x) in V where L is a differential operator with boundary conditions S(u) = g(x) on Γ where S is a differential

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Simultaneous state and input estimation of non-linear process with unknown inputs using particle swarm optimization particle filter (PSO-PF) algorithm

Simultaneous state and input estimation of non-linear process with unknown inputs using particle swarm optimization particle filter (PSO-PF) algorithm Simultaneous state and input estimation of non-linear process with unknown inputs using particle swarm optimization particle filter (PSO-PF) algorithm Mohammad A. Khan, CSChe 2016 Outlines Motivations

More information

Kinetic Model Parameter Estimation for Product Stability: Non-uniform Finite Elements and Convexity Analysis

Kinetic Model Parameter Estimation for Product Stability: Non-uniform Finite Elements and Convexity Analysis Kinetic Model Parameter Estimation for Product Stability: Non-uniform Finite Elements and Convexity Analysis Mark Daichendt, Lorenz Biegler Carnegie Mellon University Pittsburgh, PA 15217 Ben Weinstein,

More information

Variable Selection in Predictive Regressions

Variable Selection in Predictive Regressions Variable Selection in Predictive Regressions Alessandro Stringhi Advanced Financial Econometrics III Winter/Spring 2018 Overview This chapter considers linear models for explaining a scalar variable when

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 11, 2016 Paper presentations and final project proposal Send me the names of your group member (2 or 3 students) before October 15 (this Friday)

More information

" = Y(#,$) % R(r) = 1 4& % " = Y(#,$) % R(r) = Recitation Problems: Week 4. a. 5 B, b. 6. , Ne Mg + 15 P 2+ c. 23 V,

 = Y(#,$) % R(r) = 1 4& %  = Y(#,$) % R(r) = Recitation Problems: Week 4. a. 5 B, b. 6. , Ne Mg + 15 P 2+ c. 23 V, Recitation Problems: Week 4 1. Which of the following combinations of quantum numbers are allowed for an electron in a one-electron atom: n l m l m s 2 2 1! 3 1 0 -! 5 1 2! 4-1 0! 3 2 1 0 2 0 0 -! 7 2-2!

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

On one Application of Newton s Method to Stability Problem

On one Application of Newton s Method to Stability Problem Journal of Multidciplinary Engineering Science Technology (JMEST) ISSN: 359-0040 Vol. Issue 5, December - 204 On one Application of Newton s Method to Stability Problem Şerife Yılmaz Department of Mathematics,

More information

Terms of Use. Copyright Embark on the Journey

Terms of Use. Copyright Embark on the Journey Terms of Use All rights reserved. No part of this packet may be reproduced, stored in a retrieval system, or transmitted in any form by any means - electronic, mechanical, photo-copies, recording, or otherwise

More information

StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent

StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent : Safely Avoiding Wasteful Updates in Coordinate Descent Tyler B. Johnson 1 Carlos Guestrin 1 Abstract Coordinate descent (CD) is a scalable and simple algorithm for solving many optimization problems

More information

Predicting Winners of Competitive Events with Topological Data Analysis

Predicting Winners of Competitive Events with Topological Data Analysis Predicting Winners of Competitive Events with Topological Data Analysis Conrad D Souza Ruben Sanchez-Garcia R.Sanchez-Garcia@soton.ac.uk Tiejun Ma tiejun.ma@soton.ac.uk Johnnie Johnson J.E.Johnson@soton.ac.uk

More information

Sparse Sum-of-Squares Optimization for Model Updating through. Minimization of Modal Dynamic Residuals

Sparse Sum-of-Squares Optimization for Model Updating through. Minimization of Modal Dynamic Residuals Sparse Sum-of-Squares Optimization for Model Updating through Minimization of Modal Dynamic Residuals 1 Dan Li, 1, 2 * Yang Wang 1 School of Civil and Environmental Engineering, Georgia Institute of Technology,

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Detecting Statistical Interactions from Neural Network Weights

Detecting Statistical Interactions from Neural Network Weights Detecting Statistical Interactions from Neural Network Weights Michael Tsang Joint work with Dehua Cheng, Yan Liu 1/17 Motivation: We seek assurance that a neural network learned the longitude x latitude

More information

Control of Mobile Robots

Control of Mobile Robots Control of Mobile Robots Regulation and trajectory tracking Prof. Luca Bascetta (luca.bascetta@polimi.it) Politecnico di Milano Dipartimento di Elettronica, Informazione e Bioingegneria Organization and

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Vector Data: Clustering: Part II Instructor: Yizhou Sun yzsun@cs.ucla.edu May 3, 2017 Methods to Learn: Last Lecture Classification Clustering Vector Data Text Data Recommender

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

Uncertain Compression & Graph Coloring. Madhu Sudan Harvard

Uncertain Compression & Graph Coloring. Madhu Sudan Harvard Uncertain Compression & Graph Coloring Madhu Sudan Harvard Based on joint works with: (1) Adam Kalai (MSR), Sanjeev Khanna (U.Penn), Brendan Juba (WUStL) (2) Elad Haramaty (Harvard) (3) Badih Ghazi (MIT),

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

A A A A A A A A A A A A. a a a a a a a a a a a a a a a. Apples taste amazingly good.

A A A A A A A A A A A A. a a a a a a a a a a a a a a a. Apples taste amazingly good. Victorian Handwriting Sheet Aa A A A A A A A A A A A A Aa Aa Aa Aa Aa Aa Aa a a a a a a a a a a a a a a a Apples taste amazingly good. Apples taste amazingly good. Now make up a sentence of your own using

More information

LOWELL WEEKLY JOURNAL. ^Jberxy and (Jmott Oao M d Ccmsparftble. %m >ai ruv GEEAT INDUSTRIES

LOWELL WEEKLY JOURNAL. ^Jberxy and (Jmott Oao M d Ccmsparftble. %m >ai ruv GEEAT INDUSTRIES ? (») /»» 9 F ( ) / ) /»F»»»»»# F??»»» Q ( ( »»» < 3»» /» > > } > Q ( Q > Z F 5

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Jasmin Smajic1, Christian Hafner2, Jürg Leuthold2, March 23, 2015

Jasmin Smajic1, Christian Hafner2, Jürg Leuthold2, March 23, 2015 Jasmin Smajic, Christian Hafner 2, Jürg Leuthold 2, March 23, 205 Time Domain Finite Element Method (TD FEM): Continuous and Discontinuous Galerkin (DG-FEM) HSR - University of Applied Sciences of Eastern

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

A Precise Model of TSV Parasitic Capacitance Considering Temperature for 3D IC DENG Quan ZHANG Min-Xuan ZHAO Zhen-Yu LI Peng

A Precise Model of TSV Parasitic Capacitance Considering Temperature for 3D IC DENG Quan ZHANG Min-Xuan ZHAO Zhen-Yu LI Peng International Conference on Automation, Mechanical Control and Computational Engineering (AMCCE 2015) A Precise Model of TSV Parasitic Capacitance Considering Temperature for 3D IC DENG Quan ZHANG Min-Xuan

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large

More information

Estimation of cumulative distribution function with spline functions

Estimation of cumulative distribution function with spline functions INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative

More information

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Bo Liu Department of Computer Science, Rutgers Univeristy Xiao-Tong Yuan BDAT Lab, Nanjing University of Information Science and Technology

More information

High-dimensional Joint Sparsity Random Effects Model for Multi-task Learning

High-dimensional Joint Sparsity Random Effects Model for Multi-task Learning High-dimensional Joint Sparsity Random Effects Model for Multi-task Learning Krishnakumar Balasubramanian Georgia Institute of Technology krishnakumar3@gatech.edu Kai Yu Baidu Inc. yukai@baidu.com Tong

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA

GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA D Pimentel-Alarcón 1, L Balzano 2, R Marcia 3, R Nowak 1, R Willett 1 1 University of Wisconsin - Madison, 2 University of Michigan - Ann Arbor, 3 University

More information

Robust Inverse Covariance Estimation under Noisy Measurements

Robust Inverse Covariance Estimation under Noisy Measurements .. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related

More information

Heteroskedasticity ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Heteroskedasticity ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD Heteroskedasticity ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD Introduction For pedagogical reasons, OLS is presented initially under strong simplifying assumptions. One of these is homoskedastic errors,

More information

DISTURBANCE OBSERVER BASED CONTROL: CONCEPTS, METHODS AND CHALLENGES

DISTURBANCE OBSERVER BASED CONTROL: CONCEPTS, METHODS AND CHALLENGES DISTURBANCE OBSERVER BASED CONTROL: CONCEPTS, METHODS AND CHALLENGES Wen-Hua Chen Professor in Autonomous Vehicles Department of Aeronautical and Automotive Engineering Loughborough University 1 Outline

More information

An Efficient Proximal Gradient Method for General Structured Sparse Learning

An Efficient Proximal Gradient Method for General Structured Sparse Learning Journal of Machine Learning Research 11 (2010) Submitted 11/2010; Published An Efficient Proximal Gradient Method for General Structured Sparse Learning Xi Chen Qihang Lin Seyoung Kim Jaime G. Carbonell

More information

Big Bang Planck Era. This theory: cosmological model of the universe that is best supported by several aspects of scientific evidence and observation

Big Bang Planck Era. This theory: cosmological model of the universe that is best supported by several aspects of scientific evidence and observation Big Bang Planck Era Source: http://www.crystalinks.com/bigbang.html Source: http://www.odec.ca/index.htm This theory: cosmological model of the universe that is best supported by several aspects of scientific

More information

On The Cauchy Problem For Some Parabolic Fractional Partial Differential Equations With Time Delays

On The Cauchy Problem For Some Parabolic Fractional Partial Differential Equations With Time Delays Journal of Mathematics and System Science 6 (216) 194-199 doi: 1.17265/2159-5291/216.5.3 D DAVID PUBLISHING On The Cauchy Problem For Some Parabolic Fractional Partial Differential Equations With Time

More information

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center Computational and Statistical Aspects of Statistical Machine Learning John Lafferty Department of Statistics Retreat Gleacher Center Outline Modern nonparametric inference for high dimensional data Nonparametric

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

Regularized Least Squares Temporal Difference learning with nested l 2 and l 1 penalization

Regularized Least Squares Temporal Difference learning with nested l 2 and l 1 penalization Regularized Least Squares Temporal Difference learning with nested l 2 and l 1 penalization Matthew W. Hoffman 1, Alessandro Lazaric 2, Mohammad Ghavamzadeh 2, and Rémi Munos 2 1 University of British

More information

Section 2: Equations and Inequalities

Section 2: Equations and Inequalities Topic 1: Equations: True or False?... 29 Topic 2: Identifying Properties When Solving Equations... 31 Topic 3: Solving Equations... 34 Topic 4: Solving Equations Using the Zero Product Property... 36 Topic

More information

Accelerate Subgradient Methods

Accelerate Subgradient Methods Accelerate Subgradient Methods Tianbao Yang Department of Computer Science The University of Iowa Contributors: students Yi Xu, Yan Yan and colleague Qihang Lin Yang (CS@Uiowa) Accelerate Subgradient Methods

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information