An Efficient Algorithm For Weak Hierarchical Lasso. Yashu Liu, Jie Wang, Jieping Ye Arizona State University
|
|
- Anthony Ball
- 5 years ago
- Views:
Transcription
1 An Efficient Algorithm For Weak Hierarchical Lasso Yashu Liu, Jie Wang, Jieping Ye Arizona State University
2 Outline Regression with Interactions Problems and Challenges Weak Hierarchical Lasso The Proposed Method Experimental Results Conclusion Future Work
3 Linear Regression Linear Regression is a widely used tool in data mining and machine learning areas A linear regression model is yy = xx TT ww + εε Outcome Feature Coefficients Noise yy xx TT ww εε = +
4 Regression with Interactions Regression with only linear effects may not be sufficient One effective way to capture the nonlinearity is to include interactions yy = xx TT ww xxtt QQQQ + εε Feature Coefficients Coefficients xx TT ww xx TT QQ xx Main Effects Interactions
5 Issues of Including Interactions ISSUE 1: Including interactions leads to ultra high dimensionality e.g., 100 features will result in 10,000 interactions ISSUE 2: Hierarchical structural estimator is typically difficult to obtain
6 Related Work Ling Yan et al used lowrank interaction model for click through rate prediction Peter Radchenko et al proposed a Lasso Type penalty VANISH to achieve strong hierarchical structure Ming Yuan et al proposed a type of nonnegative garrote method to achieve hierarchical structures Peng Zhao et al proposed Composite Absolute Penalties for hierarchical interaction models Nam Hee Choi et al formulated the interaction coefficients as the product of main effects coefficients to achieve strong hierarchy
7 Weak Hierarchical Lasso (Jacob et al Annals of Statistics) Weak Hierarchy An interaction is selected only if at least one of the main effects is included in the model QQ iiii 00 only if ww ii 00 OR ww jj 00. Age Gender MMSE Age Gender Age MMSE Gender MMSE
8 Weak Hierarchical Lasso Weak Hierarchical Constraints QQ jj 11 ww jj for jj = 11,, dd M1 M2 M3 M4 M5 When jj = 1, ww M1 M2 M3 M4 M5 QQ M1 M1 M1 M2 M1 M3 M1 M4 M1 M5
9 Weak Hierarchical Lasso Weak Hierarchical Constraints QQ jj 11 ww jj for jj = 11,, dd M1 M2 M3 M4 M5 When jj = 2, ww M1 M2 M3 M4 M5 QQ M1 M2 M2 M2 M2 M3 M2 M4 M2 M5
10 Weak Hierarchical Lasso Let LL(ww, QQ) be the loss function for interaction regression The Weak Hierarchical Lasso formulation: min ww,qq LL ww, QQ + λλ ww 1 + λλ 2 QQ 1 ss. tt. QQ,jj 1 ww jj, jj = 1,, dd. Constraint QQ,jj 1 ww jj makes the optimization problem nonconvex
11 Weak Hierarchical Lasso Jacob et al propose to solve a relaxed version: min ww,qq LL ww+ ww, QQ + λλ1 TT (ww + + ww ) + λλ 2 QQ 1 ss. tt. QQ,jj ww + 1 jj + ww jj ww + jj 0, jj = 1,, dd ww jj 0 ww 1 is relaxed to ww + + ww such that the optimization problem is convex The relaxed formulation guarantees a weak hierarchical solution only if a ridge penalty exists
12 Main Contributions We propose to directly solve the nonconvex Weak Hierarchical Lasso by a proximal algorithm We show that the associated proximal operator admits a closedform solution We accelerate the computation of each subproblem of the proximal operator from quadratic to linearithmic
13 Weak Hierarchical Lasso : Proximal Operator min ww,qq LL ww, QQ + λλ ww 1 + λλ 2 QQ 1 ss. tt. QQ,jj 1 ww jj, jj = 1,, dd Proximal Operator: arg min ww,qq 1 2 ww vv QQ UU FF 2 + λλ tt ww 1 + λλ 2tt QQ 1 ss. tt. QQ,jj 1 ww jj, jj = 1,, dd
14 Factorizing Unknown Variables The KEY IDEA is to factorize unknown coefficients ww and QQ by their signs SS ww, SS QQ and magnitudes ww and QQ Unknown coefficients ww and QQ are factorized as = + + = ww SS ww ww QQ = + + = + = + + = + = + + = SS QQ QQ
15 Proximal Operator Subproblem The proximal operator problem can be decoupled ww jj = SS ww jj ww jj By factorization QQ,jj = SS QQjj QQ,jj ww jj 0 QQ,jj 0, the subproblem is: 1 min SS ww jj,ss QQ, ww jj jj, QQ,jj 2 SS wwjj ww 2 1 jj vv jj SS 2 λλ QQ jj QQ jj UU,jj + FF tt ww jj + λλ 2tt 1TT QQ jj ss. tt. 1TT QQ jj ww jj QQ,jj 0, jj = 1,, dd
16 Solving Sign Variables We solve sign variables SS ww, SS QQ and magnitude variables ww, QQ together Since 1 2 ww 2 1 jj vv jj = 2 SS wwjj ww jj vv jj = 1 2 SS wwjj (SS wwjj ww jj vv jj ) 2 2 = 1 2 ww jj SS ww jj vv jj 2, then SS ww = sign(vv) SS QQjj must be the sign of UU,jj, i.e., SS QQjj = sign(uu,jj )
17 Solving Magnitude Variables The magnitude variables can thus be obtained by solving 1 min QQ,jj, ww jj 2 ww 2 1 jj vv jj + QQ λλ jj UU,jj + 2 tt ww jj + λλ 2tt 1TT QQ jj ss. tt. 1TT QQ jj ww jj QQ,jj 0, where vv jj = SS ww jj vv jj λλ tt 11 and UU,jj = SS QQjj UU,jj λλ 2tt 11 The above subproblem has closed form solutions
18 Accelerating Computation A straightforward implementation of solving the magnitude variables will take quadratic time Not scalable for large scale data We further accelerate the time complexity of computing each proximal operator subproblem to linearithmic Exploit special structure of the subproblem
19 Comparisons of Efficiency We compare the efficiency of our algorithm with the convex Weak Hierarchical Lasso in terms of running time and numbers of iterations
20 Comparisons of Recovery We compare the recovery performances of the two algorithms
21 Performance on Real Data We compare the original Weak Hierarchical Lasso with the convex version and other baseline methods on ADNI data
22 Conclusion We study Weak Hierarchical Lasso which is popular for building nonlinear regression models with interactions nonconvex, challenging to solve We propose a proximal algorithm to directly solve the Weak Hierarchical Lasso We show that the associated proximal operator admits a closed form solution We further accelerate the computation of each proximal operator subproblem from quadratic to linearithmic Empirical study demonstrated superior efficiency and effectiveness of the proposed algorithm
23 Future Work Apply the nonconvex Weak Hierarchical Lasso to other challenging applications such as depression study Extend our proposed algorithm to solve Strong Hierarchical Lasso problems
24 Thank You!
Approximate Second Order Algorithms. Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo
Approximate Second Order Algorithms Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo Why Second Order Algorithms? Invariant under affine transformations e.g. stretching a function preserves the convergence
More informationLecture 11. Kernel Methods
Lecture 11. Kernel Methods COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture The kernel trick Efficient computation of a dot product
More informationLecture 6. Notes on Linear Algebra. Perceptron
Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors
More informationWorksheets for GCSE Mathematics. Quadratics. mr-mathematics.com Maths Resources for Teachers. Algebra
Worksheets for GCSE Mathematics Quadratics mr-mathematics.com Maths Resources for Teachers Algebra Quadratics Worksheets Contents Differentiated Independent Learning Worksheets Solving x + bx + c by factorisation
More informationChapter 22 : Electric potential
Chapter 22 : Electric potential What is electric potential? How does it relate to potential energy? How does it relate to electric field? Some simple applications What does it mean when it says 1.5 Volts
More informationSupport Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationWorksheets for GCSE Mathematics. Algebraic Expressions. Mr Black 's Maths Resources for Teachers GCSE 1-9. Algebra
Worksheets for GCSE Mathematics Algebraic Expressions Mr Black 's Maths Resources for Teachers GCSE 1-9 Algebra Algebraic Expressions Worksheets Contents Differentiated Independent Learning Worksheets
More informationAn Efficient Algorithm For Weak Hierarchical Lasso
An Efficient Algorithm For Weak Hierarchical Lasso Yashu Liu Arizona State University Tempe, AZ 8587 Yashu.Liu@asu.edu Jie Wang Arizona State University Tempe, AZ 8587 jie.wang.ustc@asu.edu Jieping Ye
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationLecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher
Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and
More informationA Least Squares Formulation for Canonical Correlation Analysis
A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation
More informationLecture 3. Linear Regression
Lecture 3. Linear Regression COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne Weeks 2 to 8 inclusive Lecturer: Andrey Kan MS: Moscow, PhD:
More informationVariations. ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra
Variations ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra Last Time Probability Density Functions Normal Distribution Expectation / Expectation of a function Independence Uncorrelated
More informationConvex relaxation for Combinatorial Penalties
Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,
More informationA Posteriori Error Estimates For Discontinuous Galerkin Methods Using Non-polynomial Basis Functions
Lin Lin A Posteriori DG using Non-Polynomial Basis 1 A Posteriori Error Estimates For Discontinuous Galerkin Methods Using Non-polynomial Basis Functions Lin Lin Department of Mathematics, UC Berkeley;
More informationLecture 10: Logistic Regression
BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics Lecture 10: Logistic Regression Jie Wang Department of Computational Medicine & Bioinformatics University of Michigan 1 Outline An
More information7.3 The Jacobi and Gauss-Seidel Iterative Methods
7.3 The Jacobi and Gauss-Seidel Iterative Methods 1 The Jacobi Method Two assumptions made on Jacobi Method: 1.The system given by aa 11 xx 1 + aa 12 xx 2 + aa 1nn xx nn = bb 1 aa 21 xx 1 + aa 22 xx 2
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationA semi-device-independent framework based on natural physical assumptions
AQIS 2017 4-8 September 2017 A semi-device-independent framework based on natural physical assumptions and its application to random number generation T. Van Himbeeck, E. Woodhead, N. Cerf, R. García-Patrón,
More informationCOMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017
COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We
More informationStatistical Learning with the Lasso, spring The Lasso
Statistical Learning with the Lasso, spring 2017 1 Yeast: understanding basic life functions p=11,904 gene values n number of experiments ~ 10 Blomberg et al. 2003, 2010 The Lasso fmri brain scans function
More informationNumber Representations
Computer Arithmetic Algorithms Prof. Dae Hyun Kim School of Electrical Engineering and Computer Science Washington State University Number Representations Information Textbook Israel Koren, Computer Arithmetic
More informationInferring the origin of an epidemic with a dynamic message-passing algorithm
Inferring the origin of an epidemic with a dynamic message-passing algorithm HARSH GUPTA (Based on the original work done by Andrey Y. Lokhov, Marc Mézard, Hiroki Ohta, and Lenka Zdeborová) Paper Andrey
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More informationDan Roth 461C, 3401 Walnut
CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn
More informationGrover s algorithm. We want to find aa. Search in an unordered database. QC oracle (as usual) Usual trick
Grover s algorithm Search in an unordered database Example: phonebook, need to find a person from a phone number Actually, something else, like hard (e.g., NP-complete) problem 0, xx aa Black box ff xx
More informationCSC 576: Variants of Sparse Learning
CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in
More informationVariable Selection for Highly Correlated Predictors
Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationAdaptive Primal Dual Optimization for Image Processing and Learning
Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University
More informationGeneral Strong Polarization
General Strong Polarization Madhu Sudan Harvard University Joint work with Jaroslaw Blasiok (Harvard), Venkatesan Gurswami (CMU), Preetum Nakkiran (Harvard) and Atri Rudra (Buffalo) December 4, 2017 IAS:
More informationCSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning Fall 2018/19 3. Improving Neural Networks (Some figures adapted from NNDL book) 1 Various Approaches to Improve Neural Networks 1. Cost functions Quadratic Cross
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne
More informationQuantum Mechanics. An essential theory to understand properties of matter and light. Chemical Electronic Magnetic Thermal Optical Etc.
Quantum Mechanics An essential theory to understand properties of matter and light. Chemical Electronic Magnetic Thermal Optical Etc. Fall 2018 Prof. Sergio B. Mendes 1 CHAPTER 3 Experimental Basis of
More informationReview for Exam Hyunse Yoon, Ph.D. Assistant Research Scientist IIHR-Hydroscience & Engineering University of Iowa
57:020 Fluids Mechanics Fall2013 1 Review for Exam3 12. 11. 2013 Hyunse Yoon, Ph.D. Assistant Research Scientist IIHR-Hydroscience & Engineering University of Iowa 57:020 Fluids Mechanics Fall2013 2 Chapter
More informationAccelerated primal-dual methods for linearly constrained convex problems
Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize
More informationMinimizing the Difference of L 1 and L 2 Norms with Applications
1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:
More informationSolving DC Programs that Promote Group 1-Sparsity
Solving DC Programs that Promote Group 1-Sparsity Ernie Esser Contains joint work with Xiaoqun Zhang, Yifei Lou and Jack Xin SIAM Conference on Imaging Science Hong Kong Baptist University May 14 2014
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 2: Vector Data: Prediction Instructor: Yizhou Sun yzsun@cs.ucla.edu October 8, 2018 TA Office Hour Time Change Junheng Hao: Tuesday 1-3pm Yunsheng Bai: Thursday 1-3pm
More informationRadial Basis Function (RBF) Networks
CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks 1 Function approximation We have been using MLPs as pattern classifiers But in general, they are function approximators Depending
More informationSECTION 5: POWER FLOW. ESE 470 Energy Distribution Systems
SECTION 5: POWER FLOW ESE 470 Energy Distribution Systems 2 Introduction Nodal Analysis 3 Consider the following circuit Three voltage sources VV sss, VV sss, VV sss Generic branch impedances Could be
More information1. The graph of a function f is given above. Answer the question: a. Find the value(s) of x where f is not differentiable. Ans: x = 4, x = 3, x = 2,
1. The graph of a function f is given above. Answer the question: a. Find the value(s) of x where f is not differentiable. x = 4, x = 3, x = 2, x = 1, x = 1, x = 2, x = 3, x = 4, x = 5 b. Find the value(s)
More informationAn Efficient Graph Sparsification Approach to Scalable Harmonic Balance (HB) Analysis of Strongly Nonlinear RF Circuits
Design Automation Group An Efficient Graph Sparsification Approach to Scalable Harmonic Balance (HB) Analysis of Strongly Nonlinear RF Circuits Authors : Lengfei Han (Speaker) Xueqian Zhao Dr. Zhuo Feng
More informationCS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds
CS 6347 Lecture 8 & 9 Lagrange Multipliers & Varitional Bounds General Optimization subject to: min ff 0() R nn ff ii 0, h ii = 0, ii = 1,, mm ii = 1,, pp 2 General Optimization subject to: min ff 0()
More informationSVRG++ with Non-uniform Sampling
SVRG++ with Non-uniform Sampling Tamás Kern András György Department of Electrical and Electronic Engineering Imperial College London, London, UK, SW7 2BT {tamas.kern15,a.gyorgy}@imperial.ac.uk Abstract
More informationOnline Mode Shape Estimation using Complex Principal Component Analysis and Clustering, Hallvar Haugdal (NTMU Norway)
Online Mode Shape Estimation using Complex Principal Component Analysis and Clustering, Hallvar Haugdal (NTMU Norway) Mr. Hallvar Haugdal, finished his MSc. in Electrical Engineering at NTNU, Norway in
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationSOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu
SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray
More informationLearning with L q<1 vs L 1 -norm regularisation with exponentially many irrelevant features
Learning with L q
More informationRotational Motion. Chapter 10 of Essential University Physics, Richard Wolfson, 3 rd Edition
Rotational Motion Chapter 10 of Essential University Physics, Richard Wolfson, 3 rd Edition 1 We ll look for a way to describe the combined (rotational) motion 2 Angle Measurements θθ ss rr rrrrrrrrrrrrrr
More informationExploratory quantile regression with many covariates: An application to adverse birth outcomes
Exploratory quantile regression with many covariates: An application to adverse birth outcomes June 3, 2011 eappendix 30 Percent of Total 20 10 0 0 1000 2000 3000 4000 5000 Birth weights efigure 1: Histogram
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationStatistical Machine Learning for Structured and High Dimensional Data
Statistical Machine Learning for Structured and High Dimensional Data (FA9550-09- 1-0373) PI: Larry Wasserman (CMU) Co- PI: John Lafferty (UChicago and CMU) AFOSR Program Review (Jan 28-31, 2013, Washington,
More informationLecture No. 1 Introduction to Method of Weighted Residuals. Solve the differential equation L (u) = p(x) in V where L is a differential operator
Lecture No. 1 Introduction to Method of Weighted Residuals Solve the differential equation L (u) = p(x) in V where L is a differential operator with boundary conditions S(u) = g(x) on Γ where S is a differential
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationSimultaneous state and input estimation of non-linear process with unknown inputs using particle swarm optimization particle filter (PSO-PF) algorithm
Simultaneous state and input estimation of non-linear process with unknown inputs using particle swarm optimization particle filter (PSO-PF) algorithm Mohammad A. Khan, CSChe 2016 Outlines Motivations
More informationKinetic Model Parameter Estimation for Product Stability: Non-uniform Finite Elements and Convexity Analysis
Kinetic Model Parameter Estimation for Product Stability: Non-uniform Finite Elements and Convexity Analysis Mark Daichendt, Lorenz Biegler Carnegie Mellon University Pittsburgh, PA 15217 Ben Weinstein,
More informationVariable Selection in Predictive Regressions
Variable Selection in Predictive Regressions Alessandro Stringhi Advanced Financial Econometrics III Winter/Spring 2018 Overview This chapter considers linear models for explaining a scalar variable when
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 11, 2016 Paper presentations and final project proposal Send me the names of your group member (2 or 3 students) before October 15 (this Friday)
More information" = Y(#,$) % R(r) = 1 4& % " = Y(#,$) % R(r) = Recitation Problems: Week 4. a. 5 B, b. 6. , Ne Mg + 15 P 2+ c. 23 V,
Recitation Problems: Week 4 1. Which of the following combinations of quantum numbers are allowed for an electron in a one-electron atom: n l m l m s 2 2 1! 3 1 0 -! 5 1 2! 4-1 0! 3 2 1 0 2 0 0 -! 7 2-2!
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationOn one Application of Newton s Method to Stability Problem
Journal of Multidciplinary Engineering Science Technology (JMEST) ISSN: 359-0040 Vol. Issue 5, December - 204 On one Application of Newton s Method to Stability Problem Şerife Yılmaz Department of Mathematics,
More informationTerms of Use. Copyright Embark on the Journey
Terms of Use All rights reserved. No part of this packet may be reproduced, stored in a retrieval system, or transmitted in any form by any means - electronic, mechanical, photo-copies, recording, or otherwise
More informationStingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent
: Safely Avoiding Wasteful Updates in Coordinate Descent Tyler B. Johnson 1 Carlos Guestrin 1 Abstract Coordinate descent (CD) is a scalable and simple algorithm for solving many optimization problems
More informationPredicting Winners of Competitive Events with Topological Data Analysis
Predicting Winners of Competitive Events with Topological Data Analysis Conrad D Souza Ruben Sanchez-Garcia R.Sanchez-Garcia@soton.ac.uk Tiejun Ma tiejun.ma@soton.ac.uk Johnnie Johnson J.E.Johnson@soton.ac.uk
More informationSparse Sum-of-Squares Optimization for Model Updating through. Minimization of Modal Dynamic Residuals
Sparse Sum-of-Squares Optimization for Model Updating through Minimization of Modal Dynamic Residuals 1 Dan Li, 1, 2 * Yang Wang 1 School of Civil and Environmental Engineering, Georgia Institute of Technology,
More informationBAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationDetecting Statistical Interactions from Neural Network Weights
Detecting Statistical Interactions from Neural Network Weights Michael Tsang Joint work with Dehua Cheng, Yan Liu 1/17 Motivation: We seek assurance that a neural network learned the longitude x latitude
More informationControl of Mobile Robots
Control of Mobile Robots Regulation and trajectory tracking Prof. Luca Bascetta (luca.bascetta@polimi.it) Politecnico di Milano Dipartimento di Elettronica, Informazione e Bioingegneria Organization and
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Vector Data: Clustering: Part II Instructor: Yizhou Sun yzsun@cs.ucla.edu May 3, 2017 Methods to Learn: Last Lecture Classification Clustering Vector Data Text Data Recommender
More informationMLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net
More informationUncertain Compression & Graph Coloring. Madhu Sudan Harvard
Uncertain Compression & Graph Coloring Madhu Sudan Harvard Based on joint works with: (1) Adam Kalai (MSR), Sanjeev Khanna (U.Penn), Brendan Juba (WUStL) (2) Elad Haramaty (Harvard) (3) Badih Ghazi (MIT),
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationA A A A A A A A A A A A. a a a a a a a a a a a a a a a. Apples taste amazingly good.
Victorian Handwriting Sheet Aa A A A A A A A A A A A A Aa Aa Aa Aa Aa Aa Aa a a a a a a a a a a a a a a a Apples taste amazingly good. Apples taste amazingly good. Now make up a sentence of your own using
More informationLOWELL WEEKLY JOURNAL. ^Jberxy and (Jmott Oao M d Ccmsparftble. %m >ai ruv GEEAT INDUSTRIES
? (») /»» 9 F ( ) / ) /»F»»»»»# F??»»» Q ( ( »»» < 3»» /» > > } > Q ( Q > Z F 5
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationJasmin Smajic1, Christian Hafner2, Jürg Leuthold2, March 23, 2015
Jasmin Smajic, Christian Hafner 2, Jürg Leuthold 2, March 23, 205 Time Domain Finite Element Method (TD FEM): Continuous and Discontinuous Galerkin (DG-FEM) HSR - University of Applied Sciences of Eastern
More informationLasso: Algorithms and Extensions
ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions
More informationA Precise Model of TSV Parasitic Capacitance Considering Temperature for 3D IC DENG Quan ZHANG Min-Xuan ZHAO Zhen-Yu LI Peng
International Conference on Automation, Mechanical Control and Computational Engineering (AMCCE 2015) A Precise Model of TSV Parasitic Capacitance Considering Temperature for 3D IC DENG Quan ZHANG Min-Xuan
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large
More informationEstimation of cumulative distribution function with spline functions
INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative
More informationDistributed Inexact Newton-type Pursuit for Non-convex Sparse Learning
Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Bo Liu Department of Computer Science, Rutgers Univeristy Xiao-Tong Yuan BDAT Lab, Nanjing University of Information Science and Technology
More informationHigh-dimensional Joint Sparsity Random Effects Model for Multi-task Learning
High-dimensional Joint Sparsity Random Effects Model for Multi-task Learning Krishnakumar Balasubramanian Georgia Institute of Technology krishnakumar3@gatech.edu Kai Yu Baidu Inc. yukai@baidu.com Tong
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationGROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA
GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA D Pimentel-Alarcón 1, L Balzano 2, R Marcia 3, R Nowak 1, R Willett 1 1 University of Wisconsin - Madison, 2 University of Michigan - Ann Arbor, 3 University
More informationRobust Inverse Covariance Estimation under Noisy Measurements
.. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related
More informationHeteroskedasticity ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD
Heteroskedasticity ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD Introduction For pedagogical reasons, OLS is presented initially under strong simplifying assumptions. One of these is homoskedastic errors,
More informationDISTURBANCE OBSERVER BASED CONTROL: CONCEPTS, METHODS AND CHALLENGES
DISTURBANCE OBSERVER BASED CONTROL: CONCEPTS, METHODS AND CHALLENGES Wen-Hua Chen Professor in Autonomous Vehicles Department of Aeronautical and Automotive Engineering Loughborough University 1 Outline
More informationAn Efficient Proximal Gradient Method for General Structured Sparse Learning
Journal of Machine Learning Research 11 (2010) Submitted 11/2010; Published An Efficient Proximal Gradient Method for General Structured Sparse Learning Xi Chen Qihang Lin Seyoung Kim Jaime G. Carbonell
More informationBig Bang Planck Era. This theory: cosmological model of the universe that is best supported by several aspects of scientific evidence and observation
Big Bang Planck Era Source: http://www.crystalinks.com/bigbang.html Source: http://www.odec.ca/index.htm This theory: cosmological model of the universe that is best supported by several aspects of scientific
More informationOn The Cauchy Problem For Some Parabolic Fractional Partial Differential Equations With Time Delays
Journal of Mathematics and System Science 6 (216) 194-199 doi: 1.17265/2159-5291/216.5.3 D DAVID PUBLISHING On The Cauchy Problem For Some Parabolic Fractional Partial Differential Equations With Time
More informationComputational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center
Computational and Statistical Aspects of Statistical Machine Learning John Lafferty Department of Statistics Retreat Gleacher Center Outline Modern nonparametric inference for high dimensional data Nonparametric
More informationWritten Examination
Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes
More informationRegularized Least Squares Temporal Difference learning with nested l 2 and l 1 penalization
Regularized Least Squares Temporal Difference learning with nested l 2 and l 1 penalization Matthew W. Hoffman 1, Alessandro Lazaric 2, Mohammad Ghavamzadeh 2, and Rémi Munos 2 1 University of British
More informationSection 2: Equations and Inequalities
Topic 1: Equations: True or False?... 29 Topic 2: Identifying Properties When Solving Equations... 31 Topic 3: Solving Equations... 34 Topic 4: Solving Equations Using the Zero Product Property... 36 Topic
More informationAccelerate Subgradient Methods
Accelerate Subgradient Methods Tianbao Yang Department of Computer Science The University of Iowa Contributors: students Yi Xu, Yan Yan and colleague Qihang Lin Yang (CS@Uiowa) Accelerate Subgradient Methods
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More information