Is the Whole Greater Than the Sum of Its Parts?

Size: px
Start display at page:

Download "Is the Whole Greater Than the Sum of Its Parts?"

Transcription

1 Is the Whole Greater Than the Sum of Its Parts? Presenter: Liangyue Li Joint work with Liangyue Li (ASU) Hanghang Tong (ASU) Yong Wang (HKUST) Conglei Shi (IBM->Airbnb) Nan Cao (Tongji) Norbou Buchler (US ARL) - 1 -

2 From the Ancient Philosophy The whole is greater than the sum of its parts. -- Aristotle Whole: a collection of parts Parts: individual elements Aristotle s hypothesis: whole > sum of parts - 2 -

3 Part-Whole in Team Science Research Team Sports Team Film Crew Whole Team Parts Team members Sales Team - 3 -

4 Part-Whole Beyond Teams Autonomous System Whole: system Parts: individual drones Stock Market Whole: DJIA Parts: individual stock Community Question Answering Whole: question Parts: individual answers System Reliability Whole: system Parts: individual component - 4 -

5 Outcome of Part-Whole Whole: Team Part: Members Whole outcome: Team s performance Part outcome: each member s performance Whole: Researcher Part: Publications Whole outcome: h-index Part outcome: #citations of publications Question: how can we predict the outcome of whole/parts? - 5 -

6 Predict the Part-Whole Outcomes Existing Algorithmic Work Separate models for parts and whole Joint linear models Aristotle s hypothesis: whole>sum(parts) Question: how to jointly predict part/whole by leveraging the part-whole relationship beyond the linear models? - 6 -

7 Avg. Answer Impact Challenges -- Modeling Nonlinear Part-whole Relationship Max: impact of a question is strongly correlated with that of the best answer Min: classic Wooden Bucket Theory Sparsity: team performance often dominated by a few top-performing team members - 7 -

8 Challenges Modeling (con t) Part-part Interdependency Parts are connected via underlying network Impact of such interdependency on outcomes Hypothesis-1: similar parts -> similar contribution to whole Hypothesis-2: similar parts -> similar parts outcome whole parts Question: how can we utilize 1. nonlinear part-whole relationship 2. part-part interdependency

9 Challenges -- Algorithm Non-linearity + Interdependency high complexity Question: how to scale up the computation? - 9 -

10 Part-Whole Outcome Prediction Movies (Whole) Actors/Actress (Parts) F p F o y p y o G p Given: 1. feature matrix for whole/part F o /F p 2. outcome vector for whole/part y o /y p 3. whole to part mapping φ 4. parts network G p (optional) Predict: outcome of new whole/parts

11 Roadmap Motivations PAROLE -- Modeling Generic Framework Modeling Part-Whole Relationship Modeling Part-Part Interdependency PAROLE -- Optimization Empirical Evaluations Conclusions

12 A Generic Joint Prediction Framework -- PAROLE Formulation min J = J o + J p + J po + J pp + J r J o Movie (Whole) F o (1, : ) Actor/Actress (Part) F p (5, : ) J po F o (2, : ) F p (4, : ) = 1 n o n o i=1 L[f F o i, :, w o, y o i )] + 1 n p n p L[f F p i, :, w p, y p i )] i=1 + α n o n o i=1 h(f F o i, :, w o, Agg(φ(o i ))) J p F p (1, : ) J pp F p (2, : ) F p (3, : ) J o : Predictive Model for Whole J p : Predictive Model for Part J po : Part-whole Relationship n p + β n p i=1 n p G p ij g(f F p i, :, w p, f(f p j, :, w p )) j=1 J pp : Part-part Interdependency γ(ω w o + Ω w p ) J r : parameter regularizer

13 Roadmap Motivations PAROLE -- Modeling Generic Framework Modeling Part-Whole Relationship Modeling Part-Part Interdependency PAROLE -- Optimization Empirical Evaluations Conclusions J o Movie (Whole) F o (1, : ) Actor/Actress (Part) F p (5, : ) J p J pp F p (1, : ) F o (2, : ) J po F p (4, : ) F p (3, : ) F p (2, : )

14 Modeling Part-Whole Relationship Agg(o i ) Overview: for each whole entity o i, define e i = F o i, w o Agg o i e i : Measure the difference between predicted whole outcome using whole feature predicted whole outcome using aggregated parts outcome Key idea: model part-whole relations by Different loss functions on e i Different aggregation functions Agg( )

15 Overview Intuition: whole (weighted) sum of parts Details: a j i : weight of part j s contribution to the whole o i s outcome Remark: e i = F o i, w o Agg o i Agg o i = a i j F p j, : w p j φ(o i ) Characterize part-whole relationships Agg(o i ) i a 1 i a i 2 a 3 a 4 i Use different loss functions on e i Use different norms on a i

16 Linear Part-Whole Relation linear a 1 i a 2 i a 3 i a 4 i Intuition: Whole linear combination of parts some parts play more important roles than the others in contributing to the whole outcome Details: J po = α 2n o n o 2 i=1 e i Remark: a j i = 1: the whole is the sum of its parts a i j = 1 : average coupling o i

17 Sparse Part-Whole Relation Intuition: Whole a few parts sparse a 1 i a 2 i = 0 a 3 i a 4 i some parts have little or no effect on the whole outcome Details: J po = α n o n o i=1( 1 e 2 i 2 +λ a i 1 ) Remark: The l 1 norm can shrink some part contributions a i j to exactly zero Nonlinear part-whole relation

18 Ordered Sparse Part-Whole Relation Intuition: Whole a few top parts team performance is determined by not only a few key members, but also the structural hierarchy between them Details: J po = α n o n o i=1( 1 e 2 i 2 +λω w (a i )) n Ω w x = i=1 x [i] w i = w T x : ordered weighted l 1 norm w K m+ : vector of non-increasing nonnegative weights

19 Robust Part-Whole Relation Intuition: Whole parts that are not outliers squared loss is sensitive to outliers. Solution: robust regression model Details: J po = α n o n o i=1 ρ( ) is robust estimator ρ e i

20 Maximum Part-Whole Relation Intuition: Whole max(parts) team performance dominated by the best team member/leader Details: Agg o i = max(parts outcome) [not differentiable] Soft max function: max x 1, x 2,, x n ln(exp x 1 + exp x exp(x n )) Aggregation: Agg o i = ln( j φ(oi ) exp(fp j, : w p )) J po = α 2n o n o 2 ei i=1 max

21 Summarize Part-Whole Relations Name Agg(o i ) Aggregation of parts Maximum ln( exp(f p j, : w p )) J po Sub-objective Remark α 2n o e i 2 Nonlinear Whole max(parts) Linear a j i F p j, : w p α 2n o e i 2 Linear Whole linear combination of parts Sparse a j i F p j, : w p α n o ( 1 2 e i 2 + λ a i 1 ) Nonlinear Whole a few parts Ordered Sparse a j i F p j, : w p α n o ( 1 2 e i 2 + λω w a i ) Nonlinear Whole a few top parts Robust a j i F p j, : w p α n o ρ(e i ) Nonlinear Whole parts that are not outliers

22 Roadmap Motivations PAROLE -- Modeling Generic Framework Modeling Part-Whole Relationship Modeling Part-Part Interdependency PAROLE -- Optimization Empirical Evaluations Conclusions J o Movie (Whole) F o (1, : ) Actor/Actress (Part) F p (5, : ) J p J pp F p (1, : ) F o (2, : ) J po F p (4, : ) F p (3, : ) F p (2, : )

23 Modeling Part-Part Interdependency Effect on the whole outcome Intuition: closely connected parts might have similar contribution to the whole outcome Details: a 1 i a 2 i a 3 i a 4 i Similar parts (large G p kl ) similar contributions (a k i a l i ) p G 12 p G

24 Modeling Part-Part Interdependency Effect on the parts outcome Intuition: closely connected parts might share similar outcomes themselves Details: Fp 4, wp F p 1, w p p G 12 p G 14 Similar parts (large G ij p ) similar predicted outcomes (F p i, w p F p j, : w p )

25 Roadmap Motivations PAROLE -- Modeling PAROLE -- Optimization Empirical Evaluations Conclusions

26 Optimization Solution Formulation: J = J o w o + J p w p + J po w o, w p, a j i J pp w p Observation: + J r w o, w p Movie (Whole) Actor/Actress (Part) F p (5, : ) not jointly convex w.r.t. w o, w p, a i j J p F p (1, : ) J o F o (1, : ) J pp + F p (2, : ) F o (2, : ) J po F p (4, : ) F p (3, : ) Convex w.r.t. to one block while fixing others Solution: block coordinate descent

27 Block Coordinate Descent Three coordinate blocks: w o, w p i, a j Update one block while fixing others Update each block (proximal) gradient descent Maximum Agg Linear Agg Sparse Agg Order Sparse Agg α n o i=1 J po w o n o ei (F o i, : ) α n o i=1 J po w p n o ei j φ o i (F p (j, : )) y i p j φ o i y i p J po a i or proximal gradient update α (F o ) (F o w o MF p w p ) α F p M (F o w o MF p w p ) e i F p φ o i, w p + L p i a i n o n o α (F o ) (F o w o MF p w p ) α F p M (F o w o MF p w p ) z = a i τ e i F p φ o i, : w p + L i n o n o a i prox λτl1 (z) α (F o ) (F o w o MF p w p ) α F p M (F o w o MF p w p ) z = a i τ e i F p φ o i, : w p + L i n o n o a i prox λτωw (z) N/A p a i p a i Robust Agg α n o n o i=1 ρ e i e i F o i, : α n o n o i=1 ρ e i e i ( j φ o i a j F p (j, : ) ) α n o ρ e i F p φ o e i, : w p + L p i a i i

28 Optimization Properties Convergence and Optimality Under mild conditions, the optimization alg converges to a coordinate-wise minimum point Complexity The alg scales linearly w.r.t. the size of partwhole graph in both time and space Whole Parts Complexity: O(n o d o + n p d p + m po + m pp ) n o : #whole entities n p : #part entities m po : #links from whole to parts m pp : #links in part-part network d o, d p : feature dimension of whole, parts

29 Roadmap Motivations PAROLE -- Modeling PAROLE -- Optimization Empirical Evaluations Conclusions

30 Datasets Data Whole Part #Whole #Part Math SO DBLP Movie Question (#votes) Question (#votes) Author (h-index) Movie (# ) Answer (#votes) Answer (#votes) Paper (#citation) Actors/Actress (# ) 16,638 32,876 1,966,272 4,282, , ,756 5,043 37,365 Setup: sort whole in chronological order, gather first x percent and corresponding parts as training, test on last 10% Metric: root mean squared error (RMSE)

31 Joint Nonlinear Outcome Prediction Performance Math Sparse Observations 1. Joint prediction models > separate models 2. Non-linear part-whole relationships (max, Huber, Bisquare, Lasso, OWL) > linear relationships (Sum, Linear) 3. Lasso and OWL are the best methods in most cases

32 Outcome Prediction Performance Whole Parts Overall SO DBLP

33 Effect of part-part interdependency Movie PAROLE-Basic no network information Part-part interdependency on whole outcome and parts outcome both boost the performance

34 Convergence Analysis SO PAROLE converges fast (25-30 iterations)

35 Parameter Sensitivity Movie α controls importance of part-whole relation β controls importance of part-part interdependency Stable in a relatively large parameter space

36 Scalability of PAROLE SO part-whole graph size: PAROLE scales linearly w.r.t. part-whole graph size

37 Roadmap Motivations PAROLE -- Modeling PAROLE -- Optimization Empirical Evaluations Conclusions

38 Conclusions -- PAROLE Goals: leverage potentially non-linear partwhole relationships for outcome prediction Solutions: PAROLE Modeling Part-whole relationship Part-part interdependency Optimization Block coordinate descent Converges to a coordinate-wise minimum point Scales linearly w.r.t. the part-whole graph size

Is the Whole Greater Than the Sum of Its Parts?

Is the Whole Greater Than the Sum of Its Parts? ABSTRACT Is the Whole Greater Than the of Its Parts? Liangyue Li Arizona State University liangyue@asu.edu Conglei Shi IBM Research shiconglei@gmail.com The part-whole relationship routinely finds itself

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Proximal Methods for Optimization with Spasity-inducing Norms

Proximal Methods for Optimization with Spasity-inducing Norms Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Learning Task Grouping and Overlap in Multi-Task Learning

Learning Task Grouping and Overlap in Multi-Task Learning Learning Task Grouping and Overlap in Multi-Task Learning Abhishek Kumar Hal Daumé III Department of Computer Science University of Mayland, College Park 20 May 2013 Proceedings of the 29 th International

More information

Logistic Regression with the Nonnegative Garrote

Logistic Regression with the Nonnegative Garrote Logistic Regression with the Nonnegative Garrote Enes Makalic Daniel F. Schmidt Centre for MEGA Epidemiology The University of Melbourne 24th Australasian Joint Conference on Artificial Intelligence 2011

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion

More information

Robust high-dimensional linear regression: A statistical perspective

Robust high-dimensional linear regression: A statistical perspective Robust high-dimensional linear regression: A statistical perspective Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics STOC workshop on robustness and nonconvexity Montreal,

More information

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT) Metric Embedding of Task-Specific Similarity Greg Shakhnarovich Brown University joint work with Trevor Darrell (MIT) August 9, 2006 Task-specific similarity A toy example: Task-specific similarity A toy

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

18.6 Regression and Classification with Linear Models

18.6 Regression and Classification with Linear Models 18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight

More information

Linear and Logistic Regression. Dr. Xiaowei Huang

Linear and Logistic Regression. Dr. Xiaowei Huang Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Lecture Notes 10: Matrix Factorization

Lecture Notes 10: Matrix Factorization Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To

More information

DATA MINING AND MACHINE LEARNING

DATA MINING AND MACHINE LEARNING DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems

More information

Learning the Number of Neurons in Deep Networks

Learning the Number of Neurons in Deep Networks Learning the Number of Jose M. Alvarez 1 Mathieu Salzmanno 2 1 Data61 @ CSIRO,Canberra, ACT 2601, Australia 2 CVLab, EPFL,CH-1015 Lausanne, Switzerland NIPS,2016 Presenter: Arshdeep Sekhon NIPS,2016 Presenter:

More information

The Frank-Wolfe Algorithm:

The Frank-Wolfe Algorithm: The Frank-Wolfe Algorithm: New Results, and Connections to Statistical Boosting Paul Grigas, Robert Freund, and Rahul Mazumder http://web.mit.edu/rfreund/www/talks.html Massachusetts Institute of Technology

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large

More information

A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation

A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation Yue Ning 1 Yue Shi 2 Liangjie Hong 2 Huzefa Rangwala 3 Naren Ramakrishnan 1 1 Virginia Tech 2 Yahoo Research. Yue Shi

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion

An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion Robert M. Freund, MIT joint with Paul Grigas (UC Berkeley) and Rahul Mazumder (MIT) CDC, December 2016 1 Outline of Topics

More information

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18 CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H

More information

l 1 and l 2 Regularization

l 1 and l 2 Regularization David Rosenberg New York University February 5, 2015 David Rosenberg (New York University) DS-GA 1003 February 5, 2015 1 / 32 Tikhonov and Ivanov Regularization Hypothesis Spaces We ve spoken vaguely about

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Lasso Regression: Regularization for feature selection

Lasso Regression: Regularization for feature selection Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 1 Feature selection task 2 1 Why might you want to perform feature selection? Efficiency: - If

More information

Recommendation Systems

Recommendation Systems Recommendation Systems Popularity Recommendation Systems Predicting user responses to options Offering news articles based on users interests Offering suggestions on what the user might like to buy/consume

More information

ARock: an algorithmic framework for asynchronous parallel coordinate updates

ARock: an algorithmic framework for asynchronous parallel coordinate updates ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic

More information

Smoothing Proximal Gradient Method. General Structured Sparse Regression

Smoothing Proximal Gradient Method. General Structured Sparse Regression for General Structured Sparse Regression Xi Chen, Qihang Lin, Seyoung Kim, Jaime G. Carbonell, Eric P. Xing (Annals of Applied Statistics, 2012) Gatsby Unit, Tea Talk October 25, 2013 Outline Motivation:

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Scalable robust hypothesis tests using graphical models

Scalable robust hypothesis tests using graphical models Scalable robust hypothesis tests using graphical models Umamahesh Srinivas ipal Group Meeting October 22, 2010 Binary hypothesis testing problem Random vector x = (x 1,...,x n ) R n generated from either

More information

Using SVD to Recommend Movies

Using SVD to Recommend Movies Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last

More information

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017 Machine Learning Regularization and Feature Selection Fabio Vandin November 13, 2017 1 Learning Model A: learning algorithm for a machine learning task S: m i.i.d. pairs z i = (x i, y i ), i = 1,..., m,

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

LASSO Review, Fused LASSO, Parallel LASSO Solvers

LASSO Review, Fused LASSO, Parallel LASSO Solvers Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable

More information

MATH 680 Fall November 27, Homework 3

MATH 680 Fall November 27, Homework 3 MATH 680 Fall 208 November 27, 208 Homework 3 This homework is due on December 9 at :59pm. Provide both pdf, R files. Make an individual R file with proper comments for each sub-problem. Subgradients and

More information

Homework 6. Due: 10am Thursday 11/30/17

Homework 6. Due: 10am Thursday 11/30/17 Homework 6 Due: 10am Thursday 11/30/17 1. Hinge loss vs. logistic loss. In class we defined hinge loss l hinge (x, y; w) = (1 yw T x) + and logistic loss l logistic (x, y; w) = log(1 + exp ( yw T x ) ).

More information

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization

More information

Primal-dual coordinate descent

Primal-dual coordinate descent Primal-dual coordinate descent Olivier Fercoq Joint work with P. Bianchi & W. Hachem 15 July 2015 1/28 Minimize the convex function f, g, h convex f is differentiable Problem min f (x) + g(x) + h(mx) x

More information

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint

More information

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis

More information

9 Classification. 9.1 Linear Classifiers

9 Classification. 9.1 Linear Classifiers 9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression Behavioral Data Mining Lecture 7 Linear and Logistic Regression Outline Linear Regression Regularization Logistic Regression Stochastic Gradient Fast Stochastic Methods Performance tips Linear Regression

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Online Nonnegative Matrix Factorization with General Divergences

Online Nonnegative Matrix Factorization with General Divergences Online Nonnegative Matrix Factorization with General Divergences Vincent Y. F. Tan (ECE, Mathematics, NUS) Joint work with Renbo Zhao (NUS) and Huan Xu (GeorgiaTech) IWCT, Shanghai Jiaotong University

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

ML4NLP Multiclass Classification

ML4NLP Multiclass Classification ML4NLP Multiclass Classification CS 590NLP Dan Goldwasser Purdue University dgoldwas@purdue.edu Social NLP Last week we discussed the speed-dates paper. Interesting perspective on NLP problems- Can we

More information

Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, Emily Fox 2014

Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, Emily Fox 2014 Case Study 3: fmri Prediction Fused LASSO LARS Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, 2014 Emily Fox 2014 1 LASSO Regression

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Linear Classifiers: predictions Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due Friday of next

More information

Manfred K. Warmuth - UCSC S.V.N. Vishwanathan - Purdue & Microsoft Research. Updated: March 23, Warmuth (UCSC) ICML 09 Boosting Tutorial 1 / 62

Manfred K. Warmuth - UCSC S.V.N. Vishwanathan - Purdue & Microsoft Research. Updated: March 23, Warmuth (UCSC) ICML 09 Boosting Tutorial 1 / 62 Updated: March 23, 2010 Warmuth (UCSC) ICML 09 Boosting Tutorial 1 / 62 ICML 2009 Tutorial Survey of Boosting from an Optimization Perspective Part I: Entropy Regularized LPBoost Part II: Boosting from

More information

Journal Club. A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) March 8th, CMAP, Ecole Polytechnique 1/19

Journal Club. A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) March 8th, CMAP, Ecole Polytechnique 1/19 Journal Club A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) CMAP, Ecole Polytechnique March 8th, 2018 1/19 Plan 1 Motivations 2 Existing Acceleration Methods 3 Universal

More information

Anytime Planning for Decentralized Multi-Robot Active Information Gathering

Anytime Planning for Decentralized Multi-Robot Active Information Gathering Anytime Planning for Decentralized Multi-Robot Active Information Gathering Brent Schlotfeldt 1 Dinesh Thakur 1 Nikolay Atanasov 2 Vijay Kumar 1 George Pappas 1 1 GRASP Laboratory University of Pennsylvania

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University. SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression

More information

Lasso Regression: Regularization for feature selection

Lasso Regression: Regularization for feature selection Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 Feature selection task 1 Why might you want to perform feature selection? Efficiency: - If size(w)

More information

Error Functions & Linear Regression (2)

Error Functions & Linear Regression (2) Error Functions & Linear Regression (2) John Kelleher & Brian Mac Namee Machine Learning @ DIT Overview 1 Introduction Overview 2 Linear Classifiers Threshold Function Perceptron Learning Rule Training/Learning

More information

RaRE: Social Rank Regulated Large-scale Network Embedding

RaRE: Social Rank Regulated Large-scale Network Embedding RaRE: Social Rank Regulated Large-scale Network Embedding Authors: Yupeng Gu 1, Yizhou Sun 1, Yanen Li 2, Yang Yang 3 04/26/2018 The Web Conference, 2018 1 University of California, Los Angeles 2 Snapchat

More information

Laconic: Label Consistency for Image Categorization

Laconic: Label Consistency for Image Categorization 1 Laconic: Label Consistency for Image Categorization Samy Bengio, Google with Jeff Dean, Eugene Ie, Dumitru Erhan, Quoc Le, Andrew Rabinovich, Jon Shlens, and Yoram Singer 2 Motivation WHAT IS THE OCCLUDED

More information

Accelerate Subgradient Methods

Accelerate Subgradient Methods Accelerate Subgradient Methods Tianbao Yang Department of Computer Science The University of Iowa Contributors: students Yi Xu, Yan Yan and colleague Qihang Lin Yang (CS@Uiowa) Accelerate Subgradient Methods

More information

Coordinate Update Algorithm Short Course The Package TMAC

Coordinate Update Algorithm Short Course The Package TMAC Coordinate Update Algorithm Short Course The Package TMAC Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 16 TMAC: A Toolbox of Async-Parallel, Coordinate, Splitting, and Stochastic Methods C++11 multi-threading

More information

Importance Sampling for Minibatches

Importance Sampling for Minibatches Importance Sampling for Minibatches Dominik Csiba School of Mathematics University of Edinburgh 07.09.2016, Birmingham Dominik Csiba (University of Edinburgh) Importance Sampling for Minibatches 07.09.2016,

More information

Likelihood-Based Methods

Likelihood-Based Methods Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Binary matrix completion

Binary matrix completion Binary matrix completion Yaniv Plan University of Michigan SAMSI, LDHD workshop, 2013 Joint work with (a) Mark Davenport (b) Ewout van den Berg (c) Mary Wootters Yaniv Plan (U. Mich.) Binary matrix completion

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

CS7267 MACHINE LEARNING

CS7267 MACHINE LEARNING CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning

More information

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in

More information

Multivariate Calibration with Robust Signal Regression

Multivariate Calibration with Robust Signal Regression Multivariate Calibration with Robust Signal Regression Bin Li and Brian Marx from Louisiana State University Somsubhra Chakraborty from Indian Institute of Technology Kharagpur David C Weindorf from Texas

More information

L 2,1 Norm and its Applications

L 2,1 Norm and its Applications L 2, Norm and its Applications Yale Chang Introduction According to the structure of the constraints, the sparsity can be obtained from three types of regularizers for different purposes.. Flat Sparsity.

More information

Composite nonlinear models at scale

Composite nonlinear models at scale Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)

More information

CPSC 340: Machine Learning and Data Mining. Gradient Descent Fall 2016

CPSC 340: Machine Learning and Data Mining. Gradient Descent Fall 2016 CPSC 340: Machine Learning and Data Mining Gradient Descent Fall 2016 Admin Assignment 1: Marks up this weekend on UBC Connect. Assignment 2: 3 late days to hand it in Monday. Assignment 3: Due Wednesday

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Overview. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Overview. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Overview Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 1/25/2016 Sparsity Denoising Regression Inverse problems Low-rank models Matrix completion

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Oslo Class 6 Sparsity based regularization

Oslo Class 6 Sparsity based regularization RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Hierarchical Boosting and Filter Generation

Hierarchical Boosting and Filter Generation January 29, 2007 Plan Combining Classifiers Boosting Neural Network Structure of AdaBoost Image processing Hierarchical Boosting Hierarchical Structure Filters Combining Classifiers Combining Classifiers

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:

More information

Low Rank Matrix Completion Formulation and Algorithm

Low Rank Matrix Completion Formulation and Algorithm 1 2 Low Rank Matrix Completion and Algorithm Jian Zhang Department of Computer Science, ETH Zurich zhangjianthu@gmail.com March 25, 2014 Movie Rating 1 2 Critic A 5 5 Critic B 6 5 Jian 9 8 Kind Guy B 9

More information

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.

More information