Is the Whole Greater Than the Sum of Its Parts?
|
|
- Julia Hubbard
- 5 years ago
- Views:
Transcription
1 Is the Whole Greater Than the Sum of Its Parts? Presenter: Liangyue Li Joint work with Liangyue Li (ASU) Hanghang Tong (ASU) Yong Wang (HKUST) Conglei Shi (IBM->Airbnb) Nan Cao (Tongji) Norbou Buchler (US ARL) - 1 -
2 From the Ancient Philosophy The whole is greater than the sum of its parts. -- Aristotle Whole: a collection of parts Parts: individual elements Aristotle s hypothesis: whole > sum of parts - 2 -
3 Part-Whole in Team Science Research Team Sports Team Film Crew Whole Team Parts Team members Sales Team - 3 -
4 Part-Whole Beyond Teams Autonomous System Whole: system Parts: individual drones Stock Market Whole: DJIA Parts: individual stock Community Question Answering Whole: question Parts: individual answers System Reliability Whole: system Parts: individual component - 4 -
5 Outcome of Part-Whole Whole: Team Part: Members Whole outcome: Team s performance Part outcome: each member s performance Whole: Researcher Part: Publications Whole outcome: h-index Part outcome: #citations of publications Question: how can we predict the outcome of whole/parts? - 5 -
6 Predict the Part-Whole Outcomes Existing Algorithmic Work Separate models for parts and whole Joint linear models Aristotle s hypothesis: whole>sum(parts) Question: how to jointly predict part/whole by leveraging the part-whole relationship beyond the linear models? - 6 -
7 Avg. Answer Impact Challenges -- Modeling Nonlinear Part-whole Relationship Max: impact of a question is strongly correlated with that of the best answer Min: classic Wooden Bucket Theory Sparsity: team performance often dominated by a few top-performing team members - 7 -
8 Challenges Modeling (con t) Part-part Interdependency Parts are connected via underlying network Impact of such interdependency on outcomes Hypothesis-1: similar parts -> similar contribution to whole Hypothesis-2: similar parts -> similar parts outcome whole parts Question: how can we utilize 1. nonlinear part-whole relationship 2. part-part interdependency
9 Challenges -- Algorithm Non-linearity + Interdependency high complexity Question: how to scale up the computation? - 9 -
10 Part-Whole Outcome Prediction Movies (Whole) Actors/Actress (Parts) F p F o y p y o G p Given: 1. feature matrix for whole/part F o /F p 2. outcome vector for whole/part y o /y p 3. whole to part mapping φ 4. parts network G p (optional) Predict: outcome of new whole/parts
11 Roadmap Motivations PAROLE -- Modeling Generic Framework Modeling Part-Whole Relationship Modeling Part-Part Interdependency PAROLE -- Optimization Empirical Evaluations Conclusions
12 A Generic Joint Prediction Framework -- PAROLE Formulation min J = J o + J p + J po + J pp + J r J o Movie (Whole) F o (1, : ) Actor/Actress (Part) F p (5, : ) J po F o (2, : ) F p (4, : ) = 1 n o n o i=1 L[f F o i, :, w o, y o i )] + 1 n p n p L[f F p i, :, w p, y p i )] i=1 + α n o n o i=1 h(f F o i, :, w o, Agg(φ(o i ))) J p F p (1, : ) J pp F p (2, : ) F p (3, : ) J o : Predictive Model for Whole J p : Predictive Model for Part J po : Part-whole Relationship n p + β n p i=1 n p G p ij g(f F p i, :, w p, f(f p j, :, w p )) j=1 J pp : Part-part Interdependency γ(ω w o + Ω w p ) J r : parameter regularizer
13 Roadmap Motivations PAROLE -- Modeling Generic Framework Modeling Part-Whole Relationship Modeling Part-Part Interdependency PAROLE -- Optimization Empirical Evaluations Conclusions J o Movie (Whole) F o (1, : ) Actor/Actress (Part) F p (5, : ) J p J pp F p (1, : ) F o (2, : ) J po F p (4, : ) F p (3, : ) F p (2, : )
14 Modeling Part-Whole Relationship Agg(o i ) Overview: for each whole entity o i, define e i = F o i, w o Agg o i e i : Measure the difference between predicted whole outcome using whole feature predicted whole outcome using aggregated parts outcome Key idea: model part-whole relations by Different loss functions on e i Different aggregation functions Agg( )
15 Overview Intuition: whole (weighted) sum of parts Details: a j i : weight of part j s contribution to the whole o i s outcome Remark: e i = F o i, w o Agg o i Agg o i = a i j F p j, : w p j φ(o i ) Characterize part-whole relationships Agg(o i ) i a 1 i a i 2 a 3 a 4 i Use different loss functions on e i Use different norms on a i
16 Linear Part-Whole Relation linear a 1 i a 2 i a 3 i a 4 i Intuition: Whole linear combination of parts some parts play more important roles than the others in contributing to the whole outcome Details: J po = α 2n o n o 2 i=1 e i Remark: a j i = 1: the whole is the sum of its parts a i j = 1 : average coupling o i
17 Sparse Part-Whole Relation Intuition: Whole a few parts sparse a 1 i a 2 i = 0 a 3 i a 4 i some parts have little or no effect on the whole outcome Details: J po = α n o n o i=1( 1 e 2 i 2 +λ a i 1 ) Remark: The l 1 norm can shrink some part contributions a i j to exactly zero Nonlinear part-whole relation
18 Ordered Sparse Part-Whole Relation Intuition: Whole a few top parts team performance is determined by not only a few key members, but also the structural hierarchy between them Details: J po = α n o n o i=1( 1 e 2 i 2 +λω w (a i )) n Ω w x = i=1 x [i] w i = w T x : ordered weighted l 1 norm w K m+ : vector of non-increasing nonnegative weights
19 Robust Part-Whole Relation Intuition: Whole parts that are not outliers squared loss is sensitive to outliers. Solution: robust regression model Details: J po = α n o n o i=1 ρ( ) is robust estimator ρ e i
20 Maximum Part-Whole Relation Intuition: Whole max(parts) team performance dominated by the best team member/leader Details: Agg o i = max(parts outcome) [not differentiable] Soft max function: max x 1, x 2,, x n ln(exp x 1 + exp x exp(x n )) Aggregation: Agg o i = ln( j φ(oi ) exp(fp j, : w p )) J po = α 2n o n o 2 ei i=1 max
21 Summarize Part-Whole Relations Name Agg(o i ) Aggregation of parts Maximum ln( exp(f p j, : w p )) J po Sub-objective Remark α 2n o e i 2 Nonlinear Whole max(parts) Linear a j i F p j, : w p α 2n o e i 2 Linear Whole linear combination of parts Sparse a j i F p j, : w p α n o ( 1 2 e i 2 + λ a i 1 ) Nonlinear Whole a few parts Ordered Sparse a j i F p j, : w p α n o ( 1 2 e i 2 + λω w a i ) Nonlinear Whole a few top parts Robust a j i F p j, : w p α n o ρ(e i ) Nonlinear Whole parts that are not outliers
22 Roadmap Motivations PAROLE -- Modeling Generic Framework Modeling Part-Whole Relationship Modeling Part-Part Interdependency PAROLE -- Optimization Empirical Evaluations Conclusions J o Movie (Whole) F o (1, : ) Actor/Actress (Part) F p (5, : ) J p J pp F p (1, : ) F o (2, : ) J po F p (4, : ) F p (3, : ) F p (2, : )
23 Modeling Part-Part Interdependency Effect on the whole outcome Intuition: closely connected parts might have similar contribution to the whole outcome Details: a 1 i a 2 i a 3 i a 4 i Similar parts (large G p kl ) similar contributions (a k i a l i ) p G 12 p G
24 Modeling Part-Part Interdependency Effect on the parts outcome Intuition: closely connected parts might share similar outcomes themselves Details: Fp 4, wp F p 1, w p p G 12 p G 14 Similar parts (large G ij p ) similar predicted outcomes (F p i, w p F p j, : w p )
25 Roadmap Motivations PAROLE -- Modeling PAROLE -- Optimization Empirical Evaluations Conclusions
26 Optimization Solution Formulation: J = J o w o + J p w p + J po w o, w p, a j i J pp w p Observation: + J r w o, w p Movie (Whole) Actor/Actress (Part) F p (5, : ) not jointly convex w.r.t. w o, w p, a i j J p F p (1, : ) J o F o (1, : ) J pp + F p (2, : ) F o (2, : ) J po F p (4, : ) F p (3, : ) Convex w.r.t. to one block while fixing others Solution: block coordinate descent
27 Block Coordinate Descent Three coordinate blocks: w o, w p i, a j Update one block while fixing others Update each block (proximal) gradient descent Maximum Agg Linear Agg Sparse Agg Order Sparse Agg α n o i=1 J po w o n o ei (F o i, : ) α n o i=1 J po w p n o ei j φ o i (F p (j, : )) y i p j φ o i y i p J po a i or proximal gradient update α (F o ) (F o w o MF p w p ) α F p M (F o w o MF p w p ) e i F p φ o i, w p + L p i a i n o n o α (F o ) (F o w o MF p w p ) α F p M (F o w o MF p w p ) z = a i τ e i F p φ o i, : w p + L i n o n o a i prox λτl1 (z) α (F o ) (F o w o MF p w p ) α F p M (F o w o MF p w p ) z = a i τ e i F p φ o i, : w p + L i n o n o a i prox λτωw (z) N/A p a i p a i Robust Agg α n o n o i=1 ρ e i e i F o i, : α n o n o i=1 ρ e i e i ( j φ o i a j F p (j, : ) ) α n o ρ e i F p φ o e i, : w p + L p i a i i
28 Optimization Properties Convergence and Optimality Under mild conditions, the optimization alg converges to a coordinate-wise minimum point Complexity The alg scales linearly w.r.t. the size of partwhole graph in both time and space Whole Parts Complexity: O(n o d o + n p d p + m po + m pp ) n o : #whole entities n p : #part entities m po : #links from whole to parts m pp : #links in part-part network d o, d p : feature dimension of whole, parts
29 Roadmap Motivations PAROLE -- Modeling PAROLE -- Optimization Empirical Evaluations Conclusions
30 Datasets Data Whole Part #Whole #Part Math SO DBLP Movie Question (#votes) Question (#votes) Author (h-index) Movie (# ) Answer (#votes) Answer (#votes) Paper (#citation) Actors/Actress (# ) 16,638 32,876 1,966,272 4,282, , ,756 5,043 37,365 Setup: sort whole in chronological order, gather first x percent and corresponding parts as training, test on last 10% Metric: root mean squared error (RMSE)
31 Joint Nonlinear Outcome Prediction Performance Math Sparse Observations 1. Joint prediction models > separate models 2. Non-linear part-whole relationships (max, Huber, Bisquare, Lasso, OWL) > linear relationships (Sum, Linear) 3. Lasso and OWL are the best methods in most cases
32 Outcome Prediction Performance Whole Parts Overall SO DBLP
33 Effect of part-part interdependency Movie PAROLE-Basic no network information Part-part interdependency on whole outcome and parts outcome both boost the performance
34 Convergence Analysis SO PAROLE converges fast (25-30 iterations)
35 Parameter Sensitivity Movie α controls importance of part-whole relation β controls importance of part-part interdependency Stable in a relatively large parameter space
36 Scalability of PAROLE SO part-whole graph size: PAROLE scales linearly w.r.t. part-whole graph size
37 Roadmap Motivations PAROLE -- Modeling PAROLE -- Optimization Empirical Evaluations Conclusions
38 Conclusions -- PAROLE Goals: leverage potentially non-linear partwhole relationships for outcome prediction Solutions: PAROLE Modeling Part-whole relationship Part-part interdependency Optimization Block coordinate descent Converges to a coordinate-wise minimum point Scales linearly w.r.t. the part-whole graph size
Is the Whole Greater Than the Sum of Its Parts?
ABSTRACT Is the Whole Greater Than the of Its Parts? Liangyue Li Arizona State University liangyue@asu.edu Conglei Shi IBM Research shiconglei@gmail.com The part-whole relationship routinely finds itself
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationProximal Methods for Optimization with Spasity-inducing Norms
Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationLearning Task Grouping and Overlap in Multi-Task Learning
Learning Task Grouping and Overlap in Multi-Task Learning Abhishek Kumar Hal Daumé III Department of Computer Science University of Mayland, College Park 20 May 2013 Proceedings of the 29 th International
More informationLogistic Regression with the Nonnegative Garrote
Logistic Regression with the Nonnegative Garrote Enes Makalic Daniel F. Schmidt Centre for MEGA Epidemiology The University of Melbourne 24th Australasian Joint Conference on Artificial Intelligence 2011
More informationhttps://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:
More informationA Randomized Approach for Crowdsourcing in the Presence of Multiple Views
A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion
More informationRobust high-dimensional linear regression: A statistical perspective
Robust high-dimensional linear regression: A statistical perspective Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics STOC workshop on robustness and nonconvexity Montreal,
More informationMetric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)
Metric Embedding of Task-Specific Similarity Greg Shakhnarovich Brown University joint work with Trevor Darrell (MIT) August 9, 2006 Task-specific similarity A toy example: Task-specific similarity A toy
More informationOWL to the rescue of LASSO
OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More information18.6 Regression and Classification with Linear Models
18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight
More informationLinear and Logistic Regression. Dr. Xiaowei Huang
Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationLecture Notes 10: Matrix Factorization
Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To
More informationDATA MINING AND MACHINE LEARNING
DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems
More informationLearning the Number of Neurons in Deep Networks
Learning the Number of Jose M. Alvarez 1 Mathieu Salzmanno 2 1 Data61 @ CSIRO,Canberra, ACT 2601, Australia 2 CVLab, EPFL,CH-1015 Lausanne, Switzerland NIPS,2016 Presenter: Arshdeep Sekhon NIPS,2016 Presenter:
More informationThe Frank-Wolfe Algorithm:
The Frank-Wolfe Algorithm: New Results, and Connections to Statistical Boosting Paul Grigas, Robert Freund, and Rahul Mazumder http://web.mit.edu/rfreund/www/talks.html Massachusetts Institute of Technology
More informationSolving Corrupted Quadratic Equations, Provably
Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin
More informationsparse and low-rank tensor recovery Cubic-Sketching
Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru
More information1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large
More informationA Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation
A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation Yue Ning 1 Yue Shi 2 Liangjie Hong 2 Huzefa Rangwala 3 Naren Ramakrishnan 1 1 Virginia Tech 2 Yahoo Research. Yue Shi
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationAn Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion
An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion Robert M. Freund, MIT joint with Paul Grigas (UC Berkeley) and Rahul Mazumder (MIT) CDC, December 2016 1 Outline of Topics
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationl 1 and l 2 Regularization
David Rosenberg New York University February 5, 2015 David Rosenberg (New York University) DS-GA 1003 February 5, 2015 1 / 32 Tikhonov and Ivanov Regularization Hypothesis Spaces We ve spoken vaguely about
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More informationLasso Regression: Regularization for feature selection
Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 1 Feature selection task 2 1 Why might you want to perform feature selection? Efficiency: - If
More informationRecommendation Systems
Recommendation Systems Popularity Recommendation Systems Predicting user responses to options Offering news articles based on users interests Offering suggestions on what the user might like to buy/consume
More informationARock: an algorithmic framework for asynchronous parallel coordinate updates
ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,
More informationMLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More informationSVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels
SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic
More informationSmoothing Proximal Gradient Method. General Structured Sparse Regression
for General Structured Sparse Regression Xi Chen, Qihang Lin, Seyoung Kim, Jaime G. Carbonell, Eric P. Xing (Annals of Applied Statistics, 2012) Gatsby Unit, Tea Talk October 25, 2013 Outline Motivation:
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationScalable robust hypothesis tests using graphical models
Scalable robust hypothesis tests using graphical models Umamahesh Srinivas ipal Group Meeting October 22, 2010 Binary hypothesis testing problem Random vector x = (x 1,...,x n ) R n generated from either
More informationUsing SVD to Recommend Movies
Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last
More informationMachine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017
Machine Learning Regularization and Feature Selection Fabio Vandin November 13, 2017 1 Learning Model A: learning algorithm for a machine learning task S: m i.i.d. pairs z i = (x i, y i ), i = 1,..., m,
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationLASSO Review, Fused LASSO, Parallel LASSO Solvers
Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable
More informationMATH 680 Fall November 27, Homework 3
MATH 680 Fall 208 November 27, 208 Homework 3 This homework is due on December 9 at :59pm. Provide both pdf, R files. Make an individual R file with proper comments for each sub-problem. Subgradients and
More informationHomework 6. Due: 10am Thursday 11/30/17
Homework 6 Due: 10am Thursday 11/30/17 1. Hinge loss vs. logistic loss. In class we defined hinge loss l hinge (x, y; w) = (1 yw T x) + and logistic loss l logistic (x, y; w) = log(1 + exp ( yw T x ) ).
More informationLinear Regression. CSL603 - Fall 2017 Narayanan C Krishnan
Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization
More informationPrimal-dual coordinate descent
Primal-dual coordinate descent Olivier Fercoq Joint work with P. Bianchi & W. Hachem 15 July 2015 1/28 Minimize the convex function f, g, h convex f is differentiable Problem min f (x) + g(x) + h(mx) x
More informationCertifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering
Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint
More informationLinear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan
Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis
More information9 Classification. 9.1 Linear Classifiers
9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationBehavioral Data Mining. Lecture 7 Linear and Logistic Regression
Behavioral Data Mining Lecture 7 Linear and Logistic Regression Outline Linear Regression Regularization Logistic Regression Stochastic Gradient Fast Stochastic Methods Performance tips Linear Regression
More informationLecture 9: September 28
0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationOnline Nonnegative Matrix Factorization with General Divergences
Online Nonnegative Matrix Factorization with General Divergences Vincent Y. F. Tan (ECE, Mathematics, NUS) Joint work with Renbo Zhao (NUS) and Huan Xu (GeorgiaTech) IWCT, Shanghai Jiaotong University
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More informationML4NLP Multiclass Classification
ML4NLP Multiclass Classification CS 590NLP Dan Goldwasser Purdue University dgoldwas@purdue.edu Social NLP Last week we discussed the speed-dates paper. Interesting perspective on NLP problems- Can we
More informationMachine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, Emily Fox 2014
Case Study 3: fmri Prediction Fused LASSO LARS Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, 2014 Emily Fox 2014 1 LASSO Regression
More informationLecture 25: November 27
10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining Linear Classifiers: predictions Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due Friday of next
More informationManfred K. Warmuth - UCSC S.V.N. Vishwanathan - Purdue & Microsoft Research. Updated: March 23, Warmuth (UCSC) ICML 09 Boosting Tutorial 1 / 62
Updated: March 23, 2010 Warmuth (UCSC) ICML 09 Boosting Tutorial 1 / 62 ICML 2009 Tutorial Survey of Boosting from an Optimization Perspective Part I: Entropy Regularized LPBoost Part II: Boosting from
More informationJournal Club. A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) March 8th, CMAP, Ecole Polytechnique 1/19
Journal Club A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) CMAP, Ecole Polytechnique March 8th, 2018 1/19 Plan 1 Motivations 2 Existing Acceleration Methods 3 Universal
More informationAnytime Planning for Decentralized Multi-Robot Active Information Gathering
Anytime Planning for Decentralized Multi-Robot Active Information Gathering Brent Schlotfeldt 1 Dinesh Thakur 1 Nikolay Atanasov 2 Vijay Kumar 1 George Pappas 1 1 GRASP Laboratory University of Pennsylvania
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationSCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.
SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression
More informationLasso Regression: Regularization for feature selection
Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 Feature selection task 1 Why might you want to perform feature selection? Efficiency: - If size(w)
More informationError Functions & Linear Regression (2)
Error Functions & Linear Regression (2) John Kelleher & Brian Mac Namee Machine Learning @ DIT Overview 1 Introduction Overview 2 Linear Classifiers Threshold Function Perceptron Learning Rule Training/Learning
More informationRaRE: Social Rank Regulated Large-scale Network Embedding
RaRE: Social Rank Regulated Large-scale Network Embedding Authors: Yupeng Gu 1, Yizhou Sun 1, Yanen Li 2, Yang Yang 3 04/26/2018 The Web Conference, 2018 1 University of California, Los Angeles 2 Snapchat
More informationLaconic: Label Consistency for Image Categorization
1 Laconic: Label Consistency for Image Categorization Samy Bengio, Google with Jeff Dean, Eugene Ie, Dumitru Erhan, Quoc Le, Andrew Rabinovich, Jon Shlens, and Yoram Singer 2 Motivation WHAT IS THE OCCLUDED
More informationAccelerate Subgradient Methods
Accelerate Subgradient Methods Tianbao Yang Department of Computer Science The University of Iowa Contributors: students Yi Xu, Yan Yan and colleague Qihang Lin Yang (CS@Uiowa) Accelerate Subgradient Methods
More informationCoordinate Update Algorithm Short Course The Package TMAC
Coordinate Update Algorithm Short Course The Package TMAC Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 16 TMAC: A Toolbox of Async-Parallel, Coordinate, Splitting, and Stochastic Methods C++11 multi-threading
More informationImportance Sampling for Minibatches
Importance Sampling for Minibatches Dominik Csiba School of Mathematics University of Edinburgh 07.09.2016, Birmingham Dominik Csiba (University of Edinburgh) Importance Sampling for Minibatches 07.09.2016,
More informationLikelihood-Based Methods
Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationBinary matrix completion
Binary matrix completion Yaniv Plan University of Michigan SAMSI, LDHD workshop, 2013 Joint work with (a) Mark Davenport (b) Ewout van den Berg (c) Mary Wootters Yaniv Plan (U. Mich.) Binary matrix completion
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationRegression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)
Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features
More informationCS7267 MACHINE LEARNING
CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning
More informationEE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)
EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in
More informationMultivariate Calibration with Robust Signal Regression
Multivariate Calibration with Robust Signal Regression Bin Li and Brian Marx from Louisiana State University Somsubhra Chakraborty from Indian Institute of Technology Kharagpur David C Weindorf from Texas
More informationL 2,1 Norm and its Applications
L 2, Norm and its Applications Yale Chang Introduction According to the structure of the constraints, the sparsity can be obtained from three types of regularizers for different purposes.. Flat Sparsity.
More informationComposite nonlinear models at scale
Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)
More informationCPSC 340: Machine Learning and Data Mining. Gradient Descent Fall 2016
CPSC 340: Machine Learning and Data Mining Gradient Descent Fall 2016 Admin Assignment 1: Marks up this weekend on UBC Connect. Assignment 2: 3 late days to hand it in Monday. Assignment 3: Due Wednesday
More informationRegression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)
Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features
More informationOverview. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Overview Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 1/25/2016 Sparsity Denoising Regression Inverse problems Low-rank models Matrix completion
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationHomework 4. Convex Optimization /36-725
Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationOslo Class 6 Sparsity based regularization
RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity
More informationAn efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss
An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao
More informationHierarchical Boosting and Filter Generation
January 29, 2007 Plan Combining Classifiers Boosting Neural Network Structure of AdaBoost Image processing Hierarchical Boosting Hierarchical Structure Filters Combining Classifiers Combining Classifiers
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationKernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning
Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:
More informationLow Rank Matrix Completion Formulation and Algorithm
1 2 Low Rank Matrix Completion and Algorithm Jian Zhang Department of Computer Science, ETH Zurich zhangjianthu@gmail.com March 25, 2014 Movie Rating 1 2 Critic A 5 5 Critic B 6 5 Jian 9 8 Kind Guy B 9
More informationMachine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.
More information