Is the Whole Greater Than the Sum of Its Parts?

Size: px

Start display at page:

Download "Is the Whole Greater Than the Sum of Its Parts?"

Julia Hubbard
5 years ago
Views:

1 Is the Whole Greater Than the Sum of Its Parts? Presenter: Liangyue Li Joint work with Liangyue Li (ASU) Hanghang Tong (ASU) Yong Wang (HKUST) Conglei Shi (IBM->Airbnb) Nan Cao (Tongji) Norbou Buchler (US ARL) - 1 -

2 From the Ancient Philosophy The whole is greater than the sum of its parts. -- Aristotle Whole: a collection of parts Parts: individual elements Aristotle s hypothesis: whole > sum of parts - 2 -

3 Part-Whole in Team Science Research Team Sports Team Film Crew Whole Team Parts Team members Sales Team - 3 -

4 Part-Whole Beyond Teams Autonomous System Whole: system Parts: individual drones Stock Market Whole: DJIA Parts: individual stock Community Question Answering Whole: question Parts: individual answers System Reliability Whole: system Parts: individual component - 4 -

5 Outcome of Part-Whole Whole: Team Part: Members Whole outcome: Team s performance Part outcome: each member s performance Whole: Researcher Part: Publications Whole outcome: h-index Part outcome: #citations of publications Question: how can we predict the outcome of whole/parts? - 5 -

6 Predict the Part-Whole Outcomes Existing Algorithmic Work Separate models for parts and whole Joint linear models Aristotle s hypothesis: whole>sum(parts) Question: how to jointly predict part/whole by leveraging the part-whole relationship beyond the linear models? - 6 -

7 Avg. Answer Impact Challenges -- Modeling Nonlinear Part-whole Relationship Max: impact of a question is strongly correlated with that of the best answer Min: classic Wooden Bucket Theory Sparsity: team performance often dominated by a few top-performing team members - 7 -

8 Challenges Modeling (con t) Part-part Interdependency Parts are connected via underlying network Impact of such interdependency on outcomes Hypothesis-1: similar parts -> similar contribution to whole Hypothesis-2: similar parts -> similar parts outcome whole parts Question: how can we utilize 1. nonlinear part-whole relationship 2. part-part interdependency

9 Challenges -- Algorithm Non-linearity + Interdependency high complexity Question: how to scale up the computation? - 9 -

10 Part-Whole Outcome Prediction Movies (Whole) Actors/Actress (Parts) F p F o y p y o G p Given: 1. feature matrix for whole/part F o /F p 2. outcome vector for whole/part y o /y p 3. whole to part mapping φ 4. parts network G p (optional) Predict: outcome of new whole/parts

11 Roadmap Motivations PAROLE -- Modeling Generic Framework Modeling Part-Whole Relationship Modeling Part-Part Interdependency PAROLE -- Optimization Empirical Evaluations Conclusions

12 A Generic Joint Prediction Framework -- PAROLE Formulation min J = J o + J p + J po + J pp + J r J o Movie (Whole) F o (1, : ) Actor/Actress (Part) F p (5, : ) J po F o (2, : ) F p (4, : ) = 1 n o n o i=1 L[f F o i, :, w o, y o i )] + 1 n p n p L[f F p i, :, w p, y p i )] i=1 + α n o n o i=1 h(f F o i, :, w o, Agg(φ(o i ))) J p F p (1, : ) J pp F p (2, : ) F p (3, : ) J o : Predictive Model for Whole J p : Predictive Model for Part J po : Part-whole Relationship n p + β n p i=1 n p G p ij g(f F p i, :, w p, f(f p j, :, w p )) j=1 J pp : Part-part Interdependency γ(ω w o + Ω w p ) J r : parameter regularizer

13 Roadmap Motivations PAROLE -- Modeling Generic Framework Modeling Part-Whole Relationship Modeling Part-Part Interdependency PAROLE -- Optimization Empirical Evaluations Conclusions J o Movie (Whole) F o (1, : ) Actor/Actress (Part) F p (5, : ) J p J pp F p (1, : ) F o (2, : ) J po F p (4, : ) F p (3, : ) F p (2, : )

14 Modeling Part-Whole Relationship Agg(o i ) Overview: for each whole entity o i, define e i = F o i, w o Agg o i e i : Measure the difference between predicted whole outcome using whole feature predicted whole outcome using aggregated parts outcome Key idea: model part-whole relations by Different loss functions on e i Different aggregation functions Agg( )

15 Overview Intuition: whole (weighted) sum of parts Details: a j i : weight of part j s contribution to the whole o i s outcome Remark: e i = F o i, w o Agg o i Agg o i = a i j F p j, : w p j φ(o i ) Characterize part-whole relationships Agg(o i ) i a 1 i a i 2 a 3 a 4 i Use different loss functions on e i Use different norms on a i

16 Linear Part-Whole Relation linear a 1 i a 2 i a 3 i a 4 i Intuition: Whole linear combination of parts some parts play more important roles than the others in contributing to the whole outcome Details: J po = α 2n o n o 2 i=1 e i Remark: a j i = 1: the whole is the sum of its parts a i j = 1 : average coupling o i

17 Sparse Part-Whole Relation Intuition: Whole a few parts sparse a 1 i a 2 i = 0 a 3 i a 4 i some parts have little or no effect on the whole outcome Details: J po = α n o n o i=1( 1 e 2 i 2 +λ a i 1 ) Remark: The l 1 norm can shrink some part contributions a i j to exactly zero Nonlinear part-whole relation

18 Ordered Sparse Part-Whole Relation Intuition: Whole a few top parts team performance is determined by not only a few key members, but also the structural hierarchy between them Details: J po = α n o n o i=1( 1 e 2 i 2 +λω w (a i )) n Ω w x = i=1 x [i] w i = w T x : ordered weighted l 1 norm w K m+ : vector of non-increasing nonnegative weights

19 Robust Part-Whole Relation Intuition: Whole parts that are not outliers squared loss is sensitive to outliers. Solution: robust regression model Details: J po = α n o n o i=1 ρ( ) is robust estimator ρ e i

20 Maximum Part-Whole Relation Intuition: Whole max(parts) team performance dominated by the best team member/leader Details: Agg o i = max(parts outcome) [not differentiable] Soft max function: max x 1, x 2,, x n ln(exp x 1 + exp x exp(x n )) Aggregation: Agg o i = ln( j φ(oi ) exp(fp j, : w p )) J po = α 2n o n o 2 ei i=1 max

21 Summarize Part-Whole Relations Name Agg(o i ) Aggregation of parts Maximum ln( exp(f p j, : w p )) J po Sub-objective Remark α 2n o e i 2 Nonlinear Whole max(parts) Linear a j i F p j, : w p α 2n o e i 2 Linear Whole linear combination of parts Sparse a j i F p j, : w p α n o ( 1 2 e i 2 + λ a i 1 ) Nonlinear Whole a few parts Ordered Sparse a j i F p j, : w p α n o ( 1 2 e i 2 + λω w a i ) Nonlinear Whole a few top parts Robust a j i F p j, : w p α n o ρ(e i ) Nonlinear Whole parts that are not outliers

22 Roadmap Motivations PAROLE -- Modeling Generic Framework Modeling Part-Whole Relationship Modeling Part-Part Interdependency PAROLE -- Optimization Empirical Evaluations Conclusions J o Movie (Whole) F o (1, : ) Actor/Actress (Part) F p (5, : ) J p J pp F p (1, : ) F o (2, : ) J po F p (4, : ) F p (3, : ) F p (2, : )

23 Modeling Part-Part Interdependency Effect on the whole outcome Intuition: closely connected parts might have similar contribution to the whole outcome Details: a 1 i a 2 i a 3 i a 4 i Similar parts (large G p kl ) similar contributions (a k i a l i ) p G 12 p G

24 Modeling Part-Part Interdependency Effect on the parts outcome Intuition: closely connected parts might share similar outcomes themselves Details: Fp 4, wp F p 1, w p p G 12 p G 14 Similar parts (large G ij p ) similar predicted outcomes (F p i, w p F p j, : w p )

25 Roadmap Motivations PAROLE -- Modeling PAROLE -- Optimization Empirical Evaluations Conclusions

26 Optimization Solution Formulation: J = J o w o + J p w p + J po w o, w p, a j i J pp w p Observation: + J r w o, w p Movie (Whole) Actor/Actress (Part) F p (5, : ) not jointly convex w.r.t. w o, w p, a i j J p F p (1, : ) J o F o (1, : ) J pp + F p (2, : ) F o (2, : ) J po F p (4, : ) F p (3, : ) Convex w.r.t. to one block while fixing others Solution: block coordinate descent

27 Block Coordinate Descent Three coordinate blocks: w o, w p i, a j Update one block while fixing others Update each block (proximal) gradient descent Maximum Agg Linear Agg Sparse Agg Order Sparse Agg α n o i=1 J po w o n o ei (F o i, : ) α n o i=1 J po w p n o ei j φ o i (F p (j, : )) y i p j φ o i y i p J po a i or proximal gradient update α (F o ) (F o w o MF p w p ) α F p M (F o w o MF p w p ) e i F p φ o i, w p + L p i a i n o n o α (F o ) (F o w o MF p w p ) α F p M (F o w o MF p w p ) z = a i τ e i F p φ o i, : w p + L i n o n o a i prox λτl1 (z) α (F o ) (F o w o MF p w p ) α F p M (F o w o MF p w p ) z = a i τ e i F p φ o i, : w p + L i n o n o a i prox λτωw (z) N/A p a i p a i Robust Agg α n o n o i=1 ρ e i e i F o i, : α n o n o i=1 ρ e i e i ( j φ o i a j F p (j, : ) ) α n o ρ e i F p φ o e i, : w p + L p i a i i

28 Optimization Properties Convergence and Optimality Under mild conditions, the optimization alg converges to a coordinate-wise minimum point Complexity The alg scales linearly w.r.t. the size of partwhole graph in both time and space Whole Parts Complexity: O(n o d o + n p d p + m po + m pp ) n o : #whole entities n p : #part entities m po : #links from whole to parts m pp : #links in part-part network d o, d p : feature dimension of whole, parts

29 Roadmap Motivations PAROLE -- Modeling PAROLE -- Optimization Empirical Evaluations Conclusions

Datasets Data Whole Part #Whole #Part Math SO DBLP Movie Question (#votes) Question (#votes) Author (h-index) Movie (# ) Answer (#votes) Answer (#votes) Paper (#citation) Actors/Actress (# ) 16,638

30 Datasets Data Whole Part #Whole #Part Math SO DBLP Movie Question (#votes) Question (#votes) Author (h-index) Movie (# ) Answer (#votes) Answer (#votes) Paper (#citation) Actors/Actress (# ) 16,638 32,876 1,966,272 4,282, , ,756 5,043 37,365 Setup: sort whole in chronological order, gather first x percent and corresponding parts as training, test on last 10% Metric: root mean squared error (RMSE)

31 Joint Nonlinear Outcome Prediction Performance Math Sparse Observations 1. Joint prediction models > separate models 2. Non-linear part-whole relationships (max, Huber, Bisquare, Lasso, OWL) > linear relationships (Sum, Linear) 3. Lasso and OWL are the best methods in most cases

32 Outcome Prediction Performance Whole Parts Overall SO DBLP

33 Effect of part-part interdependency Movie PAROLE-Basic no network information Part-part interdependency on whole outcome and parts outcome both boost the performance

34 Convergence Analysis SO PAROLE converges fast (25-30 iterations)

35 Parameter Sensitivity Movie α controls importance of part-whole relation β controls importance of part-part interdependency Stable in a relatively large parameter space

36 Scalability of PAROLE SO part-whole graph size: PAROLE scales linearly w.r.t. part-whole graph size

37 Roadmap Motivations PAROLE -- Modeling PAROLE -- Optimization Empirical Evaluations Conclusions

Conclusions -- PAROLE Goals: leverage potentially non-linear partwhole relationships for outcome prediction Solutions: PAROLE Modeling Part-whole relationship

38 Conclusions -- PAROLE Goals: leverage potentially non-linear partwhole relationships for outcome prediction Solutions: PAROLE Modeling Part-whole relationship Part-part interdependency Optimization Block coordinate descent Converges to a coordinate-wise minimum point Scales linearly w.r.t. the part-whole graph size

Is the Whole Greater Than the Sum of Its Parts?

ABSTRACT Is the Whole Greater Than the of Its Parts? Liangyue Li Arizona State University liangyue@asu.edu Conglei Shi IBM Research shiconglei@gmail.com The part-whole relationship routinely finds itself