Online Passive-Aggressive Algorithms Tirgul 11
Multi-Label Classification 2
Multilabel Problem: Example Mapping Apps to smart folders: Assign an installed app to one or more folders Candy Crush Saga 3
Goal Given A, an installed app: Farm Story Assign it to y Y the set of relevant folders: Where Y is the set of all folders: Games Social Social Music Games Photography Shopping 4
Multilabel Classification A variant of the classification problem: Multiple target labels must by assigned to each instance. Setting: There are k different possible labels: Y = 1,, k. Every instance x i is associated with a set of relevant labels y i. Special case: There is a single relevant label for each instance: multiclass (single-label) classification. 5
Multilabel Classification Other usage: Text categorization: x i represents a document. y i is the set of topics which are relevant to the document Chosen from a predefined collection of topics. E.g.: A text might be about any of religion, politics, finance or education at the same time or none of these. 6
Task Assign user s installed applications to one ore more smart folders in an automatic way 7
Task Training set of examples: Each example (A i, y i ) contains an application and a set of one or more smart folders: Farm Story Games Social 8
Model Find a function y = f w A with parameters w that assigns a set of folders y to an installed app A. 9
Model and Inference y = f w A = argsort y w φ(a, y) weight vectors feature maps 10
Feature maps TF-IDF of title Google Play s category TF-IDF of description Representation of related apps
Inference y i = argsort y w φ(a i, y) Algorithm s output upon receiving an instance x i : A score for each of the k labels in Y. x i = Farm Story Y = Social Music Games Photography Shopping 0.6, 0.3, 0.7, 0.1, 0.01 12
y = argsort y w φ(a, y) Inference The algorithm s prediction is a vector in R k where each element in the vector corresponds to the score assigned to the respective label. This form of prediction is often referred to as label ranking. Y = Social Music Games Photography Shopping 0.6, 0.3, 0.7, 0.1, 0.01 R k 13
Inference For a pair of labels r, s Y: If score r > score(s): Label r is ranked higher than label s. Goal of Algorithm: Rank every relevant label above every irrelevant label. Social Music Games Photography 0.6 0.3 Shopping 14
The Margin Example: A i, y i =,. Games After making predictions y i, the algorithm receives the correct set y i. 1. Find the least probable correct folder: Social Social Music Games Games 2. Find the most probable wrong folder: Shopping Music max Photography 15
Iterate over examples Example: A i, y i =,. Games After making predictions y i, the algorithm receives the correct set y i. Social Social Music Games max Update: Shopping Photography 16
The Margin We define the margin attained by the algorithm on round i for example x i, y i, Social Music γ w i, x i, y i = min r y i w i φ x i, r max s y i w i φ x i, s. Games Shopping Photography 17
The Margin The margin is positive if all relevant labels are ranked higher than all irrelevant labels. We are not satisfied with only a positive margin; we require the margin of every prediction to be at least 1. γ w i, x i, y i = min r y i w i φ x i, r max s y i w i φ x i, s. 18
The Loss l min r y i w i φ x i,r max s y i w i φ x i,s <0 γ w i, x i, y i = min r y i w i φ x i, r max s y i w i φ x i, s. Define a hinge loss: l w i, x i, y i = 0 γ w i, x i, y i 1 1 γ w i, x i, y i otherwise 19
The Loss l w i, x i, y i = 0 γ w i, x i, y i 1 1 γ w i, x i, y i otherwise Could also be written as follows: where a + = max(0, a) l w i, x i, y i = 1 γ w i, x i, y i + 20
Learning First approach: Goal : 21
Multilabel PA Optimization Problem An alternative approach: r i = argmin r yi w i φ(x i, r) s i = argmax s yi w i φ(x i, s) w i+1 = argmin w 1 2 w w i 2 s. t. l w, x i, y i = 1 γ w, x i, y i + = 0 i.e., the margin is greater than 1. γ w i, x i, y i = w i φ x i, r i w i φ x i, s i 22
Passive Aggressive (PA) The algorithm is passive whenever the hinge-loss is zero: l i = 0 w i+1 = w i When the loss is positive, the algorithm aggressively forces w i+1 to satisfy the constraint l w i, x i, y i = 0, regardless of the step size required. 23
Passive Aggressive Solving with Lagrange Multipliers, we get the following update rule: w i+1 = w i + τ i φ x i, r i φ x i, s i τ i = l i φ x i, r i φ x i, s i 2 24
Passive Aggressive The updated vector w i+1 will classify example x i with l w, x i, y i = 0. 25
Ranking 26
Ranking Problem: Example A Prediction Bar: Predict the apps the user is most likely to use at any given time and location. 27
assume the user is at a context. 12:47 at the office today is Wednesday. the most likely app the 4th likely app
Goal Find a function f that gets as input the current context x and predicts the apps the user is likely to click at a given context. 29
Features time of day day of week location
Discriminative Model Parameters w A R d of app A Context x R d Prediction:. 31
New Evaluation The performance of the prediction system is measured by Receiver operating Characteristics (ROC) curve. true positive rate= app was in the prediction bar and was clicked total contexts where was clicked app false positive rate= was in the prediction bar and wasn t clicked total contexts where wasn t clicked
Maximizing AUC By definition of the AUC (Bamber, 1975; Hanley and McNeil,1982): AUC 33
Pairwise Dataset For an application define two sets of context: Waze Waze 34
Maximizing AUC By definition of the AUC (Bamber, 1975; Hanley and McNeil,1982): 35
Solve using PA Set the next w i to be the minimizer of the following optimization problem min w R d,ξ 0 1 2 w w i 1 2 s. t. f w x i +, A i f w x i, A i 1 37
Implementation Online algorithm to solve the optimization problem efficiency on huge data. Theoretical guarantees of convergence. 38