Inexact Search is Good Enough

Size: px

Start display at page:

Download "Inexact Search is Good Enough"

Dina Lawson
5 years ago
Views:

1 Inexact Search is Good Enough Advanced Machine Learning for NLP Jordan Boyd-Graber MATHEMATICAL TREATMENT Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 1 of 1

2 Preliminaries: algorithm, separability Structured perceptron maintains set of wrong features Φ(x,y,z) Φ(x,y) Φ(x,z) (1) Structured perceptron updates weights with w w + Φ(x,y,z) (2) Dataset D is linearly separable under features Φ with margin δ if u Φ(x,y,z) δ x,y,z D (3) given some oracle unit vector u. Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 2 of 1

3 Violations vs. Errors It may be difficult to find the highest scoring hypothesis It s okay as long as inference finds a violation w Φ(x,y,z) 0 (4) This means that y might not be answer algorithm gives (i.e., wrong) Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 3 of 1

4 Limited number of mistakes Define diameter R as R = max (x,y,z) Φ(x,y,z) (5) Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 4 of 1

5 Limited number of mistakes Define diameter R as R = max (x,y,z) Φ(x,y,z) (5) Weight vector w grows with each error We can prove that w can t get too big And thus, algorithm can only run for limited number of iterations k where it updates weights Indeed, we ll bound it from two directions k 2 δ 2 w (k+1) 2 kr 2 (6) Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 4 of 1

6 Lower Bound Lower Bound k 2 δ 2 w (k+1) 2 (7) Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 5 of 1

7 Lower Bound Lower Bound k 2 δ 2 w (k+1) 2 w (k+1) =w (k) + Φ(x,y,z) (7) Update equation (8) Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 5 of 1

8 Lower Bound Lower Bound k 2 δ 2 w (k+1) 2 w (k+1) =w (k) + Φ(x,y,z) (7) u w (k+1) = u w (k) + u Φ(x,y,z) (8) (9) Multiply both sides by u Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 5 of 1

9 Lower Bound Lower Bound k 2 δ 2 w (k+1) 2 w (k+1) =w (k) + Φ(x,y,z) (7) u w (k+1) = u w (k) + u Φ(x,y,z) (8) u w (k+1) u w (k) + δ (9) Definition of margin Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 5 of 1

10 Lower Bound Lower Bound k 2 δ 2 w (k+1) 2 w (k+1) =w (k) + Φ(x,y,z) (7) u w (k+1) = u w (k) + u Φ(x,y,z) (8) u w (k+1) u w (k) + δ (9) By induction, u w (k+1) kδ (Base case: w 0 = 0) Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 5 of 1

11 Lower Bound Lower Bound k 2 δ 2 w (k+1) 2 u w (k+1) u w (k) + δ (7) By induction, u w (k+1) kδ (Base case: w 0 = 0) u w (k+1) u w kδ (8) For any vectors, a b a b Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 5 of 1

12 Lower Bound Lower Bound k 2 δ 2 w (k+1) 2 u w (k+1) u w (k) + δ (7) By induction, u w (k+1) kδ (Base case: w 0 = 0) u w (k+1) u w kδ (8) w (k+1) kδ (9) u is a unit vector Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 5 of 1

13 Lower Bound Lower Bound k 2 δ 2 w (k+1) 2 u w (k+1) u w (k) + δ (7) By induction, u w (k+1) kδ (Base case: w 0 = 0) u w (k+1) u w kδ (8) w (k+1) kδ (9) w (k+1) 2 k 2 δ 2 (10) Square both sides, and we re done! Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 5 of 1

14 Upper Bound Upper Bound w (k+1) 2 kr 2 (11) (12) Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 6 of 1

15 Upper Bound Upper Bound w (k+1) 2 kr 2 (11) w (k+1) 2 = w (k) + Φ(x,y,z) 2 (12) Update rule Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 6 of 1

16 Upper Bound Upper Bound w (k+1) 2 kr 2 (11) w (k+1) 2 = w (k) + Φ(x,y,z) 2 (12) w (k+1) 2 = w (k) 2 + Φ(x,y,z) 2 + 2w (k) Φ(x,y,z) (13) Law of cosines Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 6 of 1

17 Upper Bound Upper Bound w (k+1) 2 kr 2 (11) w (k+1) 2 = w (k) + Φ(x,y,z) 2 (12) w (k+1) 2 = w (k) 2 + Φ(x,y,z) 2 + 2w (k) Φ(x,y,z) (13) w (k+1) 2 w (k) 2 + R 2 + 2w (k) Φ(x,y,z) (14) Definition of diameter Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 6 of 1

18 Upper Bound Upper Bound w (k+1) 2 kr 2 (11) w (k+1) 2 = w (k) + Φ(x,y,z) 2 (12) w (k+1) 2 = w (k) 2 + Φ(x,y,z) 2 + 2w (k) Φ(x,y,z) (13) w (k+1) 2 w (k) 2 + R 2 + 2w (k) Φ(x,y,z) (14) w (k+1) 2 w (k) 2 + R (15) If violation Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 6 of 1

19 Upper Bound Upper Bound w (k+1) 2 kr 2 (11) w (k+1) 2 = w (k) + Φ(x,y,z) 2 (12) w (k+1) 2 = w (k) 2 + Φ(x,y,z) 2 + 2w (k) Φ(x,y,z) (13) w (k+1) 2 w (k) 2 + R 2 + 2w (k) Φ(x,y,z) (14) w (k+1) 2 w (k) 2 + R (15) w (k+1) 2 kr 2 (16) Induction! Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 6 of 1

20 Putting it together Sandwich: k 2 δ 2 w (k+1) 2 kr 2 (17) Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 7 of 1

21 Putting it together Sandwich: k 2 δ 2 w (k+1) 2 kr 2 (17) Solve for k: k R2 δ 2 (18) Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 7 of 1

22 Putting it together Sandwich: k 2 δ 2 w (k+1) 2 kr 2 (17) Solve for k: k R2 δ 2 (18) What does this mean? Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 7 of 1

23 Putting it together Sandwich: k 2 δ 2 w (k+1) 2 kr 2 (17) Solve for k: k R2 δ 2 (18) What does this mean? Limited number of errors (updates) Larger diameter increases errors (worst possible mistake) Larger margin decreases errors (bigger separation from wrong answer) Finding the largest violation wrong answer is best (but any violation okay) Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 7 of 1

24 In Practice Harder the search space, the more max violation helps Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 8 of 1

Online Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri

Online Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri Online Learning Jordan Boyd-Graber University of Colorado Boulder LECTURE 21 Slides adapted from Mohri Jordan Boyd-Graber Boulder Online Learning 1 of 31 Motivation PAC learning: distribution fixed over