Recent Advances in Ranking: Adversarial Respondents and Lower Bounds on the Bayes Risk

Size: px
Start display at page:

Download "Recent Advances in Ranking: Adversarial Respondents and Lower Bounds on the Bayes Risk"

Transcription

1 Recent Advances in Ranking: Adversarial Respondents and Lower Bounds on the Bayes Risk Vincent Y. F. Tan Department of Electrical and Computer Engineering, Department of Mathematics, National University of Singapore McMaster University June 15, 2018 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 1 / 51

2 Outline 1 Introduction to Statistical Models for Ranking 2 Fundamental Limits of Top-K Ranking with Adversaries 3 Lower Bounds on the Bayes Risk of a Bayesian BTL Model Vincent Y. F. Tan (NUS) Ranking Mcmaster University 2 / 51

3 Outline 1 Introduction to Statistical Models for Ranking 2 Fundamental Limits of Top-K Ranking with Adversaries 3 Lower Bounds on the Bayes Risk of a Bayesian BTL Model Vincent Y. F. Tan (NUS) Ranking Mcmaster University 3 / 51

4 Ranking A fundamental problem in a wide range of contexts Vincent Y. F. Tan (NUS) Ranking Mcmaster University 4 / 51

5 Ranking A fundamental problem in a wide range of contexts Applications: web search, recommendation systems, social choice, sports competitions, voting, etc. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 4 / 51

6 Ranking A fundamental problem in a wide range of contexts Applications: web search, recommendation systems, social choice, sports competitions, voting, etc. Efforts in developing various ranking algorithms Vincent Y. F. Tan (NUS) Ranking Mcmaster University 4 / 51

7 Ranking A fundamental problem in a wide range of contexts Applications: web search, recommendation systems, social choice, sports competitions, voting, etc. Efforts in developing various ranking algorithms A variety of statistical models introduced for evaluating ranking algorithms Vincent Y. F. Tan (NUS) Ranking Mcmaster University 4 / 51

8 Ranking: An Example and Difficulties Example: Web search Vincent Y. F. Tan (NUS) Ranking Mcmaster University 5 / 51

9 Ranking: An Example and Difficulties Example: Web search n = 10 9 websites ( n ) 2 n 2 = comparisons Do we really need Θ(n 2 ) comparisons? Vincent Y. F. Tan (NUS) Ranking Mcmaster University 5 / 51

10 Large-Scale Ranking Suppose that we want a total ordering pairwise comparisons are randomly given (probabilistically). This indeed requires Θ(n 2 ) comparisons Vincent Y. F. Tan (NUS) Ranking Mcmaster University 6 / 51

11 Large-Scale Ranking Suppose that we want a total ordering pairwise comparisons are randomly given (probabilistically). This indeed requires Θ(n 2 ) comparisons No way to identify the ordering between 1 and 2 without a direct comparison, i.e., comparison must be made w.p. 1 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 6 / 51

12 Large-Scale Ranking Suppose that we want a total ordering pairwise comparisons are randomly given (probabilistically). This indeed requires Θ(n 2 ) comparisons No way to identify the ordering between 1 and 2 without a direct comparison, i.e., comparison must be made w.p. 1 Worse with noisy data Vincent Y. F. Tan (NUS) Ranking Mcmaster University 6 / 51

13 Large-Scale Ranking Suppose that we want a total ordering pairwise comparisons are randomly given (probabilistically). This indeed requires Θ(n 2 ) comparisons No way to identify the ordering between 1 and 2 without a direct comparison, i.e., comparison must be made w.p. 1 Worse with noisy data Adopt a Shannon-theoretic approach in our analyses Vincent Y. F. Tan (NUS) Ranking Mcmaster University 6 / 51

14 Top-K Ranking Usually Suffices Vincent Y. F. Tan (NUS) Ranking Mcmaster University 7 / 51

15 Top-K Ranking Usually Suffices Vincent Y. F. Tan (NUS) Ranking Mcmaster University 7 / 51

16 Statistical Model for Top-K Ranking: Part I Adopt the Bradley-Terry-Luce or BTL model in which there is an underlying unknown score vector w = (w 1,..., w n ) R n ++, where w i is the likeability of movie i. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 8 / 51

17 Statistical Model for Top-K Ranking: Part I Adopt the Bradley-Terry-Luce or BTL model in which there is an underlying unknown score vector w = (w 1,..., w n ) R n ++, where w i is the likeability of movie i. Decide which items to compare via a comparison graph Vincent Y. F. Tan (NUS) Ranking Mcmaster University 8 / 51

18 Statistical Model for Top-K Ranking: Part II The outcome of the comparison between item 1 and 2 is ( ) w1 Y 12 = I{item 1 item 2} Bern. w 1 + w 2 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 9 / 51

19 Statistical Model for Top-K Ranking: Part II The outcome of the comparison between item 1 and 2 is ( ) w1 Y 12 = I{item 1 item 2} Bern. w 1 + w 2 E.g., w 1 = 0.9 and w 2 = 0.1, then item 1 beats item 2 w.p. 90%. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 9 / 51

20 Statistical Model for Top-K Ranking: Part II The outcome of the comparison between item 1 and 2 is ( ) w1 Y 12 = I{item 1 item 2} Bern. w 1 + w 2 E.g., w 1 = 0.9 and w 2 = 0.1, then item 1 beats item 2 w.p. 90%. We have L independent copies Y (1) ij,..., Y (L) ij for each observed edge {i, j} E of the observation graph. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 9 / 51

21 Statistical Model for Top-K Ranking: Part II The outcome of the comparison between item 1 and 2 is ( ) w1 Y 12 = I{item 1 item 2} Bern. w 1 + w 2 E.g., w 1 = 0.9 and w 2 = 0.1, then item 1 beats item 2 w.p. 90%. We have L independent copies Y (1) ij,..., Y (L) ij for each observed edge {i, j} E of the observation graph. Determine fundamental limits on L (as a function of n and other parameters) so that recovery of top-k set is successful. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 9 / 51

22 Outline 1 Introduction to Statistical Models for Ranking 2 Fundamental Limits of Top-K Ranking with Adversaries 3 Lower Bounds on the Bayes Risk of a Bayesian BTL Model Vincent Y. F. Tan (NUS) Ranking Mcmaster University 10 / 51

23 Top-K Ranking with Adversaries Joint work with Changho Suh (KAIST) Renbo Zhao (NUS) Vincent Y. F. Tan (NUS) Ranking Mcmaster University 11 / 51

24 Top-K Ranking with Adversaries Joint work with Changho Suh (KAIST) Renbo Zhao (NUS) C. Suh, VYFT and R. Zhao Adversarial Top-K Ranking, IEEE Trans. on Inf. Theory, Apr 2017 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 11 / 51

25 Ranking with Adversaries: Crowdsourced Setting Vincent Y. F. Tan (NUS) Ranking Mcmaster University 12 / 51

26 Ranking with Adversaries: Crowdsourced Setting ( ) wi Y ij Bern w i + w j Vincent Y. F. Tan (NUS) Ranking Mcmaster University 12 / 51

27 Ranking with Adversaries: Crowdsourced Setting ( ) wi Y ij Bern w i + w j ( ) 1/w i Y ij Bern 1/w i + 1/w j Vincent Y. F. Tan (NUS) Ranking Mcmaster University 12 / 51

28 Ranking with Adversaries: Crowdsourced Setting ( ) wi Y ij Bern w i + w j ( ) 1/w i Y ij Bern 1/w i + 1/w j ( ) wj = Bern w i + w j Vincent Y. F. Tan (NUS) Ranking Mcmaster University 12 / 51

29 Ranking with Adversaries: Crowdsourced Setting ( ) wi Y ij Bern w i + w j ( ) 1/w i Y ij Bern 1/w i + 1/w j ( ) wj = Bern w i + w j Spammers provide answers in an adversarial manner Vincent Y. F. Tan (NUS) Ranking Mcmaster University 12 / 51

30 Ranking with Adversaries: Crowdsourced Setting ( ) wi Y ij Bern w i + w j ( ) 1/w i Y ij Bern 1/w i + 1/w j ( ) wj = Bern w i + w j Spammers provide answers in an adversarial manner ( ) w i w j Y ij Bern η + (1 η) w i + w j w i + w j Vincent Y. F. Tan (NUS) Ranking Mcmaster University 12 / 51

31 Related Work: Crowdsourced BTL [Chen et al. 13] 1 Given an observed pair, each sample has different distributions ( ) Y (l) w i 1/w i ij Bern η l + (1 η l ) w i + w j 1/w i + 1/w j where η l is a quality parameter of measurement l 1 X. Chen, P. N. Bennett, K. Collins-Thompson, and E. Horvitz, Pairwise ranking aggregation in a crowdsourced setting, in WSDM, 2013 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 13 / 51

32 Related Work: Crowdsourced BTL [Chen et al. 13] 1 Given an observed pair, each sample has different distributions ( ) Y (l) w i 1/w i ij Bern η l + (1 η l ) w i + w j 1/w i + 1/w j where η l is a quality parameter of measurement l Subsumes as a special case our adversarial BTL model when all quality parameters are the same 1 X. Chen, P. N. Bennett, K. Collins-Thompson, and E. Horvitz, Pairwise ranking aggregation in a crowdsourced setting, in WSDM, 2013 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 13 / 51

33 Related Work: Crowdsourced BTL [Chen et al. 13] 1 Given an observed pair, each sample has different distributions ( ) Y (l) w i 1/w i ij Bern η l + (1 η l ) w i + w j 1/w i + 1/w j where η l is a quality parameter of measurement l Subsumes as a special case our adversarial BTL model when all quality parameters are the same The authors developed a ranking algorithm but without theoretical guarantees 1 X. Chen, P. N. Bennett, K. Collins-Thompson, and E. Horvitz, Pairwise ranking aggregation in a crowdsourced setting, in WSDM, 2013 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 13 / 51

34 Related Work: Crowdsourced BTL [Chen et al. 13] 1 Given an observed pair, each sample has different distributions ( ) Y (l) w i 1/w i ij Bern η l + (1 η l ) w i + w j 1/w i + 1/w j where η l is a quality parameter of measurement l Subsumes as a special case our adversarial BTL model when all quality parameters are the same The authors developed a ranking algorithm but without theoretical guarantees More difficult to analyze as there are many more parameters 1 X. Chen, P. N. Bennett, K. Collins-Thompson, and E. Horvitz, Pairwise ranking aggregation in a crowdsourced setting, in WSDM, 2013 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 13 / 51

35 Goal of Adversarial Top-K Ranking Erdös-Rényi comparison graph Vincent Y. F. Tan (NUS) Ranking Mcmaster University 14 / 51

36 Goal of Adversarial Top-K Ranking Erdös-Rényi comparison graph Vincent Y. F. Tan (NUS) Ranking Mcmaster University 14 / 51

37 Contribution #1: 1/2 < η < 1 known η = Fraction of non-adversaries; K w K w K+1 sample complexity η = 1 studied by Chen and Suh (2015) 2 2 Y. Chen and C. Suh, Spectral MLE: Top-K rank aggregation from pairwise comparisons, in ICML 2015 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 15 / 51

38 Contribution #1: 1/2 < η < 1 known η = Fraction of non-adversaries; K w K w K+1 sample complexity η = 1 studied by Chen and Suh (2015) 2 2 Y. Chen and C. Suh, Spectral MLE: Top-K rank aggregation from pairwise comparisons, in ICML 2015 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 15 / 51

39 Contribution #1: 1/2 < η < 1 known η = Fraction of non-adversaries; K w K w K+1 sample complexity minimax optimal η = 1 studied by Chen and Suh (2015) 2 2 Y. Chen and C. Suh, Spectral MLE: Top-K rank aggregation from pairwise comparisons, in ICML 2015 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 15 / 51

40 Contribution #1: 1/2 < η < 1 known η = Fraction of non-adversaries; K w K w K+1 sample complexity minimax optimal nearly-linear time algorithm η = 1 studied by Chen and Suh (2015) 2 2 Y. Chen and C. Suh, Spectral MLE: Top-K rank aggregation from pairwise comparisons, in ICML 2015 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 15 / 51

41 Contribution #1: 1/2 < η < 1 known Experimental Results for n = 1000 and K = 10 Ŝ = ( ) n pˆl ave 2 Ŝ norm = Ŝ (n log n)/ 2 K Vincent Y. F. Tan (NUS) Ranking Mcmaster University 16 / 51

42 Contribution #2: 1/2 < η < 1 unknown η = Fraction of non-adversaries; K w K w K+1 sample complexity Vincent Y. F. Tan (NUS) Ranking Mcmaster University 17 / 51

43 Contribution #2: 1/2 < η < 1 unknown η = Fraction of non-adversaries; K w K w K+1 sample complexity infeasible Vincent Y. F. Tan (NUS) Ranking Mcmaster University 17 / 51

44 Contribution #2: 1/2 < η < 1 unknown η = Fraction of non-adversaries; K w K w K+1 sample complexity polynomial time algorithm infeasible Vincent Y. F. Tan (NUS) Ranking Mcmaster University 17 / 51

45 Contribution #2: 1/2 < η < 1 unknown η = Fraction of non-adversaries; K w K w K+1 sample complexity?? polynomial time algorithm infeasible Vincent Y. F. Tan (NUS) Ranking Mcmaster University 17 / 51

46 Optimality Vincent Y. F. Tan (NUS) Ranking Mcmaster University 18 / 51

47 Optimality Minimax optimality: Construct worst-case score vectors Vincent Y. F. Tan (NUS) Ranking Mcmaster University 18 / 51

48 Optimality Minimax optimality: Construct worst-case score vectors Translation to M-ary hypothesis testing: Construction of multiple hypotheses Vincent Y. F. Tan (NUS) Ranking Mcmaster University 18 / 51

49 Optimality Minimax optimality: Construct worst-case score vectors Translation to M-ary hypothesis testing: Construction of multiple hypotheses Information-theoretic ideas applied to statistical learning Vincent Y. F. Tan (NUS) Ranking Mcmaster University 18 / 51

50 Optimality: Tools Construction of M := min{k, n K} + 1 n/2 hypotheses: Pr(σ([K]) = S) = 1, for S = {2,..., K} {i}, i = 1, K + 1,..., n M Vincent Y. F. Tan (NUS) Ranking Mcmaster University 19 / 51

51 Optimality: Tools Construction of M := min{k, n K} + 1 n/2 hypotheses: Pr(σ([K]) = S) = 1, for S = {2,..., K} {i}, i = 1, K + 1,..., n M Bound mutual info. of permutation and erased version of Y (l) I(σ; Z) p M 2 σ 1,σ 2 M l=1 L i j ( D P (l) Y ij σ 1 P Y (l) ij σ 2 ) ij : Vincent Y. F. Tan (NUS) Ranking Mcmaster University 19 / 51

52 Optimality: Tools Construction of M := min{k, n K} + 1 n/2 hypotheses: Pr(σ([K]) = S) = 1, for S = {2,..., K} {i}, i = 1, K + 1,..., n M Bound mutual info. of permutation and erased version of Y (l) I(σ; Z) p M 2 σ 1,σ 2 M l=1 L i j ( D P (l) Y ij σ 1 P Y (l) ij σ 2 ) Bound the divergence using reverse Pinsker s inequality. Here is where K comes in ( ) D P (l) Y ij σ 1 P Y (l) n (2η 1) 2 2 ij σ 2 K i j ij : Vincent Y. F. Tan (NUS) Ranking Mcmaster University 19 / 51

53 Optimality: Tools Construction of M := min{k, n K} + 1 n/2 hypotheses: Pr(σ([K]) = S) = 1, for S = {2,..., K} {i}, i = 1, K + 1,..., n M Bound mutual info. of permutation and erased version of Y (l) I(σ; Z) p M 2 σ 1,σ 2 M l=1 L i j ( D P (l) Y ij σ 1 P Y (l) ij σ 2 ) Bound the divergence using reverse Pinsker s inequality. Here is where K comes in ( ) D P (l) Y ij σ 1 P Y (l) n (2η 1) 2 2 ij σ 2 K i j Fano s inequality Vincent Y. F. Tan (NUS) Ranking Mcmaster University 19 / 51 ij :

54 Ranking Algorithm for η Known: Part I Vincent Y. F. Tan (NUS) Ranking Mcmaster University 20 / 51

55 Ranking Algorithm for η Known: Part I Scores determine the ranking Vincent Y. F. Tan (NUS) Ranking Mcmaster University 20 / 51

56 Ranking Algorithm for η Known: Part I Scores determine the ranking Adopt a two-step approach Vincent Y. F. Tan (NUS) Ranking Mcmaster University 20 / 51

57 Ranking Algorithm for η Known: Part II Vincent Y. F. Tan (NUS) Ranking Mcmaster University 21 / 51

58 Ranking Algorithm for η Known: Part II Key Message: Small MSE = Small l Error of ŵ = High Ranking Accuracy Vincent Y. F. Tan (NUS) Ranking Mcmaster University 21 / 51

59 How to ensure small MSE for η = 1? Vincent Y. F. Tan (NUS) Ranking Mcmaster University 22 / 51

60 How to ensure small MSE for η = 1? Recall η = 1 (no adversaries) Vincent Y. F. Tan (NUS) Ranking Mcmaster University 22 / 51

61 How to ensure small MSE for η = 1? Recall η = 1 (no adversaries) L independent copies Y (1) ij, Y (2) ij,..., Y (L) ij Vincent Y. F. Tan (NUS) Ranking Mcmaster University 22 / 51

62 How to ensure small MSE for η = 1? Recall η = 1 (no adversaries) L independent copies Y (1) ij, Y (2) ij,..., Y (L) ij Convergence to stationary distribution 1 L L l=1 Y (l) ij w i w i + w j Vincent Y. F. Tan (NUS) Ranking Mcmaster University 22 / 51

63 How to ensure small MSE for η = 1? Recall η = 1 (no adversaries) L independent copies Y (1) ij, Y (2) ij,..., Y (L) ij Convergence to stationary distribution 1 L L l=1 Detailed balance equation: w j w i π i = π j w i + w j w i + w j Y (l) ij w i w i + w j where π := [π 1, π 2,..., π n ] is the stat. distn. of the chain. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 22 / 51

64 How to ensure small MSE for η = 1? Recall η = 1 (no adversaries) L independent copies Y (1) ij, Y (2) ij,..., Y (L) ij Convergence to stationary distribution 1 L L l=1 Detailed balance equation: w j w i π i = π j w i + w j w i + w j Y (l) ij w i w i + w j where π := [π 1, π 2,..., π n ] is the stat. distn. of the chain. Stationary distribution converges to w (up to constant scaling), i.e., lim L π(l) = αw. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 22 / 51

65 How to ensure small MSE for η (1/2, 1]? Vincent Y. F. Tan (NUS) Ranking Mcmaster University 23 / 51

66 How to ensure small MSE for η (1/2, 1]? Arbitrary η (1/2, 1] (adversaries) L independent copies Y (1) ij, Y (2) ij,..., Y (L) ij Redefine Markov chain Vincent Y. F. Tan (NUS) Ranking Mcmaster University 23 / 51

67 How to ensure small MSE for η (1/2, 1]? Arbitrary η (1/2, 1] (adversaries) L independent copies Y (1) ij, Y (2) ij,..., Y (L) ij Redefine Markov chain We instead have the following convergence: 1 L L l=1 Y (l) ij w i w j η + (1 η) = (2η 1) + (1 η) w i + w j w i + w j w i + w j w i Vincent Y. F. Tan (NUS) Ranking Mcmaster University 23 / 51

68 How to ensure small MSE for η (1/2, 1]? Arbitrary η (1/2, 1] (adversaries) L independent copies Y (1) ij, Y (2) ij,..., Y (L) ij Redefine Markov chain We instead have the following convergence: 1 L L l=1 Y (l) ij w i w j η + (1 η) = (2η 1) + (1 η) w i + w j w i + w j w i + w j Redefine shifted samples with range scaled by 2η 1: [ ] 1 1 L Ỹ ij = Y (l) ij (1 η) w i 2η 1 L w i + w j l=1 w i Vincent Y. F. Tan (NUS) Ranking Mcmaster University 23 / 51

69 How to ensure small MSE for η (1/2, 1]? Arbitrary η (1/2, 1] (adversaries) L independent copies Y (1) ij, Y (2) ij,..., Y (L) ij Redefine Markov chain We instead have the following convergence: 1 L L l=1 Y (l) ij w i w j η + (1 η) = (2η 1) + (1 η) w i + w j w i + w j w i + w j Redefine shifted samples with range scaled by 2η 1: [ ] 1 1 L Ỹ ij = Y (l) ij (1 η) w i 2η 1 L w i + w j l=1 Construct Markov chain with transition probabilities {Ỹ ij }. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 23 / 51 w i

70 Ranking Algorithm for η Known: Summary Vincent Y. F. Tan (NUS) Ranking Mcmaster University 24 / 51

71 Ranking Algorithm for η Known: Summary Use several concentraition inequalities (Hoeffding, Bernstein, Tropp, etc.), we can show that if ( ) n n log n sample size = L p 2 (2η 1) 2 2 = Feasible Top-K Ranking K Vincent Y. F. Tan (NUS) Ranking Mcmaster University 24 / 51

72 What if η is unknown? Adversarial BTL model is a mixture model 3 P. Jain and S. Oh, Learning mixtures of discrete product distributions using spectral decompositions, in COLT, A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade, and M. Telgarsky, Tensor decompositions for learning latent variable models, JMLR, Vincent Y. F. Tan (NUS) Ranking Mcmaster University 25 / 51

73 What if η is unknown? Adversarial BTL model is a mixture model Obtaining global optimality guarantees for mixture model problems is difficult in general 3 P. Jain and S. Oh, Learning mixtures of discrete product distributions using spectral decompositions, in COLT, A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade, and M. Telgarsky, Tensor decompositions for learning latent variable models, JMLR, Vincent Y. F. Tan (NUS) Ranking Mcmaster University 25 / 51

74 What if η is unknown? Adversarial BTL model is a mixture model Obtaining global optimality guarantees for mixture model problems is difficult in general Recent developments: Tensor methods: Jain and Oh 3 and Anandkumar et al. 4 Key idea: Exact 2nd and 3rd moments yield sufficient statistics 3 P. Jain and S. Oh, Learning mixtures of discrete product distributions using spectral decompositions, in COLT, A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade, and M. Telgarsky, Tensor decompositions for learning latent variable models, JMLR, Vincent Y. F. Tan (NUS) Ranking Mcmaster University 25 / 51

75 What if η is unknown? Adversarial BTL model is a mixture model Obtaining global optimality guarantees for mixture model problems is difficult in general Recent developments: Tensor methods: Jain and Oh 3 and Anandkumar et al. 4 Key idea: Exact 2nd and 3rd moments yield sufficient statistics Our setting: Can obtain estimates of 2nd and 3rd moments Can estimate η 3 P. Jain and S. Oh, Learning mixtures of discrete product distributions using spectral decompositions, in COLT, A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade, and M. Telgarsky, Tensor decompositions for learning latent variable models, JMLR, Vincent Y. F. Tan (NUS) Ranking Mcmaster University 25 / 51

76 Unknown η: High-Level Algorithm: Part I 1 Turn weights into distribution vectors [ w i w j w i π 0 = w i + w j w i + w j w i + w j w j w i + w j ] T Vincent Y. F. Tan (NUS) Ranking Mcmaster University 26 / 51

77 Unknown η: High-Level Algorithm: Part I 1 Turn weights into distribution vectors [ w i w j w i π 0 = w i + w j w i + w j w i + w j w j w i + w j ] T 2 Estimate moments. Ground truth moment matrix and tensor are: M 2 := ηπ 0 π 0 + (1 η)π 1 π 1, M 3 := ηπ 0 π 0 π 0 + (1 η)π 1 π 1 π 1. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 26 / 51

78 Unknown η: High-Level Algorithm: Part I 1 Turn weights into distribution vectors [ w i w j w i π 0 = w i + w j w i + w j w i + w j w j w i + w j ] T 2 Estimate moments. Ground truth moment matrix and tensor are: M 2 := ηπ 0 π 0 + (1 η)π 1 π 1, M 3 := ηπ 0 π 0 π 0 + (1 η)π 1 π 1 π 1. 3 Solves a Least Squares Problem ] Ĝ arg min P Ω 3 (Z [P 1 ) 2 ˆM2 3 Y (t) Z R I 2 t I 2 F Vincent Y. F. Tan (NUS) Ranking Mcmaster University 26 / 51

79 Unknown η: High-Level Algorithm: Part I 1 Turn weights into distribution vectors [ w i w j w i π 0 = w i + w j w i + w j w i + w j w j w i + w j ] T 2 Estimate moments. Ground truth moment matrix and tensor are: M 2 := ηπ 0 π 0 + (1 η)π 1 π 1, M 3 := ηπ 0 π 0 π 0 + (1 η)π 1 π 1 π 1. 3 Solves a Least Squares Problem ] Ĝ arg min P Ω 3 (Z [P 1 ) 2 ˆM2 3 Y (t) Z R I 2 t I 2 F 4 Find leading eigenvalue λ 1 (Ĝ) of Ĝ which is related to η as follows: ˆη = λ 1 (Ĝ) 2 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 26 / 51

80 Unknown η: High-Level Algorithm: Part II Vincent Y. F. Tan (NUS) Ranking Mcmaster University 27 / 51

81 Unknown η: High-Level Algorithm: Part II How does the quality of the estimation of η affect overall sample complexity? Vincent Y. F. Tan (NUS) Ranking Mcmaster University 27 / 51

82 Tradeoff Between ˆη η and Sample Complexity With very careful analysis, we can derive a meta-lemma ( ) n ˆη η ɛ = Sample size = L p n log2 n 2 ɛ 2 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 28 / 51

83 Tradeoff Between ˆη η and Sample Complexity With very careful analysis, we can derive a meta-lemma ( ) n ˆη η ɛ = Sample size = L p n log2 n 2 ɛ 2 This implies that ˆη η implies that ŵ w but sample size Vincent Y. F. Tan (NUS) Ranking Mcmaster University 28 / 51

84 Tradeoff Between ˆη η and Sample Complexity With very careful analysis, we can derive a meta-lemma ( ) n ˆη η ɛ = Sample size = L p n log2 n 2 ɛ 2 This implies that ˆη η implies that ŵ w but sample size ˆη η implies that sample size but ŵ w Vincent Y. F. Tan (NUS) Ranking Mcmaster University 28 / 51

85 Tradeoff Between ˆη η and Sample Complexity With very careful analysis, we can derive a meta-lemma ( ) n ˆη η ɛ = Sample size = L p n log2 n 2 ɛ 2 This implies that ˆη η implies that ŵ w but sample size ˆη η implies that sample size but ŵ w Find a sweet spot to show that sample size n log2 n (2η 1) 4 4, = Feasible Top-K ranking K Vincent Y. F. Tan (NUS) Ranking Mcmaster University 28 / 51

86 Conclusion for Adversarial Top-K Ranking Explored a Top-K ranking problem for an adversarial setting Vincent Y. F. Tan (NUS) Ranking Mcmaster University 29 / 51

87 Conclusion for Adversarial Top-K Ranking Explored a Top-K ranking problem for an adversarial setting Characterized exact order-wise optimal sample complexity for η-known case Vincent Y. F. Tan (NUS) Ranking Mcmaster University 29 / 51

88 Conclusion for Adversarial Top-K Ranking Explored a Top-K ranking problem for an adversarial setting Characterized exact order-wise optimal sample complexity for η-known case Established an upper bound on the sample complexity for the η-unknown case Vincent Y. F. Tan (NUS) Ranking Mcmaster University 29 / 51

89 Conclusion for Adversarial Top-K Ranking Explored a Top-K ranking problem for an adversarial setting Characterized exact order-wise optimal sample complexity for η-known case Established an upper bound on the sample complexity for the η-unknown case Developed computationally efficient algorithms for both cases (using state-of-the-art tensor methods for the η-unknown case) Vincent Y. F. Tan (NUS) Ranking Mcmaster University 29 / 51

90 Conclusion for Adversarial Top-K Ranking Explored a Top-K ranking problem for an adversarial setting Characterized exact order-wise optimal sample complexity for η-known case Established an upper bound on the sample complexity for the η-unknown case Developed computationally efficient algorithms for both cases (using state-of-the-art tensor methods for the η-unknown case) C. Suh, VYFT and R. Zhao Adversarial Top-K Ranking, IEEE Trans. on Inf. Theory, Apr 2017 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 29 / 51

91 Outline 1 Introduction to Statistical Models for Ranking 2 Fundamental Limits of Top-K Ranking with Adversaries 3 Lower Bounds on the Bayes Risk of a Bayesian BTL Model Vincent Y. F. Tan (NUS) Ranking Mcmaster University 30 / 51

92 Lower Bounds on the Risk of a Bayesian BTL Model Joint work with Mine Alsan (NUS) Ranjitha Prasad (TCS Innovation Labs, Delhi) Vincent Y. F. Tan (NUS) Ranking Mcmaster University 31 / 51

93 Lower Bounds on the Risk of a Bayesian BTL Model Joint work with Mine Alsan (NUS) Ranjitha Prasad (TCS Innovation Labs, Delhi) M. Alsan, R. Prasad and VYFT, Lower Bounds on the Bayes Risk of the Bayesian BTL Model with Applications to Comparison Graphs, IEEE J. on Sel. Topics of Sig. Proc., Oct 2018 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 31 / 51

94 Summary of contributions Study the fundamental performance limits of ranking algorithms in the Bradley-Terry-Luce model within a Bayesian framework: Vincent Y. F. Tan (NUS) Ranking Mcmaster University 32 / 51

95 Summary of contributions Study the fundamental performance limits of ranking algorithms in the Bradley-Terry-Luce model within a Bayesian framework: 1 Derive lower bounds on the Bayes Risk of estimators. - A family of information-theoretic lower bounds for norm-based distortion functions r r, for any r 1. - The Bayesian Cramér-Rao bound for the MSE, i.e., r = 2. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 32 / 51

96 Summary of contributions Study the fundamental performance limits of ranking algorithms in the Bradley-Terry-Luce model within a Bayesian framework: 1 Derive lower bounds on the Bayes Risk of estimators. - A family of information-theoretic lower bounds for norm-based distortion functions r r, for any r 1. - The Bayesian Cramér-Rao bound for the MSE, i.e., r = 2. 2 Explore optimal comparison graph structures to design experiments minimizing distortion. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 32 / 51

97 Recall the BTL Model BTL model: To each item i [n], a skill parameter w i R ++ s.t. P ij := Pr[item i item j] = w i w i + w j. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 33 / 51

98 Recall the BTL Model BTL model: To each item i [n], a skill parameter w i R ++ s.t. P ij := Pr[item i item j] = w i w i + w j. Instead of Top-K ranking, now we want to estimate the vector w := (w 1,..., w n ) R n ++ Vincent Y. F. Tan (NUS) Ranking Mcmaster University 33 / 51

99 Recall the BTL Model BTL model: To each item i [n], a skill parameter w i R ++ s.t. P ij := Pr[item i item j] = w i w i + w j. Instead of Top-K ranking, now we want to estimate the vector Given w := (w 1,..., w n ) R n ++ m = (i,j):i j m ij N indep. pairwise comparisons, we count: Vincent Y. F. Tan (NUS) Ranking Mcmaster University 33 / 51

100 Recall the BTL Model BTL model: To each item i [n], a skill parameter w i R ++ s.t. P ij := Pr[item i item j] = w i w i + w j. Instead of Top-K ranking, now we want to estimate the vector w := (w 1,..., w n ) R n ++ Given m = m ij N (i,j):i j indep. pairwise comparisons, we count: 1 m ij : Num. of pairwise comparisons between items i & j, 2 b ij : Num. of comparisons in which i is preferred over j. M := {m ij } N n n and B := {b ij } N n n. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 33 / 51

101 Induced Probabilities by BTL Model We assume that the matrix M = {m ij } is fixed a priori. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 34 / 51

102 Induced Probabilities by BTL Model We assume that the matrix M = {m ij } is fixed a priori. The BTL model induces the following distributions: 1 For fixed m ij, p(b ij w i, w j ) = Bin(b ij ; m ij, P ij ). 2 For fixed M, p(b λ) = Bin(b ij ; m ij, P ij ), (i,j):i<j Vincent Y. F. Tan (NUS) Ranking Mcmaster University 34 / 51

103 Bayesian BTL Model Adopt the Bayesian BTL framework by Caron & Doucet 5 : 5 F. Caron and A. Doucet, Efficient Bayesian Inference for Generalized Bradley-Terry Models", in JCGS, 2012 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 35 / 51

104 Bayesian BTL Model Adopt the Bayesian BTL framework by Caron & Doucet 5 : 1 Prior distribution: They assign p(w i ) = Gam(w i ; α i, β i ) to each item i [n], where α = {α i } n i=1, β := {β i} n i=1 Rn F. Caron and A. Doucet, Efficient Bayesian Inference for Generalized Bradley-Terry Models", in JCGS, 2012 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 35 / 51

105 Bayesian BTL Model Adopt the Bayesian BTL framework by Caron & Doucet 5 : 1 Prior distribution: They assign p(w i ) = Gam(w i ; α i, β i ) to each item i [n], where α = {α i } n i=1, β := {β i} n i=1 Rn Latent random variables: They introduce Z := {Z ij } R n n m ij Z ij = Z ji := min{y si, Y sj }, for i, j [n] such that i < j, where Y i Exp(w i ) & Y j Exp(w j ) such that P ij = Pr[Y i < Y j ]. Known as Thurstonian interpretation of the BTL model. 5 F. Caron and A. Doucet, Efficient Bayesian Inference for Generalized Bradley-Terry Models", in JCGS, 2012 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 35 / 51 s=1

106 Induced Probabilities by Bayesian BTL Model 1 Prior: n n p(w) = p(w i ) = Gam(w i ; α i, β i ), i=1 i=1 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 36 / 51

107 Induced Probabilities by Bayesian BTL Model 1 Prior: p(w) = 2 Prior Likelihood: p(w, B) = p(w)p(b w) = n p(w i ) = i=1 n n Gam(w i ; α i, β i ), i=1 i=1 Gam(w i ; α i, β i ) i<j ( Bin b ij ; m ij, w i w i + w j ). Vincent Y. F. Tan (NUS) Ranking Mcmaster University 36 / 51

108 Induced Probabilities by Bayesian BTL Model 1 Prior: p(w) = 2 Prior Likelihood: p(w, B) = p(w)p(b w) = 3 Latent Variable: n p(w i ) = i=1 n n Gam(w i ; α i, β i ), i=1 i=1 Gam(w i ; α i, β i ) i<j p(z ij w i, w j ) = Gam(Z ij ; m ij, w i + w j ). ( Bin b ij ; m ij, w i w i + w j ). Vincent Y. F. Tan (NUS) Ranking Mcmaster University 36 / 51

109 Induced Probabilities by Bayesian BTL Model 1 Prior: p(w) = 2 Prior Likelihood: p(w, B) = p(w)p(b w) = 3 Latent Variable: n p(w i ) = i=1 n n Gam(w i ; α i, β i ), i=1 i=1 Gam(w i ; α i, β i ) i<j p(z ij w i, w j ) = Gam(Z ij ; m ij, w i + w j ). ( Bin b ij ; m ij, w i w i + w j ). 4 Posterior: n p(w B, Z) = Gam(w i ; α i + b i, β i + Z i ). i=1 where b i := j i b ij and Z i := j i Z ij. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 36 / 51

110 Bayes Risk For any r 1, we define the family of Bayes risks for estimating w Vincent Y. F. Tan (NUS) Ranking Mcmaster University 37 / 51

111 Bayes Risk For any r 1, we define the family of Bayes risks for estimating w 1 from only B as R B := inf ϕ E [ w ϕ(b) r r ], where ϕ(b) is an estimator of w. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 37 / 51

112 Bayes Risk For any r 1, we define the family of Bayes risks for estimating w 1 from only B as R B := inf ϕ E [ w ϕ(b) r r ], where ϕ(b) is an estimator of w. 2 from B and the latent variable Z as [ R B := inf E w ϕ (B, Z) ] r, ϕ r where ϕ (B, Z) is an estimator of w. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 37 / 51

113 Bayes Risk For any r 1, we define the family of Bayes risks for estimating w 1 from only B as R B := inf ϕ E [ w ϕ(b) r r ], where ϕ(b) is an estimator of w. 2 from B and the latent variable Z as [ R B := inf E w ϕ (B, Z) ] r, ϕ r where ϕ (B, Z) is an estimator of w. R B R B. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 37 / 51

114 Bayesian Network of All Variables w i Gam(w i ; α i, β i ) Prior on w i Vincent Y. F. Tan (NUS) Ranking Mcmaster University 38 / 51

115 Bayesian Network of All Variables w i P ij = w i + w j BTL model Vincent Y. F. Tan (NUS) Ranking Mcmaster University 38 / 51

116 Bayesian Network of All Variables Y si Exp(w i ) Latent Arrival Times Vincent Y. F. Tan (NUS) Ranking Mcmaster University 38 / 51

117 Bayesian Network of All Variables b ij Bin (b ij ; m ij, P ij ) Num of times i beats j out of m ij games Vincent Y. F. Tan (NUS) Ranking Mcmaster University 38 / 51

118 Bayesian Network of All Variables m ij Z ij = min{y si, Y sj } : Latent variables s=1 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 38 / 51

119 Bayesian Network of All Variables ϕ(b) and ϕ (B, Z) : Functions to estimate w Vincent Y. F. Tan (NUS) Ranking Mcmaster University 38 / 51

120 General Lower Bounds on the Bayes Risk For r = 2, can compute the Bayesian Cramér-Rao bound on R B. 6 A. Xu and M. Raginsky, Information-Theoretic Lower Bounds on Bayes Risk in Decentralized Estimation," in IEEE Trans. on Inf. Theory, Vincent Y. F. Tan (NUS) Ranking Mcmaster University 39 / 51

121 General Lower Bounds on the Bayes Risk For r = 2, can compute the Bayesian Cramér-Rao bound on R B. We compute a family of information-theoretic lower bounds: 6 A. Xu and M. Raginsky, Information-Theoretic Lower Bounds on Bayes Risk in Decentralized Estimation," in IEEE Trans. on Inf. Theory, Vincent Y. F. Tan (NUS) Ranking Mcmaster University 39 / 51

122 General Lower Bounds on the Bayes Risk For r = 2, can compute the Bayesian Cramér-Rao bound on R B. We compute a family of information-theoretic lower bounds: 1 Theorem 3 of Xu and Raginsky 6 reads: For any r 1, R B n ( ( V n Γ 1 + n ) ) r/n [ exp r ) I(w; B, Z) h(w) re r n( ], where V n is the volume of the unit ball in (R n, r ). 6 A. Xu and M. Raginsky, Information-Theoretic Lower Bounds on Bayes Risk in Decentralized Estimation," in IEEE Trans. on Inf. Theory, Vincent Y. F. Tan (NUS) Ranking Mcmaster University 39 / 51

123 General Lower Bounds on the Bayes Risk For r = 2, can compute the Bayesian Cramér-Rao bound on R B. We compute a family of information-theoretic lower bounds: 1 Theorem 3 of Xu and Raginsky 6 reads: For any r 1, R B n ( ( V n Γ 1 + n ) ) r/n [ exp r ) I(w; B, Z) h(w) re r n( ], where V n is the volume of the unit ball in (R n, r ). 6 A. Xu and M. Raginsky, Information-Theoretic Lower Bounds on Bayes Risk in Decentralized Estimation," in IEEE Trans. on Inf. Theory, Vincent Y. F. Tan (NUS) Ranking Mcmaster University 39 / 51

124 General Lower Bounds on the Bayes Risk For r = 2, can compute the Bayesian Cramér-Rao bound on R B. We compute a family of information-theoretic lower bounds: 1 Theorem 3 of Xu and Raginsky 6 reads: For any r 1, R B n ( ( V n Γ 1 + n ) ) r/n [ exp r ) I(w; B, Z) h(w) re r n( ], where V n is the volume of the unit ball in (R n, r ). 2 Using Stirling s approximation, we upper bound I(w; B, Z) h(w) = E [log p(w B, Z)]. 6 A. Xu and M. Raginsky, Information-Theoretic Lower Bounds on Bayes Risk in Decentralized Estimation," in IEEE Trans. on Inf. Theory, Vincent Y. F. Tan (NUS) Ranking Mcmaster University 39 / 51

125 Family of Information-Theoretic Lower Bounds Theorem For all i [n], let m i := 1 m ij. half the total num. of games i plays 2 j i Vincent Y. F. Tan (NUS) Ranking Mcmaster University 40 / 51

126 Family of Information-Theoretic Lower Bounds Theorem For all i [n], let m i := 1 m ij. 2 j i half the total num. of games i plays Then, the Bayes risk is asymptotically lower bounded by R B n ( ( V n Γ 1 + n ) ) r/n exp [ r E(B, α, β) ], re r where Vincent Y. F. Tan (NUS) Ranking Mcmaster University 40 / 51

127 Family of Information-Theoretic Lower Bounds Theorem For all i [n], let m i := 1 m ij. 2 j i half the total num. of games i plays Then, the Bayes risk is asymptotically lower bounded by R B n ( ( V n Γ 1 + n ) ) r/n exp [ r E(B, α, β) ], re r where E(B, α, β) := ( ) n 1 2 log (2π) + log β i ψ(α i ) log (α i + m i ). i=1 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 40 / 51

128 Information-Theoretic Lower Bounds Take α i = α and β i = β, for each i [n]. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 41 / 51

129 Information-Theoretic Lower Bounds Take α i = α and β i = β, for each i [n]. For the L 1 norm (r = 1), π R B 2 exp [ (log β ψ(α) + 1) ] n, α/n + m Vincent Y. F. Tan (NUS) Ranking Mcmaster University 41 / 51

130 Information-Theoretic Lower Bounds Take α i = α and β i = β, for each i [n]. For the L 1 norm (r = 1), π R B 2 exp [ (log β ψ(α) + 1) ] n, α/n + m For the squared L 2 norm (r = 2), R B exp [ 2(log β ψ(α)) 1 ] n α/n + m. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 41 / 51

131 Performance of Lower Bounds: L 1 error L1 Error EM algorithm Information theoretic lower bound Sample Size Figure: L 1 error of the EM algo. and the information-theoretic lower bound (for n = 100, α = 5 and β = αn 1). Vincent Y. F. Tan (NUS) Ranking Mcmaster University 42 / 51

132 Perf. of Lower Bounds: MSE (squared L 2 error) MSE EM algorithm BCRB Information theoretic lower bound Sample Size Figure: L 2 error of the EM algo., the IT lower bound and the BCRB (for n = 100, α = 5, and β = αn 1). Vincent Y. F. Tan (NUS) Ranking Mcmaster University 43 / 51

133 Application to General Comparison Graphs Given a fixed budget of m = i j m ij games, how to allocate games among n players to minimize the bounds? Vincent Y. F. Tan (NUS) Ranking Mcmaster University 44 / 51

134 Application to General Comparison Graphs Given a fixed budget of m = i j m ij games, how to allocate games among n players to minimize the bounds? Corollary (Optimal Connected Graphs) Regular Connected Graphs are Optimal! Vincent Y. F. Tan (NUS) Ranking Mcmaster University 44 / 51

135 Application to General Comparison Graphs Proof: Minimizing the lower bound is equivalent to maximizing f ({m i } i [n] ) := n i=1 subject to n i=1 m i = m and m i N. 1 2 log (α i + m i ) Vincent Y. F. Tan (NUS) Ranking Mcmaster University 45 / 51

136 Application to General Comparison Graphs Proof: Minimizing the lower bound is equivalent to maximizing f ({m i } i [n] ) := n i=1 subject to n i=1 m i = m and m i N. Solution given by water-filling formula: 1 2 log (α i + m i ) m i = µ α i +, i [n], where µ > 0 is chosen such that n µ α i + = m. i=1 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 45 / 51

137 Application to General Comparison Graphs Proof: Minimizing the lower bound is equivalent to maximizing f ({m i } i [n] ) := n i=1 subject to n i=1 m i = m and m i N. Solution given by water-filling formula: 1 2 log (α i + m i ) m i = µ α i +, i [n], where µ > 0 is chosen such that n µ α i + = m. i=1 But when α i = α for all i, m i are all equal. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 45 / 51

138 The Gamma Distribution with Fixed β = 1 α i = Greater belief that w i = Games i plays with others m i , = 1, = 2, = 3, = 4, = Vincent Y. F. Tan (NUS) Ranking Mcmaster University 46 / 51

139 Application to Comparison Graphs: Restricted to Trees Vincent Y. F. Tan (NUS) Ranking Mcmaster University 47 / 51

140 Application to Comparison Graphs: Restricted to Trees Corollary (Optimal Tree Graphs) Vincent Y. F. Tan (NUS) Ranking Mcmaster University 47 / 51

141 Application to Comparison Graphs: Restricted to Trees Corollary (Optimal Tree Graphs) 1 Best: Minimizes the (lower bound on the) Bayes Risk 2 Worst: Maximizes the (lower bound on the) Bayes Risk Vincent Y. F. Tan (NUS) Ranking Mcmaster University 47 / 51

142 Application to Comparison Graphs: Restricted to Trees Proof for Star: Maximizing the lower bound on Bayes risk equivalent to min m: g(m) := 1 i m i=m 2 log α + 2m log(α + m i) i i i m i where m = {m i } i [n] and i = 1 is the central node. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 48 / 51

143 Application to Comparison Graphs: Restricted to Trees Proof for Star: Maximizing the lower bound on Bayes risk equivalent to min m: g(m) := 1 i m i=m 2 log α + 2m log(α + m i) i i i m i where m = {m i } i [n] and i = 1 is the central node. Shift part of weight of an edge m 1j > 0, for j 1, to create a new edge with weight m ji such that i 1. Can show that g(m 1,..., m n ) m i > 0 implying that f will be increased by the new configuration. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 48 / 51

144 Effect of Tree Graph Structure on IT Bound Fully Connected Graph Star Graph Chain Graph Random Tree MSE Sample Size Figure: IT bounds for diff. graph structures (for n = 100, α = 5, β = αn 1). Vincent Y. F. Tan (NUS) Ranking Mcmaster University 49 / 51

145 Effect of Tree Graph Structure on BCRB Fully Connected Graph Star Graph Chain Graph Random Tree MSE Sample Size Figure: BCRB for diff. graph structures (for n = 100, α = 5, β = αn 1). Vincent Y. F. Tan (NUS) Ranking Mcmaster University 50 / 51

146 Final Remarks Also derived lower bounds for the home-field advantage scenario: Q ij := θλ i, if i is home, θλ P ij = i + λ j λ i Q ij :=, if j is home, λ i + θλ j where θ R ++ models the strength of advantage (θ > 1) Vincent Y. F. Tan (NUS) Ranking Mcmaster University 51 / 51

147 Final Remarks Also derived lower bounds for the home-field advantage scenario: Q ij := θλ i, if i is home, θλ P ij = i + λ j λ i Q ij :=, if j is home, λ i + θλ j where θ R ++ models the strength of advantage (θ > 1) Future works: Matching information-theoretic upper bounds Vincent Y. F. Tan (NUS) Ranking Mcmaster University 51 / 51

148 Final Remarks Also derived lower bounds for the home-field advantage scenario: Q ij := θλ i, if i is home, θλ P ij = i + λ j λ i Q ij :=, if j is home, λ i + θλ j where θ R ++ models the strength of advantage (θ > 1) Future works: Matching information-theoretic upper bounds Other questions related to comparing graph structure, e.g., Vincent Y. F. Tan (NUS) Ranking Mcmaster University 51 / 51

149 Final Remarks Also derived lower bounds for the home-field advantage scenario: Q ij := θλ i, if i is home, θλ P ij = i + λ j λ i Q ij :=, if j is home, λ i + θλ j where θ R ++ models the strength of advantage (θ > 1) Future works: Matching information-theoretic upper bounds Other questions related to comparing graph structure, e.g., Does the fully-connected graph outperform a simple cycle?" Vincent Y. F. Tan (NUS) Ranking Mcmaster University 51 / 51

150 Final Remarks Also derived lower bounds for the home-field advantage scenario: Q ij := θλ i, if i is home, θλ P ij = i + λ j λ i Q ij :=, if j is home, λ i + θλ j where θ R ++ models the strength of advantage (θ > 1) Future works: Matching information-theoretic upper bounds Other questions related to comparing graph structure, e.g., Does the fully-connected graph outperform a simple cycle?" M. Alsan, R. Prasad and VYFT, Lower Bounds on the Bayes Risk of the Bayesian BTL Model with Applications to Comparison Graphs, IEEE J. on Sel. Topics of Sig. Proc., Oct 2018 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 51 / 51

Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking

Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking Yuxin Chen Electrical Engineering, Princeton University Joint work with Jianqing Fan, Cong Ma and Kaizheng Wang Ranking A fundamental

More information

Ranking from Pairwise Comparisons

Ranking from Pairwise Comparisons Ranking from Pairwise Comparisons Sewoong Oh Department of ISE University of Illinois at Urbana-Champaign Joint work with Sahand Negahban(MIT) and Devavrat Shah(MIT) / 9 Rank Aggregation Data Algorithm

More information

o A set of very exciting (to me, may be others) questions at the interface of all of the above and more

o A set of very exciting (to me, may be others) questions at the interface of all of the above and more Devavrat Shah Laboratory for Information and Decision Systems Department of EECS Massachusetts Institute of Technology http://web.mit.edu/devavrat/www (list of relevant references in the last set of slides)

More information

Minimax-optimal Inference from Partial Rankings

Minimax-optimal Inference from Partial Rankings Minimax-optimal Inference from Partial Rankings Bruce Hajek UIUC b-hajek@illinois.edu Sewoong Oh UIUC swoh@illinois.edu Jiaming Xu UIUC jxu8@illinois.edu Abstract This paper studies the problem of rank

More information

An Introduction to Spectral Learning

An Introduction to Spectral Learning An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words Preliminaries X 1,, X n p (x; θ), θ = (θ 1,

More information

Estimating Latent Variable Graphical Models with Moments and Likelihoods

Estimating Latent Variable Graphical Models with Moments and Likelihoods Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods

More information

Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems

Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems Sewoong Oh Massachusetts Institute of Technology joint work with David R. Karger and Devavrat Shah September 28, 2011 1 / 13 Crowdsourcing

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Statistical and Computational Phase Transitions in Planted Models

Statistical and Computational Phase Transitions in Planted Models Statistical and Computational Phase Transitions in Planted Models Jiaming Xu Joint work with Yudong Chen (UC Berkeley) Acknowledgement: Prof. Bruce Hajek November 4, 203 Cluster/Community structure in

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018 ELE 538B: Mathematics of High-Dimensional Data Spectral methods Yuxin Chen Princeton University, Fall 2018 Outline A motivating application: graph clustering Distance and angles between two subspaces Eigen-space

More information

Planning and Model Selection in Data Driven Markov models

Planning and Model Selection in Data Driven Markov models Planning and Model Selection in Data Driven Markov models Shie Mannor Department of Electrical Engineering Technion Joint work with many people along the way: Dotan Di-Castro (Yahoo!), Assaf Halak (Technion),

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Information Recovery from Pairwise Measurements: A Shannon-Theoretic Approach

Information Recovery from Pairwise Measurements: A Shannon-Theoretic Approach Information Recovery from Pairwise Measurements: A Shannon-Theoretic Approach Yuxin Chen Statistics Stanford University Email: yxchen@stanfordedu Changho Suh EE KAIST Email: chsuh@kaistackr Andrea J Goldsmith

More information

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can

More information

Information Recovery from Pairwise Measurements

Information Recovery from Pairwise Measurements Information Recovery from Pairwise Measurements A Shannon-Theoretic Approach Yuxin Chen, Changho Suh, Andrea Goldsmith Stanford University KAIST Page 1 Recovering data from correlation measurements A large

More information

On the errors introduced by the naive Bayes independence assumption

On the errors introduced by the naive Bayes independence assumption On the errors introduced by the naive Bayes independence assumption Author Matthijs de Wachter 3671100 Utrecht University Master Thesis Artificial Intelligence Supervisor Dr. Silja Renooij Department of

More information

Scaling Neighbourhood Methods

Scaling Neighbourhood Methods Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)

More information

Lecture 5 September 19

Lecture 5 September 19 IFT 6269: Probabilistic Graphical Models Fall 2016 Lecture 5 September 19 Lecturer: Simon Lacoste-Julien Scribe: Sébastien Lachapelle Disclaimer: These notes have only been lightly proofread. 5.1 Statistical

More information

Information theoretic perspectives on learning algorithms

Information theoretic perspectives on learning algorithms Information theoretic perspectives on learning algorithms Varun Jog University of Wisconsin - Madison Departments of ECE and Mathematics Shannon Channel Hangout! May 8, 2018 Jointly with Adrian Tovar-Lopez

More information

Consensus and Distributed Inference Rates Using Network Divergence

Consensus and Distributed Inference Rates Using Network Divergence DIMACS August 2017 1 / 26 Consensus and Distributed Inference Rates Using Network Divergence Anand D. Department of Electrical and Computer Engineering, The State University of New Jersey August 23, 2017

More information

CS155: Probability and Computing: Randomized Algorithms and Probabilistic Analysis

CS155: Probability and Computing: Randomized Algorithms and Probabilistic Analysis CS155: Probability and Computing: Randomized Algorithms and Probabilistic Analysis Eli Upfal Eli Upfal@brown.edu Office: 319 TA s: Lorenzo De Stefani and Sorin Vatasoiu cs155tas@cs.brown.edu It is remarkable

More information

Appendix A. Proof to Theorem 1

Appendix A. Proof to Theorem 1 Appendix A Proof to Theorem In this section, we prove the sample complexity bound given in Theorem The proof consists of three main parts In Appendix A, we prove perturbation lemmas that bound the estimation

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

U Logo Use Guidelines

U Logo Use Guidelines Information Theory Lecture 3: Applications to Machine Learning U Logo Use Guidelines Mark Reid logo is a contemporary n of our heritage. presents our name, d and our motto: arn the nature of things. authenticity

More information

Simple, Robust and Optimal Ranking from Pairwise Comparisons

Simple, Robust and Optimal Ranking from Pairwise Comparisons Journal of Machine Learning Research 8 (208) -38 Submitted 4/6; Revised /7; Published 4/8 Simple, Robust and Optimal Ranking from Pairwise Comparisons Nihar B. Shah Machine Learning Department and Computer

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information 204 IEEE International Symposium on Information Theory Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information Omur Ozel, Kaya Tutuncuoglu 2, Sennur Ulukus, and Aylin Yener

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk

Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk John Lafferty School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 lafferty@cs.cmu.edu Abstract

More information

A Tight Upper Bound on the Second-Order Coding Rate of Parallel Gaussian Channels with Feedback

A Tight Upper Bound on the Second-Order Coding Rate of Parallel Gaussian Channels with Feedback A Tight Upper Bound on the Second-Order Coding Rate of Parallel Gaussian Channels with Feedback Vincent Y. F. Tan (NUS) Joint work with Silas L. Fong (Toronto) 2017 Information Theory Workshop, Kaohsiung,

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

Notes 6 : First and second moment methods

Notes 6 : First and second moment methods Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Learning from the Wisdom of Crowds by Minimax Entropy. Denny Zhou, John Platt, Sumit Basu and Yi Mao Microsoft Research, Redmond, WA

Learning from the Wisdom of Crowds by Minimax Entropy. Denny Zhou, John Platt, Sumit Basu and Yi Mao Microsoft Research, Redmond, WA Learning from the Wisdom of Crowds by Minimax Entropy Denny Zhou, John Platt, Sumit Basu and Yi Mao Microsoft Research, Redmond, WA Outline 1. Introduction 2. Minimax entropy principle 3. Future work and

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Worst-Case Bounds for Gaussian Process Models

Worst-Case Bounds for Gaussian Process Models Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis

More information

Lecture 17: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 31, 2016 [Ed. Apr 1]

Lecture 17: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 31, 2016 [Ed. Apr 1] ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture 7: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 3, 06 [Ed. Apr ] In last lecture, we studied the minimax

More information

ECE 598: Representation Learning: Algorithms and Models Fall 2017

ECE 598: Representation Learning: Algorithms and Models Fall 2017 ECE 598: Representation Learning: Algorithms and Models Fall 2017 Lecture 1: Tensor Methods in Machine Learning Lecturer: Pramod Viswanathan Scribe: Bharath V Raghavan, Oct 3, 2017 11 Introduction Tensors

More information

Supplemental for Spectral Algorithm For Latent Tree Graphical Models

Supplemental for Spectral Algorithm For Latent Tree Graphical Models Supplemental for Spectral Algorithm For Latent Tree Graphical Models Ankur P. Parikh, Le Song, Eric P. Xing The supplemental contains 3 main things. 1. The first is network plots of the latent variable

More information

11. Learning graphical models

11. Learning graphical models Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

Large-Deviations and Applications for Learning Tree-Structured Graphical Models

Large-Deviations and Applications for Learning Tree-Structured Graphical Models Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of Information and Decision Systems, Massachusetts Institute of Technology Thesis

More information

Supplementary Material for: Spectral Unsupervised Parsing with Additive Tree Metrics

Supplementary Material for: Spectral Unsupervised Parsing with Additive Tree Metrics Supplementary Material for: Spectral Unsupervised Parsing with Additive Tree Metrics Ankur P. Parikh School of Computer Science Carnegie Mellon University apparikh@cs.cmu.edu Shay B. Cohen School of Informatics

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Jialin Dong ShanghaiTech University 1 Outline Introduction FourVignettes: System Model and Problem Formulation Problem Analysis

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic

More information

Approximate Inference in Practice Microsoft s Xbox TrueSkill TM

Approximate Inference in Practice Microsoft s Xbox TrueSkill TM 1 Approximate Inference in Practice Microsoft s Xbox TrueSkill TM Daniel Hernández-Lobato Universidad Autónoma de Madrid 2014 2 Outline 1 Introduction 2 The Probabilistic Model 3 Approximate Inference

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Fall 2013 Full points may be obtained for correct answers to eight questions. Each numbered question (which may have several parts) is worth

More information

Directed Probabilistic Graphical Models CMSC 678 UMBC

Directed Probabilistic Graphical Models CMSC 678 UMBC Directed Probabilistic Graphical Models CMSC 678 UMBC Announcement 1: Assignment 3 Due Wednesday April 11 th, 11:59 AM Any questions? Announcement 2: Progress Report on Project Due Monday April 16 th,

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

Generalization theory

Generalization theory Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1

More information

Bregman Divergences for Data Mining Meta-Algorithms

Bregman Divergences for Data Mining Meta-Algorithms p.1/?? Bregman Divergences for Data Mining Meta-Algorithms Joydeep Ghosh University of Texas at Austin ghosh@ece.utexas.edu Reflects joint work with Arindam Banerjee, Srujana Merugu, Inderjit Dhillon,

More information

Variational inference

Variational inference Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France. Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM

More information

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables? Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do

More information

Online Forest Density Estimation

Online Forest Density Estimation Online Forest Density Estimation Frédéric Koriche CRIL - CNRS UMR 8188, Univ. Artois koriche@cril.fr UAI 16 1 Outline 1 Probabilistic Graphical Models 2 Online Density Estimation 3 Online Forest Density

More information

CS Lecture 18. Topic Models and LDA

CS Lecture 18. Topic Models and LDA CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same

More information

Theory of Stochastic Processes 8. Markov chain Monte Carlo

Theory of Stochastic Processes 8. Markov chain Monte Carlo Theory of Stochastic Processes 8. Markov chain Monte Carlo Tomonari Sei sei@mist.i.u-tokyo.ac.jp Department of Mathematical Informatics, University of Tokyo June 8, 2017 http://www.stat.t.u-tokyo.ac.jp/~sei/lec.html

More information

Learning Mixtures of Random Utility Models

Learning Mixtures of Random Utility Models Learning Mixtures of Random Utility Models Zhibing Zhao and Tristan Villamil and Lirong Xia Rensselaer Polytechnic Institute 110 8th Street, Troy, NY, USA {zhaoz6, villat2}@rpi.edu, xial@cs.rpi.edu Abstract

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels Feedback Capacity of a Class of Symmetric Finite-State Markov Channels Nevroz Şen, Fady Alajaji and Serdar Yüksel Department of Mathematics and Statistics Queen s University Kingston, ON K7L 3N6, Canada

More information

Strong Converse Theorems for Classes of Multimessage Multicast Networks: A Rényi Divergence Approach

Strong Converse Theorems for Classes of Multimessage Multicast Networks: A Rényi Divergence Approach Strong Converse Theorems for Classes of Multimessage Multicast Networks: A Rényi Divergence Approach Silas Fong (Joint work with Vincent Tan) Department of Electrical & Computer Engineering National University

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

The PAC Learning Framework -II

The PAC Learning Framework -II The PAC Learning Framework -II Prof. Dan A. Simovici UMB 1 / 1 Outline 1 Finite Hypothesis Space - The Inconsistent Case 2 Deterministic versus stochastic scenario 3 Bayes Error and Noise 2 / 1 Outline

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Information Complexity and Applications. Mark Braverman Princeton University and IAS FoCM 17 July 17, 2017

Information Complexity and Applications. Mark Braverman Princeton University and IAS FoCM 17 July 17, 2017 Information Complexity and Applications Mark Braverman Princeton University and IAS FoCM 17 July 17, 2017 Coding vs complexity: a tale of two theories Coding Goal: data transmission Different channels

More information

Bayesian Contextual Multi-armed Bandits

Bayesian Contextual Multi-armed Bandits Bayesian Contextual Multi-armed Bandits Xiaoting Zhao Joint Work with Peter I. Frazier School of Operations Research and Information Engineering Cornell University October 22, 2012 1 / 33 Outline 1 Motivating

More information

The Method of Types and Its Application to Information Hiding

The Method of Types and Its Application to Information Hiding The Method of Types and Its Application to Information Hiding Pierre Moulin University of Illinois at Urbana-Champaign www.ifp.uiuc.edu/ moulin/talks/eusipco05-slides.pdf EUSIPCO Antalya, September 7,

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

arxiv: v1 [stat.ml] 1 Jan 2019

arxiv: v1 [stat.ml] 1 Jan 2019 CONVERGENCE RATES OF GRADIENT DESCENT AND MM ALGORITHMS FOR GENERALIZED BRADLEY-TERRY MODELS arxiv:1901.00150v1 [stat.ml] 1 Jan 2019 By Milan Vojnovic, Seyoung Yun and Kaifang Zhou London School of Economics

More information

Volodymyr Kuleshov ú Arun Tejasvi Chaganty ú Percy Liang. May 11, 2015

Volodymyr Kuleshov ú Arun Tejasvi Chaganty ú Percy Liang. May 11, 2015 Tensor Factorization via Matrix Factorization Volodymyr Kuleshov ú Arun Tejasvi Chaganty ú Percy Liang Stanford University May 11, 2015 Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization

More information

Matrix estimation by Universal Singular Value Thresholding

Matrix estimation by Universal Singular Value Thresholding Matrix estimation by Universal Singular Value Thresholding Courant Institute, NYU Let us begin with an example: Suppose that we have an undirected random graph G on n vertices. Model: There is a real symmetric

More information

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

1-bit Matrix Completion. PAC-Bayes and Variational Approximation : PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Bayes In Paris, 5 January 2017 (Happy New Year!) Various Topics covered Matrix Completion PAC-Bayesian Estimation Variational

More information

Stratégies bayésiennes et fréquentistes dans un modèle de bandit

Stratégies bayésiennes et fréquentistes dans un modèle de bandit Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016

More information

A physical model for efficient rankings in networks

A physical model for efficient rankings in networks A physical model for efficient rankings in networks Daniel Larremore Assistant Professor Dept. of Computer Science & BioFrontiers Institute March 5, 2018 CompleNet danlarremore.com @danlarremore The idea

More information

Permuation Models meet Dawid-Skene: A Generalised Model for Crowdsourcing

Permuation Models meet Dawid-Skene: A Generalised Model for Crowdsourcing Permuation Models meet Dawid-Skene: A Generalised Model for Crowdsourcing Ankur Mallick Electrical and Computer Engineering Carnegie Mellon University amallic@andrew.cmu.edu Abstract The advent of machine

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Econ 2148, spring 2019 Statistical decision theory

Econ 2148, spring 2019 Statistical decision theory Econ 2148, spring 2019 Statistical decision theory Maximilian Kasy Department of Economics, Harvard University 1 / 53 Takeaways for this part of class 1. A general framework to think about what makes a

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Introduction to graphical models: Lecture III

Introduction to graphical models: Lecture III Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction

More information

Models of collective inference

Models of collective inference Models of collective inference Laurent Massoulié (Microsoft Research-Inria Joint Centre) Mesrob I. Ohannessian (University of California, San Diego) Alexandre Proutière (KTH Royal Institute of Technology)

More information

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016 AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework

More information

Introduction to Bayesian Learning

Introduction to Bayesian Learning Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline

More information

Multi-armed bandit models: a tutorial

Multi-armed bandit models: a tutorial Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)

More information

Approximate Inference

Approximate Inference Approximate Inference Simulation has a name: sampling Sampling is a hot topic in machine learning, and it s really simple Basic idea: Draw N samples from a sampling distribution S Compute an approximate

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Detection problems can usually be casted as binary or M-ary hypothesis testing problems. Applications: This chapter: Simple hypothesis

More information