Recent Advances in Ranking: Adversarial Respondents and Lower Bounds on the Bayes Risk
|
|
- Maryann Richardson
- 5 years ago
- Views:
Transcription
1 Recent Advances in Ranking: Adversarial Respondents and Lower Bounds on the Bayes Risk Vincent Y. F. Tan Department of Electrical and Computer Engineering, Department of Mathematics, National University of Singapore McMaster University June 15, 2018 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 1 / 51
2 Outline 1 Introduction to Statistical Models for Ranking 2 Fundamental Limits of Top-K Ranking with Adversaries 3 Lower Bounds on the Bayes Risk of a Bayesian BTL Model Vincent Y. F. Tan (NUS) Ranking Mcmaster University 2 / 51
3 Outline 1 Introduction to Statistical Models for Ranking 2 Fundamental Limits of Top-K Ranking with Adversaries 3 Lower Bounds on the Bayes Risk of a Bayesian BTL Model Vincent Y. F. Tan (NUS) Ranking Mcmaster University 3 / 51
4 Ranking A fundamental problem in a wide range of contexts Vincent Y. F. Tan (NUS) Ranking Mcmaster University 4 / 51
5 Ranking A fundamental problem in a wide range of contexts Applications: web search, recommendation systems, social choice, sports competitions, voting, etc. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 4 / 51
6 Ranking A fundamental problem in a wide range of contexts Applications: web search, recommendation systems, social choice, sports competitions, voting, etc. Efforts in developing various ranking algorithms Vincent Y. F. Tan (NUS) Ranking Mcmaster University 4 / 51
7 Ranking A fundamental problem in a wide range of contexts Applications: web search, recommendation systems, social choice, sports competitions, voting, etc. Efforts in developing various ranking algorithms A variety of statistical models introduced for evaluating ranking algorithms Vincent Y. F. Tan (NUS) Ranking Mcmaster University 4 / 51
8 Ranking: An Example and Difficulties Example: Web search Vincent Y. F. Tan (NUS) Ranking Mcmaster University 5 / 51
9 Ranking: An Example and Difficulties Example: Web search n = 10 9 websites ( n ) 2 n 2 = comparisons Do we really need Θ(n 2 ) comparisons? Vincent Y. F. Tan (NUS) Ranking Mcmaster University 5 / 51
10 Large-Scale Ranking Suppose that we want a total ordering pairwise comparisons are randomly given (probabilistically). This indeed requires Θ(n 2 ) comparisons Vincent Y. F. Tan (NUS) Ranking Mcmaster University 6 / 51
11 Large-Scale Ranking Suppose that we want a total ordering pairwise comparisons are randomly given (probabilistically). This indeed requires Θ(n 2 ) comparisons No way to identify the ordering between 1 and 2 without a direct comparison, i.e., comparison must be made w.p. 1 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 6 / 51
12 Large-Scale Ranking Suppose that we want a total ordering pairwise comparisons are randomly given (probabilistically). This indeed requires Θ(n 2 ) comparisons No way to identify the ordering between 1 and 2 without a direct comparison, i.e., comparison must be made w.p. 1 Worse with noisy data Vincent Y. F. Tan (NUS) Ranking Mcmaster University 6 / 51
13 Large-Scale Ranking Suppose that we want a total ordering pairwise comparisons are randomly given (probabilistically). This indeed requires Θ(n 2 ) comparisons No way to identify the ordering between 1 and 2 without a direct comparison, i.e., comparison must be made w.p. 1 Worse with noisy data Adopt a Shannon-theoretic approach in our analyses Vincent Y. F. Tan (NUS) Ranking Mcmaster University 6 / 51
14 Top-K Ranking Usually Suffices Vincent Y. F. Tan (NUS) Ranking Mcmaster University 7 / 51
15 Top-K Ranking Usually Suffices Vincent Y. F. Tan (NUS) Ranking Mcmaster University 7 / 51
16 Statistical Model for Top-K Ranking: Part I Adopt the Bradley-Terry-Luce or BTL model in which there is an underlying unknown score vector w = (w 1,..., w n ) R n ++, where w i is the likeability of movie i. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 8 / 51
17 Statistical Model for Top-K Ranking: Part I Adopt the Bradley-Terry-Luce or BTL model in which there is an underlying unknown score vector w = (w 1,..., w n ) R n ++, where w i is the likeability of movie i. Decide which items to compare via a comparison graph Vincent Y. F. Tan (NUS) Ranking Mcmaster University 8 / 51
18 Statistical Model for Top-K Ranking: Part II The outcome of the comparison between item 1 and 2 is ( ) w1 Y 12 = I{item 1 item 2} Bern. w 1 + w 2 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 9 / 51
19 Statistical Model for Top-K Ranking: Part II The outcome of the comparison between item 1 and 2 is ( ) w1 Y 12 = I{item 1 item 2} Bern. w 1 + w 2 E.g., w 1 = 0.9 and w 2 = 0.1, then item 1 beats item 2 w.p. 90%. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 9 / 51
20 Statistical Model for Top-K Ranking: Part II The outcome of the comparison between item 1 and 2 is ( ) w1 Y 12 = I{item 1 item 2} Bern. w 1 + w 2 E.g., w 1 = 0.9 and w 2 = 0.1, then item 1 beats item 2 w.p. 90%. We have L independent copies Y (1) ij,..., Y (L) ij for each observed edge {i, j} E of the observation graph. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 9 / 51
21 Statistical Model for Top-K Ranking: Part II The outcome of the comparison between item 1 and 2 is ( ) w1 Y 12 = I{item 1 item 2} Bern. w 1 + w 2 E.g., w 1 = 0.9 and w 2 = 0.1, then item 1 beats item 2 w.p. 90%. We have L independent copies Y (1) ij,..., Y (L) ij for each observed edge {i, j} E of the observation graph. Determine fundamental limits on L (as a function of n and other parameters) so that recovery of top-k set is successful. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 9 / 51
22 Outline 1 Introduction to Statistical Models for Ranking 2 Fundamental Limits of Top-K Ranking with Adversaries 3 Lower Bounds on the Bayes Risk of a Bayesian BTL Model Vincent Y. F. Tan (NUS) Ranking Mcmaster University 10 / 51
23 Top-K Ranking with Adversaries Joint work with Changho Suh (KAIST) Renbo Zhao (NUS) Vincent Y. F. Tan (NUS) Ranking Mcmaster University 11 / 51
24 Top-K Ranking with Adversaries Joint work with Changho Suh (KAIST) Renbo Zhao (NUS) C. Suh, VYFT and R. Zhao Adversarial Top-K Ranking, IEEE Trans. on Inf. Theory, Apr 2017 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 11 / 51
25 Ranking with Adversaries: Crowdsourced Setting Vincent Y. F. Tan (NUS) Ranking Mcmaster University 12 / 51
26 Ranking with Adversaries: Crowdsourced Setting ( ) wi Y ij Bern w i + w j Vincent Y. F. Tan (NUS) Ranking Mcmaster University 12 / 51
27 Ranking with Adversaries: Crowdsourced Setting ( ) wi Y ij Bern w i + w j ( ) 1/w i Y ij Bern 1/w i + 1/w j Vincent Y. F. Tan (NUS) Ranking Mcmaster University 12 / 51
28 Ranking with Adversaries: Crowdsourced Setting ( ) wi Y ij Bern w i + w j ( ) 1/w i Y ij Bern 1/w i + 1/w j ( ) wj = Bern w i + w j Vincent Y. F. Tan (NUS) Ranking Mcmaster University 12 / 51
29 Ranking with Adversaries: Crowdsourced Setting ( ) wi Y ij Bern w i + w j ( ) 1/w i Y ij Bern 1/w i + 1/w j ( ) wj = Bern w i + w j Spammers provide answers in an adversarial manner Vincent Y. F. Tan (NUS) Ranking Mcmaster University 12 / 51
30 Ranking with Adversaries: Crowdsourced Setting ( ) wi Y ij Bern w i + w j ( ) 1/w i Y ij Bern 1/w i + 1/w j ( ) wj = Bern w i + w j Spammers provide answers in an adversarial manner ( ) w i w j Y ij Bern η + (1 η) w i + w j w i + w j Vincent Y. F. Tan (NUS) Ranking Mcmaster University 12 / 51
31 Related Work: Crowdsourced BTL [Chen et al. 13] 1 Given an observed pair, each sample has different distributions ( ) Y (l) w i 1/w i ij Bern η l + (1 η l ) w i + w j 1/w i + 1/w j where η l is a quality parameter of measurement l 1 X. Chen, P. N. Bennett, K. Collins-Thompson, and E. Horvitz, Pairwise ranking aggregation in a crowdsourced setting, in WSDM, 2013 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 13 / 51
32 Related Work: Crowdsourced BTL [Chen et al. 13] 1 Given an observed pair, each sample has different distributions ( ) Y (l) w i 1/w i ij Bern η l + (1 η l ) w i + w j 1/w i + 1/w j where η l is a quality parameter of measurement l Subsumes as a special case our adversarial BTL model when all quality parameters are the same 1 X. Chen, P. N. Bennett, K. Collins-Thompson, and E. Horvitz, Pairwise ranking aggregation in a crowdsourced setting, in WSDM, 2013 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 13 / 51
33 Related Work: Crowdsourced BTL [Chen et al. 13] 1 Given an observed pair, each sample has different distributions ( ) Y (l) w i 1/w i ij Bern η l + (1 η l ) w i + w j 1/w i + 1/w j where η l is a quality parameter of measurement l Subsumes as a special case our adversarial BTL model when all quality parameters are the same The authors developed a ranking algorithm but without theoretical guarantees 1 X. Chen, P. N. Bennett, K. Collins-Thompson, and E. Horvitz, Pairwise ranking aggregation in a crowdsourced setting, in WSDM, 2013 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 13 / 51
34 Related Work: Crowdsourced BTL [Chen et al. 13] 1 Given an observed pair, each sample has different distributions ( ) Y (l) w i 1/w i ij Bern η l + (1 η l ) w i + w j 1/w i + 1/w j where η l is a quality parameter of measurement l Subsumes as a special case our adversarial BTL model when all quality parameters are the same The authors developed a ranking algorithm but without theoretical guarantees More difficult to analyze as there are many more parameters 1 X. Chen, P. N. Bennett, K. Collins-Thompson, and E. Horvitz, Pairwise ranking aggregation in a crowdsourced setting, in WSDM, 2013 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 13 / 51
35 Goal of Adversarial Top-K Ranking Erdös-Rényi comparison graph Vincent Y. F. Tan (NUS) Ranking Mcmaster University 14 / 51
36 Goal of Adversarial Top-K Ranking Erdös-Rényi comparison graph Vincent Y. F. Tan (NUS) Ranking Mcmaster University 14 / 51
37 Contribution #1: 1/2 < η < 1 known η = Fraction of non-adversaries; K w K w K+1 sample complexity η = 1 studied by Chen and Suh (2015) 2 2 Y. Chen and C. Suh, Spectral MLE: Top-K rank aggregation from pairwise comparisons, in ICML 2015 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 15 / 51
38 Contribution #1: 1/2 < η < 1 known η = Fraction of non-adversaries; K w K w K+1 sample complexity η = 1 studied by Chen and Suh (2015) 2 2 Y. Chen and C. Suh, Spectral MLE: Top-K rank aggregation from pairwise comparisons, in ICML 2015 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 15 / 51
39 Contribution #1: 1/2 < η < 1 known η = Fraction of non-adversaries; K w K w K+1 sample complexity minimax optimal η = 1 studied by Chen and Suh (2015) 2 2 Y. Chen and C. Suh, Spectral MLE: Top-K rank aggregation from pairwise comparisons, in ICML 2015 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 15 / 51
40 Contribution #1: 1/2 < η < 1 known η = Fraction of non-adversaries; K w K w K+1 sample complexity minimax optimal nearly-linear time algorithm η = 1 studied by Chen and Suh (2015) 2 2 Y. Chen and C. Suh, Spectral MLE: Top-K rank aggregation from pairwise comparisons, in ICML 2015 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 15 / 51
41 Contribution #1: 1/2 < η < 1 known Experimental Results for n = 1000 and K = 10 Ŝ = ( ) n pˆl ave 2 Ŝ norm = Ŝ (n log n)/ 2 K Vincent Y. F. Tan (NUS) Ranking Mcmaster University 16 / 51
42 Contribution #2: 1/2 < η < 1 unknown η = Fraction of non-adversaries; K w K w K+1 sample complexity Vincent Y. F. Tan (NUS) Ranking Mcmaster University 17 / 51
43 Contribution #2: 1/2 < η < 1 unknown η = Fraction of non-adversaries; K w K w K+1 sample complexity infeasible Vincent Y. F. Tan (NUS) Ranking Mcmaster University 17 / 51
44 Contribution #2: 1/2 < η < 1 unknown η = Fraction of non-adversaries; K w K w K+1 sample complexity polynomial time algorithm infeasible Vincent Y. F. Tan (NUS) Ranking Mcmaster University 17 / 51
45 Contribution #2: 1/2 < η < 1 unknown η = Fraction of non-adversaries; K w K w K+1 sample complexity?? polynomial time algorithm infeasible Vincent Y. F. Tan (NUS) Ranking Mcmaster University 17 / 51
46 Optimality Vincent Y. F. Tan (NUS) Ranking Mcmaster University 18 / 51
47 Optimality Minimax optimality: Construct worst-case score vectors Vincent Y. F. Tan (NUS) Ranking Mcmaster University 18 / 51
48 Optimality Minimax optimality: Construct worst-case score vectors Translation to M-ary hypothesis testing: Construction of multiple hypotheses Vincent Y. F. Tan (NUS) Ranking Mcmaster University 18 / 51
49 Optimality Minimax optimality: Construct worst-case score vectors Translation to M-ary hypothesis testing: Construction of multiple hypotheses Information-theoretic ideas applied to statistical learning Vincent Y. F. Tan (NUS) Ranking Mcmaster University 18 / 51
50 Optimality: Tools Construction of M := min{k, n K} + 1 n/2 hypotheses: Pr(σ([K]) = S) = 1, for S = {2,..., K} {i}, i = 1, K + 1,..., n M Vincent Y. F. Tan (NUS) Ranking Mcmaster University 19 / 51
51 Optimality: Tools Construction of M := min{k, n K} + 1 n/2 hypotheses: Pr(σ([K]) = S) = 1, for S = {2,..., K} {i}, i = 1, K + 1,..., n M Bound mutual info. of permutation and erased version of Y (l) I(σ; Z) p M 2 σ 1,σ 2 M l=1 L i j ( D P (l) Y ij σ 1 P Y (l) ij σ 2 ) ij : Vincent Y. F. Tan (NUS) Ranking Mcmaster University 19 / 51
52 Optimality: Tools Construction of M := min{k, n K} + 1 n/2 hypotheses: Pr(σ([K]) = S) = 1, for S = {2,..., K} {i}, i = 1, K + 1,..., n M Bound mutual info. of permutation and erased version of Y (l) I(σ; Z) p M 2 σ 1,σ 2 M l=1 L i j ( D P (l) Y ij σ 1 P Y (l) ij σ 2 ) Bound the divergence using reverse Pinsker s inequality. Here is where K comes in ( ) D P (l) Y ij σ 1 P Y (l) n (2η 1) 2 2 ij σ 2 K i j ij : Vincent Y. F. Tan (NUS) Ranking Mcmaster University 19 / 51
53 Optimality: Tools Construction of M := min{k, n K} + 1 n/2 hypotheses: Pr(σ([K]) = S) = 1, for S = {2,..., K} {i}, i = 1, K + 1,..., n M Bound mutual info. of permutation and erased version of Y (l) I(σ; Z) p M 2 σ 1,σ 2 M l=1 L i j ( D P (l) Y ij σ 1 P Y (l) ij σ 2 ) Bound the divergence using reverse Pinsker s inequality. Here is where K comes in ( ) D P (l) Y ij σ 1 P Y (l) n (2η 1) 2 2 ij σ 2 K i j Fano s inequality Vincent Y. F. Tan (NUS) Ranking Mcmaster University 19 / 51 ij :
54 Ranking Algorithm for η Known: Part I Vincent Y. F. Tan (NUS) Ranking Mcmaster University 20 / 51
55 Ranking Algorithm for η Known: Part I Scores determine the ranking Vincent Y. F. Tan (NUS) Ranking Mcmaster University 20 / 51
56 Ranking Algorithm for η Known: Part I Scores determine the ranking Adopt a two-step approach Vincent Y. F. Tan (NUS) Ranking Mcmaster University 20 / 51
57 Ranking Algorithm for η Known: Part II Vincent Y. F. Tan (NUS) Ranking Mcmaster University 21 / 51
58 Ranking Algorithm for η Known: Part II Key Message: Small MSE = Small l Error of ŵ = High Ranking Accuracy Vincent Y. F. Tan (NUS) Ranking Mcmaster University 21 / 51
59 How to ensure small MSE for η = 1? Vincent Y. F. Tan (NUS) Ranking Mcmaster University 22 / 51
60 How to ensure small MSE for η = 1? Recall η = 1 (no adversaries) Vincent Y. F. Tan (NUS) Ranking Mcmaster University 22 / 51
61 How to ensure small MSE for η = 1? Recall η = 1 (no adversaries) L independent copies Y (1) ij, Y (2) ij,..., Y (L) ij Vincent Y. F. Tan (NUS) Ranking Mcmaster University 22 / 51
62 How to ensure small MSE for η = 1? Recall η = 1 (no adversaries) L independent copies Y (1) ij, Y (2) ij,..., Y (L) ij Convergence to stationary distribution 1 L L l=1 Y (l) ij w i w i + w j Vincent Y. F. Tan (NUS) Ranking Mcmaster University 22 / 51
63 How to ensure small MSE for η = 1? Recall η = 1 (no adversaries) L independent copies Y (1) ij, Y (2) ij,..., Y (L) ij Convergence to stationary distribution 1 L L l=1 Detailed balance equation: w j w i π i = π j w i + w j w i + w j Y (l) ij w i w i + w j where π := [π 1, π 2,..., π n ] is the stat. distn. of the chain. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 22 / 51
64 How to ensure small MSE for η = 1? Recall η = 1 (no adversaries) L independent copies Y (1) ij, Y (2) ij,..., Y (L) ij Convergence to stationary distribution 1 L L l=1 Detailed balance equation: w j w i π i = π j w i + w j w i + w j Y (l) ij w i w i + w j where π := [π 1, π 2,..., π n ] is the stat. distn. of the chain. Stationary distribution converges to w (up to constant scaling), i.e., lim L π(l) = αw. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 22 / 51
65 How to ensure small MSE for η (1/2, 1]? Vincent Y. F. Tan (NUS) Ranking Mcmaster University 23 / 51
66 How to ensure small MSE for η (1/2, 1]? Arbitrary η (1/2, 1] (adversaries) L independent copies Y (1) ij, Y (2) ij,..., Y (L) ij Redefine Markov chain Vincent Y. F. Tan (NUS) Ranking Mcmaster University 23 / 51
67 How to ensure small MSE for η (1/2, 1]? Arbitrary η (1/2, 1] (adversaries) L independent copies Y (1) ij, Y (2) ij,..., Y (L) ij Redefine Markov chain We instead have the following convergence: 1 L L l=1 Y (l) ij w i w j η + (1 η) = (2η 1) + (1 η) w i + w j w i + w j w i + w j w i Vincent Y. F. Tan (NUS) Ranking Mcmaster University 23 / 51
68 How to ensure small MSE for η (1/2, 1]? Arbitrary η (1/2, 1] (adversaries) L independent copies Y (1) ij, Y (2) ij,..., Y (L) ij Redefine Markov chain We instead have the following convergence: 1 L L l=1 Y (l) ij w i w j η + (1 η) = (2η 1) + (1 η) w i + w j w i + w j w i + w j Redefine shifted samples with range scaled by 2η 1: [ ] 1 1 L Ỹ ij = Y (l) ij (1 η) w i 2η 1 L w i + w j l=1 w i Vincent Y. F. Tan (NUS) Ranking Mcmaster University 23 / 51
69 How to ensure small MSE for η (1/2, 1]? Arbitrary η (1/2, 1] (adversaries) L independent copies Y (1) ij, Y (2) ij,..., Y (L) ij Redefine Markov chain We instead have the following convergence: 1 L L l=1 Y (l) ij w i w j η + (1 η) = (2η 1) + (1 η) w i + w j w i + w j w i + w j Redefine shifted samples with range scaled by 2η 1: [ ] 1 1 L Ỹ ij = Y (l) ij (1 η) w i 2η 1 L w i + w j l=1 Construct Markov chain with transition probabilities {Ỹ ij }. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 23 / 51 w i
70 Ranking Algorithm for η Known: Summary Vincent Y. F. Tan (NUS) Ranking Mcmaster University 24 / 51
71 Ranking Algorithm for η Known: Summary Use several concentraition inequalities (Hoeffding, Bernstein, Tropp, etc.), we can show that if ( ) n n log n sample size = L p 2 (2η 1) 2 2 = Feasible Top-K Ranking K Vincent Y. F. Tan (NUS) Ranking Mcmaster University 24 / 51
72 What if η is unknown? Adversarial BTL model is a mixture model 3 P. Jain and S. Oh, Learning mixtures of discrete product distributions using spectral decompositions, in COLT, A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade, and M. Telgarsky, Tensor decompositions for learning latent variable models, JMLR, Vincent Y. F. Tan (NUS) Ranking Mcmaster University 25 / 51
73 What if η is unknown? Adversarial BTL model is a mixture model Obtaining global optimality guarantees for mixture model problems is difficult in general 3 P. Jain and S. Oh, Learning mixtures of discrete product distributions using spectral decompositions, in COLT, A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade, and M. Telgarsky, Tensor decompositions for learning latent variable models, JMLR, Vincent Y. F. Tan (NUS) Ranking Mcmaster University 25 / 51
74 What if η is unknown? Adversarial BTL model is a mixture model Obtaining global optimality guarantees for mixture model problems is difficult in general Recent developments: Tensor methods: Jain and Oh 3 and Anandkumar et al. 4 Key idea: Exact 2nd and 3rd moments yield sufficient statistics 3 P. Jain and S. Oh, Learning mixtures of discrete product distributions using spectral decompositions, in COLT, A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade, and M. Telgarsky, Tensor decompositions for learning latent variable models, JMLR, Vincent Y. F. Tan (NUS) Ranking Mcmaster University 25 / 51
75 What if η is unknown? Adversarial BTL model is a mixture model Obtaining global optimality guarantees for mixture model problems is difficult in general Recent developments: Tensor methods: Jain and Oh 3 and Anandkumar et al. 4 Key idea: Exact 2nd and 3rd moments yield sufficient statistics Our setting: Can obtain estimates of 2nd and 3rd moments Can estimate η 3 P. Jain and S. Oh, Learning mixtures of discrete product distributions using spectral decompositions, in COLT, A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade, and M. Telgarsky, Tensor decompositions for learning latent variable models, JMLR, Vincent Y. F. Tan (NUS) Ranking Mcmaster University 25 / 51
76 Unknown η: High-Level Algorithm: Part I 1 Turn weights into distribution vectors [ w i w j w i π 0 = w i + w j w i + w j w i + w j w j w i + w j ] T Vincent Y. F. Tan (NUS) Ranking Mcmaster University 26 / 51
77 Unknown η: High-Level Algorithm: Part I 1 Turn weights into distribution vectors [ w i w j w i π 0 = w i + w j w i + w j w i + w j w j w i + w j ] T 2 Estimate moments. Ground truth moment matrix and tensor are: M 2 := ηπ 0 π 0 + (1 η)π 1 π 1, M 3 := ηπ 0 π 0 π 0 + (1 η)π 1 π 1 π 1. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 26 / 51
78 Unknown η: High-Level Algorithm: Part I 1 Turn weights into distribution vectors [ w i w j w i π 0 = w i + w j w i + w j w i + w j w j w i + w j ] T 2 Estimate moments. Ground truth moment matrix and tensor are: M 2 := ηπ 0 π 0 + (1 η)π 1 π 1, M 3 := ηπ 0 π 0 π 0 + (1 η)π 1 π 1 π 1. 3 Solves a Least Squares Problem ] Ĝ arg min P Ω 3 (Z [P 1 ) 2 ˆM2 3 Y (t) Z R I 2 t I 2 F Vincent Y. F. Tan (NUS) Ranking Mcmaster University 26 / 51
79 Unknown η: High-Level Algorithm: Part I 1 Turn weights into distribution vectors [ w i w j w i π 0 = w i + w j w i + w j w i + w j w j w i + w j ] T 2 Estimate moments. Ground truth moment matrix and tensor are: M 2 := ηπ 0 π 0 + (1 η)π 1 π 1, M 3 := ηπ 0 π 0 π 0 + (1 η)π 1 π 1 π 1. 3 Solves a Least Squares Problem ] Ĝ arg min P Ω 3 (Z [P 1 ) 2 ˆM2 3 Y (t) Z R I 2 t I 2 F 4 Find leading eigenvalue λ 1 (Ĝ) of Ĝ which is related to η as follows: ˆη = λ 1 (Ĝ) 2 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 26 / 51
80 Unknown η: High-Level Algorithm: Part II Vincent Y. F. Tan (NUS) Ranking Mcmaster University 27 / 51
81 Unknown η: High-Level Algorithm: Part II How does the quality of the estimation of η affect overall sample complexity? Vincent Y. F. Tan (NUS) Ranking Mcmaster University 27 / 51
82 Tradeoff Between ˆη η and Sample Complexity With very careful analysis, we can derive a meta-lemma ( ) n ˆη η ɛ = Sample size = L p n log2 n 2 ɛ 2 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 28 / 51
83 Tradeoff Between ˆη η and Sample Complexity With very careful analysis, we can derive a meta-lemma ( ) n ˆη η ɛ = Sample size = L p n log2 n 2 ɛ 2 This implies that ˆη η implies that ŵ w but sample size Vincent Y. F. Tan (NUS) Ranking Mcmaster University 28 / 51
84 Tradeoff Between ˆη η and Sample Complexity With very careful analysis, we can derive a meta-lemma ( ) n ˆη η ɛ = Sample size = L p n log2 n 2 ɛ 2 This implies that ˆη η implies that ŵ w but sample size ˆη η implies that sample size but ŵ w Vincent Y. F. Tan (NUS) Ranking Mcmaster University 28 / 51
85 Tradeoff Between ˆη η and Sample Complexity With very careful analysis, we can derive a meta-lemma ( ) n ˆη η ɛ = Sample size = L p n log2 n 2 ɛ 2 This implies that ˆη η implies that ŵ w but sample size ˆη η implies that sample size but ŵ w Find a sweet spot to show that sample size n log2 n (2η 1) 4 4, = Feasible Top-K ranking K Vincent Y. F. Tan (NUS) Ranking Mcmaster University 28 / 51
86 Conclusion for Adversarial Top-K Ranking Explored a Top-K ranking problem for an adversarial setting Vincent Y. F. Tan (NUS) Ranking Mcmaster University 29 / 51
87 Conclusion for Adversarial Top-K Ranking Explored a Top-K ranking problem for an adversarial setting Characterized exact order-wise optimal sample complexity for η-known case Vincent Y. F. Tan (NUS) Ranking Mcmaster University 29 / 51
88 Conclusion for Adversarial Top-K Ranking Explored a Top-K ranking problem for an adversarial setting Characterized exact order-wise optimal sample complexity for η-known case Established an upper bound on the sample complexity for the η-unknown case Vincent Y. F. Tan (NUS) Ranking Mcmaster University 29 / 51
89 Conclusion for Adversarial Top-K Ranking Explored a Top-K ranking problem for an adversarial setting Characterized exact order-wise optimal sample complexity for η-known case Established an upper bound on the sample complexity for the η-unknown case Developed computationally efficient algorithms for both cases (using state-of-the-art tensor methods for the η-unknown case) Vincent Y. F. Tan (NUS) Ranking Mcmaster University 29 / 51
90 Conclusion for Adversarial Top-K Ranking Explored a Top-K ranking problem for an adversarial setting Characterized exact order-wise optimal sample complexity for η-known case Established an upper bound on the sample complexity for the η-unknown case Developed computationally efficient algorithms for both cases (using state-of-the-art tensor methods for the η-unknown case) C. Suh, VYFT and R. Zhao Adversarial Top-K Ranking, IEEE Trans. on Inf. Theory, Apr 2017 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 29 / 51
91 Outline 1 Introduction to Statistical Models for Ranking 2 Fundamental Limits of Top-K Ranking with Adversaries 3 Lower Bounds on the Bayes Risk of a Bayesian BTL Model Vincent Y. F. Tan (NUS) Ranking Mcmaster University 30 / 51
92 Lower Bounds on the Risk of a Bayesian BTL Model Joint work with Mine Alsan (NUS) Ranjitha Prasad (TCS Innovation Labs, Delhi) Vincent Y. F. Tan (NUS) Ranking Mcmaster University 31 / 51
93 Lower Bounds on the Risk of a Bayesian BTL Model Joint work with Mine Alsan (NUS) Ranjitha Prasad (TCS Innovation Labs, Delhi) M. Alsan, R. Prasad and VYFT, Lower Bounds on the Bayes Risk of the Bayesian BTL Model with Applications to Comparison Graphs, IEEE J. on Sel. Topics of Sig. Proc., Oct 2018 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 31 / 51
94 Summary of contributions Study the fundamental performance limits of ranking algorithms in the Bradley-Terry-Luce model within a Bayesian framework: Vincent Y. F. Tan (NUS) Ranking Mcmaster University 32 / 51
95 Summary of contributions Study the fundamental performance limits of ranking algorithms in the Bradley-Terry-Luce model within a Bayesian framework: 1 Derive lower bounds on the Bayes Risk of estimators. - A family of information-theoretic lower bounds for norm-based distortion functions r r, for any r 1. - The Bayesian Cramér-Rao bound for the MSE, i.e., r = 2. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 32 / 51
96 Summary of contributions Study the fundamental performance limits of ranking algorithms in the Bradley-Terry-Luce model within a Bayesian framework: 1 Derive lower bounds on the Bayes Risk of estimators. - A family of information-theoretic lower bounds for norm-based distortion functions r r, for any r 1. - The Bayesian Cramér-Rao bound for the MSE, i.e., r = 2. 2 Explore optimal comparison graph structures to design experiments minimizing distortion. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 32 / 51
97 Recall the BTL Model BTL model: To each item i [n], a skill parameter w i R ++ s.t. P ij := Pr[item i item j] = w i w i + w j. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 33 / 51
98 Recall the BTL Model BTL model: To each item i [n], a skill parameter w i R ++ s.t. P ij := Pr[item i item j] = w i w i + w j. Instead of Top-K ranking, now we want to estimate the vector w := (w 1,..., w n ) R n ++ Vincent Y. F. Tan (NUS) Ranking Mcmaster University 33 / 51
99 Recall the BTL Model BTL model: To each item i [n], a skill parameter w i R ++ s.t. P ij := Pr[item i item j] = w i w i + w j. Instead of Top-K ranking, now we want to estimate the vector Given w := (w 1,..., w n ) R n ++ m = (i,j):i j m ij N indep. pairwise comparisons, we count: Vincent Y. F. Tan (NUS) Ranking Mcmaster University 33 / 51
100 Recall the BTL Model BTL model: To each item i [n], a skill parameter w i R ++ s.t. P ij := Pr[item i item j] = w i w i + w j. Instead of Top-K ranking, now we want to estimate the vector w := (w 1,..., w n ) R n ++ Given m = m ij N (i,j):i j indep. pairwise comparisons, we count: 1 m ij : Num. of pairwise comparisons between items i & j, 2 b ij : Num. of comparisons in which i is preferred over j. M := {m ij } N n n and B := {b ij } N n n. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 33 / 51
101 Induced Probabilities by BTL Model We assume that the matrix M = {m ij } is fixed a priori. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 34 / 51
102 Induced Probabilities by BTL Model We assume that the matrix M = {m ij } is fixed a priori. The BTL model induces the following distributions: 1 For fixed m ij, p(b ij w i, w j ) = Bin(b ij ; m ij, P ij ). 2 For fixed M, p(b λ) = Bin(b ij ; m ij, P ij ), (i,j):i<j Vincent Y. F. Tan (NUS) Ranking Mcmaster University 34 / 51
103 Bayesian BTL Model Adopt the Bayesian BTL framework by Caron & Doucet 5 : 5 F. Caron and A. Doucet, Efficient Bayesian Inference for Generalized Bradley-Terry Models", in JCGS, 2012 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 35 / 51
104 Bayesian BTL Model Adopt the Bayesian BTL framework by Caron & Doucet 5 : 1 Prior distribution: They assign p(w i ) = Gam(w i ; α i, β i ) to each item i [n], where α = {α i } n i=1, β := {β i} n i=1 Rn F. Caron and A. Doucet, Efficient Bayesian Inference for Generalized Bradley-Terry Models", in JCGS, 2012 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 35 / 51
105 Bayesian BTL Model Adopt the Bayesian BTL framework by Caron & Doucet 5 : 1 Prior distribution: They assign p(w i ) = Gam(w i ; α i, β i ) to each item i [n], where α = {α i } n i=1, β := {β i} n i=1 Rn Latent random variables: They introduce Z := {Z ij } R n n m ij Z ij = Z ji := min{y si, Y sj }, for i, j [n] such that i < j, where Y i Exp(w i ) & Y j Exp(w j ) such that P ij = Pr[Y i < Y j ]. Known as Thurstonian interpretation of the BTL model. 5 F. Caron and A. Doucet, Efficient Bayesian Inference for Generalized Bradley-Terry Models", in JCGS, 2012 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 35 / 51 s=1
106 Induced Probabilities by Bayesian BTL Model 1 Prior: n n p(w) = p(w i ) = Gam(w i ; α i, β i ), i=1 i=1 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 36 / 51
107 Induced Probabilities by Bayesian BTL Model 1 Prior: p(w) = 2 Prior Likelihood: p(w, B) = p(w)p(b w) = n p(w i ) = i=1 n n Gam(w i ; α i, β i ), i=1 i=1 Gam(w i ; α i, β i ) i<j ( Bin b ij ; m ij, w i w i + w j ). Vincent Y. F. Tan (NUS) Ranking Mcmaster University 36 / 51
108 Induced Probabilities by Bayesian BTL Model 1 Prior: p(w) = 2 Prior Likelihood: p(w, B) = p(w)p(b w) = 3 Latent Variable: n p(w i ) = i=1 n n Gam(w i ; α i, β i ), i=1 i=1 Gam(w i ; α i, β i ) i<j p(z ij w i, w j ) = Gam(Z ij ; m ij, w i + w j ). ( Bin b ij ; m ij, w i w i + w j ). Vincent Y. F. Tan (NUS) Ranking Mcmaster University 36 / 51
109 Induced Probabilities by Bayesian BTL Model 1 Prior: p(w) = 2 Prior Likelihood: p(w, B) = p(w)p(b w) = 3 Latent Variable: n p(w i ) = i=1 n n Gam(w i ; α i, β i ), i=1 i=1 Gam(w i ; α i, β i ) i<j p(z ij w i, w j ) = Gam(Z ij ; m ij, w i + w j ). ( Bin b ij ; m ij, w i w i + w j ). 4 Posterior: n p(w B, Z) = Gam(w i ; α i + b i, β i + Z i ). i=1 where b i := j i b ij and Z i := j i Z ij. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 36 / 51
110 Bayes Risk For any r 1, we define the family of Bayes risks for estimating w Vincent Y. F. Tan (NUS) Ranking Mcmaster University 37 / 51
111 Bayes Risk For any r 1, we define the family of Bayes risks for estimating w 1 from only B as R B := inf ϕ E [ w ϕ(b) r r ], where ϕ(b) is an estimator of w. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 37 / 51
112 Bayes Risk For any r 1, we define the family of Bayes risks for estimating w 1 from only B as R B := inf ϕ E [ w ϕ(b) r r ], where ϕ(b) is an estimator of w. 2 from B and the latent variable Z as [ R B := inf E w ϕ (B, Z) ] r, ϕ r where ϕ (B, Z) is an estimator of w. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 37 / 51
113 Bayes Risk For any r 1, we define the family of Bayes risks for estimating w 1 from only B as R B := inf ϕ E [ w ϕ(b) r r ], where ϕ(b) is an estimator of w. 2 from B and the latent variable Z as [ R B := inf E w ϕ (B, Z) ] r, ϕ r where ϕ (B, Z) is an estimator of w. R B R B. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 37 / 51
114 Bayesian Network of All Variables w i Gam(w i ; α i, β i ) Prior on w i Vincent Y. F. Tan (NUS) Ranking Mcmaster University 38 / 51
115 Bayesian Network of All Variables w i P ij = w i + w j BTL model Vincent Y. F. Tan (NUS) Ranking Mcmaster University 38 / 51
116 Bayesian Network of All Variables Y si Exp(w i ) Latent Arrival Times Vincent Y. F. Tan (NUS) Ranking Mcmaster University 38 / 51
117 Bayesian Network of All Variables b ij Bin (b ij ; m ij, P ij ) Num of times i beats j out of m ij games Vincent Y. F. Tan (NUS) Ranking Mcmaster University 38 / 51
118 Bayesian Network of All Variables m ij Z ij = min{y si, Y sj } : Latent variables s=1 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 38 / 51
119 Bayesian Network of All Variables ϕ(b) and ϕ (B, Z) : Functions to estimate w Vincent Y. F. Tan (NUS) Ranking Mcmaster University 38 / 51
120 General Lower Bounds on the Bayes Risk For r = 2, can compute the Bayesian Cramér-Rao bound on R B. 6 A. Xu and M. Raginsky, Information-Theoretic Lower Bounds on Bayes Risk in Decentralized Estimation," in IEEE Trans. on Inf. Theory, Vincent Y. F. Tan (NUS) Ranking Mcmaster University 39 / 51
121 General Lower Bounds on the Bayes Risk For r = 2, can compute the Bayesian Cramér-Rao bound on R B. We compute a family of information-theoretic lower bounds: 6 A. Xu and M. Raginsky, Information-Theoretic Lower Bounds on Bayes Risk in Decentralized Estimation," in IEEE Trans. on Inf. Theory, Vincent Y. F. Tan (NUS) Ranking Mcmaster University 39 / 51
122 General Lower Bounds on the Bayes Risk For r = 2, can compute the Bayesian Cramér-Rao bound on R B. We compute a family of information-theoretic lower bounds: 1 Theorem 3 of Xu and Raginsky 6 reads: For any r 1, R B n ( ( V n Γ 1 + n ) ) r/n [ exp r ) I(w; B, Z) h(w) re r n( ], where V n is the volume of the unit ball in (R n, r ). 6 A. Xu and M. Raginsky, Information-Theoretic Lower Bounds on Bayes Risk in Decentralized Estimation," in IEEE Trans. on Inf. Theory, Vincent Y. F. Tan (NUS) Ranking Mcmaster University 39 / 51
123 General Lower Bounds on the Bayes Risk For r = 2, can compute the Bayesian Cramér-Rao bound on R B. We compute a family of information-theoretic lower bounds: 1 Theorem 3 of Xu and Raginsky 6 reads: For any r 1, R B n ( ( V n Γ 1 + n ) ) r/n [ exp r ) I(w; B, Z) h(w) re r n( ], where V n is the volume of the unit ball in (R n, r ). 6 A. Xu and M. Raginsky, Information-Theoretic Lower Bounds on Bayes Risk in Decentralized Estimation," in IEEE Trans. on Inf. Theory, Vincent Y. F. Tan (NUS) Ranking Mcmaster University 39 / 51
124 General Lower Bounds on the Bayes Risk For r = 2, can compute the Bayesian Cramér-Rao bound on R B. We compute a family of information-theoretic lower bounds: 1 Theorem 3 of Xu and Raginsky 6 reads: For any r 1, R B n ( ( V n Γ 1 + n ) ) r/n [ exp r ) I(w; B, Z) h(w) re r n( ], where V n is the volume of the unit ball in (R n, r ). 2 Using Stirling s approximation, we upper bound I(w; B, Z) h(w) = E [log p(w B, Z)]. 6 A. Xu and M. Raginsky, Information-Theoretic Lower Bounds on Bayes Risk in Decentralized Estimation," in IEEE Trans. on Inf. Theory, Vincent Y. F. Tan (NUS) Ranking Mcmaster University 39 / 51
125 Family of Information-Theoretic Lower Bounds Theorem For all i [n], let m i := 1 m ij. half the total num. of games i plays 2 j i Vincent Y. F. Tan (NUS) Ranking Mcmaster University 40 / 51
126 Family of Information-Theoretic Lower Bounds Theorem For all i [n], let m i := 1 m ij. 2 j i half the total num. of games i plays Then, the Bayes risk is asymptotically lower bounded by R B n ( ( V n Γ 1 + n ) ) r/n exp [ r E(B, α, β) ], re r where Vincent Y. F. Tan (NUS) Ranking Mcmaster University 40 / 51
127 Family of Information-Theoretic Lower Bounds Theorem For all i [n], let m i := 1 m ij. 2 j i half the total num. of games i plays Then, the Bayes risk is asymptotically lower bounded by R B n ( ( V n Γ 1 + n ) ) r/n exp [ r E(B, α, β) ], re r where E(B, α, β) := ( ) n 1 2 log (2π) + log β i ψ(α i ) log (α i + m i ). i=1 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 40 / 51
128 Information-Theoretic Lower Bounds Take α i = α and β i = β, for each i [n]. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 41 / 51
129 Information-Theoretic Lower Bounds Take α i = α and β i = β, for each i [n]. For the L 1 norm (r = 1), π R B 2 exp [ (log β ψ(α) + 1) ] n, α/n + m Vincent Y. F. Tan (NUS) Ranking Mcmaster University 41 / 51
130 Information-Theoretic Lower Bounds Take α i = α and β i = β, for each i [n]. For the L 1 norm (r = 1), π R B 2 exp [ (log β ψ(α) + 1) ] n, α/n + m For the squared L 2 norm (r = 2), R B exp [ 2(log β ψ(α)) 1 ] n α/n + m. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 41 / 51
131 Performance of Lower Bounds: L 1 error L1 Error EM algorithm Information theoretic lower bound Sample Size Figure: L 1 error of the EM algo. and the information-theoretic lower bound (for n = 100, α = 5 and β = αn 1). Vincent Y. F. Tan (NUS) Ranking Mcmaster University 42 / 51
132 Perf. of Lower Bounds: MSE (squared L 2 error) MSE EM algorithm BCRB Information theoretic lower bound Sample Size Figure: L 2 error of the EM algo., the IT lower bound and the BCRB (for n = 100, α = 5, and β = αn 1). Vincent Y. F. Tan (NUS) Ranking Mcmaster University 43 / 51
133 Application to General Comparison Graphs Given a fixed budget of m = i j m ij games, how to allocate games among n players to minimize the bounds? Vincent Y. F. Tan (NUS) Ranking Mcmaster University 44 / 51
134 Application to General Comparison Graphs Given a fixed budget of m = i j m ij games, how to allocate games among n players to minimize the bounds? Corollary (Optimal Connected Graphs) Regular Connected Graphs are Optimal! Vincent Y. F. Tan (NUS) Ranking Mcmaster University 44 / 51
135 Application to General Comparison Graphs Proof: Minimizing the lower bound is equivalent to maximizing f ({m i } i [n] ) := n i=1 subject to n i=1 m i = m and m i N. 1 2 log (α i + m i ) Vincent Y. F. Tan (NUS) Ranking Mcmaster University 45 / 51
136 Application to General Comparison Graphs Proof: Minimizing the lower bound is equivalent to maximizing f ({m i } i [n] ) := n i=1 subject to n i=1 m i = m and m i N. Solution given by water-filling formula: 1 2 log (α i + m i ) m i = µ α i +, i [n], where µ > 0 is chosen such that n µ α i + = m. i=1 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 45 / 51
137 Application to General Comparison Graphs Proof: Minimizing the lower bound is equivalent to maximizing f ({m i } i [n] ) := n i=1 subject to n i=1 m i = m and m i N. Solution given by water-filling formula: 1 2 log (α i + m i ) m i = µ α i +, i [n], where µ > 0 is chosen such that n µ α i + = m. i=1 But when α i = α for all i, m i are all equal. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 45 / 51
138 The Gamma Distribution with Fixed β = 1 α i = Greater belief that w i = Games i plays with others m i , = 1, = 2, = 3, = 4, = Vincent Y. F. Tan (NUS) Ranking Mcmaster University 46 / 51
139 Application to Comparison Graphs: Restricted to Trees Vincent Y. F. Tan (NUS) Ranking Mcmaster University 47 / 51
140 Application to Comparison Graphs: Restricted to Trees Corollary (Optimal Tree Graphs) Vincent Y. F. Tan (NUS) Ranking Mcmaster University 47 / 51
141 Application to Comparison Graphs: Restricted to Trees Corollary (Optimal Tree Graphs) 1 Best: Minimizes the (lower bound on the) Bayes Risk 2 Worst: Maximizes the (lower bound on the) Bayes Risk Vincent Y. F. Tan (NUS) Ranking Mcmaster University 47 / 51
142 Application to Comparison Graphs: Restricted to Trees Proof for Star: Maximizing the lower bound on Bayes risk equivalent to min m: g(m) := 1 i m i=m 2 log α + 2m log(α + m i) i i i m i where m = {m i } i [n] and i = 1 is the central node. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 48 / 51
143 Application to Comparison Graphs: Restricted to Trees Proof for Star: Maximizing the lower bound on Bayes risk equivalent to min m: g(m) := 1 i m i=m 2 log α + 2m log(α + m i) i i i m i where m = {m i } i [n] and i = 1 is the central node. Shift part of weight of an edge m 1j > 0, for j 1, to create a new edge with weight m ji such that i 1. Can show that g(m 1,..., m n ) m i > 0 implying that f will be increased by the new configuration. Vincent Y. F. Tan (NUS) Ranking Mcmaster University 48 / 51
144 Effect of Tree Graph Structure on IT Bound Fully Connected Graph Star Graph Chain Graph Random Tree MSE Sample Size Figure: IT bounds for diff. graph structures (for n = 100, α = 5, β = αn 1). Vincent Y. F. Tan (NUS) Ranking Mcmaster University 49 / 51
145 Effect of Tree Graph Structure on BCRB Fully Connected Graph Star Graph Chain Graph Random Tree MSE Sample Size Figure: BCRB for diff. graph structures (for n = 100, α = 5, β = αn 1). Vincent Y. F. Tan (NUS) Ranking Mcmaster University 50 / 51
146 Final Remarks Also derived lower bounds for the home-field advantage scenario: Q ij := θλ i, if i is home, θλ P ij = i + λ j λ i Q ij :=, if j is home, λ i + θλ j where θ R ++ models the strength of advantage (θ > 1) Vincent Y. F. Tan (NUS) Ranking Mcmaster University 51 / 51
147 Final Remarks Also derived lower bounds for the home-field advantage scenario: Q ij := θλ i, if i is home, θλ P ij = i + λ j λ i Q ij :=, if j is home, λ i + θλ j where θ R ++ models the strength of advantage (θ > 1) Future works: Matching information-theoretic upper bounds Vincent Y. F. Tan (NUS) Ranking Mcmaster University 51 / 51
148 Final Remarks Also derived lower bounds for the home-field advantage scenario: Q ij := θλ i, if i is home, θλ P ij = i + λ j λ i Q ij :=, if j is home, λ i + θλ j where θ R ++ models the strength of advantage (θ > 1) Future works: Matching information-theoretic upper bounds Other questions related to comparing graph structure, e.g., Vincent Y. F. Tan (NUS) Ranking Mcmaster University 51 / 51
149 Final Remarks Also derived lower bounds for the home-field advantage scenario: Q ij := θλ i, if i is home, θλ P ij = i + λ j λ i Q ij :=, if j is home, λ i + θλ j where θ R ++ models the strength of advantage (θ > 1) Future works: Matching information-theoretic upper bounds Other questions related to comparing graph structure, e.g., Does the fully-connected graph outperform a simple cycle?" Vincent Y. F. Tan (NUS) Ranking Mcmaster University 51 / 51
150 Final Remarks Also derived lower bounds for the home-field advantage scenario: Q ij := θλ i, if i is home, θλ P ij = i + λ j λ i Q ij :=, if j is home, λ i + θλ j where θ R ++ models the strength of advantage (θ > 1) Future works: Matching information-theoretic upper bounds Other questions related to comparing graph structure, e.g., Does the fully-connected graph outperform a simple cycle?" M. Alsan, R. Prasad and VYFT, Lower Bounds on the Bayes Risk of the Bayesian BTL Model with Applications to Comparison Graphs, IEEE J. on Sel. Topics of Sig. Proc., Oct 2018 Vincent Y. F. Tan (NUS) Ranking Mcmaster University 51 / 51
Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking
Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking Yuxin Chen Electrical Engineering, Princeton University Joint work with Jianqing Fan, Cong Ma and Kaizheng Wang Ranking A fundamental
More informationRanking from Pairwise Comparisons
Ranking from Pairwise Comparisons Sewoong Oh Department of ISE University of Illinois at Urbana-Champaign Joint work with Sahand Negahban(MIT) and Devavrat Shah(MIT) / 9 Rank Aggregation Data Algorithm
More informationo A set of very exciting (to me, may be others) questions at the interface of all of the above and more
Devavrat Shah Laboratory for Information and Decision Systems Department of EECS Massachusetts Institute of Technology http://web.mit.edu/devavrat/www (list of relevant references in the last set of slides)
More informationMinimax-optimal Inference from Partial Rankings
Minimax-optimal Inference from Partial Rankings Bruce Hajek UIUC b-hajek@illinois.edu Sewoong Oh UIUC swoh@illinois.edu Jiaming Xu UIUC jxu8@illinois.edu Abstract This paper studies the problem of rank
More informationAn Introduction to Spectral Learning
An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words Preliminaries X 1,, X n p (x; θ), θ = (θ 1,
More informationEstimating Latent Variable Graphical Models with Moments and Likelihoods
Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods
More informationBudget-Optimal Task Allocation for Reliable Crowdsourcing Systems
Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems Sewoong Oh Massachusetts Institute of Technology joint work with David R. Karger and Devavrat Shah September 28, 2011 1 / 13 Crowdsourcing
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationStatistical and Computational Phase Transitions in Planted Models
Statistical and Computational Phase Transitions in Planted Models Jiaming Xu Joint work with Yudong Chen (UC Berkeley) Acknowledgement: Prof. Bruce Hajek November 4, 203 Cluster/Community structure in
More informationSum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017
Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth
More informationELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018
ELE 538B: Mathematics of High-Dimensional Data Spectral methods Yuxin Chen Princeton University, Fall 2018 Outline A motivating application: graph clustering Distance and angles between two subspaces Eigen-space
More informationPlanning and Model Selection in Data Driven Markov models
Planning and Model Selection in Data Driven Markov models Shie Mannor Department of Electrical Engineering Technion Joint work with many people along the way: Dotan Di-Castro (Yahoo!), Assaf Halak (Technion),
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationInformation Recovery from Pairwise Measurements: A Shannon-Theoretic Approach
Information Recovery from Pairwise Measurements: A Shannon-Theoretic Approach Yuxin Chen Statistics Stanford University Email: yxchen@stanfordedu Changho Suh EE KAIST Email: chsuh@kaistackr Andrea J Goldsmith
More informationTopic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up
Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can
More informationInformation Recovery from Pairwise Measurements
Information Recovery from Pairwise Measurements A Shannon-Theoretic Approach Yuxin Chen, Changho Suh, Andrea Goldsmith Stanford University KAIST Page 1 Recovering data from correlation measurements A large
More informationOn the errors introduced by the naive Bayes independence assumption
On the errors introduced by the naive Bayes independence assumption Author Matthijs de Wachter 3671100 Utrecht University Master Thesis Artificial Intelligence Supervisor Dr. Silja Renooij Department of
More informationScaling Neighbourhood Methods
Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)
More informationLecture 5 September 19
IFT 6269: Probabilistic Graphical Models Fall 2016 Lecture 5 September 19 Lecturer: Simon Lacoste-Julien Scribe: Sébastien Lachapelle Disclaimer: These notes have only been lightly proofread. 5.1 Statistical
More informationInformation theoretic perspectives on learning algorithms
Information theoretic perspectives on learning algorithms Varun Jog University of Wisconsin - Madison Departments of ECE and Mathematics Shannon Channel Hangout! May 8, 2018 Jointly with Adrian Tovar-Lopez
More informationConsensus and Distributed Inference Rates Using Network Divergence
DIMACS August 2017 1 / 26 Consensus and Distributed Inference Rates Using Network Divergence Anand D. Department of Electrical and Computer Engineering, The State University of New Jersey August 23, 2017
More informationCS155: Probability and Computing: Randomized Algorithms and Probabilistic Analysis
CS155: Probability and Computing: Randomized Algorithms and Probabilistic Analysis Eli Upfal Eli Upfal@brown.edu Office: 319 TA s: Lorenzo De Stefani and Sorin Vatasoiu cs155tas@cs.brown.edu It is remarkable
More informationAppendix A. Proof to Theorem 1
Appendix A Proof to Theorem In this section, we prove the sample complexity bound given in Theorem The proof consists of three main parts In Appendix A, we prove perturbation lemmas that bound the estimation
More informationProbabilistic Time Series Classification
Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign
More informationU Logo Use Guidelines
Information Theory Lecture 3: Applications to Machine Learning U Logo Use Guidelines Mark Reid logo is a contemporary n of our heritage. presents our name, d and our motto: arn the nature of things. authenticity
More informationSimple, Robust and Optimal Ranking from Pairwise Comparisons
Journal of Machine Learning Research 8 (208) -38 Submitted 4/6; Revised /7; Published 4/8 Simple, Robust and Optimal Ranking from Pairwise Comparisons Nihar B. Shah Machine Learning Department and Computer
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationCapacity of the Discrete Memoryless Energy Harvesting Channel with Side Information
204 IEEE International Symposium on Information Theory Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information Omur Ozel, Kaya Tutuncuoglu 2, Sennur Ulukus, and Aylin Yener
More informationData Mining Techniques
Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationIterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk
Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk John Lafferty School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 lafferty@cs.cmu.edu Abstract
More informationA Tight Upper Bound on the Second-Order Coding Rate of Parallel Gaussian Channels with Feedback
A Tight Upper Bound on the Second-Order Coding Rate of Parallel Gaussian Channels with Feedback Vincent Y. F. Tan (NUS) Joint work with Silas L. Fong (Toronto) 2017 Information Theory Workshop, Kaohsiung,
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationConvex relaxation for Combinatorial Penalties
Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,
More informationNotes 6 : First and second moment methods
Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationLearning from the Wisdom of Crowds by Minimax Entropy. Denny Zhou, John Platt, Sumit Basu and Yi Mao Microsoft Research, Redmond, WA
Learning from the Wisdom of Crowds by Minimax Entropy Denny Zhou, John Platt, Sumit Basu and Yi Mao Microsoft Research, Redmond, WA Outline 1. Introduction 2. Minimax entropy principle 3. Future work and
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationWorst-Case Bounds for Gaussian Process Models
Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis
More informationLecture 17: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 31, 2016 [Ed. Apr 1]
ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture 7: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 3, 06 [Ed. Apr ] In last lecture, we studied the minimax
More informationECE 598: Representation Learning: Algorithms and Models Fall 2017
ECE 598: Representation Learning: Algorithms and Models Fall 2017 Lecture 1: Tensor Methods in Machine Learning Lecturer: Pramod Viswanathan Scribe: Bharath V Raghavan, Oct 3, 2017 11 Introduction Tensors
More informationSupplemental for Spectral Algorithm For Latent Tree Graphical Models
Supplemental for Spectral Algorithm For Latent Tree Graphical Models Ankur P. Parikh, Le Song, Eric P. Xing The supplemental contains 3 main things. 1. The first is network plots of the latent variable
More information11. Learning graphical models
Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical
More informationParameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn
Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation
More informationLarge-Deviations and Applications for Learning Tree-Structured Graphical Models
Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of Information and Decision Systems, Massachusetts Institute of Technology Thesis
More informationSupplementary Material for: Spectral Unsupervised Parsing with Additive Tree Metrics
Supplementary Material for: Spectral Unsupervised Parsing with Additive Tree Metrics Ankur P. Parikh School of Computer Science Carnegie Mellon University apparikh@cs.cmu.edu Shay B. Cohen School of Informatics
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationRanking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization
Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Jialin Dong ShanghaiTech University 1 Outline Introduction FourVignettes: System Model and Problem Formulation Problem Analysis
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationRETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic
More informationApproximate Inference in Practice Microsoft s Xbox TrueSkill TM
1 Approximate Inference in Practice Microsoft s Xbox TrueSkill TM Daniel Hernández-Lobato Universidad Autónoma de Madrid 2014 2 Outline 1 Introduction 2 The Probabilistic Model 3 Approximate Inference
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Fall 2013 Full points may be obtained for correct answers to eight questions. Each numbered question (which may have several parts) is worth
More informationDirected Probabilistic Graphical Models CMSC 678 UMBC
Directed Probabilistic Graphical Models CMSC 678 UMBC Announcement 1: Assignment 3 Due Wednesday April 11 th, 11:59 AM Any questions? Announcement 2: Progress Report on Project Due Monday April 16 th,
More information8.1 Concentration inequality for Gaussian random matrix (cont d)
MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration
More informationEconometrics I, Estimation
Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the
More informationGeneralization theory
Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1
More informationBregman Divergences for Data Mining Meta-Algorithms
p.1/?? Bregman Divergences for Data Mining Meta-Algorithms Joydeep Ghosh University of Texas at Austin ghosh@ece.utexas.edu Reflects joint work with Arindam Banerjee, Srujana Merugu, Inderjit Dhillon,
More informationVariational inference
Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France. Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM
More informationMachine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?
Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do
More informationOnline Forest Density Estimation
Online Forest Density Estimation Frédéric Koriche CRIL - CNRS UMR 8188, Univ. Artois koriche@cril.fr UAI 16 1 Outline 1 Probabilistic Graphical Models 2 Online Density Estimation 3 Online Forest Density
More informationCS Lecture 18. Topic Models and LDA
CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same
More informationTheory of Stochastic Processes 8. Markov chain Monte Carlo
Theory of Stochastic Processes 8. Markov chain Monte Carlo Tomonari Sei sei@mist.i.u-tokyo.ac.jp Department of Mathematical Informatics, University of Tokyo June 8, 2017 http://www.stat.t.u-tokyo.ac.jp/~sei/lec.html
More informationLearning Mixtures of Random Utility Models
Learning Mixtures of Random Utility Models Zhibing Zhao and Tristan Villamil and Lirong Xia Rensselaer Polytechnic Institute 110 8th Street, Troy, NY, USA {zhaoz6, villat2}@rpi.edu, xial@cs.rpi.edu Abstract
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationFeedback Capacity of a Class of Symmetric Finite-State Markov Channels
Feedback Capacity of a Class of Symmetric Finite-State Markov Channels Nevroz Şen, Fady Alajaji and Serdar Yüksel Department of Mathematics and Statistics Queen s University Kingston, ON K7L 3N6, Canada
More informationStrong Converse Theorems for Classes of Multimessage Multicast Networks: A Rényi Divergence Approach
Strong Converse Theorems for Classes of Multimessage Multicast Networks: A Rényi Divergence Approach Silas Fong (Joint work with Vincent Tan) Department of Electrical & Computer Engineering National University
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationThe PAC Learning Framework -II
The PAC Learning Framework -II Prof. Dan A. Simovici UMB 1 / 1 Outline 1 Finite Hypothesis Space - The Inconsistent Case 2 Deterministic versus stochastic scenario 3 Bayes Error and Noise 2 / 1 Outline
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationInformation Complexity and Applications. Mark Braverman Princeton University and IAS FoCM 17 July 17, 2017
Information Complexity and Applications Mark Braverman Princeton University and IAS FoCM 17 July 17, 2017 Coding vs complexity: a tale of two theories Coding Goal: data transmission Different channels
More informationBayesian Contextual Multi-armed Bandits
Bayesian Contextual Multi-armed Bandits Xiaoting Zhao Joint Work with Peter I. Frazier School of Operations Research and Information Engineering Cornell University October 22, 2012 1 / 33 Outline 1 Motivating
More informationThe Method of Types and Its Application to Information Hiding
The Method of Types and Its Application to Information Hiding Pierre Moulin University of Illinois at Urbana-Champaign www.ifp.uiuc.edu/ moulin/talks/eusipco05-slides.pdf EUSIPCO Antalya, September 7,
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationarxiv: v1 [stat.ml] 1 Jan 2019
CONVERGENCE RATES OF GRADIENT DESCENT AND MM ALGORITHMS FOR GENERALIZED BRADLEY-TERRY MODELS arxiv:1901.00150v1 [stat.ml] 1 Jan 2019 By Milan Vojnovic, Seyoung Yun and Kaifang Zhou London School of Economics
More informationVolodymyr Kuleshov ú Arun Tejasvi Chaganty ú Percy Liang. May 11, 2015
Tensor Factorization via Matrix Factorization Volodymyr Kuleshov ú Arun Tejasvi Chaganty ú Percy Liang Stanford University May 11, 2015 Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization
More informationMatrix estimation by Universal Singular Value Thresholding
Matrix estimation by Universal Singular Value Thresholding Courant Institute, NYU Let us begin with an example: Suppose that we have an undirected random graph G on n vertices. Model: There is a real symmetric
More information1-bit Matrix Completion. PAC-Bayes and Variational Approximation
: PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Bayes In Paris, 5 January 2017 (Happy New Year!) Various Topics covered Matrix Completion PAC-Bayesian Estimation Variational
More informationStratégies bayésiennes et fréquentistes dans un modèle de bandit
Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016
More informationA physical model for efficient rankings in networks
A physical model for efficient rankings in networks Daniel Larremore Assistant Professor Dept. of Computer Science & BioFrontiers Institute March 5, 2018 CompleNet danlarremore.com @danlarremore The idea
More informationPermuation Models meet Dawid-Skene: A Generalised Model for Crowdsourcing
Permuation Models meet Dawid-Skene: A Generalised Model for Crowdsourcing Ankur Mallick Electrical and Computer Engineering Carnegie Mellon University amallic@andrew.cmu.edu Abstract The advent of machine
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationEcon 2148, spring 2019 Statistical decision theory
Econ 2148, spring 2019 Statistical decision theory Maximilian Kasy Department of Economics, Harvard University 1 / 53 Takeaways for this part of class 1. A general framework to think about what makes a
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationIntroduction to graphical models: Lecture III
Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction
More informationModels of collective inference
Models of collective inference Laurent Massoulié (Microsoft Research-Inria Joint Centre) Mesrob I. Ohannessian (University of California, San Diego) Alexandre Proutière (KTH Royal Institute of Technology)
More information1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016
AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework
More informationIntroduction to Bayesian Learning
Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline
More informationMulti-armed bandit models: a tutorial
Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)
More informationApproximate Inference
Approximate Inference Simulation has a name: sampling Sampling is a hot topic in machine learning, and it s really simple Basic idea: Draw N samples from a sampling distribution S Compute an approximate
More informationBayes methods for categorical data. April 25, 2017
Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,
More informationChapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)
Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Detection problems can usually be casted as binary or M-ary hypothesis testing problems. Applications: This chapter: Simple hypothesis
More information