Collective Intelligence
Collective Intelligence Prediction
A Tale of Two Models Lu Hong and Scott Page Interpreted and Generated Signals Journal of Economic Theory, 2009
Generated Signals Interpreted Signals
Generated Signal: disturbance or interference (social scientists/statisticians) Interpreted Signal: prediction from a model (computer scientists/psychologists)
Fundamental Question Generated Signals Collective Intelligence via Generated Signals Interpreted Signals Collective Intelligence via Interpreted Signals The Netflix Prize
Democracy: information aggregation Markets: prices are forecasts rational expectations
Generated Signals
Generated Signal noise Outcome Signal
UCSC!
L L-ε L+ε
Collective Intelligence 1.0
Outcome: θ in Θ
Signal: si
Distribution: f(s i θ)
Error = (si - 2 θ)
AveError = 1 n n i=1 (s i θ) 2
c = 1 n n i=1 s i
Crowd Error = (c - 2 θ)
Div = 1 n n i=1 (s i c ) 2
Diversity Prediction Theorem Crowd Error = Average Error - Diversity
Diversity Prediction Theorem Crowd Error = Average Error - Diversity n n (c θ) 2 = 1 n (s i θ) 2 1 n (s i c) 2 i=1 i=1
Crowd Error = Average Error - Diversity
Crowd Error
Crowd Error = Average Error
Average = Diversity Crowd Error - Error
0.6 = 2,956.0-2955.4
Collective Intelligence 2.0
Signals as Random Variables Mean of i s signal: µ i (θ) ias of i s signal: b i = (µ i (θ) - θ) Variance of i s signal: v i =E[(µ i (θ) - θ)] 2 Average ias = Average Variance V = Average Covariance C = 1 n(n 1) 1 n n i=1 b i 1 n n i=1 v i n E[ s i µ i ][s j µ j ] i=1 j i
ias Variance Decomposition E[SqE(c)] = 2 + 1 n V + n 1 n C
Resolving the Paradox Diversity Prediction Theorem: Predictive Diversity is realized diversity, which improves accuracy. ias Variance Decomposition: Variance corresponds to noisier signals, which reduces accuracy. Negative covariance implies diverse realized diversity and improves expected accuracy of collective predictions.
Large Population Accuracy If the signals are independent, unbiased, and with bounded variance, then as n approaches infinity the crowd s error goes to zero E[SqE(c)] = 0 2 + 1 n V + n 1 n 0
Ecologies of Models Suppose that there exist K possible models so that there exists a distribution across those models. p i = proportion of the population using model i
Collective Accuracy: Diverse Types D = 1 p i 2 i 2 + 1 D V + D 1 D C Economo, Hong, Page
Weighting
Weighting by Accuracy Accuracy A = 1/σ 2
Weighting by Accuracy Accuracy A = 1/σ 2 Weights: w i = A i /(A 1 +A 2 +A 3 +A 4 +A n ) E M i=1 2 A i M s j=1 A i = j M i=1 A 2 i ( M k=1 A k) 2(s i)+ M i=1 j=i A i A j ( M k=1 A k) 2(s i,s j )
Example Three predictors with variances: 1,2, and 4 Equally weighted: E[SqE(C)] = 7/9 Accuracy weighted: E[SqE(C a )] = 4/7 accuracies: 1, 0.5, 0.25
Accuracy and Covariance Σ = variance covariance matrix u = (1,1,1,1,1,1) Weights: w = 1/n (Σ -1 u) Error: (u Σ -1 u) -1
Example: Two Models Weight on model a σ b 2 cov(a,b) σ a 2 + σ b 2 2cov(a,b)
Forecast Standard Use equal weights unless you have strong evidence to support unequal weighting of forecasts (Armstrong 2001)
Interpreted Signals
Interpretive Signal model Attributes Prediction
Concepts and Categorization Categorization enables a wide variety of subordinate functions because classifying something as a category member allows people to bring their knowledge of the category to bear on the new instance. Once people categorize some novel entity for example, then can use relevant knowledge for understanding and prediction. Medin and Rips
Interpretive Signal Example Charisma H MH ML L H G G G Experience MH G G G G ML G L G
Experience Interpretation 75 % Correct H G G G G MH G G G G G ML G L G
Interpreted Signals Accuracy: Number or boxes Diversity: Different boxes
inary Interpreted Signals Model Set of objects X =N Set of outcomes S = {G,} Interpretation: I j = {m j,1,m j,2 m j,nj } is a partition of X P(m j,i ) = probability m j,i arises
Collective Intelligence 3.0
Interpretive Signals and Collective Accuracy Charisma H MH ML L H G G G Experience MH G G G G ML G L G
Experience Interpretation 75 % Correct H G G G G MH G G G G G ML G L G
Charisma Interpretation H MH ML L G G G 75% Correct G G G G G G G G
alanced Interpretation 75% Correct G G G Extreme on one measure. Moderate on the other G G G G G G G
Voting Outcome H MH ML L H GG GGG GG G MH GGG GG G G GG ML GG G G L G GG G
Reality H MH ML L H G G G MH G G G G ML G L G
Collective Measurability: The outcome function F is measurable with respect to σ(m i ) iεn, the smallest sigma field such that all M i are measurable. Proposition: F satisfies collective measurability if and only if F(x) = G(M 1 (x) M n (x)) for all x in X
Agent 1 Outcome Function Agent 2
Agent 1 Outcome Function Agent 2
Threshold Separable Additivity: Given F, {M i } iεn, and G:{0,1} N into {0,1}, there exists an integer k and a set of functions h i :{0,1} into {0,1}, such that G(M 1 (x) M n (x)) = 1 if and only if h i (M i (x)) > k N.. This does not mean that the function is linear in the models, only that it can be expressed this way!
Theorem: A classification problem can be solved by a threshold voting mechanism if and only if it satisfies collective measurability and threshold separable additivity with respect to the agents models.
Interpreted Signal Example V(a,b,c,d) = a + b + c + d a,b,c,d independent N(0,1) Model 1: a+b Model 2: c Model 3: d Optimal Statistical weighting: 3/5, 1/5, 1/5 Optimal weighting: 1,1,1
Interpreted Signal Example V(a,b,c,d) = a + b + c + d a,b,c,d independent N(0,1) Model 1: a+b Model 2: c+d Model 3: a+b+d Optimal Statistical weighting:0, 1/3, 2/3 Optimal Weighting: 1,1,0
Some Details Netflix users rank movies from 1 to 5 Six years of data Half million users 17,700 movies Data divided into (training, testing) Testing Data dived into (probe, quiz, test)
Singular Value Decomposition Each movie represented by a vector: (p 1,p 2,p 3,p 4 p n ) Each person represented by a vector: (q 1,q 2,q 3,q 4 q n ) Rating: r ij = m i + a j + p q Training: choose p,q to minimiize (actual ij r ij ) 2 + c( p 2 + q 2 )
ellkor 50 dimensions in each of 107 models est Model: 6.8% improvement Combination of Models: 8.4% improvement
ellkor s Pragmatic Chaos est Model 8.4% Ensemble: 10.1%
Enter ``The Ensemble 23 Teams 30 Countries
And The Winner is RMSE for The Ensemble: 0.85671 RMSE for ellkor's Pragmatic Chaos: 0.85670
WEIGHT vs RMSE 0.4 0.3 0.2 0.1 0 0.855 0.86 0.865 0.87 0.875 0.88-0.1-0.2
Interpreted Signal Example V(a,b,c,d) = a + b + c + d a,b,c,d independent N(0,1) Model 1: a+b+c Model 2: b+c+d Model 3: b+c Optimal Weighting: 1,1,-1
+
+ +
+ + -
Weighting question: context dependent
Generated Signals: Errors Cancel Interpreted Signals: undling
Ability: accuracy
Ability: accuracy Diversity: correlation or partitions?