Collective Intelligence

Collective Intelligence Prediction

A Tale of Two Models Lu Hong and Scott Page Interpreted and Generated Signals Journal of Economic Theory, 2009

Generated Signals Interpreted Signals

Generated Signal: disturbance or interference (social scientists/statisticians) Interpreted Signal: prediction from a model (computer scientists/psychologists)

Fundamental Question Generated Signals Collective Intelligence via Generated Signals Interpreted Signals Collective Intelligence via Interpreted Signals The Netflix Prize

Democracy: information aggregation Markets: prices are forecasts rational expectations

Generated Signals

Generated Signal noise Outcome Signal

UCSC!

L L-ε L+ε

Collective Intelligence 1.0

Outcome: θ in Θ

Signal: si

Distribution: f(s i θ)

Error = (si - 2 θ)

AveError = 1 n n i=1 (s i θ) 2

c = 1 n n i=1 s i

Crowd Error = (c - 2 θ)

Div = 1 n n i=1 (s i c ) 2

Diversity Prediction Theorem Crowd Error = Average Error - Diversity

Diversity Prediction Theorem Crowd Error = Average Error - Diversity n n (c θ) 2 = 1 n (s i θ) 2 1 n (s i c) 2 i=1 i=1

Crowd Error = Average Error - Diversity

Crowd Error

Crowd Error = Average Error

Average = Diversity Crowd Error - Error

0.6 = 2,956.0-2955.4

Collective Intelligence 2.0

Signals as Random Variables Mean of i s signal: µ i (θ) ias of i s signal: b i = (µ i (θ) - θ) Variance of i s signal: v i =E[(µ i (θ) - θ)] 2 Average ias = Average Variance V = Average Covariance C = 1 n(n 1) 1 n n i=1 b i 1 n n i=1 v i n E[ s i µ i ][s j µ j ] i=1 j i

ias Variance Decomposition E[SqE(c)] = 2 + 1 n V + n 1 n C

Resolving the Paradox Diversity Prediction Theorem: Predictive Diversity is realized diversity, which improves accuracy. ias Variance Decomposition: Variance corresponds to noisier signals, which reduces accuracy. Negative covariance implies diverse realized diversity and improves expected accuracy of collective predictions.

Large Population Accuracy If the signals are independent, unbiased, and with bounded variance, then as n approaches infinity the crowd s error goes to zero E[SqE(c)] = 0 2 + 1 n V + n 1 n 0

Ecologies of Models Suppose that there exist K possible models so that there exists a distribution across those models. p i = proportion of the population using model i

Collective Accuracy: Diverse Types D = 1 p i 2 i 2 + 1 D V + D 1 D C Economo, Hong, Page

Weighting

Weighting by Accuracy Accuracy A = 1/σ 2

Weighting by Accuracy Accuracy A = 1/σ 2 Weights: w i = A i /(A 1 +A 2 +A 3 +A 4 +A n ) E M i=1 2 A i M s j=1 A i = j M i=1 A 2 i ( M k=1 A k) 2(s i)+ M i=1 j=i A i A j ( M k=1 A k) 2(s i,s j )

Example Three predictors with variances: 1,2, and 4 Equally weighted: E[SqE(C)] = 7/9 Accuracy weighted: E[SqE(C a )] = 4/7 accuracies: 1, 0.5, 0.25

Accuracy and Covariance Σ = variance covariance matrix u = (1,1,1,1,1,1) Weights: w = 1/n (Σ -1 u) Error: (u Σ -1 u) -1

Example: Two Models Weight on model a σ b 2 cov(a,b) σ a 2 + σ b 2 2cov(a,b)

Forecast Standard Use equal weights unless you have strong evidence to support unequal weighting of forecasts (Armstrong 2001)

Interpreted Signals

Interpretive Signal model Attributes Prediction

Concepts and Categorization Categorization enables a wide variety of subordinate functions because classifying something as a category member allows people to bring their knowledge of the category to bear on the new instance. Once people categorize some novel entity for example, then can use relevant knowledge for understanding and prediction. Medin and Rips

Interpretive Signal Example Charisma H MH ML L H G G G Experience MH G G G G ML G L G

Experience Interpretation 75 % Correct H G G G G MH G G G G G ML G L G

Interpreted Signals Accuracy: Number or boxes Diversity: Different boxes

inary Interpreted Signals Model Set of objects X =N Set of outcomes S = {G,} Interpretation: I j = {m j,1,m j,2 m j,nj } is a partition of X P(m j,i ) = probability m j,i arises

Collective Intelligence 3.0

Interpretive Signals and Collective Accuracy Charisma H MH ML L H G G G Experience MH G G G G ML G L G

Experience Interpretation 75 % Correct H G G G G MH G G G G G ML G L G

Charisma Interpretation H MH ML L G G G 75% Correct G G G G G G G G

alanced Interpretation 75% Correct G G G Extreme on one measure. Moderate on the other G G G G G G G

Voting Outcome H MH ML L H GG GGG GG G MH GGG GG G G GG ML GG G G L G GG G

Reality H MH ML L H G G G MH G G G G ML G L G

Collective Measurability: The outcome function F is measurable with respect to σ(m i ) iεn, the smallest sigma field such that all M i are measurable. Proposition: F satisfies collective measurability if and only if F(x) = G(M 1 (x) M n (x)) for all x in X

Agent 1 Outcome Function Agent 2

Threshold Separable Additivity: Given F, {M i } iεn, and G:{0,1} N into {0,1}, there exists an integer k and a set of functions h i :{0,1} into {0,1}, such that G(M 1 (x) M n (x)) = 1 if and only if h i (M i (x)) > k N.. This does not mean that the function is linear in the models, only that it can be expressed this way!

Theorem: A classification problem can be solved by a threshold voting mechanism if and only if it satisfies collective measurability and threshold separable additivity with respect to the agents models.

Interpreted Signal Example V(a,b,c,d) = a + b + c + d a,b,c,d independent N(0,1) Model 1: a+b Model 2: c Model 3: d Optimal Statistical weighting: 3/5, 1/5, 1/5 Optimal weighting: 1,1,1

Interpreted Signal Example V(a,b,c,d) = a + b + c + d a,b,c,d independent N(0,1) Model 1: a+b Model 2: c+d Model 3: a+b+d Optimal Statistical weighting:0, 1/3, 2/3 Optimal Weighting: 1,1,0

Some Details Netflix users rank movies from 1 to 5 Six years of data Half million users 17,700 movies Data divided into (training, testing) Testing Data dived into (probe, quiz, test)

Singular Value Decomposition Each movie represented by a vector: (p 1,p 2,p 3,p 4 p n ) Each person represented by a vector: (q 1,q 2,q 3,q 4 q n ) Rating: r ij = m i + a j + p q Training: choose p,q to minimiize (actual ij r ij ) 2 + c( p 2 + q 2 )

ellkor 50 dimensions in each of 107 models est Model: 6.8% improvement Combination of Models: 8.4% improvement

ellkor s Pragmatic Chaos est Model 8.4% Ensemble: 10.1%

Enter ``The Ensemble 23 Teams 30 Countries

And The Winner is RMSE for The Ensemble: 0.85671 RMSE for ellkor's Pragmatic Chaos: 0.85670

WEIGHT vs RMSE 0.4 0.3 0.2 0.1 0 0.855 0.86 0.865 0.87 0.875 0.88-0.1-0.2

Interpreted Signal Example V(a,b,c,d) = a + b + c + d a,b,c,d independent N(0,1) Model 1: a+b+c Model 2: b+c+d Model 3: b+c Optimal Weighting: 1,1,-1

+ +

+ + -

Weighting question: context dependent

Generated Signals: Errors Cancel Interpreted Signals: undling

Ability: accuracy

Ability: accuracy Diversity: correlation or partitions?