Collective Intelligence

Size: px

Start display at page:

Download "Collective Intelligence"

Milton Hamilton
6 years ago
Views:

1 Collective Intelligence

2 Collective Intelligence Prediction

4 A Tale of Two Models Lu Hong and Scott Page Interpreted and Generated Signals Journal of Economic Theory, 2009

5 Generated Signals Interpreted Signals

6 Generated Signal: disturbance or interference (social scientists/statisticians) Interpreted Signal: prediction from a model (computer scientists/psychologists)

7 Fundamental Question Generated Signals Collective Intelligence via Generated Signals Interpreted Signals Collective Intelligence via Interpreted Signals The Netflix Prize

8 Democracy: information aggregation Markets: prices are forecasts rational expectations

9 Generated Signals

10 Generated Signal noise Outcome Signal

11 UCSC!

12 L L-ε L+ε

13 Collective Intelligence 1.0

14 Outcome: θ in Θ

15 Signal: si

16 Distribution: f(s i θ)

17 Error = (si - 2 θ)

18 AveError = 1 n n i=1 (s i θ) 2

19 c = 1 n n i=1 s i

20 Crowd Error = (c - 2 θ)

21 Div = 1 n n i=1 (s i c ) 2

22 Diversity Prediction Theorem Crowd Error = Average Error - Diversity

23 Diversity Prediction Theorem Crowd Error = Average Error - Diversity n n (c θ) 2 = 1 n (s i θ) 2 1 n (s i c) 2 i=1 i=1

24 Crowd Error = Average Error - Diversity

25 Crowd Error

26 Crowd Error = Average Error

27 Average = Diversity Crowd Error - Error

29 0.6 = 2,

30 Collective Intelligence 2.0

31 Signals as Random Variables Mean of i s signal: µ i (θ) ias of i s signal: b i = (µ i (θ) - θ) Variance of i s signal: v i =E[(µ i (θ) - θ)] 2 Average ias = Average Variance V = Average Covariance C = 1 n(n 1) 1 n n i=1 b i 1 n n i=1 v i n E[ s i µ i ][s j µ j ] i=1 j i

32 ias Variance Decomposition E[SqE(c)] = n V + n 1 n C

33 Resolving the Paradox Diversity Prediction Theorem: Predictive Diversity is realized diversity, which improves accuracy. ias Variance Decomposition: Variance corresponds to noisier signals, which reduces accuracy. Negative covariance implies diverse realized diversity and improves expected accuracy of collective predictions.

34 Large Population Accuracy If the signals are independent, unbiased, and with bounded variance, then as n approaches infinity the crowd s error goes to zero E[SqE(c)] = n V + n 1 n 0

35 Ecologies of Models Suppose that there exist K possible models so that there exists a distribution across those models. p i = proportion of the population using model i

36 Collective Accuracy: Diverse Types D = 1 p i 2 i D V + D 1 D C Economo, Hong, Page

37 Weighting

38 Weighting by Accuracy Accuracy A = 1/σ 2

39 Weighting by Accuracy Accuracy A = 1/σ 2 Weights: w i = A i /(A 1 +A 2 +A 3 +A 4 +A n ) E M i=1 2 A i M s j=1 A i = j M i=1 A 2 i ( M k=1 A k) 2(s i)+ M i=1 j=i A i A j ( M k=1 A k) 2(s i,s j )

40 Example Three predictors with variances: 1,2, and 4 Equally weighted: E[SqE(C)] = 7/9 Accuracy weighted: E[SqE(C a )] = 4/7 accuracies: 1, 0.5, 0.25

41 Accuracy and Covariance Σ = variance covariance matrix u = (1,1,1,1,1,1) Weights: w = 1/n (Σ -1 u) Error: (u Σ -1 u) -1

42 Example: Two Models Weight on model a σ b 2 cov(a,b) σ a 2 + σ b 2 2cov(a,b)

43 Forecast Standard Use equal weights unless you have strong evidence to support unequal weighting of forecasts (Armstrong 2001)

44 Interpreted Signals

45 Interpretive Signal model Attributes Prediction

46 Concepts and Categorization Categorization enables a wide variety of subordinate functions because classifying something as a category member allows people to bring their knowledge of the category to bear on the new instance. Once people categorize some novel entity for example, then can use relevant knowledge for understanding and prediction. Medin and Rips

47 Interpretive Signal Example Charisma H MH ML L H G G G Experience MH G G G G ML G L G

48 Experience Interpretation 75 % Correct H G G G G MH G G G G G ML G L G

49 Interpreted Signals Accuracy: Number or boxes Diversity: Different boxes

50 inary Interpreted Signals Model Set of objects X =N Set of outcomes S = {G,} Interpretation: I j = {m j,1,m j,2 m j,nj } is a partition of X P(m j,i ) = probability m j,i arises

51 Collective Intelligence 3.0

52 Interpretive Signals and Collective Accuracy Charisma H MH ML L H G G G Experience MH G G G G ML G L G

53 Experience Interpretation 75 % Correct H G G G G MH G G G G G ML G L G

54 Charisma Interpretation H MH ML L G G G 75% Correct G G G G G G G G

55 alanced Interpretation 75% Correct G G G Extreme on one measure. Moderate on the other G G G G G G G

56 Voting Outcome H MH ML L H GG GGG GG G MH GGG GG G G GG ML GG G G L G GG G

57 Reality H MH ML L H G G G MH G G G G ML G L G

58 Collective Measurability: The outcome function F is measurable with respect to σ(m i ) iεn, the smallest sigma field such that all M i are measurable. Proposition: F satisfies collective measurability if and only if F(x) = G(M 1 (x) M n (x)) for all x in X

59 Agent 1 Outcome Function Agent 2

60 Agent 1 Outcome Function Agent 2

61 Threshold Separable Additivity: Given F, {M i } iεn, and G:{0,1} N into {0,1}, there exists an integer k and a set of functions h i :{0,1} into {0,1}, such that G(M 1 (x) M n (x)) = 1 if and only if h i (M i (x)) > k N.. This does not mean that the function is linear in the models, only that it can be expressed this way!

62 Theorem: A classification problem can be solved by a threshold voting mechanism if and only if it satisfies collective measurability and threshold separable additivity with respect to the agents models.

63 Interpreted Signal Example V(a,b,c,d) = a + b + c + d a,b,c,d independent N(0,1) Model 1: a+b Model 2: c Model 3: d Optimal Statistical weighting: 3/5, 1/5, 1/5 Optimal weighting: 1,1,1

64 Interpreted Signal Example V(a,b,c,d) = a + b + c + d a,b,c,d independent N(0,1) Model 1: a+b Model 2: c+d Model 3: a+b+d Optimal Statistical weighting:0, 1/3, 2/3 Optimal Weighting: 1,1,0

67 Some Details Netflix users rank movies from 1 to 5 Six years of data Half million users 17,700 movies Data divided into (training, testing) Testing Data dived into (probe, quiz, test)

68 Singular Value Decomposition Each movie represented by a vector: (p 1,p 2,p 3,p 4 p n ) Each person represented by a vector: (q 1,q 2,q 3,q 4 q n ) Rating: r ij = m i + a j + p q Training: choose p,q to minimiize (actual ij r ij ) 2 + c( p 2 + q 2 )

69 ellkor 50 dimensions in each of 107 models est Model: 6.8% improvement Combination of Models: 8.4% improvement

70 ellkor s Pragmatic Chaos est Model 8.4% Ensemble: 10.1%

71 Enter ``The Ensemble 23 Teams 30 Countries

72 And The Winner is RMSE for The Ensemble: RMSE for ellkor's Pragmatic Chaos:

73 WEIGHT vs RMSE

74 Interpreted Signal Example V(a,b,c,d) = a + b + c + d a,b,c,d independent N(0,1) Model 1: a+b+c Model 2: b+c+d Model 3: b+c Optimal Weighting: 1,1,-1

75 +

76 + +

77 + + -

78 Weighting question: context dependent

79 Generated Signals: Errors Cancel Interpreted Signals: undling

80 Ability: accuracy

81 Ability: accuracy Diversity: correlation or partitions?

Diversity and Team Science

Diversity and Team Science Scott E Page Santa Fe Institute University of Michigan Outline A Great Big Complex World Diversity and Prediction Diversity Prediction Theorem Model Diversity Theorem Categorical