Collective Intelligence

Similar documents
Diversity and Team Science

Scott E Page. University of Michigan Santa Fe Institute. Leveraging Diversity

Social Choice and Networks

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

Cpc Analyse X. askia analyse Significancy tests User guide

Political Economy of Institutions and Development: Problem Set 1. Due Date: Thursday, February 23, in class.

1-Bit Matrix Completion

Review of the General Linear Model

Preliminaries. Data Mining. The art of extracting knowledge from large bodies of structured data. Let s put it to use!

Using SVD to Recommend Movies

Variance Reduction and Ensemble Methods

Bagging and Other Ensemble Methods

WINGS scientific whitepaper, version 0.8

Review of Vectors and Matrices

Summary of Chapters 7-9

INTRODUCTION TO LOG-LINEAR MODELING

1-Bit Matrix Completion

Singular value decomposition (SVD) of large random matrices. India, 2010

1-Bit Matrix Completion

Areal data. Infant mortality, Auckland NZ districts. Number of plant species in 20cm x 20 cm patches of alpine tundra. Wheat yield

Recommender Systems. Dipanjan Das Language Technologies Institute Carnegie Mellon University. 20 November, 2007

Andriy Mnih and Ruslan Salakhutdinov

Algorithms for Collaborative Filtering

Probability Models of Information Exchange on Networks Lecture 1

LECTURE 10: LINEAR MODEL SELECTION PT. 1. October 16, 2017 SDS 293: Machine Learning

Learning representations

Variable Selection and Weighting by Nearest Neighbor Ensembles

From inductive inference to machine learning

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

Linear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Classification 1: Linear regression of indicators, linear discriminant analysis

CS540 ANSWER SHEET

Chapter 11. Output Analysis for a Single Model Prof. Dr. Mesut Güneş Ch. 11 Output Analysis for a Single Model

Collaborative topic models: motivations cont

9. Least squares data fitting

6.1 Polynomial Functions

Categorical Predictor Variables

Advanced Digital Design with the Verilog HDL, Second Edition Michael D. Ciletti Prentice Hall, Pearson Education, 2011

Suppose we needed four batches of formaldehyde, and coulddoonly4runsperbatch. Thisisthena2 4 factorial in 2 2 blocks.

Mathathon Round 1 (2 points each) 1. If this mathathon has 7 rounds of 3 problems each, how many problems does it have in total? (Not a trick!

Factorial designs (Chapter 5 in the book)

Other-Regarding Preferences: Theory and Evidence

6.034 Introduction to Artificial Intelligence

Chapter Summary. Sets The Language of Sets Set Operations Set Identities Functions Types of Functions Operations on Functions Computability

A Simple Algorithm for Nuclear Norm Regularized Problems

Collaborative Filtering. Radek Pelánek

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan

day month year documentname/initials 1

Stock Prices, News, and Economic Fluctuations: Comment

multilevel modeling: concepts, applications and interpretations

Mu Alpha Theta National Convention 2013

Ensembles of Classifiers.

Matrix Factorization In Recommender Systems. Yong Zheng, PhDc Center for Web Intelligence, DePaul University, USA March 4, 2015

Eco517 Fall 2014 C. Sims FINAL EXAM

Power Functions for. Process Behavior Charts

CS246 Final Exam, Winter 2011

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

UVA CS 4501: Machine Learning

Feature Engineering, Model Evaluations

Jeffrey D. Ullman Stanford University

Experimental design (DOE) - Design

ABC is a triangle. The point D lies on AC. Angle BDC = 90 BD = 10 cm, AB = 15 cm and DC = 12.5 cm.

Stat 406: Algorithms for classification and prediction. Lecture 1: Introduction. Kevin Murphy. Mon 7 January,

CSSS/STAT/SOC 321 Case-Based Social Statistics I. Levels of Measurement

Accuracy & confidence

DISCUSSION CLASS OF DAX IS ON 22ND MARCH, TIME : 9-12 BRING ALL YOUR DOUBTS [STRAIGHT OBJECTIVE TYPE]

A FUZZY NEURAL NETWORK MODEL FOR FORECASTING STOCK PRICE

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Fundamentals of Operations Research. Prof. G. Srinivasan. Indian Institute of Technology Madras. Lecture No. # 15

Regrese a predikce pomocí fuzzy asociačních pravidel

Multivariate GARCH models.

DS-GA 1002 Lecture notes 12 Fall Linear regression

An Introduction to Parameter Estimation

Ensembles. Léon Bottou COS 424 4/8/2010

Final Exam, Machine Learning, Spring 2009

STA 414/2104, Spring 2014, Practice Problem Set #1

Immigration attitudes (opposes immigration or supports it) it may seriously misestimate the magnitude of the effects of IVs

Forecast comparison of principal component regression and principal covariate regression

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Describing Contingency tables

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

ECON3150/4150 Spring 2016

Section 3: Simple Linear Regression

Linear Algebra Review

Multi-dimensional Human Development Measures : Trade-offs and Inequality

Evaluation requires to define performance measures to be optimized

LESSON 8.1 RATIONAL EXPRESSIONS I

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Context-based Reasoning in Ambient Intelligence - CoReAmI -

Ensemble learning 11/19/13. The wisdom of the crowds. Chapter 11. Ensemble methods. Ensemble methods

UNC Charlotte 2005 Comprehensive March 7, 2005

ECON0702: Mathematical Methods in Economics

Lecture 3 Linear Algebra Background

Section Summary. Definition of a Function.

Fast Adaptive Algorithm for Robust Evaluation of Quality of Experience

Oligopoly Theory 2 Bertrand Market Games

Multivariate Statistical Analysis

HMMT November 2017 November 11, 2017

Construction of Mixed-Level Orthogonal Arrays for Testing in Digital Marketing

Transcription:

Collective Intelligence

Collective Intelligence Prediction

A Tale of Two Models Lu Hong and Scott Page Interpreted and Generated Signals Journal of Economic Theory, 2009

Generated Signals Interpreted Signals

Generated Signal: disturbance or interference (social scientists/statisticians) Interpreted Signal: prediction from a model (computer scientists/psychologists)

Fundamental Question Generated Signals Collective Intelligence via Generated Signals Interpreted Signals Collective Intelligence via Interpreted Signals The Netflix Prize

Democracy: information aggregation Markets: prices are forecasts rational expectations

Generated Signals

Generated Signal noise Outcome Signal

UCSC!

L L-ε L+ε

Collective Intelligence 1.0

Outcome: θ in Θ

Signal: si

Distribution: f(s i θ)

Error = (si - 2 θ)

AveError = 1 n n i=1 (s i θ) 2

c = 1 n n i=1 s i

Crowd Error = (c - 2 θ)

Div = 1 n n i=1 (s i c ) 2

Diversity Prediction Theorem Crowd Error = Average Error - Diversity

Diversity Prediction Theorem Crowd Error = Average Error - Diversity n n (c θ) 2 = 1 n (s i θ) 2 1 n (s i c) 2 i=1 i=1

Crowd Error = Average Error - Diversity

Crowd Error

Crowd Error = Average Error

Average = Diversity Crowd Error - Error

0.6 = 2,956.0-2955.4

Collective Intelligence 2.0

Signals as Random Variables Mean of i s signal: µ i (θ) ias of i s signal: b i = (µ i (θ) - θ) Variance of i s signal: v i =E[(µ i (θ) - θ)] 2 Average ias = Average Variance V = Average Covariance C = 1 n(n 1) 1 n n i=1 b i 1 n n i=1 v i n E[ s i µ i ][s j µ j ] i=1 j i

ias Variance Decomposition E[SqE(c)] = 2 + 1 n V + n 1 n C

Resolving the Paradox Diversity Prediction Theorem: Predictive Diversity is realized diversity, which improves accuracy. ias Variance Decomposition: Variance corresponds to noisier signals, which reduces accuracy. Negative covariance implies diverse realized diversity and improves expected accuracy of collective predictions.

Large Population Accuracy If the signals are independent, unbiased, and with bounded variance, then as n approaches infinity the crowd s error goes to zero E[SqE(c)] = 0 2 + 1 n V + n 1 n 0

Ecologies of Models Suppose that there exist K possible models so that there exists a distribution across those models. p i = proportion of the population using model i

Collective Accuracy: Diverse Types D = 1 p i 2 i 2 + 1 D V + D 1 D C Economo, Hong, Page

Weighting

Weighting by Accuracy Accuracy A = 1/σ 2

Weighting by Accuracy Accuracy A = 1/σ 2 Weights: w i = A i /(A 1 +A 2 +A 3 +A 4 +A n ) E M i=1 2 A i M s j=1 A i = j M i=1 A 2 i ( M k=1 A k) 2(s i)+ M i=1 j=i A i A j ( M k=1 A k) 2(s i,s j )

Example Three predictors with variances: 1,2, and 4 Equally weighted: E[SqE(C)] = 7/9 Accuracy weighted: E[SqE(C a )] = 4/7 accuracies: 1, 0.5, 0.25

Accuracy and Covariance Σ = variance covariance matrix u = (1,1,1,1,1,1) Weights: w = 1/n (Σ -1 u) Error: (u Σ -1 u) -1

Example: Two Models Weight on model a σ b 2 cov(a,b) σ a 2 + σ b 2 2cov(a,b)

Forecast Standard Use equal weights unless you have strong evidence to support unequal weighting of forecasts (Armstrong 2001)

Interpreted Signals

Interpretive Signal model Attributes Prediction

Concepts and Categorization Categorization enables a wide variety of subordinate functions because classifying something as a category member allows people to bring their knowledge of the category to bear on the new instance. Once people categorize some novel entity for example, then can use relevant knowledge for understanding and prediction. Medin and Rips

Interpretive Signal Example Charisma H MH ML L H G G G Experience MH G G G G ML G L G

Experience Interpretation 75 % Correct H G G G G MH G G G G G ML G L G

Interpreted Signals Accuracy: Number or boxes Diversity: Different boxes

inary Interpreted Signals Model Set of objects X =N Set of outcomes S = {G,} Interpretation: I j = {m j,1,m j,2 m j,nj } is a partition of X P(m j,i ) = probability m j,i arises

Collective Intelligence 3.0

Interpretive Signals and Collective Accuracy Charisma H MH ML L H G G G Experience MH G G G G ML G L G

Experience Interpretation 75 % Correct H G G G G MH G G G G G ML G L G

Charisma Interpretation H MH ML L G G G 75% Correct G G G G G G G G

alanced Interpretation 75% Correct G G G Extreme on one measure. Moderate on the other G G G G G G G

Voting Outcome H MH ML L H GG GGG GG G MH GGG GG G G GG ML GG G G L G GG G

Reality H MH ML L H G G G MH G G G G ML G L G

Collective Measurability: The outcome function F is measurable with respect to σ(m i ) iεn, the smallest sigma field such that all M i are measurable. Proposition: F satisfies collective measurability if and only if F(x) = G(M 1 (x) M n (x)) for all x in X

Agent 1 Outcome Function Agent 2

Agent 1 Outcome Function Agent 2

Threshold Separable Additivity: Given F, {M i } iεn, and G:{0,1} N into {0,1}, there exists an integer k and a set of functions h i :{0,1} into {0,1}, such that G(M 1 (x) M n (x)) = 1 if and only if h i (M i (x)) > k N.. This does not mean that the function is linear in the models, only that it can be expressed this way!

Theorem: A classification problem can be solved by a threshold voting mechanism if and only if it satisfies collective measurability and threshold separable additivity with respect to the agents models.

Interpreted Signal Example V(a,b,c,d) = a + b + c + d a,b,c,d independent N(0,1) Model 1: a+b Model 2: c Model 3: d Optimal Statistical weighting: 3/5, 1/5, 1/5 Optimal weighting: 1,1,1

Interpreted Signal Example V(a,b,c,d) = a + b + c + d a,b,c,d independent N(0,1) Model 1: a+b Model 2: c+d Model 3: a+b+d Optimal Statistical weighting:0, 1/3, 2/3 Optimal Weighting: 1,1,0

Some Details Netflix users rank movies from 1 to 5 Six years of data Half million users 17,700 movies Data divided into (training, testing) Testing Data dived into (probe, quiz, test)

Singular Value Decomposition Each movie represented by a vector: (p 1,p 2,p 3,p 4 p n ) Each person represented by a vector: (q 1,q 2,q 3,q 4 q n ) Rating: r ij = m i + a j + p q Training: choose p,q to minimiize (actual ij r ij ) 2 + c( p 2 + q 2 )

ellkor 50 dimensions in each of 107 models est Model: 6.8% improvement Combination of Models: 8.4% improvement

ellkor s Pragmatic Chaos est Model 8.4% Ensemble: 10.1%

Enter ``The Ensemble 23 Teams 30 Countries

And The Winner is RMSE for The Ensemble: 0.85671 RMSE for ellkor's Pragmatic Chaos: 0.85670

WEIGHT vs RMSE 0.4 0.3 0.2 0.1 0 0.855 0.86 0.865 0.87 0.875 0.88-0.1-0.2

Interpreted Signal Example V(a,b,c,d) = a + b + c + d a,b,c,d independent N(0,1) Model 1: a+b+c Model 2: b+c+d Model 3: b+c Optimal Weighting: 1,1,-1

+

+ +

+ + -

Weighting question: context dependent

Generated Signals: Errors Cancel Interpreted Signals: undling

Ability: accuracy

Ability: accuracy Diversity: correlation or partitions?