Gambling and Information Theory

Similar documents
A New Interpretation of Information Rate

Information Theory and Networks

Horse Racing Redux. Background

A New Interpretation of Information Rate

Lecture 1: Introduction, Entropy and ML estimation

Lecture 2: August 31

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Lecture 5 - Information theory

Information Theory in Intelligent Decision Making

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

Revision of Lecture 4

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

Information measures in simple coding problems

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

Noisy channel communication

Dynamical systems, information and time series

Information. = more information was provided by the outcome in #2

A Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Shannon s noisy-channel theorem

The topics in this section concern with the first course objective.

Introduction to Information Theory. B. Škorić, Physical Aspects of Digital Security, Chapter 2

Lecture 18: Quantum Information Theory and Holevo s Bound

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Interpretations of Directed Information in Portfolio Theory, Data Compression, and Hypothesis Testing

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Chapter I: Fundamental Information Theory

Information Complexity and Applications. Mark Braverman Princeton University and IAS FoCM 17 July 17, 2017

Information Theory Primer:

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019

Example: Letter Frequencies

Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006

The binary entropy function

Information in Biology

Expectation Maximization

Module 1. Introduction to Digital Communications and Information Theory. Version 2 ECE IIT, Kharagpur

Markov Chains. Chapter 16. Markov Chains - 1

6.02 Fall 2011 Lecture #9

(Classical) Information Theory III: Noisy channel coding

Information in Biology

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

1 Random Walks and Electrical Networks

Exercises with solutions (Set D)

U Logo Use Guidelines

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Shannon's Theory of Communication

EE 4TM4: Digital Communications II. Channel Capacity

Lecture 11: Continuous-valued signals and differential entropy

Discrete Probability

Information Theory, Statistics, and Decision Trees

Computing and Communications 2. Information Theory -Entropy

Revision of Lecture 5

Entropies & Information Theory

INTRODUCTION TO INFORMATION THEORY

Communication Theory II

An Introduction to Information Theory: Notes

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018

RANDOM WALKS IN ONE DIMENSION

Probability and the Second Law of Thermodynamics

Lecture 11: Quantum Information III - Source Coding

Example: Letter Frequencies

Upper Bounds on the Capacity of Binary Intermittent Communication

Discrete Random Variables. Discrete Random Variables

Example: Letter Frequencies

QB LECTURE #4: Motif Finding

Towards a Theory of Information Flow in the Finitary Process Soup

Chaos, Complexity, and Inference (36-462)

CS 630 Basic Probability and Information Theory. Tim Campbell

Massachusetts Institute of Technology

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

Solutions to Homework Set #3 Channel and Source coding

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5.

LECTURE 3. Last time:

Chapter 9 Fundamental Limits in Information Theory

Homework Set #2 Data Compression, Huffman code and AEP

Noisy-Channel Coding

Quantum Information Theory and Cryptography

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

ECE Information theory Final

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Lecture 8: Shannon s Noise Models

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes

Lecture 2. Capacity of the Gaussian channel

Principles of Communications

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

ELEMENTS O F INFORMATION THEORY

9. Distance measures. 9.1 Classical information measures. Head Tail. How similar/close are two probability distributions? Trace distance.

Shannon s Noisy-Channel Coding Theorem

Solutions to In Class Problems Week 15, Wed.

1 Introduction. Sept CS497:Learning and NLP Lec 4: Mathematical and Computational Paradigms Fall Consider the following examples:

LEM. New Results on Betting Strategies, Market Selection, and the Role of Luck. Giulio Bottazzi Daniele Giachini

Entropy, Relative Entropy and Mutual Information Exercises

Discrete Memoryless Channels with Memoryless Output Sequences

to mere bit flips) may affect the transmission.

H(X) = plog 1 p +(1 p)log 1 1 p. With a slight abuse of notation, we denote this quantity by H(p) and refer to it as the binary entropy function.

Transcription:

Gambling and Information Theory Giulio Bertoli UNIVERSITEIT VAN AMSTERDAM December 17, 2014

Overview Introduction Kelly Gambling Horse Races and Mutual Information

Some Facts Shannon (1948): definitions/concepts based on coding In following years: information without coding? J. L. Kelly (1956): paper A new interpretation of information rate on Bell Sys. Tech. Journal B. S. T. J., 35 (4): 917-926, Mar., 1956

John Larry Kelly 1923 Corsicana TX 1953 - PhD in Physics, then Bell Labs 1956 - Kelly Gambling 1961 - Speech Synthesis 1965 NY Remarkable character: gunslinger, stuntman pilot... Never profited of his findings on gambling (Shannon did!)

Kelly Gambling Let s bet! Take a single horse race with three horses, with probability of 1 winning 6, 1 2, 1 respectively. 3 You can bet any fraction of your capital on any horse and place simultaneous bets, but you must bet all of it. How would you bet?

Kelly Gambling Now, let s take the case where every Saturday there s such a horse race. How does your betting strategy change?

Kelly Gambling Now, let s take the case where every Saturday there s such a horse race. How does your betting strategy change? If you ALWAYS bet on horse 2, you ll go broke! Most intuitive way: bet according to probabilities.

Kelly Gambling Let s formalize this, follow Kelly s article (1956). Gambler with private wire: channel transmits results on binary bet BEFORE they become public. Noisless binary channel Noisy binary channel General case

Gambler with private wire - Noiseless Gambler sure of winning bets all his money. Consider 2-for-1 bet. After N bets, he s got V N =2 N times his initial money V 0. Define the exponential rate of growth: In our case, G =1. 1 G = lim N N log V N (1) V 0

Gambler with private wire - Noisy This time, there s probability of error p (correct transmission with probability q =1 p). If gambler bets all his money every time, he will be broke for N large enough! He should bet a fraction, f, of his money. We have: V N =(1+f ) W (1 f ) L V 0

Gambler with private wire - Noisy Compute G using V N =(1+f ) W (1 f ) L V 0 : (1 + f ) W (1 f ) L V 0 G = lim log N V 0 W = lim N N log(1 + f )+ L N log(1 f ) = q log(1 + f )+p log(1 f ) Want money? Maximize G!

Gambler with private wire - Noisy Maximize G w.r.t. f, using concavity of log or Lagrange multipliers. You get the relations Which give you: 1+f =2q 1 f =2p G max =1+p log p + q log q

General case Notation Consider case where we bet on input symbols, representing outcome of chance events. Now channel has several inputs x with probability of transmission p(x) and several outputs y with probability of reception q(y). The joint probability is p(x, y). Let s call b(x y) the fraction of the gambler s capital that he decides to bet on x after he receives y. o x are the odds paid to the gambler for a 1-dollar bet if x wins.

Horse Races with no channel But first, let s consider no channel at all. Then we simply have a horse race of which we know nothing except the probabilities. What is G? Use now V N = b(x)o(x) W V 0 : 1 G = lim N N log V N V 0 = p(x) log[b(x)o(x)]

Horse Races Again, seek to maximize G. Does Kelly gambling work? YES! (Theorem 6.1.2 in CT, Kelly gambling is log-optimal) G = p(x) log[b(x)o x ] = p(x) log[ b(x) p(x) p(x)o x] = p(x) log[o x ] H(p) D(p b) p(x) log[o x ] H(p) Where equality holds iff p = b. QED

Interpretation of result Take fair horse race, where 1 o x =1. The bookie s estimate is given by r x =1/o x, seen as probability distribution. We note: G = p(x) log[b(x)o x ] = p(x) log[ b(x) p(x) p(x) r(x) ] = D(p r) D(p b) This means that we can make money only if our estimate (entropy distance) is better (less) than the bookie s!

Horse Races - with channel Back to case with channel. Consider the most general case with odds 1 o x =1. Now we have: G max = x,y p(x, y) log[b(x y)o x ] = x,y p(x, y) log[b(x y)] + x p(x) log o x = x p(x) log o x H(X Y ) Where in the last line we maximize setting b(x) =p(x).

Mutual Information Compare this to case without channel. There G = x p(x) log o x H(X ). This results in Theorem 6.2.1 of CT: The increase in G due to side information Y for a horse race X is given by the mutual information I (X ; Y ). Proof: just compare previously obtained results! G = G with side info G without side info = p(x) log o x H(X Y ) x = H(X ) H(X Y )=I(X; Y ) QED p(x) log o x H(X ) x

Example: 6.15 of CT Let X be the winner of a fair horse race (o x =1/p(x)). b(x) is the bet on horse x as usual. What is the optimal growth rate G?

Example: 6.15 of CT Let X be the winner of a fair horse race (o x =1/p(x)). b(x) is the bet on horse x as usual. What is the optimal growth rate G? G = p(x) log[b(x)o x ] = p(x) log[1] =0

Example: 6.15 of CT Suppose now we know that Y =1if X =1, 2, and Y =0 otherwise. What is then G?

Example: 6.15 of CT Suppose now we know that Y =1if X =1, 2, and Y =0 otherwise. What is then G? G =0+I (X ; Y )=H(Y ) H(Y X ) = H(Y ) = H(p(1) + p(2))

Summing up & Outlook Gambling and Inf Theory have a lot in common If there s no track take, Kelly gambling is the way The maximum exponential rate of growth G is larger than it would have been with no channel by an amount equal to I (X ; Y ). This was first glimpse of subfield; nowadays applied to stock market.