Discrete random structures whose limits are described by a PDE: 3 open problems

Similar documents
Lecture 4. David Aldous. 2 September David Aldous Lecture 4

Erdős-Renyi random graphs basics

First order logic on Galton-Watson trees

Figure 10.1: Recording when the event E occurs

3. The Voter Model. David Aldous. June 20, 2012

1. Let X and Y be independent exponential random variables with rate α. Find the densities of the random variables X 3, X Y, min(x, Y 3 )

The Moran Process as a Markov Chain on Leaf-labeled Trees

Interlude: Practice Final

Modern Discrete Probability Branching processes

MAS275 Probability Modelling Exercises

2. The averaging process

MATH 19B FINAL EXAM PROBABILITY REVIEW PROBLEMS SPRING, 2010

Stochastic Modelling Unit 1: Markov chain models

4: The Pandemic process

Mixing Times and Hitting Times

Childrens game Up The River. Each player has three boats; throw a non-standard die, choose one boat to advance. At end of round, every boat is moved

Sample Spaces, Random Variables

Lecture 8 : Structural Induction DRAFT

14 Branching processes

T 1. The value function v(x) is the expected net gain when using the optimal stopping time starting at state x:

Writing proofs for MATH 51H Section 2: Set theory, proofs of existential statements, proofs of uniqueness statements, proof by cases

De los ejercicios de abajo (sacados del libro de Georgii, Stochastics) se proponen los siguientes:

CS 237: Probability in Computing

Infinitely iterated Brownian motion

Math Stochastic Processes & Simulation. Davar Khoshnevisan University of Utah

The story of the film so far... Mathematics for Informatics 4a. Continuous-time Markov processes. Counting processes

Hard-Core Model on Random Graphs

Theory of Stochastic Processes 3. Generating functions and their applications

Inference for Stochastic Processes

Average-case Analysis for Combinatorial Problems,

Markov Chains. Chapter 16. Markov Chains - 1

Solutions For Stochastic Process Final Exam

BINOMIAL DISTRIBUTION

The Kemeny constant of a Markov chain

Stochastic processes. MAS275 Probability Modelling. Introduction and Markov chains. Continuous time. Markov property

CONTENTS. Preface List of Symbols and Notation

INTRODUCTION TO MARKOV CHAIN MONTE CARLO

4 Branching Processes

CS 361: Probability & Statistics

MATH 425, FINAL EXAM SOLUTIONS

Scaling limits for random trees and graphs

Finite Markov Information-Exchange processes

The Leslie Matrix. The Leslie Matrix (/2)

Markov Processes Hamid R. Rabiee

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

Optional Stopping Theorem Let X be a martingale and T be a stopping time such

Math 456: Mathematical Modeling. Tuesday, March 6th, 2018

(b) What is the variance of the time until the second customer arrives, starting empty, assuming that we measure time in minutes?

Example. How to Guess What to Prove

Quadratic function and equations Quadratic function/equations, supply, demand, market equilibrium

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Week 12: Optimisation and Course Review.

Math Homework 5 Solutions

2. Transience and Recurrence

6 Markov Chain Monte Carlo (MCMC)

Admin and Lecture 1: Recap of Measure Theory

Economics 2010c: Lectures 9-10 Bellman Equation in Continuous Time

Random trees and applications

Lecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes

Markov chains. Randomness and Computation. Markov chains. Markov processes

BRANCHING PROCESSES 1. GALTON-WATSON PROCESSES

Lecture 1: An introduction to probability theory

Sharpness of second moment criteria for branching and tree-indexed processes

MATH 56A: STOCHASTIC PROCESSES CHAPTER 2

Mean-field dual of cooperative reproduction

MATHEMATICAL INDUCTION

18.175: Lecture 30 Markov chains

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Power series solutions for 2nd order linear ODE s (not necessarily with constant coefficients) a n z n. n=0

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t

3 rd class Mech. Eng. Dept. hamdiahmed.weebly.com Fourier Series

MATH 56A SPRING 2008 STOCHASTIC PROCESSES

Discrete Event Systems Solution to Exercise Sheet 6

MAS1302 Computational Probability and Statistics

Branching processes. Chapter Background Basic definitions

Lecture 06 01/31/ Proofs for emergence of giant component

Markov Chains on Countable State Space

MATH 564/STAT 555 Applied Stochastic Processes Homework 2, September 18, 2015 Due September 30, 2015

Lecture 7: Simulation of Markov Processes. Pasi Lassila Department of Communications and Networking

Section Summary. Proof by Cases Existence Proofs

Master s Written Examination - Solution

P (A G) dp G P (A G)

process on the hierarchical group

Entropy and Compression for Sparse Graphs with Vertex Names

Reading 5 : Induction

MATH 56A: STOCHASTIC PROCESSES CHAPTER 6

Modelling data networks stochastic processes and Markov chains

Infinite Limits. By Tuesday J. Johnson

WXML Final Report: Chinese Restaurant Process

CS1800: Mathematical Induction. Professor Kevin Gold

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Matroids and Greedy Algorithms Date: 10/31/16

Lecture 5: The Principle of Deferred Decisions. Chernoff Bounds

Chapter 7. Homogeneous equations with constant coefficients

1. Let X be a random variable with probability density function. 1 x < f(x) = 0 otherwise

Entropy for Sparse Random Graphs With Vertex-Names

A Proof of Aldous Spectral Gap Conjecture. Thomas M. Liggett, UCLA. Stirring Processes

Random Times and Their Properties

Example: physical systems. If the state space. Example: speech recognition. Context can be. Example: epidemics. Suppose each infected

Mock Exam Künstliche Intelligenz-1. Different problems test different skills and knowledge, so do not get stuck on one problem.

RANDOM WALKS IN ONE DIMENSION

Transcription:

Discrete random structures whose limits are described by a PDE: 3 open problems David Aldous April 3, 2014

Much of my research has involved study of n limits of size n random structures. There are many techniques one can try. This talk is about models where it s easy to see (heuristically) there should be some kind of limit process which is determined (somehow) by a specific PDE. In such cases I want to start with the explicit solution of the PDE then the hard work comes when we try to formalize heuristics. (In principle not need explicit solution, but in practice...... ). This is a very old-fashioned attitude, to PDE theorists! The most familiar interface between Probability and PDEs is the theory of finite- or infinite-dimensional diffusions. My examples are (mostly) different.

The only example where I could carry this program through was in Aldous-Diaconis (1995) Hammersley s Interacting Particle Process and Longest Increasing Subsequences. Limit was determined by a function F (t, x) on 0 < t <, 0 < x < satisfying (writing F t for F t ) Ft = 1/Fx F (t, 0) = F (x, 0) = 0 Here we knew the answer in advance (because re-proving known result) F (t, x) = 2 tx. The talk will describe 3 examples where I can t solve the PDE explicitly and so haven t got started......

1. How to make a spectator sport exciting You are watching a match in real time. p(t) = chance home team wins, given what has happened so far. Several ways to think about the process p(t) (a) via real-time gambling odds

(b) Via a math model for the point difference process X (t) = (points by home team) - (points by visiting team) e.g. for soccer: team i scores at times of a Poisson (rate λ i ) process. Given any model for X (t), and taking match duration as 1 time unit, we have the derived price process p(t) = P(X (1) > 0 F(t)). (1) If designing a new sport, what would we want the process p(t) to be? Imagine equally good teams, so p(0) = 1/2; p(1) = 1 or 0, equally likely. What would we like the distribution of p(1/2) to be? To a spectator: if p(1/2) typically close to 1/2 then only the second half of the match is important if p(1/2) typically close to 1 or 0 then only the first half of the match is important Note paradox: an individual match is exciting if result open until near the end, but we don t want that to happen in every match.

Rules of the sport determine point difference process X (t) = (points by home team) - (points by visiting team) which then determines price process p(t) = P(X (1) > 0 F(t)). Clearly p(t) must be a martingale. Simplify by assuming p(t) is a continuous-path martingale and a (time-inhomogeneous) Markov process (roughly, this is saying X (t) is continuous and (time-inhomogeneous) Markov). So we have for some variance rate σ 2 (x, t). dp(t) = σ(p(t), t) db(t) Question. We can in principle (cf. Jeopardy) design a game to have any chosen σ(x, t), up to two integrals being finite/infinite. How do we choose?

Under the simplest model (e.g. the soccer model) for (continuized) point difference p(1/2) has U(0, 1) distribution. (2) We have a small data-set consistent with this theory. See 2013 Monthly paper Using Prediction Market Data to Illustrate Undergraduate Probability. Is (2) desirable? from entropy viewpoint??? from analysis of variance viewpoint, 1/3 of variance is resolved in first half, 2/3 in second half??? Return to previous slide. We study the question there by considering maximizing entropy for the whole process (p(t), 0 t 1). Not clear how to do this directly in continuous time/space; we first discretize then pass to a limit.

Take integer parameters (T, N). Take discrete state space { N, N + 1,..., N 1, N}. We can construct the discrete time process (X s, s = 0, 1, 2,..., T ) which is the maximum entropy process satisfying X (0) = 0; X (T ) = N or N (3) and the martingale property. The process is the time-inhomogeneous Markov chain whose transition probabilities p s (i, j) = P(X s+1 = j X s = i) are defined by backwards induction as follows. Clearly for s = T 1 we must have Define p T 1 (i, N) = i+n 2N, e T 1 (i) = i+n 2N p T 1(i, N) = N i 2N. i+n log 2N N i 2N that is the entropy of the distribution p T 1 (i, ). log N i 2N

Now inductively for s = T 2, T 3,..., 0, for each i we define p s (i, ) as the distribution q( ) on [ N, N] which maximizes j q(j) log q(j) + j q(j)e s+1 (j) (4) subject to having mean = i, and let e s (i) be the corresponding maximized value of (4). So this dynamic programming construction inductively specifies the maximum entropy process, starting at state i at time s, satisfying (3) and the martingale property.

Now a back-of-an-envelope calculation shows what the continuous limit should be. Rescale (s, i) to (t, x) and rescale e s (i) to e(t, x), 0 < t < 1, 1 < x < 1 We can solve the maximization problem to get the PDE with boundary conditions e t = 1 2 log( e xx) (5) e(t, ±1) = 0, 0 t < 1; e(1, x) = 0, 1 < x < 1; and we find that STOP. σ 2 (t, x) = 1 e xx (t, x) (6)

2. The Lake Wobegon Publishing Group. Recall that in Lake Wobegon all the women are strong, all the men are good looking, and all the children are above average. In this spirit the Lake Wobegon Math Society journal only publishes papers whose quality is above the average quality of their previously-published papers. We model the quality U 1, U 2,... of successive submissions as IID. Remark. Analogous to record process but not distribution-free. Use U(0, 1) or Exponential(1) distributions. Exercise (graduate stochastic processes course): study Z n := number of accepted papers among first n submissions, Now imagine a second journal that considers all papers rejected by first journal, and uses the same better than average rule for acceptance. And so on......

This is a putting objects into piles random model. There are two well known such models that are corners of large topics. Chinese Restaurant Process. Patience sorting. Cards with IID U(0,1) labels put into piles so we see the label on the top card in each pile. These visible labels are in increasing order, for instance 0.15, 0.33, 0.40, 0.54, 0.71, 0.83 If next card has label 0.45 we put it on top of the 0.40 card (largest label smaller than the card we are placing); if it had label 0.09 we would start a new pile on the left. Our Lake Wobegon Publishing Group model is the variant of patience sorting where we use the average of the labels in each pile (instead of the maximum) to determine where the next card is placed.

To study patience sorting informally consider F (t, x) = E( number of piles with top card label x). At given t the process of top card labels is approximately a spatial Poisson process, rate λ(x) = F x. So chance that next card increases the number of piles with top card label x is the chance the next label is in the interval x This chance = 1/λ(x), so this gives the PDE F t = 1/F x stated at start of talk, whose solution is F (t, x) = 2 tx. So the process has 2n 1/2 piles with order n 1/2 cards in each. Background: number of piles = length of longest increasing subsequence of a random permutation.

To study our Lake Wobegon Publishing Group model informally, consider F (t, x) = E( number of piles with average card label x). A slightly more elaborate calculation gives ( ) 1 1 = F x 2F t F x which can be rearranged to F tt F x + F xt F t + 2F 2 t F x = 0. t Boundary conditions STOP. F (t, 0) = F (0, x) = 0.

3. Online ζ(3). The setting is: complete graph on n vertices assign IID Exponential(mean n) edge-lengths L n = length of minimum spanning tree (MST). A remarkable offline result of Frieze (1985) shows n 1 EL n ζ(3). The formula can be seen as arising from two simple relations. First lim n n 1 EL n = 1 2 0 λp(λ)dλ where p(λ) is the (limit) probability that a length-λ edge is in the MST.

[repeat] p(λ) is the (limit) probability that a length-λ edge is in the MST. Second 1 p(λ) = q 2 (λ) (7) where q(λ) is the probability that a Galton-Watson Branching Process with Poisson(λ) offspring survives forever. Here (7) holds because (i) a length-λ edge is not in the MST iff there is an alternative path between its end-vertices using only edges of length < λ. (ii) relative to a given vertex, the process of edges of length < λ is (in n limit) a Galton-Watson BP with Poisson(λ) offspring. (iii) Both BPs surviving forever (n = ) corresponds for large finite n to having overlapping giant components.

In project with Omer Angel and Nathanael Berestycki (moribund since 2008) we study the corresponding online spanning tree problem where the edge-lengths are shown in random order, and one must decide immediately whether to use the edge in the tree. Fix n. Any scheme for constructing a tree will build up a forest of tree-components. Represent state as a point x = (x 1,..., x n ) in the simplex n 1 x i = proportion of vertices in size-i components. Initial state is (1, 0, 0, 0,...) but we consider F n (x) = n 1 E(length of optimal online tree starting from x). Use classical method: get equation for F n ( ) by conditioning on first step.

F n (x) = n 1 E(length of optimal online tree starting from x). Write x ij for the vector obtained from x if we accept an edge linking a size-i and a size-j component. Clearly the optimal rule is: accept such an edge iff its length is < F n (x) F n (x ij ). Then by considering the cost of the first accepted edge from x we find F n (x) = n 2 i,j x i x j (F n (x) F n (x ij )) 2 4 (8) which is in fact exact for a Poissonized arrival process. We now just write down the heuristic n limit of (8) as a PDE for a function G(y) on the infinite-dimensional simplex. G = 1 y i y j (ig i + jg j (i + j)g i+j ) 2 4 i,j G i = G(y) y i

A PDE for a function G(y) on the infinite-dimensional simplex. A boundary condition is G = 1 y i y j (ig i + jg j (i + j)g i+j ) 2 4 i,j G(y n ) 0 whenever y n i 0 i We also have, from the fact F n (x) F n (x ij ) > 0, that ig i + jg j (i + j)g i+j > 0. So we conjecture there is a unique solution to the PDE and that the online ζ(3) constant is G(1, 0, 0, 0,...). STOP.