L = n i, i=1. dp p n 1

Similar documents
Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Infinite Sequences and Series

4. Partial Sums and the Central Limit Theorem

Singular Continuous Measures by Michael Pejic 5/14/10

CS284A: Representations and Algorithms in Molecular Biology

Approximations and more PMFs and PDFs

6.3 Testing Series With Positive Terms

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

Shannon s noiseless coding theorem

UNIT 2 DIFFERENT APPROACHES TO PROBABILITY THEORY

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Math 155 (Lecture 3)

M A T H F A L L CORRECTION. Algebra I 1 4 / 1 0 / U N I V E R S I T Y O F T O R O N T O

Sequences. Notation. Convergence of a Sequence

Axis Aligned Ellipsoid

Lecture 7: Properties of Random Samples

Lecture 19: Convergence

Statistics 511 Additional Materials

This section is optional.

4x 2. (n+1) x 3 n+1. = lim. 4x 2 n+1 n3 n. n 4x 2 = lim = 3

CHAPTER I: Vector Spaces

0, otherwise. EX = E(X 1 + X n ) = EX j = np and. Var(X j ) = np(1 p). Var(X) = Var(X X n ) =

Information Theory and Statistics Lecture 4: Lempel-Ziv code

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

5. INEQUALITIES, LIMIT THEOREMS AND GEOMETRIC PROBABILITY

The Growth of Functions. Theoretical Supplement

Series: Infinite Sums

Math 2784 (or 2794W) University of Connecticut

THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

The Discrete Fourier Transform

7.1 Convergence of sequences of random variables

Chapter 6 Infinite Series

Statisticians use the word population to refer the total number of (potential) observations under consideration

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Carleton College, Winter 2017 Math 121, Practice Final Prof. Jones. Note: the exam will have a section of true-false questions, like the one below.

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Distribution of Random Samples & Limit theorems

Abstract Vector Spaces. Abstract Vector Spaces

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

7.1 Convergence of sequences of random variables

10.6 ALTERNATING SERIES

1 Convergence in Probability and the Weak Law of Large Numbers

Basics of Probability Theory (for Theory of Computation courses)

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

n=1 a n is the sequence (s n ) n 1 n=1 a n converges to s. We write a n = s, n=1 n=1 a n

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Measure and Measurable Functions

Advanced Stochastic Processes.

Lecture 6: Source coding, Typicality, and Noisy channels and capacity

Exercises 1 Sets and functions

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology

MAT1026 Calculus II Basic Convergence Tests for Series

1.3 Convergence Theorems of Fourier Series. k k k k. N N k 1. With this in mind, we state (without proof) the convergence of Fourier series.

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

Estimation for Complete Data

Beurling Integers: Part 2

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense,

JANE PROFESSOR WW Prob Lib1 Summer 2000

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

MA131 - Analysis 1. Workbook 3 Sequences II

Math 113 Exam 3 Practice

The standard deviation of the mean

Inverse Matrix. A meaning that matrix B is an inverse of matrix A.

THE KALMAN FILTER RAUL ROJAS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 6 9/24/2008 DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

Lecture 14: Graph Entropy

Application to Random Graphs

Math 10A final exam, December 16, 2016

The Gamma function Michael Taylor. Abstract. This material is excerpted from 18 and Appendix J of [T].

ON POINTWISE BINOMIAL APPROXIMATION

Math 113, Calculus II Winter 2007 Final Exam Solutions

Simulation. Two Rule For Inverting A Distribution Function

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Math 299 Supplement: Real Analysis Nov 2013

Seunghee Ye Ma 8: Week 5 Oct 28

TR/46 OCTOBER THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION A. TALBOT

Math 104: Homework 2 solutions

Chimica Inorganica 3

MATHEMATICAL SCIENCES PAPER-II

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

4.3 Growth Rates of Solutions to Recurrences

Comparison Study of Series Approximation. and Convergence between Chebyshev. and Legendre Series

Chapter 4. Fourier Series

Lecture 2: Monte Carlo Simulation

Disjoint set (Union-Find)

Series III. Chapter Alternating Series

PH 425 Quantum Measurement and Spin Winter SPINS Lab 1

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

International Contest-Game MATH KANGAROO Canada, Grade 11 and 12

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

On a Smarandache problem concerning the prime gaps

University of Colorado Denver Dept. Math. & Stat. Sciences Applied Analysis Preliminary Exam 13 January 2012, 10:00 am 2:00 pm. Good luck!

MA131 - Analysis 1. Workbook 2 Sequences I

Transcription:

Exchageable sequeces ad probabilities for probabilities 1996; modified 98 5 21 to add material o mutual iformatio; modified 98 7 21 to add Heath-Sudderth proof of de Fietti represetatio; modified 99 11 24 to make the presetatio clearer ad more complete ad 00 10 18 to iclude commets o the itegratio measure Suppose oe assigs a probability, P p 1,, p = P p, to the sigle-trial probabilities for alteratives The, i trials, the occurrece probability ie, the total probability that alterative i occurs i times, i = 1,, is give by Here p = p 1,, = dp p 1,, pp p! = dp 1!! p 1 1 p P p! = 1!! p 1 1 p = i, dp = dp 1 dp, ad the itegral rus over positive values of the sigle-trial probabilities The probability o probabilities, P p, is restricted to the simplex; ie, as a fuctio o positive values of the probabilities, it is proportioal to a delta fuctio δ p i 1 otice that, i cotrast to other otes, we do ot iclude the iverse directio cosie i the itegratio measure o the simplex, ad we put the δ fuctio that restricts to the simplex i the distributio rather tha i the itegratio measure The momet of sigle-trial probabilities, 1 p = dp p 1 1 p P p, is the probability for ay sequece i which occurrece umbers are give by the vector = 1,, The last form of p thus writes the occurrece probability i the form of a momet of the sigle-trial probabilities otice that the occurrece probabilities for trials are determied by the th-order momets of P p I particular, the margial probabilities for a sigle trial, p i = dp p i P p, 1

are the first momets of P p A exchageable probability assigmet or a exchageable sequece is oe such that the probability for a sequece does ot chage uder reörderig; i other words, all sequeces with the same occurrece vector have the same probability Ay probability o probabilities leads to a exchageable probability assigmet o the multi-trial hypothesis space This meas that there is a map from probabilities o probabilities to exchageable probability assigmets The de Fietti represetatio theorem asserts that ay exchageable probability assigmet correspods to a uique probability o probabilities Aother way of puttig this is that the map from probabilities o probabilities to exchageable probability assigmets is oe-to-oe ad oto We ca get at the uiqueess ie, the map is oe-to-oe easily Oe way to proceed is to defie a characteristic fuctio Φk e ik p = dp e ik p P p = 1,, = 1,, i 1!! k 1 1 k 1 p i! k 1 1 k p That P p is restricted to the simplex meas that for vectors of the form k = k1,, 1, the characteristic fuctio becomes Φk = e ik ow it is clear why two differet probabilities o probabilities caot lead to the same exchageable probability assigmet: if they did, they would have the same characteristic fuctio ad thus, uder the iverse Fourier trasform, they would be the same Aother way of puttig this is that the polyomials p 1 1 p are liearly idepedet ad complete but ot orthogoal Thus two differet probabilities o probabilities caot lead to the same exchageable sequece, for if they did, they would have have the same overlap with this complete set of polyomials ad thus would be the same Showig that every exchageable assigmet correspods to a probability o probabilities the map is oto requires more work Suppose, for example, that oe uses the occurrece probabilities p to defie a characteristic fuctio ad the iverts the Fourier trasform to get a fuctio P p The ormalizatio of the occurrece probabilities implies that Φk = e ik for k = k1,, 1, which i tur implies that P p is restricted to the surface i p i = 1 The difficulty is that oe ca t tell from this procedure that P p is restricted to positive values of the probabilities ie, restricted to the simplex or, eve worse, that it is positive This difficulty has to be remedied by usig some other method The simplest proof seems to be oe due to David Heath ad William Sudderth [The America Statisticia 304, 188 189 ovember 1976], which I sketch here for the case of biary alteratives, the case cosidered i their paper et X 1, X 2,, X M deote the results of trials of a biary quatity takig o values 0 ad 1, ad let p, K, K, be the probability for 1s i K trials Exchageability guaratees that K p, K = px 1 = 1,, X = 1, X +1 = 0,, X K = 0 2

We ca coditio the probability o the right o the occurrece of m 1s i all trials: K p, K = px 1 = 1,, X = 1, X +1 = 0,, X K = 0 m, pm, m=0 Give m 1s i trials, the sequeces are equally likely Thus the situatio is idetical m to drawig without replacemet from a ur that has m 1s o balls, ad we have that px 1 = 1,, X = 1, X +1 = 0,, X K = 0 m, = m m 1 1 = m m K K, m 1 m m 1 m K 1 1 1 K 1 q 1 r q r j = rr 1 r q + 1 = j=0 Therefore, we have the mai result that p, K = r! r q! K m m K pm, m=0 K The de Fietti represetatio theorem fails for sequeces that are exchageable for a fiite umber of trials : for fiite exchageable sequeces that ca be derived from a probability o probabilities, the probability o probabilities is ot uique, ad there are fiite exchageable sequeces i particular, aticorrelated sequeces such as drawig from a ur without replacemet that caot be derived from a probability o probabilities Yet the Heath-Sudderth proof establishes that all fiite exchageable sequeces ca be derived from mixtures of ur probabilities What remais is to take the limit We ca write p, K as a itegral p, K = K 1 dz z 0 P z = 1 z K K P z, pz, δz m/ m=0 is a distributio cocetrated at the -trial frequecies m/ I the limit, P z coverges to a cotiuous distributio o the simplex, ad the other term i the itegrad goes to z 1 z K, givig K 1 p, K = dz z 1 z K P z 0 3

What we have show is that if P, K is derived from a ifiite exchageable sequece, the it has a de Fietti represetatio i terms of a probability distributio o the simplex The result ca readily be exteded to obiary variables The coclusio is that a probability o probabilities is just a coveiet shorthad for specifyig occurrece probabilities o a multi-trial hypothesis space The Heath-Sudderth proof is based o the fact that if the multi-trial probabilities are derived from a probability o probabilities P p, ie, the i the limit of large, p, K = 1 0 dp K p 1 p K P p, p, K 1/K = P p = /K ; ie, the probability pm, that i the Heath-Sudderth proof becomes the probability o probabilities is just what it ought to be It is iterestig to ivestigate how much iformatio oe gais from trials about the sigle-trial probabilities p = p 1,, p This iformatio is quatified by the mutual iformatio HD ; p = HD HD p, HD = sequeces 1 p log 1 p = p log p 1 1 p 1,, is the Shao iformatio of the data gathered i trials ad HD p = dp P p p i log p i = p i log p i is the coditioal iformatio i the -trial data, give the sigle trial probabilities p otice that p i log p i HD HD p, the first term is the Shao iformatio for trials draw from a iid govered by the sigle-trial margial probabilities p i The first iequality is a cosequece of the subadditivity of Shao iformatio Whe the umber of trials is small, it is hard to make geeral statemets about the mutual iformatio If P p is cocetrated at several widely separated sigle-trial probabilities p, the it takes oly a few trials to begi gettig iformatio about which of the widely separated probabilities is geeratig the data I cotrast, suppose P p is cocetrated at a particular p withi a small rage for each alterative I this case it 4

takes may trials to begi gettig much iformatio about which sigle-trial probabilities withi the rage are geeratig the data We ca estimate the umber of trials required i the followig way, we cosider oly two alteratives = 2 for simplicity After trials, the data is able to determie p 1 to withi a ucertaity give roughly by p1 p 2 / Thus oe would expect to begi gettig iformatio about the value of p 1 whe p1 p 2 /, ie, whe p 1 p 2 / 2 As becomes eve bigger, ie, p 1 p 2 / 2, the data is able to distiguish roughly / p 1 p 2 / = 2 /p 1 p 2 values of p 1, ad the mutual iformatio should be roughly the logarithm of this umber of values, ie, HD ; p log 2 p 1 p 2 We ca put these cosideratios o a firm footig by cosiderig the Gaussia approximatio to the biomial distributio p p The Gaussia approximatio requires that for each alterative i, the umber of trials is large eough that p i / p i, ie, p i 1, for all probabilities p that have substatial support i P p If we further assume that the umber of trials is large eough that the data ca distiguish all the features of P p ie, for each alterative, P p does ot vary sigificatly o the scale p i / the it is a tedious, but straightforward computatio to show that 1 p = 1 1 P p = 2π 1/2 1 ad! p = 1!! p 1 1 p = 1 p 1 P =, which leads to a mutual iformatio HD ; p = dp P p log P pvp, 1 Vp = 1/2 2π p1 p is a probability-depedet volume elemet o the probability simplex, which ca be thought of as the distiguishability volume determied by trials The mutual iformatio 1 has the followig iterpretatio: bi the probabilities p accordig to the volume elemet Vp; the mutual iformatio is the Shao iformatio for the discrete distributio obtaied by replacig the cotiuous distributio P p by the distributio of probabilities for the bis Aother way of sayig this is that the mutual iformatio 1 is the etropy of P p relative to a positio-depedet measure mp = 1/Vp, which describes the positio-depedet distiguishability of distributios p I the aforemetioed example, P p is cocetrated at a particular p, each probability havig a small rage of possible values, the mutual iformatio 1 becomes 1 2 /2π 1/2 HD ; p = log = log, Vp p1 p 5

which simplifies to the estimate above for = 2 Actually, this example is flawed because it requires oe probability, say p, to vary over a rage 1 We ca do a better job of takig ito accout the volume o the simplex by usig a Gaussia p i q i 2 P p = exp 2π 2 1/2 2 2, i which case the mutual iformatio 1 becomes 2πe 2 1/2 / HD ; p = log Vq e 2 1/2 = log q1 q I the first form here, the umerator withi the logarithm ca be thought of as the volume occupied by the Gaussia The is the correctio to the volume that comes from projectig oto the simplex otice that aother eat way to write the mutual iformatio 1 comes from itroducig a Wootters distiguishability metric ds 2 = 4 d p i 2 = dp 2 i p i The volume elemet for the Wootters metric is d W p = δ p i 2 1 2 d p 1 d p dp 1 dp = δ p i 1 p1 p = dp p1 p Redefiig the probability P p i terms of the Wootters metric, gives P W pd W p = P pdp, P W p = p 1 p P p = P pvp = 1/2 2π P W p, ad the mutual iformatio 1 becomes 2π 1/2 HD ; p = d W p P W p log P p W Sice the Wootters metric is based o distiguishability from data i may trials, the mutual iformatio becomes the iformatio of P W p relative to a -depedet, but positio-idepedet measure m W p = /2π 1/2 6