Bernoulli trials with variable probabilities - an observation by Feller

Size: px

Start display at page:

Download "Bernoulli trials with variable probabilities - an observation by Feller"

Kenneth Barnard Cobb
5 years ago
Views:

1 Beroulli trials with variable probabilities - a observatio by Feller Peter Haggstrom mathsatbodibeach@gmail.com October 15, Backgroud I his famous book, A Itroductio to Probability Theory ad its Applicatios, Third Editio. Volume 1 [3 William Feller develops a iterestig ad somewhat couterituitive result for the sum of mutually idepedet radom variables X k such that each oe assumes the values 1 ad 0 with probabilities p k ad q k 1 p k respectively (see [3 pages 30-31). He is iterested i a iterpretatio of the variace of the sum S X 1 + X + + X so he proceeds as follows. For each k we have that the expectatio of X k, E(X k ) p k ad the variace: Var(X k ) E(X k ) (E(X k)) p k p k p kq k (1) If you have trouble seeig where these results come from, recall that sice the Beroulli variable X k ca take the value 1 with probability p k ad 0 with probability q k, hece E(X k ) 1 p k + 0 q k p k. The variace is defied as Var(X k ) E((X k µ) ) E(X k µx k + µ ) E(X k ) µe(x k) + E(µ ) E(X k ) µ + µ E(X k ) µ. Note that E(X k ) µ ad the liearity of the expectatio operator has bee used. Also the expectatio of a costat, eg µ is just the costat itself. Fially ote that E(X k ) 1 p k + 0 q k p k. We kow that the variace of the sum of mutually idepedet radom variables X k, S X k is: Var(S ) σk where: σ k Var(X k ) () 1

2 A proof of () ca be foud i the Appedix. Hece, usig (1) we have: Var(S ) Var(X k ) p k q k (3) Feller s descriptio of the issue I will simply quote Feller at [3 page 31:..the variable S may be iterpreted as the total umber of successes i idepedet trials, each of which results i success or failure. The p p 1+ +p is the average probability of success, ad it seems atural to compare the preset situatio to Beroulli trials with the costat probability of success p. Such a compariso leads us to a strikig result. We may rewrite (3) i the form: Var(S ) p p k (4) Next, it is easily see (by elemetary calculus or iductio) that amog all combiatios p k such that p k p the sum p k assumes its miimum value whe all the p k are equal. It follows that, if the average probability of success p is kept costat, Var(S ) assumes its maximum value whe p 1 p p. We have thus the surprisig result that the variability of p k, or lack of uiformity, decreases the magitude of chace fluctuatios as measured by the variace. For example, the umber of fires i a commuity may be treated as a radom variable; for a give average umber, the variability is maximal if all households have the same probability of fire. Give a certai average quality p of machies, the output will be least uiform if all machies are equal. (A applicatio to moder educatio is obvious but hopeless.) 3 Discussio The fial italicized commet by Feller above is ideed surprisig eve give the simplifyig assumptios made. Notwithstadig Feller s stature as a probabilist (he was resposible for major aalytic work o the Cetral Limit Theorem i the 1930s) let s evertheless satisfy ourselves that he is right. The way the issue is set up what we have is a optimisatio problem: Miimise p k subject to the costrait: p p 1+ +p p is costat ie p k

3 I the laguage of Lagrage multipliers we thus have somethig of the form of miimisig F (p 1, p,..., p ) subject to φ(p 1, p,..., p ) 0. Here F (p 1, p,..., p ) p k ad φ(p 1, p,..., p ) p k p 0. Thus we form the auxiliary fuctio: To fid a extremum we eed: G(p 1, p,..., p ) F (p 1, p,..., p ) + λφ(p 1, p,..., p ) (5) G 0 k (6) Recall that this is a ecessary coditio so further ivestigatio is eeded to establish that we actually have a miimum. Makig the relevat substitutios i (5) we have: Differetiatig we get: G(p 1, p,..., p ) p k + λ( p k p ) (7) G p k + λ 0 λ pk k (8) Now (8) oly makes sese whe the p k are costat ie p k p for all k. The costrait p k p 0 the becomes p p 0. Therefore p p 0 ad so p p p 1+ +p. To see that p p 1+ +p p k k actually miimises p k perturb two of the p k as follows. Without loss of geerality we ca relabel the p k so that p 1 p p. Let: p 1 p 1 ɛ, p p + ɛ (9) where ɛ > 0. The other values of p k remai the same ie p k p k for k 1,. Hece p p p k ie it is costat. Thus: p k p k p k ( (p 1 ɛ) +(p +ɛ) )+p 3 + +p ) ɛ ɛ(p p 1 ) < 0 (10) 3

4 Notig the assumptio that p 1 p. Thus p k < have a miimum. p k ad so we do ideed I two dimesios we ca satisfy ourselves that p p 1+p +p 3 3 miimises hece maximises the variace defied i (4) as Var(S ) p p k. Let: f(p 1, p ) ( p 1 + p ) p k p 1 + p p k ad p k (11) Recall that if f : R R is differetiable at p with cotiuous secod partials at all poits sufficiety close to p ad f( p) 0, the if the determiat of the Hessia at p is strictly positive the if f > 0 at p implies a local miimum at p. x The Hessia is defied as follows: H(p 1, p ) f p 1 f p 1 p f p p 1 f p (1) Of course the assumptio of cotiuous secod partials meas that (1) gives the structure i the geeral case. Thus: f p p 1 f p 1 p but det H(p 1, p ) > 0 (13) Note that the extremum is give by: f p 1 1 p 1 0 f(p 1, p ) p 1 p 1 1 p 0 f p (14) Thus we have a local miimum at p p 1+p 1. I dimesios the local miimum will exist where the Hessia is o-zero ad the eigevalues of the Hessia at the extremum are all positive. See [ pages 60-6 for a discussio. A iductive proof (alluded to by Feller) is perhaps more satisfyig i showig that p p 1+ +p miimises p k. The base case of 1 is trivial (ad bereft of useful isight) so we cosider. Thus: 4

5 p k ( p p 1 + p p1 + p ) p 1 p 1p + p (p 1 p ) 0 (15) Thus p is miimal. Now for + 1 we have, where p p 1+ +p +p : p k +1 p p k ( + 1)(p p + p +1 ) ( + 1) p k + p +1 (p + p +1) + 1 p + p +1 ( p + p p +1 + p +1 ) + 1 usig the iductio hypothesis that p k p ad p ( + 1)p + ( + 1)p +1 p p p +1 p p + p + p +1 + p +1 p p p +1 p p k is costat p p p +1 + p (p p +1) (16) Thus +1 p k +1 p ad so the propositio is established by iductio. There is a aalogy betwee this problem ad that of fidig the discrete probability distributio o the poits {p 1, p,..., p } with maximal iformatio etropy. Thus we eed to maximise the Shao etropy defied by: f(p 1,..., p ) p k log p k (17) For there to be a legitimate probability distributio we eed the followig costrait: g(p 1,..., p ) 5 p k 1 (18)

6 Usig Lagrage multipliers we proceed by formig the auxiliary equatio: ad so we require: { f(p 1,..., p ) + λ ( g(p 1,..., p ) 1 ) (19) ( )} p k log p k + λ p k 1 0 evalulated at p p (0) There are separate equatios i (0) ad whe we carry put the dfferetiatio for k 1,,..., we get: 0 log p k p (log p k ) k + λ log p k p ( l p k l ) k + λ (log p k + 1 l ) + λ λ log p k + 1 l (1) The last lie of (1) implies that all the p k are equal because they deped solely o λ. Thus p k p for all k but the costrait p k 1 meas that p 1. Thus the required distributio is uiform with probability p k 1. More detail o optimisatio theory applyig to etropy maximisatio ca be foud i [4, pages -8 Feller also otes at [3, page 8 that S has a Poisso distributio i the limit ie where the p k deped o is such as way that the largest p k teds to zero, but the sum p 1 + p + + p λ remais costat. This result is derived usig probability geeratig fuctio techiques. 4 Applicatio to ivestmet ad MOOCs I the ivestmet world each asset maager ivests i a variety of assets eg domestic shares, iteratioal shares, bods etc ad their idividual performace for each asset class ca be measured. However, i so doig oe has to be aware of the games that the asset maagers play. For istace, they ca pick bechmarks that are relatively easy to exceed. They ca play with how they disclose performace eg pre-tax ad fees versus after tax ad fees. Frequetly performace is related to the performace of the media 6

7 maager for that asset class ie 50% of the maagers will have performace higher tha that maager. Feller s commet that give a certai average quality p of machies, the output will be least uiform if all machies are equal clearly has fudametal relevace to the performace of asset maagers. If we foud that the average probability of a uiverse of asset maagers exceedig some asset class bechmark was 70%, say, the performace of this group of maagers would be least uiform if all the maagers beat the bechmark with this probability. Give the propesity for herd like behaviour amog asset maagers i some cotexts this theoretical propositio could assume more practical sigificace. I am ot aware of a empirical work o this specific issue. This priciple also has importat implicatios for massive o-lie ope courses (MOOCS). 5 Appedix To prove equatio () we suppose that X 1,..., X are mutually idepedet radom variables with fiite variaces σ 1,..., σ ad S X X. We let µ k E[X k ad m µ µ E[S. The S m (X k µ k ) The: ( ) ( ) (S m ) (X j µ j ) (X k µ k ) j1 (X k µ k ) + (X j µ j )(X k µ k ) Takig expectatios of both sides ad usig the liear properties of the expectatio operator we get: j<k () [ Var(S ) E (S m ) [ E (X k µ k ) + [ E (X k µ k ) [ + E (X j µ j )(X k µ k ) [ E (X k µ k ) + E [(X k µ k ) σ k j<k Cov(X j, X k ) j<k (X j µ j )(X k µ k ) j<k (3) 7

8 I (3) the term j<k Cov(X j, X k ) 0 because the X j, X k are mutually idepedet. This fudametal fact about the covariace is established as follows: [ Cov(X j, X k ) E (X j µ j )(X k µ k ) [X j X k X j µ k X k µ j + µ j µ k E E [X j X k E [X j E E [X j µ k [X k µ k E E [X k µ j [X j µ j E + E [µ j µ k [X k + µ j µ k usig idepedece E[XY E[X E[Y µ j µ k µ k µ j µ j µ k + µ j µ k 0 (4) 6 Refereces 1. Stephe Boyd ad Lieve Vadeberghe, Covex Optimizatio,Cambridge Uiversity Press, 004. David M Bressoud, Secod Year Calculus: From Celestial Mechaics to Special Relativity, Spriger, William Feller, A Itroductio to Probability Theory ad its Applicatios, Third Editio. Volume 1, Wiley. 4. Claude E. Shao, (July October 1948). A Mathematical Theory of Commuicatio Bell System Techical Joural 7 (3): bstj/vol7-1948/articles/bstj pdf 8

Distribution of Random Samples & Limit theorems

Distribution of Random Samples & Limit theorems STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to