Stanford Statistics 311/Electrical Engineering 377

Size: px

Start display at page:

Download "Stanford Statistics 311/Electrical Engineering 377"

Holly Sanders
5 years ago
Views:

1 I. Uiversal predictio ad codig a. Gae: sequecex ofdata, adwattopredict(orcode)aswellasifwekew distributio of data b. Two versios: probabilistic ad adversarial. I either case, let p ad q be desities or probability ass fuctios (codes). Regret of sequece x is Reg(Q,P,x ) := log q(x ) log p(x ) = i= log q(x i x i ) log p(x i x i ). Associated axiu regret (usuall just regret) with respect to faily P is R X (Q,P) := sup Reg(Q,P,x ). P P,x X (Note i ore geerality, ca have loss fuctios up there other tha log-loss; will discuss ore later.) 2. Redudacy related to codig is expected regret uder distributio P. That is, Red (Q,P) := E P [log q(x ) log p(x ) = D kl (P Q). The worst-case redudacy with respect to a class P is R(Q,P) := sup Red (Q,P). P P We saw last tie that if q(x) = 2 l C(x), the we had the codig gae. c. Exaple: filterig proble. Suppose we believe X i N(AX i +g,σ 2 I), where we assue σ 2 is fixed ad kow. The we ight look at distributios Q that at iteratio i predict X i N(µ i,σ 2 I), i which case the two values are Reg(Q,P,x ) = i= 2σ 2 µ i x i 2 2 2σ 2 Ax i +g x i 2 2 for the regret, ad for the redudacy we have Red (Q,P) = 2σ 2 d. Miiax strategies for regret: i= E[ AX i +g µ i (X i ) Coplexity i the regret settig: assue we have paraetric set P = {P } Θ defied o X The coplexity of the set P is Cop (P) := log p (x )dx or geerally Cop (P) := log p (x )dµ(x ). X sup Θ 2. Actual value of iiax gae if Cop(P) < : X sup Θ

2 Propositio 0.. Assue that P has fiite coplexity Cop(P) <. The the iiax regret if Q RX (Q,P) = Cop (P) Proof Note that if we choose the oralized axiu likelihood distributio or Shtarkov distributio Q to have desity q(x ) = sup p (x )/ sup p (x )dx, the Q has costat regret [ R X (Q,P) = sup log x X = sup x [ log q(x ) log sup p (x ) sup p sup p (x ) log sup p (x ) = Cop (P). Moreover, for ay other distributio Q Q there is soe z X satisfyig q(z ) < q(z ), so that R X (Q,P) log q(z ) log sup p (z ) > log q(z ) log sup p (z ) = Cop (P), because Q assigs the sae probability to each sequece. 3. Exaple: Coplexity of the Beroulli distributio. We ay copute this early exactly. First, we paraeterize via [0,, so that for a sequece x {0,} with ozeros, we have for = / that ( ) ( ) P (x ) = ( ) = exp( h 2 ( )), where h 2 (p) = plogp ( p)log( p) is the biary etropy. Moreover, we have P (x ) = sup [0,P (x ). Thus we fid that ( ) Cop ([0,) = log e h 2( ). Now I cheat by usig Stirlig s approxiatio without tellig you: for ay p (0,) with p N, we have ( ) [, exp(h 2 (p)). p 8p( p) πp( p) =0 Dealig with = ad = 0 explicitly, we the obtai =0 ( )exp( h 2 ( )) = = [ 8, ( ) exp( h 2 ( )) π = ( }{{ ) } 0 (( )) 2 2

3 I particular, we have that as, (2+[8 /2,π /2 /2 if Q RX (Q,P) = Cop ([0,) = log (Fisher iforatio) II. Fisher iforatio = 2 log+log ( ) d +O(). ) d +o() ( ) a. Fisher Iforatio: let deote a paraetric faily, where is suitably sooth. I := E [ log log = E [ l l, where l = log. Note that with variables b. Alterate defiitios: Uder suitable soothess coditios, we have E [ l = logdx = dx = dx = dx = = 0. Also, because we have 2 log = 2 I = E [ l l = 2 = 2 l l, 2 logdx+ 2 dx = E[ 2 log+ 2 dx = E[ 2 log. } {{ } = c. A few exaple Fisher iforatios:. Exaple 0. (Caoical expoetial faily): We have log =,φ(x) A(), ad because l = φ(x) A() ad 2 log = 2 A(), we obtai I = 2 A(). 2. Exaple 0.2 (Two paraeterizatios of Beroulli): I caoical paraeterizatio froexpoetialfaily,wesawexp(x log(+e e )),sofisheriforatiois = +e +e p( p) uder chage of variables p = e /(+e ), or = log p p. O the other had, if P(X = x) = p x ( p) x, the logp(x = x) = x p x p, so that [ (X E p p X ) 2 = p p + p = p( p). 3

4 III. Fisher iforatio: Craér Rao Boud Propositio 0.2 (Craér Rao Boud). Let φ : R d R be arbitrary differetiable fuctio ad assue that T is ubiased for φ() uder P. The Var(T) φ() I φ(). Iediate corollary: take φ() = λ,, ad vary λ, ad we obtai that for ay ubiased estiator T for, Var( λ,t) λ I λ, or Cov(T) I. Proof First, if we ca ove derivatives i ad out wishy-washy-like, Cov(T φ(), l,j ) = E[(T φ()) l,j = E[T l,j = T(x) j dx = j T(x)dx = φ(). j Now, we ote that Var(T ) 0 for all λ, ad usig the previous equalities, Var(T ) = Var(T)+λ I 2E[T = Var(T)+λ I λ 2 λ, φ(). ) = Var(T) φ() I φ(), ad rear- Takig λ = I φ() gives 0 Var(T ragig gives the result. IV. Coectios of Fisher iforatio to divergeces a. Exaple 0.3(Divergeces i expoetial failies): Suppose = h(x) exp(, φ(x) A()). The D kl (P P 2 ) = A( 2 ) A( ) A( ), 2. Note that for sooth A, which is all expoetial faily odels, we thus have D kl (P P 2 ) = 2 2, 2 A( )( 2 ) +O( 2 3 ) = 2 2,I ( 2 )+O( 2 3 ). Coect with Brega divergece via B f (x,y) = f(x) f(y) f(y),x y. b. Other sketchy result: KL divergece ad Fisher iforatio. We clai that Propositio 0.3. For appropriately sooth failies, D kl (P P 2 ) = 2 2,I ( 2 )+o( 2 2 ). 4

5 Sketch of Proof We have E [logp 2 (X) = E [logp (X)+E [ logp (X), E [( 2 ) 2 logp (X)( 2 )+E [R(, 2,X), where R(, 2,X) is equal to the third derivative (tesor) evaluated at take at soe poit (X) = λ X + ( λ X ) 2, where λ X [0, ay deped o X, that is, R(, 2,X) = 6 3 (X)[ logp (X) 2 = O X ( 2 3 ), where O X deotes a big-oh ter depedig o X. Assuig that we ca ove derivatives outside of itegrals appropriately, we obtai E [logp 2 (X) = E [logp (X)+ p (x)dx, ( 2 ) E [ 2 logp (X)( 2 )+o( 2 2 ) = E [logp (X) 2 ( 2 ) I ( 2 )+o( 2 2 ). Rearragig ad otig that E [logp 2 (X) E [logp (X) = D kl (P P 2 ) gives the result. Reark: Coditios to ake all this go through are those sufficiet to apply Lebesgue s doiated covergece theore. So if, for the base easure µ we have the existece of a fuctio g such that g(x) f(x,) for all, where gdµ <, the f(x,)dµ = f(x,)dµ by the ea-value theore (ote that for all 0 we have the upper boud sup v: v 2 δ f(x,) =0 2 g(x)). More geerally, ca hadle absolutely cotiuous fuctios, which are differetiable alost +v everywhere. 5

A PROBABILITY PROBLEM

A PROBABILITY PROBLEM A big superarket chai has the followig policy: For every Euros you sped per buy, you ear oe poit (suppose, e.g., that = 3; i this case, if you sped 8.45 Euros, you get two poits,