Learning with Maximum Likelihood

Size: px

Start display at page:

Download "Learning with Maximum Likelihood"

Amber Ethel Gibson
6 years ago
Views:

1 Learnng wth Mamum Lelhood Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your own needs. PowerPont orgnals are avalable. If you mae use of a sgnfcant porton of these sldes n your own lecture, please nclude ths message, or the followng ln to the source repostory of Andrew s tutorals: Comments and correctons gratefully receved. Andrew W. Moore Professor School of Computer Scence Carnege Mellon Unversty awm@cs.cmu.edu Copyrght 00, 004, Andrew W. Moore Sep 6th, 00 Mamum Lelhood learnng of Gaussans for Data Mnng Why we should care Learnng Unvarate Gaussans Learnng Multvarate Gaussans What s a based estmator? Bayesan Learnng of Gaussans Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde

2 Why we should care Mamum Lelhood Estmaton s a very very very very fundamental part of data analyss. MLE for Gaussans s tranng wheels for our future technques Learnng Gaussans s more useful than you mght guess Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 3 Learnng Gaussans from Data Suppose you have,, ~ (..d) N(, ) But you don t now (you do now ) MLE: For whch s,, most lely? MAP: Whch mamzes p(,,, )? Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 4

3 Learnng Gaussans from Data Suppose you have,, ~(..d) N(, ) But you don t now Sneer (you do now ) MLE: For whch s,, most lely? MAP: Whch mamzes p(,,, )? Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 5 Learnng Gaussans from Data Suppose you have,, ~(..d) N(, ) But you don t now Sneer (you do now ) MLE: For whch s,, most lely? MAP: Whch mamzes p(,,, )? Despte ths, we ll spend 95% of our tme on MLE. Why? Wat and see Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 6 3

4 MLE for unvarate Gaussan Suppose you have,, ~(..d) N(, ) But you don t now (you do now ) MLE: For whch s,, most lely? arg ma p(,,..., ) Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 7 Algebra Euphora arg ma p(,,..., ) (by..d) (monotoncty of log) (plug n formula for Gaussan) (after smplfcaton) Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 8 4

5 arg ma p( Algebra Euphora arg ma p(,,..., ) arg ma arg ma log p( π, ( arg mn ( ), ) ) ) (by..d) (monotoncty of log) (plug n formula for Gaussan) (after smplfcaton) Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 9 Intermsson: A General Scalar MLE strategy Tas: Fnd MLE θ assumng nown form for p(data θ,stuff). Wrte LL log P(Data θ,stuff). Wor out / θ usng hgh-school calculus 3. Set / θ0 for a mamum, creatng an equaton n terms of θ 4. Solve t* 5. Chec that you ve found a mamum rather than a mnmum or saddle-pont, and be careful f θ s constraned *Ths s a perfect eample of somethng that wors perfectly n all tetboo eamples and usually nvolves surprsng pan f you need t for somethng new. Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 0 5

6 The MLE arg ma p(,,..., ) arg mn ( ) s.t. 0 (what?) Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde The MLE arg ma p(,,..., ) arg mn ( ) s.t. 0 Thus Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde ( ) ( ) 6

7 Laws-a-lawdy! The best estmate of the mean of a dstrbuton s the mean of the sample! At frst sght: Ths nd of pedantc, algebra-flled and ultmately unsurprsng fact s eactly the reason people throw down ther Statstcs boo and pc up ther Agent Based Evolutonary Data Mnng Usng The Neuro-Fuzz Transform boo. Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 3 A General MLE strategy Suppose θ (θ, θ,, θ n ) T s a vector of parameters. Tas: Fnd MLE θ assumng nown form for p(data θ,stuff). Wrte LL log P(Data θ,stuff). Wor out / θ usng hgh-school calculus θ θ θ M θ n Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 4 7

8 A General MLE strategy Suppose θ (θ, θ,, θ n ) T s a vector of parameters. Tas: Fnd MLE θ assumng nown form for p(data θ,stuff). Wrte LL log P(Data θ,stuff). Wor out / θ usng hgh-school calculus 3. Solve the set of smultaneous equatons θ θ M θ n Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 5 A General MLE strategy Suppose θ (θ, θ,, θ n ) T s a vector of parameters. Tas: Fnd MLE θ assumng nown form for p(data θ,stuff). Wrte LL log P(Data θ,stuff). Wor out / θ usng hgh-school calculus 3. Solve the set of smultaneous equatons 0 θ 0 θ M 0 θ n 4. Chec that you re at a mamum Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 6 8

9 A General MLE strategy Suppose θ (θ, θ,, θ n ) T s a vector of parameters. Tas: Fnd MLE θ assumng nown form for p(data θ,stuff). Wrte LL log P(Data θ,stuff). Wor out / θ usng hgh-school calculus 3. Solve the set of smultaneous equatons If you can t solve them, what should you do? 0 θ 0 θ M 0 θ n 4. Chec that you re at a mamum Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 7 MLE for unvarate Gaussan Suppose you have,, ~(..d) N(, ) But you don t now or MLE: For whch θ (, ) s,, most lely? log p(,,..., ) (log π + log ) ( ) ( ) + ( ) 4 Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 8 9

10 MLE for unvarate Gaussan Suppose you have,, ~(..d) N(, ) But you don t now or MLE: For whch θ (, ) s,, most lely? log p(,,..., ) (log π + log ) ( ) 0 ( ) 0 + ( ) 4 Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 9 MLE for unvarate Gaussan Suppose you have,, ~(..d) N(, ) But you don t now or MLE: For whch θ (, ) s,, most lely? log p(,,..., ) (log π + log ) ( ) 0 ( ) 0 + ( ) 4 what? Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 0 0

11 MLE for unvarate Gaussan Suppose you have,, ~(..d) N(, ) But you don t now or MLE: For whch θ (, ) s,, most lely? ( ) Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde Unbased Estmators An estmator of a parameter s unbased f the epected value of the estmate s the same as the true value of the parameters. If,, ~(..d) N(, ) then E[ ] E s unbased Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde

12 Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 3 Based Estmators An estmator of a parameter s based f the epected value of the estmate s dfferent from the true value of the parameters. If,, ~(..d) N(, ) then [ ] ) ( j j E E E s based Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 4 MLE Varance Bas If,, ~(..d) N(, ) then [ ] E E j j Intuton chec: consder the case of Why should our guts epect that would be an underestmate of true? How could you prove that?

13 Unbased estmate of Varance If,, ~(..d) N(, ) then E [ ] E j j So defne unbased [ ] So E unbased Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 5 Unbased estmate of Varance If,, ~(..d) N(, ) then E [ ] E j j So defne unbased [ ] So E unbased unbased ( ) Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 6 3

14 Unbasedtude dscusson Whch s best? ( unbased ( ) ) Answer: It depends on the tas And doesn t mae much dfference once --> large Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 7 Don t get too ected about beng unbased Assume,, ~(..d) N(, ) Suppose we had these estmators for the mean suboptmal + 7 crap Are ether of these unbased? Wll ether of them asymptote to the correct value as gets large? Whch s more useful? Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 8 4

15 MLE for m-dmensonal Gaussan Suppose you have,, ~(..d) N(,Σ) But you don t now or Σ MLE: For whch θ (,Σ) s,, most lely? Σ ( )( ) T Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 9 MLE for m-dmensonal Gaussan Suppose you have,, ~(..d) N(,Σ) But you don t now or Σ MLE: For whch θ (,Σ) s,, most lely? Σ ( )( ) T Where m And s value of the th component of (the th attrbute of the th record) And s the th component of Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 30 5

16 MLE for m-dmensonal Gaussan Suppose you have,, ~(..d) N(,Σ) But you don t now or Σ MLE: For whch θ (,Σ) s,, most lely? Σ ( )( ) j Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 3 T ( )( j ) j Where m, j m And s value of the th component of (the th attrbute of the th record) And j s the (,j) th component of Σ MLE for m-dmensonal Gaussan Suppose you have,, ~(..d) N(,Σ) But you don t now or Σ MLE: For whch θ (,Σ) s Note, how, Σ most s forced lely? to be Σ ( )( ) Σ unbased Σ Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 3 T Q: How would you prove ths? A: Just plug through the MLE recpe. symmetrc non-negatve defnte Note the unbased case How many dataponts would you need before the Gaussan has a chance of beng non-degenerate? ( )( ) T 6

17 Confdence ntervals We need to tal We need to dscuss how accurate we epect and Σ to be as a functon of And we need to consder how to estmate these accuraces from data Analytcally * Non-parametrcally (usng randomzaton and bootstrappng) * But we won t. Not yet. *Wll be dscussed n future Andrew lectures just before we need ths technology. Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 33 Structural error Actually, we need to tal about somethng else too.. What f we do all ths analyss when the true dstrbuton s n fact not Gaussan? How can we tell? * How can we survve? * *Wll be dscussed n future Andrew lectures just before we need ths technology. Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 34 7

Gaussan MLE n acton Usng 39 cars from the MPG

18 Gaussan MLE n acton Usng 39 cars from the MPG UCI dataset suppled by oss Qunlan Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 35 Data-starved Gaussan MLE Usng three subsets of MPG. Each subset has 6 randomly-chosen cars. Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 36 8

19 Bvarate MLE n acton Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 37 Multvarate MLE Covarance matrces are not ectng to loo at Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 38 9

20 Beng Bayesan: MAP estmates for Gaussans Suppose you have,, ~(..d) N(,Σ) But you don t now or Σ MAP: Whch (,Σ) mamzes p(,σ,, )? Step : Put a pror on (,Σ) Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 39 Beng Bayesan: MAP estmates for Gaussans Suppose you have,, ~(..d) N(,Σ) But you don t now or Σ MAP: Whch (,Σ) mamzes p(,σ,, )? Step : Put a pror on (,Σ) Step a: Put a pror on Σ (ν 0 -m-) Σ ~ IW(ν 0, (ν 0 -m-) Σ 0 ) Ths thng s called the Inverse-Wshart dstrbuton. A PDF over SPD matrces! Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 40 0

21 Beng Bayesan: MAP estmates for Gaussans ν Suppose 0 small: I am not sure you have,, ~(..d) N(,Σ) about my guess of Σ 0 Σ But you don t now or Σ 0 : (oughly) my best guess of Σ MAP: ν 0 large: Whch I m pretty (,Σ) sure mamzes p(,σ,, )? about my guess of Σ 0 Ε[Σ ] Σ 0 Step : Put a pror on (,Σ) Step a: Put a pror on Σ (ν 0 -m-) Σ ~ IW(ν 0, (ν 0 -m-) Σ 0 ) Ths thng s called the Inverse-Wshart dstrbuton. A PDF over SPD matrces! Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 4 Beng Bayesan: MAP estmates for Gaussans Suppose you have,, ~(..d) N(,Σ) But you don t now or Σ MAP: Whch (,Σ) mamzes p(,σ,, )? Step : Put a pror on (,Σ) Step a: Put a pror on Σ (ν 0 -m-)σ ~ IW(ν 0, (ν 0 -m-)σ 0 ) Step b: Put a pror on Σ Σ ~ N( 0, Σ / κ 0 ) Together, Σ and Σ defne a jont dstrbuton on (,Σ) Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 4

22 Beng Bayesan: MAP estmates for Gaussans Suppose you have,, ~(..d) N(,Σ) But you don t now or Σ κ 0 small: I am not sure MAP: Whch (,Σ) mamzes about p(,σ my guess, of, 0 )? 0 Step : My : best Put guess a pror of on (,Σ) Step E[ a: ] Put a 0 pror on Σ (ν 0 -m-)σ ~ IW(ν 0, (ν 0 -m-)σ 0 ) Step b: Put a pror on Σ Σ ~ N( 0, Σ / κ 0 ) Notce how we are forced to epress our gnorance of proportonally to Σ κ 0 large: I m pretty sure about my guess of 0 Together, Σ and Σ defne a jont dstrbuton on (,Σ) Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 43 Beng Bayesan: MAP estmates for Gaussans Suppose you have,, ~(..d) N(,Σ) But you don t now or Σ MAP: Whch (,Σ) mamzes p(,σ,, )? Step : Put a pror on (,Σ) Step a: Put a pror on Σ (ν 0 -m-)σ ~ IW(ν 0, (ν 0 -m-)σ 0 ) Step b: Put a pror on Σ Σ ~ N( 0, Σ / κ 0 ) Why do we use ths form of pror? Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 44

23 Beng Bayesan: MAP estmates for Gaussans Suppose you have,, ~(..d) N(,Σ) But you don t now or Σ MAP: Whch (,Σ) mamzes p(,σ,, )? Step : Put a pror on (,Σ) Step a: Put a pror on Σ (ν 0 -m-)σ ~ IW(ν 0, (ν 0 -m-)σ 0 ) Step b: Put a pror on Σ Σ ~ N( 0, Σ / κ 0 ) Why do we use ths form of pror? Actually, we don t have to But t s computatonally and algebracally convenent t s a conjugate pror. Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 45 Beng Bayesan: MAP estmates for Gaussans Suppose you have,, ~(..d) N(,Σ) MAP: Whch (,Σ) mamzes p(,σ,, )? Step : Pror: (ν 0 -m-) Σ ~ IW(ν 0, (ν 0 -m-) Σ 0 ), Σ ~ N( 0, Σ / κ 0 ) Step : κ 0 κ ( ν + m ) Σ ( ν 0 + m ) Σ0 + Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde ν ν 0 + κ κ 0 + T 0 ( )( ) + / κ 0 Step 3: Posteror: (ν +m-)σ ~ IW(ν, (ν +m-) Σ ), Σ ~ N(, Σ / κ ) esult: map, E[Σ,, ] Σ ( )( ) 0 + / T 3

24 Beng Bayesan: Loo MAP carefully estmates what these for formulae Gaussans are Suppose you have,, ~(..d) N(,Σ) MAP: Whch (,Σ) statstcs mamzes of the p(,σ data.,, )? Step : Pror: (ν 0 -m-) Σ ~ The IW(ν margnal 0, (ν 0 -m-) dstrbuton Σ 0 ), Σ on ~ N( s a 0, student-t Σ / κ 0 ) One pont of vew: t s pretty academc f > 30 Step : κ κ + ( ν + m ) Σ ( ν 0 + m ) Σ0 + dong. It s all very sensble. Conjugate prors mean pror form and posteror form are same and characterzed by suffcent Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 47 0 ν ν 0 + κ κ 0 + T 0 ( )( ) + / κ 0 Step 3: Posteror: (ν +m-)σ ~ IW(ν, (ν +m-) Σ ), Σ ~ N(, Σ / κ ) esult: map, E[Σ,, ] Σ ( )( ) 0 + / T Where we re at Categorcal nputs only eal-valued nputs only Med eal / Cat oay Inputs Classfer Predct category Jont BC Naïve BC Dec Tree Inputs Densty Estmator Probablty Jont DE Naïve DE Gauss DE Inputs egressor Predct real no. Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 48 4

25 What you should now The ecpe for MLE What do we sometmes prefer MLE to MAP? Understand MLE estmaton of Gaussan parameters Understand based estmator versus unbased estmator Apprecate the outlne behnd Bayesan estmaton of Gaussan parameters Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 49 Useful eercse We d already done some MLE n ths class wthout even tellng you! Suppose categorcal arty-n nputs,, ~(..d.) from a multnomal M(p, p, p n ) where P( j p)p j What s the MLE p(p, p, p n )? Copyrght 00, 004, Andrew W. Moore Mamum Lelhood: Slde 50 5

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification Instance-Based earnng (a.k.a. memory-based learnng) Part I: Nearest Neghbor Classfcaton Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n