Will Monroe August 9, with materials by Mehran Sahami and Chris Piech. image: Arito. Parameter learning

Size: px

Start display at page:

Download "Will Monroe August 9, with materials by Mehran Sahami and Chris Piech. image: Arito. Parameter learning"

Eustace Lynch
5 years ago
Views:

1 Will Monroe August 9, 07 with aterials by Mehran Sahai and Chris Piech iage: Arito Paraeter learning

2 Announceent: Proble Set #6 Goes out tonight. Due the last day of class, Wednesday, August 6 (before class). Soe serious coding! Congressional votng No late days! Heart disease diagnosis

3 Review: Paraeter estaton Soetes we don t know things like the expectaton and variance of a distributon; we have to estate the fro incoplete inforaton. n n = X i S = ( X X X ) i n i= n i= ^ =arg ax LL()

4 Review: Central liit theore Sus and averages of IID rando variables are norally distributed. n σ X = X i N (μ, ) n i= n n Y =n X = X i N (n μ, n σ ) i=

5 Easily-confused principles Constant ultple Su of identcal of a noral norals X N (μ, σ ) CLT X i??? X i N (μ, σ ) (independent & identcal) n n X N (nμ, n σ ) (independent & identcal) n X i N (nμ,n σ ) X i N (nμ,n σ ) (exactly) (approxiately, for large n) i= i=

6 Central liit theore deo

7 Review: Approxiatng a Poisson with a noral X Poi (λ) Y N (λ, λ) (for large λ)

8 Paraeters X Ber(p) =p Poi(λ) =λ Uni(a, b) = [a, b] N(μ, σ²) = [μ, σ²]

9 Maxiu likelihood estaton Choose paraeters that axiize the likelihood (joint probability given paraeters) of the exaple data. ^ =arg ax LL() * x x x x * 0 x 3 4

10 How to: MLE. Copute the likelihood. L()=P ( X,, X ). Take its log. LL()=log L() 3. Maxiize this as a functon of the paraeters. * d LL()=0 d x4 x3 x x x * 0

11 Maxiu likelihood for Bernoulli The axiu likelihood p for Bernoulli rando variables is the saple ean. p^ = X i i=

12 Derivaton: MLE for Bernoulli. Copute the likelihood. = p L()=P( X,, X ) = P( X i ) i= = i= don t forget: IID eans independent! p if X i= ( p) if X i=0 {

13 Derivaton: MLE for Bernoulli. Copute the likelihood. = p L()=P( X,, X ) = P( X i ) i= = i= don t forget: IID eans independent! p if X i= ( p) if X i=0 { Xi X i = p ( p) i=

14 Derivaton: MLE for Bernoulli. Take its log. = p Xi X i L()= ( ) i= Xi X i LL()=log ( ) i= Xi X i = log [ ( ) ] i= = [ X i log +( X i )log( ) ] i=

15 Derivaton: MLE for Bernoulli 3. Maxiize this as a functon of the paraeters. = p LL()= [ X i log +( X i ) log ( ) ] i= ^ p^ =arg ax LL() = X i X i d LL()= d i= = X i ( X i ) i= i= ( = [ ] + ( )( i= + =0 ) )( ) Xi ( ) i= X i = i= X i = = ( ) i= Xi

16 Maxiu likelihood for noral The axiu likelihood μ for noral rando variables is the saple ean, and the axiu likelihood σ² is the uncorrected ean square deviaton. ^ μ= Xi i= ^ ^ σ = ( X i μ) i=

17 Derivaton: MLE for Noral. Take its log =[μ, σ ] L()= i= x μ σ ( e σ π [ ) x μ σ ( LL()= log e σ π i= ) ] x μ = log σ log π σ i= ( )

18 Derivaton: MLE for noral 3. Maxiize this as a functon of the paraeters. =[μ, σ ] X i μ LL()= log σ log π σ i= ^ ^ [ μ^, σ ]==arg ax LL() ( LL()= X i μ σ σ μ i= X i μ = σ i= μ = X i =0 σ i= σ μ= ( ( ) )( ) ( ) X i = X i= )

19 Derivaton: MLE for noral 3. Maxiize this as a functon of the paraeters. =[μ, σ ] X i μ LL()= log σ log π σ i= ^ ^ [ μ^, σ ]==arg ax LL() ( ) X i μ X i μ LL()= σ σ σ σ i= ( X i μ) = σ 3 σ i= [ [ ( )] ] = )( ( X μ) i 3 σ =0 σ i= ) σ = ( X i μ) = ( X i X i= i=

20 Break te!

21 Maxiu likelihood for unifor The axiu likelihood a and b for unifor rando variables are the iniu and axiu of the data. ^ b=ax Xi a^ =in X i i i b a a b

22 Derivaton: MLE for unifor. Copute the likelihood. =[a, b] L()= i= { b a 0 if a X i b otherwise. Take its log. LL()= i= log (b a) if a X i b otherwise { 3. Maxiize this as a functon of the paraeters. ^ =arg ^ [ a^, b]= ax LL()

$ax L() { Likelihood: L(a,) L(a, ) a^ =in X i i Likelihood: L(0, b) L(0, b) a b ^ b=ax Xi$

23 Derivaton: MLE for unifor =[a, b] if a X i b L()= b a i= 0 otherwise ^ =arg ^ [ a^, b]= ax L() { Likelihood: L(a,) L(a, ) a^ =in X i i Likelihood: L(0, b) L(0, b) a b ^ b=ax Xi i

24 Maxiu a posteriori estaton Choose the ost likely paraeters given the exaple data. You ll need a prior probability over the paraeters. ^ =arg ax P ( X,, X n ) =arg ax [ LL()+log P () ]

25 Review: Multnoial rando variable An ultnoial rando variable records the nuber of tes each outcoe occurs, when an experient with ultple outcoes (e.g. die roll) is run ultple tes. X,, X MN (n, p, p,, p ) vector! P ( X =c, X =c,, X =c ) c c c n = p p p c, c,, c ( )

26 Roll all of the dice! A 6-sided die is rolled 7 tes. What is the probability we get: one two 0 threes fours 0 fves 3 sixes? X,, X 6 MN (7,,,,,, ) P ( X =, X =, X 3 =0, X 4 =, X 5 =0, X 6 =3) = =40 6,,0,,0, ( )( ) ( ) ( ) ( ) ( ) ( ) 7 ()

27 Maxiu likelihood with ultnoial A 6-sided die is rolled 7 tes. We get: one two 0 threes fours 0 fves 3 sixes What is the MLE for p₁,, p₆? 3 X,, X 6 MN (7,,, 0,, 0, ) you ll never roll a 3! not in a illion years!

28 Are we doing this backwards? ^ =arg ax P ( X,, X n ) ^ =arg ax P ( X,, X n )

29 Bayes to the rescue ^ =arg ax P ( X,, X n ) P( X,, X n ) P() =arg ax P ( X,, X n ) =arg ax P( X,, X n ) P () =arg ax [ log P( X,, X n )+log P () ]

30 Review: Beta rando variable An beta rando variable odels the probability of a trial s success, given previous trials. The PDF/CDF let you copute probabilites of probabilites! X Beta (a, b) f X ( x)= Cx { a b ( x) 0 if 0< x < otherwise

31 Review: Dirichlet distributon Beta is the distributon ( conjugate prior ) for the p in the Bernoulli and binoial. Dirichlet is the distributon for the p₁, p₂, in the ultnoial. X, X, Dir (a, a, ) fx, X, ( x, x, )= C x a x b if 0<{x, x, }<, x + x + = (0 otherwise)

32 Laplace soothing Also known as add-one soothing: assue you ve seen one iaginary occurrence of each possible outcoe. # ( X =i)+k pi = n+ k # ( X =i)+ pi = n+

33 Maxiu likelihood with ultnoial A 6-sided die is rolled 7 tes. We get: one two 0 threes fours 0 fves 3 sixes What is the MLE for p₁,, p₆? 3 X,, X 6 MN (7,,, 0,, 0, ) you ll never roll a 3! not in a illion years!

34 Laplace with ultnoial A 6-sided die is rolled 7 tes. We get: one two 0 threes fours 0 fves 3 sixes What is the Laplace estate for p₁,, p₆? 3 4 X,, X 6 MN (7,,,,,, ) stll a chance!

35 Paraeter priors X Ber(p) p ~ Beta(a, b) Bin(n, p) p ~ Beta(a, b) MN(p) p ~ Dir(a) Poi(λ) λ ~ Gaa(k, ) Exp(λ) λ ~ Gaa(k, ) N(μ, σ²) μ ~ N(μ, σ ²) σ²~ InvGaa(α, β)

Conditional distributions

Conditional distributions Will Monroe July 6, 017 with materials by Mehran Sahami and Chris Piech Independence of discrete random variables Two random variables are independent if knowing the value of