Discriminative Estimation (Maxent models and perceptron)

Size: px

Start display at page:

Download "Discriminative Estimation (Maxent models and perceptron)"

Lilian Scott
5 years ago
Views:

1 srmnatve Estmaton Maxent moels an pereptron Generatve vs. srmnatve moels Many sles are aapte rom sles by hrstopher Mannng

2 Introuton So ar we ve looke at generatve moels Nave Bayes But there s now muh use o ontonal or srmnatve probablst moels n NLP Speeh IR an ML generally Beause: They gve hgh auray perormane They make t easy to norporate lots o lngustally mportant eatures

3 Jont Moels We have some ata { } o pare observatons an hen lasses. Jont generatve moels plae probabltes over both observe ata an the hen stu generate the observe ata rom hen stu: All the lass StatNLP moels: n-gram moels Nave Bayes lassers hen Markov moels probablst ontext-ree grammars IBM mahne translaton algnment moels P

4 ontonal Moels srmnatve ontonal moels take the ata as gven an put a probablty over hen struture gven the ata: Logst regresson ontonal loglnear or maxmum entropy moels ontonal ranom els Also SVMs average pereptron et. are srmnatve lassers but not retly probablst P

5 Jont Lkelhoo vs. ontonal Lkelhoo A jont moel gves probabltes P an tres to maxmze ths jont lkelhoo. It turns out to be trval to hoose weghts: just relatve requenes. A ontonal moel gves probabltes P. It takes the ata as gven an only moels the ontonal probablty o the lass. Harer to o. More losely relate to lassaton error.

6 Maxent Moels an srmnatve Estmaton Generatve vs. srmnatve moels

7 The Maxent Moel

8 Example eatures 1 º [ LOATION Ù w -1 n Ù saptalzew] weght: º [ LOATION Ù hasaentelatnharw] weght: º [ RUG Ù ensw ] weght: 0.3 LOATION n Araa LOATION n Québe RUG takng Zanta PERSON saw Sue Moels wll assgn to eah eature a weght: A postve weght votes that ths onguraton s lkely orret A negatve weght votes that ths onguraton s lkely norret

9 The Maxent Moel Exponental log-lnear maxent logst Gbbs moels: P ' exp exp Makes votes postve ' Normalzes votes PLOATION n Québe e 1.8 e 0.6 /e 1.8 e e e PRUG n Québe e 0.3 /e 1.8 e e e PPERSON n Québe e 0 /e 1.8 e e e

10 The Lkelhoo Value The log ontonal lkelhoo o ata aorng to maxent moel s a unton o the ata an the parameters l: I there aren t many values o t s easy to alulate: log log log P P P log log P ' ' exp exp

11 A lkelhoo surae

12 The Lkelhoo Value We an separate ths nto two omponents: log P log P logexp N M logexp ' ' The ervatve s the erene between the ervatves o eah omponent

13 The ervatve I: Numerator ervatve o the numerator s: the empral ount N log exp

14 The ervatve II: enomnator M ' ' exp log ' '' ' exp '' exp 1 ' '' ' 1 ' exp '' exp 1 ' '' exp ' exp ' '' ' ' ' P prete ount l

15 log The ervatve III P atual ount prete ount The optmum parameters are the ones or whh eah eature s prete expetaton equals ts empral expetaton. The optmum strbuton s: Always unque but parameters may not be unque Always exsts eature ounts are rom atual ata. These moels are also alle maxmum entropy moels beause we n the moel havng maxmum entropy an satsyng the onstrants: E p j E~ p j j

16 Feature Expetatons We wll rually make use o two expetatons atual or prete ounts o a eature rng: Empral ount expetaton o a eature: Moel expetaton o a eature: observe empral E P E Goal: well t the ata

17 The Maxent Moel

18 NB vs. Maxent

19 Nave Bayes vs. Maxent Moels Nave Bayes moels mult-ount orrelate evene Eah eature s multple n even when you have multple eatures tellng you the same thng Maxmum Entropy moels pretty muh solve ths problem ths s one by weghtng eatures avo to assgn equally hgh weghts to orrelate eatures.

20 Europe Text lassaton: Asa or Europe Tranng ata Asa Monao Monao Monao Monao Monao Monao Hong Kong Hong Kong Monao Monao Hong Kong Hong Kong

21 The Maxent Moel

22 Pereptron Another srmnatve Learnng algorthm

23 23 Pereptron Algorthm

24 Regularzaton n the Pereptron Algorthm run erent numbers o teratons Use parameter averagng ornstaneaverage o all parameters ater seeng eah ata pont 24

Maxent Models and Discriminative Estimation. Generative vs. Discriminative models

Maxent Models and Discriminative Estimation. Generative vs. Discriminative models + Maxent Moels an Dsrmnatve Estmaton Generatve vs. Dsrmnatve moels + Introuton n So far we ve looke at generatve moels n Language moels Nave Bayes 2 n But there s now muh use of ontonal or srmnatve probablst