Speech and Language Processing

Size: px

Start display at page:

Download "Speech and Language Processing"

Stewart Stanley
5 years ago
Views:

1 Speech and Language rocessng Lecture 3 ayesan network and ayesan nference Informaton and ommuncatons Engneerng ourse Takahro Shnozak 08//5

2 Lecture lan (Shnozak s part) I gves the frst 6 lectures about speech recognton. Through these lectures the backbone of the latest speech recognton technques s eplaned.. 0/9 (remote) Speech recognton based on GMM HMM and N gram. 0/6 (remote) Mamum lkelhood estmaton and EM algorthm 3. /5 (@TIST) ayesan network and ayesan nference 4. /5 (@TIST) Varatonal nference and samplng 5. /6 (@TIST) Neural network based acoustc and language models 6. /6 (@TIST) Weghted fnte state transducer (WFST) and speech decodng

3 Today s Topc nswers for the prevous eercses ayesan network ayesan nference 3

4 nswers for the revous Eercses 4

5 Eercse. Show the dervaton process of obtanng 5 K k k k K k m k L log μ n k m k by mamzng 0 0 μ μ μ L L n k m k

6 Eercse. Derve the ML soluton { } of the Gaussan dstrbuton. The dervaton process must be descrbed 6 0 log 0 log n n N N n n n n

7 Eercse.3 Gven a tranng data D wth n tranng samples D={ n } obtan ML estmaton for GMM wth M mtures You can assume the varance s for smplcty 7 N M m m m N M m m m w w M M M ep log arg ma ep arg ma ˆ GMM No closed form soluton and we need EM

8 Eercse.4 ssume you have an ntal model parameter Θ 0. rove that f you take 0 0 then the lower bound 0 0 s equal to the log lkelhood log H J log H J H q H H q q J H log

9 Eercse.5 onsder the m GMM of the prevous page. Let m Θ 0 ). Obtans the followngs arg ma ˆ arg ma ˆ arg ma ˆ Q w Q Q w n w o o ˆ ˆ ˆ ˆ

10 ayesan Network 0

11 Graphs Undrected graph graph defned by nodes and undrected arcs Drected graph graph defned by nodes and a drected arcs Drected cyclc Graph: DG Drected graph that does not contan a drected cycle Eamples: Undrected graph Drected graph (Have a drected cycle) Drected acyclc graph

12 arent hld ncestor Descendant Node s a parent of node Node and are ancestors of node D Node s a chld of node D Node and D are descendant of node

13 partte When nodes of a graph are separated to two groups and there s no arc nsde the groups t s called a bpartte Eample of partte: 3

14 Drected Graph and Node Orderng drected graph s a DG Eq There s a orderng of nodes where all arcs face the same drecton (=There s a numberng of nodes where all arcs go from a lower numbered to hgher numbered nodes)

15 Outlne of the roof Statement : There s a orderng of nodes where all the arcs face the same drecton Statement : graph does not contan a drected cycle Easy Use lemma 3. (See append) 5

16 omment Feld 6

17 Eercse 3. Is the drected graph a DG? Graph Graph 7

18 ayesan Network (N) N s a graphcal model that represents a set of random varables and ther condtonal ndependence by DG D 8

19 Decomposton of Jont robablty and N y the product rule arbtrary jont probablty s decomposed to a product of condtonal probabltes ( D) D DG s made by Representng the varables as nodes onnectng the nodes by drected arcs accordng the condtonal probabltes D 9

20 ondtonal Independence and rcs ondtonal ndependence s represented by absence of arcs 0 D D ) ( ) ( ) ( ) ( ) ( D D D D ) ( D D

21 Jont robablty Defned by N roduct of condtonal probabltes assocated wth DG always satsfy the sum to one constrant roof: Snce a ayesan network s a DG wth a proper orderng of the varables the product has the followng form N { } That s does not appear n the condtonal part of. y thnkng the summaton of the followng order we have: N N N N N N N N

22 Eercse 3. Represent the followng jont probablty by a N ) ( ) ( D E D E D

23 N Representaton of a ategorcal Dstrbuton K p() μ μ μ K 3

24 N Representaton of a Gaussan Dstrbuton N ep 4

25 N Representaton of a GMM Gauss (μ σ ) Mture weght (robablty of nde) Gauss (μ σ ) Gauss3 (μ 3 σ 3 ) Gaussan dstrbuton condtoned by the nde Gauss ( ) 5

26 ayesan Network Representaton of a HMM The network has unrolled structure The length depends on the nput sequence a b Transton robablty e.g. (S t =as t- =b) HMM S S S 3 S T 3 T Emsson dstrbuton condtoned by the state e.g. ( t S t =a) ayesan network Tme 6

27 Eample of lgnment a b Feature sequence: State sequence: abaaabb S S S 3 S 7 = = 3 = 3 4 = 4 5 = 5 6 = 6 7 = S =a S =b S 3 =a S 4 =a S 5 =a S 6 =b S 7 =b 7

28 Representaton of a Repeated Structure V V t 3 T T 8

29 Representaton of arameters small crcles represent parameters Eample of GMM: 3 N S 3 N N Gauss 9

30 Eercse 3.3 Fll the blanks so that the followng HMM and the N become equvalent Intal state Intal state probablty (a) (b) (S t S t ) S t =a S t =b S t =a ( ) ( ) S t =b ( ) 0.6 a 0.4 b a a b b S S S 3 S T N a a HMM N b b N 3 T t s t s t N 30

31 Factor Graph bpartte graph where one sde of varables represent random varables and the others represent functons The arcs represent dependences of the functons to the varables factor graph defnes a jont probablty f N ssubsets of s varables s Eample: f f f 3 Factor nodes 3 4 Varable nodes f f f

32 Factor Graph Representaton of ayesan Network Each condtonal probablty can be regarded as a factor Eample D D ayesan network () () () (D) D Factor graph 3

33 robablstc Inference Margnal and condtonal probabltes are obtaned from a jont probablty by applyng the sum and product rules 33

34 Dstrbuton roperty and omputatonal ost roduct s dstrbutve over addton N af Number of products:n Number of summaton:n a N f Number of products: Number of summaton:n The same property holds for sum and ma and product and ma ma a f a f ma ma af a maf 34

35 omputatonal ost of Margnalzaton Suppose and D take 000 possble values D D # summaton = = 0 9 If the jont probablty s decomposed to: D ( ) ( ) ( ) ( D) D D D D # summaton = 30 3 Independence structure s mportant 35

36 When the Factor Graph s Lnear f f f 3 Suppose we want ( 3 ) f f f f f f f f f f f f 5 f 4

37 Message assng Vew of the Inference f f f f f 5 f 4 f f f f 3 f 3 f 3 3 f f f f 4 f f 4 f3 f 3 f 3 f 3 3 f f

38 ayesan Inference 38

39 robablstc Models and Ther arameters LM w M p Gaussan dstrbuton Multnomal dstrbuton etc. p w p w pw M Speech model consstng of language and acoustc models LM 39

40 ML Tranng and redcton N Tranng set D Test sample * arg ma D arg ma p p n n p * Mamum lkelhood (ML) tranng redcton 40

41 ayesan pproach Treat parameters as random varables 4 N Λ D p p D p p D p p D p D p D p redcton of a new sample s formulated as an evaluaton of condtonal probablty gven a tranng set D D p D

42 Defntons of Terms pror dstrbuton of parameters p robablstc model p posteror dstrbuton of parameters p D p D p D p redctve dstrbuton p D p p D 4

43 Evaluaton of osteror Dstrbuton Ecept for very smple models how to evaluate the a posteror dstrbuton s a bg ssue snce t requres ntegratons over many varables p D p D p p D 43

44 pproaches nalytcal evaluaton Ideal but only applcable for very smple models For practcal models closed form soluton s usually not obtaned. Numercal ntegraton s also not feasble when there are many varables Varatonal ayes an be appled to large models f proper analytcal appromaton s ntroduced Samplng Versatle but requres very large computatonal cost 44

45 onjugate ror For some combnatons of pror and probablstc model posteror takes the same functonal form as the pror robablstc model onjugate pror nomal dstrbuton eta dstrbuton Multnomal dstrbuton Gaussan dstrbuton Drchlet dstrbuton Mean: Gaussan dstrbuton Varance: Gamma dstrbuton 45

46 Eercse 3.4 ssumes a probablstc model (μ) a tranng sample and a pror dstrbuton of a parameter (μ) are gven as follows. Gaussan dstrbuton wth mean μ and varance ep μ ep Gaussan dstrbuton wth mean 0 and varance ) Estmate posteror dstrbuton ) Estmate predctve dstrbuton Note: ep c d c d 46

47 ppend 47

48 Lemma 3. If a graph does not contan a drected cycle then there est at least one node that has no ncomng arc? 48

49 Notaton for ondtonal Independence Let and be dsjont sets of random varables. When the followng equaton holds we say that s ndependent of gven and denote t as Note: 49

50 Graph Structure and ondtonal Independence y nvestgatng the graph structure we can read relatonshps between random varables D? D? 50

51 Tal To Tal Tal Tal In general () s not epressed as ()(). Therefor Φ does not hold. (Φ s an empty set) () s epressed as ()(). Therefore holds. 5

52 Head To Tal 5 Head Tal In general () s not epressed as ()(). Therefor Φ does not hold. () s epressed as ()(). Therefore holds.

53 Head To Head Head Head In general () s epressed as ()(). Therefor Φ holds. () s not epressed as ()(). Therefore does not hold. 53

54 lockng a ath For a ayesan network let and be a node and be a set of nodes that does not nclude and. We say a path from to s blocked when ether of the followngs holds On the path from to there s a node n and the connecton of the arcs s tal to tal or head to tal t one of the nodes on the path from to the connecton of the arcs s head to head. In addton the node and ts all descendants are not ncluded n D E F lock lock 54

55 d separaton For a ayesan network let and be eclusve sets of nodes We say s d separated from by f all the paths startng from a node n and endng at a node n s blocked When s d separated from by holds for the jont probablty defned by the ayesan network (earl 988) 55

56 Mamzaton of Jont robablty Obtaned by replacng Σ n the sum product algorthm wth ma Ma product lgorthm 56 N N N arg ma

57 Sum roduct lgorthm (for Tree) Message passng Leaf nodes Varable node: Factor node: f f f Varable node to factor node: f f Factor node to varable node: Margnal probablty f f f M M M m m f m 57

58 EM for HMM and Effcency Q HMM K log HMM K 0 0 K K k k k T T () a b The summaton s over state sequence K The number of the sequences s eponental to the length of nput sec of feature sequence s 00 frames Drectly enumeratng all the paths s mpossble Q functon () can be effcently evaluated f posterors k t s 0 and k s' k s are obtaned where s and s are HMM state ID t t 0 Use the Sum roduct algorthm to effcently obtan the posterors 58

59 59

60 Eercse 3.3 (nswer) 60 ep 4 ep ep ep ep ep ep N d d 3 3 ep 3 N d

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Mamum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models for