Machine Learning PDF Free Download

ony Jebara, Columbia Universiy achine Learning 4771 Insrucor: ony Jebara

ony Jebara, Columbia Universiy opic 20 Hs wih Evidence H Collec H Evaluae H Disribue H Decode H Parameer Learning via JA & E

ony Jebara, Columbia Universiy Hs: JA wih Evidence If y seuence is observed (in problems 1,2,3 ge evidence: p( 0 p( 1 p,y 1 he poenials urn ino slices: ψ (, 0 1 ψ 0,y 0 φ ( 1 ς( 1 φ 0 ψ 1,y 1 Nex, pick a roo, for example righmos one: Collec all zea separaors boom up: ς * ( ψ,y p( y Collec lefmos phi separaor o he righ: x p( y ψ 0,y 0 φ * 0 ς * ( ψ,y x ψ 1,y 1 y δ( y 0 y 0 p y 0, 0 y 0 ς * ( ψ(,y p( y ψ( 1,

ony Jebara, Columbia Universiy Hs: Collec wih Evidence Now, we will collec (* along he backbone lef o righ Updae each cliue wih is lef and boom separaors: ( φ* ς * ( +1 1 1 ψ * ( ψ * φ * +1 ψ( φ * ( p( y +1 +1 α φ * ( p( y +1 +1 α Keep going along chain unil righ mos node Noe: above formula for phi is recursive, could use as is. Propery: recall we had p y 0, 0 0 p y 0,y 1, 1 1 p y 0,,y, φ * 1 φ * 2 φ * +1 p( y 0, 0 φ * 0 p( y 1 1 p( 1 0 p( y 0,y 1, 1 p( y 2 2 p( 2 1 p( y 0,y 1,y 2, 2 p( y +1 +1 p( +1 p y 0,,y +1

ony Jebara, Columbia Universiy Hs: Evaluae wih Evidence Say we are solving he firs H problem: 1 Evaluae: given y 0,,y & θ compue p(y 0,,y θ If we wan o compue he likelihood, we are already done! We really jus need o do collec (no even disribue. From previous slide we had: p y 0,,y, φ * +1 p( y +1 +1 p( +1 p y 0,,y +1 Collec il roo (righmos node: is normalizer is p(evidence! p( y 0,,y, ( 1, p( y 0,,y, 1, Or use hypoheical φ * Can compue he likelihood jus by marginalizing his phi p( y 0,,y, p y 0,,y φ * So, adding up he enries in las φ* gives us he likelihood ψ * hypoheical

ony Jebara, Columbia Universiy Hs: Disribue wih Evidence Back o collecing say jus finished collecing o he roo wih our las updae formula: ψ * ( φ* 1 1, 1 ς * 1 ψ( 1, φ * ( 1 p( y α 1, Now, we disribue (** along he backbone righ o lef Have firs ** for roo (says he same: ψ ** 1, Sar going o he lef from here: ψ * ( 1, a b c d for -1 o 0 a b φ ** ( ψ ** +1 ψ ** ( ς ** +1 c d φ** +1 ψ ** ψ ** y, ψ* φ * +1 ς** ψ y, ς *

ony Jebara, Columbia Universiy H Example You are given he parameers of a 2-sae H. You observed he inpu seuence AB (from a 2-symbol alphabe A or B. In oher words, you observe wo symbols from your finie sae machine, A and hen B. Using he juncion ree algorihm, evaluae he likelihood of his daa p(y given your H and is parameers. Also compue (for decoding he individual marginals of he saes afer he evidence from his seuence is observed: p( 0 y and p( 1 y. he parameers for he H are provided below. hey are he iniial sae prior p( 0, he sae ransiion marix given by p( -1, and he emission marix p(y, respecively. π p( 0 1 2 1/ 3 2 / 3 1 2 1 2 a p( 1 1 2 3 / 4 1 / 2 1/ 4 1 / 2 η p( y A B 1 / 2 1 / 3 1 / 2 2 / 3

H Example ony Jebara, Columbia Universiy

ony Jebara, Columbia Universiy Hs: arginals & axdecoding Now ha JA is finished, we have he following: φ ** ( p( y 1,,y ς ** ( +1 p( +1 y 1,,y p( y 1,,y ψ ** he separaors define a disribuion over he hidden saes his gives he probabiliy he DNA symbol y was {I,E,P} We ve done 2 Decode: given y 0,,y & θ find p( 0,,p( Can also do 2 Decode: given y 0,,y & θ find 0,, We can also decode o find he mos likely pah 0 Here, we use he Argax JA algorihm Run JA bu replace sums wih max hen, find bigges enry in separaors: ˆ arg max φ ** 0 I I E E P G A C C

ony Jebara, Columbia Universiy Hs: E Learning Finally 3 ax Likelihood: given y 0,,y learn parameers θ Recall max likelihood: ˆθ arg maxθ log p( y θ If observe, i s easy o maximize he complee likelihood: log p(,y log p( 0 p( 1 1 log p( 0 + log p( 1 l θ ( p y + log p y 1 i log π 0 N i1 i + log α 1 i1 ij + log η i1 ij i i 0 log π i + 1 j N logα ij + i y j log η ij i1 Inroduce Lagrange & ake derivaives ˆπ i 0 i ˆα ij 1 i, i j 1 i1 N π i 1 α ij 1 η ij 1 i1 1 i j +1 1 k1 i k +1 ˆη ij i y j N k1 i y k i j y

ony Jebara, Columbia Universiy Hs: E Learning Bu, we don observe he s, incomplee p(,y θ p y θ E: ax expeced complee likelihood given curren p( E l θ -sep is maximizing as before: Wha are E{} s? 0! p 0 0 p( 1 { } E { p 0 log p,y } p y (,, y { + i i1 1 j N 1 logα i, ij + i1 i y j log η } ij 1 log p,y p y E i 0 log π i i i j N i E { 0 }log π i + E { 1 }logα ij + E { }y j log η ij i1 i ˆπ i E { 0 } ˆα ij 1 i, 1 1 k1 { } E i k { +1 } E i j +1 ˆη ij i1 N k1 i j E { }y i k E { }y

ony Jebara, Columbia Universiy Hs: E Learning Bu, we don observe he s, incomplee p(,y θ p y θ E: ax expeced complee likelihood given curren p( E l θ -sep is maximizing as before: Wha are E{} s? 0! p 0 0 p( 1 { } E { p 0 log p,y } p y (,, y { + i i1 1 j N 1 logα i, ij + i1 i y j log η } ij 1 log p,y p y E i 0 log π i i i j N i E { 0 }log π i + E { 1 }logα ij + E { }y j log η ij i1 i ˆπ i E { 0 } ˆα ij 1 E p(x i, 1 E i j { +1 } 1 E i k { +1 } { x i } p x k1 i1 i j E { }y ˆη ij N i k k1 E { }y x i p( xδ( x x i p( x i x x

ony Jebara, Columbia Universiy E i j +1 Hs: E Learning Bu, we don observe he s, incomplee p(,y θ p y θ E: ax expeced complee likelihood given curren p( E l θ -sep is maximizing as before: Wha are E{} s? 0! p 0 0 p( 1 { } E { p 0 log p,y } p y (,, y { + i i1 1 j N 1 logα i, ij + i1 i y j log η } ij Our JA ψ & φ marginals! (JA is he E-Sep for given θ 1 log p,y p y E i 0 log π i i i j N i E { 0 }log π i + E { 1 }logα ij + E { }y j log η ij i1 i ˆπ i E { 0 } ˆα ij 1 E p(x i, 1 E i j { +1 } 1 E i k { +1 } { x i } p x k1 { } p( i j y ψ** i j i1 i j E { }y ˆη ij N i k k1 E { }y x i p( xδ( x x i p( x i x ψ ** ( i j E i ij x { } p( i y φ** i i φ ** i

ony Jebara, Columbia Universiy hank you! So, o incomplee maximize likelihood wih E, iniialize parameers randomly, Run Juncion ree Algorihm o ge marginals Use marginals over s in he maximum likelihood sep Please complee course evaluaion on courseworks Good luck wih finals week and happy holidays!

Machine Learning 4771