Lecture 10: Expectation-Maximization Algorithm

Size: px
Start display at page:

Download "Lecture 10: Expectation-Maximization Algorithm"

Transcription

1 ECE 645: Estmaton Theory Sprng 2015 Instructor: Prof Stanley H Chan Lecture 10: Expectaton-Maxmzaton Algorthm (LaTeX prepared by Shaobo Fang) May 4, 2015 Ths lecture note s based on ECE 645 (Sprng 2015) by Prof Stanley H Chan n the School of Electrcal and Computer Engneerng at Purdue Unversty 1 Motvaton Consder a set of data ponts wth ther classes labeled, and assume that each class s a Gaussan as shown n Fgure 1(a) Gven ths set of data ponts, fndng the means of two Gaussan can be done easly by estmatng the sample mean, as the class labels are known Now magne that the classes are not labeled as shown n Fgure 1(b) How should we determne the mean for each of the classes then? In order to solve ths problem, we could use an teratve approach: frst make a guess of the class label for each data pont, then compute the means and update the guess of the class labels agan We repeat untl the means converge The problem of estmatng parameters n the absence of labels s known as unsupervsed learnng There are many unsupervsed learnng methods We wll focus on the Expectaton Maxmzaton (EM) algorthm 8 Class Labelled 8 Class Unlabelled Class 1 Class Fgure 1: Estmaton of parameters becomes trval gven the labelled classes 2 The EM-algorthm Notatons 1 Y, y observatons Y random varable; y realzaton of Y 2 X, x complete data 3 Z, z, mssng data Note that X (Y,Z) 4 θ: unknown determnstc parameter θ (t) : t th estmate of the θ n the EM teraton 5 f(y θ) s the dstrbuton of Y gven θ

2 6 f(x θ) s a random varable takng value of f(x θ) (Remember: f( θ) s a functon and thus we can put any argument nto f( θ) and evaluate ts output) 7 E X y,θ [g(x)] g(x)f X y,θ (x y,θ)dx s the condtonal expectaton of g(x) gven Y y and θ 8 l(θ) logf(y θ) s the log-lkelhood Note that l(θ) depends on y EM Steps The EM-algorthm conssts of two steps: 1 E-step: Gven y and pretendng for the moment that θ (t) s correct, formulate the dstrbuton for the complete data x: f(x y,θ (t) ) Then, we calculate the Q-functon: Q(θ θ (t) ) def E X y,θ (t)[logf(x θ)] logf(x θ)f(x y,θ (t) )dx 2 M-step: Maxmze Q(θ θ (t) ) wth regard to θ: Propertes of Q(θ θ (t) ) θ (t+1) argmaxq(θ θ (t) ) θ 1 Ideally, f we have the dstrbuton of the complete data x, then fndng the parameter can be done by maxmzng f(x θ) However, the complete data s only a vrtual thng we created to solved the problem In realty we never know x All we know s ts dstrbuton f(x θ), whch depends on what we know about x So one way to handle ths uncertanty s to compute the average Ths average s the Q-functon 2 Another way of lookng at Q(θ θ (t) ) We can treat logf(x θ) as a functon of two varables h(x,θ) Maxmzng over θ s problematc because t depends on X So by takng expectaton E X [h(x,θ)] we can elmnate the dependency on X 3 Q(θ θ (t) ) can be thought of a local approxmaton of the log-lkelhood functon l(θ): Here, by local we meant that Q(θ θ (t) ) stays close to ts prevous estmate θ (t) In fact f Q(θ θ (t) ) Q(θ (t) θ (t) ), then l(θ) l(θ (t) ) 3 Estmatng Mean wth Partal Observaton Let us consder the frst example of the EM algorthm Suppose that we generated a sequence of n random varables Y N(θ,σ 2 ) for 1,,n Imagne that we have only observed Y [Y 1,Y 2,,Y m ] where m < n How should we estmate θ based on Y? Intutvely, the estmated θ should be the sample mean of the m observatons θ 1 m m Y However, n ths example we would lke to derve the EM algorthm and see f the EM algorthm would match wth our ntuton Soluton: To start the EM algorthm, we frst need to specfy the mssng data and the complete data In ths problem, the mssng data s Z [Y m+1,,y n ], and the complete datas X [Y,Z] The dstrbuton of X s: logf(x θ) n (Y θ) 2 2 log(2πσ2 ) 2σ 2 (1) 2

3 Therefore, the Q functon s Q(θ θ (t) ) def E X Y,θ (t)[logf(x θ)] [ n m (Y θ) 2 E X Y,θ (t) 2 log(2πσ2 ) 2σ 2 n 2 log(2πσ2 ) The last expectaton can be evaluated as Therefore, the Q functon s Q(θ θ (t) ) n 2 log(2πσ2 ) m (y θ) 2 2σ 2 m+1 m+1 E Y Y,θ (t)[(y θ) 2 ] E Y Y,θ (t)[y 2 2Y θ +θ 2 ] [(θ (t) ) 2 +σ 2 2θ (t) θ +θ 2 ] m (y θ) 2 2σ 2 ] (Y θ) 2 2σ 2 E X Y,θ (t)[(y θ) 2 ] 2σ 2 n m 2σ 2 [(θ(t) ) 2 +σ 2 2θ (t) θ +θ 2 ] In the M-step, we need to maxmze the Q-functon To ths end, we set whch yelds that θ Q(θ θ(t) ) 0, m θ (t+1) y +(n m)θ (t) n It s not dffcult to show that as t, θ (t) θ ( ) Hence, m θ ( ) y ( + 1 m ) θ ( ), n n whch yelds θ ( ) 1 m m y Ths result says that as the EM algorthm converges, the estmated parameter converges to the sample mean usng the avalable m samples, whch s qute ntutve 4 Gaussan Mxture Wth Known Mean And Varance Our next example of the EM algorthm to estmate the mxture weghts of a Gaussan mxture wth known mean and varance A Gaussan mxture s defned as f(y θ) θ N(y µ,σ), 2 (2) where θ [θ 1,,θ k ] s called the mxture weght The mxture weght satsfes the condton that Our goal s to derve the EM-algorthm for θ θ 1 3

4 Soluton: We frst need to defne the mssng data For ths problem, we observe that the observed data s Y [y 1,y 2,,y n ] The mssng data can be defned as the label for each y j, so that Z [Z 1,Z 2,,Z n ], wth Z j {1,,k} Consequently, the complete data s X [X 1,X 2,,X n ], where X j (y j,z j ) The dstrbuton of the complete data can be computed as Thus, the Q functon s The expectaton can be evaluated as f(x j θ) f(y j,z j θ) θ zj N(y j µ zj,σ 2 z j ), Q(θ θ (t) ) E X, Y,θ (t) {logf(x, θ)} E Z, y,θ (t) {logf(z,y, θ)} n E Z, y,θ (t) log θ zj N(y j, µ zj,σz 2 j ) j1 } E Zj y j,θ {logθ (t) zj +logn(y j, µ zj,σz 2 j ) j1 E Zj y j,θ (t){logθ z j } z j logθ zj P(Z j z j y j,θ (t) ) By summng over all j s, we can further defne Therefore, the Q functon becomes j1 j logθ P(Z j y j,θ (t) ) }{{} P(Z j y j,θ (t) ) j1 j1 Q(θ θ (t) ) def j θ (t) N(y j µ,σ 2 ) θ(t) N(y j µ,σ 2 ) j1 log j θ +C log θ +C, for some constant C ndependent of θ Maxmzng over θ yelds θ (t+1) argmax θ γ(t), logθ where the last equalty s due to Gbbs nequalty To summarze the EM algorthm s gven n the algorthm below 4

5 Data: Gaussan Mxture wth known mean and varance Result: Estmated θ for t 1, do end j1 θ (t) γ(t) θ (t) N(y j µ,σ 2 ) θ(t) N(y j µ,σ 2 ) Remark: To solve argmax θ all α and β such that α 1, γ(t) logθ, we use the Gbbs nequalty Gbbs nequalty states that for β 1, 0 α 1 and 0 β 1, t holds that α logβ α logα, (3) wth the equalty holds when α β for all The proof of Gbbs nequalty s due to the non-negatvty of the KL-dvergence whch we wll skp What we want to show s that f we let then the equalty holds when: whch s the result we want α γ(t) θ, β θ, γ(t), 5 Gaussan Mxture Prevously we have been workng on Gaussan Mxtures wth known mean and varance However for most of the tme t s lkely nether mean nor varance s avalable for us Thus, we are nterested n dervng an EM-algorthm that would generally apply for any Gaussan mxture model wth only observatons avalable Recall that a Gaussan mxture s defned as f(y θ) π N(y µ,σ ), (4) where θ def {(π µ Σ )} k s the parameter, wth π 1 Our goal s to derve the EM algorthm for learnng θ Soluton We frst specfy the followng data: Observed Data: Y [Y 1,,Y n ] wth realzatons y [y 1,,y n ]; Mssng Data: Z [Z 1,,Z n ] wth realzatons z [z 1,,z n ], where z j {1,,k}; Complete Data: X [X 1,,X n ] wth realzatons x [x 1,,x n ] and x j (y j,z j ) Accordngly, the dstrbuton of the complete data s f(y j,z j θ) π zj N(y j µ zj,σ zj ) 5

6 Therefore, we can show that The Q functon s P(Z j y j,θ (t) ) π (t) N(y j µ (t),σ (t) ) π(t) N(y µ (t),σ (t) ) Q(θ,θ (t) ) E X y,θ (t){logf(x θ)} E Z y,θ (t){logf(z,y θ)} n E Z y,θ (t){log( π zj N(y j µ zj,σ zj ))} j1 j1 j1 E Zj y j,θ (t){logπ z j 1 2 log Σ z j 1 2 (y j µ zj ) T Σ 1 z j (y j µ zj )}+C j1 P(Z j y,θ (t) ){logπ 1 2 log Σ 1 2 (y j µ )T Σ 1 (y j µ )}+C j {logπ 1 2 log Σ 1 2 (y j µ ) T Σ 1 (y j µ )}+C, where C s a constant ndependent of θ The Maxmzaton step s to solve the followng optmzaton problem maxmze θ subject to Q(θ θ (t) ) π 1, π > 0, Σ 0 (5) For π, the maxmzaton s maxmze π subject to j1 γ(t) j logπ π 1, π > 0 (6) The soluton of ths problem s π (t+1) j1 γ(t) j n j1 γ(t) j For µ, the maxmzaton can be reduced to solvng the equaton j1 γ(t) j (7) n µ Q(θ θ (t) ) 0 (8) The left hand sde s Therefore, µ Q(θ θ (t) ) µ { Σ 1 ( j1 j1 j (y j µ )T Σ 1 (y j µ )} j y j j1 µ (t+1) j1 γ(t) j y j1 γ(t) j j µ ) (9) 6

7 For Σ, the maxmzaton s equvalent to solvng The left hand sde s Σ (θ θ (t) ) 1 2 (Σn j1 γ(t) j )log Σ Σ ( n γ t j )Σ Σ (θ θ (t) ) 0 (10) j1 j1 j {(y Σ j µ ) T Σ 1 (y j µ )} j Σ 1 (y j µ )(y j µ ) T Σ 1 Therefore, Σ t+1 j1 γ(t) j (y j µ (t+1) )(y j µ (t+1) γt j ) T (11) 6 Bernoull Mxture Our next example s to consder a Bernoull mxture model To motvate ths problem, let us magne that we have a dataset of varous tems Our goal s to see whether there s any relatonshp between the presence or absence of these tems For example, f the object A (eg a tree) was presented, there s some probablty that the object B (eg a flower) s also presented However f gven certan object C (eg a dnosaur) presented t s unlkely to see the object D (eg a car, unless you are n Jurassc Park!) To setup the problem let us frst defne some notatons We use Y 1,,Y N to denote N mages we have observed In each mage, there are at most M tems, so that Y n [Y1 n,,ym n ] for n 1,,N Each entry n ths vector s a Bernoull random varable Moreover, we defne P(Y n 1 Y n k 1) def θ k (12) Therefore, the goal s to estmate the matrx Θ θ 11 θ M1 θ 1M (13) θ MM from the observatons Y 1,,Y N The general problem of estmatng Θ from Y 1,,Y N s very dffcult Therefore, t s necessary to pose some assumptons on the problem The assumpton we make here s sem-vald from our daly experence It s not completely true, but they are smple enough to provde us some computatonal solutons Assumpton 1 Condtonal Independence We assume that the observatons follow the condtonal ndependence structure: P(Y n 1 Yj n 1 Yk n n 1) P(Y 1 Yk n 1) P(Y j n 1 Yk n 1) (14) Remark: Condtonal ndependence s not the same as ndependence For example, we let A be the event that a puppy breaks a toy, B be the event that a mother yells, and C be the event that a chld cres Wthout knowng the relatonshp, t could be that the chld cres because the mother yells However, f we assume the condtonal ndependence of B and C gven A, then we know that the cryng of the chld and the yellng of the mother are both trggered by the dog, but not by each other 7

8 Indvdual Model In order to understand the EM algorthm of Bernoull Mxture, let us set n fxed Consequently, Furthermore, P(Y n y n ) P(Y n y n tem m s actve )P( tem m s actve ) }{{} m1 def π m P(Y n y n tem m s actve ) where θ m [θ m1,,θ mm ] s the mth row of Θ Therefore, M θ yn m (1 θ m) 1 yn def f m (y n θ m ), P(Y n y n ) m1 π m f m (y n θ m ) (15) EM Algorthm Now, we wll derve EM algorthm to estmate {π 1,,π M } and Θ To start wth, let us defne the followng types of data: Observed Data: Y 1,,Y N ; Mssng Data: Z 1,,Z N wth realzatons z 1,,z N and z n R 1 N ; Complete Data: X 1,,X N, accordngly x n (y n,z n ) The dstrbuton of the complete data s P(Y n y n,z n z n Θ) π m f m (y n θ m ) The dstrbuton of the mssng data condtoned on the observed data s The nth Q functon s π (t) P(Z n m Y n y n,θ (t) m f m (y ) n θ (t) m ) M m1 π(t) m f m (y n θ (t) m ) where we can show that Q n (Θ Θ (t) ) def E Zn y n,θ (t)[logf(x n Θ)] E Zn y n,θ (t)[logf(z n,y n Θ)] log(π m f m (y n θ (t) m ))P(Z n m y n,θ (t) ) }{{} m1 m1 log(π m f m (y n θ (t) m )) logπ m +log nm log(π mf m (y n θ (t) m )), logπ m + M def j θ yn m (1 θ m) 1 yn y n logθ m +(1 y n )log(1 θ m) 8

9 Therefore, overall Q-functon s Q(Θ Θ (t ) γ nm (t) n1m1 To maxmze the Q functon, we solve [ logπ m + ] y n logθ m +(1 y n )log(1 θ m ) (16) For a fxed m and, we have Settng ths to zero yelds Θ (t1) argmaxq(θ Θ (t) ) (17) Θ θ m Q(Θ Θ (t) ) N n1 [ ] y γ nm (t) n 1 yn θ m 1 θ m whch s N n1 γ(t) nmy n N n1 γ(t) nm(1 y n), θ m 1 θ m N θ (t+1) n1 m γ(t) Data: EM Algorthm for Bernoull Mxture Model Result: Estmated Θ and π m for t 1, do end 7 Convergence of EM nm nmy N n1 γ(t) nm π (t) M m1 π(t) m f m (y n θ (t) N θ (t+1) n1 m γ(t) nmy n N n1 γ(t) nm π (t+1) m nm N n1 γ(t) nm (18) m ) m f m (y n θ (t) The convergence of EM algorthm s known to be local What t means s that as the EM algorthm terates, θ (t+1) wll never be less lkely than θ (t) Ths property s called the monotoncty of EM, whch s the result of the followng theorem Theorem 1 Let X and Y be two random varables wth parametrc dstrbuton controlled by a parameter θ Λ Suppose that: 1 X does not depend on θ; 2 There exsts a Markov relatonshp θ X Y e f(y x,θ) f(y x) for all θ Λ and x X, y Y Then, for θ Λ and y Y such that X(y), we have: m ) l(θ) l(θ (t) ) f Q(θ θ (t) ) Q(θ (t) θ (t) ) (19) 9

10 Proof l(θ) logf(y θ) (by defnton) log f(x, y θ)dx (margnalzaton, e, total probablty) X(y) f(x,y θ) log X(y) f(x y,θ (t) ) f(x y,θ(t) )dx [ ] f(x,y θ) loge X y,θ (t) f(x y,θ) [ E X y,θ (t) log f(x,y θ) ] (Jensen s Inequalty) f(x y,θ) E X y,θ (t) log f(y X,θ)f(X θ) (Baye s Rule) f(y X,θ (t) )f(x θ (t) ) f(y θ (t) ) [ ] E X y,θ (t) log f(y X)f(X θ)f(y θ(t) ) f(y X)f(X θ (t) ) [ ] E X y,θ (t) log f(x θ)f(y θ(t) ) f(x θ (t) ) E X y,θ (t) [logf(x θ)] E X y,θ (t) Q(θ θ (t) ) Q(θ (t) θ (t) )+logf(y θ (t) ) }{{} l(θ (t) ) (assumpton 2) [ ] [ ] logf(x θ (t) ) +E X y,θ (t) logf(y θ (t) ) Thus, l(θ) l(θ (t) ) Q(θ θ (t) ) Q(θ (t) θ (t) ) Hence f Q(θ θ (t) ) Q(θ (t) θ (t) ), then l(θ) l(θ (t) ) 8 Usng Pror wth EM The EM algorthm can fal due to sngularty of the log-lkelhood functon For example, when learnng a GMM wth 10 components, the algorthm may decde that the most lkely soluton s for one of the Gaussans to only have one data pont assgned to t Ths could yeld some bad result of havng zero covarance To allevate ths problem, one can use the pror nformaton about θ In ths case, we can modfy the EM setp as E-step: Q(θ θ (t) ) E X y,θ (t)[logf(x θ)]; M-step: θ (t+1) argmax θ Q(θ θ (t) )+logf(θ) }{{} pror Example Assume that we have a GMM of k-components: f(y j θ) w N(y j µ,σ 2 ) (20) 10

11 Let us consder a constrant on µ : µ µ+( 1) µ, for 1,,k, e the means are equally spaced (For detals please refer to secton 33 of Gupta and Chen Prors: We assume the followng prors: 1 2 That s, That s, ( ) v σ 2 nverse-gamma 2, ǫ2 2 2 f(σ 2 ) (ξ 2 )v 2 Γ( v 2 ) (σ2 ) v 2 1 exp (σ 2 ) v+3 2 exp µ σ 2 N f( µ σ 2 ) exp ( ( ξ2 ( ξ2 2σ 2 ) 2σ 2 ( η, σ ) ρ ( µ η)2 2( σ2 ρ ) ) ) Therefore, the jont dstrbuton of the pror s: f( µ,σ 2 ) (σ 2 ) v+3 2 exp { ξ2 +l( η) 2 2σ 2 } (21) Parameters: θ (w 1,,w k,µ, µ,σ 2 ) Our goal s to estmate θ EM algorthm: Frst of all, we let j The EM steps can be derved as follows The Expectaton Step Q(θ θ (t) ) j1 j1 j1 j log(w N(y j µ,σ 2 )) w (t) N(y j µ (t),σ 2(t) ) w(t) N(y j µ (t),σ 2(t) ) (22) j log(w N(y j µ+( 1) µ,σ 2 )) j logw n 2 log(2π) n 2 log(σ2 ) 1 2σ 2 j1 j (y j µ ( 1) µ) 2 11

12 The Maxmzaton Step θ (t+1) argmax θ argmax θ 1 2σ 2 Q(θ θ (t) )+logf(θ) j1 j1 j logw n+v +3 logσ 2 ξ +l( µ η)2 2 2σ 2 j (y j µ ( 1) µ) 2 +C Thus, and w (t+1) j1 γ(t) j j1 γ(t) j, { µ [Q(θ θ(t) )+logf(θ)] 0 µ [Q(θ θ(t) )+logf(θ)] 0 [ 1 w(t+1) w(t+1) +1 1 w(t+1) l n ] [ ] [ µ µ 1 ρη n + 1 n j1 n j1 y j The soluton of µ and µ can be obtaned by solvng the lnear system Fnally, ( ) σ 2 Q(θ θ (t) )+logf(θ) 0 σ 2(t+1) ξ2 +l( µ (t+1) η) 2 + j1 n+v +3 γ(t) j (y j µ (t+1) 2 γ(t) j ( 1)y j ) 2 9 MALAB Demo: EM Algorthm for Bernoull Mxture 91 Synthesze The Data ] 1 functon [ data rand] MakeData( DS, u vec, p mat ) 2 3 cnt 0; 4 for 1:1:length(u vec) 5 N DS*u vec(); 6 p vec p mat(,:); 7 %% 8 for m 1:1:length(p vec) 9 data vec randperm(n); 10 th N*p mat(,m); 11 for n 1:1:N 12 f data vec(n) > th 13 data vec(n) 0; 14 else 15 data vec(n) 1; 16 end 17 end 18 data(cnt+1:cnt+n, m) data vec'; 19 end 20 cnt cnt + N; 21 end %% Now randomly permutate the rows of the matrx 24 [row, column] sze(data); 25 row vec randperm(row); 12

13 26 for 1:1:row 27 randtemp row vec(); 28 data rand(,:) data(randtemp,:); 29 end end 92 Estmate the probablty of a Vector Gven Bernoull Dstrbuton 1 functon [ p b ] Bernoull vec( p vec, y vec ) 2 %% Calculate the probablty of usng the current Bernoull Mxture 3 p b 1; 4 for 1:1:length(p vec) 5 p b p b*(p vec()ˆ(y vec()))*((1-p vec())ˆ(1-y vec())); 6 end 7 8 end 93 The Man Functon for EM wth Bernoull Mxture 1 close all 2 clear all 3 clc 4 DS nput('eneter the synthetzed data sze:'); 5 u vec [1/4, 1/2, 1/4] 6 p mat [1, 04, 005; 7 02, 1, 08; 8 03, 07, 1] 9 data rand MakeData(DS, u vec, p mat); 10 T nput('enter the desred number of teratons:'); %% Pck Intalzaton of parameters 13 u ntal [1/4, 1/8, 5/8]; 14 p ntal [03, 02, 08; 15 01, 08, 07; 16 05, 015, 06]; M length(u ntal); 19 N sze(data rand, 1); 20 % Intlaze the parameters 21 u u ntal; 22 p p ntal; u hstory zeros(m,t); 25 p hstory zeros(m,m,t); for t 1:1:T 28 for m 1:1:M 29 p m u(m); 30 p vec p(m,:); 31 for n 1:1:N 32 y vec data rand(n,:); 33 %% Fnd the Hdden Varable, lambda 34 numerator p m*bernoull vec(p vec, y vec); % Modle the Bernoull Process 35 denom 0; 36 for mm 1:1:M 37 p vec tmp p(mm,:); 38 denom denom + u(mm)*bernoull vec(p vec tmp, y vec); 39 end 40 lambda(m,n) numerator/denom; 41 end 13

14 42 end sum lambda sum(sum(lambda)); %% Update mu 47 for m 1:1:M 48 u(m) sum(lambda(m,:))/sum lambda; 49 end %% Update P matrx 52 for 1:1:M 53 for m 1:1:M 54 p(m,) (sum(lambda(m,:)*data rand(:,)'))/(sum(lambda(m,:))); 55 end 56 end %% Save n hstory for each teraton to plot 59 u hstory(:,t) u; 60 p hstory(:,:,t) p; 61 end 62 dsp('updated p and u:') 63 p 64 u fgure 67 hold on 68 grd on 69 for m 1:1:M 70 plot(u hstory(m,:)); 71 end 72 ylabel('estmated \mu value', 'FontSze', 20) 73 xlabel('iteratons', 'FontSze', 20) 74 ttle('convergence of \mu estmated for Mxture Number 3', 'FontSze', 20) 75 for m 1:1:M 76 stem(t, u vec(m)); 77 end fgure 81 hold on 82 grd on 83 for 1:1:M 84 for jj 1:1:M 85 for t 1:1:T 86 tmp p hstory(,jj,t); 87 plot vec(t) tmp; 88 end 89 plot(plot vec) 90 end 91 end 92 ylabel('estmated P matrx values', 'FontSze', 20) 93 xlabel('iteratons', 'FontSze', 20) 94 ttle('convergence of P matrx estmated for Mxture Number 3', 'FontSze', 20) 95 for m 1:1:M 96 for n 1:1:M 97 stem(t, p mat(m,n)); 98 end 99 end for m 1:1:M 102 one loc fnd(abs(p(m,:) - 1) mn(abs(p(m,:) - 1))) 103 p fnal(one loc,:) p(m,:); 104 u fnal(one loc) u(m); 105 end dsp('after Automatc Sortng Based on Dagnals:') 108 p fnal 109 u fnal 14

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

Hidden Markov Models

Hidden Markov Models Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics /7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and

More information

3.1 ML and Empirical Distribution

3.1 ML and Empirical Distribution 67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9 Why MLE?

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

Lecture 3: Probability Distributions

Lecture 3: Probability Distributions Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the

More information

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1 Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

The Basic Idea of EM

The Basic Idea of EM The Basc Idea of EM Janxn Wu LAMDA Group Natonal Key Lab for Novel Software Technology Nanjng Unversty, Chna wujx2001@gmal.com June 7, 2017 Contents 1 Introducton 1 2 GMM: A workng example 2 2.1 Gaussan

More information

Semi-Supervised Learning

Semi-Supervised Learning Sem-Supervsed Learnng Consder the problem of Prepostonal Phrase Attachment. Buy car wth money ; buy car wth wheel There are several ways to generate features. Gven the lmted representaton, we can assume

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number

More information

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30

More information

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore 8/5/17 Data Modelng Patrce Koehl Department of Bologcal Scences atonal Unversty of Sngapore http://www.cs.ucdavs.edu/~koehl/teachng/bl59 koehl@cs.ucdavs.edu Data Modelng Ø Data Modelng: least squares Ø

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

Probability Theory (revisited)

Probability Theory (revisited) Probablty Theory (revsted) Summary Probablty v.s. plausblty Random varables Smulaton of Random Experments Challenge The alarm of a shop rang. Soon afterwards, a man was seen runnng n the street, persecuted

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30 STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear

More information

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF 10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the

More information

Lecture 4: Universal Hash Functions/Streaming Cont d

Lecture 4: Universal Hash Functions/Streaming Cont d CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law: CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and

More information

Supplementary material: Margin based PU Learning. Matrix Concentration Inequalities

Supplementary material: Margin based PU Learning. Matrix Concentration Inequalities Supplementary materal: Margn based PU Learnng We gve the complete proofs of Theorem and n Secton We frst ntroduce the well-known concentraton nequalty, so the covarance estmator can be bounded Then we

More information

Tracking with Kalman Filter

Tracking with Kalman Filter Trackng wth Kalman Flter Scott T. Acton Vrgna Image and Vdeo Analyss (VIVA), Charles L. Brown Department of Electrcal and Computer Engneerng Department of Bomedcal Engneerng Unversty of Vrgna, Charlottesvlle,

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1 On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool

More information

Clustering gene expression data & the EM algorithm

Clustering gene expression data & the EM algorithm CG, Fall 2011-12 Clusterng gene expresson data & the EM algorthm CG 08 Ron Shamr 1 How Gene Expresson Data Looks Entres of the Raw Data matrx: Rato values Absolute values Row = gene s expresson pattern

More information

Machine learning: Density estimation

Machine learning: Density estimation CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machne Learnng - Lectures Lecture 1-2: Concept Learnng (M. Pantc Lecture 3-4: Decson Trees & CC Intro (M. Pantc Lecture 5-6: Artfcal Neural Networks (S.Zaferou Lecture 7-8: Instance ased Learnng

More information

Lecture Nov

Lecture Nov Lecture 18 Nov 07 2008 Revew Clusterng Groupng smlar obects nto clusters Herarchcal clusterng Agglomeratve approach (HAC: teratvely merge smlar clusters Dfferent lnkage algorthms for computng dstances

More information

Quantifying Uncertainty

Quantifying Uncertainty Partcle Flters Quantfyng Uncertanty Sa Ravela M. I. T Last Updated: Sprng 2013 1 Quantfyng Uncertanty Partcle Flters Partcle Flters Appled to Sequental flterng problems Can also be appled to smoothng problems

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

Expectation Maximization Mixture Models HMMs

Expectation Maximization Mixture Models HMMs -755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Hidden Markov Models

Hidden Markov Models CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Math 217 Fall 2013 Homework 2 Solutions

Math 217 Fall 2013 Homework 2 Solutions Math 17 Fall 013 Homework Solutons Due Thursday Sept. 6, 013 5pm Ths homework conssts of 6 problems of 5 ponts each. The total s 30. You need to fully justfy your answer prove that your functon ndeed has

More information

Gaussian process classification: a message-passing viewpoint

Gaussian process classification: a message-passing viewpoint Gaussan process classfcaton: a message-passng vewpont Flpe Rodrgues fmpr@de.uc.pt November 014 Abstract The goal of ths short paper s to provde a message-passng vewpont of the Expectaton Propagaton EP

More information

1 Motivation and Introduction

1 Motivation and Introduction Instructor: Dr. Volkan Cevher EXPECTATION PROPAGATION September 30, 2008 Rce Unversty STAT 63 / ELEC 633: Graphcal Models Scrbes: Ahmad Beram Andrew Waters Matthew Nokleby Index terms: Approxmate nference,

More information

Randomness and Computation

Randomness and Computation Randomness and Computaton or, Randomzed Algorthms Mary Cryan School of Informatcs Unversty of Ednburgh RC 208/9) Lecture 0 slde Balls n Bns m balls, n bns, and balls thrown unformly at random nto bns usually

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Exerments-I MODULE III LECTURE - 2 EXPERIMENTAL DESIGN MODELS Dr. Shalabh Deartment of Mathematcs and Statstcs Indan Insttute of Technology Kanur 2 We consder the models

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Statistics and Probability Theory in Civil, Surveying and Environmental Engineering

Statistics and Probability Theory in Civil, Surveying and Environmental Engineering Statstcs and Probablty Theory n Cvl, Surveyng and Envronmental Engneerng Pro. Dr. Mchael Havbro Faber ETH Zurch, Swtzerland Contents o Todays Lecture Overvew o Uncertanty Modelng Random Varables - propertes

More information

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva Econ 39 - Statstcal Propertes of the OLS estmator Sanjaya DeSlva September, 008 1 Overvew Recall that the true regresson model s Y = β 0 + β 1 X + u (1) Applyng the OLS method to a sample of data, we estmate

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Stat 543 Exam 2 Spring 2016

Stat 543 Exam 2 Spring 2016 Stat 543 Exam 2 Sprng 206 I have nether gven nor receved unauthorzed assstance on ths exam. Name Sgned Date Name Prnted Ths Exam conssts of questons. Do at least 0 of the parts of the man exam. I wll score

More information

Convergence of random processes

Convergence of random processes DS-GA 12 Lecture notes 6 Fall 216 Convergence of random processes 1 Introducton In these notes we study convergence of dscrete random processes. Ths allows to characterze phenomena such as the law of large

More information

1 The Mistake Bound Model

1 The Mistake Bound Model 5-850: Advanced Algorthms CMU, Sprng 07 Lecture #: Onlne Learnng and Multplcatve Weghts February 7, 07 Lecturer: Anupam Gupta Scrbe: Bryan Lee,Albert Gu, Eugene Cho he Mstake Bound Model Suppose there

More information

Gaussian Conditional Random Field Network for Semantic Segmentation - Supplementary Material

Gaussian Conditional Random Field Network for Semantic Segmentation - Supplementary Material Gaussan Condtonal Random Feld Networ for Semantc Segmentaton - Supplementary Materal Ravtea Vemulapall, Oncel Tuzel *, Mng-Yu Lu *, and Rama Chellappa Center for Automaton Research, UMIACS, Unversty of

More information

Lecture 21: Numerical methods for pricing American type derivatives

Lecture 21: Numerical methods for pricing American type derivatives Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)

More information

STAT 3008 Applied Regression Analysis

STAT 3008 Applied Regression Analysis STAT 3008 Appled Regresson Analyss Tutoral : Smple Lnear Regresson LAI Chun He Department of Statstcs, The Chnese Unversty of Hong Kong 1 Model Assumpton To quantfy the relatonshp between two factors,

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida Frst Year Examnaton Department of Statstcs, Unversty of Florda May 7, 010, 8:00 am - 1:00 noon Instructons: 1. You have four hours to answer questons n ths examnaton.. You must show your work to receve

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X The EM Algorthm (Dempster, Lard, Rubn 1977 The mssng data or ncomplete data settng: An Observed Data Lkelhood (ODL that s a mxture or ntegral of Complete Data Lkelhoods (CDL. (1a ODL(;Y = [Y;] = [Y,][

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

The Expectation-Maximisation Algorithm

The Expectation-Maximisation Algorithm Chapter 4 The Expectaton-Maxmsaton Algorthm 4. The EM algorthm - a method for maxmsng the lkelhood Let us suppose that we observe Y {Y } n. The jont densty of Y s f(y ; θ 0), and θ 0 s an unknown parameter.

More information

Stat 543 Exam 2 Spring 2016

Stat 543 Exam 2 Spring 2016 Stat 543 Exam 2 Sprng 2016 I have nether gven nor receved unauthorzed assstance on ths exam. Name Sgned Date Name Prnted Ths Exam conssts of 11 questons. Do at least 10 of the 11 parts of the man exam.

More information

Chapter Newton s Method

Chapter Newton s Method Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve

More information

Technical Note: An Expectation-Maximization Algorithm to Estimate the Parameters of the Markov Chain Choice Model

Technical Note: An Expectation-Maximization Algorithm to Estimate the Parameters of the Markov Chain Choice Model Techncal Note: An Expectaton-Maxmzaton Algorthm to Estmate the Parameters of the Markov Chan Choce Model A. Serdar Şmşek 1, Huseyn Topaloglu 2 August 1, 2017 Abstract We develop an expectaton-maxmzaton

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence

More information

Statistical learning

Statistical learning Statstcal learnng Model the data generaton process Learn the model parameters Crteron to optmze: Lkelhood of the dataset (maxmzaton) Maxmum Lkelhood (ML) Estmaton: Dataset X Statstcal model p(x;θ) (θ parameters)

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Lecture 3 Stat102, Spring 2007

Lecture 3 Stat102, Spring 2007 Lecture 3 Stat0, Sprng 007 Chapter 3. 3.: Introducton to regresson analyss Lnear regresson as a descrptve technque The least-squares equatons Chapter 3.3 Samplng dstrbuton of b 0, b. Contnued n net lecture

More information

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z ) C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z

More information

Report on Image warping

Report on Image warping Report on Image warpng Xuan Ne, Dec. 20, 2004 Ths document summarzed the algorthms of our mage warpng soluton for further study, and there s a detaled descrpton about the mplementaton of these algorthms.

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

APPENDIX A Some Linear Algebra

APPENDIX A Some Linear Algebra APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,

More information

10-801: Advanced Optimization and Randomized Methods Lecture 2: Convex functions (Jan 15, 2014)

10-801: Advanced Optimization and Randomized Methods Lecture 2: Convex functions (Jan 15, 2014) 0-80: Advanced Optmzaton and Randomzed Methods Lecture : Convex functons (Jan 5, 04) Lecturer: Suvrt Sra Addr: Carnege Mellon Unversty, Sprng 04 Scrbes: Avnava Dubey, Ahmed Hefny Dsclamer: These notes

More information

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material Natural Images, Gaussan Mxtures and Dead Leaves Supplementary Materal Danel Zoran Interdscplnary Center for Neural Computaton Hebrew Unversty of Jerusalem Israel http://www.cs.huj.ac.l/ danez Yar Wess

More information