The Expectation-Maximisation Algorithm

Size: px
Start display at page:

Download "The Expectation-Maximisation Algorithm"

Transcription

1 Chapter 4 The Expectaton-Maxmsaton Algorthm 4. The EM algorthm - a method for maxmsng the lkelhood Let us suppose that we observe Y {Y } n. The jont densty of Y s f(y ; θ 0), and θ 0 s an unknown parameter. Our objectve s to estmate θ 0. The log-lkelhood of Y s L n (Y ; θ) log f(y ; θ) Observe, that we have not specfed that {Y } are d random varables. Ths s because the procedure that we wll descrbe below s extremely general and the observatons do not need to be ether ndependent or dentcally dstrbuted (ndeed a very nterestng extenson of ths procedure, s to tme seres wth mssng data frst proposed n Shumway and Stoffer (982) and Engle and Watson (982)). Our objectve s to estmate θ 0, n the stuaton where ether evaluatng the log-lkelhood L n or maxmsng L n s dffcult. Hence an alternatve means of maxmsng L n s requred. Often, there may exst unobserved data {U {U } m }, where the lkelhood of (Y U) can be easly evaluated. It s through these unobserved data that we fnd an alternatve method for maxmsng L n. Example 4.. Let us suppose that {T } n+m are d survval tmes, wth densty f(x; θ 0 ). Some of these tmes are censored and we observe {Y } n+m, where Y mn(t c). To smplfy notaton we wll suppose that {Y T } n, hence the survval tme for n, s observed 25

2 but Y c for n + n + m. Usng the results n Secton the log-lkelhood of Y s n L n (Y ; θ) n+m log f(y ; θ) + n+ log (Y ; θ). The observatons {Y } n+ n+m can be treated as f they were mssng. Defne the complete observatons U {T } n+ n+m, hence U contans the unobserved survval tmes. Then the lkelhood of (Y U) s L n (Y U; θ) n+m log f(t ; θ). Usually t s a lot easer to maxmse L n (Y U) than L n (Y ). We now formally descrbe the EM-algorthm. As mentoned n the dscusson above t s easer to deal wth the jont lkelhood of (Y U) than wth the lkelhood of Y tself. Hence let us consder ths lkelhood n detal Let us suppose that the jont lkelhood of (Y U) s L n (Y U; θ) log f(y U; θ). Ths lkelhood s often called the complete lkelhood, we wll assume that f U were known, then ths lkelhood would be easy to obtan and dfferentate. We wll assume that the densty f(u Y ; θ) s also known and s easy to evaluate. By usng Bayes theorem t s straghtforward to show that log f(y U; θ) log f(y ; θ) + log f(u Y ; θ) (4.) L n (Y U; θ) L n (Y ; θ) + log f(u Y ; θ). Of course, n realty log f(y U; θ) s unknown, because U s unobserved. However, let us consder the expected value of log f(y U; θ) gven what we observe Y. That s Q(θ 0 θ) log f(y U; θ) Y θ 0 log f(y u; θ) f(u Y θ 0 ) (4.2) where f(u Y θ 0 ) s the condtonal dstrbuton of U gven Y and the unknown parameter θ 0. Hence f f(u Y θ 0 ) were known, then Q(θ 0 θ) can be evaluated. Remark 4.. It s worth notng that Q(θ 0 θ) log f(y U; θ) Y θ0 can be vewed as the best predctor of the complete lkelhood (nvolvng both observed and unobserved data - (Y U)) gven what s observed Y. We recall that the condtonal expectaton s the best predctor of U n terms of mean squared error, that s the functon of Y whch mnmses the mean squared error: (U Y ) arg mn g (U g(y )) 2. 26

3 The EM algorthm s based on teratng Q( ) n such a way that at each step we obtanng a θ whch gves a larger value of Q( ) (and as we wll show later, ths gves a larger L n (Y ; θ)). We descrbe the EM-algorthm below. The EM-algorthm () Defne an ntal value θ Θ. Let θ θ. () The expectaton step The k+)-step), For a fxed θ evaluate Q(θ θ) log f(y U; θ) Y θ log f(y u; θ) f(u Y θ )du for all θ Θ. () The maxmsaton step Evaluate θ k+ arg max θ Θ Q(θ θ). We note that the maxmsaton can be done by fndng the soluton of log f(y U; θ) θ Y θ 0. (v) If θ k and θ k+ are suffcently close to each other stop the algorthm and set ˆθ n θ k+. Else set θ θ k+, go back and repeat steps () and () agan. We use ˆθ n as an estmator of θ 0. To understand why ths teraton s connected to the maxmsng of L n (Y ; θ) and, under certan condtons, gves a good estmator of θ 0 (n the sense that ˆθ n s close to the parameter whch maxmses L n ) let us return to (4.). Takng the expectaton of log f(y U; θ), condtoned on Y we have Q(θ θ) log f(y U; θ) Y θ log f(y ; θ) + log f(u; θ) Y θ. (observe that ths s lke (4.2)), but the dstrbuton used n the expectaton s f(u Y θ ) nstead of f(u Y θ )). Defne D(θ θ) log f(u; θ) Y θ log f(u; θ) f(u Y θ )du. Hence we have Q(θ θ) L n (θ) + D(θ θ). (4.3) Now we recall that at the (k + )th step teraton of the EM-algorthm, θ k+ maxmses Q(θ k θ) over all θ Θ, hence Q(θ k θ k+ ) Q(θ k θ k ). In the lemma below we show that L n (θ k+ ) L n (θ k ), hence at each teraton of the EMalgorthm we are obtanng a θ k+ whch ncreases the lkelhood over the prevous teraton. 27

4 Lemma 4.. We have L n (θ k+ ) L n (θ k ). Moroever, under certan condtons we have θ k converges to the maxmum lkelhood estmator arg max L n (Y ) (we do not prove ths part of the result here). PROOF. From (4.3) t s clear that Q(θ k θ k+ ) Q(θ k θ k ) L n (θ k+ ) L n (θ k ) + D(θ k θ k+ ) D(θ k θ k ). (4.4) We wll now that D(θ n θ n+ ) D(θ n θ n ) 0, the result follows from ths. We observe that D(θn θ n+ ) D(θ n θ n ) log f(u; θ n+) f(u; θ n+ ) f(u Y θ n)du. Now by usng the Jenson s nequalty (whch we have used several tmes prevously) we have that D(θn θ n+ ) D(θ n θ n ) log f(u; θ n+ )du 0. Therefore, we have D(θ n θ n+ ) D(θ n θ n ) 0. Snce D(θ n θ n+ ) D(θ n θ n ) 0 we have by (4.4) that Ln (θ n+ ) L n (θ n ) Q(θ n θ n+ ) Q(θ n θ n ) 0. and we obtan the desred result (L n (θ k+ ) L n (θ k )). Remark 4..2 The Fsher nformaton) The Fsher nformaton of of the observed lkelhood L n (Y ; θ) s I n (θ 0 ) 2 log f(y ; θ) θ 2 θθ0. As n the Secton 4., I n (θ 0 ) s the asymptotc varance of the lmtng dstrbuton of ˆθ n. To understand, how much s lost by not havng a complete set of observatons, we now rewrte the Fsher nformaton n terms of the complete data and the mssng data. By usng (4.) I n (θ 0 ) can be rewrtten as I n (θ 0 ) 2 log f(y ; θ) θ 2 2 log f(u Y ; θ) θθ 0 θ 2 θθ 0 I n (C) (θ 0 ) I n (M) (θ 0 ). In the case that θ s unvarate, t s clear that I (C) n (θ 0 ) I (M) (θ 0 ). Hence, as one would expect, the complete data set (Y U) contans more nformaton about the unknown parameter than Y. 28 n

5 If U s fully determned by Y, then t can be shown that I n (M) (θ 0 ) 0, and no nformaton has been lost. From a practcal pont of vew, one s nterested n how many teratons of the EM-algorthm s requred to obtan an estmator suffcently close to the MLE. Let J n (C) (θ 0 ) 2 log f(y ; θ) θ 2 θθ 0 Y θ 0 J n (M) (θ 0 ) 2 log f(u Y ; θ) Y θ 0. θ 2 θθ 0 By dfferentatng (4.) twce wth respect to the parameter θ we have J n (θ 0 ) 2 log f(y ; θ) θ 2 J n (C) (θ 0 ) J n (M) (θ 0 ). θθ 0 Now t can be shown the rate of convergence of the algorthm depends on on the rato J n (C) (θ 0 ) J n (M) (θ 0 ). The closer the largest egenvalue of J n (C) (θ 0 ) J n (M) (θ 0 ) s to one, the slower the rate of convergence, and a large number of teratons requred. On the other hand f the largest egenvalue of J n (C) (θ 0 ) J n (M) (θ 0 ) s close to zero, then the rate of convergence s fast (small number of teratons for convergence to the mle). 4.. Censored data Let us return to the example at the start of ths secton, and construct the EM-algorthm for censored data. We recall that the log-lkelhoods for censored data and complete data are and n L n (Y ; θ) n L n (Y U; θ) n+m log f(y ; θ) + n+ n+m log f(y ; θ) + n+ log (Y ; θ). log f(t ; θ). To mplement the EM-algorthm we need to evaluate the expectaton step Q(θ θ). It s easy to see that Q(θ θ) L n (Y U; θ) Y θ n To obtan log f(t ; θ) Y θ ( n + ) we note that n+m log f(y ; θ) + n+ log f(t ; θ) Y θ (log f(t ; θ) T c) log f(t ; θ) f(u; θ )du. (c; θ) 29 c log f(t ; θ) Y θ.

6 Therefore we have Q(θ θ) n m log f(y ; θ) + (c; θ ) c log f(t ; θ) f(u; θ )du. We also note that the dervatve of Q(θ θ) wth respect to θ s Q(θ θ) θ n f(y ; θ) f(y ; θ) + θ Hence for ths example, the EM-algorthm s () Defne an ntal value θ Θ. Let θ θ. () The expectaton step, For a fxed θ evaluate Q(θ θ) θ n () The maxmsaton step f(y ; θ) f(y ; θ) + θ m (c; θ ) c m (c; θ ) c Solve for Q(θ θ) θ. Let θ k+ be such that Q(θ θ) θ θθk 0. f(u; θ) f(u; θ) f(u; θ) f(u; θ )du. θ f(u; θ) f(u; θ )du. θ (v) If θ k and θ k+ are suffcently close to each other stop the algorthm and set ˆθ n θ k+. Else set θ θ k+, go back and repeat steps () and () agan Mxture dstrbutons We now consder a useful applcaton of the EM-algorthm, to the estmaton of parameters n mxture dstrbutons. Let us suppose that {Y } n are d random varables wth densty f(y; θ) pf (y; θ ) + ( p)f 2 (y; θ 2 ) where θ (p θ θ 2 ) are unknown parameters. For the purpose of dentfablty we wll suppose that θ θ 2, p and p 0. The log-lkelhood of {Y } s n L n (Y ; θ) log pf (Y ; θ ) + ( p)f 2 (Y ; θ 2 ). (4.5) Now maxmsng the above can be extremely dffcult. As an llustraton consder the example below. 30

7 Example 4..2 Let us suppose that f (y; θ ) and f 2 (y; θ ) are normal denstes, then the log lkelhood s n L n (Y ; θ) log p exp( (Y 2πσ 2 µ ) 2 ) + ( p) exp( 2πσ 2 2 2σ 2 2σ 2 2 (Y µ 2 ) 2 ). We observe ths s extremely dffcult to maxmse. On the other hand f Y were smply normally dstrbuted then the log-lkelhood s extremely smple n L n (Y ; θ) log σ 2 + 2σ 2 (Y µ ) 2 ). (4.6) In other words, the smplcty of maxmsng the log-lkelhood of the exponental famly of dstrbutons (see Secton 3.) s lost for mxtures of dstrbutons. We now use the EM-algorthm as an ndrect but smple method of maxmsng (4.6). In ths example, t s not clear what observatons are mssng. However, let us consder one possble ntepretaton of the mxture dstrbuton. Let us defne the random varables δ and Y, where δ { 2}, and the probablty of Y y gven δ s P (δ ) p and P (δ 2) ( p) f(y y δ ) f (y; θ )dy and f(y y δ 2) f 2 (y; θ 2 )dy. Therefore, t s clear from the above that the densty of Y s f(y; θ) pf (y; θ ) + ( p)f 2 (y; θ 2 ). Hence, one nterpretaton of the mxture model s that there s a hdden unobserved random varable whch determnes the state or dstrbuton of Y. A smple example, s that Y s the heght of an ndvdual and δ s the gender. However, δ s unobserved and only the heght s observed. Often a mxture dstrbuton has a physcal nterpretaton, smlar to the heght example, but sometmes t can be used to parametrcally model a wde class of denstes. Based on the dscusson above, U {δ } can be treated as the mssng observatons. The lkelhood of (Y U ) s p f (Y ; θ ) I(δ ) p2 f 2 (Y ; θ 2 ) I(δ 2) pδ f δ (Y ; θ δ ). where we set p 2 p. Therefore the log lkelhood of {(Y δ )} s n L n (Y U; θ) log pδ + log f δ (Y ; θ δ ). 3

8 We now need to evaluate Q(θ θ) L n (Y U; θ) Y θ n log pδ Y θ + log fδ (Y ; θ δ ) Y θ. We see that the above expectaton s taken wth respect the dstrbuton of δ condtoned on Y and the parameter θ. By usng condtonng arguments t s easy to see that Therefore P (δ Y y θ ) P (δ Y y; θ ) P (Y y; θ ) : w (θ ) P (δ 2 Y y θ ) p f 2 (y θ 2 ) p f (y θ ) + ( p )f 2 (y θ 2 ) : w 2 (θ ) w (θ ). Q(θ θ) n log p + log f (Y ; θ ) w (θ ) + n p f (y θ ) p f (y θ ) + ( p )f 2 (y θ 2 ) log( p) + log f 2 (Y ; θ 2 ) w 2 (θ ). Now maxmsng the above wth respect to p θ and θ 2 n general wll be much easer than maxmsng L n (Y ; θ). For ths example the EM algorthm s () Defne an ntal value θ Θ. Let θ θ. () The expectaton step, For a fxed θ evaluate n Q(θ θ) log p + log f (Y ; θ ) w (θ ) + () The maxmsaton step n log( p) + log f 2 (Y ; θ 2 ) w 2 (θ ). Evaluate θ k+ arg max θ Θ Q(θ θ) by dfferentatng Q(θ θ) wrt to θ and equatng to zero. Snce the parameters p and θ θ 2 are n separate subfunctons, they can be maxmsed separately. (v) If θ k and θ k+ are suffcently close to each other stop the algorthm and set ˆθ n θ k+. Else set θ θ k+, go back and repeat steps () and () agan. Exercse: Derve the EM algorthm n the case that f and f 2 have normal denstes. It s straghtforward to see that the arguments above can be generalsed to the case that the densty of Y s a mxture of r dfferent denstes. However, we observe that the selecton of r can be qute adhoc. There are methods for choosng r, these nclude the reversble jump MCMC methods proposed by Peter Green. 32

9 Example 4..3 Queston: Suppose that the regressors x t are beleved to nfluence the response varable Y t. The dstrbuton of Y t s P (Y t y) p λy t exp( λ ty) y where λ t exp(β x t) and λ t2 exp(β 2 x t). + ( p) λy t2 exp( λ t2y) y () State mnmum condtons on the parameters, for the above model to be dentfable? () Carefully explan (gvng detals of Q(θ θ ) and the EM stages) how the EM-algorthm can be used to obtan estmators of β β 2 and p. () Derve the dervatve of Q(θ θ ), and explan how the dervatve may be useful n the maxmsaton stage of the EM-algorthm. (v) Gven an ntal value, wll the EM-algorthm always fnd the maxmum of the lkelhood? Explan how one can check whether the parameter whch maxmses the EM-algorthm, maxmses the lkelhood. Soluton () 0 < p < and β β 2 (these are mnmum assumptons, there could be more whch s hard to account for gven the regressors x t ). () We frst observe that P (Y t y) s a mxture of two Posson dstrbutons where each has the canoncal lnk functon. Defne the unobserved varables, {U t }, whch are d and where P (U t ) p and P (U t 2) ( p) and P (Y y U ) λy t exp( λty) y and P (Y y) U 2) λy t2 exp( λt2y) y. Therefore, we have log f(y t U t θ) Y t βu t x t exp(βu t x t ) + log Y t + log p where θ (β β 2 p). Thus, E(log f(y t U t θ) Y t θ ) s E(log f(y t U t θ) Y t θ ) Y t βx t exp(βx t ) + log Y t + log p π(θ Y t ) + Y t β2x t exp(β2x t ) + log Y t + log p ( π(θ Y t )). where P (U Y t θ ) s evaluated as P (U Y t θ ) π(θ Y t ) pf (Y t θ ) pf (Y t θ ) + ( p)f 2 (Y t θ ) 33

10 wth f (Y t θ ) exp(β x ty t ) exp( Y t exp(β x t)) Y t Thus Q(θ θ ) s Q(θ θ ) T t f (Y t θ ) exp(β x ty t ) exp( Y t exp(β x t). Y t Y t βx t exp(βx t ) + log Y t + log p π(θ Y t ) + Y t β2x t exp(β2x t ) + log Y t + log( p) ( π(θ Y t )). Usng the above, the EM algorthm s the followng: (a) Start wth an ntal value whch s an estmator of β β 2 and p, denote ths as θ. (b) For every θ evaluate Q(θ θ ). (c) Evaluate arg max θ Q(θ θ ). Denote the maxmum as θ and return to step (b). (d) Keep teratng untl the maxmums are suffcently close. () The dervatve of Q(θ θ ) s Q(θ θ ) β Q(θ θ ) β 2 Q(θ θ ) p T t T t T t Y t exp(βx t ) x t π(θ Y t ) Y t exp(β2x t ) x t ( π(θ Y t )) p π(θ Y t ) p ( π(θ Y t ). Thus maxmsaton of Q(θ θ ) can be acheved by solvng for the above equatons usng teratve weghted least squares. (v) Dependng on the ntal value, the EM-algorthm may only locate a local maxmum. To check whether we have found the global maxmum, we can start the EM-algorthm wth several dfferent ntal values and check where they converge. Example 4..4 Queston (2) Let us suppose that (t) and 2 (t) are two survval functons. Let x denote a unvarate regressor. [25] () Show that (t; x) p (t) exp(βx) + ( p) 2 (t) exp(β2x) s a vald survval functon and obtan the correspondng densty functon. 34

11 Soluton () Suppose that T are survval tmes and x s a unvarate regressor whch exerts an nfluence an T. Let Y mn(t c), where c s a common censorng tme. {T } are ndependent random varables wth survval functon (t; x ) p (t) exp(βx ) + ( p) 2 (t) exp(β2x ), where both and 2 are known, but p, β and β 2 are unknown. State the censored lkelhood and show that the EM-algorthm together wth teratve least squares n the maxmsaton step can be used to maxmse ths lkelhood (suffcent detals need to be gven such that your algorthm can be easly coded). ) Snce and 2 are monotoncally decreasng postve functons where (0) F 2 (0) and ( ) 2 ( ) 0, then t mmedately follows that s the same (use that df(t) dt (t x) p (t) eβ x + ( p) 2 (t) eβ 2 x f (t)), thus (t; x) s a survval functon. (t x) t pe βx f (t) (t) eβ x ( p)e β2x f 2 (t) 2 (t) eβ 2 x f(t; x) pe βx f (t) (t) eβ x + ( p)e β2x f 2 (t) 2 (t) eβ 2 x ) The censored log lkelhood s T L n (β β 2 p) [δ log f(y ; β β 2 p) + ( δ ) log (Y ; β β 2 p)]. Clearly, drectly maxmzng the above s extremely dffcult. Thus we look for an alternatve method va the EM algorthm. Defne the unobserved varable I wth P (I ) p p 2 wth P (I 2) ( p) p 2. Then the jont densty of (ϕ δ I ) s δ log p I + β I x + log f I (Y ) + (e β I x ) log F I (Y ) +( δ ) log p I + (e β I x ) log F I (Y ). Thus the complete log lkelhood s T L T (Y δ I ; β β 2 p) {δ [log p I + β I x + log f I (Y ) + (e β I x ) log I (Y ) +( δ )[log p I + (e β I x ) log I (Y )])} 35

12 Now we need to calculate P (I Y δ ). We have ω δ P (I Y δ p α β α β α 2 ) p α e βα x f (Y ) (Y ) eβα x p α e βα x f (Y ) (Y ) eβα x + ( p α )e βα 2 x f 2 (Y ) 2 (Y ) eβα 2 x ω δ 0 P (I Y δ 0 p α β α β α 2 ) p α F (Y ) eβα x p α (Y ) eβα x + ( p α ) 2 (Y ) eβα 2 x Therefore the complete lkelhood condtoned on what we observe s Q(θ θ α ) T {δ ω δ [log p + β x + log f (Y ) + (e βx ) log (Y )] + ( δ )ω δ T + log p + e βx log (Y ) } {δ ( ω δ )[log( p) + β 2x + log f 2 (Y ) + (e β2x ) log F 2 (Y )] + ( δ )( ω δ )[log( p) + e β2x log F 2 (Y )]} The condtonal lkelhood, above, looks unweldy. However, the parameter estmates can to be separated. Frst, dfferentatng wth respect to p gves Q T p δ ω δ p + T ω δ ( δ ) T p δ ( ω δ ) T p ( ω δ )( δ ) p. Equatng the above to zero we have the estmator ˆp a a+b, where a b T δ ω δ + T ω δ ( δ ) T T δ ( ω δ ) ( ω δ )( δ ). Now we consder the estmates of β and β 2 at the th teraton step. Dfferentatng Q wrt 36

13 to β and β 2 gves Q β Q β 2 2 Q β 2 2 Q β 2 2 T {δ ω δ + e βx log F (Y ) + ( δ )ω δ e βx log F (Y )}x 0 T {δ ( ω δ ) + e β2x log F 2 (Y ) T δ ω δ eβx log F (Y ) + ( δ )ω δ e βx log F (Y )x 2 T + ( δ )( ω δ )e β2x log F 2 (Y )}x 0 δ ( ω δ )eβ2x log F 2 (Y ) + ( δ )( ω δ )e β2x log F 2 (Y )x 2. Thus to estmate (β β 2 ) at the j th teraton we use β (j) β (j) 2 β (j ) β (j ) Q β Q β 2 β j ) Q β Q β 2 β j ) Thus β (j) β (j ) + 2 Q β 2 Q β β j ). And smlarly for β (j) 2. Now we can rewrte Q 2 Q β 2 β β j ) as (X ω (j ) X) X S (j ) where ω (j ) X (x x 2... x T ) dag[ω (j )... ω (j ) T ] S (j ) ω (j ) δ ω s S (j ) S (j ). S (j ) T eβj ) wth log (Y ) + ( δ )ω δ e βj ) x log (Y ) j δ ω δ [ + eβj ) x log (Y )] + ( δ )ω δ e βj ) x log (Y )] Thus altogether n the EM-algorthm we have: Start wth ntal value β 0 β0 2 p0 Step Set (β r β 2r p r ) (β β 2 p ). Evaluate ω δ and ω δ stay the same throughout the teratve least squares). 37 (these probables/weghts

14 Step 2 Maxmze Q(θ θ ) by usng the algorthm p r where a r b r are defned prevously. Now evaluate ar a r+b r β (j) β (j ) + (X ω (j ) X) X S (j0) same for β (j) 2, β (j) 2 + (X ω (j ) 2 X) X S (j ) 2 terate untl convergence. Step 3 Let β r β 2r p r be the lmt of the teratve least squares, go back to step untl convergence. Example 4..5 Queston Let us suppose that X and Z are ndependent postve random varables wth denstes f X and f Z respectvely. () Derve the densty functon of /X. () Show that the densty of XZ s (or equvalently c f Z (cy)f X (c )dc). (b) Consder the lnear regresson model x f Z( y x )f X(x)dx (4.7) Y α x + σ ε where ε follows a standard normal dstrbuton (mean zero and varance ) and σ 2 follows a Gamma dstrbuton f(σ 2 ; λ) σ2(κ ) λ κ exp( λσ 2 ) Γ(κ) σ 2 0 wth κ > 0. parameter. Let us suppose that α and λ are unknown parameters but κ s a known () Gve an expresson of the log-lkelhood of Y and explan why t s dffcult to compute the maxmum lkelhood estmate? () As an alternatve to drectly maxmsng the lkelhood, the EM algorthm can be used nstead. Derve the EM-algorthm for ths case. In your dervaton explan what quanttes wll have to be evaluated numercally. 38

15 Soluton 3 (a) () P (/X c) P (X > /c) F X (/c) (F X dstrbuton functon of X). Therefore the densty of /X s f /X (c) /c 2 f X (/c). () We frst note that P (XZ y X x) P (Z y/x). Therefore the densty of XZ X s f XZ X (y) x f Z ( y x ). Usng ths we obtan the densty of XZ f XZ (y) P (XZ y X x)f X (x) f XZ X (y x)f X (x)dx x f Z( y x )f X(x)dx (4.8) Or equvalently we can condton on X to obtan f XZ (y) P (XZ y X x)f /X(c) c f Z (cy)f X (c )dc. cf Z (cy) c 2 f X(c )dc Note that wth a change of varables c /x we can show that both ntegrals are equvalent. (b) () We recall that Y α x + σ ε. Therefore the log-lkelhood of Y s L n (α λ) n log f σε (Y α x ) n log x f σ( Y α x ; λ)f ε (x)dx x where we use (4.8) to obtan the densty of f σε, f σ ( ; λ) s the densty of a square root Gamma random varable and f ε s the densty of a normal. It s clear ether t s very hard or mpossble to obtan an explct expresson for f σε. () Let U (σ 2... σ2 n) denote the unobserved varances whch are unobserved and Y (Y... Y n ) (whch s observed). The complete (unobserved) log-lkelhood of U Y s L T (Y U; α λ) n log σ 2 + σ 2 Y α 2 x + (κ ) log σ 2 + κ log λ λσ 2 Of course, the above can not be evaluated, snce U s unobserved. Instead we evaluate the condton expectaton of the above wth respect to what s observed. 39 Thus the

16 condtoned lkelhood wth respect to Y and the parameters α λ s Q(α λ) E L T (Y U; α λ) Y α λ n E log σ 2 σ 2 ε 2 (Y α x ) 2 λ Y +E σ 2 σ2 ε 2 (Y α x ) 2 λ α 2 x +(κ )E log σ 2 σ2 ε 2 (Y α x ) 2 λ + κ log λ λe(σ 2 σ2 ε 2 (Y α x ) 2 λ ). We note that the above s true because condtonng on Y and α, means that σ 2 ε2 (Y α x ) 2 s observed. Thus by evaluatng Q at each stage we can mplement the EM algorthm: The EMalgorthm () Defne an ntal value θ Θ. Let θ θ. () The expectaton step The k+)-step), For a fxed θ evaluate Q(θ θ) log f(y U; θ) Y θ log f(y u; θ) f(u Y θ )du for all θ Θ. () The maxmsaton step Evaluate θ k+ arg max θ Θ Q(θ θ). We note that the maxmsaton can be done by fndng the soluton of log f(y U; θ) θ Y θ 0. (v) If θ k and θ k+ are suffcently close to each other stop the algorthm and set ˆθ n θ k+. Else set θ θ k+, go back and repeat steps () and () agan. The useful feature of ths EM-algorthm s that f the weghts E σ 2 σ 2 ε 2 (Y α x ) 2 λ E(σ 2 σ 2 ε 2 (Y α x ) 2 λ ). are known. Then we donot need to numercally maxmse Q(α λ) at each stage. Ths 40

17 s because the dervatve of Q(α λ) leads to an explct soluton for α and λ: Q(α λ) α Q(α λ) α n Y 2 E σ 2 σ 2 ε 2 (Y α x ) 2 λ α x x 0 n κ λ E(σ2 σ 2 ε 2 (Y α x ) 2 λ ) 0. It s straghtfoward to see that the above can easly be solved for α and λ. Of course we need to evaluate the weghts E σ 2 ε 2 (Y α x ) 2 λ σ 2 E(σ 2 σ 2 ε 2 (Y α x ) 2 λ ). Ths s done numercally, by notng that for a general g( ) the condtonal expectaton s E(g(σ 2 ) σ 2 ε 2 y) g(σ 2 )f σ 2 σ 2 ε 2(σ2 y)dσ 2. Thus to obtan the densty of f σ 2 σ 2 ε 2 we note that P (σ2 < s σ 2 ε 2 y) P (y < ε 2 s) P (ε 2 y/s) P (ε 2 y/s). Hence the densty of σ 2 gven σ 2 ε 2 y s f σ 2 σ 2 ε 2(s ε2 ) f s 2 ε 2(y/s), where f ε 2 s a ch-squared dstrbuton wth one degree of freedom. Hence E(g(σ 2 ) σ 2 ε 2 y) g(σ 2 ) σ 2 f ε 2(y/σ2 )dσ 2. Usng the above we can numercally evaluate the above condtonal expectatons and thus Q(α λ). We keep teratng untl we get convergence Hdden Markov Models Fnally, we consder applcatons of the the EM-algorthm to parameter estmaton n Hdden Markov Models (HMM). Ths s a model where the EM-algorthm pretty much surpasses any other lkelhood maxmsaton methodology. It s worth mentonng that the EM-algorthm n ths settng s often called the Baum-Welch algorthm. Hdden Markov models are a generalsaton of mxture dstrbutons, however unlke mxture dstbutons t s dffcult to derve an explct expresson for the lkelhood of a Hdden Markov Models. HMM are a general class of models whch are wdely used n several applcatons (ncludng speech recongton), and can easly be generalsed to the Bayesan set-up. A nce descrpton of them can be found on Wkpeda. 4

18 In ths secton we wll only brefly cover how the EM-algorthm can be used for HMM. We do not attempt to address any of the ssues surroundng how the maxmsaton s done, nterested readers should refer to the extensve lterature on the subject. The general HMM s descrbed as follows. Let us suppose that we observe {Y t }, where the rvs Y t satsfy the Markov property P (Y t Y t Y t...) P (Y t Y t ). In addton to {Y t } there exsts a hdden unobserved dscrete random varables {U t }, where {U t } satsfes the Markov property P (U t U t U t 2...) P (U t U t ) and drves the dependence n {Y t }. In other words P (Y t U t Y t U t...) P (Y t U t ). To summarse, the HMM s descrbed by the followng propertes: () We observe {Y t } (whch can be ether contnuous or dscrete random varables) but do not observe the hdden dscrete random varables {U t }. () Both {Y t } and {U t } are tme-homogenuous Markov random varables that s P (Y t Y t Y t...) P (Y t Y t ) and P (U t U t U t...) P (U t U t ). The dstrbutons of P (Y t ), P (Y t Y t ), P (U t ) and P (U t U t ) do not depend on t. () The dependence between {Y t } s drven by {U t }, that s P (Y t U t Y t U t...) P (Y t U t ). There are several examples of HMM, but to have a clear ntepretaton of them, n ths secton we shall only consder one classcal example of a HMM. Let us suppose that the hdden random varable U t can take N possble values {... N} and let p P (U t ) and p j P (U t U t j). Moreover, let us suppose that Y t are contnuous random varables where (Y t U t ) N (µ σ 2) and the condtonal random varables Y t U t and Y τ U τ are ndependent of each other. Our objectve s to estmate the parameters θ {p p j µ σ 2} gven {Y }. Let f ( ; θ) denote the normal dstrbuton N (µ σ 2). Remark 4..3 HMM and mxture models) Mxture models (descrbed n the above secton) are a partcular example of HMM. In ths case the unobserved varables {U t } are d, where p P (U t U t j) P (U t ) for all and j. Let us denote the log-lkelhood of {Y t } as L T (Y ; θ) (ths s the observed lkelhood). It s clear that constructng an explct expresson for L T s dffcult, thus maxmsng the lkelhood s near mpossble. In the remark below we derve the observed lkelhood. Remark 4..4 The lkelhood of Y (Y... Y T ) s L T (Y ; θ) f(y T Y T Y T 2... ; θ)... f(y 2 Y ; θ)p (Y ; θ) f(y T Y T ; θ)... f(y 2 Y ; θ)f(y ; θ). 42

19 Thus the log-lkelhood s L T (Y ; θ) T log f(y t Y t ; θ) + f(y ; θ). The dstrbuton of f(y ; θ) s smply the mxture dstrbuton t2 f(y ; θ) p f(y ; θ ) p N f(y ; θ N ) where p P (U t ). The condtonal f(y t Y t ) s more trcky. We start wth f(y t Y t ; θ) f(y t Y t ; θ). f(y t ; θ) An expresson for f(y t ; θ) s gven above. To evaluate f(y t Y t ; θ) we condton on U t U t to gve (usng the Markov and condtonal ndependent propery) f(y t Y t ; θ) j j j f(y t Y t U t U t j)p (U t U t j) f(y t U t )P (Y t U t j)p (U t U t j)p (U t ) f (Y t ; θ )f j (Y t ; θ j )p j p. Thus we have f(y t Y t ; θ) j f (Y t ; θ )f j (Y t ; θ j )p j p p. f(y t ; θ ) We substtute the above nto L T (Y ; θ) to gve the expresson L T (Y ; θ) Now try to maxmse ths T t2 log j f (Y t ; θ )f j (Y t ; θ j )p j p p f(y t ; θ ) N + log p f(y ; θ ) Instead we seek an ndrect method for maxmsng the lkelhood. By usng the EM algorthm we can maxmse a lkelhood whch s a lot easer to evaluate. Let us suppose that we observe {Y t U t }. Snce P (Y U) P (Y T Y T... Y U)P (Y T Y T 2... Y U)... P (Y U) T t P (Y t U t ), and the dstrbuton of Y t U t s N (µ Ut σ 2 U t ), then the complete lkelhood of {Y t U t } s T t f(y t U t ; θ) T p U t2 43 p Ut U t.

20 Thus the log-lkelhood of the complete observatons {Y t U t } s L T (Y U; θ) T T log f(y t U t ; θ) + log p Ut U t + log p U. t Of course, we do not observe the complete lkelhood, but the above can be used n order to defne the functon Q(θ θ) whch s maxmsed n the EM-algorthm. It s worth mentonng that gven the transton probabltes of a dscrete Markov chan (that s {p j } j ) one can obtan the margnal probabltes {p }. Thus t s not necessary to estmate the margnal probabltes {p } (note that the excluson of {p } n the log-lkelhood, above, gves the condtonal complete log-lkelhood). We recall that to maxmse the observed lkelhood L T (Y ; θ) usng the EM algorthm nvolves evaluatng Q(θ θ), where T Q(θ θ) log f(y t U t ; θ) + U U t t t2 T log p Ut U t + log p U Y θ t2 T T log f(y t U t ; θ) + log p Ut U t + log p U p(u Y θ ) T t2 t[log f(y t U t ; θ)]p (U t Y θ ) + U T [log p Ut U t ]P (U t U t Y θ ) + [log p U ]P (U Y θ ) and U denotes all combnatons of U. Snce P (U t Y θ ) P (U t Y θ )/P (Y θ ) and P (U t U t Y θ ) P (U t U t Y θ )/P (Y θ ) and P (Y θ ) s common to all U t and s ndependent of θ we can defne t2 Q(θ θ) U T t[log f(y t U t ; θ)]p (U t Y θ ) + U T [log p Ut U t ]P (U t U t Y θ ) + [log p U ]P (U Y θ ) t2 where Q(θ θ) Q(θ θ) and the maxmum of Q(θ θ) wth respect to θ s the same as the maxmum of Q(θ θ). Thus the quantty Q(θ θ) s evaluated and maxmsed wth respect to θ. For a gven θ and Y, the condtonal probabltes P (U t Y θ ) and P (U t U t Y θ ) can be evaluated through a seres of teratve steps. For ths example the EM algorthm s () Defne an ntal value θ Θ. Let θ θ. () The expectaton step, For a fxed θ evaluate P (U t Y θ ), P (U t U t Y θ ). Q(θ θ) (defned n (4.9)). 44

21 () The maxmsaton step Evaluate θ k+ arg max θ Θ Q(θ θ) by dfferentatng Q(θ θ) wrt to θ and equatng to zero. (v) If θ k and θ k+ are suffcently close to each other stop the algorthm and set ˆθ n θ k+. Else set θ θ k+, go back and repeat steps () and () agan. 45

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Hidden Markov Models

Hidden Markov Models Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X The EM Algorthm (Dempster, Lard, Rubn 1977 The mssng data or ncomplete data settng: An Observed Data Lkelhood (ODL that s a mxture or ntegral of Complete Data Lkelhoods (CDL. (1a ODL(;Y = [Y;] = [Y,][

More information

Stat 543 Exam 2 Spring 2016

Stat 543 Exam 2 Spring 2016 Stat 543 Exam 2 Sprng 206 I have nether gven nor receved unauthorzed assstance on ths exam. Name Sgned Date Name Prnted Ths Exam conssts of questons. Do at least 0 of the parts of the man exam. I wll score

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Stat 543 Exam 2 Spring 2016

Stat 543 Exam 2 Spring 2016 Stat 543 Exam 2 Sprng 2016 I have nether gven nor receved unauthorzed assstance on ths exam. Name Sgned Date Name Prnted Ths Exam conssts of 11 questons. Do at least 10 of the 11 parts of the man exam.

More information

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF 10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition) Count Data Models See Book Chapter 11 2 nd Edton (Chapter 10 1 st Edton) Count data consst of non-negatve nteger values Examples: number of drver route changes per week, the number of trp departure changes

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models

More information

Primer on High-Order Moment Estimators

Primer on High-Order Moment Estimators Prmer on Hgh-Order Moment Estmators Ton M. Whted July 2007 The Errors-n-Varables Model We wll start wth the classcal EIV for one msmeasured regressor. The general case s n Erckson and Whted Econometrc

More information

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30 STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Chapter Newton s Method

Chapter Newton s Method Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

Solutions Homework 4 March 5, 2018

Solutions Homework 4 March 5, 2018 1 Solutons Homework 4 March 5, 018 Soluton to Exercse 5.1.8: Let a IR be a translaton and c > 0 be a re-scalng. ˆb1 (cx + a) cx n + a (cx 1 + a) c x n x 1 cˆb 1 (x), whch shows ˆb 1 s locaton nvarant and

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1 On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool

More information

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise. Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where y + = β + β e for =,..., y and are observable varables e s a random error How can an estmaton rule be constructed for the

More information

Credit Card Pricing and Impact of Adverse Selection

Credit Card Pricing and Impact of Adverse Selection Credt Card Prcng and Impact of Adverse Selecton Bo Huang and Lyn C. Thomas Unversty of Southampton Contents Background Aucton model of credt card solctaton - Errors n probablty of beng Good - Errors n

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law: CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

Foundations of Arithmetic

Foundations of Arithmetic Foundatons of Arthmetc Notaton We shall denote the sum and product of numbers n the usual notaton as a 2 + a 2 + a 3 + + a = a, a 1 a 2 a 3 a = a The notaton a b means a dvdes b,.e. ac = b where c s an

More information

Properties of Least Squares

Properties of Least Squares Week 3 3.1 Smple Lnear Regresson Model 3. Propertes of Least Squares Estmators Y Y β 1 + β X + u weekly famly expendtures X weekly famly ncome For a gven level of x, the expected level of food expendtures

More information

The Basic Idea of EM

The Basic Idea of EM The Basc Idea of EM Janxn Wu LAMDA Group Natonal Key Lab for Novel Software Technology Nanjng Unversty, Chna wujx2001@gmal.com June 7, 2017 Contents 1 Introducton 1 2 GMM: A workng example 2 2.1 Gaussan

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

Chapter 20 Duration Analysis

Chapter 20 Duration Analysis Chapter 20 Duraton Analyss Duraton: tme elapsed untl a certan event occurs (weeks unemployed, months spent on welfare). Survval analyss: duraton of nterest s survval tme of a subject, begn n an ntal state

More information

e i is a random error

e i is a random error Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where + β + β e for,..., and are observable varables e s a random error How can an estmaton rule be constructed for the unknown

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

Goodness of fit and Wilks theorem

Goodness of fit and Wilks theorem DRAFT 0.0 Glen Cowan 3 June, 2013 Goodness of ft and Wlks theorem Suppose we model data y wth a lkelhood L(µ) that depends on a set of N parameters µ = (µ 1,...,µ N ). Defne the statstc t µ ln L(µ) L(ˆµ),

More information

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva Econ 39 - Statstcal Propertes of the OLS estmator Sanjaya DeSlva September, 008 1 Overvew Recall that the true regresson model s Y = β 0 + β 1 X + u (1) Applyng the OLS method to a sample of data, we estmate

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

Learning from Data 1 Naive Bayes

Learning from Data 1 Naive Bayes Learnng from Data 1 Nave Bayes Davd Barber dbarber@anc.ed.ac.uk course page : http://anc.ed.ac.uk/ dbarber/lfd1/lfd1.html c Davd Barber 2001, 2002 1 Learnng from Data 1 : c Davd Barber 2001,2002 2 1 Why

More information

Topic 5: Non-Linear Regression

Topic 5: Non-Linear Regression Topc 5: Non-Lnear Regresson The models we ve worked wth so far have been lnear n the parameters. They ve been of the form: y = Xβ + ε Many models based on economc theory are actually non-lnear n the parameters.

More information

THE ARIMOTO-BLAHUT ALGORITHM FOR COMPUTATION OF CHANNEL CAPACITY. William A. Pearlman. References: S. Arimoto - IEEE Trans. Inform. Thy., Jan.

THE ARIMOTO-BLAHUT ALGORITHM FOR COMPUTATION OF CHANNEL CAPACITY. William A. Pearlman. References: S. Arimoto - IEEE Trans. Inform. Thy., Jan. THE ARIMOTO-BLAHUT ALGORITHM FOR COMPUTATION OF CHANNEL CAPACITY Wllam A. Pearlman 2002 References: S. Armoto - IEEE Trans. Inform. Thy., Jan. 1972 R. Blahut - IEEE Trans. Inform. Thy., July 1972 Recall

More information

Thermodynamics and statistical mechanics in materials modelling II

Thermodynamics and statistical mechanics in materials modelling II Course MP3 Lecture 8/11/006 (JAE) Course MP3 Lecture 8/11/006 Thermodynamcs and statstcal mechancs n materals modellng II A bref résumé of the physcal concepts used n materals modellng Dr James Ellott.1

More information

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

Computing MLE Bias Empirically

Computing MLE Bias Empirically Computng MLE Bas Emprcally Kar Wa Lm Australan atonal Unversty January 3, 27 Abstract Ths note studes the bas arses from the MLE estmate of the rate parameter and the mean parameter of an exponental dstrbuton.

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida Frst Year Examnaton Department of Statstcs, Unversty of Florda May 7, 010, 8:00 am - 1:00 noon Instructons: 1. You have four hours to answer questons n ths examnaton.. You must show your work to receve

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9 Why MLE?

More information

The Feynman path integral

The Feynman path integral The Feynman path ntegral Aprl 3, 205 Hesenberg and Schrödnger pctures The Schrödnger wave functon places the tme dependence of a physcal system n the state, ψ, t, where the state s a vector n Hlbert space

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

Convergence of random processes

Convergence of random processes DS-GA 12 Lecture notes 6 Fall 216 Convergence of random processes 1 Introducton In these notes we study convergence of dscrete random processes. Ths allows to characterze phenomena such as the law of large

More information

PHYS 705: Classical Mechanics. Calculus of Variations II

PHYS 705: Classical Mechanics. Calculus of Variations II 1 PHYS 705: Classcal Mechancs Calculus of Varatons II 2 Calculus of Varatons: Generalzaton (no constrant yet) Suppose now that F depends on several dependent varables : We need to fnd such that has a statonary

More information

Appendix B. The Finite Difference Scheme

Appendix B. The Finite Difference Scheme 140 APPENDIXES Appendx B. The Fnte Dfference Scheme In ths appendx we present numercal technques whch are used to approxmate solutons of system 3.1 3.3. A comprehensve treatment of theoretcal and mplementaton

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Mamum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models for

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and

More information

Open Systems: Chemical Potential and Partial Molar Quantities Chemical Potential

Open Systems: Chemical Potential and Partial Molar Quantities Chemical Potential Open Systems: Chemcal Potental and Partal Molar Quanttes Chemcal Potental For closed systems, we have derved the followng relatonshps: du = TdS pdv dh = TdS + Vdp da = SdT pdv dg = VdP SdT For open systems,

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics ECOOMICS 35*-A Md-Term Exam -- Fall Term 000 Page of 3 pages QUEE'S UIVERSITY AT KIGSTO Department of Economcs ECOOMICS 35* - Secton A Introductory Econometrcs Fall Term 000 MID-TERM EAM ASWERS MG Abbott

More information

12. The Hamilton-Jacobi Equation Michael Fowler

12. The Hamilton-Jacobi Equation Michael Fowler 1. The Hamlton-Jacob Equaton Mchael Fowler Back to Confguraton Space We ve establshed that the acton, regarded as a functon of ts coordnate endponts and tme, satsfes ( ) ( ) S q, t / t+ H qpt,, = 0, and

More information

6. Stochastic processes (2)

6. Stochastic processes (2) Contents Markov processes Brth-death processes Lect6.ppt S-38.45 - Introducton to Teletraffc Theory Sprng 5 Markov process Consder a contnuous-tme and dscrete-state stochastc process X(t) wth state space

More information

Web-based Supplementary Materials for Inference for the Effect of Treatment. on Survival Probability in Randomized Trials with Noncompliance and

Web-based Supplementary Materials for Inference for the Effect of Treatment. on Survival Probability in Randomized Trials with Noncompliance and Bometrcs 000, 000 000 DOI: 000 000 0000 Web-based Supplementary Materals for Inference for the Effect of Treatment on Survval Probablty n Randomzed Trals wth Noncomplance and Admnstratve Censorng by Ne,

More information

Tracking with Kalman Filter

Tracking with Kalman Filter Trackng wth Kalman Flter Scott T. Acton Vrgna Image and Vdeo Analyss (VIVA), Charles L. Brown Department of Electrcal and Computer Engneerng Department of Bomedcal Engneerng Unversty of Vrgna, Charlottesvlle,

More information

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1 Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons

More information

6. Stochastic processes (2)

6. Stochastic processes (2) 6. Stochastc processes () Lect6.ppt S-38.45 - Introducton to Teletraffc Theory Sprng 5 6. Stochastc processes () Contents Markov processes Brth-death processes 6. Stochastc processes () Markov process

More information

3.1 ML and Empirical Distribution

3.1 ML and Empirical Distribution 67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

More information

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14 APPROXIMAE PRICES OF BASKE AND ASIAN OPIONS DUPON OLIVIER Prema 14 Contents Introducton 1 1. Framewor 1 1.1. Baset optons 1.. Asan optons. Computng the prce 3. Lower bound 3.1. Closed formula for the prce

More information

Randomness and Computation

Randomness and Computation Randomness and Computaton or, Randomzed Algorthms Mary Cryan School of Informatcs Unversty of Ednburgh RC 208/9) Lecture 0 slde Balls n Bns m balls, n bns, and balls thrown unformly at random nto bns usually

More information

Marginal Effects in Probit Models: Interpretation and Testing. 1. Interpreting Probit Coefficients

Marginal Effects in Probit Models: Interpretation and Testing. 1. Interpreting Probit Coefficients ECON 5 -- NOE 15 Margnal Effects n Probt Models: Interpretaton and estng hs note ntroduces you to the two types of margnal effects n probt models: margnal ndex effects, and margnal probablty effects. It

More information

STAT 3008 Applied Regression Analysis

STAT 3008 Applied Regression Analysis STAT 3008 Appled Regresson Analyss Tutoral : Smple Lnear Regresson LAI Chun He Department of Statstcs, The Chnese Unversty of Hong Kong 1 Model Assumpton To quantfy the relatonshp between two factors,

More information

Lecture 4: September 12

Lecture 4: September 12 36-755: Advanced Statstcal Theory Fall 016 Lecture 4: September 1 Lecturer: Alessandro Rnaldo Scrbe: Xao Hu Ta Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer: These notes have not been

More information

1 Matrix representations of canonical matrices

1 Matrix representations of canonical matrices 1 Matrx representatons of canoncal matrces 2-d rotaton around the orgn: ( ) cos θ sn θ R 0 = sn θ cos θ 3-d rotaton around the x-axs: R x = 1 0 0 0 cos θ sn θ 0 sn θ cos θ 3-d rotaton around the y-axs:

More information

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads

More information

Lecture 4: Universal Hash Functions/Streaming Cont d

Lecture 4: Universal Hash Functions/Streaming Cont d CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected

More information

Economics 130. Lecture 4 Simple Linear Regression Continued

Economics 130. Lecture 4 Simple Linear Regression Continued Economcs 130 Lecture 4 Contnued Readngs for Week 4 Text, Chapter and 3. We contnue wth addressng our second ssue + add n how we evaluate these relatonshps: Where do we get data to do ths analyss? How do

More information

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number

More information

Rockefeller College University at Albany

Rockefeller College University at Albany Rockefeller College Unverst at Alban PAD 705 Handout: Maxmum Lkelhood Estmaton Orgnal b Davd A. Wse John F. Kenned School of Government, Harvard Unverst Modfcatons b R. Karl Rethemeer Up to ths pont n

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

LECTURE 9 CANONICAL CORRELATION ANALYSIS

LECTURE 9 CANONICAL CORRELATION ANALYSIS LECURE 9 CANONICAL CORRELAION ANALYSIS Introducton he concept of canoncal correlaton arses when we want to quantfy the assocatons between two sets of varables. For example, suppose that the frst set of

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

Supplement to Clustering with Statistical Error Control

Supplement to Clustering with Statistical Error Control Supplement to Clusterng wth Statstcal Error Control Mchael Vogt Unversty of Bonn Matthas Schmd Unversty of Bonn In ths supplement, we provde the proofs that are omtted n the paper. In partcular, we derve

More information

9. Binary Dependent Variables

9. Binary Dependent Variables 9. Bnar Dependent Varables 9. Homogeneous models Log, prob models Inference Tax preparers 9.2 Random effects models 9.3 Fxed effects models 9.4 Margnal models and GEE Appendx 9A - Lkelhood calculatons

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information