Chapter 7. The Expectation-Maximisation Algorithm. 7.1 The EM algorithm - a method for maximising the likelihood

Size: px
Start display at page:

Download "Chapter 7. The Expectation-Maximisation Algorithm. 7.1 The EM algorithm - a method for maximising the likelihood"

Transcription

1 Chapter 7 The Expetation-Maximisation Algorithm 7. The EM algorithm - a method for maximising the likelihood Let us suppose that we observe Y {Y i } n. ThejointdensityofY is f(y ; 0 ), and 0 is an unknown parameter. Our objetive is to estimate 0.Thelog-likelihoodofY is L n (Y ; ) log f(y ; ), Observe, that we have not speified that {Y i } are iid random variables. This is beause the proedure that we will desribe below is very general and the observations do not need to be either independent or identially distributed (indeed an interesting extension of this proedure, is to time series with missing data first proposed in Shumway and Sto er (98) and Engle and Watson (98)). Our objetive is to estimate 0,inthesituation where either evaluating the log-likelihood L n or maximising L n is di ult. Hene an alternative means of maximising L n is required. Often, there may exist unobserved data {U {U i } m }, wherethelikelihoodof(y,u)anbe easily evaluated. Itisthrough these unobserved data that we find an alternative method for maximising L n. The EM-algorithm was speified in its urrent form in Dempster, Laird and Runbin (977)( howeveritwasapplied previously to several speifi models. 93

2 Example 7.. (i) Suppose that {f j ( ; ); } m j are a sequene of densities from m exponential lasses of densities. In Setions.6 and.6.5 we showed that it was straightforward to maximise eah of these densities. However, let us suppose that eah f j ( ; ) orresponds to one subpopulation. All the populations are pooled together and given an observation X i it is unknown whih population it omes from. Let denote the subpopulation the individual X i omes from i.e. i {,...,m} where P ( i j) p j. The density of all these mixtures of distribution is i f(x; ) mx f(x i x i j)p ( i j) j mx p j f j (x; ) j where P m j p j. Thus the log-likelihood of {X i } is log! mx p j f j (X i ; ). j Of ourse we require that P m j p j, thus we inlude a lagrange multiplier to the likelihood to ensure this holds!! mx mx log p j f j (X i ; ) + p j. j It is straightforward to maximise the likelihood for eah individual subpopulation, however, it is extremely di ult to maximise the likelihood of this mixture of distributions. j The data {X i } an be treated as missing, sine the information { i } about the whih population eah individual belongs to is not there. If {X i, i} is i were known the likelihood of my ny (p j f j (X i ; )) I( ij) j ny p i f i (X i ; ) whih leads to the log-likelihood of {X i, i} whih is log p i f i (X i ; ) (log p i +logf i (X i ; )) 94

3 whih is far easier to maximise. Again to ensure that P m j p j we inlude a Lagrange multiplier log p i f i (X i ; ) (log p i +logf i (X i ; )) + It is easy to show that bp j n P n I( i j).! mx p j. (ii) Let us suppose that {T i } n+m are iid survival times, with density f(x; 0 ). Some of these times are ensored and we observe {Y i } n+m, where Y i min(t i,). To simplify notation we will suppose that {Y i T i } n, hene the survival time for apple i apple n, is observed but Y i for n + apple i apple n + m. Using the results in Setion the log-likelihood of Y is n+m X L n (Y ; ) log f(y i ; ) + in+ j log F(Y i ; ). The observations {Y i } n+m in+ an be treated as if they were missing. Define the omplete observations U {T i } n+m in+, hene U ontains the unobserved survival times. Then the likelihood of (Y,U) is L n (Y,U; ) n+m X log f(t i ; ). If no analyti express exists for the survival funtion F, it is easier to maximise L n (Y,U) than L n (Y ). We now formally desribe the EM-algorithm. As mentioned in the disussion above it is often easier to maximise the joint likelihood of (Y,U) than with the likelihood of Y itself. the EM-algorithm is based on maximising an approximation of (Y,U)basedon the data that is observed Y. Let us suppose that the joint likelihood of (Y,U)is L n (Y,U; ) log f(y,u; ). This likelihood is often alled the omplete likelihood, we will assume that if U were known, then this likelihood would be easy to obtain and di erentiate. We will assume 95

4 that the density f(u Y ; ) is also known and is easy to evaluate. By using Bayes theorem it is straightforward to show that log f(y,u; ) logf(y ; )+logf(u Y ; ) (7.) )L n (Y,U; ) L n (Y ; )+logf(u Y ; ). Of ourse, in reality log f(y,u; ) isunknown,beauseu is unobserved. However, let us onsider the expeted value of log f(y,u; ) givenwhatweobservey.thatis Z Q( 0,)E log f(y,u; ) Y, 0 log f(y,u; ) f(u Y, 0 )du, (7.) where f(u Y, 0 )istheonditionaldistributionofu given Y and the unknown parameter 0. Hene if f(u Y, 0 )wereknown,thenq( 0,)anbeevaluated. Remark 7.. It is worth noting that Q( 0,)E log f(y,u; ) Y, 0 an be viewed as the best preditor of the omplete likelihood (involving both observed and unobserved data - (Y,U)) given what is observed Y. We reall that the onditional expetation is the best preditor of U in terms of mean squared error, that is the funtion of Y whih minimises the mean squared error: E(U Y )argmin g E(U g(y )). The EM algorithm is based on iterating Q( ) insuhawaythatateahstepwe obtaining an estimator whih gives a larger value of Q( ) (andaswewillshowlater,this gives a larger L n (Y ; )). We desribe the EM-algorithm below. The EM-algorithm: (i) Define an initial value. Let. (ii) The expetation step (The (k+)-step), For a fixed evaluate Z Q(,)E log f(y,u; ) Y, log f(y,u; ) f(u Y, )du, for all. (iii) The maximisation step Evaluate k+ argmax Q(,). 96

5 We note that the maximisation an be done by finding the solution log f(y,u; ) E Y, 0. (iv) If k and k+ are su iently lose to eah other stop the algorithm and set b n k+. Else set k+,gobakandrepeatsteps(ii)and(iii)again. We use ˆ n as an estimator of 0.Tounderstandwhythisiterationisonnetedtothe maximising of L n (Y ; ) and,underertainonditions,givesagoodestimatorof 0 (in the sense that ˆ n is lose to the parameter whih maximises L n )letusreturnto(7.). Taking the expetation of log f(y,u; ), onditioned on Y we have Q(,) E log f(y,u; ) Y, E log f(y ; )+logf(u Y ; ) Y, apple logf(y ; )+E log f(u Y ; ) Y,. (7.3) Define Z D(,) E log f(u Y ; ) Y, log f(u Y ; ) f(u Y, )du. Substituting D(,)into(7.3)gives Q(,)L n ()+D(,), (7.4) we use this in expression in the proof below to show that L n ( k+ ) > L n ( k ). First we that at the (k +)th step iteration of the EM-algorithm, k+ maximises Q( k,)overall, hene Q( k, k+ ) In the lemma below we show that L n ( k+ ) Q( k, k )(whihwillalsobeusedintheproof). L n ( k ), hene at eah iteration of the EM-algorthm we are obtaining a k+ whih inreases the likelihood over the previous iteration. Lemma 7.. When running the EM-algorithm the inequality L n ( k+ ) holds. L n ( k ) always Furthermore, if k! b and for every ) (, )( k, k+ ) 0, n() b 0(this point an be a saddle point, a loal maximum or the sought after global maximum). 97

6 PROOF. From (7.4) it is lear that Q( k, k+ ) Q( k, k ) L n ( k+ ) L n ( k ) + D( k, k+ ) D( k, k ), (7.5) where we reall Z D(,) E log f(u Y ; ) Y, log f(u Y ; ) f(u Y, )du. We will show that D( k, k+ ) that D(k, k+ ) D( k, k ) Z D( k, k ) apple 0, the result follows from this. We observe log f(u Y, k+) f(u Y, k ) f(u Y, k)du. By using the Jenson s inequality (whih we have used several times previously) D(k, k+ ) D( k, k ) Z apple log f(u Y, k+ )du 0. Therefore, D( k, k+ ) D( k, k ) apple 0. Note that if uniquely identifies the distribution f(u Y,) then equality only happens when k+ k.sine D( k, k+ ) D( k, k ) apple 0 by (7.5) we have Ln ( k+ ) L n ( k ) Q( k, k+ ) Q( k, k ) 0. and we obtain the desired result (L n ( k+ ) L n ( k )). To prove the seond part of the result we will use that for all log f(u Y ; ) (, )(,) f(u Z f(u Y,)du 0. We will to show that the derivative of the likelihood is zero at i.e. show this we use the identity L n ( k+ )Q( k, k+ ) D( k, k+ ). Taking derivatives with respet to k+ gives 0. n () ) (, )( k, k+ ) (, )( k, k+ ). By definition of ) (, )( k, k+ ) n () ) (, )( k, k+ ). 98

7 Furthermore, sine by assumption k! b this implies that as k!we n () ) (, )(, b ) b 0, whih follows from (7.6), thus giving the required result. Further information on onvergene an be found in Boyles (983) ( jstor.org/stable/pdf/3456.pdf?_ )andwu(983)( jstor.org/stable/pdf/40463.pdf?_ ). Remark 7.. Note that the EM algorithm will onverge to a b b 0. The reason an be seen from the identity Q(,)L n ()+D(,). The derivative of the above with respet n() (7.7) Observe that D(,) is maximum only when (for all, this is lear from the proof above), whih for b b 0. Sine b b b 0. Furthermore, by 0by using (7.7) this b 0. In order to prove the results in the following setion we use the following identities. Q(, )L( )+D(, ) Q(, ) Q(, ) L( L( ) D(, ) (, L( ) (, Q(, ) (, )(,) D(, ) (, )(,) (, )(,) D(, ) (, )(,)(7.8) We observe that the LHS of the above is the observed Fisher information matrix I( Y Q(, ) (, )(,) I C ( Y ) log f(u,y; ) f(u Y,)du (7.9) 99

8 is the omplete Fisher information onditioned on what is observed D(, ) (, )(,) I M ( Y ) log f(u Y ; ) f(u Y,)du (7.0) is the Fisher information matrix of the unobserved data onditioned on what is observed. Thus I( Y )I C ( Y ) I M ( Y ). 7.. Speed of onvergene of k to a stable point When analyzing an algorithm it is instrutive to understand how fast it takes to onverge to the limiting point. In the ase of the EM-algorithm, this means what fators determine the rate at whih k onverges to a stable point b (note this has nothing to do with the rate of onvergene of an estimator to the true parameter, and it is important to understand this distintion). The rate of onvergene of an algorithm is usually measured by the ratio of the urrent iteration with the previous iteration:! k+ b R lim, k! b k if the algorithm onverges to a limit in a finite number of iterations we plae the above limit to zero. Thus the smaller R the faster the rate of onvergene (for example if (i) k b k then R if(ii) k b k then R, assuming < ). Note that sine ( k+ b ) Q k j j+ b j b,thentypially R apple. To obtain an approximation of R we will make a Taylor expansion ) around the limit (, )(, b ). b To do this we reall that for a bivariate funtion f : R! R for (x 0,y 0 ) lose to(x, y) we have the Taylor expansion f(x, y) f(x 0,y 0 )+(x x 0 y) 0 ) +(y y 0 ) 0 ) +lowerorderterms. Applying the above ) ) (, )( k, k+ ) (, )( b, b ) +( k+ Q(, ) (, )(, b ) b +( k 00 Q(, ) (, )( b, b ).

9 Sine k+ maximises Q( k,)and b maximises Q( b, ) within the interior of the parameter spae then This implies ) (, )( k, k+ ) ) (, )( b, b ) 0 ( k+ ) Q(, ) (, )(, b ) b +( k Q(, ) (, )( b, b ) 0. Thus lim k! k+ k! b Q(, ) (, )( b, b Q(, ) (, )(, b ) b. (7.) This result shows that the rate of onvergene depends on the ratio of gradients of Q(, ) around (, )( b, b ). Some further simplifiations an be made by noting that Q(, ) L n ( )+D(, ) Q(, D(, ). Substituting this into (7.) gives! k+ Q(, ) lim k! b k (, )( b, b D(, ) (, )(, b ) b. (7.) To make one further simplifiation, we note Z D(, Y, Y, ) (, )(, b ) b f(u Y, ) (, )(, b ) b du Y,) f(u Y,) du b log f(u Y,) f(u Y,) bdu (7.3) were the last line of the above follows from the identity Z log f(x; ) dx + f(x; )dx 0 f(x; ) (see the proof of Corollary.3.). Substituting (7.) into (7.3) gives! k+ Q(, D(, ) lim k! b (, )(, b ) b (, )(, b ) b. (7.4) k 0

10 Substituting (7.9) and (7.0) into the above gives! k+ b lim I k! b C ( Y ) I M ( Y ) (7.5) k Hene the rate of onvergene of the algorithm depends on on the ratio I C ( Y ) I M ( Y ). The loser the largest eigenvalue of I C ( Y ) I M ( Y )toone,theslowertherateof onvergene, and a larger number of iterations are required. The heuristi of this result is that if the missing information is a large proportion of the omplete or total information than this ratio will be large. Further details an be found in Dempster et. al. (977) pages 9-0 and Meng and Rubin (994) ( 7. Appliations of the EM algorithm 7.. Censored data Let us return to the example at the start of this setion, and onstrut the EM-algorithm for ensored data. We reall that the log-likelihoods for ensored data and omplete data are and n+m X L n (Y ; ) log f(y i ; ) + in+ n+m X L n (Y,U; ) log f(y i ; ) + in+ log F(Y i ; ). log f(t i ; ). To implement the EM-algorithm we need to evaluate the expetation step Q(,). It is easy to see that Q(,)E L n (Y,U; ) Y, n+m X log f(y i ; ) + To obtain E log f(t i ; ) Y, (i n +)wenotethat in+ E log f(t i ; ) Y, E(logf(T i ; ) T i ) Z log f(ti ; ) f(u; )du. F(; ) 0 E log f(t i ; ) Y,.

11 Therefore we have Q(,) m log f(y i ; ) + F(; ) Z log f(ti ; ) f(u; )du. We also note that the derivative of Q(,)withrespetto i ; ) m + f(y i ; ) F(; ) Hene for this example, the EM-algorithm is (i) Define an initial value. Let. (ii) The expetation step: For a fixed f(y i ; ) (iii) The maximisation i ; ) + Z m F(; ) f(u; ) Solve k+ be suh k 0. f(u; ) f(u; ) f(u; )du. (iv) If k and k+ are su iently lose to eah other stop the algorithm and set ˆ n k+. Else set k+,gobakandrepeatsteps(ii)and(iii)again. 7.. Mixture distributions We now onsider a useful appliation of the EM-algorithm, to the estimation of parameters in mixture distributions. Let us suppose that {Y i } n are iid random variables with density f(y; ) pf (y; )+( p)f (y; ), where (p,, )areunknownparameters. Forthepurposeofidentifiabilitywewill suppose that 6, p 6 andp 6 0. Thelog-likelihoodof{Y i } is L n (Y ; ) log pf (Y i ; )+( p)f (Y i ; ). (7.6) Now maximising the above an be extremely di ult. As an illustration onsider the example below. 03

12 Example 7.. Let us suppose that f (y; ) and f (y; ) are normal densities, then the log likelihood is L n (Y ; ) log pp exp( We observe this is extremely di (Y i µ ) )+( p) p exp( (Y i µ ) ). ult to maximise. On the other hand if Y i were simply normally distributed then the log-likelihood is extremely simple L n (Y ; ) / log + (Y i µ ) ). (7.7) In other words, the simpliity of maximising the log-likelihood of the exponential family of distributions (see Setion.6) is lost for mixtures of distributions. We use the EM-algorithm as an indiret but simple method of maximising (7.7). In this example, it is not lear what observations are missing. However, let us onsider one possible intepretation of the mixture distribution. Let us define the random variables and Y i,where i {, }, P ( i )p and P ( i )( p) and the density of Y i i isf and the density of Y i i isf.basedonthisdefinition, it is lear from the above that the density of Y i is f(y; ) f(y,)p ( )+f(y,)p ( )pf (y; )+( p)f (y; ). Hene, one interpretation of the mixture model is that there is a hidden unobserved random variable whih determines the state or distribution of Y i. A simple example, is that Y i is the height of an individual and i is the gender. However, i is unobserved and only the height is observed. Often a mixture distribution has a physial interpretation, similar to the height example, but sometimes it an be used to parametrially model a wide lass of densities. Based on the disussion above, U { i } an be treated as the missing observations. The likelihood of (Y i,u i )is i p f (Y i ; ) I( ) p f (Y i ; ) I( i) p i f i (Y i ; i ). 04

13 where we set p p. Thereforetheloglikelihoodof{(Y i, i)} is We now need to evaluate L n (Y,U; ) Q(,)E L n (Y,U; ) Y, (log p i +logf i (Y i ; i )). E log p i Y i, +E log f i (Y i ; i ) Y i,. We see that the above expetation is taken with respet the distribution of i onditioned on Y i and the parameter Thus, in general, E(A(Y, ) Y, ) X j A(Y, j)p ( j Y i, ), whih we apply to Q(,)togive Q(,) X j [log p i j +logf i j(y i ; )] P ( i j Y i, ). Therefore we need to obtain P ( i j Y i, ). By using onditioning arguments it is easy to see that P ( i Y i y, ) P ( i,y i y; ) p f (y,, ) P (Y i y; ) p f (y,, )+( p )f (y,, ) : w (,y) P ( i Y i y, ) p f (y,, ) p f (y,, )+( p )f (y,, ) : w (,y) w (,y). Therefore Q(,) log p +logf (Y i ; ) w (,Y i )+ log( p)+logf (Y i ; ) w (,Y i ). Now maximising the above with respet to p, and in general will be muh easier than maximising L n (Y ; ). For this example the EM algorithm is (i) Define an initial value. Let. To see why note that P ( i and Y i [y h/,y+ h/] )hp f (y) and P (Y i [y h/,y+ h/] )h (p f (y)+( p )f (y)). 05

14 (ii) The expetation step: For a fixed evaluate Q(,) (iii) The maximisation step: log p +logf (Y i ; ) w (,Y i )+ log( p)+logf (Y i ; ) w (,Y i ). Evaluate k+ argmax Q(,)bydi erentiatingq(,)wrtto and equating to zero. Sine the parameters p and, are in separate subfuntions, they an be maximised separately. (iv) If k and k+ are su iently lose to eah other stop the algorithm and set ˆ n k+. Else set k+,gobakandrepeatsteps(ii)and(iii)again. Example 7.. (Normal mixtures and mixtures from the exponential family) We briefly outline the algorithm in the ase of a mixture two normal distributions. In this ase (i) Q(,) X w j (,Y i ) j j (Y i µ j ) +log j + w j (,Y i )(logp +log( p)). By di erentiating the above wrt to µ j, j (for j and ) and p it is straightforward to see that the µ j, j and p whih maximises the above is bµ j P n w j(,y i )Y i P n w j(,y i ) and b j P n w j(,y i )(Y i bµ j ) P n w j(,y i ) and bp P n w (,Y i ). n One these estimators are obtained we let (bµ, bµ, b, b, bp). The quantities w j (,Y i ) are re-evaluated and Q(,) maximised with respet to the new weights. (ii) In general if Y is a mixture from the exponential family with density f(y; ) mx p j exp (y j j 06 apple j ( j )+ j (y))

15 the orresponding Q(,) is where Q(,) mx j w j (,Y i )[Y i j apple j ( j )+ j (Y i )+logp j ], w j (,Y i ) p j exp Y i j apple j ( j )+ j (Y i ) P m k p k exp (Y i k apple k ( k )+ k(y i )) subjet to the onstraint that P m j p j. Thus for apple j apple m, Q(,) is maximised for b j µ j P n w j(,y i )Y P i n w j(,y i ) where µ j apple 0 j (we assume all parameter for eah exponential mixture is open) and P n bp j w j(,y i ). n Thus we set ({ b j, bp j } m j) and re-evaluate the weights. Remark 7.. One the algorithm is terminated, we an alulate the hane that any given observation Y i is in subpopulation j sine bp ( i j Y i ) bp j f j (Y ; b ) P j bp jf j (Y ; b ). This allows us to obtain a lassifier for eah observation Y i. It is straightforward to see that the arguments above an be generalised to the ase that the density of Y i is a mixture of m di erent densities. However, we observe that the seletion of m an be quite adho. There are methods for hoosing m, theseinludethe reversible jump MCMC methods Problems Example 7..3 Question: Suppose that the regressors x t are believed to influene the response variable Y t. The distribution of Y t is P (Y t y) p y t exp( y! where t exp( x 0 t ) and t exp( x 0 t ). 07 ty) +( p) y t exp( y! ty),

16 (i) State minimum onditions on the parameters, for the above model to be identifiable? (ii) Carefully explain (giving details of Q(,) and the EM stages) how the EM-algorithm an be used to obtain estimators of, and p. (iii) Derive the derivative of Q(,), and explain how the derivative may be useful in the maximisation stage of the EM-algorithm. Solution (iv) Given an initial value, will the EM-algorithm always find the maximum of the likelihood? Explain how one an hek whether the parameter whih maximises the EM-algorithm, maximises the likelihood. (i) 0 <p<and 6 (these are minimum assumptions, there ould be more whih is hard to aount for given the regressors x t ). (ii) We first observe that P (Y t y) isamixtureoftwopoissondistributionswhere eah has the anonial link funtion. Define the unobserved variables, {U t },whih are iid and where P (U t )p and P (U t )( p) andp (Y y U i ) y t exp( ty) and P (Y y) U y! i ) y t exp( ty) log f(y t,u t,).therefore,wehave y! 0 Y t ut x t exp( u 0 t x t )+logy t!+logp, where (,,p). Thus, E(log f(y t,u t,) Y t, )is E(log f(y t,u t,) Y t, ) Y t 0 x t exp( x 0 t )+logy t!+logp (,Y t ) + Y t 0 x t exp( x 0 t )+logy t!+logp ( (,Y t )). where P (U i Y t, )isevaluatedas P (U i Y t, ) (,Y t ) pf (Y t, ) pf (Y t, )+( p)f (Y t, ), with f (Y t, ) exp( 0 x t Y t )exp( Y t exp( 0 x t )) Y t! f (Y t, ) exp( x 0 t Y t )exp( Y t exp( x 0 t ). Y t! 08

17 Thus Q(,)is Q(,) TX t 0 Y t x t exp( x 0 t )+logy t!+logp (,Y t ) 0 + Y t x t exp( x 0 t )+logy t!+log( p) ( (,Y t )). Using the above, the EM algorithm is the following: (a) Start with an initial value whih is an estimator of., and p, denote this as (b) For every evaluate Q(,). () Evaluate arg max Q(,). Denote the maximum as and return to step (b). (d) Keep iterating until the maximums are su iently lose. (iii) The derivative TX t TX t TX t Y t exp( x 0 t ) x t (,Y t ) Y t exp( x 0 t ) x t ( (,Y t )) p (,Y t ) p ( (,Y t ). Thus maximisation of Q(,)anbeahievedbysolvingfortheaboveequations using iterative weighted least squares. (iv) Depending on the initial value, the EM-algorithm may only loate a loal maximum. To hek whether we have found the global maximum, we an start the EMalgorithm with several di erent initial values and hek where they onverge. Example 7..4 Question () Let us suppose that F (t) and F (t) are two survival funtions. Let x denote a univariate regressor. (i) Show that F(t; x) pf (t) exp( x) +( p)f (t) exp( x) is a valid survival funtion and obtain the orresponding density funtion. 09

18 (ii) Suppose that T i are survival times and x i is a univariate regressor whih exerts an influene an T i. Let Y i min(t i,), where is a ommon ensoring time. {T i } are independent random variables with survival funtion F(t; x i ) pf (t) exp( x i ) +( p)f (t) exp( x i ), where both F and F are known, but p, and are unknown. State the ensored likelihood and show that the EM-algorithm together with iterative least squares in the maximisation step an be used to maximise this likelihood (su ient details need to be given suh that your algorithm an be easily oded). Solution i) Sine F and F are monotonially dereasing positive funtions where F (0) F (0) and F () F () 0,thenitimmediatelyfollowsthat F(t, x) pf (t) e x +( p)f (t) e x satisfies the same onditions. To obtain the density we di erential wrt x) pe x f (t)f (t) e x ( p)e x f (t)f (t) e ) f(t; x) pe x f (t)f (t) e x +( p)e x f (t)f (t) e x, where we use that df (t) dt f ( t). ii) The ensored log likelihood is L n (,,p) [ i log f(y i ;,,p)+( i)logf(y i ;,,p)]. Clearly, diretly maximizing the above is extremely di alternative method via the EM algorithm. ult. Thus we look for an We first define the indiator variable (whih orresponds to the missing variables) whih denotes the state or ( with P (Ii )p p I i with P (I i )( p) p. 0

19 Then the joint density of (Y i, i,i i )is whih gives the log-density p Ii e I i x i f Ii (t)f Ii (t) e I i x i F Ii (t) e I i x i i i log p Ii + Ii x i +logf Ii (Y i )+(e I i x i ) log F Ii (Y i ) +( i) log p Ii +(e I i x i )logf Ii (Y i ). Thus the omplete log likelihood of (Y i, i,i i )is L n (Y,,I i ;,,p) { i [log p Ii + Ii x i +logf Ii (Y i )+(e I i x i ) log F Ii (Y i ) +( i)[log p Ii +(e I i x i )logf Ii (Y i )])} Next we need to alulate P (I i Y i, i, )andp (I i Y i, i, );! i i () P (I i Y i, i,p,, ) p e x i f (Y i )F (Y i ) e x i p e x i f (Y i )F (Y i ) e x i +( p )e x i f (Y i )F (Y i ) e x i! i 0 i () P (I i Y i, i 0,p, p F (Y i ) e x i p F (Y i ) e x i +(, ) p )F (Y i ) e x i and! i i ()! i i () and! i 0 i ()! i 0 i (). Let p p and p p. Therefore the omplete likelihood onditioned on what we observe is Q(,) X s { i! i i (s)[log p s + x i +logf s (Y i )+(e sx i ) log F s (Y i )] +( i)! i 0 i (s) log p s + e sx i log F s (Y i ) } X { i! i i (s)[ x i +logf s (Y i )+(e sx i ) log F s (Y i )] s +e sx i log F s (Y i ) X + s i (s)logp s +( i)! i 0 i (s)logp s i! i Q(, )+Q(, )+Q(,p,p )

20 The onditional likelihood, above, looks unwieldy. However, the parameter estimators an to be separated. First, di erentiating with respet to @Q(,p, i! i i () p +! i 0 i ()( i) p i () p i! i i ()( i) p.! i 0 Equating the above to zero we have the estimator ˆp a b i! i i () + i! i i () + a,where a+b! i 0 i ()( i)! i 0 i ()( i). Next we onsider the estimates of and at the i th iteration step. Di erentiating Q wrt to and gives for @Q s(, Q s (, Q( 0. i! i i (s) +e sx i log F s (Y i ) +( i)! i 0 i (s)e sx i log F s (Y i ) x s i! i i (s)e sx i log F s (Y i )+( i)! i 0 i (s)e sx i log F s (Y i ) x i Observe that setting the first derivative to zero, we annot obtain an expliit expression for the estimators at eah iteration. Thus we need to use the Newton-Rapshon sheme but in a very simply set-up. To estimate (, ) atthej th iteration we use " (j) (j) # " # #

21 Thus for s, wehave (j) s s @ s. We an rewrite the above Newton Raphson sheme as something that resembles weighted least squares. We reall the weighted least squares estimator are the parameters whih minimise the weighted least squares riterion W ii (Y i x 0 i ). The whih minimises the above is b (X 0 WX) X 0 WY. The Newtons-Raphson sheme an be written as (j) @ s s (X 0 W s X) XS s where W where the elements of the above are X 0 (x,x,...,x n ), s diag[! (s),...,! 3 S S s 6 4 s. 7 5, S sn n (s)],! si i! i i e s log F s (Y i )+( i)! i 0 i e s x i log F s (Y i ) S si i! i i [ + e s x i log F s (Y i )] + ( i)! i 0 i e s x i log F s (Y i )]. By using algebrai manipulations we an rewrite the iteration as an iterated weighted least squared (j) s s s s (X 0! s X) X 0 S (X 0 W s X) (X 0 W s X) (X 0 W s X) X 0 W s X s s (X 0 W s X) XS s s (X 0 W s X) XW s [W s ] S s 3

22 Now we rewrite the above in weighted least squares form. Define Z s X s [W s ] S s this ats as our pseudo y-variable. Using this notation we have (j) s (X 0 W s X) X 0 W s Z s. Thus at eah step of the Newton-Raphson iteration we minimise the weighted least equation! si Z s x i for s,. Thus altogether in the EM-algorithm we have: Start with initial value 0, 0,p 0 Step Set (,r,,r,p r )(,,p ). Evaluate! i i and! i i (these probabilies/weights stay the same throughout the iterative least squares). Step Maximize Q(,)byusingthealgorithmp r previously. Now evaluate for s, ar a r+b r where a r,b r are defined (j) s (X 0 W s X) X 0 W s Z s. Iterate until onvergene of the parameters. Step 3 Go bak to step until onvergene of the EM algorithm Exerises Exerise 7. Consider the linear regression model Y i 0 x i + i " i where " i follows a standard normal distribution (mean zero and variane ) and a Gamma distribution i follows f( ; ) (apple ) apple exp( ), (apple) 0, 4

23 with apple>0. Let us suppose that and are unknown parameters but apple is a known parameter. We showed in Exerise. that diretly maximising the log-likelihood was extremely di ult. Derive the EM-algorithm for model. In your derivation explain what quantities will have to be evaluated numerially. Exerise 7. Consider the following shifted exponential mixture distribution f(x;,,p,a)p exp( x/ )I(x 0) + ( p) exp( (x a)/ )I(x a), where p,, and a are unknown. (i) Make a plot of the above mixture density. Considering the ases x a and x < a separately, alulate the probability of belonging to eah of the mixtures, given the observation X i (i.e. Define the variable i, where P ( i )p and f(x i ) exp( x/ ) and alulate P ( i X i )). (ii) Show how the EM-algorithm an be used to estimate a, p,,. At eah iteration you should be able to obtain expliit solutions for most of the parameters, give as many details as you an. Hint: It may be benefiial for you to use profiling too. (iii) From your knowledge of estimation of these parameters, what do you onjeture the rates of onvergene to be? Will they all be the same, or possibly di erent? Exerise 7.3 Suppose {Z i } n are independent random variables, where Z i has the density f Z (z; 0,,µ,,u i )ph(z; 0,,u i )+( p)g(z;, µ), g(x;, µ) ( µ )( x µ ) exp( (x/µ) )I (0,) (x) (the Weibull distribution) and h(x; 0,,u i ) i exp( x/ i)i (0,) (x) (the exponential distribution), with i 0 exp( u i ) and {u i } n are observed regressors. The parameters p, 0,,µ and are unknown and our objetive in this question is to estimate them. (a) What is the log-likelihood of {Z i }? (Assume we also observe the deterministi regressors {u i }.) 5

24 (b) By defining the orret dummy variable i derive the steps of the EM-algorithm to estimate the parameters p, 0,,µ, (using the method of profiling if neessary). 7.3 Hidden Markov Models Finally, we onsider appliations of the EM-algorithm to parameter estimation in Hidden Markov Models (HMM). This is a model where the EM-algorithm pretty muh surpasses any other likelihood maximisation methodology. It is worth mentioning that the EMalgorithm in this setting is often alled the Baum-Welh algorithm. Hidden Markov models are a generalisation of mixture distributions, however unlike mixture distibutions it is di ult to derive an expliit expression for the likelihood of a Hidden Markov Models. HMM are a general lass of models whih are widely used in several appliations (inluding speeh reongition), and an easily be generalised to the Bayesian set-up. A nie desription of them an be found on Wikipedia. In this setion we will only briefly over how the EM-algorithm an be used for HMM. We do not attempt to address any of the issues surrounding how the maximisation is done, interested readers should refer to the extensive literature on the subjet. The general HMM is desribed as follows. Let us suppose that we observe {Y t },where the rvs Y t satisfy the Markov property P (Y t Y t,y t,...)p(y t Y t ). In addition to {Y t } there exists a hidden unobserved disrete random variables {U t },where{u t } satisfies the Markov property P (U t U t,u t,...)p(u t U t )and drives thedependene in {Y t }. In other words P (Y t U t,y t,u t,...)p(y t U t ). To summarise, the HMM is desribed by the following properties: (i) We observe {Y t } (whih an be either ontinuous or disrete random variables) but do not observe the hidden disrete random variables {U t }. (ii) Both {Y t } and {U t } are time-homogenuous Markov random variables that is P (Y t Y t,y t,...) P (Y t Y t ) and P (U t U t,u t,...) P (U t U t ). The distributions of P (Y t ), P (Y t Y t ), P (U t )andp (U t U t )donotdependont. (iii) The dependene between {Y t } is driven by {U t },thatisp (Y t U t,y t,u t,...) P (Y t U t ). 6

25 There are several examples of HMM, but to have a lear intepretation of them, in this setion we shall only onsider one lassial example of a HMM. Let us suppose that the hidden random variable U t an take N possible values {,...,N} and let p i P (U t i) and p ij P (U t i U t j). Moreover, let us suppose that Y t are ontinuous random variables where (Y t U t i) N(µ i, i )andtheonditionalrandomvariablesy t U t and Y U are independent of eah other. Our objetive is to estimate the parameters {p i,p ij,µ i, i } given {Y i }.Letf i ( ; ) denotethenormaldistributionn (µ i, i ). Remark 7.3. (HMM and mixture models) Mixture models (desribed in the above setion) are a partiular example of HMM. In this ase the unobserved variables {U t } are iid, where p i P (U t i U t j) P (U t i) for all i and j. Let us denote the log-likelihood of {Y t } as L T (Y ; ) (thisistheobservedlikelihood). It is lear that onstruting an expliit expression for L T is di ult, thus maximising the likelihood is near impossible. In the remark below we derive the observed likelihood. Remark 7.3. The likelihood of Y (Y,...,Y T ) is L T (Y ; ) f(y T Y T,Y T,...; )...f(y Y ; )P (Y ; ) f(y T Y T ; )...f(y Y ; )f(y ; ). Thus the log-likelihood is L T (Y ; ) TX log f(y t Y t t ; )+f(y ; ). The distribution of f(y ; ) is simply the mixture distribution f(y ; ) p f(y ; )+...+ p N f(y ; N ), where p i P (U t i). The onditional f(y t Y t ) is more triky. We start with f(y t Y t ; ) f(y t,y t ; ) f(y t ; ). An expression for f(y t ; ) is given above. To evaluate f(y t,y t ; ) we ondition on 7

26 U t,u t to give (using the Markov and onditional independent propery) f(y t,y t ; ) X i,j X i,j X i,j f(y t,y t U t i, U t j)p (U t i, U t j) f(y t U t i)p (Y t U t j)p (U t i U t j)p (U t i) f i (Y t ; i )f j (Y t ; j )p ij p i. Thus we have f(y t Y t ; ) P i,j f i(y t ; i )f j (Y t ; j )p ij p i P i p. if(y t ; i ) We substitute the above into L T (Y ; ) to give the expression L T (Y ; ) TX P log i,j fi(y t ; i )f j (Y t ; j )p ij p i P i p +log if(y t ; i ) t! NX p i f(y ; i ). Clearly, this is extremely di ult to maximise. Instead we seek an indiret method for maximising the likelihood. By using the EM algorithm we an maximise a likelihood whih is a lot easier to evaluate. Let us suppose that we observe {Y t,u t }.SineP(Y U) P (Y T Y T,...,Y,U)P(Y T Y T,...,Y,U)...P(Y U) Q T t P (Y t U t ), and the distribution of Y t U t is N (µ Ut, Ut ), then the omplete likelihood of {Y t,u t } is T Y t f(y t U t ; ) p U T Y t p Ut U t. Thus the log-likelihood of the omplete observations {Y t,u t } is L T (Y,U; ) TX log f(y t U t ; )+ t TX log p Ut Ut +logp U. t Of ourse, we do not observe the omplete likelihood, but the above an be used in order to define the funtion Q(,) whih is maximised in the EM-algorithm. It is worth mentioning that given the transition probabilities of a disrete Markov hain (that is {p i,j } ij )themarginal/stationaryprobabilities{p i } an be obtained by solving P, where P is the transition matrix. Thus it is not neessary to estimate the marginal 8

27 probabilities {p i } (note that the exlusion of {p i } in the log-likelihood, above, gives the onditional omplete log-likelihood). We reall that to maximise the observed likelihood L T (Y ; ) usingtheemalgorithm involves evaluating Q(,), where TX Q(,) E log f(y t U t ; )+ t X U{,...,N} T TX log p Ut Ut +logp U Y, t TX log f(y t U t ; )+ t TX log p Ut Ut +logp U p(u Y, ). Note that eah step in the algorithm the probability p(u Y, )needstobeevaluated. This is done by using onditioning p(u Y, ) p(u Y, ) p(u Y, ) t TY P (U t U t,...,u Y ; ) t TY P (U t U t t,y; )usingthemarkovproperty. Evaluation of the above is not simple (mainly beause one is estimating the probability of being in state U t based on U t and the observation information Y t in the past, present and future). This is usually done using the so alled forward bakward algorithm (and is related to the idea of Kalman filtering). For this example the EM algorithm is (i) Define an initial value. Let. (ii) The expetation step, For a fixed evaluate P (U t,y, ), P (U t U t,y, )andq(,). (iii) The maximisation step Evaluate k+ argmax Q(,)bydi erentiatingq(,)wrtto and equating to zero. (iv) If k and k+ are su iently lose to eah other stop the algorithm and set ˆ n k+. Else set k+,gobakandrepeatsteps(ii)and(iii)again. 9

28 Sine P (U Y, )P(U,Y, )/P (Y, ) and P (U t,u t Y, )P(U t,u t,y, )/P (Y, ); P (Y, )isommontoallu in {,...,N} T and is independent of,thusratherthan maximising Q(,) one an equvalently maximise X TX TX eq(,) log f(y t U t ; )+ log p Ut Ut +logp U p(u,y, ), U{,...,N} T t t noting that Q( e,) / Q(,)andthemaximumof Q(,)withrespetto is the same as the maximum of Q(,). 0

Methods of evaluating tests

Methods of evaluating tests Methods of evaluating tests Let X,, 1 Xn be i.i.d. Bernoulli( p ). Then 5 j= 1 j ( 5, ) T = X Binomial p. We test 1 H : p vs. 1 1 H : p>. We saw that a LRT is 1 if t k* φ ( x ) =. otherwise (t is the observed

More information

Maximum Entropy and Exponential Families

Maximum Entropy and Exponential Families Maximum Entropy and Exponential Families April 9, 209 Abstrat The goal of this note is to derive the exponential form of probability distribution from more basi onsiderations, in partiular Entropy. It

More information

Chapter 8 Hypothesis Testing

Chapter 8 Hypothesis Testing Leture 5 for BST 63: Statistial Theory II Kui Zhang, Spring Chapter 8 Hypothesis Testing Setion 8 Introdution Definition 8 A hypothesis is a statement about a population parameter Definition 8 The two

More information

The Hanging Chain. John McCuan. January 19, 2006

The Hanging Chain. John McCuan. January 19, 2006 The Hanging Chain John MCuan January 19, 2006 1 Introdution We onsider a hain of length L attahed to two points (a, u a and (b, u b in the plane. It is assumed that the hain hangs in the plane under a

More information

Hankel Optimal Model Order Reduction 1

Hankel Optimal Model Order Reduction 1 Massahusetts Institute of Tehnology Department of Eletrial Engineering and Computer Siene 6.245: MULTIVARIABLE CONTROL SYSTEMS by A. Megretski Hankel Optimal Model Order Redution 1 This leture overs both

More information

Time Domain Method of Moments

Time Domain Method of Moments Time Domain Method of Moments Massahusetts Institute of Tehnology 6.635 leture notes 1 Introdution The Method of Moments (MoM) introdued in the previous leture is widely used for solving integral equations

More information

Complexity of Regularization RBF Networks

Complexity of Regularization RBF Networks Complexity of Regularization RBF Networks Mark A Kon Department of Mathematis and Statistis Boston University Boston, MA 02215 mkon@buedu Leszek Plaskota Institute of Applied Mathematis University of Warsaw

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilisti Graphial Models David Sontag New York University Leture 12, April 19, 2012 Aknowledgement: Partially based on slides by Eri Xing at CMU and Andrew MCallum at UMass Amherst David Sontag (NYU)

More information

Tests of fit for symmetric variance gamma distributions

Tests of fit for symmetric variance gamma distributions Tests of fit for symmetri variane gamma distributions Fragiadakis Kostas UADPhilEon, National and Kapodistrian University of Athens, 4 Euripidou Street, 05 59 Athens, Greee. Keywords: Variane Gamma Distribution,

More information

Strauss PDEs 2e: Section Exercise 3 Page 1 of 13. u tt c 2 u xx = cos x. ( 2 t c 2 2 x)u = cos x. v = ( t c x )u

Strauss PDEs 2e: Section Exercise 3 Page 1 of 13. u tt c 2 u xx = cos x. ( 2 t c 2 2 x)u = cos x. v = ( t c x )u Strauss PDEs e: Setion 3.4 - Exerise 3 Page 1 of 13 Exerise 3 Solve u tt = u xx + os x, u(x, ) = sin x, u t (x, ) = 1 + x. Solution Solution by Operator Fatorization Bring u xx to the other side. Write

More information

LECTURE NOTES FOR , FALL 2004

LECTURE NOTES FOR , FALL 2004 LECTURE NOTES FOR 18.155, FALL 2004 83 12. Cone support and wavefront set In disussing the singular support of a tempered distibution above, notie that singsupp(u) = only implies that u C (R n ), not as

More information

max min z i i=1 x j k s.t. j=1 x j j:i T j

max min z i i=1 x j k s.t. j=1 x j j:i T j AM 221: Advaned Optimization Spring 2016 Prof. Yaron Singer Leture 22 April 18th 1 Overview In this leture, we will study the pipage rounding tehnique whih is a deterministi rounding proedure that an be

More information

A Characterization of Wavelet Convergence in Sobolev Spaces

A Characterization of Wavelet Convergence in Sobolev Spaces A Charaterization of Wavelet Convergene in Sobolev Spaes Mark A. Kon 1 oston University Louise Arakelian Raphael Howard University Dediated to Prof. Robert Carroll on the oasion of his 70th birthday. Abstrat

More information

A Queueing Model for Call Blending in Call Centers

A Queueing Model for Call Blending in Call Centers A Queueing Model for Call Blending in Call Centers Sandjai Bhulai and Ger Koole Vrije Universiteit Amsterdam Faulty of Sienes De Boelelaan 1081a 1081 HV Amsterdam The Netherlands E-mail: {sbhulai, koole}@s.vu.nl

More information

Solutions to Problem Set 1

Solutions to Problem Set 1 Eon602: Maro Theory Eonomis, HKU Instrutor: Dr. Yulei Luo September 208 Solutions to Problem Set. [0 points] Consider the following lifetime optimal onsumption-saving problem: v (a 0 ) max f;a t+ g t t

More information

Discrete Bessel functions and partial difference equations

Discrete Bessel functions and partial difference equations Disrete Bessel funtions and partial differene equations Antonín Slavík Charles University, Faulty of Mathematis and Physis, Sokolovská 83, 186 75 Praha 8, Czeh Republi E-mail: slavik@karlin.mff.uni.z Abstrat

More information

Product Policy in Markets with Word-of-Mouth Communication. Technical Appendix

Product Policy in Markets with Word-of-Mouth Communication. Technical Appendix rodut oliy in Markets with Word-of-Mouth Communiation Tehnial Appendix August 05 Miro-Model for Inreasing Awareness In the paper, we make the assumption that awareness is inreasing in ustomer type. I.e.,

More information

7 Max-Flow Problems. Business Computing and Operations Research 608

7 Max-Flow Problems. Business Computing and Operations Research 608 7 Max-Flow Problems Business Computing and Operations Researh 68 7. Max-Flow Problems In what follows, we onsider a somewhat modified problem onstellation Instead of osts of transmission, vetor now indiates

More information

MAC Calculus II Summer All you need to know on partial fractions and more

MAC Calculus II Summer All you need to know on partial fractions and more MC -75-Calulus II Summer 00 ll you need to know on partial frations and more What are partial frations? following forms:.... where, α are onstants. Partial frations are frations of one of the + α, ( +

More information

Likelihood-confidence intervals for quantiles in Extreme Value Distributions

Likelihood-confidence intervals for quantiles in Extreme Value Distributions Likelihood-onfidene intervals for quantiles in Extreme Value Distributions A. Bolívar, E. Díaz-Franés, J. Ortega, and E. Vilhis. Centro de Investigaión en Matemátias; A.P. 42, Guanajuato, Gto. 36; Méxio

More information

CSC2515 Winter 2015 Introduc3on to Machine Learning. Lecture 5: Clustering, mixture models, and EM

CSC2515 Winter 2015 Introduc3on to Machine Learning. Lecture 5: Clustering, mixture models, and EM CSC2515 Winter 2015 Introdu3on to Mahine Learning Leture 5: Clustering, mixture models, and EM All leture slides will be available as.pdf on the ourse website: http://www.s.toronto.edu/~urtasun/ourses/csc2515/

More information

SOA/CAS MAY 2003 COURSE 1 EXAM SOLUTIONS

SOA/CAS MAY 2003 COURSE 1 EXAM SOLUTIONS SOA/CAS MAY 2003 COURSE 1 EXAM SOLUTIONS Prepared by S. Broverman e-mail 2brove@rogers.om website http://members.rogers.om/2brove 1. We identify the following events:. - wathed gymnastis, ) - wathed baseball,

More information

CRITICAL EXPONENTS TAKING INTO ACCOUNT DYNAMIC SCALING FOR ADSORPTION ON SMALL-SIZE ONE-DIMENSIONAL CLUSTERS

CRITICAL EXPONENTS TAKING INTO ACCOUNT DYNAMIC SCALING FOR ADSORPTION ON SMALL-SIZE ONE-DIMENSIONAL CLUSTERS Russian Physis Journal, Vol. 48, No. 8, 5 CRITICAL EXPONENTS TAKING INTO ACCOUNT DYNAMIC SCALING FOR ADSORPTION ON SMALL-SIZE ONE-DIMENSIONAL CLUSTERS A. N. Taskin, V. N. Udodov, and A. I. Potekaev UDC

More information

Quantum Mechanics: Wheeler: Physics 6210

Quantum Mechanics: Wheeler: Physics 6210 Quantum Mehanis: Wheeler: Physis 60 Problems some modified from Sakurai, hapter. W. S..: The Pauli matries, σ i, are a triple of matries, σ, σ i = σ, σ, σ 3 given by σ = σ = σ 3 = i i Let stand for the

More information

Remark 4.1 Unlike Lyapunov theorems, LaSalle s theorem does not require the function V ( x ) to be positive definite.

Remark 4.1 Unlike Lyapunov theorems, LaSalle s theorem does not require the function V ( x ) to be positive definite. Leture Remark 4.1 Unlike Lyapunov theorems, LaSalle s theorem does not require the funtion V ( x ) to be positive definite. ost often, our interest will be to show that x( t) as t. For that we will need

More information

Millennium Relativity Acceleration Composition. The Relativistic Relationship between Acceleration and Uniform Motion

Millennium Relativity Acceleration Composition. The Relativistic Relationship between Acceleration and Uniform Motion Millennium Relativity Aeleration Composition he Relativisti Relationship between Aeleration and niform Motion Copyright 003 Joseph A. Rybzyk Abstrat he relativisti priniples developed throughout the six

More information

Aharonov-Bohm effect. Dan Solomon.

Aharonov-Bohm effect. Dan Solomon. Aharonov-Bohm effet. Dan Solomon. In the figure the magneti field is onfined to a solenoid of radius r 0 and is direted in the z- diretion, out of the paper. The solenoid is surrounded by a barrier that

More information

Relative Maxima and Minima sections 4.3

Relative Maxima and Minima sections 4.3 Relative Maxima and Minima setions 4.3 Definition. By a ritial point of a funtion f we mean a point x 0 in the domain at whih either the derivative is zero or it does not exists. So, geometrially, one

More information

Danielle Maddix AA238 Final Project December 9, 2016

Danielle Maddix AA238 Final Project December 9, 2016 Struture and Parameter Learning in Bayesian Networks with Appliations to Prediting Breast Caner Tumor Malignany in a Lower Dimension Feature Spae Danielle Maddix AA238 Final Projet Deember 9, 2016 Abstrat

More information

CMSC 451: Lecture 9 Greedy Approximation: Set Cover Thursday, Sep 28, 2017

CMSC 451: Lecture 9 Greedy Approximation: Set Cover Thursday, Sep 28, 2017 CMSC 451: Leture 9 Greedy Approximation: Set Cover Thursday, Sep 28, 2017 Reading: Chapt 11 of KT and Set 54 of DPV Set Cover: An important lass of optimization problems involves overing a ertain domain,

More information

Math 151 Introduction to Eigenvectors

Math 151 Introduction to Eigenvectors Math 151 Introdution to Eigenvetors The motivating example we used to desrie matrixes was landsape hange and vegetation suession. We hose the simple example of Bare Soil (B), eing replaed y Grasses (G)

More information

Fundamental Theorem of Calculus

Fundamental Theorem of Calculus Chater 6 Fundamental Theorem of Calulus 6. Definition (Nie funtions.) I will say that a real valued funtion f defined on an interval [a, b] is a nie funtion on [a, b], if f is ontinuous on [a, b] and integrable

More information

4.2 Estimation on the boundary of the parameter space

4.2 Estimation on the boundary of the parameter space Chapter 4 Non-standard inference As we mentioned in Chapter the the log-likelihood ratio statistic is useful in the context of statistical testing because typically it is pivotal (does not depend on any

More information

22.54 Neutron Interactions and Applications (Spring 2004) Chapter 6 (2/24/04) Energy Transfer Kernel F(E E')

22.54 Neutron Interactions and Applications (Spring 2004) Chapter 6 (2/24/04) Energy Transfer Kernel F(E E') 22.54 Neutron Interations and Appliations (Spring 2004) Chapter 6 (2/24/04) Energy Transfer Kernel F(E E') Referenes -- J. R. Lamarsh, Introdution to Nulear Reator Theory (Addison-Wesley, Reading, 1966),

More information

Determination of the reaction order

Determination of the reaction order 5/7/07 A quote of the wee (or amel of the wee): Apply yourself. Get all the eduation you an, but then... do something. Don't just stand there, mae it happen. Lee Iaoa Physial Chemistry GTM/5 reation order

More information

Computer Science 786S - Statistical Methods in Natural Language Processing and Data Analysis Page 1

Computer Science 786S - Statistical Methods in Natural Language Processing and Data Analysis Page 1 Computer Siene 786S - Statistial Methods in Natural Language Proessing and Data Analysis Page 1 Hypothesis Testing A statistial hypothesis is a statement about the nature of the distribution of a random

More information

Control Theory association of mathematics and engineering

Control Theory association of mathematics and engineering Control Theory assoiation of mathematis and engineering Wojieh Mitkowski Krzysztof Oprzedkiewiz Department of Automatis AGH Univ. of Siene & Tehnology, Craow, Poland, Abstrat In this paper a methodology

More information

6 Dynamic Optimization in Continuous Time

6 Dynamic Optimization in Continuous Time 6 Dynami Optimization in Continuous Time 6.1 Dynami programming in ontinuous time Consider the problem Z T max e rt u (k,, t) dt (1) (t) T s.t. k ú = f (k,, t) (2) k () = k, (3) with k (T )= k (ase 1),

More information

Journal of Inequalities in Pure and Applied Mathematics

Journal of Inequalities in Pure and Applied Mathematics Journal of Inequalities in Pure and Applied Mathematis A NEW ARRANGEMENT INEQUALITY MOHAMMAD JAVAHERI University of Oregon Department of Mathematis Fenton Hall, Eugene, OR 97403. EMail: javaheri@uoregon.edu

More information

Where as discussed previously we interpret solutions to this partial differential equation in the weak sense: b

Where as discussed previously we interpret solutions to this partial differential equation in the weak sense: b Consider the pure initial value problem for a homogeneous system of onservation laws with no soure terms in one spae dimension: Where as disussed previously we interpret solutions to this partial differential

More information

Differential Equations 8/24/2010

Differential Equations 8/24/2010 Differential Equations A Differential i Equation (DE) is an equation ontaining one or more derivatives of an unknown dependant d variable with respet to (wrt) one or more independent variables. Solution

More information

arxiv: v2 [math.pr] 9 Dec 2016

arxiv: v2 [math.pr] 9 Dec 2016 Omnithermal Perfet Simulation for Multi-server Queues Stephen B. Connor 3th Deember 206 arxiv:60.0602v2 [math.pr] 9 De 206 Abstrat A number of perfet simulation algorithms for multi-server First Come First

More information

Optimization of Statistical Decisions for Age Replacement Problems via a New Pivotal Quantity Averaging Approach

Optimization of Statistical Decisions for Age Replacement Problems via a New Pivotal Quantity Averaging Approach Amerian Journal of heoretial and Applied tatistis 6; 5(-): -8 Published online January 7, 6 (http://www.sienepublishinggroup.om/j/ajtas) doi:.648/j.ajtas.s.65.4 IN: 36-8999 (Print); IN: 36-96 (Online)

More information

A Variational Definition for Limit and Derivative

A Variational Definition for Limit and Derivative Amerian Journal of Applied Mathematis 2016; 4(3): 137-141 http://www.sienepublishinggroup.om/j/ajam doi: 10.11648/j.ajam.20160403.14 ISSN: 2330-0043 (Print); ISSN: 2330-006X (Online) A Variational Definition

More information

Microeconomic Theory I Assignment #7 - Answer key

Microeconomic Theory I Assignment #7 - Answer key Miroeonomi Theory I Assignment #7 - Answer key. [Menu priing in monopoly] Consider the example on seond-degree prie disrimination (see slides 9-93). To failitate your alulations, assume H = 5, L =, and

More information

Theory. Coupled Rooms

Theory. Coupled Rooms Theory of Coupled Rooms For: nternal only Report No.: R/50/TCR Prepared by:. N. taey B.., MO Otober 00 .00 Objet.. The objet of this doument is present the theory alulations to estimate the reverberant

More information

Simple FIR Digital Filters. Simple FIR Digital Filters. Simple Digital Filters. Simple FIR Digital Filters. Simple FIR Digital Filters

Simple FIR Digital Filters. Simple FIR Digital Filters. Simple Digital Filters. Simple FIR Digital Filters. Simple FIR Digital Filters Simple Digital Filters Later in the ourse we shall review various methods of designing frequeny-seletive filters satisfying presribed speifiations We now desribe several low-order FIR and IIR digital filters

More information

Geometry of Transformations of Random Variables

Geometry of Transformations of Random Variables Geometry of Transformations of Random Variables Univariate distributions We are interested in the problem of finding the distribution of Y = h(x) when the transformation h is one-to-one so that there is

More information

Cavity flow with surface tension past a flat plate

Cavity flow with surface tension past a flat plate Proeedings of the 7 th International Symposium on Cavitation CAV9 Paper No. ## August 7-, 9, Ann Arbor, Mihigan, USA Cavity flow with surfae tension past a flat plate Yuriy Savhenko Institute of Hydromehanis

More information

23.1 Tuning controllers, in the large view Quoting from Section 16.7:

23.1 Tuning controllers, in the large view Quoting from Section 16.7: Lesson 23. Tuning a real ontroller - modeling, proess identifiation, fine tuning 23.0 Context We have learned to view proesses as dynami systems, taking are to identify their input, intermediate, and output

More information

EECS 120 Signals & Systems University of California, Berkeley: Fall 2005 Gastpar November 16, Solutions to Exam 2

EECS 120 Signals & Systems University of California, Berkeley: Fall 2005 Gastpar November 16, Solutions to Exam 2 EECS 0 Signals & Systems University of California, Berkeley: Fall 005 Gastpar November 6, 005 Solutions to Exam Last name First name SID You have hour and 45 minutes to omplete this exam. he exam is losed-book

More information

When p = 1, the solution is indeterminate, but we get the correct answer in the limit.

When p = 1, the solution is indeterminate, but we get the correct answer in the limit. The Mathematia Journal Gambler s Ruin and First Passage Time Jan Vrbik We investigate the lassial problem of a gambler repeatedly betting $1 on the flip of a potentially biased oin until he either loses

More information

DIGITAL DISTANCE RELAYING SCHEME FOR PARALLEL TRANSMISSION LINES DURING INTER-CIRCUIT FAULTS

DIGITAL DISTANCE RELAYING SCHEME FOR PARALLEL TRANSMISSION LINES DURING INTER-CIRCUIT FAULTS CHAPTER 4 DIGITAL DISTANCE RELAYING SCHEME FOR PARALLEL TRANSMISSION LINES DURING INTER-CIRCUIT FAULTS 4.1 INTRODUCTION Around the world, environmental and ost onsiousness are foring utilities to install

More information

Acoustic Waves in a Duct

Acoustic Waves in a Duct Aousti Waves in a Dut 1 One-Dimensional Waves The one-dimensional wave approximation is valid when the wavelength λ is muh larger than the diameter of the dut D, λ D. The aousti pressure disturbane p is

More information

Singular Event Detection

Singular Event Detection Singular Event Detetion Rafael S. Garía Eletrial Engineering University of Puerto Rio at Mayagüez Rafael.Garia@ee.uprm.edu Faulty Mentor: S. Shankar Sastry Researh Supervisor: Jonathan Sprinkle Graduate

More information

Model-based mixture discriminant analysis an experimental study

Model-based mixture discriminant analysis an experimental study Model-based mixture disriminant analysis an experimental study Zohar Halbe and Mayer Aladjem Department of Eletrial and Computer Engineering, Ben-Gurion University of the Negev P.O.Box 653, Beer-Sheva,

More information

Common Value Auctions with Costly Entry

Common Value Auctions with Costly Entry Common Value Autions with Costly Entry Pauli Murto Juuso Välimäki June, 205 preliminary and inomplete Abstrat We onsider a model where potential bidders onsider paying an entry ost to partiipate in an

More information

HOW TO FACTOR. Next you reason that if it factors, then the factorization will look something like,

HOW TO FACTOR. Next you reason that if it factors, then the factorization will look something like, HOW TO FACTOR ax bx I now want to talk a bit about how to fator ax bx where all the oeffiients a, b, and are integers. The method that most people are taught these days in high shool (assuming you go to

More information

SPLINE ESTIMATION OF SINGLE-INDEX MODELS

SPLINE ESTIMATION OF SINGLE-INDEX MODELS SPLINE ESIMAION OF SINGLE-INDEX MODELS Li Wang and Lijian Yang University of Georgia and Mihigan State University Supplementary Material his note ontains proofs for the main results he following two propositions

More information

Chapter IX. Roots of Equations

Chapter IX. Roots of Equations 14 Chapter IX Roots of Equations Root solving typially onsists of finding values of x whih satisfy a relationship suh as: 3 3 x + x = 1 + e x The problem an be restated as: Find the root of 3 f ( x) =

More information

Green s function for the wave equation

Green s function for the wave equation Green s funtion for the wave equation Non-relativisti ase January 2019 1 The wave equations In the Lorentz Gauge, the wave equations for the potentials are (Notes 1 eqns 43 and 44): 1 2 A 2 2 2 A = µ 0

More information

Optimal control of solar energy systems

Optimal control of solar energy systems Optimal ontrol of solar energy systems Viorel Badesu Candida Oanea Institute Polytehni University of Buharest Contents. Optimal operation - systems with water storage tanks 2. Sizing solar olletors 3.

More information

3 Tidal systems modelling: ASMITA model

3 Tidal systems modelling: ASMITA model 3 Tidal systems modelling: ASMITA model 3.1 Introdution For many pratial appliations, simulation and predition of oastal behaviour (morphologial development of shorefae, beahes and dunes) at a ertain level

More information

To work algebraically with exponential functions, we need to use the laws of exponents. You should

To work algebraically with exponential functions, we need to use the laws of exponents. You should Prealulus: Exponential and Logisti Funtions Conepts: Exponential Funtions, the base e, logisti funtions, properties. Laws of Exponents memorize these laws. To work algebraially with exponential funtions,

More information

A model for measurement of the states in a coupled-dot qubit

A model for measurement of the states in a coupled-dot qubit A model for measurement of the states in a oupled-dot qubit H B Sun and H M Wiseman Centre for Quantum Computer Tehnology Centre for Quantum Dynamis Griffith University Brisbane 4 QLD Australia E-mail:

More information

18.05 Problem Set 6, Spring 2014 Solutions

18.05 Problem Set 6, Spring 2014 Solutions 8.5 Problem Set 6, Spring 4 Solutions Problem. pts.) a) Throughout this problem we will let x be the data of 4 heads out of 5 tosses. We have 4/5 =.56. Computing the likelihoods: 5 5 px H )=.5) 5 px H

More information

UPPER-TRUNCATED POWER LAW DISTRIBUTIONS

UPPER-TRUNCATED POWER LAW DISTRIBUTIONS Fratals, Vol. 9, No. (00) 09 World Sientifi Publishing Company UPPER-TRUNCATED POWER LAW DISTRIBUTIONS STEPHEN M. BURROUGHS and SARAH F. TEBBENS College of Marine Siene, University of South Florida, St.

More information

Some facts you should know that would be convenient when evaluating a limit:

Some facts you should know that would be convenient when evaluating a limit: Some fats you should know that would be onvenient when evaluating a it: When evaluating a it of fration of two funtions, f(x) x a g(x) If f and g are both ontinuous inside an open interval that ontains

More information

Lecture 7: Sampling/Projections for Least-squares Approximation, Cont. 7 Sampling/Projections for Least-squares Approximation, Cont.

Lecture 7: Sampling/Projections for Least-squares Approximation, Cont. 7 Sampling/Projections for Least-squares Approximation, Cont. Stat60/CS94: Randomized Algorithms for Matries and Data Leture 7-09/5/013 Leture 7: Sampling/Projetions for Least-squares Approximation, Cont. Leturer: Mihael Mahoney Sribe: Mihael Mahoney Warning: these

More information

(q) -convergence. Comenius University, Bratislava, Slovakia

(q) -convergence.   Comenius University, Bratislava, Slovakia Annales Mathematiae et Informatiae 38 (2011) pp. 27 36 http://ami.ektf.hu On I (q) -onvergene J. Gogola a, M. Mačaj b, T. Visnyai b a University of Eonomis, Bratislava, Slovakia e-mail: gogola@euba.sk

More information

ON A PROCESS DERIVED FROM A FILTERED POISSON PROCESS

ON A PROCESS DERIVED FROM A FILTERED POISSON PROCESS ON A PROCESS DERIVED FROM A FILTERED POISSON PROCESS MARIO LEFEBVRE and JEAN-LUC GUILBAULT A ontinuous-time and ontinuous-state stohasti proess, denoted by {Xt), t }, is defined from a proess known as

More information

Lecture 3 - Lorentz Transformations

Lecture 3 - Lorentz Transformations Leture - Lorentz Transformations A Puzzle... Example A ruler is positioned perpendiular to a wall. A stik of length L flies by at speed v. It travels in front of the ruler, so that it obsures part of the

More information

Verka Prolović Chair of Civil Engineering Geotechnics, Faculty of Civil Engineering and Architecture, Niš, R. Serbia

Verka Prolović Chair of Civil Engineering Geotechnics, Faculty of Civil Engineering and Architecture, Niš, R. Serbia 3 r d International Conferene on New Developments in Soil Mehanis and Geotehnial Engineering, 8-30 June 01, Near East University, Niosia, North Cyprus Values of of partial fators for for EC EC 7 7 slope

More information

Estimation for Unknown Parameters of the Extended Burr Type-XII Distribution Based on Type-I Hybrid Progressive Censoring Scheme

Estimation for Unknown Parameters of the Extended Burr Type-XII Distribution Based on Type-I Hybrid Progressive Censoring Scheme Estimation for Unknown Parameters of the Extended Burr Type-XII Distribution Based on Type-I Hybrid Progressive Censoring Sheme M.M. Amein Department of Mathematis Statistis, Faulty of Siene, Taif University,

More information

Sensitivity Analysis in Markov Networks

Sensitivity Analysis in Markov Networks Sensitivity Analysis in Markov Networks Hei Chan and Adnan Darwihe Computer Siene Department University of California, Los Angeles Los Angeles, CA 90095 {hei,darwihe}@s.ula.edu Abstrat This paper explores

More information

Subject: Introduction to Component Matching and Off-Design Operation % % ( (1) R T % (

Subject: Introduction to Component Matching and Off-Design Operation % % ( (1) R T % ( 16.50 Leture 0 Subjet: Introdution to Component Mathing and Off-Design Operation At this point it is well to reflet on whih of the many parameters we have introdued (like M, τ, τ t, ϑ t, f, et.) are free

More information

Symplectic Projector and Physical Degrees of Freedom of The Classical Particle

Symplectic Projector and Physical Degrees of Freedom of The Classical Particle Sympleti Projetor and Physial Degrees of Freedom of The Classial Partile M. A. De Andrade a, M. A. Santos b and I. V. Vanea arxiv:hep-th/0308169v3 7 Sep 2003 a Grupo de Físia Teória, Universidade Católia

More information

Supplementary Materials

Supplementary Materials Supplementary Materials Neural population partitioning and a onurrent brain-mahine interfae for sequential motor funtion Maryam M. Shanehi, Rollin C. Hu, Marissa Powers, Gregory W. Wornell, Emery N. Brown

More information

Sensitivity analysis for linear optimization problem with fuzzy data in the objective function

Sensitivity analysis for linear optimization problem with fuzzy data in the objective function Sensitivity analysis for linear optimization problem with fuzzy data in the objetive funtion Stephan Dempe, Tatiana Starostina May 5, 2004 Abstrat Linear programming problems with fuzzy oeffiients in the

More information

The shape of a hanging chain. a project in the calculus of variations

The shape of a hanging chain. a project in the calculus of variations The shape of a hanging hain a projet in the alulus of variations April 15, 218 2 Contents 1 Introdution 5 2 Analysis 7 2.1 Model............................... 7 2.2 Extremal graphs.........................

More information

On Component Order Edge Reliability and the Existence of Uniformly Most Reliable Unicycles

On Component Order Edge Reliability and the Existence of Uniformly Most Reliable Unicycles Daniel Gross, Lakshmi Iswara, L. William Kazmierzak, Kristi Luttrell, John T. Saoman, Charles Suffel On Component Order Edge Reliability and the Existene of Uniformly Most Reliable Uniyles DANIEL GROSS

More information

QUANTUM MECHANICS II PHYS 517. Solutions to Problem Set # 1

QUANTUM MECHANICS II PHYS 517. Solutions to Problem Set # 1 QUANTUM MECHANICS II PHYS 57 Solutions to Problem Set #. The hamiltonian for a lassial harmoni osillator an be written in many different forms, suh as use ω = k/m H = p m + kx H = P + Q hω a. Find a anonial

More information

36-720: Log-Linear Models: Three-Way Tables

36-720: Log-Linear Models: Three-Way Tables 36-720: Log-Linear Models: Three-Way Tables Brian Junker September 10, 2007 Hierarhial Speifiation of Log-Linear Models Higher-Dimensional Tables Digression: Why Poisson If We Believe Multinomial? Produt

More information

A NETWORK SIMPLEX ALGORITHM FOR THE MINIMUM COST-BENEFIT NETWORK FLOW PROBLEM

A NETWORK SIMPLEX ALGORITHM FOR THE MINIMUM COST-BENEFIT NETWORK FLOW PROBLEM NETWORK SIMPLEX LGORITHM FOR THE MINIMUM COST-BENEFIT NETWORK FLOW PROBLEM Cen Çalışan, Utah Valley University, 800 W. University Parway, Orem, UT 84058, 801-863-6487, en.alisan@uvu.edu BSTRCT The minimum

More information

Taste for variety and optimum product diversity in an open economy

Taste for variety and optimum product diversity in an open economy Taste for variety and optimum produt diversity in an open eonomy Javier Coto-Martínez City University Paul Levine University of Surrey Otober 0, 005 María D.C. Garía-Alonso University of Kent Abstrat We

More information

Parallel disrete-event simulation is an attempt to speed-up the simulation proess through the use of multiple proessors. In some sense parallel disret

Parallel disrete-event simulation is an attempt to speed-up the simulation proess through the use of multiple proessors. In some sense parallel disret Exploiting intra-objet dependenies in parallel simulation Franeso Quaglia a;1 Roberto Baldoni a;2 a Dipartimento di Informatia e Sistemistia Universita \La Sapienza" Via Salaria 113, 198 Roma, Italy Abstrat

More information

Estimation of Density Function Parameters with Censored Data from Product Life Tests

Estimation of Density Function Parameters with Censored Data from Product Life Tests VŠB TU Ostrava Faulty of Mehanial Engineering Department of Control Systems and Instrumentation Estimation of Density Funtion Parameters with Censored Data from Produt Life Tests JiříTůma 7. listopadu

More information

Modeling of discrete/continuous optimization problems: characterization and formulation of disjunctions and their relaxations

Modeling of discrete/continuous optimization problems: characterization and formulation of disjunctions and their relaxations Computers and Chemial Engineering (00) 4/448 www.elsevier.om/loate/omphemeng Modeling of disrete/ontinuous optimization problems: haraterization and formulation of disjuntions and their relaxations Aldo

More information

Wavetech, LLC. Ultrafast Pulses and GVD. John O Hara Created: Dec. 6, 2013

Wavetech, LLC. Ultrafast Pulses and GVD. John O Hara Created: Dec. 6, 2013 Ultrafast Pulses and GVD John O Hara Created: De. 6, 3 Introdution This doument overs the basi onepts of group veloity dispersion (GVD) and ultrafast pulse propagation in an optial fiber. Neessarily, it

More information

Ordered fields and the ultrafilter theorem

Ordered fields and the ultrafilter theorem F U N D A M E N T A MATHEMATICAE 59 (999) Ordered fields and the ultrafilter theorem by R. B e r r (Dortmund), F. D e l o n (Paris) and J. S h m i d (Dortmund) Abstrat. We prove that on the basis of ZF

More information

Dirac s equation We construct relativistically covariant equation that takes into account also the spin. The kinetic energy operator is

Dirac s equation We construct relativistically covariant equation that takes into account also the spin. The kinetic energy operator is Dira s equation We onstrut relativistially ovariant equation that takes into aount also the spin The kineti energy operator is H KE p Previously we derived for Pauli spin matries the relation so we an

More information

RIEMANN S FIRST PROOF OF THE ANALYTIC CONTINUATION OF ζ(s) AND L(s, χ)

RIEMANN S FIRST PROOF OF THE ANALYTIC CONTINUATION OF ζ(s) AND L(s, χ) RIEMANN S FIRST PROOF OF THE ANALYTIC CONTINUATION OF ζ(s AND L(s, χ FELIX RUBIN SEMINAR ON MODULAR FORMS, WINTER TERM 6 Abstrat. In this hapter, we will see a proof of the analyti ontinuation of the Riemann

More information

Indian Institute of Technology Bombay. Department of Electrical Engineering. EE 325 Probability and Random Processes Lecture Notes 3 July 28, 2014

Indian Institute of Technology Bombay. Department of Electrical Engineering. EE 325 Probability and Random Processes Lecture Notes 3 July 28, 2014 Indian Institute of Tehnology Bombay Department of Eletrial Engineering Handout 5 EE 325 Probability and Random Proesses Leture Notes 3 July 28, 2014 1 Axiomati Probability We have learned some paradoxes

More information

A simple expression for radial distribution functions of pure fluids and mixtures

A simple expression for radial distribution functions of pure fluids and mixtures A simple expression for radial distribution funtions of pure fluids and mixtures Enrio Matteoli a) Istituto di Chimia Quantistia ed Energetia Moleolare, CNR, Via Risorgimento, 35, 56126 Pisa, Italy G.

More information

The Laws of Acceleration

The Laws of Acceleration The Laws of Aeleration The Relationships between Time, Veloity, and Rate of Aeleration Copyright 2001 Joseph A. Rybzyk Abstrat Presented is a theory in fundamental theoretial physis that establishes the

More information

10.2 The Occurrence of Critical Flow; Controls

10.2 The Occurrence of Critical Flow; Controls 10. The Ourrene of Critial Flow; Controls In addition to the type of problem in whih both q and E are initially presribed; there is a problem whih is of pratial interest: Given a value of q, what fators

More information

Relativistic Addition of Velocities *

Relativistic Addition of Velocities * OpenStax-CNX module: m42540 1 Relativisti Addition of Veloities * OpenStax This work is produed by OpenStax-CNX and liensed under the Creative Commons Attribution Liense 3.0 Abstrat Calulate relativisti

More information

Bäcklund Transformations: Some Old and New Perspectives

Bäcklund Transformations: Some Old and New Perspectives Bäklund Transformations: Some Old and New Perspetives C. J. Papahristou *, A. N. Magoulas ** * Department of Physial Sienes, Helleni Naval Aademy, Piraeus 18539, Greee E-mail: papahristou@snd.edu.gr **

More information

Volunteering and the strategic value of ignorance

Volunteering and the strategic value of ignorance Volunteering and the strategi value of ignorane Florian Morath Max Plank Institute for Tax Law and Publi Finane November 10, 011 Abstrat Private provision of publi goods often takes plae as a war of attrition:

More information

Discrete Generalized Burr-Type XII Distribution

Discrete Generalized Burr-Type XII Distribution Journal of Modern Applied Statistial Methods Volume 13 Issue 2 Artile 13 11-2014 Disrete Generalized Burr-Type XII Distribution B. A. Para University of Kashmir, Srinagar, India, parabilal@gmail.om T.

More information