Probabilistic learning

Size: px
Start display at page:

Download "Probabilistic learning"

Transcription

1 Probabilisic learning Charles Elkan November 8, 2012 Imporan: These lecure noes are based closely on noes wrien by Lawrence Saul. Tex may be copied direcly from his noes, or paraphrased. Also, hese ypese noes lack illusraions. See he classroom lecures for figures and diagrams. 1 Learning in a Bayesian nework A Bayesian nework is a direced graph wih a CPT (condiional probabiliy able) for each node. This secion explains how o learn he CPTs from raining daa. As explained before, he raining daa are a marix where each row is an insance and each column is a feaure. Insances are also called examples, while feaures are also called nodes, random variables, and aribues. One enry in he marix is one value of one feaure, ha is one oucome of one random variable. We consider firs he scenario where each insance is complee, ha is he oucome of every node is observed for every insance. In his scenario, nohing is unknown, or in oher words, here is no missing daa. This scenario is also called fully visible, or no hidden nodes, or no laen variables. We also assume ha he graph of he Bayesian nework is known, ha is nodes X 1 o X n consiue a finie se, and ha each node is random variable wih a discree finie se of alernaive values. In his scenario, wha we need o learn is he CPT of each node. A single enry in one CPT is p(x i = x pa(x i ) = π) where x is a specific oucome of X i and π is a specific se of oucomes, also called a configuraion, of he paren nodes of X i. The raining daa are T insances x, each of which is a complee configuraion of X 1 o X n. We wrie x = (x 1,..., x n ). Remember he convenion ha he firs subscrip refers o rows while he second subscrip refers o columns. To make learning feasible, we need a basic assumpion abou he raining daa. 1

2 Assumpion. Each example is an independen and idenically disribued (IID) random sample from he join disribuion defined by he Bayesian nework. This assumpion has wo pars. Firs, each x is idenically disribued means ha each sample is generaed using he same CPTs. Second, being independen means ha probabiliies can be muliplied: p(x s, x ) = p(x s )p(x ). Wih he IID assumpion, we are ready o begin o derive a learning procedure. The probabiliy of he raining daa is P = T p(x 1 = x 1,..., X n = x n ). =1 The probabiliy of example is p(x 1 = x 1,..., X n = x n ) = = n p(x i = x i X 1 = x 1,..., X i 1 = x,i 1 ) i=1 n p(x i = x i pa(x i ) = pa i ) i=1 The firs equaion above follows from he chain rule of probabiliies, while he second follows from condiional independence in he Bayesian nework. Learning means choosing values, based on he available daa, for he aspecs of he model ha are unknown. Here, he model is he probabiliy disribuion specified by he Bayesian nework. Is graph is known bu he parameers inside is CPTs are unknown. The principle of maximum likelihood says ha we should choose values for unknown parameers in such a way ha he overall probabiliy of he raining daa is maximized. This principle is no a heorem ha can be proved. I is simply a sensible guideline. One way o argue ha he principle is sensible is o noice ha, essenially, i says ha we should assume ha he raining daa are he mos ypical possible, ha is, ha he observed daa are he mode of he disribuion o be learned. The principle of maximum likelihood says ha we should choose values for he parameers of he CPTs ha make P as large as possible. Le hese parameers be called w. The principle says ha we should choose w = argmax w P. Because he logarihm funcion is monoone sricly increasing, his is equivalen o choosing w = argmax w log P. I is convenien o maximize he log because he log of 2

3 a produc is a sum, and dealing wih sums is easier. So, he goal is o maximize L = log = T =1 T =1 i=1 n p(x i = x i pa(x i ) = pa i ) n log p(x i = x i pa(x i ) = pa i ) i=1 Swapping he order of he summaions gives L = n i=1 T log p(x i = x i pa(x i ) = pa i ). (1) =1 Now, noice ha each inner sum over involves a differen CPT. These CPTs have parameers whose values can be chosen compleely separaely. Therefore, L can be maximized by maximizing each inner sum separaely. We can decompose he ask of maximizing L ino n separae subasks o maximize M i = T log p(x i = x i pa(x i ) = pa i ) =1 for i = 1 o i = n. Consider one of hese subasks. The sum over reas each raining example separaely. To make progress, we group he raining examples ino equivalence classes. Each class consiss of all examples among he T ha have he same oucome for X i and he same oucome for he parens of X i. Le x range over he oucomes of X i and le π range over he oucomes of he parens of X i. Le coun(x, π) be how many of he T examples have he value x and he configuraion π. Noe ha T = coun(x, π). x π We can wrie M i = x coun(x, π) log p(x i = x pa(x i ) = π). π We wan o choose parameer values for he CPT for node i o maximize his expression. These parameer values are he probabiliies p(x i = x pa(x i ) = π). These values are consrained by he fac ha for each π log p(x i = x pa(x i ) = π) = 1. x 3

4 However, here is no consrain connecing he values for differen π. Therefore, we can swap he order of he summaions inside he expression for M i and obain a separae subask for each π. Wrie w x = p(x i = x pa(x i ) = π) and c x = coun(x, π). The problem o solve is maximize x c x log w x subjec o w x 0 and x w x = 1. This problem can be solved using Lagrange mulipliers. The soluion is w x = c x x c. x In words, he maximum likelihood esimae of he probabiliy ha X i = x, given ha he parens of X i are observed o be π, is coun(x i = x, pa(x i ) = π) p(x i = x pa(x i ) = π) = x coun(x i = x, pa(x i ) = π) = coun(x i = x, pa(x i ) = π) coun(pa(x i ) = π) = I(x = x i, π = pa i ) I(π = pa i ) where he couns are wih respec o he raining daa. These esimaes make sense inuiively. Each esimaed probabiliy is proporional o he corresponding frequency observed in he raining daa. If he value x is never observed for some combinaion π, hen is condiional probabiliy is esimaed o be zero. Alhough he esimaes are inuiively sensible, only a formal derivaion like he one above can show ha hey are correc (and unique). The derivaion uses several mahemaical manipulaions ha are common in similar argumens. These manipulaions include changing producs ino sums, swapping he order of summaions, and arguing ha maximizaion subasks are separae. End of he lecure on Thursday Ocober Markov models of language Many applicaions involving naural language need a model ha assigns probabiliies o senences. For example, he mos successful ranslaion sysems nowadays 4

5 for naural language are based on probabilisic models. Le F be a random variable whose values are senences wrien in French, and le E be a similar random variable ranging over English senences. Given a specific French senence f, he machine ranslaion ask is o find e = argmax e p(e = e F = f). One way o decompose he ask ino subasks is o use Bayes rule and wrie e = argmax e p(f = f E = e)p(e = e) p(f = f) = argmax e p(f = f E = e)p(e = e). The denominaor p(f = f) can be ignored because i is he same for all e. Alhough creaing a model of p(f = f E = e) is presumably jus as difficul as creaing a model direcly of p(e = e F = f), he model of p(e = e) can overcome some errors in p(f = f E = e). For example, regardless of he original senence in he foreign language, he English senence Colorless green ideas sleep furiously should no be a high-probabiliy ranslaion. This secion explains how o learn basic models of p(e = e). Clearly he probabiliy of a senence depends on he words in i, and also on he order of he words. Consider a senence ha consiss of he words w 1 o w L in order. Le hese words be he oucomes of random variables W 1 o W L. The chain rule of probabiliies says ha p(w 1, W 2,..., W L ) = p(w 1 )p(w 2 W 1 ) p(w L W L 1,... W 1 ). Words ha occur a long way before w l in he senence presumably influence he probabiliy of w l less, so o simplify his expression i is reasonable o fix a number n of previous words and wrie L p(w l W l n,..., W l 2, W l 1 ) l=1 wih each word depending only on he previous n words. In he special case where n = 0, each word is independen of he previous words. A model of his ype is called a Markov model of order n. A unigram model has order n = 0, a bigram model has order n = 1, and a rigram model has order n = 2. A bigram model corresponds o a Bayesian nework wih nodes W 1 o W L and an edge from each node W l o W l+1. Imporanly, he same CPT p(w l+1 = j W l = i) is used a each node W l+1. Fixing he enries in differen CPTs o be he same is called ying. Noice ha echnically we have a differen Bayesian nework for each differen lengh L, bu ying CPTs les us rea all hese neworks as he same. 5

6 How can we learn he shared CPT? Each node W l is a discree random variable, bu one wih a very large se of values. The cardinaliy of his se is he size of he vocabulary, ypically beween 10 4 and 10 5 in applicaions. Since mos words never follow each oher, a documen collecion of size smaller han (10 5 ) 2 words can be adequae for raining. Forunaely, nowadays i is easy o assemble and process collecions of 10 8 and more words. The maximum likelihood esimae of he CPT parameers is p(w l = j W l 1 = i) = c ij c i where c i is he number of imes ha word i occurs followed by any oher word, and c ij is he number of imes ha word i occurs followed by word j. A noe on noaion: i is convenien o assume ha each word is an ineger beween 1 and he vocabulary size. Noaion such as w i insead of i for he ih word causes wo difficulies: i leads o double subscrips, and i suggess ha srings are mahemaical objecs. Some issues occur wih n-gram models. The firs issue is ha hey do no handle novel words in an inelligen way. Typically we conver each word no in he predefined vocabulary ino a special fixed oken such as UNK, and hen rea his as an ordinary word. The second issue is ha all sequences of words no seen in he raining collecion are assigned zero probabiliy. For example, he bigram pink flies may be so uncommon ha i occurs zero imes in he raining collecion, bu ha does no mean i is impossible. Is probabiliy should be small, bu above zero. The higher he order of he n-gram model is, he more his second issue is imporan. 3 Linear regression Linear regression is perhaps he mos widely used mehod of modeling daa in classical saisics. Here we see how i fis ino he paradigm of learning he parameers of a Bayesian nework via he principle of maximum likelihood. We have independen nodes X 1 o X d and a dependen node Y, wih an edge X i Y for each i. Inuiively, he value of Y is a linear funcion of he values of X 1 o X d, plus some random noise. Assuming ha he noise has mean zero, we can wrie d E[Y ] = w i x i = w x i=1 6

7 where w 1 o w d are parameers describing he linear dependence. The sandard choice o model he random noise is a Gaussian disribuion wih mean zero and variance σ 2. The probabiliy densiy funcion of his disribuion is p(z) = 1 z2 exp 2πσ 2 2σ. 2 Combining his wih he expression for E[y] gives p(y = y X = x) = 1 exp 1 2πσ 2 2σ (y w 2 x)2. End of he lecure on Tuesday Ocober 30. To learn he parameers w 1 o w d we have raining examples ( x, y ) for = 1 o = T. Assume ha each x is a column vecor. Given ha hese examples are IID, he log likelihood is L = T log p(y x ) = =1 T =1 1 2 log(2πσ2 ) + 1 2σ (y w x ) 2. 2 We can maximize his expression in wo sages: firs find he opimal w i values, and hen find he opimal σ 2 value. The firs subproblem is o minimize (no maximize) T S = (y w x ) 2. =1 We can solve his by seing he parial derivaives of S o zero. We ge he equaions T S = 2(y w x )x i = 0 w i =1 for i = 1 o i = d, where we wrie x i because x is a column vecor. These yield he sysem of d linear equaions T y x i = =1 T ( w x )x i. =1 Noe ha each of he d equaions involves all of he unknowns w 1 o w d. In marix noaion, he sysem of equaions is b = A w. Here, b is he column vecor of 7

8 lengh d whose ih enry is b i = y x i, ha is b = y x. The righ side is T =1 x i( x T w) where he superscrip T means ranspose and he do produc has been wrien as a marix produc. This yields T T b = x ( x T w) = ( x x T ) w = A w =1 where he d d square marix A = x x T. The row i column j enry of A is A ij = x ix j. Mahemaically, he soluion o he sysem A w = b is w = A 1 b. Compuaionally, evaluaing he inverse A 1 of A is more expensive han jus solving he sysem of equaions once for a specific vecor b. In pracice, in Malab one uses he backslash operaor, and oher programming environmens have a similar feaure. The inverse of A is no well-defined when A does no have full rank. Since A is he sum of T marices of rank one, his happens when T < d, and can happen when he inpu vecors x are no linearly independen. One way of overcoming his issue is o choose he soluion w wih minimum norm such ha A w = b. Such a w always exiss and is unique. Concreely, his soluion is w = A + b where A + is he Moore-Penrose pseudo inverse of A, which always exiss, and can be compued via he singular value decomposiion (SVD) of A. We said above ha we can maximize he log likelihood in wo sages, firs finding he bes w i values, and hen finding he bes σ 2 value. The second sage is lef as an exercise for he reader. =1 4 The general EM algorihm Suppose ha, in he daa available for raining, he oucomes of some random variables are unknown for some examples. These oucomes are called hidden or laen, and he examples are called incomplee or parial. Concepually, i is no he case ha he hidden oucomes do no exis. Raher, hey do exis, bu hey have been concealed from he observer. Le X be he se of all nodes of he Bayesian nework. As before, suppose ha here are T raining examples, which are independen and idenically disribued. For he h raining example, le V be he se of visible nodes and le H be he se of hidden nodes, so X = V H. Noe ha differen examples may have differen hidden nodes. 8

9 As before, we wan o maximize he log likelihood of he observed daa: L = = log p(v = v ) log h p(v = v, H = h ) = log h n i=1 p(x i = x i pa(x i ) = pa i ). In he las expression above, each X i belongs o eiher V or H. Because of he sum over h, we canno move he logarihm inside he produc and we do no ge a separae opimizaion subproblem for each node X i. Expecaion-maximizaion (EM) is he name for an approach o solving he combined opimizaion problem. To simplify noaion, assume iniially ha here is jus one raining example, wih one observed random variable X = x and one hidden random variable Z. Le θ be all he parameers of he join model p(x = x, Z = z; θ). Following he principle of maximum likelihood, he goal is o choose θ o maximize he log likelihood funcion, which is L(θ; x) = log p(x; θ). As noed before, p(x; θ) = z p(x, z; θ). Suppose we have a curren esimae θ for he parameers. Muliplying inside his sum by p(z x; θ )/p(z x; θ ) gives ha he log likelihood is D = log p(x; θ) = log z p(x, z; θ) p(z x; θ ) p(z x; θ ). Noe ha z p(z x; θ ) = 1 and p(z x; θ ) 0 for all z. Therefore D is he logarihm of a weighed sum, so we can apply Jensen s inequaliy, 1 which says 1 The mahemaical fac on which he EM algorihm is based is known as Jensen s inequaliy. I is he following lemma. Lemma: Suppose he weighs w j are nonnegaive and sum o one, and le each x j be any real number for j = 1 o j = n. Le f : R R be any concave funcion. Then f( n w j x j ) j=1 n w j f(x j ). Proof: The proof is by inducion on n. For he base case n = 2, he definiion of being concave says ha f(wa + (1 w)b) wf(a) + (1 w)f(b). The logarihm funcion is concave, so Jensen s inequaliy applies o i. j=1 9

10 log j w jv j j w j log v j, given j w j = 1 and each w j 0. Here, we le he sum range over he values z of Z, wih he weigh w j being p(z x; θ ). We ge D E = z p(z x; θ ) log p(x, z; θ) p(z x; θ ). Separaing he fracion inside he logarihm o obain wo sums gives E = ( p(z x; θ ) log p(x, z; θ) ) ( p(z x; θ ) log p(z x; θ ) ). z Since E D and we wan o maximize D, consider maximizing E. The weighs p(z x; θ ) do no depend on θ, so we only need o maximize he firs sum, which is p(z x; θ ) log p(x, z; θ). z In general, he E sep of an EM algorihm is o compue p(z x; θ ) for all z. The M sep is hen o find θ o maximize z p(z x; θ ) log p(x, z; θ). How do we know ha maximizing E acually leads o an improvemen in he likelihood? Wih θ = θ, E = z p(z x; θ ) log p(x, z; θ ) p(z x; θ ) = z z p(z x; θ ) log p(x; θ ) = log p(x; θ ) which is he log likelihood a θ. So any θ ha maximizes E mus lead o a likelihood ha is beer han he likelihood a θ. 5 EM wih independen raining examples The EM algorihm derived above can be exended o he case where we have a raining se {x 1,..., x n } such ha each x i is independen, and hey all share he same parameers θ. In his case he log likelihood is D = i log p(x i ; θ). Le he auxiliary random variables be a se {Z 1,..., Z n } such ha he disribuion of each Z i is a funcion only of he corresponding x i and θ. Noe ha Z i may be differen for each i. By an argumen similar o above, D = i log z i p(x i, z i ; θ) p(z i x i ; θ ) p(z i x i ; θ ). 10

11 Using Jensen s inequaliy separaely for each i gives D E = i z i p(z i x i ; θ ) log p(x i, z i ; θ) p(z i x i ; θ ). As before, o maximize E we wan o maximize he sum p(z i x i ; θ ) log p(x i, z i ; θ). i z i The E sep is o compue p(z i x i ; θ ) for all z i for each i. The M sep is hen o find θ +1 = argmax θ p(z i x i ; θ ) log p(x i, z i ; θ). i z i End of he lecure on Thursday November 1. 6 EM for Bayesian neworks Le θ 0 be he curren esimae of he parameers of a Bayesian nework. For raining example, le v be he observed values of he visible nodes. The M sep of EM is o choose new parameer values θ ha maximize F = p(h v ; θ 0 ) log p(h, v ; θ) h where he inner sum is over all possible combinaions h of oucomes of he nodes ha are hidden in he h raining example. We shall show ha insead of summing explicily over all possible combinaions h, we can have a separae summaion for each hidden node. The advanage of his is ha separae summaions are far more efficien compuaionally. By he definiion of a Bayesian nework, F = p(h v ; θ 0 ) log p(x i = x i pa(x i ) = pa i ; θ) i h where each x i and each value in pa i is par of eiher v or h. Convering he log produc ino a sum of logs, hen moving his sum o he ouside, gives F = p(h v ; θ 0 ) log p(x i pa i ; θ). i h 11

12 For each i, he sum over h can be replaced by a sum over he alernaive values x of X i and π of he parens of X i, yielding F = p(x i = x, pa(x i ) = π v ; θ 0 ) log p(x π; θ). i x,π Noe ha summing over alernaive values for X i and is parens makes sense even if some of hese random variables are observed. If X i happens o be observed for raining example, le is observed value be x. In his case, p(x i = x, pa(x i ) = π v ; θ 0 ) = 0 for all values x x. A similar observaion is rue for parens of X i ha are observed. Changing he order of he sums again gives F = [ p(x, π v ; θ 0 )] log p(x π; θ). i x,π For comparison, he log likelihood in Equaion (1) on page 3 for he fully observed case can be rewrien as [ I(x = x i, π = pa i )] log p(x π; θ). x,π i The argumen following Equaion (1) says ha he soluion ha maximizes his expression is p(x i = x pa(x i ) = π) = I(x = x i, π = pa i ) I(π = pa i ). A similar argumen similar can be applied here o give ha he soluion for he new parameer values θ, in he parially observed case, is p(x π; θ) = p(x i = x pa(x i ) = π) = p(x i = x, pa(x i ) = π v ; θ 0 ) p(pa(x. i) = π v ; θ 0 ) To appreciae he meaning of his resul, remember ha θ is shorhand for all he parameers of he Bayesian nework, ha is all he CPTs of he nework. A single one of hese parameers is one number in one CPT, wrien p(x π; θ). In he special case where X i and is parens are fully observed, heir values x i and pa i are par of v, and p(x i = x, pa(x i ) = π v ; θ 0 ) = I(x = x i, π = pa i ). The maximum likelihood esimaion mehod for θ explained a he end of Secion 1 above is a special case of he expecaion-maximizaion mehod described here. 12

13 7 Applying EM o modeling language Secion 2 above described n-gram models of language. A major issue wih hese models is ha unigram models underfi he available daa, while higher-order models end o overfi. This secion shows how o use expecaion-maximizaion o fi a model wih inermediae complexiy, ha can rade off beween underfiing and overfiing. The cenral idea is o inroduce a hidden random variable called Z beween he random variable W for a word and he variable W for he following word. Specifically, he Bayesian nework has edges W Z and Z W. The alernaive values of he variable Z can be any discree se. Inuiively, each of hese values idenifies a differen possible linguisic conex. Each conex has a cerain probabiliy depending on he previous word, and each following word has a cerain probabiliy depending on he conex. We can say ha he previous word riggers each conex wih a word-specific probabiliy, while each conex suggess following words wih word-specific probabiliies. Le he number of alernaive conexs be c. Marginalizing ou he variable Z gives c p(w w) = p(w z)p(z w). z=1 This conex model has m(c 1) + c(m 1) parameers where m is he size of he vocabulary. If c = 1, he model reduces o he unigram model, while if c = m, he model has a quadraic number of parameers, like he bigram model. End of he lecure on Tuesday November 6. The following M sep derivaion is he same as in he quiz. The goal for raining is o maximize he log likelihood of he raining daa, which is log p(w, w ). (We ignore he complicaion ha raining examples are no independen, if hey are aken from consecuive ex.) Training he model means esimaing p(z w) and p(w z). Consider he former ask firs. The M sep of EM is o perform he updae p(z = z W = w) := p(z = z, W = w W = w, W = w ) p(w = w W = w, W = w ) = I(w = w)p(z = z W = w, W = w ) I(w = w) :w = p(z = z W = w =w, W = w ). coun(w = w) 13

14 This M sep is inuiively reasonable. Firs, he denominaor says ha he probabiliy of conex z given curren word w depends only on raining examples which have his word. Second, he numeraor says ha his probabiliy should be high if z is compaible wih he following word as well as wih he curren word. The E sep is o evaluae p(z w, w ) for all z, for each pair of consecuive words w and w. By Bayes rule his is p(z w, w ) = = p(w z, w)p(z w) z p(w z, w)p(z w) p(w z)p(z w) z p(w z )p(z w). This resul is also inuiively reasonable. I says ha he weigh of a conex z is proporional o is probabiliy given w and o he probabiliy of w given i. Finally, consider esimaing p(w z). The M sep for his is o perform he updae p(w = w Z = z) := p(w = w, Z = z W = w, W = w ) p(z = z W = w, W = w ) = I(w = w )p(z = z W = w, W = w ) p(z = z W = w, W = w ) :w p(z = z W = w = =w, W = w ) p(z = z W = w., W = w ) The denominaor here says ha he updae is based on all raining examples, bu each one is weighed according o he probabiliy of he conex z. The numeraor selecs, wih he same weighs, jus hose raining examples for which he second word is w. The E sep is acually he same as above: o evaluae p(z w, w ) for all z, for each pair of consecuive words w and w. 8 Mixure models Suppose ha we have alernaive models p(x; θ j ) for j = 1 o j = k ha are applicable o he same daa poins x. The linear combinaion p(x) = k λ j p(x; θ j ) j=1 14

15 is a valid probabiliy disribuion if λ j 0 and k j=1 λ j = 1. The combined model is ineresing because i is more flexible han any individual model. I is ofen called a mixure model wih k componens, bu i can also be called an inerpolaion model, or a cluser model wih k clusers. We can formulae he ask of learning he coefficiens from raining examples using a Bayesian nework ha has an observed node X, an unobserved node Z, and one edge Z X. The CPT for Z is simply p(z = j) = λ j, while he CPT for X is p(x z) = p(x; θ z ). The goal is o maximize he log likelihood of raining examples x 1 o x T. Marginalizing over Z, hen using he produc rule, shows ha p(x) = z p(x, z) = z p(z)p(x z) = k λ j p(x; θ j ) which is he same mixure model. The CPT of he node Z can be learned using EM. The E sep is o compue p(z = j x ) for all j, for each raining example x. Using Bayes rule, his is p(z = j x ) = p(x Z = j)p(z = j) p(x ) j=1 = p(x; θ j)λ j k i=1 λ ip(x; θ i ) The general M sep for Bayesian neworks is p(x i = x pa i = π) := p(x i = x, pa i = π v ) x p(x i = x, pa i = π v ). For he applicaion here, X i is Z and he parens of X i are he empy se. We ge he updae p(z = j) = λ j := p(z = j x ) k i=1 p(z = i x ) = p(z = j x ) T where T is he number of raining examples. End of he lecure on Thursday November 8. 9 Inerpolaing language models As a special case of raining a mixure model, consider a linear combinaion of language models of differen orders: p(w l w l 1, w l 2 ) = λ 1 p 1 (w l ) + λ 1 p 1 (w l w l 1 ) + λ 3 p 3 (w l w l 1, w l 2 ) 15

16 where all hree componen models are rained on he same corpus A. Wha is a principled way o esimae he inerpolaion weighs λ i? The firs imporan poin is ha he weighs should be rained using a differen corpus, say C. Specifically, we can choose he weighs o opimize he log likelihood of C. If he weighs are esimaed on A, he resul will always be λ n = 1 and λ i = 0 for i < n, where n indicaes he highes order model, because his model fis he A corpus he mos closely. When esing he final combined model, we mus use a hird corpus B, since he weighs will overfi C, a leas slighly. We can formulae he ask of learning he λ i weighs using a Bayesian nework. The nework has nodes W l 2, W l 1, W l, and Z, wih edges W l 2 W l, W l 1 W l, and Z W l. The CPT for Z is simply p(z = i) = λ i, while he CPT for W l is p 1 (w l ) if z = 1 p(w l w l 1, w l 2, z) = p 2 (w l w l 1 ) if z = 2 p 3 (w l w l 1, w l 2 ) if z = 3 The goal is o maximize he log likelihood of he uning corpus C. Marginalizing over Z, hen using he produc rule and condiional independence, shows ha p(w l w l 1, w l 2 ) = λ 1 p 1 (w l ) + λ 1 p 1 (w l w l 1 ) + λ 3 p 3 (w l w l 1, w l 2 ) as above. To learn values for he parameers λ i = p(z = i), he E sep is o compue he poserior probabiliy p(z = i w l, w l 1, w l 2 ). Using Bayes rule, his is p(z = i w l, w l 1, w l 2 ) =... The M sep is o updae λ i values. The general M sep for Bayesian neworks is p(x i = x pa i = π) := p(x i = x, pa i = π v ) x p(x i = x, pa i = π v ). For he applicaion here, raining example number is he word riple ending in w l, X i is Z, and he parens of X i are he empy se. We ge he updae l p(z = i) := p(z = i w l, w l 1, w l 2 ) 3 j=1 l p(z = j w l, w l 1, w l 2 ) = l p(z = i w l, w l 1, w l 2 ) L where L is he number of words in he corpus. 16

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details!

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details! MAT 257, Handou 6: Ocober 7-2, 20. I. Assignmen. Finish reading Chaper 2 of Spiva, rereading earlier secions as necessary. handou and fill in some missing deails! II. Higher derivaives. Also, read his

More information

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN The MIT Press, 2014 Lecure Slides for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/i2ml3e CHAPTER 2: SUPERVISED LEARNING Learning a Class

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when

More information

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED 0.1 MAXIMUM LIKELIHOOD ESTIMATIO EXPLAIED Maximum likelihood esimaion is a bes-fi saisical mehod for he esimaion of he values of he parameers of a sysem, based on a se of observaions of a random variable

More information

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t Exercise 7 C P = α + β R P + u C = αp + βr + v (a) (b) C R = α P R + β + w (c) Assumpions abou he disurbances u, v, w : Classical assumions on he disurbance of one of he equaions, eg. on (b): E(v v s P,

More information

RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY

RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY ECO 504 Spring 2006 Chris Sims RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY 1. INTRODUCTION Lagrange muliplier mehods are sandard fare in elemenary calculus courses, and hey play a cenral role in economic

More information

10. State Space Methods

10. State Space Methods . Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he

More information

Lecture 33: November 29

Lecture 33: November 29 36-705: Inermediae Saisics Fall 2017 Lecurer: Siva Balakrishnan Lecure 33: November 29 Today we will coninue discussing he boosrap, and hen ry o undersand why i works in a simple case. In he las lecure

More information

Notes for Lecture 17-18

Notes for Lecture 17-18 U.C. Berkeley CS278: Compuaional Complexiy Handou N7-8 Professor Luca Trevisan April 3-8, 2008 Noes for Lecure 7-8 In hese wo lecures we prove he firs half of he PCP Theorem, he Amplificaion Lemma, up

More information

Georey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract

Georey E. Hinton. University oftoronto.   Technical Report CRG-TR February 22, Abstract Parameer Esimaion for Linear Dynamical Sysems Zoubin Ghahramani Georey E. Hinon Deparmen of Compuer Science Universiy oftorono 6 King's College Road Torono, Canada M5S A4 Email: zoubin@cs.orono.edu Technical

More information

Ensamble methods: Bagging and Boosting

Ensamble methods: Bagging and Boosting Lecure 21 Ensamble mehods: Bagging and Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Ensemble mehods Mixure of expers Muliple base models (classifiers, regressors), each covers a differen par

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3

More information

MATH 128A, SUMMER 2009, FINAL EXAM SOLUTION

MATH 128A, SUMMER 2009, FINAL EXAM SOLUTION MATH 28A, SUMME 2009, FINAL EXAM SOLUTION BENJAMIN JOHNSON () (8 poins) [Lagrange Inerpolaion] (a) (4 poins) Le f be a funcion defined a some real numbers x 0,..., x n. Give a defining equaion for he Lagrange

More information

Chapter 2. First Order Scalar Equations

Chapter 2. First Order Scalar Equations Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.

More information

GMM - Generalized Method of Moments

GMM - Generalized Method of Moments GMM - Generalized Mehod of Momens Conens GMM esimaion, shor inroducion 2 GMM inuiion: Maching momens 2 3 General overview of GMM esimaion. 3 3. Weighing marix...........................................

More information

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t... Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger

More information

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals

More information

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis Speaker Adapaion Techniques For Coninuous Speech Using Medium and Small Adapaion Daa Ses Consaninos Boulis Ouline of he Presenaion Inroducion o he speaker adapaion problem Maximum Likelihood Sochasic Transformaions

More information

Ensamble methods: Boosting

Ensamble methods: Boosting Lecure 21 Ensamble mehods: Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Schedule Final exam: April 18: 1:00-2:15pm, in-class Term projecs April 23 & April 25: a 1:00-2:30pm in CS seminar room

More information

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality Marix Versions of Some Refinemens of he Arihmeic-Geomeric Mean Inequaliy Bao Qi Feng and Andrew Tonge Absrac. We esablish marix versions of refinemens due o Alzer ], Carwrigh and Field 4], and Mercer 5]

More information

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS NA568 Mobile Roboics: Mehods & Algorihms Today s Topic Quick review on (Linear) Kalman Filer Kalman Filering for Non-Linear Sysems Exended Kalman Filer (EKF)

More information

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Robust estimation based on the first- and third-moment restrictions of the power transformation model h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,

More information

Lecture 20: Riccati Equations and Least Squares Feedback Control

Lecture 20: Riccati Equations and Least Squares Feedback Control 34-5 LINEAR SYSTEMS Lecure : Riccai Equaions and Leas Squares Feedback Conrol 5.6.4 Sae Feedback via Riccai Equaions A recursive approach in generaing he marix-valued funcion W ( ) equaion for i for he

More information

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H. ACE 56 Fall 005 Lecure 5: he Simple Linear Regression Model: Sampling Properies of he Leas Squares Esimaors by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Inference in he Simple

More information

An Introduction to Malliavin calculus and its applications

An Introduction to Malliavin calculus and its applications An Inroducion o Malliavin calculus and is applicaions Lecure 5: Smoohness of he densiy and Hörmander s heorem David Nualar Deparmen of Mahemaics Kansas Universiy Universiy of Wyoming Summer School 214

More information

Mathcad Lecture #8 In-class Worksheet Curve Fitting and Interpolation

Mathcad Lecture #8 In-class Worksheet Curve Fitting and Interpolation Mahcad Lecure #8 In-class Workshee Curve Fiing and Inerpolaion A he end of his lecure, you will be able o: explain he difference beween curve fiing and inerpolaion decide wheher curve fiing or inerpolaion

More information

Lecture Notes 2. The Hilbert Space Approach to Time Series

Lecture Notes 2. The Hilbert Space Approach to Time Series Time Series Seven N. Durlauf Universiy of Wisconsin. Basic ideas Lecure Noes. The Hilber Space Approach o Time Series The Hilber space framework provides a very powerful language for discussing he relaionship

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

ACE 562 Fall Lecture 4: Simple Linear Regression Model: Specification and Estimation. by Professor Scott H. Irwin

ACE 562 Fall Lecture 4: Simple Linear Regression Model: Specification and Estimation. by Professor Scott H. Irwin ACE 56 Fall 005 Lecure 4: Simple Linear Regression Model: Specificaion and Esimaion by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Simple Regression: Economic and Saisical Model

More information

Math 2142 Exam 1 Review Problems. x 2 + f (0) 3! for the 3rd Taylor polynomial at x = 0. To calculate the various quantities:

Math 2142 Exam 1 Review Problems. x 2 + f (0) 3! for the 3rd Taylor polynomial at x = 0. To calculate the various quantities: Mah 4 Eam Review Problems Problem. Calculae he 3rd Taylor polynomial for arcsin a =. Soluion. Le f() = arcsin. For his problem, we use he formula f() + f () + f ()! + f () 3! for he 3rd Taylor polynomial

More information

2.7. Some common engineering functions. Introduction. Prerequisites. Learning Outcomes

2.7. Some common engineering functions. Introduction. Prerequisites. Learning Outcomes Some common engineering funcions 2.7 Inroducion This secion provides a caalogue of some common funcions ofen used in Science and Engineering. These include polynomials, raional funcions, he modulus funcion

More information

Unit Root Time Series. Univariate random walk

Unit Root Time Series. Univariate random walk Uni Roo ime Series Univariae random walk Consider he regression y y where ~ iid N 0, he leas squares esimae of is: ˆ yy y y yy Now wha if = If y y hen le y 0 =0 so ha y j j If ~ iid N 0, hen y ~ N 0, he

More information

Matlab and Python programming: how to get started

Matlab and Python programming: how to get started Malab and Pyhon programming: how o ge sared Equipping readers he skills o wrie programs o explore complex sysems and discover ineresing paerns from big daa is one of he main goals of his book. In his chaper,

More information

Two Coupled Oscillators / Normal Modes

Two Coupled Oscillators / Normal Modes Lecure 3 Phys 3750 Two Coupled Oscillaors / Normal Modes Overview and Moivaion: Today we ake a small, bu significan, sep owards wave moion. We will no ye observe waves, bu his sep is imporan in is own

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

Echocardiography Project and Finite Fourier Series

Echocardiography Project and Finite Fourier Series Echocardiography Projec and Finie Fourier Series 1 U M An echocardiagram is a plo of how a porion of he hear moves as he funcion of ime over he one or more hearbea cycles If he hearbea repeas iself every

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

Tom Heskes and Onno Zoeter. Presented by Mark Buller

Tom Heskes and Onno Zoeter. Presented by Mark Buller Tom Heskes and Onno Zoeer Presened by Mark Buller Dynamic Bayesian Neworks Direced graphical models of sochasic processes Represen hidden and observed variables wih differen dependencies Generalize Hidden

More information

Answers to QUIZ

Answers to QUIZ 18441 Answers o QUIZ 1 18441 1 Le P be he proporion of voers who will voe Yes Suppose he prior probabiliy disribuion of P is given by Pr(P < p) p for 0 < p < 1 You ake a poll by choosing nine voers a random,

More information

Asymptotic Equipartition Property - Seminar 3, part 1

Asymptotic Equipartition Property - Seminar 3, part 1 Asympoic Equipariion Propery - Seminar 3, par 1 Ocober 22, 2013 Problem 1 (Calculaion of ypical se) To clarify he noion of a ypical se A (n) ε and he smalles se of high probabiliy B (n), we will calculae

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients Secion 3.5 Nonhomogeneous Equaions; Mehod of Undeermined Coefficiens Key Terms/Ideas: Linear Differenial operaor Nonlinear operaor Second order homogeneous DE Second order nonhomogeneous DE Soluion o homogeneous

More information

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and

More information

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8) I. Definiions and Problems A. Perfec Mulicollineariy Econ7 Applied Economerics Topic 7: Mulicollineariy (Sudenmund, Chaper 8) Definiion: Perfec mulicollineariy exiss in a following K-variable regression

More information

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power Alpaydin Chaper, Michell Chaper 7 Alpaydin slides are in urquoise. Ehem Alpaydin, copyrigh: The MIT Press, 010. alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/ ehem/imle All oher slides are based on Michell.

More information

This document was generated at 1:04 PM, 09/10/13 Copyright 2013 Richard T. Woodward. 4. End points and transversality conditions AGEC

This document was generated at 1:04 PM, 09/10/13 Copyright 2013 Richard T. Woodward. 4. End points and transversality conditions AGEC his documen was generaed a 1:4 PM, 9/1/13 Copyrigh 213 Richard. Woodward 4. End poins and ransversaliy condiions AGEC 637-213 F z d Recall from Lecure 3 ha a ypical opimal conrol problem is o maimize (,,

More information

STATE-SPACE MODELLING. A mass balance across the tank gives:

STATE-SPACE MODELLING. A mass balance across the tank gives: B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing

More information

Solutions from Chapter 9.1 and 9.2

Solutions from Chapter 9.1 and 9.2 Soluions from Chaper 9 and 92 Secion 9 Problem # This basically boils down o an exercise in he chain rule from calculus We are looking for soluions of he form: u( x) = f( k x c) where k x R 3 and k is

More information

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),

More information

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé Bias in Condiional and Uncondiional Fixed Effecs Logi Esimaion: a Correcion * Tom Coupé Economics Educaion and Research Consorium, Naional Universiy of Kyiv Mohyla Academy Address: Vul Voloska 10, 04070

More information

Estimation of Poses with Particle Filters

Estimation of Poses with Particle Filters Esimaion of Poses wih Paricle Filers Dr.-Ing. Bernd Ludwig Chair for Arificial Inelligence Deparmen of Compuer Science Friedrich-Alexander-Universiä Erlangen-Nürnberg 12/05/2008 Dr.-Ing. Bernd Ludwig (FAU

More information

2. Nonlinear Conservation Law Equations

2. Nonlinear Conservation Law Equations . Nonlinear Conservaion Law Equaions One of he clear lessons learned over recen years in sudying nonlinear parial differenial equaions is ha i is generally no wise o ry o aack a general class of nonlinear

More information

The Arcsine Distribution

The Arcsine Distribution The Arcsine Disribuion Chris H. Rycrof Ocober 6, 006 A common heme of he class has been ha he saisics of single walker are ofen very differen from hose of an ensemble of walkers. On he firs homework, we

More information

Hamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t

Hamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t M ah 5 2 7 Fall 2 0 0 9 L ecure 1 0 O c. 7, 2 0 0 9 Hamilon- J acobi Equaion: Explici Formulas In his lecure we ry o apply he mehod of characerisics o he Hamilon-Jacobi equaion: u + H D u, x = 0 in R n

More information

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power Alpaydin Chaper, Michell Chaper 7 Alpaydin slides are in urquoise. Ehem Alpaydin, copyrigh: The MIT Press, 010. alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/ ehem/imle All oher slides are based on Michell.

More information

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

Lecture 2 October ε-approximation of 2-player zero-sum games

Lecture 2 October ε-approximation of 2-player zero-sum games Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion

More information

Christos Papadimitriou & Luca Trevisan November 22, 2016

Christos Papadimitriou & Luca Trevisan November 22, 2016 U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Probabilisic reasoning over ime So far, we ve mosly deal wih episodic environmens Excepions: games wih muliple moves, planning In paricular, he Bayesian neworks we ve seen so far describe

More information

Object tracking: Using HMMs to estimate the geographical location of fish

Object tracking: Using HMMs to estimate the geographical location of fish Objec racking: Using HMMs o esimae he geographical locaion of fish 02433 - Hidden Markov Models Marin Wæver Pedersen, Henrik Madsen Course week 13 MWP, compiled June 8, 2011 Objecive: Locae fish from agging

More information

Math 10B: Mock Mid II. April 13, 2016

Math 10B: Mock Mid II. April 13, 2016 Name: Soluions Mah 10B: Mock Mid II April 13, 016 1. ( poins) Sae, wih jusificaion, wheher he following saemens are rue or false. (a) If a 3 3 marix A saisfies A 3 A = 0, hen i canno be inverible. True.

More information

3.1 More on model selection

3.1 More on model selection 3. More on Model selecion 3. Comparing models AIC, BIC, Adjused R squared. 3. Over Fiing problem. 3.3 Sample spliing. 3. More on model selecion crieria Ofen afer model fiing you are lef wih a handful of

More information

IMPLICIT AND INVERSE FUNCTION THEOREMS PAUL SCHRIMPF 1 OCTOBER 25, 2013

IMPLICIT AND INVERSE FUNCTION THEOREMS PAUL SCHRIMPF 1 OCTOBER 25, 2013 IMPLICI AND INVERSE FUNCION HEOREMS PAUL SCHRIMPF 1 OCOBER 25, 213 UNIVERSIY OF BRIISH COLUMBIA ECONOMICS 526 We have exensively sudied how o solve sysems of linear equaions. We know how o check wheher

More information

Machine Learning 4771

Machine Learning 4771 ony Jebara, Columbia Universiy achine Learning 4771 Insrucor: ony Jebara ony Jebara, Columbia Universiy opic 20 Hs wih Evidence H Collec H Evaluae H Disribue H Decode H Parameer Learning via JA & E ony

More information

Chapter 6. Systems of First Order Linear Differential Equations

Chapter 6. Systems of First Order Linear Differential Equations Chaper 6 Sysems of Firs Order Linear Differenial Equaions We will only discuss firs order sysems However higher order sysems may be made ino firs order sysems by a rick shown below We will have a sligh

More information

ODEs II, Lecture 1: Homogeneous Linear Systems - I. Mike Raugh 1. March 8, 2004

ODEs II, Lecture 1: Homogeneous Linear Systems - I. Mike Raugh 1. March 8, 2004 ODEs II, Lecure : Homogeneous Linear Sysems - I Mike Raugh March 8, 4 Inroducion. In he firs lecure we discussed a sysem of linear ODEs for modeling he excreion of lead from he human body, saw how o ransform

More information

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems MATHEMATICS OF OPERATIONS RESEARCH Vol. 38, No. 2, May 2013, pp. 209 227 ISSN 0364-765X (prin) ISSN 1526-5471 (online) hp://dx.doi.org/10.1287/moor.1120.0562 2013 INFORMS On Boundedness of Q-Learning Ieraes

More information

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 A Primal-Dual Type Algorihm wih he O(/) Convergence Rae for Large Scale Consrained Convex Programs Hao Yu and Michael J. Neely Absrac This paper considers

More information

Speech and Language Processing

Speech and Language Processing Speech and Language rocessing Lecure 4 Variaional inference and sampling Informaion and Communicaions Engineering Course Takahiro Shinozaki 08//5 Lecure lan (Shinozaki s par) I gives he firs 6 lecures

More information

Final Spring 2007

Final Spring 2007 .615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o

More information

Y. Xiang, Learning Bayesian Networks 1

Y. Xiang, Learning Bayesian Networks 1 Learning Bayesian Neworks Objecives Acquisiion of BNs Technical conex of BN learning Crierion of sound srucure learning BN srucure learning in 2 seps BN CPT esimaion Reference R.E. Neapolian: Learning

More information

References are appeared in the last slide. Last update: (1393/08/19)

References are appeared in the last slide. Last update: (1393/08/19) SYSEM IDEIFICAIO Ali Karimpour Associae Professor Ferdowsi Universi of Mashhad References are appeared in he las slide. Las updae: 0..204 393/08/9 Lecure 5 lecure 5 Parameer Esimaion Mehods opics o be

More information

System of Linear Differential Equations

System of Linear Differential Equations Sysem of Linear Differenial Equaions In "Ordinary Differenial Equaions" we've learned how o solve a differenial equaion for a variable, such as: y'k5$e K2$x =0 solve DE yx = K 5 2 ek2 x C_C1 2$y''C7$y

More information

GENERALIZATION OF THE FORMULA OF FAA DI BRUNO FOR A COMPOSITE FUNCTION WITH A VECTOR ARGUMENT

GENERALIZATION OF THE FORMULA OF FAA DI BRUNO FOR A COMPOSITE FUNCTION WITH A VECTOR ARGUMENT Inerna J Mah & Mah Sci Vol 4, No 7 000) 48 49 S0670000970 Hindawi Publishing Corp GENERALIZATION OF THE FORMULA OF FAA DI BRUNO FOR A COMPOSITE FUNCTION WITH A VECTOR ARGUMENT RUMEN L MISHKOV Received

More information

EXERCISES FOR SECTION 1.5

EXERCISES FOR SECTION 1.5 1.5 Exisence and Uniqueness of Soluions 43 20. 1 v c 21. 1 v c 1 2 4 6 8 10 1 2 2 4 6 8 10 Graph of approximae soluion obained using Euler s mehod wih = 0.1. Graph of approximae soluion obained using Euler

More information

Convergence of the Neumann series in higher norms

Convergence of the Neumann series in higher norms Convergence of he Neumann series in higher norms Charles L. Epsein Deparmen of Mahemaics, Universiy of Pennsylvania Version 1.0 Augus 1, 003 Absrac Naural condiions on an operaor A are given so ha he Neumann

More information

non -negative cone Population dynamics motivates the study of linear models whose coefficient matrices are non-negative or positive.

non -negative cone Population dynamics motivates the study of linear models whose coefficient matrices are non-negative or positive. LECTURE 3 Linear/Nonnegaive Marix Models x ( = Px ( A= m m marix, x= m vecor Linear sysems of difference equaions arise in several difference conexs: Linear approximaions (linearizaion Perurbaion analysis

More information

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach 1 Decenralized Sochasic Conrol wih Parial Hisory Sharing: A Common Informaion Approach Ashuosh Nayyar, Adiya Mahajan and Demoshenis Tenekezis arxiv:1209.1695v1 [cs.sy] 8 Sep 2012 Absrac A general model

More information

ACE 562 Fall Lecture 8: The Simple Linear Regression Model: R 2, Reporting the Results and Prediction. by Professor Scott H.

ACE 562 Fall Lecture 8: The Simple Linear Regression Model: R 2, Reporting the Results and Prediction. by Professor Scott H. ACE 56 Fall 5 Lecure 8: The Simple Linear Regression Model: R, Reporing he Resuls and Predicion by Professor Sco H. Irwin Required Readings: Griffihs, Hill and Judge. "Explaining Variaion in he Dependen

More information

Testing the Random Walk Model. i.i.d. ( ) r

Testing the Random Walk Model. i.i.d. ( ) r he random walk heory saes: esing he Random Walk Model µ ε () np = + np + Momen Condiions where where ε ~ i.i.d he idea here is o es direcly he resricions imposed by momen condiions. lnp lnp µ ( lnp lnp

More information

Comparing Means: t-tests for One Sample & Two Related Samples

Comparing Means: t-tests for One Sample & Two Related Samples Comparing Means: -Tess for One Sample & Two Relaed Samples Using he z-tes: Assumpions -Tess for One Sample & Two Relaed Samples The z-es (of a sample mean agains a populaion mean) is based on he assumpion

More information

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology

More information

20. Applications of the Genetic-Drift Model

20. Applications of the Genetic-Drift Model 0. Applicaions of he Geneic-Drif Model 1) Deermining he probabiliy of forming any paricular combinaion of genoypes in he nex generaion: Example: If he parenal allele frequencies are p 0 = 0.35 and q 0

More information

MATH 4330/5330, Fourier Analysis Section 6, Proof of Fourier s Theorem for Pointwise Convergence

MATH 4330/5330, Fourier Analysis Section 6, Proof of Fourier s Theorem for Pointwise Convergence MATH 433/533, Fourier Analysis Secion 6, Proof of Fourier s Theorem for Poinwise Convergence Firs, some commens abou inegraing periodic funcions. If g is a periodic funcion, g(x + ) g(x) for all real x,

More information

Biol. 356 Lab 8. Mortality, Recruitment, and Migration Rates

Biol. 356 Lab 8. Mortality, Recruitment, and Migration Rates Biol. 356 Lab 8. Moraliy, Recruimen, and Migraion Raes (modified from Cox, 00, General Ecology Lab Manual, McGraw Hill) Las week we esimaed populaion size hrough several mehods. One assumpion of all hese

More information

14 Autoregressive Moving Average Models

14 Autoregressive Moving Average Models 14 Auoregressive Moving Average Models In his chaper an imporan parameric family of saionary ime series is inroduced, he family of he auoregressive moving average, or ARMA, processes. For a large class

More information

Understanding the asymptotic behaviour of empirical Bayes methods

Understanding the asymptotic behaviour of empirical Bayes methods Undersanding he asympoic behaviour of empirical Bayes mehods Boond Szabo, Aad van der Vaar and Harry van Zanen EURANDOM, 11.10.2011. Conens 2/20 Moivaion Nonparameric Bayesian saisics Signal in Whie noise

More information

Solutions of Sample Problems for Third In-Class Exam Math 246, Spring 2011, Professor David Levermore

Solutions of Sample Problems for Third In-Class Exam Math 246, Spring 2011, Professor David Levermore Soluions of Sample Problems for Third In-Class Exam Mah 6, Spring, Professor David Levermore Compue he Laplace ransform of f e from is definiion Soluion The definiion of he Laplace ransform gives L[f]s

More information

5. Stochastic processes (1)

5. Stochastic processes (1) Lec05.pp S-38.45 - Inroducion o Teleraffic Theory Spring 2005 Conens Basic conceps Poisson process 2 Sochasic processes () Consider some quaniy in a eleraffic (or any) sysem I ypically evolves in ime randomly

More information

Right tail. Survival function

Right tail. Survival function Densiy fi (con.) Lecure 4 The aim of his lecure is o improve our abiliy of densiy fi and knowledge of relaed opics. Main issues relaed o his lecure are: logarihmic plos, survival funcion, HS-fi mixures,

More information

Announcements: Warm-up Exercise:

Announcements: Warm-up Exercise: Fri Apr 13 7.1 Sysems of differenial equaions - o model muli-componen sysems via comparmenal analysis hp//en.wikipedia.org/wiki/muli-comparmen_model Announcemens Warm-up Exercise Here's a relaively simple

More information

Chapter 3 Boundary Value Problem

Chapter 3 Boundary Value Problem Chaper 3 Boundary Value Problem A boundary value problem (BVP) is a problem, ypically an ODE or a PDE, which has values assigned on he physical boundary of he domain in which he problem is specified. Le

More information

Computer-Aided Analysis of Electronic Circuits Course Notes 3

Computer-Aided Analysis of Electronic Circuits Course Notes 3 Gheorghe Asachi Technical Universiy of Iasi Faculy of Elecronics, Telecommunicaions and Informaion Technologies Compuer-Aided Analysis of Elecronic Circuis Course Noes 3 Bachelor: Telecommunicaion Technologies

More information

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature On Measuring Pro-Poor Growh 1. On Various Ways of Measuring Pro-Poor Growh: A Shor eview of he Lieraure During he pas en years or so here have been various suggesions concerning he way one should check

More information