Probabilistic learning
|
|
- Morris Robinson
- 5 years ago
- Views:
Transcription
1 Probabilisic learning Charles Elkan November 8, 2012 Imporan: These lecure noes are based closely on noes wrien by Lawrence Saul. Tex may be copied direcly from his noes, or paraphrased. Also, hese ypese noes lack illusraions. See he classroom lecures for figures and diagrams. 1 Learning in a Bayesian nework A Bayesian nework is a direced graph wih a CPT (condiional probabiliy able) for each node. This secion explains how o learn he CPTs from raining daa. As explained before, he raining daa are a marix where each row is an insance and each column is a feaure. Insances are also called examples, while feaures are also called nodes, random variables, and aribues. One enry in he marix is one value of one feaure, ha is one oucome of one random variable. We consider firs he scenario where each insance is complee, ha is he oucome of every node is observed for every insance. In his scenario, nohing is unknown, or in oher words, here is no missing daa. This scenario is also called fully visible, or no hidden nodes, or no laen variables. We also assume ha he graph of he Bayesian nework is known, ha is nodes X 1 o X n consiue a finie se, and ha each node is random variable wih a discree finie se of alernaive values. In his scenario, wha we need o learn is he CPT of each node. A single enry in one CPT is p(x i = x pa(x i ) = π) where x is a specific oucome of X i and π is a specific se of oucomes, also called a configuraion, of he paren nodes of X i. The raining daa are T insances x, each of which is a complee configuraion of X 1 o X n. We wrie x = (x 1,..., x n ). Remember he convenion ha he firs subscrip refers o rows while he second subscrip refers o columns. To make learning feasible, we need a basic assumpion abou he raining daa. 1
2 Assumpion. Each example is an independen and idenically disribued (IID) random sample from he join disribuion defined by he Bayesian nework. This assumpion has wo pars. Firs, each x is idenically disribued means ha each sample is generaed using he same CPTs. Second, being independen means ha probabiliies can be muliplied: p(x s, x ) = p(x s )p(x ). Wih he IID assumpion, we are ready o begin o derive a learning procedure. The probabiliy of he raining daa is P = T p(x 1 = x 1,..., X n = x n ). =1 The probabiliy of example is p(x 1 = x 1,..., X n = x n ) = = n p(x i = x i X 1 = x 1,..., X i 1 = x,i 1 ) i=1 n p(x i = x i pa(x i ) = pa i ) i=1 The firs equaion above follows from he chain rule of probabiliies, while he second follows from condiional independence in he Bayesian nework. Learning means choosing values, based on he available daa, for he aspecs of he model ha are unknown. Here, he model is he probabiliy disribuion specified by he Bayesian nework. Is graph is known bu he parameers inside is CPTs are unknown. The principle of maximum likelihood says ha we should choose values for unknown parameers in such a way ha he overall probabiliy of he raining daa is maximized. This principle is no a heorem ha can be proved. I is simply a sensible guideline. One way o argue ha he principle is sensible is o noice ha, essenially, i says ha we should assume ha he raining daa are he mos ypical possible, ha is, ha he observed daa are he mode of he disribuion o be learned. The principle of maximum likelihood says ha we should choose values for he parameers of he CPTs ha make P as large as possible. Le hese parameers be called w. The principle says ha we should choose w = argmax w P. Because he logarihm funcion is monoone sricly increasing, his is equivalen o choosing w = argmax w log P. I is convenien o maximize he log because he log of 2
3 a produc is a sum, and dealing wih sums is easier. So, he goal is o maximize L = log = T =1 T =1 i=1 n p(x i = x i pa(x i ) = pa i ) n log p(x i = x i pa(x i ) = pa i ) i=1 Swapping he order of he summaions gives L = n i=1 T log p(x i = x i pa(x i ) = pa i ). (1) =1 Now, noice ha each inner sum over involves a differen CPT. These CPTs have parameers whose values can be chosen compleely separaely. Therefore, L can be maximized by maximizing each inner sum separaely. We can decompose he ask of maximizing L ino n separae subasks o maximize M i = T log p(x i = x i pa(x i ) = pa i ) =1 for i = 1 o i = n. Consider one of hese subasks. The sum over reas each raining example separaely. To make progress, we group he raining examples ino equivalence classes. Each class consiss of all examples among he T ha have he same oucome for X i and he same oucome for he parens of X i. Le x range over he oucomes of X i and le π range over he oucomes of he parens of X i. Le coun(x, π) be how many of he T examples have he value x and he configuraion π. Noe ha T = coun(x, π). x π We can wrie M i = x coun(x, π) log p(x i = x pa(x i ) = π). π We wan o choose parameer values for he CPT for node i o maximize his expression. These parameer values are he probabiliies p(x i = x pa(x i ) = π). These values are consrained by he fac ha for each π log p(x i = x pa(x i ) = π) = 1. x 3
4 However, here is no consrain connecing he values for differen π. Therefore, we can swap he order of he summaions inside he expression for M i and obain a separae subask for each π. Wrie w x = p(x i = x pa(x i ) = π) and c x = coun(x, π). The problem o solve is maximize x c x log w x subjec o w x 0 and x w x = 1. This problem can be solved using Lagrange mulipliers. The soluion is w x = c x x c. x In words, he maximum likelihood esimae of he probabiliy ha X i = x, given ha he parens of X i are observed o be π, is coun(x i = x, pa(x i ) = π) p(x i = x pa(x i ) = π) = x coun(x i = x, pa(x i ) = π) = coun(x i = x, pa(x i ) = π) coun(pa(x i ) = π) = I(x = x i, π = pa i ) I(π = pa i ) where he couns are wih respec o he raining daa. These esimaes make sense inuiively. Each esimaed probabiliy is proporional o he corresponding frequency observed in he raining daa. If he value x is never observed for some combinaion π, hen is condiional probabiliy is esimaed o be zero. Alhough he esimaes are inuiively sensible, only a formal derivaion like he one above can show ha hey are correc (and unique). The derivaion uses several mahemaical manipulaions ha are common in similar argumens. These manipulaions include changing producs ino sums, swapping he order of summaions, and arguing ha maximizaion subasks are separae. End of he lecure on Thursday Ocober Markov models of language Many applicaions involving naural language need a model ha assigns probabiliies o senences. For example, he mos successful ranslaion sysems nowadays 4
5 for naural language are based on probabilisic models. Le F be a random variable whose values are senences wrien in French, and le E be a similar random variable ranging over English senences. Given a specific French senence f, he machine ranslaion ask is o find e = argmax e p(e = e F = f). One way o decompose he ask ino subasks is o use Bayes rule and wrie e = argmax e p(f = f E = e)p(e = e) p(f = f) = argmax e p(f = f E = e)p(e = e). The denominaor p(f = f) can be ignored because i is he same for all e. Alhough creaing a model of p(f = f E = e) is presumably jus as difficul as creaing a model direcly of p(e = e F = f), he model of p(e = e) can overcome some errors in p(f = f E = e). For example, regardless of he original senence in he foreign language, he English senence Colorless green ideas sleep furiously should no be a high-probabiliy ranslaion. This secion explains how o learn basic models of p(e = e). Clearly he probabiliy of a senence depends on he words in i, and also on he order of he words. Consider a senence ha consiss of he words w 1 o w L in order. Le hese words be he oucomes of random variables W 1 o W L. The chain rule of probabiliies says ha p(w 1, W 2,..., W L ) = p(w 1 )p(w 2 W 1 ) p(w L W L 1,... W 1 ). Words ha occur a long way before w l in he senence presumably influence he probabiliy of w l less, so o simplify his expression i is reasonable o fix a number n of previous words and wrie L p(w l W l n,..., W l 2, W l 1 ) l=1 wih each word depending only on he previous n words. In he special case where n = 0, each word is independen of he previous words. A model of his ype is called a Markov model of order n. A unigram model has order n = 0, a bigram model has order n = 1, and a rigram model has order n = 2. A bigram model corresponds o a Bayesian nework wih nodes W 1 o W L and an edge from each node W l o W l+1. Imporanly, he same CPT p(w l+1 = j W l = i) is used a each node W l+1. Fixing he enries in differen CPTs o be he same is called ying. Noice ha echnically we have a differen Bayesian nework for each differen lengh L, bu ying CPTs les us rea all hese neworks as he same. 5
6 How can we learn he shared CPT? Each node W l is a discree random variable, bu one wih a very large se of values. The cardinaliy of his se is he size of he vocabulary, ypically beween 10 4 and 10 5 in applicaions. Since mos words never follow each oher, a documen collecion of size smaller han (10 5 ) 2 words can be adequae for raining. Forunaely, nowadays i is easy o assemble and process collecions of 10 8 and more words. The maximum likelihood esimae of he CPT parameers is p(w l = j W l 1 = i) = c ij c i where c i is he number of imes ha word i occurs followed by any oher word, and c ij is he number of imes ha word i occurs followed by word j. A noe on noaion: i is convenien o assume ha each word is an ineger beween 1 and he vocabulary size. Noaion such as w i insead of i for he ih word causes wo difficulies: i leads o double subscrips, and i suggess ha srings are mahemaical objecs. Some issues occur wih n-gram models. The firs issue is ha hey do no handle novel words in an inelligen way. Typically we conver each word no in he predefined vocabulary ino a special fixed oken such as UNK, and hen rea his as an ordinary word. The second issue is ha all sequences of words no seen in he raining collecion are assigned zero probabiliy. For example, he bigram pink flies may be so uncommon ha i occurs zero imes in he raining collecion, bu ha does no mean i is impossible. Is probabiliy should be small, bu above zero. The higher he order of he n-gram model is, he more his second issue is imporan. 3 Linear regression Linear regression is perhaps he mos widely used mehod of modeling daa in classical saisics. Here we see how i fis ino he paradigm of learning he parameers of a Bayesian nework via he principle of maximum likelihood. We have independen nodes X 1 o X d and a dependen node Y, wih an edge X i Y for each i. Inuiively, he value of Y is a linear funcion of he values of X 1 o X d, plus some random noise. Assuming ha he noise has mean zero, we can wrie d E[Y ] = w i x i = w x i=1 6
7 where w 1 o w d are parameers describing he linear dependence. The sandard choice o model he random noise is a Gaussian disribuion wih mean zero and variance σ 2. The probabiliy densiy funcion of his disribuion is p(z) = 1 z2 exp 2πσ 2 2σ. 2 Combining his wih he expression for E[y] gives p(y = y X = x) = 1 exp 1 2πσ 2 2σ (y w 2 x)2. End of he lecure on Tuesday Ocober 30. To learn he parameers w 1 o w d we have raining examples ( x, y ) for = 1 o = T. Assume ha each x is a column vecor. Given ha hese examples are IID, he log likelihood is L = T log p(y x ) = =1 T =1 1 2 log(2πσ2 ) + 1 2σ (y w x ) 2. 2 We can maximize his expression in wo sages: firs find he opimal w i values, and hen find he opimal σ 2 value. The firs subproblem is o minimize (no maximize) T S = (y w x ) 2. =1 We can solve his by seing he parial derivaives of S o zero. We ge he equaions T S = 2(y w x )x i = 0 w i =1 for i = 1 o i = d, where we wrie x i because x is a column vecor. These yield he sysem of d linear equaions T y x i = =1 T ( w x )x i. =1 Noe ha each of he d equaions involves all of he unknowns w 1 o w d. In marix noaion, he sysem of equaions is b = A w. Here, b is he column vecor of 7
8 lengh d whose ih enry is b i = y x i, ha is b = y x. The righ side is T =1 x i( x T w) where he superscrip T means ranspose and he do produc has been wrien as a marix produc. This yields T T b = x ( x T w) = ( x x T ) w = A w =1 where he d d square marix A = x x T. The row i column j enry of A is A ij = x ix j. Mahemaically, he soluion o he sysem A w = b is w = A 1 b. Compuaionally, evaluaing he inverse A 1 of A is more expensive han jus solving he sysem of equaions once for a specific vecor b. In pracice, in Malab one uses he backslash operaor, and oher programming environmens have a similar feaure. The inverse of A is no well-defined when A does no have full rank. Since A is he sum of T marices of rank one, his happens when T < d, and can happen when he inpu vecors x are no linearly independen. One way of overcoming his issue is o choose he soluion w wih minimum norm such ha A w = b. Such a w always exiss and is unique. Concreely, his soluion is w = A + b where A + is he Moore-Penrose pseudo inverse of A, which always exiss, and can be compued via he singular value decomposiion (SVD) of A. We said above ha we can maximize he log likelihood in wo sages, firs finding he bes w i values, and hen finding he bes σ 2 value. The second sage is lef as an exercise for he reader. =1 4 The general EM algorihm Suppose ha, in he daa available for raining, he oucomes of some random variables are unknown for some examples. These oucomes are called hidden or laen, and he examples are called incomplee or parial. Concepually, i is no he case ha he hidden oucomes do no exis. Raher, hey do exis, bu hey have been concealed from he observer. Le X be he se of all nodes of he Bayesian nework. As before, suppose ha here are T raining examples, which are independen and idenically disribued. For he h raining example, le V be he se of visible nodes and le H be he se of hidden nodes, so X = V H. Noe ha differen examples may have differen hidden nodes. 8
9 As before, we wan o maximize he log likelihood of he observed daa: L = = log p(v = v ) log h p(v = v, H = h ) = log h n i=1 p(x i = x i pa(x i ) = pa i ). In he las expression above, each X i belongs o eiher V or H. Because of he sum over h, we canno move he logarihm inside he produc and we do no ge a separae opimizaion subproblem for each node X i. Expecaion-maximizaion (EM) is he name for an approach o solving he combined opimizaion problem. To simplify noaion, assume iniially ha here is jus one raining example, wih one observed random variable X = x and one hidden random variable Z. Le θ be all he parameers of he join model p(x = x, Z = z; θ). Following he principle of maximum likelihood, he goal is o choose θ o maximize he log likelihood funcion, which is L(θ; x) = log p(x; θ). As noed before, p(x; θ) = z p(x, z; θ). Suppose we have a curren esimae θ for he parameers. Muliplying inside his sum by p(z x; θ )/p(z x; θ ) gives ha he log likelihood is D = log p(x; θ) = log z p(x, z; θ) p(z x; θ ) p(z x; θ ). Noe ha z p(z x; θ ) = 1 and p(z x; θ ) 0 for all z. Therefore D is he logarihm of a weighed sum, so we can apply Jensen s inequaliy, 1 which says 1 The mahemaical fac on which he EM algorihm is based is known as Jensen s inequaliy. I is he following lemma. Lemma: Suppose he weighs w j are nonnegaive and sum o one, and le each x j be any real number for j = 1 o j = n. Le f : R R be any concave funcion. Then f( n w j x j ) j=1 n w j f(x j ). Proof: The proof is by inducion on n. For he base case n = 2, he definiion of being concave says ha f(wa + (1 w)b) wf(a) + (1 w)f(b). The logarihm funcion is concave, so Jensen s inequaliy applies o i. j=1 9
10 log j w jv j j w j log v j, given j w j = 1 and each w j 0. Here, we le he sum range over he values z of Z, wih he weigh w j being p(z x; θ ). We ge D E = z p(z x; θ ) log p(x, z; θ) p(z x; θ ). Separaing he fracion inside he logarihm o obain wo sums gives E = ( p(z x; θ ) log p(x, z; θ) ) ( p(z x; θ ) log p(z x; θ ) ). z Since E D and we wan o maximize D, consider maximizing E. The weighs p(z x; θ ) do no depend on θ, so we only need o maximize he firs sum, which is p(z x; θ ) log p(x, z; θ). z In general, he E sep of an EM algorihm is o compue p(z x; θ ) for all z. The M sep is hen o find θ o maximize z p(z x; θ ) log p(x, z; θ). How do we know ha maximizing E acually leads o an improvemen in he likelihood? Wih θ = θ, E = z p(z x; θ ) log p(x, z; θ ) p(z x; θ ) = z z p(z x; θ ) log p(x; θ ) = log p(x; θ ) which is he log likelihood a θ. So any θ ha maximizes E mus lead o a likelihood ha is beer han he likelihood a θ. 5 EM wih independen raining examples The EM algorihm derived above can be exended o he case where we have a raining se {x 1,..., x n } such ha each x i is independen, and hey all share he same parameers θ. In his case he log likelihood is D = i log p(x i ; θ). Le he auxiliary random variables be a se {Z 1,..., Z n } such ha he disribuion of each Z i is a funcion only of he corresponding x i and θ. Noe ha Z i may be differen for each i. By an argumen similar o above, D = i log z i p(x i, z i ; θ) p(z i x i ; θ ) p(z i x i ; θ ). 10
11 Using Jensen s inequaliy separaely for each i gives D E = i z i p(z i x i ; θ ) log p(x i, z i ; θ) p(z i x i ; θ ). As before, o maximize E we wan o maximize he sum p(z i x i ; θ ) log p(x i, z i ; θ). i z i The E sep is o compue p(z i x i ; θ ) for all z i for each i. The M sep is hen o find θ +1 = argmax θ p(z i x i ; θ ) log p(x i, z i ; θ). i z i End of he lecure on Thursday November 1. 6 EM for Bayesian neworks Le θ 0 be he curren esimae of he parameers of a Bayesian nework. For raining example, le v be he observed values of he visible nodes. The M sep of EM is o choose new parameer values θ ha maximize F = p(h v ; θ 0 ) log p(h, v ; θ) h where he inner sum is over all possible combinaions h of oucomes of he nodes ha are hidden in he h raining example. We shall show ha insead of summing explicily over all possible combinaions h, we can have a separae summaion for each hidden node. The advanage of his is ha separae summaions are far more efficien compuaionally. By he definiion of a Bayesian nework, F = p(h v ; θ 0 ) log p(x i = x i pa(x i ) = pa i ; θ) i h where each x i and each value in pa i is par of eiher v or h. Convering he log produc ino a sum of logs, hen moving his sum o he ouside, gives F = p(h v ; θ 0 ) log p(x i pa i ; θ). i h 11
12 For each i, he sum over h can be replaced by a sum over he alernaive values x of X i and π of he parens of X i, yielding F = p(x i = x, pa(x i ) = π v ; θ 0 ) log p(x π; θ). i x,π Noe ha summing over alernaive values for X i and is parens makes sense even if some of hese random variables are observed. If X i happens o be observed for raining example, le is observed value be x. In his case, p(x i = x, pa(x i ) = π v ; θ 0 ) = 0 for all values x x. A similar observaion is rue for parens of X i ha are observed. Changing he order of he sums again gives F = [ p(x, π v ; θ 0 )] log p(x π; θ). i x,π For comparison, he log likelihood in Equaion (1) on page 3 for he fully observed case can be rewrien as [ I(x = x i, π = pa i )] log p(x π; θ). x,π i The argumen following Equaion (1) says ha he soluion ha maximizes his expression is p(x i = x pa(x i ) = π) = I(x = x i, π = pa i ) I(π = pa i ). A similar argumen similar can be applied here o give ha he soluion for he new parameer values θ, in he parially observed case, is p(x π; θ) = p(x i = x pa(x i ) = π) = p(x i = x, pa(x i ) = π v ; θ 0 ) p(pa(x. i) = π v ; θ 0 ) To appreciae he meaning of his resul, remember ha θ is shorhand for all he parameers of he Bayesian nework, ha is all he CPTs of he nework. A single one of hese parameers is one number in one CPT, wrien p(x π; θ). In he special case where X i and is parens are fully observed, heir values x i and pa i are par of v, and p(x i = x, pa(x i ) = π v ; θ 0 ) = I(x = x i, π = pa i ). The maximum likelihood esimaion mehod for θ explained a he end of Secion 1 above is a special case of he expecaion-maximizaion mehod described here. 12
13 7 Applying EM o modeling language Secion 2 above described n-gram models of language. A major issue wih hese models is ha unigram models underfi he available daa, while higher-order models end o overfi. This secion shows how o use expecaion-maximizaion o fi a model wih inermediae complexiy, ha can rade off beween underfiing and overfiing. The cenral idea is o inroduce a hidden random variable called Z beween he random variable W for a word and he variable W for he following word. Specifically, he Bayesian nework has edges W Z and Z W. The alernaive values of he variable Z can be any discree se. Inuiively, each of hese values idenifies a differen possible linguisic conex. Each conex has a cerain probabiliy depending on he previous word, and each following word has a cerain probabiliy depending on he conex. We can say ha he previous word riggers each conex wih a word-specific probabiliy, while each conex suggess following words wih word-specific probabiliies. Le he number of alernaive conexs be c. Marginalizing ou he variable Z gives c p(w w) = p(w z)p(z w). z=1 This conex model has m(c 1) + c(m 1) parameers where m is he size of he vocabulary. If c = 1, he model reduces o he unigram model, while if c = m, he model has a quadraic number of parameers, like he bigram model. End of he lecure on Tuesday November 6. The following M sep derivaion is he same as in he quiz. The goal for raining is o maximize he log likelihood of he raining daa, which is log p(w, w ). (We ignore he complicaion ha raining examples are no independen, if hey are aken from consecuive ex.) Training he model means esimaing p(z w) and p(w z). Consider he former ask firs. The M sep of EM is o perform he updae p(z = z W = w) := p(z = z, W = w W = w, W = w ) p(w = w W = w, W = w ) = I(w = w)p(z = z W = w, W = w ) I(w = w) :w = p(z = z W = w =w, W = w ). coun(w = w) 13
14 This M sep is inuiively reasonable. Firs, he denominaor says ha he probabiliy of conex z given curren word w depends only on raining examples which have his word. Second, he numeraor says ha his probabiliy should be high if z is compaible wih he following word as well as wih he curren word. The E sep is o evaluae p(z w, w ) for all z, for each pair of consecuive words w and w. By Bayes rule his is p(z w, w ) = = p(w z, w)p(z w) z p(w z, w)p(z w) p(w z)p(z w) z p(w z )p(z w). This resul is also inuiively reasonable. I says ha he weigh of a conex z is proporional o is probabiliy given w and o he probabiliy of w given i. Finally, consider esimaing p(w z). The M sep for his is o perform he updae p(w = w Z = z) := p(w = w, Z = z W = w, W = w ) p(z = z W = w, W = w ) = I(w = w )p(z = z W = w, W = w ) p(z = z W = w, W = w ) :w p(z = z W = w = =w, W = w ) p(z = z W = w., W = w ) The denominaor here says ha he updae is based on all raining examples, bu each one is weighed according o he probabiliy of he conex z. The numeraor selecs, wih he same weighs, jus hose raining examples for which he second word is w. The E sep is acually he same as above: o evaluae p(z w, w ) for all z, for each pair of consecuive words w and w. 8 Mixure models Suppose ha we have alernaive models p(x; θ j ) for j = 1 o j = k ha are applicable o he same daa poins x. The linear combinaion p(x) = k λ j p(x; θ j ) j=1 14
15 is a valid probabiliy disribuion if λ j 0 and k j=1 λ j = 1. The combined model is ineresing because i is more flexible han any individual model. I is ofen called a mixure model wih k componens, bu i can also be called an inerpolaion model, or a cluser model wih k clusers. We can formulae he ask of learning he coefficiens from raining examples using a Bayesian nework ha has an observed node X, an unobserved node Z, and one edge Z X. The CPT for Z is simply p(z = j) = λ j, while he CPT for X is p(x z) = p(x; θ z ). The goal is o maximize he log likelihood of raining examples x 1 o x T. Marginalizing over Z, hen using he produc rule, shows ha p(x) = z p(x, z) = z p(z)p(x z) = k λ j p(x; θ j ) which is he same mixure model. The CPT of he node Z can be learned using EM. The E sep is o compue p(z = j x ) for all j, for each raining example x. Using Bayes rule, his is p(z = j x ) = p(x Z = j)p(z = j) p(x ) j=1 = p(x; θ j)λ j k i=1 λ ip(x; θ i ) The general M sep for Bayesian neworks is p(x i = x pa i = π) := p(x i = x, pa i = π v ) x p(x i = x, pa i = π v ). For he applicaion here, X i is Z and he parens of X i are he empy se. We ge he updae p(z = j) = λ j := p(z = j x ) k i=1 p(z = i x ) = p(z = j x ) T where T is he number of raining examples. End of he lecure on Thursday November 8. 9 Inerpolaing language models As a special case of raining a mixure model, consider a linear combinaion of language models of differen orders: p(w l w l 1, w l 2 ) = λ 1 p 1 (w l ) + λ 1 p 1 (w l w l 1 ) + λ 3 p 3 (w l w l 1, w l 2 ) 15
16 where all hree componen models are rained on he same corpus A. Wha is a principled way o esimae he inerpolaion weighs λ i? The firs imporan poin is ha he weighs should be rained using a differen corpus, say C. Specifically, we can choose he weighs o opimize he log likelihood of C. If he weighs are esimaed on A, he resul will always be λ n = 1 and λ i = 0 for i < n, where n indicaes he highes order model, because his model fis he A corpus he mos closely. When esing he final combined model, we mus use a hird corpus B, since he weighs will overfi C, a leas slighly. We can formulae he ask of learning he λ i weighs using a Bayesian nework. The nework has nodes W l 2, W l 1, W l, and Z, wih edges W l 2 W l, W l 1 W l, and Z W l. The CPT for Z is simply p(z = i) = λ i, while he CPT for W l is p 1 (w l ) if z = 1 p(w l w l 1, w l 2, z) = p 2 (w l w l 1 ) if z = 2 p 3 (w l w l 1, w l 2 ) if z = 3 The goal is o maximize he log likelihood of he uning corpus C. Marginalizing over Z, hen using he produc rule and condiional independence, shows ha p(w l w l 1, w l 2 ) = λ 1 p 1 (w l ) + λ 1 p 1 (w l w l 1 ) + λ 3 p 3 (w l w l 1, w l 2 ) as above. To learn values for he parameers λ i = p(z = i), he E sep is o compue he poserior probabiliy p(z = i w l, w l 1, w l 2 ). Using Bayes rule, his is p(z = i w l, w l 1, w l 2 ) =... The M sep is o updae λ i values. The general M sep for Bayesian neworks is p(x i = x pa i = π) := p(x i = x, pa i = π v ) x p(x i = x, pa i = π v ). For he applicaion here, raining example number is he word riple ending in w l, X i is Z, and he parens of X i are he empy se. We ge he updae l p(z = i) := p(z = i w l, w l 1, w l 2 ) 3 j=1 l p(z = j w l, w l 1, w l 2 ) = l p(z = i w l, w l 1, w l 2 ) L where L is he number of words in he corpus. 16
Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details!
MAT 257, Handou 6: Ocober 7-2, 20. I. Assignmen. Finish reading Chaper 2 of Spiva, rereading earlier secions as necessary. handou and fill in some missing deails! II. Higher derivaives. Also, read his
More informationPENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD
PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.
More informationVehicle Arrival Models : Headway
Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where
More informationINTRODUCTION TO MACHINE LEARNING 3RD EDITION
ETHEM ALPAYDIN The MIT Press, 2014 Lecure Slides for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/i2ml3e CHAPTER 2: SUPERVISED LEARNING Learning a Class
More informationNotes on Kalman Filtering
Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren
More informationState-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter
Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when
More information0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED
0.1 MAXIMUM LIKELIHOOD ESTIMATIO EXPLAIED Maximum likelihood esimaion is a bes-fi saisical mehod for he esimaion of he values of he parameers of a sysem, based on a se of observaions of a random variable
More informationR t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t
Exercise 7 C P = α + β R P + u C = αp + βr + v (a) (b) C R = α P R + β + w (c) Assumpions abou he disurbances u, v, w : Classical assumions on he disurbance of one of he equaions, eg. on (b): E(v v s P,
More informationRANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY
ECO 504 Spring 2006 Chris Sims RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY 1. INTRODUCTION Lagrange muliplier mehods are sandard fare in elemenary calculus courses, and hey play a cenral role in economic
More information10. State Space Methods
. Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he
More informationLecture 33: November 29
36-705: Inermediae Saisics Fall 2017 Lecurer: Siva Balakrishnan Lecure 33: November 29 Today we will coninue discussing he boosrap, and hen ry o undersand why i works in a simple case. In he las lecure
More informationNotes for Lecture 17-18
U.C. Berkeley CS278: Compuaional Complexiy Handou N7-8 Professor Luca Trevisan April 3-8, 2008 Noes for Lecure 7-8 In hese wo lecures we prove he firs half of he PCP Theorem, he Amplificaion Lemma, up
More informationGeorey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract
Parameer Esimaion for Linear Dynamical Sysems Zoubin Ghahramani Georey E. Hinon Deparmen of Compuer Science Universiy oftorono 6 King's College Road Torono, Canada M5S A4 Email: zoubin@cs.orono.edu Technical
More informationEnsamble methods: Bagging and Boosting
Lecure 21 Ensamble mehods: Bagging and Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Ensemble mehods Mixure of expers Muliple base models (classifiers, regressors), each covers a differen par
More information1 Review of Zero-Sum Games
COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3
More informationMATH 128A, SUMMER 2009, FINAL EXAM SOLUTION
MATH 28A, SUMME 2009, FINAL EXAM SOLUTION BENJAMIN JOHNSON () (8 poins) [Lagrange Inerpolaion] (a) (4 poins) Le f be a funcion defined a some real numbers x 0,..., x n. Give a defining equaion for he Lagrange
More informationChapter 2. First Order Scalar Equations
Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.
More informationGMM - Generalized Method of Moments
GMM - Generalized Mehod of Momens Conens GMM esimaion, shor inroducion 2 GMM inuiion: Maching momens 2 3 General overview of GMM esimaion. 3 3. Weighing marix...........................................
More informationt is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...
Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger
More information23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes
Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals
More informationSpeaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis
Speaker Adapaion Techniques For Coninuous Speech Using Medium and Small Adapaion Daa Ses Consaninos Boulis Ouline of he Presenaion Inroducion o he speaker adapaion problem Maximum Likelihood Sochasic Transformaions
More informationEnsamble methods: Boosting
Lecure 21 Ensamble mehods: Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Schedule Final exam: April 18: 1:00-2:15pm, in-class Term projecs April 23 & April 25: a 1:00-2:30pm in CS seminar room
More informationMatrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality
Marix Versions of Some Refinemens of he Arihmeic-Geomeric Mean Inequaliy Bao Qi Feng and Andrew Tonge Absrac. We esablish marix versions of refinemens due o Alzer ], Carwrigh and Field 4], and Mercer 5]
More informationL07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms
L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS NA568 Mobile Roboics: Mehods & Algorihms Today s Topic Quick review on (Linear) Kalman Filer Kalman Filering for Non-Linear Sysems Exended Kalman Filer (EKF)
More informationRobust estimation based on the first- and third-moment restrictions of the power transformation model
h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,
More informationLecture 20: Riccati Equations and Least Squares Feedback Control
34-5 LINEAR SYSTEMS Lecure : Riccai Equaions and Leas Squares Feedback Conrol 5.6.4 Sae Feedback via Riccai Equaions A recursive approach in generaing he marix-valued funcion W ( ) equaion for i for he
More informationACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.
ACE 56 Fall 005 Lecure 5: he Simple Linear Regression Model: Sampling Properies of he Leas Squares Esimaors by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Inference in he Simple
More informationAn Introduction to Malliavin calculus and its applications
An Inroducion o Malliavin calculus and is applicaions Lecure 5: Smoohness of he densiy and Hörmander s heorem David Nualar Deparmen of Mahemaics Kansas Universiy Universiy of Wyoming Summer School 214
More informationMathcad Lecture #8 In-class Worksheet Curve Fitting and Interpolation
Mahcad Lecure #8 In-class Workshee Curve Fiing and Inerpolaion A he end of his lecure, you will be able o: explain he difference beween curve fiing and inerpolaion decide wheher curve fiing or inerpolaion
More informationLecture Notes 2. The Hilbert Space Approach to Time Series
Time Series Seven N. Durlauf Universiy of Wisconsin. Basic ideas Lecure Noes. The Hilber Space Approach o Time Series The Hilber space framework provides a very powerful language for discussing he relaionship
More informationT L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB
Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal
More informationACE 562 Fall Lecture 4: Simple Linear Regression Model: Specification and Estimation. by Professor Scott H. Irwin
ACE 56 Fall 005 Lecure 4: Simple Linear Regression Model: Specificaion and Esimaion by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Simple Regression: Economic and Saisical Model
More informationMath 2142 Exam 1 Review Problems. x 2 + f (0) 3! for the 3rd Taylor polynomial at x = 0. To calculate the various quantities:
Mah 4 Eam Review Problems Problem. Calculae he 3rd Taylor polynomial for arcsin a =. Soluion. Le f() = arcsin. For his problem, we use he formula f() + f () + f ()! + f () 3! for he 3rd Taylor polynomial
More information2.7. Some common engineering functions. Introduction. Prerequisites. Learning Outcomes
Some common engineering funcions 2.7 Inroducion This secion provides a caalogue of some common funcions ofen used in Science and Engineering. These include polynomials, raional funcions, he modulus funcion
More informationUnit Root Time Series. Univariate random walk
Uni Roo ime Series Univariae random walk Consider he regression y y where ~ iid N 0, he leas squares esimae of is: ˆ yy y y yy Now wha if = If y y hen le y 0 =0 so ha y j j If ~ iid N 0, hen y ~ N 0, he
More informationMatlab and Python programming: how to get started
Malab and Pyhon programming: how o ge sared Equipping readers he skills o wrie programs o explore complex sysems and discover ineresing paerns from big daa is one of he main goals of his book. In his chaper,
More informationTwo Coupled Oscillators / Normal Modes
Lecure 3 Phys 3750 Two Coupled Oscillaors / Normal Modes Overview and Moivaion: Today we ake a small, bu significan, sep owards wave moion. We will no ye observe waves, bu his sep is imporan in is own
More informationTwo Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017
Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =
More informationEchocardiography Project and Finite Fourier Series
Echocardiography Projec and Finie Fourier Series 1 U M An echocardiagram is a plo of how a porion of he hear moves as he funcion of ime over he one or more hearbea cycles If he hearbea repeas iself every
More informationDiebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles
Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance
More informationTom Heskes and Onno Zoeter. Presented by Mark Buller
Tom Heskes and Onno Zoeer Presened by Mark Buller Dynamic Bayesian Neworks Direced graphical models of sochasic processes Represen hidden and observed variables wih differen dependencies Generalize Hidden
More informationAnswers to QUIZ
18441 Answers o QUIZ 1 18441 1 Le P be he proporion of voers who will voe Yes Suppose he prior probabiliy disribuion of P is given by Pr(P < p) p for 0 < p < 1 You ake a poll by choosing nine voers a random,
More informationAsymptotic Equipartition Property - Seminar 3, part 1
Asympoic Equipariion Propery - Seminar 3, par 1 Ocober 22, 2013 Problem 1 (Calculaion of ypical se) To clarify he noion of a ypical se A (n) ε and he smalles se of high probabiliy B (n), we will calculae
More informationAn introduction to the theory of SDDP algorithm
An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking
More informationSection 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients
Secion 3.5 Nonhomogeneous Equaions; Mehod of Undeermined Coefficiens Key Terms/Ideas: Linear Differenial operaor Nonlinear operaor Second order homogeneous DE Second order nonhomogeneous DE Soluion o homogeneous
More informationModal identification of structures from roving input data by means of maximum likelihood estimation of the state space model
Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix
More informationLinear Response Theory: The connection between QFT and experiments
Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and
More informationEcon107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)
I. Definiions and Problems A. Perfec Mulicollineariy Econ7 Applied Economerics Topic 7: Mulicollineariy (Sudenmund, Chaper 8) Definiion: Perfec mulicollineariy exiss in a following K-variable regression
More informationLearning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power
Alpaydin Chaper, Michell Chaper 7 Alpaydin slides are in urquoise. Ehem Alpaydin, copyrigh: The MIT Press, 010. alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/ ehem/imle All oher slides are based on Michell.
More informationThis document was generated at 1:04 PM, 09/10/13 Copyright 2013 Richard T. Woodward. 4. End points and transversality conditions AGEC
his documen was generaed a 1:4 PM, 9/1/13 Copyrigh 213 Richard. Woodward 4. End poins and ransversaliy condiions AGEC 637-213 F z d Recall from Lecure 3 ha a ypical opimal conrol problem is o maimize (,,
More informationSTATE-SPACE MODELLING. A mass balance across the tank gives:
B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing
More informationSolutions from Chapter 9.1 and 9.2
Soluions from Chaper 9 and 92 Secion 9 Problem # This basically boils down o an exercise in he chain rule from calculus We are looking for soluions of he form: u( x) = f( k x c) where k x R 3 and k is
More informationZürich. ETH Master Course: L Autonomous Mobile Robots Localization II
Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),
More informationBias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé
Bias in Condiional and Uncondiional Fixed Effecs Logi Esimaion: a Correcion * Tom Coupé Economics Educaion and Research Consorium, Naional Universiy of Kyiv Mohyla Academy Address: Vul Voloska 10, 04070
More informationEstimation of Poses with Particle Filters
Esimaion of Poses wih Paricle Filers Dr.-Ing. Bernd Ludwig Chair for Arificial Inelligence Deparmen of Compuer Science Friedrich-Alexander-Universiä Erlangen-Nürnberg 12/05/2008 Dr.-Ing. Bernd Ludwig (FAU
More information2. Nonlinear Conservation Law Equations
. Nonlinear Conservaion Law Equaions One of he clear lessons learned over recen years in sudying nonlinear parial differenial equaions is ha i is generally no wise o ry o aack a general class of nonlinear
More informationThe Arcsine Distribution
The Arcsine Disribuion Chris H. Rycrof Ocober 6, 006 A common heme of he class has been ha he saisics of single walker are ofen very differen from hose of an ensemble of walkers. On he firs homework, we
More informationHamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t
M ah 5 2 7 Fall 2 0 0 9 L ecure 1 0 O c. 7, 2 0 0 9 Hamilon- J acobi Equaion: Explici Formulas In his lecure we ry o apply he mehod of characerisics o he Hamilon-Jacobi equaion: u + H D u, x = 0 in R n
More informationLearning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power
Alpaydin Chaper, Michell Chaper 7 Alpaydin slides are in urquoise. Ehem Alpaydin, copyrigh: The MIT Press, 010. alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/ ehem/imle All oher slides are based on Michell.
More informationSimulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010
Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid
More informationOnline Convex Optimization Example And Follow-The-Leader
CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion
More informationLecture 2 October ε-approximation of 2-player zero-sum games
Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion
More informationChristos Papadimitriou & Luca Trevisan November 22, 2016
U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream
More informationHidden Markov Models
Hidden Markov Models Probabilisic reasoning over ime So far, we ve mosly deal wih episodic environmens Excepions: games wih muliple moves, planning In paricular, he Bayesian neworks we ve seen so far describe
More informationObject tracking: Using HMMs to estimate the geographical location of fish
Objec racking: Using HMMs o esimae he geographical locaion of fish 02433 - Hidden Markov Models Marin Wæver Pedersen, Henrik Madsen Course week 13 MWP, compiled June 8, 2011 Objecive: Locae fish from agging
More informationMath 10B: Mock Mid II. April 13, 2016
Name: Soluions Mah 10B: Mock Mid II April 13, 016 1. ( poins) Sae, wih jusificaion, wheher he following saemens are rue or false. (a) If a 3 3 marix A saisfies A 3 A = 0, hen i canno be inverible. True.
More information3.1 More on model selection
3. More on Model selecion 3. Comparing models AIC, BIC, Adjused R squared. 3. Over Fiing problem. 3.3 Sample spliing. 3. More on model selecion crieria Ofen afer model fiing you are lef wih a handful of
More informationIMPLICIT AND INVERSE FUNCTION THEOREMS PAUL SCHRIMPF 1 OCTOBER 25, 2013
IMPLICI AND INVERSE FUNCION HEOREMS PAUL SCHRIMPF 1 OCOBER 25, 213 UNIVERSIY OF BRIISH COLUMBIA ECONOMICS 526 We have exensively sudied how o solve sysems of linear equaions. We know how o check wheher
More informationMachine Learning 4771
ony Jebara, Columbia Universiy achine Learning 4771 Insrucor: ony Jebara ony Jebara, Columbia Universiy opic 20 Hs wih Evidence H Collec H Evaluae H Disribue H Decode H Parameer Learning via JA & E ony
More informationChapter 6. Systems of First Order Linear Differential Equations
Chaper 6 Sysems of Firs Order Linear Differenial Equaions We will only discuss firs order sysems However higher order sysems may be made ino firs order sysems by a rick shown below We will have a sligh
More informationODEs II, Lecture 1: Homogeneous Linear Systems - I. Mike Raugh 1. March 8, 2004
ODEs II, Lecure : Homogeneous Linear Sysems - I Mike Raugh March 8, 4 Inroducion. In he firs lecure we discussed a sysem of linear ODEs for modeling he excreion of lead from he human body, saw how o ransform
More informationOn Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems
MATHEMATICS OF OPERATIONS RESEARCH Vol. 38, No. 2, May 2013, pp. 209 227 ISSN 0364-765X (prin) ISSN 1526-5471 (online) hp://dx.doi.org/10.1287/moor.1120.0562 2013 INFORMS On Boundedness of Q-Learning Ieraes
More informationA Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs
PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 A Primal-Dual Type Algorihm wih he O(/) Convergence Rae for Large Scale Consrained Convex Programs Hao Yu and Michael J. Neely Absrac This paper considers
More informationSpeech and Language Processing
Speech and Language rocessing Lecure 4 Variaional inference and sampling Informaion and Communicaions Engineering Course Takahiro Shinozaki 08//5 Lecure lan (Shinozaki s par) I gives he firs 6 lecures
More informationFinal Spring 2007
.615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o
More informationY. Xiang, Learning Bayesian Networks 1
Learning Bayesian Neworks Objecives Acquisiion of BNs Technical conex of BN learning Crierion of sound srucure learning BN srucure learning in 2 seps BN CPT esimaion Reference R.E. Neapolian: Learning
More informationReferences are appeared in the last slide. Last update: (1393/08/19)
SYSEM IDEIFICAIO Ali Karimpour Associae Professor Ferdowsi Universi of Mashhad References are appeared in he las slide. Las updae: 0..204 393/08/9 Lecure 5 lecure 5 Parameer Esimaion Mehods opics o be
More informationSystem of Linear Differential Equations
Sysem of Linear Differenial Equaions In "Ordinary Differenial Equaions" we've learned how o solve a differenial equaion for a variable, such as: y'k5$e K2$x =0 solve DE yx = K 5 2 ek2 x C_C1 2$y''C7$y
More informationGENERALIZATION OF THE FORMULA OF FAA DI BRUNO FOR A COMPOSITE FUNCTION WITH A VECTOR ARGUMENT
Inerna J Mah & Mah Sci Vol 4, No 7 000) 48 49 S0670000970 Hindawi Publishing Corp GENERALIZATION OF THE FORMULA OF FAA DI BRUNO FOR A COMPOSITE FUNCTION WITH A VECTOR ARGUMENT RUMEN L MISHKOV Received
More informationEXERCISES FOR SECTION 1.5
1.5 Exisence and Uniqueness of Soluions 43 20. 1 v c 21. 1 v c 1 2 4 6 8 10 1 2 2 4 6 8 10 Graph of approximae soluion obained using Euler s mehod wih = 0.1. Graph of approximae soluion obained using Euler
More informationConvergence of the Neumann series in higher norms
Convergence of he Neumann series in higher norms Charles L. Epsein Deparmen of Mahemaics, Universiy of Pennsylvania Version 1.0 Augus 1, 003 Absrac Naural condiions on an operaor A are given so ha he Neumann
More informationnon -negative cone Population dynamics motivates the study of linear models whose coefficient matrices are non-negative or positive.
LECTURE 3 Linear/Nonnegaive Marix Models x ( = Px ( A= m m marix, x= m vecor Linear sysems of difference equaions arise in several difference conexs: Linear approximaions (linearizaion Perurbaion analysis
More informationDecentralized Stochastic Control with Partial History Sharing: A Common Information Approach
1 Decenralized Sochasic Conrol wih Parial Hisory Sharing: A Common Informaion Approach Ashuosh Nayyar, Adiya Mahajan and Demoshenis Tenekezis arxiv:1209.1695v1 [cs.sy] 8 Sep 2012 Absrac A general model
More informationACE 562 Fall Lecture 8: The Simple Linear Regression Model: R 2, Reporting the Results and Prediction. by Professor Scott H.
ACE 56 Fall 5 Lecure 8: The Simple Linear Regression Model: R, Reporing he Resuls and Predicion by Professor Sco H. Irwin Required Readings: Griffihs, Hill and Judge. "Explaining Variaion in he Dependen
More informationTesting the Random Walk Model. i.i.d. ( ) r
he random walk heory saes: esing he Random Walk Model µ ε () np = + np + Momen Condiions where where ε ~ i.i.d he idea here is o es direcly he resricions imposed by momen condiions. lnp lnp µ ( lnp lnp
More informationComparing Means: t-tests for One Sample & Two Related Samples
Comparing Means: -Tess for One Sample & Two Relaed Samples Using he z-tes: Assumpions -Tess for One Sample & Two Relaed Samples The z-es (of a sample mean agains a populaion mean) is based on he assumpion
More informationApplication of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing
Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology
More information20. Applications of the Genetic-Drift Model
0. Applicaions of he Geneic-Drif Model 1) Deermining he probabiliy of forming any paricular combinaion of genoypes in he nex generaion: Example: If he parenal allele frequencies are p 0 = 0.35 and q 0
More informationMATH 4330/5330, Fourier Analysis Section 6, Proof of Fourier s Theorem for Pointwise Convergence
MATH 433/533, Fourier Analysis Secion 6, Proof of Fourier s Theorem for Poinwise Convergence Firs, some commens abou inegraing periodic funcions. If g is a periodic funcion, g(x + ) g(x) for all real x,
More informationBiol. 356 Lab 8. Mortality, Recruitment, and Migration Rates
Biol. 356 Lab 8. Moraliy, Recruimen, and Migraion Raes (modified from Cox, 00, General Ecology Lab Manual, McGraw Hill) Las week we esimaed populaion size hrough several mehods. One assumpion of all hese
More information14 Autoregressive Moving Average Models
14 Auoregressive Moving Average Models In his chaper an imporan parameric family of saionary ime series is inroduced, he family of he auoregressive moving average, or ARMA, processes. For a large class
More informationUnderstanding the asymptotic behaviour of empirical Bayes methods
Undersanding he asympoic behaviour of empirical Bayes mehods Boond Szabo, Aad van der Vaar and Harry van Zanen EURANDOM, 11.10.2011. Conens 2/20 Moivaion Nonparameric Bayesian saisics Signal in Whie noise
More informationSolutions of Sample Problems for Third In-Class Exam Math 246, Spring 2011, Professor David Levermore
Soluions of Sample Problems for Third In-Class Exam Mah 6, Spring, Professor David Levermore Compue he Laplace ransform of f e from is definiion Soluion The definiion of he Laplace ransform gives L[f]s
More information5. Stochastic processes (1)
Lec05.pp S-38.45 - Inroducion o Teleraffic Theory Spring 2005 Conens Basic conceps Poisson process 2 Sochasic processes () Consider some quaniy in a eleraffic (or any) sysem I ypically evolves in ime randomly
More informationRight tail. Survival function
Densiy fi (con.) Lecure 4 The aim of his lecure is o improve our abiliy of densiy fi and knowledge of relaed opics. Main issues relaed o his lecure are: logarihmic plos, survival funcion, HS-fi mixures,
More informationAnnouncements: Warm-up Exercise:
Fri Apr 13 7.1 Sysems of differenial equaions - o model muli-componen sysems via comparmenal analysis hp//en.wikipedia.org/wiki/muli-comparmen_model Announcemens Warm-up Exercise Here's a relaively simple
More informationChapter 3 Boundary Value Problem
Chaper 3 Boundary Value Problem A boundary value problem (BVP) is a problem, ypically an ODE or a PDE, which has values assigned on he physical boundary of he domain in which he problem is specified. Le
More informationComputer-Aided Analysis of Electronic Circuits Course Notes 3
Gheorghe Asachi Technical Universiy of Iasi Faculy of Elecronics, Telecommunicaions and Informaion Technologies Compuer-Aided Analysis of Elecronic Circuis Course Noes 3 Bachelor: Telecommunicaion Technologies
More informationOn Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature
On Measuring Pro-Poor Growh 1. On Various Ways of Measuring Pro-Poor Growh: A Shor eview of he Lieraure During he pas en years or so here have been various suggesions concerning he way one should check
More information