ONLINE BAYESIAN KERNEL REGRESSION FROM NONLINEAR MAPPING OF OBSERVATIONS

Size: px
Start display at page:

Download "ONLINE BAYESIAN KERNEL REGRESSION FROM NONLINEAR MAPPING OF OBSERVATIONS"

Transcription

1 ONLINE BAYESIAN KERNEL REGRESSION FROM NONLINEAR MAPPING OF OBSERVATIONS Mattheu Gest 1,2, Olver Petqun 1 and Gabrel Frcout 2 1 SUPELEC, IMS Research Group, Metz, France 2 ArcelorMttal Research, MCE Department, Mazères-lès-Metz, France ABSTRACT In a large number of applcatons, engneers have to estmate a functon lnked to the state of a dynamc system. To do so, a sequence of samples drawn from ths unknown functon s observed whle the system s transtng from state to state and the problem s to generalze these observatons to unvsted states. Several solutons can be envsoned among whch regressng a famly of parameterzed functons so as to make t ft at best to the observed samples. However classcal methods cannot handle the case where actual samples are not drectly observable but only a nonlnear mappng of them s avalable, whch happen when a specal sensor has to be used or when solvng the Bellman equaton n order to control the system. Ths paper ntroduces a method based on Bayesan flterng and kernel machnes desgned to solve the trcky problem at sght. Frst expermental results are promsng. 1. INTRODUCTION In a large number of applcatons, engneers have to estmate a functon lnked to the state of a dynamc system. Ths functon s generally qualfyng the system (e.g. the magntude of the eletro-magnetc feld n a wf network accordng to the power emtted by each antenna n the network). To do so, a sequence of samples drawn from ths unknown functon s observed whle the system s transtng from state to state and the problem s to generalze these observatons to unvsted states. Several solutons can be envsoned among whch regressng a famly of parameterzed functons so as to make t ft at best to the observed samples. Ths s a well known problem that s addressed by many standard methods n the lterature such as the kernel machnes methods [1] or neural networks [2]. Yet, several problems are not usually handled by standard technques. For nstance, actual samples are sometmes not drectly observable but only a nonlnear mappng of them s avalable. Ths s the case when a specal sensor has to be used (e.g. measurng a temperature usng a spectrometer or a thermocouple). Ths s also the case when solvng the Bellman equaton n a Markovan decson process wth unknown determnstc transtons [3]. Ths s mportant for (asynchronous and onlne) dynamc programmng, and more generally for control theory. Another problem s to handle onlne regresson, that s ncrementally mprove the regresson results as new samples are observed by recursvely updatng prevously computed parameters. In ths paper we propose a method, based on the Bayesan flterng paradgm and kernel machnes, for onlne regresson of non-lnear mappng of observatons. We have chosen here to descrbe a qute general formulaton of the problem n order to appeal a broader audence. Indeed the technque ntroduced below handles well non-lneartes n a dervatve free way and t should be useful n other felds Problem statement Through the rest of ths paper, x wll denote a column vector and x a scalar. Let x = (x 1, x 2 ) X = X 1 X 2 where X 1 (resp. X 2 ) s a compact set of R n (resp. R m ). Let t : X 1 X 2 X 1 be a nonlnear transformaton (transtons n case of dynamc systems) whch wll be observed. Let g be a nonlnear mappng such that g : f R X g f R X X 1. The am here s to approxmate sequentally the nonlnear functon f : x X f(x) = f(x 1, x 2 ) R from samples ( xk, t k = t(x 1 k, x 2 k), y k = g f (x 1 k, x 2 k, t k ) ) k by a functon ˆf θ (x) = ˆf θ (x 1, x 2 ) parameterzed by the vector θ. Here the output s scalar, however the proposed method can be straghtforwardly extended to vectoral outputs. Note that as j R X R X g R X X 1 ff R X g f : x X g f (x, t(x)) ths problem statement s qute general. Thus the work presented n ths paper can be consdered wth g : f R X g f R X (g beng known analytcally), ths case beng more specfc. The nterest of ths partcular formulaton s that a part of the nonlneartes has to be known analytcally (the mappng g) whereas the other part can be just observed (the t functon).

2 1.2. Outlne A kernel-based regresson s used, namely the approxmaton s of the form ˆf θ (x) = p =1 α K(x, x ) where K s a kernel, that s a contnuous, symmetrc and postve sem-defnte functon. The parameter vector θ contans the weghts (α ), and possbly the centers (x ) and some parameters of the kernel (e.g. the varance for Gaussan kernels). These methods rely on the Mercer theorem [4] whch states that each kernel s a dot product n a hgher dmensonal space. More precsely, for each kernel K, t exsts a mappng ϕ : X F (F beng called the feature space) such that x, y X, K(x, y) = ϕ(x), ϕ(y). Thus, any lnear regresson algorthm whch only uses dot products can be cast by ths kernel trck nto a nonlnear one by mplctly mappng the orgnal space X to a hgher dmensonal one. Many approaches to kernel regresson can be found n the lterature, the most classcal beng the Support Vector Machnes (SVM) framework [4]. There are qute fewer Bayesan approaches, nonetheless the reader can refer to [5] or [6] for nterestng examples. To our knowledge, none of them s desgned to handle the regresson problem descrbed n ths paper. As a preprocessng step a pror on the kernel to be used s put (e.g. a Gaussan kernel wth a specfc wdth), and a dctonary method [7] s used to compute an approxmate bass of ϕ(x ). Ths gves a far sparse and nose-ndependent ntalsaton for the number of kernels to be used and ther centers. Followng [8], the regresson problem s cast nto a state-space representaton θ k+1 = θ k + v k (1) y k = g ˆfθk (x 1 k, x 2 k, t k ) + n k where v k s an artfcal process nose and n k s an observaton nose. A Sgma Pont Kalman Flter (SPKF) [9] s used to sequentally estmate the parameters, as the samples (x k, t(x k ), y k ) k are observed. In the followng sectons, the SPKF approach and the dctonary method wll be frst brefly presented. The proposed algorthm, whch s based on the two forementoned methods, wll then be exhbted, and the frst expermental results wll be shown. Eventually, future works wll be sketched. 2. BACKGROUND 2.1. Bayesan Flterng Paradgm The problem of Bayesan flterng can be expressed n ts state-space formulaton: s k+1 = ψ k (s k, v k ) (2) y k = h k (s k, n k ) The objectve s to sequentally nfer E[s k y 1:k ], the expectaton of the hdden state s k gven the sequence of observatons up to tme k, y 1:k, knowng that the state evoluton s drven by the possbly nonlnear mappng ψ k and the process nose v k. The observaton y k, through the possbly nonlnear mappng h k, s a functon of the state s k, corrupted by an observaton nose n k. If the mappngs are lnear and f the noses v k and n k are addtonal Gaussan noses, the optmal soluton s gven by the Kalman flter [10]: quanttes of nterest are random varables, and nference (that s predcton of these quanttes and correcton of them gven a new observaton) s done onlne by propagatng suffcent statstcs through lnear transformatons. If one of these two hypothess does not hold, approxmate solutons exst. See [11] for a survey on Bayesan flterng. The Sgma Pont framework [9] s a nonlnear extenson of Kalman flterng (random noses are stll Gaussan). The basc dea s that t s easer to approxmate a probablty dstrbuton than an arbtrary nonlnear functon. Recall that n the Kalman flter framework, bascally lnear transformatons are appled to suffcent statstcs. In the Extended Kalman Flter (EKF) approach, nonlnear mappngs are lnearzed. In the Sgma Pont approach, a set of so-called sgma ponts are determnstcally computed usng the estmates of mean and of covarance of the random varable of nterest. These ponts are representatve of the current dstrbuton. Then, nstead of computng a lnear (or lnearzed) mappng of the dstrbuton of nterest, one calculates a nonlnear mappng of these sgma ponts, and use them to compute suffcent statstcs of nterest for predcton and correcton equatons, that s to approxmate the followng dstrbutons: p(s k Y 1:k 1 ) = p(s k S k 1 )p(s k 1 Y 1:k 1 )ds k 1 S p(y k S k )p(s k Y 1:k 1 ) p(s k Y 1:k ) = S p(y k S k )p(s k Y 1:k 1 )ds k Note that SPKF and classcal Kalman equatons are very smlar. The major change s how to compute suffcent statstcs (drectly for Kalman, through sgma ponts computaton for SPKF). Table 1 sketches a SPKF update n the case of addtve nose, based on the state-space formulaton, and usng the standard Kalman notatons: s k k 1 denotes a predcton, s k k an estmate (or correcton), P sk,y k a covarance matrx, n k a mean and k s the dscrete tme ndex. The reader can refer to [9] for detals Dctonary method As sad n secton 1, the kernel trck corresponds to a dot product n a hgher dmensonal space F, assocated wth a mappng ϕ. By observng that although F s a (very) hgher dmensonal space, ϕ(x ) can be a qute smaller em-

3 nputs: outputs: Table 1. SPKF Update s k 1 k 1, P sk 1 k 1 s k k, P sk k Sgma ponts computaton: Compute determnstcally sgma-pont set S k 1 k 1 from s k 1 k 1 and P sk 1 k 1 Predcton step: Compute S k k 1 from ψ k (S k 1 k 1, v k ) and process nose covarance Compute s k k 1 and P sk k 1 from S k k 1 Correcton step: Observe y k Y k k 1 = h k (S k k 1, n k ) Compute ȳ k k 1, P yk k 1 and P sk k 1,y k k 1 from S k k 1, Y k k 1 and observaton nose covarance K k = P sk k 1,y k k 1 Py 1 k k 1 s k k = s k k 1 + K k (y k ȳ k k 1 ) P sk k = P sk k 1 K k P yk k 1 Kk T beddng, the objectve s to fnd a set of p ponts n X such that ϕ(x ) Span {ϕ( x 1 ),..., ϕ( x p )}. Ths procedure s teratve. Suppose that samples x 1, x 2,... are sequentally observed. At tme k, a dctonary D k 1 = ( x j ) m k 1 j=1 (x j ) k 1 j=1 of m k 1 elements s avalable where by constructon feature vectors ϕ( x j ) are approxmately lnearly ndependent n F. A sample x k s then observed, and s added to the dctonary f ϕ(x k ) s lnearly ndependent on D k 1. To test ths, weghts a = (a 1,..., a mk 1 ) T have to be computed so as to verfy 2 δ k = mn a R m k 1 m k 1 a j ϕ( x j ) ϕ(x k ) j=1 Formally, f δ k = 0 then the feature vectors are lnearly dependent, otherwse not. Practcally an approxmate dependence s allowed, and δ k wll be compared to a predefned threshold ν determnng the qualty of the approxmaton (and consequently the sparsty of the dctonary). Thus the feature vectors wll be consdered as approxmately lnearly dependent f δ k ν. By usng the kernel trck and the blnearty of dot products, equaton (3) can be rewrtten as δ k = mn a n a T Kk 1 a 2a T kk 1 (x t) + K(x k, x k ) where ( K k 1 ),j = K( x, x j ) s a m k 1 m k 1 matrx and ( k k 1 (x)) = K(x, x ) s a m k 1 1 vector. If δ k > ν, x k = x mk s added to the dctonary, otherwse not. Equaton (4) admts the analytcal soluton a k = K k 1 k 1 k 1 (x k ). Moreover t exsts a computatonally effcent algorthm whch uses the parttoned matrx nverson formula to construct ths dctonary. See [7] for detals. o (3) (4) Thus, by choosng a pror on the kernel to be used, and by applyng ths algorthm to a set of N ponts randomly sampled from X, a sparse set of good canddates to the kernel regresson problem s obtaned. Ths method s theoretcally well founded, easy to mplement, computatonally effcent and t does not depend on kernels nor space s topology. Note that, despte the fact that ths algorthm s naturally onlne, ths dctonary cannot be bult (straghtforwardly) whle estmatng the parameters, snce the parameters of the chosen kernels (such as mean and devaton for Gaussan kernels) wll be parameterzed as well. Observe that only bounds on X have to be known n order to compute ths dctonary, and not the samples used for regresson. 3. PROPOSED ALGORITHM Recall that the objectve here s to sequentally approxmate a nonlnear functon, as samples (x k, t k = t(x 1 k, x2 k ), y k) are avalable, wth a set of kernel functons. Ths parameter estmaton problem can be expressed as a state-space problem (1). Note that here f does not depend on tme, but we thnk that ths approach can be easly extended to nonstatonary functon approxmaton. In ths paper the approxmaton s of the form ˆf θ (x) = p α K σ 1 (x 1, x 1 )K σ 2 (x 2, x 2 ), wth (5) =1 θ = [(α ) p =1, ( (x 1 ) T ) p =1, (σ1 ) p =1, ( (x 2 ) T ) p =1, (σ2 ) p =1 ]T and K σ j (x j, x j ) = x j exp( xj 2 2(σ j ), j = 1, 2 )2 ( Note that K σ (x, x ) = K (σ 1,σ 2 (x ) 1, x 2 ), (x 1, x2 )) = K σ 1 (x 1, x 1 )K σ 2(x2, x 2 ) s a product of kernels, thus t s a kernel. The optmal number of kernels and a good ntalsaton for the parameters have to be determned frst. To tackle ths ntal dffculty, the dctonary method s used n a preprocessng step. A pror σ 0 = (σ0, 1 σ0) 2 on the Gaussan wdth s frst put (the same for each kernel). Then a set of N random ponts s sampled unformly from X. Fnally, these samples are used to compute the dctonary. A set of p ponts D = {x 1,..., x p } s thus obtaned such that ϕ(x ) Span {ϕ σ0 (x 1 ),..., ϕ σ0 (x p )} where ϕ σ0 s the mappng correspondng to K σ0. Note that even though ths sparsfcaton procedure s offlne, the proposed algorthm (the regresson part) s onlne. Moreover, no tranng sample s requred for ths preprocessng step, but only classcal pror whch s anyway requred for the Bayesan flter (σ 0 ), one sparsfcaton parameter ν and bounds for X. A SPKF s then used to estmate the parameters. As for any Bayesan approach an ntal pror on the parameter dstrbuton has to be put. The ntal parameter vector follows

4 a Gaussan law, that s θ 0 N ( θ 0, Σ θ0 ), where θ 0 = [α 0,..., α 0, D 1, σ 1 0,..., σ 1 0, D 2, σ 2 0,..., σ 2 0] T, and Σ θ0 = dag(σα 2 0,..., σ 2 µ,..., σ σ,..., σ µ,..., σ σ... ) 0, 2 In these equatons, α 0 s the pror mean on kernel weghts, D j = [(x j 1 )T,..., (x j p) T ] s derved from the dctonary computed n the preprocessng step, σ j 0 s the pror mean on kernel devaton, and σα 2 0, σ 2 (whch s a row vector µ j 0 wth the varance for each component of x j, ), σ2 σ 0 are respectvely the pror varances on kernel weghts, centers and devatons. All these parameters (except the dctonary) have to be set up beforehand. Let θ = q = (n + m + 3)p. Note that θ 0 R q and Σ θ0 R q q. A pror on noses has to be put too: more precsely, v 0 N (0, R v0 ) where R v0 = σv 2 0 I q, I q beng the dentty matrx, and n N (0, R n ), where R n = σn. 2 Note that here the observaton nose s also structural. In other words, t models the ablty of the parameterzaton ˆf θ to approxmate the true functon of nterest f. Then, a SPKF update (whch updates the suffcent statstcs of the parameters) s smply appled at each tme step, as a new tranng sample (x k, t k, y k ) s avalable. A specfc form of SPKF s used, the so-called Square- Root Central Dfference Kalman Flter (SR-CDKF) parameter estmaton form. Ths mplementaton uses the fact that for ths parameter estmaton problem, the evoluton equaton s lnear and the nose s addtve. Ths approach s computatonally cheaper: the complexty per teraton s O( θ 2 ), whereas t s O( θ 3 ) n the general case. See [9] for detals. A last ssue s to choose the artfcal process nose. Formally, snce the target functon s statonary, there s no process nose. But the parameter space has to be explored n order to fnd a good soluton, that s an artfcal process nose must be ntroduced. Choosng ths nose s stll an open research problem. Followng [9] a Robbns-Monro stochastc approxmaton scheme for estmatng nnovaton s used. That s the process nose covarance s set to R vk = (1 α)r vk 1 + αk k (y k g ˆfθk 1 (x k, t k )) 2 K T k Here α s a forgettng factor set by the user, and K k s the Kalman gan obtaned durng the SR-CDKF update. Ths approach s summarzed n Table PRELIMINARY RESULTS In ths secton the results of our prelmnary experments are presented. The proposed artfcal problem s qute arbtrary, t has been chosen for ts nterestng nonlneartes so as to emphasze on the potental benefts of the proposed approach. More results focusng on the applcaton of ths method to renforcement learnng are presented n [12]. Table 2. Proposed algorthm nputs:ν,n,α 0, σ j 0, σα 0,σ j σ, σ j 0 µ,σ v0, σ n 0 outputs: θ, Σθ Compute dctonary; {1... N}, x U X Set X = {x 1,..., x N } D =Compute-Dctonary(X,ν,σ 0) Intalzaton: Intalze θ 0, Σ θ0, R n0, R v for k = 1, 2,... do Observe (x k, t k, y k ) SR-CDKF update: [ θ k, Σ θk, K k ] = SR-CDKF-Update( θ k 1,Σ θk 1,x k,t k,......y k,r vk 1,R n) Artfcal process nose update: R vk = (1 α)r vk 1 + αk k (y k g ˆfθk 1 (x k, t k )) 2 Kk T end for Let X = [ 10, 10] 2 and f be a 2d nonlnear cardnal sne: f(x 1, x 2 ) = sn(x1 ) x 1 + x1 x Let the observed nonlnear transformaton be t(x 1, x 2 ) = 10 tanh( x1 7 ) + sn(x2 ) Recall that ths analytcal expresson s only used to generate observatons n ths experment but t s not used n the regresson algorthm. Let the nonlnear mappng g be ( g f x 1, x 2, t(x 1, x 2 ) ) = f(x 1, x 2 ) γ max f ( t(x 1, x 2 ), z ) z X 2 wth γ [0, 1[ a predefned constant. Ths s a specfc form of the Bellman equaton [3] wth the transformaton t beng a determnstc transton functon. Recall that ths functon s of specal nterest n optmal control theory. The assocated state-space formulaton s thus θ k+1 = θ k + v k (6) y k = ˆf θk (x 1 k, x 2 ( k) γ max ˆfθk t(x 1 k, x 2 k), z ) + n k z X Problem statement and settngs At each tme step k the flter observes x k U X (x k unformly sampled from X ), the transformaton t k = t(x 1 k, x2 k ) and y k = f(x k ) γ max z X 2 f(t k, z). Once agan the regressor does not have to know the functon t analytcally, t just has to observe t. The functon f s shown on Fg. 1(a) and ts nonlnear mappng on Fg. 1(b). For these experments the algorthm parameters were set to N = 1000,

5 (a) 2-dmensonal nonlnear cardnal sne. (a) Approxmaton of f(x). (b) Nonlnear mappng observed by the regressor. Fg dmensonal nonlnear cardnal sne and ts observed nonlnear mappng. σ j 0 = 4.4, ν = 0.1, α 0 = 0, σ n = 0.05, α = 0.7, and all varances (σα 2 0, σ 2, σ 2 σ µ j 0 σ0, 2 j n 0 ) to 0.1, for j = 1, 2. The factor γ was set to 0.9. Note that ths emprcal parameters were not fnely tuned Qualty of regresson Because of the specal form of g f, any bas added to f stll gves the same observatons and t may exst other nvarances: the root mean square error (RMSE) between f and ˆf should not be used to measure the qualty of regresson. Instead the nonlnear mappng of f s computed from ts estmate ˆf and s then used to calculate the RMSE. The qualty of regresson s thus measured wth the followng RMSE: ( X (g f (x, t(x)) g fθ ˆ (x, t(x))) 2 dx) 1 2. As t s computed from ˆf, t really measures the qualty of regresson, and as t uses the assocated nonlnear mappng, t wll not take nto account the bas. Practcally t s computed on 10 4 equally spaced ponts. The functon g f s what s observed (Fg. 1(b)) and t s used by the SPKF to approxmate the functon f (Fg. 1(a)) by ˆf θ (Fg. 2(a)). We compute g fθ ˆ from ˆf θ (Fg. 2(b)) and use t to compute the RMSE. As the proposed algorthm s stochastc, the results have been averaged over 100 runs. Fg. 3 shows an errorbar plot, that s mean ± standard devaton of RMSE. The average (b) Nonlnear mappng calculated from ˆf θ (x) Fg. 2. Approxmaton of f(x)and assocated approxmated nonlnear mappng. number of kernels was ± Ths results can be compared wth the RMSE drectly computed from ˆf, that s ( x X (f(x) ˆf θ (x)) 2 dx) 1 2, whch s llustrated on Table 3. There s more varance and bas n these results because of the possble nvarances of the consdered nonlnear mappng. However one can observe on Fg. 2(a) that a good approxmaton s obtaned. Table 3. RMSE (mean ± devaton) of g ˆfθ and ˆf θ as a functon of number of samples g ˆfθ ± ± ± ˆf θ ± ± ± Thus the RMSE computed from g f s a better qualty measure. As far as we know, no other method to handle such a problem has been publshed so far, thus comparsons wth other approaches s made dffcult. However the order of magntude for the RMSE obtaned from g f s comparable wth the state-of-the-art regresson approaches when the functon of nterest s drectly observed. Ths s demonstrated on Table 4. Here the problem s to regress a lnear 2d cardnal sne f(x 1, x 2 ) = sn(x1 ) x + x For the proposed algorthm, the nonlnear mappng s the same as consdered before. We compare RMSE results (10 3 tranng ponts) to

6 s planned to be lnked to the qualty of regresson. Yet the uncertanty reducton occurrng when acqurng more samples (thus more nformaton) could be quantfed. 6. REFERENCES [1] B. Scholkopf and A. J. Smola, Learnng wth Kernels: Support Vector Machnes, Regularzaton, Optmzaton, and Beyond, MIT Press, Fg. 3. RMSE (g ˆfθ ) averaged over 100 runs. Kernel Recursve Least Squares (KRLS) and Support Vector Regresson (SVR). For the proposed approach, the approxmaton s computed from nonlnear mappng of observatons. For KLRS and SVR, t s computed from drect observatons. Notce that SVR s a batch method, that s t needs the whole set of samples n advance. The target of the regressor s ndeed g f, and we obtan the same order of magntude (however much nonlneartes are ntroduced for our method and the representaton s more sparse). The RMSE on ˆf s slghtly hgher for ths contrbuton, but ths can be mostly explaned by the nvarances (as bas nvarance) nduced by the nonlnear mappng. Table 4. Comparatve results (KRLS and SVR results are reproduced from [7]). The frst column gves RMSE (from ˆf and g ˆf for the proposed algorthm, from ˆf for the two others), and the second one the percentage of support vector/dctonary vectors from the tranng sample used. test error Method ˆf g ˆf %DV/SVs Proposed Alg % KRLS % SVR % 5. CONCLUSIONS AND FUTURE WORKS An approach allowng to regress a functon of nterest f from observatons whch are obtaned through a nonlnear mappng of t has been proposed. The regresson s onlne, kernel-based, nonlnear and Bayesan. A preprocessng step allows to acheve sparsty of representaton. Ths method has proven to be effectve on an artfcal problem and reaches good performance although much more nonlneartes are ntroduced. Ths approach s planned to be extended to stochastc transformaton functon (the observed part of nonlneartes) n order to solve a more general form of the Bellman equaton. The adaptve artfcal process nose s planned to be extended, and an adaptve observaton nose s on study. Fnally, the approxmaton ˆf θ beng a mappng of the random varable θ, t s a random varable. An uncertanty nformaton can thus be derved from t and [2] C. M. Bshop, Neural Networks for Pattern Recognton, Oxford Unversty Press, New York, USA, [3] R. Bellman, Dynamc Programmng, Dover Publcatons, sxth edton, [4] V. N. Vapnk, Statscal Learnng Theory, John Wley & Sons, Inc., [5] C. M. Bshop and M. E. Tppng, Bayesan Regresson and Classfcaton, n Advances n Learnng Theory: Methods, Models and Applcatons. 2003, vol. 190, pp , OS Press, NATO Scence Seres III: Computer and Systems Scences. [6] J. Vermaak, S. J. Godsll, and A. Doucet, Sequental Bayesan Kernel Regresson, n Advances n Neural Informaton Processng Systems , MIT Press. [7] Y. Engel, S. Mannor, and R. Mer, The Kernel Recursve Least Squares Algorthm, IEEE Transactons on Sgnal Processng, vol. 52, no. 8, pp , [8] E. A. Wan and R. van der Merwe, The Unscented Kalman Flter for nonlnear estmaton, n Adaptve Systems for Sgnal Processng, Communcatons, and Control Symposum, 2000, pp [9] R. van der Merwe and E. A. Wan, Sgma-Pont Kalman Flters for Probablstc Inference n Dynamc State-Space Models, n Workshop on Advances n Machne Learnng, Montreal, Canada, June [10] R. E. Kalman, A New Approach to Lnear Flterng and Predcton Problems, Transactons of the ASME Journal of Basc Engneerng, vol. 82, Seres D, pp , [11] Z. Chen, Bayesan Flterng : From Kalman Flters to Partcle Flters, and Beyond, Tech. Rep., Adaptve Systems Lab, McMaster Unversty, [12] Mattheu Gest, Olver Petqun, and Gabrel Frcout, Bayesan Reward Flterng, n European Workshop on Renforcement Learnng (EWRL 2008), Llle (France), 2008, Lecture Notes n Artfcal Intellgence, Sprnger Verlag, to appear.

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

STATISTICALLY LINEARIZED RECURSIVE LEAST SQUARES. Matthieu Geist and Olivier Pietquin. IMS Research Group Supélec, Metz, France

STATISTICALLY LINEARIZED RECURSIVE LEAST SQUARES. Matthieu Geist and Olivier Pietquin. IMS Research Group Supélec, Metz, France SAISICALLY LINEARIZED RECURSIVE LEAS SQUARES Mattheu Gest and Olver Petqun IMS Research Group Supélec, Metz, France ABSRAC hs artcle proposes a new nterpretaton of the sgmapont kalman flter SPKF for parameter

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

Time-Varying Systems and Computations Lecture 6

Time-Varying Systems and Computations Lecture 6 Tme-Varyng Systems and Computatons Lecture 6 Klaus Depold 14. Januar 2014 The Kalman Flter The Kalman estmaton flter attempts to estmate the actual state of an unknown dscrete dynamcal system, gven nosy

More information

Lecture 3: Dual problems and Kernels

Lecture 3: Dual problems and Kernels Lecture 3: Dual problems and Kernels C4B Machne Learnng Hlary 211 A. Zsserman Prmal and dual forms Lnear separablty revsted Feature mappng Kernels for SVMs Kernel trck requrements radal bass functons SVM

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Quantifying Uncertainty

Quantifying Uncertainty Partcle Flters Quantfyng Uncertanty Sa Ravela M. I. T Last Updated: Sprng 2013 1 Quantfyng Uncertanty Partcle Flters Partcle Flters Appled to Sequental flterng problems Can also be appled to smoothng problems

More information

THE Kalman filter (KF) rooted in the state-space formulation

THE Kalman filter (KF) rooted in the state-space formulation Proceedngs of Internatonal Jont Conference on Neural Networks, San Jose, Calforna, USA, July 31 August 5, 211 Extended Kalman Flter Usng a Kernel Recursve Least Squares Observer Pngpng Zhu, Badong Chen,

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30 STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear

More information

A Particle Filter Algorithm based on Mixing of Prior probability density and UKF as Generate Importance Function

A Particle Filter Algorithm based on Mixing of Prior probability density and UKF as Generate Importance Function Advanced Scence and Technology Letters, pp.83-87 http://dx.do.org/10.14257/astl.2014.53.20 A Partcle Flter Algorthm based on Mxng of Pror probablty densty and UKF as Generate Importance Functon Lu Lu 1,1,

More information

Estimating the Fundamental Matrix by Transforming Image Points in Projective Space 1

Estimating the Fundamental Matrix by Transforming Image Points in Projective Space 1 Estmatng the Fundamental Matrx by Transformng Image Ponts n Projectve Space 1 Zhengyou Zhang and Charles Loop Mcrosoft Research, One Mcrosoft Way, Redmond, WA 98052, USA E-mal: fzhang,cloopg@mcrosoft.com

More information

Relevance Vector Machines Explained

Relevance Vector Machines Explained October 19, 2010 Relevance Vector Machnes Explaned Trstan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introducton Ths document has been wrtten n an attempt to make Tppng s [1] Relevance Vector Machnes

More information

Tracking with Kalman Filter

Tracking with Kalman Filter Trackng wth Kalman Flter Scott T. Acton Vrgna Image and Vdeo Analyss (VIVA), Charles L. Brown Department of Electrcal and Computer Engneerng Department of Bomedcal Engneerng Unversty of Vrgna, Charlottesvlle,

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Parameter Estimation for Dynamic System using Unscented Kalman filter

Parameter Estimation for Dynamic System using Unscented Kalman filter Parameter Estmaton for Dynamc System usng Unscented Kalman flter Jhoon Seung 1,a, Amr Atya F. 2,b, Alexander G.Parlos 3,c, and Klto Chong 1,4,d* 1 Dvson of Electroncs Engneerng, Chonbuk Natonal Unversty,

More information

AN IMPROVED PARTICLE FILTER ALGORITHM BASED ON NEURAL NETWORK FOR TARGET TRACKING

AN IMPROVED PARTICLE FILTER ALGORITHM BASED ON NEURAL NETWORK FOR TARGET TRACKING AN IMPROVED PARTICLE FILTER ALGORITHM BASED ON NEURAL NETWORK FOR TARGET TRACKING Qn Wen, Peng Qcong 40 Lab, Insttuton of Communcaton and Informaton Engneerng,Unversty of Electronc Scence and Technology

More information

Multigradient for Neural Networks for Equalizers 1

Multigradient for Neural Networks for Equalizers 1 Multgradent for Neural Netorks for Equalzers 1 Chulhee ee, Jnook Go and Heeyoung Km Department of Electrcal and Electronc Engneerng Yonse Unversty 134 Shnchon-Dong, Seodaemun-Ku, Seoul 1-749, Korea ABSTRACT

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Appendix B: Resampling Algorithms

Appendix B: Resampling Algorithms 407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

Multilayer Perceptron (MLP)

Multilayer Perceptron (MLP) Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING 1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N

More information

1 Motivation and Introduction

1 Motivation and Introduction Instructor: Dr. Volkan Cevher EXPECTATION PROPAGATION September 30, 2008 Rce Unversty STAT 63 / ELEC 633: Graphcal Models Scrbes: Ahmad Beram Andrew Waters Matthew Nokleby Index terms: Approxmate nference,

More information

Hidden Markov Models

Hidden Markov Models CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte

More information

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression 11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING

More information

Hidden Markov Models

Hidden Markov Models Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks Other NN Models Renforcement learnng (RL) Probablstc neural networks Support vector machne (SVM) Renforcement learnng g( (RL) Basc deas: Supervsed dlearnng: (delta rule, BP) Samples (x, f(x)) to learn

More information

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Grover s Algorithm + Quantum Zeno Effect + Vaidman Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the

More information

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan Kernels n Support Vector Machnes Based on lectures of Martn Law, Unversty of Mchgan Non Lnear separable problems AND OR NOT() The XOR problem cannot be solved wth a perceptron. XOR Per Lug Martell - Systems

More information

4DVAR, according to the name, is a four-dimensional variational method.

4DVAR, according to the name, is a four-dimensional variational method. 4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The

More information

Notes on Frequency Estimation in Data Streams

Notes on Frequency Estimation in Data Streams Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

On a direct solver for linear least squares problems

On a direct solver for linear least squares problems ISSN 2066-6594 Ann. Acad. Rom. Sc. Ser. Math. Appl. Vol. 8, No. 2/2016 On a drect solver for lnear least squares problems Constantn Popa Abstract The Null Space (NS) algorthm s a drect solver for lnear

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

Uncertainty as the Overlap of Alternate Conditional Distributions

Uncertainty as the Overlap of Alternate Conditional Distributions Uncertanty as the Overlap of Alternate Condtonal Dstrbutons Olena Babak and Clayton V. Deutsch Centre for Computatonal Geostatstcs Department of Cvl & Envronmental Engneerng Unversty of Alberta An mportant

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Linear Classification, SVMs and Nearest Neighbors

Linear Classification, SVMs and Nearest Neighbors 1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1 On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool

More information

Sparse Gaussian Processes Using Backward Elimination

Sparse Gaussian Processes Using Backward Elimination Sparse Gaussan Processes Usng Backward Elmnaton Lefeng Bo, Lng Wang, and Lcheng Jao Insttute of Intellgent Informaton Processng and Natonal Key Laboratory for Radar Sgnal Processng, Xdan Unversty, X an

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

RBF Neural Network Model Training by Unscented Kalman Filter and Its Application in Mechanical Fault Diagnosis

RBF Neural Network Model Training by Unscented Kalman Filter and Its Application in Mechanical Fault Diagnosis Appled Mechancs and Materals Submtted: 24-6-2 ISSN: 662-7482, Vols. 62-65, pp 2383-2386 Accepted: 24-6- do:.428/www.scentfc.net/amm.62-65.2383 Onlne: 24-8- 24 rans ech Publcatons, Swtzerland RBF Neural

More information

Conjugacy and the Exponential Family

Conjugacy and the Exponential Family CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the

More information

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Neural networks. Nuno Vasconcelos ECE Department, UCSD Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X

More information

Report on Image warping

Report on Image warping Report on Image warpng Xuan Ne, Dec. 20, 2004 Ths document summarzed the algorthms of our mage warpng soluton for further study, and there s a detaled descrpton about the mplementaton of these algorthms.

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

Suppose that there s a measured wndow of data fff k () ; :::; ff k g of a sze w, measured dscretely wth varable dscretzaton step. It s convenent to pl

Suppose that there s a measured wndow of data fff k () ; :::; ff k g of a sze w, measured dscretely wth varable dscretzaton step. It s convenent to pl RECURSIVE SPLINE INTERPOLATION METHOD FOR REAL TIME ENGINE CONTROL APPLICATIONS A. Stotsky Volvo Car Corporaton Engne Desgn and Development Dept. 97542, HA1N, SE- 405 31 Gothenburg Sweden. Emal: astotsky@volvocars.com

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

Solving Nonlinear Differential Equations by a Neural Network Method

Solving Nonlinear Differential Equations by a Neural Network Method Solvng Nonlnear Dfferental Equatons by a Neural Network Method Luce P. Aarts and Peter Van der Veer Delft Unversty of Technology, Faculty of Cvlengneerng and Geoscences, Secton of Cvlengneerng Informatcs,

More information

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z ) C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material Natural Images, Gaussan Mxtures and Dead Leaves Supplementary Materal Danel Zoran Interdscplnary Center for Neural Computaton Hebrew Unversty of Jerusalem Israel http://www.cs.huj.ac.l/ danez Yar Wess

More information

2 STATISTICALLY OPTIMAL TRAINING DATA 2.1 A CRITERION OF OPTIMALITY We revew the crteron of statstcally optmal tranng data (Fukumzu et al., 1994). We

2 STATISTICALLY OPTIMAL TRAINING DATA 2.1 A CRITERION OF OPTIMALITY We revew the crteron of statstcally optmal tranng data (Fukumzu et al., 1994). We Advances n Neural Informaton Processng Systems 8 Actve Learnng n Multlayer Perceptrons Kenj Fukumzu Informaton and Communcaton R&D Center, Rcoh Co., Ltd. 3-2-3, Shn-yokohama, Yokohama, 222 Japan E-mal:

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Probability Theory (revisited)

Probability Theory (revisited) Probablty Theory (revsted) Summary Probablty v.s. plausblty Random varables Smulaton of Random Experments Challenge The alarm of a shop rang. Soon afterwards, a man was seen runnng n the street, persecuted

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models

More information

A Hybrid Variational Iteration Method for Blasius Equation

A Hybrid Variational Iteration Method for Blasius Equation Avalable at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 1932-9466 Vol. 10, Issue 1 (June 2015), pp. 223-229 Applcatons and Appled Mathematcs: An Internatonal Journal (AAM) A Hybrd Varatonal Iteraton Method

More information

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016 CS 29-128: Algorthms and Uncertanty Lecture 17 Date: October 26, 2016 Instructor: Nkhl Bansal Scrbe: Mchael Denns 1 Introducton In ths lecture we wll be lookng nto the secretary problem, and an nterestng

More information

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information

A new Approach for Solving Linear Ordinary Differential Equations

A new Approach for Solving Linear Ordinary Differential Equations , ISSN 974-57X (Onlne), ISSN 974-5718 (Prnt), Vol. ; Issue No. 1; Year 14, Copyrght 13-14 by CESER PUBLICATIONS A new Approach for Solvng Lnear Ordnary Dfferental Equatons Fawz Abdelwahd Department of

More information

ECE559VV Project Report

ECE559VV Project Report ECE559VV Project Report (Supplementary Notes Loc Xuan Bu I. MAX SUM-RATE SCHEDULING: THE UPLINK CASE We have seen (n the presentaton that, for downlnk (broadcast channels, the strategy maxmzng the sum-rate

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information