Semi-supervised Learning using Kernel Self-consistent Labeling

Size: px
Start display at page:

Download "Semi-supervised Learning using Kernel Self-consistent Labeling"

Transcription

1 Sem-supervsed earnng usng Kernel Self-consstent abelng Keywords: sem-supervsed learnng, kernel methods, support vector machne, leave-one-out cross valdaton, Gaussan process. Abstract We present a new method for sem-supervsed learnng based on any gven vald kernel. Our strategy s to vew the kernel as the covarance matrx of a Gaussan process and predct the label of each nstance condtoned on all other nstances. We then fnd a self-consstent labelng of the nstances by usng the hnge loss on the predctons on labeled data and the ε nsenstve loss on predctons on the unlabeled data. Computatonally, ths leads to a quadratc programmng problem wth only bound constrants. We examne the smlartes and dfferences between the algorthm and other sem-supervsed learnng algorthms. Evaluaton of the method on both synthetc and real-world datasets gves encouragng results.. Introducton In many applcatons, labeled data are expensve but unlabeled data can be easly obtaned. In these applcatons, t s desrable to explot the unlabeled data to mprove performance of learnng algorthms. Methods that learn from both labeled and unlabeled data are called transductve or sem-supervsed learnng algorthms. Kernel methods have been used successfully n varous machne learnng problems ncludng classfcaton, regresson, rankng and dmensonalty reducton. he use of kernels allows non-lnear functons to be used effcently n these problems by mappng the nstances nto a hgher dmensonal feature space. By desgnng the kernels approprately, a rch famly of hgh dmensonal mappngs s avalable to sut varous learnng problems. In vew of the advantages and flexblty of kernel methods, effectve sem-supervsed kernel learnng algorthms could be qute useful. We use the self-consstency prncple to desgn a semsupervsed kernel classfcaton method. he dea behnd ths prncple s to label the unlabeled data n such a way that the classfer traned usng a subset of the (labeled and artfcally labeled) data would have low valdaton Prelmnary work. Under revew by the Internatonal Conference on Machne earnng (ICM). Do not dstrbute. error on the (labeled and artfcally labeled) data that s not used n tranng. hs prncple was used n desgnng the mncut based sem-supervsed learner n (Blum & Chawla, ) where the algorthm was shown to able to generate a labelng that mnmzes the leave-one-out cross valdaton (OOCV) error of the -nearest neghbor algorthm. A normalzed verson of the algorthm based on mnmzng the ratocut (Spectral Graph ransducer) was proposed n (Joachms, 3). We propose a support vector style algorthm wth ε- nsenstve regresson cost constrants for self-consstency on unlabeled ponts and hnge loss penaltes on labeled ponts. he problem can be formulated as a quadratc programmng problem wth only bound constrants, for whch global optmum can be found effcently. As the method s based on leave-one-out Gaussan process predcton usng kernels, we call the method kernel selfconsstent labelng. Kernel self-consstent labelng has nterestng connectons wth many exstng algorthms. Wthout the ε-nsenstve self-consstency constrants, the method s very smlar to a leave-one-out verson of the support vector machne. he Representer heorem states that the support vector machnes can be represented by lnearly combnng only kernel functons that are assocated wth the labeled data, regardless of whether unlabeled data s present. Interestngly, kernel self-consstent labelng utlzes kernel functons of both the labeled and unlabeled nstances even when only the hnge loss penaltes on labeled ponts are used. ke many sem-supervsed learnng methods, t s also possble to vew the kernel self-consstent labelng as a graph-based sem-supervsed learnng method. Both the mncut and the Spectral Graph ransducer algorthms are based on mnmzng edge costs on graphs. Smlarly, (Zhu, Ghahraman & afferty, 3) mnmzes the sum of squared dfferences of the functon on all pars of ponts (edges), weghted by the par-wse smlarty n the graph. Instead of mnmzng weghted cost on edges, a second class of graph algorthms mnmzes the node cost. hey express the predcted label of one nstance based on the average of other nstances labels, weghted by mutual smlarty. In (Rowes & Saul, ), the cost functon s defned on each node as the squared dfference between the node s own functon value and the weghted average values of ts neghbors n a certan localty. Smlarly, we can vew kernel self-consstent labelng as a method for

2 mnmzng node costs on a graph by usng what s known as the equvalent kernel (Slverman, 984) of the orgnal kernel as the weght matrx n a graph. Conversely, n some graph-based sem-supervsed learnng methods such as n the harmonc energy mnmzaton (Zhu et al., 3), the aplacan of the graph (Chung, 999) can be vewed as the pseudo-nverse of a kernel (Ham et al., 4). Kernels can be nterpreted as the covarance matrx of a Gaussan process except wth ndependent Gaussan nose added to the output (Seeger, 4). hus, relatonshp between gven labels, the probablty dstrbuton of unknown labels, and par-wse smlarty among examples can be represented under the framework of multvarate Gaussan dstrbutons. Interestngly, n Gaussan dstrbutons, the expected label of an unlabeled pont, condtoned on known labels, turns out not to utlze the other unlabeled data once the kernel s fxed. In the harmonc energy mnmzaton (Zhu et al., 3), the unlabeled data s utlzed to construct the aplacan and changng an unlabeled nstance would result n a dfferent kernel and hence dfferent predctons on the other unlabeled data. In contrast, kernel selfconsstent labelng uses a fxed kernel and allows the change of an unlabeled nstance to affect predctons on other unlabeled nstance by self-consstency and optmzng an approprate cost functon. he rest of ths paper s organzed as follows. Secton gves some background on Gaussan process and ntroduces the leave-one-out predcton algorthm whch establshes the relatonshps among all data ponts based on any vald kernel. Secton 3 presents the proposed mxed classfcaton and regresson leave-one-out style algorthm that uses the hnge loss on labeled data and the ε- nsenstve loss functon on unlabeled data. he model s connectons to exstng learnng algorthms are dscussed n Secton 4. Computatonal effcency s consdered n Secton 5. Secton 6 contans expermental results. Fnally the paper s concluded n Secton 7.. eave-one-out Gaussan Process Predcton We assume that ether a proper kernel functon k (x, x ) that satsfes Mercer s theorem or a vald Gram matrx K (symmetrc and postve sem-defnte) (Schölkopf & Smola, ) for both labeled and unlabeled data s gven. By Mercer s heorem, we have k (, ) ( ) ( ) xx = λφ x φ x where { φ ()} are the egenfunctons of k (,), or egenvectors of Gram matrx. hen we defne Bayesan near Regresson: zx ( ) = θ φ( x), yx ( ) = zx ( ) + ε = θ φ( x) + ε where φ( x) = ( φ( x), φ( x),...), θ (, Σ), Σ= dag( λ, λ,...), and nose ε (, σ ) whch s ndependent between dfferent x. hen covarance cov( zx ( ), zx ( )) = φ( x) Σ φ ( x ) = k ( xx, ') cov( yx ( ), yx ( )) = φ( x) Σ φ( x ) + σδ( xx, ) = k xx + xx (, ) σδ (, ) where δ ( xx, ) s Kronecker delta. So we can vew { Z( x )} as Gaussan Process (GP) wth covarance matrx K f K s nonsngular. Suppose we have labeled examples {( x, )} c = wth c {,}. hen the log posteror of the Gaussan process Z(x) s gven by J = ( c z) ( c z) + z K z () σ where the frst term s log lkelhood and the second term s the log of the pror, actng as a regularzaton factor. Assume there are U unlabeled examples {(,)} + x U = + and denote = U +. We can vew { Y( x )} as a GP as well wth covarance matrx be wrtten as K K K σ I K = +. Suppose K can K. U = KU KUU As K s postve sem-defnte, K must be postve defnte f σ. Condtonng on the gven labels c, the expected value of unlabeled data s label y U s K K c. Unfortunately, the result shows that unlabeled data are not used n predctng y U,.e., only K and the row correspondng to the sngle test example k are nvolved. Unlabeled data are not mutually related, drectly or ndrectly va labeled data, n the process of classfyng other unlabeled data. Our strategy of utlzng unlabeled data s to do leave-oneout predcton for each nstance assumng that the labels of the other nstances are known and mpose selfconsstency. We compute a decson vector y wth each element assocated wth one example and then assgn labels based on y. For labeled data, we do not fx ts assocated element n y to any predefned value, allowng t to vary. Assume we know z = (y,, y, y +,, y ) then the predcton of y s k z, where k s the th row vector of K, wthout the th element. s the matrx after removng the th row and th column of K. et b be the vector created by n- x U

3 sertng at the th element of k, then the predcton of y based on all other elements s b y. he functon b(,) s called equvalent kernel or dual kernel and s very smlar to the equvalent kernel n (Slverman, 984; Sollch, 4), defned n a supervsed learnng settngs. et B = (b,, b ). We call B the GP predcton coeffcent matrx. 3. eave-one-out ρ-ε earnng Havng bult up the relatonshp among all examples through the leave-one-out predcton formula, we can mplement the self-consstency prncple. For labeled data, snce we are only concerned about ther class and t s a classfcaton problem, there s no need to force a strong match to target value lke n regresson. Rather, the maxmum margn prncple should be preferable. On unlabeled data, f we stll use margn constrants, as n ransductve (Joachms, 999), to fnd an assgnment such that y b y ξ, where y s varable, then we nevtably come up wth a non-convex optmzaton problem. One smple alternatve s usng regresson constrants such as the ε-nsenstve loss functon (Vapnk, 995). Fnally, regularzaton s ncorporated n the same way as n (). Puttng t all together, suppose the label for the frst nstances s known and K has already ncoporated Gaussan nose, the problem s formulated as: Mnmze wth respect to { y, ξξ, '}: ξ ( ξ ξ ) () y K y C C = = + s.t. c b y ξ =... (3) b y y ε + ξ = +,..., (4) y b y ε + ξ = +,..., (5) ξ, ξ =..., = +,..., (6) he constrant (3) s margn constrant. he constrants (4) and (5) are for ε-nsenstve regresson. he frst term n () s for regularzaton. We name ths mxed classfcaton-regresson algorthm as ρ-ε learnng because ρ s usually used to denote margn. he whole problem s a quadratc programmng (QP) problem wth K (thus K ) beng postve defnte. he agrangan s: = y K y+ C ξ + C ( ξ + ξ ) = = + α ( cb y ξ ) β ( y b y ε ξ ) = = + γ ( b y y ε ξ ) λξ σξ (7) + + = + = = + So ξ = C α λ =... (8) ξ = C β λ = +,..., (9) C ξ = γ σ = +,..., () = + + = y K y αcb β b βex γ b γex = = + = + where βex 's = (,...,, β ), γex 's = (,...,, γ ). So y = K αcb β b + βex + γ b γex = = + = + () Defne x ( α, β, γ ) R = cb, cb,..., cb, =, ( ) ( ) ( ),...,, R = b+ b + U IU, R = (R, R, R ) () where I s U U dentty matrx and U U s U zero matrx. We obtan y = K( Rα + R β R γ) = KRx (3) Substtutng (8)~(3) nto (7), we obtan the dual problem: Maxmze wth respect to { αβγ,, }: x R KRx + α ε ( β + γ ) (4) = = + s.t. α [, C ]... (5) β [, C ], γ [, C ] +,..., (6) and the predcton formula s (3). ow the constrants are all bound constrants and the bound constraned QP can be solved very effcently. In our experments usng the RO algorthm (n & Moré, 999), the optmzaton usually converges to global optmum wthn seconds for 4 varables, convergng after about to teratons. As n standard whch employs norm penalty, the solutons to ρ-ε learnng also enoys a sparse structure. 4. Model Interpretaton and Connectons to Other Works he prncple dea for our model s to mnmze the leaveone-out (OO) cross valdaton error. hs s emboded by both Gaussan process predcton formula and mxed classfcaton-regresson ρ-ε model.

4 4. OO Gaussan Process predcton Sem-supervsed learnng reles heavly on smlarty between nstances. Snce kernels can also be vewed as descrpton of smlarty, t s nterestng to nterpret exstng sem-supervsed learnng algorthms from the perspectve of kernels, or Gaussan process. In (Zhu et al., 3), RBF kernel W s used as smlarty measure: = exp (7) w x x σ Defnng a decson vector f = ( f, fu ) wth f fxed to the gven labels /, we mnmze the weghted sum of the squared dfferences of f on each edge, also nterpreted as a plausble cost for a one-dmensonal embeddng of nodes: w( f f ) (8), = et D = dag( d,..., d ), where d = w = and defne graph aplacan as = D W. he algorthm essentally mnmzes the quadratc form f f. Suppose U = U UU hen the closed-form soluton s: f = f. (9) U UU U ocally lnear embeddng (E) (Rowes & Saul, ) establshed cost functons on each node. Suppose for node x, the combned contrbuton from all other nodes s w f, subect to w =, w (f requrng reconstructon of each pont lyng wthn the convex hull of ts neghbors), and w = f x does not belong to the set of neghbors of x. hen we would lke the weghted average to be close to f. So we have the followng problem: Mnmze f w f () Usng notaton above, t s essentally mnmzng f f and a smlar closed-form soluton can be gven. (Ham et al., 4) ponted out that both aplacan algorthm (8) and E are kernel PCA wth kernels beng the pseudo-nverse of (denoted as ) and λmax I (or ts centered form) respectvely, where λ max s the largest egenvalue of. In the mnmzaton problems (8, ) for classfcaton tasks, t s n E that plays a smlar role as n aplacan method. o make and ( ) vald covarance matrces of the assocated Gaussan process, we add very small nose σ I. Surprsngly, from ths perspectve, the predcton, say (9), does not make use of nformaton n the lower-rght U U sub-matrx of covarance matrx (kernel) assocated wth unlabeled data. hs s because for a multvarate Gaussan dstrbuton of ( x, x ) wth zero mean and covarance U Σ Σ U the expected mean of x U gven x s ΣUΣ x. What (8) seeks s the x U whch mnmzes the logarthm of Gaussan probablty dstrbuton functon wth x fxed, and ts soluton (9) should be exactly the expected condtonal mean of x U. ote, however, that when these methods are used for sem-supervsed learnng, t s the aplacan that s constructed from the unlabeled data, hence the kernel changes when the unlabeled data change. One restrcton of the aplacan algorthm s the requrement of nonnegatve smlarty. Otherwse the aplacan wll no longer be postve sem-defnte. In E, weghts are decded by mnmzng a functon and t may also requre postve weghts for certan applcatons. As some kernel functons may assume negatve value, ths restrcton precludes a rch source of smlarty functons. Usng our leave-one-out Gaussan process predcton, we can obtan a naturally scaled formula for combnng the contrbutons from all other nodes whle utlzng kernel functons, even those that may assume negatve values. he obectve functon now becomes mnmzng ( ) f b f wth respect to f U n f = ( f, fu ), clampng f to gven labels ± or /. hs OO Gaussan process predcton algorthm makes t possble to utlze general kernels as smlarty for graph algorthms. 4. OO Mxed ρ-ε earnng It s nterestng to nvestgate the model s behavor as an nterpolaton between puttng unlabeled data on smlar footng as the labeled data and gnorng unlabeled data completely. In mn-cut class graph algorthms, the reweghtng of labeled and unlabeled data can usually be carred out by modfyng the weght between nodes, adustng the orgnal weght by a factor η ( η ) when an edge connects unlabeled nodes. he smplest correspondng manpulaton n our model s to convert matrx B as follows: B BU B ηbu B = B = BU BUU + η BU ηbuu In general cases, η as unlabeled ponts are less relable and less nformatve than labeled data. Another method s to tune ε to provde the tradeoff. By applyng Karush-Kuhn-ucker (KK) condtons, we have: β ε + ξ b y+ y = Σ Σ U UU ( ) ( + + b y y ) = γ ε ξ

5 When ε s very large, the constrants (4) and (5) are unlkely to be actve, so β and γ are almost certanly. he only remanng constrants are on labeled data. In ths sense, tunng ε can re-weght the effect of unlabeled data on our model. If β and γ, then Rx Rα and the predcton formula s: y = KR α () Substtutng () nto (4), we have a smplfed model: Maxmze α R KRα + α () = s.t. α [, C ]... (3) Specfcally, for data pont x, the predcton x = x α = x α = α x = = (4) y k R k b c b c k where k x s the kxx (, ),..., kxx (, ) correspondng to unlabeled nstance x n K. Snce b αc s vector ( ) = ndependent of x, ths result shows that the optmal soluton must le n the span of kernels evaluated at the labeled and unlabeled data ponts. In comparson, accordng to the Representer heorem (Schölkopf & Smola, ), the soluton of the Support Vector Machne can be represented entrely wth kernels assocated wth labeled data, even when unlabeled data s present. ote also that the Representer heorem, appled to ρ-ε learnng also shows that the soluton for ρ-ε learnng can be represented usng only kernels assocated wth the labeled and unlabeled data and do not requre kernels assocated wth unseen data. Hence, when addtonal examples are gven for testng, the model can perform n an nductve fashon rather than only n the transductve style. here s a natural connecton between our mxed ρ-ε model and leave-one-out (OO) (Herbrch, ). he OO s formulated as: Mnmze s.t. ξ (5) = c α K( x, x) + b ξ... (6) =, ξ α... (7) he constrants for our mxed ρ-ε model has the form cb y ξ whch s dfferent to (6) but s stll a leave-one-out predcton constrant. he obectve functon () has an addtonal regularzaton term compared to (5) whch ust mnmzes the norm of penalty. Despte these dfferences, both OO and our model are mnmzng the leave-one-out error, usng smlar frameworks to produce the predcton of one nstance s label assumng the others labels are known. 5. Effcency Consderatons wo costly operatons nvolved n ths algorthm are matrx nverson for buldng the GP predcton coeffcent matrx B, and the calculaton of the Hessan matrx for quadratc programmng. o calculate B, one needs k for all =... so the bottleneck les n. Here we can apply Matrx Inverson emma n ts smplest form: ( A uv ) A A u va ( v A u) + = + where A s a nvertble matrx, u and v are vectors. We frst calculate. otcng that + ( ) s dfferent from only by the th row and th column, we have = + u + + v, where s a ( ) vector wth all elements beng except the th element beng. So,..., can be calculated one after one. As each applcaton of matrx nverson lemma has computatonal complexty O( ), the total cost of buldng B s O( 3 ). he QP tself can be solved effcently. he calculaton of the Hessan matrx n (4) wth R defned n () can also be done quckly. But f one-aganst-all approach s adopted for mult-class tasks, calculatng R KR for multple tmes can be costly. However, as R = (R, R, R ) and only R s related to labels whch change n the oneaganst-all process, we only need to calculate R KR and R KR repeatedly. hey both cost OU ( ), under the typcal settngs of sem-supervsed learnng that U. 3 herefore, both can be done effcently. he OU ( ) step for producng R KR needs to be done only once. 6. Expermental Results We evaluate our algorthm on eght datasets. hree are from the newsgroup dataset (9997 nstances) (ang, 995): baseball-hockey (993 nstances / classes), relgon-athesm (44 / ), and pc-mac (943 / ). wo are from handwrtten dgts recognton task: odd-even and dgts (-way), for whch we used a smplfed form avalable at data, wth 96 features each rangng between and 9. We extracted examples from each class (so nstances n all). he other three datasets are synthetc -

6 spral (94 / / features), wne (78 / 3 / 3) (UCI Repostory), and yeast (484 / / 8). All nput feature vectors are normalzed to have length, except the three newsgroup datasets and the -spral dataset. We randomly pck tranng sets and test predcton accuracy on the remanng examples under the constrant that all classes, ncludng each dgt as subclasses for the odd-even dataset, must be present n labeled set. For mult-class tasks, the one-aganst-all heurstc s used and we pck the class whose functon has the largest value. he program s wrtten n Matlab, except the QP solver whch s mplemented more effcently by applyng RO algorthm provded by the oolkt for Advanced Optmzaton (AO) (Benson et al., 4). We compare our algorthm wth the. o ensure that comparson s far we use RBF kernel for both models. For, we choose the RBF kernel bandwdth σ (as n (7)) and tradeoff parameter C that yeld the best average generalzaton performance,.e., maxmzng the average testng accuracy of all choces of tranng dataset sze. For kernel self-consstent labelng, we fx Gaussan nose varance to 4. After some experments, we fnd that C = C =, ε = performs well throughout all datasets and all possble number of labeled data. In fact, the performance s not very senstve to the exact value of C and C, as long as they are at a proper magntude. hs s smlar to the property of tradeoff parameter C n. However, the RBF kernel bandwdth σ needs to be tuned for each dataset. o demonstrate that good values for σ can be found wth reasonable effort, we expermented wth four bandwdths for all datasets: σ = σ r, where r =,,,. In practce, when enough labeled data exsts, k-fold cross valdaton can be used to select good settngs for the parameters. Fgure to Fgure 4 demonstrate the resultng accuraces. Each result s averaged over 3 trals of randomly pckng labeled data. We observe that self-consstent labelng wth r = outperformed for 6 out of the 8 datasets. For the other datasets, yeast and -dgt (both -way multclass tasks), we tred more settngs to see f settngs that can outperform the exst and ndeed we found better settngs as shown n Fgure 4. For yeast dataset, r = yelds consstently better results, whle for -dgt dataset, the result gven by r = 5 s of lttle statstcal dfference from n most cases. Whether the preference for smaller r s due to the larger number of classes or ust due to the property of the dataset tself requres further nvestgaton, because the effect of one-aganst-all heurstc on kernel self-consstent labelng s less known. One avenue of nvestgaton, as future work, s to try to transplant strateges that extend for mult-class classfcaton to our algorthm. For all the datasets except the -dgt dataset, we fnd that r > yelds better results than r < (not shown because of the already consstently poor performance at r = and the trend shown). In normal supervsed learnng, larger tranng set sze usually allows the use of smaller bandwdths to mprove performance. hs suggests that kernel self-consstent labelng s able to take advantage of the unlabeled data and use kernels wth smaller bandwdth effectvely. It s nterestng to nvestgate the nfluence of the rato between labeled postve and negatve data. For all the 5 bnary tasks, the performance of grows worse under the sequence #postve/#negatve = /, /, /3, /4, though the total number of tranng data s ncreasng. Kernel self-consstent labelng has a dfferent trend when usng unbalanced tranng data n the -spral dataset. Wth the number of labeled data growng n that sequence, albet ncreasngly unbalanced, there s consstent mprovement n accuracy. he σ and optmal C for are gven n able. able. Optmal parameters for baseball - relgon - hockey athesm pc - mac odd - even σ C dgts -spral wne yeast σ C 4 4 Fgure. Accuracy results (n percentage) for -spral data. In the upper fgure, there s equal number of postve and negatve labeled data. he lower fgure shows dfferent numbers of pos/neg labeled data.,,, n legend are the values of r.

7 Fgure. Accuracy results for baseball vs hockey (left), relgon vs athesm (mddle), and pc vs mac (rght). he upper mddle fgure has no curve for r = because ts accuracy s always below 5%. Fgure 3. Accuracy results for odd vs even. 7. Conclusons In ths paper, we descrbe a new method of utlzng unlabeled data for classfcaton based on leave-one-out Gaussan process predcton and mxed classfcatonregresson. We motvate ths approach usng the prncples of low tranng error on labeled data and low OOCV error on whole dataset. he essence of the algorthm s to vew any generc vald kernel or Gram matrx as covarance matrx of a Gaussan process and then establsh the leave-one-out predcton formula based on condtonal expectaton under multvarate Gaussan dstrbuton. hs formula connects all nstances and can also possbly be nterpreted as a smlarty measure. Bult upon ths connecton, a support vector style model s proposed by mnmzng the hnge loss on labeled data for classfcaton, and the ε-nsenstve cost on unlabeled data for regresson. he model reduces to a local-optma-free quadratc programmng problem wth only bound constran and can be solved effcently. Both the Hessan matrx and GP predcton coeffcent matrx can be computed

8 Fgure 4. Accuracy results for wne (left), yeast (mddle), and -dgt (rght). For these mult-class tasks, we do not show the nfluence of dfferent rato of labeled data from each class. here s no lne for r = n the rght fgure (-dgt) because ts accuracy s always below 8%. effcently, by matrx block decomposton and matrx nverson lemma respectvely. Expermental results llustrate the advantage of sem-supervsed learnng usng kernel self-consstent labelng when compared to usng the. Further work needs to be done to extend the current bnary classfcaton settngs so that t uses an n-bult mechansm for mult-class classfcaton, rather than general one-aganst-all heurstcs. Other varants of such as γ - or other loss functons lke Huber s robust loss (Schölkopf & Smola, ) can also be utlzed under the framework of self-consstency. Fnally, t s desrable to explore and establsh, under the settngs of sem-supervsed learnng, stronger theoretcal connectons wth RKHS models and learnng theory, where many state-of-the-art learnng models are rooted. References Benson, S., McInnes,., Moré, J., & Sarch, J. (4). AO User Manual (Revson.7) A/MCS-M-4. Blum, A., & Chawla, S. (). earnng from labeled and unlabeled data usng graph mncuts. Proc. of the 8 th Int l Conf. on Machne earnng (pp. 9-6). Chung, F. R. K. (997). Spectral graph theory. o. 9 n Regonal Conference Seres n Mathematcs. Amercan Mathematcal Socety. Crstann,., Shawe-aylor, J., Elsseeff, A., & Kandola, J. (). On kernel-target algnment. In Advances n eural Informaton Processng Systems, 4, Ham, J., ee, D., Mka, S., & Schölkopf, B. (4). A kernel vew of the dmensonalty reducton of manfolds. Proceedngs of the th Internatonal Conference on Machne earnng (pp. 9-97). Joachms,. (999). ransductve nference for text Classfcaton usng support vector machnes. Proceedngs of the 6 th Internatonal Conference on Machne earnng (pp. -9). Joachms,. (3). ransductve learnng va Spectral Graph Parttonng. Proceedngs of the th Internatonal Conference on Machne earnng (pp. 9-97). n C.-J., & Moré, J. (999). ewton s method for large bound-constraned optmzaton problems. SIOP, 9(4), -7. Herbrch, R. (). earnng kernel classfers: theory and algorthms. Cambrdge, Mass.: MI Press. Rowes, S., & Saul, K. (). onlnear dmensonalty reducton by locally lnear embeddng. Scence, 9, Seeger, M. (4). Gaussan processes for machne learnng. Int l Journal of eural Systems, 4(), Schölkopf, B., & Smola, A. (). earnng wth kernels: support vector machnes, regularzaton, optmzaton, and beyond. Cambrdge, MA: MI Press. Slverman, B. (984). Splne smoothng: the equvalent varable kernel method. Annals of Statstcs,, Sollch, P., & Wllams, C. (4). Usng the equvalent kernel to understand Gaussan Process regresson. In Advances n eural Informaton Processng Systems, 7. Szummer, M., & Jaakkola,. (). Partally labeled classfcaton wth Markov random walks. In Advances n eural Informaton Processng Systems, 4, UCI (). Repostory of machne learnng databases. Vapnk, V. (995). he ature of Statstcal earnng heory. Sprnger, ew York. Zhu, X., Ghahraman, Z., & afferty, J. (3). Semsupervsed learnng usng Gaussan felds and harmonc functons. Proceedngs of the th Internatonal Conference on Machne earnng (pp. 9-99).

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

Semi-supervised Classification with Active Query Selection

Semi-supervised Classification with Active Query Selection Sem-supervsed Classfcaton wth Actve Query Selecton Jao Wang and Swe Luo School of Computer and Informaton Technology, Beng Jaotong Unversty, Beng 00044, Chna Wangjao088@63.com Abstract. Labeled samples

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

Statistical pattern recognition

Statistical pattern recognition Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? + + + + + + + + + Intuton of Margn Consder ponts

More information

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015 CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research

More information

Lecture 3: Dual problems and Kernels

Lecture 3: Dual problems and Kernels Lecture 3: Dual problems and Kernels C4B Machne Learnng Hlary 211 A. Zsserman Prmal and dual forms Lnear separablty revsted Feature mappng Kernels for SVMs Kernel trck requrements radal bass functons SVM

More information

Lecture 10 Support Vector Machines. Oct

Lecture 10 Support Vector Machines. Oct Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? Intuton of Margn Consder ponts A, B, and C We

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

CSE 252C: Computer Vision III

CSE 252C: Computer Vision III CSE 252C: Computer Vson III Lecturer: Serge Belonge Scrbe: Catherne Wah LECTURE 15 Kernel Machnes 15.1. Kernels We wll study two methods based on a specal knd of functon k(x, y) called a kernel: Kernel

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z ) C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Linear Classification, SVMs and Nearest Neighbors

Linear Classification, SVMs and Nearest Neighbors 1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush

More information

The Minimum Universal Cost Flow in an Infeasible Flow Network

The Minimum Universal Cost Flow in an Infeasible Flow Network Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran

More information

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan Kernels n Support Vector Machnes Based on lectures of Martn Law, Unversty of Mchgan Non Lnear separable problems AND OR NOT() The XOR problem cannot be solved wth a perceptron. XOR Per Lug Martell - Systems

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Experment-I MODULE VII LECTURE - 3 ANALYSIS OF COVARIANCE Dr Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Any scentfc experment s performed

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

On the Multicriteria Integer Network Flow Problem

On the Multicriteria Integer Network Flow Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 5, No 2 Sofa 2005 On the Multcrtera Integer Network Flow Problem Vassl Vasslev, Marana Nkolova, Maryana Vassleva Insttute of

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

Estimation: Part 2. Chapter GREG estimation

Estimation: Part 2. Chapter GREG estimation Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

18-660: Numerical Methods for Engineering Design and Optimization

18-660: Numerical Methods for Engineering Design and Optimization 8-66: Numercal Methods for Engneerng Desgn and Optmzaton n L Department of EE arnege Mellon Unversty Pttsburgh, PA 53 Slde Overve lassfcaton Support vector machne Regularzaton Slde lassfcaton Predct categorcal

More information

Support Vector Machines

Support Vector Machines CS 2750: Machne Learnng Support Vector Machnes Prof. Adrana Kovashka Unversty of Pttsburgh February 17, 2016 Announcement Homework 2 deadlne s now 2/29 We ll have covered everythng you need today or at

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Chapter 12 Analysis of Covariance

Chapter 12 Analysis of Covariance Chapter Analyss of Covarance Any scentfc experment s performed to know somethng that s unknown about a group of treatments and to test certan hypothess about the correspondng treatment effect When varablty

More information

Sparse Gaussian Processes Using Backward Elimination

Sparse Gaussian Processes Using Backward Elimination Sparse Gaussan Processes Usng Backward Elmnaton Lefeng Bo, Lng Wang, and Lcheng Jao Insttute of Intellgent Informaton Processng and Natonal Key Laboratory for Radar Sgnal Processng, Xdan Unversty, X an

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

LECTURE 9 CANONICAL CORRELATION ANALYSIS

LECTURE 9 CANONICAL CORRELATION ANALYSIS LECURE 9 CANONICAL CORRELAION ANALYSIS Introducton he concept of canoncal correlaton arses when we want to quantfy the assocatons between two sets of varables. For example, suppose that the frst set of

More information

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 ) Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2 Salmon: Lectures on partal dfferental equatons 5. Classfcaton of second-order equatons There are general methods for classfyng hgher-order partal dfferental equatons. One s very general (applyng even to

More information

Regularized Discriminant Analysis for Face Recognition

Regularized Discriminant Analysis for Face Recognition 1 Regularzed Dscrmnant Analyss for Face Recognton Itz Pma, Mayer Aladem Department of Electrcal and Computer Engneerng, Ben-Guron Unversty of the Negev P.O.Box 653, Beer-Sheva, 845, Israel. Abstract Ths

More information

17 Support Vector Machines

17 Support Vector Machines 17 We now dscuss an nfluental and effectve classfcaton algorthm called (SVMs). In addton to ther successes n many classfcaton problems, SVMs are responsble for ntroducng and/or popularzng several mportant

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

Unified Subspace Analysis for Face Recognition

Unified Subspace Analysis for Face Recognition Unfed Subspace Analyss for Face Recognton Xaogang Wang and Xaoou Tang Department of Informaton Engneerng The Chnese Unversty of Hong Kong Shatn, Hong Kong {xgwang, xtang}@e.cuhk.edu.hk Abstract PCA, LDA

More information

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud Resource Allocaton wth a Budget Constrant for Computng Independent Tasks n the Cloud Wemng Sh and Bo Hong School of Electrcal and Computer Engneerng Georga Insttute of Technology, USA 2nd IEEE Internatonal

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Some modelling aspects for the Matlab implementation of MMA

Some modelling aspects for the Matlab implementation of MMA Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton

More information

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland

More information

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression 11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

The Order Relation and Trace Inequalities for. Hermitian Operators

The Order Relation and Trace Inequalities for. Hermitian Operators Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence

More information

MAXIMUM A POSTERIORI TRANSDUCTION

MAXIMUM A POSTERIORI TRANSDUCTION MAXIMUM A POSTERIORI TRANSDUCTION LI-WEI WANG, JU-FU FENG School of Mathematcal Scences, Peng Unversty, Bejng, 0087, Chna Center for Informaton Scences, Peng Unversty, Bejng, 0087, Chna E-MIAL: {wanglw,

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Exerments-I MODULE III LECTURE - 2 EXPERIMENTAL DESIGN MODELS Dr. Shalabh Deartment of Mathematcs and Statstcs Indan Insttute of Technology Kanur 2 We consder the models

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them?

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them? Image classfcaton Gven te bag-of-features representatons of mages from dfferent classes ow do we learn a model for dstngusng tem? Classfers Learn a decson rule assgnng bag-offeatures representatons of

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING 1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N

More information

Non-linear Canonical Correlation Analysis Using a RBF Network

Non-linear Canonical Correlation Analysis Using a RBF Network ESANN' proceedngs - European Smposum on Artfcal Neural Networks Bruges (Belgum), 4-6 Aprl, d-sde publ., ISBN -97--, pp. 57-5 Non-lnear Canoncal Correlaton Analss Usng a RBF Network Sukhbnder Kumar, Elane

More information

Chapter 8 Indicator Variables

Chapter 8 Indicator Variables Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n

More information

Nonlinear Classifiers II

Nonlinear Classifiers II Nonlnear Classfers II Nonlnear Classfers: Introducton Classfers Supervsed Classfers Lnear Classfers Perceptron Least Squares Methods Lnear Support Vector Machne Nonlnear Classfers Part I: Mult Layer Neural

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

Chapter 6 Support vector machine. Séparateurs à vaste marge

Chapter 6 Support vector machine. Séparateurs à vaste marge Chapter 6 Support vector machne Séparateurs à vaste marge Méthode de classfcaton bnare par apprentssage Introdute par Vladmr Vapnk en 1995 Repose sur l exstence d un classfcateur lnéare Apprentssage supervsé

More information

Efficient, General Point Cloud Registration with Kernel Feature Maps

Efficient, General Point Cloud Registration with Kernel Feature Maps Effcent, General Pont Cloud Regstraton wth Kernel Feature Maps Hanchen Xong, Sandor Szedmak, Justus Pater Insttute of Computer Scence Unversty of Innsbruck 30 May 2013 Hanchen Xong (Un.Innsbruck) 3D Regstraton

More information

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS) Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998

More information

Relevance Vector Machines Explained

Relevance Vector Machines Explained October 19, 2010 Relevance Vector Machnes Explaned Trstan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introducton Ths document has been wrtten n an attempt to make Tppng s [1] Relevance Vector Machnes

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16 STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

Support Vector Machines

Support Vector Machines Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class

More information

Report on Image warping

Report on Image warping Report on Image warpng Xuan Ne, Dec. 20, 2004 Ths document summarzed the algorthms of our mage warpng soluton for further study, and there s a detaled descrpton about the mplementaton of these algorthms.

More information

Time-Varying Systems and Computations Lecture 6

Time-Varying Systems and Computations Lecture 6 Tme-Varyng Systems and Computatons Lecture 6 Klaus Depold 14. Januar 2014 The Kalman Flter The Kalman estmaton flter attempts to estmate the actual state of an unknown dscrete dynamcal system, gven nosy

More information

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

Probability Density Function Estimation by different Methods

Probability Density Function Estimation by different Methods EEE 739Q SPRIG 00 COURSE ASSIGMET REPORT Probablty Densty Functon Estmaton by dfferent Methods Vas Chandraant Rayar Abstract The am of the assgnment was to estmate the probablty densty functon (PDF of

More information