Journal of Mathematical Analysis and Applications

Size: px
Start display at page:

Download "Journal of Mathematical Analysis and Applications"

Transcription

1 J Math Anal Appl Contents lists available at ScienceDirect Journal of Matheatical Analysis and Applications wwwelsevierco/locate/jaa Sei-Supervised Learning with the help of Parzen Windows Shao-Gao Lv a,, Yun-Long Feng b a Statistics School, Southwestern University of Finance and Econoics, Chengdu 630, China b Joint Advanced Research Center of University of Science and Technology of China, and City University of Hong ong, SuZhou 2523, China article info abstract Article history: Received 5 April 20 Available online 29 July 20 Subitted by U Stadtueller eywords: Sei-Supervised Learning Graph-based odels Support vector achine Least square regression Reproducing kernel Hilbert spaces Sei-Supervised Learning is a faily of achine learning techniques that ake use of both labeled and unlabeled data for training, typically a sall aount of labeled data with a large nuber of unlabeled data In this paper we propose a Sei-Supervised regression algorith by eans of density estiator, generated by Parzen Windows functions under the fraework of Sei-Supervised Learning We conduct error analysis by capacity independent technique and obtain soe satisfactory learning rates in ters of regularity of the target function and the decay condition on the arginal distribution near the boundary 20 Elsevier Inc All rights reserved Introduction Traditional learning fro exaple uses only labeled data to train However, in any real world application, it is relatively easy to acquire a large aount of unlabeled data For exaple, in phonological acquisition contexts, a child is exposed to any acoustic utterances These utterances do not coe with identifiable phonological arkers Corrective feedback is the ain source of directly labeled exaples In any cases, a sall aount of feedback is sufficient to allow the child to aster the acoustic-to-phonetic apping of any language Another instance is for a robot with a video caera The robot continuously takes high frae-rate video of its surroundings and seek to learn the naes of various objects in the video, but the robot stores naes in designed syste beforehand only very rarely These phenoenon can be forulated as a Sei-Supervised Learning situation: ost objects are unlabeled, while only a few are labeled Sei-Supervised Learning addresses this proble by using large aount of unlabeled data, together with the labeled data, to build better learners Due to less huan efforts and higher accuracy, Sei-Supervised Learning can be of great practical value At the first glance, the unlabeled data by itself does not carry any inforation on the apping fro the input space to output space Y How can it help us learn a better predictor f : Y Balcan and Blu pointed out in [] that the key lies in an iplicit ordering of f F induced by the unlabeled data Roughly speaking, if the iplicit order happens to rank the target predictor f near the top, then one needs less labeled data to learn f This idea will be foralized on using PAC learning bounds In other context, the iplicit ordering is interpreted as a prior over F or as a regularizer Therefore, finding the iplicit order of input space and designing a good learning algorith are both very critical in Sei-Supervised Learning The history of Sei-Supervised Learning goes back to at least the 70s, when Self-training, transduction and Gaussian ixtures with the EM algorith first eerged A variety of Sei-Supervised techniques have been developed, which all can be classified into generative and discriinative ethods A straightforward, generative Sei-Supervised ethod is the Expectation Maization EM algorith The EM approach for naive Bayes text classification odels is discussed * Corresponding author E-ail address: kenan76@ailustceducn S-G Lv /$ see front atter 20 Elsevier Inc All rights reserved doi:006/jjaa

2 206 S-G Lv, Y-L Feng / J Math Anal Appl by Niga [2] Discriinative Sei-Supervised ethods [9] include probabilistic and non-probabilistic approaches, such as transductive or Sei-Supervised Support vector Machines TSVMs, S3VMs and a variety of other graph based ethods [23,2] Different learning ethods for Sei-Supervised Learning are based on different assuptions Fulfilling these assuptions is crucial for the success of the ethods More discussions can be founded in [3,22] Fro the learning kernel point of view, any esting structured learning algoriths eg conditional rando fields, au argin Markov networks can be endowed with a Sei-Supervised kernel The key idea lies in that the graph kernel can be constructed with all the unlabeled data, next one applies the graph kernel to a standard structured learning kernel achine Such kernel achines include the kernelized conditional rando fields [0] and the au argin Markov networks [7], which differ priarily by the loss function they use As is well known, Parzen Windows ethod [3] is a widely used technique for kernel density estiation and regression see eg [9], and recently they are also applied to ulti-classifier learning [4] and regression proble [2] in a general subset of R d by a decay condition near the boundary These otivate us to study Sei-Supervised regression integrated with density estiations using Parzen Windows Intuitively, the unknown arginal distribution can be characterized well by Parzen Windows using aounts of unlabeled data This aounts to provide us soe additional useful inforation for supervised learning Definition We say that : R d R d R is a basic window function if it is continuous and it satisfies: i R d x, y dy = foreachx R d ; ii there est soe q > d + 2 and constant c q > 0 such that x, y c q + x y, x, q y Rd The decay condition is reasonable since it is satisfied by any coonly used function in ultivariate analysis and learning theory Suppose that the input space is a copact subset of R d, and the output space Y R Letρ be the probability easure defined on Y Given a sequence of saple z := x i, y i } i= drawn independently according to ρ, andn unlabeled data x i } +n i=+ drawn fro the arginal distribution denoted by ρ,often n In our setting, the objective is to learn the regression function f ρ x = ydρy x, x, Y where ρy x is the conditional probability easure at any given x induced by ρ We consider a learning algorith in a reproducing kernel Hilbert space RHS H associated with a Mercer kernel, where : R is a continuous, syetric and positive sei-definite function Without loss of generality, assue that sup x,y x, y κ On the basis of density estiator generated by Parzen Windows, we learn the regression function by f z,λ = arg in f H + n d +n i= j=, x j f y i 2 + λ f 2 }, 2 where = >0 is a scalar paraeter and λ 0,, a regularization paraeter controlling the trade-off between the epirical error and the penalty ter f 2 Since is a copact subset of R d, soe regularity conditions on both the arginal distribution and the density are required Throughout the paper, we give the following assuption Assuption Suppose that for soe 0 < τ, c p > 0, the arginal distribution ρ satisfies } ρ x : inf u x s c 2 u R d p s2τ, s > 0, 3 \ and the density px of ρ satisfies sup px c p, x px pv c p v x τ, v, x 4 Denote L 2 μ as the L2 -space with the inner production f, g L 2 μ = f xgx dμ Our learning rates will be achieved under the regularity assuption on the regression function that f ρ lies in the range of L r for soe r > 0 Here L is the integral operator on L 2 ν defined by

3 L f x = S-G Lv, Y-L Feng / J Math Anal Appl x, y f y dνy, x, f L 2 ν, where dνx = px V p dρ x with V p := px2 dx>0 The operator L is linear, copact, positive and can be also regarded as a self-adjoint operator on H Sincetheeasuredνx = px V p dρ x is probability one on and p c p,wesee that the fractional operator L r 0 < r is also well-defined on L2 ρ in [4] We can now state our learning rate for the Sei-Supervised regression algorith which will be proved in Section 5 Theore Assue y M for soe constant M > 0 Suppose Assuption holds, and L f ρ L 2 ρ with 0 < r Choose τ λ = τ 2τ r++d and λ = For any 0 <δ<, with confidence to δ, there holds z,λ f ρ L 2 ρ C log 2 δ where C is a constant independent on or δ τ r 2τ r++d, The key technique behind the proof for the convergence of the learning schee lies in the error decoposition, consisting of a saple error ter and an approation error ter The first ter, the saple error, is bounded using a concentration inequality in a Hilbert space since it is a function of the saple z On the other hand, the second ter, the approation error, does not depend on the saple and we use approation theory to attain the purpose The paper is organized as follows In Section 2 a new Sei-Supervised regression algorith is proposed based on density estiations Then we introduce a necessary concentration inequality in a Hilbert space and saple error in H are estiated in Section 3 In Section 4 we bound the approation error under the condition of Assuption and the regularity of the regression function Finally we obtain the learning rate by cobining the saple error with approation error 2 Sei-Supervised algorith with density estiators Turn our attention to Parzen Windows again Let x, x 2,,x n be an iid saple drawn fro soe distribution with an unknown density p We are interested in estiating the shape of this function p Its kernel density estiator by eans of Parzen Windows ethod is given as p x, = n d n x, x i, i= where > 0 is a soothing paraeter called the bandwidth Parzen showed that p x, converges to px and = n satisfies li n = 0, n li n[ n ] d = n Akernelwithsubscript is called the scaled kernel and defined as x = / x/ Intuitively one wants to choose as sall as the data allows, however there is always a trade-off between the bias of the estiator and its variance When the boundary of satisfies certain decay condition, by choosing a proper paraeter n, a standard convergence rate for density estiation founded in [6,8] can be expressed as px, px L 2 = O n 2/d+4 22 Note that if the unknown density p is sooth enough, the convergence rate can be iproved by using the higher order Parzen Windows ethods [2] In this paper, our idea lies in that ore accuracy should be given at saple point where the density of the arginal distribution ρ is uch larger than other field In the classical regression algorith, all the saple points are equivalently treated and it ay cause large violation when there est soe singular points aong all the saple data Since an unseen data will appears with greater confidence in uch denser fields, our efforts should focus on there by using huge surplus unlabeled data Considering the data structure of the input space, we propose the kernel-based regression estiator with respect to density function as } f z,λ = arg in f H i= px i f x i y i 2 + λ f 2 2

4 208 S-G Lv, Y-L Feng / J Math Anal Appl As entioned above, px is usually unknown and we shall replace it with a density estiator Here we ake use of Parzen Windows ethods as the kernel density estiation Thus, given a set of n unlabeled exaples x i } +n i=+,wegetthe optiization proble 2 The optiization ethod 2 looks like weighted least square regression WLSR in Supervised Learning [5,7] But here we are ainly concerned with underling data shape induced by large aounts of unlabeled data In the classical WLSR setting, there ests no surplus unlabeled data and estiation paraeter ay be sensitive to the weight function using only a few observations By the reproducing property of RHS, the Representer Theore holds, which eans that the iniizer can be obtained in a finite diensional space in ters of both labeled and unlabeled exaples As before, the Representer Theore shows that the solution has the for f z,λ x = α i x, x i i= Substituting this for in the proble above, we arrive at the following convex differentiable objective function of the -diensional variable a =[a,,a ] T : a = arg in Y xa T Q Y x a + λa T x a, where x is the gra atrix i, j = x i, x j, Q = diagp x, x,,p x, x and Y is the label vector Y = [y,,y ] T The derivative of the objective function vanishes at the iniizer, which leads to the following solution: α = Q x + λi QY 3 Saple error estiate In this section we investigate the approation error of our proposed algorith as the paraeters and λ change To understand 2, denote by x the set of inputs x,,x }, and define the sapling operator S x : H R S x f = f x i i= The adjoint of the sapling operator, ST x : R H, can be written as as S T x c = c i, c R i= A siilar ethods with [4] are applied to this algorith 2, whose iniizer can be represented by f z,λ = + n d ST x GS x + λi + n d ST xgy, 3 where G = diaggx,,gx } with gx i = +n j= x i, x j i =,, and Y is defined as above, the following notations play key role in our theoretical analysis To study the liitation behavior of the function f z,λ Definition 2 The regularization error and the regularizing function of the triple H, f ρ, ρ are defined respectively as x Dλ = inf f H d, t f x fρ x } 2 dρ x dρ t + λ f 2, 32 and f λ = arg in f H d x, t f x fρ x } 2 dρ x dρ t + λ f 2, λ>0 33 In order to obtain the explicit for of f λ, we introduce the following definition L : L2 ρ L 2 ρ,givenas L f u = d x, t f x u x dρ x dρ t, for u Taking the functional derivatives, we know that f λ can be expressed associated with L f λ = λi + L L f ρ 34

5 S-G Lv, Y-L Feng / J Math Anal Appl Intuitively, fixed f H, there holds E ll+u d ST x GS x f = L f, and E ll+u d ST x GY = L f ρ Therefore, it sees that f x, can approate f λ To bound the saple error f z,λ f λ, we need to introduce a McDiarid Bernstein type probability inequality for vector-valued rando variables, which appears in [5] Lea Let x =x i } i= be iid draws fro a probability distribution ρ on, H, be a Hilbert space, and F : H be easurable If there ests M 0 such that F x E F x M foreach i andalostx, then for every ε > 0, } Prob x F x Ex F x ε 2 } ε 2exp 2 Mε + 2, 35 where 2 := i= sup x\x i E x i F x E F x 2 } For any 0 <δ<, with confidence to δ, there holds F x Ex F x 2log 2 M + } 2 δ It is worth pointing out that though estiating the Saple Error is a standard ethod in learning theory [,6], the operator L is not necessary in the Hilbert Schidt space and the corresponding skills are no longer applicable Proposition Suppose that the density px of ρ ests and satisfies sup x px c p, then for any 0 < and 0 <δ,with confidence δ, there holds z,λ f 2 2cq κm + f λ λ log + c λ d/2 d/2 qc p 36 δ Proof By 3, we have f z,λ f λ = + n d ST x GS x + λi S T + n d x GY S T x GS x f λ Define a function F : Z n H by i= j= i= j= λ f λ } F z, x = S T + n d x GY S T x GS x f λ +n = + n d, x +n j y i, x } j f λ x i By independence, the expectation of F z, x equals x d, t } fρ x f λ x u x dρ x dρ t = L fρ f λ u By 34, we see that λ f λ = L f ρ f λ = EF z, x Therefore, z,λ f λ λ j k F z, x E F z, x When k,,}, we see that F z, x E zk F z, x equals xk + n d, x j yk f λ x k u x k + yi + n d f λ x i u x i + + n d i k xk, x k The reproducing property of RHS iplies that, x k yk f λ x k u x k x, x j, x x, x } fρ x f λ x u x dρ x dρ x } fρ x f λ x u x dρ x f L 2 ρ f κ f, for f H 37

6 20 S-G Lv, Y-L Feng / J Math Anal Appl Hence, when k,,}, we have F z, x Ezk F z, x M := 6c q κm + f λ d 38 Clearly, the sae conclusion also holds for the case k +,, + n} Next we need to estiate the variance 2, F z, x E zk F z, x can be bounded by 3c q κm + f λ +n + n d xk, x j +, x } k j= i k Note that fro Definition of Parzen Windows, it follows that E k F z, x E zk F z, x 2 /2 is bounded by 6c q κm + f λ +n c 2 /2 q px + n d dx 39 + x j x/ 2q j= 6c2 qκc p M + f λ d/2 It follows that for, 2 36c4 qκ 2 c 2 p M + f λ 2 d Thus Proposition 3 follows fro Lea 4 Approation error estiate 30 To estiate the approation error, we need to consider the convergence of L as 0 Proposition 2 Under Assuption,thenV p c p and for any 0 < we have L V p L L 2 ρ τ κ 2 L 2 c q c p M q+2 + M ρ q+τ + M q Here M q+2,m q+τ and M q are constants independent of Proof Let f L 2 ρ, by Definition we see that L f V p L f L 2 is bounded by ρ κ 2 x d, t f x px pt dρ x dt + x, t f x px } dρ x dt, R d \ and it follows that κ 2 x d, t f x px pt dρ x dt κ 2 d c p c q + x t / q x t τ f x dt dρ x τ κ 2 c p c q M q+τ f L 2 ρ, where M q+τ is soe constant independent of and the last inequality is obtained by the Cauchy Schwartz inequality Separate the doain into := x : inf u R d \ u x } and its copleent \ When x \,foranyt R d \, t x is satisfied, and thereby t x 2 Hence κ 2 x d, t f x px dρ x dt κ 2 c q c p M q+2 f L 2, ρ \ R d \ where M q+2 is also a constant siilar with M q+τ For the subset, we use the Cauchy Schwartz inequality and obtain x, t f x px dρ x dt ρ κ 2 c q c p M q f L 2 ρ κ 2 d R d \

7 S-G Lv, Y-L Feng / J Math Anal Appl By 3, ρ c 2 p 2τ Thus,forany0<, we have L f V p L f L 2 ρ τ κ 2 c q c p M q+2 + M q+τ + M q f L 2 ρ This proves our desired result Define the regularizing function independent on λ f λ = I + L L f ρ V p Proposition 3 Under the assuptions 3 and 4 WhenL f ρ L 2 ρ with 0 < r,wehave λ f L λ 2 C 0 ρ τ λ r, where C 0 = κ 2 c q c p V p r M q+2 + M q+τ + M q L f ρ L 2 ρ Proof Observe that L V p L λi + V p L f ρ f λ f λ = λ λi + L It follows that λ f L λ 2 L ν V p L λi + V p L L r L The desired result follows fro Proposition 3 2 λ r L ρ V p L L 2 ρ To get the total error f z,λ f ρ we need bounds for the approation error f λ f ρ The following result is well known, proved by Sale and Zhou in [6] Lea 2 Define f λ as above If L f ρ L 2 ρ with 0 < r,then f λ f ρ 2 L 2 ρ + λ f λ 2 λ2r V 2r p L f ρ 2, L 2 if 0 < r < ρ 2 4 and f λ f ρ λ r 2 V 2 p Moreover, for 0 < r, there holds f λ f ρ L 2 ρ λ r V p L L 2, if r 42 ρ 2 2 ρ Following Proposition 3 and 43, the approation error can be stated as follows Proposition 4 Under the assuptions 3 and 4 WhenL f ρ L 2 ρ with 0 < r,wehave λ f ρl 2 ρ τ λ r + λ r C 0 + V p L 2, 44 ρ where C 0 is defined as in Proposition 3 5 Learning rate and conclusion We are now in a position to derive the learning rate of Sei-Supervised Learning algorith 2 Proof of Theore Following Propositions 3 and 4, with confidence to δ, z,λ 2 ρ z,λ f λ L 2 + ρ λ f ρl 2 ρ 2 2cq κ 2 M + f λ log δ + c λ d/2 d/2 qc p τ r 2 2τ r++d C log, δ τ provided that λ = τ and λ = 2τ r++d, where C is a constant independent on or δ + τ λ r + λ r C 0 + V p L 2 ρ 43

8 22 S-G Lv, Y-L Feng / J Math Anal Appl We introduce a Sei-Supervised Learning algorith with density estiators, generated by Parzen Windows functions The ain difference fro Weighted Supervised Learning is that large aounts of unlabeled data are eployed in density estiations Thus estiation paraeter ay be less sensitive than the case where only a few observations are used, such as Weighted Least Squared Regression An error analysis is given for the convergence of the learner to the regression function The technique we use here is capacity independent one Considering the coplety of hypothesis space such as Radeacher coplety [8] and various covering nubers [20], we are sure that the convergence rate can be iproved greatly if the hypothesis space is good enough References [] MF Balcan, A Blu, A discriinative odel for sei-supervised learning, J ACM 2009 [2] M Belkin, I Matveeva, P Niyogi, Regularization and sei-supervised learning on large graphs, in: COLT, 2004 [3] O Chapelle, A Zien, B Scholkopf, Sei-Supervised Learning, MIT Press, 2006 [4] F Cucker, S Sale, On the atheatical foundations of learning, Bull Aer Math Soc NS [5] F Cucker, D Zhou, Learning Theory: An Approation Theory Viewpoint, Cabridge University Press, Cabridge, 2007 [6] L Devroye, G Lugosi, Cobinatorial Methods in Density Estiation, Springer, Heidelberg, 2000 [7] T Evgeniou, M Pontil, T Poggio, Regularization networks and support vector achines, Adv Coput Math [8] Fukunaga, Introduction to Statistical Pattern Recognition, Acadeic, London, 990 [9] Rie ubota Ando, Tong Zhang, Two-view feature generation odel for sei-supervised, in: Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, 2007 [0] Lafferty, J Zhu, Y Liu, ernel conditional rando fields: Representation and clique selection, in: The 2st International Conference on Machine Learning ICML, 2004 [] S Mukherjee, D Zhou, Learning coordinate covariances via gradients, J Mach Learn Res [2] Niga, T Mitchell, A McCallu, S Thrun, Text classification fro labeled and unlabeled docuents using EM, in: Machine Learning, luwer Acadeic Publishers, Boston, 2000, pp 34 [3] E Parzen, On the estiation of a probability density function and the ode, Ann Math Stat [4] ZW Pan, DH iao, QW iao, D Zhou, Parzen windows for ulti-class classification, J Coplety [5] I Pinelis, Optiu bounds for the distributions of artingales in Banach spaces, Ann Probab [6] S Sale, D Zhou, Learning theory estiates via integral operators and their approations, Constr Approx [7] B Taskar, C Guestrin, D oller, Max-argin Markov networks, in: NIPS 03, 2003 [8] AW van der Vaart, JA Wellner, Weak Convergence and Epirical Processes, Springer-Verlag, New York, 996 [9] MP Wand, MC Jones, ernel Soothing, Monogr Statist Appl Probab, vol 60, Chapan Hall, London, 995 [20] D Zhou, Capacity of reproducing kernel spaces in learning theory, IEEE Trans Infor Theory [2] J Zhou, D Zhou, High order Parzen windows and randoized sapling, Adv Coput Math [22] J Zhu, Sei-supervised learning literature survey, Tech Report, 530 [23] J Zhu, Sei-supervised learning with graph, Doctoral Thesis, Carnegie Mellon University, 2005

Shannon Sampling II. Connections to Learning Theory

Shannon Sampling II. Connections to Learning Theory Shannon Sapling II Connections to Learning heory Steve Sale oyota echnological Institute at Chicago 147 East 60th Street, Chicago, IL 60637, USA E-ail: sale@athberkeleyedu Ding-Xuan Zhou Departent of Matheatics,

More information

Learnability of Gaussians with flexible variances

Learnability of Gaussians with flexible variances Learnability of Gaussians with flexible variances Ding-Xuan Zhou City University of Hong Kong E-ail: azhou@cityu.edu.hk Supported in part by Research Grants Council of Hong Kong Start October 20, 2007

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

Yongquan Zhang a, Feilong Cao b & Zongben Xu a a Institute for Information and System Sciences, Xi'an Jiaotong. Available online: 11 Mar 2011

Yongquan Zhang a, Feilong Cao b & Zongben Xu a a Institute for Information and System Sciences, Xi'an Jiaotong. Available online: 11 Mar 2011 This article was downloaded by: [Xi'an Jiaotong University] On: 15 Noveber 2011, At: 18:34 Publisher: Taylor & Francis Infora Ltd Registered in England and Wales Registered Nuber: 1072954 Registered office:

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

An l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions

An l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions Journal of Matheatical Research with Applications Jul., 207, Vol. 37, No. 4, pp. 496 504 DOI:0.3770/j.issn:2095-265.207.04.0 Http://jre.dlut.edu.cn An l Regularized Method for Nuerical Differentiation

More information

Understanding Machine Learning Solution Manual

Understanding Machine Learning Solution Manual Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

Stability Bounds for Non-i.i.d. Processes

Stability Bounds for Non-i.i.d. Processes tability Bounds for Non-i.i.d. Processes Mehryar Mohri Courant Institute of Matheatical ciences and Google Research 25 Mercer treet New York, NY 002 ohri@cis.nyu.edu Afshin Rostaiadeh Departent of Coputer

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

SVM Soft Margin Classifiers: Linear Programming versus Quadratic Programming

SVM Soft Margin Classifiers: Linear Programming versus Quadratic Programming SVM Soft Margin Classifiers: Linear Prograing versus Quadratic Prograing Qiang Wu wu.qiang@student.cityu.edu.hk Ding-Xuan Zhou azhou@cityu.edu.hk Departent of Matheatics, City University of Hong Kong,

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

3.3 Variational Characterization of Singular Values

3.3 Variational Characterization of Singular Values 3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

Multi-kernel Regularized Classifiers

Multi-kernel Regularized Classifiers Multi-kernel Regularized Classifiers Qiang Wu, Yiing Ying, and Ding-Xuan Zhou Departent of Matheatics, City University of Hong Kong Kowloon, Hong Kong, CHINA wu.qiang@student.cityu.edu.hk, yying@cityu.edu.hk,

More information

Symmetrization and Rademacher Averages

Symmetrization and Rademacher Averages Stat 928: Statistical Learning Theory Lecture: Syetrization and Radeacher Averages Instructor: Sha Kakade Radeacher Averages Recall that we are interested in bounding the difference between epirical and

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

A Theoretical Framework for Deep Transfer Learning

A Theoretical Framework for Deep Transfer Learning A Theoretical Fraewor for Deep Transfer Learning Toer Galanti The School of Coputer Science Tel Aviv University toer22g@gail.co Lior Wolf The School of Coputer Science Tel Aviv University wolf@cs.tau.ac.il

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes Radeacher Coplexity Margin Bounds for Learning with a Large Nuber of Classes Vitaly Kuznetsov Courant Institute of Matheatical Sciences, 25 Mercer street, New York, NY, 002 Mehryar Mohri Courant Institute

More information

Geometrical intuition behind the dual problem

Geometrical intuition behind the dual problem Based on: Geoetrical intuition behind the dual proble KP Bennett, EJ Bredensteiner, Duality and Geoetry in SVM Classifiers, Proceedings of the International Conference on Machine Learning, 2000 1 Geoetrical

More information

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS ISSN 1440-771X AUSTRALIA DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS An Iproved Method for Bandwidth Selection When Estiating ROC Curves Peter G Hall and Rob J Hyndan Working Paper 11/00 An iproved

More information

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION A eshsize boosting algorith in kernel density estiation A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION C.C. Ishiekwene, S.M. Ogbonwan and J.E. Osewenkhae Departent of Matheatics, University

More information

Manifold learning via Multi-Penalty Regularization

Manifold learning via Multi-Penalty Regularization Manifold learning via Multi-Penalty Regularization Abhishake Rastogi Departent of Matheatics Indian Institute of Technology Delhi New Delhi 006, India abhishekrastogi202@gail.co Abstract Manifold regularization

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and

More information

Machine Learning: Fisher s Linear Discriminant. Lecture 05

Machine Learning: Fisher s Linear Discriminant. Lecture 05 Machine Learning: Fisher s Linear Discriinant Lecture 05 Razvan C. Bunescu chool of Electrical Engineering and Coputer cience bunescu@ohio.edu Lecture 05 upervised Learning ask learn an (unkon) function

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions

Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions Kernel Choice and Classifiability for RKHS Ebeddings of Probability Distributions Bharath K. Sriperubudur Departent of ECE UC San Diego, La Jolla, USA bharathsv@ucsd.edu Kenji Fukuizu The Institute of

More information

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion Suppleentary Material for Fast and Provable Algoriths for Spectrally Sparse Signal Reconstruction via Low-Ran Hanel Matrix Copletion Jian-Feng Cai Tianing Wang Ke Wei March 1, 017 Abstract We establish

More information

Introduction to Machine Learning. Recitation 11

Introduction to Machine Learning. Recitation 11 Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,

More information

Tight Bounds for the Expected Risk of Linear Classifiers and PAC-Bayes Finite-Sample Guarantees

Tight Bounds for the Expected Risk of Linear Classifiers and PAC-Bayes Finite-Sample Guarantees Tight Bounds for the Expected Risk of Linear Classifiers and PAC-Bayes Finite-Saple Guarantees Jean Honorio CSAIL, MIT Cabridge, MA 0239, USA jhonorio@csail.it.edu Toi Jaakkola CSAIL, MIT Cabridge, MA

More information

The degree of a typical vertex in generalized random intersection graph models

The degree of a typical vertex in generalized random intersection graph models Discrete Matheatics 306 006 15 165 www.elsevier.co/locate/disc The degree of a typical vertex in generalized rando intersection graph odels Jerzy Jaworski a, Michał Karoński a, Dudley Stark b a Departent

More information

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA) Bayesian Learning Chapter 6: Bayesian Learning CS 536: Machine Learning Littan (Wu, TA) [Read Ch. 6, except 6.3] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theore MAP, ML hypotheses MAP learners Miniu

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a ournal published by Elsevier. The attached copy is furnished to the author for internal non-coercial research and education use, including for instruction at the authors institution

More information

1 Rademacher Complexity Bounds

1 Rademacher Complexity Bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability

More information

Asynchronous Gossip Algorithms for Stochastic Optimization

Asynchronous Gossip Algorithms for Stochastic Optimization Asynchronous Gossip Algoriths for Stochastic Optiization S. Sundhar Ra ECE Dept. University of Illinois Urbana, IL 680 ssrini@illinois.edu A. Nedić IESE Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Transductive Learning of Structural SVMs via Prior Knowledge Constraints

Transductive Learning of Structural SVMs via Prior Knowledge Constraints Transductive Learning of Structural SVMs via Prior Knowledge Constraints Chun-Na Yu Dept. of Coputing Science & AICML University of Alberta, AB, Canada chunna@cs.ualberta.ca Abstract Reducing the nuber

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

E. Alpaydın AERFAISS

E. Alpaydın AERFAISS E. Alpaydın AERFAISS 00 Introduction Questions: Is the error rate of y classifier less than %? Is k-nn ore accurate than MLP? Does having PCA before iprove accuracy? Which kernel leads to highest accuracy

More information

Soft-margin SVM can address linearly separable problems with outliers

Soft-margin SVM can address linearly separable problems with outliers Non-linear Support Vector Machines Non-linearly separable probles Hard-argin SVM can address linearly separable probles Soft-argin SVM can address linearly separable probles with outliers Non-linearly

More information

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer.

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer. UIVRSITY OF TRTO DIPARTITO DI IGGRIA SCIZA DLL IFORAZIO 3823 Povo Trento (Italy) Via Soarive 4 http://www.disi.unitn.it O TH US OF SV FOR LCTROAGTIC SUBSURFAC SSIG A. Boni. Conci A. assa and S. Piffer

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

Structured Prediction Theory Based on Factor Graph Complexity

Structured Prediction Theory Based on Factor Graph Complexity Structured Prediction Theory Based on Factor Graph Coplexity Corinna Cortes Google Research New York, NY 00 corinna@googleco Mehryar Mohri Courant Institute and Google New York, NY 00 ohri@cisnyuedu Vitaly

More information

Supplement to: Subsampling Methods for Persistent Homology

Supplement to: Subsampling Methods for Persistent Homology Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation

More information

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES

More information

Probabilistic Machine Learning

Probabilistic Machine Learning Probabilistic Machine Learning by Prof. Seungchul Lee isystes Design Lab http://isystes.unist.ac.kr/ UNIST Table of Contents I.. Probabilistic Linear Regression I... Maxiu Likelihood Solution II... Maxiu-a-Posteriori

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

Stability Bounds for Stationary ϕ-mixing and β-mixing Processes

Stability Bounds for Stationary ϕ-mixing and β-mixing Processes Journal of Machine Learning Research (200) 789-84 Subitted /08; Revised /0; Published 2/0 Stability Bounds for Stationary ϕ-ixing and β-ixing Processes Mehryar Mohri Courant Institute of Matheatical Sciences

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

Exact tensor completion with sum-of-squares

Exact tensor completion with sum-of-squares Proceedings of Machine Learning Research vol 65:1 54, 2017 30th Annual Conference on Learning Theory Exact tensor copletion with su-of-squares Aaron Potechin Institute for Advanced Study, Princeton David

More information

Statistics and Probability Letters

Statistics and Probability Letters Statistics and Probability Letters 79 2009 223 233 Contents lists available at ScienceDirect Statistics and Probability Letters journal hoepage: www.elsevier.co/locate/stapro A CLT for a one-diensional

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Metric Entropy of Convex Hulls

Metric Entropy of Convex Hulls Metric Entropy of Convex Hulls Fuchang Gao University of Idaho Abstract Let T be a precopact subset of a Hilbert space. The etric entropy of the convex hull of T is estiated in ters of the etric entropy

More information

Support recovery in compressed sensing: An estimation theoretic approach

Support recovery in compressed sensing: An estimation theoretic approach Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de

More information

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011) E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how

More information

Research Article Robust ε-support Vector Regression

Research Article Robust ε-support Vector Regression Matheatical Probles in Engineering, Article ID 373571, 5 pages http://dx.doi.org/10.1155/2014/373571 Research Article Robust ε-support Vector Regression Yuan Lv and Zhong Gan School of Mechanical Engineering,

More information

arxiv: v1 [math.pr] 17 May 2009

arxiv: v1 [math.pr] 17 May 2009 A strong law of large nubers for artingale arrays Yves F. Atchadé arxiv:0905.2761v1 [ath.pr] 17 May 2009 March 2009 Abstract: We prove a artingale triangular array generalization of the Chow-Birnbau- Marshall

More information

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)

More information

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters journal of ultivariate analysis 58, 96106 (1996) article no. 0041 The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Paraeters H. S. Steyn

More information

Introduction to Kernel methods

Introduction to Kernel methods Introduction to Kernel ethods ML Workshop, ISI Kolkata Chiranjib Bhattacharyya Machine Learning lab Dept of CSA, IISc chiru@csa.iisc.ernet.in http://drona.csa.iisc.ernet.in/~chiru 19th Oct, 2012 Introduction

More information

The Transactional Nature of Quantum Information

The Transactional Nature of Quantum Information The Transactional Nature of Quantu Inforation Subhash Kak Departent of Coputer Science Oklahoa State University Stillwater, OK 7478 ABSTRACT Inforation, in its counications sense, is a transactional property.

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago

More information

A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics, EPFL, Lausanne Phone: Fax:

A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics, EPFL, Lausanne Phone: Fax: A general forulation of the cross-nested logit odel Michel Bierlaire, EPFL Conference paper STRC 2001 Session: Choices A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics,

More information

Multi-view Discriminative Manifold Embedding for Pattern Classification

Multi-view Discriminative Manifold Embedding for Pattern Classification Multi-view Discriinative Manifold Ebedding for Pattern Classification X. Wang Departen of Inforation Zhenghzou 450053, China Y. Guo Departent of Digestive Zhengzhou 450053, China Z. Wang Henan University

More information

Comparison of Stability of Selected Numerical Methods for Solving Stiff Semi- Linear Differential Equations

Comparison of Stability of Selected Numerical Methods for Solving Stiff Semi- Linear Differential Equations International Journal of Applied Science and Technology Vol. 7, No. 3, Septeber 217 Coparison of Stability of Selected Nuerical Methods for Solving Stiff Sei- Linear Differential Equations Kwaku Darkwah

More information

Some Proofs: This section provides proofs of some theoretical results in section 3.

Some Proofs: This section provides proofs of some theoretical results in section 3. Testing Jups via False Discovery Rate Control Yu-Min Yen. Institute of Econoics, Acadeia Sinica, Taipei, Taiwan. E-ail: YMYEN@econ.sinica.edu.tw. SUPPLEMENTARY MATERIALS Suppleentary Materials contain

More information

Domain-Adversarial Neural Networks

Domain-Adversarial Neural Networks Doain-Adversarial Neural Networks Hana Ajakan, Pascal Gerain 2, Hugo Larochelle 3, François Laviolette 2, Mario Marchand 2,2 Départeent d inforatique et de génie logiciel, Université Laval, Québec, Canada

More information

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS N. van Erp and P. van Gelder Structural Hydraulic and Probabilistic Design, TU Delft Delft, The Netherlands Abstract. In probles of odel coparison

More information

Lecture 20 November 7, 2013

Lecture 20 November 7, 2013 CS 229r: Algoriths for Big Data Fall 2013 Prof. Jelani Nelson Lecture 20 Noveber 7, 2013 Scribe: Yun Willia Yu 1 Introduction Today we re going to go through the analysis of atrix copletion. First though,

More information

L p moments of random vectors via majorizing measures

L p moments of random vectors via majorizing measures L p oents of rando vectors via ajorizing easures Olivier Guédon, Mark Rudelson Abstract For a rando vector X in R n, we obtain bounds on the size of a saple, for which the epirical p-th oents of linear

More information

Forecasting Financial Indices: The Baltic Dry Indices

Forecasting Financial Indices: The Baltic Dry Indices International Journal of Maritie, Trade & Econoic Issues pp. 109-130 Volue I, Issue (1), 2013 Forecasting Financial Indices: The Baltic Dry Indices Eleftherios I. Thalassinos 1, Mike P. Hanias 2, Panayiotis

More information

Derivative reproducing properties for kernel methods in learning theory

Derivative reproducing properties for kernel methods in learning theory Journal of Computational and Applied Mathematics 220 (2008) 456 463 www.elsevier.com/locate/cam Derivative reproducing properties for kernel methods in learning theory Ding-Xuan Zhou Department of Mathematics,

More information