Manifold learning via Multi-Penalty Regularization
|
|
- Lionel Stokes
- 6 years ago
- Views:
Transcription
1 Manifold learning via Multi-Penalty Regularization Abhishake Rastogi Departent of Matheatics Indian Institute of Technology Delhi New Delhi 006, India Abstract Manifold regularization is an approach which exploits the geoetry of the arginal distribution. The ain goal of this paper is to analyze the convergence issues of such regularization algoriths in learning theory. We propose a ore general ulti-penalty fraework and establish the optial convergence rates under the general soothness assuption. We study a theoretical analysis of the perforance of the ulti-penalty regularization over the reproducing kernel Hilbert space. We discuss the error estiates of the regularization schees under soe prior assuptions for the joint probability easure on the saple space. We analyze the convergence rates of learning algoriths easured in the nor in reproducing kernel Hilbert space and in the nor in Hilbert space of square-integrable functions. The convergence issues for the learning algoriths are discussed in probabilistic sense by exponential tail inequalities. In order to optiize the regularization functional, one of the crucial issue is to select regularization paraeters to ensure good perforance of the solution. We propose a new paraeter choice rule the penalty balancing principle based on augented Tikhonov regularization for the choice of regularization paraeters. The superiority of ulti-penalty regularization over single-penalty regularization is shown using the acadeic exaple and oon data set. Keywords: Learning theory, Multi-penalty regularization, General source condition, Optial rates, Penalty balancing principle. Matheatics Subject Classification 200: 68T05, 68Q32. Introduction Let X be a copact etric space and Y R with the joint probability easure ρ on Z = X Y. Suppose z = (x i, y i )} Z be a observation set drawn fro the unknown probability easure ρ. The learning proble [, 2, 3, 4] ais to approxiate a function f z based on z such that f z (x) y. The goodness of the estiator can be easured by the generalization error of a function f which can be defined as E(f) := E ρ (f) = Z V (f(x), y)dρ(x, y), () DOI : 0.52/ijaia
2 where V : Y Y R is the loss function. The iniizer of E(f) for the square loss function V (f(x), y) = (f(x) y) 2 Y is given by f ρ (x) := Y ydρ(y x), (2) where f ρ is called the regression function. It is clear fro the following proposition: E(f) = X (f(x) f ρ (x)) 2 dρ X (x) + σ 2 ρ, where σρ 2 = X Y (y f ρ(x)) 2 dρ(y x)dρ X (x). The regression function f ρ belongs to the space of square integrable functions provided that Z y 2 dρ(x, y) <. (3) Therefore our objective becoes to estiate the regression function f ρ. Single-penalty regularization is widely considered to infer the estiator fro given set of rando saples [5, 6, 7, 8, 9, 0]. Sale et al. [9,, 2] provided the foundations of theoretical analysis of square-loss regularization schee under Hölder s source condition. Caponnetto et al. [6] iproved the error estiates to optial convergence rates for regularized least-square algorith using the polynoial decay condition of eigenvalues of the integral operator. But soeties, one ay require to add ore penalties to incorporate ore features in the regularized solution. Multi-penalty regularization is studied by various authors for both inverse probles and learning algoriths [3, 4, 5, 6, 7, 8, 9, 20]. Belkin et al. [3] discussed the proble of anifold regularization which controls the coplexity of the function in abient space as well as geoetry of the probability space: f = argin (f(x i ) y i ) 2 + λ A f 2 H K + λ I n i,j= (f(x i ) f(x j )) 2 ω ij, (4) where (x i, y i ) X Y : i } x i X : < i n} is given set of labeled and unlabeled data, λ A and λ I are non-negative regularization paraeters, ω ij s are non-negative weights, H K is reproducing kernel Hilbert space and HK is its nor. Further, the anifold regularization algorith is developed and widely considered in the vectorvalued fraework to analyze the ulti-task learning proble [2, 22, 23, 24] (Also see references therein). So it otivates us to theoretically analyze this proble. The convergence issues of the ulti-penalty regularizer are discussed under general source condition in [25] but the convergence rates are not optial. Here we are able to achieve the optial iniax convergence rates using the polynoial decay condition of eigenvalues of the integral operator. In order to optiize regularization functional, one of the crucial proble is the paraeter choice strategy. Various prior and posterior paraeter choice rules are proposed for single-penalty regularization [26, 27, 28, 29, 30] (also see references therein). Many regularization paraeter selection approaches are discussed for ulti-penalized ill-posed inverse probles such as discrepancy principle [5, 3], quasi-optiality principle [8, 32], balanced-discrepancy principle [33], heuristic L-curve [34], noise structure based paraeter choice rules [35, 36, 37], soe approaches which require reduction to single-penalty regularization [38]. Due to growing interest in ulti-penalty 78
3 x i } is defined by S x (f) = (f(x)) x x. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.8, No.5, Septeber 207 regularization in learning, ulti-paraeter choice rules are discussed in learning theory fraework such as discrepancy principle [5, 6], balanced-discrepancy principle [25], paraeter choice strategy based on generalized cross validation score [9]. Here we discuss the penalty balancing principle (PB-principle) to choose the regularization paraeters in our learning theory fraework which is considered for ulti-penalty regularization in ill-posed probles [33].. Matheatical Preliinaries and Notations Definition.. Let X be a non-epty set and H be a Hilbert space of real-valued functions on X. If the pointwise evaluation ap F x : H R, defined by F x (f) = f(x) f H, is continuous for every x X. Then H is called reproducing kernel Hilbert space. For each reproducing kernel Hilbert space H there exists a ercer kernel K : X X R such that for K x : X R, defined as K x (y) = K(x, y), the span of the set K x : x X} is dense in H. Moreover, there is one to one correspondence between ercer kernels and reproducing kernel Hilbert spaces [39]. So we denote the reproducing kernel Hilbert space H by H K corresponding to a ercer kernel K and its nor by K. Definition.2. The sapling operator S x : H K R associated with a discrete subset x = Denote S x : R H K as the adjoint of S x. Then for c R, f, S xc K = S x f, c = Then its adjoint is given by S xc = c i f(x i ) = f, c i K xi K, f H K. c i K xi, c = (c,, c ) R. Fro the following assertion we observe that S x is a bounded operator: S x f 2 = } f, K xi 2 K } f 2 K K xi 2 K κ 2 f 2 K, which iplies S x κ, where κ := sup K(x, x). x X For each (x i, y i ) Z, y i = f ρ (x i ) + η xi, where the probability distribution of η xi has ean 0 and variance σx 2 i. Denote σ 2 := σx 2 i <. Learning Schee. The optiization functional (4) can be expressed as } f = argin S x f y 2 + λ A f 2 K + λ I (Sx LS x )/2 f 2 K, (5) 79
4 where x = x i X : i n}, y 2 = y2 i, L = D W with W = (ω ij) is a weight atrix with non-negative entries and D is a diagonal atrix with D ii = n ω ij. Here we consider a ore general regularized learning schee based on two penalties: f z,λ := argin Sx f y 2 + λ f 2 K + λ 2 Bf 2 } K, (6) where B : H K H K is a bounded operator and λ, λ 2 are non-negative paraeters. Theore.. If S xs x + λ I + λ 2 B B is invertible, then the optiization functional (6) has unique iniizer: Proof. we know that for f H K, j= f z,λ = S S xy, where S := (S xs x + λ I + λ 2 B B). S x f y 2 + λ f 2 K + λ 2 Bf 2 K= (S xs x + λ I + λ 2 B B)f, f K 2 S xy, f K + y 2. Taking the functional derivative for f H K, we see that any iniizer f z,λ of (6) satisfies This proves Theore.. (S xs x + λ I + λ 2 B B)f z,λ = S xy. Define f x,λ as the iniizer of the optiization proble: f x,λ := argin } (f(x i ) f ρ (x i )) 2 + λ f 2 K + λ 2 Bf 2 K (7) which gives f x,λ = S S xs x f ρ. (8) The data-free version of the considered regularization schee (6) is f λ := argin f fρ 2 ρ + λ f 2 K + λ 2 Bf 2 } K, (9) where the nor ρ := L 2 ρx. Then we get the expression of f λ, f λ = (L K + λ I + λ 2 B B) L K f ρ (0) and which iplies f λ := argin f fρ 2 ρ + λ f 2 } K. () f λ = (L K + λ I) L K f ρ, (2) where the integral operator L K : Lρ 2 X Lρ 2 X is a self-adjoint, non-negative, copact operator, defined as L K (f)(x) := K(x, t)f(t)dρ X (t), x X. X 80
5 The integral operator L K can also be defined as a self-adjoint operator on H K. We use the sae notation L K for both the operators. Using the singular value decoposition L K = t i, e i K e i for orthonoral syste e i } in H K and sequence of singular nubers κ 2 t t , we define φ(l K ) = φ(t i ), e i K e i, where φ is a continuous increasing index function defined on the interval [0, κ 2 ] with the assuption φ(0) = 0. We require soe prior assuptions on the probability easure ρ to achieve the unifor convergence rates for learning algoriths. Assuption. (Source condition) Suppose Ω φ,r := f H K : f = φ(l K )g and g K R}, Then the condition f ρ Ω φ,r is usually referred as general source condition [40]. Assuption 2. (Polynoial decay condition) We assue the eigenvalues t n s of the integral operator L K follows the polynoial decay: For fixed positive constants α, β and b >, αn b t n βn b n N. Following the notion of Bauer et al. [5] and Caponnetto et al. [6], we consider the class of probability easures P φ which satisfies the source condition and the probability easure class P φ,b satisfying the source condition and polynoial decay condition. The effective diension N (λ ) can be estiated fro Proposition 3 [6] under the polynoial decay condition as follows, N (λ ) := T r ( (L K + λ I) L K ) where T r(a) := Ae k, e k for soe orthonoral basis e k } k=. k= βb b λ /b, for b >. (3) Shuai Lu et al. [4] and Blanchard et al. [42] considered the logarith decay condition of the effective diension N (λ ), Assuption 3. (logarithic decay) Assue that there exists soe positive constant c > 0 such that ( ) N (λ ) c log, λ > 0. (4) λ 2 Convergence Analysis In this section, we discuss the convergence issues of ulti-penalty regularization schee on reproducing kernel Hilbert space under the considered soothness priors in learning theory fraework. We address the convergence rates of the ulti-penalty regularizer by estiating the saple error f z,λ f λ and approxiation error f λ f ρ in interpolation nor. In Theore 2., the upper convergence rates of ulti-penalty regularized solution f z,λ are derived fro the estiates of 8
6 Proposition 2.2, 2.3 for the class of probability easure P φ,b, respectively. We discuss the error estiates under the general source condition and the paraeter choice rule based on the index function φ and saple size. Under the polynoial decay condition, in Theore 2.2 we obtain the optial iniax convergence rates in ters of index function φ, the paraeter b and the nuber of saples. In particular under Hölder s source condition, we present the convergence rates under the logarith decay condition on effective diension in Corollary 2.2. Proposition 2.. Let z be i.i.d. saples drawn according to the probability easure ρ with the hypothesis y i M for each (x i, y i ) Z. Then for 0 s 2 and for every 0 < < with prob., L s K(f z,λ f x,λ ) K 2λ s 2 Ξ ( + 2 log ( ) ) 2 + 4κM 3 log λ ( ) } 2, where N xi (λ ) = T r ( ) (L K + λ I) K xi Kx i and Ξ = σ2 x i N xi (λ ) for the variance σx 2 i of the probability distribution of η xi = y i f ρ (x i ). Proof. The expression f z,λ f x,λ can be written as S S x(y S x f ρ ). Then we find that L s K(f z,λ f x,λ ) K I L s K(L K + λ I) /2 (L K + λ I) /2 S (L K + λ I) /2 I I 2 L s K(L K + λ I) /2, (5) where I = (L K +λ I) /2 S x(y S x f ρ ) K and I 2 = (L K +λ I) /2 (S xs x +λ I) (L K +λ I) /2. For sufficiently large saple size, the following inequality holds: Then fro Theore 2 [43] we have with confidence, 8κ 2 ( ) 2 log λ (6) I 3 = (L K + λ I) /2 (L K S xs x )(L K + λ I) /2 S xs x L K Then the Neuann series gives λ 4κ 2 ( ) 2 log λ 2. I 2 = I (L K + λ I) /2 (L K SxS x )(L K + λ I) /2 } (7) = (L K + λ I) /2 (L K SxS x )(L K + λ I) /2 } i I3 i = 2. I 3 Now we have, i=0 L s K(L K + λ I) /2 sup 0<t κ 2 i=0 t s λs /2 (t + λ ) /2 for 0 s 2. (8) To estiate the error bound for (L K + λ I) /2 S x(y S x f ρ ) K using the McDiarid inequality (Lea 2 [2]), define the function F : R R as F(y) = (L K + λ I) /2 S x(y S x f ρ ) K 82
7 So F 2 (y) = 2 = (L K + λ I) /2 (y i f ρ (x i ))K xi. K (y i f ρ (x i ))(y j f ρ (x j )) (L K + λ I) K xi, K xj K. i,j= The independence of the saples together with E y (y i f ρ (x i )) = 0, E y (y i f ρ (x i )) 2 = σx 2 i iplies E y (F 2 ) = 2 σx 2 i N xi (λ ) Ξ 2, where N xi (λ ) = T r ( ) (L K + λ I) K xi Kx i and Ξ = σ2 x i N xi (λ ). Since E y (F) Ey (F 2 ). It iplies E y (F) Ξ. Let y i = (y,..., y i, y i, y i+,..., y ), where y i is another saple at x i. We have F(y) F(y i ) (L K + λ I) /2 Sx(y y i ) K = (y i y i)(l K + λ I) /2 K xi K 2κM. λ This can be taken as B in Lea 2(2) [2]. Now ( E yi F(y) Eyi (F(y)) 2) ( 2 2 y i y i (L K + λ I) /2 K xi K dρ(y i x i )) dρ(y i x i ) Y Y 2 (y i y i) 2 N xi (λ )dρ(y i x i )dρ(y i x i ) which iplies Y Y 2 2 σ2 x i N xi (λ ) σi 2 (F) 2Ξ 2. In view of Lea 2(2) [2] for every ε > 0, P rob F(y) E ε 2 } y(f(y)) ε} exp y Y 4(Ξ 2 + εκm/3 =. (let) λ ) In ters of, probability inequality becoes ( P rob y Y F(y) Ξ + 2 log ( ) ) + 4κM 3 log λ ( ) }. Incorporating this inequality with (7), (8) in (5), we get the desired result. Proposition 2.2. Let z be i.i.d. saples drawn according to the probability easure ρ with the hypothesis y i M for each (x i, y i ) Z. Suppose f ρ Ω φ,r. Then for 0 s 2 and for every 0 < < with prob., L s K(f z,λ f λ ) K 2λs 2 3M N (λ ) + 4κ λ f λ f ρ ρ + λ 6 f λ f ρ K + 7κM } ( ) 4 log. λ 83
8 Proof. We can express f x,λ f λ = S (S xs x L K )(f ρ f λ ), which iplies L s K(f x,λ f λ ) K I 4 (f ρ (x i ) f λ (x i ))K xi L K (f ρ f λ ). K where I 4 = L s K S. Using Lea 3 [2] for the function f ρ f λ, we get with confidence, L s K(f x,λ f λ ) K I 4 ( 4κ f λ f ρ log 3 ( ) + κ f λ f ρ ρ ( + 8log For sufficiently large saple (6), fro Theore 2 [43] we get ( ) (L K SxS x )(L K + λ I) S xs x L K 4κ2 2 log λ λ 2 with confidence, which iplies ( ) )). (9) (L K + λ I)(S xs x + λ I) = I (L K S xs x )(L K + λ I) } 2. (20) We have, L s K(L K + λ I) sup 0<t κ 2 Now equation (20) and (2) iplies the following inequality, t s (t + λ ) λs for 0 s. (2) I 4 L s K(S xs x + λ I) L s K(L K + λ I) (L K + λ I)(S xs x + λ I) 2λ s. (22) Let ξ(x) = σ 2 xn x (λ ) be the rando variable. Then it satisfies ξ 4κ 2 M 2 /λ, E x (ξ) M 2 N (λ ) and σ 2 (ξ) 4κ 2 M 4 N (λ )/λ. Using the Bernstein inequality we get P rob x X ( σ 2 xi N xi (λ ) M 2 N (λ ) ) } ( > t exp t 2 /2 4κ 2 M 4 N (λ ) λ + 4κ2 M 2 t 3λ ) which iplies P rob x X M 2 N (λ ) Ξ + 8κ 2 M λ log ( ) }. (23) We get the required error estiate by cobining the estiates of Proposition 2. with inequalities (9), (22), (23). Proposition 2.3. Suppose f ρ Ω φ,r. Then under the assuption that φ(t) and t s /φ(t) are nondecreasing functions, we have L s K(f λ f ρ ) K λ s ( ) Rφ(λ ) + λ 2 λ 3/2 M B B. (24) Proof. To realize the above error estiates, we decoposes f λ f ρ into f λ f λ + f λ f ρ. The first ter can be expressed as f λ f λ = λ 2 (L K + λ I + λ 2 B B) B Bf λ. 84
9 Then we get L s K(f λ f λ ) K λ 2 L s K(L K + λ I) B B f λ K (25) λ 2 λ s B B f λ K λ 2 λ s 3/2 M B B. L s K(f λ f ρ ) R r λ (L K )L s Kφ(L K ) Rλ s φ(λ ), (26) where r λ (t) = (t + λ ) t. Cobining these error bounds, we achieve the required estiate. Theore 2.. Let z be i.i.d. saples drawn according to probability easure P φ,b. Suppose φ(t) and t s /φ(t) are nondecreasing functions. Then under paraeter choice λ (0, ], λ = Ψ ( /2 ), λ 2 = (Ψ ( /2 )) 3/2 φ(ψ ( /2 )) where Ψ(t) = t 2 + 2b φ(t), for 0 s 2 and for all 0 < <, the following error estiates holds with confidence, ( )} 4 P rob L s K(f z,λ f ρ ) K C(Ψ ( /2 )) s φ(ψ ( /2 )) log, z Z where C = 4κM + (2 + 8κ)(R + M B B ) + 6M βb/(b ) and li τ li sup sup P rob ρ P φ,b z Z Proof. Let Ψ(t) = t 2 + 2b φ(t). Then Ψ(t) = y follows, } L s K(f z,λ f ρ ) K > τ(ψ ( /2 )) s φ(ψ ( /2 )) =0. Ψ(t) y li = li t 0 t y 0 Ψ (y) = 0. Under the paraeter choice λ = Ψ ( /2 ) we have large, 2b = λ φ(λ ) λ λ li λ =. Therefore for sufficiently λ 2b φ(λ ). Under the fact λ fro Proposition 2.2, 2.3 and eqn. (3) follows that with confidence, L s K(f z,λ f ρ ) K C(Ψ ( /2 )) s φ(ψ ( /2 )) log ( ) 4, (27) where C = 4κM + (2 + 8κ)(R + M B B ) + 6M βb/(b ). Now defining τ := C log ( 4 ) gives = τ = 4e τ/c. The estiate (27) can be reexpressed as P rob K(f z Z Ls z,λ f ρ ) K > τ(ψ ( /2 )) s φ(ψ ( /2 ))} τ. (28) Theore 2.2. Let z be i.i.d. saples drawn according to the probability easure ρ P φ,b. Then under the paraeter choice λ 0 (0, ], λ 0 = Ψ ( /2 ), λ j = (Ψ ( /2 )) 3/2 φ(ψ ( /2 )) for j p, where Ψ(t) = t 2 + 2b φ(t), for all 0 < η <, the following error estiates hold with confidence η, 85
10 (i) If φ(t) and t/φ(t) are nondecreasing functions. Then we have, ( )} Prob f z,λ f H H Cφ(Ψ 4 ( /2 )) log η z Z η and li τ li sup sup Prob ρ P φ,b z Z } f z,λ f H H > τφ(ψ ( /2 )) = 0, where C = 2R + 4κM + 2M B B L(H) + 4κΣ. (ii) If φ(t) and t/φ(t) are nondecreasing functions. Then we have, ( )} Prob f z,λ f H ρ C(Ψ 4 ( /2 )) /2 φ(ψ ( /2 )) log η z Z η and li τ li sup sup Prob ρ P φ,b z Z } f z,λ f H ρ > τ(ψ ( /2 )) /2 φ(ψ ( /2 )) = 0. Corollary 2.. Under the sae assuptions of Theore 2. for Hölder s source condition f ρ Ω φ,r, φ(t) = t r, for 0 s 2 and for all 0 < <, with confidence, for the paraeter choice λ = b 2br+b+ and λ2 = 2br+3b 4br+2b+2 we have the following convergence rates: ( ) L s K(f z,λ f ρ ) K C b(r+s) 2br+b+ 4 log for 0 r s. Reark 2.. For Hölder source condition f H Ω φ,r, φ(t) = t r with the paraeter choice λ 0 (0, ], λ 0 = b 2br+b+ and for j p, λj = 2br+3b 4br+2b+2, we obtain the optial iniax convergence rates O( br 2br+b+ ) for 0 r and O( 2br+b 4br+2b+2 ) for 0 r 2 in RKHS-nor and Lρ 2 X -nor, respectively. Corollary 2.2. Under the logarith decay condition of effective diension N (λ ), for Hölder s source condition f ρ Ω φ,r, φ(t) = t r, for 0 s 2 and for all 0 < <, with confidence, for the paraeter choice λ = convergence rates: ( log ) 2r+ and λ 2 = ( ) s+r ( ) log L s 2r+ 4 K(f z,λ f ρ ) K C log ( log ) 2r+3 4r+2 for 0 r s. we have the following Reark 2.2. The upper convergence rates of the regularized solution is estiated in the interpolation nor for the paraeter s [0, 2 ]. In particular, we obtain the error estiates in HK -nor for s = 0 and in L 2 ρx -nor for s = 2. We present the error estiates of ulti-penalty regularizer over the regularity class P φ,b in Theore 2. and Corollary 2.. We can also obtain the convergence rates of the estiator f z,λ under the source condition without the polynoial decay of the eigenvalues of the integral operator L K by substituting N (λ ) κ2 λ. In addition, for B = (Sx LS x )/2 we obtain the error estiates of the anifold regularization schee (30) considered in [3]. Reark 2.3. The paraeter choice is said to be optial, if the iniax lower rates coincide with the upper convergence rates for soe λ = λ(). For the paraeter choice λ = Ψ ( /2 ) and 86
11 λ 2 = (Ψ ( /2 )) 3/2 φ(ψ ( /2 )), Theore 2.2 share the upper convergence rates with the lower convergence rates of Theore 3., 3.2 [44]. Therefore the choice of paraeters is optial. Reark 2.4. The results can be easily generalized to n-penalty regularization in vector-valued fraework. For siplicity, we discuss two-paraeter regularization schee in scalar-valued function setting. Reark 2.5. We can also address the convergence issues of binary classification proble [45] using our error estiates as siilar to discussed in Section 3.3 [5] and Section 5 [9]. The proposed choice of paraeters in Theore 2. is based on the regularity paraeters which are generally not known in practice. In the proceeding section, we discuss the paraeter choice rules based on saples. 3 Paraeter Choice Rules Most regularized learning algoriths depend on the tuning paraeter, whose appropriate choice is crucial to ensure good perforance of the regularized solution. Many paraeter choice strategies are discussed for single-penalty regularization schees for both ill-posed probles and the learning algoriths [27, 28] (also see references therein). Various paraeter choice rules are studied for ulti-penalty regularization schees [5, 8, 9, 25, 3, 32, 33, 36, 46]. Ito el al. [33] studied a balancing principle for choosing regularization paraeters based on the augented Tikhonov regularization approach for ill posed inverse probles. In learning theory fraework, we are discussing the fixed point algorith based on the penalty balancing principle considered in [33]. The Bayesian inference approach provides a echanis for selecting the regularization paraeters through hierarchical odeling. Various authors successfully applied this approach in different probles. Thopson et al. [47] applied this for selecting paraeters for iage restoration. Jin et al. [48] considered the approach for ill-posed Cauchy proble of steady-state heat conduction. The posterior probability density function (PPDF) for the functional (5) is given by P (f, σ 2, µ, z) ( ) n/2 ( σ 2 exp ) ( 2σ 2 S xf y 2 µ n/2 exp µ ) 2 f 2 K µ n2/2 2 ( exp µ ) ( ) α 2 2 Bf 2 K µ α e β µ µ α 2 e β µ 2 o σ 2 e β o ( σ 2 ). where (α, β ) are paraeter pairs for µ = (µ, µ 2 ), (α o, β o) are paraeter pair for inverse variance σ 2. In the Bayesian inference approach, we select paraeter set (f, σ 2, µ) which axiizes the PPDF. By taking the negative logarith and siplifying, the proble can be reforulated as J (f, τ, µ) = τ S x f y 2 + µ f 2 K + µ 2 Bf 2 K +β(µ + µ 2 ) α(logµ + logµ 2 ) + β o τ α o logτ, where τ = /σ 2, β = 2β, α = n + 2α 2, β o = 2β o, α o = n 2 + 2α o 2. We assue that the scalars τ and µ i s have Gaa distributions with known paraeter pairs. The functional is pronounced as augented Tikhonov regularization. For non-inforative prior β o = β = 0, the optiality of a-tikhonov functional can be reduced to 87
12 f z,λ = arg in Sx f y 2 + λ f 2 K + λ } 2 Bf 2 K α µ = τ = f z,λ 2 K α o S xf z,λ y 2, µ 2 = α Bf z,λ 2 K where λ = µ τ, λ 2 = µ2 τ, γ = αo α, this can be reforulated as f z,λ = arg in Sx f y 2 + λ f 2 K + λ } 2 Bf 2 K λ = S xf z,λ y 2 γ f z,λ, λ 2 2 = S xf z,λ y 2 γ K Bf z,λ 2 K which iplies λ f z,λ 2 K = λ 2 Bf z,λ 2 K. It selects the regularization paraeter λ in the functional (6) by balancing the penalty with the fidelity. Therefore the ter Penalty balancing principle follows. Now we describe the fixed point algorith based on PB-principle. Algorith Paraeter choice rule Penalty-balancing Principle. For an initial value λ = (λ 0, λ 0 2), start with k = Calculate f z,λ k and update λ by = S xf z,λ k y 2 + λ k 2 Bf z,λ k 2 K ( + γ) f z,λ k 2, K λ k+ 2 = S xf z,λ k y 2 + λ k f z,λ k 2 K ( + γ) Bf z,λ k 2. K λ k+ 3. If stopping criteria λ k+ λ k < ε satisfied then stop otherwise set k = k + and GOTO (2). 4 Nuerical Realization In this section, the perforance of single-penalty regularization versus ulti-penalty regularization is deonstrated using the acadeic exaple and two oon data set. For single-penalty regularization, paraeters are chosen according to the quasi-optiality principle while for twoparaeter regularization according to PB-principle. We consider the well-known acadeic exaple [28, 6, 49] to test the ulti-penalty regularization under PB-principle paraeter choice rule, f ρ (x) = ( x + 2 e 8( 4π 3 x)2 e 8( π 2 x)2 e 8( 3π x)2)} 2, x [0, 2π], (29) 0 which belongs to reproducing kernel Hilbert space H K corresponding to the kernel K(x, y) = xy + exp ( 8(x y) 2 ). We generate noisy data 00 ties in the for y = f ρ (x) + ξ corresponding to the inputs x = x i } = π 0 (i )}, where ξ follows the unifor distribution over [, ] with =
13 We consider the following ulti-penalty functional proposed in the anifold regularization [3, 5], } argin (f(x i ) y i ) 2 + λ f 2 K + λ 2 (Sx LS x )/2 f 2 K, (30) where x = x i X : i n} and L = D W with W = (ω ij ) is a weight atrix with non-negative entries and D is a diagonal atrix with D ii = n ω ij. In our experient, we illustrate the error estiates of single-penalty regularizers f = f z,λ, f = f z,λ2 and ulti-penalty regularizer f = f z,λ using the relative error easure f fρ f for the acadeic exaple in sup nor, H K -nor and -epirical nor in Fig. (a), (b) & (c) respectively. j= Figure : Figures show the relative errors of different estiators for the acadeic exaple in K - nor (a), -epirical nor (b) and infinity nor (c) corresponding to 00 test probles with the noise = 0.02 for all estiators. Now we copare the perforance of ulti-penalty regularization over single-penalty regularization ethod using the well-known two oon data set (Fig. 2) in the context of anifold learning. The data set contains 200 exaples with k labeled exaple for each class. We perfor experients 500 ties by taking l = 2k = 2, 6, 0, 20 labeled points randoly. We solve the anifold regularization proble (30) for the ercer kernel K(x i, x j ) = exp( γ x i x j 2 ) with the exponential weights ω ij = exp( x i x j 2 /4b), for soe b, γ > 0. We choose initial paraeters λ = 0 4, λ 2 = , the kernel paraeter γ = 3.5 and the weight paraeter b = in all experients. The perforance of single-penalty (λ 2 = 0) and the proposed ulti-penalty regularizer (30) is presented in Fig. 2, Table. Based on the considered exaples, we observe that the proposed ulti-penalty regularization with the penalty balancing principle paraeter choice outperfors the single-penalty regularizers. 89
14 (a) (b) Figure 2: The figures show the decision surfaces generated with two labeled saples (red star) by single-penalty regularizer (a) based on the quasi-optiality principle and anifold regularizer (b) based on PB-principle. Single-penalty Regularizer Multi-penalty Regularizer (SP %) (WC) Paraeters (SP %) (WC) Paraeters = λ = λ = λ 2 = = λ = λ = λ 2 = = λ = λ = λ 2 = = λ = λ = λ 2 = Table : Statistical perforance interpretation of single-penalty (λ 2 = 0) and ulti-penalty regularizers of the functional Sybols: labeled points (); successfully predicted (SP); axiu of wrongly classified points (WC) 5 Conclusion In suary, we achieved the optial iniax rates of ulti-penalized regression proble under the general source condition with the decay conditions of effective diension. In particular, the convergence analysis of ulti-penalty regularization provide the error estiates of anifold regularization proble. We can also address the convergence issues of binary classification proble using our error estiates. Here we discussed the penalty balancing principle based on augented Tikhonov regularization for the choice of regularization paraeters. Many other paraeter choice rules are proposed to obtain the regularized solution of ulti-paraeter regularization schees. The next proble of interest can be the rigorous analysis of different paraeter choice rules of ulti-penalty regularization schees. Finally, the superiority of ulti-penalty regularization over single-penalty regularization is shown using the acadeic exaple and oon data set. Acknowledgeents: The authors are grateful for the valuable suggestions and coents of the anonyous referees that led to iprove the quality of the paper. 90
15 References [] O. Bousquet, S. Boucheron, and G. Lugosi, Introduction to statistical learning theory, in Advanced lectures on achine learning, pp , Berlin/Heidelberg: Springer, [2] F. Cucker and S. Sale, On the atheatical foundations of learning, Bull. Aer. Math. Soc. (NS), vol. 39, no., pp. 49, [3] T. Evgeniou, M. Pontil, and T. Poggio, Regularization networks and support vector achines, Adv. Coput. Math., vol. 3, no., pp. 50, [4] V. N. Vapnik and V. Vapnik, Statistical Learning Theory, vol.. New York: Wiley, 998. [5] F. Bauer, S. Pereverzev, and L. Rosasco, On regularization algoriths in learning theory, J. Coplexity, vol. 23, no., pp , [6] A. Caponnetto and E. De Vito, Optial rates for the regularized least-squares algorith, Found. Coput. Math., vol. 7, no. 3, pp , [7] H. W. Engl, M. Hanke, and A. Neubauer, Regularization of Inverse Probles, vol Dordrecht, The Netherlands: Math. Appl., Kluwer Acadeic Publishers Group, 996. [8] L. L. Gerfo, L. Rosasco, F. Odone, E. De Vito, and A. Verri, Spectral algoriths for supervised learning, Neural Coputation, vol. 20, no. 7, pp , [9] S. Sale and D. X. Zhou, Learning theory estiates via integral operators and their approxiations, Constr. Approx., vol. 26, no. 2, pp , [0] A. N. Tikhonov and V. Y. Arsenin, Solutions of Ill-posed Probles, vol. 4. Washington, DC: W. H. Winston, 977. [] S. Sale and D. X. Zhou, Shannon sapling and function reconstruction fro point values, Bull. Aer. Math. Soc., vol. 4, no. 3, pp , [2] S. Sale and D. X. Zhou, Shannon sapling II: Connections to learning theory, Appl. Coput. Haronic Anal., vol. 9, no. 3, pp , [3] M. Belkin, P. Niyogi, and V. Sindhwani, Manifold regularization: A geoetric fraework for learning fro labeled and unlabeled exaples, J. Mach. Learn. Res., vol. 7, pp , [4] D. Düveleyer and B. Hofann, A ulti-paraeter regularization approach for estiating paraeters in jup diffusion processes, J. Inverse Ill-Posed Probl., vol. 4, no. 9, pp , [5] S. Lu and S. V. Pereverzev, Multi-paraeter regularization and its nuerical realization, Nuer. Math., vol. 8, no., pp. 3, 20. [6] S. Lu, S. Pereverzyev Jr., and S. Sivananthan, Multiparaeter regularization for construction of extrapolating estiators in statistical learning theory, in Multiscale Signal Analysis and Modeling, pp , New York: Springer,
16 [7] Y. Lu, L. Shen, and Y. Xu, Multi-paraeter regularization ethods for high-resolution iage reconstruction with displaceent errors, IEEE Trans. Circuits Syst. I: Regular Papers, vol. 54, no. 8, pp , [8] V. Nauova and S. V. Pereverzyev, Multi-penalty regularization with a coponent-wise penalization, Inverse Probles, vol. 29, no. 7, p , 203. [9] S. N. Wood, Modelling and soothing paraeter estiation with ultiple quadratic penalties, J. R. Statist. Soc., vol. 62, pp , [20] P. Xu, Y. Fukuda, and Y. Liu, Multiple paraeter regularization: Nuerical solutions and applications to the deterination of geopotential fro precise satellite orbits, J. Geodesy, vol. 80, no., pp. 7 27, [2] Y. Luo, D. Tao, C. Xu, D. Li, and C. Xu, Vector-valued ulti-view sei-supervsed learning for ulti-label iage classification., in AAAI, pp , 203. [22] Y. Luo, D. Tao, C. Xu, C. Xu, H. Liu, and Y. Wen, Multiview vector-valued anifold regularization for ultilabel iage classification, IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 5, pp , 203. [23] H. Q. Minh, L. Bazzani, and V. Murino, A unifying fraework in vector-valued reproducing kernel Hilbert spaces for anifold regularization and co-regularized ulti-view learning, J. Mach. Learn. Res., vol. 7, no. 25, pp. 72, 206. [24] H. Q. Minh and V. Sindhwani, Vector-valued anifold regularization, in International Conference on Machine Learning, 20. [25] Abhishake and S. Sivananthan, Multi-penalty regularization in learning theory, J. Coplexity, vol. 36, pp. 4 65, 206. [26] F. Bauer and S. Kinderann, The quasi-optiality criterion for classical inverse probles, Inverse Probles, vol. 24, p , [27] A. Caponnetto and Y. Yao, Cross-validation based adaptation for regularization operators in learning theory, Anal. Appl., vol. 8, no. 2, pp. 6 83, 200. [28] E. De Vito, S. Pereverzyev, and L. Rosasco, Adaptive kernel ethods using the balancing principle, Found. Coput. Math., vol. 0, no. 4, pp , 200. [29] V. A. Morozov, On the solution of functional equations by the ethod of regularization, Soviet Math. Dokl, vol. 7, no., pp , 966. [30] J. Xie and J. Zou, An iproved odel function ethod for choosing regularization paraeters in linear inverse probles, Inverse Probles, vol. 8, no. 3, pp , [3] S. Lu, S. V. Pereverzev, and U. Tautenhahn, A odel function ethod in regularized total least squares, Appl. Anal., vol. 89, no., pp , 200. [32] M. Fornasier, V. Nauova, and S. V. Pereverzyev, Paraeter choice strategies for ultipenalty regularization, SIAM J. Nuer. Anal., vol. 52, no. 4, pp ,
17 [33] K. Ito, B. Jin, and T. Takeuchi, Multi-paraeter Tikhonov regularization An augented approach, Chinese Ann. Math., vol. 35, no. 3, pp , 204. [34] M. Belge, M. E. Kiler, and E. L. Miller, Efficient deterination of ultiple regularization paraeters in a generalized L-curve fraework, Inverse Probles, vol. 8, pp. 6 83, [35] F. Bauer and O. Ivanyshyn, Optial regularization with two interdependent regularization paraeters, Inverse probles, vol. 23, no., pp , [36] F. Bauer and S. V. Pereverzev, An utilization of a rough approxiation of a noise covariance within the fraework of ulti-paraeter regularization, Int. J. Toogr. Stat, vol. 4, pp. 2, [37] Z. Chen, Y. Lu, Y. Xu, and H. Yang, Multi-paraeter Tikhonov regularization for linear ill-posed operator equations, J. Cop. Math., vol. 26, pp , [38] C. Brezinski, M. Redivo-Zaglia, G. Rodriguez, and S. Seatzu, Multi-paraeter regularization techniques for ill-conditioned linear systes, Nuer. Math., vol. 94, no. 2, pp , [39] N. Aronszajn, Theory of reproducing kernels, Trans. Aer. Math. Soc., vol. 68, pp , 950. [40] P. Mathé and S. V. Pereverzev, Geoetry of linear ill-posed probles in variable Hilbert scales, Inverse probles, vol. 9, no. 3, pp , [4] S. Lu, P. Mathé, and S. Pereverzyev, Balancing principle in supervised learning for a general regularization schee, RICAM-Report, vol. 38, 206. [42] G. Blanchard and P. Mathé, Discrepancy principle for statistical inverse probles with application to conjugate gradient iteration, Inverse probles, vol. 28, no., p. 50, 202. [43] E. De Vito, L. Rosasco, A. Caponnetto, U. De Giovannini, and F. Odone, Learning fro exaples as an inverse proble, J. Mach. Learn. Res., vol. 6, pp , [44] A. Rastogi and S. Sivananthan, Optial rates for the regularized learning algoriths under general source condition, Front. Appl. Math. Stat., vol. 3, p. 3, 207. [45] S. Boucheron, O. Bousquet, and G. Lugosi, Theory of classification: A survey of soe recent advances, ESAIM: probability and statistics, vol. 9, pp , [46] S. Lu and S. Pereverzev, Regularization Theory for Ill-posed Probles: Selected Topics, vol. 58. Berlin: Walter de Gruyter, 203. [47] A. M. Thopson and J. Kay, On soe choices of regularization paraeter in iage restoration, Inverse Probles, vol. 9, pp , 993. [48] B. Jin and J. Zou, Augented Tikhonov regularization, Inverse Probles, vol. 25, no. 2, p , [49] C. A. Micchelli and M. Pontil, Learning the kernel function via regularization, J. Mach. Learn. Res., vol. 6, no. 2, pp ,
Error Estimates for Multi-Penalty Regularization under General Source Condition
Error Estimates for Multi-Penalty Regularization under General Source Condition Abhishake Rastogi Department of Mathematics Indian Institute of Technology Delhi New Delhi 006, India abhishekrastogi202@gmail.com
More informationShannon Sampling II. Connections to Learning Theory
Shannon Sapling II Connections to Learning heory Steve Sale oyota echnological Institute at Chicago 147 East 60th Street, Chicago, IL 60637, USA E-ail: sale@athberkeleyedu Ding-Xuan Zhou Departent of Matheatics,
More informationAn l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions
Journal of Matheatical Research with Applications Jul., 207, Vol. 37, No. 4, pp. 496 504 DOI:0.3770/j.issn:2095-265.207.04.0 Http://jre.dlut.edu.cn An l Regularized Method for Nuerical Differentiation
More informationLearnability of Gaussians with flexible variances
Learnability of Gaussians with flexible variances Ding-Xuan Zhou City University of Hong Kong E-ail: azhou@cityu.edu.hk Supported in part by Research Grants Council of Hong Kong Start October 20, 2007
More informationCS Lecture 13. More Maximum Likelihood
CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood
More informationSupport Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization
Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering
More informationFeature Extraction Techniques
Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that
More informationKernel Methods and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic
More informationMulti-kernel Regularized Classifiers
Multi-kernel Regularized Classifiers Qiang Wu, Yiing Ying, and Ding-Xuan Zhou Departent of Matheatics, City University of Hong Kong Kowloon, Hong Kong, CHINA wu.qiang@student.cityu.edu.hk, yying@cityu.edu.hk,
More informationA Simple Regression Problem
A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where
More informationSVM Soft Margin Classifiers: Linear Programming versus Quadratic Programming
SVM Soft Margin Classifiers: Linear Prograing versus Quadratic Prograing Qiang Wu wu.qiang@student.cityu.edu.hk Ding-Xuan Zhou azhou@cityu.edu.hk Departent of Matheatics, City University of Hong Kong,
More informationYongquan Zhang a, Feilong Cao b & Zongben Xu a a Institute for Information and System Sciences, Xi'an Jiaotong. Available online: 11 Mar 2011
This article was downloaded by: [Xi'an Jiaotong University] On: 15 Noveber 2011, At: 18:34 Publisher: Taylor & Francis Infora Ltd Registered in England and Wales Registered Nuber: 1072954 Registered office:
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher
More informationA Smoothed Boosting Algorithm Using Probabilistic Output Codes
A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu
More informationBayes Decision Rule and Naïve Bayes Classifier
Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.
More informationCombining Classifiers
Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/
More informationIntelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes
More informationDEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS
ISSN 1440-771X AUSTRALIA DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS An Iproved Method for Bandwidth Selection When Estiating ROC Curves Peter G Hall and Rob J Hyndan Working Paper 11/00 An iproved
More informationIntelligent Systems: Reasoning and Recognition. Artificial Neural Networks
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial
More informationJournal of Mathematical Analysis and Applications
J Math Anal Appl 386 202 205 22 Contents lists available at ScienceDirect Journal of Matheatical Analysis and Applications wwwelsevierco/locate/jaa Sei-Supervised Learning with the help of Parzen Windows
More informatione-companion ONLY AVAILABLE IN ELECTRONIC FORM
OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer
More informationSharp Time Data Tradeoffs for Linear Inverse Problems
Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used
More informationBlock designs and statistics
Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent
More informationSupplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion
Suppleentary Material for Fast and Provable Algoriths for Spectrally Sparse Signal Reconstruction via Low-Ran Hanel Matrix Copletion Jian-Feng Cai Tianing Wang Ke Wei March 1, 017 Abstract We establish
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October
More informationPattern Recognition and Machine Learning. Artificial Neural networks
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,
More informationA Bernstein-Markov Theorem for Normed Spaces
A Bernstein-Markov Theore for Nored Spaces Lawrence A. Harris Departent of Matheatics, University of Kentucky Lexington, Kentucky 40506-0027 Abstract Let X and Y be real nored linear spaces and let φ :
More informationPattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition
More informationE0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)
E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how
More informationGeneralized AOR Method for Solving System of Linear Equations. Davod Khojasteh Salkuyeh. Department of Mathematics, University of Mohaghegh Ardabili,
Australian Journal of Basic and Applied Sciences, 5(3): 35-358, 20 ISSN 99-878 Generalized AOR Method for Solving Syste of Linear Equations Davod Khojasteh Salkuyeh Departent of Matheatics, University
More informationE0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis
E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds
More informationUNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer.
UIVRSITY OF TRTO DIPARTITO DI IGGRIA SCIZA DLL IFORAZIO 3823 Povo Trento (Italy) Via Soarive 4 http://www.disi.unitn.it O TH US OF SV FOR LCTROAGTIC SUBSURFAC SSIG A. Boni. Conci A. assa and S. Piffer
More informationModel Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon
Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential
More informationNon-Parametric Non-Line-of-Sight Identification 1
Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,
More informationThe linear sampling method and the MUSIC algorithm
INSTITUTE OF PHYSICS PUBLISHING INVERSE PROBLEMS Inverse Probles 17 (2001) 591 595 www.iop.org/journals/ip PII: S0266-5611(01)16989-3 The linear sapling ethod and the MUSIC algorith Margaret Cheney Departent
More informationOnline Gradient Descent Learning Algorithms
DISI, Genova, December 2006 Online Gradient Descent Learning Algorithms Yiming Ying (joint work with Massimiliano Pontil) Department of Computer Science, University College London Introduction Outline
More informationMulti-view Discriminative Manifold Embedding for Pattern Classification
Multi-view Discriinative Manifold Ebedding for Pattern Classification X. Wang Departen of Inforation Zhenghzou 450053, China Y. Guo Departent of Digestive Zhengzhou 450053, China Z. Wang Henan University
More informationPattern Recognition and Machine Learning. Artificial Neural networks
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial
More informationUsing EM To Estimate A Probablity Density With A Mixture Of Gaussians
Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points
More information1 Proof of learning bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a
More informationMachine Learning Basics: Estimators, Bias and Variance
Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics
More informationDistributed Subgradient Methods for Multi-agent Optimization
1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions
More informationSupport Vector Machines. Goals for the lecture
Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed
More informationUniversal algorithms for learning theory Part II : piecewise polynomial functions
Universal algoriths for learning theory Part II : piecewise polynoial functions Peter Binev, Albert Cohen, Wolfgang Dahen, and Ronald DeVore Deceber 6, 2005 Abstract This paper is concerned with estiating
More informationGeometry on Probability Spaces
Geometry on Probability Spaces Steve Smale Toyota Technological Institute at Chicago 427 East 60th Street, Chicago, IL 60637, USA E-mail: smale@math.berkeley.edu Ding-Xuan Zhou Department of Mathematics,
More informationDetection and Estimation Theory
ESE 54 Detection and Estiation Theory Joseph A. O Sullivan Sauel C. Sachs Professor Electronic Systes and Signals Research Laboratory Electrical and Systes Engineering Washington University 11 Urbauer
More informationFixed-to-Variable Length Distribution Matching
Fixed-to-Variable Length Distribution Matching Rana Ali Ajad and Georg Böcherer Institute for Counications Engineering Technische Universität München, Gerany Eail: raa2463@gail.co,georg.boecherer@tu.de
More informationTwo-parameter regularization method for determining the heat source
Global Journal of Pure and Applied Mathematics. ISSN 0973-1768 Volume 13, Number 8 (017), pp. 3937-3950 Research India Publications http://www.ripublication.com Two-parameter regularization method for
More informationIEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Two-Diensional Multi-Label Active Learning with An Efficient Online Adaptation Model for Iage Classification Guo-Jun Qi, Xian-Sheng Hua, Meber,
More informationPAC-Bayes Analysis Of Maximum Entropy Learning
PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E
More informationBoosting with log-loss
Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the
More informationIntroduction to Machine Learning. Recitation 11
Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,
More informationStochastic Subgradient Methods
Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods
More informationOn the Use of A Priori Information for Sparse Signal Approximations
ITS TECHNICAL REPORT NO. 3/4 On the Use of A Priori Inforation for Sparse Signal Approxiations Oscar Divorra Escoda, Lorenzo Granai and Pierre Vandergheynst Signal Processing Institute ITS) Ecole Polytechnique
More informationChapter 6 1-D Continuous Groups
Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:
More informationQuantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search
Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths
More informationWeighted- 1 minimization with multiple weighting sets
Weighted- 1 iniization with ultiple weighting sets Hassan Mansour a,b and Özgür Yılaza a Matheatics Departent, University of British Colubia, Vancouver - BC, Canada; b Coputer Science Departent, University
More informationSupport recovery in compressed sensing: An estimation theoretic approach
Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de
More informationOn the Impact of Kernel Approximation on Learning Accuracy
On the Ipact of Kernel Approxiation on Learning Accuracy Corinna Cortes Mehryar Mohri Aeet Talwalkar Google Research New York, NY corinna@google.co Courant Institute and Google Research New York, NY ohri@cs.nyu.edu
More informationA remark on a success rate model for DPA and CPA
A reark on a success rate odel for DPA and CPA A. Wieers, BSI Version 0.5 andreas.wieers@bsi.bund.de Septeber 5, 2018 Abstract The success rate is the ost coon evaluation etric for easuring the perforance
More informationSupport Vector Machines MIT Course Notes Cynthia Rudin
Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance
More informationLecture October 23. Scribes: Ruixin Qiang and Alana Shine
CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single
More informationNORMAL MATRIX POLYNOMIALS WITH NONSINGULAR LEADING COEFFICIENTS
NORMAL MATRIX POLYNOMIALS WITH NONSINGULAR LEADING COEFFICIENTS NIKOLAOS PAPATHANASIOU AND PANAYIOTIS PSARRAKOS Abstract. In this paper, we introduce the notions of weakly noral and noral atrix polynoials,
More informationUniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval
Unifor Approxiation and Bernstein Polynoials with Coefficients in the Unit Interval Weiang Qian and Marc D. Riedel Electrical and Coputer Engineering, University of Minnesota 200 Union St. S.E. Minneapolis,
More informationASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical
IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul
More informationProbabilistic Machine Learning
Probabilistic Machine Learning by Prof. Seungchul Lee isystes Design Lab http://isystes.unist.ac.kr/ UNIST Table of Contents I.. Probabilistic Linear Regression I... Maxiu Likelihood Solution II... Maxiu-a-Posteriori
More informationExperimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis
City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna
More informationA Note on the Applied Use of MDL Approximations
A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention
More informationLower Bounds for Quantized Matrix Completion
Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &
More informationResearch Article Robust ε-support Vector Regression
Matheatical Probles in Engineering, Article ID 373571, 5 pages http://dx.doi.org/10.1155/2014/373571 Research Article Robust ε-support Vector Regression Yuan Lv and Zhong Gan School of Mechanical Engineering,
More informationAlgorithms for parallel processor scheduling with distinct due windows and unit-time jobs
BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and
More information1 Bounding the Margin
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost
More informationSupport Vector Machines. Maximizing the Margin
Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the
More informationA note on the multiplication of sparse matrices
Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani
More informationKonrad-Zuse-Zentrum für Informationstechnik Berlin Heilbronner Str. 10, D Berlin - Wilmersdorf
Konrad-Zuse-Zentru für Inforationstechnik Berlin Heilbronner Str. 10, D-10711 Berlin - Wilersdorf Folkar A. Borneann On the Convergence of Cascadic Iterations for Elliptic Probles SC 94-8 (Marz 1994) 1
More informationConsistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material
Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable
More informationAsynchronous Gossip Algorithms for Stochastic Optimization
Asynchronous Gossip Algoriths for Stochastic Optiization S. Sundhar Ra ECE Dept. University of Illinois Urbana, IL 680 ssrini@illinois.edu A. Nedić IESE Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu
More informationA Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair
Proceedings of the 6th SEAS International Conference on Siulation, Modelling and Optiization, Lisbon, Portugal, Septeber -4, 006 0 A Siplified Analytical Approach for Efficiency Evaluation of the eaving
More informationRANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION
RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION GUANGHUI LAN AND YI ZHOU Abstract. In this paper, we consider a class of finite-su convex optiization probles defined over a distributed
More informationNeural Network Learning as an Inverse Problem
Neural Network Learning as an Inverse Proble VĚRA KU RKOVÁ, Institute of Coputer Science, Acadey of Sciences of the Czech Republic, Pod Vodárenskou věží 2, 182 07 Prague 8, Czech Republic. Eail: vera@cs.cas.cz
More informationRECOVERY OF A DENSITY FROM THE EIGENVALUES OF A NONHOMOGENEOUS MEMBRANE
Proceedings of ICIPE rd International Conference on Inverse Probles in Engineering: Theory and Practice June -8, 999, Port Ludlow, Washington, USA : RECOVERY OF A DENSITY FROM THE EIGENVALUES OF A NONHOMOGENEOUS
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic
More informationSupport Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab
Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a
More informationLeast Squares Fitting of Data
Least Squares Fitting of Data David Eberly, Geoetric Tools, Redond WA 98052 https://www.geoetrictools.co/ This work is licensed under the Creative Coons Attribution 4.0 International License. To view a
More informationLearning, Regularization and Ill-Posed Inverse Problems
Learning, Regularization and Ill-Posed Inverse Problems Lorenzo Rosasco DISI, Università di Genova rosasco@disi.unige.it Andrea Caponnetto DISI, Università di Genova caponnetto@disi.unige.it Ernesto De
More informationOPTIMIZATION in multi-agent networks has attracted
Distributed constrained optiization and consensus in uncertain networks via proxial iniization Kostas Margellos, Alessandro Falsone, Sione Garatti and Maria Prandini arxiv:603.039v3 [ath.oc] 3 May 07 Abstract
More informationOptimal Jamming Over Additive Noise: Vector Source-Channel Case
Fifty-first Annual Allerton Conference Allerton House, UIUC, Illinois, USA October 2-3, 2013 Optial Jaing Over Additive Noise: Vector Source-Channel Case Erah Akyol and Kenneth Rose Abstract This paper
More informationOn the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation
journal of coplexity 6, 459473 (2000) doi:0.006jco.2000.0544, available online at http:www.idealibrary.co on On the Counication Coplexity of Lipschitzian Optiization for the Coordinated Model of Coputation
More informationAlgebraic Montgomery-Yang problem: the log del Pezzo surface case
c 2014 The Matheatical Society of Japan J. Math. Soc. Japan Vol. 66, No. 4 (2014) pp. 1073 1089 doi: 10.2969/jsj/06641073 Algebraic Montgoery-Yang proble: the log del Pezzo surface case By DongSeon Hwang
More informationThis model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.
CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when
More informationCompressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements
1 Copressive Distilled Sensing: Sparse Recovery Using Adaptivity in Copressive Measureents Jarvis D. Haupt 1 Richard G. Baraniuk 1 Rui M. Castro 2 and Robert D. Nowak 3 1 Dept. of Electrical and Coputer
More informationMachine Learning: Fisher s Linear Discriminant. Lecture 05
Machine Learning: Fisher s Linear Discriinant Lecture 05 Razvan C. Bunescu chool of Electrical Engineering and Coputer cience bunescu@ohio.edu Lecture 05 upervised Learning ask learn an (unkon) function
More informationOn Conditions for Linearity of Optimal Estimation
On Conditions for Linearity of Optial Estiation Erah Akyol, Kuar Viswanatha and Kenneth Rose {eakyol, kuar, rose}@ece.ucsb.edu Departent of Electrical and Coputer Engineering University of California at
More informationThe Hilbert Schmidt version of the commutator theorem for zero trace matrices
The Hilbert Schidt version of the coutator theore for zero trace atrices Oer Angel Gideon Schechtan March 205 Abstract Let A be a coplex atrix with zero trace. Then there are atrices B and C such that
More informationMSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE
Proceeding of the ASME 9 International Manufacturing Science and Engineering Conference MSEC9 October 4-7, 9, West Lafayette, Indiana, USA MSEC9-8466 MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL
More informationLecture 21. Interior Point Methods Setup and Algorithm
Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and
More informationLecture 9: Multi Kernel SVM
Lecture 9: Multi Kernel SVM Stéphane Canu stephane.canu@litislab.eu Sao Paulo 204 April 6, 204 Roadap Tuning the kernel: MKL The ultiple kernel proble Sparse kernel achines for regression: SVR SipleMKL:
More informationA Theoretical Framework for Deep Transfer Learning
A Theoretical Fraewor for Deep Transfer Learning Toer Galanti The School of Coputer Science Tel Aviv University toer22g@gail.co Lior Wolf The School of Coputer Science Tel Aviv University wolf@cs.tau.ac.il
More informationInspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information
Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub
More information