Factorized Sparse Learning Models with Interpretable High Order Feature Interactions
|
|
- Aron Norris
- 5 years ago
- Views:
Transcription
1 Factorized Sparse Learning Modes with Interpretabe High Order Feature Interactions Sanjay Purushotham University of Southern Caifornia Los Angees, CA, USA Martin Renqiang Min NEC Labs America Princeton, NJ, USA Rache Ostroff SomaLogic, Inc. Bouder, CO, USA C.-C. Jay Kuo University of Southern Caifornia Los Angees, CA, USA ABSTRACT Identifying interpretabe discriminative high-order feature interactions given imited training data in high dimensions is chaenging in both machine earning and data mining. In this paper, we propose a factorization based sparse earning framework termed FHIM for identifying high-order feature interactions in inear and ogistic regression modes, and study severa optimization methods for soving them. Unike previous sparse earning methods, our mode FHIM recovers both the main effects and the interaction terms accuratey without imposing tree-structured hierarchica constraints. Furthermore, we show that FHIM has orace properties when extended to generaized inear regression modes with pairwise interactions. Experiments on simuated data show that FHIM outperforms the state-of-the-art sparse earning techniques. Further experiments on our experimentay generated data from patient bood sampes using a nove SOMAmer (Sow Off-rate Modified Aptamer) technoogy show that, FHIM performs bood-based cancer diagnosis and bio-marker discovery for Rena Ce Carcinoma much better than other competing methods, and it identifies interpretabe bock-wise high-order gene interactions predictive of cancer stages of sampes. A iterature survey shows that the interactions identified by FHIM pay important roes in cancer deveopment. Categories and Subject Descriptors J.3 [Computer Appications: Life and Medica Sciences; Co-first author Co-first author, corresponding author To whom data request shoud be sent Permission to make digita or hard copies of a or part of this work for persona or cassroom use is granted without fee provided that copies are not made or distributed for profit or commercia advantage and that copies bear this notice and the fu citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or repubish, to post on servers or to redistribute to ists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. KDD 4, August 24 27, 204, New York, NY, USA. Copyright 204 ACM /4/08...$ I.2.6 [Computing Methodoogies: Artificia Inteigence sparse earning, feature seection Keywords Sparse Learning; High-order interactions; Biomaker Discovery; Bood-based Cancer Diagnosis. INTRODUCTION Identifying interpretabe high-order feature interactions is an important probem in machine earning, data mining, and biomedica informatics, because feature interactions often hep revea some hidden domain knowedge and the structures of probems under consideration. For exampe, genes and proteins sedom perform their functions independenty, so many human diseases are often manifested as the dysfunction of some pathways or functiona gene modues, and the disrupted patterns due to diseases are often more obvious at a pathway or modue eve. Identifying these disrupted gene interactions for different diseases such as cancer wi hep us understand the underying mechanisms of the diseases and deveop effective drugs to cure them. However, identifying reiabe discriminative high-order gene/protein or SNP interactions for accurate disease diagnosis such as eary cancer diagnosis directy based on patient bood sampes is sti a chaenging probem, because we often have very imited patient sampes but a huge number of compex feature interactions to consider. In this paper, we propose a sparse earning framework based on weight matrix factorizations and reguarizations for identifying discriminative high-order feature interactions in inear and ogistic regression modes, and we study severa optimization methods for soving them. Experimenta resuts on synthetic and rea-word datasets show that our method outperforms the state-of-the-art sparse earning techniques, and it provides interpretabe bockwise highorder interactions for disease status prediction. Our proposed sparse earning framework is genera, and can be used to identify any discriminative compex system input interactions that are predictive of system outputs given imited high-dimensiona training data. Our contributions are as foows: () We propose a method capabe of simutaneousy identifying both informative singe
2 discriminative features and discriminative bock-wise highorder interactions in a sparse earning framework, which can be easiy extended to hande arbitrariy high-order feature interactions; (2) Our method works on high-dimensiona input feature spaces and i-posed probems with much more features than data points, which is typica for biomedica appications such as biomarker discovery and cancer diagnosis; (3) Our method has interesting theoretica properties for generaized inear regression modes; (4) The interactions identified by our method ead to biomedica insight into understanding bood-based cancer diagnosis. 2. RELATED WORK Variabe seection has been a we studied topic in statistics, machine earning, and data mining iterature. Generay, variabe seection approaches focus on identifying discriminative features using reguarization techniques. Most recent methods focus on identifying discriminative features or groups of discriminative features based on Lasso penaty [8, Group Lasso [2, Trace-norm [6, Dirty mode [8 and Support Vector Machines (SVMs) [6. A recent approach [20 heuristicay adds some possibe high-order interactions into the input feature set in a greedy way based on asso penaized ogistic regression. Some recent approaches [2,[3 enforce strong and/or weak heredity constraints to recover the pairwise interactions in inear regression modes. In strong heredity, an interaction term can be incuded in the mode ony if the corresponding main terms are aso incuded in the mode, whie in weak heredity, an interaction term is incuded when either of the main terms are incuded in the mode. However, recent studies in bioinformatics has shown that feature interactions need not foow heredity constraints for manifestation of the diseases, and thus the above approaches [2,[3 have imited chance of recovering reevant interactions. Kerne methods such as Gaussian Process [4 and Mutipe Kerne Learning [0 can be used to mode high-order feature interactions, but they can ony te which orders are important. Thus, a these previous approaches either faied to identify specific high-order interactions for prediction or identified sporadic pairwise interactions in a greedy way, which is very unikey to recover the interpretabe bockwise high-order interactions among features in different sub-components (for exampe: pathways or gene functiona modues) of systems. Recenty, [4 proposed an efficient way to identify combinatoria interactions among interactive genes in compex diseases by using overapping group asso and screening. However, they use prior information such as gene ontoogy in their approach, which is generay not avaiabe or difficut to coect for some machine earning probems. Thus, there is a need to deveop new efficient techniques to automaticay capture the important bockwise high-order feature interactions in regression modes, which is the focus of this paper. The remainder of the paper is organized as foows: in section 3 we discuss our probem formuation and reevant notations used in the paper. In section 4, we discuss the main idea of our approach, and in section 5 we give a overview of theoretica properties associated with our method. In section 6, we present the optimization methods which we use to sove our optimization probem. In section 7, we discuss our experimenta setup and present our resuts on synthetic and rea datasets. Finay, in section 8 we concude the paper with discussions and future research directions. 3. PROBLEM FORMULATION Consider a regression setup with a training set of n sampes and p features, {(X (i), y (i) )}, where X (i) R p is the i th instance (coumn) of the design matrix X (p n), y (i) R is the i th instance of response variabe y (n ), and i =,..., n. To mode the response in terms of the predictors, we can set up a inear regression mode or a ogistic regression mode p(y (i) = X (i) ) = y (i) = β T X (i) + ɛ (i), () + exp( β T X (i) β 0), (2) where β R p is the weight vector associated with singe features (aso caed main effects), ɛ R n is a noise vector, and β 0 R is the bias term. In many practica fieds such as bioinformatics and medica informatics, the main terms (the terms ony invoving singe features) are not enough to capture compex reationship between the response and the predictors, and thus high-order interactions are necessary. In this paper, we consider regression modes with both main effects and high-order interaction terms. Equation 3 shows a inear regression mode with pairwise interaction terms. y (i) = β T X (i) + X (i)t WX (i) + ɛ (i), (3) where W(p p) is the weight matrix associated with the pairwise feature interactions. The corresponding oss function (the sum of squared errors) is as foows (we center the data to avoid an additiona bias term), L sqerr(β, W) = n y (i) β T X (i) X T (i) WX (i) 2 2. (4) 2 i= We can simiary write the ogistic regression mode with pairwise interactions as foows, p(y (i) X (i) ) = + exp( y (i) (β T X (i) + X (i)t WX (i) + β 0)) and the corresponding oss function (the sum of the negative og-ikeihood of the training data) is, n L ogistic (β, W, β 0) = og( + exp( y (i) (β T X (i) i= (5) +X (i)t WX (i) + β 0)). (6) 4. OUR APPROACH In this section, we propose an optimization-driven sparse earning framework to identify discriminative singe features and groups of high-order interactions among input features for output prediction in the setting of imited training data. When the number of input features is huge (e.g. biomedica appications), it is practicay impossibe to expicity consider quadratic or even higher-order interactions among a the input features based on simpe asso penaized inear regression or ogistic regression. To sove this probem, we propose to factorize the weight matrix W associated with high-order interactions between input features to be a sum of K rank-one matrices for pairwise interactions or a sum of ow-rank high-order tensors for higher-order interactions.
3 Each rank-one matrix for pairwise feature interactions is represented by an outer product of two identica vectors, and each m-order (m > 2) tensor is represented by the outer product of m identica vectors. Besides minimizing the oss function of inear regression or ogistic regression, we penaize the norm of both the weights associated with singe input features and the weights associated with high-order feature interactions. Mathematicay, we sove the optimization probem to identify the discriminative singe and pairwise interaction features as foows, { ˆβ, â k } = arg min Lsqerr(β, W) a k,β K (7) + λ β β + λ ak a k k= where W = K k= a k a k, represents the tensor product/outer product, and ˆβ, â k represent the estimated parameters of our mode and et Q represent objective function of (7). For ogistic regression, we repace L sqerr(β, W) in (7) by L ogistic (β, W, β 0). We ca our mode Factorizationbased High-order Interaction Mode (FHIM). Proposition 4.. The optimization probem in Equation 7 is convex in β and non-convex in a k. Because of the non-convexity property of our optimization probem, it is difficut to propose optimization agorithms which guarantee convergence to goba optima. Here, we adopt a greedy aternating optimization methods to find the oca optima for our probem. In the case of pairwise interactions, fixing other weights, we sove each rank-one weight matrix each time. Pease note that our symmetric positive definite factorization of W makes this sub-optimization probem very easy. Moreover, for a particuar rank-one weight matrix a k a k, the nonzero entries of the corresponding vector a k can be interpreted as the bock-wise interaction feature indices of a densey interacting feature group. In the case of higher-order interactions, the optimization procedure is simiar to the one for the pairwise interactions except that we have more rounds of aternating optimization. The parameter K of W is generay unknown in rea datasets, thus, we greediy estimate K during the aternating optimization agorithm. In fact, the combination of our factorization formuation and the greedy agorithm is effective for estimating the interaction weight matrix W. β is re-estimated when K is greediy added during the aternating optimization as shown in agorithm. Agorithm Greedy Aternating Optimization : Initiaize β to 0, K = and a K = 2: Whie (K==) (a K 0 for K > ) 3: Repeat unti convergence 4: βj t = arg min j Q(β, t..., βj, t β t j+, βt p ), a t k ) 5: a t k,j = arg min j Q((a t k,,..., a t k,j, a t k,j+, at k,p ), βt ) 6: End Repeat 7: K = K + ; a K = 8: End Whie 9: Remove a K and a K from a. 5. THEORETICAL PROPERTIES In this section, we study the asymptotic behavior of FHIM for the ikeihood-based generaized inear regression modes. The emmas and theorems proved here are simiar to the ones shown in the paper [3. However, in their paper the authors make an assumption on the strong heredity (i.e. interaction term coefficients are dependent on the main effects), which is not assumed in our mode since we are interested in identifying a high-order interactions irrespective of heredity constraints. Here, we discuss the asymptotic properties w.r.t to the main effects and factorized co-efficients. Probem Setup: Assume that the data V i = (X i, y i), i =,...n are coected independenty and Y i has a density of f(g(x i), y i) conditioned on X i, where g is a known regression function with main effects and a possibe pairwise interactions. Let β j and a k,j denote the underying true parameters satisfying bock-wise properties impied by our factorization. Let Q n(θ) denote the objective with negative og-ikeihood and θ = (β T, α T ) T, where α = (a k), k =,..., K. We consider the estimates for FHIM as ˆθ n: ˆθ n = arg min θ Q n(θ) (8) = arg min n i), y i) + λ β β + θ n i=(l(g(x λ αk α k k where L(g(X i), y i) is the oss function of generaized inear regression modes with pairwise interactions. In the case of inear regression, g( ) takes the form of Equation (3) without the noise term ɛ and L( ) takes the form of Equation (4). Now, et us define A = {j : β j 0} A 2 = {(k, ) : α k, 0}, A = A A 2 (9) where A contains the indices of the main terms which correspond to the nonzero true coefficients, and simiary A 2 contains the indices of the factorized interaction terms whose true co-efficients are non-zero. Let us define a n = max{λ β j, λα k : j A, (k, ) A 2} b n = min{λ β j, λα k : j A c, (k, ) A c 2} (0) Now, we show that our mode possesses the orace properties for (i) n with fixed p and (ii) p n as n under some reguarity conditions. Pease refer to Appendix for proofs of the emmas & theorems of sections 5. and Asymptotic Orace Properties when n The asymptotic properties when sampe size increases and the number of predictors is fixed are described in the foowing emmas and theorems. FHIM possesses orace properties [3 under certain reguarity conditions (C)-(C3) shown beow. Let Ω denote the parameter space for θ. (C) The observations V i : i =,..., n are independent and identicay distributed with a probabiity density f(v, θ), which has a common support. We assume the density f satisfies the foowing equations: and [ og f(v, θ) E θ = 0 for j =,..., p(k + ), θ j [ og f(v, θ) og f(v, θ) I jk (θ) =E θ θ j θ k [ =E θ 2 og f(v, θ) θ j θ k
4 (C2) The Fisher Information Matrix [( og f(v, θ) )( og f(v, θ) ) T I(θ) = E θ θ is finite and positive definite at θ = θ. (C3) There exists an open set ω of Ω that contains the true parameter point θ such that for amost a V the density f(v, θ) admits a third derivatives ( 3 f(v, θ))/( θ j θ k θ ) for a θ ω and any j, k, =,..., p(k + ). Furthermore, there exist functions M jk such that 3 og f(v, θ) θ j θ k θ M jk(v) for a θ ω where m jk = E θ [M jk (V) <. These reguarity conditions are the existence of common support and first, second derivatives for f(v, θ); Fisher Information matrix being finite and positive definite; and existence of bounded third derivative for f(v, θ). These reguarity conditions guarantee asymptotic normaity of the ordinary maximum ikeihood estimates [. Lemma 5.. Assume a n = o() as n. Then under reguarity conditions (C)-(C3), there exists a oca minimizer ˆθ n of Q n(θ) such that ˆθ n θ = O P (n /2 + a n) Theorem 5.2. Assume nb n and the minimizer ˆθ n given in emma 5. satisfies ˆθ n θ = O P (n /2 ). Then under reguarity conditions (C)-(C3), we have P ( ˆβ A C = 0), P ( ˆα A C 2 = 0) Lemma 5. impies that when the tuning parameters associated with the non-zero coefficients of main effects and pairwise interactions tend to 0 at a rate faster than n /2, then there exists a oca minimizer of Q n(θ), which is n consistent (the samping error is O p(n /2 )). Theorem 5.2 shows that our mode removes noise consistenty with high probabiity ( ). If na n 0 and nb n, then emma 5. and theorem 5.2 impy that the n consistent estimator ˆθ n satisfies P (ˆθ A c = 0). Theorem 5.3. Assume na n 0 and nb n. Then under the reguarity conditions (C)-(C3), the component ˆθ A of the oca minimizer ˆθ n (given in emma 5.) satisfies n(ˆθa θ A) d N(0, I (θ A)), where I(θ A) is the Fisher information matrix of θ A at θ A = θ A assuming that θ Ac = 0 is known in advance. Theorem 5.3 shows that our mode estimates the non-zero coefficients of the true mode with the same asymptotic distribution as if the zero coefficients were known in advance. Based on theorems 5.2 and 5.3, we can say that our mode has the orace property [3, [5, when the tuning parameters satisfy the conditions na n 0 and nb n. To satisfy these conditions, we have to consider adaptive weights w β j, wα k [23 for our tuning parameters λ β, λ αk (see appendix for more detais). Thus, our tuning parameters are: λ β j = og n n λ βw β j, λα k = og n n λα k wα k 5.2 Asymptotic Orace Properties When p n as n In this section, we consider the asymptotic behavior of our mode when the number of predictors p n grows to infinity aong with the sampe size n. If certain reguarity conditions (C4)-(C6) (shown beow) hod, then we can show that our mode possesses the orace property. We denote the tota number of predictors by p n. We denote a the quantities that change with sampe size by adding n as their subscript. A, A 2, A are defined as in section 5 and et s n = A n. The asymptotic properties of our mode when the number of predictors increases aong with the sampe size are described in the foowing emma and theorem. The reguarity conditions (C4)-(C6) are given beow: Let Ω n denote the parameter space for θ n. (C4) The observations V ni : i =,..., n are independent and identicay distributed with a probabiity density f n(v n, θ n), which has a common support. We assume the density f n satisfies the foowing equations: and [ og fn(v n, θ n) E θn = 0 for j =,..., p n, θ nj [ og fn(v n, θ n) og f n(v n, θ n) I jk (θ n) =E θn θ nj θ nk [ =E θn 2 og f n(v n, θ n) θ nj θ nk (C5) I n(θ n) = E[( og fn(v n,θ n) θ n )( og fn(v n,θ n) θ n ) T satisfies 0 < C < λ mini n(θ n) λ maxi n(θ n) < C 2 < for a n, where λ min(.) and λ max(.) represent the smaest and argest eigenvaues of a matrix respectivey. Moreover, for any j, k =,..., p n, { } 2 og f n(v n, θ n) og f n(v n, θ n) E θn θ nj θ nk and < C 3 <, E θn { } 2 og f n(v n, θ n) < C 4 < θ nj θ nk (C6) There exists a arge open set ω n Ω n R pn which contains the true parameters θ n such that for amost a V ni the density admits a third derivatives 3 f n(v ni, θ n)/ θ nj θ nk θ n for a θ n ω n. Furthermore, there are functions M njk such that 3 f n(v ni, θ n) θ nj θ nk θ n M njk(v ni) for a θ n ω n and for a p n, n, and j, k,. E θn M 2 njk(v ni) < C 5 < Lemma 5.4. Assume that the density f n(v n, θ n) satisfies some reguarity conditions (C4)-(C6). If na n 0 and p 5 n/n 0 as n, then there exists a oca minimizer ˆθ n of Q n(θ) such that ˆθ n θ n = O P ( p n(n /2 + a n))
5 Theorem 5.5. Suppose that the density f n(v n, θ n) satisfies some reguarity conditions (C4)-(C6). If np na n 0, n/pnb n and p 5 n/n 0 as n, then with probabiity tending to, the n/p n-consistent oca minimizer ˆθ n in Lemma 5.4 satisfies the foowing: Sparsity: ˆθ na c n = 0 Asymptotic normaity: nani 2 n (ˆθ nan θ na n ) d N(0, G) where A n is an arbitrary m s n matrix with finite m such that A na T n G and G is a m m nonnegative symmetric matrix and I n(θ na n ) is the Fisher information matrix of θ nan at θ nan = θ na n. Since the dimension of ˆθ nan as sampe size n, we coud consider arbitrary inear combination A n ˆθnAn for the asymptotic normaity of our mode s estimates. Simiar to section 5., to satisfy orace property, we have to consider an adaptive weights w β nj, wα k n [23 for our tuning parameters λ β, λ αk as: λ β nj = og(n)pn λ β w β nj n, n, = og(n)pn λ αk w α k n n λα k 6. OPTIMIZATION In this section, we outine three optimization methods that we empoy to sove our objective function (7), which corresponds to Line 4 and 5 in Agorithm. [5 provides a good survey on severa optimization approaches for soving -reguarized regression probems. In this paper, we use the sub-gradient and co-ordinate wise soft-threshoding based optimization methods since they work we and are easy to impement. We compare these methods in the experimenta resuts in section Sub-Gradient Methods Sub-gradient based strategies treat the non-differentiabe objective as a non-smooth optimization probem and use sub-gradients of the objective function at the non-differentiabe points. For our mode, the optimaity conditions w.r.t parameter vectors β and a k can be written out separatey based on the objective function (7). Optimaity conditions w.r.t a k is: { jl(a k ) + λ ak sgn(a kj ) = 0 a kj > 0 jl(a k ) λ ak a kj = 0 where L(a k ) is the oss function of our inear regression mode or ogistic regression mode in Equation (7) w.r.t a k. Simiary, optimaity conditions can be written for β. The sub-gradient s jf(a k ) for each a kj is given by where s jf(a k ) = jl(a k ) + λ ak sgn(a kj ), a kj > 0 jl(a k ) + λ ak, a kj = 0, jl(a k ) < λ ak jl(a k ) λ ak, a kj = 0, jl(a k ) > λ ak 0, λ ak jl(a k ) λ ak jl(a k ) = 2 i ( 2X (i) j X T (i) WX (i). XT (i) a k )[y (i) β T X (i) for our inear regression mode. The negation of the subgradient represents the steepest descent direction. Simiary the sub-gradients for β ( s jf(β) ) can be cacuated. Differentia of the oss function of the inear regression in Equation (7) w.r.t β is given by jl(β) = ( 2X (i) j 2 )[y(i) β T X (i) X T (i) WX (i) i 6.. Orthant-Wise Descent (OWD) Andrew and Gao [ proposed an effective strategy for soving arge-scae -reguarized regression probems based on choosing an appropriate steepest descent direction for the objective function and taking a step ike a Newton iteration in this direction (with an L-BFGS Hessian approximation [2). The orthant-wise earning descent method for our mode takes the foowing form β P O[β γ β P S[H β s f(β) a k P O[a k γ ak P S[H a k s f(a k) where P O and P S are two projection operators and H β is the positive definite approximation of Hessian of quadratic approximation of objective function f(β), and γ β and γ ak are step sizes. P S projects the Newton-ike direction to guarantee that it is in the descent direction. P O projects the step onto the orthant containing β or a k and ensures that ine search does not cross points of non-differentiabiity Projected Scaed Sub-Gradient (PSS) Schmidt [5 proposed optimization methods caed Projected Scaed Sub-Gradient methods where the iterations can be written as the projection of a scaing of a sub-gradient of the objective. Pease refer to [ and [5 for more detais on OWD and PSS methods. 6.2 Soft-threshoding Soft-threshoding based co-ordinate descent optimization method can be used to find β, a k updates in the aternating optimization agorithm for our FHIM mode. The β updates are β j and are given by β j(λ β ) S( βj(λ β ) + k n X ij(y i i= k j ) X ik WX ki ), λ β X jk βk where W = k ak ak, and S is the soft-threshoding operator [7. Simiary, the updates for a k are ã kj and given by ( ã kj (λ ak ) S ã kj (λ ak ) + k j X jk βk k n p a kr X ir)[y i r= X ij( i= k ) X ik W jx ki, λ ak where W j is W with j th coumn and j th row eements are a zero. 7. EXPERIMENTS In this section, we use synthetic and rea datasets to demonstrate the efficacy of our mode (FHIM), and compare its performance with LASSO [8, A-Pairs Lasso [2, Hierarchica LASSO [2, Group Lasso [2, Trace-norm [9, Dirty mode [8 and QUIRE [4. For a these modes, we perform 5 runs of 5-fod cross-vaidation on training dataset (80 %)
6 to find the optima parameters and evauate prediction error on a test dataset (20 %). We search tuning parameters for a methods using grid search and for our mode the parameters λ β and λ ak are searched in the range of [0.0, 0. We aso discuss the support recovery of β and W for our mode. 7. Datasets We use synthetic datasets and a rea dataset for cassification and support recovery experiments. We give detaied description of these datasets beow. 7.. Synthetic Dataset We generate the predictors of the design matrix X using a norma distribution with mean zero and variance one. The weight matrix W was generated as a sum of K rank one matrices i.e. W = K k= a ka T k. β, a k were generated as a sparse vector from a norma distribution with mean 0 and variance, whie noise vector ɛ is generated from a norma distribution with mean 0 and variance 0.. Finay, the response vectors y of the inear and ogistic regression modes with pairwise interactions were generated using Equations (3) and (5) respectivey. We generated severa synthetic datasets by varying number of instances (n), number of variabes/predictors (p), rank of W i.e. K and sparsity eve of β, a k. We denote the combined tota predictors (that is main effects predictors + predictors for interaction terms) by q, here q = p(p + )/2. Sparsity eve (non-zeros) was chosen as 2 4% for arge p(> 00), and 5 0% for sma p(< 00) for both β, a k. In this paper, we show resuts for synthetic data in these settings: Case () n > p and q > n (high-dimensiona setting w.r.t combined predictors) and, Case (2) p > n (high-dimensiona w.r.t origina predictors) Rea Dataset To predict cancer progression status directy from bood sampes, we generated our own dataset. A sampes and cinica information were coected under Heath Insurance Portabiity and Accountabiity Act compiance from study participants after obtaining written informed consent under cinica research protocos approved by the institutiona review boards for each site. Bood was processed within 2 hours of coection according to estabished standard operating procedures. To predict RCC status, serum sampes were coected at a singe study site from patients diagnosed with RCC or benign rena mass prior to treatment. Definitive pathoogy diagnosis of RCC and cancer stage was made after resection. Outcome data was obtained through foowup from 3 months to 5 years after initia treatment. Our RCC dataset contains 22 RCC sampes from benign and 4 different stages of tumor. Expression eves of 092 proteins based on a high-throughput SOMAmer protein quantification technoogy are coected. The number of Benign, Stage, Stage 2, Stage 3 and Stage 4 tumor sampes are 40; 0; 7; 24 and 3 respectivey Experimenta Design We use inear regression modes (Equation 3) for a the foowing experiments and we ony use ogistic regression modes (Equation 5) for synthetic data experiments shown in tabe 2. We evauate the performance of our method (FHIM) by the foowing experiments:. Prediction error and support recovery experiments on synthetic datasets 2. Cassification experiments using RCC sampes: We perform three stage-wise binary cassification experiments using RCC sampes: (a) Case : Cassification of Benign sampes from Stage 4 sampes. (b) Case 2: Cassification of Benign and Stage sampes from Stage 2 4 sampes. (c) Case 3: Cassification of Benign, Stage, 2 sampes from Stage 3, 4 sampes Performance on Synthetic dataset We evauate the performance of our mode (FHIM) on synthetic dataset by the foowing experiments: (i) Comparison of optimization methods presented in section 6, (ii) Prediction error on the test data for q > n and p > n (highdimensiona settings), (iii) Support recovery accuracy of β, W and (iv) Prediction of rank of W using greedy approach. Tabe 3 shows the prediction error on test data when different optimization methods (discussed in section 6) are used for our mode (FHIM). From tabe 3, we see that both OWD and PSS methods perform neary simiar (OWD is marginay better), and are better than the soft-threshoding method. This is because, in soft-threshoding, co-ordinate updates of variabes might not be accurate in high dimensiona settings (i.e. the soution is affected by the path taken during updates). We observed that soft-threshoding in genera is sower than OWD and PSS methods. For a the other experiments discussed in this paper, we choose OWD as the optimization method for FHIM. Tabe and Tabe 2 shows Figure : Support Recovery of β (90 % sparse) and W (99 % sparse) for synthetic data Case : n > p and q > n where n = 000, p = 50, q = 275. Figure 2: Support Recovery of W (99.5 % sparse) for synthetic data Case 2: p > n where p = 500, n = 00. Onine suppementary materias contain high-quaity images for this figure. the performance comparison (in terms of prediction error on test dataset) of FHIM for inear and ogistic regression modes with respect to the state-of-the-art approaches such as Lasso, Fused Lasso, Trace-Norm and Hierarchica Lasso (HLasso is a genera version of SHIM [3). From tabes and 2, we see that FHIM generay outperforms a the state-ofthe-art approaches for both inear and ogistic pairwise regression modes. For q > n, we see that test data prediction
7 Average area under ROC curve q > n p > n n, p, K FHIM Fused Lasso Lasso HLasso Trace norm Dirty Mode 000, 50, 338.4(4.5) 425.9(20.7) 474.7(5.3) (24.82) 464.4(36.3) 63.5(0.76) 000, 50, (2.9) 888.3(2.) 922.9(43.9) 889. (2.5) 822.6(99.8) (0.76) 0000, 500, 093.(9.5) (55.) (29.5) (0.) (0.8) 0000, 500, (2.2) 22720(597.8) (23.3) (32.4) 2924(0.8) 00, 500, (50.3) 57.2(355.0) 335.0(59.2) (299.7) 65.9(62.6) 00, 000, 340. (40.02) 770.9(27.6) 879.(80.3) (208.7) 808.(5.) 00, 2000, (00.) 022.3(406.2) 99.2(32.) (47.6) 96.7(63.4) Tabe : Performance comparison for synthetic data on inear regression mode with high-order interactions. Prediction Error (MSE) and Std. deviation of MSE (shown inside brackets) on test data is used to measure the mode s performance. For p >= 500, Hierarchica Lasso (HLasso) has heavy computationa compexity, hence we don t show it s resuts here. q > n p > n n, p, K FHIM Fused Lasso Lasso HLasso Trace norm 000, 50, 0.27 (0.009) 0.28 (0.07) 0.56 (0.07) 0.36 (0.02) 0.28 (0.06) 000, 50, (0.03) (0.024) (0.042) (0.022) (0.027) 0000, 500, 0.35 (0.002) (0.007) 0.6 (0.02) (0.077) 0000, 500, (0.05) 0.54 (0.006) 0.507(0.08) (0.006) 00, 500, (0.04) (0.086) (0.054) (0.079) 00, 000, (0.056) 0.409(0.086) 0.458(0.083) (0.0) Tabe 2: Performance comparison for synthetic dataset on ogistic regression mode with high-order interactions. Miscassification Error on test data is used to measure the mode s performance error for FHIM is consistenty ower compared to a other approaches. For p > n, FHIM performs sighty better than other approaches, however, the prediction error for a the approaches is high since it s hard to accuratey recover the coefficients of main effects and pairwise interactions in very high-dimensiona settings. From figure and tabe 4, we see that our mode performs very we (F score cose to ) in the support recovery of β and W for the q > n setting. From figure 2 and tabe 5, we see that our mode performs fairy we in the support recovery of W for p > n setting. We observe that when the tuning parameters are correcty chosen, support recovery of W works very we when W is ow-rank (see tabe 4 and 5), and the F score for the support recovery of W decreases with increase in rank of W. Tabe 5 shows that for q > n the greedy strategy of FHIM accuratey recovers the rank K of W, whie for p > n, the greedy strategy might not correcty recover K. This is because the tensor factorization is not unique and sighty correated variabes can enter our mode during optimization. RCC Dataset the markers seected by Lasso, A-Pairs Lasso [2, Group Lasso, Dirty mode [8 and QUIRE. We use SLEP [3, MAL- SAR [22 and QUIRE packages for the impementation of these modes. The overa performance of the agorithms are shown in Figure 3. In this figure, we report average AUC score for five runs of 5-fod cross vaidation experiments for cancer stage prediction in RCC. In 5-fod cross vaidation experiments, we train our mode on the four fods to identify the main effects and pairwise interactions and we use the remaining one fod for testing prediction. The average AUC achieved by features seected with our mode are 0.68, 0.89 and 0.84 respectivey for the three cases discussed in section We performed pairwise t-tests for the comparisons of our method vs. the other methods, and a p-vaues are beow From figure 3, it is cear that our mode outperforms a the other agorithms that do not use prior feature group information for a the three cassification cases of RCC prediction. In addition, our mode has simiar performance to the state-of-the-art technique - QUIRE [4, which uses Gene Ontoogy based functiona annotation for grouping and custering of genes to identify high order interactions Benign vs. Stage-4 Benign, Stage vs. Stage 2-4 Benign, Stage -2 vs. Stage 3-4 APLasso Lasso Trace-norm Group Lasso Dirty Mode QUIRE FHIM (Ours) IL5RA CHEK2 ACE2 PES AIP CXC3L CD97 IL6ST Figure 3: Comparison of the cassification performances of different feature seection approaches with our mode in identifying the different stages of RCC. We perform five fod cross vaidation five times and average AUC score is reported Cassification Performance on RCC In this section, we report systematic experimenta resuts on cassification of sampes from different stages of RCC. The predictive performance of the markers and pairwise interactions seected by our mode (FHIM) is compared against Figure 4: Exampes of functiona modues for RCC Case 3, induced by markers and interactions discovered by our mode and enriched in pathways and functions associated with RCC 7..6 Informative interactions discovered by FHIM An investigation of the pairwise interactions identified by our mode on RCC dataset reveas that many of these interactions are indeed reevant to the prediction of cancer. Figure 4 shows some of the interactions associated with higher
8 weighted pairwise co-efficients for Case 3 of RCC cassification experiment. The interactions incude CX3CL- CD97, CHEK2-IL5RA which are known to be reated to proteins in bood. CX3CL was recenty found to promote breast cancer [7, whie CD97 was found to promote coorecta cancer [9. We beieve these protein interactions might ead to rena ce cancer. Further investigations of the interactions identified by our mode might revea nove protein interactions associated with rena ce cancer and thus eading to testabe hypothesis Time Compexity FHIM has O(np) time compexity for agorithm. In genera, FHIM takes more time than the Lasso approach since we do aternating optimization of β, a k. For q n setting with n = 000, q = 275, our OWD earning optimization method on Matab takes around minute for 5-fod cross-vaidation, whie for p > n with p = 2000, n = 00, our FHIM mode took around 2 hours for 5-fod cross-vaidation. Our experiments were run on inte i3 dua-core 2.9 GHz CPU with 8 GB RAM. n, p OWD Soft-thres PSS -hoding 00, , , Tabe 3: Comparison of optimization methods for our FHIM mode based on test data prediction error n, p Sparsity K Support recovery β, a k β, W (F score) 000, 50 5, 5.0,.0 000, 50 5, 5 3.0, , 50 5, 5 5.0, , 500 0, , , 500 0, , , 500 0, , 0.55 Tabe 4: Support recovery of β, W n, p true estimated W support recovery K K F score 000, , , , , , Tabe 5: Recovering K using greedy strategy 8. CONCLUSIONS In this paper, we proposed a factorization based sparse earning framework caed FHIM for identifying high-order feature interactions in inear and ogistic regression modes, and studied severa optimization methods for our mode. Empirica experiments on synthetic and rea datasets showed that our mode outperforms severa we-known techniques such as Lasso, Trace-norm, GroupLasso and achieves comparabe performance to the current state-of-the-art method - QUIRE, whie not assuming any prior knowedge about the data. Our mode gives interpretabe resuts for high-order feature interactions on RCC dataset which can be used for biomarker discovery for disease diagnosis. In the future, we wi consider the foowing directions: (i) We wi consider factorization of the weight matrix W as W = k a kb T k and higher-order feature interactions, which is more genera, but the optimization probem is non-convex; (ii) We wi extend our optimization methods from Singe- Task Learning to Muti-Task Learning; (iii) We wi consider groupings of features for both Singe Task Learning and Muti-Task Learning. References [ G. Andrew and J. Gao. Scaabe training of -reguarized og-inear modes. In Proceedings of the 24th internationa conference on Machine earning, pages ACM, [2 J. Bien, J. Tayor, and R. Tibshirani. A asso for hierarchica interactions. The Annas of Statistics, 4(3): 4, 203. [3 N. H. Choi, W. Li, and J. Zhu. Variabe seection with the strong heredity constraint and its orace property. Journa of the American Statistica Association, 05(489): , 200. [4 D. K. Duvenaud, H. Nickisch, and C. E. Rasmussen. Additive gaussian processes. In NIPS, pages , 20. [5 J. Fan and R. Li. Variabe seection via nonconcave penaized ikeihood and its orace properties. Journa of the American Statistica Association, 96(456): , 200. [6 R. Foyge, N. Srebro, and R. Saakhutdinov. Matrix reconstruction with the oca max norm. arxiv preprint arxiv:20.596, 202. [7 J. Friedman, T. Hastie, H. Höfing, and R. Tibshirani. Pathwise coordinate optimization. The Annas of Appied Statistics, (2): , [8 A. Jaai, P. Ravikumar, and S. Sanghavi. A dirty mode for mutipe sparse regression. arxiv preprint arxiv: , 20. [9 S. Ji and J. Ye. An acceerated gradient method for trace norm minimization. In Proceedings of the 26th Annua Internationa Conference on Machine Learning, pages ACM, [0 G. R. G. Lanckriet, N. Cristianini, P. Bartett, L. E. Ghaoui, and M. I. Jordan. Learning the kerne matrix with semidefinite programming. J. Mach. Learn. Res., 5:27 72, Dec [ E. L. Lehmann and G. Casea. Theory of point estimation, voume 3. Springer, 998. [2 D. C. Liu and J. Noceda. On the imited memory bfgs method for arge scae optimization. Mathematica programming, 45(-3): , 989. [3 J. Liu, S. Ji, and J. Ye. SLEP: Sparse Learning with Efficient Projections. Arizona State University, [4 R. Min, S. Chowdhury, Y. Qi, A. Stewart, and R. Ostroff. An integrated approach to bood-based cancer diagnosis and biomarker discovery. In Proceedings of the Pacific Symposium on Biocomputing, 204.
9 [5 M. Schmidt. Graphica mode structure earning with -reguarization. PhD thesis, UNIVERSITY OF BRITISH COLUMBIA, 200. [6 J. A. Suykens and J. Vandewae. Least squares support vector machine cassifiers. Neura processing etters, 9(3): , 999. [7 M. Tardaguia, E. Mira, M. A. García-Cabezas, A. M. Feijoo, M. Quintea-Fandino, I. Azcoitia, S. A. Lira, and S. Manes. Cx3c promotes breast cancer via transactivation of the egf pathway. Cancer research, 203. [8 R. Tibshirani. Regression shrinkage and seection via the asso. Journa of the Roya Statistica Society. Series B (Methodoogica), pages , 996. [9 M. Wobus, O. Huber, J. Hamann, and G. Aust. Cd97 overexpression in tumor ces at the invasion front in coorecta cancer (cc) is independenty reguated of the canonica wnt pathway. Moecuar carcinogenesis, 45():88 886, [20 T. T. Wu, Y. F. Chen, T. Hastie, E. Sobe, and K. Lange. Genome-wide association anaysis by asso penaized ogistic regression. Bioinformatics, 25(6):74 72, [2 M. Yuan and Y. Lin. Mode seection and estimation in regression with grouped variabes. Journa of the Roya Statistica Society: Series B (Statistica Methodoogy), 68():49 67, [22 J. Zhou, J. Chen, and J. Ye. MALSAR: Muti-tAsk Learning via StructurA Reguarization. Arizona State University, 20. [23 H. Zou. The adaptive asso and its orace properties. Journa of the American statistica association, 0(476):48 429, APPENDIX A. PROOFS FOR SECTION 5. Proof of Lemma 5.:. Let η n = n /2 + a n and {θ + η nδ : δ d} be the ba around θ, where δ = (u,..., u p, v,...v Kp) T = (u T, v T ) T. Define D n(δ) Q n(θ + η nδ) Q n(θ ) Where Q n(θ ) is defined in equation (8). For δ that satisfies δ = d, we have D n(δ) = L n(θ + η nδ) + L n(θ ) + n j + n k, λ β j ( β j + η nu j β j ) λ α k ( αk, + η nv k ) αk, L n(θ + η nδ) + L n(θ ) + n λ β j ( β j + η nu j βj ) j A + n ( αk, + η nv k ) αk, λ αk (k,) A 2 L n(θ + η nδ) + L n(θ ) nη n uj nηn j A λ β j (k,) A 2 λ αk L n(θ + η nδ) + L n(θ ) nηn( 2 u j + v k ) j A (k,) A 2 v k L n(θ + η nδ) + L n(θ ) nη 2 n( A + A 2 )d = [ L n(θ ) T (η nδ) 2 (ηnδ)t [ 2 L n(θ ) (η nδ)( + o p()) nη 2 n( A + A 2 )d We used Tayor s expansion in above step. above into three parts and we get: Thus, K = η n[ L n(θ ) T δ = nη n( n L n(θ )) T δ = O p(nη 2 n)δ K 2 = 2 nη2 n{δ T [ n 2 L n(θ )δ( + o p())} = 2 nη2 n{δ T [I(θ )δ( + o p())} K 3 = nη 2 n( A + A 2 )d D n(δ) K + K 2 + K 3 We spit the = O p(nη 2 n)δ + 2 nη2 n{δ T [I(θ )δ( + o p())} nη 2 n( A + A 2 )d We see that K 2 dominates the rest of the terms and is positive since I(θ) is positive definite at θ = θ from reguarity condition (C2). Therefore, for any given ɛ > 0 there exists a arge enough constant d such that P { inf Q n(θ + η nδ) > Q n(θ )} ɛ δ =d This impies that with probabiity at-east ɛ, there exists a oca minimizer in the ba {θ +η nδ : δ d}. Thus, there exists a oca minimizer of Q n(θ) such that ˆθ n θ = O p(η n). Proof of Theorem 5.2:. Let us first consider P (ˆα A c 2 = 0). It is sufficient to show that for any (k, ) A c 2 Q n(ˆθ n) α k, < 0 for ɛ n < ˆα k, < 0 () Q n(ˆθ n) α k, > 0 for ɛ n > ˆα k, > 0 (2) with probabiity tending to, where ɛ n = Cn /2 and C > 0 is any constant. To show (2), notice Q n(ˆθ n) = Ln(ˆθ n) + nλ α k sgn(ˆα k, ) α k, α k, = Ln(θ ) α k, p(k+) j= 2 L n(θ ) α k, θ j (ˆθ j θ j )
10 p(k+) j= p(k+) m= +nλ α k sgn(ˆα k, ) 3 L n( θ) α k, θ j θ m (ˆθ j θ j )(ˆθ m θ m) where θ ies between ˆθ n and θ. By reguarity conditions (C)-(C3) and the Lemma 5., we have Q n(ˆθ n) α k, = n{o p() + nλ α k sgn(ˆα k, )} As nλ α k for (k, ) A c 2 from the assumption, the sign of Qn(ˆθ n) α k, is dominated by sgn(ˆα k,j ). Thus, ( ) P Q n(ˆθ n) > 0 for 0 < ˆα k, < ɛ n α k, as n () has identica proof as above. Aso, P ( ˆβ A c = 0) can be proved simiary since in our mode β and α are independent of each other. Proof of Theorem 5.3. Let Q n(θ A) denote the objective function Q n ony on the A component of θ, that is Q n(θ) with θ A c. Based on Lemma 5. and Theorem 5.2, we have P ( θ ˆ A c = 0). Thus, ( P arg min Q n(θ A) = (A component of θ A ) arg min Q n(θ)) θ Thus, ˆθ A shoud satisfy Q n(θ A) θ j θa =ˆθ A = 0 j A (3) with probabiity tending to. Let L n(θ A) and P λ (θ A) denote the og-ikeihood function of θ A and the penaty function of θ A respectivey so that we have From (3), we have Q n(θ A) = L n(θ A) + np λ (θ A) AQ n(ˆθ A) = AL n(ˆθ A) + n AP λ (ˆθ A) = 0, (4) with probabiity tending to. Now, consider by Tayor expansion of first term and second terms at θ A = θ A, we get the foowing: AL n(ˆθ A) = AL n(θ A) [ 2 AL n(θ A) + o p() (ˆθ A θ A) [ = n n AL n(θ A)+ I(θ A) n(ˆθ A θ A) + o p() { [ λ β j n AP λ (ˆθ A) =n sgn(βj) λ α k sgn(α k, ) } +o p()(ˆθ A θ A) j A,(k,) A 2 Thus, we get, 0 = n [ n AL n(θ A) +I(θ A) n(ˆθ A θ A) + o p() Therefore, from centra imit theorem, n(ˆθa θ A) d N(0, I (θ A)) The proofs for emma and theorem of section 5.2 are aong the same ines as above. Pease refer to Appendix section C for more detais. B. COMPUTING ADAPTIVE WEIGHTS Here, we expain how the adaptive weights w β j, wα k can be cacuated for tuning parameters λ β, λ αk in Theoretica properties (Theorems 5.3 & 5.5) of Section 5. Let q be the tota number of predictors, et n be tota number of instances. When n > q, we can compute the adaptive weights w β j, wα k for tuning parameters λ β j, λα k using ordinary east squares (OLS) estimates of the training observations. where λ β j = og n n λ βw β j, λα k w β j = ˆβ OLS j, w α k = = og n n λα k wα k ˆα k OLS When q > n, the OLS estimates are not avaiabe and so we compute the weights using the ridge regression estimates, that is, repacing a the above OLS estimates with the ridge regression estimates. The tuning parameter for ridge regression can be seected using cross-vaidation. Note, we OLS find ˆα k j by taking east squares w.r.t to each α k where k [0, K for some K truek. Without oss of generaity we can assume K = truek for proving the Theoretica properties in section 5. Even if K truek, it does not affect the Theoretica properties since the cardinaity of A 2( A 2 ) does not affect the root-n consistency (see, proof of emma 5.). In practice, K is greediy chosen by ago. in our paper. C. SUPPLEMENTARY MATERIALS For interested readers, we provide onine suppementary materias at pdf with detaied proofs for Lemma 5.4 and Theorem 5.5. These proofs do not affect the understanding of this paper. We aso provide high quaity images for figure in the suppementary materias. We provide more detais about the experimenta settings for state-of-the-art techniques used in this paper., = no p() since na n = o() and ˆθ A θ A = O p(n /2 )
Moreau-Yosida Regularization for Grouped Tree Structure Learning
Moreau-Yosida Reguarization for Grouped Tree Structure Learning Jun Liu Computer Science and Engineering Arizona State University J.Liu@asu.edu Jieping Ye Computer Science and Engineering Arizona State
More informationFRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)
1 FRST 531 -- Mutivariate Statistics Mutivariate Discriminant Anaysis (MDA) Purpose: 1. To predict which group (Y) an observation beongs to based on the characteristics of p predictor (X) variabes, using
More informationStatistical Learning Theory: A Primer
Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO
More informationStochastic Variational Inference with Gradient Linearization
Stochastic Variationa Inference with Gradient Linearization Suppementa Materia Tobias Pötz * Anne S Wannenwetsch Stefan Roth Department of Computer Science, TU Darmstadt Preface In this suppementa materia,
More informationA Brief Introduction to Markov Chains and Hidden Markov Models
A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,
More informationSUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS
ISEE 1 SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS By Yingying Fan and Jinchi Lv University of Southern Caifornia This Suppementary Materia
More informationA. Distribution of the test statistic
A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch
More informationFrom Margins to Probabilities in Multiclass Learning Problems
From Margins to Probabiities in Muticass Learning Probems Andrea Passerini and Massimiiano Ponti 2 and Paoo Frasconi 3 Abstract. We study the probem of muticass cassification within the framework of error
More information(This is a sample cover image for this issue. The actual cover is not yet available at this time.)
(This is a sampe cover image for this issue The actua cover is not yet avaiabe at this time) This artice appeared in a journa pubished by Esevier The attached copy is furnished to the author for interna
More informationBP neural network-based sports performance prediction model applied research
Avaiabe onine www.jocpr.com Journa of Chemica and Pharmaceutica Research, 204, 6(7:93-936 Research Artice ISSN : 0975-7384 CODEN(USA : JCPRC5 BP neura networ-based sports performance prediction mode appied
More informationProbabilistic Graphical Models
Schoo of Computer Science Probabiistic Graphica Modes Gaussian graphica modes and Ising modes: modeing networks Eric Xing Lecture 0, February 0, 07 Reading: See cass website Eric Xing @ CMU, 005-07 Network
More informationMultilayer Kerceptron
Mutiayer Kerceptron Zotán Szabó, András Lőrincz Department of Information Systems, Facuty of Informatics Eötvös Loránd University Pázmány Péter sétány 1/C H-1117, Budapest, Hungary e-mai: szzoi@csetehu,
More informationAppendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model
Appendix of the Paper The Roe of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Mode Caio Ameida cameida@fgv.br José Vicente jose.vaentim@bcb.gov.br June 008 1 Introduction In this
More informationBayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?
Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine
More informationStatistical Learning Theory: a Primer
??,??, 1 6 (??) c?? Kuwer Academic Pubishers, Boston. Manufactured in The Netherands. Statistica Learning Theory: a Primer THEODOROS EVGENIOU AND MASSIMILIANO PONTIL Center for Bioogica and Computationa
More informationInductive Bias: How to generalize on novel data. CS Inductive Bias 1
Inductive Bias: How to generaize on nove data CS 478 - Inductive Bias 1 Overfitting Noise vs. Exceptions CS 478 - Inductive Bias 2 Non-Linear Tasks Linear Regression wi not generaize we to the task beow
More informationAn Algorithm for Pruning Redundant Modules in Min-Max Modular Network
An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai
More informationDo Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix
VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 43 Do Schoos Matter for High Math Achievement? Evidence from the American Mathematics Competitions Genn Eison and Ashey Swanson Onine Appendix Appendix
More informationTUNING PARAMETER SELECTION FOR PENALIZED LIKELIHOOD ESTIMATION OF GAUSSIAN GRAPHICAL MODEL
Statistica Sinica 22 (2012), 1123-1146 doi:http://dx.doi.org/10.5705/ss.2009.210 TUNING PARAMETER SELECTION FOR PENALIZED LIKELIHOOD ESTIMATION OF GAUSSIAN GRAPHICAL MODEL Xin Gao, Danie Q. Pu, Yuehua
More informationCS229 Lecture notes. Andrew Ng
CS229 Lecture notes Andrew Ng Part IX The EM agorithm In the previous set of notes, we taked about the EM agorithm as appied to fitting a mixture of Gaussians. In this set of notes, we give a broader view
More informationAppendix for Stochastic Gradient Monomial Gamma Sampler
3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 3 3 33 34 35 36 37 38 39 4 4 4 43 44 45 46 47 48 49 5 5 5 53 54 Appendix for Stochastic Gradient Monomia Gamma Samper A The Main Theorem We provide the foowing
More informationThe EM Algorithm applied to determining new limit points of Mahler measures
Contro and Cybernetics vo. 39 (2010) No. 4 The EM Agorithm appied to determining new imit points of Maher measures by Souad E Otmani, Georges Rhin and Jean-Marc Sac-Épée Université Pau Veraine-Metz, LMAM,
More informationResearch of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance
Send Orders for Reprints to reprints@benthamscience.ae 340 The Open Cybernetics & Systemics Journa, 015, 9, 340-344 Open Access Research of Data Fusion Method of Muti-Sensor Based on Correation Coefficient
More informationAppendix A: MATLAB commands for neural networks
Appendix A: MATLAB commands for neura networks 132 Appendix A: MATLAB commands for neura networks p=importdata('pn.xs'); t=importdata('tn.xs'); [pn,meanp,stdp,tn,meant,stdt]=prestd(p,t); for m=1:10 net=newff(minmax(pn),[m,1],{'tansig','purein'},'trainm');
More informationSTA 216 Project: Spline Approach to Discrete Survival Analysis
: Spine Approach to Discrete Surviva Anaysis November 4, 005 1 Introduction Athough continuous surviva anaysis differs much from the discrete surviva anaysis, there is certain ink between the two modeing
More informationSVM: Terminology 1(6) SVM: Terminology 2(6)
Andrew Kusiak Inteigent Systems Laboratory 39 Seamans Center he University of Iowa Iowa City, IA 54-57 SVM he maxima margin cassifier is simiar to the perceptron: It aso assumes that the data points are
More informationLearning Structural Changes of Gaussian Graphical Models in Controlled Experiments
Learning Structura Changes of Gaussian Graphica Modes in Controed Experiments Bai Zhang and Yue Wang Bradey Department of Eectrica and Computer Engineering Virginia Poytechnic Institute and State University
More informationStatistics for Applications. Chapter 7: Regression 1/43
Statistics for Appications Chapter 7: Regression 1/43 Heuristics of the inear regression (1) Consider a coud of i.i.d. random points (X i,y i ),i =1,...,n : 2/43 Heuristics of the inear regression (2)
More informationA Novel Learning Method for Elman Neural Network Using Local Search
Neura Information Processing Letters and Reviews Vo. 11, No. 8, August 2007 LETTER A Nove Learning Method for Eman Neura Networ Using Loca Search Facuty of Engineering, Toyama University, Gofuu 3190 Toyama
More informationAppendix for Stochastic Gradient Monomial Gamma Sampler
Appendix for Stochastic Gradient Monomia Gamma Samper A The Main Theorem We provide the foowing theorem to characterize the stationary distribution of the stochastic process with SDEs in (3) Theorem 3
More informationSparse canonical correlation analysis from a predictive point of view
Sparse canonica correation anaysis from a predictive point of view arxiv:1501.01231v1 [stat.me] 6 Jan 2015 Ines Wims Facuty of Economics and Business, KU Leuven and Christophe Croux Facuty of Economics
More informationarxiv: v2 [stat.ml] 19 Oct 2016
Sparse Quadratic Discriminant Anaysis and Community Bayes arxiv:1407.4543v2 [stat.ml] 19 Oct 2016 Ya Le Department of Statistics Stanford University ye@stanford.edu Abstract Trevor Hastie Department of
More informationExplicit overall risk minimization transductive bound
1 Expicit overa risk minimization transductive bound Sergio Decherchi, Paoo Gastado, Sandro Ridea, Rodofo Zunino Dept. of Biophysica and Eectronic Engineering (DIBE), Genoa University Via Opera Pia 11a,
More informationAkaike Information Criterion for ANOVA Model with a Simple Order Restriction
Akaike Information Criterion for ANOVA Mode with a Simpe Order Restriction Yu Inatsu * Department of Mathematics, Graduate Schoo of Science, Hiroshima University ABSTRACT In this paper, we consider Akaike
More informationAsynchronous Control for Coupled Markov Decision Systems
INFORMATION THEORY WORKSHOP (ITW) 22 Asynchronous Contro for Couped Marov Decision Systems Michae J. Neey University of Southern Caifornia Abstract This paper considers optima contro for a coection of
More informationXSAT of linear CNF formulas
XSAT of inear CN formuas Bernd R. Schuh Dr. Bernd Schuh, D-50968 Kön, Germany; bernd.schuh@netcoogne.de eywords: compexity, XSAT, exact inear formua, -reguarity, -uniformity, NPcompeteness Abstract. Open
More informationarxiv: v1 [cs.db] 1 Aug 2012
Functiona Mechanism: Regression Anaysis under Differentia Privacy arxiv:208.029v [cs.db] Aug 202 Jun Zhang Zhenjie Zhang 2 Xiaokui Xiao Yin Yang 2 Marianne Winsett 2,3 ABSTRACT Schoo of Computer Engineering
More informationTwo-Stage Least Squares as Minimum Distance
Two-Stage Least Squares as Minimum Distance Frank Windmeijer Discussion Paper 17 / 683 7 June 2017 Department of Economics University of Bristo Priory Road Compex Bristo BS8 1TU United Kingdom Two-Stage
More informationParagraph Topic Classification
Paragraph Topic Cassification Eugene Nho Graduate Schoo of Business Stanford University Stanford, CA 94305 enho@stanford.edu Edward Ng Department of Eectrica Engineering Stanford University Stanford, CA
More informationASummaryofGaussianProcesses Coryn A.L. Bailer-Jones
ASummaryofGaussianProcesses Coryn A.L. Baier-Jones Cavendish Laboratory University of Cambridge caj@mrao.cam.ac.uk Introduction A genera prediction probem can be posed as foows. We consider that the variabe
More informationA proposed nonparametric mixture density estimation using B-spline functions
A proposed nonparametric mixture density estimation using B-spine functions Atizez Hadrich a,b, Mourad Zribi a, Afif Masmoudi b a Laboratoire d Informatique Signa et Image de a Côte d Opae (LISIC-EA 4491),
More informationDistributed average consensus: Beyond the realm of linearity
Distributed average consensus: Beyond the ream of inearity Usman A. Khan, Soummya Kar, and José M. F. Moura Department of Eectrica and Computer Engineering Carnegie Meon University 5 Forbes Ave, Pittsburgh,
More informationarxiv: v1 [cs.lg] 31 Oct 2017
ACCELERATED SPARSE SUBSPACE CLUSTERING Abofaz Hashemi and Haris Vikao Department of Eectrica and Computer Engineering, University of Texas at Austin, Austin, TX, USA arxiv:7.26v [cs.lg] 3 Oct 27 ABSTRACT
More informationPrimal and dual active-set methods for convex quadratic programming
Math. Program., Ser. A 216) 159:469 58 DOI 1.17/s117-15-966-2 FULL LENGTH PAPER Prima and dua active-set methods for convex quadratic programming Anders Forsgren 1 Phiip E. Gi 2 Eizabeth Wong 2 Received:
More informationData Mining Technology for Failure Prognostic of Avionics
IEEE Transactions on Aerospace and Eectronic Systems. Voume 38, #, pp.388-403, 00. Data Mining Technoogy for Faiure Prognostic of Avionics V.A. Skormin, Binghamton University, Binghamton, NY, 1390, USA
More informationarxiv: v2 [cs.lg] 4 Sep 2014
Cassification with Sparse Overapping Groups Nikhi S. Rao Robert D. Nowak Department of Eectrica and Computer Engineering University of Wisconsin-Madison nrao2@wisc.edu nowak@ece.wisc.edu ariv:1402.4512v2
More informationUnconditional security of differential phase shift quantum key distribution
Unconditiona security of differentia phase shift quantum key distribution Kai Wen, Yoshihisa Yamamoto Ginzton Lab and Dept of Eectrica Engineering Stanford University Basic idea of DPS-QKD Protoco. Aice
More informationTwo view learning: SVM-2K, Theory and Practice
Two view earning: SVM-2K, Theory and Practice Jason D.R. Farquhar jdrf99r@ecs.soton.ac.uk Hongying Meng hongying@cs.york.ac.uk David R. Hardoon drh@ecs.soton.ac.uk John Shawe-Tayor jst@ecs.soton.ac.uk
More informationSequential Decoding of Polar Codes with Arbitrary Binary Kernel
Sequentia Decoding of Poar Codes with Arbitrary Binary Kerne Vera Miosavskaya, Peter Trifonov Saint-Petersburg State Poytechnic University Emai: veram,petert}@dcn.icc.spbstu.ru Abstract The probem of efficient
More informationCONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION
CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION SAHAR KARIMI AND STEPHEN VAVASIS Abstract. In this paper we present a variant of the conjugate gradient (CG) agorithm in which we invoke a subspace minimization
More informationStochastic Automata Networks (SAN) - Modelling. and Evaluation. Paulo Fernandes 1. Brigitte Plateau 2. May 29, 1997
Stochastic utomata etworks (S) - Modeing and Evauation Pauo Fernandes rigitte Pateau 2 May 29, 997 Institut ationa Poytechnique de Grenobe { IPG Ecoe ationae Superieure d'informatique et de Mathematiques
More informationLearning Fully Observed Undirected Graphical Models
Learning Fuy Observed Undirected Graphica Modes Sides Credit: Matt Gormey (2016) Kayhan Batmangheich 1 Machine Learning The data inspires the structures we want to predict Inference finds {best structure,
More informationA Comparison Study of the Test for Right Censored and Grouped Data
Communications for Statistica Appications and Methods 2015, Vo. 22, No. 4, 313 320 DOI: http://dx.doi.org/10.5351/csam.2015.22.4.313 Print ISSN 2287-7843 / Onine ISSN 2383-4757 A Comparison Study of the
More informationChemical Kinetics Part 2
Integrated Rate Laws Chemica Kinetics Part 2 The rate aw we have discussed thus far is the differentia rate aw. Let us consider the very simpe reaction: a A à products The differentia rate reates the rate
More informationAnt Colony Algorithms for Constructing Bayesian Multi-net Classifiers
Ant Coony Agorithms for Constructing Bayesian Muti-net Cassifiers Khaid M. Saama and Aex A. Freitas Schoo of Computing, University of Kent, Canterbury, UK. {kms39,a.a.freitas}@kent.ac.uk December 5, 2013
More informationMARKOV CHAINS AND MARKOV DECISION THEORY. Contents
MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After
More informationAlberto Maydeu Olivares Instituto de Empresa Marketing Dept. C/Maria de Molina Madrid Spain
CORRECTIONS TO CLASSICAL PROCEDURES FOR ESTIMATING THURSTONE S CASE V MODEL FOR RANKING DATA Aberto Maydeu Oivares Instituto de Empresa Marketing Dept. C/Maria de Moina -5 28006 Madrid Spain Aberto.Maydeu@ie.edu
More informationLecture Note 3: Stationary Iterative Methods
MATH 5330: Computationa Methods of Linear Agebra Lecture Note 3: Stationary Iterative Methods Xianyi Zeng Department of Mathematica Sciences, UTEP Stationary Iterative Methods The Gaussian eimination (or
More informationScalable Spectrum Allocation for Large Networks Based on Sparse Optimization
Scaabe Spectrum ocation for Large Networks ased on Sparse Optimization innan Zhuang Modem R&D Lab Samsung Semiconductor, Inc. San Diego, C Dongning Guo, Ermin Wei, and Michae L. Honig Department of Eectrica
More informationFirst-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries
c 26 Noninear Phenomena in Compex Systems First-Order Corrections to Gutzwier s Trace Formua for Systems with Discrete Symmetries Hoger Cartarius, Jörg Main, and Günter Wunner Institut für Theoretische
More informationDiscriminant Analysis: A Unified Approach
Discriminant Anaysis: A Unified Approach Peng Zhang & Jing Peng Tuane University Eectrica Engineering & Computer Science Department New Oreans, LA 708 {zhangp,jp}@eecs.tuane.edu Norbert Riede Tuane University
More informationChemical Kinetics Part 2. Chapter 16
Chemica Kinetics Part 2 Chapter 16 Integrated Rate Laws The rate aw we have discussed thus far is the differentia rate aw. Let us consider the very simpe reaction: a A à products The differentia rate reates
More informationSeparation of Variables and a Spherical Shell with Surface Charge
Separation of Variabes and a Spherica She with Surface Charge In cass we worked out the eectrostatic potentia due to a spherica she of radius R with a surface charge density σθ = σ cos θ. This cacuation
More informationA Solution to the 4-bit Parity Problem with a Single Quaternary Neuron
Neura Information Processing - Letters and Reviews Vo. 5, No. 2, November 2004 LETTER A Soution to the 4-bit Parity Probem with a Singe Quaternary Neuron Tohru Nitta Nationa Institute of Advanced Industria
More informationConvolutional Networks 2: Training, deep convolutional networks
Convoutiona Networks 2: Training, deep convoutiona networks Hakan Bien Machine Learning Practica MLP Lecture 8 30 October / 6 November 2018 MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks
More informationRelated Topics Maxwell s equations, electrical eddy field, magnetic field of coils, coil, magnetic flux, induced voltage
Magnetic induction TEP Reated Topics Maxwe s equations, eectrica eddy fied, magnetic fied of cois, coi, magnetic fux, induced votage Principe A magnetic fied of variabe frequency and varying strength is
More informationMelodic contour estimation with B-spline models using a MDL criterion
Meodic contour estimation with B-spine modes using a MDL criterion Damien Loive, Ney Barbot, Oivier Boeffard IRISA / University of Rennes 1 - ENSSAT 6 rue de Kerampont, B.P. 80518, F-305 Lannion Cedex
More information6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7
6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17 Soution 7 Probem 1: Generating Random Variabes Each part of this probem requires impementation in MATLAB. For the
More informationTarget Location Estimation in Wireless Sensor Networks Using Binary Data
Target Location stimation in Wireess Sensor Networks Using Binary Data Ruixin Niu and Pramod K. Varshney Department of ectrica ngineering and Computer Science Link Ha Syracuse University Syracuse, NY 344
More informationActive Learning & Experimental Design
Active Learning & Experimenta Design Danie Ting Heaviy modified, of course, by Lye Ungar Origina Sides by Barbara Engehardt and Aex Shyr Lye Ungar, University of Pennsyvania Motivation u Data coection
More informationA unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio
MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1654 March23, 1999
More informationA Simple and Efficient Algorithm of 3-D Single-Source Localization with Uniform Cross Array Bing Xue 1 2 a) * Guangyou Fang 1 2 b and Yicai Ji 1 2 c)
A Simpe Efficient Agorithm of 3-D Singe-Source Locaization with Uniform Cross Array Bing Xue a * Guangyou Fang b Yicai Ji c Key Laboratory of Eectromagnetic Radiation Sensing Technoogy, Institute of Eectronics,
More informationSupport Vector Machine and Its Application to Regression and Classification
BearWorks Institutiona Repository MSU Graduate Theses Spring 2017 Support Vector Machine and Its Appication to Regression and Cassification Xiaotong Hu As with any inteectua project, the content and views
More informationConvergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems
Convergence Property of the Iri-Imai Agorithm for Some Smooth Convex Programming Probems S. Zhang Communicated by Z.Q. Luo Assistant Professor, Department of Econometrics, University of Groningen, Groningen,
More informationFORECASTING TELECOMMUNICATIONS DATA WITH AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODELS
FORECASTING TEECOMMUNICATIONS DATA WITH AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODES Niesh Subhash naawade a, Mrs. Meenakshi Pawar b a SVERI's Coege of Engineering, Pandharpur. nieshsubhash15@gmai.com
More informationDetermining The Degree of Generalization Using An Incremental Learning Algorithm
Determining The Degree of Generaization Using An Incrementa Learning Agorithm Pabo Zegers Facutad de Ingeniería, Universidad de os Andes San Caros de Apoquindo 22, Las Condes, Santiago, Chie pzegers@uandes.c
More informationHILBERT? What is HILBERT? Matlab Implementation of Adaptive 2D BEM. Dirk Praetorius. Features of HILBERT
Söerhaus-Workshop 2009 October 16, 2009 What is HILBERT? HILBERT Matab Impementation of Adaptive 2D BEM joint work with M. Aurada, M. Ebner, S. Ferraz-Leite, P. Godenits, M. Karkuik, M. Mayr Hibert Is
More informationPartial permutation decoding for MacDonald codes
Partia permutation decoding for MacDonad codes J.D. Key Department of Mathematics and Appied Mathematics University of the Western Cape 7535 Bevie, South Africa P. Seneviratne Department of Mathematics
More informationTesting for the Existence of Clusters
Testing for the Existence of Custers Caudio Fuentes and George Casea University of Forida November 13, 2008 Abstract The detection and determination of custers has been of specia interest, among researchers
More informationNEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION
NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION Hsiao-Chang Chen Dept. of Systems Engineering University of Pennsyvania Phiadephia, PA 904-635, U.S.A. Chun-Hung Chen
More informationAn Information Geometrical View of Stationary Subspace Analysis
An Information Geometrica View of Stationary Subspace Anaysis Motoaki Kawanabe, Wojciech Samek, Pau von Bünau, and Frank C. Meinecke Fraunhofer Institute FIRST, Kekuéstr. 7, 12489 Berin, Germany Berin
More informationSome Measures for Asymmetry of Distributions
Some Measures for Asymmetry of Distributions Georgi N. Boshnakov First version: 31 January 2006 Research Report No. 5, 2006, Probabiity and Statistics Group Schoo of Mathematics, The University of Manchester
More informationProcess Capability Proposal. with Polynomial Profile
Contemporary Engineering Sciences, Vo. 11, 2018, no. 85, 4227-4236 HIKARI Ltd, www.m-hikari.com https://doi.org/10.12988/ces.2018.88467 Process Capabiity Proposa with Poynomia Profie Roberto José Herrera
More informationAlgorithms to solve massively under-defined systems of multivariate quadratic equations
Agorithms to sove massivey under-defined systems of mutivariate quadratic equations Yasufumi Hashimoto Abstract It is we known that the probem to sove a set of randomy chosen mutivariate quadratic equations
More informationReichenbachian Common Cause Systems
Reichenbachian Common Cause Systems G. Hofer-Szabó Department of Phiosophy Technica University of Budapest e-mai: gszabo@hps.ete.hu Mikós Rédei Department of History and Phiosophy of Science Eötvös University,
More informationNonlinear Gaussian Filtering via Radial Basis Function Approximation
51st IEEE Conference on Decision and Contro December 10-13 01 Maui Hawaii USA Noninear Gaussian Fitering via Radia Basis Function Approximation Huazhen Fang Jia Wang and Raymond A de Caafon Abstract This
More informationII. PROBLEM. A. Description. For the space of audio signals
CS229 - Fina Report Speech Recording based Language Recognition (Natura Language) Leopod Cambier - cambier; Matan Leibovich - matane; Cindy Orozco Bohorquez - orozcocc ABSTRACT We construct a rea time
More informationEfficiently Generating Random Bits from Finite State Markov Chains
1 Efficienty Generating Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown
More informationFUSED MULTIPLE GRAPHICAL LASSO
FUSED MULTIPLE GRAPHICAL LASSO SEN YANG, ZHAOSONG LU, XIAOTONG SHEN, PETER WONKA, JIEPING YE Abstract. In this paper, we consider the probem of estimating mutipe graphica modes simutaneousy using the fused
More information18-660: Numerical Methods for Engineering Design and Optimization
8-660: Numerica Methods for Engineering esign and Optimization in i epartment of ECE Carnegie Meon University Pittsburgh, PA 523 Side Overview Conjugate Gradient Method (Part 4) Pre-conditioning Noninear
More informationMATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES
MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES Separation of variabes is a method to sove certain PDEs which have a warped product structure. First, on R n, a inear PDE of order m is
More informationTraffic data collection
Chapter 32 Traffic data coection 32.1 Overview Unike many other discipines of the engineering, the situations that are interesting to a traffic engineer cannot be reproduced in a aboratory. Even if road
More informationAvailable online at ScienceDirect. Procedia Computer Science 96 (2016 )
Avaiabe onine at www.sciencedirect.com ScienceDirect Procedia Computer Science 96 (206 92 99 20th Internationa Conference on Knowedge Based and Inteigent Information and Engineering Systems Connected categorica
More informationOptimality of Inference in Hierarchical Coding for Distributed Object-Based Representations
Optimaity of Inference in Hierarchica Coding for Distributed Object-Based Representations Simon Brodeur, Jean Rouat NECOTIS, Département génie éectrique et génie informatique, Université de Sherbrooke,
More informationCategories and Subject Descriptors B.7.2 [Integrated Circuits]: Design Aids Verification. General Terms Algorithms
5. oward Eicient arge-scae Perormance odeing o Integrated Circuits via uti-ode/uti-corner Sparse Regression Wangyang Zhang entor Graphics Corporation Ridder Park Drive San Jose, CA 953 wangyan@ece.cmu.edu
More information8 APPENDIX. E[m M] = (n S )(1 exp( exp(s min + c M))) (19) E[m M] n exp(s min + c M) (20) 8.1 EMPIRICAL EVALUATION OF SAMPLING
8 APPENDIX 8.1 EMPIRICAL EVALUATION OF SAMPLING We wish to evauate the empirica accuracy of our samping technique on concrete exampes. We do this in two ways. First, we can sort the eements by probabiity
More informationApproximated MLC shape matrix decomposition with interleaf collision constraint
Approximated MLC shape matrix decomposition with intereaf coision constraint Thomas Kainowski Antje Kiese Abstract Shape matrix decomposition is a subprobem in radiation therapy panning. A given fuence
More informationUniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete
Uniprocessor Feasibiity of Sporadic Tasks with Constrained Deadines is Strongy conp-compete Pontus Ekberg and Wang Yi Uppsaa University, Sweden Emai: {pontus.ekberg yi}@it.uu.se Abstract Deciding the feasibiity
More informationA Separability Index for Distance-based Clustering and Classification Algorithms
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL.?, NO.?, JANUARY 1? 1 A Separabiity Index for Distance-based Custering and Cassification Agorithms Arka P. Ghosh, Ranjan Maitra and Anna
More informationAnother Look at Linear Programming for Feature Selection via Methods of Regularization 1
Another Look at Linear Programming for Feature Seection via Methods of Reguarization Yonggang Yao, The Ohio State University Yoonkyung Lee, The Ohio State University Technica Report No. 800 November, 2007
More information