Factorized Sparse Learning Models with Interpretable High Order Feature Interactions

Size: px
Start display at page:

Download "Factorized Sparse Learning Models with Interpretable High Order Feature Interactions"

Transcription

1 Factorized Sparse Learning Modes with Interpretabe High Order Feature Interactions Sanjay Purushotham University of Southern Caifornia Los Angees, CA, USA Martin Renqiang Min NEC Labs America Princeton, NJ, USA Rache Ostroff SomaLogic, Inc. Bouder, CO, USA C.-C. Jay Kuo University of Southern Caifornia Los Angees, CA, USA ABSTRACT Identifying interpretabe discriminative high-order feature interactions given imited training data in high dimensions is chaenging in both machine earning and data mining. In this paper, we propose a factorization based sparse earning framework termed FHIM for identifying high-order feature interactions in inear and ogistic regression modes, and study severa optimization methods for soving them. Unike previous sparse earning methods, our mode FHIM recovers both the main effects and the interaction terms accuratey without imposing tree-structured hierarchica constraints. Furthermore, we show that FHIM has orace properties when extended to generaized inear regression modes with pairwise interactions. Experiments on simuated data show that FHIM outperforms the state-of-the-art sparse earning techniques. Further experiments on our experimentay generated data from patient bood sampes using a nove SOMAmer (Sow Off-rate Modified Aptamer) technoogy show that, FHIM performs bood-based cancer diagnosis and bio-marker discovery for Rena Ce Carcinoma much better than other competing methods, and it identifies interpretabe bock-wise high-order gene interactions predictive of cancer stages of sampes. A iterature survey shows that the interactions identified by FHIM pay important roes in cancer deveopment. Categories and Subject Descriptors J.3 [Computer Appications: Life and Medica Sciences; Co-first author Co-first author, corresponding author To whom data request shoud be sent Permission to make digita or hard copies of a or part of this work for persona or cassroom use is granted without fee provided that copies are not made or distributed for profit or commercia advantage and that copies bear this notice and the fu citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or repubish, to post on servers or to redistribute to ists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. KDD 4, August 24 27, 204, New York, NY, USA. Copyright 204 ACM /4/08...$ I.2.6 [Computing Methodoogies: Artificia Inteigence sparse earning, feature seection Keywords Sparse Learning; High-order interactions; Biomaker Discovery; Bood-based Cancer Diagnosis. INTRODUCTION Identifying interpretabe high-order feature interactions is an important probem in machine earning, data mining, and biomedica informatics, because feature interactions often hep revea some hidden domain knowedge and the structures of probems under consideration. For exampe, genes and proteins sedom perform their functions independenty, so many human diseases are often manifested as the dysfunction of some pathways or functiona gene modues, and the disrupted patterns due to diseases are often more obvious at a pathway or modue eve. Identifying these disrupted gene interactions for different diseases such as cancer wi hep us understand the underying mechanisms of the diseases and deveop effective drugs to cure them. However, identifying reiabe discriminative high-order gene/protein or SNP interactions for accurate disease diagnosis such as eary cancer diagnosis directy based on patient bood sampes is sti a chaenging probem, because we often have very imited patient sampes but a huge number of compex feature interactions to consider. In this paper, we propose a sparse earning framework based on weight matrix factorizations and reguarizations for identifying discriminative high-order feature interactions in inear and ogistic regression modes, and we study severa optimization methods for soving them. Experimenta resuts on synthetic and rea-word datasets show that our method outperforms the state-of-the-art sparse earning techniques, and it provides interpretabe bockwise highorder interactions for disease status prediction. Our proposed sparse earning framework is genera, and can be used to identify any discriminative compex system input interactions that are predictive of system outputs given imited high-dimensiona training data. Our contributions are as foows: () We propose a method capabe of simutaneousy identifying both informative singe

2 discriminative features and discriminative bock-wise highorder interactions in a sparse earning framework, which can be easiy extended to hande arbitrariy high-order feature interactions; (2) Our method works on high-dimensiona input feature spaces and i-posed probems with much more features than data points, which is typica for biomedica appications such as biomarker discovery and cancer diagnosis; (3) Our method has interesting theoretica properties for generaized inear regression modes; (4) The interactions identified by our method ead to biomedica insight into understanding bood-based cancer diagnosis. 2. RELATED WORK Variabe seection has been a we studied topic in statistics, machine earning, and data mining iterature. Generay, variabe seection approaches focus on identifying discriminative features using reguarization techniques. Most recent methods focus on identifying discriminative features or groups of discriminative features based on Lasso penaty [8, Group Lasso [2, Trace-norm [6, Dirty mode [8 and Support Vector Machines (SVMs) [6. A recent approach [20 heuristicay adds some possibe high-order interactions into the input feature set in a greedy way based on asso penaized ogistic regression. Some recent approaches [2,[3 enforce strong and/or weak heredity constraints to recover the pairwise interactions in inear regression modes. In strong heredity, an interaction term can be incuded in the mode ony if the corresponding main terms are aso incuded in the mode, whie in weak heredity, an interaction term is incuded when either of the main terms are incuded in the mode. However, recent studies in bioinformatics has shown that feature interactions need not foow heredity constraints for manifestation of the diseases, and thus the above approaches [2,[3 have imited chance of recovering reevant interactions. Kerne methods such as Gaussian Process [4 and Mutipe Kerne Learning [0 can be used to mode high-order feature interactions, but they can ony te which orders are important. Thus, a these previous approaches either faied to identify specific high-order interactions for prediction or identified sporadic pairwise interactions in a greedy way, which is very unikey to recover the interpretabe bockwise high-order interactions among features in different sub-components (for exampe: pathways or gene functiona modues) of systems. Recenty, [4 proposed an efficient way to identify combinatoria interactions among interactive genes in compex diseases by using overapping group asso and screening. However, they use prior information such as gene ontoogy in their approach, which is generay not avaiabe or difficut to coect for some machine earning probems. Thus, there is a need to deveop new efficient techniques to automaticay capture the important bockwise high-order feature interactions in regression modes, which is the focus of this paper. The remainder of the paper is organized as foows: in section 3 we discuss our probem formuation and reevant notations used in the paper. In section 4, we discuss the main idea of our approach, and in section 5 we give a overview of theoretica properties associated with our method. In section 6, we present the optimization methods which we use to sove our optimization probem. In section 7, we discuss our experimenta setup and present our resuts on synthetic and rea datasets. Finay, in section 8 we concude the paper with discussions and future research directions. 3. PROBLEM FORMULATION Consider a regression setup with a training set of n sampes and p features, {(X (i), y (i) )}, where X (i) R p is the i th instance (coumn) of the design matrix X (p n), y (i) R is the i th instance of response variabe y (n ), and i =,..., n. To mode the response in terms of the predictors, we can set up a inear regression mode or a ogistic regression mode p(y (i) = X (i) ) = y (i) = β T X (i) + ɛ (i), () + exp( β T X (i) β 0), (2) where β R p is the weight vector associated with singe features (aso caed main effects), ɛ R n is a noise vector, and β 0 R is the bias term. In many practica fieds such as bioinformatics and medica informatics, the main terms (the terms ony invoving singe features) are not enough to capture compex reationship between the response and the predictors, and thus high-order interactions are necessary. In this paper, we consider regression modes with both main effects and high-order interaction terms. Equation 3 shows a inear regression mode with pairwise interaction terms. y (i) = β T X (i) + X (i)t WX (i) + ɛ (i), (3) where W(p p) is the weight matrix associated with the pairwise feature interactions. The corresponding oss function (the sum of squared errors) is as foows (we center the data to avoid an additiona bias term), L sqerr(β, W) = n y (i) β T X (i) X T (i) WX (i) 2 2. (4) 2 i= We can simiary write the ogistic regression mode with pairwise interactions as foows, p(y (i) X (i) ) = + exp( y (i) (β T X (i) + X (i)t WX (i) + β 0)) and the corresponding oss function (the sum of the negative og-ikeihood of the training data) is, n L ogistic (β, W, β 0) = og( + exp( y (i) (β T X (i) i= (5) +X (i)t WX (i) + β 0)). (6) 4. OUR APPROACH In this section, we propose an optimization-driven sparse earning framework to identify discriminative singe features and groups of high-order interactions among input features for output prediction in the setting of imited training data. When the number of input features is huge (e.g. biomedica appications), it is practicay impossibe to expicity consider quadratic or even higher-order interactions among a the input features based on simpe asso penaized inear regression or ogistic regression. To sove this probem, we propose to factorize the weight matrix W associated with high-order interactions between input features to be a sum of K rank-one matrices for pairwise interactions or a sum of ow-rank high-order tensors for higher-order interactions.

3 Each rank-one matrix for pairwise feature interactions is represented by an outer product of two identica vectors, and each m-order (m > 2) tensor is represented by the outer product of m identica vectors. Besides minimizing the oss function of inear regression or ogistic regression, we penaize the norm of both the weights associated with singe input features and the weights associated with high-order feature interactions. Mathematicay, we sove the optimization probem to identify the discriminative singe and pairwise interaction features as foows, { ˆβ, â k } = arg min Lsqerr(β, W) a k,β K (7) + λ β β + λ ak a k k= where W = K k= a k a k, represents the tensor product/outer product, and ˆβ, â k represent the estimated parameters of our mode and et Q represent objective function of (7). For ogistic regression, we repace L sqerr(β, W) in (7) by L ogistic (β, W, β 0). We ca our mode Factorizationbased High-order Interaction Mode (FHIM). Proposition 4.. The optimization probem in Equation 7 is convex in β and non-convex in a k. Because of the non-convexity property of our optimization probem, it is difficut to propose optimization agorithms which guarantee convergence to goba optima. Here, we adopt a greedy aternating optimization methods to find the oca optima for our probem. In the case of pairwise interactions, fixing other weights, we sove each rank-one weight matrix each time. Pease note that our symmetric positive definite factorization of W makes this sub-optimization probem very easy. Moreover, for a particuar rank-one weight matrix a k a k, the nonzero entries of the corresponding vector a k can be interpreted as the bock-wise interaction feature indices of a densey interacting feature group. In the case of higher-order interactions, the optimization procedure is simiar to the one for the pairwise interactions except that we have more rounds of aternating optimization. The parameter K of W is generay unknown in rea datasets, thus, we greediy estimate K during the aternating optimization agorithm. In fact, the combination of our factorization formuation and the greedy agorithm is effective for estimating the interaction weight matrix W. β is re-estimated when K is greediy added during the aternating optimization as shown in agorithm. Agorithm Greedy Aternating Optimization : Initiaize β to 0, K = and a K = 2: Whie (K==) (a K 0 for K > ) 3: Repeat unti convergence 4: βj t = arg min j Q(β, t..., βj, t β t j+, βt p ), a t k ) 5: a t k,j = arg min j Q((a t k,,..., a t k,j, a t k,j+, at k,p ), βt ) 6: End Repeat 7: K = K + ; a K = 8: End Whie 9: Remove a K and a K from a. 5. THEORETICAL PROPERTIES In this section, we study the asymptotic behavior of FHIM for the ikeihood-based generaized inear regression modes. The emmas and theorems proved here are simiar to the ones shown in the paper [3. However, in their paper the authors make an assumption on the strong heredity (i.e. interaction term coefficients are dependent on the main effects), which is not assumed in our mode since we are interested in identifying a high-order interactions irrespective of heredity constraints. Here, we discuss the asymptotic properties w.r.t to the main effects and factorized co-efficients. Probem Setup: Assume that the data V i = (X i, y i), i =,...n are coected independenty and Y i has a density of f(g(x i), y i) conditioned on X i, where g is a known regression function with main effects and a possibe pairwise interactions. Let β j and a k,j denote the underying true parameters satisfying bock-wise properties impied by our factorization. Let Q n(θ) denote the objective with negative og-ikeihood and θ = (β T, α T ) T, where α = (a k), k =,..., K. We consider the estimates for FHIM as ˆθ n: ˆθ n = arg min θ Q n(θ) (8) = arg min n i), y i) + λ β β + θ n i=(l(g(x λ αk α k k where L(g(X i), y i) is the oss function of generaized inear regression modes with pairwise interactions. In the case of inear regression, g( ) takes the form of Equation (3) without the noise term ɛ and L( ) takes the form of Equation (4). Now, et us define A = {j : β j 0} A 2 = {(k, ) : α k, 0}, A = A A 2 (9) where A contains the indices of the main terms which correspond to the nonzero true coefficients, and simiary A 2 contains the indices of the factorized interaction terms whose true co-efficients are non-zero. Let us define a n = max{λ β j, λα k : j A, (k, ) A 2} b n = min{λ β j, λα k : j A c, (k, ) A c 2} (0) Now, we show that our mode possesses the orace properties for (i) n with fixed p and (ii) p n as n under some reguarity conditions. Pease refer to Appendix for proofs of the emmas & theorems of sections 5. and Asymptotic Orace Properties when n The asymptotic properties when sampe size increases and the number of predictors is fixed are described in the foowing emmas and theorems. FHIM possesses orace properties [3 under certain reguarity conditions (C)-(C3) shown beow. Let Ω denote the parameter space for θ. (C) The observations V i : i =,..., n are independent and identicay distributed with a probabiity density f(v, θ), which has a common support. We assume the density f satisfies the foowing equations: and [ og f(v, θ) E θ = 0 for j =,..., p(k + ), θ j [ og f(v, θ) og f(v, θ) I jk (θ) =E θ θ j θ k [ =E θ 2 og f(v, θ) θ j θ k

4 (C2) The Fisher Information Matrix [( og f(v, θ) )( og f(v, θ) ) T I(θ) = E θ θ is finite and positive definite at θ = θ. (C3) There exists an open set ω of Ω that contains the true parameter point θ such that for amost a V the density f(v, θ) admits a third derivatives ( 3 f(v, θ))/( θ j θ k θ ) for a θ ω and any j, k, =,..., p(k + ). Furthermore, there exist functions M jk such that 3 og f(v, θ) θ j θ k θ M jk(v) for a θ ω where m jk = E θ [M jk (V) <. These reguarity conditions are the existence of common support and first, second derivatives for f(v, θ); Fisher Information matrix being finite and positive definite; and existence of bounded third derivative for f(v, θ). These reguarity conditions guarantee asymptotic normaity of the ordinary maximum ikeihood estimates [. Lemma 5.. Assume a n = o() as n. Then under reguarity conditions (C)-(C3), there exists a oca minimizer ˆθ n of Q n(θ) such that ˆθ n θ = O P (n /2 + a n) Theorem 5.2. Assume nb n and the minimizer ˆθ n given in emma 5. satisfies ˆθ n θ = O P (n /2 ). Then under reguarity conditions (C)-(C3), we have P ( ˆβ A C = 0), P ( ˆα A C 2 = 0) Lemma 5. impies that when the tuning parameters associated with the non-zero coefficients of main effects and pairwise interactions tend to 0 at a rate faster than n /2, then there exists a oca minimizer of Q n(θ), which is n consistent (the samping error is O p(n /2 )). Theorem 5.2 shows that our mode removes noise consistenty with high probabiity ( ). If na n 0 and nb n, then emma 5. and theorem 5.2 impy that the n consistent estimator ˆθ n satisfies P (ˆθ A c = 0). Theorem 5.3. Assume na n 0 and nb n. Then under the reguarity conditions (C)-(C3), the component ˆθ A of the oca minimizer ˆθ n (given in emma 5.) satisfies n(ˆθa θ A) d N(0, I (θ A)), where I(θ A) is the Fisher information matrix of θ A at θ A = θ A assuming that θ Ac = 0 is known in advance. Theorem 5.3 shows that our mode estimates the non-zero coefficients of the true mode with the same asymptotic distribution as if the zero coefficients were known in advance. Based on theorems 5.2 and 5.3, we can say that our mode has the orace property [3, [5, when the tuning parameters satisfy the conditions na n 0 and nb n. To satisfy these conditions, we have to consider adaptive weights w β j, wα k [23 for our tuning parameters λ β, λ αk (see appendix for more detais). Thus, our tuning parameters are: λ β j = og n n λ βw β j, λα k = og n n λα k wα k 5.2 Asymptotic Orace Properties When p n as n In this section, we consider the asymptotic behavior of our mode when the number of predictors p n grows to infinity aong with the sampe size n. If certain reguarity conditions (C4)-(C6) (shown beow) hod, then we can show that our mode possesses the orace property. We denote the tota number of predictors by p n. We denote a the quantities that change with sampe size by adding n as their subscript. A, A 2, A are defined as in section 5 and et s n = A n. The asymptotic properties of our mode when the number of predictors increases aong with the sampe size are described in the foowing emma and theorem. The reguarity conditions (C4)-(C6) are given beow: Let Ω n denote the parameter space for θ n. (C4) The observations V ni : i =,..., n are independent and identicay distributed with a probabiity density f n(v n, θ n), which has a common support. We assume the density f n satisfies the foowing equations: and [ og fn(v n, θ n) E θn = 0 for j =,..., p n, θ nj [ og fn(v n, θ n) og f n(v n, θ n) I jk (θ n) =E θn θ nj θ nk [ =E θn 2 og f n(v n, θ n) θ nj θ nk (C5) I n(θ n) = E[( og fn(v n,θ n) θ n )( og fn(v n,θ n) θ n ) T satisfies 0 < C < λ mini n(θ n) λ maxi n(θ n) < C 2 < for a n, where λ min(.) and λ max(.) represent the smaest and argest eigenvaues of a matrix respectivey. Moreover, for any j, k =,..., p n, { } 2 og f n(v n, θ n) og f n(v n, θ n) E θn θ nj θ nk and < C 3 <, E θn { } 2 og f n(v n, θ n) < C 4 < θ nj θ nk (C6) There exists a arge open set ω n Ω n R pn which contains the true parameters θ n such that for amost a V ni the density admits a third derivatives 3 f n(v ni, θ n)/ θ nj θ nk θ n for a θ n ω n. Furthermore, there are functions M njk such that 3 f n(v ni, θ n) θ nj θ nk θ n M njk(v ni) for a θ n ω n and for a p n, n, and j, k,. E θn M 2 njk(v ni) < C 5 < Lemma 5.4. Assume that the density f n(v n, θ n) satisfies some reguarity conditions (C4)-(C6). If na n 0 and p 5 n/n 0 as n, then there exists a oca minimizer ˆθ n of Q n(θ) such that ˆθ n θ n = O P ( p n(n /2 + a n))

5 Theorem 5.5. Suppose that the density f n(v n, θ n) satisfies some reguarity conditions (C4)-(C6). If np na n 0, n/pnb n and p 5 n/n 0 as n, then with probabiity tending to, the n/p n-consistent oca minimizer ˆθ n in Lemma 5.4 satisfies the foowing: Sparsity: ˆθ na c n = 0 Asymptotic normaity: nani 2 n (ˆθ nan θ na n ) d N(0, G) where A n is an arbitrary m s n matrix with finite m such that A na T n G and G is a m m nonnegative symmetric matrix and I n(θ na n ) is the Fisher information matrix of θ nan at θ nan = θ na n. Since the dimension of ˆθ nan as sampe size n, we coud consider arbitrary inear combination A n ˆθnAn for the asymptotic normaity of our mode s estimates. Simiar to section 5., to satisfy orace property, we have to consider an adaptive weights w β nj, wα k n [23 for our tuning parameters λ β, λ αk as: λ β nj = og(n)pn λ β w β nj n, n, = og(n)pn λ αk w α k n n λα k 6. OPTIMIZATION In this section, we outine three optimization methods that we empoy to sove our objective function (7), which corresponds to Line 4 and 5 in Agorithm. [5 provides a good survey on severa optimization approaches for soving -reguarized regression probems. In this paper, we use the sub-gradient and co-ordinate wise soft-threshoding based optimization methods since they work we and are easy to impement. We compare these methods in the experimenta resuts in section Sub-Gradient Methods Sub-gradient based strategies treat the non-differentiabe objective as a non-smooth optimization probem and use sub-gradients of the objective function at the non-differentiabe points. For our mode, the optimaity conditions w.r.t parameter vectors β and a k can be written out separatey based on the objective function (7). Optimaity conditions w.r.t a k is: { jl(a k ) + λ ak sgn(a kj ) = 0 a kj > 0 jl(a k ) λ ak a kj = 0 where L(a k ) is the oss function of our inear regression mode or ogistic regression mode in Equation (7) w.r.t a k. Simiary, optimaity conditions can be written for β. The sub-gradient s jf(a k ) for each a kj is given by where s jf(a k ) = jl(a k ) + λ ak sgn(a kj ), a kj > 0 jl(a k ) + λ ak, a kj = 0, jl(a k ) < λ ak jl(a k ) λ ak, a kj = 0, jl(a k ) > λ ak 0, λ ak jl(a k ) λ ak jl(a k ) = 2 i ( 2X (i) j X T (i) WX (i). XT (i) a k )[y (i) β T X (i) for our inear regression mode. The negation of the subgradient represents the steepest descent direction. Simiary the sub-gradients for β ( s jf(β) ) can be cacuated. Differentia of the oss function of the inear regression in Equation (7) w.r.t β is given by jl(β) = ( 2X (i) j 2 )[y(i) β T X (i) X T (i) WX (i) i 6.. Orthant-Wise Descent (OWD) Andrew and Gao [ proposed an effective strategy for soving arge-scae -reguarized regression probems based on choosing an appropriate steepest descent direction for the objective function and taking a step ike a Newton iteration in this direction (with an L-BFGS Hessian approximation [2). The orthant-wise earning descent method for our mode takes the foowing form β P O[β γ β P S[H β s f(β) a k P O[a k γ ak P S[H a k s f(a k) where P O and P S are two projection operators and H β is the positive definite approximation of Hessian of quadratic approximation of objective function f(β), and γ β and γ ak are step sizes. P S projects the Newton-ike direction to guarantee that it is in the descent direction. P O projects the step onto the orthant containing β or a k and ensures that ine search does not cross points of non-differentiabiity Projected Scaed Sub-Gradient (PSS) Schmidt [5 proposed optimization methods caed Projected Scaed Sub-Gradient methods where the iterations can be written as the projection of a scaing of a sub-gradient of the objective. Pease refer to [ and [5 for more detais on OWD and PSS methods. 6.2 Soft-threshoding Soft-threshoding based co-ordinate descent optimization method can be used to find β, a k updates in the aternating optimization agorithm for our FHIM mode. The β updates are β j and are given by β j(λ β ) S( βj(λ β ) + k n X ij(y i i= k j ) X ik WX ki ), λ β X jk βk where W = k ak ak, and S is the soft-threshoding operator [7. Simiary, the updates for a k are ã kj and given by ( ã kj (λ ak ) S ã kj (λ ak ) + k j X jk βk k n p a kr X ir)[y i r= X ij( i= k ) X ik W jx ki, λ ak where W j is W with j th coumn and j th row eements are a zero. 7. EXPERIMENTS In this section, we use synthetic and rea datasets to demonstrate the efficacy of our mode (FHIM), and compare its performance with LASSO [8, A-Pairs Lasso [2, Hierarchica LASSO [2, Group Lasso [2, Trace-norm [9, Dirty mode [8 and QUIRE [4. For a these modes, we perform 5 runs of 5-fod cross-vaidation on training dataset (80 %)

6 to find the optima parameters and evauate prediction error on a test dataset (20 %). We search tuning parameters for a methods using grid search and for our mode the parameters λ β and λ ak are searched in the range of [0.0, 0. We aso discuss the support recovery of β and W for our mode. 7. Datasets We use synthetic datasets and a rea dataset for cassification and support recovery experiments. We give detaied description of these datasets beow. 7.. Synthetic Dataset We generate the predictors of the design matrix X using a norma distribution with mean zero and variance one. The weight matrix W was generated as a sum of K rank one matrices i.e. W = K k= a ka T k. β, a k were generated as a sparse vector from a norma distribution with mean 0 and variance, whie noise vector ɛ is generated from a norma distribution with mean 0 and variance 0.. Finay, the response vectors y of the inear and ogistic regression modes with pairwise interactions were generated using Equations (3) and (5) respectivey. We generated severa synthetic datasets by varying number of instances (n), number of variabes/predictors (p), rank of W i.e. K and sparsity eve of β, a k. We denote the combined tota predictors (that is main effects predictors + predictors for interaction terms) by q, here q = p(p + )/2. Sparsity eve (non-zeros) was chosen as 2 4% for arge p(> 00), and 5 0% for sma p(< 00) for both β, a k. In this paper, we show resuts for synthetic data in these settings: Case () n > p and q > n (high-dimensiona setting w.r.t combined predictors) and, Case (2) p > n (high-dimensiona w.r.t origina predictors) Rea Dataset To predict cancer progression status directy from bood sampes, we generated our own dataset. A sampes and cinica information were coected under Heath Insurance Portabiity and Accountabiity Act compiance from study participants after obtaining written informed consent under cinica research protocos approved by the institutiona review boards for each site. Bood was processed within 2 hours of coection according to estabished standard operating procedures. To predict RCC status, serum sampes were coected at a singe study site from patients diagnosed with RCC or benign rena mass prior to treatment. Definitive pathoogy diagnosis of RCC and cancer stage was made after resection. Outcome data was obtained through foowup from 3 months to 5 years after initia treatment. Our RCC dataset contains 22 RCC sampes from benign and 4 different stages of tumor. Expression eves of 092 proteins based on a high-throughput SOMAmer protein quantification technoogy are coected. The number of Benign, Stage, Stage 2, Stage 3 and Stage 4 tumor sampes are 40; 0; 7; 24 and 3 respectivey Experimenta Design We use inear regression modes (Equation 3) for a the foowing experiments and we ony use ogistic regression modes (Equation 5) for synthetic data experiments shown in tabe 2. We evauate the performance of our method (FHIM) by the foowing experiments:. Prediction error and support recovery experiments on synthetic datasets 2. Cassification experiments using RCC sampes: We perform three stage-wise binary cassification experiments using RCC sampes: (a) Case : Cassification of Benign sampes from Stage 4 sampes. (b) Case 2: Cassification of Benign and Stage sampes from Stage 2 4 sampes. (c) Case 3: Cassification of Benign, Stage, 2 sampes from Stage 3, 4 sampes Performance on Synthetic dataset We evauate the performance of our mode (FHIM) on synthetic dataset by the foowing experiments: (i) Comparison of optimization methods presented in section 6, (ii) Prediction error on the test data for q > n and p > n (highdimensiona settings), (iii) Support recovery accuracy of β, W and (iv) Prediction of rank of W using greedy approach. Tabe 3 shows the prediction error on test data when different optimization methods (discussed in section 6) are used for our mode (FHIM). From tabe 3, we see that both OWD and PSS methods perform neary simiar (OWD is marginay better), and are better than the soft-threshoding method. This is because, in soft-threshoding, co-ordinate updates of variabes might not be accurate in high dimensiona settings (i.e. the soution is affected by the path taken during updates). We observed that soft-threshoding in genera is sower than OWD and PSS methods. For a the other experiments discussed in this paper, we choose OWD as the optimization method for FHIM. Tabe and Tabe 2 shows Figure : Support Recovery of β (90 % sparse) and W (99 % sparse) for synthetic data Case : n > p and q > n where n = 000, p = 50, q = 275. Figure 2: Support Recovery of W (99.5 % sparse) for synthetic data Case 2: p > n where p = 500, n = 00. Onine suppementary materias contain high-quaity images for this figure. the performance comparison (in terms of prediction error on test dataset) of FHIM for inear and ogistic regression modes with respect to the state-of-the-art approaches such as Lasso, Fused Lasso, Trace-Norm and Hierarchica Lasso (HLasso is a genera version of SHIM [3). From tabes and 2, we see that FHIM generay outperforms a the state-ofthe-art approaches for both inear and ogistic pairwise regression modes. For q > n, we see that test data prediction

7 Average area under ROC curve q > n p > n n, p, K FHIM Fused Lasso Lasso HLasso Trace norm Dirty Mode 000, 50, 338.4(4.5) 425.9(20.7) 474.7(5.3) (24.82) 464.4(36.3) 63.5(0.76) 000, 50, (2.9) 888.3(2.) 922.9(43.9) 889. (2.5) 822.6(99.8) (0.76) 0000, 500, 093.(9.5) (55.) (29.5) (0.) (0.8) 0000, 500, (2.2) 22720(597.8) (23.3) (32.4) 2924(0.8) 00, 500, (50.3) 57.2(355.0) 335.0(59.2) (299.7) 65.9(62.6) 00, 000, 340. (40.02) 770.9(27.6) 879.(80.3) (208.7) 808.(5.) 00, 2000, (00.) 022.3(406.2) 99.2(32.) (47.6) 96.7(63.4) Tabe : Performance comparison for synthetic data on inear regression mode with high-order interactions. Prediction Error (MSE) and Std. deviation of MSE (shown inside brackets) on test data is used to measure the mode s performance. For p >= 500, Hierarchica Lasso (HLasso) has heavy computationa compexity, hence we don t show it s resuts here. q > n p > n n, p, K FHIM Fused Lasso Lasso HLasso Trace norm 000, 50, 0.27 (0.009) 0.28 (0.07) 0.56 (0.07) 0.36 (0.02) 0.28 (0.06) 000, 50, (0.03) (0.024) (0.042) (0.022) (0.027) 0000, 500, 0.35 (0.002) (0.007) 0.6 (0.02) (0.077) 0000, 500, (0.05) 0.54 (0.006) 0.507(0.08) (0.006) 00, 500, (0.04) (0.086) (0.054) (0.079) 00, 000, (0.056) 0.409(0.086) 0.458(0.083) (0.0) Tabe 2: Performance comparison for synthetic dataset on ogistic regression mode with high-order interactions. Miscassification Error on test data is used to measure the mode s performance error for FHIM is consistenty ower compared to a other approaches. For p > n, FHIM performs sighty better than other approaches, however, the prediction error for a the approaches is high since it s hard to accuratey recover the coefficients of main effects and pairwise interactions in very high-dimensiona settings. From figure and tabe 4, we see that our mode performs very we (F score cose to ) in the support recovery of β and W for the q > n setting. From figure 2 and tabe 5, we see that our mode performs fairy we in the support recovery of W for p > n setting. We observe that when the tuning parameters are correcty chosen, support recovery of W works very we when W is ow-rank (see tabe 4 and 5), and the F score for the support recovery of W decreases with increase in rank of W. Tabe 5 shows that for q > n the greedy strategy of FHIM accuratey recovers the rank K of W, whie for p > n, the greedy strategy might not correcty recover K. This is because the tensor factorization is not unique and sighty correated variabes can enter our mode during optimization. RCC Dataset the markers seected by Lasso, A-Pairs Lasso [2, Group Lasso, Dirty mode [8 and QUIRE. We use SLEP [3, MAL- SAR [22 and QUIRE packages for the impementation of these modes. The overa performance of the agorithms are shown in Figure 3. In this figure, we report average AUC score for five runs of 5-fod cross vaidation experiments for cancer stage prediction in RCC. In 5-fod cross vaidation experiments, we train our mode on the four fods to identify the main effects and pairwise interactions and we use the remaining one fod for testing prediction. The average AUC achieved by features seected with our mode are 0.68, 0.89 and 0.84 respectivey for the three cases discussed in section We performed pairwise t-tests for the comparisons of our method vs. the other methods, and a p-vaues are beow From figure 3, it is cear that our mode outperforms a the other agorithms that do not use prior feature group information for a the three cassification cases of RCC prediction. In addition, our mode has simiar performance to the state-of-the-art technique - QUIRE [4, which uses Gene Ontoogy based functiona annotation for grouping and custering of genes to identify high order interactions Benign vs. Stage-4 Benign, Stage vs. Stage 2-4 Benign, Stage -2 vs. Stage 3-4 APLasso Lasso Trace-norm Group Lasso Dirty Mode QUIRE FHIM (Ours) IL5RA CHEK2 ACE2 PES AIP CXC3L CD97 IL6ST Figure 3: Comparison of the cassification performances of different feature seection approaches with our mode in identifying the different stages of RCC. We perform five fod cross vaidation five times and average AUC score is reported Cassification Performance on RCC In this section, we report systematic experimenta resuts on cassification of sampes from different stages of RCC. The predictive performance of the markers and pairwise interactions seected by our mode (FHIM) is compared against Figure 4: Exampes of functiona modues for RCC Case 3, induced by markers and interactions discovered by our mode and enriched in pathways and functions associated with RCC 7..6 Informative interactions discovered by FHIM An investigation of the pairwise interactions identified by our mode on RCC dataset reveas that many of these interactions are indeed reevant to the prediction of cancer. Figure 4 shows some of the interactions associated with higher

8 weighted pairwise co-efficients for Case 3 of RCC cassification experiment. The interactions incude CX3CL- CD97, CHEK2-IL5RA which are known to be reated to proteins in bood. CX3CL was recenty found to promote breast cancer [7, whie CD97 was found to promote coorecta cancer [9. We beieve these protein interactions might ead to rena ce cancer. Further investigations of the interactions identified by our mode might revea nove protein interactions associated with rena ce cancer and thus eading to testabe hypothesis Time Compexity FHIM has O(np) time compexity for agorithm. In genera, FHIM takes more time than the Lasso approach since we do aternating optimization of β, a k. For q n setting with n = 000, q = 275, our OWD earning optimization method on Matab takes around minute for 5-fod cross-vaidation, whie for p > n with p = 2000, n = 00, our FHIM mode took around 2 hours for 5-fod cross-vaidation. Our experiments were run on inte i3 dua-core 2.9 GHz CPU with 8 GB RAM. n, p OWD Soft-thres PSS -hoding 00, , , Tabe 3: Comparison of optimization methods for our FHIM mode based on test data prediction error n, p Sparsity K Support recovery β, a k β, W (F score) 000, 50 5, 5.0,.0 000, 50 5, 5 3.0, , 50 5, 5 5.0, , 500 0, , , 500 0, , , 500 0, , 0.55 Tabe 4: Support recovery of β, W n, p true estimated W support recovery K K F score 000, , , , , , Tabe 5: Recovering K using greedy strategy 8. CONCLUSIONS In this paper, we proposed a factorization based sparse earning framework caed FHIM for identifying high-order feature interactions in inear and ogistic regression modes, and studied severa optimization methods for our mode. Empirica experiments on synthetic and rea datasets showed that our mode outperforms severa we-known techniques such as Lasso, Trace-norm, GroupLasso and achieves comparabe performance to the current state-of-the-art method - QUIRE, whie not assuming any prior knowedge about the data. Our mode gives interpretabe resuts for high-order feature interactions on RCC dataset which can be used for biomarker discovery for disease diagnosis. In the future, we wi consider the foowing directions: (i) We wi consider factorization of the weight matrix W as W = k a kb T k and higher-order feature interactions, which is more genera, but the optimization probem is non-convex; (ii) We wi extend our optimization methods from Singe- Task Learning to Muti-Task Learning; (iii) We wi consider groupings of features for both Singe Task Learning and Muti-Task Learning. References [ G. Andrew and J. Gao. Scaabe training of -reguarized og-inear modes. In Proceedings of the 24th internationa conference on Machine earning, pages ACM, [2 J. Bien, J. Tayor, and R. Tibshirani. A asso for hierarchica interactions. The Annas of Statistics, 4(3): 4, 203. [3 N. H. Choi, W. Li, and J. Zhu. Variabe seection with the strong heredity constraint and its orace property. Journa of the American Statistica Association, 05(489): , 200. [4 D. K. Duvenaud, H. Nickisch, and C. E. Rasmussen. Additive gaussian processes. In NIPS, pages , 20. [5 J. Fan and R. Li. Variabe seection via nonconcave penaized ikeihood and its orace properties. Journa of the American Statistica Association, 96(456): , 200. [6 R. Foyge, N. Srebro, and R. Saakhutdinov. Matrix reconstruction with the oca max norm. arxiv preprint arxiv:20.596, 202. [7 J. Friedman, T. Hastie, H. Höfing, and R. Tibshirani. Pathwise coordinate optimization. The Annas of Appied Statistics, (2): , [8 A. Jaai, P. Ravikumar, and S. Sanghavi. A dirty mode for mutipe sparse regression. arxiv preprint arxiv: , 20. [9 S. Ji and J. Ye. An acceerated gradient method for trace norm minimization. In Proceedings of the 26th Annua Internationa Conference on Machine Learning, pages ACM, [0 G. R. G. Lanckriet, N. Cristianini, P. Bartett, L. E. Ghaoui, and M. I. Jordan. Learning the kerne matrix with semidefinite programming. J. Mach. Learn. Res., 5:27 72, Dec [ E. L. Lehmann and G. Casea. Theory of point estimation, voume 3. Springer, 998. [2 D. C. Liu and J. Noceda. On the imited memory bfgs method for arge scae optimization. Mathematica programming, 45(-3): , 989. [3 J. Liu, S. Ji, and J. Ye. SLEP: Sparse Learning with Efficient Projections. Arizona State University, [4 R. Min, S. Chowdhury, Y. Qi, A. Stewart, and R. Ostroff. An integrated approach to bood-based cancer diagnosis and biomarker discovery. In Proceedings of the Pacific Symposium on Biocomputing, 204.

9 [5 M. Schmidt. Graphica mode structure earning with -reguarization. PhD thesis, UNIVERSITY OF BRITISH COLUMBIA, 200. [6 J. A. Suykens and J. Vandewae. Least squares support vector machine cassifiers. Neura processing etters, 9(3): , 999. [7 M. Tardaguia, E. Mira, M. A. García-Cabezas, A. M. Feijoo, M. Quintea-Fandino, I. Azcoitia, S. A. Lira, and S. Manes. Cx3c promotes breast cancer via transactivation of the egf pathway. Cancer research, 203. [8 R. Tibshirani. Regression shrinkage and seection via the asso. Journa of the Roya Statistica Society. Series B (Methodoogica), pages , 996. [9 M. Wobus, O. Huber, J. Hamann, and G. Aust. Cd97 overexpression in tumor ces at the invasion front in coorecta cancer (cc) is independenty reguated of the canonica wnt pathway. Moecuar carcinogenesis, 45():88 886, [20 T. T. Wu, Y. F. Chen, T. Hastie, E. Sobe, and K. Lange. Genome-wide association anaysis by asso penaized ogistic regression. Bioinformatics, 25(6):74 72, [2 M. Yuan and Y. Lin. Mode seection and estimation in regression with grouped variabes. Journa of the Roya Statistica Society: Series B (Statistica Methodoogy), 68():49 67, [22 J. Zhou, J. Chen, and J. Ye. MALSAR: Muti-tAsk Learning via StructurA Reguarization. Arizona State University, 20. [23 H. Zou. The adaptive asso and its orace properties. Journa of the American statistica association, 0(476):48 429, APPENDIX A. PROOFS FOR SECTION 5. Proof of Lemma 5.:. Let η n = n /2 + a n and {θ + η nδ : δ d} be the ba around θ, where δ = (u,..., u p, v,...v Kp) T = (u T, v T ) T. Define D n(δ) Q n(θ + η nδ) Q n(θ ) Where Q n(θ ) is defined in equation (8). For δ that satisfies δ = d, we have D n(δ) = L n(θ + η nδ) + L n(θ ) + n j + n k, λ β j ( β j + η nu j β j ) λ α k ( αk, + η nv k ) αk, L n(θ + η nδ) + L n(θ ) + n λ β j ( β j + η nu j βj ) j A + n ( αk, + η nv k ) αk, λ αk (k,) A 2 L n(θ + η nδ) + L n(θ ) nη n uj nηn j A λ β j (k,) A 2 λ αk L n(θ + η nδ) + L n(θ ) nηn( 2 u j + v k ) j A (k,) A 2 v k L n(θ + η nδ) + L n(θ ) nη 2 n( A + A 2 )d = [ L n(θ ) T (η nδ) 2 (ηnδ)t [ 2 L n(θ ) (η nδ)( + o p()) nη 2 n( A + A 2 )d We used Tayor s expansion in above step. above into three parts and we get: Thus, K = η n[ L n(θ ) T δ = nη n( n L n(θ )) T δ = O p(nη 2 n)δ K 2 = 2 nη2 n{δ T [ n 2 L n(θ )δ( + o p())} = 2 nη2 n{δ T [I(θ )δ( + o p())} K 3 = nη 2 n( A + A 2 )d D n(δ) K + K 2 + K 3 We spit the = O p(nη 2 n)δ + 2 nη2 n{δ T [I(θ )δ( + o p())} nη 2 n( A + A 2 )d We see that K 2 dominates the rest of the terms and is positive since I(θ) is positive definite at θ = θ from reguarity condition (C2). Therefore, for any given ɛ > 0 there exists a arge enough constant d such that P { inf Q n(θ + η nδ) > Q n(θ )} ɛ δ =d This impies that with probabiity at-east ɛ, there exists a oca minimizer in the ba {θ +η nδ : δ d}. Thus, there exists a oca minimizer of Q n(θ) such that ˆθ n θ = O p(η n). Proof of Theorem 5.2:. Let us first consider P (ˆα A c 2 = 0). It is sufficient to show that for any (k, ) A c 2 Q n(ˆθ n) α k, < 0 for ɛ n < ˆα k, < 0 () Q n(ˆθ n) α k, > 0 for ɛ n > ˆα k, > 0 (2) with probabiity tending to, where ɛ n = Cn /2 and C > 0 is any constant. To show (2), notice Q n(ˆθ n) = Ln(ˆθ n) + nλ α k sgn(ˆα k, ) α k, α k, = Ln(θ ) α k, p(k+) j= 2 L n(θ ) α k, θ j (ˆθ j θ j )

10 p(k+) j= p(k+) m= +nλ α k sgn(ˆα k, ) 3 L n( θ) α k, θ j θ m (ˆθ j θ j )(ˆθ m θ m) where θ ies between ˆθ n and θ. By reguarity conditions (C)-(C3) and the Lemma 5., we have Q n(ˆθ n) α k, = n{o p() + nλ α k sgn(ˆα k, )} As nλ α k for (k, ) A c 2 from the assumption, the sign of Qn(ˆθ n) α k, is dominated by sgn(ˆα k,j ). Thus, ( ) P Q n(ˆθ n) > 0 for 0 < ˆα k, < ɛ n α k, as n () has identica proof as above. Aso, P ( ˆβ A c = 0) can be proved simiary since in our mode β and α are independent of each other. Proof of Theorem 5.3. Let Q n(θ A) denote the objective function Q n ony on the A component of θ, that is Q n(θ) with θ A c. Based on Lemma 5. and Theorem 5.2, we have P ( θ ˆ A c = 0). Thus, ( P arg min Q n(θ A) = (A component of θ A ) arg min Q n(θ)) θ Thus, ˆθ A shoud satisfy Q n(θ A) θ j θa =ˆθ A = 0 j A (3) with probabiity tending to. Let L n(θ A) and P λ (θ A) denote the og-ikeihood function of θ A and the penaty function of θ A respectivey so that we have From (3), we have Q n(θ A) = L n(θ A) + np λ (θ A) AQ n(ˆθ A) = AL n(ˆθ A) + n AP λ (ˆθ A) = 0, (4) with probabiity tending to. Now, consider by Tayor expansion of first term and second terms at θ A = θ A, we get the foowing: AL n(ˆθ A) = AL n(θ A) [ 2 AL n(θ A) + o p() (ˆθ A θ A) [ = n n AL n(θ A)+ I(θ A) n(ˆθ A θ A) + o p() { [ λ β j n AP λ (ˆθ A) =n sgn(βj) λ α k sgn(α k, ) } +o p()(ˆθ A θ A) j A,(k,) A 2 Thus, we get, 0 = n [ n AL n(θ A) +I(θ A) n(ˆθ A θ A) + o p() Therefore, from centra imit theorem, n(ˆθa θ A) d N(0, I (θ A)) The proofs for emma and theorem of section 5.2 are aong the same ines as above. Pease refer to Appendix section C for more detais. B. COMPUTING ADAPTIVE WEIGHTS Here, we expain how the adaptive weights w β j, wα k can be cacuated for tuning parameters λ β, λ αk in Theoretica properties (Theorems 5.3 & 5.5) of Section 5. Let q be the tota number of predictors, et n be tota number of instances. When n > q, we can compute the adaptive weights w β j, wα k for tuning parameters λ β j, λα k using ordinary east squares (OLS) estimates of the training observations. where λ β j = og n n λ βw β j, λα k w β j = ˆβ OLS j, w α k = = og n n λα k wα k ˆα k OLS When q > n, the OLS estimates are not avaiabe and so we compute the weights using the ridge regression estimates, that is, repacing a the above OLS estimates with the ridge regression estimates. The tuning parameter for ridge regression can be seected using cross-vaidation. Note, we OLS find ˆα k j by taking east squares w.r.t to each α k where k [0, K for some K truek. Without oss of generaity we can assume K = truek for proving the Theoretica properties in section 5. Even if K truek, it does not affect the Theoretica properties since the cardinaity of A 2( A 2 ) does not affect the root-n consistency (see, proof of emma 5.). In practice, K is greediy chosen by ago. in our paper. C. SUPPLEMENTARY MATERIALS For interested readers, we provide onine suppementary materias at pdf with detaied proofs for Lemma 5.4 and Theorem 5.5. These proofs do not affect the understanding of this paper. We aso provide high quaity images for figure in the suppementary materias. We provide more detais about the experimenta settings for state-of-the-art techniques used in this paper., = no p() since na n = o() and ˆθ A θ A = O p(n /2 )

Moreau-Yosida Regularization for Grouped Tree Structure Learning

Moreau-Yosida Regularization for Grouped Tree Structure Learning Moreau-Yosida Reguarization for Grouped Tree Structure Learning Jun Liu Computer Science and Engineering Arizona State University J.Liu@asu.edu Jieping Ye Computer Science and Engineering Arizona State

More information

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA) 1 FRST 531 -- Mutivariate Statistics Mutivariate Discriminant Anaysis (MDA) Purpose: 1. To predict which group (Y) an observation beongs to based on the characteristics of p predictor (X) variabes, using

More information

Statistical Learning Theory: A Primer

Statistical Learning Theory: A Primer Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO

More information

Stochastic Variational Inference with Gradient Linearization

Stochastic Variational Inference with Gradient Linearization Stochastic Variationa Inference with Gradient Linearization Suppementa Materia Tobias Pötz * Anne S Wannenwetsch Stefan Roth Department of Computer Science, TU Darmstadt Preface In this suppementa materia,

More information

A Brief Introduction to Markov Chains and Hidden Markov Models

A Brief Introduction to Markov Chains and Hidden Markov Models A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,

More information

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS ISEE 1 SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS By Yingying Fan and Jinchi Lv University of Southern Caifornia This Suppementary Materia

More information

A. Distribution of the test statistic

A. Distribution of the test statistic A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch

More information

From Margins to Probabilities in Multiclass Learning Problems

From Margins to Probabilities in Multiclass Learning Problems From Margins to Probabiities in Muticass Learning Probems Andrea Passerini and Massimiiano Ponti 2 and Paoo Frasconi 3 Abstract. We study the probem of muticass cassification within the framework of error

More information

(This is a sample cover image for this issue. The actual cover is not yet available at this time.)

(This is a sample cover image for this issue. The actual cover is not yet available at this time.) (This is a sampe cover image for this issue The actua cover is not yet avaiabe at this time) This artice appeared in a journa pubished by Esevier The attached copy is furnished to the author for interna

More information

BP neural network-based sports performance prediction model applied research

BP neural network-based sports performance prediction model applied research Avaiabe onine www.jocpr.com Journa of Chemica and Pharmaceutica Research, 204, 6(7:93-936 Research Artice ISSN : 0975-7384 CODEN(USA : JCPRC5 BP neura networ-based sports performance prediction mode appied

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Schoo of Computer Science Probabiistic Graphica Modes Gaussian graphica modes and Ising modes: modeing networks Eric Xing Lecture 0, February 0, 07 Reading: See cass website Eric Xing @ CMU, 005-07 Network

More information

Multilayer Kerceptron

Multilayer Kerceptron Mutiayer Kerceptron Zotán Szabó, András Lőrincz Department of Information Systems, Facuty of Informatics Eötvös Loránd University Pázmány Péter sétány 1/C H-1117, Budapest, Hungary e-mai: szzoi@csetehu,

More information

Appendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model

Appendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model Appendix of the Paper The Roe of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Mode Caio Ameida cameida@fgv.br José Vicente jose.vaentim@bcb.gov.br June 008 1 Introduction In this

More information

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with? Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine

More information

Statistical Learning Theory: a Primer

Statistical Learning Theory: a Primer ??,??, 1 6 (??) c?? Kuwer Academic Pubishers, Boston. Manufactured in The Netherands. Statistica Learning Theory: a Primer THEODOROS EVGENIOU AND MASSIMILIANO PONTIL Center for Bioogica and Computationa

More information

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1 Inductive Bias: How to generaize on nove data CS 478 - Inductive Bias 1 Overfitting Noise vs. Exceptions CS 478 - Inductive Bias 2 Non-Linear Tasks Linear Regression wi not generaize we to the task beow

More information

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai

More information

Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix

Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 43 Do Schoos Matter for High Math Achievement? Evidence from the American Mathematics Competitions Genn Eison and Ashey Swanson Onine Appendix Appendix

More information

TUNING PARAMETER SELECTION FOR PENALIZED LIKELIHOOD ESTIMATION OF GAUSSIAN GRAPHICAL MODEL

TUNING PARAMETER SELECTION FOR PENALIZED LIKELIHOOD ESTIMATION OF GAUSSIAN GRAPHICAL MODEL Statistica Sinica 22 (2012), 1123-1146 doi:http://dx.doi.org/10.5705/ss.2009.210 TUNING PARAMETER SELECTION FOR PENALIZED LIKELIHOOD ESTIMATION OF GAUSSIAN GRAPHICAL MODEL Xin Gao, Danie Q. Pu, Yuehua

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part IX The EM agorithm In the previous set of notes, we taked about the EM agorithm as appied to fitting a mixture of Gaussians. In this set of notes, we give a broader view

More information

Appendix for Stochastic Gradient Monomial Gamma Sampler

Appendix for Stochastic Gradient Monomial Gamma Sampler 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 3 3 33 34 35 36 37 38 39 4 4 4 43 44 45 46 47 48 49 5 5 5 53 54 Appendix for Stochastic Gradient Monomia Gamma Samper A The Main Theorem We provide the foowing

More information

The EM Algorithm applied to determining new limit points of Mahler measures

The EM Algorithm applied to determining new limit points of Mahler measures Contro and Cybernetics vo. 39 (2010) No. 4 The EM Agorithm appied to determining new imit points of Maher measures by Souad E Otmani, Georges Rhin and Jean-Marc Sac-Épée Université Pau Veraine-Metz, LMAM,

More information

Research of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance

Research of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance Send Orders for Reprints to reprints@benthamscience.ae 340 The Open Cybernetics & Systemics Journa, 015, 9, 340-344 Open Access Research of Data Fusion Method of Muti-Sensor Based on Correation Coefficient

More information

Appendix A: MATLAB commands for neural networks

Appendix A: MATLAB commands for neural networks Appendix A: MATLAB commands for neura networks 132 Appendix A: MATLAB commands for neura networks p=importdata('pn.xs'); t=importdata('tn.xs'); [pn,meanp,stdp,tn,meant,stdt]=prestd(p,t); for m=1:10 net=newff(minmax(pn),[m,1],{'tansig','purein'},'trainm');

More information

STA 216 Project: Spline Approach to Discrete Survival Analysis

STA 216 Project: Spline Approach to Discrete Survival Analysis : Spine Approach to Discrete Surviva Anaysis November 4, 005 1 Introduction Athough continuous surviva anaysis differs much from the discrete surviva anaysis, there is certain ink between the two modeing

More information

SVM: Terminology 1(6) SVM: Terminology 2(6)

SVM: Terminology 1(6) SVM: Terminology 2(6) Andrew Kusiak Inteigent Systems Laboratory 39 Seamans Center he University of Iowa Iowa City, IA 54-57 SVM he maxima margin cassifier is simiar to the perceptron: It aso assumes that the data points are

More information

Learning Structural Changes of Gaussian Graphical Models in Controlled Experiments

Learning Structural Changes of Gaussian Graphical Models in Controlled Experiments Learning Structura Changes of Gaussian Graphica Modes in Controed Experiments Bai Zhang and Yue Wang Bradey Department of Eectrica and Computer Engineering Virginia Poytechnic Institute and State University

More information

Statistics for Applications. Chapter 7: Regression 1/43

Statistics for Applications. Chapter 7: Regression 1/43 Statistics for Appications Chapter 7: Regression 1/43 Heuristics of the inear regression (1) Consider a coud of i.i.d. random points (X i,y i ),i =1,...,n : 2/43 Heuristics of the inear regression (2)

More information

A Novel Learning Method for Elman Neural Network Using Local Search

A Novel Learning Method for Elman Neural Network Using Local Search Neura Information Processing Letters and Reviews Vo. 11, No. 8, August 2007 LETTER A Nove Learning Method for Eman Neura Networ Using Loca Search Facuty of Engineering, Toyama University, Gofuu 3190 Toyama

More information

Appendix for Stochastic Gradient Monomial Gamma Sampler

Appendix for Stochastic Gradient Monomial Gamma Sampler Appendix for Stochastic Gradient Monomia Gamma Samper A The Main Theorem We provide the foowing theorem to characterize the stationary distribution of the stochastic process with SDEs in (3) Theorem 3

More information

Sparse canonical correlation analysis from a predictive point of view

Sparse canonical correlation analysis from a predictive point of view Sparse canonica correation anaysis from a predictive point of view arxiv:1501.01231v1 [stat.me] 6 Jan 2015 Ines Wims Facuty of Economics and Business, KU Leuven and Christophe Croux Facuty of Economics

More information

arxiv: v2 [stat.ml] 19 Oct 2016

arxiv: v2 [stat.ml] 19 Oct 2016 Sparse Quadratic Discriminant Anaysis and Community Bayes arxiv:1407.4543v2 [stat.ml] 19 Oct 2016 Ya Le Department of Statistics Stanford University ye@stanford.edu Abstract Trevor Hastie Department of

More information

Explicit overall risk minimization transductive bound

Explicit overall risk minimization transductive bound 1 Expicit overa risk minimization transductive bound Sergio Decherchi, Paoo Gastado, Sandro Ridea, Rodofo Zunino Dept. of Biophysica and Eectronic Engineering (DIBE), Genoa University Via Opera Pia 11a,

More information

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction Akaike Information Criterion for ANOVA Mode with a Simpe Order Restriction Yu Inatsu * Department of Mathematics, Graduate Schoo of Science, Hiroshima University ABSTRACT In this paper, we consider Akaike

More information

Asynchronous Control for Coupled Markov Decision Systems

Asynchronous Control for Coupled Markov Decision Systems INFORMATION THEORY WORKSHOP (ITW) 22 Asynchronous Contro for Couped Marov Decision Systems Michae J. Neey University of Southern Caifornia Abstract This paper considers optima contro for a coection of

More information

XSAT of linear CNF formulas

XSAT of linear CNF formulas XSAT of inear CN formuas Bernd R. Schuh Dr. Bernd Schuh, D-50968 Kön, Germany; bernd.schuh@netcoogne.de eywords: compexity, XSAT, exact inear formua, -reguarity, -uniformity, NPcompeteness Abstract. Open

More information

arxiv: v1 [cs.db] 1 Aug 2012

arxiv: v1 [cs.db] 1 Aug 2012 Functiona Mechanism: Regression Anaysis under Differentia Privacy arxiv:208.029v [cs.db] Aug 202 Jun Zhang Zhenjie Zhang 2 Xiaokui Xiao Yin Yang 2 Marianne Winsett 2,3 ABSTRACT Schoo of Computer Engineering

More information

Two-Stage Least Squares as Minimum Distance

Two-Stage Least Squares as Minimum Distance Two-Stage Least Squares as Minimum Distance Frank Windmeijer Discussion Paper 17 / 683 7 June 2017 Department of Economics University of Bristo Priory Road Compex Bristo BS8 1TU United Kingdom Two-Stage

More information

Paragraph Topic Classification

Paragraph Topic Classification Paragraph Topic Cassification Eugene Nho Graduate Schoo of Business Stanford University Stanford, CA 94305 enho@stanford.edu Edward Ng Department of Eectrica Engineering Stanford University Stanford, CA

More information

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones ASummaryofGaussianProcesses Coryn A.L. Baier-Jones Cavendish Laboratory University of Cambridge caj@mrao.cam.ac.uk Introduction A genera prediction probem can be posed as foows. We consider that the variabe

More information

A proposed nonparametric mixture density estimation using B-spline functions

A proposed nonparametric mixture density estimation using B-spline functions A proposed nonparametric mixture density estimation using B-spine functions Atizez Hadrich a,b, Mourad Zribi a, Afif Masmoudi b a Laboratoire d Informatique Signa et Image de a Côte d Opae (LISIC-EA 4491),

More information

Distributed average consensus: Beyond the realm of linearity

Distributed average consensus: Beyond the realm of linearity Distributed average consensus: Beyond the ream of inearity Usman A. Khan, Soummya Kar, and José M. F. Moura Department of Eectrica and Computer Engineering Carnegie Meon University 5 Forbes Ave, Pittsburgh,

More information

arxiv: v1 [cs.lg] 31 Oct 2017

arxiv: v1 [cs.lg] 31 Oct 2017 ACCELERATED SPARSE SUBSPACE CLUSTERING Abofaz Hashemi and Haris Vikao Department of Eectrica and Computer Engineering, University of Texas at Austin, Austin, TX, USA arxiv:7.26v [cs.lg] 3 Oct 27 ABSTRACT

More information

Primal and dual active-set methods for convex quadratic programming

Primal and dual active-set methods for convex quadratic programming Math. Program., Ser. A 216) 159:469 58 DOI 1.17/s117-15-966-2 FULL LENGTH PAPER Prima and dua active-set methods for convex quadratic programming Anders Forsgren 1 Phiip E. Gi 2 Eizabeth Wong 2 Received:

More information

Data Mining Technology for Failure Prognostic of Avionics

Data Mining Technology for Failure Prognostic of Avionics IEEE Transactions on Aerospace and Eectronic Systems. Voume 38, #, pp.388-403, 00. Data Mining Technoogy for Faiure Prognostic of Avionics V.A. Skormin, Binghamton University, Binghamton, NY, 1390, USA

More information

arxiv: v2 [cs.lg] 4 Sep 2014

arxiv: v2 [cs.lg] 4 Sep 2014 Cassification with Sparse Overapping Groups Nikhi S. Rao Robert D. Nowak Department of Eectrica and Computer Engineering University of Wisconsin-Madison nrao2@wisc.edu nowak@ece.wisc.edu ariv:1402.4512v2

More information

Unconditional security of differential phase shift quantum key distribution

Unconditional security of differential phase shift quantum key distribution Unconditiona security of differentia phase shift quantum key distribution Kai Wen, Yoshihisa Yamamoto Ginzton Lab and Dept of Eectrica Engineering Stanford University Basic idea of DPS-QKD Protoco. Aice

More information

Two view learning: SVM-2K, Theory and Practice

Two view learning: SVM-2K, Theory and Practice Two view earning: SVM-2K, Theory and Practice Jason D.R. Farquhar jdrf99r@ecs.soton.ac.uk Hongying Meng hongying@cs.york.ac.uk David R. Hardoon drh@ecs.soton.ac.uk John Shawe-Tayor jst@ecs.soton.ac.uk

More information

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel Sequentia Decoding of Poar Codes with Arbitrary Binary Kerne Vera Miosavskaya, Peter Trifonov Saint-Petersburg State Poytechnic University Emai: veram,petert}@dcn.icc.spbstu.ru Abstract The probem of efficient

More information

CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION

CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION SAHAR KARIMI AND STEPHEN VAVASIS Abstract. In this paper we present a variant of the conjugate gradient (CG) agorithm in which we invoke a subspace minimization

More information

Stochastic Automata Networks (SAN) - Modelling. and Evaluation. Paulo Fernandes 1. Brigitte Plateau 2. May 29, 1997

Stochastic Automata Networks (SAN) - Modelling. and Evaluation. Paulo Fernandes 1. Brigitte Plateau 2. May 29, 1997 Stochastic utomata etworks (S) - Modeing and Evauation Pauo Fernandes rigitte Pateau 2 May 29, 997 Institut ationa Poytechnique de Grenobe { IPG Ecoe ationae Superieure d'informatique et de Mathematiques

More information

Learning Fully Observed Undirected Graphical Models

Learning Fully Observed Undirected Graphical Models Learning Fuy Observed Undirected Graphica Modes Sides Credit: Matt Gormey (2016) Kayhan Batmangheich 1 Machine Learning The data inspires the structures we want to predict Inference finds {best structure,

More information

A Comparison Study of the Test for Right Censored and Grouped Data

A Comparison Study of the Test for Right Censored and Grouped Data Communications for Statistica Appications and Methods 2015, Vo. 22, No. 4, 313 320 DOI: http://dx.doi.org/10.5351/csam.2015.22.4.313 Print ISSN 2287-7843 / Onine ISSN 2383-4757 A Comparison Study of the

More information

Chemical Kinetics Part 2

Chemical Kinetics Part 2 Integrated Rate Laws Chemica Kinetics Part 2 The rate aw we have discussed thus far is the differentia rate aw. Let us consider the very simpe reaction: a A à products The differentia rate reates the rate

More information

Ant Colony Algorithms for Constructing Bayesian Multi-net Classifiers

Ant Colony Algorithms for Constructing Bayesian Multi-net Classifiers Ant Coony Agorithms for Constructing Bayesian Muti-net Cassifiers Khaid M. Saama and Aex A. Freitas Schoo of Computing, University of Kent, Canterbury, UK. {kms39,a.a.freitas}@kent.ac.uk December 5, 2013

More information

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After

More information

Alberto Maydeu Olivares Instituto de Empresa Marketing Dept. C/Maria de Molina Madrid Spain

Alberto Maydeu Olivares Instituto de Empresa Marketing Dept. C/Maria de Molina Madrid Spain CORRECTIONS TO CLASSICAL PROCEDURES FOR ESTIMATING THURSTONE S CASE V MODEL FOR RANKING DATA Aberto Maydeu Oivares Instituto de Empresa Marketing Dept. C/Maria de Moina -5 28006 Madrid Spain Aberto.Maydeu@ie.edu

More information

Lecture Note 3: Stationary Iterative Methods

Lecture Note 3: Stationary Iterative Methods MATH 5330: Computationa Methods of Linear Agebra Lecture Note 3: Stationary Iterative Methods Xianyi Zeng Department of Mathematica Sciences, UTEP Stationary Iterative Methods The Gaussian eimination (or

More information

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization Scaabe Spectrum ocation for Large Networks ased on Sparse Optimization innan Zhuang Modem R&D Lab Samsung Semiconductor, Inc. San Diego, C Dongning Guo, Ermin Wei, and Michae L. Honig Department of Eectrica

More information

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries c 26 Noninear Phenomena in Compex Systems First-Order Corrections to Gutzwier s Trace Formua for Systems with Discrete Symmetries Hoger Cartarius, Jörg Main, and Günter Wunner Institut für Theoretische

More information

Discriminant Analysis: A Unified Approach

Discriminant Analysis: A Unified Approach Discriminant Anaysis: A Unified Approach Peng Zhang & Jing Peng Tuane University Eectrica Engineering & Computer Science Department New Oreans, LA 708 {zhangp,jp}@eecs.tuane.edu Norbert Riede Tuane University

More information

Chemical Kinetics Part 2. Chapter 16

Chemical Kinetics Part 2. Chapter 16 Chemica Kinetics Part 2 Chapter 16 Integrated Rate Laws The rate aw we have discussed thus far is the differentia rate aw. Let us consider the very simpe reaction: a A à products The differentia rate reates

More information

Separation of Variables and a Spherical Shell with Surface Charge

Separation of Variables and a Spherical Shell with Surface Charge Separation of Variabes and a Spherica She with Surface Charge In cass we worked out the eectrostatic potentia due to a spherica she of radius R with a surface charge density σθ = σ cos θ. This cacuation

More information

A Solution to the 4-bit Parity Problem with a Single Quaternary Neuron

A Solution to the 4-bit Parity Problem with a Single Quaternary Neuron Neura Information Processing - Letters and Reviews Vo. 5, No. 2, November 2004 LETTER A Soution to the 4-bit Parity Probem with a Singe Quaternary Neuron Tohru Nitta Nationa Institute of Advanced Industria

More information

Convolutional Networks 2: Training, deep convolutional networks

Convolutional Networks 2: Training, deep convolutional networks Convoutiona Networks 2: Training, deep convoutiona networks Hakan Bien Machine Learning Practica MLP Lecture 8 30 October / 6 November 2018 MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks

More information

Related Topics Maxwell s equations, electrical eddy field, magnetic field of coils, coil, magnetic flux, induced voltage

Related Topics Maxwell s equations, electrical eddy field, magnetic field of coils, coil, magnetic flux, induced voltage Magnetic induction TEP Reated Topics Maxwe s equations, eectrica eddy fied, magnetic fied of cois, coi, magnetic fux, induced votage Principe A magnetic fied of variabe frequency and varying strength is

More information

Melodic contour estimation with B-spline models using a MDL criterion

Melodic contour estimation with B-spline models using a MDL criterion Meodic contour estimation with B-spine modes using a MDL criterion Damien Loive, Ney Barbot, Oivier Boeffard IRISA / University of Rennes 1 - ENSSAT 6 rue de Kerampont, B.P. 80518, F-305 Lannion Cedex

More information

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7 6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17 Soution 7 Probem 1: Generating Random Variabes Each part of this probem requires impementation in MATLAB. For the

More information

Target Location Estimation in Wireless Sensor Networks Using Binary Data

Target Location Estimation in Wireless Sensor Networks Using Binary Data Target Location stimation in Wireess Sensor Networks Using Binary Data Ruixin Niu and Pramod K. Varshney Department of ectrica ngineering and Computer Science Link Ha Syracuse University Syracuse, NY 344

More information

Active Learning & Experimental Design

Active Learning & Experimental Design Active Learning & Experimenta Design Danie Ting Heaviy modified, of course, by Lye Ungar Origina Sides by Barbara Engehardt and Aex Shyr Lye Ungar, University of Pennsyvania Motivation u Data coection

More information

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1654 March23, 1999

More information

A Simple and Efficient Algorithm of 3-D Single-Source Localization with Uniform Cross Array Bing Xue 1 2 a) * Guangyou Fang 1 2 b and Yicai Ji 1 2 c)

A Simple and Efficient Algorithm of 3-D Single-Source Localization with Uniform Cross Array Bing Xue 1 2 a) * Guangyou Fang 1 2 b and Yicai Ji 1 2 c) A Simpe Efficient Agorithm of 3-D Singe-Source Locaization with Uniform Cross Array Bing Xue a * Guangyou Fang b Yicai Ji c Key Laboratory of Eectromagnetic Radiation Sensing Technoogy, Institute of Eectronics,

More information

Support Vector Machine and Its Application to Regression and Classification

Support Vector Machine and Its Application to Regression and Classification BearWorks Institutiona Repository MSU Graduate Theses Spring 2017 Support Vector Machine and Its Appication to Regression and Cassification Xiaotong Hu As with any inteectua project, the content and views

More information

Convergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems

Convergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems Convergence Property of the Iri-Imai Agorithm for Some Smooth Convex Programming Probems S. Zhang Communicated by Z.Q. Luo Assistant Professor, Department of Econometrics, University of Groningen, Groningen,

More information

FORECASTING TELECOMMUNICATIONS DATA WITH AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODELS

FORECASTING TELECOMMUNICATIONS DATA WITH AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODELS FORECASTING TEECOMMUNICATIONS DATA WITH AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODES Niesh Subhash naawade a, Mrs. Meenakshi Pawar b a SVERI's Coege of Engineering, Pandharpur. nieshsubhash15@gmai.com

More information

Determining The Degree of Generalization Using An Incremental Learning Algorithm

Determining The Degree of Generalization Using An Incremental Learning Algorithm Determining The Degree of Generaization Using An Incrementa Learning Agorithm Pabo Zegers Facutad de Ingeniería, Universidad de os Andes San Caros de Apoquindo 22, Las Condes, Santiago, Chie pzegers@uandes.c

More information

HILBERT? What is HILBERT? Matlab Implementation of Adaptive 2D BEM. Dirk Praetorius. Features of HILBERT

HILBERT? What is HILBERT? Matlab Implementation of Adaptive 2D BEM. Dirk Praetorius. Features of HILBERT Söerhaus-Workshop 2009 October 16, 2009 What is HILBERT? HILBERT Matab Impementation of Adaptive 2D BEM joint work with M. Aurada, M. Ebner, S. Ferraz-Leite, P. Godenits, M. Karkuik, M. Mayr Hibert Is

More information

Partial permutation decoding for MacDonald codes

Partial permutation decoding for MacDonald codes Partia permutation decoding for MacDonad codes J.D. Key Department of Mathematics and Appied Mathematics University of the Western Cape 7535 Bevie, South Africa P. Seneviratne Department of Mathematics

More information

Testing for the Existence of Clusters

Testing for the Existence of Clusters Testing for the Existence of Custers Caudio Fuentes and George Casea University of Forida November 13, 2008 Abstract The detection and determination of custers has been of specia interest, among researchers

More information

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION Hsiao-Chang Chen Dept. of Systems Engineering University of Pennsyvania Phiadephia, PA 904-635, U.S.A. Chun-Hung Chen

More information

An Information Geometrical View of Stationary Subspace Analysis

An Information Geometrical View of Stationary Subspace Analysis An Information Geometrica View of Stationary Subspace Anaysis Motoaki Kawanabe, Wojciech Samek, Pau von Bünau, and Frank C. Meinecke Fraunhofer Institute FIRST, Kekuéstr. 7, 12489 Berin, Germany Berin

More information

Some Measures for Asymmetry of Distributions

Some Measures for Asymmetry of Distributions Some Measures for Asymmetry of Distributions Georgi N. Boshnakov First version: 31 January 2006 Research Report No. 5, 2006, Probabiity and Statistics Group Schoo of Mathematics, The University of Manchester

More information

Process Capability Proposal. with Polynomial Profile

Process Capability Proposal. with Polynomial Profile Contemporary Engineering Sciences, Vo. 11, 2018, no. 85, 4227-4236 HIKARI Ltd, www.m-hikari.com https://doi.org/10.12988/ces.2018.88467 Process Capabiity Proposa with Poynomia Profie Roberto José Herrera

More information

Algorithms to solve massively under-defined systems of multivariate quadratic equations

Algorithms to solve massively under-defined systems of multivariate quadratic equations Agorithms to sove massivey under-defined systems of mutivariate quadratic equations Yasufumi Hashimoto Abstract It is we known that the probem to sove a set of randomy chosen mutivariate quadratic equations

More information

Reichenbachian Common Cause Systems

Reichenbachian Common Cause Systems Reichenbachian Common Cause Systems G. Hofer-Szabó Department of Phiosophy Technica University of Budapest e-mai: gszabo@hps.ete.hu Mikós Rédei Department of History and Phiosophy of Science Eötvös University,

More information

Nonlinear Gaussian Filtering via Radial Basis Function Approximation

Nonlinear Gaussian Filtering via Radial Basis Function Approximation 51st IEEE Conference on Decision and Contro December 10-13 01 Maui Hawaii USA Noninear Gaussian Fitering via Radia Basis Function Approximation Huazhen Fang Jia Wang and Raymond A de Caafon Abstract This

More information

II. PROBLEM. A. Description. For the space of audio signals

II. PROBLEM. A. Description. For the space of audio signals CS229 - Fina Report Speech Recording based Language Recognition (Natura Language) Leopod Cambier - cambier; Matan Leibovich - matane; Cindy Orozco Bohorquez - orozcocc ABSTRACT We construct a rea time

More information

Efficiently Generating Random Bits from Finite State Markov Chains

Efficiently Generating Random Bits from Finite State Markov Chains 1 Efficienty Generating Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown

More information

FUSED MULTIPLE GRAPHICAL LASSO

FUSED MULTIPLE GRAPHICAL LASSO FUSED MULTIPLE GRAPHICAL LASSO SEN YANG, ZHAOSONG LU, XIAOTONG SHEN, PETER WONKA, JIEPING YE Abstract. In this paper, we consider the probem of estimating mutipe graphica modes simutaneousy using the fused

More information

18-660: Numerical Methods for Engineering Design and Optimization

18-660: Numerical Methods for Engineering Design and Optimization 8-660: Numerica Methods for Engineering esign and Optimization in i epartment of ECE Carnegie Meon University Pittsburgh, PA 523 Side Overview Conjugate Gradient Method (Part 4) Pre-conditioning Noninear

More information

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES Separation of variabes is a method to sove certain PDEs which have a warped product structure. First, on R n, a inear PDE of order m is

More information

Traffic data collection

Traffic data collection Chapter 32 Traffic data coection 32.1 Overview Unike many other discipines of the engineering, the situations that are interesting to a traffic engineer cannot be reproduced in a aboratory. Even if road

More information

Available online at ScienceDirect. Procedia Computer Science 96 (2016 )

Available online at  ScienceDirect. Procedia Computer Science 96 (2016 ) Avaiabe onine at www.sciencedirect.com ScienceDirect Procedia Computer Science 96 (206 92 99 20th Internationa Conference on Knowedge Based and Inteigent Information and Engineering Systems Connected categorica

More information

Optimality of Inference in Hierarchical Coding for Distributed Object-Based Representations

Optimality of Inference in Hierarchical Coding for Distributed Object-Based Representations Optimaity of Inference in Hierarchica Coding for Distributed Object-Based Representations Simon Brodeur, Jean Rouat NECOTIS, Département génie éectrique et génie informatique, Université de Sherbrooke,

More information

Categories and Subject Descriptors B.7.2 [Integrated Circuits]: Design Aids Verification. General Terms Algorithms

Categories and Subject Descriptors B.7.2 [Integrated Circuits]: Design Aids Verification. General Terms Algorithms 5. oward Eicient arge-scae Perormance odeing o Integrated Circuits via uti-ode/uti-corner Sparse Regression Wangyang Zhang entor Graphics Corporation Ridder Park Drive San Jose, CA 953 wangyan@ece.cmu.edu

More information

8 APPENDIX. E[m M] = (n S )(1 exp( exp(s min + c M))) (19) E[m M] n exp(s min + c M) (20) 8.1 EMPIRICAL EVALUATION OF SAMPLING

8 APPENDIX. E[m M] = (n S )(1 exp( exp(s min + c M))) (19) E[m M] n exp(s min + c M) (20) 8.1 EMPIRICAL EVALUATION OF SAMPLING 8 APPENDIX 8.1 EMPIRICAL EVALUATION OF SAMPLING We wish to evauate the empirica accuracy of our samping technique on concrete exampes. We do this in two ways. First, we can sort the eements by probabiity

More information

Approximated MLC shape matrix decomposition with interleaf collision constraint

Approximated MLC shape matrix decomposition with interleaf collision constraint Approximated MLC shape matrix decomposition with intereaf coision constraint Thomas Kainowski Antje Kiese Abstract Shape matrix decomposition is a subprobem in radiation therapy panning. A given fuence

More information

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete Uniprocessor Feasibiity of Sporadic Tasks with Constrained Deadines is Strongy conp-compete Pontus Ekberg and Wang Yi Uppsaa University, Sweden Emai: {pontus.ekberg yi}@it.uu.se Abstract Deciding the feasibiity

More information

A Separability Index for Distance-based Clustering and Classification Algorithms

A Separability Index for Distance-based Clustering and Classification Algorithms IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL.?, NO.?, JANUARY 1? 1 A Separabiity Index for Distance-based Custering and Cassification Agorithms Arka P. Ghosh, Ranjan Maitra and Anna

More information

Another Look at Linear Programming for Feature Selection via Methods of Regularization 1

Another Look at Linear Programming for Feature Selection via Methods of Regularization 1 Another Look at Linear Programming for Feature Seection via Methods of Reguarization Yonggang Yao, The Ohio State University Yoonkyung Lee, The Ohio State University Technica Report No. 800 November, 2007

More information