Another Look at Linear Programming for Feature Selection via Methods of Regularization 1

Size: px
Start display at page:

Download "Another Look at Linear Programming for Feature Selection via Methods of Regularization 1"

Transcription

1 Another Look at Linear Programming for Feature Seection via Methods of Reguarization Yonggang Yao, The Ohio State University Yoonkyung Lee, The Ohio State University Technica Report No. 800 November, 2007 Department of Statistics The Ohio State University 958 Nei Avenue Coumbus, OH This is the first revision of Technica Report 800, Department of Statistics, The Ohio State University, Juy Some numerica resuts in Section 6 have been added.

2 Another Look at Linear Programming for Feature Seection via Methods of Reguarization Yonggang Yao and Yoonkyung Lee The Ohio State University, USA Summary. We consider statistica procedures for feature seection defined by a famiy of reguarization probems with convex piecewise inear oss functions and penaties of nature. Many known statistica procedures (e.g. quantie regression and support vector machines with norm penaty) are subsumed under this category. Computationay, the reguarization probems are inear programming (LP) probems indexed by a singe parameter, which are known as parametric cost LP or parametric right-hand-side LP in the optimization theory. Expoiting the connection with the LP theory, we ay out genera agorithms, namey, the simpex agorithm and its variant for generating reguarized soution paths for the feature seection probems. The significance of such agorithms is that they aow a compete exporation of the mode space aong the paths and provide a broad view of persistent features in the data. The impications of the genera path-finding agorithms are outined for a few statistica procedures, and they are iustrated with numerica exampes. Keywords: -norm penaty; Parametric inear programming, Quantie regression; Simpex method; Structured earning; Support vector machines. Introduction Reguarization methods cover a wide range of statistica procedures for estimation and prediction, and they have been used in many modern appications. To name a few, exampes are ridge regression (Hoer and Kennard, 970), the LASSO regression (Tibshirani, 996), smoothing spines (Wahba, 990), and support vector machines (SVM) (Vapnik, 998). Given a training data set, {(y i, x i ) : x i R p ; i =,, n}, many statistica probems can be phrased as the probem of finding a functiona reationship between the covariates, x R p, and the response y based on the observed pairs. For exampe, a reguarization method for prediction ooks for a mode f(x; β) with unknown parameters β that minimizes a prediction error over the training data whie controing its mode compexity. To be precise, et L(y, f(x; β)) be a convex oss function for the prediction error and J(f(x; β)) be a convex penaty functiona that measures the mode compexity. The training error with respect to L is defined by L(Y, f(x; β)) := n i= L(y i, f(x i ; β)), where Y := (y,, y n ) and X := (x,, x n). Formay, the soution to a reguarization probem is defined to be f with the mode parameters ˆβ that minimize: L(Y, f) + λ J(f), () where λ 0 is a pre-specified reguarization parameter. The λ determines the trade-off between the prediction error and the mode compexity, and thus the quaity of the soution highy depends on the choice of λ. Identification of a proper vaue of the reguarization parameter for mode seection or a proper range for mode averaging is a critica statistica probem. Note that ˆβ(λ) is a function of λ. As in (), each reguarization method defines a continuum of optimization probems indexed by a tuning parameter. In most cases, the soution as a function of the tuning parameter is expected to change continuousy with λ. This aows for the possibiity of compete exporation of the mode space as λ varies, and computationa savings if () is to be optimized for mutipe vaues of λ. Aternativey, the reguarization probem in () can be formuated to bound the mode compexity or the penaty. In this compexity-bounded formuation, the optima parameters are sought by minimizing: L(Y, f) s.t. J(f) s, (2) Address for correspondence: Yoonkyung Lee, Department of Statistics, The Ohio State University, 958 Nei Ave, Coumbus, OH 4320, USA. E-mai: ykee@stat.osu.edu

3 where s is an upper bound of the compexity. For a certain combination of the oss L and the compexity measure J, it is feasibe to generate the entire soution path of the reguarization probem. Here, the path refers to the entire set of soutions to the reguarization probem, for instance, ˆβ(λ) in () as a function of λ (or ˆβ(s) in (2) as a function of s). Some pairs of the oss and the compexity are known to aow such fast and efficient path finding agorithms; for instance, LARS (Efron et a., 2004), the standard binary SVM (Hastie et a., 2004), the muti-category SVM (Lee and Cui, 2006), and the -norm quantie regression (Li and Zhu, 2005). Rosset and Zhu (2007) study genera conditions for the combination of L and J such that soutions indexed by a reguarization parameter are piecewise inear and thus can be sequentiay characterized. They provide generic path-finding agorithms under some appropriate assumptions on L and J. In this paper, we focus on an array of reguarization methods aimed for feature seection with penaties of nature and piecewise inear oss functions. Many existing procedures are subsumed under this category. Exampes incude the -norm SVM (Zhu et a., 2004) and its extension to the muti-cass case (Wang and Shen, 2006), -norm quantie regression (Li and Zhu, 2005), Sup-norm muti-category SVM (Zhang et a., 2006), the functiona component seection step (caed θ-step ) for structured muti-category SVM (Lee et a., 2006), and the Dantzig seector (Candes and Tao, 2005). We aso note that the ǫ-insensitive oss in the SVM regression (Vapnik, 998) fits into the category of a piecewise inear oss, and the sup norm gives rise to a inear penaty just as the norm in genera. There is a great commonaity among these methods. That is, computationay the associated optimization probems are a inear programming (LP) probems indexed by a singe reguarization parameter. This famiy of LP probems are known as the parametric cost inear programming and have ong been studied in the optimization theory. Furthermore, there aready exist efficient agorithms for the soution paths. Despite the commonaity, so far, ony case-by-case treatments of some of the probems are avaiabe as in Zhu et a. (2004); Li and Zhu (2005) and Wang and Shen (2006). Athough Wang and Shen (2006) notice that those soution path agorithms have fundamenta connections with the parametric right-hand-side LP (see (8) for the definition), such connections have not been adequatey expored for other probems with generaity. As noted, Rosset and Zhu (2007) have a comprehensive take on the computationa properties of reguarized soutions, however they did not tap into the LP theory for genera treatments of the probems of our focus. The goa of this paper is to make it more expicit the ink between the parametric LP and a famiy of computationa probems arising in statistics for feature seection via reguarization and put those feature seection probems in perspectives. To this end, we pu together resuts from the inear programming iterature and summarize them in an accessibe and sef-contained fashion. Section 2 begins with an overview of the standard LP and parametric LP probems, and gives a brief account of the optimaity conditions for their soutions. Section 3 introduces the simpex agorithm and the tabeau-simpex agorithm for finding the entire soution paths of the parametric LP probems. Section 4 describes a few exampes of LP for feature seection, paraphrasing their computationa eements in the LP terms. A detaied comparison of the simpex agorithm with the existing agorithm for the -norm SVM (Zhu et a., 2004) is given in Section 5 highighting the generaity of the proposed approach. Numerica exampes and data appication of the agorithm foow in Section 6 for iustration. Technica proofs except for the key theorems are coected into Appendix. 2. Linear Programming Linear programming (LP) is one of the cornerstones of the optimization theory. Since the pubication of the simpex agorithm by Dantzig in 947, LP has quicky found its appications in operation research, microeconomics, business management, and other engineering fieds. We give an overview of LP here and describe the optimaity conditions of the LP soution pertinent to our discussion of path-finding agorithms. Some properties of the LP to be described are we known in the optimization iterature, but we incude them and their proofs for competeness. Our treatment of LP cosey foows that in standard references such as Bertsimas and Tsitsikis (997) and Murty (983). The readers are referred to them and references therein for more compete discussions. 2

4 2.. Standard Linear Programs A standard form of LP is min z R N s.t. c z Az = b z 0, (3) where z is an N-vector of variabes, c is a fixed N-vector, b is a fixed M-vector, and A is an M N fixed matrix. Without oss of generaity, it is assumed that M N and A is of fu rank. Standard techniques for soving LP incude simpex method, dua simpex method, tabeau method, and interior point methods. Geometricay speaking, the standard LP probem in (3) searches the minimum of a inear function over a poyhedron whose edges are defined by hyperpanes. Therefore, if there exists a fixed soution for the LP probem, at east one of the intersection points (formay caed basic soutions) of the hyperpanes shoud attain the minimum. For forma discussion of the optimaity, a brief review of some terminoogies in LP is provided. Let N denote the index set {,, N} of the unknowns, z, in the LP probem in (3). DEFINITION. A set B := {B,, BM } N is caed a basic index set, if A B := [A B,, A BM ] is invertibe, where A B i is the Bi th coumn vector of A for i B. A B is caed the basic matrix associated with B. Correspondingy, a vector z R N is caed the basic soution associated with B, if z satisfies { z B := (z B,, z B ) = M B b zj = 0 for j N \ B. DEFINITION 2. Let z be the basic soution associated with B. z is caed a basic feasibe soution if z B 0; z is caed a non-degenerate basic feasibe soution if z B > 0; z is caed a degenerate basic feasibe soution if z B 0 and z Bi = 0 for some i M := {,, M}; z is caed an optima basic soution if z is a soution of the LP probem. Since each basic soution is associated with its basic index set, the optima basic soution can be identified with the optima basic index set as defined beow. DEFINITION 3. A basic index set B is caed a feasibe basic index set if B b 0. A feasibe basic index set B is aso caed an optima basic index set if [ c A ( ) ] B cb 0. The foowing theorem indicates that the standard LP probem can be soved by finding the optima basic index set. THEOREM 4. For the LP probem in (3), et z be the basic soution associated with B, an optima basic index set. Then z is an optima basic soution. PROOF. We need to show c z c z or c (z z ) 0 for any feasibe vector z R N with Az = b and z 0. Denote d := (d,, d N ) := (z z ). From Ad = A B d B + i N \B A i d i = 0, we have d B = i N \B B A id i. 3

5 Then, c (z z ) = c d = c B d B + c i d i i N \B = (c i c B A A B i)d i. i N \B [ Reca that for i N \ B, zi = 0, which impies d i := (z i zi ) 0. Together with c A ( ) ] B cb it ensures (c i c B A B A i)d i 0. Thus, we have c d 0. 0, 2.2. Parametric Linear Programs In practica appications, the cost coefficients c or the constraint constants b in (3) are often partiay known or controabe so that they may be modeed ineary as (c +λa) or (b + ωb ) with some parameters λ and ω R. A famiy of reguarization methods for feature seection to be discussed share this characteristic. Athough every parameter vaue creates a new LP probem in the setting, it is feasibe to generate soutions for a vaues of the parameter via sequentia updates. The new LP probems indexed by the parameters are caed the parametric-cost LP and parametric right-hand-side LP, respectivey. For reference, see Bertsimas and Tsitsikis (997), p , and Murty (983), p The standard form of a parametric-cost LP is defined as min z R N s.t. (c + λa) z Az = b z 0. Since the basic index sets of the parametric-cost LP do not depend on the parameter λ, it is not hard to see that an optima basic index set B for some fixed vaue of λ woud remain optima for a range of λ vaues, say, [λ, λ]. The interva is caed the optimaity interva of B for the parametric-cost LP probem. COROLLARY 5. For a fixed λ 0, et B be an optima basic index set of the probem in (4) at λ = λ. Define ( ) λ := max č j (5) {j : ǎ j > 0; j N \ B } and λ := min {j : ǎ j < 0; j N \ B } ǎ j ( č j ǎ j where ǎ j := a j a B A B A j and č j := c j c B A B A j for j N. Then, B is an optima basic index set of (4) for λ [λ, λ], which incudes λ. PROOF. From the optimaity of B for λ = λ, we have B b 0 and [ c A ( ) ] [ B cb + λ a A ( ) ] B ab 0, which impies that č j + λ ǎ j 0 for j N. To find the optimaity interva [λ, λ] of B, by Theorem 4, we need to investigate the foowing inequaity for each j N : ), č j + λǎ j 0. (6) It is easy to see that A B Bi = e i for i M since A B i is the ith coumn of A B. Consequenty, the jth entries of (c c B A B A) and (a a B A B A) are both 0 for j B, and č j + λǎ j = 0 for any λ. So, the inequaity hods for any λ R and j B. When ǎ j > 0 (or ǎ j < 0) for j (N \ B ), (6) hods if and ony if λ č j/ǎ j (or λ č j /ǎ j ). Thus, the ower bound and the upper bound of the optimaity interva of B are given by the λ and λ in (5). (4) 4

6 Note that č j and ǎ j define the reative cost coefficient of z j. Since the number of basic index sets is finite for fixed A, there exist ony a finite number of optima basic index sets of the probem in (4). Coroary 5 impies that a version of the soution path of the probem as a function of λ, z(λ), is a step function. On the other hand, if the parametric cost LP in (4) is recast in the form of (2), then the property of the soution path changes. The aternative compexity-bounded formuation of (4) is given by min z R N, δ R s.t. c z Az = b a z + δ = s z 0, δ 0. It can be transformed into a standard parametric right-hand-side LP probem: min Þ R N+ s.t. Þ AÞ = + ω (8) Þ 0 [ ] [ ] [ ] [ ] [ ] z c b by setting ω = s, Þ =, =, =, 0 =, and A A 0 = δ 0 0 a. Note that when A in (8) is of fu rank, so is A. Let B be an optima basic index set of (8) at ω = ω. Simiary, we can show that B is optima for any ω satisfying Þ B = B ( + ω ) 0, and there exist ω and ω such that B is optima for ω [ω, ω]. This impies that a version of the soution path of (8) is a piecewise inear function. (7) 3. Generating the Soution Path Based on the basic concepts and the optimaity condition of LP introduced in Section 2, we describe agorithms to generate the soution paths for (4) and (7), namey, the simpex and tabeau-simpex agorithms. Since the exampes of the LP probem in Section 4 for feature seection invove non-negative a, λ, and s ony, we assume that they are non-negative in the foowing agorithms and take s = 0 (equivaenty λ = ) as a starting vaue. 3.. Simpex Agorithm 3... Initiaization Let z 0 := (z 0,, z 0 N ) denote the initia soution of (7) at s = 0. a z 0 = 0 impies z 0 j = 0 for a j / I a := {i : a i = 0, i N }. Thus, by extracting the coordinates of c, z, and the coumns in A corresponding to I a, we can simpify the initia LP probem of (4) and (7) to min z Ia R Ia s.t. c Ia z Ia A Ia z Ia = b z Ia 0, (9) where I a is the cardinaity of I a. Accordingy, any initia optima basic index set, B 0 of (4) and (7) contains that of the reduced probem (9) and determines the initia soution z Main Agorithm For simpicity, we describe the agorithm for the soution path of the parametric-cost LP probem in (4) first, and then discuss how it aso soves the compexity-bounded LP probem in (7). Let B be the th optima basic index set at λ = λ. For convenience, define λ :=, the starting vaue of the reguarization parameter for the soution path of (4). Given B, et z be the th joint soution, which is given by z B = B b and z j = 0 for j N \ B. Since the optima LP soution is identified by the optima basic index set as in Theorem 4, it suffices to describe how to update the optima basic index set as λ decreases. By the invertibiity of 5

7 the basic matrix associated with the index set, updating amounts to finding a new index that enters and the other that exits the current basic index set. By Coroary 5, we can compute the ower bound of the optimaity interva of B denoted by λ and identify the entry index associated with it. Let ( ) j := arg max {j : ǎ j > 0; j (N \ B )} č j ǎ j, (0) where ǎ j := (a j a B A B j ) and č j := (c j c B A B j). Then, the ower bound is given by λ := č j /ǎ j, and B is optima for λ [λ, λ ]. To determine the index exiting B, consider the moving direction from z to the next joint soution. Define d := (d,, d N ) as d B = A A B j, d j =, and () d i = 0 for i N \ (B {j }). Lemma 2 in Appendix shows that d is the moving direction at λ = λ in the sense that z + = z + τd for some τ 0. For the feasibiity of z + 0, the step size τ can not exceed the minimum of zi /d i for i B with d i < 0, and the index attaining the minimum is to eave B. Denote the the exit index by i := arg min i {j: d j <0, j B } ( z i d i ). (2) Therefore, the optima basic index set at λ = λ is given by B + := B {j } \ {i }. More precisey, we can verify the optimaity of B + at λ = λ by showing that (c + λ a) A ( B + ) (cb + + λ a B +) (3) = (c + λ a) A ( B ) (cb + λ a B ). The proof is given in Appendix B. Then the fact that B is optima at λ = λ impies that B + is aso optima at λ = λ. As a resut, the updating procedure can be repeated with B + and λ successivey unti λ < 0 or equivaenty č j 0. The agorithm for updating the optima basic index sets is summarized as foows. (a) Initiaize the optima basic index set at λ = with B 0. (b) Given B, the th optima basic index set at λ = λ, determine the soution z by z B 0 for j N \ B. (c) Find the entry index ( ) (d) Find the exit index j = i = arg max j : ǎ j > 0; j N \ B č j ǎ j ( ) arg min z i i {j: d j <0, j B } d. i If there are mutipe indices, choose one of them. (e) Update the optima basic index set to B + = B {j } \ {i }. (f) Terminate the agorithm if č j 0 or equivaenty λ 0. Otherwise, repeat = B b and z j = If z i /d i = 0, then z = z +, which may resut in the probem of cycing among severa basic index sets with the same soution. We defer the description of the tabeau-simpex agorithm which can avoid the cycing probem to Section 3.2. For brevity, we just assume that z + τd 0 for some τ > 0 so that z z + for each and ca this non-degeneracy assumption. Under this assumption, suppose the simpex agorithm terminates after J iterations with {(z, λ ) : = 0,,, J}. Then the entire soution path is obtained as described beow. 6

8 THEOREM 6. The soution path of (4) is z 0 for λ > λ 0 z for λ < λ < λ, =,, J τz + ( τ)z + for λ = λ and τ [0, ], = 0,, J. (4) Likewise, the soutions to the aternative formuation of (7) with the compexity bound can be obtained as a function of s. By the correspondence of the two formuations, the th joint of the piecewise inear soution is given by s = a z, and the soution between the joints is a inear combination of z and z + as described in Theorem 7 beow. Its proof is in Appendix C. THEOREM 7. For s 0, the soution path for (7) can be expressed as { s+ s s + s z + s s s + s z + if s s < s + and = 0,, J z J if s s J Tabeau-Simpex Agorithm The non-degeneracy assumption in the simpex method that any two consecutive joint soutions are different may not hod in practice for many probems. When some coumns of a basic matrix are discrete, the assumption may fai at some degenerate joint soutions. To dea with more genera settings where the cycing probem may occur in generating the LP soution path by the simpex method, we discuss the tabeau-simpex agorithm. A tabeau is a big matrix which contains a the information about the LP. It consists of the reevant terms in LP associated with a basic matrix such as the basic soution and the cost. DEFINITION 8. For a basic index set B, its tabeau is defined as cost row penaty row pivot rows zeroth coumn c B A B b a B A B b B b pivot coumns c c B A A B A B A A B a a B A We foow the convention for the names of the coumns and rows in the tabeau. For reference, see Murty (983) and Bertsimas and Tsitsikis (997). Note that the zeroth coumn contains z B := A B b, the non-zero part of the basic soution, c B A B b = c z, the negative cost, and a B A B b = a z, the negative penaty of z associated with B, and the pivot coumns contain č j s and ǎ j s. The agorithm to be discussed updates the basic index sets by using the tabeau, in particuar, by ordering some rows of the tabeau. To describe the agorithm, we introduce the exicographic order of vectors first. DEFINITION 9. For v and w R n, we say that v is exicographicay greater than w (denoted by v L > w) if the first non-zero entry of v w is stricty positive. We say that v is exicographicay positive if v L > 0. Consider the parametric-cost LP in (4) Initia Tabeau With the index set B 0, initiaize the tabeau. Since z 0 B = 0 B0 b 0 and the coumns of A can be rearranged such that the first M coumns of B0 A are I, we assume that the pivot rows, [A B b 0 A B0 A], of the initia tabeau are exicographicay positive. In other words, there is a permutation π : N N which maps B 0 to M := {,, M}, and we can repace the probem with the π-permuted version (e.g., z π(n) and A π(n) ). 7

9 Updating Tabeau Given the current optima basic index set B, the current tabeau is cost row penaty row pivot rows zeroth coumn c B B b B b B b a B pivot coumns c c B A B A B A A B a a B Suppose a the pivot rows of the current tabeau are exicographicay positive. The tabeau-simpex agorithm differs from the simpex agorithm ony in the way the exit index is determined. The foowing procedure is generaization of Step 4 in the simpex agorithm for finding the exit index. Step 4. Let u := (u,, u M ) := A B j. For each i M with u i > 0, divide the ith pivot row (incuding the entry in the zeroth coumn) by u i. And, among those rows, find the index, i, of the exicographicay smaest row. Then, i := B i is the exit index. Remark Since u = d B, if i in (2) is unique with z i > 0, then it is the same as the exicographicay smaest row that the tabeau-simpex agorithm seeks. Hence the two agorithms coincide. The simpex agorithm determines the exit index based ony on the zeroth coumn in the tabeau whie the exicographic ordering invoves the pivot coumns additionay. The optimaity of B for λ [λ, λ ] immediatey foows by the same step 3, and (3) remains to hod true for the exit index i of the tabeau-simpex agorithm, which impies the optimaity of B + at λ = λ. Some characteristics of the updated tabeau associated with B + are described in the next theorem. The proof is adapted from that for the exicographic pivoting rue in Bertsimas and Tsitsikis (997) p. 08. See Appendix D for detais. THEOREM 0. For the updated basic index set B + by the tabeau-simpex agorithm, i) a the pivot rows of the updated tabeau are sti exicographicay positive, and ii) the updated cost row is exicographicay greater than that for B. Since B +b is the zeroth coumn of the pivot rows, i) says that the basic soution for B+ is feasibe, i.e., z + 0. Moreover, it impies that the updating procedure can be repeated with B + and the new tabeau. It is not hard to see that z + = z if and ony if z i = 0 (see the proof of Theorem 0 in the Appendix for more detais). When z i = 0, z + = z, however the tabeau-simpex agorithm uniquey updates B + such that the previous optima basic index sets B s never reappear in the process. This anti-cycing property is guaranteed by ii). By ii), we can stricty order the optima basic index sets B based on their cost rows. Because of this and the fact that a possibe basic index sets are finite, the tota number of iterations must be finite. This proves the foowing. COROLLARY. The tabeau updating procedure terminates after a finite number of iterations. Suppose that the tabeau-simpex agorithm stops after J iterations with λ J 0. In parae to the simpex agorithm, the tabeau-simpex agorithm outputs the sequence {(z, s, λ ) : = 0,, J}, and the soution paths for (4) and (7) admit the same forms as in Theorem 6 and Theorem 7 except for any dupicate joints λ and s. 4. Exampes of LP for Reguarization A few concrete exampes of LP probems that arise in statistics for feature seection via reguarization are given. For each exampe, the eements in the standard LP form are identified, and their commonaities and how they can be utiized for efficient computation are discussed. 8

10 4.. -Norm Quantie Regression Quantie regression is a regression technique, introduced by Koenker and Bassett (978), intended to estimate the conditiona quantie functions. It is obtained by repacing the squared error oss of the cassica inear regression for the conditiona mean function with a piecewise inear oss caed the check function. For a genera introduction to quantie regression, see Koenker and Haock (200). For simpicity, assume that the conditiona quanties are inear in the predictors. Given a data set, {(x i, y i ) : x i R p, y i R, i =,, n}, the τth conditiona quantie function is estimated by min β 0 R, β R p n i= ρ τ (y i β 0 x i β), (5) where β 0 and β := (β,...,β p ) are the quantie regression coefficients for τ (0, ), and ρ τ ( ) is the check function defined as { τ t for t > 0 ρ τ (t) := ( τ) t for t 0. For exampe, when τ = /2, it ooks for the median regression function. The standard quantie regression probem in (5) can be cast as an LP probem itsef, and enumeration of the entire range of quantie functions parametrized by τ is feasibe as noted in Koenker (2005), p.85. Since it is somewhat different from an array of statistica optimization probems for feature seection that we intend to address in this paper, we eave an adequate treatment of this topic esewhere and turn to a reguarized quantie regression. Aiming at estimating the conditiona quantie function simutaneousy with seecting reevant predictors, Li and Zhu (2005) propose the -norm Quantie Regression. It is defined by the foowing constrained optimization probem: { min β 0 R, β R p n i= ρ τ (y i β 0 x i β) s.t. β s, where s > 0 is a reguarization parameter. Equivaenty, with λ Q = τ/( τ), and λ another tuning parameter, the -norm quantie regression can be recast as { min β 0 R, β R p, ζ R n n i= {(ζ i) + + λ Q (ζ i ) } + λ β s.t. β 0 + x i β + ζ i = y i for i =,, n, where (x) + = max(x, 0) and (x) = max( x, 0). (6) can be formuated as an LP parametrized by λ, which is a common feature of the exampes discussed. For the non-negativity constraint in the standard form of LP, consider both positive and negative parts of each variabe and denote, for exampe, ((β ) +,...,(β p ) + ) by β + and ((β ),..., (β p ) ) by β. Note that β = β + β and the -norm β := p i= β i is given by (β + + β ) with := (,, ) of appropriate ength. Let Y := (y,, y n ), X := (x,, x n ), ζ := (ζ,, ζ n ), and 0 := (0,, 0) of appropriate ength. Then the foowing eements define the -norm quantie regression in the standard form of a parametric-cost LP in (4): z := ( β 0 + β0 (β + ) (β ) (ζ + ) (ζ ) ) c := ( λ Q ) a := ( ) A := ( X X I I ) b := Y with a tota of N = 2( + p + n) variabes and M = n equaity constraints. (6) Norm SVM Consider a binary cassification probem where y i {, }, i =,, n denote the cass abes. The Support Vector Machine (SVM) introduced by Cortes and Vapnik (995) is a cassification method that finds the optima hyperpane maximizing the margin between the casses. It is another exampe of a reguarization method with a margin based 9

11 hinge oss and the ridge regression type 2 norm penaty. The optima hyperpane (β 0 + xβ = 0) in the standard SVM is determined by the soution to the probem: min β 0 R, β R p n { y i (β 0 + x i β)} + + λ β 2 2, i= where λ > 0 is a tuning parameter. Repacing the 2 norm with the norm for seection of variabes as in Bradey and Mangasarian (998) and Zhu et a. (2004), we arrive at a variant of the soft-margin SVM: { min n β 0 R, β R p, ζ R n i= (ζ i) + + λ β (7) s.t. y i (β 0 + x i β) + ζ i = for i =,, n. Simiary, this -norm SVM can be formuated as a parametric cost LP with the foowing eements in the standard form: z := ( β 0 + β0 (β + ) (β ) (ζ + ) (ζ ) ) c := ( ) a := ( ) A := ( Y Y diag(y )X diag(y )X I I ) b :=. This exampe wi be revisited in great detais in Section Norm Functiona Component Seection We have considered ony inear functions in the origina variabes for conditiona quanties and separating hyperpanes so far. In genera, the technique of norm reguarization for variabe seection can be extended to nonparametric regression and cassification. Athough many different extensions are possibe, we discuss here a specific extension for feature seection which is we suited to a wide range of function estimation and prediction probems. In a nutshe, the space of inear functions is substituted with a rich function space such as a reproducing kerne Hibert space (Wahba, 990; Schökopf and Smoa, 2002) where functions are decomposed of interpretabe functiona components, and the decomposition corresponds to a set of different kernes which generate the functiona subspaces. Let an ANOVA-ike decomposition of f with, say, d components be f = f + +f d and K ν, ν =,..., d be the associated kernes. Nonnegative weights θ ν are then introduced for recaibration of the functiona components f ν. Treating f ν s as features and restricting the norm of θ := (θ,..., θ d ) akin to the LASSO eads to a genera procedure for feature seection and shrinkage. Detaied discussions of the idea can be found in Lin and Zhang (2006); Gunn and Kandoa (2002); Zhang (2006); Lee et a. (2006). More generay, Micchei and Ponti (2005) treat it as a reguarization procedure for optima kerne combination. For iustration, we consider the θ-step of the structured SVM in Lee et a. (2006), which yieds another parametric cost LP probem. For generaity, consider a k-category probem with potentiay different miscassification costs. The cass abes are coded by k-vectors; y i = (y i,..., yk i ) denotes a vector with y j i = and /(k ) esewhere if the ith observation fas into cass j. L(y i ) = (L y i,..., L k y i ) is a miscassification cost vector, where L j j is the cost of miscassifying j as j. The SVM aims to find f = (f,...,f k ) cosey matching an appropriate cass code y given x which induces a cassifier φ(x) = argmax j=,...,k f j (x). Suppose that each f j is of the form β j 0 +hj (x) := β j 0 + n i= βj i d ν= θ νk ν (x i, x). Define the squared norm of h j as h j 2 K := (βj ) ( d ν= θ νk ν ) β j, where β j := (β j,...,βj n) is the jth coefficient vector, and K ν is the n by n kerne matrix associated with K ν. With the extended hinge oss L{y i, f(x i )} := L(y i ){f(x i ) y i } +, the structured SVM finds f with β and θ minimizing n L(y i ){f(x i ) y i } + + λ 2 i= k h j 2 K + λ θ j= d θ ν (8) subject to θ ν 0 for ν =,..., d. λ and λ θ are tuning parameters. By aternating estimation of β and θ, we attempt to find the optima kerne configuration (a inear combination of pre-specified kernes) and the coefficients associated ν= 0

12 with the optima kerne. The θ-step refers to optimization of the functiona component weights θ given β. More specificay, treating β as fixed, the weights of the features are chosen to minimize { } ( k d (L j ) β j 0 + θ ν K ν β j y j + λ k d ) d 2 ν= j=(β j ) θ ν K ν β j + λ θ θ ν, ν= ν= j= where L j := (L j y,...,l j y n ) and y j = (y j,..., yj n). This optimization probem can be rephrased as min k ζ R nk, θ R d j= (Lj ) (ζ j ) + + λ ( d 2 ν= θ k ν j= (βj ) K ν β j) + λ d θ ν= θ ν d s.t. ν= θ νk ν β j ζ j = y j β j 0 for j =,...,k θ ν 0 for ν =,...,d. + Let g ν := (λ/2) k j= (βj ) K ν β j, g := (g,, g d ), L := ( (L ),, (L k ) )., and ζ := ((ζ ),, (ζ )) k Aso, et X := K β K d β K β k K d β k Then the foowing eements define the θ-step as a parametric cost LP indexed by λ θ with N = d + 2nk variabes and M = nk equaity constraints:. z := ( θ (ζ + ) (ζ ) ) c := ( g L 0 ) a := ( 0 0 ) A := ( X I I ) b := ((y β 0),..., (y k β k 0) ) Computation The LP probems for the foregoing three exampes share a simiar structure that can be expoited in computation. First of a, the A matrix has both I and I as its sub-matrices, and the entries of the penaty coefficient vector a corresponding to I and I in A are zero. Thus, the ranks of A and A Ia are M, and the initia optima soution exists and can be easiy identified. Due to the specia structure of A Ia, it is easy to find a basic index set B Ia for the initia LP probem in (9) which gives a feasibe soution. For instance, a feasibe basic soution can be obtained by constructing a basic index set B such that for b j 0, we choose the jth index from those for I, and otherwise from the indices for I. For the θ-step of structured SVM, B itsef is the initia optima basic index set, and it gives a trivia initia soution. For the -norm SVM and -norm quantie regression, the basic index set defined above is not optima. However, the initia optima basic index set can be obtained easiy from B. In genera, the tabeau-simpex agorithm in Section 3 can be used to find the optima basic index set of a standard LP probem, taking any feasibe basic index set B as a starting point. The necessary modification of the agorithm for standard LP probems is that the entry index j N is chosen to satisfy č j < 0 at Step 3. For B, a but the indices j for β + 0 and β 0 satisfy č j 0. Therefore, one of the indices for β 0 wi move into the basic index set first by the agorithm, and it may take some iterations to get the initia optima index set for the two reguarization probems. A tabeau contains a the information on the current LP soution and the terms necessary for the next update. To discuss the computationa compexity of the tabeau updating agorithm in Section 3.2.2, et T denote the tabeau, an (N + ) (M + 2) matrix associated with the current optima basic index set B. For a compact statement of the updating formua, assume that the tabeau is rearranged such that the pivot coumns and the pivot rows precede the zeroth coumn and the cost row and the penaty row, respectivey. For the entry index j and exit index i defined in the agorithm, T j denotes its j th coumn vector, T i the i th row vector of T, and T i j the i j th entry of T. The proof of Theorem 0 in Appendix D impies the foowing updating formua: T + = T T i j (T j e i ) T i. (9)

13 Therefore, the computationa compexity of the tabeau updating is approximatey O(MN) for each iteration in genera. For the three exampes, tabeau update can be further streamined. Expoiting the structure of A with paired coumns and fixed eements in the tabeau associated with B, we can compress each tabeau, retaining the information about the current tabeau, and update the reduced tabeau instead. We eave discussion of impementation detais esewhere, but updating such a reduced tabeau has the compexity of O((N g M)M) for each iteration, where N g is the reduced number of coumns in A counting ony one for each of the paired coumns. As a resut, when the tabeau agorithm stops in J iterations, the compexity of both -norm SVM and -norm QR as a whoe is O((p + )nj) whie that of the θ-step of structured SVM is roughy O(dnkJ), where p is the number of variabes, d is the number of kerne functions, and k is the number of casses. 5. A Coser Look at the -Norm Support Vector Machine Taking the -norm SVM as a case in point, we describe the impications of the tabeau-simpex agorithm for generating the soution path. Zhu et a. (2004) provide a specific path-finding agorithm for the -norm SVM in the compexity-bounded formuation of (7) and give a carefu treatment of this particuar probem. We discuss the correspondence and generaity of the tabeau-simpex agorithm in comparison with their agorithm. 5.. Status Sets For the SVM probem with the compexity bound s, i.e. β s, et β 0 (s) and β(s) := (β (s),, β p (s)) be the optima soution at s. Zhu et a. (2004) categorize the variabes and cases that are invoved in the reguarized LP probem as foows: Active set: A(s) := {j : β j (s) 0, j = 0,,...,p} Ebow set: E(s) := {i : y i {β 0 (s) + x i β(s)} =, i =,...,n} Left set: L(s) := {i : y i {β 0 (s) + x i β(s)} <, i =,..., n} Right set: R(s) := {i : y i {β 0 (s) + x i β(s)} >, i =,...,n}. Now, consider the soution z(s) given by the tabeau-simpex agorithm as defined in Section 4.2 and the equaity constraints of Az(s) = b, that is, Az(s) := β 0 (s)y + diag(y )Xβ(s) + ζ(s) =. It is easy to see that for any soution z(s), its non-zero eements must be one of the foowing types, and hence associated with A(s), L(s), and R(s): β + j (s) > 0 or β j ζ + i (s) > 0 and ζ i ζ + i (s) = 0 and ζ i (s) > 0 (but not both) j A(s); (s) = 0 i L(s); (s) > 0 i R(s). On the other hand, if ζ + i (s) = 0 and ζ i (s) = 0, then i E(s), the ebow set Assumption Suppose that the th joint soution at s = s is non-degenerate. Then z j (s ) > 0 if and ony if j B. This gives A(s ) + L(s ) + R(s ) = n. Since E(s) L(s) R(s) = {,..., n} for any s, the reationship that A(s ) = E(s ) must hod for a the joint soutions. In fact, the equaity of the cardinaity of the active set and the ebow set is stated as an assumption for uniqueness of the soution in the agorithm of Zhu et a. (2004). The impicit assumption of z B > 0 at each joint impies z + z, the non-degeneracy assumption for the simpex agorithm. Thus the simpex agorithm is ess 2

14 restrictive. In practice, the assumption that joint soutions are non-degenerate may not hod, especiay when important predictors are discrete or coded categorica variabes such as gender. For instance, the initia soution of the -norm SVM vioates the assumption in most cases, requiring a separate treatment for finding the next joint soution after initiaization. In genera, there coud be more than one degenerate joint soutions aong the soution path. This woud make the tabeau-simpex agorithm appeaing as it does not rey on any restrictive assumption Duaity in Agorithm To move from one joint soution to the next, the simpex agorithm finds the entry index j. For the -norm SVM, each index is associated with either β j or ζ i. Under the non-degeneracy assumption, the variabe associated with j must change from zero to non-zero after the joint (s > s ). Therefore, ony one of the foowing events as defined in Zhu et a. (2004) can happen immediatey after a joint soution: β j (s ) = 0 becomes β j (s) 0, i.e., an inactive variabe becomes active; ζ i (s ) = 0 becomes ζ i (s) 0, i.e., an eement eaves the ebow set and joins either the eft set or the right set. In conjunction with the entry index, the simpex agorithm determines the eaving index, which accompanies one of the reverse events. The agorithm in Zhu et a. (2004), driven by the Karush-Kuhn-Tucker optimaity conditions, seeks the event with the smaest oss/ s, in other words, the one that decreases the cost with the fastest rate. The simpex agorithm is consistent with this existing agorithm. As in (0), reca that the entry index j is chosen to minimize (č j/ǎ j) among j N \ B with ǎ j > 0. N \ B contains those indices corresponding to j / A(s ) or i E(s ). Anaogous to the optima moving direction d in (), define v j = (v j,...,vj N ) such that v j B = B A j, v j j =, and vj i = 0 for i N \ (B {j}). Then ǎ j := (a j a B A B j ) = a v j s j and č j := (c j c B A B j) = c v j oss j. Thus, the index chosen by the simpex agorithm in (0) maximizes the rate of reduction in the cost, oss/ s. The existing -norm SVM path agorithm needs to sove roughy p groups of E -variate inear equation systems for each iteration. Its computationa compexity can be O(p E 2 + p L ) if Sherman-Morrison updating formua is used. On the other hand, the computationa compexity of the tabeau-simpex agorithm is O(pn) for each iteration as mentioned in Section 4. Therefore, the former coud be faster if n/p is arge; otherwise, the tabeau-simpex agorithm is faster. Most of the arguments in this section aso appy for the comparison of the simpex agorithms with the extended soution path agorithm for the -norm muti-cass SVM by Wang and Shen (2006). 6. Numerica Resuts We iustrate the use of the tabeau-simpex agorithm for the parametric LP in statistica appications with two simuated exampes and anaysis of rea data, and discuss mode seection or variabe seection probems therein. 6.. Cassification Consider a simpe binary cassification probem with inear cassifiers. In this simuation, 0-dimensiona independent covariates are generated from the standard norma distribution, x := (x,..., x 0 ) N(0, I), and the response variabe is generated via the foowing probit mode: Y = sign(β 0 + xβ + ǫ), (20) where ǫ N(0, σ 2 ), and x and ǫ are assumed to be mutuay independent. Let φ(x) = sign(ˆβ 0 + xˆβ) denote a inear cassifier with ˆβ 0 and ˆβ estimated from data. Under the probit mode, the theoretica error rate of φ(x) can be anayticay obtained as foows. Given β 0 and β, { } Pr Y sign(ˆβ 0 + Xˆβ) { ( ) } u 0 + û uz = Φ ( û 0 ) + E Φ sign(z + û ( + /SNR) (û 0 ), u) 2 3

15 where Φ( ) is the cumuative distribution function of the standard norma distribution, Z is a standard norma random variabe, u 0 := β 0 / β 2, u := β/ β 2, û 0 := ˆβ 0 / ˆβ 2, and û := ˆβ/ ˆβ 2. The SNR refers to the signa-to-noise ratio defined as var(xβ)/σ 2 in this case. Note that the error rate is invariant to scaing of (ˆβ 0, ˆβ). Setting σ 2 = 50, β 0 = 0, and β = (2, 0, 2, 0, 2, 0, 0, 0, 0, 2), we have the SNR of Then, for the Bayes decision rue, in particuar, we have the error rate of Pr {Y sign(β 0 + Xβ)} = 2 π arctan SNR 0.336, (2) which is the minimum possibe vaue under the probit mode (20). Figure shows the coefficient paths of the -norm SVM indexed by og(λ) (piecewise constant) and s (piecewise inear) for a simuated data set of size 400 from the mode. Ceary, as /λ or s increases, those estimated coefficients corresponding to the non-zero β j s (j =, 3, 5, and 0) grow arge very quicky. The error rate associated with the soution at each point of the paths is theoreticay avaiabe for this exampe, and thus the optima vaue of the reguarization parameter can be defined. However, in practice, λ (or s) needs to be chosen data-dependenty, and this gives rise to an important cass of mode seection probems in genera. For the feasibiity of data-dependent choice of λ, we carried out cross vaidation and made comparison with the theoreticay optima vaues. The dashed ines in Figure indicate the optima vaues of λ (or s) chosen by five-fod cross vaidation with 0- oss (bue) and hinge oss (red), respectivey. The discontinuity of the 0- oss tends to give jagged cross vaidation curves, which have an adverse effect on identification of the optima vaue of the tuning parameter. To increase the stabiity, one may smooth out individua cross vaidated error rate curves by averaging them over different spits of the data. To that effect, cross vaidation was repeated 50 times with respect to the 0- oss and the hinge oss for averaging. β β og(λ) s Fig.. The soution paths of the -norm SVM for simuated data. The numbers at the end of the paths are the indices of β s. The vaues of og(λ) (or s) with the minimum five-fod cross vaidated error rate and hinge oss are indicated by the bue and red dashed ines, respectivey. Figure 2 dispays the path of average miscassification rates from five-fod cross vaidation over the training data and the true error rate path for the -norm SVM under the probit mode. The true error rates were approximated by numerica integration up to the precision of 0 4. Seection of s by cross vaidation with the 0- oss and hinge oss gave very simiar resuts. The smaest error rate achieved by the -norm SVM for this particuar training data set is approximatey 0.34, which is fairy cose to the Bayes error rate. We observe that the inear cassifiers at both of the chosen vaues incude the four reevant predictors Quantie Regression For another exampe, consider a quantie regression probem where covariates are simuated by the same setting as in the previous exampe, but a continuous response variabe is defined by Y = β 0 + xβ + ǫ. Under the assumption that 4

16 CV Error Rate Error Rate s s Fig. 2. Error rate paths. The eft pane shows the path of average miscassification rates from five-fod cross vaidation of the training data repeated 50 times, and the right pane shows the true error rate path for the -norm SVM under the probit mode. The vertica dashed ines are the same as in Figure. The cross on the path in the right pane pinpoints the vaue of s with the minimum error rate, and the gray horizonta dashed ine indicates the Bayes error rate. ǫ N(0, σ 2 ), the theoretica τth conditiona quantie function is given by m τ (x) = σφ (τ) + β 0 + xβ. Restricting to inear functions ony, suppose that an estimated τth conditiona quantie function is f(x) = ˆβ 0 + xˆβ. With respect to the check function as a oss, one can cacuate the risk of f, which is defined by R(f; β 0, β) := E {τ(y ˆβ 0 Xˆβ) + + ( τ)(y ˆβ } 0 Xˆβ) = τ Φ ˆβ 0 β 0 σ 2 + β ˆβ 2 (β 0 ˆβ 0 ) 2 σ β ˆβ { } 2 2 (ˆβ 0 β 0 ) 2 exp 2π 2(σ 2 + β ˆβ 2 2 ). For each τ, the true risk of m τ (x) is (σ/ 2π)exp{ Φ (τ) 2 }, which represents the minima achievabe risk. Note that the maximum of the minima risks occurs when τ = 0.5 in this case, i.e., for the median, and the true conditiona median function is m 0.5 (x) = β 0 + xβ with the risk of σ/ 2π Figure 3 shows the coefficient paths of the -norm median regression appied to simuated data of size 400. Simiary, Figure 4 shows the corresponding path of the averaged 0-fod cross vaidated risk with respect to the check oss from 0 repetitions and its corresponding theoretica risk path. At the chosen vaue of λ by cross vaidation, the four correct predictors and one extra predictor have non-zero coefficients, and the theoretica risk of the seected mode is not far from the minima risk denoted by the horizonta reference ine. We note that the compete risk path eves off roughy after ogλ = 5, impying that moderatey reguarized modes are amost as good as the fu mode of the unconstrained soution. In terms of the risk, the reaized benefit of penaization appears itte compared to the previous cassification exampe Income Data Anaysis For a rea appication, we take the income data in Hastie et a. (200), which are extracted from a marketing database for a survey conducted in the Bay area (987). The data set is avaiabe at tibs/eemstatlearn/. It consists of 4 demographic attributes with a mixture of categorica and continuous variabes, which incude age, gender, education, occupation, marita status, househoder status 5

17 β β og(λ) s Fig. 3. The soution paths of the -norm median regression for simuated data. The dashed ines specify the vaue of the reguarization parameter with the minimum of 0-fod cross vaidated risk with respect to the check oss over the training data. (own home/rent/other), and annua income among others. The main goa of the anaysis is to predict the annua income of the househod (or persona income if singe) from the other 3 demographics attributes. The origina response of the annua income takes one of the foowing income brackets: < 0, [0, 5), [5, 20), [20, 25), [25, 30), [30, 40), [40, 50), [50, 75), and 75 in the unit of $,000. For simpification, we created a proxy numerica response by converting each bracket into its midde vaue except the first and the ast ones, which were mapped to some reasonabe vaues abeit arbitrary. Removing the records with missing vaues yieds a tota of 6,876 records. Because of the granuarity in the response, the norma-theory regression woud not be appropriate. As an aternative, we considered median regression, in particuar, the norm median regression for simutaneous variabe seection and prediction. In the anaysis, each categorica variabe with k categories was coded by (k-) 0- dummy variabes with the majority category treated as the baseine. Some genuiney numerica but bracketed predictors such as age were aso coded simiary as the response. As a resut, 35 variabes were generated from the 3 origina variabes. The data set was spit into a training set of 2,000 observations and a test set of 4,876 for evauation. A the predictors were centered to zero and scaed to have the squared norm equa to the training sampe size before fitting modes. Inspection of the margina associations of the origina attributes with the response necessitated incusion of a quadratic term for age. We then considered inear median regression with the main effect terms ony (35 variabes pus the quadratic term) and with two-way interaction terms as we as the main effects. There are potentiay 53 two-way interaction terms by taking the product of each pair of the normaized main effect terms from different attributes. In an attempt to excude neary constant terms, we screened out any product with the reative frequency of its mode 90% or above. This resuted in addition of 69 two-way interactions to the main effects mode. Note that the interaction terms were put in the partia two-way interaction mode without further centering and normaization for the carity of the mode. Approximatey three quarters of the interactions had their norms within 0% difference from that of the main effects. Figure 5 shows the coefficient paths of the main effects mode in the eft pane and the partia two-way interaction mode in the right pane for the training data set. The coefficients of the dummy variabes grouped for each categorica variabe are of the same coor. In both modes, severa variabes emerge at eary stages as important predictors of the househod income and remain important throughout the paths. Among those, the factors positivey associated with househod income are home ownership (in dark bue reative to renting), education (in brown), dua income due to marriage (in purpe reative to not married ), age (in skybue), and being mae (in ight green). Marita status and occupation are aso strong predictors. As opposed to those positive factors, being singe or divorced (in red reative to married ) and being a student, cerica worker, retired or unempoyed (in green reative to professionas/managers) are negativey associated with the income. So is the quadratic term of age in bue as expected. In genera, it woud be too simpistic to assume that the demographic factors in the data affect the househod income additivey. Truthfu modes woud need to take into account some high order interactions, refecting 6

Another Look at Linear Programming for Feature Selection via Methods of Regularization 1

Another Look at Linear Programming for Feature Selection via Methods of Regularization 1 Another Look at Linear Programming for Feature Selection via Methods of Regularization 1 Yonggang Yao, SAS Institute Inc. Yoonkyung Lee, The Ohio State University Technical Report No. 8r April, 21 Department

More information

Statistical Learning Theory: A Primer

Statistical Learning Theory: A Primer Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO

More information

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries c 26 Noninear Phenomena in Compex Systems First-Order Corrections to Gutzwier s Trace Formua for Systems with Discrete Symmetries Hoger Cartarius, Jörg Main, and Günter Wunner Institut für Theoretische

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part IX The EM agorithm In the previous set of notes, we taked about the EM agorithm as appied to fitting a mixture of Gaussians. In this set of notes, we give a broader view

More information

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After

More information

Multilayer Kerceptron

Multilayer Kerceptron Mutiayer Kerceptron Zotán Szabó, András Lőrincz Department of Information Systems, Facuty of Informatics Eötvös Loránd University Pázmány Péter sétány 1/C H-1117, Budapest, Hungary e-mai: szzoi@csetehu,

More information

A Brief Introduction to Markov Chains and Hidden Markov Models

A Brief Introduction to Markov Chains and Hidden Markov Models A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,

More information

Explicit overall risk minimization transductive bound

Explicit overall risk minimization transductive bound 1 Expicit overa risk minimization transductive bound Sergio Decherchi, Paoo Gastado, Sandro Ridea, Rodofo Zunino Dept. of Biophysica and Eectronic Engineering (DIBE), Genoa University Via Opera Pia 11a,

More information

Primal and dual active-set methods for convex quadratic programming

Primal and dual active-set methods for convex quadratic programming Math. Program., Ser. A 216) 159:469 58 DOI 1.17/s117-15-966-2 FULL LENGTH PAPER Prima and dua active-set methods for convex quadratic programming Anders Forsgren 1 Phiip E. Gi 2 Eizabeth Wong 2 Received:

More information

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems Componentwise Determination of the Interva Hu Soution for Linear Interva Parameter Systems L. V. Koev Dept. of Theoretica Eectrotechnics, Facuty of Automatics, Technica University of Sofia, 1000 Sofia,

More information

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Brandon Maone Department of Computer Science University of Hesini February 18, 2014 Abstract This document derives, in excrutiating

More information

A. Distribution of the test statistic

A. Distribution of the test statistic A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch

More information

SVM: Terminology 1(6) SVM: Terminology 2(6)

SVM: Terminology 1(6) SVM: Terminology 2(6) Andrew Kusiak Inteigent Systems Laboratory 39 Seamans Center he University of Iowa Iowa City, IA 54-57 SVM he maxima margin cassifier is simiar to the perceptron: It aso assumes that the data points are

More information

XSAT of linear CNF formulas

XSAT of linear CNF formulas XSAT of inear CN formuas Bernd R. Schuh Dr. Bernd Schuh, D-50968 Kön, Germany; bernd.schuh@netcoogne.de eywords: compexity, XSAT, exact inear formua, -reguarity, -uniformity, NPcompeteness Abstract. Open

More information

4 Separation of Variables

4 Separation of Variables 4 Separation of Variabes In this chapter we describe a cassica technique for constructing forma soutions to inear boundary vaue probems. The soution of three cassica (paraboic, hyperboic and eiptic) PDE

More information

Problem set 6 The Perron Frobenius theorem.

Problem set 6 The Perron Frobenius theorem. Probem set 6 The Perron Frobenius theorem. Math 22a4 Oct 2 204, Due Oct.28 In a future probem set I want to discuss some criteria which aow us to concude that that the ground state of a sef-adjoint operator

More information

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA) 1 FRST 531 -- Mutivariate Statistics Mutivariate Discriminant Anaysis (MDA) Purpose: 1. To predict which group (Y) an observation beongs to based on the characteristics of p predictor (X) variabes, using

More information

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with? Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine

More information

II. PROBLEM. A. Description. For the space of audio signals

II. PROBLEM. A. Description. For the space of audio signals CS229 - Fina Report Speech Recording based Language Recognition (Natura Language) Leopod Cambier - cambier; Matan Leibovich - matane; Cindy Orozco Bohorquez - orozcocc ABSTRACT We construct a rea time

More information

From Margins to Probabilities in Multiclass Learning Problems

From Margins to Probabilities in Multiclass Learning Problems From Margins to Probabiities in Muticass Learning Probems Andrea Passerini and Massimiiano Ponti 2 and Paoo Frasconi 3 Abstract. We study the probem of muticass cassification within the framework of error

More information

Approximated MLC shape matrix decomposition with interleaf collision constraint

Approximated MLC shape matrix decomposition with interleaf collision constraint Approximated MLC shape matrix decomposition with intereaf coision constraint Thomas Kainowski Antje Kiese Abstract Shape matrix decomposition is a subprobem in radiation therapy panning. A given fuence

More information

STA 216 Project: Spline Approach to Discrete Survival Analysis

STA 216 Project: Spline Approach to Discrete Survival Analysis : Spine Approach to Discrete Surviva Anaysis November 4, 005 1 Introduction Athough continuous surviva anaysis differs much from the discrete surviva anaysis, there is certain ink between the two modeing

More information

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7 6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17 Soution 7 Probem 1: Generating Random Variabes Each part of this probem requires impementation in MATLAB. For the

More information

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS ISEE 1 SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS By Yingying Fan and Jinchi Lv University of Southern Caifornia This Suppementary Materia

More information

Lecture Note 3: Stationary Iterative Methods

Lecture Note 3: Stationary Iterative Methods MATH 5330: Computationa Methods of Linear Agebra Lecture Note 3: Stationary Iterative Methods Xianyi Zeng Department of Mathematica Sciences, UTEP Stationary Iterative Methods The Gaussian eimination (or

More information

Partial permutation decoding for MacDonald codes

Partial permutation decoding for MacDonald codes Partia permutation decoding for MacDonad codes J.D. Key Department of Mathematics and Appied Mathematics University of the Western Cape 7535 Bevie, South Africa P. Seneviratne Department of Mathematics

More information

C. Fourier Sine Series Overview

C. Fourier Sine Series Overview 12 PHILIP D. LOEWEN C. Fourier Sine Series Overview Let some constant > be given. The symboic form of the FSS Eigenvaue probem combines an ordinary differentia equation (ODE) on the interva (, ) with a

More information

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete Uniprocessor Feasibiity of Sporadic Tasks with Constrained Deadines is Strongy conp-compete Pontus Ekberg and Wang Yi Uppsaa University, Sweden Emai: {pontus.ekberg yi}@it.uu.se Abstract Deciding the feasibiity

More information

6 Wave Equation on an Interval: Separation of Variables

6 Wave Equation on an Interval: Separation of Variables 6 Wave Equation on an Interva: Separation of Variabes 6.1 Dirichet Boundary Conditions Ref: Strauss, Chapter 4 We now use the separation of variabes technique to study the wave equation on a finite interva.

More information

An approximate method for solving the inverse scattering problem with fixed-energy data

An approximate method for solving the inverse scattering problem with fixed-energy data J. Inv. I-Posed Probems, Vo. 7, No. 6, pp. 561 571 (1999) c VSP 1999 An approximate method for soving the inverse scattering probem with fixed-energy data A. G. Ramm and W. Scheid Received May 12, 1999

More information

Asynchronous Control for Coupled Markov Decision Systems

Asynchronous Control for Coupled Markov Decision Systems INFORMATION THEORY WORKSHOP (ITW) 22 Asynchronous Contro for Couped Marov Decision Systems Michae J. Neey University of Southern Caifornia Abstract This paper considers optima contro for a coection of

More information

A proposed nonparametric mixture density estimation using B-spline functions

A proposed nonparametric mixture density estimation using B-spline functions A proposed nonparametric mixture density estimation using B-spine functions Atizez Hadrich a,b, Mourad Zribi a, Afif Masmoudi b a Laboratoire d Informatique Signa et Image de a Côte d Opae (LISIC-EA 4491),

More information

Statistical Learning Theory: a Primer

Statistical Learning Theory: a Primer ??,??, 1 6 (??) c?? Kuwer Academic Pubishers, Boston. Manufactured in The Netherands. Statistica Learning Theory: a Primer THEODOROS EVGENIOU AND MASSIMILIANO PONTIL Center for Bioogica and Computationa

More information

Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix

Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 43 Do Schoos Matter for High Math Achievement? Evidence from the American Mathematics Competitions Genn Eison and Ashey Swanson Onine Appendix Appendix

More information

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai

More information

Stochastic Variational Inference with Gradient Linearization

Stochastic Variational Inference with Gradient Linearization Stochastic Variationa Inference with Gradient Linearization Suppementa Materia Tobias Pötz * Anne S Wannenwetsch Stefan Roth Department of Computer Science, TU Darmstadt Preface In this suppementa materia,

More information

T.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA

T.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA ON THE SYMMETRY OF THE POWER INE CHANNE T.C. Banwe, S. Gai {bct, sgai}@research.tecordia.com Tecordia Technoogies, Inc., 445 South Street, Morristown, NJ 07960, USA Abstract The indoor power ine network

More information

Some Measures for Asymmetry of Distributions

Some Measures for Asymmetry of Distributions Some Measures for Asymmetry of Distributions Georgi N. Boshnakov First version: 31 January 2006 Research Report No. 5, 2006, Probabiity and Statistics Group Schoo of Mathematics, The University of Manchester

More information

Discrete Techniques. Chapter Introduction

Discrete Techniques. Chapter Introduction Chapter 3 Discrete Techniques 3. Introduction In the previous two chapters we introduced Fourier transforms of continuous functions of the periodic and non-periodic (finite energy) type, we as various

More information

Approximation and Fast Calculation of Non-local Boundary Conditions for the Time-dependent Schrödinger Equation

Approximation and Fast Calculation of Non-local Boundary Conditions for the Time-dependent Schrödinger Equation Approximation and Fast Cacuation of Non-oca Boundary Conditions for the Time-dependent Schrödinger Equation Anton Arnod, Matthias Ehrhardt 2, and Ivan Sofronov 3 Universität Münster, Institut für Numerische

More information

Data Mining Technology for Failure Prognostic of Avionics

Data Mining Technology for Failure Prognostic of Avionics IEEE Transactions on Aerospace and Eectronic Systems. Voume 38, #, pp.388-403, 00. Data Mining Technoogy for Faiure Prognostic of Avionics V.A. Skormin, Binghamton University, Binghamton, NY, 1390, USA

More information

Week 6 Lectures, Math 6451, Tanveer

Week 6 Lectures, Math 6451, Tanveer Fourier Series Week 6 Lectures, Math 645, Tanveer In the context of separation of variabe to find soutions of PDEs, we encountered or and in other cases f(x = f(x = a 0 + f(x = a 0 + b n sin nπx { a n

More information

Power Control and Transmission Scheduling for Network Utility Maximization in Wireless Networks

Power Control and Transmission Scheduling for Network Utility Maximization in Wireless Networks ower Contro and Transmission Scheduing for Network Utiity Maximization in Wireess Networks Min Cao, Vivek Raghunathan, Stephen Hany, Vinod Sharma and. R. Kumar Abstract We consider a joint power contro

More information

Mat 1501 lecture notes, penultimate installment

Mat 1501 lecture notes, penultimate installment Mat 1501 ecture notes, penutimate instament 1. bounded variation: functions of a singe variabe optiona) I beieve that we wi not actuay use the materia in this section the point is mainy to motivate the

More information

Discrete Techniques. Chapter Introduction

Discrete Techniques. Chapter Introduction Chapter 3 Discrete Techniques 3. Introduction In the previous two chapters we introduced Fourier transforms of continuous functions of the periodic and non-periodic (finite energy) type, as we as various

More information

Cryptanalysis of PKP: A New Approach

Cryptanalysis of PKP: A New Approach Cryptanaysis of PKP: A New Approach Éiane Jaumes and Antoine Joux DCSSI 18, rue du Dr. Zamenhoff F-92131 Issy-es-Mx Cedex France eiane.jaumes@wanadoo.fr Antoine.Joux@ens.fr Abstract. Quite recenty, in

More information

Approximated MLC shape matrix decomposition with interleaf collision constraint

Approximated MLC shape matrix decomposition with interleaf collision constraint Agorithmic Operations Research Vo.4 (29) 49 57 Approximated MLC shape matrix decomposition with intereaf coision constraint Antje Kiese and Thomas Kainowski Institut für Mathematik, Universität Rostock,

More information

Indirect Optimal Control of Dynamical Systems

Indirect Optimal Control of Dynamical Systems Computationa Mathematics and Mathematica Physics, Vo. 44, No. 3, 24, pp. 48 439. Transated from Zhurna Vychisite noi Matematiki i Matematicheskoi Fiziki, Vo. 44, No. 3, 24, pp. 444 466. Origina Russian

More information

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel Sequentia Decoding of Poar Codes with Arbitrary Binary Kerne Vera Miosavskaya, Peter Trifonov Saint-Petersburg State Poytechnic University Emai: veram,petert}@dcn.icc.spbstu.ru Abstract The probem of efficient

More information

Efficiently Generating Random Bits from Finite State Markov Chains

Efficiently Generating Random Bits from Finite State Markov Chains 1 Efficienty Generating Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown

More information

Nonlinear Analysis of Spatial Trusses

Nonlinear Analysis of Spatial Trusses Noninear Anaysis of Spatia Trusses João Barrigó October 14 Abstract The present work addresses the noninear behavior of space trusses A formuation for geometrica noninear anaysis is presented, which incudes

More information

Haar Decomposition and Reconstruction Algorithms

Haar Decomposition and Reconstruction Algorithms Jim Lambers MAT 773 Fa Semester 018-19 Lecture 15 and 16 Notes These notes correspond to Sections 4.3 and 4.4 in the text. Haar Decomposition and Reconstruction Agorithms Decomposition Suppose we approximate

More information

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1 Inductive Bias: How to generaize on nove data CS 478 - Inductive Bias 1 Overfitting Noise vs. Exceptions CS 478 - Inductive Bias 2 Non-Linear Tasks Linear Regression wi not generaize we to the task beow

More information

(This is a sample cover image for this issue. The actual cover is not yet available at this time.)

(This is a sample cover image for this issue. The actual cover is not yet available at this time.) (This is a sampe cover image for this issue The actua cover is not yet avaiabe at this time) This artice appeared in a journa pubished by Esevier The attached copy is furnished to the author for interna

More information

The EM Algorithm applied to determining new limit points of Mahler measures

The EM Algorithm applied to determining new limit points of Mahler measures Contro and Cybernetics vo. 39 (2010) No. 4 The EM Agorithm appied to determining new imit points of Maher measures by Souad E Otmani, Georges Rhin and Jean-Marc Sac-Épée Université Pau Veraine-Metz, LMAM,

More information

DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM

DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM MIKAEL NILSSON, MATTIAS DAHL AND INGVAR CLAESSON Bekinge Institute of Technoogy Department of Teecommunications and Signa Processing

More information

Coupling of LWR and phase transition models at boundary

Coupling of LWR and phase transition models at boundary Couping of LW and phase transition modes at boundary Mauro Garaveo Dipartimento di Matematica e Appicazioni, Università di Miano Bicocca, via. Cozzi 53, 20125 Miano Itay. Benedetto Piccoi Department of

More information

Convergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems

Convergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems Convergence Property of the Iri-Imai Agorithm for Some Smooth Convex Programming Probems S. Zhang Communicated by Z.Q. Luo Assistant Professor, Department of Econometrics, University of Groningen, Groningen,

More information

FRIEZE GROUPS IN R 2

FRIEZE GROUPS IN R 2 FRIEZE GROUPS IN R 2 MAXWELL STOLARSKI Abstract. Focusing on the Eucidean pane under the Pythagorean Metric, our goa is to cassify the frieze groups, discrete subgroups of the set of isometries of the

More information

The Group Structure on a Smooth Tropical Cubic

The Group Structure on a Smooth Tropical Cubic The Group Structure on a Smooth Tropica Cubic Ethan Lake Apri 20, 2015 Abstract Just as in in cassica agebraic geometry, it is possibe to define a group aw on a smooth tropica cubic curve. In this note,

More information

Analysis of Emerson s Multiple Model Interpolation Estimation Algorithms: The MIMO Case

Analysis of Emerson s Multiple Model Interpolation Estimation Algorithms: The MIMO Case Technica Report PC-04-00 Anaysis of Emerson s Mutipe Mode Interpoation Estimation Agorithms: The MIMO Case João P. Hespanha Dae E. Seborg University of Caifornia, Santa Barbara February 0, 004 Anaysis

More information

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES Separation of variabes is a method to sove certain PDEs which have a warped product structure. First, on R n, a inear PDE of order m is

More information

VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES

VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES SARAH DAY, JEAN-PHILIPPE LESSARD, AND KONSTANTIN MISCHAIKOW Abstract. One of the most efficient methods for determining the equiibria of a continuous parameterized

More information

Support Vector Machine and Its Application to Regression and Classification

Support Vector Machine and Its Application to Regression and Classification BearWorks Institutiona Repository MSU Graduate Theses Spring 2017 Support Vector Machine and Its Appication to Regression and Cassification Xiaotong Hu As with any inteectua project, the content and views

More information

Reichenbachian Common Cause Systems

Reichenbachian Common Cause Systems Reichenbachian Common Cause Systems G. Hofer-Szabó Department of Phiosophy Technica University of Budapest e-mai: gszabo@hps.ete.hu Mikós Rédei Department of History and Phiosophy of Science Eötvös University,

More information

THE REACHABILITY CONES OF ESSENTIALLY NONNEGATIVE MATRICES

THE REACHABILITY CONES OF ESSENTIALLY NONNEGATIVE MATRICES THE REACHABILITY CONES OF ESSENTIALLY NONNEGATIVE MATRICES by Michae Neumann Department of Mathematics, University of Connecticut, Storrs, CT 06269 3009 and Ronad J. Stern Department of Mathematics, Concordia

More information

c 2007 Society for Industrial and Applied Mathematics

c 2007 Society for Industrial and Applied Mathematics SIAM REVIEW Vo. 49,No. 1,pp. 111 1 c 7 Society for Industria and Appied Mathematics Domino Waves C. J. Efthimiou M. D. Johnson Abstract. Motivated by a proposa of Daykin [Probem 71-19*, SIAM Rev., 13 (1971),

More information

Two-sample inference for normal mean vectors based on monotone missing data

Two-sample inference for normal mean vectors based on monotone missing data Journa of Mutivariate Anaysis 97 (006 6 76 wwweseviercom/ocate/jmva Two-sampe inference for norma mean vectors based on monotone missing data Jianqi Yu a, K Krishnamoorthy a,, Maruthy K Pannaa b a Department

More information

Statistics for Applications. Chapter 7: Regression 1/43

Statistics for Applications. Chapter 7: Regression 1/43 Statistics for Appications Chapter 7: Regression 1/43 Heuristics of the inear regression (1) Consider a coud of i.i.d. random points (X i,y i ),i =1,...,n : 2/43 Heuristics of the inear regression (2)

More information

THE THREE POINT STEINER PROBLEM ON THE FLAT TORUS: THE MINIMAL LUNE CASE

THE THREE POINT STEINER PROBLEM ON THE FLAT TORUS: THE MINIMAL LUNE CASE THE THREE POINT STEINER PROBLEM ON THE FLAT TORUS: THE MINIMAL LUNE CASE KATIE L. MAY AND MELISSA A. MITCHELL Abstract. We show how to identify the minima path network connecting three fixed points on

More information

VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES

VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES SIAM J. NUMER. ANAL. Vo. 0, No. 0, pp. 000 000 c 200X Society for Industria and Appied Mathematics VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES SARAH DAY, JEAN-PHILIPPE LESSARD, AND KONSTANTIN MISCHAIKOW

More information

BALANCING REGULAR MATRIX PENCILS

BALANCING REGULAR MATRIX PENCILS BALANCING REGULAR MATRIX PENCILS DAMIEN LEMONNIER AND PAUL VAN DOOREN Abstract. In this paper we present a new diagona baancing technique for reguar matrix pencis λb A, which aims at reducing the sensitivity

More information

BP neural network-based sports performance prediction model applied research

BP neural network-based sports performance prediction model applied research Avaiabe onine www.jocpr.com Journa of Chemica and Pharmaceutica Research, 204, 6(7:93-936 Research Artice ISSN : 0975-7384 CODEN(USA : JCPRC5 BP neura networ-based sports performance prediction mode appied

More information

Asymptotic Properties of a Generalized Cross Entropy Optimization Algorithm

Asymptotic Properties of a Generalized Cross Entropy Optimization Algorithm 1 Asymptotic Properties of a Generaized Cross Entropy Optimization Agorithm Zijun Wu, Michae Koonko, Institute for Appied Stochastics and Operations Research, Caustha Technica University Abstract The discrete

More information

Schedulability Analysis of Deferrable Scheduling Algorithms for Maintaining Real-Time Data Freshness

Schedulability Analysis of Deferrable Scheduling Algorithms for Maintaining Real-Time Data Freshness 1 Scheduabiity Anaysis of Deferrabe Scheduing Agorithms for Maintaining Rea-Time Data Freshness Song Han, Deji Chen, Ming Xiong, Kam-yiu Lam, Aoysius K. Mok, Krithi Ramamritham UT Austin, Emerson Process

More information

On the evaluation of saving-consumption plans

On the evaluation of saving-consumption plans On the evauation of saving-consumption pans Steven Vanduffe Jan Dhaene Marc Goovaerts Juy 13, 2004 Abstract Knowedge of the distribution function of the stochasticay compounded vaue of a series of future

More information

u(x) s.t. px w x 0 Denote the solution to this problem by ˆx(p, x). In order to obtain ˆx we may simply solve the standard problem max x 0

u(x) s.t. px w x 0 Denote the solution to this problem by ˆx(p, x). In order to obtain ˆx we may simply solve the standard problem max x 0 Bocconi University PhD in Economics - Microeconomics I Prof M Messner Probem Set 4 - Soution Probem : If an individua has an endowment instead of a monetary income his weath depends on price eves In particuar,

More information

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction Akaike Information Criterion for ANOVA Mode with a Simpe Order Restriction Yu Inatsu * Department of Mathematics, Graduate Schoo of Science, Hiroshima University ABSTRACT In this paper, we consider Akaike

More information

(f) is called a nearly holomorphic modular form of weight k + 2r as in [5].

(f) is called a nearly holomorphic modular form of weight k + 2r as in [5]. PRODUCTS OF NEARLY HOLOMORPHIC EIGENFORMS JEFFREY BEYERL, KEVIN JAMES, CATHERINE TRENTACOSTE, AND HUI XUE Abstract. We prove that the product of two neary hoomorphic Hece eigenforms is again a Hece eigenform

More information

Multicategory Classification by Support Vector Machines

Multicategory Classification by Support Vector Machines Muticategory Cassification by Support Vector Machines Erin J Bredensteiner Department of Mathematics University of Evansvie 800 Lincon Avenue Evansvie, Indiana 47722 eb6@evansvieedu Kristin P Bennett Department

More information

Appendix A: MATLAB commands for neural networks

Appendix A: MATLAB commands for neural networks Appendix A: MATLAB commands for neura networks 132 Appendix A: MATLAB commands for neura networks p=importdata('pn.xs'); t=importdata('tn.xs'); [pn,meanp,stdp,tn,meant,stdt]=prestd(p,t); for m=1:10 net=newff(minmax(pn),[m,1],{'tansig','purein'},'trainm');

More information

Algorithms to solve massively under-defined systems of multivariate quadratic equations

Algorithms to solve massively under-defined systems of multivariate quadratic equations Agorithms to sove massivey under-defined systems of mutivariate quadratic equations Yasufumi Hashimoto Abstract It is we known that the probem to sove a set of randomy chosen mutivariate quadratic equations

More information

Separation of Variables and a Spherical Shell with Surface Charge

Separation of Variables and a Spherical Shell with Surface Charge Separation of Variabes and a Spherica She with Surface Charge In cass we worked out the eectrostatic potentia due to a spherica she of radius R with a surface charge density σθ = σ cos θ. This cacuation

More information

https://doi.org/ /epjconf/

https://doi.org/ /epjconf/ HOW TO APPLY THE OPTIMAL ESTIMATION METHOD TO YOUR LIDAR MEASUREMENTS FOR IMPROVED RETRIEVALS OF TEMPERATURE AND COMPOSITION R. J. Sica 1,2,*, A. Haefee 2,1, A. Jaai 1, S. Gamage 1 and G. Farhani 1 1 Department

More information

On a geometrical approach in contact mechanics

On a geometrical approach in contact mechanics Institut für Mechanik On a geometrica approach in contact mechanics Aexander Konyukhov, Kar Schweizerhof Universität Karsruhe, Institut für Mechanik Institut für Mechanik Kaiserstr. 12, Geb. 20.30 76128

More information

Turbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University

Turbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University Turbo Codes Coding and Communication Laboratory Dept. of Eectrica Engineering, Nationa Chung Hsing University Turbo codes 1 Chapter 12: Turbo Codes 1. Introduction 2. Turbo code encoder 3. Design of intereaver

More information

A Graphical Approach for Solving Single Machine Scheduling Problems Approximately

A Graphical Approach for Solving Single Machine Scheduling Problems Approximately A Graphica Approach for Soving Singe Machine Scheduing Probems Approximatey Evgeny R Gafarov Aexandre Dogui Aexander A Lazarev Frank Werner Institute of Contro Sciences of the Russian Academy of Sciences,

More information

4 1-D Boundary Value Problems Heat Equation

4 1-D Boundary Value Problems Heat Equation 4 -D Boundary Vaue Probems Heat Equation The main purpose of this chapter is to study boundary vaue probems for the heat equation on a finite rod a x b. u t (x, t = ku xx (x, t, a < x < b, t > u(x, = ϕ(x

More information

Schedulability Analysis of Deferrable Scheduling Algorithms for Maintaining Real-Time Data Freshness

Schedulability Analysis of Deferrable Scheduling Algorithms for Maintaining Real-Time Data Freshness 1 Scheduabiity Anaysis of Deferrabe Scheduing Agorithms for Maintaining Rea- Data Freshness Song Han, Deji Chen, Ming Xiong, Kam-yiu Lam, Aoysius K. Mok, Krithi Ramamritham UT Austin, Emerson Process Management,

More information

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1654 March23, 1999

More information

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones ASummaryofGaussianProcesses Coryn A.L. Baier-Jones Cavendish Laboratory University of Cambridge caj@mrao.cam.ac.uk Introduction A genera prediction probem can be posed as foows. We consider that the variabe

More information

Paper presented at the Workshop on Space Charge Physics in High Intensity Hadron Rings, sponsored by Brookhaven National Laboratory, May 4-7,1998

Paper presented at the Workshop on Space Charge Physics in High Intensity Hadron Rings, sponsored by Brookhaven National Laboratory, May 4-7,1998 Paper presented at the Workshop on Space Charge Physics in High ntensity Hadron Rings, sponsored by Brookhaven Nationa Laboratory, May 4-7,998 Noninear Sef Consistent High Resoution Beam Hao Agorithm in

More information

ORTHOGONAL MULTI-WAVELETS FROM MATRIX FACTORIZATION

ORTHOGONAL MULTI-WAVELETS FROM MATRIX FACTORIZATION J. Korean Math. Soc. 46 2009, No. 2, pp. 281 294 ORHOGONAL MLI-WAVELES FROM MARIX FACORIZAION Hongying Xiao Abstract. Accuracy of the scaing function is very crucia in waveet theory, or correspondingy,

More information

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract Stochastic Compement Anaysis of Muti-Server Threshod Queues with Hysteresis John C.S. Lui The Dept. of Computer Science & Engineering The Chinese University of Hong Kong Leana Goubchik Dept. of Computer

More information

<C 2 2. λ 2 l. λ 1 l 1 < C 1

<C 2 2. λ 2 l. λ 1 l 1 < C 1 Teecommunication Network Contro and Management (EE E694) Prof. A. A. Lazar Notes for the ecture of 7/Feb/95 by Huayan Wang (this document was ast LaT E X-ed on May 9,995) Queueing Primer for Muticass Optima

More information

Robust Sensitivity Analysis for Linear Programming with Ellipsoidal Perturbation

Robust Sensitivity Analysis for Linear Programming with Ellipsoidal Perturbation Robust Sensitivity Anaysis for Linear Programming with Eipsoida Perturbation Ruotian Gao and Wenxun Xing Department of Mathematica Sciences Tsinghua University, Beijing, China, 100084 September 27, 2017

More information

Rate-Distortion Theory of Finite Point Processes

Rate-Distortion Theory of Finite Point Processes Rate-Distortion Theory of Finite Point Processes Günther Koiander, Dominic Schuhmacher, and Franz Hawatsch, Feow, IEEE Abstract We study the compression of data in the case where the usefu information

More information

Alberto Maydeu Olivares Instituto de Empresa Marketing Dept. C/Maria de Molina Madrid Spain

Alberto Maydeu Olivares Instituto de Empresa Marketing Dept. C/Maria de Molina Madrid Spain CORRECTIONS TO CLASSICAL PROCEDURES FOR ESTIMATING THURSTONE S CASE V MODEL FOR RANKING DATA Aberto Maydeu Oivares Instituto de Empresa Marketing Dept. C/Maria de Moina -5 28006 Madrid Spain Aberto.Maydeu@ie.edu

More information

Moreau-Yosida Regularization for Grouped Tree Structure Learning

Moreau-Yosida Regularization for Grouped Tree Structure Learning Moreau-Yosida Reguarization for Grouped Tree Structure Learning Jun Liu Computer Science and Engineering Arizona State University J.Liu@asu.edu Jieping Ye Computer Science and Engineering Arizona State

More information

MATRIX CONDITIONING AND MINIMAX ESTIMATIO~ George Casella Biometrics Unit, Cornell University, Ithaca, N.Y. Abstract

MATRIX CONDITIONING AND MINIMAX ESTIMATIO~ George Casella Biometrics Unit, Cornell University, Ithaca, N.Y. Abstract MATRIX CONDITIONING AND MINIMAX ESTIMATIO~ George Casea Biometrics Unit, Corne University, Ithaca, N.Y. BU-732-Mf March 98 Abstract Most of the research concerning ridge regression methods has deat with

More information