Stochastic Subgradient Methods

Size: px
Start display at page:

Download "Stochastic Subgradient Methods"

Transcription

1 Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, Abstract Stochastic subgradient ethods play an iportant role in achine learning. We introduced the concepts of subgradient ethods and stochastic subgradient ethods in this project, discussed their convergence conditions as well as the strong and weak points against their copetitors. We deonstrated the application of (stochastic) subgradient ethods to achine learning with a running exaple of training support vector achines (SVM) throughout this report. 1 Introduction We have explored a variety of optiization ethods in class (e.g., generalized descent ethods, Newton s ethod, interior-point ethods), all of which apply to differentiable functions. However, any functions that arise in practice aybe non-differentiable at certain places. A coon exaple is the absolute value function f(x) = x, or its ultivariate extension, the l 1 -nor f(x) = x 1 = n x i. Therefore, we ai to discuss a generalization of the gradient for non-differentiable convex functions. In a lot of applications, the exact subgradient could be difficult to calculate because of errors in easureents, uncertainty in the data or because the coputation is intractable. However, it is usually possible to get a noisy estiate to the subgradient. In this case we can use the noisy estiate as the true value in the subgradient ethod, which is called the stochastic subgradient ethod. We ll show in this report that any convergence conditions still apply to the stochastic version, and the expectation of the solution could achieve the sae order of convergence rate as the exact subgradient ethod with often uch fewer coputations. The exaple of training SVMs as a stochastic subgradient ethod illustrates its advantage over deterinistic counterparts. It also shows the close connection between the statistic optiization and achine learning theories. This part of the report is ostly based on the lecture notes of Stephen Boyd [1] and a tutorial on stochastic optiization in ICML 2010 [8]. 2 Subgradient Methods 2.1 Definition A vector g R n is a subgradient of a convex function at x if f(y) f(x) + g T (y x) y (1) If f is convex and differentiable, then its gradient at x is a subgradient. But a subgradient can exist even when f is not differentiable at x, as illustrated in figure 1. The sae exaple shows that there can be ore than one subgradient of a function f at a point x. The set of all subgradients at x is called the subdifferential, and is denoted by f(x). 1

2 Figure 1: At x 1, the convex function f is differentiable, and g 1 (which is the derivative of f at x 1 ) is the unique subgradient at x 1. At the point x 2, f is not differentiable. At this point, f has any subgradients, g 2 and g 3, are shown. Taken fro [1] 2.2 Convergence for subgradient ethods Subgradient ethods are iterative ethods for solving convex iniization probles. Let f : R n R be a convex function with doain R n. A classical subgradient ethod iterates: x (k+1) = x (k) α k g (k) (2) where x (k) denotes the kth iterate, g (k) denotes any subgradient of f at x (k), and α k > 0 is the kth step size. Thus, at each iteration of the subgradient ethod,we take a step in the direction of a negative subgradient. Recall that a subgradient of f at x is any vector g that satisfies the inequality f(y) f(x) + g T (y x)for all y. It ay happen that g (k) is not a descent direction for f at x (k). We therefore aintain a list f best that keeps track of the lowest objective function value found so far, i.e.,the one with sallest function value.at each step, we set f (k) (k 1) best = in{f best, f(x (k) )} (3) and set i (k) best = k if f(x(k) ) = f (k) best, i.e., if x(k) is the best point found so far(in a descent ethod, there is no need to do this, since the current point is always the best one so far.) Then we have the best objective value found in k iterations f (k) best = in{f(x(1) ),..., f(x (k) )} (4) The subgradient ethods are guaranteed to converge to within soe range of the optial value under certain assuption. We assue (1) there is a iniizer of f, say x. (2) the nor of the subgradients is bounded, i.e., there is a G such that f (k) 2 G for all k. This will be the case if f satisfies the Lipschitz condition f(u) f(v) G u v 2, (5) for all u, v, because then g 2 G for any g f(x),and any x. (3) a nuber R is known that satisfies R x (1) x 2. Under those three assuptions, the subgradient algorith is guaranteed to converge to within soe range of the optial value, i.e., we have li f (k) best f <, (6) k where f denotes the optial value of the function f.the nuber is a function of the step size. So for different step size rules, the convergence results are different. The followings are five basic step size rules. (a) Constant step size, α k = α. 2

3 (b) Constant step length, α k = γ/ g (k) 2, which gives x (k+1) x (k) 2 = γ. (c) Square suable but not suable step size, i.e. any step sizes satisfying α k 0, αk 2 <, α k = (7) k=1 k=1 (d) Nonsuable diinishing, i.e. any step sizes satisfying α k 0, li α k = 0, α k = (8) k k=1 (e) Nonsuable diinishing step lengths, i.e. α k = γ k / g (k) 2 (9) whereγ k 0, li k γ k = 0, k=1 γ k =. For all five rules, the step-sizes are deterined off-line, before the ethod is iterated, the step-sizes do not depend on preceding iterations. This off-line property of subgradient ethods differs fro the on-line step-size rules used for descent ethods for differentiable functions: any ethods for iniizing differentiable functions satisfy Wolfe s sufficient conditions for convergence, where step-sizes typically depend on the current point and the current search direction. Terination of the subgradient search procedure occurs when R2 +G 2 k=1 α2 i 2 (which is an k=1 αi upper bound of f (k) best f ) is really, really slow. In fact, up to the present, the subgradient ethod has suffered fro lack of definite terination criterion. So the subgradient ethods are usually used for probles in which very high accuracy is not required. 2.3 Subgradient ethods for unconstrained proble Let f : R n R be a convex function. We seek to solve the unconstrained proble p = in x f(x) x arg in f(x) (10) x Rn For differentiable f the optiality condition reads f(x ) = in x R n f(x) 0 = f(x ) (11) For non-differentiable f the optiality condition generalizes to f(x ) = in x R n f(x) 0 f(x ) (12) which follows fro the definition of the subgradient f(x) f(x ) + 0 T (x x ), x R n. The pseudocode of the subgradient ethod for unconstrained convex function f is shown in Algorith 1. Algorith 1 Subgradient ethod for iniizing unconstrained f(x) Initialize x 0, k = 0, copute a subgradient g k f(x k ) repeat x (k+1) = x (k) α k g (k) until Terination condition is satisfied x (k) : kth iterate, α k > 0: kth step size, g (k) : any subgradient of f at x (k). Let s take the L 2 -nor SVM learning proble as an exaple. Consider the prial for of SVM: in s.t. w 2 + C ξ i ( w, x i y i ) 1 ξ i, i = 1,..., ξ i 0, i = 1,..., (13) 3

4 Table 1: Subgradients of function F (w, ξ). (a) (b) Figure 2: Constrained SVM: the value of f (t) best f versus iteration nuber t. Taken fro [10] Note that ξ i ax{0, 1 ( w, x i y i )} and the equality is attained in the optiu, i.e., we can optiize as s.t. ξ i = ax{0, 1 ( w, x i y i )} = l( w, x i y i ) (14) (which is usually called the hinge loss). We can therefore reforulate the proble as an unconstrained convex proble in w 2 + C l( w, x i y i ) (15) The first ter is differentiable, the second ter is a point-wise axiu, thus this forulation fit the subgradient fraework. The corresponding subgradients are shown in table 2 where b is the coefficient for the constant ter of x. γ g t 2 For this L 2 -nor SVM learning proble, we considered the constant step length rule α t = and the diinishing step size rule α t = γ t. The function converges to a suboptial solution li t f (t) best f γg 2 for the first step size rule, while it converges to optial solution li t f (t) best f = 0 for the second step size rule. Figure 3(a) and 3(b) shows convergence of f (t) best f for different γ. The figures reveal a trade-off: larger γ gives faster convergence, but larger final suboptiality. Non-onotonicity and the slow convergence of f(x (t) ) are also clearly seen fro these plots. 2.4 Subgradient ethods for constrained proble One extension of the subgradient ethod is the projected subgradient ethod, which solves the constrained optiization proble iniize f(x) subject to x C (16) 4

5 (a) (b) Figure 3: Constrained SVM: the value of f (t) best f versus iteration nuber t. Taken fro [10] where C is a convex set, the projected subgradient ethod uses the iteration ( x (k+1) = P x (k) α k g (k)) (17) where P is (Euclidean) projection on C. The pseudocode of the projected subgradient ethod for constrained convex function f is shown in Algorith 2. Algorith 2 Subgradient ethod for iniizing constrained f(x) Initialize x 0, k = 0, copute a subgradient g k f(x k ) repeat x (k+1) = x (k) α k g (k) x (k+1) = P ( x (k+1) ) until Terination condition is satisfied x (k) : kth iterate, α k > 0: kth step size, g (k) : any subgradient of f at x (k). For the sae SVM learning exaple in w 2 + C l( w, x i y i ) (18) An alternative forulation of the sae proble is in 1 l( w, x i y i ) s.t. w B (19) note that the regularization constant C > 0 is replaced here by a less abstract constant B > 0. To ipleent subgradient ethod, we need only a projection operator P (w) = {w so after update w := w α k w, we project w := w B} (20) w w w B. For the reforulated L 2 -nor SVM learning proble, we also plotted the convergence results with constant step length rule and diinishing step size. Figure 4(a) and 4(b) show convergence of f (t) best f for different γ. Coparing these two plots with figure 3(a) and 3(b) respectively, we found this approach converges uch faster. 5

6 The subgradient ethod can also be extended to solve the general inequality constrained proble iniize f 0 (x) subject to f i (x) 0, i = 1,..., (21) where f i are convex. The algorith takes the sae for: x (k+1) = x (k) α k g (k) (22) where α k > 0 is a step size, and g (k) is a subgradient of the objective or one of the constraint functions atx. More specifically, we take { g (k) f0 (x) if f = i (x) 0 i = 1... (23) f j (x) for soe j such that f j (x) > 0 where f denotes the subdifferential of f. In other words: If the current point is feasible, the algorith uses an objective subgradient, as if the proble is unconstrained; if the current point is infeasible, the algorith chooses a subgradient of any violated constraint. In this generalized version of the subgradient algorith, the iterates can be (and often are ) infeasible. In contrast, the iterates of the projected subgradient ethod ( and the basic subgradient algorith) are always feasible. So when the current point is infeasible, we can use an over-projection step length, as we would when solving convex inequalities. If we know (or estiate) f, we can use the Polyak s step length when the current point is feasible. Thus our step lengths are chose as { (f0 (x (k) ) f )/ g (k) 2 α k = 2, x (k) feasible (f i (x (k) ) + )/ g (k) 2 2, x (k) (24) infeasible where is a sall positive argin, and i is the index of the ost violated inequality in the case where x (k) is infeasible. 3 Stochastic Subgradient Methods When the subgradient of the objective function is difficult to copute exactly due to various reasons such as errors in easureents or intractability in the coputation, we can use a noisy estiate of the subgradient for optiization. With a proper choice of the step size, we can guarantee the convergence with probability 1 or even stronger conclusions. Stochastic ethods also apply when the objective function itself is difficult to copute exactly. In the following subsections, we ll introduce the stochastic subgradient descent (SD) ethod, its convergence condition and stochastic prograing. 3.1 Stochastic Subgradient Methods For a convex function f : R n R, g is called the noisy (unbiased) subgradient of f at x dof if g = E g f(x), i.e., f(z) f(x) + (E g) T (z x), z dof (25) If x is also a rando variable, then we say that g is a noisy subgradient of f at x (which is rando) if E( g x) f(x). We can consider the noisy subgradient as the su of g and a rando variable with zero ean. The noise can coe fro error in coputing the true subgradient, error in Monte Carlo evaluation of a function defined as an expected value, or easureent error. The algorith of stochastic subgradient ethod is alost the sae as the regular subgradient ethod except for using a noisy estiate instead of the true subgradient and a ore liited set of step size rules. The pseudocode of the stochastic subgradient ethod for iniization a convex function f without constraints is shown in Algorith 3. As the regular subgradient ethod, f(x (k) ) can increase during the algorith, so we keep track of the best point found so far, and the associated function value f (k) best = in{f(x(1) ),..., f(x (k) )} (26) 6

7 Algorith 3 Stochastic subgradient ethod for iniizing f(x) Initialize x 0, k = 0 repeat x (k+1) = x (k) α k g (k) until Terination condition is satisfied x (k) : kth iterate, α k > 0: kth step size, g (k) : noisy subgradient of f at x (k). We ll show a very basic convergence result for the stochastic subgradient ethod. Assue there is an x that iniizes f, and a G for which E g(k) 2 2 G 2 for all k. We also assue that B satisfies E x (1) x 2 2 B 2. It can be proved that Ef (k) best f B2 + G 2 k 2 k α i α2 i we can get the convergence in expectation (k) li Ef best f (28) k for various step size rules such as square-suable but not suable sequence (e.g. α k = 1/k), and not suable diinishing sequence (e.g. α k = 1/ k). Using Markov s inequality, we obtain the convergence in probability, that is, for any > 0, (k) li Prob(f best f + ) = 0 (29) k More sophisticated ethods can be used to show alost sure convergence. 3.2 Applying SD to SVM Let s still take the SVM learning proble as an exaple. Consider the nor constraint forulation of SVM: in F (w) = in 1 l( w, x i y i ) (30) w 2 B w 2 B The objective function F (w) is the average hinge loss over the training set. The regular gradient descent algorith, referred to as batch subgradient ethod, coputes the subgradient as the average of the subgradients of the loss function for every data point, i.e., g (k) = 1 w l( w k, x i y i ) (31) and then project the new paraeter vector back to the ball of radius B at each step as w (k+1) if the new vector outside the ball. B w(k+1) w (k+1) In contrast, we consider the stochastic version, Algorith 3, with the extra projection step above. At each iteration, the stochastic subgradient is coputed by randoly drawing a saple fro data points, x i(k), and coputing the noisy subgradient as g (k) = w l( w k, x i(k) y i(k) ) (32) It s easy to see that E g (k) = g (k). In this exaple, we choose the step size rule as α k = B/G k the final solution is coputed as w = 1 k k w k. It can be proved that for the batch subgradient descent with such a choice the step size, we can achieve the sub-optiality: F (w (k) ) F (w ) O( GB k ) (33) (27) and 7

8 where w is the optial solution. This is the best possible convergence rate using only F (w) and F (w). For SD we can get the sae guarantee in the expectation of the objective function: E[F ( w (k) ] F (w ) O( GB k ) (34) Taking into consideration that the batch ethod takes ties as any operations as the stochastic ethod with the sae guarantee on errors, we would coe to an intuitive conclusion that if only taking siple gradient steps, it s better to be stochastic. Forally speaking, to guarantee an error bound of in F (w), the runtie of SD is O( G2 B 2 d) while the runtie of batch subgradient ethod 2 is O( G2 B 2 d), where d is the diensionality of the feature vector. The stochastic ethod achieves the optial 2 runtie if only using gradients and only assuing Lipschitz condition. The difference in speed is significant for high diensional probles. We should notice that the error bound of the stochastic ethod is defined in expectation. The convergence rate as a function of update steps for individual realizations of the ethod is usually slower than the batch ethod due to the noise. In fact, the slow convergence acts as a soothing factor that reduces the variance in the stochastic subgradients. We could consider a ixed ethod that draws a subset of data points fro {x i } and coputes the average as the stochastic subgradient. This ethod, usually called ini-batch ethod, provides a tradeoff between the coputational burden at each step and the error variance (relevant to convergence rate). Also, the runtie of SD is optial only in the first order ethods. Second order ethods such as Newton s ethod can use uch fewer steps than SD for the sae error bound although it ay take uch ore tie in each step. 3.3 Stochastic Prograing We have discussed in the previous subsections the situation when the subgradient is difficult or expensive to copute and we take a noisy estiate to optiize the objective function. Now we ll talk about the case when the objective function itself is difficult to copute. A stochastic prograing proble has the for iniize Ef 0 (x, ω) subject to Ef i (x, ω) 0, i = 1,..., (35) where x R n is the optiization variable, and ω is a rando variable. If f i (x, ω) is convex in x for each ω, the proble is a convex stochastic prograing proble. In this case the objective and constraint functions are convex. Stochastic prograing can be used to odel a variety of robust design or decision probles with uncertain data. Although the basic for above involves only expectation or average values, soe tricks can be used to capture other easures of the probability distributions of f i (x, ω). We can replace an objective or constraint ter Ef(x, ω) with EΦ(f(x, ω)), where Φ is a convex increasing function. For exaple, with Φ(u) = ax(u, 0), we can for a constraint of the for Ef i (x, ω) + (36) where is a positive paraeter, and (.) + denotes positive part. Here Ef i (x, ω) + has a siple interpretation as the expected violation of the ith constraint. It s also possible to cobine the constraints, using a single constraint of the for E ax(f 1 (x, ω) +,..., f (x, ω + ) (37) The lefthand side here can be interpreted as the expected worst violation (over all constraints). Proble (35) is associated with another optiization proble that replaces the rando variable by its expected value, which is soeties called the certainty equivalent of the original stochastic prograing, iniize f 0 (x, Eω) subject to f i (x, Eω) 0, i = 1,..., (38) By Jensen s inequality, we get Ef i (x, ω) f i (x, Eω) (39) 8

9 It tells us that the constraint set for the certainty equivalent proble is larger than the original stochastic proble (35), and its objective is saller. It follows that the optial value of the certainty equivalent proble gives a lower bound on the optial value of the stochastic proble (35). (It can be a poor bound, of course.) The objective function f(x) = F (x, ω)p(ω)dω is usually not easy to copute exactly especially for a function with a high diensional variable. However, in any applications, we can approxiately copute f using Monte Carlo ethods. Assue we generate M independent identically distributed (iid) saples ω 1,..., ω M. We can copute an estiate of f as ˆf(x) = 1 M M F (x, ω i ) (40) Further ore, if we calculate the subgradient of F (x, ω) for each value of ω, x F (x, ω i ), it can be proved that their average, g(x) = 1 M M xf (x, ω i ), gives an unbiased noisy estiate of the subgradient of f(x), i.e., g(x) E g(x) = E F (x, ω) x Ef(x, ω) (41) Therefore, we can use the stochastic subgradient descent ethod described in subsection 3.1 to solve the optiization proble (35). This result is independent of M. We can even take M = 1. In this case, g = F (x, ω 1 ). In other words, we siply generate one saple ω 1 and use the subgradient of F for that value of ω. In this case g could hardly be called a good approxiation of a subgradient of f, but its ean is a subgradient, so it is a valid noisy unbiased subgradient. On the other hand, we can take M to be large. In this case, g is a rando vector with ean g and very sall variance, i.e., it is a good estiate of g, a subgradient of f at x. 3.4 Applying Stochastic Prograing to SVM The stochastic prograing is closely related to another training setting in achine learning: iniizing generalization error. Since the purpose of achine learning is to learn a odel that generalizes to unobserved data, it akes ore sense to iniize the generalization error E p [l(w, z)] rather than the error on the epirical distribution 1 M M l(w, z i) (soeties called epirical risk), where p is the true distribution of data z, and z i is the ith of the M data ites in the training set. Actually, optiization on the epirical risk often leads to over-fitting in the training set. Fro this aspect, we would like to forulate the achine learning proble in the generic learning setting [9] (taking SVM training as an exaple again) as in F (w) = in E zf (w, z) (42) w 2 B w 2 B = in w 2 B E (x,y)l( w, x y) (43) In general, training a odel by iniizing the generalization error (42) is a stochastic prograing proble. Since the loss function l of SVM is a convex function, proble (43) is also a convex stochastic prograing proble. We copare two approaches to proble (43). One approach is called saple average approxiation (SAA) [4, 7, 5]. When an th saple, (x, y ), is collected, SAA optiizes the epirical loss in w ˆF (w) = in w 1 l( w, x i y i ) (44) by any optiization ethod such as SD, batch subgradient descent or Newton s ethod. This corresponds to the epirical risk iniization ethod we entioned above if we consider the set of saples as the training set. Analysis on the error bound with SAA is typically based on the unifor concentration theore. The other approach is called saple approxiation (SA) [6]. SA uses the siple stochastic subgradient descent ethod to directly iniize F (w). Every tie a new saple is collected, a noisy 9

10 Figure 4: Illustration of the relationships between the initialization (w (0) ), outputs of SAA/SA, ŵ/ w () and the optial solution w. Taken fro [8]. subgradient is coputed as g (k) = w l( w (k), x k y k ). Then w (k) is updated based on that weak estiate to F (w (k) ). Since this is a typical stochastic subgradient descent ethod on a convex optiization proble, conclusions about convergence rate on the general stochastic ethods also apply here. We show the conclusions of SAA and SA below without proofs. Let ŵ be the optial solution of proble (44). We can find an upper bound on the error L(ŵ) L(w ) O by the unifor concentration theore. Thus to guarantee L( w () SAA ) L(w ) +, we need a saple of size = Ω( G2 B 2 ) which leads to the conclusion that the runtie to guarantee an upper bound on 2 the error is Ω(d) = Ω( G2 B 2 d). Notice that this result applies to any optiization algorith. 2 In contrast, for SA after one pass over the data ites, the generalization error with w () SA is upper bounded by L( w () SA ) L( w ) + O( ). So to guarantee that L( w() SA ) L( w ) G 2 B 2 the saple size would be = O( G2 B 2 ) and consequently the runtie for an error gap of is 2 O(d) = O( G2 B 2 d). 2 SA guarantees the best-possible generalization error with optial runtie in theory. The relationship between the outputs of SAA/SA and the optial solution is shown in figure 4. However, we should notice that SA achieves the optial generalization error up to a constant factor, and in practice, SA still sees to be better than SA in the generalization error. Nevertheless, stochastic gradient descent is usually considered the fastest algorith for solving any achine learning probles. G 2 B 2 4 Discussion The subgradient ethod is very siple and general, it applies directly to nondifferentiable f,any subgradient fro f(x (k) ) is sufficient. It is guaranteed to converge for properly selected step size. However, coparing with ordinary gradient ethod, the convergence can be uch slower, as it usually requires a lot of iterations. There is no onotonic convergence since it is not a descent ethod, so the line search cannot be used to final optial α k. Besides, there is no good stopping criteria. Subgradient ethod is part of a large class of ethods known as first-order ethods, which are based on subgradients or closely related objects. In contrast, second-order ethods(like interior-point ethods or Newton s ethod) typically use Hessian inforation; each iteration is expensive (both in flops and eory) but convergence is obtained very quickly. There are several general approaches that can be used to speed up subgradient ethods. Localization ethods such as cutting-plane and ellipsoid ethods are typically uch faster than basic subgradient ethods. Another general approach is to base the update on soe conic cobination of previously evaluated subgradients. For exaple, in bundle ethods, the update direction is found as the least-nor convex cobination of soe previous subgradients. Further ore, the convergence can be iproved by keeping eory of past steps, x (k+1) = x (k) α k g (k) + β k (x (k) x (k 1) ) (45) 10

11 where α k and β k are positive.we can interpret the second ter as a eory ter. Such algoriths have state, whereas the basic subgradient ethods is stateless. Conjugate gradients ethods have a siilar for. The stochastic variant of the subgradient ethod relaxes the condition for the application subgradient ethod where exact coputation of the subgradient isn t necessary anyore. This usually slows down the convergence as a function of update steps due to the noise in the descent direction, but it could save a lot of tie in each step because a noisy estiate can be easy to obtain. Moreover, the sae order of convergence rate as the deterinistic descent ethods either in iniizing the epirical error or the generalization error akes SD a very copetitive tool in achine learning. Stochastic learning is in fact closed related to online learning except for few differences such as the objective function (online learning iniizes epirical regret), and the assuption for observation (adversary vs iid) distributions [8]. There are also soe other learning algoriths that enjoy siilar or even better advantages than SD including stochastic coordinate descent [3] and stochastic second order ethods [2]. References [1] S. Boyd. Stochastic subgradient ethods. lecture slides and notes for ee364b. Stanford University, Spring quarter [2] R. Byrd, G. Chin, W. Neveitt, and J. Nocedal. On the use of stochastic hessian inforation in unconstrained optiization. Technical report, Technical Report, Northwestern University, [3] C. Hsieh, K. Chang, C. Lin, S. Keerthi, and S. Sundararajan. A dual coordinate descent ethod for large-scale linear SVM. In Proceedings of the 25th international conference on Machine learning, pages ACM, [4] A. J. Kleywegt, A. Shapiro, and T. H. de Mello. The saple average approxiation ethod for stochastic discrete optiization. SIAM J. on Optiization, 12: , February [5] E. Plabeck, B. Fu, S. Robinson, and R. Suri. Saple-path optiization of convex stochastic perforance functions. Matheatical Prograing, 75(2): , [6] H. Robbins and S. Monro. A stochastic approxiation ethod. The Annals of Matheatical Statistics, 22(3): , [7] R. Y. Rubinstein and A. Shapiro. Optiization of static siulation odels by the score function ethod. Math. Coput. Siul., 32: , October [8] N. Srebro and A. Tewari. Stochastic optiization: Icl 2010 tutorial, July [9] V. Vapnik. The nature of statistical learning theory [10] F. Vojtech. A short tutorial on subgradient ethods. 11

Stochastic Subgradient Method

Stochastic Subgradient Method Stochastic Subgradient Method Lingjie Weng, Yutian Chen Bren School of Information and Computer Science UC Irvine Subgradient Recall basic inequality for convex differentiable f : f y f x + f x T (y x)

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Stephen Boyd and Almir Mutapcic Notes for EE364b, Stanford University, Winter 26-7 April 13, 28 1 Noisy unbiased subgradient Suppose f : R n R is a convex function. We say

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION GUANGHUI LAN AND YI ZHOU Abstract. In this paper, we consider a class of finite-su convex optiization probles defined over a distributed

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Bootstrapping Dependent Data

Bootstrapping Dependent Data Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly

More information

On Constant Power Water-filling

On Constant Power Water-filling On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives

More information

The Methods of Solution for Constrained Nonlinear Programming

The Methods of Solution for Constrained Nonlinear Programming Research Inventy: International Journal Of Engineering And Science Vol.4, Issue 3(March 2014), PP 01-06 Issn (e): 2278-4721, Issn (p):2319-6483, www.researchinventy.co The Methods of Solution for Constrained

More information

Distributed Subgradient Methods for Multi-agent Optimization

Distributed Subgradient Methods for Multi-agent Optimization 1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions

More information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub

More information

Convex Programming for Scheduling Unrelated Parallel Machines

Convex Programming for Scheduling Unrelated Parallel Machines Convex Prograing for Scheduling Unrelated Parallel Machines Yossi Azar Air Epstein Abstract We consider the classical proble of scheduling parallel unrelated achines. Each job is to be processed by exactly

More information

Asynchronous Gossip Algorithms for Stochastic Optimization

Asynchronous Gossip Algorithms for Stochastic Optimization Asynchronous Gossip Algoriths for Stochastic Optiization S. Sundhar Ra ECE Dept. University of Illinois Urbana, IL 680 ssrini@illinois.edu A. Nedić IESE Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu

More information

Tight Complexity Bounds for Optimizing Composite Objectives

Tight Complexity Bounds for Optimizing Composite Objectives Tight Coplexity Bounds for Optiizing Coposite Objectives Blake Woodworth Toyota Technological Institute at Chicago Chicago, IL, 60637 blake@ttic.edu Nathan Srebro Toyota Technological Institute at Chicago

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

Data Dependent Convergence For Consensus Stochastic Optimization

Data Dependent Convergence For Consensus Stochastic Optimization Data Dependent Convergence For Consensus Stochastic Optiization Avleen S. Bijral, Anand D. Sarwate, Nathan Srebro Septeber 8, 08 arxiv:603.04379v ath.oc] Sep 06 Abstract We study a distributed consensus-based

More information

Approximation in Stochastic Scheduling: The Power of LP-Based Priority Policies

Approximation in Stochastic Scheduling: The Power of LP-Based Priority Policies Approxiation in Stochastic Scheduling: The Power of -Based Priority Policies Rolf Möhring, Andreas Schulz, Marc Uetz Setting (A P p stoch, r E( w and (B P p stoch E( w We will assue that the processing

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Soft-margin SVM can address linearly separable problems with outliers

Soft-margin SVM can address linearly separable problems with outliers Non-linear Support Vector Machines Non-linearly separable probles Hard-argin SVM can address linearly separable probles Soft-argin SVM can address linearly separable probles with outliers Non-linearly

More information

Randomized Recovery for Boolean Compressed Sensing

Randomized Recovery for Boolean Compressed Sensing Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch

More information

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words) 1 A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine (1900 words) Contact: Jerry Farlow Dept of Matheatics Univeristy of Maine Orono, ME 04469 Tel (07) 866-3540 Eail: farlow@ath.uaine.edu

More information

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Analyzing Simulation Results

Analyzing Simulation Results Analyzing Siulation Results Dr. John Mellor-Cruey Departent of Coputer Science Rice University johnc@cs.rice.edu COMP 528 Lecture 20 31 March 2005 Topics for Today Model verification Model validation Transient

More information

An incremental mirror descent subgradient algorithm with random sweeping and proximal step

An incremental mirror descent subgradient algorithm with random sweeping and proximal step An increental irror descent subgradient algorith with rando sweeping and proxial step Radu Ioan Boţ Axel Böh April 6, 08 Abstract We investigate the convergence properties of increental irror descent type

More information

Understanding Machine Learning Solution Manual

Understanding Machine Learning Solution Manual Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Ştefan ŞTEFĂNESCU * is the minimum global value for the function h (x)

Ştefan ŞTEFĂNESCU * is the minimum global value for the function h (x) 7Applying Nelder Mead s Optiization Algorith APPLYING NELDER MEAD S OPTIMIZATION ALGORITHM FOR MULTIPLE GLOBAL MINIMA Abstract Ştefan ŞTEFĂNESCU * The iterative deterinistic optiization ethod could not

More information

An incremental mirror descent subgradient algorithm with random sweeping and proximal step

An incremental mirror descent subgradient algorithm with random sweeping and proximal step An increental irror descent subgradient algorith with rando sweeping and proxial step Radu Ioan Boţ Axel Böh Septeber 7, 07 Abstract We investigate the convergence properties of increental irror descent

More information

Research Article Robust ε-support Vector Regression

Research Article Robust ε-support Vector Regression Matheatical Probles in Engineering, Article ID 373571, 5 pages http://dx.doi.org/10.1155/2014/373571 Research Article Robust ε-support Vector Regression Yuan Lv and Zhong Gan School of Mechanical Engineering,

More information

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are, Page of 8 Suppleentary Materials: A ultiple testing procedure for ulti-diensional pairwise coparisons with application to gene expression studies Anjana Grandhi, Wenge Guo, Shyaal D. Peddada S Notations

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

OPTIMIZATION in multi-agent networks has attracted

OPTIMIZATION in multi-agent networks has attracted Distributed constrained optiization and consensus in uncertain networks via proxial iniization Kostas Margellos, Alessandro Falsone, Sione Garatti and Maria Prandini arxiv:603.039v3 [ath.oc] 3 May 07 Abstract

More information

1 Identical Parallel Machines

1 Identical Parallel Machines FB3: Matheatik/Inforatik Dr. Syaantak Das Winter 2017/18 Optiizing under Uncertainty Lecture Notes 3: Scheduling to Miniize Makespan In any standard scheduling proble, we are given a set of jobs J = {j

More information

Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008

Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008 LIDS Report 2779 1 Constrained Consensus and Optiization in Multi-Agent Networks arxiv:0802.3922v2 [ath.oc] 17 Dec 2008 Angelia Nedić, Asuan Ozdaglar, and Pablo A. Parrilo February 15, 2013 Abstract We

More information

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup) Recovering Data fro Underdeterined Quadratic Measureents (CS 229a Project: Final Writeup) Mahdi Soltanolkotabi Deceber 16, 2011 1 Introduction Data that arises fro engineering applications often contains

More information

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science A Better Algorith For an Ancient Scheduling Proble David R. Karger Steven J. Phillips Eric Torng Departent of Coputer Science Stanford University Stanford, CA 9435-4 Abstract One of the oldest and siplest

More information

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence Best Ar Identification: A Unified Approach to Fixed Budget and Fixed Confidence Victor Gabillon Mohaad Ghavazadeh Alessandro Lazaric INRIA Lille - Nord Europe, Tea SequeL {victor.gabillon,ohaad.ghavazadeh,alessandro.lazaric}@inria.fr

More information

Introduction to Machine Learning. Recitation 11

Introduction to Machine Learning. Recitation 11 Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,

More information

When Short Runs Beat Long Runs

When Short Runs Beat Long Runs When Short Runs Beat Long Runs Sean Luke George Mason University http://www.cs.gu.edu/ sean/ Abstract What will yield the best results: doing one run n generations long or doing runs n/ generations long

More information

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality

More information

Optimum Value of Poverty Measure Using Inverse Optimization Programming Problem

Optimum Value of Poverty Measure Using Inverse Optimization Programming Problem International Journal of Conteporary Matheatical Sciences Vol. 14, 2019, no. 1, 31-42 HIKARI Ltd, www.-hikari.co https://doi.org/10.12988/ijcs.2019.914 Optiu Value of Poverty Measure Using Inverse Optiization

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels Extension of CSRSM for the Paraetric Study of the Face Stability of Pressurized Tunnels Guilhe Mollon 1, Daniel Dias 2, and Abdul-Haid Soubra 3, M.ASCE 1 LGCIE, INSA Lyon, Université de Lyon, Doaine scientifique

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon Lecture 2: Enseble Methods Isabelle Guyon guyoni@inf.ethz.ch Introduction Book Chapter 7 Weighted Majority Mixture of Experts/Coittee Assue K experts f, f 2, f K (base learners) x f (x) Each expert akes

More information

Mixed Robust/Average Submodular Partitioning

Mixed Robust/Average Submodular Partitioning Mixed Robust/Average Subodular Partitioning Kai Wei 1 Rishabh Iyer 1 Shengjie Wang 2 Wenruo Bai 1 Jeff Biles 1 1 Departent of Electrical Engineering, University of Washington 2 Departent of Coputer Science,

More information

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

Ph 20.3 Numerical Solution of Ordinary Differential Equations

Ph 20.3 Numerical Solution of Ordinary Differential Equations Ph 20.3 Nuerical Solution of Ordinary Differential Equations Due: Week 5 -v20170314- This Assignent So far, your assignents have tried to failiarize you with the hardware and software in the Physics Coputing

More information

Lecture 9: Multi Kernel SVM

Lecture 9: Multi Kernel SVM Lecture 9: Multi Kernel SVM Stéphane Canu stephane.canu@litislab.eu Sao Paulo 204 April 6, 204 Roadap Tuning the kernel: MKL The ultiple kernel proble Sparse kernel achines for regression: SVR SipleMKL:

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS ISSN 1440-771X AUSTRALIA DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS An Iproved Method for Bandwidth Selection When Estiating ROC Curves Peter G Hall and Rob J Hyndan Working Paper 11/00 An iproved

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

Fairness via priority scheduling

Fairness via priority scheduling Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation

More information

Proceedings of the 2016 Winter Simulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds.

Proceedings of the 2016 Winter Simulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds. Proceedings of the 2016 Winter Siulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtan, E. Zhou, T. Huschka, and S. E. Chick, eds. THE EMPIRICAL LIKELIHOOD APPROACH TO SIMULATION INPUT UNCERTAINTY

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics, EPFL, Lausanne Phone: Fax:

A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics, EPFL, Lausanne Phone: Fax: A general forulation of the cross-nested logit odel Michel Bierlaire, EPFL Conference paper STRC 2001 Session: Choices A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics,

More information

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory Pseudo-arginal Metropolis-Hastings: a siple explanation and (partial) review of theory Chris Sherlock Motivation Iagine a stochastic process V which arises fro soe distribution with density p(v θ ). Iagine

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna

More information

Ocean 420 Physical Processes in the Ocean Project 1: Hydrostatic Balance, Advection and Diffusion Answers

Ocean 420 Physical Processes in the Ocean Project 1: Hydrostatic Balance, Advection and Diffusion Answers Ocean 40 Physical Processes in the Ocean Project 1: Hydrostatic Balance, Advection and Diffusion Answers 1. Hydrostatic Balance a) Set all of the levels on one of the coluns to the lowest possible density.

More information

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer

More information

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable

More information

Compression and Predictive Distributions for Large Alphabet i.i.d and Markov models

Compression and Predictive Distributions for Large Alphabet i.i.d and Markov models 2014 IEEE International Syposiu on Inforation Theory Copression and Predictive Distributions for Large Alphabet i.i.d and Markov odels Xiao Yang Departent of Statistics Yale University New Haven, CT, 06511

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

Inexact Proximal Gradient Methods for Non-Convex and Non-Smooth Optimization

Inexact Proximal Gradient Methods for Non-Convex and Non-Smooth Optimization The Thirty-Second AAAI Conference on Artificial Intelligence AAAI-8) Inexact Proxial Gradient Methods for Non-Convex and Non-Sooth Optiization Bin Gu, De Wang, Zhouyuan Huo, Heng Huang * Departent of Electrical

More information

Fixed-to-Variable Length Distribution Matching

Fixed-to-Variable Length Distribution Matching Fixed-to-Variable Length Distribution Matching Rana Ali Ajad and Georg Böcherer Institute for Counications Engineering Technische Universität München, Gerany Eail: raa2463@gail.co,georg.boecherer@tu.de

More information

Kinematics and dynamics, a computational approach

Kinematics and dynamics, a computational approach Kineatics and dynaics, a coputational approach We begin the discussion of nuerical approaches to echanics with the definition for the velocity r r ( t t) r ( t) v( t) li li or r( t t) r( t) v( t) t for

More information

Accuracy at the Top. Abstract

Accuracy at the Top. Abstract Accuracy at the Top Stephen Boyd Stanford University Packard 264 Stanford, CA 94305 boyd@stanford.edu Mehryar Mohri Courant Institute and Google 25 Mercer Street New York, NY 002 ohri@cis.nyu.edu Corinna

More information

Tracking using CONDENSATION: Conditional Density Propagation

Tracking using CONDENSATION: Conditional Density Propagation Tracking using CONDENSATION: Conditional Density Propagation Goal Model-based visual tracking in dense clutter at near video frae rates M. Isard and A. Blake, CONDENSATION Conditional density propagation

More information

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space Journal of Machine Learning Research 3 (2003) 1333-1356 Subitted 5/02; Published 3/03 Grafting: Fast, Increental Feature Selection by Gradient Descent in Function Space Sion Perkins Space and Reote Sensing

More information