Boosting with log-loss

Size: px
Start display at page:

Download "Boosting with log-loss"

Transcription

1 Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the sign of F x i ) giving the prediction on exaple i. We assue that F is a su of a set of basis functions f j F, where F is the set of possible basis functions: T F x) = f j x i ) j= For now, we focus on weak learners of the following for, although siilar algoriths apply for other fors for the weak learners. { c x A f A,c x) = 0 x A where we have soe set A of possible partitions A and a score c R, so that F A R. An obvious objective to iniize is the average training error: J err = [y i signfx))] i This is very challenging optiization proble because it is not even continuous. Machine learning algoriths use various other loss functions that approxiate this objective but are easier to optiize. 2 Adaboost as greedy coordinate descent Adaboost attepts to iniize the exponential loss: J exp = T )) exp y i f j x i ) j=

2 which is an upper bound on the average training error. As discussed in [3] and [2], Adaboost iniizes this loss by doing greedy coordinate descent on the functions f j. That is, each f k is chosen such that: f k = argin f F k )) exp y i f j x i ) + f k x i ) where the previous f j for j =... k are fixed. Let F k x i ) = k j= f jx i ) be the aggregated contributions fro the existing ters for each data point i. Then, J exp = exp y i F k x i ) + f k x i ))) = w i exp y i f k x i )) j= where the weights w i un-noralized) are defined to be w i := exp y i F k x i )) Each iteration is then interpreted as training a new weak learner f k on the weighted training set with respect to the exponential loss. This coordinate-wise procedure can also be seen as a procedure for feature selection. We now derive the Adaboost updates for our choice of basis functions. At each iteration: J exp A, c) = w i exp y i f A,c x i )) [ = w i exp c) + w i expc) + ] w i i:x i A,y i =+ i:x i A,y i = i:x i A = [ exp c)w + A) + expc)w A) + W A) ] where W + A), W A), and W A) are the sus of weights for positive exaples with x i A, negative exaples with x i A, and exaples with x i A, respectively. We first find the optiu value of c for each choice of A. Differentiating with respect to c and setting to zero gives: c A) = ) W + 2 log A) W A) Plugging this expression into J exp A, c) gives the iniu acheivable loss using a basis function with partition set A: JexpA) = ) W W + + /2 ) A) W A) + W + /2 A) A) + W A)) W A) W A) = 2 ) W + A)W A) + W A) This expression is very convenient, because it is expressed in ters of sus of weights, which can be efficiently coputed for all A with sparse atrix ultiplication when A is large but discrete. 2

3 3 Greedy coordinate descent with the log-loss We can derive an algorith siilar to Adaboost that will perfor feature selection, but with the log-loss instead of the exponential loss. The log-loss does not increase exponentially as the argin gets ore and ore negative see Figure ), and is therefore ore robust to outliers and noisy data. The log-loss is: Figure : The log-loss as a function of the new c for a given set of weights w and a given A, with the scores chosen by the various algoriths above. J log = log [ + exp y i F k x i ) y i f k x i ))] Following what we did for Adaboost, we seek to greedily iniizing the following function at each step, with respect to the new function f k, keeping the previous functions fixed. f k = argin f F log [ + exp y i F k x i ) y i f k x i ))] Unfortunately, this cannot be expressed in ters of an optiization on a weighted training set because each log-loss ter does not factorize into the contribution of the previous functions F k and the contribution fro the new function f k, as was the case for the exponential loss. Algorith LogitBoost) For our choice of basis functions F, to solve this proble exactly we would have to perfor a nuerical optiization for each A to find the best c A), and then plug this c A) into the loss function to get Jlog A) and choose the A with the iniu value. The objective for the optiization of c, J log c; A), is convex in c, as we now show. Let Fk i = F k x i ) for ore copact notation. 3

4 d 2 dc 2 J logc; A) = = i:x i A d dc J logc; A) = i:x i A y i exp y i F i k + c)) + exp y i F i k + c)) [ y 2 i exp y i Fk i + c)) + exp y i Fk i + c)) + y i exp y i Fk i + c)) y i) ) exp y i Fk i + c)) ] + exp y i Fk i + c)))2 [ y 2 i exp y i Fk i + c)) + exp y i Fk i + c)) + y2 i exp 2y i Fk i + c)) ] + exp y i Fk i + c)))2 i:x i A Each single ter is positive if: exp y i Fk i + c)) + exp y i Fk i + c)) exp 2y i Fk i + c)) + exp y i Fk i + c)))2 + exp y i F i k + c))) exp y i F i k + c)) exp 2y i F i k + c)) exp y i F i k + c)) 0 Therefore the optiization over c is convex, and can be solved with Newton s ethod. The LogitBoost algorith of Friedan, Hastie and Tibshirani [2]) executes one Newton step, and this appears to be sufficient to get very close to the true iniu, as seen in in Figure 2. However, Friedan, Hastie and Tibshirani also suggest intentionally using a single Newton step instead of exact iniization for the Adaboost objective, which they call Gentle Adaboost. The optial c for the log-loss single Newton step algorith LogitBoost) is just the Newton step itself, since the starting point is always c = 0: c A) = = d J dc logc; A) c=0 d 2 J dc 2 log c; A) c=0 i:x i A y i exp y i Fk i ) i:x i A +exp y i Fk i ) [ y ] 2 i exp y i Fk i ) + y2 i exp 2y ifk i ) +exp y i Fk i ) +exp y i Fk i ))2 i:x i A,y i =+ w i i:x i A,y i = w i i:x i A w i w i ) = W + A) W A) W A) where w i := are interpreted as un-noralized) weights, and W + +expy i F k x i A) and )) W A), are sus of weights as before and W A) = i:x i A w i w i ). Plugging in this value of c for one Newton step gives an estiate for the best loss achievable for a choice of 4

5 A: J loga) i:x i A ) W + A) W A) log + exp y i F i y i ) W A) + i:x i A log + exp y i F i )) We refer to this process of choosing the c A) by one Newton step, and then finding the A with iniu resulting log-loss as LogitBoost. Unfortunately, this does not decopose into a direct function of sus over weights, and so we cannot use sparse atrix ultiplcation loss to choose the best A as we did with the exponential loss. Instead, for each A, we ust loop over the training set to copute the true log-loss. Algorith 2 QuadBoost) If we approxiate the iniu loss for each A at c A) by the iniu of the second-order expansion of the loss which is what the Newton step actually iniizes), then we get a quantity that is expressed in ters of sus. This is given by: d dc J logc; A) c=0 ) 2 J A) J prev 2 d 2 J dc 2 log c; A) c=0 = i:x i A y ) 2 iw i 2 i:x i A w i w i ) i:xi A,yi=+ w i 2 i:xi A,yi= i) w = J prev 2 i:x i A w i w i ) This expression only uses sus of weights, which we can copute efficiently for all A at once using atrix ultiplication. We refer to this process of choosing c A) by a single Newton step, and then using the second-order approxiation criterion above to find the best A as QuadBoost ). Note that for MEDUSA this update only requires one set of atrix ultiplications instead of two for the existing code. Algorith 3 GradientBoost) An different approxiation also allows the search over our set A to be done using sparse atrix ultiplication. Instead of using the A that gives the optial loss when J log c; A) is optiized, we use the A that gives the highest value of d J dc logc; A) c=0. The intuition is that this is the local direction of steepest descent in A space. Then, we perfor a line search for the optial c for our choice of A. For the log-loss, the criterion for choosing A is: d dc J logc; A) c=0 = i:x i A y i exp y i F k x i )) + exp y i F k x i )) = W + A) W A) 5

6 where the un-noralized weights w i are as in LogitBoost. The A A that axiizes this criterion can be found efficiently, since the W s are found for all A with sparse atrix ultiplication. Then, we can find the exact optial choice of c A) by a single Newton step. A siilar idea to this is presented in [?]. Algorith 4 CollinsBoost) Collins, Schapire, and Singer []) present a sequential update algorith for both the exponential and log-loss in Figure 2, using the language of Bregan divergences to ake the connection. When applied to the exponential loss, the algorith is basically identical to Adaboost. When applied to the log-loss, the algorith is distinct fro any of the above. We derive a version of the algorith here in a siple way without discussing Bregan distances. Consider the following upper bound on the log-loss for a given training exaple i, as a function of the choice of new basis function f k : J i logf k ) = log + exp y i F i k + f k x i )) ) = log + exp y i Fk i + f k x i ) ) log + exp y i Fk ) ) i + log + exp y i Fk ) ) i + exp yi Fk i = log + f ) kx i )) + exp y i Fk i ) + log + exp y i Fk ) ) i expyi Fk i = log ) + exp y ) if k x i )) + expy i Fk i x + log + exp y i Fk ) ) i i)) ) = log + expy i Fk i ) + + expy i Fk i ) expy if k x i )) + log + exp y i Fk ) ) i + expy i Fk i ) exp y if k x i )) ) + log + exp y i Fk ) ) i where the last line coes fro log+x) x. Note that this bound is exact when f k x i ) = 0. An upper bound on the total log-loss is then: J log f k ) + expy i Fk i ) exp y if k x i )) ) + log + exp y i Fk ) ) i = w i exp y i f k x i )) wi log + exp y i F i k )) ) where w i is again as in Algoriths and 2. The ters on the right are constant, so iniizing this upper bound on each iteration is done by finding: f k = argin f w i exp y i fx i )) Note that this is identical to the update rule for Adaboost, except for the redefinition of the weights w i. This was noted briefly in [3]. This eans that the update rules for finding the optial A and c are also identical, except for the choice of weighting function. 6

7 Note that each step is not training a log-loss weak learner on the weighted training set, but an exponential-loss weak learner. The fact that the upper bound is exact for c = 0 suggests that this algorith will be ost exact in the later stages of the training. This algorith appears to perfor significantly worse than Algorith. Algorith 5 MedusaBoost) The current logit-boost MEDUSA code iniizes the following function exactly) at each step with respect to f A,c : J ed A, c) = + expy i F i k ) log + exp y if A,c x i ))) = w i log + exp y i f A,c x i ))) The intuition is that we are training a log-loss weak learner on a re-weighted version of the training set on each iteration. This akes sense conceptually, but is not any of the algoriths in the literature discussed above. For our set of basis functions, this function is: J ed A, c) = w i log + exp c)) + w i log2) + i:x i A,y i =+ i:x i C,y i = w i log + expc)) + Finding the optial c for a given A by differentiating: c A) = log W + A) W A) i:x i A,y i =+ i:x i A,y i = w i log2) Note that this is twice the prediction of Algorith 3. Plugging this into the objective gives the expression for the optial loss for a given A: JedA) = W total log 2 + W + A) log W ) A) + W A) log W ) + A) W + A) W A) which is the criterion used to choose a rule A in the code. If interpreted as an algorith for approxiately greedily iniizing the total log loss J log, this algorith finds the optial c by using the following approxiation of the derivative of the log-loss: d dc log + exp y i Fk i + c) ) y i = + expy i Fk i + c)) y i + expy i Fk i ) + expy i c) This approxiation is best when the argin y i F i k is near zero. 7

8 3. Coparison of algoriths I evaluated how well each of the above algoriths greedily iniizes the log-loss at each iteration, as well as the train and test accuracy, for a particular data setting: I used a set of 3D chip-features exonarray_cons_vs_uw_fdr0.0_pks_proxk_v), with no paired regulator features, in addition to the NOT of all these chip features, and a bias feature for a total of base features). The training log-loss the quantity being iniized) is plotted in Figure 2 for each of the algoriths. I also evaluated the train and test accuracy of each of these algoriths, shown in Figure 3. Figure 2: The LogitBoost, MedusaBoost, and QuadBoost all perfor very siilarly, with QuadBoost becoing less optial in later iterations. LogitBoost is a uch ore coputationally heavy algorith, since it requires an additional loop of the training exaples that the other two do not. AdaBoost is shown even though it is not attepting to iniize this loss function. 8

9 Figure 3: All algoriths have test accuracy within or 2 % on this proble. CollinsBoost appears to perfor the worst of the log-loss algoriths. The training accuracy of LogitBoost, QuadBoost and MedusaBoost are basically identical. 3.2 Extension to real-valued features Although it was not presented this way, the Alternating Decision Tree algorith uses a predictor function of the following for: F x) = T t= f tj x) where the f tj x) are base features that are boolean conditions on x, and the product of these functions fors an AND. The loss function the exponential loss was used in the ADT algorith) is the iniized greedily with respect to adding a new ter at each iteration. The tree part of the algorith derived fro the restriction that new ters can only involve a cobination of base features that existed in a previous ter, plus one additional base feature. We can attept extend this odel to incorporate general real-valued base features f tj x). Let c be the score of the new feature, and let f k x) be the new feature. The exponential loss used by Adaboost and the original ADT is then: J exp c, f k ) = w i exp y i cf k x i )) Whereas previously the f k x i ) was either zero or one. In our new case, there is no longer a closed-for expression for the optial c given a choice of f k. We would have to resort to a nuerical optiization. The sae holds if we want to use the log-loss. 9 c t n t j=

10 Interestingly, if we use QuadBoost or GradientBoost procedure as above, we can still use atrix ultiplications to approxiate which f k x i ) should be chosen. The sae derivations as above apply to this case. For GradientBoost the criterion for choosing a feature f k becoes: dj log c; f k ) y i f k x i ) exp y i F k x i ))) dc = c=0 + exp y i F k x i ) = y i f k x i )w i = w i f k x i ) w i f k x i ) i:y i = i:y i =+ where the w i := as before. If the f +expy i F k x i )) kx i ) are expressed in a atrix or pair of atrices for decoposable features used in MEDUSA then these sus can be perfored with a atrix ultiplication. For LogitBoost, having chosen an f k using the above GradientBoost ethod, we would choose the optial c using a single Newton step, which is now: c i:y f k ) = i =+ w if k x i ) i:y i = w if k x i ) i f kx i ) 2 w i w i ) i = w iy i f k x i ) i f kx i ) 2 w i w i ) The QuadBoost criterion for choosing an f k is then: J f k ) = w iy i f k x i )) 2 2 f kx i ) 2 w i w i ) and these sus can be coputed by atrix ultiplication. For MEDUSA with decoposable features, the su over i becoes a su over g and e, and code does not have to change. 3.3 Binary response variable logistic regression) The loss as a function of the new feature choice k and new coefficient c is: Jk, c) = log + exp y i F i + ca ki ))) The derivative with respect to c is: dj dc = exp y i F i + ca ki )) y i A ki + exp y i F i + ca ki )) = y i A ki + expy i F i + ca ki )) Setting the derivative equal to zero does not given an analytic solution even if we restrict ourselves to binary features). Instead, we 0

11 3.4 Poisson response variable If the response variable is distributed according to a Poisson distribution where logλ) is a weighted su of features, then, the loss as a function of the new choice of feature k and the new score c derived fro the negative log-likelihood) is: Jk, c) = = c y i F i + ca ki ) + exp F i + ca ki ) y i A ki + The derivative with respect to c is dj dc = y i A ki + expca ki ) expf i ) y i F i A ki expca ki ) expf i ) Setting the derivative to zero does not given an analytic solution for the optial c in the general case, and we will resort to using one Newton step as before. However, in the case of boolean features A ki {0, }) we can find an analytic solution: expc) A ki expf i ) = y i A ki = c k = log A ) kiy i A ki expf i ) Substituting this expression into Jk, c) gives the optial loss achievable for each choice of feature k to add: ) Jk = J prev A ki expf i ) A ki y i log A ) ) kiy i A ki expf i ) The required sus are A kiy i and A ki expf i ), which can each be coputed for all k with atrix ultiplications. If we don t restrict ourselves to boolean features, then we resort to a Newton step. The first and second derivatives with respect to c, evaluated at c = 0 are: dj dc = y i A ki + A ki expf i ) c=0 The Newton step is then: d 2 J dc 2 = c=0 c k = A 2 ki expf i ) dj dc c=0 = d 2 J c=0 dc 2 A kiy i F i ) A2 ki expf i)

12 3.5 Gaussian response variable If the response variable is norally distributed, then the loss as a function of the new choice of feature k and the new score c is: Jk, c) = y i F i ca ki ) 2 Miniizing with respect to c we have the optial c if for each k: c k = A kiy i F i ) A2 ki Substituting this expression for c k into Jk, c) gives the optial loss achievable for each choice k of feature to add: Jk j= = y i F i A A ) 2 kjy j F j ) ki j= A2 kj = y i F i ) 2 j= 2 y i F i )A A kjy j F j ) ki + A 2 j= A ) 2 kjy j F j ) j= A2 ki kj j= A2 kj = J prev A kiy i F i )) 2 A2 ki The nuerator and denoinator can be calculated efficiently for all k at once using atrix ultiplication. If the data consists of two diensions g and e, and the features are the product of two base features and r of the for A r ge = BgeC e, r then the optial loss for a choice of and r is: J r = J prev G E g= e= B gecey r ge F ge ) E ) e= B 2 ge C r e ) 2 G g= Now, the nuerator involves a specialized atrix ultiplication By F )C of the sae coplexity as a noral atrix ultiplication). References [] Michael Collins, Robert E. Schapire, and Yora Singer. Logistic regression, adaboost and bregan distances. In MACHINE LEARNING, pages 58 69, [2] Jeroe Friedan, Trevor Hastie, and Robert Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28:2000, 998. [3] Robert E. Schapire. The boosting approach to achine learning: An overview, ) 2 2

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

List Scheduling and LPT Oliver Braun (09/05/2017)

List Scheduling and LPT Oliver Braun (09/05/2017) List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION A eshsize boosting algorith in kernel density estiation A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION C.C. Ishiekwene, S.M. Ogbonwan and J.E. Osewenkhae Departent of Matheatics, University

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes Radeacher Coplexity Margin Bounds for Learning with a Large Nuber of Classes Vitaly Kuznetsov Courant Institute of Matheatical Sciences, 25 Mercer street, New York, NY, 002 Mehryar Mohri Courant Institute

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

Topic 5a Introduction to Curve Fitting & Linear Regression

Topic 5a Introduction to Curve Fitting & Linear Regression /7/08 Course Instructor Dr. Rayond C. Rup Oice: A 337 Phone: (95) 747 6958 E ail: rcrup@utep.edu opic 5a Introduction to Curve Fitting & Linear Regression EE 4386/530 Coputational ethods in EE Outline

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words) 1 A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine (1900 words) Contact: Jerry Farlow Dept of Matheatics Univeristy of Maine Orono, ME 04469 Tel (07) 866-3540 Eail: farlow@ath.uaine.edu

More information

1 Identical Parallel Machines

1 Identical Parallel Machines FB3: Matheatik/Inforatik Dr. Syaantak Das Winter 2017/18 Optiizing under Uncertainty Lecture Notes 3: Scheduling to Miniize Makespan In any standard scheduling proble, we are given a set of jobs J = {j

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon Lecture 2: Enseble Methods Isabelle Guyon guyoni@inf.ethz.ch Introduction Book Chapter 7 Weighted Majority Mixture of Experts/Coittee Assue K experts f, f 2, f K (base learners) x f (x) Each expert akes

More information

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS ISSN 1440-771X AUSTRALIA DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS An Iproved Method for Bandwidth Selection When Estiating ROC Curves Peter G Hall and Rob J Hyndan Working Paper 11/00 An iproved

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

Variations on Backpropagation

Variations on Backpropagation 2 Variations on Backpropagation 2 Variations Heuristic Modifications Moentu Variable Learning Rate Standard Nuerical Optiization Conjugate Gradient Newton s Method (Levenberg-Marquardt) 2 2 Perforance

More information

Machine Learning: Fisher s Linear Discriminant. Lecture 05

Machine Learning: Fisher s Linear Discriminant. Lecture 05 Machine Learning: Fisher s Linear Discriinant Lecture 05 Razvan C. Bunescu chool of Electrical Engineering and Coputer cience bunescu@ohio.edu Lecture 05 upervised Learning ask learn an (unkon) function

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Bootstrapping Dependent Data

Bootstrapping Dependent Data Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Soft-margin SVM can address linearly separable problems with outliers

Soft-margin SVM can address linearly separable problems with outliers Non-linear Support Vector Machines Non-linearly separable probles Hard-argin SVM can address linearly separable probles Soft-argin SVM can address linearly separable probles with outliers Non-linearly

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space Journal of Machine Learning Research 3 (2003) 1333-1356 Subitted 5/02; Published 3/03 Grafting: Fast, Increental Feature Selection by Gradient Descent in Function Space Sion Perkins Space and Reote Sensing

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

Introduction to Machine Learning. Recitation 11

Introduction to Machine Learning. Recitation 11 Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA) Bayesian Learning Chapter 6: Bayesian Learning CS 536: Machine Learning Littan (Wu, TA) [Read Ch. 6, except 6.3] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theore MAP, ML hypotheses MAP learners Miniu

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Variational Adaptive-Newton Method

Variational Adaptive-Newton Method Variational Adaptive-Newton Method Mohaad Etiyaz Khan Wu Lin Voot Tangkaratt Zuozhu Liu Didrik Nielsen AIP, RIKEN, Tokyo Abstract We present a black-box learning ethod called the Variational Adaptive-Newton

More information

Probabilistic Machine Learning

Probabilistic Machine Learning Probabilistic Machine Learning by Prof. Seungchul Lee isystes Design Lab http://isystes.unist.ac.kr/ UNIST Table of Contents I.. Probabilistic Linear Regression I... Maxiu Likelihood Solution II... Maxiu-a-Posteriori

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Approximation in Stochastic Scheduling: The Power of LP-Based Priority Policies

Approximation in Stochastic Scheduling: The Power of LP-Based Priority Policies Approxiation in Stochastic Scheduling: The Power of -Based Priority Policies Rolf Möhring, Andreas Schulz, Marc Uetz Setting (A P p stoch, r E( w and (B P p stoch E( w We will assue that the processing

More information

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Homework 3 Solutions CSE 101 Summer 2017

Homework 3 Solutions CSE 101 Summer 2017 Hoework 3 Solutions CSE 0 Suer 207. Scheduling algoriths The following n = 2 jobs with given processing ties have to be scheduled on = 3 parallel and identical processors with the objective of iniizing

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Order Recursion Introduction Order versus Time Updates Matrix Inversion by Partitioning Lemma Levinson Algorithm Interpretations Examples

Order Recursion Introduction Order versus Time Updates Matrix Inversion by Partitioning Lemma Levinson Algorithm Interpretations Examples Order Recursion Introduction Order versus Tie Updates Matrix Inversion by Partitioning Lea Levinson Algorith Interpretations Exaples Introduction Rc d There are any ways to solve the noral equations Solutions

More information

paper prepared for the 1996 PTRC Conference, September 2-6, Brunel University, UK ON THE CALIBRATION OF THE GRAVITY MODEL

paper prepared for the 1996 PTRC Conference, September 2-6, Brunel University, UK ON THE CALIBRATION OF THE GRAVITY MODEL paper prepared for the 1996 PTRC Conference, Septeber 2-6, Brunel University, UK ON THE CALIBRATION OF THE GRAVITY MODEL Nanne J. van der Zijpp 1 Transportation and Traffic Engineering Section Delft University

More information

The Methods of Solution for Constrained Nonlinear Programming

The Methods of Solution for Constrained Nonlinear Programming Research Inventy: International Journal Of Engineering And Science Vol.4, Issue 3(March 2014), PP 01-06 Issn (e): 2278-4721, Issn (p):2319-6483, www.researchinventy.co The Methods of Solution for Constrained

More information

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE Proceeding of the ASME 9 International Manufacturing Science and Engineering Conference MSEC9 October 4-7, 9, West Lafayette, Indiana, USA MSEC9-8466 MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL

More information

Deep Boosting. Abstract. 1. Introduction

Deep Boosting. Abstract. 1. Introduction Corinna Cortes Google Research, 8th Avenue, New York, NY Mehryar Mohri Courant Institute and Google Research, 25 Mercer Street, New York, NY 2 Uar Syed Google Research, 8th Avenue, New York, NY Abstract

More information

arxiv: v1 [math.na] 10 Oct 2016

arxiv: v1 [math.na] 10 Oct 2016 GREEDY GAUSS-NEWTON ALGORITHM FOR FINDING SPARSE SOLUTIONS TO NONLINEAR UNDERDETERMINED SYSTEMS OF EQUATIONS MÅRTEN GULLIKSSON AND ANNA OLEYNIK arxiv:6.395v [ath.na] Oct 26 Abstract. We consider the proble

More information

Randomized Recovery for Boolean Compressed Sensing

Randomized Recovery for Boolean Compressed Sensing Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch

More information

On Constant Power Water-filling

On Constant Power Water-filling On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives

More information

Distributed Subgradient Methods for Multi-agent Optimization

Distributed Subgradient Methods for Multi-agent Optimization 1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

3.3 Variational Characterization of Singular Values

3.3 Variational Characterization of Singular Values 3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and

More information

Effective joint probabilistic data association using maximum a posteriori estimates of target states

Effective joint probabilistic data association using maximum a posteriori estimates of target states Effective joint probabilistic data association using axiu a posteriori estiates of target states 1 Viji Paul Panakkal, 2 Rajbabu Velurugan 1 Central Research Laboratory, Bharat Electronics Ltd., Bangalore,

More information

1 Rademacher Complexity Bounds

1 Rademacher Complexity Bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

arxiv: v1 [cs.ds] 29 Jan 2012

arxiv: v1 [cs.ds] 29 Jan 2012 A parallel approxiation algorith for ixed packing covering seidefinite progras arxiv:1201.6090v1 [cs.ds] 29 Jan 2012 Rahul Jain National U. Singapore January 28, 2012 Abstract Penghui Yao National U. Singapore

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5,

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, 2015 31 11 Motif Finding Sources for this section: Rouchka, 1997, A Brief Overview of Gibbs Sapling. J. Buhler, M. Topa:

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Tracking using CONDENSATION: Conditional Density Propagation

Tracking using CONDENSATION: Conditional Density Propagation Tracking using CONDENSATION: Conditional Density Propagation Goal Model-based visual tracking in dense clutter at near video frae rates M. Isard and A. Blake, CONDENSATION Conditional density propagation

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory Pseudo-arginal Metropolis-Hastings: a siple explanation and (partial) review of theory Chris Sherlock Motivation Iagine a stochastic process V which arises fro soe distribution with density p(v θ ). Iagine

More information

Randomized Accuracy-Aware Program Transformations For Efficient Approximate Computations

Randomized Accuracy-Aware Program Transformations For Efficient Approximate Computations Randoized Accuracy-Aware Progra Transforations For Efficient Approxiate Coputations Zeyuan Allen Zhu Sasa Misailovic Jonathan A. Kelner Martin Rinard MIT CSAIL zeyuan@csail.it.edu isailo@it.edu kelner@it.edu

More information

Multiple Testing Issues & K-Means Clustering. Definitions related to the significance level (or type I error) of multiple tests

Multiple Testing Issues & K-Means Clustering. Definitions related to the significance level (or type I error) of multiple tests StatsM254 Statistical Methods in Coputational Biology Lecture 3-04/08/204 Multiple Testing Issues & K-Means Clustering Lecturer: Jingyi Jessica Li Scribe: Arturo Rairez Multiple Testing Issues When trying

More information

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair Proceedings of the 6th SEAS International Conference on Siulation, Modelling and Optiization, Lisbon, Portugal, Septeber -4, 006 0 A Siplified Analytical Approach for Efficiency Evaluation of the eaving

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

arxiv: v1 [cs.lg] 8 Jan 2019

arxiv: v1 [cs.lg] 8 Jan 2019 Data Masking with Privacy Guarantees Anh T. Pha Oregon State University phatheanhbka@gail.co Shalini Ghosh Sasung Research shalini.ghosh@gail.co Vinod Yegneswaran SRI international vinod@csl.sri.co arxiv:90.085v

More information

SimpleMKL. hal , version 1-26 Jan Abstract. Alain Rakotomamonjy LITIS EA 4108 Université de Rouen Saint Etienne du Rouvray, France

SimpleMKL. hal , version 1-26 Jan Abstract. Alain Rakotomamonjy LITIS EA 4108 Université de Rouen Saint Etienne du Rouvray, France SipleMKL Alain Rakotoaonjy LITIS EA 48 Université de Rouen 768 Saint Etienne du Rouvray, France Francis Bach INRIA - Willow project Départeent d Inforatique, Ecole Norale Supérieure 45, Rue d Ul 7523 Paris,

More information

Support recovery in compressed sensing: An estimation theoretic approach

Support recovery in compressed sensing: An estimation theoretic approach Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de

More information

Hybrid System Identification: An SDP Approach

Hybrid System Identification: An SDP Approach 49th IEEE Conference on Decision and Control Deceber 15-17, 2010 Hilton Atlanta Hotel, Atlanta, GA, USA Hybrid Syste Identification: An SDP Approach C Feng, C M Lagoa, N Ozay and M Sznaier Abstract The

More information

Lecture 9: Multi Kernel SVM

Lecture 9: Multi Kernel SVM Lecture 9: Multi Kernel SVM Stéphane Canu stephane.canu@litislab.eu Sao Paulo 204 April 6, 204 Roadap Tuning the kernel: MKL The ultiple kernel proble Sparse kernel achines for regression: SVR SipleMKL:

More information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub

More information

Training an RBM: Contrastive Divergence. Sargur N. Srihari

Training an RBM: Contrastive Divergence. Sargur N. Srihari Training an RBM: Contrastive Divergence Sargur N. srihari@cedar.buffalo.edu Topics in Partition Function Definition of Partition Function 1. The log-likelihood gradient 2. Stochastic axiu likelihood and

More information

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS N. van Erp and P. van Gelder Structural Hydraulic and Probabilistic Design, TU Delft Delft, The Netherlands Abstract. In probles of odel coparison

More information

W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS

W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS. Introduction When it coes to applying econoetric odels to analyze georeferenced data, researchers are well

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

Statistical clustering and Mineral Spectral Unmixing in Aviris Hyperspectral Image of Cuprite, NV

Statistical clustering and Mineral Spectral Unmixing in Aviris Hyperspectral Image of Cuprite, NV CS229 REPORT, DECEMBER 05 1 Statistical clustering and Mineral Spectral Unixing in Aviris Hyperspectral Iage of Cuprite, NV Mario Parente, Argyris Zynis I. INTRODUCTION Hyperspectral Iaging is a technique

More information

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents

More information

Mixed Robust/Average Submodular Partitioning

Mixed Robust/Average Submodular Partitioning Mixed Robust/Average Subodular Partitioning Kai Wei 1 Rishabh Iyer 1 Shengjie Wang 2 Wenruo Bai 1 Jeff Biles 1 1 Departent of Electrical Engineering, University of Washington 2 Departent of Coputer Science,

More information

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul

More information

Convolutional Codes. Lecture Notes 8: Trellis Codes. Example: K=3,M=2, rate 1/2 code. Figure 95: Convolutional Encoder

Convolutional Codes. Lecture Notes 8: Trellis Codes. Example: K=3,M=2, rate 1/2 code. Figure 95: Convolutional Encoder Convolutional Codes Lecture Notes 8: Trellis Codes In this lecture we discuss construction of signals via a trellis. That is, signals are constructed by labeling the branches of an infinite trellis with

More information

lecture 36: Linear Multistep Mehods: Zero Stability

lecture 36: Linear Multistep Mehods: Zero Stability 95 lecture 36: Linear Multistep Mehods: Zero Stability 5.6 Linear ultistep ethods: zero stability Does consistency iply convergence for linear ultistep ethods? This is always the case for one-step ethods,

More information