A Theoretical Analysis of a Warm Start Technique

Size: px
Start display at page:

Download "A Theoretical Analysis of a Warm Start Technique"

Transcription

1 A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful for early steps where the current position is nowhere near optial. There has been a lot of interest in war-start approaches to gradient descent techniques, but little analysis. In this paper, we forally analyze a ethod of war-starting batch gradient descent using sall batch sizes. We argue that this approach is fundaentally different than ini-batch, in that after an initial shuffle, it requires only sequential passes over the data, iproving perforance on datasets stored on a disk drive. 1 Support Vector Machine Probles Suppose you have a set of exaples x 1, y 1... x, y, drawn fro a distribution D, and a regularization paraeter λ. For each exaple i, there is soe convex loss L i : R d R over the paraeter space associated with exaple i, such that for any S {1... }, we have a cost function f S : R d R: f S w = λ w + 1 S L i w. 1 Also, we can have a distribution D over loss functions. Using this we can define: f D w = λ w + LwD dl. Here, we will focus on finding a point w such that f D w is as sall as possible. This is often done by iniizing f As an interediate step, we focus on Euclidean distance to w = argin w f D w, as any ties this is a syetric proble. Define G to be a bound on the agnitude of the gradient of functions in D. Define w 1... R d such that w 1... = argin w f 1... w. We will first bound f S w with high probability, and then connect this to the distance between w 1... and w. A quick theore, in the vein of statistical learning theory: Lea 1 With probability of at least 1 δ, w 1... w ln d+ln ln δ8dg λ. The proof is in the appendix. In [1], they argue that with access to an infinite aount of data, it akes sense to balance the loss due to optiization error and the loss due to estiation error. Here, we use this balance not just to tune the algorith but to alter its structure. We odify batch gradient by having ultiple epochs, doubling the size of the batch each epoch. One key practical eleent that we want to ephasize is that rando access has a high cost, especially on hard disk drives. For exaple, a odern disk drive can read or write about 100 egabytes a second of data written in sequence in a file: however, if you want a rando chunk off the disk, it can take about 10 illiseconds for the disk to spin to a specific location and for the head to ove i S 1

2 to access that one byte. Therefore, if your exaples are 1 kilobyte in size, then you can either read 10,000 sequentially per second or 100 randoly per second. On ore expensive solid state drives, the gap is saller, but this can ake storage prohibitively expensive. However, special tricks can be applied to shuffle data see below. Thus, stochastic gradient descent [] and ini-batch [7, 5] can be significantly ore expensive than one ight expect by just looking at the nuber of iterations. In practical ipleentations such as [6], the stochastic aspect of the stochastic gradient descent is ignored, and the exaples are passed over sequentially. Although in practice this algorith works fine and you can prove Algorith 1 Shuffled Stochastic Gradient Descent Input: Cost functions L 1... L, epochs T, step size η, initial w R n. If not already shuffled, shuffle L 1... L. for t = 1 to T do for i = 1 to do w w η f i w that it converges, getting practical convergence rates is incredibly difficult. This represents a huge discrepency between theory and practice in the field. One critical issue is that you ust shuffle the data: if the data was generated independently and identically each exaple drawn at rando fro the sae distribution, then no shuffling is necessary. On the other hand, if it was then reordered for exaple, the data was ordered by date, or label, then it ust be shuffled. However, oddly, shuffling the data does not require uch rando access. For exaple, we can take a dataset, and add a rando key to each entry. Then, using fast ethods for large scale sorting such as ergesort, we can sort by the rando key, effecting a shuffle using n log n tie. Therefore, the question we ask in this paper is: Given a dataset that is shuffled once and accessed sequentially thereafter, what theoretical guarantees can be obtained? Although we do not have a tight analysis of Shuffled Stochastic Gradient Descent, in this paper we present another practical algorith which is easier to analyze. Algorith Doubling Batch Gradient Descent Input: Cost functions L 1... L, initial batch size initial, iterations T per size, step size η, initial w R n, iterations T 0 in the first phase. If not already shuffled, shuffle L 1... L. initial. {The first epoch} for t = 1 to T 0 do w w η f 1... w while < do {Each iteration of the while loop is another epoch} in,. for t = 1 to T do w w η f 1... w end while Theoretical Analysis of Doubling Batch Gradient Descent In this section we show how to set the paraeters to prove a significant iproveent over traditional analysis. Define C = ln d+ln ln δ+ln log 8dG λ see Lea 3 for why.

3 Theore Running Algorith 3 with = 1, T 0 = T = κ + 1 ln1 +, η = 1 H+λ achieves an error of H+λ4C. Before we prove this, we need two leas. The paraeters T 0 and T are selected such that at the end of each epoch, the bound on w w 1... is roughly equal to the bound on w 1... w. Lea 3 With probability at least 1 δ, for all used, w 1... w C. Proof: This follows fro union bounds over the log bounds, using Lea 1, substituting δ/ log for δ. Suppose that there is a upper bound H on the Lipschitz constant on the gradient of the functions L 1... L. Then, for all S {1... }, we have a bound on the Lipschitz constant of the gradient of f S. Define dx, y = x y. Therefore, we can use a slight tweak on Lea 16 fro the appendix of [8], siilar to the exponential convergence in cost described in [3], chapter 9: Lea 4 Given a convex function L where L is Lipschitz continuous with constant H, define cx = λ x + Lx. If η λ + H 1, and η < 1, then for all x: dw η cw, w 1 ηλdw, w 3 The proof is ostly in [8], but can be slightly generalized aking strict inequalities conditions siply inequalities. Proof of Theore : Thus, if we set η = 1 H+λ, then we know that we ove exponentially fast. Define κ = H λ to be the conditioning nuber of the syste. Define w 0 to be the initial position at the beginning of a phase. After T steps: dw, w λ T dw, w 0 4 H + λ dw, w 1... exp λ H + λ T dw, w 0 5 Suppose that we choose T 0 and the initial saple size initial such that dw, w 1... = the first epoch. Then, since dw, w 1... C and dw, w 1... C, then: C after dw, w 1... dw, w dw, w dw, w dw, w 1... C + C + dw, w C C 7 8 Therefore, if T = κ + 1 ln κ + 1 ln1 + <.68κ , then at the end of the epoch, dw, w 1... C. Here is the key eleent: note that T only has a linear dependence on κ, and is independent of, or even the final error of the algorith. Moreover, a single iteration has a cost of d, and so a single epoch except the first has a cost of ln1 + κ + 1 d. Note that C > G λ, so if = 1, and w 0 = 0, then w 0 w 1... C/. Therefore, we can choose T 0 < T, and push initial to be significantly higher than 1. So, the first epoch will not take longer than others. 3

4 Assue that / initial is a power of. The coputational cost of the algorith will be: Cost initial + initial + 4 initial / +.68κ d 9 Cost t.68κ d 10 t=0 Cost.68κ d. 11 This algorith achieves a paraeter C distance fro the optial. Using the bounds on the Lipschitz constant on the second derivative, this achieves a bound in cost 1 of H+λ4C. Note that we did not change and T 0 fro the natural defaults of 1 and T. One other reason for increasing is to affect the conditioning nuber of the resulting proble. As the loss functions fro any exaples are averaged, the proble can becoe ore well-conditioned. Moreover, in certain cases, one can epirically bound the value of H by one pass over the data where you average the exaples, allowing you to adjust the nuber of iterations per epoch accordingly. See Appendix B. Running batch gradient ethod to obtain this optiization error, referring to [1], or using the analysis here, requires at least on the order κd log 4C H+λ barring varying definitions of the conditioning nuber. Therefore, by scaling up the data set size, we have eliinated this last ter fro our bounds. 3 Conclusion Most discussions of large-scale learning focus on parallelization across different achines. However, another direction is to push learning deeper into the eory hierarchy. This paper has argued that deeper in the eory hierarchy, the difference between sequential access tie and rando access tie is very significant, aking stochastic gradient and ini-batch approaches less attractive than they would otherwise be. One very powerful arguent against batch is the overkill involved in looking at the gradient of every exaple at the initial point: this paper has to soe degree addressed this issue. However, barring the issues of rando access, stochastic gradient descent and ini-batch have very different properties, not to ention second order ethods. Overall, the larger question is: Given a dataset that is shuffled once and accessed sequentially thereafter, what theoretical guarantees can be obtained? In other words, what guarantees can be obtained with pseudo-stochastic gradient descent, where the data is shuffled once, but then passed over ultiple ties? What if ini-batch is used? Theoretically speaking, Newton s ethod basically takes six steps once it has reached its asyptotic stage [3]. However, it can take a while to reach that phase: can this analysis apply in that phase? On reasonable arguent for ignoring soe of the practical issues with stochastic ethods is the arrival of solid state drives, which have radically different perforance characteristics than disk drives. But for the foreseeable future, a achine focused on processing data streaing fro a disk will be a very cheap option. Secondly, the eory hierarchy itself presents interesting opportunities and challenges: it is conceivable that data could be accessed ultiple ties in eory while new data is being drawn fro disk. These types of approaches are being enabled with tools like Torch [4], but there is little theoretical understanding. We ake no clais that this is the first exaple of shuffling once and running any ties. In fact, this represents a standard trick e.g., it can be applied using Vowpal Wabbit [6], but one with little theory supporting it. If the theory behind shuffling algoriths and pseudo-stochastic algoriths can be expanded, it will better connect the theory of achine learning optiization and its practice. 1 Note that this is a slightly different bound than in [1]: note that C has a 1/λ ter, so that this bound is neither strictly better or worse than the bounds in that paper. In this work, we define the conditioning nuber to be the ratio of the Lipschitz constant of the gradient to the strong convexity: in [1], they focus on a siilar ratio in the vicinity of the optial point, whereas [3] use a easure siilar to the one in this paper. 4

5 References [1] L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In J. C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Inforation Processing Systes, 008. [] Lon Bottou. Stochastic learning. In Olivier Bousquet and Ulrike von Luxburg, editors, Advanced Lectures on Machine Learning, pages , 004. [3] S. Boyd and L. Vandenberghe. Convex Optiization. Cabridge University Press, New York, 004. [4] R. Collobert, S. Bengio, L. Bottou, J. Weston, and I.Melvin. Torch 5, [5] O. Dekel, R. Gilad-Bachrach, O. Shair, and L. Xiao. Optial distributed online prediction. In Proceedings of the Twenty-Eighth International Conference on Machine Learning, 011. [6] J. Langford. Vowpal wabbit, [7] N. Schraudolph and T. Graepel. Cobining conjugate direction ethods with stochastic approxiation of gradients. In C. Bishop and B. Frey, editors, Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, 003. [8] M. Zinkevich, M. Weier, A. Sola, and L. Li. Parallelized stochastic gradient descent. In J. Lafferty, C. K. I. Willias, J. Shawe-Taylor, R. S. Zeel, and A. Culotta, editors, Advances in Neural Inforation Processing Systes,

6 A Non-Asyptotic Estiation Error Lea 5 Hoeffding s Inequality for all i in 1... n, PrX i [a i, b i] = 1, and X = X X n/n, then Pr X E[X ] t n t exp n bi. 1 ai We will also use a basic result fro strong convexity: Lea 6 [3], Equation 9.11If x is the iniu of f, and f is h-strongly convex, then x x h fx. An interesting issue is establishing how accurate we are. Suppose that we know that regardless of the dataset sapled, we have a lower bound on the curvature. Define f t : R d R to be a function with saples. For any point x, if x is the optial point of f, then: Lea 1 With probability of at least 1 δ, w 1... w ln d+ln ln δ8dg λ. Proof: Since w iniizes f D, f D w = 0. Therefore, if L is drawn fro D, then the expected gradient of λ w + Lw at w is zero. Thus, the ith coponent of the gradient λwi + L w i has expected value zero, and is between G + λwi and G + λwi. Thus, we can use Hoeffding bounds to show that the average for any diension i: Pr f 1...w i t exp t 13 G Then, using union bounds: Pr f 1...w i t exp t. 14 G Pr i, f 1...w i t d exp t G Pr f 1...w t d d exp t G Now, fro Lea 6, since f 1... is λ-strongly convex, we can use f 1...w to bound the distance between w and the optial solution of f 1... Forally, w w 1... λ f1...w 17 Therefore, we push this into our previous results: Pr λ f1...w λ t d Pr w w 1... λ t d d exp t G d exp t G Setting ɛ = t d λ Therefore, setting We now have established:, and substituting ɛλ d for t: Pr f 1...w ɛ d exp ɛ λ 8dG 0 δ = d exp ɛ λ. 1 8dG Pr f 1...w ɛ δ 6

7 Therefore, solving for ɛ: ln δ = ln + ln d + ɛ λ 8dG ln δ ln ln d8dg = ɛ 4 λ ln d + ln ln δ8dg = ɛ 5 λ ln d + ln ln δ8dg = ɛ 6 λ 3 B Bounding H One drawback of this approach is that it requires deep knowledge of the paraeters of the syste, such as the axiu gradient and the Lipschitz constant of the derivative of the average of the losses. It turns out, it is fairly easy to bound H for the whole dataset in a single pass. Suppose that you consider the axiu curvature of the average of a bunch of loss functions. Suppose that these loss functions are of the for: L iw = lw x i, y i. 7 We assue that the Lipschitz constant of the derivative of the loss function l in the first ter is bounded above by H, and for all i, x i = 1. First, we are trying to bound the curvature of: L 1...w = 1 L iw 8 Define x = 1 xi. Lea 7 If for all i, for all j, x i j H x. 0, then the Lipschitz constant of L 1...w is bounded above by Proof: We know that: L iw = δlŷ, yi x i 9 δy i ŷ=w x i L iw L iw = δlŷ, yi x i δlŷ, yi x i 30 δy i ŷ=w x δy i i ŷ=w x i L iw δlŷ, y i L iw = δlŷ, yi x i 31 δy i ŷ=w x δy i i ŷ=w x i L iw L iw δlŷ, y i = δlŷ, yi x i 3 δy i ŷ=w x δy i i ŷ=w x i Using the Lipschitz bound on the derivative of l yields and the bound of x i : Liw L iw w x i w x i H Liw L iw w w x i H

8 Suing over all i: L1...w L 1...w 1 L iw L iw w w 1 1 Liw L iw 35 w w x i H 36 w w x i H 37 w w L iw L iw H = ax 38 w w w w 1 w w x i H 39 w w w w Suppose now that for all i, x i j > 0. Thus, for all j, if we wish to upper bound the left hand side, we can assue without loss of generality, w j w j 0, and reove the absolute values. Replace w w with a vector v: Using Cauchy-Schwarz: 1 w w H w w w w w w v 0 v 0 v 0 w w x i w w H 40 H v v x i 41 4 x i 43 H v v x 44 H v x 45 v v 0 H x 46 H H x 47 C One Shuffle Mini-Batch The ipractical aspect of ini-batch to apply to a disk drive is to grab a bunch of independent eleents. If one first divides the data into ini-batches, and these ini-batches are sufficiently large such that seek tie does not doinate the tie to read the batch data. Algorith 3 One Shuffle Mini-Batch Input: Cost functions L 1... L, ini-batch size, iterations T, initial w R n. If not already shuffled, shuffle L 1... L. for t = 1 to T do Draw i fro 0... / 1. w w η f i +1...i + w The average of the gradients of the ini-batches are still correct, so running stochastic gradient descent on the for a sufficiently sall gradient will work well. However, what is difficult is understanding what kind of win is obtained in ters of the variances of the gradients: oreover, it ay be necessary to bound this over the entire doain of the paraeters, which ight prove difficult. 8

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Introduction to Discrete Optimization

Introduction to Discrete Optimization Prof. Friedrich Eisenbrand Martin Nieeier Due Date: March 9 9 Discussions: March 9 Introduction to Discrete Optiization Spring 9 s Exercise Consider a school district with I neighborhoods J schools and

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011) E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Least Squares Fitting of Data

Least Squares Fitting of Data Least Squares Fitting of Data David Eberly, Geoetric Tools, Redond WA 98052 https://www.geoetrictools.co/ This work is licensed under the Creative Coons Attribution 4.0 International License. To view a

More information

3.3 Variational Characterization of Singular Values

3.3 Variational Characterization of Singular Values 3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and

More information

Tight Complexity Bounds for Optimizing Composite Objectives

Tight Complexity Bounds for Optimizing Composite Objectives Tight Coplexity Bounds for Optiizing Coposite Objectives Blake Woodworth Toyota Technological Institute at Chicago Chicago, IL, 60637 blake@ttic.edu Nathan Srebro Toyota Technological Institute at Chicago

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

Learnability and Stability in the General Learning Setting

Learnability and Stability in the General Learning Setting Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu

More information

1 Proving the Fundamental Theorem of Statistical Learning

1 Proving the Fundamental Theorem of Statistical Learning THEORETICAL MACHINE LEARNING COS 5 LECTURE #7 APRIL 5, 6 LECTURER: ELAD HAZAN NAME: FERMI MA ANDDANIEL SUO oving te Fundaental Teore of Statistical Learning In tis section, we prove te following: Teore.

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

1 Identical Parallel Machines

1 Identical Parallel Machines FB3: Matheatik/Inforatik Dr. Syaantak Das Winter 2017/18 Optiizing under Uncertainty Lecture Notes 3: Scheduling to Miniize Makespan In any standard scheduling proble, we are given a set of jobs J = {j

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space Journal of Machine Learning Research 3 (2003) 1333-1356 Subitted 5/02; Published 3/03 Grafting: Fast, Increental Feature Selection by Gradient Descent in Function Space Sion Perkins Space and Reote Sensing

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

Bootstrapping Dependent Data

Bootstrapping Dependent Data Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly

More information

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence Best Ar Identification: A Unified Approach to Fixed Budget and Fixed Confidence Victor Gabillon Mohaad Ghavazadeh Alessandro Lazaric INRIA Lille - Nord Europe, Tea SequeL {victor.gabillon,ohaad.ghavazadeh,alessandro.lazaric}@inria.fr

More information

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION GUANGHUI LAN AND YI ZHOU Abstract. In this paper, we consider a class of finite-su convex optiization probles defined over a distributed

More information

Randomized Accuracy-Aware Program Transformations For Efficient Approximate Computations

Randomized Accuracy-Aware Program Transformations For Efficient Approximate Computations Randoized Accuracy-Aware Progra Transforations For Efficient Approxiate Coputations Zeyuan Allen Zhu Sasa Misailovic Jonathan A. Kelner Martin Rinard MIT CSAIL zeyuan@csail.it.edu isailo@it.edu kelner@it.edu

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

arxiv: v1 [cs.ds] 17 Mar 2016

arxiv: v1 [cs.ds] 17 Mar 2016 Tight Bounds for Single-Pass Streaing Coplexity of the Set Cover Proble Sepehr Assadi Sanjeev Khanna Yang Li Abstract arxiv:1603.05715v1 [cs.ds] 17 Mar 2016 We resolve the space coplexity of single-pass

More information

Domain-Adversarial Neural Networks

Domain-Adversarial Neural Networks Doain-Adversarial Neural Networks Hana Ajakan, Pascal Gerain 2, Hugo Larochelle 3, François Laviolette 2, Mario Marchand 2,2 Départeent d inforatique et de génie logiciel, Université Laval, Québec, Canada

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016/2017 Lessons 9 11 Jan 2017 Outline Artificial Neural networks Notation...2 Convolutional Neural Networks...3

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago

More information

Topic 5a Introduction to Curve Fitting & Linear Regression

Topic 5a Introduction to Curve Fitting & Linear Regression /7/08 Course Instructor Dr. Rayond C. Rup Oice: A 337 Phone: (95) 747 6958 E ail: rcrup@utep.edu opic 5a Introduction to Curve Fitting & Linear Regression EE 4386/530 Coputational ethods in EE Outline

More information

Distributed Subgradient Methods for Multi-agent Optimization

Distributed Subgradient Methods for Multi-agent Optimization 1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions

More information

Testing Properties of Collections of Distributions

Testing Properties of Collections of Distributions Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the

More information

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation journal of coplexity 6, 459473 (2000) doi:0.006jco.2000.0544, available online at http:www.idealibrary.co on On the Counication Coplexity of Lipschitzian Optiization for the Coordinated Model of Coputation

More information

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and

More information

8.1 Force Laws Hooke s Law

8.1 Force Laws Hooke s Law 8.1 Force Laws There are forces that don't change appreciably fro one instant to another, which we refer to as constant in tie, and forces that don't change appreciably fro one point to another, which

More information

Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression

Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression Sha M Kakade Microsoft Research and Wharton, U Penn skakade@icrosoftco Varun Kanade SEAS, Harvard University vkanade@fasharvardedu

More information

arxiv: v2 [cs.lg] 30 Mar 2017

arxiv: v2 [cs.lg] 30 Mar 2017 Batch Renoralization: Towards Reducing Minibatch Dependence in Batch-Noralized Models Sergey Ioffe Google Inc., sioffe@google.co arxiv:1702.03275v2 [cs.lg] 30 Mar 2017 Abstract Batch Noralization is quite

More information

Supplement to: Subsampling Methods for Persistent Homology

Supplement to: Subsampling Methods for Persistent Homology Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation

More information

On the Impact of Kernel Approximation on Learning Accuracy

On the Impact of Kernel Approximation on Learning Accuracy On the Ipact of Kernel Approxiation on Learning Accuracy Corinna Cortes Mehryar Mohri Aeet Talwalkar Google Research New York, NY corinna@google.co Courant Institute and Google Research New York, NY ohri@cs.nyu.edu

More information

Measuring Temperature with a Silicon Diode

Measuring Temperature with a Silicon Diode Measuring Teperature with a Silicon Diode Due to the high sensitivity, nearly linear response, and easy availability, we will use a 1N4148 diode for the teperature transducer in our easureents 10 Analysis

More information

When Short Runs Beat Long Runs

When Short Runs Beat Long Runs When Short Runs Beat Long Runs Sean Luke George Mason University http://www.cs.gu.edu/ sean/ Abstract What will yield the best results: doing one run n generations long or doing runs n/ generations long

More information

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 2, FEBRUARY ETSP stands for the Euclidean traveling salesman problem.

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 2, FEBRUARY ETSP stands for the Euclidean traveling salesman problem. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO., FEBRUARY 015 37 Target Assignent in Robotic Networks: Distance Optiality Guarantees and Hierarchical Strategies Jingjin Yu, Meber, IEEE, Soon-Jo Chung,

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

arxiv: v3 [cs.lg] 7 Jan 2016

arxiv: v3 [cs.lg] 7 Jan 2016 Efficient and Parsionious Agnostic Active Learning Tzu-Kuo Huang Alekh Agarwal Daniel J. Hsu tkhuang@icrosoft.co alekha@icrosoft.co djhsu@cs.colubia.edu John Langford Robert E. Schapire jcl@icrosoft.co

More information

A Sequential Dual Method for Large Scale Multi-Class Linear SVMs

A Sequential Dual Method for Large Scale Multi-Class Linear SVMs A Sequential Dual Method for Large Scale Multi-Class Linear SVMs Kai-Wei Chang Dept. of Coputer Science National Taiwan University Taipei 106, Taiwan b92084@csie.ntu.edu.tw S. Sathiya Keerthi Yahoo! Research

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

time time δ jobs jobs

time time δ jobs jobs Approxiating Total Flow Tie on Parallel Machines Stefano Leonardi Danny Raz y Abstract We consider the proble of optiizing the total ow tie of a strea of jobs that are released over tie in a ultiprocessor

More information

Asynchronous Gossip Algorithms for Stochastic Optimization

Asynchronous Gossip Algorithms for Stochastic Optimization Asynchronous Gossip Algoriths for Stochastic Optiization S. Sundhar Ra ECE Dept. University of Illinois Urbana, IL 680 ssrini@illinois.edu A. Nedić IESE Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

ma x = -bv x + F rod.

ma x = -bv x + F rod. Notes on Dynaical Systes Dynaics is the study of change. The priary ingredients of a dynaical syste are its state and its rule of change (also soeties called the dynaic). Dynaical systes can be continuous

More information

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse

More information

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul

More information

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science A Better Algorith For an Ancient Scheduling Proble David R. Karger Steven J. Phillips Eric Torng Departent of Coputer Science Stanford University Stanford, CA 9435-4 Abstract One of the oldest and siplest

More information

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE Proceeding of the ASME 9 International Manufacturing Science and Engineering Conference MSEC9 October 4-7, 9, West Lafayette, Indiana, USA MSEC9-8466 MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a ournal published by Elsevier. The attached copy is furnished to the author for internal non-coercial research and education use, including for instruction at the authors institution

More information

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS ISSN 1440-771X AUSTRALIA DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS An Iproved Method for Bandwidth Selection When Estiating ROC Curves Peter G Hall and Rob J Hyndan Working Paper 11/00 An iproved

More information

Fairness via priority scheduling

Fairness via priority scheduling Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

On Conditions for Linearity of Optimal Estimation

On Conditions for Linearity of Optimal Estimation On Conditions for Linearity of Optial Estiation Erah Akyol, Kuar Viswanatha and Kenneth Rose {eakyol, kuar, rose}@ece.ucsb.edu Departent of Electrical and Coputer Engineering University of California at

More information

Approximation in Stochastic Scheduling: The Power of LP-Based Priority Policies

Approximation in Stochastic Scheduling: The Power of LP-Based Priority Policies Approxiation in Stochastic Scheduling: The Power of -Based Priority Policies Rolf Möhring, Andreas Schulz, Marc Uetz Setting (A P p stoch, r E( w and (B P p stoch E( w We will assue that the processing

More information

CHAPTER 19: Single-Loop IMC Control

CHAPTER 19: Single-Loop IMC Control When I coplete this chapter, I want to be able to do the following. Recognize that other feedback algoriths are possible Understand the IMC structure and how it provides the essential control features

More information

Some Perspective. Forces and Newton s Laws

Some Perspective. Forces and Newton s Laws Soe Perspective The language of Kineatics provides us with an efficient ethod for describing the otion of aterial objects, and we ll continue to ake refineents to it as we introduce additional types of

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

Detection and Estimation Theory

Detection and Estimation Theory ESE 54 Detection and Estiation Theory Joseph A. O Sullivan Sauel C. Sachs Professor Electronic Systes and Signals Research Laboratory Electrical and Systes Engineering Washington University 11 Urbauer

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single

More information