The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance

Size: px
Start display at page:

Download "The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance"

Transcription

1 The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance Marina Meilă University of Washington Department of Statistics Box Seattle, WA phone:(206) mmpstat.washington.edu February 20, 2006 Abstract We prove that the above two distances between partitions of a finite set are equivalent in the neighborhood of 0. In other words, if the two partitions are very similar, then d χ 2 defines upper and lower bounds on d ME and viceversa. The proof is geometric and relies on the convexity of a certain set of probability measures. The motivation for this work is in the area of data clustering, where these distances are frequently used to compare two clusterings of a set of observations. Besides, our result applies to any pair of finite valued random variables, and provides simple yet tight upper and lower bounds on the χ 2 measure of (in)dependence valid when the two variables are strongly dependent.

2 1 Motivation Clustering, or finding partitions in data, has become an increasingly popular part of data analysis. In order to theoretically study clustering, or in order to assess its behaviour empirically, one needs to compare clusterings of a finite set in a meaningful way. The Misclassification Error and the χ 2 distance are two distinct criteria for comparing clusterings, the first one being widely used in the computer science literature on clustering and the other one originating in statistics. Here we show that these two distances are equivalent in the case when the two partitions are very similar. In other words, if d ME is small, then d χ 2 is small too, and vicerversa. This result is, to my knowledge, the first ever to give a detailed local comparison of two distances between partitions. The case of small distances is of utmost importance, as it is in this regime that one desires the behaviour of any clustering algorithm to lie. Therefore, this proof provides a theoretical tool for the analysis of algorithms behavior and for the analysis of clustering criteria. In the empirical evaluation of clustering algorithms, understanding the small distances case allows one to make fine distinctions between various algorithms. The present equivalence theorems represent a step towards removing the dependence on the distance from the evaluation outcome. 2 Definitions and representation We consider a finite set D n. A clustering is a partition of D n is into sets C 1, C 2,... C K called clusters such that C k C l = and K C k = D n. Let the cardinality of cluster C k be n k respectively. We have, of course, that n = K k=1 n k. We also assume that n k > 0; in other words, that K represents the number of non-empty clusters. k=1 Representing clusterings as matrices. W.l.o.g. the set D n can be taken to be {1, 2,... n} def [n]. Denote by X a clustering {C 1, C 2,... C K }; X can be represented by the n K matrix A X with A ik = 1 if i C k and 0 otherwise. In this representation, the columns of A X are indicator vectors of the clusters and are orthogonal. Representing clusterings as random variables. The clustering X can also be represented as the random variable (denoted abusively by) X : [n] [K] taking value k [K] w.p. 2

3 n k n. One typically requires distances between partitions to be invariant to the permutations of the labels 1,... K. By this representation, any distance between two clusterings can be seen as a particular type of distance between random variables which is invariant to permutations. Let a second clustering of D n be Y = {C 1, C 2,... C K }, with cluster sizes n k. Note that the two clusterings may have different numbers of clusters. Lemma 1 The joint distribution of variables X, Y is given by p XY = 1 n AT XA Y (2.1) In other words, p XY (x, y) is the x, y-th element of the K K matrix in (2.1). In the above, the superscript () T denotes matrix transposition. The proof is immediate and is left to the reader. We now define the two distances between clusterings in terms of the joint probability matrix defined above. Definition 2 The misclassification error distance d ME between clusterings X, Y (with K K is where Π K d ME (X, Y ) = 1 max π Π K p XY (x, π(x)) i [K] is the set of all permutations of K objects. Although the maximization above is over a set of size K!, d ME can be computed in polynomial time by a maximum bipartite matching algorithm [Papadimitriou and Steigliz, 1998]. It can be shown that d ME is a metric (see e.g. [Meila, 2005]). This distance is widely used in the computer science literature on clustering, due to its direct relationship with the misclassification error cost of classification. It has indeed very appealing properties as long as X, Y are close[meilă, 2006]. Otherwise, its poor resolution represents a major hindrance. Definition 3 The χ 2 distance d χ 2 is defined as d χ 2(X, Y ) = min(k, K ) χ 2 (p XY ) with χ 2 (p XY ) = x,y p XY (x, y) 2 p X (x)p Y (y) (2.2) 3

4 The above definition and notation are motivated as follows. Lemma 4 Let p X = (p x ) x [K], p Y = (p y) y [K ] the marginals of p XY. Then, the function χ 2 (p XY ) defined in (2.2) represents the functional χ 2 (f, g) + 1 applied to f = p XY, g = p X p Y. Proof Denote p xy = p XY (x, y). χ 2 (f, g) = xy (p xy p x p y) 2 [ p 2 xy p x p y ] = xy p x p y 2p xy + p x p y = xy p 2 xy p x p y Hence, d χ 2 is a measure of independence. It is equal to 0 when the random variables X, Y are identical up to a permutation, and it equals 1 when they are independent. From lemma 4 one can see that d χ 2 is non-negative. This distance with slight variants has been used as a distance between partitions by [Hubert and Arabie, 1985, Bach and Jordan, 2004] with the obvious motivation of being related to the familiar χ 2 functional. The following lemma gives another, technical motivation for paying attention to d χ 2. Lemma 5 Let ÃX, Ã Y be the normalized matrix representations for X, Y defined by ÃX(i, k) = 1 nk if i C k and 0 otherwise. Hence, ÃX, (ÃY ) have orthonormal columns. Then, where F represents the Frobenius norm. χ 2 (p XY ) = ÃT XÃY 2 F (2.3) Proof Note that (ÃT XÃY ) xy = pxy pxp. y The above lemma shows that the d χ 2 distance is a quadratic function, making it a convenient instrument in proofs. Contrast this with the apparently simple d ME distance, which is not everywhere differentiable and is theoretically much harder to analyze. We close this section by noting that d χ 2 is concave in p XY while d ME is convex. For d χ 2, this follows from the convexity of the χ 2 functional [Vajda, 1989]. The d ME can be expressed as the minimum of a set of linear functions 1 ; therefore it is convex, which completes the argument. 1 d ME = minimum of the offidiagonal mass of p XY over all permutations 4

5 3 Small d χ 2 implies small d ME To prove this statement, we adopt the following framework. First, for simplicity, we assume that K = K; the generalization to K K is straightforward. Second, we will assume w.l.o.g that partition X is fixed, while Y is allowed to vary. In terms of random variables, the two assumptions describe the set of distributions over [K] [K] that have a fixed marginal p X = (p 1,... p K ). We denote this domain by P. In the rest of the section we will adopt the following notation: p represents a distribution from P, p xy is the probability of pair (x, y) [K] [K] under p, p Y = (p 1... p K ) is the second marginal of p. Thus, P = { p = [p xy ] x,y [K], p xy 0, y p xy = p x for y [K]}. Consequently, P is convex and bounded. We will show that the maxima of χ 2 over P have value K and are attained when the second random variable is a one-to-one function of the first. We call such a point optimal; the set of optimal points of P is denoted by E.Any element p in E is defined as p kk = p k if k = π(k) 0 otherwise where π represents a permutation of the indices 1, 2,... K. In the following it will be proved that if a joint distribution p in P is more than away from any optimal point then χ 2 ( p) will be bounded away from K. Theorem 6 For two clusterings represented by the joint distribution p XY, denote p min = min [K] p x, p max = max [K] p x. Then, for any p min, if d χ 2(p XY ) p max then d ME (p XY ). Outline of proof For a fixed π, we denote the corresponding optimal point by p π and the points which differ from p π by in p aa, p ab by p,π (a, b). Below is the definition of p,π in the case of the identical permutation. In what follows, whenever we consider one optimal point only, we shall assume w.l.o.g that π is the identical permutation, and omit it from the notation. x = a, y = b p a, x = y = a [ p (a, b)] xy = (3.1) p x, x = y a 0, otherwise 5

6 and thus, x = y = a [ p p (a, b)] xy = x = a, y = b (3.2) 0, otherwise For p min = min x p x let E π = { p,π (a, b), a, b [K] [K], a b}. We lower bound the value of χ 2 at all points in E, then we show that if d ME is greater than, then the value of χ 2 cannot be lower than this bound. These results will be proved as a series of lemmas, after which the formal proof of the theorem will close this section. Lemma 7 (i) The set of extreme points of P is E = { p φ : [K] [K ], p xy = p x if y = φ(x), 0 otherwise} (3.3) (ii) For p E, χ 2 ( p) = Rangeφ. Proof The proof of (i) is immediate and left to the reader. To prove (ii) let p E. We can write successively χ 2 ( p) = y p y >0 x φ 1 (y) = y p y >0 x φ 1 (y) p x z φ 1 (y) p z p 2 x p x z φ 1 (y) p z = y p y >0 1 = Rangeφ If Range(φ) = K, then φ is a permutation and we denote it by π. Let E = { p π } the set of extreme points for which χ 2 = K and E = E \ E the set of the extreme points for which χ 2 K 1. Lemma 8 Let B 1 () be the 1-norm ball of radius centered at p E. Then, B 1 (2) P = convex({ p } E ) Proof First we show that p p (a, b) 1 = 2. p p (a, b) 1 = x,y p xy p (a, b) xy = p aa p (a, b) aa + p ab p (a, b) ab = + = 2 (3.4) 6

7 For any point p B 1 (2) P denote by Then, it is easy to check that and Lemma 9 For all p B 1 (2) P e = x p = (1 e ) p + a (1 e ) + a y x b a p xy b a p ab d ME ( p) Proof Obvious, since d ME ( p) x y x p xy =. p ab p (a, b) = 1 Lemma 10 Let x = i α ix i with α i 0, i α i = 1 and, for all i, let y i be a point of the segment (x, x i ]. Then x is a convex combination of {y i }. Proof Let y i = β i x + (1 β i )x i, β [0, 1). Then x i = y i β i x 1 β i and replacing the above in the expression of x we get successively: [ αi x = y i α ] iβ i x 1 β i 1 β i i (3.5) = i α i 1 β i y i x i α i β i 1 β i (3.6) Hence with γ i 0 and γ i = i x = i i 1 + j α i 1 β i α i 1 β i α jβ j 1 β j 1 + j }{{} γ i α jβ j = 1 β j 1 + i 1 + j y i (3.7) α iβ i 1 β i α jβ j = 1 (3.8) 1 β j 7

8 Lemma 11 The set { p d ME ( p) } with p min is included in the convex hull of {E π} Π K E. Proof Let A = {d ME ( p) } and p A. Because p P it is a convex combination of the extreme points of P, it can be written as p = = E α i p π i, α i 0, i=1 K! i=1 E α i p π i + i=1 α i = 1 i α i+k! p i Let us look at the segment [ p, p π i ]; its first end, p is in A, while its other end is outside A and inside the ball B πi 1 (). As the ball is convex, there is a (unique) point p i = [ p, p π i ] B πi 1 (). This point being on the boundary of the ball, it can be written as a convex combination of points in E πi by Lemma 8. We now apply Lemma 10, with y i = p π i for i = 1,...K! and y i = p i K! for i > K!. It follows that p is a convex combination of p i, i = 1,...m, which completes the proof. Lemma 12 For p min χ 2 ( p ) χ 2 ( p(a, b)) p max Proof Compute χ 2 ( p (a, b)): Therefore χ 2 ( p (a, b)) = K 2 + (p a ) 2 p a (p a ) + 2 p a (p b + ) + p 2 b p b (p b + ) = K p a + = K (p a + p b ) p a (p b + ) 2 p a (p b + ) p b + (3.9) (3.10) (3.11) K p a (3.12) χ 2 ( p ) χ 2 ( p (a, b)) p a p max Proof of Theorem 6 By contradiction. Assume d ME ( p). Then, p A by lemma 11. Since χ 2 is convex on A, χ 2 ( p) cannot be larger than the maximum value at the extreme points 8

9 of A, which are contained in E ( π E π). But we know by lemma 12 that the value of χ2 is bounded above by K /p max at any point in E π and by K 1 at any point in E. Note also that a tight, non-linear bound can be obtained by maximizing (3.11) over all a, b. 4 Small d ME implies small d χ 2 Theorem 13 Let p XY represent a pair of clusterings with d ME (p XY ). Then d χ 2(p XY ) 2 p min The proof is based on the fact that a convex function is always above any tangent to its graph. We pick a point p that has d ME ( p) = and lower bound χ 2 ( p) by the tangent to χ 2 in the nearest p. We start by proving three lemmas then follow with the formal proof of the theorem. Lemma 14 The unconstrained partial derivatives of χ 2 in p are Proof χ 2 p ab = p ab χ 2 = p xy p [ = 1 p a p ab x 1 p 2 xy p x y ( p 2 ab x p x y = 1 p a 2p ab p b p2 ab.1 p 2 b = 2p ab p a p b x p 2 xb p x p 2 b 1 p y, x y 1 p x, x = y x p x y ] ) + x a 1 + x a p 2 xb p x p 2 b The result follows now by setting p xb = p x δ xb, p b = p b. p x p ab ( ) p 2 xb x p x b (4.1) (4.2) (4.3) (4.4) Lemma 15 For any p P χ 2 ( p ) χ 2 ( p) x y x ( pxy + p ) xy p x p y 9

10 Proof χ 2 is convex, therefore χ 2 ( p) is above the tangent at p, i.e χ 2 ( p) χ 2 ( p ) + vec( χ 2 ( p )) vec( p p ) (4.5) Denote vec( χ 2 ( p )) vec( p p ) = 1 xy + p x x y xp 1 p x y y x = ( pxy + p ) xy p x x p y y x p xy (4.6) (4.7) x = 1 p xy, x [K] (4.8) p x y x y = 1 p xy y [K] (4.9) p y x y These quantitites represent the relative leak of probability mass from the diagonal to the offdiagonal cells in row x, respectively in column y of the matrix p w.r.t p. Lemma 16 Let x, x [K] be as defined above, and assume that the marginals p x are sorted so that p min = p 1 p 2 p 3... p K = p max with x p x x =. Then, p 1, if [0, p 1 ] 1 + p1 p max x = 2, if (p 1, p 1 + p 2 ] { x} x... k + P x k px p k+1, if (p p k, p p k+1 ] Proof It is easy to verify the solution for p 1. For the other intervals, on verifies the solution by induction over k [K]. Proof [Theorem 13] Assume that d ME ( p) = 0. Then, w.l.o.g. one can assume that the off-diagonal elements of p sum to. It is easy to see that under the conditions of lemma 16 x x p min By symmetry, this bound also holds for y y. Therefore, by lemma 15 χ 2 ( p ) χ 2 ( p) 10 2 p min (4.10)

11 or d χ 2( p) 2 p min 2 0 p min 5 Remarks Although the original motivation for this work stems from comparing partitions, we have proved a result which holds for any two finite-valued random variables. In particular, the two theorems give lower and upper bounds on the χ 2 measure of independence between two random variables, holding locally when the two variables are strongly dependent. The present approximation complements an older approximation of χ 2 by the mutual information I XY = xy p xy ln pxy p xp. y It is known [Cover and Thomas, 1991] that the second order Taylor approximation of I XY = 1 2 (χ2 (p XY ) 1) with χ 2 defined as in (2.2). This approximation is good around p XY = p X p Y, hence in the weak dependence region. The non-linear bound (3.11) in Theorem 6 is tight. The proofs hold when the condition K = K is replaced by K K or even by K. It can be seen that both sets of bounds are tighter and hold for a larger range of when the clusterings have approximately equal clusters, that is when p min, p max both approach 1/K. This confirms the general intuition that clusterings with equal sized clusters are easier (and its counterpart, that clusterings containing very small clusters are hard ). Finally, a useful property of the theorems presented here is that they involve the values p min, p max of on one clustering only. Hence they can be applied in cases when only one clustering is known. For example, [Meilă et al., 2005] used this result in the context of spectral clustering, to prove that any clustering with low enough normalized cut is close to the (unknown) optimal clustering of that data set. References [Bach and Jordan, 2004] Bach, F. and Jordan, M. I. (2004). Learning spectral clustering. In Thrun, S. and Saul, L., editors, Advances in Neural Information Processing Systems 16,, Cambridge, MA. MIT Press. [Cover and Thomas, 1991] Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory. Wiley. 11

12 [Hubert and Arabie, 1985] Hubert, L. and Arabie, P. (1985). Comparing partitions. Journal of Classification, 2: [Meila, 2005] Meila, M. (2005). Comparing clusterings an axiomatic view. In Wrobel, S. and De Raedt, L., editors, Proceedings of the International Machine Learning Conference (ICML). Morgan Kaufmann. [Meilă, 2006] Meilă, M. (2006). Comparing clusterings an information based metric. Journal of Multivariate Analysis, page (in print). [Meilă et al., 2005] Meilă, M., Shortreed, S., and Xu, L. (2005). Regularized spectral learning. In Cowell, R. and Ghahramani, Z., editors, Proceedings of the Artificial Intelligence and Statistics Workshop(AISTATS 05). [Papadimitriou and Steigliz, 1998] Papadimitriou, C. and Steigliz, K. (1998). Combinatorial optimization. Algorithms and complexity. Dover Publication, Inc., Minneola, NY. [Vajda, 1989] Vajda, I. (1989). Theory of statistical inference and information. Theory and Decision Library. Series B: Mathematical and Statistical methods. Kluwer Academic Publishers, Norwell, MA. 12

Local equivalences of distances between clusterings a geometric perspective. Marina Meilă. Machine Learning ISSN

Local equivalences of distances between clusterings a geometric perspective. Marina Meilă. Machine Learning ISSN Local equivalences of distances between clusterings a geometric perspective Marina Meilă Machine Learning ISSN 0885-65 DOI 0.007/s0994-0-567-3 Your article is protected by copyright and all rights are

More information

Linear Algebra: Matrix Eigenvalue Problems

Linear Algebra: Matrix Eigenvalue Problems CHAPTER8 Linear Algebra: Matrix Eigenvalue Problems Chapter 8 p1 A matrix eigenvalue problem considers the vector equation (1) Ax = λx. 8.0 Linear Algebra: Matrix Eigenvalue Problems Here A is a given

More information

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and Athens Journal of Sciences December 2014 Discriminant Analysis with High Dimensional von Mises - Fisher Distributions By Mario Romanazzi This paper extends previous work in discriminant analysis with von

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint

More information

Majorizations for the Eigenvectors of Graph-Adjacency Matrices: A Tool for Complex Network Design

Majorizations for the Eigenvectors of Graph-Adjacency Matrices: A Tool for Complex Network Design Majorizations for the Eigenvectors of Graph-Adjacency Matrices: A Tool for Complex Network Design Rahul Dhal Electrical Engineering and Computer Science Washington State University Pullman, WA rdhal@eecs.wsu.edu

More information

The stability of a good clustering

The stability of a good clustering The stability of a good clustering Marina Meilă mmp@stat.washington.edu University of Washington, Box 345322 Seattle, WA 98195-4322 September 18, 2011 Abstract If we have found a good clustering C of a

More information

Supplemental for Spectral Algorithm For Latent Tree Graphical Models

Supplemental for Spectral Algorithm For Latent Tree Graphical Models Supplemental for Spectral Algorithm For Latent Tree Graphical Models Ankur P. Parikh, Le Song, Eric P. Xing The supplemental contains 3 main things. 1. The first is network plots of the latent variable

More information

Spectral Clustering. Zitao Liu

Spectral Clustering. Zitao Liu Spectral Clustering Zitao Liu Agenda Brief Clustering Review Similarity Graph Graph Laplacian Spectral Clustering Algorithm Graph Cut Point of View Random Walk Point of View Perturbation Theory Point of

More information

The Expectation Maximization Algorithm

The Expectation Maximization Algorithm The Expectation Maximization Algorithm Frank Dellaert College of Computing, Georgia Institute of Technology Technical Report number GIT-GVU-- February Abstract This note represents my attempt at explaining

More information

On the Tightness of an LP Relaxation for Rational Optimization and its Applications

On the Tightness of an LP Relaxation for Rational Optimization and its Applications OPERATIONS RESEARCH Vol. 00, No. 0, Xxxxx 0000, pp. 000 000 issn 0030-364X eissn 526-5463 00 0000 000 INFORMS doi 0.287/xxxx.0000.0000 c 0000 INFORMS Authors are encouraged to submit new papers to INFORMS

More information

Preliminary draft only: please check for final version

Preliminary draft only: please check for final version ARE211, Fall2012 CALCULUS4: THU, OCT 11, 2012 PRINTED: AUGUST 22, 2012 (LEC# 15) Contents 3. Univariate and Multivariate Differentiation (cont) 1 3.6. Taylor s Theorem (cont) 2 3.7. Applying Taylor theory:

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Boolean Inner-Product Spaces and Boolean Matrices

Boolean Inner-Product Spaces and Boolean Matrices Boolean Inner-Product Spaces and Boolean Matrices Stan Gudder Department of Mathematics, University of Denver, Denver CO 80208 Frédéric Latrémolière Department of Mathematics, University of Denver, Denver

More information

An indicator for the number of clusters using a linear map to simplex structure

An indicator for the number of clusters using a linear map to simplex structure An indicator for the number of clusters using a linear map to simplex structure Marcus Weber, Wasinee Rungsarityotin, and Alexander Schliep Zuse Institute Berlin ZIB Takustraße 7, D-495 Berlin, Germany

More information

An Algorithm for Transfer Learning in a Heterogeneous Environment

An Algorithm for Transfer Learning in a Heterogeneous Environment An Algorithm for Transfer Learning in a Heterogeneous Environment Andreas Argyriou 1, Andreas Maurer 2, and Massimiliano Pontil 1 1 Department of Computer Science University College London Malet Place,

More information

Analysis-3 lecture schemes

Analysis-3 lecture schemes Analysis-3 lecture schemes (with Homeworks) 1 Csörgő István November, 2015 1 A jegyzet az ELTE Informatikai Kar 2015. évi Jegyzetpályázatának támogatásával készült Contents 1. Lesson 1 4 1.1. The Space

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Topological properties

Topological properties CHAPTER 4 Topological properties 1. Connectedness Definitions and examples Basic properties Connected components Connected versus path connected, again 2. Compactness Definition and first examples Topological

More information

Spectral Techniques for Clustering

Spectral Techniques for Clustering Nicola Rebagliati 1/54 Spectral Techniques for Clustering Nicola Rebagliati 29 April, 2010 Nicola Rebagliati 2/54 Thesis Outline 1 2 Data Representation for Clustering Setting Data Representation and Methods

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Numerical Analysis Lecture Notes Peter J Olver 8 Numerical Computation of Eigenvalues In this part, we discuss some practical methods for computing eigenvalues and eigenvectors of matrices Needless to

More information

Grothendieck s Inequality

Grothendieck s Inequality Grothendieck s Inequality Leqi Zhu 1 Introduction Let A = (A ij ) R m n be an m n matrix. Then A defines a linear operator between normed spaces (R m, p ) and (R n, q ), for 1 p, q. The (p q)-norm of A

More information

Support Vector Machine Classification via Parameterless Robust Linear Programming

Support Vector Machine Classification via Parameterless Robust Linear Programming Support Vector Machine Classification via Parameterless Robust Linear Programming O. L. Mangasarian Abstract We show that the problem of minimizing the sum of arbitrary-norm real distances to misclassified

More information

Transpose & Dot Product

Transpose & Dot Product Transpose & Dot Product Def: The transpose of an m n matrix A is the n m matrix A T whose columns are the rows of A. So: The columns of A T are the rows of A. The rows of A T are the columns of A. Example:

More information

Another algorithm for nonnegative matrices

Another algorithm for nonnegative matrices Linear Algebra and its Applications 365 (2003) 3 12 www.elsevier.com/locate/laa Another algorithm for nonnegative matrices Manfred J. Bauch University of Bayreuth, Institute of Mathematics, D-95440 Bayreuth,

More information

22.3. Repeated Eigenvalues and Symmetric Matrices. Introduction. Prerequisites. Learning Outcomes

22.3. Repeated Eigenvalues and Symmetric Matrices. Introduction. Prerequisites. Learning Outcomes Repeated Eigenvalues and Symmetric Matrices. Introduction In this Section we further develop the theory of eigenvalues and eigenvectors in two distinct directions. Firstly we look at matrices where one

More information

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles

More information

The general programming problem is the nonlinear programming problem where a given function is maximized subject to a set of inequality constraints.

The general programming problem is the nonlinear programming problem where a given function is maximized subject to a set of inequality constraints. 1 Optimization Mathematical programming refers to the basic mathematical problem of finding a maximum to a function, f, subject to some constraints. 1 In other words, the objective is to find a point,

More information

Set, functions and Euclidean space. Seungjin Han

Set, functions and Euclidean space. Seungjin Han Set, functions and Euclidean space Seungjin Han September, 2018 1 Some Basics LOGIC A is necessary for B : If B holds, then A holds. B A A B is the contraposition of B A. A is sufficient for B: If A holds,

More information

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information 1 Conditional entropy Let (Ω, F, P) be a probability space, let X be a RV taking values in some finite set A. In this lecture

More information

Transpose & Dot Product

Transpose & Dot Product Transpose & Dot Product Def: The transpose of an m n matrix A is the n m matrix A T whose columns are the rows of A. So: The columns of A T are the rows of A. The rows of A T are the columns of A. Example:

More information

On the simultaneous diagonal stability of a pair of positive linear systems

On the simultaneous diagonal stability of a pair of positive linear systems On the simultaneous diagonal stability of a pair of positive linear systems Oliver Mason Hamilton Institute NUI Maynooth Ireland Robert Shorten Hamilton Institute NUI Maynooth Ireland Abstract In this

More information

Graph-Based Semi-Supervised Learning

Graph-Based Semi-Supervised Learning Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université de Montréal CIAR Workshop - April 26th, 2005 Graph-Based Semi-Supervised Learning Yoshua Bengio, Olivier

More information

Expectation Propagation in Factor Graphs: A Tutorial

Expectation Propagation in Factor Graphs: A Tutorial DRAFT: Version 0.1, 28 October 2005. Do not distribute. Expectation Propagation in Factor Graphs: A Tutorial Charles Sutton October 28, 2005 Abstract Expectation propagation is an important variational

More information

Repeated Eigenvalues and Symmetric Matrices

Repeated Eigenvalues and Symmetric Matrices Repeated Eigenvalues and Symmetric Matrices. Introduction In this Section we further develop the theory of eigenvalues and eigenvectors in two distinct directions. Firstly we look at matrices where one

More information

Recoverabilty Conditions for Rankings Under Partial Information

Recoverabilty Conditions for Rankings Under Partial Information Recoverabilty Conditions for Rankings Under Partial Information Srikanth Jagabathula Devavrat Shah Abstract We consider the problem of exact recovery of a function, defined on the space of permutations

More information

ACI-matrices all of whose completions have the same rank

ACI-matrices all of whose completions have the same rank ACI-matrices all of whose completions have the same rank Zejun Huang, Xingzhi Zhan Department of Mathematics East China Normal University Shanghai 200241, China Abstract We characterize the ACI-matrices

More information

Lecture 9: Random sampling, ɛ-approximation and ɛ-nets

Lecture 9: Random sampling, ɛ-approximation and ɛ-nets CPS296.2 Geometric Optimization February 13, 2007 Lecture 9: Random sampling, -approximation and -nets Lecturer: Pankaj K. Agarwal Scribe: Shashidhara K. Ganjugunte In the next few lectures we will be

More information

Part III. 10 Topological Space Basics. Topological Spaces

Part III. 10 Topological Space Basics. Topological Spaces Part III 10 Topological Space Basics Topological Spaces Using the metric space results above as motivation we will axiomatize the notion of being an open set to more general settings. Definition 10.1.

More information

Kernels for Multi task Learning

Kernels for Multi task Learning Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano

More information

Mathematical Preliminaries

Mathematical Preliminaries Chapter 33 Mathematical Preliminaries In this appendix, we provide essential definitions and key results which are used at various points in the book. We also provide a list of sources where more details

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational

More information

The matrix approach for abstract argumentation frameworks

The matrix approach for abstract argumentation frameworks The matrix approach for abstract argumentation frameworks Claudette CAYROL, Yuming XU IRIT Report RR- -2015-01- -FR February 2015 Abstract The matrices and the operation of dual interchange are introduced

More information

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary

More information

Sample width for multi-category classifiers

Sample width for multi-category classifiers R u t c o r Research R e p o r t Sample width for multi-category classifiers Martin Anthony a Joel Ratsaby b RRR 29-2012, November 2012 RUTCOR Rutgers Center for Operations Research Rutgers University

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017 Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem

More information

On Dominator Colorings in Graphs

On Dominator Colorings in Graphs On Dominator Colorings in Graphs Ralucca Michelle Gera Department of Applied Mathematics Naval Postgraduate School Monterey, CA 994, USA ABSTRACT Given a graph G, the dominator coloring problem seeks a

More information

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence

More information

Problem Set 2: Solutions Math 201A: Fall 2016

Problem Set 2: Solutions Math 201A: Fall 2016 Problem Set 2: s Math 201A: Fall 2016 Problem 1. (a) Prove that a closed subset of a complete metric space is complete. (b) Prove that a closed subset of a compact metric space is compact. (c) Prove that

More information

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent. Lecture Notes: Orthogonal and Symmetric Matrices Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk Orthogonal Matrix Definition. Let u = [u

More information

Quadratic and Copositive Lyapunov Functions and the Stability of Positive Switched Linear Systems

Quadratic and Copositive Lyapunov Functions and the Stability of Positive Switched Linear Systems Proceedings of the 2007 American Control Conference Marriott Marquis Hotel at Times Square New York City, USA, July 11-13, 2007 WeA20.1 Quadratic and Copositive Lyapunov Functions and the Stability of

More information

Submodularity in Machine Learning

Submodularity in Machine Learning Saifuddin Syed MLRG Summer 2016 1 / 39 What are submodular functions Outline 1 What are submodular functions Motivation Submodularity and Concavity Examples 2 Properties of submodular functions Submodularity

More information

Learning Kernel Parameters by using Class Separability Measure

Learning Kernel Parameters by using Class Separability Measure Learning Kernel Parameters by using Class Separability Measure Lei Wang, Kap Luk Chan School of Electrical and Electronic Engineering Nanyang Technological University Singapore, 3979 E-mail: P 3733@ntu.edu.sg,eklchan@ntu.edu.sg

More information

More Spectral Clustering and an Introduction to Conjugacy

More Spectral Clustering and an Introduction to Conjugacy CS8B/Stat4B: Advanced Topics in Learning & Decision Making More Spectral Clustering and an Introduction to Conjugacy Lecturer: Michael I. Jordan Scribe: Marco Barreno Monday, April 5, 004. Back to spectral

More information

Approximating Submodular Functions. Nick Harvey University of British Columbia

Approximating Submodular Functions. Nick Harvey University of British Columbia Approximating Submodular Functions Nick Harvey University of British Columbia Approximating Submodular Functions Part 1 Nick Harvey University of British Columbia Department of Computer Science July 11th,

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

9. Birational Maps and Blowing Up

9. Birational Maps and Blowing Up 72 Andreas Gathmann 9. Birational Maps and Blowing Up In the course of this class we have already seen many examples of varieties that are almost the same in the sense that they contain isomorphic dense

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence

More information

Estimating Gaussian Mixture Densities with EM A Tutorial

Estimating Gaussian Mixture Densities with EM A Tutorial Estimating Gaussian Mixture Densities with EM A Tutorial Carlo Tomasi Due University Expectation Maximization (EM) [4, 3, 6] is a numerical algorithm for the maximization of functions of several variables

More information

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Information retrieval LSI, plsi and LDA. Jian-Yun Nie Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and

More information

The Informativeness of k-means for Learning Mixture Models

The Informativeness of k-means for Learning Mixture Models The Informativeness of k-means for Learning Mixture Models Vincent Y. F. Tan (Joint work with Zhaoqiang Liu) National University of Singapore June 18, 2018 1/35 Gaussian distribution For F dimensions,

More information

EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University September 22, 2005 0 Preface This collection of ten

More information

Voronoi Diagrams for Oriented Spheres

Voronoi Diagrams for Oriented Spheres Voronoi Diagrams for Oriented Spheres F. Aurenhammer J. Wallner University of Technology Graz, Austria auren@igi.tugraz.at j.wallner@tugraz.at M. Peternell H. Pottmann University of Technology Vienna,

More information

Linear Algebra. Preliminary Lecture Notes

Linear Algebra. Preliminary Lecture Notes Linear Algebra Preliminary Lecture Notes Adolfo J. Rumbos c Draft date April 29, 23 2 Contents Motivation for the course 5 2 Euclidean n dimensional Space 7 2. Definition of n Dimensional Euclidean Space...........

More information

Variable Objective Search

Variable Objective Search Variable Objective Search Sergiy Butenko, Oleksandra Yezerska, and Balabhaskar Balasundaram Abstract This paper introduces the variable objective search framework for combinatorial optimization. The method

More information

Tutorials in Optimization. Richard Socher

Tutorials in Optimization. Richard Socher Tutorials in Optimization Richard Socher July 20, 2008 CONTENTS 1 Contents 1 Linear Algebra: Bilinear Form - A Simple Optimization Problem 2 1.1 Definitions........................................ 2 1.2

More information

Lecture 19 : Reed-Muller, Concatenation Codes & Decoding problem

Lecture 19 : Reed-Muller, Concatenation Codes & Decoding problem IITM-CS6845: Theory Toolkit February 08, 2012 Lecture 19 : Reed-Muller, Concatenation Codes & Decoding problem Lecturer: Jayalal Sarma Scribe: Dinesh K Theme: Error correcting codes In the previous lecture,

More information

Convex Functions and Optimization

Convex Functions and Optimization Chapter 5 Convex Functions and Optimization 5.1 Convex Functions Our next topic is that of convex functions. Again, we will concentrate on the context of a map f : R n R although the situation can be generalized

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES JOEL A. TROPP Abstract. We present an elementary proof that the spectral radius of a matrix A may be obtained using the formula ρ(a) lim

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

Nonlinear Discrete Optimization

Nonlinear Discrete Optimization Nonlinear Discrete Optimization Technion Israel Institute of Technology http://ie.technion.ac.il/~onn Billerafest 2008 - conference in honor of Lou Billera's 65th birthday (Update on Lecture Series given

More information

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008. 1 ECONOMICS 594: LECTURE NOTES CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS W. Erwin Diewert January 31, 2008. 1. Introduction Many economic problems have the following structure: (i) a linear function

More information

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure The Information Bottleneck Revisited or How to Choose a Good Distortion Measure Peter Harremoës Centrum voor Wiskunde en Informatica PO 94079, 1090 GB Amsterdam The Nederlands PHarremoes@cwinl Naftali

More information

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning

More information

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in

More information

Basis Construction from Power Series Expansions of Value Functions

Basis Construction from Power Series Expansions of Value Functions Basis Construction from Power Series Expansions of Value Functions Sridhar Mahadevan Department of Computer Science University of Massachusetts Amherst, MA 3 mahadeva@cs.umass.edu Bo Liu Department of

More information

Notes on Complex Analysis

Notes on Complex Analysis Michael Papadimitrakis Notes on Complex Analysis Department of Mathematics University of Crete Contents The complex plane.. The complex plane...................................2 Argument and polar representation.........................

More information

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs) ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexity-preserving operations Convex envelopes, cardinality

More information

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H. Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

REVIEW OF DIFFERENTIAL CALCULUS

REVIEW OF DIFFERENTIAL CALCULUS REVIEW OF DIFFERENTIAL CALCULUS DONU ARAPURA 1. Limits and continuity To simplify the statements, we will often stick to two variables, but everything holds with any number of variables. Let f(x, y) be

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

12 : Variational Inference I

12 : Variational Inference I 10-708: Probabilistic Graphical Models, Spring 2015 12 : Variational Inference I Lecturer: Eric P. Xing Scribes: Fattaneh Jabbari, Eric Lei, Evan Shapiro 1 Introduction Probabilistic inference is one of

More information

Tangent spaces, normals and extrema

Tangent spaces, normals and extrema Chapter 3 Tangent spaces, normals and extrema If S is a surface in 3-space, with a point a S where S looks smooth, i.e., without any fold or cusp or self-crossing, we can intuitively define the tangent

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Material presented. Direct Models for Classification. Agenda. Classification. Classification (2) Classification by machines 6/16/2010.

Material presented. Direct Models for Classification. Agenda. Classification. Classification (2) Classification by machines 6/16/2010. Material presented Direct Models for Classification SCARF JHU Summer School June 18, 2010 Patrick Nguyen (panguyen@microsoft.com) What is classification? What is a linear classifier? What are Direct Models?

More information

Without Loss of Generality

Without Loss of Generality 0 Without Loss of Generality John Harrison TPHOLs 2009, Munich 19th August 2009, 09:00 10:00 1 Without loss of generality Mathematical proofs sometimes state that a certain assumption can be made without

More information

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants

More information

Orientations of Simplices Determined by Orderings on the Coordinates of their Vertices. September 30, 2018

Orientations of Simplices Determined by Orderings on the Coordinates of their Vertices. September 30, 2018 Orientations of Simplices Determined by Orderings on the Coordinates of their Vertices Emeric Gioan 1 Kevin Sol 2 Gérard Subsol 1 September 30, 2018 arxiv:1602.01641v1 [cs.dm] 4 Feb 2016 Abstract Provided

More information

10. Smooth Varieties. 82 Andreas Gathmann

10. Smooth Varieties. 82 Andreas Gathmann 82 Andreas Gathmann 10. Smooth Varieties Let a be a point on a variety X. In the last chapter we have introduced the tangent cone C a X as a way to study X locally around a (see Construction 9.20). It

More information

BMO Round 2 Problem 3 Generalisation and Bounds

BMO Round 2 Problem 3 Generalisation and Bounds BMO 2007 2008 Round 2 Problem 3 Generalisation and Bounds Joseph Myers February 2008 1 Introduction Problem 3 (by Paul Jefferys) is: 3. Adrian has drawn a circle in the xy-plane whose radius is a positive

More information

Fourier analysis of boolean functions in quantum computation

Fourier analysis of boolean functions in quantum computation Fourier analysis of boolean functions in quantum computation Ashley Montanaro Centre for Quantum Information and Foundations, Department of Applied Mathematics and Theoretical Physics, University of Cambridge

More information

An introduction to multivariate data

An introduction to multivariate data An introduction to multivariate data Angela Montanari 1 The data matrix The starting point of any analysis of multivariate data is a data matrix, i.e. a collection of n observations on a set of p characters

More information