k-variates++: more pluses in the k-means++

Size: px
Start display at page:

Download "k-variates++: more pluses in the k-means++"

Transcription

1 (formerly NICTA) k-variates++: more pluses in the k-means++ m p 7 3 n o M, 9 Poster #2 Richard Nock, Raphaël Canyasse, Roksana Boreli, Frank Nielsen DATA61 ANU TECHNION ECOLE POLYTECHNIQUE UNSW SONY CS LABS, INC wwwdata61csiroau

2 In this talk k-variates A generalization of the popular k-means++ seeding! Two theorems on k-variates++! guarantees on approximation of the global optimum! likelihood ratio bound between neighbouring instances! Applications: reductions between clustering algorithms + approximation bounds of new clustering algorithms, privacy 2

3 In this talk k-variates A generalization of the popular k-means++ seeding! Two theorems on k-variates++! And more! guarantees on approximation of the global optimum! (see poster) likelihood ratio bound between neighbouring instances! Applications: reductions between clustering algorithms + approximation bounds of new clustering algorithms, privacy 3

4 In this talk k-variates A generalization of the popular k-means++ seeding! Two theorems on k-variates++! And more! guarantees on approximation of the global optimum! (see poster)! likelihood ratio bound between neighbouring instances! (see paper!) Applications: reductions between clustering algorithms + approximation bounds of new clustering algorithms, privacy 4

5 Motivation k-means++ seeding = a gold standard in clustering:! utterly simple to implement (iteratively pick centers squ distance to previous centers)! k-means++ distributed on-line assumption-free (expected) approximation guarantee wrt the k-means global optimum: E C [potential] apple (2 + log k) 8 opt (Arthur & Vassilvitskii, SODA 2007)! streamed no closed form centroid tensors more potentials Inspired many variants (tensor clustering, distributed, data stream, on-line, parallel clustering, clustering without centroids in closed form, etc) 5

6 Motivation Approaches are spawns of k-means++:! k-variates modify the algorithm (eg )! use it as building block! Our objective:! distributed on-line k-means++ more applications all in the same bag : a generalisation of k-means++ from which such approaches would be just instanciations reductions! ) streamed ) Because general new applications no closed form centroid more potentials 6

7 k-means++ Arthur & Vassilvitskii, SODA 07 Input: data A R d with A = m, k 2 N ; Step 1: Initialise centers C ;; Step 2: for t =1, 2,,k 21: randomly sample a qt A,withq 1 = u m and, for t>1, q t (a) = D t (a) X a 0 2A D t (a 0 )! 1, where D t (a) =min x2c ka xk2 2 ; 22: x a; 23: C C [ {x}; Output: C; 7

8 k-variates Input: data A R d with A = m, k 2 N, random variables {X a, a 2 A}, probe functions } t : A! R d (t 1); Step 1: Initialise centers C ;; Step 2: for t =1, 2,,k 21: randomly sample a qt A,withq 1 = u m and, for t>1, q t (a) = D t (a) X a 0 2A D t (a 0 )! 1, where D t (a) =min x2c k} t(a) xk 2 2 ; 22: randomly sample x X a ; 23: C C [ {x}; Output: C; 8

9 Two theorems & applications 9

10 Theorem 1 approximation of global optimum k-means potential for C:, with! ( 0) Suppose is -stretching: for any optimal cluster with size > 1 and any, (A; C) = P a2a ka c(a)k2 2 } t A a 0 2 A (A; C) (A; {a 0 }) apple (1 + ) (} t (A); C) (} t (A); {} t (a 0 )}), 8t E C k variates++ [ (A; C)] apple (2 + log k) Then, with =(6+4 ) opt +2 bias +2 var c(a) = arg min ka c2c ck2 2 opt = X a2a ka c opt (a)k 2 2 bias = X a2a ke[x a ] c opt (a)k 2 2 var = X a2a tr (cov[x a ]) 10

11 Theorem 1 approximation of global optimum k-means potential for C:, with! ( 0) Suppose is -stretching: for any optimal cluster with size > 1 and any, (A; C) = P a2a ka c(a)k2 2 } t A a 0 2 A (A; C) (A; {a 0 }) apple (1 + ) (} t (A); C) (} t (A); {} t (a 0 )}), 8t E C k variates++ [ (A; C)] apple (2 + log k) Then, with =(6+4 ) opt +2 bias +2 var c(a) = arg min ka c2c ck2 2 k-means++:! probe = Id! = Diracs X opt bias = X a2a ka c opt (a)k 2 2 = X a2a ke[x a ] c opt (a)k 2 2 var = X a2a tr (cov[x a ]) 11

12 Theorem 1 approximation of global optimum k-means potential for C:, with! ( 0) Suppose is -stretching: for any optimal cluster with size > 1 and any, (A; C) = P a2a ka c(a)k2 2 } t A a 0 2 A (A; C) (A; {a 0 }) apple (1 + ) (} t (A); C) (} t (A); {} t (a 0 )}), 8t E C k variates++ [ (A; C)] apple (2 + log k) Then, with =(6+4 ) opt +2 bias +2 var c(a) = arg min ka c2c ck2 2 k-means++: bias = opt var =0 =0 ) =8 opt opt bias var = X a2a ka c opt (a)k 2 2 = X a2a ke[x a ] c opt (a)k 2 2 = X a2a tr (cov[x a ]) 12

13 Remarks Guarantee approaches statistical lowerbound (Fréchet-Cramér-Rao-Darmois)! Can be better than Arthur-Vassilvitskii bound, in particular if bias < bias opt = knob from which background / domain knowledge may improve the general bound 13

14 Applications Reductions from k-variates++ ) approximability ratios! pick clustering algorithm L,! show that expected output of L= that of k-variates++ for particular choices of } t and X (note: no computational constraint, just need existence)! Get approximability ratio for L! 14

15 Summary (poster, paper) } t X Setting AlgorithmL Probe functions Densities Batch k-means++ Identity Diracs Distributed d-k-means++ Identity Uniform, support = subsets Distributed p + d-k-means++ Identity Non uniform, compact support Streaming s-k-means++ synopses Diracs On-line ol-k-means++ point (batch not hit) Diracs / closest center (batch hit) 15

16 Summary (poster, paper) } t X Setting AlgorithmL Probe functions Densities Batch k-means++ Identity Diracs Distributed d-k-means++ Identity Uniform, support = subsets Distributed p + d-k-means++ Identity Non uniform, compact support Streaming s-k-means++ synopses Diracs On-line ol-k-means++ point (batch not hit) Diracs / closest center (batch hit) 16

17 Distributed clustering Setting: {data nodes = Forgy nodes} & special node Sampling node no data & non-uniform sampling eg hybrid, server-assisted P2P networks! (or Forgy node) (F 1, A 1 ) (F 2, A 2 ) (F 3, A 3 ) N k data points! no data! communicated communicated (F 4, A 4 ) (F 5, A 5 ) Forgy nodes data ([ i A i = A) & uniform sampling 17

18 Algorithm + Theorem Algorithm: iterate for t =1, 2,,k:! d-k-means++ N chooses (non-uniformly, D ) Forgy node, say! t F i F i samples (uniformly) point a t 2 A i, sends to! F j, 8j F j, 8j computes & sends d j 2 R + to N, which updates D! t Theorem: E[ (A, C)] apple (2 + log k), with and F s = P i2[n] P a2a i kc(a i ) ak 2 2 the spread of Forgy nodes! Remarks: global optimum on total data; bound gets opt all the better as Forgy nodes aggregate local data a t = 10 opt +6 F s 18

19 Theorem 2 likelihood ratio bound for neighbour samples Assumption: = Ball(L 2,R), all satisfy (see eg differential privacy) X (dp Xa 0 /dp Xa )(x) apple %(R), 8a, a 0 2 A, 8x 2 19

20 Theorem 2 likelihood ratio bound for neighbour samples Fix For any neighbour A 0 A (differ from 1), 0 < w, s 1are spread and monotonicity parameters! } t =Id() P C k variates++ [C A 0 ] P C k variates++ [C A] apple (1 + w ) k 1 + f(k) w (1 + s ) k 1 %(R) They can be estimated / computed from data! definition in In general, they! 0 with m(formal poster / paper) 20

21 Theorem 2 likelihood ratio bound for neighbour samples } t =Id() Fix For any neighbour A 0 A (differ from 1), P C k variates++ [C A 0 ] P C k variates++ [C A] apple (1 + w ) k 1 + f(k) w (1 + s ) k 1 %(R) Conditions for ) 1 & ) 0? 21

22 Theorem 2 likelihood ratio bound for neighbour samples } t =Id() Fix For any neighbour A 0 A(differ from 1), P C k variates++ [C A 0 ] P C k variates++ [C A] apple (1 + w ) k 1 + f(k) w (1 + s ) k 1 %(R) If densities of all are in [ m, M ] 63 0, with prob P[C A 0 ] P[C A] apple 1+ X a M as long as k! apple m 2 4 M m k pm No w, s in bound (proof exhibits small values whp, experiments display such values) Application in differential privacy (sublinear noise!) ! k %(2R) m d+1 k 2 d m {z } o(1) 1 22

23 Experiments k-variates++ ( d-k-means++ ) vs k-means++ & k-means (Bahmani & al 2012), simulated data, d=50, sample peers E[ A i ] = 500 with until For each peer, (a) data uniformly sampled in an hyper rectangle + (b) p% of points given to a random peer (increases, problem more difficult ) F s P i A i F s (p) F s (0) k p 23

24 Experiments k-variates++ (d-k-means++) vs k-means++ & k-meansk (Bahmani & al 2012) (used with their best parameters) k 7 k-variates++ ++ s n a e m k s t a be p 4 5 (k-means++) (H) = (d-k-means++) (H) 6 k p (k-meansk ) (H)

25 Conclusions We provide a generalisation of k-means++ with guaranteed approximation of the global optimum! k-variates++ can be used as is (eg privacy, k-means++) or to prove approximation properties of other algorithms via reductions between clustering algorithms Come see the poster for more examples! Future: use Theorems to address stability, generalisation and smoothed analysis 25

26 Thank you! k-variates Questions? 26

k-variates++: more pluses in the k-means++

k-variates++: more pluses in the k-means++ k-variates++: more pluses in the k-means++ arxiv:698v [cslg] 3 Feb 6 Richard Nock Nicta & The Australian National University richardnock@nictacomau Raphaël Canyasse Ecole Polytechnique & The Technion raphaelcanyasse@polytechniqueedu

More information

Tailored Bregman Ball Trees for Effective Nearest Neighbors

Tailored Bregman Ball Trees for Effective Nearest Neighbors Tailored Bregman Ball Trees for Effective Nearest Neighbors Frank Nielsen 1 Paolo Piro 2 Michel Barlaud 2 1 Ecole Polytechnique, LIX, Palaiseau, France 2 CNRS / University of Nice-Sophia Antipolis, Sophia

More information

Clustering and Gaussian Mixture Models

Clustering and Gaussian Mixture Models Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap

More information

Privacy and Fault-Tolerance in Distributed Optimization. Nitin Vaidya University of Illinois at Urbana-Champaign

Privacy and Fault-Tolerance in Distributed Optimization. Nitin Vaidya University of Illinois at Urbana-Champaign Privacy and Fault-Tolerance in Distributed Optimization Nitin Vaidya University of Illinois at Urbana-Champaign Acknowledgements Shripad Gade Lili Su argmin x2x SX i=1 i f i (x) Applications g f i (x)

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Lecture 3 k-means++ & the Impossibility Theorem

Lecture 3 k-means++ & the Impossibility Theorem COMS 4995: Unsupervised Learning (Summer 18) May 29, 2018 Lecture 3 k-means++ & the Impossibility Theorem Instructor: Nakul Verma Scribes: Zongkai Tian Instead of arbitrarily initializing cluster centers

More information

Target Tracking and Classification using Collaborative Sensor Networks

Target Tracking and Classification using Collaborative Sensor Networks Target Tracking and Classification using Collaborative Sensor Networks Xiaodong Wang Department of Electrical Engineering Columbia University p.1/3 Talk Outline Background on distributed wireless sensor

More information

Computing the Entropy of a Stream

Computing the Entropy of a Stream Computing the Entropy of a Stream To appear in SODA 2007 Graham Cormode graham@research.att.com Amit Chakrabarti Dartmouth College Andrew McGregor U. Penn / UCSD Outline Introduction Entropy Upper Bound

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to

More information

U Logo Use Guidelines

U Logo Use Guidelines Information Theory Lecture 3: Applications to Machine Learning U Logo Use Guidelines Mark Reid logo is a contemporary n of our heritage. presents our name, d and our motto: arn the nature of things. authenticity

More information

Distributed Systems Gossip Algorithms

Distributed Systems Gossip Algorithms Distributed Systems Gossip Algorithms He Sun School of Informatics University of Edinburgh What is Gossip? Gossip algorithms In a gossip algorithm, each node in the network periodically exchanges information

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

Non-Convex Optimization. CS6787 Lecture 7 Fall 2017

Non-Convex Optimization. CS6787 Lecture 7 Fall 2017 Non-Convex Optimization CS6787 Lecture 7 Fall 2017 First some words about grading I sent out a bunch of grades on the course management system Everyone should have all their grades in Not including paper

More information

Privacy-Preserving Data Mining

Privacy-Preserving Data Mining CS 380S Privacy-Preserving Data Mining Vitaly Shmatikov slide 1 Reading Assignment Evfimievski, Gehrke, Srikant. Limiting Privacy Breaches in Privacy-Preserving Data Mining (PODS 2003). Blum, Dwork, McSherry,

More information

QUANTIZATION FOR DISTRIBUTED ESTIMATION IN LARGE SCALE SENSOR NETWORKS

QUANTIZATION FOR DISTRIBUTED ESTIMATION IN LARGE SCALE SENSOR NETWORKS QUANTIZATION FOR DISTRIBUTED ESTIMATION IN LARGE SCALE SENSOR NETWORKS Parvathinathan Venkitasubramaniam, Gökhan Mergen, Lang Tong and Ananthram Swami ABSTRACT We study the problem of quantization for

More information

K-means. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. November 19 th, Carlos Guestrin 1

K-means. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. November 19 th, Carlos Guestrin 1 EM Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 19 th, 2007 2005-2007 Carlos Guestrin 1 K-means 1. Ask user how many clusters they d like. e.g. k=5 2. Randomly guess

More information

Introduction to Machine Learning HW6

Introduction to Machine Learning HW6 CS 189 Spring 2018 Introduction to Machine Learning HW6 Your self-grade URL is http://eecs189.org/self_grade?question_ids=1_1,1_ 2,2_1,2_2,3_1,3_2,3_3,4_1,4_2,4_3,4_4,4_5,4_6,5_1,5_2,6. This homework is

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Submodular Functions Properties Algorithms Machine Learning

Submodular Functions Properties Algorithms Machine Learning Submodular Functions Properties Algorithms Machine Learning Rémi Gilleron Inria Lille - Nord Europe & LIFL & Univ Lille Jan. 12 revised Aug. 14 Rémi Gilleron (Mostrare) Submodular Functions Jan. 12 revised

More information

Efficient Primal- Dual Graph Algorithms for Map Reduce

Efficient Primal- Dual Graph Algorithms for Map Reduce Efficient Primal- Dual Graph Algorithms for Map Reduce Joint work with Bahman Bahmani Ashish Goel, Stanford Kamesh Munagala Duke University Modern Data Models Over the past decade, many commodity distributed

More information

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation Variations ECE 6540, Lecture 10 Last Time BLUE (Best Linear Unbiased Estimator) Formulation Advantages Disadvantages 2 The BLUE A simplification Assume the estimator is a linear system For a single parameter

More information

Applications of Hidden Markov Models

Applications of Hidden Markov Models 18.417 Introduction to Computational Molecular Biology Lecture 18: November 9, 2004 Scribe: Chris Peikert Lecturer: Ross Lippert Editor: Chris Peikert Applications of Hidden Markov Models Review of Notation

More information

1 Maximizing a Submodular Function

1 Maximizing a Submodular Function 6.883 Learning with Combinatorial Structure Notes for Lecture 16 Author: Arpit Agarwal 1 Maximizing a Submodular Function In the last lecture we looked at maximization of a monotone submodular function,

More information

Supplemental Material for TKDE

Supplemental Material for TKDE Supplemental Material for TKDE-05-05-035 Kijung Sh, Lee Sael, U Kang PROPOSED METHODS Proofs of Update Rules In this section, we present the proofs of the update rules Section 35 of the ma paper Specifically,

More information

A New Penalty-SQP Method

A New Penalty-SQP Method Background and Motivation Illustration of Numerical Results Final Remarks Frank E. Curtis Informs Annual Meeting, October 2008 Background and Motivation Illustration of Numerical Results Final Remarks

More information

The Karcher Mean of Points on SO n

The Karcher Mean of Points on SO n The Karcher Mean of Points on SO n Knut Hüper joint work with Jonathan Manton (Univ. Melbourne) Knut.Hueper@nicta.com.au National ICT Australia Ltd. CESAME LLN, 15/7/04 p.1/25 Contents Introduction CESAME

More information

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3

More information

Computer Vision Group Prof. Daniel Cremers. 14. Clustering

Computer Vision Group Prof. Daniel Cremers. 14. Clustering Group Prof. Daniel Cremers 14. Clustering Motivation Supervised learning is good for interaction with humans, but labels from a supervisor are hard to obtain Clustering is unsupervised learning, i.e. it

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017 Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient

More information

Szemerédi-type clustering of peer-to-peer streaming system

Szemerédi-type clustering of peer-to-peer streaming system Szemerédi-type clustering of peer-to-peer streaming system Vesa Pehkonen and Hannu Reittu VTT Finland MODELING, ANALYSIS, AND CONTROL OF COMPLEX NETWORKS Cnet 2011, 9.Sept. San Francisco agenda: review

More information

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Mathurin Massias https://mathurinm.github.io INRIA Saclay Joint work with: Olivier Fercoq (Télécom ParisTech) Alexandre Gramfort

More information

Stochastic Composition Optimization

Stochastic Composition Optimization Stochastic Composition Optimization Algorithms and Sample Complexities Mengdi Wang Joint works with Ethan X. Fang, Han Liu, and Ji Liu ORFE@Princeton ICCOPT, Tokyo, August 8-11, 2016 1 / 24 Collaborators

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

the tree till a class assignment is reached

the tree till a class assignment is reached Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal

More information

An Evolving Gradient Resampling Method for Machine Learning. Jorge Nocedal

An Evolving Gradient Resampling Method for Machine Learning. Jorge Nocedal An Evolving Gradient Resampling Method for Machine Learning Jorge Nocedal Northwestern University NIPS, Montreal 2015 1 Collaborators Figen Oztoprak Stefan Solntsev Richard Byrd 2 Outline 1. How to improve

More information

1 EM Primer. CS4786/5786: Machine Learning for Data Science, Spring /24/2015: Assignment 3: EM, graphical models

1 EM Primer. CS4786/5786: Machine Learning for Data Science, Spring /24/2015: Assignment 3: EM, graphical models CS4786/5786: Machine Learning for Data Science, Spring 2015 4/24/2015: Assignment 3: EM, graphical models Due Tuesday May 5th at 11:59pm on CMS. Submit what you have at least once by an hour before that

More information

Neural Networks: Optimization & Regularization

Neural Networks: Optimization & Regularization Neural Networks: Optimization & Regularization Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) NN Opt & Reg

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Review of Probability. CS1538: Introduction to Simulations

Review of Probability. CS1538: Introduction to Simulations Review of Probability CS1538: Introduction to Simulations Probability and Statistics in Simulation Why do we need probability and statistics in simulation? Needed to validate the simulation model Needed

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2014-2015 Jakob Verbeek, ovember 21, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15

More information

Importance Sampling for Minibatches

Importance Sampling for Minibatches Importance Sampling for Minibatches Dominik Csiba School of Mathematics University of Edinburgh 07.09.2016, Birmingham Dominik Csiba (University of Edinburgh) Importance Sampling for Minibatches 07.09.2016,

More information

Introduction to second order approximation. SGZ Macro Week 3, Day 4 Lecture 1

Introduction to second order approximation. SGZ Macro Week 3, Day 4 Lecture 1 Introduction to second order approximation 1 Outline A. Basic concepts in static models 1. What are first and second order approximations? 2. What are implications for accuracy of welfare calculations

More information

Mobile Robot Localization

Mobile Robot Localization Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations

More information

CSC 411: Lecture 04: Logistic Regression

CSC 411: Lecture 04: Logistic Regression CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic

More information

A fast randomized algorithm for approximating an SVD of a matrix

A fast randomized algorithm for approximating an SVD of a matrix A fast randomized algorithm for approximating an SVD of a matrix Joint work with Franco Woolfe, Edo Liberty, and Vladimir Rokhlin Mark Tygert Program in Applied Mathematics Yale University Place July 17,

More information

Advanced Optimization

Advanced Optimization Advanced Optimization Lecture 3: 1: Randomized Algorithms for for Continuous Discrete Problems Problems November 22, 2016 Master AIC Université Paris-Saclay, Orsay, France Anne Auger INRIA Saclay Ile-de-France

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Loss Functions and Optimization. Lecture 3-1

Loss Functions and Optimization. Lecture 3-1 Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative: Live Questions We ll use Zoom to take questions from remote students live-streaming the lecture Check Piazza for instructions and

More information

Lecture 10 September 27, 2016

Lecture 10 September 27, 2016 CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 10 September 27, 2016 Scribes: Quinten McNamara & William Hoza 1 Overview In this lecture, we focus on constructing coresets, which are

More information

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28 Context : Computational Anatomy Context and motivations :

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Caesar s Taxi Prediction Services

Caesar s Taxi Prediction Services 1 Caesar s Taxi Prediction Services Predicting NYC Taxi Fares, Trip Distance, and Activity Paul Jolly, Boxiao Pan, Varun Nambiar Abstract In this paper, we propose three models each predicting either taxi

More information

Clustering using Mixture Models

Clustering using Mixture Models Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior

More information

Lecture 5: November 19, Minimizing the maximum intracluster distance

Lecture 5: November 19, Minimizing the maximum intracluster distance Analysis of DNA Chips and Gene Networks Spring Semester, 2009 Lecture 5: November 19, 2009 Lecturer: Ron Shamir Scribe: Renana Meller 5.1 Minimizing the maximum intracluster distance 5.1.1 Introduction

More information

Greedy Maximization Framework for Graph-based Influence Functions

Greedy Maximization Framework for Graph-based Influence Functions Greedy Maximization Framework for Graph-based Influence Functions Edith Cohen Google Research Tel Aviv University HotWeb '16 1 Large Graphs Model relations/interactions (edges) between entities (nodes)

More information

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian

More information

Learning theory. Ensemble methods. Boosting. Boosting: history

Learning theory. Ensemble methods. Boosting. Boosting: history Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over

More information

In the Name of God. Lectures 15&16: Radial Basis Function Networks

In the Name of God. Lectures 15&16: Radial Basis Function Networks 1 In the Name of God Lectures 15&16: Radial Basis Function Networks Some Historical Notes Learning is equivalent to finding a surface in a multidimensional space that provides a best fit to the training

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Multi-Dimensional Online Tracking

Multi-Dimensional Online Tracking Multi-Dimensional Online Tracking Ke Yi and Qin Zhang Hong Kong University of Science & Technology SODA 2009 January 4-6, 2009 1-1 A natural problem Bob: tracker f(t) g(t) Alice: observer (t, g(t)) t 2-1

More information

DM534: Introduction to Computer Science Autumn term Exercise Clustering: Clustering, Color Histograms

DM534: Introduction to Computer Science Autumn term Exercise Clustering: Clustering, Color Histograms University of Southern Denmark IMADA Rolf Fagerberg Richard Roettger based on the work of Arthur Zimek DM: Introduction to Computer Science Autumn term 0 Exercise Clustering: Clustering, Color Histograms

More information

Personalized Social Recommendations Accurate or Private

Personalized Social Recommendations Accurate or Private Personalized Social Recommendations Accurate or Private Presented by: Lurye Jenny Paper by: Ashwin Machanavajjhala, Aleksandra Korolova, Atish Das Sarma Outline Introduction Motivation The model General

More information

CSE 150. Assignment 6 Summer Maximum likelihood estimation. Out: Thu Jul 14 Due: Tue Jul 19

CSE 150. Assignment 6 Summer Maximum likelihood estimation. Out: Thu Jul 14 Due: Tue Jul 19 SE 150. Assignment 6 Summer 2016 Out: Thu Jul 14 ue: Tue Jul 19 6.1 Maximum likelihood estimation A (a) omplete data onsider a complete data set of i.i.d. examples {a t, b t, c t, d t } T t=1 drawn from

More information

CMPUT651: Differential Privacy

CMPUT651: Differential Privacy CMPUT65: Differential Privacy Homework assignment # 2 Due date: Apr. 3rd, 208 Discussion and the exchange of ideas are essential to doing academic work. For assignments in this course, you are encouraged

More information

An introduction to clustering techniques

An introduction to clustering techniques - ABSTRACT Cluster analysis has been used in a wide variety of fields, such as marketing, social science, biology, pattern recognition etc. It is used to identify homogenous groups of cases to better understand

More information

Artificial Intelligence Heuristic Search Methods

Artificial Intelligence Heuristic Search Methods Artificial Intelligence Heuristic Search Methods Chung-Ang University, Jaesung Lee The original version of this content is created by School of Mathematics, University of Birmingham professor Sandor Zoltan

More information

Application of the Cross-Entropy Method to Clustering and Vector Quantization

Application of the Cross-Entropy Method to Clustering and Vector Quantization Application of the Cross-Entropy Method to Clustering and Vector Quantization Dirk P. Kroese and Reuven Y. Rubinstein and Thomas Taimre Faculty of Industrial Engineering and Management, Technion, Haifa,

More information

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,

More information

The Perron-Frobenius Theorem. Consider a non-zero linear operator B on R n that sends the non-negative orthant into itself.

The Perron-Frobenius Theorem. Consider a non-zero linear operator B on R n that sends the non-negative orthant into itself. Consider a non-zero linear operator B on R n that sends the non-negative orthant into itself. Consider a non-zero linear operator B on R n that sends the non-negative orthant into itself. Let be the simplex

More information

Mobile Robot Localization

Mobile Robot Localization Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Perturbed Proximal Gradient Algorithm

Perturbed Proximal Gradient Algorithm Perturbed Proximal Gradient Algorithm Gersende FORT LTCI, CNRS, Telecom ParisTech Université Paris-Saclay, 75013, Paris, France Large-scale inverse problems and optimization Applications to image processing

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer

More information

arxiv: v2 [cs.it] 17 Jan 2018

arxiv: v2 [cs.it] 17 Jan 2018 K-means Algorithm over Compressed Binary Data arxiv:1701.03403v2 [cs.it] 17 Jan 2018 Elsa Dupraz IMT Atlantique, Lab-STICC, UBL, Brest, France Abstract We consider a networ of binary-valued sensors with

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Link Analysis Ranking

Link Analysis Ranking Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query

More information

Divide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates

Divide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates : A Distributed Algorithm with Minimax Optimal Rates Yuchen Zhang, John C. Duchi, Martin Wainwright (UC Berkeley;http://arxiv.org/pdf/1305.509; Apr 9, 014) Gatsby Unit, Tea Talk June 10, 014 Outline Motivation.

More information

The Art of Sequential Optimization via Simulations

The Art of Sequential Optimization via Simulations The Art of Sequential Optimization via Simulations Stochastic Systems and Learning Laboratory EE, CS* & ISE* Departments Viterbi School of Engineering University of Southern California (Based on joint

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

(i) The optimisation problem solved is 1 min

(i) The optimisation problem solved is 1 min STATISTICAL LEARNING IN PRACTICE Part III / Lent 208 Example Sheet 3 (of 4) By Dr T. Wang You have the option to submit your answers to Questions and 4 to be marked. If you want your answers to be marked,

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees Machine Learning Spring 2018 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning decision trees The ID3 algorithm: A greedy

More information

Secure Multiparty Computation from Graph Colouring

Secure Multiparty Computation from Graph Colouring Secure Multiparty Computation from Graph Colouring Ron Steinfeld Monash University July 2012 Ron Steinfeld Secure Multiparty Computation from Graph Colouring July 2012 1/34 Acknowledgements Based on joint

More information

Hans-Peter Kriegel, Peer Kröger, Irene Ntoutsi, Arthur Zimek

Hans-Peter Kriegel, Peer Kröger, Irene Ntoutsi, Arthur Zimek Hans-Peter Kriegel, Peer Kröger, Irene Ntoutsi, Arthur Zimek SSDBM, 20-22/7/2011, Portland OR Ludwig-Maximilians-Universität (LMU) Munich, Germany www.dbs.ifi.lmu.de Motivation Subspace clustering for

More information

Development of Stochastic Artificial Neural Networks for Hydrological Prediction

Development of Stochastic Artificial Neural Networks for Hydrological Prediction Development of Stochastic Artificial Neural Networks for Hydrological Prediction G. B. Kingston, M. F. Lambert and H. R. Maier Centre for Applied Modelling in Water Engineering, School of Civil and Environmental

More information

EM (cont.) November 26 th, Carlos Guestrin 1

EM (cont.) November 26 th, Carlos Guestrin 1 EM (cont.) Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 26 th, 2007 1 Silly Example Let events be grades in a class w 1 = Gets an A P(A) = ½ w 2 = Gets a B P(B) = µ

More information

MATH 829: Introduction to Data Mining and

MATH 829: Introduction to Data Mining and Supervised and unsupervised learning Supervised learning problems: MATH 829: Introduction to Data Mining and (X, Y ) P (X, Y ). Data Analysis Clustering I is labelled (input/output) with joint density

More information

Compressed Sensing: Extending CLEAN and NNLS

Compressed Sensing: Extending CLEAN and NNLS Compressed Sensing: Extending CLEAN and NNLS Ludwig Schwardt SKA South Africa (KAT Project) Calibration & Imaging Workshop Socorro, NM, USA 31 March 2009 Outline 1 Compressed Sensing (CS) Introduction

More information

Yuval Ishai Technion

Yuval Ishai Technion Winter School on, Israel 30/1/2011-1/2/2011 Yuval Ishai Technion 1 Several potential advantages Unconditional security Guaranteed output and fairness Universally composable security This talk: efficiency

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion

More information

Learning Vector Quantization

Learning Vector Quantization Learning Vector Quantization Neural Computation : Lecture 18 John A. Bullinaria, 2015 1. SOM Architecture and Algorithm 2. Vector Quantization 3. The Encoder-Decoder Model 4. Generalized Lloyd Algorithms

More information