k-variates++: more pluses in the k-means++
|
|
- Susanna Jones
- 6 years ago
- Views:
Transcription
1 (formerly NICTA) k-variates++: more pluses in the k-means++ m p 7 3 n o M, 9 Poster #2 Richard Nock, Raphaël Canyasse, Roksana Boreli, Frank Nielsen DATA61 ANU TECHNION ECOLE POLYTECHNIQUE UNSW SONY CS LABS, INC wwwdata61csiroau
2 In this talk k-variates A generalization of the popular k-means++ seeding! Two theorems on k-variates++! guarantees on approximation of the global optimum! likelihood ratio bound between neighbouring instances! Applications: reductions between clustering algorithms + approximation bounds of new clustering algorithms, privacy 2
3 In this talk k-variates A generalization of the popular k-means++ seeding! Two theorems on k-variates++! And more! guarantees on approximation of the global optimum! (see poster) likelihood ratio bound between neighbouring instances! Applications: reductions between clustering algorithms + approximation bounds of new clustering algorithms, privacy 3
4 In this talk k-variates A generalization of the popular k-means++ seeding! Two theorems on k-variates++! And more! guarantees on approximation of the global optimum! (see poster)! likelihood ratio bound between neighbouring instances! (see paper!) Applications: reductions between clustering algorithms + approximation bounds of new clustering algorithms, privacy 4
5 Motivation k-means++ seeding = a gold standard in clustering:! utterly simple to implement (iteratively pick centers squ distance to previous centers)! k-means++ distributed on-line assumption-free (expected) approximation guarantee wrt the k-means global optimum: E C [potential] apple (2 + log k) 8 opt (Arthur & Vassilvitskii, SODA 2007)! streamed no closed form centroid tensors more potentials Inspired many variants (tensor clustering, distributed, data stream, on-line, parallel clustering, clustering without centroids in closed form, etc) 5
6 Motivation Approaches are spawns of k-means++:! k-variates modify the algorithm (eg )! use it as building block! Our objective:! distributed on-line k-means++ more applications all in the same bag : a generalisation of k-means++ from which such approaches would be just instanciations reductions! ) streamed ) Because general new applications no closed form centroid more potentials 6
7 k-means++ Arthur & Vassilvitskii, SODA 07 Input: data A R d with A = m, k 2 N ; Step 1: Initialise centers C ;; Step 2: for t =1, 2,,k 21: randomly sample a qt A,withq 1 = u m and, for t>1, q t (a) = D t (a) X a 0 2A D t (a 0 )! 1, where D t (a) =min x2c ka xk2 2 ; 22: x a; 23: C C [ {x}; Output: C; 7
8 k-variates Input: data A R d with A = m, k 2 N, random variables {X a, a 2 A}, probe functions } t : A! R d (t 1); Step 1: Initialise centers C ;; Step 2: for t =1, 2,,k 21: randomly sample a qt A,withq 1 = u m and, for t>1, q t (a) = D t (a) X a 0 2A D t (a 0 )! 1, where D t (a) =min x2c k} t(a) xk 2 2 ; 22: randomly sample x X a ; 23: C C [ {x}; Output: C; 8
9 Two theorems & applications 9
10 Theorem 1 approximation of global optimum k-means potential for C:, with! ( 0) Suppose is -stretching: for any optimal cluster with size > 1 and any, (A; C) = P a2a ka c(a)k2 2 } t A a 0 2 A (A; C) (A; {a 0 }) apple (1 + ) (} t (A); C) (} t (A); {} t (a 0 )}), 8t E C k variates++ [ (A; C)] apple (2 + log k) Then, with =(6+4 ) opt +2 bias +2 var c(a) = arg min ka c2c ck2 2 opt = X a2a ka c opt (a)k 2 2 bias = X a2a ke[x a ] c opt (a)k 2 2 var = X a2a tr (cov[x a ]) 10
11 Theorem 1 approximation of global optimum k-means potential for C:, with! ( 0) Suppose is -stretching: for any optimal cluster with size > 1 and any, (A; C) = P a2a ka c(a)k2 2 } t A a 0 2 A (A; C) (A; {a 0 }) apple (1 + ) (} t (A); C) (} t (A); {} t (a 0 )}), 8t E C k variates++ [ (A; C)] apple (2 + log k) Then, with =(6+4 ) opt +2 bias +2 var c(a) = arg min ka c2c ck2 2 k-means++:! probe = Id! = Diracs X opt bias = X a2a ka c opt (a)k 2 2 = X a2a ke[x a ] c opt (a)k 2 2 var = X a2a tr (cov[x a ]) 11
12 Theorem 1 approximation of global optimum k-means potential for C:, with! ( 0) Suppose is -stretching: for any optimal cluster with size > 1 and any, (A; C) = P a2a ka c(a)k2 2 } t A a 0 2 A (A; C) (A; {a 0 }) apple (1 + ) (} t (A); C) (} t (A); {} t (a 0 )}), 8t E C k variates++ [ (A; C)] apple (2 + log k) Then, with =(6+4 ) opt +2 bias +2 var c(a) = arg min ka c2c ck2 2 k-means++: bias = opt var =0 =0 ) =8 opt opt bias var = X a2a ka c opt (a)k 2 2 = X a2a ke[x a ] c opt (a)k 2 2 = X a2a tr (cov[x a ]) 12
13 Remarks Guarantee approaches statistical lowerbound (Fréchet-Cramér-Rao-Darmois)! Can be better than Arthur-Vassilvitskii bound, in particular if bias < bias opt = knob from which background / domain knowledge may improve the general bound 13
14 Applications Reductions from k-variates++ ) approximability ratios! pick clustering algorithm L,! show that expected output of L= that of k-variates++ for particular choices of } t and X (note: no computational constraint, just need existence)! Get approximability ratio for L! 14
15 Summary (poster, paper) } t X Setting AlgorithmL Probe functions Densities Batch k-means++ Identity Diracs Distributed d-k-means++ Identity Uniform, support = subsets Distributed p + d-k-means++ Identity Non uniform, compact support Streaming s-k-means++ synopses Diracs On-line ol-k-means++ point (batch not hit) Diracs / closest center (batch hit) 15
16 Summary (poster, paper) } t X Setting AlgorithmL Probe functions Densities Batch k-means++ Identity Diracs Distributed d-k-means++ Identity Uniform, support = subsets Distributed p + d-k-means++ Identity Non uniform, compact support Streaming s-k-means++ synopses Diracs On-line ol-k-means++ point (batch not hit) Diracs / closest center (batch hit) 16
17 Distributed clustering Setting: {data nodes = Forgy nodes} & special node Sampling node no data & non-uniform sampling eg hybrid, server-assisted P2P networks! (or Forgy node) (F 1, A 1 ) (F 2, A 2 ) (F 3, A 3 ) N k data points! no data! communicated communicated (F 4, A 4 ) (F 5, A 5 ) Forgy nodes data ([ i A i = A) & uniform sampling 17
18 Algorithm + Theorem Algorithm: iterate for t =1, 2,,k:! d-k-means++ N chooses (non-uniformly, D ) Forgy node, say! t F i F i samples (uniformly) point a t 2 A i, sends to! F j, 8j F j, 8j computes & sends d j 2 R + to N, which updates D! t Theorem: E[ (A, C)] apple (2 + log k), with and F s = P i2[n] P a2a i kc(a i ) ak 2 2 the spread of Forgy nodes! Remarks: global optimum on total data; bound gets opt all the better as Forgy nodes aggregate local data a t = 10 opt +6 F s 18
19 Theorem 2 likelihood ratio bound for neighbour samples Assumption: = Ball(L 2,R), all satisfy (see eg differential privacy) X (dp Xa 0 /dp Xa )(x) apple %(R), 8a, a 0 2 A, 8x 2 19
20 Theorem 2 likelihood ratio bound for neighbour samples Fix For any neighbour A 0 A (differ from 1), 0 < w, s 1are spread and monotonicity parameters! } t =Id() P C k variates++ [C A 0 ] P C k variates++ [C A] apple (1 + w ) k 1 + f(k) w (1 + s ) k 1 %(R) They can be estimated / computed from data! definition in In general, they! 0 with m(formal poster / paper) 20
21 Theorem 2 likelihood ratio bound for neighbour samples } t =Id() Fix For any neighbour A 0 A (differ from 1), P C k variates++ [C A 0 ] P C k variates++ [C A] apple (1 + w ) k 1 + f(k) w (1 + s ) k 1 %(R) Conditions for ) 1 & ) 0? 21
22 Theorem 2 likelihood ratio bound for neighbour samples } t =Id() Fix For any neighbour A 0 A(differ from 1), P C k variates++ [C A 0 ] P C k variates++ [C A] apple (1 + w ) k 1 + f(k) w (1 + s ) k 1 %(R) If densities of all are in [ m, M ] 63 0, with prob P[C A 0 ] P[C A] apple 1+ X a M as long as k! apple m 2 4 M m k pm No w, s in bound (proof exhibits small values whp, experiments display such values) Application in differential privacy (sublinear noise!) ! k %(2R) m d+1 k 2 d m {z } o(1) 1 22
23 Experiments k-variates++ ( d-k-means++ ) vs k-means++ & k-means (Bahmani & al 2012), simulated data, d=50, sample peers E[ A i ] = 500 with until For each peer, (a) data uniformly sampled in an hyper rectangle + (b) p% of points given to a random peer (increases, problem more difficult ) F s P i A i F s (p) F s (0) k p 23
24 Experiments k-variates++ (d-k-means++) vs k-means++ & k-meansk (Bahmani & al 2012) (used with their best parameters) k 7 k-variates++ ++ s n a e m k s t a be p 4 5 (k-means++) (H) = (d-k-means++) (H) 6 k p (k-meansk ) (H)
25 Conclusions We provide a generalisation of k-means++ with guaranteed approximation of the global optimum! k-variates++ can be used as is (eg privacy, k-means++) or to prove approximation properties of other algorithms via reductions between clustering algorithms Come see the poster for more examples! Future: use Theorems to address stability, generalisation and smoothed analysis 25
26 Thank you! k-variates Questions? 26
k-variates++: more pluses in the k-means++
k-variates++: more pluses in the k-means++ arxiv:698v [cslg] 3 Feb 6 Richard Nock Nicta & The Australian National University richardnock@nictacomau Raphaël Canyasse Ecole Polytechnique & The Technion raphaelcanyasse@polytechniqueedu
More informationTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest Neighbors Frank Nielsen 1 Paolo Piro 2 Michel Barlaud 2 1 Ecole Polytechnique, LIX, Palaiseau, France 2 CNRS / University of Nice-Sophia Antipolis, Sophia
More informationClustering and Gaussian Mixture Models
Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap
More informationPrivacy and Fault-Tolerance in Distributed Optimization. Nitin Vaidya University of Illinois at Urbana-Champaign
Privacy and Fault-Tolerance in Distributed Optimization Nitin Vaidya University of Illinois at Urbana-Champaign Acknowledgements Shripad Gade Lili Su argmin x2x SX i=1 i f i (x) Applications g f i (x)
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationLecture 3 k-means++ & the Impossibility Theorem
COMS 4995: Unsupervised Learning (Summer 18) May 29, 2018 Lecture 3 k-means++ & the Impossibility Theorem Instructor: Nakul Verma Scribes: Zongkai Tian Instead of arbitrarily initializing cluster centers
More informationTarget Tracking and Classification using Collaborative Sensor Networks
Target Tracking and Classification using Collaborative Sensor Networks Xiaodong Wang Department of Electrical Engineering Columbia University p.1/3 Talk Outline Background on distributed wireless sensor
More informationComputing the Entropy of a Stream
Computing the Entropy of a Stream To appear in SODA 2007 Graham Cormode graham@research.att.com Amit Chakrabarti Dartmouth College Andrew McGregor U. Penn / UCSD Outline Introduction Entropy Upper Bound
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationClustering with k-means and Gaussian mixture distributions
Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to
More informationU Logo Use Guidelines
Information Theory Lecture 3: Applications to Machine Learning U Logo Use Guidelines Mark Reid logo is a contemporary n of our heritage. presents our name, d and our motto: arn the nature of things. authenticity
More informationDistributed Systems Gossip Algorithms
Distributed Systems Gossip Algorithms He Sun School of Informatics University of Edinburgh What is Gossip? Gossip algorithms In a gossip algorithm, each node in the network periodically exchanges information
More informationComparison of Modern Stochastic Optimization Algorithms
Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,
More informationNon-Convex Optimization. CS6787 Lecture 7 Fall 2017
Non-Convex Optimization CS6787 Lecture 7 Fall 2017 First some words about grading I sent out a bunch of grades on the course management system Everyone should have all their grades in Not including paper
More informationPrivacy-Preserving Data Mining
CS 380S Privacy-Preserving Data Mining Vitaly Shmatikov slide 1 Reading Assignment Evfimievski, Gehrke, Srikant. Limiting Privacy Breaches in Privacy-Preserving Data Mining (PODS 2003). Blum, Dwork, McSherry,
More informationQUANTIZATION FOR DISTRIBUTED ESTIMATION IN LARGE SCALE SENSOR NETWORKS
QUANTIZATION FOR DISTRIBUTED ESTIMATION IN LARGE SCALE SENSOR NETWORKS Parvathinathan Venkitasubramaniam, Gökhan Mergen, Lang Tong and Ananthram Swami ABSTRACT We study the problem of quantization for
More informationK-means. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. November 19 th, Carlos Guestrin 1
EM Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 19 th, 2007 2005-2007 Carlos Guestrin 1 K-means 1. Ask user how many clusters they d like. e.g. k=5 2. Randomly guess
More informationIntroduction to Machine Learning HW6
CS 189 Spring 2018 Introduction to Machine Learning HW6 Your self-grade URL is http://eecs189.org/self_grade?question_ids=1_1,1_ 2,2_1,2_2,3_1,3_2,3_3,4_1,4_2,4_3,4_4,4_5,4_6,5_1,5_2,6. This homework is
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationSubmodular Functions Properties Algorithms Machine Learning
Submodular Functions Properties Algorithms Machine Learning Rémi Gilleron Inria Lille - Nord Europe & LIFL & Univ Lille Jan. 12 revised Aug. 14 Rémi Gilleron (Mostrare) Submodular Functions Jan. 12 revised
More informationEfficient Primal- Dual Graph Algorithms for Map Reduce
Efficient Primal- Dual Graph Algorithms for Map Reduce Joint work with Bahman Bahmani Ashish Goel, Stanford Kamesh Munagala Duke University Modern Data Models Over the past decade, many commodity distributed
More informationVariations. ECE 6540, Lecture 10 Maximum Likelihood Estimation
Variations ECE 6540, Lecture 10 Last Time BLUE (Best Linear Unbiased Estimator) Formulation Advantages Disadvantages 2 The BLUE A simplification Assume the estimator is a linear system For a single parameter
More informationApplications of Hidden Markov Models
18.417 Introduction to Computational Molecular Biology Lecture 18: November 9, 2004 Scribe: Chris Peikert Lecturer: Ross Lippert Editor: Chris Peikert Applications of Hidden Markov Models Review of Notation
More information1 Maximizing a Submodular Function
6.883 Learning with Combinatorial Structure Notes for Lecture 16 Author: Arpit Agarwal 1 Maximizing a Submodular Function In the last lecture we looked at maximization of a monotone submodular function,
More informationSupplemental Material for TKDE
Supplemental Material for TKDE-05-05-035 Kijung Sh, Lee Sael, U Kang PROPOSED METHODS Proofs of Update Rules In this section, we present the proofs of the update rules Section 35 of the ma paper Specifically,
More informationA New Penalty-SQP Method
Background and Motivation Illustration of Numerical Results Final Remarks Frank E. Curtis Informs Annual Meeting, October 2008 Background and Motivation Illustration of Numerical Results Final Remarks
More informationThe Karcher Mean of Points on SO n
The Karcher Mean of Points on SO n Knut Hüper joint work with Jonathan Manton (Univ. Melbourne) Knut.Hueper@nicta.com.au National ICT Australia Ltd. CESAME LLN, 15/7/04 p.1/25 Contents Introduction CESAME
More informationPattern Recognition and Machine Learning. Perceptrons and Support Vector machines
Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3
More informationComputer Vision Group Prof. Daniel Cremers. 14. Clustering
Group Prof. Daniel Cremers 14. Clustering Motivation Supervised learning is good for interaction with humans, but labels from a supervisor are hard to obtain Clustering is unsupervised learning, i.e. it
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationSimple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017
Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient
More informationSzemerédi-type clustering of peer-to-peer streaming system
Szemerédi-type clustering of peer-to-peer streaming system Vesa Pehkonen and Hannu Reittu VTT Finland MODELING, ANALYSIS, AND CONTROL OF COMPLEX NETWORKS Cnet 2011, 9.Sept. San Francisco agenda: review
More informationGeneralized Concomitant Multi-Task Lasso for sparse multimodal regression
Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Mathurin Massias https://mathurinm.github.io INRIA Saclay Joint work with: Olivier Fercoq (Télécom ParisTech) Alexandre Gramfort
More informationStochastic Composition Optimization
Stochastic Composition Optimization Algorithms and Sample Complexities Mengdi Wang Joint works with Ethan X. Fang, Han Liu, and Ji Liu ORFE@Princeton ICCOPT, Tokyo, August 8-11, 2016 1 / 24 Collaborators
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationthe tree till a class assignment is reached
Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal
More informationAn Evolving Gradient Resampling Method for Machine Learning. Jorge Nocedal
An Evolving Gradient Resampling Method for Machine Learning Jorge Nocedal Northwestern University NIPS, Montreal 2015 1 Collaborators Figen Oztoprak Stefan Solntsev Richard Byrd 2 Outline 1. How to improve
More information1 EM Primer. CS4786/5786: Machine Learning for Data Science, Spring /24/2015: Assignment 3: EM, graphical models
CS4786/5786: Machine Learning for Data Science, Spring 2015 4/24/2015: Assignment 3: EM, graphical models Due Tuesday May 5th at 11:59pm on CMS. Submit what you have at least once by an hour before that
More informationNeural Networks: Optimization & Regularization
Neural Networks: Optimization & Regularization Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) NN Opt & Reg
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationReview of Probability. CS1538: Introduction to Simulations
Review of Probability CS1538: Introduction to Simulations Probability and Statistics in Simulation Why do we need probability and statistics in simulation? Needed to validate the simulation model Needed
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,
More informationClustering with k-means and Gaussian mixture distributions
Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2014-2015 Jakob Verbeek, ovember 21, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15
More informationImportance Sampling for Minibatches
Importance Sampling for Minibatches Dominik Csiba School of Mathematics University of Edinburgh 07.09.2016, Birmingham Dominik Csiba (University of Edinburgh) Importance Sampling for Minibatches 07.09.2016,
More informationIntroduction to second order approximation. SGZ Macro Week 3, Day 4 Lecture 1
Introduction to second order approximation 1 Outline A. Basic concepts in static models 1. What are first and second order approximations? 2. What are implications for accuracy of welfare calculations
More informationMobile Robot Localization
Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations
More informationCSC 411: Lecture 04: Logistic Regression
CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic
More informationA fast randomized algorithm for approximating an SVD of a matrix
A fast randomized algorithm for approximating an SVD of a matrix Joint work with Franco Woolfe, Edo Liberty, and Vladimir Rokhlin Mark Tygert Program in Applied Mathematics Yale University Place July 17,
More informationAdvanced Optimization
Advanced Optimization Lecture 3: 1: Randomized Algorithms for for Continuous Discrete Problems Problems November 22, 2016 Master AIC Université Paris-Saclay, Orsay, France Anne Auger INRIA Saclay Ile-de-France
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationLoss Functions and Optimization. Lecture 3-1
Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative: Live Questions We ll use Zoom to take questions from remote students live-streaming the lecture Check Piazza for instructions and
More informationLecture 10 September 27, 2016
CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 10 September 27, 2016 Scribes: Quinten McNamara & William Hoza 1 Overview In this lecture, we focus on constructing coresets, which are
More informationGenerative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis
Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28 Context : Computational Anatomy Context and motivations :
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationCaesar s Taxi Prediction Services
1 Caesar s Taxi Prediction Services Predicting NYC Taxi Fares, Trip Distance, and Activity Paul Jolly, Boxiao Pan, Varun Nambiar Abstract In this paper, we propose three models each predicting either taxi
More informationClustering using Mixture Models
Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior
More informationLecture 5: November 19, Minimizing the maximum intracluster distance
Analysis of DNA Chips and Gene Networks Spring Semester, 2009 Lecture 5: November 19, 2009 Lecturer: Ron Shamir Scribe: Renana Meller 5.1 Minimizing the maximum intracluster distance 5.1.1 Introduction
More informationGreedy Maximization Framework for Graph-based Influence Functions
Greedy Maximization Framework for Graph-based Influence Functions Edith Cohen Google Research Tel Aviv University HotWeb '16 1 Large Graphs Model relations/interactions (edges) between entities (nodes)
More informationDS-GA 1002 Lecture notes 11 Fall Bayesian statistics
DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian
More informationLearning theory. Ensemble methods. Boosting. Boosting: history
Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over
More informationIn the Name of God. Lectures 15&16: Radial Basis Function Networks
1 In the Name of God Lectures 15&16: Radial Basis Function Networks Some Historical Notes Learning is equivalent to finding a surface in a multidimensional space that provides a best fit to the training
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationMulti-Dimensional Online Tracking
Multi-Dimensional Online Tracking Ke Yi and Qin Zhang Hong Kong University of Science & Technology SODA 2009 January 4-6, 2009 1-1 A natural problem Bob: tracker f(t) g(t) Alice: observer (t, g(t)) t 2-1
More informationDM534: Introduction to Computer Science Autumn term Exercise Clustering: Clustering, Color Histograms
University of Southern Denmark IMADA Rolf Fagerberg Richard Roettger based on the work of Arthur Zimek DM: Introduction to Computer Science Autumn term 0 Exercise Clustering: Clustering, Color Histograms
More informationPersonalized Social Recommendations Accurate or Private
Personalized Social Recommendations Accurate or Private Presented by: Lurye Jenny Paper by: Ashwin Machanavajjhala, Aleksandra Korolova, Atish Das Sarma Outline Introduction Motivation The model General
More informationCSE 150. Assignment 6 Summer Maximum likelihood estimation. Out: Thu Jul 14 Due: Tue Jul 19
SE 150. Assignment 6 Summer 2016 Out: Thu Jul 14 ue: Tue Jul 19 6.1 Maximum likelihood estimation A (a) omplete data onsider a complete data set of i.i.d. examples {a t, b t, c t, d t } T t=1 drawn from
More informationCMPUT651: Differential Privacy
CMPUT65: Differential Privacy Homework assignment # 2 Due date: Apr. 3rd, 208 Discussion and the exchange of ideas are essential to doing academic work. For assignments in this course, you are encouraged
More informationAn introduction to clustering techniques
- ABSTRACT Cluster analysis has been used in a wide variety of fields, such as marketing, social science, biology, pattern recognition etc. It is used to identify homogenous groups of cases to better understand
More informationArtificial Intelligence Heuristic Search Methods
Artificial Intelligence Heuristic Search Methods Chung-Ang University, Jaesung Lee The original version of this content is created by School of Mathematics, University of Birmingham professor Sandor Zoltan
More informationApplication of the Cross-Entropy Method to Clustering and Vector Quantization
Application of the Cross-Entropy Method to Clustering and Vector Quantization Dirk P. Kroese and Reuven Y. Rubinstein and Thomas Taimre Faculty of Industrial Engineering and Management, Technion, Haifa,
More informationCOMP 551 Applied Machine Learning Lecture 14: Neural Networks
COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,
More informationThe Perron-Frobenius Theorem. Consider a non-zero linear operator B on R n that sends the non-negative orthant into itself.
Consider a non-zero linear operator B on R n that sends the non-negative orthant into itself. Consider a non-zero linear operator B on R n that sends the non-negative orthant into itself. Let be the simplex
More informationMobile Robot Localization
Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationPerturbed Proximal Gradient Algorithm
Perturbed Proximal Gradient Algorithm Gersende FORT LTCI, CNRS, Telecom ParisTech Université Paris-Saclay, 75013, Paris, France Large-scale inverse problems and optimization Applications to image processing
More informationStochastic Optimization Algorithms Beyond SG
Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More informationLecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron
CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer
More informationarxiv: v2 [cs.it] 17 Jan 2018
K-means Algorithm over Compressed Binary Data arxiv:1701.03403v2 [cs.it] 17 Jan 2018 Elsa Dupraz IMT Atlantique, Lab-STICC, UBL, Brest, France Abstract We consider a networ of binary-valued sensors with
More informationMLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net
More informationSolving Corrupted Quadratic Equations, Provably
Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin
More informationLink Analysis Ranking
Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query
More informationDivide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates
: A Distributed Algorithm with Minimax Optimal Rates Yuchen Zhang, John C. Duchi, Martin Wainwright (UC Berkeley;http://arxiv.org/pdf/1305.509; Apr 9, 014) Gatsby Unit, Tea Talk June 10, 014 Outline Motivation.
More informationThe Art of Sequential Optimization via Simulations
The Art of Sequential Optimization via Simulations Stochastic Systems and Learning Laboratory EE, CS* & ISE* Departments Viterbi School of Engineering University of Southern California (Based on joint
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More information(i) The optimisation problem solved is 1 min
STATISTICAL LEARNING IN PRACTICE Part III / Lent 208 Example Sheet 3 (of 4) By Dr T. Wang You have the option to submit your answers to Questions and 4 to be marked. If you want your answers to be marked,
More informationLearning Decision Trees
Learning Decision Trees Machine Learning Spring 2018 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning decision trees The ID3 algorithm: A greedy
More informationSecure Multiparty Computation from Graph Colouring
Secure Multiparty Computation from Graph Colouring Ron Steinfeld Monash University July 2012 Ron Steinfeld Secure Multiparty Computation from Graph Colouring July 2012 1/34 Acknowledgements Based on joint
More informationHans-Peter Kriegel, Peer Kröger, Irene Ntoutsi, Arthur Zimek
Hans-Peter Kriegel, Peer Kröger, Irene Ntoutsi, Arthur Zimek SSDBM, 20-22/7/2011, Portland OR Ludwig-Maximilians-Universität (LMU) Munich, Germany www.dbs.ifi.lmu.de Motivation Subspace clustering for
More informationDevelopment of Stochastic Artificial Neural Networks for Hydrological Prediction
Development of Stochastic Artificial Neural Networks for Hydrological Prediction G. B. Kingston, M. F. Lambert and H. R. Maier Centre for Applied Modelling in Water Engineering, School of Civil and Environmental
More informationEM (cont.) November 26 th, Carlos Guestrin 1
EM (cont.) Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 26 th, 2007 1 Silly Example Let events be grades in a class w 1 = Gets an A P(A) = ½ w 2 = Gets a B P(B) = µ
More informationMATH 829: Introduction to Data Mining and
Supervised and unsupervised learning Supervised learning problems: MATH 829: Introduction to Data Mining and (X, Y ) P (X, Y ). Data Analysis Clustering I is labelled (input/output) with joint density
More informationCompressed Sensing: Extending CLEAN and NNLS
Compressed Sensing: Extending CLEAN and NNLS Ludwig Schwardt SKA South Africa (KAT Project) Calibration & Imaging Workshop Socorro, NM, USA 31 March 2009 Outline 1 Compressed Sensing (CS) Introduction
More informationYuval Ishai Technion
Winter School on, Israel 30/1/2011-1/2/2011 Yuval Ishai Technion 1 Several potential advantages Unconditional security Guaranteed output and fairness Universally composable security This talk: efficiency
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationA Randomized Approach for Crowdsourcing in the Presence of Multiple Views
A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion
More informationLearning Vector Quantization
Learning Vector Quantization Neural Computation : Lecture 18 John A. Bullinaria, 2015 1. SOM Architecture and Algorithm 2. Vector Quantization 3. The Encoder-Decoder Model 4. Generalized Lloyd Algorithms
More information