QUALITY AND EFFICIENCY IN KERNEL DENSITY ESTIMATES FOR LARGE DATA
|
|
- Philomena Newman
- 6 years ago
- Views:
Transcription
1 QUALITY AND EFFICIENCY IN KERNEL DENSITY ESTIMATES FOR LARGE DATA Yan Zheng Jeffrey Jestes Jeff M. Philips Feifei Li University of Utah
2 Salt Lake City Sigmod 2014 Snowbird
3 Kernel Density Estimates Let P be an input point set of size n in domain µ from an unknown distribution f. P 2 R 1 or P 2 R 2 in our experiments. Kernel Density Estimates (KDE) function approximates the density of f at any possible input point x 2 µ KDE P (x) = 1 X K(p, x) P p2p KDE represents a continuous distribution from a finite set of points.
4 Kernel Density Estimates
5 Drawback of Histograms Relative Frequency Query value change significantly across the boundary of the bin The choice of origin can have quite an effect Relative Frequency Annual Income (K) Annual Income (K)
6 Example of Kernel Density Estimates 0.02 Kernel Density Estimate Annual Income (K)
7 Examples of kernels (2D) Gaussian: Triangle: Epanechnikov: Ball K(p, x) = exp K(p, x) = 3 2 max K(p, x) = K(p, x) = 2 max 2 kp xk 2 n 0, 1 n 0, ( 1/ 2 if kp xk < 0 otherwise kp xk o kp xk 2 2 o Gaussian Triangle Epanechnikov Ball
8 Comparing of Different Kernels 0.02 Gaussian Triangular Epan Triangular Gaussian Epan Bal Kernel Density Estimate Annual Income (K)
9 Approximate Kernel Density Estimates Kernel Density Estimates kde P (x) = 1 P P p2p K(p, x) KDE is very expensive for large dataset. Each query takes O(n) " -approximation " Given input P, and error, the goal is to produce a small set Q to ensure: max kde P (x) kde Q (x) = kkde P kde Q k 1 apple " x2r d
10 MergeReduce(MR) framework Initialization phase: Arbitrarily decompose P into disjoint sets of size k Combination phase Merge Step k+k->2k Reduce Step 2k->k (Matching Pairs & Randomly Select one) Proceeds in log(n/k) rounds P 1 P 2 P 3 P 4 P i P j P 1,P 2,...,P n/k P 12 P 12 P 34 P 34 P ij P 1234 Q
11 Matching Min-cost Matching [Phillips SODA13] Edmonds Blossom algorithm 2 approximation Greedy algorithm Introduce More Efficient Matching Edge Map E M = {e(p, q) (p, q) 2 M} Given a disk B, E M \ B C M,B = P = {e(p, q) \ B e(p, q) 2 E M } e(p,q)2e M \B kp qk2 C M = max B C M,B. Min-cost Matching C M = O(1) Our Matching C M = O(log(1/"))
12 Grid Matching Min-cost matching C M = O(1) Grid Matching C M = O(log(1/")) Running time O(n log(1/")) Starting with i = 0, construct Gi, length l ",i = p 2 "2 i 2 Inside of each cell, match points arbitrarily. Only the unmatched points survive to the next round. Each cell in Gi +1 is the union of 4 cells from Gi After log(1/ ")+1rounds, match points left arbitrarily.
13 Grid Matching Point set P R 2 with n points, we can construct Q giving an ε-approximate KDE in O(n log 1 " ) running time and Q = O(1 " log n 1 log1.5 " ) Using Grid-MR With Random sampling (RS). Random sample a set P from P with size O((1/" 2 ) log(1/ )) O(n + 1 " 2 log 1 " ) running time and Q = O(1 1 " log2.5 " ) Using Grid-MR+RS
14 Deterministic Z-Order Selection The levels of the Z-order curve are reminiscent of the grids Zrandom Compute the Z-order of all points, and of every two points discard one at random. Repeat this discarding of half the points until the remaining set is sufficiently small. Zorder Select one point from each set of P /k points in the sorted order using ε-approximate quantiles algorithm
15 Other Baseline Methods Improved fast gauss transform (IFGT) K-center clustering Hermite expansion For n points, m query points, reduce time from to No error vs. size or error vs. time guarantees. Only works for Gaussian Kernel Kernel herding (KH) Explore the reproducing kernel Hilbert space. It adds at each step the single point p 2 P \ Q to Q which most decreases Running Time Bounds only `2 kkde Q kde P k 2 O( Q n) error. O(mn) O(m + n)
16 Centralized Experiments Data sets: OpenStreetMap 160million records in 6.6GB Default setting 10 million records 10 random trials = = 200 on a domain 50,000 50,000 Test points 4000 randomly from P 1000 from the domain of P
17 Centralized Experiments Compare with Baseline Methods Grid-MR Blossom-MR Greedy-MR Kernel Herding Using Random Sampling as preprocessing step. Grid-MR Zorder Grid-MR+RS Zorder+RS O(n log(1/")) O(n 3 ) O(n 2 log n) Time (seconds) err Time (seconds) G-MR B-MR KH Greedy-MR (a) Construction time vs. err G-MR G-MR+RS Z Z+RS err (b) Construction time vs. err
18 Centralized Experiments Our construction time and size beats IFGT Just random sampling is slightly faster, but worse size bounds Time (seconds) G-MR+RS Z+RS IFGT RS Size (bytes) G-MR+RS Z+RS IFGT RS err (a) Construction time vs. err err (b) Size
19 Communication (bytes) Distributed Experiments Default setting: 50 million records G-MR+RS Z+RS RS err (a) Communication cost vs. err Size (bytes) G-MR+RS Z+RS RS err (b) Size vs. err Best small approximate, easy to construct representation of data for kernel density estimates.
20 Thank you
21
22 Observed error vs. guaranteed error 10 7 Grid RS+Grid zkern RS+zKern 10 8 err (b) err vs. "
23 Compare with Random Sampling for High Accuracy Time (seconds) G-MR+RS Z+RS RS Size (bytes) G-MR+RS Z+RS RS (x10 3 ) (x10 3 ) (b) Size vs. " (b) Size vs. " Time (seconds) G-MR+RS Z+RS RS err (c) Size vs. err Size (bytes) G-MR+RS Z+RS RS err (d) Size vs. err
Geometric Inference on Kernel Density Estimates
Geometric Inference on Kernel Density Estimates Bei Wang 1 Jeff M. Phillips 2 Yan Zheng 2 1 Scientific Computing and Imaging Institute, University of Utah 2 School of Computing, University of Utah Feb
More informationImproved Fast Gauss Transform. Fast Gauss Transform (FGT)
10/11/011 Improved Fast Gauss Transform Based on work by Changjiang Yang (00), Vikas Raykar (005), and Vlad Morariu (007) Fast Gauss Transform (FGT) Originally proposed by Greengard and Strain (1991) to
More informationScalable machine learning for massive datasets: Fast summation algorithms
Scalable machine learning for massive datasets: Fast summation algorithms Getting good enough solutions as fast as possible Vikas Chandrakant Raykar vikas@cs.umd.edu University of Maryland, CollegePark
More informationImproved Fast Gauss Transform. (based on work by Changjiang Yang and Vikas Raykar)
Improved Fast Gauss Transform (based on work by Changjiang Yang and Vikas Raykar) Fast Gauss Transform (FGT) Originally proposed by Greengard and Strain (1991) to efficiently evaluate the weighted sum
More informationComputational tractability of machine learning algorithms for tall fat data
Computational tractability of machine learning algorithms for tall fat data Getting good enough solutions as fast as possible Vikas Chandrakant Raykar vikas@cs.umd.edu University of Maryland, CollegePark
More informationMulti-Approximate-Keyword Routing Query
Bin Yao 1, Mingwang Tang 2, Feifei Li 2 1 Department of Computer Science and Engineering Shanghai Jiao Tong University, P. R. China 2 School of Computing University of Utah, USA Outline 1 Introduction
More informationL Error and Bandwidth Selection for Kernel Density Estimates of Large Data
L Error and Bandwidth Selection for Kernel Density Estimates of Large Data Yan Zheng, Jeff M. Phillips School of Computing, University of Utah Salt Lake City, UT, USA yanzheng@cs.utah.edu, jeffp@cs.utah.edu
More informationSubsampling in Smoothed Range Spaces
Subsampling in Smoothed Range Spaces Jeff M. Phillips and Yan Zheng University of Utah Abstract. We consider smoothed versions of geometric range spaces, so an element of the ground set (e.g. a point)
More informationJoint distribution optimal transportation for domain adaptation
Joint distribution optimal transportation for domain adaptation Changhuang Wan Mechanical and Aerospace Engineering Department The Ohio State University March 8 th, 2018 Joint distribution optimal transportation
More information10 Distributed Matrix Sketching
10 Distributed Matrix Sketching In order to define a distributed matrix sketching problem thoroughly, one has to specify the distributed model, data model and the partition model of data. The distributed
More informationMATH 590: Meshfree Methods
MATH 590: Meshfree Methods Chapter 14: The Power Function and Native Space Error Estimates Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2010 fasshauer@iit.edu
More informationTesting Lipschitz Functions on Hypergrid Domains
Testing Lipschitz Functions on Hypergrid Domains Pranjal Awasthi 1, Madhav Jha 2, Marco Molinaro 1, and Sofya Raskhodnikova 2 1 Carnegie Mellon University, USA, {pawasthi,molinaro}@cmu.edu. 2 Pennsylvania
More informationAssignment 4: Solutions
Math 340: Discrete Structures II Assignment 4: Solutions. Random Walks. Consider a random walk on an connected, non-bipartite, undirected graph G. Show that, in the long run, the walk will traverse each
More informationLecture 1 : Probabilistic Method
IITM-CS6845: Theory Jan 04, 01 Lecturer: N.S.Narayanaswamy Lecture 1 : Probabilistic Method Scribe: R.Krithika The probabilistic method is a technique to deal with combinatorial problems by introducing
More informationImproved Fast Gauss Transform
Improved Fast Gauss Transform Changjiang Yang Department of Computer Science University of Maryland Fast Gauss Transform (FGT) Originally proposed by Greengard and Strain (1991) to efficiently evaluate
More informationNotes on MapReduce Algorithms
Notes on MapReduce Algorithms Barna Saha 1 Finding Minimum Spanning Tree of a Dense Graph in MapReduce We are given a graph G = (V, E) on V = N vertices and E = m N 1+c edges for some constant c > 0. Our
More informationMean-Shift Tracker Computer Vision (Kris Kitani) Carnegie Mellon University
Mean-Shift Tracker 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Mean Shift Algorithm A mode seeking algorithm
More informationLecture 18: March 15
CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 18: March 15 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They may
More informationProofs and derivations
A Proofs and derivations Proposition 1. In the sheltering decision problem of section 1.1, if m( b P M + s) = u(w b P M + s), where u( ) is weakly concave and twice continuously differentiable, then f
More informationActive Learning: Disagreement Coefficient
Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which
More informationAn efficient approximation for point-set diameter in higher dimensions
CCCG 2018, Winnipeg, Canada, August 8 10, 2018 An efficient approximation for point-set diameter in higher dimensions Mahdi Imanparast Seyed Naser Hashemi Ali Mohades Abstract In this paper, we study the
More informationDiscovering Spatial and Temporal Links among RDF Data Panayiotis Smeros and Manolis Koubarakis
Discovering Spatial and Temporal Links among RDF Data Panayiotis Smeros and Manolis Koubarakis WWW2016 Workshop: Linked Data on the Web (LDOW2016) April 12, 2016 - Montréal, Canada Outline Introduction
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationCS412: Lecture #17. Mridul Aanjaneya. March 19, 2015
CS: Lecture #7 Mridul Aanjaneya March 9, 5 Solving linear systems of equations Consider a lower triangular matrix L: l l l L = l 3 l 3 l 33 l n l nn A procedure similar to that for upper triangular systems
More information2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51
2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each
More informationNearest Neighbor Searching Under Uncertainty. Wuzhou Zhang Supervised by Pankaj K. Agarwal Department of Computer Science Duke University
Nearest Neighbor Searching Under Uncertainty Wuzhou Zhang Supervised by Pankaj K. Agarwal Department of Computer Science Duke University Nearest Neighbor Searching (NNS) S: a set of n points in R. q: any
More informationDistributed Summaries
Distributed Summaries Graham Cormode graham@research.att.com Pankaj Agarwal(Duke) Zengfeng Huang (HKUST) Jeff Philips (Utah) Zheiwei Wei (HKUST) Ke Yi (HKUST) Summaries Summaries allow approximate computations:
More informationarxiv:cs/ v1 [cs.cg] 7 Feb 2006
Approximate Weighted Farthest Neighbors and Minimum Dilation Stars John Augustine, David Eppstein, and Kevin A. Wortman Computer Science Department University of California, Irvine Irvine, CA 92697, USA
More informationEnumeration. Phong Nguyễn
Enumeration Phong Nguyễn http://www.di.ens.fr/~pnguyen March 2017 References Joint work with: Yoshinori Aono, published at EUROCRYPT 2017: «Random Sampling Revisited: Lattice Enumeration with Discrete
More informationA Statistical Journey through Historic Haverford
A Statistical Journey through Historic Haverford Arthur Berg University of California, San Diego Arthur Berg A Statistical Journey through Historic Haverford 2/ 18 The Dataset Histograms Oldest? Arthur
More informationIntensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis
Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 4 Spatial Point Patterns Definition Set of point locations with recorded events" within study
More informationThe first bound is the strongest, the other two bounds are often easier to state and compute. Proof: Applying Markov's inequality, for any >0 we have
The first bound is the strongest, the other two bounds are often easier to state and compute Proof: Applying Markov's inequality, for any >0 we have Pr (1 + ) = Pr For any >0, we can set = ln 1+ (4.4.1):
More information1 Approximate Quantiles and Summaries
CS 598CSC: Algorithms for Big Data Lecture date: Sept 25, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri Suppose we have a stream a 1, a 2,..., a n of objects from an ordered universe. For simplicity
More information1 Matchings in Non-Bipartite Graphs
CS 598CSC: Combinatorial Optimization Lecture date: Feb 9, 010 Instructor: Chandra Chekuri Scribe: Matthew Yancey 1 Matchings in Non-Bipartite Graphs We discuss matching in general undirected graphs. Given
More informationMassive Experiments and Observational Studies: A Linearithmic Algorithm for Blocking/Matching/Clustering
Massive Experiments and Observational Studies: A Linearithmic Algorithm for Blocking/Matching/Clustering Jasjeet S. Sekhon UC Berkeley June 21, 2016 Jasjeet S. Sekhon (UC Berkeley) Methods for Massive
More informationLinear Sketches A Useful Tool in Streaming and Compressive Sensing
Linear Sketches A Useful Tool in Streaming and Compressive Sensing Qin Zhang 1-1 Linear sketch Random linear projection M : R n R k that preserves properties of any v R n with high prob. where k n. M =
More informationarxiv: v1 [stat.ml] 28 Oct 2017
Jinglin Chen Jian Peng Qiang Liu UIUC UIUC Dartmouth arxiv:1710.10404v1 [stat.ml] 28 Oct 2017 Abstract We propose a new localized inference algorithm for answering marginalization queries in large graphical
More informationApproximating the Minimum Closest Pair Distance and Nearest Neighbor Distances of Linearly Moving Points
Approximating the Minimum Closest Pair Distance and Nearest Neighbor Distances of Linearly Moving Points Timothy M. Chan Zahed Rahmati Abstract Given a set of n moving points in R d, where each point moves
More informationEfficient Kernel Machines Using the Improved Fast Gauss Transform
Efficient Kernel Machines Using the Improved Fast Gauss Transform Changjiang Yang, Ramani Duraiswami and Larry Davis Department of Computer Science University of Maryland College Park, MD 20742 {yangcj,ramani,lsd}@umiacs.umd.edu
More informationEffective computation of biased quantiles over data streams
Effective computation of biased quantiles over data streams Graham Cormode cormode@bell-labs.com S. Muthukrishnan muthu@cs.rutgers.edu Flip Korn flip@research.att.com Divesh Srivastava divesh@research.att.com
More informationThe τ-skyline for Uncertain Data
CCCG 2014, Halifax, Nova Scotia, August 11 13, 2014 The τ-skyline for Uncertain Data Haitao Wang Wuzhou Zhang Abstract In this paper, we introduce the notion of τ-skyline as an alternative representation
More informationLinear Algebraic Equations
Linear Algebraic Equations 1 Fundamentals Consider the set of linear algebraic equations n a ij x i b i represented by Ax b j with [A b ] [A b] and (1a) r(a) rank of A (1b) Then Axb has a solution iff
More informationCS 347 Parallel and Distributed Data Processing
CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 4: Query Optimization Query Optimization Cost estimation Strategies for exploring plans Q min CS 347 Notes 4 2 Cost Estimation Based on
More informationRESEARCH ON THE DISTRIBUTED PARALLEL SPATIAL INDEXING SCHEMA BASED ON R-TREE
RESEARCH ON THE DISTRIBUTED PARALLEL SPATIAL INDEXING SCHEMA BASED ON R-TREE Yuan-chun Zhao a, b, Cheng-ming Li b a. Shandong University of Science and Technology, Qingdao 266510 b. Chinese Academy of
More informationApproximating Graph Spanners
Approximating Graph Spanners Michael Dinitz Johns Hopkins University Joint work with combinations of Robert Krauthgamer (Weizmann), Eden Chlamtáč (Ben Gurion), Ran Raz (Weizmann), Guy Kortsarz (Rutgers-Camden)
More informationAditya Bhaskara CS 5968/6968, Lecture 1: Introduction and Review 12 January 2016
Lecture 1: Introduction and Review We begin with a short introduction to the course, and logistics. We then survey some basics about approximation algorithms and probability. We also introduce some of
More informationIntensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis
Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 5 Topic Overview 1) Introduction/Unvariate Statistics 2) Bootstrapping/Monte Carlo Simulation/Kernel
More information12 - Nonparametric Density Estimation
ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6
More informationBiased Quantiles. Flip Korn Graham Cormode S. Muthukrishnan
Biased Quantiles Graham Cormode cormode@bell-labs.com S. Muthukrishnan muthu@cs.rutgers.edu Flip Korn flip@research.att.com Divesh Srivastava divesh@research.att.com Quantiles Quantiles summarize data
More informationEFFICIENT SUMMARIZATION TECHNIQUES FOR MASSIVE DATA
EFFICIENT SUMMARIZATION TECHNIQUES FOR MASSIVE DATA by Jeffrey Jestes A thesis submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of
More informationOptimal compression of approximate Euclidean distances
Optimal compression of approximate Euclidean distances Noga Alon 1 Bo az Klartag 2 Abstract Let X be a set of n points of norm at most 1 in the Euclidean space R k, and suppose ε > 0. An ε-distance sketch
More informationRandomized Algorithms
Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours
More information1 Some loose ends from last time
Cornell University, Fall 2010 CS 6820: Algorithms Lecture notes: Kruskal s and Borůvka s MST algorithms September 20, 2010 1 Some loose ends from last time 1.1 A lemma concerning greedy algorithms and
More informationAlgorithms, Geometry and Learning. Reading group Paris Syminelakis
Algorithms, Geometry and Learning Reading group Paris Syminelakis October 11, 2016 2 Contents 1 Local Dimensionality Reduction 5 1 Introduction.................................... 5 2 Definitions and Results..............................
More informationto be more efficient on enormous scale, in a stream, or in distributed settings.
16 Matrix Sketching The singular value decomposition (SVD) can be interpreted as finding the most dominant directions in an (n d) matrix A (or n points in R d ). Typically n > d. It is typically easy to
More informationEdo Liberty Principal Scientist Amazon Web Services. Streaming Quantiles
Edo Liberty Principal Scientist Amazon Web Services Streaming Quantiles Streaming Quantiles Manku, Rajagopalan, Lindsay. Random sampling techniques for space efficient online computation of order statistics
More informationNumerical Minimization of Potential Energies on Specific Manifolds
Numerical Minimization of Potential Energies on Specific Manifolds SIAM Conference on Applied Linear Algebra 2012 22 June 2012, Valencia Manuel Gra f 1 1 Chemnitz University of Technology, Germany, supported
More informationData Streams & Communication Complexity
Data Streams & Communication Complexity Lecture 1: Simple Stream Statistics in Small Space Andrew McGregor, UMass Amherst 1/25 Data Stream Model Stream: m elements from universe of size n, e.g., x 1, x
More informationNotes on Gaussian processes and majorizing measures
Notes on Gaussian processes and majorizing measures James R. Lee 1 Gaussian processes Consider a Gaussian process {X t } for some index set T. This is a collection of jointly Gaussian random variables,
More informationPolynomial Selection. Thorsten Kleinjung École Polytechnique Fédérale de Lausanne
Polynomial Selection Thorsten Kleinjung École Polytechnique Fédérale de Lausanne Contents Brief summary of polynomial selection (no root sieve) Motivation (lattice sieving, monic algebraic polynomial)
More informationEfficient Threshold Monitoring for Distributed Probabilistic Data
Efficient Threshold Monitoring for Distributed Probabilistic Data Mingwang Tang 1, Feifei Li 1, Jeff M. Phillips 2, Jeffrey Jestes 1 1 Computer Science Department, Florida State University; 2 School of
More informationCS 450 Numerical Analysis. Chapter 8: Numerical Integration and Differentiation
Lecture slides based on the textbook Scientific Computing: An Introductory Survey by Michael T. Heath, copyright c 2018 by the Society for Industrial and Applied Mathematics. http://www.siam.org/books/cl80
More informationApproximation Algorithms for Maximum. Coverage and Max Cut with Given Sizes of. Parts? A. A. Ageev and M. I. Sviridenko
Approximation Algorithms for Maximum Coverage and Max Cut with Given Sizes of Parts? A. A. Ageev and M. I. Sviridenko Sobolev Institute of Mathematics pr. Koptyuga 4, 630090, Novosibirsk, Russia fageev,svirg@math.nsc.ru
More informationAdvances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008
Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections
More informationThe Monte Carlo Algorithm
The Monte Carlo Algorithm This algorithm was invented by N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller, J. Chem. Phys., 21, 1087 (1953). It was listed as one of the The Top
More information[Title removed for anonymity]
[Title removed for anonymity] Graham Cormode graham@research.att.com Magda Procopiuc(AT&T) Divesh Srivastava(AT&T) Thanh Tran (UMass Amherst) 1 Introduction Privacy is a common theme in public discourse
More informationSTAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 13
STAT 309: MATHEMATICAL COMPUTATIONS I FALL 208 LECTURE 3 need for pivoting we saw that under proper circumstances, we can write A LU where 0 0 0 u u 2 u n l 2 0 0 0 u 22 u 2n L l 3 l 32, U 0 0 0 l n l
More informationBEWARE OF RELATIVELY LARGE BUT MEANINGLESS IMPROVEMENTS
TECHNICAL REPORT YL-2011-001 BEWARE OF RELATIVELY LARGE BUT MEANINGLESS IMPROVEMENTS Roi Blanco, Hugo Zaragoza Yahoo! Research Diagonal 177, Barcelona, Spain {roi,hugoz}@yahoo-inc.com Bangalore Barcelona
More informationApplication: Bucket Sort
5.2.2. Application: Bucket Sort Bucket sort breaks the log) lower bound for standard comparison-based sorting, under certain assumptions on the input We want to sort a set of =2 integers chosen I+U@R from
More informationTight Approximation Ratio of a General Greedy Splitting Algorithm for the Minimum k-way Cut Problem
Algorithmica (2011 59: 510 520 DOI 10.1007/s00453-009-9316-1 Tight Approximation Ratio of a General Greedy Splitting Algorithm for the Minimum k-way Cut Problem Mingyu Xiao Leizhen Cai Andrew Chi-Chih
More informationNordhaus-Gaddum Theorems for k-decompositions
Nordhaus-Gaddum Theorems for k-decompositions Western Michigan University October 12, 2011 A Motivating Problem Consider the following problem. An international round-robin sports tournament is held between
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationPrediction of double gene knockout measurements
Prediction of double gene knockout measurements Sofia Kyriazopoulou-Panagiotopoulou sofiakp@stanford.edu December 12, 2008 Abstract One way to get an insight into the potential interaction between a pair
More information(1) Sort all time observations from least to greatest, so that the j th and (j + 1) st observations are ordered by t j t j+1 for all j = 1,..., J.
AFFIRMATIVE ACTION AND HUMAN CAPITAL INVESTMENT 8. ONLINE APPENDIX TO ACCOMPANY Affirmative Action and Human Capital Investment: Theory and Evidence from a Randomized Field Experiment, by CHRISTOPHER COTTON,
More information10-701/ Recitation : Kernels
10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer
More informationRandom Sampling on Big Data: Techniques and Applications Ke Yi
: Techniques and Applications Ke Yi Hong Kong University of Science and Technology yike@ust.hk Big Data in one slide The 3 V s: Volume Velocity Variety Integers, real numbers Points in a multi-dimensional
More informationRandomized algorithms for the approximation of matrices
Randomized algorithms for the approximation of matrices Luis Rademacher The Ohio State University Computer Science and Engineering (joint work with Amit Deshpande, Santosh Vempala, Grant Wang) Two topics
More informationEfficient Threshold Monitoring for Distributed Probabilistic Data
Efficient Threshold Monitoring for Distributed Probabilistic Data Mingwang Tang, Feifei Li, Jeff M. Phillips, Jeffrey Jestes School of Computing, University of Utah, Salt Lake City, USA {tang, lifeifei,
More information(a) Calculate the bee s mean final position on the hexagon, and clearly label this position on the figure below. Show all work.
1. A worker bee inspects a hexagonal honeycomb cell, starting at corner A. When done, she proceeds to an adjacent corner (always facing inward as shown), either by randomly moving along the lefthand edge
More informationQuerying. 1 o Semestre 2008/2009
Querying Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2008/2009 Outline 1 2 3 4 5 Outline 1 2 3 4 5 function sim(d j, q) = 1 W d W q W d is the document norm W q is the
More informationPROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS
PROBABILITY: LIMIT THEOREMS II, SPRING 218. HOMEWORK PROBLEMS PROF. YURI BAKHTIN Instructions. You are allowed to work on solutions in groups, but you are required to write up solutions on your own. Please
More informationTopological Data Analysis for Brain Networks
Topological Data Analysis for Brain Networks Relating Functional Brain Network Topology to Clinical Measures of Behavior Bei Wang Phillips 1,2 University of Utah Joint work with Eleanor Wong 1,2, Sourabh
More informationRandomized algorithm
Tutorial 4 Joyce 2009-11-24 Outline Solution to Midterm Question 1 Question 2 Question 1 Question 2 Question 3 Question 4 Question 5 Solution to Midterm Solution to Midterm Solution to Midterm Question
More informationCarnegie Mellon Univ. Dept. of Computer Science Database Applications. SAMs - Detailed outline. Spatial Access Methods - problem
Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #26: Spatial Databases (R&G ch. 28) SAMs - Detailed outline spatial access methods problem dfn R-trees Faloutsos 2
More informationIn practice one often meets a situation where the function of interest, f(x), is only represented by a discrete set of tabulated points,
1 Interpolation 11 Introduction In practice one often meets a situation where the function of interest, f(x), is only represented by a discrete set of tabulated points, {x i, y i = f(x i ) i = 1 n, obtained,
More informationStream Computation and Arthur- Merlin Communication
Stream Computation and Arthur- Merlin Communication Justin Thaler, Yahoo! Labs Joint Work with: Amit Chakrabarti, Dartmouth Graham Cormode, University of Warwick Andrew McGregor, Umass Amherst Suresh Venkatasubramanian,
More informationRandomized algorithms for the low-rank approximation of matrices
Randomized algorithms for the low-rank approximation of matrices Yale Dept. of Computer Science Technical Report 1388 Edo Liberty, Franco Woolfe, Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationAdditive Number Theory and Automata
Additive Number Theory and Automata Jeffrey Shallit School of Computer Science, University of Waterloo Waterloo, ON N2L 3G1 Canada Joint work with Carlo Sanna Daniel Kane P. Madhusudan Dirk Nowotka Aayush
More information2017 ACM ICPC ASIA GWALIOR REGIONAL CONTEST
Official Problem Set DO NOT OPEN UNTIL CONTEST BEGINS 2017 ACM ICPC ASIA GWALIOR REGIONAL CONTEST NOTE: The order of problems here can be different than the one displayed on the contest page. 1 Problem
More informationScaling up Bayesian Inference
Scaling up Bayesian Inference David Dunson Departments of Statistical Science, Mathematics & ECE, Duke University May 1, 2017 Outline Motivation & background EP-MCMC amcmc Discussion Motivation & background
More informationParallel Multivariate SpatioTemporal Clustering of. Large Ecological Datasets on Hybrid Supercomputers
Parallel Multivariate SpatioTemporal Clustering of Large Ecological Datasets on Hybrid Supercomputers Sarat Sreepathi1, Jitendra Kumar1, Richard T. Mills2, Forrest M. Hoffman1, Vamsi Sripathi3, William
More information(f(x) P 3 (x)) dx. (a) The Lagrange formula for the error is given by
1. QUESTION (a) Given a nth degree Taylor polynomial P n (x) of a function f(x), expanded about x = x 0, write down the Lagrange formula for the truncation error, carefully defining all its elements. How
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationvariance of independent variables: sum of variances So chebyshev predicts won t stray beyond stdev.
Announcements No class monday. Metric embedding seminar. Review expectation notion of high probability. Markov. Today: Book 4.1, 3.3, 4.2 Chebyshev. Remind variance, standard deviation. σ 2 = E[(X µ X
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 22 1 / 21 Overview
More informationSVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)
Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular
More informationAlgorithms for Data Science
Algorithms for Data Science CSOR W4246 Eleni Drinea Computer Science Department Columbia University Tuesday, December 1, 2015 Outline 1 Recap Balls and bins 2 On randomized algorithms 3 Saving space: hashing-based
More informationScenario Grouping and Decomposition Algorithms for Chance-constrained Programs
Scenario Grouping and Decomposition Algorithms for Chance-constrained Programs Siqian Shen Dept. of Industrial and Operations Engineering University of Michigan Joint work with Yan Deng (UMich, Google)
More informationConflict-Free Colorings of Rectangles Ranges
Conflict-Free Colorings of Rectangles Ranges Khaled Elbassioni Nabil H. Mustafa Max-Planck-Institut für Informatik, Saarbrücken, Germany felbassio, nmustafag@mpi-sb.mpg.de Abstract. Given the range space
More information