cient and E ective Similarity Search based on Earth Mover s Distance
|
|
- Bennett Robinson
- 5 years ago
- Views:
Transcription
1 36th International Conference on Very Large Data Bases E cient and E ective Similarity Search over Probabilistic Data based on Earth Mover s Distance Jia Xu 1, Zhenjie Zhang 2, Anthony K.H. Tung 2, Ge Yu 1 1 Northeastern University, China 2 National University of Singapore, Republic of Singapore
2 Agenda Motivation Rltd Related works Definitions Index construction Algorithms for EMD based similarity il i search Experimental results Conclusion
3 Motivation Probabilistic data management Where do probabilities come from? Discrete events? Car ID Color Brand Probability NEU2010 Red Mazda 0.7 NEU2010 Silver Honda 0.3 SGD2010 Black BMW 0.5 Not easy to get the probabilities in real scenario
4 Motivation Probabilistic data management Where do probabilities come from? Discrete events? Car ID Color Brand Probability NEU2010 Red Mazda 0.7 NEU2010 Silver Honda 0.3 SGD2010 Black BMW 0.5 Not easy to get the probabilities in real scenario
5 Motivation Where do probabilities come from? Probability histograms are naturally available
6 Motivation Where do probabilities come from? Probability histograms are naturally available Samples of the pixels Pixel R G B P P P P
7 Motivation Where do probabilities come from? Probability histograms are naturally available Samples of the pixels Pixel R G B P P P P D color histogram
8 Motivation Where do probabilities come from? Probability histograms are naturally available Samples of the pixels Pixel R G B P P P P More examples on sensor network, RFID, etc. 3D color histogram
9 Motivation Queries on probabilistic histograms Accumulated range query Top k query Similarity search over histograms is more important
10 Motivation Queries on probabilistic histograms Accumulated range query Top k query Similarity search over histograms is more important
11 Motivation Distance metric on probabilistic domain? Euclidean distance? L p distance? H H1 H H
12 Motivation Distance metric on probabilistic domain? Euclidean distance? L p distance? H H1 H H
13 Motivation Distance metric on probabilistic domain? Euclidean distance? L p distance? H H1 H H
14 Motivation Distance metric on probabilistic domain? Euclidean distance? L p distance? Against human intuition H H1 L 1 (H 1, H 2 )=2 > L 1 (H 1, H 3 )=1.15 H H
15 Motivation Intuition of the Earth Mover s Distance (EMD) Work flow moved between bins H1 H1 H2 H3
16 Motivation Intuition of the Earth Mover s Distance (EMD) Work flow moved between bins Distance moved betweenbins bins H1 H1 H2 H3
17 Motivation Intuition of the Earth Mover s Distance (EMD) Work flow moved between bins Distance moved betweenbins bins H1 H1 EMD(H 1, H 2 ) < EMD(H 1, H 3 ) H2 H3
18 Motivation Intuition of the Earth Mover s Distance (EMD) Work flow moved between bins Distance moved betweenbins bins H1 H1 Better reflect the human intuition H2 H3
19 Motivation In this paper, we probabilistic records are represented by histograms discuss the problem of similarity search based on the Earth Mover s Distance (EMD) present a new indexing i scheme to answer similarity il it queries based on EMD with efficiency and effectiveness 19
20 Agenda Motivation Rltd Related works Definitions Index construction Algorithms for EMD based similarity il i search Experimental results Conclusion 20
21 RltdW Related Works Scan And Filter framework Ira Assent et al, Approximation techniques for indexing i the earth mover's distance in multimedia databases ICDE 2006 Marc Wichterich et al, Efficient EMD-based similarity search in multimedia databases via flexible dimensionality reduction SIGMOD 2008 Preprocessing: dimensionality reduction Scan: the histogram records are scan one by one and filtered by two lower bound filters on EMD Refinement: if the lower bound filters cannot prune the record, run complete EMD computation 21
22 RltdW Related Works Our work Does not scan the whole data set Index all records with a forest of B+ trees Easy to be implemented in commercial relational dtb database Achieves high scalability and concurrency 22
23 Agenda Motivation Rltd Related works Definitions Index construction Algorithms for EMD based similarity il i search Experimental results Conclusion 23
24 Dfiiti Definitions Formal definition of EMD Linear programming problem P Q
25 Dfiiti Definitions Formal definition of EMD Linear programming problem P Q
26 Dfiiti Definitions Formal definition of EMD Linear programming problem P f 2g3 =0.5 f 56 =0.2 f 78 =0.3 Q
27 Dfiiti Definitions Formal definition of EMD Linear programming problem Constraint 1: cannot send more earth than there is
28 Dfiiti Definitions Formal definition of EMD Linear programming problem Constraint 1: cannot send more earth than there is Constraint 2: cannot receive more earth than it can hold
29 Dfiiti Definitions Formal definition of EMD Linear programming problem Constraint 1: cannot send more earth than there is Constraint 2: cannot receive more earth than it can hold Constraint 3: The amount of earth moved should not be a negative number
30 Dfiiti Definitions Formal definition of EMD Linear programming problem Constraint 1: cannot send more earth than there is Constraint 2: cannot receive more earth than it can hold Constraint 3: The amount of earth moved should not be a negative number Time complexity: O(N 3 logn) negative number
31 Agenda Motivation Rltd Related works Preliminaries Index construction Algorithms for EMD based similarity il i search Experimental results Conclusion 31
32 Pi Primal Dual ld ltransformation Primal problem (EMD)
33 Pi Primal Dual ld ltransformation Primal problem (EMD) Dual problem
34 Pi Primal Dual ld ltransformation Primal problem (EMD) Dual problem
35 Pi Primal Dual ld ltransformation Primal problem (EMD) Dual problem
36 Pi Primal Dual ld ltransformation Feasible solution F = {f ij } in the primal problem Ф= {π i, Ф j } in the dual problem
37 Pi Primal Dual ld ltransformation Feasible solution F = {f ij } in the primal problem Ф= {π i, Ф j } in the dual problem Ф j p[i]+ π i q[j] EMD(p,q) f ij d ij
38 Pi Primal Dual ld ltransformation Feasible solution F = {f ij } in the primal problem Ф= {π i, Ф j } in the dual problem Ф j p[i]+ π i q[j] EMD(p,q) f ij d ij Dual Space Lower bound
39 Pi Primal Dual ld ltransformation Feasible solution F = {f ij } in the primal problem Ф= {π i, Ф j } in the dual problem Ф j p[i]+ π i q[j] EMD(p,q) f ij d ij Dual Space Lower bound Primal Space(EMD) Upper bound
40 Pi Primal Dual ld ltransformation Two good properties of dual space Independency constrains are independent to those histograms involved (i.e., p and q ) can derive any feasible solution Ф ={Ф i, π j } regardless of p and q 40
41 Pi Primal Dual ld ltransformation Two good properties of dual space Independency constrains are independent to those histograms involved (i.e., p and q ) can derive any feasible solution Ф ={Ф i, π j} regardless of p and q Lower bounding a feasible solution Ф ={Ф i, π j }, derives a lower bound (i.e., φ i pi [] + π i q [ j ] ) for EMD(p,q) i i j j can use lower bound to filter out those no hit records 41
42 Index Construction ti Dual space:
43 Index Construction ti Dual space: Ф π
44 Index Construction ti Dual space: P Ф π
45 Index Construction ti Dual space: P Ф π key = Ф i p i
46 Index Construction ti Dual space: P Ф Q π key = Ф i p i
47 Index Construction ti Dual space: P Ф Q π key = counterkey = Ф i p i π i q i
48 Index Construction ti Dual space: P Ф Q π key = counterkey = Ф i p i π i q i key + counterkey = Ф i p i + π i q i
49 Index Construction ti Dual space: P Ф Q π key = counterkey = Ф i p i π i q i EMD(P, Q) key + counterkey Lower bound = Ф i p i + π i q i
50 Index Construction ti Dual space: P Ф Q π S 5 key = counterkey = Ф i p i π i q i EMD(P, Q) key + counterkey Lower bound = Ф i p i + π i q i
51 Index Construction ti Dual space: P Ф Q π S 5 key = Ф i p i counterkey = π i q i EMD(P, Q) key + counterkey Lower bound = Ф i p i + π i q i
52 Index Construction ti Dual space: P Ф Q π S 5 key = Ф i p i counterkey = π i q i EMD(P, Q) key + counterkey Lower bound = Ф i p i + π i q i
53 Index Construction ti B+ tree filter S 5
54 Index Construction ti B+ tree filter S 5
55 Index Construction ti B+ tree filter S 5
56 Index Construction ti A forest of B+ trees
57 Index Construction ti A forest of B+ trees Ф 1 = <Ф i, π j > Ф 2 = <Ф i, π j >
58 Index Construction ti A forest of B+ trees Ф 1 = <Ф i, π j > Ф 2 = <Ф i, π j > S1 S2 S3 S4 S5 S6 S7 S8 S15 S9 S13 S5 S14 S12 S7
59 Agenda Motivation Rltd Related works Definitions Index Structure Construction Algorithms for EMD based Similarity il i Search Experimental results Conclusion 59
60 Algo. Range Query T 1 T 2 S1 S2 S9 S4 S5 S3 S6 S7 S8 S15 S9 S3 S5 S14 S12 S7
61 Algo. Range Query T 1 T 2 S1 S2 S9 S4 S5 S3 S6 S7 S8 S15 S9 S3 S5 S14 S12 S7
62 Algo. Range Query T 1 T 2 S1 S2 S9 S4 S5 S3 S6 S7 S8 S15 S9 S3 S5 S14 S12 S7 S3 S5
63 Algo. Range Query T 1 T 2 S1 S2 S9 S4 S5 S3 S6 S7 S8 S15 S9 S3 S5 S14 S12 S7 S3 S5 R EMD Filter (Sigmod 08)
64 Algo. Range Query T 1 T 2 S1 S2 S9 S4 S5 S3 S6 S7 S8 S15 S9 S3 S5 S14 S12 S7 S3 S5 R EMD Filter (Sigmod 08) LB IM Filter (ICDE 06)
65 Algo. Range Query T 1 T 2 S1 S2 S9 S4 S5 S3 S6 S7 S8 S15 S9 S3 S5 S14 S12 S7 S3 S5 R EMD Filter (Sigmod 08) LB IM Filter (ICDE 06) UB p Filter
66 Algo. Range Query T 1 T 2 S1 S2 S9 S4 S5 S3 S6 S7 S8 S15 S9 S3 S5 S14 S12 S7 S3 S5 R EMD Filter (Sigmod 08) LB IM Filter (ICDE 06) UB p Filter Exact EMD cal.
67 Algo. knn Query (k=2) T 1 T 2 S1 S2 S3 S4 S5 S6 S7 S8 S15 S9 S13 S5 S14 S12 S7
68 Algo. knn Query (k=2) T 1 T 2 S1 S2 S3 S4 S5 S6 S7 S8 S15 S9 S13 S5 S14 S12 S7 Key(q,Ф 1 ) Key(q,Ф 2 )
69 Algo. knn Query (k=2) T 1 T 2 S1 S2 S3 S4 S5 S6 S7 S8 S15 S9 S13 S5 S14 S12 S7 CR1 CL1 Key(q,Ф 1 ) Key(q,Ф 2 ) CR2 S6 S7 S8 S14 S12 S7 CL2 S5 S4 S3 S2 S1 S5 S13 S9 S15
70 Algo. knn Query (k=2) T 1 T 2 S1 S2 S3 S4 S5 S6 S7 S8 S15 S9 S13 S5 S14 S12 S7 CR1 CL1 CR2 S6 S7 S8 S14 S12 S7 CL2 S5 S4 S3 S2 S1 S5 S13 S9 S15 Buffer
71 Algo. knn Query (k=2) T 1 T 2 S1 S2 S3 S4 S5 S6 S7 S8 S15 S9 S13 S5 S14 S12 S7 CR1 CL1 CR2 S6 S7 S8 S14 S12 S7 CL2 S5 S4 S3 S2 S1 S5 S13 S9 S15 Buffer S5 S5 2 S6 1 S14 1
72 Algo. knn Query (k=2) T 1 T 2 S1 S2 S3 S4 S5 S6 S7 S8 S15 S9 S13 S5 S14 S12 S7 CR1 CL1 CR2 S6 S7 S8 S14 S12 S7 CL2 S5 S4 S3 S2 S1 S5 S13 S9 S15 Buffer S5 S5 S5 2 S6 1 S14 1 S6 1 S14 1 S7 1 S4 1 S12 1 S13 1
73 Algo. knn Query (k=2) T 1 T 2 S1 S2 S3 S4 S5 S6 S7 S8 S15 S9 S13 S5 S14 S12 S7 CR1 CL1 CR2 S6 S7 S8 S14 S12 S7 CL2 S5 S4 S3 S2 S1 S5 S13 S9 S15 Buffer S5 S5 S5 S7 S5 2 S6 1 S14 1 S6 1 S14 1 S7 1 S4 1 S12 1 S13 1 S7 2 S14 1 S6 1 S4 1 S12 1 S13 1 S8 1 S1 1 S9 1
74 Algo. knn Query (k=2) T 1 T 2 S1 S2 S3 S4 S5 S6 S7 S8 S15 S9 S13 S5 S14 S12 S7 CR1 CL1 CR2 S6 S7 S8 S14 S12 S7 CL2 S5 S4 S3 S2 S1 S5 S13 S9 S15 Buffer S5 S5 S5 S7 S5 2 S6 1 S14 1 S6 1 S14 1 S7 1 S4 1 S12 1 S13 1 S7 2 S14 1 S6 1 S4 1 S12 1 S13 1 S8 1 S1 1 S9 1
75 Agenda Motivation Rltd Related works Definitions Index structure construction Algorithms for EMD based similarity il i search Experimental results Conclusion
76 Experimental lresults Data set RETINA1 Dimension: 96 Cardinality = 3,932 IRMA Dimension: 199 Cardinality = 10,000 DBLP Dimension: 8 Cardinality = 250,000000
77 Experimental lresults Comparison algorithms TBI (Tree Based Index) TBI C: Using clustering based method to find feasible solution TBI R: Using random sampling based method to find feasible solution SAR (Scan And Refinement) Measurement CPU time, # of EMD refinement 77
78 Experimental lresults: Range Queries
79 Experimental lresults: k NN Queries
80 Conclusions We present a new B + tree based indexing scheme for the general purposesofsimilarityof similarity search on Earth Mover's Distance Our index method relies on the primal dual ltheory to construct mapping functions from the original probabilistic space to one dimensional i domain Our B+ tree based index framework is High scalability High efficiency
81 For more information Introduction.html ti
82 36th International Conference on Very Large Data Bases Thank you! contact: ed
Anticipatory DTW for Efficient Similarity Search in Time Series Databases
Anticipatory DTW for Efficient Similarity Search in Time Series Databases Ira Assent Marc Wichterich Ralph Krieger Hardy Kremer Thomas Seidl RWTH Aachen University, Germany Aalborg University, Denmark
More informationWhat is (certain) Spatio-Temporal Data?
What is (certain) Spatio-Temporal Data? A spatio-temporal database stores triples (oid, time, loc) In the best case, this allows to look up the location of an object at any time 2 What is (certain) Spatio-Temporal
More informationScalable Processing of Snapshot and Continuous Nearest-Neighbor Queries over One-Dimensional Uncertain Data
VLDB Journal manuscript No. (will be inserted by the editor) Scalable Processing of Snapshot and Continuous Nearest-Neighbor Queries over One-Dimensional Uncertain Data Jinchuan Chen Reynold Cheng Mohamed
More informationTASM: Top-k Approximate Subtree Matching
TASM: Top-k Approximate Subtree Matching Nikolaus Augsten 1 Denilson Barbosa 2 Michael Böhlen 3 Themis Palpanas 4 1 Free University of Bozen-Bolzano, Italy augsten@inf.unibz.it 2 University of Alberta,
More informationPredictive Nearest Neighbor Queries Over Uncertain Spatial-Temporal Data
Predictive Nearest Neighbor Queries Over Uncertain Spatial-Temporal Data Jinghua Zhu, Xue Wang, and Yingshu Li Department of Computer Science, Georgia State University, Atlanta GA, USA, jhzhu.ellen@gmail.com
More informationNearest Neighbor Search for Relevance Feedback
Nearest Neighbor earch for Relevance Feedbac Jelena Tešić and B.. Manjunath Electrical and Computer Engineering Department University of California, anta Barbara, CA 93-9 {jelena, manj}@ece.ucsb.edu Abstract
More informationScalable Algorithms for Distribution Search
Scalable Algorithms for Distribution Search Yasuko Matsubara (Kyoto University) Yasushi Sakurai (NTT Communication Science Labs) Masatoshi Yoshikawa (Kyoto University) 1 Introduction Main intuition and
More informationLOCALITY SENSITIVE HASHING FOR BIG DATA
1 LOCALITY SENSITIVE HASHING FOR BIG DATA Wei Wang, University of New South Wales Outline 2 NN and c-ann Existing LSH methods for Large Data Our approach: SRS [VLDB 2015] Conclusions NN and c-ann Queries
More informationA Survey on Spatial-Keyword Search
A Survey on Spatial-Keyword Search (COMP 6311C Advanced Data Management) Nikolaos Armenatzoglou 06/03/2012 Outline Problem Definition and Motivation Query Types Query Processing Techniques Indices Algorithms
More informationLarge Scale Environment Partitioning in Mobile Robotics Recognition Tasks
Large Scale Environment in Mobile Robotics Recognition Tasks Boyan Bonev, Miguel Cazorla {boyan,miguel}@dccia.ua.es Robot Vision Group Department of Computer Science and Artificial Intelligence University
More informationImage Characteristics
1 Image Characteristics Image Mean I I av = i i j I( i, j 1 j) I I NEW (x,y)=i(x,y)-b x x Changing the image mean Image Contrast The contrast definition of the entire image is ambiguous In general it is
More informationProbabilistic Similarity Search for Uncertain Time Series
Probabilistic Similarity Search for Uncertain Time Series Johannes Aßfalg, Hans-Peter Kriegel, Peer Kröger, Matthias Renz Ludwig-Maximilians-Universität München, Oettingenstr. 67, 80538 Munich, Germany
More informationFinding Frequent Items in Probabilistic Data
Finding Frequent Items in Probabilistic Data Qin Zhang, Hong Kong University of Science & Technology Feifei Li, Florida State University Ke Yi, Hong Kong University of Science & Technology SIGMOD 2008
More informationOn the Optimality of the Greedy Heuristic in Wavelet Synopses for Range Queries
On the Optimality of the Greedy Heuristic in Wavelet Synopses for Range Queries Yossi Matias School of Computer Science Tel Aviv University Tel Aviv 69978, Israel matias@tau.ac.il Daniel Urieli School
More informationMulti-Approximate-Keyword Routing Query
Bin Yao 1, Mingwang Tang 2, Feifei Li 2 1 Department of Computer Science and Engineering Shanghai Jiao Tong University, P. R. China 2 School of Computing University of Utah, USA Outline 1 Introduction
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationTowards Indexing Functions: Answering Scalar Product Queries Arijit Khan, Pouya Yanki, Bojana Dimcheva, Donald Kossmann
Towards Indexing Functions: Answering Scalar Product Queries Arijit Khan, Pouya anki, Bojana Dimcheva, Donald Kossmann Systems Group ETH Zurich Moving Objects Intersection Finding Position at a future
More informationECE 592 Topics in Data Science
ECE 592 Topics in Data Science Final Fall 2017 December 11, 2017 Please remember to justify your answers carefully, and to staple your test sheet and answers together before submitting. Name: Student ID:
More informationUncertain Time-Series Similarity: Return to the Basics
Uncertain Time-Series Similarity: Return to the Basics Dallachiesa et al., VLDB 2012 Li Xiong, CS730 Problem Problem: uncertain time-series similarity Applications: location tracking of moving objects;
More informationConvex Hull-Based Metric Refinements for Topological Spatial Relations
ABSTRACT Convex Hull-Based Metric Refinements for Topological Spatial Relations Fuyu (Frank) Xu School of Computing and Information Science University of Maine Orono, ME 04469-5711, USA fuyu.xu@maine.edu
More informationInderjit Dhillon The University of Texas at Austin
Inderjit Dhillon The University of Texas at Austin ( Universidad Carlos III de Madrid; 15 th June, 2012) (Based on joint work with J. Brickell, S. Sra, J. Tropp) Introduction 2 / 29 Notion of distance
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationSearching Dimension Incomplete Databases
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 6, NO., JANUARY 3 Searching Dimension Incomplete Databases Wei Cheng, Xiaoming Jin, Jian-Tao Sun, Xuemin Lin, Xiang Zhang, and Wei Wang Abstract
More informationInformation Retrieval Basic IR models. Luca Bondi
Basic IR models Luca Bondi Previously on IR 2 d j q i IRM SC q i, d j IRM D, Q, R q i, d j d j = w 1,j, w 2,j,, w M,j T w i,j = 0 if term t i does not appear in document d j w i,j and w i:1,j assumed to
More informationNearest-Neighbor Searching Under Uncertainty
Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat, and Swaminathan Sankararaman. To appear in PODS 2012. Nearest-Neighbor Searching S: a set of n points
More informationA Bayesian Method for Guessing the Extreme Values in a Data Set
A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu University of Florida May, 2008 Mingxi Wu (University of Florida) May, 2008 1 / 74 Outline Problem Definition Example Applications
More informationOn Two Class-Constrained Versions of the Multiple Knapsack Problem
On Two Class-Constrained Versions of the Multiple Knapsack Problem Hadas Shachnai Tami Tamir Department of Computer Science The Technion, Haifa 32000, Israel Abstract We study two variants of the classic
More informationECE 661: Homework 10 Fall 2014
ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;
More informationFeasibility of Periodic Scan Schedules
Feasibility of Periodic Scan Schedules Ants PI Meeting, Seattle, May 2000 Bruno Dutertre System Design Laboratory SRI International e-mail: bruno@sdl.sri.com 1 Scan Scheduling Scan scheduling: Given n
More informationThe Optimal Mechanism in Differential Privacy
The Optimal Mechanism in Differential Privacy Quan Geng Advisor: Prof. Pramod Viswanath 11/07/2013 PhD Final Exam of Quan Geng, ECE, UIUC 1 Outline Background on Differential Privacy ε-differential Privacy:
More informationTransport planning of the Dutch Gas Network
Transport planning of the Dutch Gas Network Finding challenging transport situations by a sufficient method by S. Maring Literature Study Student number: 4226518 Project duration: March 1, 2017 December
More informationFaloutsos, Tong ICDE, 2009
Large Graph Mining: Patterns, Tools and Case Studies Christos Faloutsos Hanghang Tong CMU Copyright: Faloutsos, Tong (29) 2-1 Outline Part 1: Patterns Part 2: Matrix and Tensor Tools Part 3: Proximity
More informationMultiple-Site Distributed Spatial Query Optimization using Spatial Semijoins
11 Multiple-Site Distributed Spatial Query Optimization using Spatial Semijoins Wendy OSBORN a, 1 and Saad ZAAMOUT a a Department of Mathematics and Computer Science, University of Lethbridge, Lethbridge,
More informationOn the Semantics and Evaluation of Top-k Queries in Probabilistic Databases
On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases Xi Zhang Jan Chomicki SUNY at Buffalo September 23, 2008 Xi Zhang, Jan Chomicki (SUNY at Buffalo) Topk Queries in Prob. DB September
More information1 Regression with High Dimensional Data
6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:
More informationCompressive Sensing of Sparse Tensor
Shmuel Friedland Univ. Illinois at Chicago Matheon Workshop CSA2013 December 12, 2013 joint work with Q. Li and D. Schonfeld, UIC Abstract Conventional Compressive sensing (CS) theory relies on data representation
More informationIterative Laplacian Score for Feature Selection
Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,
More informationSIFT: SCALE INVARIANT FEATURE TRANSFORM BY DAVID LOWE
SIFT: SCALE INVARIANT FEATURE TRANSFORM BY DAVID LOWE Overview Motivation of Work Overview of Algorithm Scale Space and Difference of Gaussian Keypoint Localization Orientation Assignment Descriptor Building
More informationLarge-Scale Behavioral Targeting
Large-Scale Behavioral Targeting Ye Chen, Dmitry Pavlov, John Canny ebay, Yandex, UC Berkeley (This work was conducted at Yahoo! Labs.) June 30, 2009 Chen et al. (KDD 09) Large-Scale Behavioral Targeting
More informationNearest Neighbor Search with Keywords in Spatial Databases
776 Nearest Neighbor Search with Keywords in Spatial Databases 1 Sphurti S. Sao, 2 Dr. Rahila Sheikh 1 M. Tech Student IV Sem, Dept of CSE, RCERT Chandrapur, MH, India 2 Head of Department, Dept of CSE,
More informationClustering and Model Integration under the Wasserstein Metric. Jia Li Department of Statistics Penn State University
Clustering and Model Integration under the Wasserstein Metric Jia Li Department of Statistics Penn State University Clustering Data represented by vectors or pairwise distances. Methods Top- down approaches
More informationClassification Based on Logical Concept Analysis
Classification Based on Logical Concept Analysis Yan Zhao and Yiyu Yao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 E-mail: {yanzhao, yyao}@cs.uregina.ca Abstract.
More informationMultiscale bi-harmonic Analysis of Digital Data Bases and Earth moving distances.
Multiscale bi-harmonic Analysis of Digital Data Bases and Earth moving distances. R. Coifman, Department of Mathematics, program of Applied Mathematics Yale University Joint work with M. Gavish and W.
More informationEE 6882 Visual Search Engine
/0/0 EE 688 Visual Search Engine Prof. Shih Fu Chang, Jan. 0, 0 Lecture # Visual Features: Global features and matching Evaluation metrics (Many slides from. Efors, W. Freeman, C. Kambhamettu, L. Xie,
More informationOverview: Parallelisation via Pipelining
Overview: Parallelisation via Pipelining three type of pipelines adding numbers (type ) performance analysis of pipelines insertion sort (type ) linear system back substitution (type ) Ref: chapter : Wilkinson
More informationEXERCISES SHORTEST PATHS: APPLICATIONS, OPTIMIZATION, VARIATIONS, AND SOLVING THE CONSTRAINED SHORTEST PATH PROBLEM. 1 Applications and Modelling
SHORTEST PATHS: APPLICATIONS, OPTIMIZATION, VARIATIONS, AND SOLVING THE CONSTRAINED SHORTEST PATH PROBLEM EXERCISES Prepared by Natashia Boland 1 and Irina Dumitrescu 2 1 Applications and Modelling 1.1
More informationLOWER BOUNDS FOR THE UNCAPACITATED FACILITY LOCATION PROBLEM WITH USER PREFERENCES. 1 Introduction
LOWER BOUNDS FOR THE UNCAPACITATED FACILITY LOCATION PROBLEM WITH USER PREFERENCES PIERRE HANSEN, YURI KOCHETOV 2, NENAD MLADENOVIĆ,3 GERAD and Department of Quantitative Methods in Management, HEC Montréal,
More informationk-nearest Neighbor Classification over Semantically Secure Encry
k-nearest Neighbor Classification over Semantically Secure Encrypted Relational Data Reporter:Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU May 9, 2014 1 2 3 4 5 Outline 1. Samanthula B K, Elmehdwi
More informationThe τ-skyline for Uncertain Data
CCCG 2014, Halifax, Nova Scotia, August 11 13, 2014 The τ-skyline for Uncertain Data Haitao Wang Wuzhou Zhang Abstract In this paper, we introduce the notion of τ-skyline as an alternative representation
More informationA Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases
A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases Thomas Bernecker #, Tobias Emrich #, Hans-Peter Kriegel #, Nikos Mamoulis, Matthias Renz #, Andreas Züfle #
More informationMining Positive and Negative Fuzzy Association Rules
Mining Positive and Negative Fuzzy Association Rules Peng Yan 1, Guoqing Chen 1, Chris Cornelis 2, Martine De Cock 2, and Etienne Kerre 2 1 School of Economics and Management, Tsinghua University, Beijing
More informationComposite Quantization for Approximate Nearest Neighbor Search
Composite Quantization for Approximate Nearest Neighbor Search Jingdong Wang Lead Researcher Microsoft Research http://research.microsoft.com/~jingdw ICML 104, joint work with my interns Ting Zhang from
More informationDIT - University of Trento Modeling and Querying Data Series and Data Streams with Uncertainty
PhD Dissertation International Doctorate School in Information and Communication Technologies DIT - University of Trento Modeling and Querying Data Series and Data Streams with Uncertainty Michele Dallachiesa
More informationProbabilistic Verifiers: Evaluating Constrained Nearest-Neighbor Queries over Uncertain Data
Probabilistic Verifiers: Evaluating Constrained Nearest-Neighbor Queries over Uncertain Data Reynold Cheng, Jinchuan Chen, Mohamed Mokbel,Chi-YinChow Deptartment of Computing, Hong Kong Polytechnic University
More informationCorrelation Preserving Unsupervised Discretization. Outline
Correlation Preserving Unsupervised Discretization Jee Vang Outline Paper References What is discretization? Motivation Principal Component Analysis (PCA) Association Mining Correlation Preserving Discretization
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)
More informationIndexing Objects Moving on Fixed Networks
Indexing Objects Moving on Fixed Networks Elias Frentzos Department of Rural and Surveying Engineering, National Technical University of Athens, Zographou, GR-15773, Athens, Hellas efrentzo@central.ntua.gr
More informationRevenue Maximization in a Cloud Federation
Revenue Maximization in a Cloud Federation Makhlouf Hadji and Djamal Zeghlache September 14th, 2015 IRT SystemX/ Telecom SudParis Makhlouf Hadji Outline of the presentation 01 Introduction 02 03 04 05
More informationDiffusion Geometries, Diffusion Wavelets and Harmonic Analysis of large data sets.
Diffusion Geometries, Diffusion Wavelets and Harmonic Analysis of large data sets. R.R. Coifman, S. Lafon, MM Mathematics Department Program of Applied Mathematics. Yale University Motivations The main
More informationAIBO experiences change of surface incline.
NW Computational Intelligence Laboratory AIBO experiences change of surface incline. Decisions based only on kinesthetic experience vector ICNC 07, China 8/27/07 # 51 NW Computational Intelligence Laboratory
More informationDM534: Introduction to Computer Science Autumn term Exercise Clustering: Clustering, Color Histograms
University of Southern Denmark IMADA Rolf Fagerberg Richard Roettger based on the work of Arthur Zimek DM: Introduction to Computer Science Autumn term 0 Exercise Clustering: Clustering, Color Histograms
More informationSimilarity Search in High Dimensions II. Piotr Indyk MIT
Similarity Search in High Dimensions II Piotr Indyk MIT Approximate Near(est) Neighbor c-approximate Nearest Neighbor: build data structure which, for any query q returns p P, p-q cr, where r is the distance
More informationImproving Performance of Similarity Measures for Uncertain Time Series using Preprocessing Techniques
Improving Performance of Similarity Measures for Uncertain Time Series using Preprocessing Techniques Mahsa Orang Nematollaah Shiri 27th International Conference on Scientific and Statistical Database
More informationSimilarity Join Size Estimation using Locality Sensitive Hashing
Similarity Join Size Estimation using Locality Sensitive Hashing Hongrae Lee, Google Inc Raymond Ng, University of British Columbia Kyuseok Shim, Seoul National University Highly Similar, but not Identical,
More informationNearest Neighbor Searching Under Uncertainty. Wuzhou Zhang Supervised by Pankaj K. Agarwal Department of Computer Science Duke University
Nearest Neighbor Searching Under Uncertainty Wuzhou Zhang Supervised by Pankaj K. Agarwal Department of Computer Science Duke University Nearest Neighbor Searching (NNS) S: a set of n points in R. q: any
More informationData Analytics Beyond OLAP. Prof. Yanlei Diao
Data Analytics Beyond OLAP Prof. Yanlei Diao OPERATIONAL DBs DB 1 DB 2 DB 3 EXTRACT TRANSFORM LOAD (ETL) METADATA STORE DATA WAREHOUSE SUPPORTS OLAP DATA MINING INTERACTIVE DATA EXPLORATION Overview of
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality
More informationWolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 14 Indexes for Multimedia Data 14 Indexes for Multimedia
More informationWolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13 Indexes for Multimedia Data 13 Indexes for Multimedia
More informationExact and Approximate Flexible Aggregate Similarity Search
Noname manuscript No. (will be inserted by the editor) Exact and Approximate Flexible Aggregate Similarity Search Feifei Li, Ke Yi 2, Yufei Tao 3, Bin Yao 4, Yang Li 4, Dong Xie 4, Min Wang 5 University
More informationHardness of Embedding Metric Spaces of Equal Size
Hardness of Embedding Metric Spaces of Equal Size Subhash Khot and Rishi Saket Georgia Institute of Technology {khot,saket}@cc.gatech.edu Abstract. We study the problem embedding an n-point metric space
More informationFractal Dimension and Vector Quantization
Fractal Dimension and Vector Quantization [Extended Abstract] Krishna Kumaraswamy Center for Automated Learning and Discovery, Carnegie Mellon University skkumar@cs.cmu.edu Vasileios Megalooikonomou Department
More information15-780: LinearProgramming
15-780: LinearProgramming J. Zico Kolter February 1-3, 2016 1 Outline Introduction Some linear algebra review Linear programming Simplex algorithm Duality and dual simplex 2 Outline Introduction Some linear
More informationOptimal Data-Dependent Hashing for Approximate Near Neighbors
Optimal Data-Dependent Hashing for Approximate Near Neighbors Alexandr Andoni 1 Ilya Razenshteyn 2 1 Simons Institute 2 MIT, CSAIL April 20, 2015 1 / 30 Nearest Neighbor Search (NNS) Let P be an n-point
More informationA Novel Gaussian Based Similarity Measure for Clustering Customer Transactions Using Transaction Sequence Vector
A Novel Gaussian Based Similarity Measure for Clustering Customer Transactions Using Transaction Sequence Vector M.S.B.Phridvi Raj 1, Vangipuram Radhakrishna 2, C.V.Guru Rao 3 1 prudviraj.kits@gmail.com,
More informationB490 Mining the Big Data
B490 Mining the Big Data 1 Finding Similar Items Qin Zhang 1-1 Motivations Finding similar documents/webpages/images (Approximate) mirror sites. Application: Don t want to show both when Google. 2-1 Motivations
More informationmaxz = 3x 1 +4x 2 2x 1 +x 2 6 2x 1 +3x 2 9 x 1,x 2
ex-5.-5. Foundations of Operations Research Prof. E. Amaldi 5. Branch-and-Bound Given the integer linear program maxz = x +x x +x 6 x +x 9 x,x integer solve it via the Branch-and-Bound method (solving
More informationMinimal Attribute Space Bias for Attribute Reduction
Minimal Attribute Space Bias for Attribute Reduction Fan Min, Xianghui Du, Hang Qiu, and Qihe Liu School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu
More informationRaRE: Social Rank Regulated Large-scale Network Embedding
RaRE: Social Rank Regulated Large-scale Network Embedding Authors: Yupeng Gu 1, Yizhou Sun 1, Yanen Li 2, Yang Yang 3 04/26/2018 The Web Conference, 2018 1 University of California, Los Angeles 2 Snapchat
More informationSkylines. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland
Yufei Tao ITEE University of Queensland Today we will discuss problems closely related to the topic of multi-criteria optimization, where one aims to identify objects that strike a good balance often optimal
More informationPavel Zezula, Giuseppe Amato,
SIMILARITY SEARCH The Metric Space Approach Pavel Zezula Giuseppe Amato Vlastislav Dohnal Michal Batko Table of Content Part I: Metric searching in a nutshell Foundations of metric space searching Survey
More informationOutline. Approximation: Theory and Algorithms. Application Scenario. 3 The q-gram Distance. Nikolaus Augsten. Definition and Properties
Outline Approximation: Theory and Algorithms Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 3 March 13, 2009 2 3 Nikolaus Augsten (DIS) Approximation: Theory and
More informationResearch Reports on Mathematical and Computing Sciences
ISSN 1342-284 Research Reports on Mathematical and Computing Sciences Exploiting Sparsity in Linear and Nonlinear Matrix Inequalities via Positive Semidefinite Matrix Completion Sunyoung Kim, Masakazu
More informationSynthetic Aperture Radar Ship Detection Using Modified Gamma Fisher Metric
Progress In Electromagnetics Research Letters, Vol. 68, 85 91, 2017 Synthetic Aperture Radar Ship Detection Using Modified Gamma Fisher Metric Meng Yang * Abstract This article proposes a novel ship detection
More informationDistances & Similarities
Introduction to Data Mining Distances & Similarities CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Distances & Similarities Yale - Fall 2016 1 / 22 Outline
More informationImproving Performance of Similarity Measures for Uncertain Time Series using Preprocessing Techniques
Improving Performance of Similarity Measures for Uncertain Time Series using Preprocessing Techniques Mahsa Orang Nematollaah Shiri 27th International Conference on Scientific and Statistical Database
More informationFPGA Implementation of a HOG-based Pedestrian Recognition System
MPC Workshop Karlsruhe 10/7/2009 FPGA Implementation of a HOG-based Pedestrian Recognition System Sebastian Bauer sebastian.bauer@fh-aschaffenburg.de Laboratory for Pattern Recognition and Computational
More informationInterior-Point versus Simplex methods for Integer Programming Branch-and-Bound
Interior-Point versus Simplex methods for Integer Programming Branch-and-Bound Samir Elhedhli elhedhli@uwaterloo.ca Department of Management Sciences, University of Waterloo, Canada Page of 4 McMaster
More informationSemi Supervised Distance Metric Learning
Semi Supervised Distance Metric Learning wliu@ee.columbia.edu Outline Background Related Work Learning Framework Collaborative Image Retrieval Future Research Background Euclidean distance d( x, x ) =
More informationwhere X is the feasible region, i.e., the set of the feasible solutions.
3.5 Branch and Bound Consider a generic Discrete Optimization problem (P) z = max{c(x) : x X }, where X is the feasible region, i.e., the set of the feasible solutions. Branch and Bound is a general semi-enumerative
More informationMultimedia Databases 1/29/ Indexes for Multimedia Data Indexes for Multimedia Data Indexes for Multimedia Data
1/29/2010 13 Indexes for Multimedia Data 13 Indexes for Multimedia Data 13.1 R-Trees Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
More informationOnline Learning With Kernel
CS 446 Machine Learning Fall 2016 SEP 27, 2016 Online Learning With Kernel Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Stochastic Gradient Descent Algorithms Regularization Algorithm Issues
More information2 Push and Pull Source Sink Push
1 Plaing Push vs Pull: Models and Algorithms for Disseminating Dnamic Data in Networks. R.C. Chakinala, A. Kumarasubramanian, Kofi A. Laing R. Manokaran, C. Pandu Rangan, R. Rajaraman Push and Pull 2 Source
More informationIssues Concerning Dimensionality and Similarity Search
Issues Concerning Dimensionality and Similarity Search Jelena Tešić, Sitaram Bhagavathy, and B. S. Manjunath Electrical and Computer Engineering Department University of California, Santa Barbara, CA 936-956
More informationGeometric Facility Location Problems on Uncertain Data
Utah State University DigitalCommons@USU All Graduate Theses and Dissertations Graduate Studies 8-2017 Geometric Facility Location Problems on Uncertain Data Jingru Zhang Utah State University Follow this
More informationEdge Image Description Using Angular Radial Partitioning
Edge Image Description Using Angular Radial Partitioning A.Chalechale, A. Mertins and G. Naghdy IEE Proc.-Vis. Image Signal Processing, 2004 Slides by David Anthony Torres Computer Science and Engineering
More informationCarnegie Mellon Univ. Dept. of Computer Science Database Applications. SAMs - Detailed outline. Spatial Access Methods - problem
Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #26: Spatial Databases (R&G ch. 28) SAMs - Detailed outline spatial access methods problem dfn R-trees Faloutsos 2
More informationCSC 411 Lecture 17: Support Vector Machine
CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17
More informationResearch Article A New Global Optimization Algorithm for Solving Generalized Geometric Programming
Mathematical Problems in Engineering Volume 2010, Article ID 346965, 12 pages doi:10.1155/2010/346965 Research Article A New Global Optimization Algorithm for Solving Generalized Geometric Programming
More informationEfficient query evaluation
Efficient query evaluation Maria Luisa Sapino Set of values E.g. select * from EMPLOYEES where SALARY = 1500; Result of a query Sorted list E.g. select * from CAR-IMAGE where color = red ; 2 Queries as
More information