Presentation for CSG399 By Jian Wen

Similar documents
Skylines. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland

Ph.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program June 2016

Econ 2148, spring 2019 Statistical decision theory

Finding Pareto Optimal Groups: Group based Skyline

Crowdsourcing Pareto-Optimal Object Finding by Pairwise Comparisons

The Paradox Severity Linear Latency General Latency Extensions Conclusion. Braess Paradox. Julian Romero. January 22, 2008.

Differentiable Welfare Theorems Existence of a Competitive Equilibrium: Preliminaries

Knapsack Auctions. Sept. 18, 2018

NETS 412: Algorithmic Game Theory March 28 and 30, Lecture Approximation in Mechanism Design. X(v) = arg max v i (a)

MS&E 246: Lecture 18 Network routing. Ramesh Johari

: Cryptography and Game Theory Ran Canetti and Alon Rosen. Lecture 8

Game Theory. Bargaining Theory. ordi Massó. International Doctorate in Economic Analysis (IDEA) Universitat Autònoma de Barcelona (UAB)

Second Welfare Theorem

Nondeterministic finite automata

Microeconomics. Joana Pais. Fall Joana Pais

The Interdisciplinary Center, Herzliya School of Economics Advanced Microeconomics Fall Bargaining The Axiomatic Approach

Analysis of Algorithms I: Asymptotic Notation, Induction, and MergeSort

4 Lecture Applications

Design and Analysis of Algorithms

Efficient Evaluation of Semi-Skylines

Mathematical Foundations -1- Constrained Optimization. Constrained Optimization. An intuitive approach 2. First Order Conditions (FOC) 7

Game Theory: introduction and applications to computer networks

CS 573: Algorithmic Game Theory Lecture date: January 23rd, 2008

LINEAR PROGRAMMING I. a refreshing example standard form fundamental questions geometry linear algebra simplex algorithm

Lecture slides by Kevin Wayne

Topics in Theoretical Computer Science April 08, Lecture 8

The Ohio State University Department of Economics. Homework Set Questions and Answers

Economics 201B Economic Theory (Spring 2017) Bargaining. Topics: the axiomatic approach (OR 15) and the strategic approach (OR 7).

Topic 17. Analysis of Algorithms

First Welfare Theorem

Lecture notes on statistical decision theory Econ 2110, fall 2013

Combining Equity and Efficiency. in Health Care. John Hooker. Joint work with H. P. Williams, LSE. Imperial College, November 2010

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

ON A THEOREM OF KALAI AND SAMET

Fundamentals of Operations Research. Prof. G. Srinivasan. Indian Institute of Technology Madras. Lecture No. # 15

Oligopoly Theory 2 Bertrand Market Games

CMSC 451: Lecture 7 Greedy Algorithms for Scheduling Tuesday, Sep 19, 2017

On the Impossibility of Black-Box Truthfulness Without Priors

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Advanced Microeconomics

Econ Slides from Lecture 2

Linear Programming. Linear Programming I. Lecture 1. Linear Programming. Linear Programming

General Equilibrium. General Equilibrium, Berardino. Cesi, MSc Tor Vergata

Revenue Maximization in a Cloud Federation

CSE 421, Spring 2017, W.L.Ruzzo. 8. Average-Case Analysis of Algorithms + Randomized Algorithms

CS 573: Algorithmic Game Theory Lecture date: Feb 6, 2008

Three-partition Flow Cover Inequalities for Constant Capacity Fixed-charge Network Flow Problems

Lecture 9: Dynamics in Load Balancing

Ph.D. Preliminary Examination MICROECONOMIC THEORY Applied Economics Graduate Program May 2012

Reinforcement Learning Wrap-up

Efficiency, Fairness and Competitiveness in Nash Bargaining Games

HOW TO DECOMPOSE A PERMUTATION INTO A PAIR OF LABELED DYCK PATHS BY PLAYING A GAME

The assignment game: core, competitive equilibria and multiple partnership

Motivating examples Introduction to algorithms Simplex algorithm. On a particular example General algorithm. Duality An application to game theory

Dynamic Programming. Reading: CLRS Chapter 15 & Section CSE 6331: Algorithms Steve Lai

EconS Advanced Microeconomics II Handout on Subgame Perfect Equilibrium (SPNE)

Calculus Review Session. Rob Fetter Duke University Nicholas School of the Environment August 13, 2015

THE EXISTENCE AND USEFULNESS OF EQUALITY CUTS IN THE MULTI-DEMAND MULTIDIMENSIONAL KNAPSACK PROBLEM LEVI DELISSA. B.S., Kansas State University, 2014

Game Theory Lecture 10+11: Knowledge

Lecture 3: Big-O and Big-Θ

a i,1 a i,j a i,m be vectors with positive entries. The linear programming problem associated to A, b and c is to find all vectors sa b

The Fundamental Welfare Theorems

Utility Maximizing Routing to Data Centers

Public Goods and Private Goods

Market Equilibrium and the Core

September Math Course: First Order Derivative

On the Structure and Complexity of Worst-Case Equilibria

Finding Top-k Preferable Products

Msc Micro I exam. Lecturer: Todd Kaplan.

Algorithms for Playing and Solving games*

EC487 Advanced Microeconomics, Part I: Lecture 2

Suggested solutions to the 6 th seminar, ECON4260

Multiagent Systems Motivation. Multiagent Systems Terminology Basics Shapley value Representation. 10.

Linear Time Selection

Tribology Prof. Dr. Harish Hirani Department of Mechanical Engineering Indian Institute of Technology, Delhi

CSC236 Week 3. Larry Zhang

Econ 2140, spring 2018, Part IIa Statistical Decision Theory

Some forgotten equilibria of the Bertrand duopoly!?

The Wumpus Game. Stench Gold. Start. Cao Hoang Tru CSE Faculty - HCMUT

Econ Slides from Lecture 1

Yale University Department of Computer Science

PhD Qualifier Examination

5. DIVIDE AND CONQUER I

ECO Class 6 Nonparametric Econometrics

ECONOMICS 001 Microeconomic Theory Summer Mid-semester Exam 2. There are two questions. Answer both. Marks are given in parentheses.

This is designed for one 75-minute lecture using Games and Information. October 3, 2006

Matching Theory. Mihai Manea. Based on slides by Fuhito Kojima. MIT

II. Analysis of Linear Programming Solutions

Algorithms and Data Structures 2016 Week 5 solutions (Tues 9th - Fri 12th February)

CS 310 Advanced Data Structures and Algorithms

Routing. Topics: 6.976/ESD.937 1

Computational Complexity

Core. Ichiro Obara. December 3, 2008 UCLA. Obara (UCLA) Core December 3, / 22

Midterm #1 EconS 527 Wednesday, February 21st, 2018

Intro to Economic analysis

Microeconomic Algorithms for Flow Control in Virtual Circuit Networks (Subset in Infocom 1989)

Mathematics-I Prof. S.K. Ray Department of Mathematics and Statistics Indian Institute of Technology, Kanpur. Lecture 1 Real Numbers

Algorithmic Game Theory and Applications

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Simple Consumption / Savings Problems (based on Ljungqvist & Sargent, Ch 16, 17) Jonathan Heathcote. updated, March The household s problem X

Transcription:

Presentation for CSG399 By Jian Wen

Outline Skyline Definition and properties Related problems Progressive Algorithms Analysis SFS Analysis Algorithm Cost Estimate Epsilon Skyline(overview and idea)

Skyline Example Hotel Search: Price (lower, better) Distance(to downtown, etc.) (closer, better) Skyline Objects Given a set of d dimensional points, a skyline point should have the property on each dimension no worse than any other points.

Dominance and Skyline Given a set of d dimensional points T: We say that one point t 1 dominates another point t 2 if and only if: t 1 is better than or equal to t 2 on all dimensions, and t 1 is better than t 2 on at least one dimension. Here better can be either smaller better (minimum skyline) or larger better (maximum skyline) or any other criteria measurable and comparable The skyline points will be the ones which will not be dominated by any other points in T

Dominance For the dominance relation, the following two properties hold: Transitivity; Asymmetry.

Formal Definition Given a set of d dimensional points T: We say that one point t 1 dominates, denoted by, another point t 2 if and only if: i j (1,2,..., d), t (1,2,..., d), t [ i], [ j] t [ j] AND Skyline points in T will be the points t satisfying: 1 1 [ i] t't, so that t' t t 2

Related Problems Convex Hull Let S be the union of the both maximum Skyline and minimum Skyline, C be the convex hull. C S In the image right, black points are minimum skyline points.

Related Problems Maximal Vectors The concept of Skyline in fact comes from maximal vectors. Find the vectors which are not dominated by any other vectors. Dominate is just the same as the definition in the previous slide. Skyline computation focuses on the tuples in databases, where there will be multiple attributes, instead of dimensions.

Related Problems Pareto Set In microeconomics in general, and game theory in particular, the Pareto set, named after economist Vilfredo Pareto, is the set of all Pareto-efficient outcomes. Given a set of alternative allocations of, say, goods or income for a set of individuals, a movement from one allocation to another that can make at least one individual better off without making any other individual worse off is called a Pareto improvement. An allocation is Pareto efficient or Pareto optimal when no further Pareto improvements can be made.

Related Problems Pareto Set is a restrict skyline There will be a demand function where all of the points in the set will be the feasible solution to this function. When Pareto Efficiency is reached, the marginal rate of each variable in the function will be equal. That is, we can not make the solution totally better, or, we cannot improve any one s benefits without deducing any other one s benefits. In the general skyline computation, there is no such a function which can be used to measure the total benefits over all the variables.

Algorithms Analysis Assumption on the data set: Independence Values of each dimension will be independent to any other dimensions. Distinct values No duplicates on any dimensions Uniformity For each dimension, the values will be uniformly distributed.

Algorithms Analysis It can be seen clearly that Skyline computation can be done in polynomial time. For a given d-dimensional set with n points Straight-forward method: For each point, compare it with all the other points to check whether it can be dominated by some points: If so, remove it; If not, mark this point as a skyline point. Recursively run the step above until all points have been checked. O( dn 2 )

Algorithms Analysis Is it worse for a polynomial time algorithm? Actual running experiment: NBA player data, from 1946-2004, 16 comparable attributes, 16380 records. Straight-forward way, all in main memory. CPU 1.73G, main memory 1G 2 mins. It is not very bad at the first view. But consider the situation: Users want to get the results ASAP, and maybe at first just some options and later more choices(progressive). Users may want to specify the number of Skyline records returned, which may be more or less then the actual number(top-k Skyline).

Two attempts Prune the points dominated will deduce the number of comparison sharply. SFS(Sort Filter Skyline): Skyline With Presorting, Jan Chomicki, Parke Godfrey, Jarek Gryz, Dongming Liang, 2002 Progressive Skyline computation. Reduce the dimension Epsilon Skyline: Top-k Skyline computation.

Sort Filter Skyline Idea: Presort the data Make sure that no point can be dominated by the ones comes after it. Presort the data with the sum of all the attributes/dimensions of each record/point.

Sort Filter Skyline Analysis: After sorting on the data set, the one with the maximum sum will be certainly the skyline point. Can be proved by the definition of dominance: If this record is not a Skyline record, it must be dominated by some other point, so according to the definition of dominance, that point must have a greater sum than this point. Contradiction! Proof done.

Sort Filter Skyline For all the other points: Theorem: after sorting, for each record t in the sorted data set, t will not be dominated by any of the records after it in a descent order. Proof can be made just like the previous one. So if we want to find all the Skyline records For each record, if it will not be dominated by any of the Skyline records with the sum greater than it, this record will not be dominated by any record in the data set, so then this record will be a Skyline record.

Sort Filter Skyline According to the analysis above, we have: With the processing of the SFS algorithm, Skyline records can be returned in a progressive way. When one record is added into S, it must be a Skyline record; The first part of Skyline records will be returned fast: First Skyline record is returned at once; Other records with greater sum will be returned with less comparison. Totally the algorithm is much more faster than the straight-forward way: Comparison will be made just on the Skyline records; All the records will only need to visit once.

Sort Filter Skyline Algorithm: data set T, d dimensions Sort all records by the sum of all attributes in descent way, get T ; Maintain a set S for skyline records. Move the first record into S from T ; While T is not empty, for each record t in T : Compare t with all the records in S: If t is dominated by some record in S, then remove t from T and continue the while loop with the next record; Else move t from T to S, do the while loop with the next record. Return S

SFS Time Cost Analysis To evaluate the SFS algorithm, we need to calculate all the comparisons, plus the cost on sort(o(nlgn)). The total comparison can be divided into two parts: The comparison between Skyline records(m is the number of Skyline records): ( m 1) ( m 2)... 1 ( m 11)* m 2 2 m 2 The comparison for eliminating the non-skyline records: How to calculate?

SFS Time Cost Analysis Consider the situations that: Best case: all the non-skyline records will be eliminated at once when we make the comparison between it and a skyline record. The total comparison will be n-m. Worst case: to eliminate each non-skyline record, all of the skyline records should be used for comparison. The total comparison will be at most m(n-m)

SFS Time Cost Analysis Then, how to evaluate m(the number of Skyline records)? Consider the situation that all the records will be sorted on the first column. Then the probability of ith record is a skyline record is equal to the probability of i-th record is a skyline record in the space of (I->n, d-1).

SFS Time Cost Analysis Proof: The probability of i-th record is a skyline record can be divided into: Probability of i-th record is not dominated by the first i-1 records, which is definitely correct since in the first column, i-th record has greater value than the first i-1 records. Probability of i-th record is not dominated by the later n-i+1 records Since i-th record has the smallest value on the first column, this probability will be equal to the one on the (d-1)- dimensional space starting from i to n.

SFS Time Cost Analysis Then let A(n, d) be the expectation of the number of Skyline records for n records on d-dimensional space In the situation that data set is uniformly independent, we can assume that A(n, d) is monotone to n: n i n i i i n d i n A t P d n A 1 1 1 1) 1, ( isa skyline record} { ), ( ) (ln ) ( ) ( 1), ( 1), ( 1), ( ), ( 1 1 1 1 n O n H n H d n A j d n A j d j A d n A d d n j n j

SFS Time Cost Analysis Now from the estimation on the Skyline records, we can have the final cost analysis on SFS: Best case: d [ O(ln nlgn 2 Worst case: 1 n)] 2 n O(ln d 1 n) O( nlgn) d [ O(ln nlgn 2 1 n)] 2 O(ln d 1 n)[ n O(ln d 1 n)] O( nlgn)

SFS Time Cost Analysis Since we are using the assumption that the number of the Skyline records are expectation in the UI data set, the estimation we got above is just the average cost. In fact the worst may be much more worse, and the best case much more better: Best: Only one skyline record, which will be returned at once, and then all the other records will be dominated, so the cost is O(n). Worst: All the records are skyline records, so all the records will be checked and compared. The cost will be O(n 2 ).

Epsilon Skyline Idea: reduce the dimension Consider the definition of dominance: If we add or subtract a ε value on the left side when we make the comparison between two points, for the maximum skyline: If ε -> +, any of the points will not be dominated by any other ones, since ε makes it larger enough. So the ε-skyline will be the whole data set. If ε = 0, the ε -skyline will be just as the original skyline. If ε -> -, any of the points will be dominated by any other ones, since ε makes it smaller enough. So the ε-skyline will be an empty set.

Epsilon Skyline Our goal: Assign a distinct ε value to each points, so that when we need to get the skyline points, we only need to return all the points with ε <=0. ε value can be seen as a rank on each points, so with the rank, we can return a given k points but not only the exact skyline points. Problem: How to define ε? How to calculate ε efficiently?

Epsilon Skyline (Intuitively)Definition on ε: To increase the number of Skyline records: for all the non-skyline points, recursively find the one with the smallest ε so that after adding ε this point will not be dominated by any of the original skyline points. To decrease the number of Skyline records: for all the skyline points, recursively find the one with the smallest ε so that after subtracting ε this point will be dominated by some point in the whole data set.

Epsilon Skyline

Epsilon Skyline This topic is still a on-going research: Define the ε in a formal way so that we can use it in the analysis and the design of the algorithms; More efficient algorithms: since now based on the intuitive idea, we can see that if we want to get the ε value for each points, we should do comparison with all the other points.