Instructor: Sudeepa Roy

Similar documents
Topics in Probabilistic and Statistical Databases. Lecture 2: Representation of Probabilistic Databases. Dan Suciu University of Washington

Constraints: Functional Dependencies

Database Design and Normalization

Foundations of Databases

But RECAP. Why is losslessness important? An Instance of Relation NEWS. Suppose we decompose NEWS into: R1(S#, Sname) R2(City, Status)

Constraints: Functional Dependencies

Incomplete Information in RDF

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

Composing Schema Mappings: Second-Order Dependencies to the Rescue

On Incomplete XML Documents with Integrity Constraints

GAV-sound with conjunctive queries

Özgür L. Özçep Data Exchange 2

Correlated subqueries. Query Optimization. Magic decorrelation. COUNT bug. Magic example (slide 2) Magic example (slide 1)

Database Systems SQL. A.R. Hurson 323 CS Building

From Constructibility and Absoluteness to Computability and Domain Independence

Obtaining More Answers From Information Integration Systems

Query answering using views

A Knowledge-Based Approach to Entity Resolution

Database Applications (15-415)

COSC 430 Advanced Database Topics. Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago

Chapter 7: Relational Database Design

Approximate Rewriting of Queries Using Views

A Dichotomy. in in Probabilistic Databases. Joint work with Robert Fink. for Non-Repeating Queries with Negation Queries with Negation

arxiv: v1 [cs.db] 21 Sep 2016

Applied Logic. Lecture 1 - Propositional logic. Marcin Szczuka. Institute of Informatics, The University of Warsaw

Phil Introductory Formal Logic

an efficient procedure for the decision problem. We illustrate this phenomenon for the Satisfiability problem.

Schema Refinement and Normal Forms Chapter 19

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

General Overview - rel. model. Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Motivation. Overview - detailed

Computational Logic. Relational Query Languages with Negation. Free University of Bozen-Bolzano, Werner Nutt

A New Approach to Knowledge Base Revision in DL-Lite

A Capturing Missing Tuples and Missing Values

CSE 132B Database Systems Applications

Database Applications (15-415)

Chapter 7: Relational Database Design. Chapter 7: Relational Database Design

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #3: SQL---Part 1

Relational completeness of query languages for annotated databases

Parallel-Correctness and Transferability for Conjunctive Queries

Axiomatic set theory. Chapter Why axiomatic set theory?

Knowledge base (KB) = set of sentences in a formal language Declarative approach to building an agent (or other system):

INF1383 -Bancos de Dados

Automata Theory and Formal Grammars: Lecture 1

Relative Information Completeness

CSE507. Course Introduction. Computer-Aided Reasoning for Software. Emina Torlak

CS 347 Parallel and Distributed Data Processing

1 First-order logic. 1 Syntax of first-order logic. 2 Semantics of first-order logic. 3 First-order logic queries. 2 First-order query evaluation

Composing Schema Mappings: Second-Order Dependencies to the Rescue

CWA-Solutions for Data Exchange Settings with Target Dependencies

CS5300 Database Systems

Definite Logic Programs

Introduction to Data Management. Lecture #7 (Relational DB Design Theory II)

Stabilizing Boolean Games by Sharing Information

Database Systems Relational Algebra. A.R. Hurson 323 CS Building

Today s topics. Introduction to Set Theory ( 1.6) Naïve set theory. Basic notations for sets

Introduction. next up previous Next: Problem Description Up: Bayesian Networks Previous: Bayesian Networks

3. Only sequences that were formed by using finitely many applications of rules 1 and 2, are propositional formulas.

Learning Goals of CS245 Logic and Computation

Chapter 10. Normalization Ext (from E&N and my editing)

Database Design and Implementation

Logical Agents. September 14, 2004

PRUNING PROCEDURE FOR INCOMPLETE-OBSERVATION DIAGNOSTIC MODEL SIMPLIFICATION

Exchanging More than Complete Data

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #15: BCNF, 3NF and Normaliza:on

Queries and Materialized Views on Probabilistic Databases

Lectures 6. Lecture 6: Design Theory

COMP9414: Artificial Intelligence Propositional Logic: Automated Reasoning

CSC242: Intro to AI. Lecture 11. Tuesday, February 26, 13

Enhancing the Updatability of Projective Views

Schema Refinement: Other Dependencies and Higher Normal Forms

Introduction to Data Management CSE 344

Manipulating Games by Sharing Information

Provenance Semirings. Todd Green Grigoris Karvounarakis Val Tannen. presented by Clemens Ley

CHAPTER 4 CLASSICAL PROPOSITIONAL SEMANTICS

Homework Assignment 2. Due Date: October 17th, CS425 - Database Organization Results

CS632 Notes on Relational Query Languages I

Probabilistic Databases

CS1021. Why logic? Logic about inference or argument. Start from assumptions or axioms. Make deductions according to rules of reasoning.

Schema Refinement and Normal Forms

SUMMER MATH PACKET. Geometry A COURSE 227

Annotated revision programs

Where are we? Knowledge Engineering Semester 2, Reasoning under Uncertainty. Probabilistic Reasoning

Predicate Logic: Sematics Part 1

Instructor: Sudeepa Roy

Schema Refinement and Normal Forms. Chapter 19

Logical Provenance in Data-Oriented Workflows (Long Version)

Path Queries under Distortions: Answering and Containment

First-Order Logic. Chapter Overview Syntax

Definability in Boolean bunched logic

Show Your Work! Point values are in square brackets. There are 35 points possible. Some facts about sets are on the last page.

CS 186, Fall 2002, Lecture 6 R&G Chapter 15

Schema Refinement & Normalization Theory

Schema Refinement and Normal Forms

Logic and Databases. Phokion G. Kolaitis. UC Santa Cruz & IBM Research Almaden. Lecture 4 Part 1

Okanagan College Math 111 (071) Fall 2010 Assignment Three Problems & Solutions

Lecture 5: Introduction to (Robertson/Spärck Jones) Probabilistic Retrieval

Notes on State Minimization

FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH

Propositional Logic: Methods of Proof (Part II)

Provenance Semirings

Transcription:

CompSci 590.6 Understanding Data: Theory and Applications Lecture 13 Incomplete Databases Instructor: Sudeepa Roy Email: sudeepa@cs.duke.edu 1

Today s Reading Alice Book : Foundations of Databases Abiteboul- Hull- Vianu Chapter 19 Incomplete Information in Relational Databases Imielinski- Lipski JACM 1984 2

Incomplete Databases : Missing Information Several scenarios: Information exists, but we do not have it John bought a car, but I don t know which one Fuzzy values Heather lives in a large and cheap apartment Irrelevant attributes for some tuples Spouse s information if one is single Unreliable information 3

Semantic vs. Representation Semantic of incomplete databases A set of possible worlds or instances (again) Representation of incomplete databases What is actually stored Same problem as probabilistic databases, but now there is no probability 4

How to store missing values? Use NULL values @ in Imielinski- Lipski paper Use variables Different assignments = different possible worlds How do we choose? Choice depends on our requirements of the representation system the query language we want to support (SP, PJ, SPJ, SPJU etc) 5

Desired properties of a Representation System Intuitively, the representation system should be 1. Sound/Safe: No incorrect conclusion is derived for the specified language 2. Complete All valid conclusions are derivable 6

Representation Systems Overview (IL 84) SUPPLIER LOCATION PRODUCT QUANTITY Smith London Nails @ Brown @ Bolts @ Jones @ Nuts 40000 Codd- table @ = NULL Supports SP Cannot support both PJ COURSE TEACHER WEEKDAY Databases x Nails Programming y Tuesday Databases x Thursday FORTRAN Smith z SUPPLIER LOCATION PRODUCT con x London Nails x=smith Brown New York Nails x Smith V- table Naïve table (Alice book) Can use same x twice Supports positive RA Conditional- table or c- table Nail is supplied by either Smith or Brown but not both 7

Codd- table or Table Switching to notations/terminology in the Alice book Constants: Alice, NY, 50 Variables: x, y, z Attributes U A Table/Codd- table is a relation over U A finite set of tuples over U Each tuple contains constants and/or variables No variable appears twice Extends to multiple tables 0 1 x y z 1 2 0 v Codd- table 8

Valuation of a Codd- table Given a set of variables Var, a valuation v is a total function from Var to constants Consts v: Var - > Const Semantic of a Codd- table T all possible valuations of variables in T rep(r) = {v(t): v is a valuation of variables in T} 0 1 x y z 1 2 0 v Codd- table 0 1 2 2 0 1 2 0 0 0 1 2 3 0 1 2 0 5 Some Instances 0 1 1 2 0 1 Why two rows? 9

Closed vs. Open World Assumptions 0 1 x y z 1 2 0 v c- table 0 1 2 2 0 1 2 0 0 0 1 2 3 0 1 2 0 5 Some Instances 0 1 1 2 0 1 Why two rows? Closed World Assumption (CWA) Each tuple in any instance must be justified by the presence of a free tuple in the c- table Open World Assumption (OWA) All instances contain an instance from the valuations rep(t) 10

So far Semantic of incomplete databases Possible worlds (CWA or OWA) Representation Codd- tables Next Query evaluation 11

Query Evaluation: Semantic Codd- table T Represents possible worlds rep(t) by CWA Query q For each instance I rep(t), the answer is q(i) The set of possible answers q(rep(t)) This gives the semantic. But what are the requirements of its representation? 12

Query Evaluation: Representation Codd- table T A system to represent input incomplete database Query q from a language L (e.g. RA) For each I rep(t), the answer is q(i) The set of possible answers q(rep(t)) We would like to represent q(rep(t)) in the same system For each representation T and query q, there should exist a computable representation q(t) such that rep(q(t)) = q(rep(t)) Strong representation system: The above property holds for a query language L (here RA) i.e. the answer to a query represents ALL POSSIBLE answers Weak representation system: Weaker requirement 13

Query q = σ A=3 (T) Challenges: Query evaluation on Codd- table 0 1 x y z 1 2 0 v Codd- table T 0 1 2 2 0 1 2 0 0 0 1 2 3 0 1 2 0 5 0 1 1 2 0 1 I 1 I I 2 3 No Codd- table representing the possible answers to q I 1, I 3 : empty relation I 2 : non- empty relation (single tuple) Suppose a table T exists such that T represents q(t) Either T is empty Then q(i 2 ) is not in rep(t ) Or T is non- empty Then empty relation is not in rep(t ) Contradiction. T does not exist 14

So far The representation system of Codd- table is insufficient Alternatives 1. Be less demanding weak representation system Next 2. Consider richer representation system that lead to a complete representation system for all of RA c- table Later 15

Weak Representation Systems DOES NOT require that the answer to a query be a representation of the set of all possible answers ASKS which tuples are SURELY in the answer i.e. belongs to all possible answers 16

Sure Tuples The set of sure tuples sure(q, T) = {q(i) I rep(t)} t sure(q, T) ó t is in q(i) for each possible world I Can be computed by dropping all variables But there is a problem 17

q = σ A=2 (T) q = Π AB (R) Problem with dropping vars 0 1 x y z 1 2 0 v Codd- table T 0 1 2 0 1 2 0 1 1 2 0 1 3 0 1 2 0 1 2 0 0 2 0 5 I 1 I 2 I 3 sure(q, T) = ɸ q (sure(q, T)) = ɸ <2, 0, *> I for all I q(rep(t)) <2, 0> I for all I q (q(rep(t))) sure(q (q(rep(t))) = {<2, 0>} (why) Not compositional Need a notion for equivalence 18

Equivalence of Incomplete Databases L is a query language Two incomplete databases I, J are L- equivalent if they are indistinguishable w.r.t. sure tuples for all q in L I L J {q(i) I I} = {q(j) J J} For each q L 19

Weak Representation System (WRS) WRS for a query language L For every q in L For each representation T of an incomplete database there can be several representation for the same db There is a representation q(t) such that rep(q(t)) L q(rep(t)) i.e. their sure tuples are identical 20

Weak Representation System (WRS) rep(q(t)) L q(rep(t)) NOTE: q(t) is not precisely sure(q, T) But sure(q, T) can be obtained by eliminating tuples with varsat the end Codd- tables form a WRS for Select- Project NOT for Union- Join (proof in the Alice book) 21

WRS examples: SP Select σ: return tuples s.t. the condition holds for all valuations of vars Project Π: ordinary projection q = σ A=2 (T) q = Π AB (R) 0 1 x y z 1 2 0 v σ A=2 (T) = {<2, 0, v>} Π AB σ A=2 (T) = {<2, 0>} The problem with Join/Union: repeated vars are not allowed in Codd- tables 22

Next: Naïve Tables Naïve Tables called V- tables in Imilienski- Lipski 84 allow repeated vars Work for SPJU = positive RA Basically vars are treated as distinct constants However the representation system is still weak Next c- tables 0 1 x x z 1 2 0 v 23

Extend representation with conditions on vars Limitations of Codd- tables and naïve tables for selection conditions, sometimes they hold sometimes do not Solution: Extend representation with conditions on vars c- tables Form a strong representation system for RA 24

Condition Conjunctions of equality atoms x = y x = c (const) inequality atoms x y x c (const) Two types of conditions associated with a table T Global: ɸ T associated with the entire table Local: ϕ t associated with a tuple t in T 25

conditional- table or c- table A triple (T, ɸ T, ϕ) T is a naive- table ɸ T is a global condition ϕ is a mapping that associates a local condition ϕ t to every tuple t of T Omit a condition = true (always holds) or x = x ɸ, ϕ can contain vars that DO NOT belong to table T or tuple t A B x 2, y 2 0 1 z = z 1 x y = 0 y x x y c- table T A B A B A B 0 1 0 1 0 1 means TRUE 1 3 1 0 0 3 Some instances J 1 J 2 J 3 26

conditional- table or c- table A c- table (T, ɸ T, ϕ) represents possible worlds as follows rep(t) = {I there is a valuation v satisfying ɸ T such that the relation I consists exactly of those facts v(t) for which v satisfies ϕ t } Sample assignments of (x, y, z) J 1 : (3, 0, 0) J 2 : (0, 1, 0) J 3 : (1, 1, 0) A B x 2, y 2 0 1 z = z 1 x y = 0 y x x y c- table T A B 0 1 1 3 0 3 Some instances A B A B 0 1 0 1 1 0 J 1 J 2 J 3 27

Can capture disjunction Example Representation Power Sally is taking MATH or CS (but NOT both) + another course Alice takes BIOLOGY if Sally takes MATH Alice takes MATH or PHYSICS (but not both) if Sally takes PHYSICS Student Course (x MATH) ^ (x CS) Sally MATH z = 0 Sally CS z 0 Sally x Alice BIOLOGY z = 0 Alice MATH (x = PHYSICS) ^ (t = 0) Alice PHYSICS (x = PHYSICS) ^ (t 0) 28

Observations There may be several c- tables for the same incomplete database Equivalence: Two representations T, T are equivalence if rep(t) = rep(t ) Even checking membership in rep(t) is NP- complete Therefore, testing equivalence is not easy but is decidable 29

c- tables form Strong Representation Systems Projection simply project the columns but preserve conditions Selection add new conjuncts to local condition Union make sure the tables use distinct vars then choose appropriate local conditions Join consider all possible pairs from two tables Set difference not limited to positive RA 30

Example T 1 T 2 T 3 B x C c B C y c y = b z Compute the c- tables for (on whiteboard) Π B (T 2 ) T 1 T 3 - - - natural join σ B=b (T 1 T 3 ) T 1 U T 2 T 1 - T 2 w A a B y 31