Database Design and Implementation

Size: px
Start display at page:

Download "Database Design and Implementation"

Transcription

1 Database Design and Implementation CS 645 Data provenance

2 Provenance provenance, n. The fact of coming from some particular source or quarter; origin, derivation [Oxford English Dictionary] Data provenance / lineage [BunemanKhannaTan 01]: aims to explain how a particular result was derived. Data-intensive science Worry about provenance

3 Motivation Data integration [WangMadnick90, LeeBressanMadnick98] Data Warehousing [CuiWidonWiener00] Scientific Data Management [BunemanKhannaTan01] Determines trust on results Ensure reliability, quality of data Repeatability/verifiability Avoid effort duplication Understanding transport of annotations

4 Example of data provenance A typical question: For a given database query Q, a database D and a tuple t in the output of Q(D), which parts of D contribute to t? R Emp John Susan Anna Dept D01 D02 D04 S Did D01 D02 D03 Mgr Mary Ken Ed Q Q = select r.a, r.b, s.c from R r, S s where r.b = s.b Emp Dept Mgr John D01 Mary Susan D02 Ken The question can be applied to attribute values, tables, etc.

5 Two approaches Eager or annotation-based Changes the transformation from Q to Q to carry extra information Source data not needed after transformation Annotation-based Q Q Extra information Lazy or non-annotation based Q is unchanged Good when extra storage is an issue Recomputation and access to source required

6 Types of provenance Why What DB tuples contribute to the presence of each result tuple? How By what process is each output tuple produced from the DB instance? Where Where (from what attribute of what tuple) does each output tuple value come from?

7 Why-provenance example a.name, DISTINCT a.phone a.name, a.phone

8 Lineage for an output tuple t is a subset of the input tuples which are relevant to the output tuple DISTINCT a.name, a.phone Lineage: {t1, t5, t6} Problem: Not very precise. e.g., lineage above does not specify that t5 and t6 do not both need to exist.

9 Why provenance DISTINCT a.name, a.phone Witness of t: Any subset of the database sufficient to reconstruct tuple t in the query result. Witness basis: Leaves of the proof tree showing how result tuple t is generated Lineage: {t1, t5, t6} {t1, t5} {t1, t6} {t1, t2, t6, t8} {{t1, t5}, {t1, t6}}

10 Why: query rewriting t1 t t2 t3 Why(Q, I, t): {{t 1 }} Why(Q, I, t): {{t 1 }, {t 1, t 2 }} Minimal witness basis: Minimal witnesses in the witness basis

11 The view deletion problem D a database instance and V=Q(D) a view defined over D. Find a set of tuples ΔD to remove from D so that a specific tuple t is removed from the view Minimize the number of side-effects in the view View side-effect problem Hard: queries with joins and projection or union PTIME: the rest Minimize the number of tuples deleted from D Source side-effect problem Same dichotomy [BunemanKhannaTan. PODS 2002]

12 How provenance Identifies witness tuples and the operations performed on them to produce each result tuple Expresses operations using provenance semirings MERGE (+): union or projection JOIN ( ): joins

13 Propagating annotations (1) R A B C a b c S D B E d b e Join (on B) R S A B C D E a b c d e The annotation means joint use of the data annotated by p and the data annotated by r

14 Propagating annotations (2) R A B C a b c Union R S A B C a b c p + r S A B C a b c The annotation p + r means alternative use of the data annotated by p and the data annotated by r

15 Propagating annotations (3) R A B C a b c 1 a b c 2 p r Project π AB R A B a b p + r + s a b c 3 s + denotes alternative use of data

16 An example (SPJU) R A B C a b c d b e f g e p r s Q = σ C=e π AC (π AB R π BC R π AC R π BC R) A C a c a e d c d e f e For selection, multiply with annotation 0 and 1.

17 Example

18 Example

19 Example

20 Example

21 Example

22 Example

23 Example

24 Example

25 Example

26 Example

27 Back to example R A B C a b c d b e f g e p r s Q A C a c a e d c d e f e

28 Applying the laws: polynomials R A B C a b c p Q A C a e pr d b e r d e 2r 2 + rs f g e s f e rs + 2s 2 Polynomials with coefficients in and annotation tokens as indeterminates p, r, s capture a very general form of provenance

29 How to read this provenance R A B C a b c p Q A C a e pr d b e r d e 2r 2 + rs f g e s f e rs + 2s 2 3 ways to derive (d e) 2 of the ways use only r, but they use it twice the 3 rd uses r once and s once

30 Deletion Propagation R A B C a b c p Q A C a e pr Q A C a e 0 Q A C f e 2s 2 d b e r d e 2r 2 + rs d e 0 f g e s f e rs + 2s 2 f e 2s 2 Delete (d b e) from R Set r to 0!

31 Some useful commutative semirings Set Semantics Bag Semantics Probabilistic events Access Control Public Top Secret

32 Example: access control where a c 2p 2 a b c p d b e r f g e s p=p, r=s, s=t q a d d f e c e e pr pr 2r 2 +rs 2s 2 +rs a b c d b e f g e P S T q a a d d c e c e P S S S Evaluate with p=p, r=s, s=t using min for +, max for f e T User with secret clearance

33 Where provenance Identifies witness cells Important for annotations SELECT * FROM R WHERE A <> 5 UNION SELECT A, 7 AS B FROM R WHERE A= 5 UPDATE R SET B=7 WHERE A=5 R A B ? A B

34 Color algebra [Geerts, Kementsietsidis, Milano 06] A B P[Q] A B Q = SELECT * FROM R WHERE A <> 5 UNION SELECT A, 7 AS B FROM R WHERE A= 5

35 Color algebra A B P[Q] A B Q = UPDATE R SET B=7 WHERE A=5

36 Where provenance and semirings R u A x B y C 1 a 1 b 1 c 1 S v B 1 C 1 b z c 1 m π AC (π AB R (π BC R S)) A 1 C 1 a 1 c 1 u 2 p 2 xy 2 + uvpmxyz 1 is a neutral annotation, used when we don t bother to track data

37 Different annotations à Different tuples R A B C a b c d b e z f g e w p r s π C σ C=e π AC (π AB R π BC R) C e z e w pr+r 2 s 2

38 Wrap up: issues and directions Archiving Compression Generalizations Program Slicing [Cheney07] Negative Provenance Why Not? [SIGMOD09], Artemis [PVLDB09] Causality

On Factorisation of Provenance Polynomials

On Factorisation of Provenance Polynomials On Factorisation of Provenance Polynomials Dan Olteanu and Jakub Závodný Oxford University Computing Laboratory Wolfson Building, Parks Road, OX1 3QD, Oxford, UK 1 Introduction Tracking and managing provenance

More information

Provenance Semirings. Todd Green Grigoris Karvounarakis Val Tannen. presented by Clemens Ley

Provenance Semirings. Todd Green Grigoris Karvounarakis Val Tannen. presented by Clemens Ley Provenance Semirings Todd Green Grigoris Karvounarakis Val Tannen presented by Clemens Ley place of origin Provenance Semirings Todd Green Grigoris Karvounarakis Val Tannen presented by Clemens Ley place

More information

Query Evaluation on Probabilistic Databases. CSE 544: Wednesday, May 24, 2006

Query Evaluation on Probabilistic Databases. CSE 544: Wednesday, May 24, 2006 Query Evaluation on Probabilistic Databases CSE 544: Wednesday, May 24, 2006 Problem Setting Queries: Tables: Review A(x,y) :- Review(x,y), Movie(x,z), z > 1991 name rating p Movie Monkey Love good.5 title

More information

Relational completeness of query languages for annotated databases

Relational completeness of query languages for annotated databases Relational completeness of query languages for annotated databases Floris Geerts 1,2 and Jan Van den Bussche 1 1 Hasselt University/Transnational University Limburg 2 University of Edinburgh Abstract.

More information

Provenance for Database Transforma1ons

Provenance for Database Transforma1ons Provenance for Database Transforma1ons Val Tannen University of Pennsylvania Joint work with J.N. Foster T.J. Green G. Karvounarakis Z. Ives Cornell UC Davis LogicBlox UPenn and ICS- FORTH 03/24/10 EDBT

More information

Provenance for Aggregate Queries

Provenance for Aggregate Queries Provenance for Aggregate Queries Yael Amsterdamer Tel Aviv University and University of Pennsylvania yaelamst@post.tau.ac.il Daniel Deutch Ben Gurion University and University of Pennsylvania deutchd@cs.bgu.ac.il

More information

A Dichotomy. in in Probabilistic Databases. Joint work with Robert Fink. for Non-Repeating Queries with Negation Queries with Negation

A Dichotomy. in in Probabilistic Databases. Joint work with Robert Fink. for Non-Repeating Queries with Negation Queries with Negation Dichotomy for Non-Repeating Queries with Negation Queries with Negation in in Probabilistic Databases Robert Dan Olteanu Fink and Dan Olteanu Joint work with Robert Fink Uncertainty in Computation Simons

More information

Path Queries under Distortions: Answering and Containment

Path Queries under Distortions: Answering and Containment Path Queries under Distortions: Answering and Containment Gosta Grahne Concordia University Alex Thomo Suffolk University Foundations of Information and Knowledge Systems (FoIKS 04) Postulate 1 The world

More information

Relational Algebra on Bags. Why Bags? Operations on Bags. Example: Bag Selection. σ A+B < 5 (R) = A B

Relational Algebra on Bags. Why Bags? Operations on Bags. Example: Bag Selection. σ A+B < 5 (R) = A B Relational Algebra on Bags Why Bags? 13 14 A bag (or multiset ) is like a set, but an element may appear more than once. Example: {1,2,1,3} is a bag. Example: {1,2,3} is also a bag that happens to be a

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 3: Query Processing Query Processing Decomposition Localization Optimization CS 347 Notes 3 2 Decomposition Same as in centralized system

More information

Database design and implementation CMPSCI 645. Lecture 14: Data Provenance

Database design and implementation CMPSCI 645. Lecture 14: Data Provenance Databas dsign and implmntation CMPSCI 645 Lctur 14: Data Provnanc 1 Provnanc provnanc, n. Th fact of coming from som particular sourc or quartr; origin, drivation [Oxford English Dictionary] } Data provnanc

More information

Correlated subqueries. Query Optimization. Magic decorrelation. COUNT bug. Magic example (slide 2) Magic example (slide 1)

Correlated subqueries. Query Optimization. Magic decorrelation. COUNT bug. Magic example (slide 2) Magic example (slide 1) Correlated subqueries Query Optimization CPS Advanced Database Systems SELECT CID FROM Course Executing correlated subquery is expensive The subquery is evaluated once for every CPS course Decorrelate!

More information

CS 347 Distributed Databases and Transaction Processing Notes03: Query Processing

CS 347 Distributed Databases and Transaction Processing Notes03: Query Processing CS 347 Distributed Databases and Transaction Processing Notes03: Query Processing Hector Garcia-Molina Zoltan Gyongyi CS 347 Notes 03 1 Query Processing! Decomposition! Localization! Optimization CS 347

More information

P Q1 Q2 Q3 Q4 Q5 Tot (60) (20) (20) (20) (60) (20) (200) You are allotted a maximum of 4 hours to complete this exam.

P Q1 Q2 Q3 Q4 Q5 Tot (60) (20) (20) (20) (60) (20) (200) You are allotted a maximum of 4 hours to complete this exam. Exam INFO-H-417 Database System Architecture 13 January 2014 Name: ULB Student ID: P Q1 Q2 Q3 Q4 Q5 Tot (60 (20 (20 (20 (60 (20 (200 Exam modalities You are allotted a maximum of 4 hours to complete this

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 16: Bayes Nets IV Inference 3/28/2011 Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Announcements

More information

Logic and Databases. Phokion G. Kolaitis. UC Santa Cruz & IBM Research Almaden. Lecture 4 Part 1

Logic and Databases. Phokion G. Kolaitis. UC Santa Cruz & IBM Research Almaden. Lecture 4 Part 1 Logic and Databases Phokion G. Kolaitis UC Santa Cruz & IBM Research Almaden Lecture 4 Part 1 1 Thematic Roadmap Logic and Database Query Languages Relational Algebra and Relational Calculus Conjunctive

More information

Databases 2011 The Relational Algebra

Databases 2011 The Relational Algebra Databases 2011 Christian S. Jensen Computer Science, Aarhus University What is an Algebra? An algebra consists of values operators rules Closure: operations yield values Examples integers with +,, sets

More information

Topics in Probabilistic and Statistical Databases. Lecture 2: Representation of Probabilistic Databases. Dan Suciu University of Washington

Topics in Probabilistic and Statistical Databases. Lecture 2: Representation of Probabilistic Databases. Dan Suciu University of Washington Topics in Probabilistic and Statistical Databases Lecture 2: Representation of Probabilistic Databases Dan Suciu University of Washington 1 Review: Definition The set of all possible database instances:

More information

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1 Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1 Background We started with schema design ER model translation into a relational schema Then we studied relational

More information

Query answering using views

Query answering using views Query answering using views General setting: database relations R 1,...,R n. Several views V 1,...,V k are defined as results of queries over the R i s. We have a query Q over R 1,...,R n. Question: Can

More information

GAV-sound with conjunctive queries

GAV-sound with conjunctive queries GAV-sound with conjunctive queries Source and global schema as before: source R 1 (A, B),R 2 (B,C) Global schema: T 1 (A, C), T 2 (B,C) GAV mappings become sound: T 1 {x, y, z R 1 (x,y) R 2 (y,z)} T 2

More information

Announcements. CS 188: Artificial Intelligence Spring Bayes Net Semantics. Probabilities in BNs. All Conditional Independences

Announcements. CS 188: Artificial Intelligence Spring Bayes Net Semantics. Probabilities in BNs. All Conditional Independences CS 188: Artificial Intelligence Spring 2011 Announcements Assignments W4 out today --- this is your last written!! Any assignments you have not picked up yet In bin in 283 Soda [same room as for submission

More information

CSE 562 Database Systems

CSE 562 Database Systems Outline Query Optimization CSE 562 Database Systems Query Processing: Algebraic Optimization Some slides are based or modified from originals by Database Systems: The Complete Book, Pearson Prentice Hall

More information

Topics in Probabilistic and Statistical Databases. Lecture 9: Histograms and Sampling. Dan Suciu University of Washington

Topics in Probabilistic and Statistical Databases. Lecture 9: Histograms and Sampling. Dan Suciu University of Washington Topics in Probabilistic and Statistical Databases Lecture 9: Histograms and Sampling Dan Suciu University of Washington 1 References Fast Algorithms For Hierarchical Range Histogram Construction, Guha,

More information

Relational Algebra and Calculus

Relational Algebra and Calculus Topics Relational Algebra and Calculus Linda Wu Formal query languages Preliminaries Relational algebra Relational calculus Expressive power of algebra and calculus (CMPT 354 2004-2) Chapter 4 CMPT 354

More information

INTRODUCTION TO RELATIONAL DATABASE SYSTEMS

INTRODUCTION TO RELATIONAL DATABASE SYSTEMS INTRODUCTION TO RELATIONAL DATABASE SYSTEMS DATENBANKSYSTEME 1 (INF 3131) Torsten Grust Universität Tübingen Winter 2017/18 1 THE RELATIONAL ALGEBRA The Relational Algebra (RA) is a query language for

More information

A Toolbox of Query Evaluation Techniques for Probabilistic Databases

A Toolbox of Query Evaluation Techniques for Probabilistic Databases 2nd Workshop on Management and mining Of UNcertain Data (MOUND) Long Beach, March 1st, 2010 A Toolbox of Query Evaluation Techniques for Probabilistic Databases Dan Olteanu, Oxford University Computing

More information

PUG: A Framework and Practical Implementation for Why & Why-Not Provenance (extended version)

PUG: A Framework and Practical Implementation for Why & Why-Not Provenance (extended version) arxiv:808.05752v [cs.db] 6 Aug 208 PUG: A Framework and Practical Implementation for Why & Why-Not Provenance (extended version) Seokki Lee, Bertram Ludäscher, Boris Glavic IIT DB Group Technical Report

More information

Enhancing the Updatability of Projective Views

Enhancing the Updatability of Projective Views Enhancing the Updatability of Projective Views (Extended Abstract) Paolo Guagliardo 1, Reinhard Pichler 2, and Emanuel Sallinger 2 1 KRDB Research Centre, Free University of Bozen-Bolzano 2 Vienna University

More information

12/3/2010 REVIEW ALGEBRA. Exam Su 3:30PM - 6:30PM 2010/12/12 Room C9000

12/3/2010 REVIEW ALGEBRA. Exam Su 3:30PM - 6:30PM 2010/12/12 Room C9000 REVIEW Exam Su 3:30PM - 6:30PM 2010/12/12 Room C9000 2 ALGEBRA 1 RELATIONAL ALGEBRA OPERATIONS Basic operations Selection ( ) Selects a subset of rows from relation. Projection ( ) Deletes unwanted columns

More information

Database Design and Normalization

Database Design and Normalization Database Design and Normalization Chapter 11 (Week 12) EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 1 1NF FIRST S# Status City P# Qty S1 20 London P1 300 S1 20 London

More information

Query Processing. 3 steps: Parsing & Translation Optimization Evaluation

Query Processing. 3 steps: Parsing & Translation Optimization Evaluation rela%onal algebra Query Processing 3 steps: Parsing & Translation Optimization Evaluation 30 Simple set of algebraic operations on relations Journey of a query SQL select from where Rela%onal algebra π

More information

Provenance-Based Analysis of Data-Centric Processes

Provenance-Based Analysis of Data-Centric Processes Noname manuscript No. will be inserted by the editor) Provenance-Based Analysis of Data-Centric Processes Daniel Deutch Yuval Moskovitch Val Tannen the date of receipt and acceptance should be inserted

More information

Desirable properties of decompositions 1. Decomposition of relational schemes. Desirable properties of decompositions 3

Desirable properties of decompositions 1. Decomposition of relational schemes. Desirable properties of decompositions 3 Desirable properties of decompositions 1 Lossless decompositions A decomposition of the relation scheme R into Decomposition of relational schemes subschemes R 1, R 2,..., R n is lossless if, given tuples

More information

Data Cleaning and Query Answering with Matching Dependencies and Matching Functions

Data Cleaning and Query Answering with Matching Dependencies and Matching Functions Data Cleaning and Query Answering with Matching Dependencies and Matching Functions Leopoldo Bertossi Carleton University Ottawa, Canada bertossi@scs.carleton.ca Solmaz Kolahi University of British Columbia

More information

Instructor: Sudeepa Roy

Instructor: Sudeepa Roy CompSci 590.6 Understanding Data: Theory and Applications Lecture 13 Incomplete Databases Instructor: Sudeepa Roy Email: sudeepa@cs.duke.edu 1 Today s Reading Alice Book : Foundations of Databases Abiteboul-

More information

Tractable Lineages on Treelike Instances: Limits and Extensions

Tractable Lineages on Treelike Instances: Limits and Extensions Tractable Lineages on Treelike Instances: Limits and Extensions Antoine Amarilli 1, Pierre Bourhis 2, Pierre enellart 1,3 June 29th, 2016 1 Télécom ParisTech 2 CNR CRItAL 3 National University of ingapore

More information

Properties of Real Numbers

Properties of Real Numbers Properties of Real Numbers Essential Understanding. Relationships that are always true for real numbers are called properties, which are rules used to rewrite and compare expressions. Two algebraic expressions

More information

The Complexity of Causality and Responsibility for Query Answers and non-answers

The Complexity of Causality and Responsibility for Query Answers and non-answers The Complexity of Causality and Responsibility for Query Answers and non-answers Alexandra Meliou Wolfgang Gatterbauer Katherine F. Moore Dan Suciu Department of Computer Science and Engineering, University

More information

Sect Properties of Real Numbers and Simplifying Expressions

Sect Properties of Real Numbers and Simplifying Expressions Sect 1.7 - Properties of Real Numbers and Simplifying Expressions Concept #1 Commutative Properties of Real Numbers Ex. 1a 9.34 + 2.5 Ex. 1b 2.5 + ( 9.34) Ex. 1c 6.3(4.2) Ex. 1d 4.2( 6.3) a) 9.34 + 2.5

More information

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information Relational Database Design Database Design To generate a set of relation schemas that allows - to store information without unnecessary redundancy - to retrieve desired information easily Approach - design

More information

Outline. Approximation: Theory and Algorithms. Application Scenario. 3 The q-gram Distance. Nikolaus Augsten. Definition and Properties

Outline. Approximation: Theory and Algorithms. Application Scenario. 3 The q-gram Distance. Nikolaus Augsten. Definition and Properties Outline Approximation: Theory and Algorithms Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 3 March 13, 2009 2 3 Nikolaus Augsten (DIS) Approximation: Theory and

More information

CMPT 354: Database System I. Lecture 9. Design Theory

CMPT 354: Database System I. Lecture 9. Design Theory CMPT 354: Database System I Lecture 9. Design Theory 1 Design Theory Design theory is about how to represent your data to avoid anomalies. Design 1 Design 2 Student Course Room Mike 354 AQ3149 Mary 354

More information

Database Design and Implementation

Database Design and Implementation Database Design and Implementation CS 645 Schema Refinement First Normal Form (1NF) A schema is in 1NF if all tables are flat Student Name GPA Course Student Name GPA Alice 3.8 Bob 3.7 Carol 3.9 Alice

More information

Schema Refinement and Normal Forms

Schema Refinement and Normal Forms Schema Refinement and Normal Forms Chapter 19 Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1 The Evils of Redundancy Redundancy is at the root of several problems associated with relational

More information

Adding and Subtracting Terms

Adding and Subtracting Terms Adding and Subtracting Terms 1.6 OBJECTIVES 1.6 1. Identify terms and like terms 2. Combine like terms 3. Add algebraic expressions 4. Subtract algebraic expressions To find the perimeter of (or the distance

More information

Relational Database Design

Relational Database Design Relational Database Design Jan Chomicki University at Buffalo Jan Chomicki () Relational database design 1 / 16 Outline 1 Functional dependencies 2 Normal forms 3 Multivalued dependencies Jan Chomicki

More information

Schema Refinement and Normal Forms

Schema Refinement and Normal Forms Schema Refinement and Normal Forms UMass Amherst Feb 14, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke, Dan Suciu 1 Relational Schema Design Conceptual Design name Product buys Person price name

More information

Quantifying Causal Effects on Query Answering in Databases

Quantifying Causal Effects on Query Answering in Databases Quantifying Causal Effects on Query Answering in Databases Babak Salimi University of Washington February 2016 Collaborators: Leopoldo Bertossi (Carleton University), Dan Suciu (University of Washington),

More information

Precalculus Chapter P.1 Part 2 of 3. Mr. Chapman Manchester High School

Precalculus Chapter P.1 Part 2 of 3. Mr. Chapman Manchester High School Precalculus Chapter P.1 Part of 3 Mr. Chapman Manchester High School Algebraic Expressions Evaluating Algebraic Expressions Using the Basic Rules and Properties of Algebra Definition of an Algebraic Expression:

More information

Data Cleaning and Query Answering with Matching Dependencies and Matching Functions

Data Cleaning and Query Answering with Matching Dependencies and Matching Functions Data Cleaning and Query Answering with Matching Dependencies and Matching Functions Leopoldo Bertossi 1, Solmaz Kolahi 2, and Laks V. S. Lakshmanan 2 1 Carleton University, Ottawa, Canada. bertossi@scs.carleton.ca

More information

6.830 Lecture 11. Recap 10/15/2018

6.830 Lecture 11. Recap 10/15/2018 6.830 Lecture 11 Recap 10/15/2018 Celebration of Knowledge 1.5h No phones, No laptops Bring your Student-ID The 5 things allowed on your desk Calculator allowed 4 pages (2 pages double sided) of your liking

More information

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd. The Evils of Redundancy Schema Refinement and Normal Forms Chapter 19 Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1 Redundancy is at the root of several problems associated with relational

More information

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram Schema Refinement and Normal Forms Chapter 19 Database Management Systems, R. Ramakrishnan and J. Gehrke 1 The Evils of Redundancy Redundancy is at the root of several problems associated with relational

More information

Datalog : A Family of Languages for Ontology Querying

Datalog : A Family of Languages for Ontology Querying Datalog : A Family of Languages for Ontology Querying Georg Gottlob Department of Computer Science University of Oxford joint work with Andrea Calì,Thomas Lukasiewicz, Marco Manna, Andreas Pieris et al.

More information

Queries and Materialized Views on Probabilistic Databases

Queries and Materialized Views on Probabilistic Databases Queries and Materialized Views on Probabilistic Databases Nilesh Dalvi Christopher Ré Dan Suciu September 11, 2008 Abstract We review in this paper some recent yet fundamental results on evaluating queries

More information

Schedule. Today: Jan. 17 (TH) Jan. 24 (TH) Jan. 29 (T) Jan. 22 (T) Read Sections Assignment 2 due. Read Sections Assignment 3 due.

Schedule. Today: Jan. 17 (TH) Jan. 24 (TH) Jan. 29 (T) Jan. 22 (T) Read Sections Assignment 2 due. Read Sections Assignment 3 due. Schedule Today: Jan. 17 (TH) Relational Algebra. Read Chapter 5. Project Part 1 due. Jan. 22 (T) SQL Queries. Read Sections 6.1-6.2. Assignment 2 due. Jan. 24 (TH) Subqueries, Grouping and Aggregation.

More information

A Comparative Study of Noncontextual and Contextual Dependencies

A Comparative Study of Noncontextual and Contextual Dependencies A Comparative Study of Noncontextual Contextual Dependencies S.K.M. Wong 1 C.J. Butz 2 1 Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: wong@cs.uregina.ca

More information

Provenance Analysis for Missing Answers and Integrity Repairs

Provenance Analysis for Missing Answers and Integrity Repairs Provenance Analysis for Missing Answers and Integrity Repairs Jane Xu, Waley Zhang, Abdussalam Alawini, and Val Tannen Dept. Computer and Information Science University of Pennsylvania {xuyuan, wzha, alawini,

More information

Inverting Proof Systems for Secrecy under OWA

Inverting Proof Systems for Secrecy under OWA Inverting Proof Systems for Secrecy under OWA Giora Slutzki Department of Computer Science Iowa State University Ames, Iowa 50010 slutzki@cs.iastate.edu May 9th, 2010 Jointly with Jia Tao and Vasant Honavar

More information

Schema Refinement and Normal Forms

Schema Refinement and Normal Forms Schema Refinement and Normal Forms Chapter 19 Quiz #2 Next Thursday Comp 521 Files and Databases Fall 2012 1 The Evils of Redundancy v Redundancy is at the root of several problems associated with relational

More information

Factorised Representations of Query Results: Size Bounds and Readability

Factorised Representations of Query Results: Size Bounds and Readability Factorised Representations of Query Results: Size Bounds and Readability Dan Olteanu and Jakub Závodný Department of Computer Science University of Oxford {dan.olteanu,jakub.zavodny}@cs.ox.ac.uk ABSTRACT

More information

Design Theory for Relational Databases. Spring 2011 Instructor: Hassan Khosravi

Design Theory for Relational Databases. Spring 2011 Instructor: Hassan Khosravi Design Theory for Relational Databases Spring 2011 Instructor: Hassan Khosravi Chapter 3: Design Theory for Relational Database 3.1 Functional Dependencies 3.2 Rules About Functional Dependencies 3.3 Design

More information

Schema Refinement and Normal Forms. Chapter 19

Schema Refinement and Normal Forms. Chapter 19 Schema Refinement and Normal Forms Chapter 19 1 Review: Database Design Requirements Analysis user needs; what must the database do? Conceptual Design high level descr. (often done w/er model) Logical

More information

Relational Database: Identities of Relational Algebra; Example of Query Optimization

Relational Database: Identities of Relational Algebra; Example of Query Optimization Relational Database: Identities of Relational Algebra; Example of Query Optimization Greg Plaxton Theory in Programming Practice, Fall 2005 Department of Computer Science University of Texas at Austin

More information

Today. Vector Clocks and Distributed Snapshots. Motivation: Distributed discussion board. Distributed discussion board. 1. Logical Time: Vector clocks

Today. Vector Clocks and Distributed Snapshots. Motivation: Distributed discussion board. Distributed discussion board. 1. Logical Time: Vector clocks Vector Clocks and Distributed Snapshots Today. Logical Time: Vector clocks 2. Distributed lobal Snapshots CS 48: Distributed Systems Lecture 5 Kyle Jamieson 2 Motivation: Distributed discussion board Distributed

More information

Wavelets for Efficient Querying of Large Multidimensional Datasets

Wavelets for Efficient Querying of Large Multidimensional Datasets Wavelets for Efficient Querying of Large Multidimensional Datasets Cyrus Shahabi University of Southern California Integrated Media Systems Center (IMSC) and Dept. of Computer Science Los Angeles, CA 90089-0781

More information

Schema Refinement and Normal Forms. Why schema refinement?

Schema Refinement and Normal Forms. Why schema refinement? Schema Refinement and Normal Forms Why schema refinement? Consider relation obtained from Hourly_Emps: Hourly_Emps (sin,rating,hourly_wages,hourly_worked) Problems: Update Anomaly: Can we change the wages

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) Relational Calculus Lecture 5, January 27, 2014 Mohammad Hammoud Today Last Session: Relational Algebra Today s Session: Relational algebra The division operator and summary

More information

Tuple Relational Calculus

Tuple Relational Calculus Tuple Relational Calculus Université de Mons (UMONS) May 14, 2018 Motivation S[S#, SNAME, STATUS, CITY] P[P#, PNAME, COLOR, WEIGHT, CITY] SP[S#, P#, QTY)] Get all pairs of city names such that a supplier

More information

Count-Min Tree Sketch: Approximate counting for NLP

Count-Min Tree Sketch: Approximate counting for NLP Count-Min Tree Sketch: Approximate counting for NLP Guillaume Pitel, Geoffroy Fouquier, Emmanuel Marchand and Abdul Mouhamadsultane exensa firstname.lastname@exensa.com arxiv:64.5492v [cs.ir] 9 Apr 26

More information

CS54100: Database Systems

CS54100: Database Systems CS54100: Database Systems Relational Algebra 3 February 2012 Prof. Walid Aref Core Relational Algebra A small set of operators that allow us to manipulate relations in limited but useful ways. The operators

More information

Introduction to Data Management. Lecture #7 (Relational DB Design Theory II)

Introduction to Data Management. Lecture #7 (Relational DB Design Theory II) Introduction to Data Management Lecture #7 (Relational DB Design Theory II) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v Homework

More information

Constraints: Functional Dependencies

Constraints: Functional Dependencies Constraints: Functional Dependencies Fall 2017 School of Computer Science University of Waterloo Databases CS348 (University of Waterloo) Functional Dependencies 1 / 42 Schema Design When we get a relational

More information

Annotation algebras for RDFS

Annotation algebras for RDFS University of Edinburgh Semantic Web in Provenance Management, 2010 RDF definition U a set of RDF URI references (s, p, o) U U U an RDF triple s subject p predicate o object A finite set of triples an

More information

Relational Algebra 2. Week 5

Relational Algebra 2. Week 5 Relational Algebra 2 Week 5 Relational Algebra (So far) Basic operations: Selection ( σ ) Selects a subset of rows from relation. Projection ( π ) Deletes unwanted columns from relation. Cross-product

More information

The Query Containment Problem: Set Semantics vs. Bag Semantics. Phokion G. Kolaitis University of California Santa Cruz & IBM Research - Almaden

The Query Containment Problem: Set Semantics vs. Bag Semantics. Phokion G. Kolaitis University of California Santa Cruz & IBM Research - Almaden The Query Containment Problem: Set Semantics vs. Bag Semantics Phokion G. Kolaitis University of California Santa Cruz & IBM Research - Almaden PROBLEMS Problems worthy of attack prove their worth by hitting

More information

Functional. Dependencies. Functional Dependency. Definition. Motivation: Definition 11/12/2013

Functional. Dependencies. Functional Dependency. Definition. Motivation: Definition 11/12/2013 Functional Dependencies Functional Dependency Functional dependency describes the relationship between attributes in a relation. Eg. if A and B are attributes of relation R, B is functionally dependent

More information

COMPUTING SIMILARITY BETWEEN DOCUMENTS (OR ITEMS) This part is to a large extent based on slides obtained from

COMPUTING SIMILARITY BETWEEN DOCUMENTS (OR ITEMS) This part is to a large extent based on slides obtained from COMPUTING SIMILARITY BETWEEN DOCUMENTS (OR ITEMS) This part is to a large extent based on slides obtained from http://www.mmds.org Distance Measures For finding similar documents, we consider the Jaccard

More information

Exam 1. March 12th, CS525 - Midterm Exam Solutions

Exam 1. March 12th, CS525 - Midterm Exam Solutions Name CWID Exam 1 March 12th, 2014 CS525 - Midterm Exam s Please leave this empty! 1 2 3 4 5 Sum Things that you are not allowed to use Personal notes Textbook Printed lecture notes Phone The exam is 90

More information

Schema Refinement & Normalization Theory

Schema Refinement & Normalization Theory Schema Refinement & Normalization Theory Functional Dependencies Week 13 1 What s the Problem Consider relation obtained (call it SNLRHW) Hourly_Emps(ssn, name, lot, rating, hrly_wage, hrs_worked) What

More information

Compositions of Tree Series Transformations

Compositions of Tree Series Transformations Compositions of Tree Series Transformations Andreas Maletti a Technische Universität Dresden Fakultät Informatik D 01062 Dresden, Germany maletti@tcs.inf.tu-dresden.de December 03, 2004 1. Motivation 2.

More information

Justifications for Logic Programming

Justifications for Logic Programming Justifications for Logic Programming C. V. Damásio 1 and A. Analyti 2 and G. Antoniou 3 1 CENTRIA, Departamento de Informática Faculdade de Ciências e Tecnologia Universidade Nova de Lisboa, 2829-516 Caparica,

More information

SYMMETRIC POLYNOMIALS

SYMMETRIC POLYNOMIALS SYMMETRIC POLYNOMIALS KEITH CONRAD Let F be a field. A polynomial f(x 1,..., X n ) F [X 1,..., X n ] is called symmetric if it is unchanged by any permutation of its variables: for every permutation σ

More information

Lectures 6. Lecture 6: Design Theory

Lectures 6. Lecture 6: Design Theory Lectures 6 Lecture 6: Design Theory Lecture 6 Announcements Solutions to PS1 are posted online. Grades coming soon! Project part 1 is out. Check your groups and let us know if you have any issues. We have

More information

Compositions of Bottom-Up Tree Series Transformations

Compositions of Bottom-Up Tree Series Transformations Compositions of Bottom-Up Tree Series Transformations Andreas Maletti a Technische Universität Dresden Fakultät Informatik D 01062 Dresden, Germany maletti@tcs.inf.tu-dresden.de May 17, 2005 1. Motivation

More information

In-Database Factorised Learning fdbresearch.github.io

In-Database Factorised Learning fdbresearch.github.io In-Database Factorised Learning fdbresearch.github.io Mahmoud Abo Khamis, Hung Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich December 2017 Logic for Data Science Seminar Alan Turing Institute

More information

LESSON 8.1 RATIONAL EXPRESSIONS I

LESSON 8.1 RATIONAL EXPRESSIONS I LESSON 8. RATIONAL EXPRESSIONS I LESSON 8. RATIONAL EXPRESSIONS I 7 OVERVIEW Here is what you'll learn in this lesson: Multiplying and Dividing a. Determining when a rational expression is undefined Almost

More information

Scalable Uncertainty Management

Scalable Uncertainty Management Scalable Uncertainty Management 05 Query Evaluation in Probabilistic Databases Rainer Gemulla Jun 1, 2012 Overview In this lecture Primer: relational calculus Understand complexity of query evaluation

More information

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19 Schema Refinement and Normal Forms [R&G] Chapter 19 CS432 1 The Evils of Redundancy Redundancy is at the root of several problems associated with relational schemas: redundant storage, insert/delete/update

More information

Introduction to Data Management. Lecture #12 (Relational Algebra II)

Introduction to Data Management. Lecture #12 (Relational Algebra II) Introduction to Data Management Lecture #12 (Relational Algebra II) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v HW and exams:

More information

Brief Tutorial on Probabilistic Databases

Brief Tutorial on Probabilistic Databases Brief Tutorial on Probabilistic Databases Dan Suciu University of Washington Simons 206 About This Talk Probabilistic databases Tuple-independent Query evaluation Statistical relational models Representation,

More information

Functional Dependencies and Normalization. Instructor: Mohamed Eltabakh

Functional Dependencies and Normalization. Instructor: Mohamed Eltabakh Functional Dependencies and Normalization Instructor: Mohamed Eltabakh meltabakh@cs.wpi.edu 1 Goal Given a database schema, how do you judge whether or not the design is good? How do you ensure it does

More information

review To find the coefficient of all the terms in 15ab + 60bc 17ca: Coefficient of ab = 15 Coefficient of bc = 60 Coefficient of ca = -17

review To find the coefficient of all the terms in 15ab + 60bc 17ca: Coefficient of ab = 15 Coefficient of bc = 60 Coefficient of ca = -17 1. Revision Recall basic terms of algebraic expressions like Variable, Constant, Term, Coefficient, Polynomial etc. The coefficients of the terms in 4x 2 5xy + 6y 2 are Coefficient of 4x 2 is 4 Coefficient

More information

Semantic Optimization Techniques for Preference Queries

Semantic Optimization Techniques for Preference Queries Semantic Optimization Techniques for Preference Queries Jan Chomicki Dept. of Computer Science and Engineering, University at Buffalo,Buffalo, NY 14260-2000, chomicki@cse.buffalo.edu Abstract Preference

More information

CS2742 midterm test 2 study sheet. Boolean circuits: Predicate logic:

CS2742 midterm test 2 study sheet. Boolean circuits: Predicate logic: x NOT ~x x y AND x /\ y x y OR x \/ y Figure 1: Types of gates in a digital circuit. CS2742 midterm test 2 study sheet Boolean circuits: Boolean circuits is a generalization of Boolean formulas in which

More information

Lineage implementation in PostgreSQL

Lineage implementation in PostgreSQL Lineage implementation in PostgreSQL Andrin Betschart, 09-714-882 Martin Leimer, 09-728-569 3. Oktober 2013 Contents Contents 1. Introduction 3 2. Lineage computation in TPDBs 4 2.1. Lineage......................................

More information

BCNF revisited: 40 Years Normal Forms

BCNF revisited: 40 Years Normal Forms Full set of slides BCNF revisited: 40 Years Normal Forms Faculty of Computer Science Technion - IIT, Haifa janos@cs.technion.ac.il www.cs.technion.ac.il/ janos 1 Full set of slides Acknowledgements Based

More information

Logical Provenance in Data-Oriented Workflows (Long Version)

Logical Provenance in Data-Oriented Workflows (Long Version) Logical Provenance in Data-Oriented Workflows (Long Version) Robert Ikeda Stanford University rmikeda@cs.stanford.edu Akash Das Sarma IIT Kanpur akashds.iitk@gmail.com Jennifer Widom Stanford University

More information

An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems

An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems Arnab Bhattacharyya, Palash Dey, and David P. Woodruff Indian Institute of Science, Bangalore {arnabb,palash}@csa.iisc.ernet.in

More information