Database design and implementation CMPSCI 645

Size: px
Start display at page:

Download "Database design and implementation CMPSCI 645"

Transcription

1 Database design and implementation CMPSCI 645 Lectures 20: Probabilistic Databases *based on a tutorial by Dan Suciu

2 Have we seen uncertainty in DB yet? } NULL values Age Height Weight 20 NULL 200 NULL Denotes uncertainty / lack of informagon 2

3 Databases today are deterministic } An item either is in the database or is not } A tuple either is in the query answer or is not } This applies to all variety of data models: } RelaGonal, E/R, NF2, hierarchical, XML, 3

4 What is a probabilistic database? } An item belongs to the database is a probabilisgc event } A tuple is an answer to the query is a probabilisgc event } Can be extended to all data models; we discuss only probabilisgc rela%onal data 4

5 Example: Information extraction...52 A Goregaon West Mumbai... [Gupta&Sarawagi 2006] ID House-No Street City P 1 52 Goregaon West Mumbai A Goregaon West Mumbai Goregaon West Mumbai A Goregaon West Mumbai Here probabiliges are meaningful 20% of such extracgons are correct 7

6 Example: RFID ecosystem at UW [Welbourne 2007] 8

7 Two types of probabilistic data } Database is determinisgc Query answers are probabilisgc } Database is probabilisgc Query answers are probabilisgc 10

8 Motivating applications } Text extracgon / record linking } Inconsistent data } Ranking query answers 11

9 Ranking query answers Database is determinisgc The query returns a ranked list of tuples } User interested in top-k answers. 12

10 [Agrawal,Chaudhuri,Das,Gionis 2003] The empty answers problem Query is over-specified: no answers Example: try to buy a house in Amherst SELECT * FROM Houses WHERE bedrooms = 4 AND style = crafsman AND district = downtown AND price <

11 [Agrawal,Chaudhuri,Das,Gionis 2003] Ranking: Compute a similarity score between a tuple and the query Q = SELECT * FROM R WHERE A 1 =v 1 AND AND A m =v m Query is a vector: Tuple is a vector: Q = (v 1,, v m ) T = (u 1,, u m ) Rank tuples by their TF/IDF similarity to the query Q Includes pargal matches 14

12 [Motro:1988,Dalvi&S:2004] Similarity predicates in SQL Beyond a single table: Find the good deals in a neighborhood SELECT * FROM Houses x WHERE x.bedrooms ~ 4 AND x.style ~ crafsman AND x.price ~ 600k AND NOT EXISTS (SELECT * FROM Houses y WHERE x.district = y.district AND x.id!= y.id AND y.bedrooms ~ 4 AND y.style ~ crafsman AND y.price ~ 600k Users specify similarity predicates with ~ System combines atomic similariges using probabiliges 15

13 [HrisGdis&PapakonstanGnou 2002,BhaloGa et al.2002] Keyword search in databases Goal: } Users want to search via keywords } Do not know the schema Techniques: } Matching objects may be scarered across physical tables due to normalizagon; need on the fly joins } Score of a tuple = number of joins, plus presgge based on in-degree 16

14 [HrisGdis,PapakonstanGnou 2002] Q = Abiteboul and Widom Join sequences (tuple trees): Paper Author Author Abiteboul Person Paper In Conference Paper Paper Person Widom Conference Author Editor Author Author Person Abiteboul Person Widom Person Paper Author Conference Person Widom Person Abiteboul 17

15 Record linkage Determine if two data records describe same object Scenarios: } Join/merge two relagons } Remove duplicates from a single relagon } Validate incoming tuples against a reference 18

16 [Bertosi&Chomicki:2003] Inconsistent data Goal: consistent query answers from inconsistent databases ApplicaGons: } IntegraGon of autonomous data sources } Un-enforced integrity constraints } Temporary inconsistencies 23

17 [Bertosi&Chomicki:2003] The repair semantics Key (?!?) Name Affilia*on State Area Miklau UW WA Data security Dalvi UW WA Prob. Data Balazinska UW WA Data streams Balazinska MIT MA Data streams Miklau UMass MA Data security Find people in State=WA Dalvi Find people in State=MA ; Consider all repairs High precision, but low recall 24

18 Alternative probabilistic semantics Name Affilia*on State Area P Miklau UW WA Data security 0.5 Dalvi UW WA Prob. Data 1 Balazinska UW WA Data streams 0.5 Balazinska MIT MA Data streams 0.5 Miklau Umass MA Data security 0.5 State=WA Dalvi, Balazinska(0.5), Miklau(0.5) State=MA Balazinska(0.5), Miklau(0.5) Lower precision, but berer recall 25

19 [Barbara et al.1992] What is a Probabilistic Database (PDB)? HasObject p Keys Probability Non-keys Object Time Person P Laptop77 9:07 John 0.62 Jim 0.34 Mary 0.45 Book302 9:18 John 0.33 Fred 0.11 What does it mean? 27

20 Possible world semantics Object Time Person P Laptop77 9:07 Book302 9:18 John p 1 Jim p 2 Mary p 3 John p 4 Fred p 5 PDB Ω={ Object Time Person Object Time Person Laptop77 9:07 John } Object Time Person Laptop77 9:07 John Book302 9:18 Object Mary Time Person Laptop77 Object 9:07 John Time Person Book302 Laptop77 9:18 John Object 9:07 Jim Time Person Book302 Laptop77 9:18 Fred 9:07 Jim Book302 Laptop77 9:18 Object Mary 9:07 Time Jim Person p 1 p 3 Book302 9:18 Object John Time Person p 1 p 4 Book302 Laptop77 9:18 Object 9:07 Fred John Time Person Laptop77 Object 9:07 Jim Time Person Book302 Object 9:18 Mary Time Person Book302 Object 9:18 John Time Person p 1 (1- p 3 -p 4 -p 5 ) Book302 9:18 Fred Possible worlds 29

21 HasObject(Object, Time, Person, P) Object Time Person P Laptop77 9:07 John p 1 Jim p 2 Disjoint Book302 9:18 Mary p 3 John p 4 Disjoint Independent Fred p 5 Meets(Person1, Person2, Time, P) Person1 Person2 Time P John Jim 9.12 p 1 Mary Sue 9:20 p 2 Independent John Mary 9:20 p 3 31

22 Tuple correlation Disjoint NegaGvely correlated Independent PosiGvely correlated IdenGcal Pr(t 1 t 2 ) = 0 Pr(t 1 t 2 ) < Pr(t 1 ) Pr(t 2 ) Pr(t 1 t 2 ) = Pr(t 1 ) Pr(t 2 ) Pr(t 1 t 2 ) > Pr(t 1 ) Pr(t 2 ) Pr(t 1 t 2 ) = Pr(t 1 ) = Pr(t 2 ) 34

23 Query semantics Given a query Q and a probabilisgc database I p, what is the meaning of Q(I p )? 35

24 Query semantics Seman*cs 1: Possible Answers A probability distribugons on sets of tuples Pr(Q = A) = I INST. Q(I) = A Pr(I) Seman*cs 2: Possible Tuples A probability funcgon on tuples Pr(t Q) = I INST. T Q(I) Pr(I) 36

25 Purchase p Name City Product John Seattle Gizmo John Seattle Camera Sue Denver Gizmo Sue Denver Camera Name City Product John Boston Gizmo Sue Denver Gizmo Sue Seattle Gadget Name City Product John Seattle Gizmo John Seattle Camera Sue Seattle Camera Name City Product John Boston Camera Sue Seattle Camera Pr(I 1 ) = 1/3 Pr(I 2 ) = 1/12 Pr(I 3 ) = 1/2 Pr(I 4 ) = 1/12 SELECT DISTINCT x.product FROM Purchase p x, Purchase p y WHERE x.name = 'John' and x.product = y.product and y.name = 'Sue' Possible answers semangcs: Answer set Probability Gizmo, Camera 1/3 Pr(I 1 ) Gizmo 1/12 Pr(I 2 ) Camera 7/12 P(I 3 ) + P(I 4 ) Possible tuples semangcs: Tuple Probability Camera 11/12 Pr(I 1 )+P(I 3 ) + P(I 4 ) Gizmo 5/12 Pr(I 1 )+Pr(I 2 ) 37

26 How do we evaluate queries? 38

27 Probability of Boolean Expressions Φ = X 1 X 2 v X 1 X 3 v X 2 X 3 P(X 1 )= p 1, P(X 2 )= p 2, P(X 3 )= p 3 Compute P(Φ) Ω=. X 1 X 2 X 3 P Φ (1-p 1 )p 2 p p 1 (1-p 2 )p p 1 p 2 (1-p 3 ) p 1 p 2 p 3 1 Pr(Φ)=(1-p 1 )p 2 p 3 + p 1 (1-p 2 )p 3 + p 1 p 2 (1-p 3 ) + p 1 p 2 p 3 39

28 Complexity of Boolean expression probability [Valiant:1979] Theorem: For a Boolean expression E, compugng Pr(E) is #P-complete NP: class of problems of the form is there a witness (SAT) #P: class of problems of the form how many witnesses (#SAT) 40

29 Query q + Database PDB è Φ q= R(x, y), S(x, z) R p PDB= A B P a 1 b 1 p 1 X 1 a 2 b 2 p 2 X 2 S p A C P a 1 c 1 q 1 Y 1 a 1 c 2 q 2 Y 2 a 2 c 3 q 3 Y 3 a 2 c 4 q 4 Y 4 Φ = ê X 1 Y 1 v X 1 Y 2 v X 2 Y 3 v X 2 Y 4 v X 2 Y 5 a 2 c 5 q 5 Y 5 41

30 Application to query evaluation Corollary Fix FO query q Exact evaluagon of Pr(q) on input PDB is in #P Corollary Fix a conjuncgve query q. ApproximaGon of Pr(q) on input PDB is in PTIME (FPTRAS) [Graedel,Gurevitch,Hirsch:1998] 42

31 Query complexity Data complexity of a query Q: } Compute Q(I p ), for probabilisgc database I p Simplest scenario only: } Possible tuples semangcs for Q } Independent tuples for I p 43

32 Challenges } Query opgmizagon } A #P query with subqueries in PTIME } Combine safe plans and inference } TheoreGcal analysis of self-joins } Complex probabilisgc models } Capture complex correlagons } Hard and sof constraints on query evaluagon } 50

Topics in Probabilistic and Statistical Databases. Lecture 2: Representation of Probabilistic Databases. Dan Suciu University of Washington

Topics in Probabilistic and Statistical Databases. Lecture 2: Representation of Probabilistic Databases. Dan Suciu University of Washington Topics in Probabilistic and Statistical Databases Lecture 2: Representation of Probabilistic Databases Dan Suciu University of Washington 1 Review: Definition The set of all possible database instances:

More information

Query Evaluation on Probabilistic Databases. CSE 544: Wednesday, May 24, 2006

Query Evaluation on Probabilistic Databases. CSE 544: Wednesday, May 24, 2006 Query Evaluation on Probabilistic Databases CSE 544: Wednesday, May 24, 2006 Problem Setting Queries: Tables: Review A(x,y) :- Review(x,y), Movie(x,z), z > 1991 name rating p Movie Monkey Love good.5 title

More information

Queries and Materialized Views on Probabilistic Databases

Queries and Materialized Views on Probabilistic Databases Queries and Materialized Views on Probabilistic Databases Nilesh Dalvi Christopher Ré Dan Suciu September 11, 2008 Abstract We review in this paper some recent yet fundamental results on evaluating queries

More information

Probabilistic Databases

Probabilistic Databases Probabilistic Databases Dan Olteanu (Oxford) London, November 2017 The National Archives Probabilistic Databases For the purpose of this introductory talk: Probabilistic data = Relational data + Probabilities

More information

Brief Tutorial on Probabilistic Databases

Brief Tutorial on Probabilistic Databases Brief Tutorial on Probabilistic Databases Dan Suciu University of Washington Simons 206 About This Talk Probabilistic databases Tuple-independent Query evaluation Statistical relational models Representation,

More information

Semantics of Ranking Queries for Probabilistic Data and Expected Ranks

Semantics of Ranking Queries for Probabilistic Data and Expected Ranks Semantics of Ranking Queries for Probabilistic Data and Expected Ranks Graham Cormode AT&T Labs Feifei Li FSU Ke Yi HKUST 1-1 Uncertain, uncertain, uncertain... (Probabilistic, probabilistic, probabilistic...)

More information

CMPT 354: Database System I. Lecture 9. Design Theory

CMPT 354: Database System I. Lecture 9. Design Theory CMPT 354: Database System I Lecture 9. Design Theory 1 Design Theory Design theory is about how to represent your data to avoid anomalies. Design 1 Design 2 Student Course Room Mike 354 AQ3149 Mary 354

More information

10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design

10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design Outline Introduction to Database Systems CSE 444 Design theory: 3.1-3.4 [Old edition: 3.4-3.6] Lectures 6-7: Database Design 1 2 Schema Refinements = Normal Forms 1st Normal Form = all tables are flat

More information

Database Design and Implementation

Database Design and Implementation Database Design and Implementation CS 645 Schema Refinement First Normal Form (1NF) A schema is in 1NF if all tables are flat Student Name GPA Course Student Name GPA Alice 3.8 Bob 3.7 Carol 3.9 Alice

More information

Introduction to Database Systems CSE 414. Lecture 20: Design Theory

Introduction to Database Systems CSE 414. Lecture 20: Design Theory Introduction to Database Systems CSE 414 Lecture 20: Design Theory CSE 414 - Spring 2018 1 Class Overview Unit 1: Intro Unit 2: Relational Data Models and Query Languages Unit 3: Non-relational data Unit

More information

Practice and Applications of Data Management CMPSCI 345. Lecture 15: Functional Dependencies

Practice and Applications of Data Management CMPSCI 345. Lecture 15: Functional Dependencies Practice and Applications of Data Management CMPSCI 345 Lecture 15: Functional Dependencies First Normal Form (1NF) } A database schema is in First Normal Form if all tables are flat Student Student Name

More information

Probabilistic Databases

Probabilistic Databases Probabilistic Databases Amol Deshpande University of Maryland Goal Introduction to probabilistic databases Focus on an overview of: Different possible representations Challenges in using them Probabilistic

More information

Introduction to Data Management CSE 344

Introduction to Data Management CSE 344 Introduction to Data Management CSE 344 Lecture 18: Design Theory Wrap-up 1 Announcements WQ6 is due on Tuesday Homework 6 is due on Thursday Be careful about your remaining late days. Today: Midterm review

More information

Databases Lecture 8. Timothy G. Griffin. Computer Laboratory University of Cambridge, UK. Databases, Lent 2009

Databases Lecture 8. Timothy G. Griffin. Computer Laboratory University of Cambridge, UK. Databases, Lent 2009 Databases Lecture 8 Timothy G. Griffin Computer Laboratory University of Cambridge, UK Databases, Lent 2009 T. Griffin (cl.cam.ac.uk) Databases Lecture 8 DB 2009 1 / 15 Lecture 08: Multivalued Dependencies

More information

A Dichotomy. in in Probabilistic Databases. Joint work with Robert Fink. for Non-Repeating Queries with Negation Queries with Negation

A Dichotomy. in in Probabilistic Databases. Joint work with Robert Fink. for Non-Repeating Queries with Negation Queries with Negation Dichotomy for Non-Repeating Queries with Negation Queries with Negation in in Probabilistic Databases Robert Dan Olteanu Fink and Dan Olteanu Joint work with Robert Fink Uncertainty in Computation Simons

More information

A Knowledge-Based Approach to Entity Resolution

A Knowledge-Based Approach to Entity Resolution A Knowledge-Based Approach to Entity Resolution Klaus-Dieter Schewe Software Competence Center Hagenberg and Johannes-Kepler-University Linz Austria kd.schewe@scch.at, kd.schewe@faw.at Qing Wang Research

More information

CSE 344 AUGUST 6 TH LOSS AND VIEWS

CSE 344 AUGUST 6 TH LOSS AND VIEWS CSE 344 AUGUST 6 TH LOSS AND VIEWS ADMINISTRIVIA WQ6 due tonight HW7 due Wednesday DATABASE DESIGN PROCESS Conceptual Model: name product makes company price name address Relational Model: Tables + constraints

More information

Introduction to Data Management CSE 344

Introduction to Data Management CSE 344 Introduction to Data Management CSE 344 Lectures 18: BCNF 1 What makes good schemas? 2 Review: Relation Decomposition Break the relation into two: Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234

More information

Answering Queries from Statistics and Probabilistic Views. Nilesh Dalvi and Dan Suciu, University of Washington.

Answering Queries from Statistics and Probabilistic Views. Nilesh Dalvi and Dan Suciu, University of Washington. Answering Queries from Statistics and Probabilistic Views Nilesh Dalvi and Dan Suciu, University of Washington. Background Query answering using Views problem: find answers to a query q over a database

More information

CSE 544 Principles of Database Management Systems

CSE 544 Principles of Database Management Systems CSE 544 Principles of Database Management Systems Lecture 3 Schema Normalization CSE 544 - Winter 2018 1 Announcements Project groups due on Friday First review due on Tuesday (makeup lecture) Run git

More information

Design Theory. Design Theory I. 1. Normal forms & functional dependencies. Today s Lecture. 1. Normal forms & functional dependencies

Design Theory. Design Theory I. 1. Normal forms & functional dependencies. Today s Lecture. 1. Normal forms & functional dependencies Design Theory BBM471 Database Management Systems Dr. Fuat Akal akal@hacettepe.edu.tr Design Theory I 2 Today s Lecture 1. Normal forms & functional dependencies 2. Finding functional dependencies 3. Closures,

More information

The Veracity of Big Data

The Veracity of Big Data The Veracity of Big Data Pierre Senellart École normale supérieure 13 October 2016, Big Data & Market Insights 23 April 2013, Dow Jones (cnn.com) 23 April 2013, Dow Jones (cnn.com) 23 April 2013, Dow Jones

More information

The Trichotomy of HAVING Queries on a Probabilistic Database

The Trichotomy of HAVING Queries on a Probabilistic Database VLDBJ manuscript No. (will be inserted by the editor) The Trichotomy of HAVING Queries on a Probabilistic Database Christopher Ré Dan Suciu Received: date / Accepted: date Abstract We study the evaluation

More information

CSC 261/461 Database Systems Lecture 13. Spring 2018

CSC 261/461 Database Systems Lecture 13. Spring 2018 CSC 261/461 Database Systems Lecture 13 Spring 2018 BCNF Decomposition Algorithm BCNFDecomp(R): Find X s.t.: X + X and X + [all attributes] if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose

More information

11/6/11. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

11/6/11. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design) Relational Schema Design Introduction to Management CSE 344 Lectures 16: Database Design Conceptual Model: Relational Model: plus FD s name Product buys Person price name ssn Normalization: Eliminates

More information

11/1/12. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

11/1/12. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design) Relational Schema Design Introduction to Management CSE 344 Lectures 16: Database Design Conceptual Model: Relational Model: plus FD s name Product buys Person price name ssn Normalization: Eliminates

More information

Lectures 6. Lecture 6: Design Theory

Lectures 6. Lecture 6: Design Theory Lectures 6 Lecture 6: Design Theory Lecture 6 Announcements Solutions to PS1 are posted online. Grades coming soon! Project part 1 is out. Check your groups and let us know if you have any issues. We have

More information

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #3: SQL---Part 1

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #3: SQL---Part 1 CS 4604: Introduc0on to Database Management Systems B. Aditya Prakash Lecture #3: SQL---Part 1 Announcements---Project Goal: design a database system applica=on with a web front-end Project Assignment

More information

Introduction to Management CSE 344

Introduction to Management CSE 344 Introduction to Management CSE 344 Lectures 17: Design Theory 1 Announcements No class/office hour on Monday Midterm on Wednesday (Feb 19) in class HW5 due next Thursday (Feb 20) No WQ next week (WQ6 due

More information

Schema Refinement. Feb 4, 2010

Schema Refinement. Feb 4, 2010 Schema Refinement Feb 4, 2010 1 Relational Schema Design Conceptual Design name Product buys Person price name ssn ER Model Logical design Relational Schema plus Integrity Constraints Schema Refinement

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 2: Distributed Database Design Logistics Gradiance No action items for now Detailed instructions coming shortly First quiz to be released

More information

Outline. Part 1: Motivation Part 2: Probabilistic Databases Part 3: Weighted Model Counting Part 4: Lifted Inference for WFOMC

Outline. Part 1: Motivation Part 2: Probabilistic Databases Part 3: Weighted Model Counting Part 4: Lifted Inference for WFOMC Outline Part 1: Motivation Part 2: Probabilistic Databases Part 3: Weighted Model Counting Part 4: Lifted Inference for WFOMC Part 5: Completeness of Lifted Inference Part 6: Query Compilation Part 7:

More information

CSE 303: Database. Outline. Lecture 10. First Normal Form (1NF) First Normal Form (1NF) 10/1/2016. Chapter 3: Design Theory of Relational Database

CSE 303: Database. Outline. Lecture 10. First Normal Form (1NF) First Normal Form (1NF) 10/1/2016. Chapter 3: Design Theory of Relational Database CSE 303: Database Lecture 10 Chapter 3: Design Theory of Relational Database Outline 1st Normal Form = all tables attributes are atomic 2nd Normal Form = obsolete Boyce Codd Normal Form = will study 3rd

More information

Extending Conditional Dependencies with Built-in Predicates

Extending Conditional Dependencies with Built-in Predicates IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, DECEMBER 214 1 Extending Conditional Dependencies with Built-in Predicates Shuai Ma, Liang Duan, Wenfei Fan, Chunming Hu, and Wenguang

More information

Data Dependencies in the Presence of Difference

Data Dependencies in the Presence of Difference Data Dependencies in the Presence of Difference Tsinghua University sxsong@tsinghua.edu.cn Outline Introduction Application Foundation Discovery Conclusion and Future Work Data Dependencies in the Presence

More information

Event Queries on Correlated Probabilistic Streams

Event Queries on Correlated Probabilistic Streams Event Queries on Correlated Probabilistic Streams Christopher Ré, Julie Letchner, Magdalena Balazinska and Dan Suciu Dept. of Computer Science and Engineering,University of Washington Seattle, WA, USA

More information

Representing and Querying Correlated Tuples in Probabilistic Databases

Representing and Querying Correlated Tuples in Probabilistic Databases Representing and Querying Correlated Tuples in Probabilistic Databases Prithviraj Sen Amol Deshpande Department of Computer Science University of Maryland, College Park. International Conference on Data

More information

Instructor: Sudeepa Roy

Instructor: Sudeepa Roy CompSci 590.6 Understanding Data: Theory and Applications Lecture 13 Incomplete Databases Instructor: Sudeepa Roy Email: sudeepa@cs.duke.edu 1 Today s Reading Alice Book : Foundations of Databases Abiteboul-

More information

Representing and Querying Correlated Tuples in Probabilistic Databases

Representing and Querying Correlated Tuples in Probabilistic Databases Representing and Querying Correlated Tuples in Probabilistic Databases Prithviraj Sen Dept. of Computer Science, University of Maryland, College Park, MD 20742. sen@cs.umd.edu Amol Deshpande Dept. of Computer

More information

Semantic Optimization Techniques for Preference Queries

Semantic Optimization Techniques for Preference Queries Semantic Optimization Techniques for Preference Queries Jan Chomicki Dept. of Computer Science and Engineering, University at Buffalo,Buffalo, NY 14260-2000, chomicki@cse.buffalo.edu Abstract Preference

More information

SCHEMA NORMALIZATION. CS 564- Fall 2015

SCHEMA NORMALIZATION. CS 564- Fall 2015 SCHEMA NORMALIZATION CS 564- Fall 2015 HOW TO BUILD A DB APPLICATION Pick an application Figure out what to model (ER model) Output: ER diagram Transform the ER diagram to a relational schema Refine the

More information

Databases 2011 The Relational Algebra

Databases 2011 The Relational Algebra Databases 2011 Christian S. Jensen Computer Science, Aarhus University What is an Algebra? An algebra consists of values operators rules Closure: operations yield values Examples integers with +,, sets

More information

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018 CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018 Announcement Read Chapter 14 and 15 You must self-study these chapters Too huge to cover in Lectures Project 2 Part 1 due tonight Agenda 1.

More information

Query answering using views

Query answering using views Query answering using views General setting: database relations R 1,...,R n. Several views V 1,...,V k are defined as results of queries over the R i s. We have a query Q over R 1,...,R n. Question: Can

More information

Practice and Applications of Data Management CMPSCI 345. Lecture 16: Schema Design and Normalization

Practice and Applications of Data Management CMPSCI 345. Lecture 16: Schema Design and Normalization Practice and Applications of Data Management CMPSCI 345 Lecture 16: Schema Design and Normalization Keys } A superkey is a set of a/ributes A 1,..., A n s.t. for any other a/ribute B, we have A 1,...,

More information

GAV-sound with conjunctive queries

GAV-sound with conjunctive queries GAV-sound with conjunctive queries Source and global schema as before: source R 1 (A, B),R 2 (B,C) Global schema: T 1 (A, C), T 2 (B,C) GAV mappings become sound: T 1 {x, y, z R 1 (x,y) R 2 (y,z)} T 2

More information

A Toolbox of Query Evaluation Techniques for Probabilistic Databases

A Toolbox of Query Evaluation Techniques for Probabilistic Databases 2nd Workshop on Management and mining Of UNcertain Data (MOUND) Long Beach, March 1st, 2010 A Toolbox of Query Evaluation Techniques for Probabilistic Databases Dan Olteanu, Oxford University Computing

More information

Homework Assignment 2. Due Date: October 17th, CS425 - Database Organization Results

Homework Assignment 2. Due Date: October 17th, CS425 - Database Organization Results Name CWID Homework Assignment 2 Due Date: October 17th, 2017 CS425 - Database Organization Results Please leave this empty! 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.15 2.16 2.17 2.18 2.19 Sum

More information

Query Processing. 3 steps: Parsing & Translation Optimization Evaluation

Query Processing. 3 steps: Parsing & Translation Optimization Evaluation rela%onal algebra Query Processing 3 steps: Parsing & Translation Optimization Evaluation 30 Simple set of algebraic operations on relations Journey of a query SQL select from where Rela%onal algebra π

More information

Topics in Probabilistic and Statistical Databases. Lecture 9: Histograms and Sampling. Dan Suciu University of Washington

Topics in Probabilistic and Statistical Databases. Lecture 9: Histograms and Sampling. Dan Suciu University of Washington Topics in Probabilistic and Statistical Databases Lecture 9: Histograms and Sampling Dan Suciu University of Washington 1 References Fast Algorithms For Hierarchical Range Histogram Construction, Guha,

More information

Multi-join Query Evaluation on Big Data Lecture 2

Multi-join Query Evaluation on Big Data Lecture 2 Multi-join Query Evaluation on Big Data Lecture 2 Dan Suciu March, 2015 Dan Suciu Multi-Joins Lecture 2 March, 2015 1 / 34 Multi-join Query Evaluation Outline Part 1 Optimal Sequential Algorithms. Thursday

More information

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101 CSC 261/461 Database Systems Lecture 8 Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101 Agenda 1. Database Design 2. Normal forms & functional dependencies 3. Finding functional dependencies

More information

Meelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 05

Meelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 05 Meelis Kull meelis.kull@ut.ee Autumn 2017 1 Sample vs population Example task with red and black cards Statistical terminology Permutation test and hypergeometric test Histogram on a sample vs population

More information

CSE 344 MAY 16 TH NORMALIZATION

CSE 344 MAY 16 TH NORMALIZATION CSE 344 MAY 16 TH NORMALIZATION ADMINISTRIVIA HW6 Due Tonight Prioritize local runs OQ6 Out Today HW7 Out Today E/R + Normalization Exams In my office; Regrades through me DATABASE DESIGN PROCESS Conceptual

More information

arxiv: v1 [cs.ai] 21 Sep 2014

arxiv: v1 [cs.ai] 21 Sep 2014 Oblivious Bounds on the Probability of Boolean Functions WOLFGANG GATTERBAUER, Carnegie Mellon University DAN SUCIU, University of Washington arxiv:409.6052v [cs.ai] 2 Sep 204 This paper develops upper

More information

Data-Driven Logical Reasoning

Data-Driven Logical Reasoning Data-Driven Logical Reasoning Claudia d Amato Volha Bryl, Luciano Serafini November 11, 2012 8 th International Workshop on Uncertainty Reasoning for the Semantic Web 11 th ISWC, Boston (MA), USA. Heterogeneous

More information

Databases. Exercises on Relational Algebra

Databases. Exercises on Relational Algebra Databases Exercises on Relational Algebra The Lab Sessions Giacomo Bergami (giacomo.bergami2@unibo.it) bergami.co.nr 2016/10/07 Keys and Superkeys Relational Algebra (I) Negation Minimum 2016/10/14 Relational

More information

Efficient Query Evaluation on Probabilistic Databases

Efficient Query Evaluation on Probabilistic Databases Efficient Query Evaluation on Probabilistic Databases Nilesh Dalvi and Dan Suciu April 4, 2004 Abstract We describe a system that supports arbitrarily complex SQL queries with uncertain predicates. The

More information

GIS Lecture 4: Data. GIS Tutorial, Third Edition GIS 1

GIS Lecture 4: Data. GIS Tutorial, Third Edition GIS 1 GIS Lecture 4: Data GIS 1 Outline Data Types, Tables, and Formats Geodatabase Tabular Joins Spatial Joins Field Calculator ArcCatalog Functions GIS 2 Data Types, Tables, Formats GIS 3 Directly Loadable

More information

Schema Refinement and Normal Forms

Schema Refinement and Normal Forms Schema Refinement and Normal Forms UMass Amherst Feb 14, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke, Dan Suciu 1 Relational Schema Design Conceptual Design name Product buys Person price name

More information

Database Design and Implementation

Database Design and Implementation Database Design and Implementation CS 645 Data provenance Provenance provenance, n. The fact of coming from some particular source or quarter; origin, derivation [Oxford English Dictionary] Data provenance

More information

Data Analytics Beyond OLAP. Prof. Yanlei Diao

Data Analytics Beyond OLAP. Prof. Yanlei Diao Data Analytics Beyond OLAP Prof. Yanlei Diao OPERATIONAL DBs DB 1 DB 2 DB 3 EXTRACT TRANSFORM LOAD (ETL) METADATA STORE DATA WAREHOUSE SUPPORTS OLAP DATA MINING INTERACTIVE DATA EXPLORATION Overview of

More information

StarAI Full, 6+1 pages Short, 2 page position paper or abstract

StarAI Full, 6+1 pages Short, 2 page position paper or abstract StarAI 2015 Fifth International Workshop on Statistical Relational AI At the 31st Conference on Uncertainty in Artificial Intelligence (UAI) (right after ICML) In Amsterdam, The Netherlands, on July 16.

More information

Profiling Sets for Preference Querying

Profiling Sets for Preference Querying Profiling Sets for Preference Querying Xi Zhang and Jan Chomicki Department of Computer Science and Engineering University at Buffalo, SUNY, U.S.A. {xizhang,chomicki}@cse.buffalo.edu Abstract. We propose

More information

Logic and Databases. Phokion G. Kolaitis. UC Santa Cruz & IBM Research Almaden. Lecture 4 Part 1

Logic and Databases. Phokion G. Kolaitis. UC Santa Cruz & IBM Research Almaden. Lecture 4 Part 1 Logic and Databases Phokion G. Kolaitis UC Santa Cruz & IBM Research Almaden Lecture 4 Part 1 1 Thematic Roadmap Logic and Database Query Languages Relational Algebra and Relational Calculus Conjunctive

More information

DECOMPOSITION & SCHEMA NORMALIZATION

DECOMPOSITION & SCHEMA NORMALIZATION DECOMPOSITION & SCHEMA NORMALIZATION CS 564- Spring 2018 ACKs: Dan Suciu, Jignesh Patel, AnHai Doan WHAT IS THIS LECTURE ABOUT? Bad schemas lead to redundancy To correct bad schemas: decompose relations

More information

α-acyclic Joins Jef Wijsen May 4, 2017

α-acyclic Joins Jef Wijsen May 4, 2017 α-acyclic Joins Jef Wijsen May 4, 2017 1 Motivation Joins in a Distributed Environment Assume the following relations. 1 M[NN, Field of Study, Year] stores data about students of UMONS. For example, (19950423158,

More information

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 6: Linear Regression Model with RegularizaEons. Dr. Yanjun Qi. University of Virginia

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 6: Linear Regression Model with RegularizaEons. Dr. Yanjun Qi. University of Virginia UVA CS 6316/4501 Fall 2016 Machine Learning Lecture 6: Linear Regression Model with RegularizaEons Dr. Yanjun Qi University of Virginia Department of Computer Science 1 Where are we? è Five major secgons

More information

INTRODUCTION TO RELATIONAL DATABASE SYSTEMS

INTRODUCTION TO RELATIONAL DATABASE SYSTEMS INTRODUCTION TO RELATIONAL DATABASE SYSTEMS DATENBANKSYSTEME 1 (INF 3131) Torsten Grust Universität Tübingen Winter 2017/18 1 THE RELATIONAL ALGEBRA The Relational Algebra (RA) is a query language for

More information

CSE 344 AUGUST 3 RD NORMALIZATION

CSE 344 AUGUST 3 RD NORMALIZATION CSE 344 AUGUST 3 RD NORMALIZATION ADMINISTRIVIA WQ6 due Monday DB design HW7 due next Wednesday DB design normalization DATABASE DESIGN PROCESS Conceptual Model: name product makes company price name address

More information

L13: Normalization. CS3200 Database design (sp18 s2) 2/26/2018

L13: Normalization. CS3200 Database design (sp18 s2)   2/26/2018 L13: Normalization CS3200 Database design (sp18 s2) https://course.ccs.neu.edu/cs3200sp18s2/ 2/26/2018 274 Announcements! Keep bringing your name plates J Page Numbers now bigger (may change slightly)

More information

Price: $25 (incl. T-Shirt, morning tea and lunch) Visit:

Price: $25 (incl. T-Shirt, morning tea and lunch) Visit: Three days of interesting talks & workshops from industry experts across Australia Explore new computing topics Network with students & employers in Brisbane Price: $25 (incl. T-Shirt, morning tea and

More information

Data Cleaning and Query Answering with Matching Dependencies and Matching Functions

Data Cleaning and Query Answering with Matching Dependencies and Matching Functions Data Cleaning and Query Answering with Matching Dependencies and Matching Functions Leopoldo Bertossi Carleton University Ottawa, Canada bertossi@scs.carleton.ca Solmaz Kolahi University of British Columbia

More information

Composing Schema Mappings: Second-Order Dependencies to the Rescue

Composing Schema Mappings: Second-Order Dependencies to the Rescue Composing Schema Mappings: Second-Order Dependencies to the Rescue RONALD FAGIN IBM Almaden Research Center PHOKION G. KOLAITIS IBM Almaden Research Center LUCIAN POPA IBM Almaden Research Center WANG-CHIEW

More information

Computing Query Probability with Incidence Algebras Technical Report UW-CSE University of Washington

Computing Query Probability with Incidence Algebras Technical Report UW-CSE University of Washington Computing Query Probability with Incidence Algebras Technical Report UW-CSE-10-03-02 University of Washington Nilesh Dalvi, Karl Schnaitter and Dan Suciu Revised: August 24, 2010 Abstract We describe an

More information

Efficient Top-k Query Evaluation on Probabilistic Data

Efficient Top-k Query Evaluation on Probabilistic Data Efficient Top-k Query Evaluation on Probabilistic Data Extended Version TR: #2006-06-05 Christopher Ré, Nilesh Dalvi and Dan Suciu University of Washington Dept. of Computer Science and Engineering E-mail:

More information

Quantifying Causal Effects on Query Answering in Databases

Quantifying Causal Effects on Query Answering in Databases Quantifying Causal Effects on Query Answering in Databases Babak Salimi University of Washington February 2016 Collaborators: Leopoldo Bertossi (Carleton University), Dan Suciu (University of Washington),

More information

Lineage implementation in PostgreSQL

Lineage implementation in PostgreSQL Lineage implementation in PostgreSQL Andrin Betschart, 09-714-882 Martin Leimer, 09-728-569 3. Oktober 2013 Contents Contents 1. Introduction 3 2. Lineage computation in TPDBs 4 2.1. Lineage......................................

More information

Scalable Uncertainty Management

Scalable Uncertainty Management Scalable Uncertainty Management 05 Query Evaluation in Probabilistic Databases Rainer Gemulla Jun 1, 2012 Overview In this lecture Primer: relational calculus Understand complexity of query evaluation

More information

Approximate Rewriting of Queries Using Views

Approximate Rewriting of Queries Using Views Approximate Rewriting of Queries Using Views Foto Afrati 1, Manik Chandrachud 2, Rada Chirkova 2, and Prasenjit Mitra 3 1 School of Electrical and Computer Engineering National Technical University of

More information

QSQL: Incorporating Logic-based Retrieval Conditions into SQL

QSQL: Incorporating Logic-based Retrieval Conditions into SQL QSQL: Incorporating Logic-based Retrieval Conditions into SQL Sebastian Lehrack and Ingo Schmitt Brandenburg University of Technology Cottbus Institute of Computer Science Chair of Database and Information

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) Relational Calculus Lecture 6, January 26, 2016 Mohammad Hammoud Today Last Session: Relational Algebra Today s Session: Relational calculus Relational tuple calculus Announcements:

More information

Chapter 3 Relational Model

Chapter 3 Relational Model Chapter 3 Relational Model Table of Contents 1. Structure of Relational Databases 2. Relational Algebra 3. Tuple Relational Calculus 4. Domain Relational Calculus Chapter 3-1 1 1. Structure of Relational

More information

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information Relational Database Design Database Design To generate a set of relation schemas that allows - to store information without unnecessary redundancy - to retrieve desired information easily Approach - design

More information

Inferring Data Currency and Consistency for Conflict Resolution

Inferring Data Currency and Consistency for Conflict Resolution Inferring Data Currency and Consistency for Conflict Resolution Wenfei Fan,4 Floris Geerts 2 Nan Tang 3 Wenyuan Yu University of Edinburgh 2 University of Antwerp 3 QCRI, Qatar Foundation 4 Big Data Research

More information

6.830 Lecture 11. Recap 10/15/2018

6.830 Lecture 11. Recap 10/15/2018 6.830 Lecture 11 Recap 10/15/2018 Celebration of Knowledge 1.5h No phones, No laptops Bring your Student-ID The 5 things allowed on your desk Calculator allowed 4 pages (2 pages double sided) of your liking

More information

Functional Dependencies. Getting a good DB design Lisa Ball November 2012

Functional Dependencies. Getting a good DB design Lisa Ball November 2012 Functional Dependencies Getting a good DB design Lisa Ball November 2012 Outline (2012) SEE NEXT SLIDE FOR ALL TOPICS (some for you to read) Normalization covered by Dr Sanchez Armstrong s Axioms other

More information

CMPSCI 250: Introduction to Computation. Lecture #11: Equivalence Relations David Mix Barrington 27 September 2013

CMPSCI 250: Introduction to Computation. Lecture #11: Equivalence Relations David Mix Barrington 27 September 2013 CMPSCI 250: Introduction to Computation Lecture #11: Equivalence Relations David Mix Barrington 27 September 2013 Equivalence Relations Definition of Equivalence Relations Two More Examples: Universal

More information

Information theoretical approach for domain ontology exploration in large Earth observation image archives

Information theoretical approach for domain ontology exploration in large Earth observation image archives Information theoretical approach for domain ontology exploration in large Earth observation image archives Mihai Datcu, Mariana Ciucu DLR Oberpfaffenhofen IGARSS 2004 KIM - Knowledge driven Information

More information

On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases

On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases Xi Zhang Jan Chomicki SUNY at Buffalo September 23, 2008 Xi Zhang, Jan Chomicki (SUNY at Buffalo) Topk Queries in Prob. DB September

More information

Overview of Topics. Finite Model Theory. Finite Model Theory. Connections to Database Theory. Qing Wang

Overview of Topics. Finite Model Theory. Finite Model Theory. Connections to Database Theory. Qing Wang Overview of Topics Finite Model Theory Part 1: Introduction 1 What is finite model theory? 2 Connections to some areas in CS Qing Wang qing.wang@anu.edu.au Database theory Complexity theory 3 Basic definitions

More information

University of New Mexico Department of Computer Science. Final Examination. CS 561 Data Structures and Algorithms Fall, 2013

University of New Mexico Department of Computer Science. Final Examination. CS 561 Data Structures and Algorithms Fall, 2013 University of New Mexico Department of Computer Science Final Examination CS 561 Data Structures and Algorithms Fall, 2013 Name: Email: This exam lasts 2 hours. It is closed book and closed notes wing

More information

Data Cleaning and Query Answering with Matching Dependencies and Matching Functions

Data Cleaning and Query Answering with Matching Dependencies and Matching Functions Data Cleaning and Query Answering with Matching Dependencies and Matching Functions Leopoldo Bertossi 1, Solmaz Kolahi 2, and Laks V. S. Lakshmanan 2 1 Carleton University, Ottawa, Canada. bertossi@scs.carleton.ca

More information

Active Integrity Constraints and Revision Programming

Active Integrity Constraints and Revision Programming Under consideration for publication in Theory and Practice of Logic Programming 1 Active Integrity Constraints and Revision Programming Luciano Caroprese 1 and Miros law Truszczyński 2 1 Università della

More information

15 Introduction to Data Mining

15 Introduction to Data Mining 15 Introduction to Data Mining 15.1 Introduction to principle methods 15.2 Mining association rule see also: A. Kemper, Chap. 17.4, Kifer et al.: chap 17.7 ff 15.1 Introduction "Discovery of useful, possibly

More information

CS60021: Scalable Data Mining. Similarity Search and Hashing. Sourangshu Bha>acharya

CS60021: Scalable Data Mining. Similarity Search and Hashing. Sourangshu Bha>acharya CS62: Scalable Data Mining Similarity Search and Hashing Sourangshu Bha>acharya Finding Similar Items Distance Measures Goal: Find near-neighbors in high-dim. space We formally define near neighbors as

More information

3. Only sequences that were formed by using finitely many applications of rules 1 and 2, are propositional formulas.

3. Only sequences that were formed by using finitely many applications of rules 1 and 2, are propositional formulas. 1 Chapter 1 Propositional Logic Mathematical logic studies correct thinking, correct deductions of statements from other statements. Let us make it more precise. A fundamental property of a statement is

More information

Canadian Board of Examiners for Professional Surveyors Core Syllabus Item C 5: GEOSPATIAL INFORMATION SYSTEMS

Canadian Board of Examiners for Professional Surveyors Core Syllabus Item C 5: GEOSPATIAL INFORMATION SYSTEMS Study Guide: Canadian Board of Examiners for Professional Surveyors Core Syllabus Item C 5: GEOSPATIAL INFORMATION SYSTEMS This guide presents some study questions with specific referral to the essential

More information

Pattern Logics and Auxiliary Relations

Pattern Logics and Auxiliary Relations Pattern Logics and Auxiliary Relations Diego Figueira Leonid Libkin University of Edinburgh Abstract A common theme in the study of logics over finite structures is adding auxiliary predicates to enhance

More information

Chap 2: Classical models for information retrieval

Chap 2: Classical models for information retrieval Chap 2: Classical models for information retrieval Jean-Pierre Chevallet & Philippe Mulhem LIG-MRIM Sept 2016 Jean-Pierre Chevallet & Philippe Mulhem Models of IR 1 / 81 Outline Basic IR Models 1 Basic

More information