Database design and implementation CMPSCI 645
|
|
- Trevor Fisher
- 5 years ago
- Views:
Transcription
1 Database design and implementation CMPSCI 645 Lectures 20: Probabilistic Databases *based on a tutorial by Dan Suciu
2 Have we seen uncertainty in DB yet? } NULL values Age Height Weight 20 NULL 200 NULL Denotes uncertainty / lack of informagon 2
3 Databases today are deterministic } An item either is in the database or is not } A tuple either is in the query answer or is not } This applies to all variety of data models: } RelaGonal, E/R, NF2, hierarchical, XML, 3
4 What is a probabilistic database? } An item belongs to the database is a probabilisgc event } A tuple is an answer to the query is a probabilisgc event } Can be extended to all data models; we discuss only probabilisgc rela%onal data 4
5 Example: Information extraction...52 A Goregaon West Mumbai... [Gupta&Sarawagi 2006] ID House-No Street City P 1 52 Goregaon West Mumbai A Goregaon West Mumbai Goregaon West Mumbai A Goregaon West Mumbai Here probabiliges are meaningful 20% of such extracgons are correct 7
6 Example: RFID ecosystem at UW [Welbourne 2007] 8
7 Two types of probabilistic data } Database is determinisgc Query answers are probabilisgc } Database is probabilisgc Query answers are probabilisgc 10
8 Motivating applications } Text extracgon / record linking } Inconsistent data } Ranking query answers 11
9 Ranking query answers Database is determinisgc The query returns a ranked list of tuples } User interested in top-k answers. 12
10 [Agrawal,Chaudhuri,Das,Gionis 2003] The empty answers problem Query is over-specified: no answers Example: try to buy a house in Amherst SELECT * FROM Houses WHERE bedrooms = 4 AND style = crafsman AND district = downtown AND price <
11 [Agrawal,Chaudhuri,Das,Gionis 2003] Ranking: Compute a similarity score between a tuple and the query Q = SELECT * FROM R WHERE A 1 =v 1 AND AND A m =v m Query is a vector: Tuple is a vector: Q = (v 1,, v m ) T = (u 1,, u m ) Rank tuples by their TF/IDF similarity to the query Q Includes pargal matches 14
12 [Motro:1988,Dalvi&S:2004] Similarity predicates in SQL Beyond a single table: Find the good deals in a neighborhood SELECT * FROM Houses x WHERE x.bedrooms ~ 4 AND x.style ~ crafsman AND x.price ~ 600k AND NOT EXISTS (SELECT * FROM Houses y WHERE x.district = y.district AND x.id!= y.id AND y.bedrooms ~ 4 AND y.style ~ crafsman AND y.price ~ 600k Users specify similarity predicates with ~ System combines atomic similariges using probabiliges 15
13 [HrisGdis&PapakonstanGnou 2002,BhaloGa et al.2002] Keyword search in databases Goal: } Users want to search via keywords } Do not know the schema Techniques: } Matching objects may be scarered across physical tables due to normalizagon; need on the fly joins } Score of a tuple = number of joins, plus presgge based on in-degree 16
14 [HrisGdis,PapakonstanGnou 2002] Q = Abiteboul and Widom Join sequences (tuple trees): Paper Author Author Abiteboul Person Paper In Conference Paper Paper Person Widom Conference Author Editor Author Author Person Abiteboul Person Widom Person Paper Author Conference Person Widom Person Abiteboul 17
15 Record linkage Determine if two data records describe same object Scenarios: } Join/merge two relagons } Remove duplicates from a single relagon } Validate incoming tuples against a reference 18
16 [Bertosi&Chomicki:2003] Inconsistent data Goal: consistent query answers from inconsistent databases ApplicaGons: } IntegraGon of autonomous data sources } Un-enforced integrity constraints } Temporary inconsistencies 23
17 [Bertosi&Chomicki:2003] The repair semantics Key (?!?) Name Affilia*on State Area Miklau UW WA Data security Dalvi UW WA Prob. Data Balazinska UW WA Data streams Balazinska MIT MA Data streams Miklau UMass MA Data security Find people in State=WA Dalvi Find people in State=MA ; Consider all repairs High precision, but low recall 24
18 Alternative probabilistic semantics Name Affilia*on State Area P Miklau UW WA Data security 0.5 Dalvi UW WA Prob. Data 1 Balazinska UW WA Data streams 0.5 Balazinska MIT MA Data streams 0.5 Miklau Umass MA Data security 0.5 State=WA Dalvi, Balazinska(0.5), Miklau(0.5) State=MA Balazinska(0.5), Miklau(0.5) Lower precision, but berer recall 25
19 [Barbara et al.1992] What is a Probabilistic Database (PDB)? HasObject p Keys Probability Non-keys Object Time Person P Laptop77 9:07 John 0.62 Jim 0.34 Mary 0.45 Book302 9:18 John 0.33 Fred 0.11 What does it mean? 27
20 Possible world semantics Object Time Person P Laptop77 9:07 Book302 9:18 John p 1 Jim p 2 Mary p 3 John p 4 Fred p 5 PDB Ω={ Object Time Person Object Time Person Laptop77 9:07 John } Object Time Person Laptop77 9:07 John Book302 9:18 Object Mary Time Person Laptop77 Object 9:07 John Time Person Book302 Laptop77 9:18 John Object 9:07 Jim Time Person Book302 Laptop77 9:18 Fred 9:07 Jim Book302 Laptop77 9:18 Object Mary 9:07 Time Jim Person p 1 p 3 Book302 9:18 Object John Time Person p 1 p 4 Book302 Laptop77 9:18 Object 9:07 Fred John Time Person Laptop77 Object 9:07 Jim Time Person Book302 Object 9:18 Mary Time Person Book302 Object 9:18 John Time Person p 1 (1- p 3 -p 4 -p 5 ) Book302 9:18 Fred Possible worlds 29
21 HasObject(Object, Time, Person, P) Object Time Person P Laptop77 9:07 John p 1 Jim p 2 Disjoint Book302 9:18 Mary p 3 John p 4 Disjoint Independent Fred p 5 Meets(Person1, Person2, Time, P) Person1 Person2 Time P John Jim 9.12 p 1 Mary Sue 9:20 p 2 Independent John Mary 9:20 p 3 31
22 Tuple correlation Disjoint NegaGvely correlated Independent PosiGvely correlated IdenGcal Pr(t 1 t 2 ) = 0 Pr(t 1 t 2 ) < Pr(t 1 ) Pr(t 2 ) Pr(t 1 t 2 ) = Pr(t 1 ) Pr(t 2 ) Pr(t 1 t 2 ) > Pr(t 1 ) Pr(t 2 ) Pr(t 1 t 2 ) = Pr(t 1 ) = Pr(t 2 ) 34
23 Query semantics Given a query Q and a probabilisgc database I p, what is the meaning of Q(I p )? 35
24 Query semantics Seman*cs 1: Possible Answers A probability distribugons on sets of tuples Pr(Q = A) = I INST. Q(I) = A Pr(I) Seman*cs 2: Possible Tuples A probability funcgon on tuples Pr(t Q) = I INST. T Q(I) Pr(I) 36
25 Purchase p Name City Product John Seattle Gizmo John Seattle Camera Sue Denver Gizmo Sue Denver Camera Name City Product John Boston Gizmo Sue Denver Gizmo Sue Seattle Gadget Name City Product John Seattle Gizmo John Seattle Camera Sue Seattle Camera Name City Product John Boston Camera Sue Seattle Camera Pr(I 1 ) = 1/3 Pr(I 2 ) = 1/12 Pr(I 3 ) = 1/2 Pr(I 4 ) = 1/12 SELECT DISTINCT x.product FROM Purchase p x, Purchase p y WHERE x.name = 'John' and x.product = y.product and y.name = 'Sue' Possible answers semangcs: Answer set Probability Gizmo, Camera 1/3 Pr(I 1 ) Gizmo 1/12 Pr(I 2 ) Camera 7/12 P(I 3 ) + P(I 4 ) Possible tuples semangcs: Tuple Probability Camera 11/12 Pr(I 1 )+P(I 3 ) + P(I 4 ) Gizmo 5/12 Pr(I 1 )+Pr(I 2 ) 37
26 How do we evaluate queries? 38
27 Probability of Boolean Expressions Φ = X 1 X 2 v X 1 X 3 v X 2 X 3 P(X 1 )= p 1, P(X 2 )= p 2, P(X 3 )= p 3 Compute P(Φ) Ω=. X 1 X 2 X 3 P Φ (1-p 1 )p 2 p p 1 (1-p 2 )p p 1 p 2 (1-p 3 ) p 1 p 2 p 3 1 Pr(Φ)=(1-p 1 )p 2 p 3 + p 1 (1-p 2 )p 3 + p 1 p 2 (1-p 3 ) + p 1 p 2 p 3 39
28 Complexity of Boolean expression probability [Valiant:1979] Theorem: For a Boolean expression E, compugng Pr(E) is #P-complete NP: class of problems of the form is there a witness (SAT) #P: class of problems of the form how many witnesses (#SAT) 40
29 Query q + Database PDB è Φ q= R(x, y), S(x, z) R p PDB= A B P a 1 b 1 p 1 X 1 a 2 b 2 p 2 X 2 S p A C P a 1 c 1 q 1 Y 1 a 1 c 2 q 2 Y 2 a 2 c 3 q 3 Y 3 a 2 c 4 q 4 Y 4 Φ = ê X 1 Y 1 v X 1 Y 2 v X 2 Y 3 v X 2 Y 4 v X 2 Y 5 a 2 c 5 q 5 Y 5 41
30 Application to query evaluation Corollary Fix FO query q Exact evaluagon of Pr(q) on input PDB is in #P Corollary Fix a conjuncgve query q. ApproximaGon of Pr(q) on input PDB is in PTIME (FPTRAS) [Graedel,Gurevitch,Hirsch:1998] 42
31 Query complexity Data complexity of a query Q: } Compute Q(I p ), for probabilisgc database I p Simplest scenario only: } Possible tuples semangcs for Q } Independent tuples for I p 43
32 Challenges } Query opgmizagon } A #P query with subqueries in PTIME } Combine safe plans and inference } TheoreGcal analysis of self-joins } Complex probabilisgc models } Capture complex correlagons } Hard and sof constraints on query evaluagon } 50
Topics in Probabilistic and Statistical Databases. Lecture 2: Representation of Probabilistic Databases. Dan Suciu University of Washington
Topics in Probabilistic and Statistical Databases Lecture 2: Representation of Probabilistic Databases Dan Suciu University of Washington 1 Review: Definition The set of all possible database instances:
More informationQuery Evaluation on Probabilistic Databases. CSE 544: Wednesday, May 24, 2006
Query Evaluation on Probabilistic Databases CSE 544: Wednesday, May 24, 2006 Problem Setting Queries: Tables: Review A(x,y) :- Review(x,y), Movie(x,z), z > 1991 name rating p Movie Monkey Love good.5 title
More informationQueries and Materialized Views on Probabilistic Databases
Queries and Materialized Views on Probabilistic Databases Nilesh Dalvi Christopher Ré Dan Suciu September 11, 2008 Abstract We review in this paper some recent yet fundamental results on evaluating queries
More informationProbabilistic Databases
Probabilistic Databases Dan Olteanu (Oxford) London, November 2017 The National Archives Probabilistic Databases For the purpose of this introductory talk: Probabilistic data = Relational data + Probabilities
More informationBrief Tutorial on Probabilistic Databases
Brief Tutorial on Probabilistic Databases Dan Suciu University of Washington Simons 206 About This Talk Probabilistic databases Tuple-independent Query evaluation Statistical relational models Representation,
More informationSemantics of Ranking Queries for Probabilistic Data and Expected Ranks
Semantics of Ranking Queries for Probabilistic Data and Expected Ranks Graham Cormode AT&T Labs Feifei Li FSU Ke Yi HKUST 1-1 Uncertain, uncertain, uncertain... (Probabilistic, probabilistic, probabilistic...)
More informationCMPT 354: Database System I. Lecture 9. Design Theory
CMPT 354: Database System I Lecture 9. Design Theory 1 Design Theory Design theory is about how to represent your data to avoid anomalies. Design 1 Design 2 Student Course Room Mike 354 AQ3149 Mary 354
More information10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design
Outline Introduction to Database Systems CSE 444 Design theory: 3.1-3.4 [Old edition: 3.4-3.6] Lectures 6-7: Database Design 1 2 Schema Refinements = Normal Forms 1st Normal Form = all tables are flat
More informationDatabase Design and Implementation
Database Design and Implementation CS 645 Schema Refinement First Normal Form (1NF) A schema is in 1NF if all tables are flat Student Name GPA Course Student Name GPA Alice 3.8 Bob 3.7 Carol 3.9 Alice
More informationIntroduction to Database Systems CSE 414. Lecture 20: Design Theory
Introduction to Database Systems CSE 414 Lecture 20: Design Theory CSE 414 - Spring 2018 1 Class Overview Unit 1: Intro Unit 2: Relational Data Models and Query Languages Unit 3: Non-relational data Unit
More informationPractice and Applications of Data Management CMPSCI 345. Lecture 15: Functional Dependencies
Practice and Applications of Data Management CMPSCI 345 Lecture 15: Functional Dependencies First Normal Form (1NF) } A database schema is in First Normal Form if all tables are flat Student Student Name
More informationProbabilistic Databases
Probabilistic Databases Amol Deshpande University of Maryland Goal Introduction to probabilistic databases Focus on an overview of: Different possible representations Challenges in using them Probabilistic
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 18: Design Theory Wrap-up 1 Announcements WQ6 is due on Tuesday Homework 6 is due on Thursday Be careful about your remaining late days. Today: Midterm review
More informationDatabases Lecture 8. Timothy G. Griffin. Computer Laboratory University of Cambridge, UK. Databases, Lent 2009
Databases Lecture 8 Timothy G. Griffin Computer Laboratory University of Cambridge, UK Databases, Lent 2009 T. Griffin (cl.cam.ac.uk) Databases Lecture 8 DB 2009 1 / 15 Lecture 08: Multivalued Dependencies
More informationA Dichotomy. in in Probabilistic Databases. Joint work with Robert Fink. for Non-Repeating Queries with Negation Queries with Negation
Dichotomy for Non-Repeating Queries with Negation Queries with Negation in in Probabilistic Databases Robert Dan Olteanu Fink and Dan Olteanu Joint work with Robert Fink Uncertainty in Computation Simons
More informationA Knowledge-Based Approach to Entity Resolution
A Knowledge-Based Approach to Entity Resolution Klaus-Dieter Schewe Software Competence Center Hagenberg and Johannes-Kepler-University Linz Austria kd.schewe@scch.at, kd.schewe@faw.at Qing Wang Research
More informationCSE 344 AUGUST 6 TH LOSS AND VIEWS
CSE 344 AUGUST 6 TH LOSS AND VIEWS ADMINISTRIVIA WQ6 due tonight HW7 due Wednesday DATABASE DESIGN PROCESS Conceptual Model: name product makes company price name address Relational Model: Tables + constraints
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lectures 18: BCNF 1 What makes good schemas? 2 Review: Relation Decomposition Break the relation into two: Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234
More informationAnswering Queries from Statistics and Probabilistic Views. Nilesh Dalvi and Dan Suciu, University of Washington.
Answering Queries from Statistics and Probabilistic Views Nilesh Dalvi and Dan Suciu, University of Washington. Background Query answering using Views problem: find answers to a query q over a database
More informationCSE 544 Principles of Database Management Systems
CSE 544 Principles of Database Management Systems Lecture 3 Schema Normalization CSE 544 - Winter 2018 1 Announcements Project groups due on Friday First review due on Tuesday (makeup lecture) Run git
More informationDesign Theory. Design Theory I. 1. Normal forms & functional dependencies. Today s Lecture. 1. Normal forms & functional dependencies
Design Theory BBM471 Database Management Systems Dr. Fuat Akal akal@hacettepe.edu.tr Design Theory I 2 Today s Lecture 1. Normal forms & functional dependencies 2. Finding functional dependencies 3. Closures,
More informationThe Veracity of Big Data
The Veracity of Big Data Pierre Senellart École normale supérieure 13 October 2016, Big Data & Market Insights 23 April 2013, Dow Jones (cnn.com) 23 April 2013, Dow Jones (cnn.com) 23 April 2013, Dow Jones
More informationThe Trichotomy of HAVING Queries on a Probabilistic Database
VLDBJ manuscript No. (will be inserted by the editor) The Trichotomy of HAVING Queries on a Probabilistic Database Christopher Ré Dan Suciu Received: date / Accepted: date Abstract We study the evaluation
More informationCSC 261/461 Database Systems Lecture 13. Spring 2018
CSC 261/461 Database Systems Lecture 13 Spring 2018 BCNF Decomposition Algorithm BCNFDecomp(R): Find X s.t.: X + X and X + [all attributes] if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose
More information11/6/11. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)
Relational Schema Design Introduction to Management CSE 344 Lectures 16: Database Design Conceptual Model: Relational Model: plus FD s name Product buys Person price name ssn Normalization: Eliminates
More information11/1/12. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)
Relational Schema Design Introduction to Management CSE 344 Lectures 16: Database Design Conceptual Model: Relational Model: plus FD s name Product buys Person price name ssn Normalization: Eliminates
More informationLectures 6. Lecture 6: Design Theory
Lectures 6 Lecture 6: Design Theory Lecture 6 Announcements Solutions to PS1 are posted online. Grades coming soon! Project part 1 is out. Check your groups and let us know if you have any issues. We have
More informationCS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #3: SQL---Part 1
CS 4604: Introduc0on to Database Management Systems B. Aditya Prakash Lecture #3: SQL---Part 1 Announcements---Project Goal: design a database system applica=on with a web front-end Project Assignment
More informationIntroduction to Management CSE 344
Introduction to Management CSE 344 Lectures 17: Design Theory 1 Announcements No class/office hour on Monday Midterm on Wednesday (Feb 19) in class HW5 due next Thursday (Feb 20) No WQ next week (WQ6 due
More informationSchema Refinement. Feb 4, 2010
Schema Refinement Feb 4, 2010 1 Relational Schema Design Conceptual Design name Product buys Person price name ssn ER Model Logical design Relational Schema plus Integrity Constraints Schema Refinement
More informationCS 347 Parallel and Distributed Data Processing
CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 2: Distributed Database Design Logistics Gradiance No action items for now Detailed instructions coming shortly First quiz to be released
More informationOutline. Part 1: Motivation Part 2: Probabilistic Databases Part 3: Weighted Model Counting Part 4: Lifted Inference for WFOMC
Outline Part 1: Motivation Part 2: Probabilistic Databases Part 3: Weighted Model Counting Part 4: Lifted Inference for WFOMC Part 5: Completeness of Lifted Inference Part 6: Query Compilation Part 7:
More informationCSE 303: Database. Outline. Lecture 10. First Normal Form (1NF) First Normal Form (1NF) 10/1/2016. Chapter 3: Design Theory of Relational Database
CSE 303: Database Lecture 10 Chapter 3: Design Theory of Relational Database Outline 1st Normal Form = all tables attributes are atomic 2nd Normal Form = obsolete Boyce Codd Normal Form = will study 3rd
More informationExtending Conditional Dependencies with Built-in Predicates
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, DECEMBER 214 1 Extending Conditional Dependencies with Built-in Predicates Shuai Ma, Liang Duan, Wenfei Fan, Chunming Hu, and Wenguang
More informationData Dependencies in the Presence of Difference
Data Dependencies in the Presence of Difference Tsinghua University sxsong@tsinghua.edu.cn Outline Introduction Application Foundation Discovery Conclusion and Future Work Data Dependencies in the Presence
More informationEvent Queries on Correlated Probabilistic Streams
Event Queries on Correlated Probabilistic Streams Christopher Ré, Julie Letchner, Magdalena Balazinska and Dan Suciu Dept. of Computer Science and Engineering,University of Washington Seattle, WA, USA
More informationRepresenting and Querying Correlated Tuples in Probabilistic Databases
Representing and Querying Correlated Tuples in Probabilistic Databases Prithviraj Sen Amol Deshpande Department of Computer Science University of Maryland, College Park. International Conference on Data
More informationInstructor: Sudeepa Roy
CompSci 590.6 Understanding Data: Theory and Applications Lecture 13 Incomplete Databases Instructor: Sudeepa Roy Email: sudeepa@cs.duke.edu 1 Today s Reading Alice Book : Foundations of Databases Abiteboul-
More informationRepresenting and Querying Correlated Tuples in Probabilistic Databases
Representing and Querying Correlated Tuples in Probabilistic Databases Prithviraj Sen Dept. of Computer Science, University of Maryland, College Park, MD 20742. sen@cs.umd.edu Amol Deshpande Dept. of Computer
More informationSemantic Optimization Techniques for Preference Queries
Semantic Optimization Techniques for Preference Queries Jan Chomicki Dept. of Computer Science and Engineering, University at Buffalo,Buffalo, NY 14260-2000, chomicki@cse.buffalo.edu Abstract Preference
More informationSCHEMA NORMALIZATION. CS 564- Fall 2015
SCHEMA NORMALIZATION CS 564- Fall 2015 HOW TO BUILD A DB APPLICATION Pick an application Figure out what to model (ER model) Output: ER diagram Transform the ER diagram to a relational schema Refine the
More informationDatabases 2011 The Relational Algebra
Databases 2011 Christian S. Jensen Computer Science, Aarhus University What is an Algebra? An algebra consists of values operators rules Closure: operations yield values Examples integers with +,, sets
More informationCSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018
CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018 Announcement Read Chapter 14 and 15 You must self-study these chapters Too huge to cover in Lectures Project 2 Part 1 due tonight Agenda 1.
More informationQuery answering using views
Query answering using views General setting: database relations R 1,...,R n. Several views V 1,...,V k are defined as results of queries over the R i s. We have a query Q over R 1,...,R n. Question: Can
More informationPractice and Applications of Data Management CMPSCI 345. Lecture 16: Schema Design and Normalization
Practice and Applications of Data Management CMPSCI 345 Lecture 16: Schema Design and Normalization Keys } A superkey is a set of a/ributes A 1,..., A n s.t. for any other a/ribute B, we have A 1,...,
More informationGAV-sound with conjunctive queries
GAV-sound with conjunctive queries Source and global schema as before: source R 1 (A, B),R 2 (B,C) Global schema: T 1 (A, C), T 2 (B,C) GAV mappings become sound: T 1 {x, y, z R 1 (x,y) R 2 (y,z)} T 2
More informationA Toolbox of Query Evaluation Techniques for Probabilistic Databases
2nd Workshop on Management and mining Of UNcertain Data (MOUND) Long Beach, March 1st, 2010 A Toolbox of Query Evaluation Techniques for Probabilistic Databases Dan Olteanu, Oxford University Computing
More informationHomework Assignment 2. Due Date: October 17th, CS425 - Database Organization Results
Name CWID Homework Assignment 2 Due Date: October 17th, 2017 CS425 - Database Organization Results Please leave this empty! 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.15 2.16 2.17 2.18 2.19 Sum
More informationQuery Processing. 3 steps: Parsing & Translation Optimization Evaluation
rela%onal algebra Query Processing 3 steps: Parsing & Translation Optimization Evaluation 30 Simple set of algebraic operations on relations Journey of a query SQL select from where Rela%onal algebra π
More informationTopics in Probabilistic and Statistical Databases. Lecture 9: Histograms and Sampling. Dan Suciu University of Washington
Topics in Probabilistic and Statistical Databases Lecture 9: Histograms and Sampling Dan Suciu University of Washington 1 References Fast Algorithms For Hierarchical Range Histogram Construction, Guha,
More informationMulti-join Query Evaluation on Big Data Lecture 2
Multi-join Query Evaluation on Big Data Lecture 2 Dan Suciu March, 2015 Dan Suciu Multi-Joins Lecture 2 March, 2015 1 / 34 Multi-join Query Evaluation Outline Part 1 Optimal Sequential Algorithms. Thursday
More informationCSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101
CSC 261/461 Database Systems Lecture 8 Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101 Agenda 1. Database Design 2. Normal forms & functional dependencies 3. Finding functional dependencies
More informationMeelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 05
Meelis Kull meelis.kull@ut.ee Autumn 2017 1 Sample vs population Example task with red and black cards Statistical terminology Permutation test and hypergeometric test Histogram on a sample vs population
More informationCSE 344 MAY 16 TH NORMALIZATION
CSE 344 MAY 16 TH NORMALIZATION ADMINISTRIVIA HW6 Due Tonight Prioritize local runs OQ6 Out Today HW7 Out Today E/R + Normalization Exams In my office; Regrades through me DATABASE DESIGN PROCESS Conceptual
More informationarxiv: v1 [cs.ai] 21 Sep 2014
Oblivious Bounds on the Probability of Boolean Functions WOLFGANG GATTERBAUER, Carnegie Mellon University DAN SUCIU, University of Washington arxiv:409.6052v [cs.ai] 2 Sep 204 This paper develops upper
More informationData-Driven Logical Reasoning
Data-Driven Logical Reasoning Claudia d Amato Volha Bryl, Luciano Serafini November 11, 2012 8 th International Workshop on Uncertainty Reasoning for the Semantic Web 11 th ISWC, Boston (MA), USA. Heterogeneous
More informationDatabases. Exercises on Relational Algebra
Databases Exercises on Relational Algebra The Lab Sessions Giacomo Bergami (giacomo.bergami2@unibo.it) bergami.co.nr 2016/10/07 Keys and Superkeys Relational Algebra (I) Negation Minimum 2016/10/14 Relational
More informationEfficient Query Evaluation on Probabilistic Databases
Efficient Query Evaluation on Probabilistic Databases Nilesh Dalvi and Dan Suciu April 4, 2004 Abstract We describe a system that supports arbitrarily complex SQL queries with uncertain predicates. The
More informationGIS Lecture 4: Data. GIS Tutorial, Third Edition GIS 1
GIS Lecture 4: Data GIS 1 Outline Data Types, Tables, and Formats Geodatabase Tabular Joins Spatial Joins Field Calculator ArcCatalog Functions GIS 2 Data Types, Tables, Formats GIS 3 Directly Loadable
More informationSchema Refinement and Normal Forms
Schema Refinement and Normal Forms UMass Amherst Feb 14, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke, Dan Suciu 1 Relational Schema Design Conceptual Design name Product buys Person price name
More informationDatabase Design and Implementation
Database Design and Implementation CS 645 Data provenance Provenance provenance, n. The fact of coming from some particular source or quarter; origin, derivation [Oxford English Dictionary] Data provenance
More informationData Analytics Beyond OLAP. Prof. Yanlei Diao
Data Analytics Beyond OLAP Prof. Yanlei Diao OPERATIONAL DBs DB 1 DB 2 DB 3 EXTRACT TRANSFORM LOAD (ETL) METADATA STORE DATA WAREHOUSE SUPPORTS OLAP DATA MINING INTERACTIVE DATA EXPLORATION Overview of
More informationStarAI Full, 6+1 pages Short, 2 page position paper or abstract
StarAI 2015 Fifth International Workshop on Statistical Relational AI At the 31st Conference on Uncertainty in Artificial Intelligence (UAI) (right after ICML) In Amsterdam, The Netherlands, on July 16.
More informationProfiling Sets for Preference Querying
Profiling Sets for Preference Querying Xi Zhang and Jan Chomicki Department of Computer Science and Engineering University at Buffalo, SUNY, U.S.A. {xizhang,chomicki}@cse.buffalo.edu Abstract. We propose
More informationLogic and Databases. Phokion G. Kolaitis. UC Santa Cruz & IBM Research Almaden. Lecture 4 Part 1
Logic and Databases Phokion G. Kolaitis UC Santa Cruz & IBM Research Almaden Lecture 4 Part 1 1 Thematic Roadmap Logic and Database Query Languages Relational Algebra and Relational Calculus Conjunctive
More informationDECOMPOSITION & SCHEMA NORMALIZATION
DECOMPOSITION & SCHEMA NORMALIZATION CS 564- Spring 2018 ACKs: Dan Suciu, Jignesh Patel, AnHai Doan WHAT IS THIS LECTURE ABOUT? Bad schemas lead to redundancy To correct bad schemas: decompose relations
More informationα-acyclic Joins Jef Wijsen May 4, 2017
α-acyclic Joins Jef Wijsen May 4, 2017 1 Motivation Joins in a Distributed Environment Assume the following relations. 1 M[NN, Field of Study, Year] stores data about students of UMONS. For example, (19950423158,
More informationUVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 6: Linear Regression Model with RegularizaEons. Dr. Yanjun Qi. University of Virginia
UVA CS 6316/4501 Fall 2016 Machine Learning Lecture 6: Linear Regression Model with RegularizaEons Dr. Yanjun Qi University of Virginia Department of Computer Science 1 Where are we? è Five major secgons
More informationINTRODUCTION TO RELATIONAL DATABASE SYSTEMS
INTRODUCTION TO RELATIONAL DATABASE SYSTEMS DATENBANKSYSTEME 1 (INF 3131) Torsten Grust Universität Tübingen Winter 2017/18 1 THE RELATIONAL ALGEBRA The Relational Algebra (RA) is a query language for
More informationCSE 344 AUGUST 3 RD NORMALIZATION
CSE 344 AUGUST 3 RD NORMALIZATION ADMINISTRIVIA WQ6 due Monday DB design HW7 due next Wednesday DB design normalization DATABASE DESIGN PROCESS Conceptual Model: name product makes company price name address
More informationL13: Normalization. CS3200 Database design (sp18 s2) 2/26/2018
L13: Normalization CS3200 Database design (sp18 s2) https://course.ccs.neu.edu/cs3200sp18s2/ 2/26/2018 274 Announcements! Keep bringing your name plates J Page Numbers now bigger (may change slightly)
More informationPrice: $25 (incl. T-Shirt, morning tea and lunch) Visit:
Three days of interesting talks & workshops from industry experts across Australia Explore new computing topics Network with students & employers in Brisbane Price: $25 (incl. T-Shirt, morning tea and
More informationData Cleaning and Query Answering with Matching Dependencies and Matching Functions
Data Cleaning and Query Answering with Matching Dependencies and Matching Functions Leopoldo Bertossi Carleton University Ottawa, Canada bertossi@scs.carleton.ca Solmaz Kolahi University of British Columbia
More informationComposing Schema Mappings: Second-Order Dependencies to the Rescue
Composing Schema Mappings: Second-Order Dependencies to the Rescue RONALD FAGIN IBM Almaden Research Center PHOKION G. KOLAITIS IBM Almaden Research Center LUCIAN POPA IBM Almaden Research Center WANG-CHIEW
More informationComputing Query Probability with Incidence Algebras Technical Report UW-CSE University of Washington
Computing Query Probability with Incidence Algebras Technical Report UW-CSE-10-03-02 University of Washington Nilesh Dalvi, Karl Schnaitter and Dan Suciu Revised: August 24, 2010 Abstract We describe an
More informationEfficient Top-k Query Evaluation on Probabilistic Data
Efficient Top-k Query Evaluation on Probabilistic Data Extended Version TR: #2006-06-05 Christopher Ré, Nilesh Dalvi and Dan Suciu University of Washington Dept. of Computer Science and Engineering E-mail:
More informationQuantifying Causal Effects on Query Answering in Databases
Quantifying Causal Effects on Query Answering in Databases Babak Salimi University of Washington February 2016 Collaborators: Leopoldo Bertossi (Carleton University), Dan Suciu (University of Washington),
More informationLineage implementation in PostgreSQL
Lineage implementation in PostgreSQL Andrin Betschart, 09-714-882 Martin Leimer, 09-728-569 3. Oktober 2013 Contents Contents 1. Introduction 3 2. Lineage computation in TPDBs 4 2.1. Lineage......................................
More informationScalable Uncertainty Management
Scalable Uncertainty Management 05 Query Evaluation in Probabilistic Databases Rainer Gemulla Jun 1, 2012 Overview In this lecture Primer: relational calculus Understand complexity of query evaluation
More informationApproximate Rewriting of Queries Using Views
Approximate Rewriting of Queries Using Views Foto Afrati 1, Manik Chandrachud 2, Rada Chirkova 2, and Prasenjit Mitra 3 1 School of Electrical and Computer Engineering National Technical University of
More informationQSQL: Incorporating Logic-based Retrieval Conditions into SQL
QSQL: Incorporating Logic-based Retrieval Conditions into SQL Sebastian Lehrack and Ingo Schmitt Brandenburg University of Technology Cottbus Institute of Computer Science Chair of Database and Information
More informationDatabase Applications (15-415)
Database Applications (15-415) Relational Calculus Lecture 6, January 26, 2016 Mohammad Hammoud Today Last Session: Relational Algebra Today s Session: Relational calculus Relational tuple calculus Announcements:
More informationChapter 3 Relational Model
Chapter 3 Relational Model Table of Contents 1. Structure of Relational Databases 2. Relational Algebra 3. Tuple Relational Calculus 4. Domain Relational Calculus Chapter 3-1 1 1. Structure of Relational
More informationUVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information
Relational Database Design Database Design To generate a set of relation schemas that allows - to store information without unnecessary redundancy - to retrieve desired information easily Approach - design
More informationInferring Data Currency and Consistency for Conflict Resolution
Inferring Data Currency and Consistency for Conflict Resolution Wenfei Fan,4 Floris Geerts 2 Nan Tang 3 Wenyuan Yu University of Edinburgh 2 University of Antwerp 3 QCRI, Qatar Foundation 4 Big Data Research
More information6.830 Lecture 11. Recap 10/15/2018
6.830 Lecture 11 Recap 10/15/2018 Celebration of Knowledge 1.5h No phones, No laptops Bring your Student-ID The 5 things allowed on your desk Calculator allowed 4 pages (2 pages double sided) of your liking
More informationFunctional Dependencies. Getting a good DB design Lisa Ball November 2012
Functional Dependencies Getting a good DB design Lisa Ball November 2012 Outline (2012) SEE NEXT SLIDE FOR ALL TOPICS (some for you to read) Normalization covered by Dr Sanchez Armstrong s Axioms other
More informationCMPSCI 250: Introduction to Computation. Lecture #11: Equivalence Relations David Mix Barrington 27 September 2013
CMPSCI 250: Introduction to Computation Lecture #11: Equivalence Relations David Mix Barrington 27 September 2013 Equivalence Relations Definition of Equivalence Relations Two More Examples: Universal
More informationInformation theoretical approach for domain ontology exploration in large Earth observation image archives
Information theoretical approach for domain ontology exploration in large Earth observation image archives Mihai Datcu, Mariana Ciucu DLR Oberpfaffenhofen IGARSS 2004 KIM - Knowledge driven Information
More informationOn the Semantics and Evaluation of Top-k Queries in Probabilistic Databases
On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases Xi Zhang Jan Chomicki SUNY at Buffalo September 23, 2008 Xi Zhang, Jan Chomicki (SUNY at Buffalo) Topk Queries in Prob. DB September
More informationOverview of Topics. Finite Model Theory. Finite Model Theory. Connections to Database Theory. Qing Wang
Overview of Topics Finite Model Theory Part 1: Introduction 1 What is finite model theory? 2 Connections to some areas in CS Qing Wang qing.wang@anu.edu.au Database theory Complexity theory 3 Basic definitions
More informationUniversity of New Mexico Department of Computer Science. Final Examination. CS 561 Data Structures and Algorithms Fall, 2013
University of New Mexico Department of Computer Science Final Examination CS 561 Data Structures and Algorithms Fall, 2013 Name: Email: This exam lasts 2 hours. It is closed book and closed notes wing
More informationData Cleaning and Query Answering with Matching Dependencies and Matching Functions
Data Cleaning and Query Answering with Matching Dependencies and Matching Functions Leopoldo Bertossi 1, Solmaz Kolahi 2, and Laks V. S. Lakshmanan 2 1 Carleton University, Ottawa, Canada. bertossi@scs.carleton.ca
More informationActive Integrity Constraints and Revision Programming
Under consideration for publication in Theory and Practice of Logic Programming 1 Active Integrity Constraints and Revision Programming Luciano Caroprese 1 and Miros law Truszczyński 2 1 Università della
More information15 Introduction to Data Mining
15 Introduction to Data Mining 15.1 Introduction to principle methods 15.2 Mining association rule see also: A. Kemper, Chap. 17.4, Kifer et al.: chap 17.7 ff 15.1 Introduction "Discovery of useful, possibly
More informationCS60021: Scalable Data Mining. Similarity Search and Hashing. Sourangshu Bha>acharya
CS62: Scalable Data Mining Similarity Search and Hashing Sourangshu Bha>acharya Finding Similar Items Distance Measures Goal: Find near-neighbors in high-dim. space We formally define near neighbors as
More information3. Only sequences that were formed by using finitely many applications of rules 1 and 2, are propositional formulas.
1 Chapter 1 Propositional Logic Mathematical logic studies correct thinking, correct deductions of statements from other statements. Let us make it more precise. A fundamental property of a statement is
More informationCanadian Board of Examiners for Professional Surveyors Core Syllabus Item C 5: GEOSPATIAL INFORMATION SYSTEMS
Study Guide: Canadian Board of Examiners for Professional Surveyors Core Syllabus Item C 5: GEOSPATIAL INFORMATION SYSTEMS This guide presents some study questions with specific referral to the essential
More informationPattern Logics and Auxiliary Relations
Pattern Logics and Auxiliary Relations Diego Figueira Leonid Libkin University of Edinburgh Abstract A common theme in the study of logics over finite structures is adding auxiliary predicates to enhance
More informationChap 2: Classical models for information retrieval
Chap 2: Classical models for information retrieval Jean-Pierre Chevallet & Philippe Mulhem LIG-MRIM Sept 2016 Jean-Pierre Chevallet & Philippe Mulhem Models of IR 1 / 81 Outline Basic IR Models 1 Basic
More information