Topics in Probabilistic and Statistical Databases. Lecture 2: Representation of Probabilistic Databases. Dan Suciu University of Washington
|
|
- Lucas French
- 5 years ago
- Views:
Transcription
1 Topics in Probabilistic and Statistical Databases Lecture 2: Representation of Probabilistic Databases Dan Suciu University of Washington 1
2 Review: Definition The set of all possible database instances: Inst = {I 1, I 2, I 3,..., I N } Sample space (Ω) Definition A probabilistic database PDB = (Inst, Pr) is a discrete probability distribution: Pr : Inst! [0,1] s.t. i=1,n Pr(I i ) = 1 Definition A possible world is I s.t. Pr(I) > 0 A possible tuple is a tuple t 2 I, for a possible world I 2
3 Representation System Informally: it is a syntax + semantics that allows us to represent a probabilistic database concisely Definition A representation system for ProbDB is (S,Rep), where S is a set of representations and Rep: S (set of PDBs) assigns a probabilistic database to each representation 3
4 Review: Disjoint-Independent Databases Definition A PDB is disjoint-independent if for any set T of possible tuples one of the following holds: T is an independent set, or T contains two disjoint tuples A disjoint-independent database can be fully specified by: all marginal tuple probabilities an indication of which tuples are disjoint or independent 4
5 Representations Systems for D/I- Databases MystiQ s representation Trio s xor-, maybe- tuples Attribute-level probabilities 5
6 D/I Relations in MystiQ At the schema level: Possible worlds key = set of attributes Probability = expression on attributes Constraint at the instance level: For each key value, sum of probabilities 1 6
7 Possible worlds key =? Attribute expression =? Constraint =? S = Object Time Person P LaptopX77 9:07 John p 1 LaptopX77 9:07 Jim p 2 Book302 9:18 Mary p 3 Book302 9:18 John p 4 Rep(S) = Book302 9:18 Fred p 5 { Object Time Person Object Time Person LaptopX77 9:07 John } Object Time Person LaptopX77 9:07 John Book302 9:18 Object Mary Time Person LaptopX77 Object 9:07 John Time Person Book302 LaptopX77 9:18 John Object 9:07 Jim Time Person Book302 LaptopX77 9:18 Fred 9:07 Jim Book302 LaptopX77 9:18 Object Mary 9:07 Time Jim Person Book302 9:18 Object John Time Person Book302 LaptopX77 9:18 Object 9:07 Fred John Time Person LaptopX77 Object 9:07 Jim Time Person Book302 Object 9:18 Mary Time Person Book302 Object 9:18 John Time Person p 1 (1- p 3 -p 4 -p 5 ) Book302 9:18 Fred p 1 p 3 p 1 p 4 7
8 X-Relations in Trio Maybe tuple is t?, meaning it may be missing X-tuple is <t1, t2, > meaning it may be any of t1, t2, An X-relation is a collection of X-tuples and maybe-tuples 8
9 Maybe- and X-tuples in Trio S = Rep(S) =? 9
10 Attribute-level Uncertainty For some attribute A, give a probability distribution on possible values Note: sum of probabilities must be 1 More generally, for a set of attributes A1, A2, give a probability distribution on possible values 10
11 [Barbara 92] Attribute-Level Uncertainty S = Rep(S) =? 11
12 Review Query Semantics Semantics 1: Possible Sets of Answers A probability distributions on sets of tuples 8 A. Pr(Q = A) = I 2 Inst. Q(I) = A Pr(I) Semantics 2: Possible Tuples A probability function on tuples 8 t. Pr(t 2 Q) = I 2 Inst. t2 Q(I) Pr(I) 12
13 The Representation Problem How do we represent correlations between tuples in a probdb? How do we represent query answers (views)? 13
14 Main Techniques Incomplete databases Concerned with representing possible worlds, i.e. no probabilities Probabilistic Networks Concerned with representing correlations, i.e. no databases 14
15 Incomplete Databases 15
16 [Imilelinski&Lipsi 1984, Green&Tannen 2006] Incomplete Databases Let Inst = the set of all possible instances over domain D Definition An incomplete database is a set of possible worlds I = {I 1, I 2, I 3,...} Inst Definition A representation system is (S, Rep) where S is a set of representations described by some syntax, and: Rep : S! 2 Inst 16
17 Discussion We often denote an incomplete database {I 1, I 2, } with <I 1, I 2, > This is called an OR-set: the true state can be any I <I 1, I 2, > 17
18 Rule of Thumb #1 A ProbDB = an Incomplete DB + probabilities 18
19 Discussion Question: what does < > mean (the empty set of worlds)? 19
20 Discussion Question: what does < > mean (the empty set of worlds)? Answer: inconsistency! We do not allow < > 20
21 Examples Every traditional database instance I is also an incomplete database I = <I> The no-information, or zero-information incomplete database is: N = < I I Inst> (in other words: N = Inst) 21
22 Four Representation Systems Codd-tables V-tables C-tables OR-sets 22
23 Codd-Tables T = NULL or Rep(T) =? 23
24 V-Tables T = Rep(T) =? marked nulls : 1, 2, 24
25 C-Tables T = Any Boolean condition Rep(T) =? 25
26 Boolean C-Tables: Vars only in Cond T = A B Cond a1 b1 (x=1) (y=2) (z=1) a1 b2 x=2 a2 b2 z=1 We often state explicitly the domain of each variable: Dom(X) = {1, 2} Dom(Y) = {1,2,3} Dom(Z) = {0,1}... Rep(T) =? In G&T s definition all vars are Boolean; minor distinction. 26
27 OR-Sets Review of Nested Relational Algebra NRA: Types: T ::= basetype T T { T } Operations (in class): 27
28 OR-Sets Types: T ::= basetype T T { T } < T > Operations (in class): 28
29 OR-Sets Examples What are their types? What do they mean? {[Gizmo, <99,110>], [Camera, <10,12,19>]} {<3,5>,<3,7>,<5,8,12>} {<{<1,2>,<3>},{<4>}>,<{<1>},{<2,3>,<4>}>} 29
30 Discussion Which concepts from incomplete databases where borrowed by the three simple representation systems for ProbDB? MystiQ s disjoint/independent tables Trio s X-tuples Attribute level probabilities 30
31 Two Important Properties Completeness: can a representation system represent all incomplete databases? Closure: can the result of a query also be represented in the same system? 31
32 Closure and Completeness Assume the domain D is finite Why? Definition A representation system is complete if for any incomplete database I there exists a representation S s.t. Rep(S) = I. 32
33 Which Are Complete? Why? Codd-Tables? V-Tables? C-Tables? OR-Sets? 33
34 Which Are Complete? Why? Codd-Tables? NO: constant cardinality V-Tables? NO: constant cardinality C-Tables? YES OR-Sets? YES 34
35 Closure Definition A representation system is closed w.r.t. a query language Q, if for every representation S and query q in Q, there exists S s.t. Rep(S ) = q(rep(s) q S S Rep Rep(S)={I 1, I 2, } q Rep Rep(S )={q(i 1 ), q(i 2 ), } 35
36 Which Are Closed w.r.t. RA? Codd-Tables? V-Tables? C-Tables? OR-Sets? 36
37 Which Are Closed w.r.t. RA? Codd-Tables? NO in class V-Tables? NO in class C-Tables? YES in class OR-Sets? YES in class 37
38 Completeness Closure Fact: every complete system is closed w.r.t. the Relational Algebra Why? 38
39 Completeness Closure? Challenge: give an example of a system that is closed w.r.t. Relational Algebra but is not complete! 39
40 [Das Sarma (unpublished) 2008] Completeness Closure? Consider a representation system S s.t. S can represent any deterministic instance <I> S can represent <{0}, {1}> S is closed under {Π,,σ} Then S is complete PROOF: in class 40
41 Lineage Lineage = a Boolean expression annotating a tuple that explains why the tuple is there Technically: lineage = condition in a Boolean C-table A.k.a provenance 41
42 [Benjelloun, VLDBJ 2008] Example Start from Trio s X-relations: ID Saw(Witness, Car) X1 <[Amy, Mazda], [Amy, Toyota]>? X2 <[Betty, Honda]> ID Drives(Person, Car) Y1 <[Jimmy, Mazda], [Jimmy, Toyota]> Y2 <[Billy, Mazda], [Billy, Honda]> 42
43 [Benjelloun, VLDBJ 2008] Example Convert to Boolean C-Tables Saw Drives Person Car Cond Witness Car Cond Jimmy Mazda Y1=1 Amy Mazda X1=1 Jimmy Toyota Y1=2 Amy Toyota X1=2 Billy Mazda Y2=1 Betty Honda X2=1 Billy Honda Y2=2 Q: How do we say Amy is a maybe tuple, Betty is a certain tuple? 43
44 Example Answer: Dom(X1) = {0,1,2} Dom(X2) = {1} Dom(Y1) = {1,2} Dom(Y2) = {1,2} 44
45 [Benjelloun, VLDBJ 2008] Saw Example Compute the query q(w,p) :- Saw(w,c),Drives(c,p) Drives Person Car Cond Witness Car Cond Jimmy Mazda Y1=1 Amy Mazda X1=1 Jimmy Toyota Y1=2 Amy Toyota X1=2 Billy Mazda Y2=1 Betty Honda X2=1 Billy Honda Y2=2 Witness Person Cond Amy Jimmy X1=1 Y1=1 X1=2 Y1=2 Betty Billy X2=1 Y2=2 45
46 Uniform Lineage Call a lineage expression uniform if: It is in DNF All conjuncts have the same number k of literals Each literal is of the form X=v Representation of uniform lineage: add 2k columns! 46
47 Saw Example Compute the query q(w,p) :- Saw(w,c),Drives(c,p) Drives Person Car Y W Witness Car X V Jimmy Mazda Y1 1 Amy Mazda X1 1 Jimmy Toyota Y1 2 Amy Toyota X1 2 Billy Mazda Y2 1 Betty Honda X2 1 Billy Honda Y2 2 Witness Person X V Y W Amy Jimmy X1 1 Y1 1 Amy Jimmy X1 2 Y1 2 Betty Billy X2 1 Y2 2 X1=1 Y1=1 X1=2 Y1=2 47
48 Summary of Incomplete DBs Main goal: specify a set of possible worlds Very relevant to ProbDBs! Key concepts: closure and completeness Turn out to be equivalent, except trivial cases Boolean C-tables closed&complete; others not Lineage: important tool in ProbDB, is derived from C-tables Some open questions next Want to tell & show? by Wed. 48
49 Open Questions in Incomplete/ Probabilistic Databases Variables OWA v.s. CWA Certain tuples, possible tuples Partial information order Strong v.s. weak representation systems Homework: pick one and apply to probdbs. Tell us next time your thoughts 49
50 Variables as Attribute Values T = A B b1 Rep(T) = A large, or infinite set a2 b1 ProbDBs don t use variables as attribute values What if we allow ProbDBs to have variables? 50
51 OWA v.s. CWA CWA: if R(a,b,c) is not mentioned in the knowledge/data base, then R(a,b,c) is assumed OWA: R(a,b,c) is inferred only if it is explicitly stated in the knowledge/data base Which one is standard semantics in DB? And in KR? 51
52 OWA v.s. CWA in Incomplete DB T = A B b1 a2 b1 CWA: Rep(T) = A B a1 b1 A B a1 b1 A B a3 b1 A B a4 b1... a2 b1 a2 b1 a2 b1 OWA: Rep(T) = A B a1 b1 a2 b1 A B... A B A B a1 b1 a1 b1 a1 b1 a2 b What would OWA mean for ProbDBs?... 52
53 Certain v.s. Possible Tuples Consider an incomplete database I = {I 1, I 2, I 3,...} Definition The certain tuples I =I 1 I 2 I 3 Definition The possible tuples I =I 1 I 2 I 3 What do certain/possible tuples correspond to in ProbDB? 53
54 Certain v.s. Possible Tuples Two incomplete databases I, J are equivalent if I = J Two incomplete databases I, J are equivalent w.r.t. a query language Q if forall q in Q, q(i) = q(j) These notions are used to define weak representation systems (see [AHV]).. Are there similar notions of equivalences between ProbDBs 54?
55 Partial Information Order Let (D, ) be an ordered set. Consider two sets A={a 1,,a m }, B={b 1,,b n }. When can we say A B? Definition [Smythe or upper] A B if b j a i : a i b j Definition [Hoare or lower] A B if a i b j : a i b j Definition [Plotkin or convex] A B if A B and A B 55
56 Partial Information Order Let s order OR-sets Values of base type: a a and a Records: [x,y] [u,v] when? OR-sets <a1,a2,a3> <b1,b2> when? Sets: {a1,a2,a3} {b1, b2} when? 56
57 Partial Information Order Let s order OR-sets Values of base type: a a and a Records: [x,y] [u,v] when? When x u and y v OR-sets <a1,a2,a3> <b1,b2> when? Smythe Sets: {a1,a2,a3} {b1, b2} when? Hoare (for OWA), or equality (for CWA) 57
58 Partial Information Order What is the partial information order on ProbDBs? 58
59 Probabilistic Networks 59
60 Probabilistic Models Problem setting: Given m random variables V 1,, V m Each with domain D (same for all V i ) A probability space over D m : Pr: D m [0,1], st v1,,vm in D Pr(v 1,,v m ) = 1 Called the joint probability distribution Problem: give a compact representation of Pr 60
61 Example V1 V2 P A disjoint probabilistic database Here m=2, but in general m is large, need compact rep. 61
62 Background Marginal probability: Pr(V i =a) = v1,,vm in D, vi = a Pr(v 1,,v m ) What does this mean? Pr(V i ), Pr(V i V j ) Conditional prob: Pr(E F) = Pr(EF)/Pr(F) Independence: Pr(EF) = Pr(E) * Pr(F) Equivalently: Pr(E F) = Pr(E) Conditional indep: Pr(E,F G)= Pr(E G)*Pr(F G) 62
63 Independence V1 V2 P V1 P V2 P Marginal probabilities Are V1, V2 independent, i.e. Pr(V1,V2) = Pr(V1)*Pr(V2)? Note: need to check 4 equalities (why?) 63
64 Factored Representation More general: if P(V1=a,V2=b) = p(a) * q(b) forall a,b then V1, V2 are independent V1 V2 P a 1 b 1 p 1 q 1 V1 P V2 P a 1 b 2 p 1 q 2 a 2 b 1 p 2 q 1 a 2 b 2 p 2 q = a 1 p 1 a 2 p 2... Factors b 1 q 1 b 2 q
65 1 st Connection to ProbDBs: Variable Attribute Every joint distribution over variables V 1,, V m corresponds to a trivial probabilistic table with attributes V 1,, V m What s trivial about it? 65
66 1 st Connection to ProbDBs: Variable Attribute Conversely: each block in a disjoint/ independent relation defines a joint distribution on the values of its attribute Object Time Person P LaptopX77 9:07 John 0.5 Jim 0.5 Mary 0.2 Book302 9:18 John 0.4 Fred
67 Independence 67
68 Independence Employee Department Quality Bonus Sales P John Smith Toy Great Yes 30k-34k 0.4*0.3 John Smith Toy Good Yes 30k-34k 0.5*0.3 John Smith Toy Fair No 30k-34k 0.1*0.3 John Smith Toy Great Yes 35k-39k 0.4*0.7 John Smith Toy Good Yes 35k-39k 0.5*0.7 John Smith Toy Fair No 35k-39k 0.1*0.7 Fre Jones Houseware Good Yes 20k-24k 0.5 Fre Jones Houseware Good Yes 24k-29k
69 Independence Factors are disjoint/indep. Tables! Employee Department Quality Bonus P John Smith Toy Great Yes 0.4 John Smith Toy Good Yes 0.5 John Smith Toy Fair No 0.1 Fre Jones Houseware Good Yes 1 Employee Department Sales P John Smith Toy 30k-34k 0.3 John Smith Toy 35k-39k 0.7 Fre Jones Houseware 20k-24k 0.5 Fre Jones Houseware 24k-29k
70 Rule of Thumb #2 Correlated table = Independent tables + join Same principle as in traditional schema normalization Decompose big table into small tables with independent attributes Big table = a view (single join!) over the small tables 70
71 Database 101 Assume that (Quality, Bonus) is independent from (Sales) Drop the P column a traditional table (no more probabilities) Question What functional dependency holds in this table? 71
72 Database 101 Assume that (Quality, Bonus) is independent from (Sales) Drop the P column a traditional table (no more probabilities) Question What functional dependency holds in this table? A Multivalued FD!: Emp, Dept Quality, Bonus Sales 72
73 Rule of Thumb #3 Factor decomposition = MVD decomposition + probability identities 73
74 Conditional Independence V1 V2 W P 1 1 a a a a b b b b Are V1, V2 independent given W, i.e. Pr(V1,V2 W) = Pr(V1 W) * Pr(V2 W)? 74
75 Factors Conditional Independence V1 V2 W P 1 1 a a a a b b b b They: they are conditional independent P(W=a)=0.5; P(-- W=a): V1 V2 P P(W=b)=0.5; P(-- W=b): V1 V2 P
76 Conditional Independence V1 V2 W P 1 1 a a a a 0.28 = W P a 0.5 b b b b b V1 W P 1 a a b b 0.5 V2 W P 1 a a b b
77 Representing Conditional Independent Vars More general: if P(V1=a,V2=b,W=c) = p(a,c) * q(b,c) forall a,b,c then V1, V2 are independent given c V1 V2 W P = W P c 1 r 1 a 1 b 1 c 1 p 11 q 11 c 2 r 2 a 1 b 2 c 1 p 11 q a 2 b 1 c 1 p 21 q 11 V1 W P V2 W P a 2 b 2 c 1 p 21 q c 2 P 12 q 12 a 1 c 1 p 11 /P 1 b 1 c 1 q 11 /Q a 2 c 1 p 21 /P 1.. c 2 p 12 /P 2 b 2 c 1 q 21 /Q 1.. c 2 q 12 /Q
78 Summary of Prob Networks (Conditional) indep. = MVDs + prob ident. Factored decomposition = base tables + joins Next time: Discuss the networks in probabilistic networks; WSdecomposition, U-relations Partial/approximate representations Begin query evaluation Send by Wed. if want to show&tell 78
Instructor: Sudeepa Roy
CompSci 590.6 Understanding Data: Theory and Applications Lecture 13 Incomplete Databases Instructor: Sudeepa Roy Email: sudeepa@cs.duke.edu 1 Today s Reading Alice Book : Foundations of Databases Abiteboul-
More informationDatabase design and implementation CMPSCI 645
Database design and implementation CMPSCI 645 Lectures 20: Probabilistic Databases *based on a tutorial by Dan Suciu Have we seen uncertainty in DB yet? } NULL values Age Height Weight 20 NULL 200 NULL
More informationQuery Evaluation on Probabilistic Databases. CSE 544: Wednesday, May 24, 2006
Query Evaluation on Probabilistic Databases CSE 544: Wednesday, May 24, 2006 Problem Setting Queries: Tables: Review A(x,y) :- Review(x,y), Movie(x,z), z > 1991 name rating p Movie Monkey Love good.5 title
More informationRelational Database Design
Relational Database Design Jan Chomicki University at Buffalo Jan Chomicki () Relational database design 1 / 16 Outline 1 Functional dependencies 2 Normal forms 3 Multivalued dependencies Jan Chomicki
More informationProbabilistic Databases
Probabilistic Databases Amol Deshpande University of Maryland Goal Introduction to probabilistic databases Focus on an overview of: Different possible representations Challenges in using them Probabilistic
More informationUVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information
Relational Database Design Database Design To generate a set of relation schemas that allows - to store information without unnecessary redundancy - to retrieve desired information easily Approach - design
More informationCMPT 354: Database System I. Lecture 9. Design Theory
CMPT 354: Database System I Lecture 9. Design Theory 1 Design Theory Design theory is about how to represent your data to avoid anomalies. Design 1 Design 2 Student Course Room Mike 354 AQ3149 Mary 354
More informationProvenance Semirings. Todd Green Grigoris Karvounarakis Val Tannen. presented by Clemens Ley
Provenance Semirings Todd Green Grigoris Karvounarakis Val Tannen presented by Clemens Ley place of origin Provenance Semirings Todd Green Grigoris Karvounarakis Val Tannen presented by Clemens Ley place
More information10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design
Outline Introduction to Database Systems CSE 444 Design theory: 3.1-3.4 [Old edition: 3.4-3.6] Lectures 6-7: Database Design 1 2 Schema Refinements = Normal Forms 1st Normal Form = all tables are flat
More informationCSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018
CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018 Announcement Read Chapter 14 and 15 You must self-study these chapters Too huge to cover in Lectures Project 2 Part 1 due tonight Agenda 1.
More informationSchema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1
Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1 Background We started with schema design ER model translation into a relational schema Then we studied relational
More informationSchema Refinement. Feb 4, 2010
Schema Refinement Feb 4, 2010 1 Relational Schema Design Conceptual Design name Product buys Person price name ssn ER Model Logical design Relational Schema plus Integrity Constraints Schema Refinement
More informationDatabase Design and Implementation
Database Design and Implementation CS 645 Schema Refinement First Normal Form (1NF) A schema is in 1NF if all tables are flat Student Name GPA Course Student Name GPA Alice 3.8 Bob 3.7 Carol 3.9 Alice
More information11/1/12. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)
Relational Schema Design Introduction to Management CSE 344 Lectures 16: Database Design Conceptual Model: Relational Model: plus FD s name Product buys Person price name ssn Normalization: Eliminates
More informationSchema Refinement and Normal Forms
Schema Refinement and Normal Forms UMass Amherst Feb 14, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke, Dan Suciu 1 Relational Schema Design Conceptual Design name Product buys Person price name
More information11/6/11. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)
Relational Schema Design Introduction to Management CSE 344 Lectures 16: Database Design Conceptual Model: Relational Model: plus FD s name Product buys Person price name ssn Normalization: Eliminates
More informationCOSC 430 Advanced Database Topics. Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago
COSC 430 Advanced Database Topics Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago Learning objectives and references You should be able to: define the elements of the relational
More informationCSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101
CSC 261/461 Database Systems Lecture 8 Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101 Agenda 1. Database Design 2. Normal forms & functional dependencies 3. Finding functional dependencies
More informationDatabases Lecture 8. Timothy G. Griffin. Computer Laboratory University of Cambridge, UK. Databases, Lent 2009
Databases Lecture 8 Timothy G. Griffin Computer Laboratory University of Cambridge, UK Databases, Lent 2009 T. Griffin (cl.cam.ac.uk) Databases Lecture 8 DB 2009 1 / 15 Lecture 08: Multivalued Dependencies
More informationDesign Theory. Design Theory I. 1. Normal forms & functional dependencies. Today s Lecture. 1. Normal forms & functional dependencies
Design Theory BBM471 Database Management Systems Dr. Fuat Akal akal@hacettepe.edu.tr Design Theory I 2 Today s Lecture 1. Normal forms & functional dependencies 2. Finding functional dependencies 3. Closures,
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lectures 18: BCNF 1 What makes good schemas? 2 Review: Relation Decomposition Break the relation into two: Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234
More informationIntroduction to Management CSE 344
Introduction to Management CSE 344 Lectures 17: Design Theory 1 Announcements No class/office hour on Monday Midterm on Wednesday (Feb 19) in class HW5 due next Thursday (Feb 20) No WQ next week (WQ6 due
More informationSchema Refinement and Normal Forms Chapter 19
Schema Refinement and Normal Forms Chapter 19 Instructor: Vladimir Zadorozhny vladimir@sis.pitt.edu Information Science Program School of Information Sciences, University of Pittsburgh Database Management
More informationInformation Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due
Information Systems for Engineers Exercise 8 ETH Zurich, Fall Semester 2017 Hand-out 24.11.2017 Due 01.12.2017 1. (Exercise 3.3.1 in [1]) For each of the following relation schemas and sets of FD s, i)
More informationConstraints: Functional Dependencies
Constraints: Functional Dependencies Fall 2017 School of Computer Science University of Waterloo Databases CS348 (University of Waterloo) Functional Dependencies 1 / 42 Schema Design When we get a relational
More informationLectures 6. Lecture 6: Design Theory
Lectures 6 Lecture 6: Design Theory Lecture 6 Announcements Solutions to PS1 are posted online. Grades coming soon! Project part 1 is out. Check your groups and let us know if you have any issues. We have
More informationAnswering Queries from Statistics and Probabilistic Views. Nilesh Dalvi and Dan Suciu, University of Washington.
Answering Queries from Statistics and Probabilistic Views Nilesh Dalvi and Dan Suciu, University of Washington. Background Query answering using Views problem: find answers to a query q over a database
More informationFunctional Dependencies. Getting a good DB design Lisa Ball November 2012
Functional Dependencies Getting a good DB design Lisa Ball November 2012 Outline (2012) SEE NEXT SLIDE FOR ALL TOPICS (some for you to read) Normalization covered by Dr Sanchez Armstrong s Axioms other
More informationCSE 303: Database. Outline. Lecture 10. First Normal Form (1NF) First Normal Form (1NF) 10/1/2016. Chapter 3: Design Theory of Relational Database
CSE 303: Database Lecture 10 Chapter 3: Design Theory of Relational Database Outline 1st Normal Form = all tables attributes are atomic 2nd Normal Form = obsolete Boyce Codd Normal Form = will study 3rd
More informationCSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-II
CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-II 1 Bayes Rule Example Disease {malaria, cold, flu}; Symptom = fever Must compute Pr(D fever) to prescribe treatment Why not assess
More informationDatabase Design and Implementation
Database Design and Implementation CS 645 Data provenance Provenance provenance, n. The fact of coming from some particular source or quarter; origin, derivation [Oxford English Dictionary] Data provenance
More informationIntroduction to Data Management. Lecture #6 (Relational DB Design Theory)
Introduction to Data Management Lecture #6 (Relational DB Design Theory) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v Homework
More informationQueries and Materialized Views on Probabilistic Databases
Queries and Materialized Views on Probabilistic Databases Nilesh Dalvi Christopher Ré Dan Suciu September 11, 2008 Abstract We review in this paper some recent yet fundamental results on evaluating queries
More informationOn Factorisation of Provenance Polynomials
On Factorisation of Provenance Polynomials Dan Olteanu and Jakub Závodný Oxford University Computing Laboratory Wolfson Building, Parks Road, OX1 3QD, Oxford, UK 1 Introduction Tracking and managing provenance
More informationA Dichotomy. in in Probabilistic Databases. Joint work with Robert Fink. for Non-Repeating Queries with Negation Queries with Negation
Dichotomy for Non-Repeating Queries with Negation Queries with Negation in in Probabilistic Databases Robert Dan Olteanu Fink and Dan Olteanu Joint work with Robert Fink Uncertainty in Computation Simons
More informationSchema Refinement: Other Dependencies and Higher Normal Forms
Schema Refinement: Other Dependencies and Higher Normal Forms Spring 2018 School of Computer Science University of Waterloo Databases CS348 (University of Waterloo) Higher Normal Forms 1 / 14 Outline 1
More informationLecture 8. Probabilistic Reasoning CS 486/686 May 25, 2006
Lecture 8 Probabilistic Reasoning CS 486/686 May 25, 2006 Outline Review probabilistic inference, independence and conditional independence Bayesian networks What are they What do they mean How do we create
More informationCSE 544 Principles of Database Management Systems
CSE 544 Principles of Database Management Systems Lecture 3 Schema Normalization CSE 544 - Winter 2018 1 Announcements Project groups due on Friday First review due on Tuesday (makeup lecture) Run git
More informationIntroduction to Database Systems CSE 414. Lecture 20: Design Theory
Introduction to Database Systems CSE 414 Lecture 20: Design Theory CSE 414 - Spring 2018 1 Class Overview Unit 1: Intro Unit 2: Relational Data Models and Query Languages Unit 3: Non-relational data Unit
More informationDesign Theory: Functional Dependencies and Normal Forms, Part I Instructor: Shel Finkelstein
Design Theory: Functional Dependencies and Normal Forms, Part I Instructor: Shel Finkelstein Reference: A First Course in Database Systems, 3 rd edition, Chapter 3 Important Notices CMPS 180 Final Exam
More informationL13: Normalization. CS3200 Database design (sp18 s2) 2/26/2018
L13: Normalization CS3200 Database design (sp18 s2) https://course.ccs.neu.edu/cs3200sp18s2/ 2/26/2018 274 Announcements! Keep bringing your name plates J Page Numbers now bigger (may change slightly)
More informationSCHEMA NORMALIZATION. CS 564- Fall 2015
SCHEMA NORMALIZATION CS 564- Fall 2015 HOW TO BUILD A DB APPLICATION Pick an application Figure out what to model (ER model) Output: ER diagram Transform the ER diagram to a relational schema Refine the
More informationChapter 10. Normalization Ext (from E&N and my editing)
Chapter 10 Normalization Ext (from E&N and my editing) Outline BCNF Multivalued Dependencies and Fourth Normal Form 2 BCNF A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever an FD X ->
More informationIntroduction to Data Management. Lecture #7 (Relational DB Design Theory II)
Introduction to Data Management Lecture #7 (Relational DB Design Theory II) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v Homework
More informationInverting Proof Systems for Secrecy under OWA
Inverting Proof Systems for Secrecy under OWA Giora Slutzki Department of Computer Science Iowa State University Ames, Iowa 50010 slutzki@cs.iastate.edu May 9th, 2010 Jointly with Jia Tao and Vasant Honavar
More informationSchema Refinement and Normal Forms. Chapter 19
Schema Refinement and Normal Forms Chapter 19 1 Review: Database Design Requirements Analysis user needs; what must the database do? Conceptual Design high level descr. (often done w/er model) Logical
More informationTopics in Probabilistic and Statistical Databases. Lecture 9: Histograms and Sampling. Dan Suciu University of Washington
Topics in Probabilistic and Statistical Databases Lecture 9: Histograms and Sampling Dan Suciu University of Washington 1 References Fast Algorithms For Hierarchical Range Histogram Construction, Guha,
More informationDecomposition. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 81
Decomposition Rudolf Kruse, Alexander Dockhorn Bayesian Networks 81 Object Representation Property Car family body Property Hatchback Motor Radio Doors Seat cover 2.8 L Type 4 Leather, 150kW alpha Type
More informationCS 186, Fall 2002, Lecture 6 R&G Chapter 15
Schema Refinement and Normalization CS 186, Fall 2002, Lecture 6 R&G Chapter 15 Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus Functional Dependencies (Review)
More informationComp 5311 Database Management Systems. 5. Functional Dependencies Exercises
Comp 5311 Database Management Systems 5. Functional Dependencies Exercises 1 Assume the following table contains the only set of tuples that may appear in a table R. Which of the following FDs hold in
More informationIntroduction to Data Management. Lecture #6 (Relational Design Theory)
Introduction to Data Management Lecture #6 (Relational Design Theory) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v HW#2 is
More informationSchema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007
Schema Refinement and Normal Forms Yanlei Diao UMass Amherst April 10, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke 1 The Evils of Redundancy Redundancy is at the root of several problems associated
More informationCSE 344 MAY 16 TH NORMALIZATION
CSE 344 MAY 16 TH NORMALIZATION ADMINISTRIVIA HW6 Due Tonight Prioritize local runs OQ6 Out Today HW7 Out Today E/R + Normalization Exams In my office; Regrades through me DATABASE DESIGN PROCESS Conceptual
More informationSchema Refinement and Normal Forms
Schema Refinement and Normal Forms Chapter 19 Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1 The Evils of Redundancy Redundancy is at the root of several problems associated with relational
More informationCSE 344 AUGUST 3 RD NORMALIZATION
CSE 344 AUGUST 3 RD NORMALIZATION ADMINISTRIVIA WQ6 due Monday DB design HW7 due next Wednesday DB design normalization DATABASE DESIGN PROCESS Conceptual Model: name product makes company price name address
More informationGlobal Database Design based on Storage Space and Update Time Minimization
Journal of Universal Computer Science, vol. 15, no. 1 (2009), 195-240 submitted: 11/1/08, accepted: 15/8/08, appeared: 1/1/09 J.UCS Global Database Design based on Storage Space and Update Time Minimization
More informationData Bases Data Mining Foundations of databases: from functional dependencies to normal forms
Data Bases Data Mining Foundations of databases: from functional dependencies to normal forms Database Group http://liris.cnrs.fr/ecoquery/dokuwiki/doku.php?id=enseignement: dbdm:start March 1, 2017 Exemple
More informationDECOMPOSITION & SCHEMA NORMALIZATION
DECOMPOSITION & SCHEMA NORMALIZATION CS 564- Spring 2018 ACKs: Dan Suciu, Jignesh Patel, AnHai Doan WHAT IS THIS LECTURE ABOUT? Bad schemas lead to redundancy To correct bad schemas: decompose relations
More informationSchema Refinement & Normalization Theory
Schema Refinement & Normalization Theory Functional Dependencies Week 13 1 What s the Problem Consider relation obtained (call it SNLRHW) Hourly_Emps(ssn, name, lot, rating, hrly_wage, hrs_worked) What
More informationSchema Refinement and Normal Forms
Schema Refinement and Normal Forms Chapter 19 Quiz #2 Next Thursday Comp 521 Files and Databases Fall 2012 1 The Evils of Redundancy v Redundancy is at the root of several problems associated with relational
More informationChapter 7: Relational Database Design
Chapter 7: Relational Database Design Chapter 7: Relational Database Design! First Normal Form! Pitfalls in Relational Database Design! Functional Dependencies! Decomposition! Boyce-Codd Normal Form! Third
More informationChapter 3 Design Theory for Relational Databases
1 Chapter 3 Design Theory for Relational Databases Contents Functional Dependencies Decompositions Normal Forms (BCNF, 3NF) Multivalued Dependencies (and 4NF) Reasoning About FD s + MVD s 2 Our example
More informationChapter 7: Relational Database Design. Chapter 7: Relational Database Design
Chapter 7: Relational Database Design Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design Functional Dependencies Decomposition Boyce-Codd Normal Form Third Normal
More informationThe Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.
The Evils of Redundancy Schema Refinement and Normal Forms Chapter 19 Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1 Redundancy is at the root of several problems associated with relational
More informationThe Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram
Schema Refinement and Normal Forms Chapter 19 Database Management Systems, R. Ramakrishnan and J. Gehrke 1 The Evils of Redundancy Redundancy is at the root of several problems associated with relational
More informationJournal of Computer and System Sciences
Journal of Computer and System Sciences 78 (2012) 1026 1044 Contents lists available at SciVerse ScienceDirect Journal of Computer and System Sciences www.elsevier.com/locate/jcss Characterisations of
More informationRelational Design Theory
Relational Design Theory CSE462 Database Concepts Demian Lessa/Jan Chomicki Department of Computer Science and Engineering State University of New York, Buffalo Fall 2013 Overview How does one design a
More informationReasoning Under Uncertainty: Conditional Probability
Reasoning Under Uncertainty: Conditional Probability CPSC 322 Uncertainty 2 Textbook 6.1 Reasoning Under Uncertainty: Conditional Probability CPSC 322 Uncertainty 2, Slide 1 Lecture Overview 1 Recap 2
More informationSchema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004
Schema Refinement and Normal Forms CIS 330, Spring 2004 Lecture 11 March 2, 2004 1 The Evils of Redundancy Redundancy is at the root of several problems associated with relational schemas: redundant storage,
More informationCSC 261/461 Database Systems Lecture 13. Spring 2018
CSC 261/461 Database Systems Lecture 13 Spring 2018 BCNF Decomposition Algorithm BCNFDecomp(R): Find X s.t.: X + X and X + [all attributes] if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose
More informationA Toolbox of Query Evaluation Techniques for Probabilistic Databases
2nd Workshop on Management and mining Of UNcertain Data (MOUND) Long Beach, March 1st, 2010 A Toolbox of Query Evaluation Techniques for Probabilistic Databases Dan Olteanu, Oxford University Computing
More informationSebastian Link University of Auckland, Auckland, New Zealand Henri Prade IRIT, CNRS and Université de Toulouse III, Toulouse, France
CDMTCS Research Report Series Relational Database Schema Design for Uncertain Data Sebastian Link University of Auckland, Auckland, New Zealand Henri Prade IRIT, CNRS and Université de Toulouse III, Toulouse,
More informationCS122A: Introduction to Data Management. Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li
CS122A: Introduction to Data Management Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li 1 Third Normal Form (3NF) v Relation R is in 3NF if it is in 2NF and it has no transitive dependencies
More informationDatabase Design and Normalization
Database Design and Normalization Chapter 12 (Week 13) EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 1 Multivalued Dependencies Employee Child Salary Year Hilbert Hubert
More informationDatabase Design and Normalization
Database Design and Normalization Chapter 11 (Week 12) EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 1 1NF FIRST S# Status City P# Qty S1 20 London P1 300 S1 20 London
More informationProbabilistic Databases
Probabilistic Databases Dan Olteanu (Oxford) London, November 2017 The National Archives Probabilistic Databases For the purpose of this introductory talk: Probabilistic data = Relational data + Probabilities
More informationTruth-Functional Logic
Truth-Functional Logic Syntax Every atomic sentence (A, B, C, ) is a sentence and are sentences With ϕ a sentence, the negation ϕ is a sentence With ϕ and ψ sentences, the conjunction ϕ ψ is a sentence
More informationDatabases 2012 Normalization
Databases 2012 Christian S. Jensen Computer Science, Aarhus University Overview Review of redundancy anomalies and decomposition Boyce-Codd Normal Form Motivation for Third Normal Form Third Normal Form
More informationSchema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19
Schema Refinement and Normal Forms [R&G] Chapter 19 CS432 1 The Evils of Redundancy Redundancy is at the root of several problems associated with relational schemas: redundant storage, insert/delete/update
More informationCS54100: Database Systems
CS54100: Database Systems Keys and Dependencies 18 January 2012 Prof. Chris Clifton Functional Dependencies X A = assertion about a relation R that whenever two tuples agree on all the attributes of X,
More informationKU Leuven Department of Computer Science
Implementation and Performance of Probabilistic Inference Pipelines: Data and Results Dimitar Shterionov Gerda Janssens Report CW 679, March 2015 KU Leuven Department of Computer Science Celestijnenlaan
More informationThe Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.
The Evils of Redundancy Schema Refinement and Normal Forms INFO 330, Fall 2006 1 Redundancy is at the root of several problems associated with relational schemas: redundant storage, insert/delete/update
More informationSchema Refinement and Normal Forms
Schema Refinement and Normal Forms Yanlei Diao UMass Amherst April 10 & 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke 1 Case Study: The Internet Shop DBDudes Inc.: a well-known database consulting
More informationRelational completeness of query languages for annotated databases
Relational completeness of query languages for annotated databases Floris Geerts 1,2 and Jan Van den Bussche 1 1 Hasselt University/Transnational University Limburg 2 University of Edinburgh Abstract.
More informationarxiv: v1 [cs.db] 21 Sep 2016
Ladan Golshanara 1, Jan Chomicki 1, and Wang-Chiew Tan 2 1 State University of New York at Buffalo, NY, USA ladangol@buffalo.edu, chomicki@buffalo.edu 2 Recruit Institute of Technology and UC Santa Cruz,
More informationReasoning Under Uncertainty: Introduction to Probability
Reasoning Under Uncertainty: Introduction to Probability CPSC 322 Lecture 23 March 12, 2007 Textbook 9 Reasoning Under Uncertainty: Introduction to Probability CPSC 322 Lecture 23, Slide 1 Lecture Overview
More informationEE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS
EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005 Instructor: Professor Jeff A. Bilmes Uncertainty & Bayesian Networks
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 18: Design Theory Wrap-up 1 Announcements WQ6 is due on Tuesday Homework 6 is due on Thursday Be careful about your remaining late days. Today: Midterm review
More informationSchema Refinement. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke
Schema Refinement Yanlei Diao UMass Amherst Slides Courtesy of R. Ramakrishnan and J. Gehrke 1 Revisit a Previous Example ssn name Lot Employees rating hourly_wages hours_worked ISA contractid Hourly_Emps
More informationIncomplete Information in RDF
Incomplete Information in RDF Charalampos Nikolaou and Manolis Koubarakis charnik@di.uoa.gr koubarak@di.uoa.gr Department of Informatics and Telecommunications National and Kapodistrian University of Athens
More informationReasoning under Uncertainty: Intro to Probability
Reasoning under Uncertainty: Intro to Probability Computer Science cpsc322, Lecture 24 (Textbook Chpt 6.1, 6.1.1) March, 15, 2010 CPSC 322, Lecture 24 Slide 1 To complete your Learning about Logics Review
More informationDatabase Design: Normal Forms as Quality Criteria. Functional Dependencies Normal Forms Design and Normal forms
Database Design: Normal Forms as Quality Criteria Functional Dependencies Normal Forms Design and Normal forms Design Quality: Introduction Good conceptual model: - Many alternatives - Informal guidelines
More informationA Comparative Study of Noncontextual and Contextual Dependencies
A Comparative Study of Noncontextual Contextual Dependencies S.K.M. Wong 1 C.J. Butz 2 1 Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: wong@cs.uregina.ca
More informationPractice and Applications of Data Management CMPSCI 345. Lecture 16: Schema Design and Normalization
Practice and Applications of Data Management CMPSCI 345 Lecture 16: Schema Design and Normalization Keys } A superkey is a set of a/ributes A 1,..., A n s.t. for any other a/ribute B, we have A 1,...,
More informationSchema Refinement and Normalization
Schema Refinement and Normalization Schema Refinements and FDs Redundancy is at the root of several problems associated with relational schemas. redundant storage, I/D/U anomalies Integrity constraints,
More informationOn the logical Implication of Multivalued Dependencies with Null Values
On the logical Implication of Multivalued Dependencies with Null Values Sebastian Link Department of Information Systems, Information Science Research Centre Massey University, Palmerston North, New Zealand
More informationChapter 3 Design Theory for Relational Databases
1 Chapter 3 Design Theory for Relational Databases Contents Functional Dependencies Decompositions Normal Forms (BCNF, 3NF) Multivalued Dependencies (and 4NF) Reasoning About FD s + MVD s 2 Remember our
More informationDatabase Applications (15-415)
Database Applications (15-415) Relational Calculus Lecture 6, January 26, 2016 Mohammad Hammoud Today Last Session: Relational Algebra Today s Session: Relational calculus Relational tuple calculus Announcements:
More informationDesign by Example for SQL Tables with Functional Dependencies
VLDB Journal manuscript No. (will be inserted by the editor) Design by Example for SQL Tables with Functional Dependencies Sven Hartmann Markus Kirchberg Sebastian Link Received: date / Accepted: date
More informationApproximate Lineage for Probabilistic Databases. University of Washington Technical Report UW TR: #TR
Approximate Lineage for Probabilistic Databases University of Washington Technical Report UW TR: #TR2008-03-02 Christopher Ré and Dan Suciu {chrisre,suciu}@cs.washington.edu Abstract In probabilistic databases,
More information