Semantic Foundations for Probabilistic Programming
|
|
- Frederick Beasley
- 5 years ago
- Views:
Transcription
1 Semantic Foundations for Probabilistic Programming Chris Heunen Ohad Kammar, Sam Staton, Frank Wood, Hongseok Yang 1 / 21
2 Semantic foundations programs mathematical objects s1 s2 2 / 21
3 Semantic foundations programs s1 s2 s1;s2 mathematical objects 2 / 21
4 Semantic foundations programs s1 s2 s1;s2 mathematical objects Operational: remember implementation details Denotational: see what program does conceptually (efficiency) (correctness) 2 / 21
5 Semantic foundations programs s1 s2 s1;s2 mathematical objects Operational: remember implementation details Denotational: see what program does conceptually Motivation: Ground programmer s unspoken intuitions Justify/refute/suggest program transformations Understand programming through mathematics (efficiency) (correctness) 2 / 21
6 Semantic foundations programs s1 s2 s1;s2 mathematical objects Operational: remember implementation details Denotational: see what program does conceptually Motivation: Ground programmer s unspoken intuitions Justify/refute/suggest program transformations Understand probability through program equations (efficiency) (correctness) 2 / 21
7 Probabilistic programming P(A B) = P(B A) P(A) P(B) 3 / 21
8 Probabilistic programming P(A B) P(B A) P(A) 3 / 21
9 Probabilistic programming P(A B) P(B A) P(A) posterior likelihood prior 3 / 21
10 Probabilistic programming P(A B) P(B A) P(A) posterior likelihood prior idealized Anglican = functional programming + normalize observe sample 3 / 21
11 Overview Interpret types as measurable spaces e.g. real = R Interpret (open) terms as kernels Interpret closed terms as measures Inference normalizes measures posterior likelihood prior [Kozen, Semantics of probabilistic programs, J Comp Syst Sci, 1981] 4 / 21
12 Overview Interpret types as measurable spaces e.g. real = R Interpret (open) terms as kernels Interpret closed terms as measures Inference normalizes measures posterior likelihood prior But: Commutativity? Higher order functions? Extensionality? Recursion? Fubini not true for all kernels R R not a measurable space [Kozen, Semantics of probabilistic programs, J Comp Syst Sci, 1981] [Aumann, Borel structures for function spaces, Ill J Math, 1961] 4 / 21
13 Example 1. Toss a fair coin to get outcome x 2. Set up exponential decay with rate r depending on x 3. Observe immediate decay 4. What is the outcome x? 5 / 21
14 Example 1. Toss a fair coin to get outcome x 2. Set up exponential decay with rate r depending on x 3. Observe immediate decay 4. What is the outcome x? let x = sample(bern(0.5)) in let r = if x then 2.0 else 1.0 observe(0.0 from exp(r)); return x 5 / 21
15 Example 1. Toss a fair coin to get outcome x 2. Set up exponential decay with rate r depending on x 3. Observe immediate decay 4. What is the outcome x? two traces: let x = sample(bern(0.5)) in x=true x=false let r = if x then 2.0 else 1.0 observe(0.0 from exp(r)); return x 5 / 21
16 Example 1. Toss a fair coin to get outcome x 2. Set up exponential decay with rate r depending on x 3. Observe immediate decay 4. What is the outcome x? two traces: let x = sample(bern(0.5)) in x=true x=false let r = if x then 2.0 else 1.0 r=2.0 observe(0.0 from exp(r)); score 2 return x return true 5 / 21
17 Example 1. Toss a fair coin to get outcome x 2. Set up exponential decay with rate r depending on x 3. Observe immediate decay 4. What is the outcome x? two traces: let x = sample(bern(0.5)) in x=true x=false let r = if x then 2.0 else 1.0 r=2.0 r=1.0 observe(0.0 from exp(r)); score 2 score 1 return x return true return false 5 / 21
18 Example 1. Toss a fair coin to get outcome x 2. Set up exponential decay with rate r depending on x 3. Observe immediate decay 4. What is the outcome x? two traces: let x = sample(bern(0.5)) in x=true x=false let r = if x then 2.0 else 1.0 r=2.0 r=1.0 observe(0.0 from exp(r)); score 2 score 1 return x return true return false posterior likelihood prior 2 0.5: true 1 0.5: false 5 / 21
19 Example 1. Toss a fair coin to get outcome x 2. Set up exponential decay with rate r depending on x 3. Observe immediate decay 4. What is the outcome x? P(true) = 1, P(false) = 0.5 two traces: let x = sample(bern(0.5)) in x=true x=false let r = if x then 2.0 else 1.0 r=2.0 r=1.0 observe(0.0 from exp(r)); score 2 score 1 return x return true return false posterior likelihood prior 2 0.5: true 1 0.5: false 5 / 21
20 Example 1. Toss a fair coin to get outcome x 2. Set up exponential decay with rate r depending on x 3. Observe immediate decay model evidence (score): What is the outcome x? P(true) = 66%, P(false) = 33% two traces: let x = sample(bern(0.5)) in x=true x=false let r = if x then 2.0 else 1.0 r=2.0 r=1.0 observe(0.0 from exp(r)); score 2 score 1 return x return true return false posterior likelihood prior 2 0.5: true 1 0.5: false 5 / 21
21 Example 1. Toss a fair coin to get outcome x 2. Set up exponential decay with rate r depending on x 3. Observe immediate decay model evidence (score): What is the outcome x? P(true) = 66%, P(false) = 33% Programs may also sample continuous distributions so have to deal with uncountable number of traces: let y = sample(gauss(7,2)) 5 / 21
22 Measure theory Impossible to sample 0.5 from standard normal distribution But sample in interval (0, 1) with probability around / 21
23 Measure theory Impossible to sample 0.5 from standard normal distribution But sample in interval (0, 1) with probability around 0.34 A measurable space is a set X with a family Σ X of subsets that is closed under countable unions and complements A (probability) measure on X is a function p: Σ X [0, ] that satisfies p( U n ) = p(u n ) (and has p(x) = 1) 6 / 21
24 First order language Types: A, B ::= R P(A) 1 A B i I A i real numbers bool := nat := i N 1 distributions over A finite products countable sums 7 / 21
25 First order language Types: A, B ::= R P(A) 1 A B i I A i Deterministic terms may not sample: variables x, y, z constructors for sums and products case, ini, if, false, true measurable functions bern, exp, gauss, dirac d 42.0 : R d gauss(2.0, 7.0) : P(R) x: R, y: R d x + y : R x: R, y: R d x < y : bool 7 / 21
26 First order language Types: A, B ::= R P(A) 1 A B i I A i Deterministic terms may not sample: variables x, y, z constructors for sums and products case, in i, if, false, true measurable functions bern, exp, gauss, dirac Probabilistic terms may sample: sequencing return, let constraints score priors sample Γ d t: A Γ p return(t): A Γ d t: R Γ p score(t): 1 Γ p t: A x : A p u: B Γ p let x = t in u: B Γ d t: P(A) Γ p sample(t): A 7 / 21
27 First order language Types: A, B ::= R P(A) 1 A B i I A i Deterministic terms may not sample: variables x, y, z constructors for sums and products case, in i, if, false, true measurable functions bern, exp, gauss, dirac inference norm Probabilistic terms may sample: sequencing return, let constraints score priors sample Γ d t: A Γ p return(t): A Γ d t: R Γ p score(t): 1 Γ p t: A x : A p u: B Γ p let x = t in u: B Γ d t: P(A) Γ p sample(t): A 7 / 21
28 First order semantics Interpret type A deterministic term Γ d t: A as measurable space A as measurable function Γ A probabilistic term Γ p t: A as kernel t: Γ Σ A [0, ] fixing first argument: measure, fixing second argument: measurable 8 / 21
29 First order semantics Interpret type A deterministic term Γ d t: A as measurable space A as measurable function Γ A probabilistic term Γ p t: A as kernel t: Γ Σ A [0, ] fixing first argument: measure, fixing second argument: measurable Γ d t: R Γ p score(t): 1 score(t)(γ, ) = t(γ) 8 / 21
30 First order semantics Interpret type A deterministic term Γ d t: A as measurable space A as measurable function Γ A probabilistic term Γ p t: A as kernel t: Γ Σ A [0, ] fixing first argument: measure, fixing second argument: measurable Γ d t: R Γ p score(t): 1 score(t)(γ, ) = t(γ) Γ d t: P(A) Γ p sample(t): A sample(t)(γ, U) = (t(γ))(u) 8 / 21
31 First order semantics Interpret type A deterministic term Γ d t: A as measurable space A as measurable function Γ A probabilistic term Γ p t: A as kernel t: Γ Σ A [0, ] fixing first argument: measure, fixing second argument: measurable Γ d t: R Γ p score(t): 1 score(t)(γ, ) = t(γ) Γ d t: P(A) Γ p sample(t): A Γ p t: A x : A p u: B Γ p let x = t in u: B sample(t)(γ, U) = (t(γ))(u) let x = t in u(γ, U) = Au(γ, x, U) t(γ, dx) 8 / 21
32 First order semantics Interpret type A deterministic term Γ d t: A as measurable space A as measurable function Γ A probabilistic term Γ p t: A as kernel t: Γ Σ A [0, ] fixing first argument: measure, fixing second argument: measurable Γ d t: R Γ p score(t): 1 Γ d t: P(A) Γ p sample(t): A Γ p t: A x : A p u: B Γ p let x = t in u: B score(t)(γ, ) = t(γ) sample(t)(γ, U) = (t(γ))(u) let x = t in u = A udt 8 / 21
33 Example let x = sample(bern(0.5)) in let r = if x then 2.0 else 1.0 observe(0.0 from exp(r)); return x The meaning of a program returning values in X is a measure on X has measure 0.0 {true} has measure 1.0 = {false} has measure 0.5 = {true, false} has measure / 21
34 Normalization: posterior likelihood prior Γ p t: A Γ d norm(t): R P(A) }{{} model evidence errors normalized posterior 10 / 21
35 Normalization: posterior likelihood prior Γ p t: A Γ d norm(t): R P(A) }{{} model evidence errors normalized posterior Interpretation of probabilistic term is kernel Γ Σ A [0, ] so fixing first argument gives measure t(γ, ) t(γ, A) is normalized probability measure 10 / 21
36 Normalization: posterior likelihood prior Γ p t: A Γ d norm(t): R P(A) }{{} model evidence normalized posterior errors (constant is 0 or ) Interpretation of probabilistic term is kernel Γ Σ A [0, ] so fixing first argument gives measure t(γ, ) t(γ, A) is normalized probability measure normalizing constant is model evidence 10 / 21
37 Normalization: posterior likelihood prior Γ p t: A Γ d norm(t): R P(A) }{{} model evidence normalized posterior errors (constant is 0 or ) let x = sample(bern(0.5)) in let r = if x then 2.0 else 1.0 observe(0.0 from exp(r)); return x = ( true : false : ) 10 / 21
38 Normalization: posterior likelihood prior Γ p t: A Γ d norm(t): R P(A) }{{} model evidence norm( let x = sample(bern(0.5)) in let r = if x then 2.0 else 1.0 observe(0.0 from exp(r)); return x ) normalized posterior errors (constant is 0 or ) = in ( ) 1 1.5, bern(0.66) 10 / 21
39 Example: sequential Monte Carlo norm( let x=t in u ) = norm( let (e,d) = norm(t) in score(e); let x=sample(d) in u ) 11 / 21
40 Example: importance sampling sample(exp(2)) = let x = sample(gauss(0,1))) score(exp-pdf(2,x) / gauss-pdf(0,1,x)); return x 12 / 21
41 Example: importance sampling sample(exp(2)) = = let x = sample(gauss(0,1))) score(exp-pdf(2,x) / gauss-pdf(0,1,x)); return x let x = sample(gauss(0,1))) score(1 / gauss-pdf(0,1,x)); score(exp-pdf(2,x)); return x 12 / 21
42 Example: importance sampling norm( sample(exp(2)) ) = norm( let x = sample(gauss(0,1))) score(exp-pdf(2,x) / gauss-pdf(0,1,x)); return x ) norm( norm( let x = sample(gauss(0,1))) score(1 / gauss-pdf(0,1,x)); score(exp-pdf(2,x)); ); return x ) Don t normalize as you go 12 / 21
43 Commutativity Reordering lines is very useful program transformation let x=t in let y=u in let y=u in v = let x=t in v 13 / 21
44 Commutativity Reordering lines is very useful program transformation let x=t in let y=u in let y=u in v = let x=t in v amounts to Fubini s theorem v du dt = v dt du A B B A 13 / 21
45 Commutativity Reordering lines is very useful program transformation let x=t in let y=u in let y=u in v = let x=t in v amounts to Fubini s theorem v du dt = v dt du A B Not true for arbitrary kernels, only for s-finite kernels kernel is s-finite when countable sum of bounded ones k: Γ Σ A [0, ] bounded if n γ U : k(γ, U) < n B A 13 / 21
46 Commutativity Reordering lines is very useful program transformation let x=t in let y=u in let y=u in v = let x=t in v amounts to Fubini s theorem v du dt = v dt du A B Not true for arbitrary kernels, only for s-finite kernels kernel is s-finite when countable sum of bounded ones k: Γ Σ A [0, ] bounded if n γ U : k(γ, U) < n kernel k is s-finite iff it can be built from sub-probability distributions, score, and binding k >>= l is (γ, V) l(γ, x, V)k(γ, dx) A B A measurable spaces and s-finite kernels form distributive symmetric monoidal category 13 / 21
47 Commutativity Reordering lines is very useful program transformation let x=t in let y=u in let y=u in v = let x=t in v amounts to Fubini s theorem v du dt = v dt du A B Not true for arbitrary kernels, only for s-finite kernels kernel is s-finite when countable sum of bounded ones k: Γ Σ A [0, ] bounded if n γ U : k(γ, U) < n B A Interpret terms as s-finite kernels 13 / 21
48 Example: facts about distributions let x = sample(gauss(0.0,1.0)) in return (x<0) = sample(bern(0.5)) 14 / 21
49 Example: conjugate priors let x = sample(beta(1,1)) in observe(bern(x), true); = return x observe(bern(0.5), true); let x = sample(beta(2,1)) in return x 15 / 21
50 Higher order functions Allow probabilistic terms as input/output for other terms [Roy et al, A stochastic programming perspective on nonparametric Bayes, ICML 2008] 16 / 21
51 Higher order functions Allow probabilistic terms as input/output for other terms A, B ::= R P(A) 1 A B i I A i A B [Roy et al, A stochastic programming perspective on nonparametric Bayes, ICML 2008] 16 / 21
52 Higher order functions Allow probabilistic terms as input/output for other terms A, B ::= R P(A) 1 A B i I A i A B R R is not a measurable space [Roy et al, A stochastic programming perspective on nonparametric Bayes, ICML 2008] [Aumann, Borel structures for function spaces, Ill J Math, 1961] 16 / 21
53 Higher order functions Allow probabilistic terms as input/output for other terms A, B ::= R P(A) 1 A B i I A i A B R R is not a measurable space Easy to handle operationally. What to do denotationally? [Roy et al, A stochastic programming perspective on nonparametric Bayes, ICML 2008] [Aumann, Borel structures for function spaces, Ill J Math, 1961] [Borgström et al, Measure transformer semantics for Bayesian machine learning, ESOP2011] 16 / 21
54 Higher order semantics Use category theory to extend measure theory not enough function spaces measurable spaces 17 / 21
55 Higher order semantics Use category theory to extend measure theory not enough function spaces measurable spaces sheaves on measurable spaces presheaves on measurable spaces that preserve countable products preserves all structure all function spaces 17 / 21
56 Higher order semantics Use category theory to extend measure theory measurable spaces A Giry monad P(A) measurable spaces distribution types sheaves on measurable spaces sheaves on measurable spaces 17 / 21
57 Higher order semantics Use category theory to extend measure theory measurable spaces sheaves on measurable spaces Giry monad left Kan extension measurable spaces sheaves on measurable spaces [Power, Generic models for computational effects, Th Comp Sci 2006] 17 / 21
58 Higher order semantics Use category theory to extend measure theory measurable spaces sheaves on measurable spaces 1 (R R) consists of random functions All definable functions R R are measurable measurable Ω R R Church-Turing Denotational and operational semantics match soundness & adequacy 17 / 21
59 Extensionality p f Not extensional: 1 A B g for all p f = g Solution: restrict to subcategory that is extensional 18 / 21
60 Extensionality p f Not extensional: 1 A B g for all p f = g Solution: restrict to subcategory that is extensional A quasi-measurable space is a set X with M X [R X] satisfying if f : R R is measurable and g M, then gf M if f : R X is constant, then f M if f : R N is measurable and g n M, then [g n ] f M t g f(t) (t) morphisms are functions f : X Y with g M X fg M Y 18 / 21
61 Extensionality p f Not extensional: 1 A B g for all p f = g Solution: restrict to subcategory that is extensional A quasi-measurable space is a set X with M X [R X] satisfying if f : R R is measurable and g M, then gf M if f : R X is constant, then f M if f : R N is measurable and g n M, then [g n ] f M t g f(t) (t) morphisms are functions f : X Y with g M X fg M Y Example: X measurable space, M X measurable functions R X morphism X Y is measurable function 18 / 21
62 Extensionality p f Not extensional: 1 A B g for all p f = g Solution: restrict to subcategory that is extensional A quasi-measurable space is a set X with M X [R X] satisfying if f : R R is measurable and g M, then gf M if f : R X is constant, then f M if f : R N is measurable and g n M, then [g n ] f M t g f(t) (t) morphisms are functions f : X Y with g M X fg M Y Example: X measurable space, M X measurable functions R X morphism X Y is measurable function Theorem: this gives cartesian closed category with countable sums Corollary: if term t has first order type, then t is measurable even if t involves higher order functions 18 / 21
63 Extensionality p f Not extensional: 1 A B g for all p f = g Solution: restrict to subcategory that is extensional A quasi-measurable space is a set X with M X [R X] satisfying if f : R R is measurable and g M, then gf M if f : R X is constant, then f M if f : R N is measurable and g n M, then [g n ] f M t g f(t) (t) A measure on (X, M X ) is a measure µ on R with a function f M Proposition: measures on [X Y] are random functions measurable map R X Y modulo measure on R 18 / 21
64 Recursion No recursion / least fixed points Idea: restrict to presheaves over domains An ω-complete partial order has suprema of increasing sequences morphisms preserve suprema of increasing sequences and infima sup x n A quasi-measurable space is ordered when X is an ωcpo and M is closed under pointwise increasing suprema x 5 x 4 x 1 x 2 x 3 Example: Any ωcpo, e.g. [0,1] take M all measurable functions R X where X has the Borel σ-algebra on the Lawson topology Theorem: this gives a cartesian closed category with countable sums 19 / 21
65 Example: von Neumann s trick let g = bern(0.66) in letrec f() = (let x = sample(g) let y = sample(g)? = sample(bern(0.5)) if x=y then f() else return x) in f() 20 / 21
66 Conclusion Foundational semantics for probabilistic programming: continuous distributions soft constraints commutativity higher order functions recursion can verify/suggest program transformations. Approximations? 21 / 21
Semantics for Probabilistic Programming
Semantics for Probabilistic Programming Chris Heunen 1 / 21 Bayes law P(A B) = P(B A) P(A) P(B) 2 / 21 Bayes law P(A B) = P(B A) P(A) P(B) Bayesian reasoning: predict future, based on model and prior evidence
More informationThe Semantic Structure of Quasi-Borel Spaces
H O T F E E U D N I I N V E B R U S R I T Y H G Heunen, Kammar, Moss, Ścibior, Staton, Vákár, and Yang Chris Heunen, Ohad Kammar, Sean K. Moss, Adam Ścibior, Sam Staton, Matthijs Vákár, and Hongseok Yang
More informationCommutative semantics for probabilistic programming
Commutative semantics for probabilistic programming Sam Staton University of Oxford Abstract. We show that a measure-based denotational semantics for probabilistic programming is commutative. The idea
More informationSynthetic Probability Theory
Synthetic Probability Theory Alex Simpson Faculty of Mathematics and Physics University of Ljubljana, Slovenia PSSL, Amsterdam, October 2018 Gian-Carlo Rota (1932-1999): The beginning definitions in any
More informationA Convenient Category for Higher-Order Probability Theory
A Convenient Category for Higher-Order Probability Theory Chris Heunen University of Edinburgh, UK Ohad Kammar University of Oxford, UK Sam Staton University of Oxford, UK Hongseok Yang University of Oxford,
More informationProbability as Logic
Probability as Logic Prakash Panangaden 1 1 School of Computer Science McGill University Estonia Winter School March 2015 Panangaden (McGill University) Probabilistic Languages and Semantics Estonia Winter
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationA PECULIAR COIN-TOSSING MODEL
A PECULIAR COIN-TOSSING MODEL EDWARD J. GREEN 1. Coin tossing according to de Finetti A coin is drawn at random from a finite set of coins. Each coin generates an i.i.d. sequence of outcomes (heads or
More informationLecture 6 Basic Probability
Lecture 6: Basic Probability 1 of 17 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 6 Basic Probability Probability spaces A mathematical setup behind a probabilistic
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 3 9/10/2008 CONDITIONING AND INDEPENDENCE
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 3 9/10/2008 CONDITIONING AND INDEPENDENCE Most of the material in this lecture is covered in [Bertsekas & Tsitsiklis] Sections 1.3-1.5
More informationReview of Probabilities and Basic Statistics
Alex Smola Barnabas Poczos TA: Ina Fiterau 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1 Overview Introduction to
More informationA Lambda Calculus Foundation for Universal Probabilistic Programming
A Lambda Calculus Foundation for Universal Probabilistic Programming Johannes Borgström (Uppsala University) Ugo Dal Lago (University of Bologna, INRIA) Andrew D. Gordon (Microsoft Research, University
More informationMachine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014
Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,
More informationACLT: Algebra, Categories, Logic in Topology - Grothendieck's generalized topological spaces (toposes)
ACLT: Algebra, Categories, Logic in Topology - Grothendieck's generalized topological spaces (toposes) Steve Vickers CS Theory Group Birmingham 2. Theories and models Categorical approach to many-sorted
More informationBrief Review of Probability
Brief Review of Probability Nuno Vasconcelos (Ken Kreutz-Delgado) ECE Department, UCSD Probability Probability theory is a mathematical language to deal with processes or experiments that are non-deterministic
More informationA categorical model for a quantum circuit description language
A categorical model for a quantum circuit description language Francisco Rios (joint work with Peter Selinger) Department of Mathematics and Statistics Dalhousie University CT July 16th 22th, 2017 What
More informationFROM COHERENT TO FINITENESS SPACES
FROM COHERENT TO FINITENESS SPACES PIERRE HYVERNAT Laboratoire de Mathématiques, Université de Savoie, 73376 Le Bourget-du-Lac Cedex, France. e-mail address: Pierre.Hyvernat@univ-savoie.fr Abstract. This
More informationBasic Measure and Integration Theory. Michael L. Carroll
Basic Measure and Integration Theory Michael L. Carroll Sep 22, 2002 Measure Theory: Introduction What is measure theory? Why bother to learn measure theory? 1 What is measure theory? Measure theory is
More informationEcon 325: Introduction to Empirical Economics
Econ 325: Introduction to Empirical Economics Lecture 2 Probability Copyright 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 3-1 3.1 Definition Random Experiment a process leading to an uncertain
More informationOrigins of Probability Theory
1 16.584: INTRODUCTION Theory and Tools of Probability required to analyze and design systems subject to uncertain outcomes/unpredictability/randomness. Such systems more generally referred to as Experiments.
More informationChapter 1: Introduction to Probability Theory
ECE5: Stochastic Signals and Systems Fall 8 Lecture - September 6, 8 Prof. Salim El Rouayheb Scribe: Peiwen Tian, Lu Liu, Ghadir Ayache Chapter : Introduction to Probability Theory Axioms of Probability
More informationDenotational Validation of Higher-Order Bayesian Inference
Denotational Validation of Higher-Order Bayesian Inference ADAM ŚCIBIOR, University of Cambridge, England and Max Planck Institute for Intelligent Systems, Germany OHAD KAMMAR, University of Oxford, England
More informationSample Spaces, Random Variables
Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted
More informationClassical and Bayesian inference
Classical and Bayesian inference AMS 132 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 8 1 / 11 The Prior Distribution Definition Suppose that one has a statistical model with parameter
More informationRandom variables. DS GA 1002 Probability and Statistics for Data Science.
Random variables DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Motivation Random variables model numerical quantities
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More information1 Sequences of events and their limits
O.H. Probability II (MATH 2647 M15 1 Sequences of events and their limits 1.1 Monotone sequences of events Sequences of events arise naturally when a probabilistic experiment is repeated many times. For
More informationA Monad For Randomized Algorithms
A Monad For Randomized Algorithms Tyler Barker Tulane University 1 Domains XII August 26, 2015 1 Supported by AFOSR Tyler Barker Tulane University A Monad For Randomized Algorithms Domains XII August 26,
More informationCS 4649/7649 Robot Intelligence: Planning
CS 4649/7649 Robot Intelligence: Planning Probability Primer Sungmoon Joo School of Interactive Computing College of Computing Georgia Institute of Technology S. Joo (sungmoon.joo@cc.gatech.edu) 1 *Slides
More informationWhy Probability? It's the right way to look at the world.
Probability Why Probability? It's the right way to look at the world. Discrete Random Variables We denote discrete random variables with capital letters. A boolean random variable may be either true or
More informationLecture 1: Probability Fundamentals
Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationChapter 4. Measurable Functions. 4.1 Measurable Functions
Chapter 4 Measurable Functions If X is a set and A P(X) is a σ-field, then (X, A) is called a measurable space. If µ is a countably additive measure defined on A then (X, A, µ) is called a measure space.
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 9: A Bayesian model of concept learning Chris Lucas School of Informatics University of Edinburgh October 16, 218 Reading Rules and Similarity in Concept Learning
More informationTopology in Denotational Semantics
Journées Topologie et Informatique Thomas Ehrhard Preuves, Programmes et Systèmes (PPS) Université Paris Diderot - Paris 7 and CNRS March 21, 2013 What is denotational semantics? The usual stateful approach
More informationLog Gaussian Cox Processes. Chi Group Meeting February 23, 2016
Log Gaussian Cox Processes Chi Group Meeting February 23, 2016 Outline Typical motivating application Introduction to LGCP model Brief overview of inference Applications in my work just getting started
More informationarxiv: v1 [cs.pl] 9 Nov 2017
arxiv:1711.03219v1 [cs.pl] 9 Nov 2017 Denotational Validation of Higher-Order Bayesian Inference ADAM ŚCIBIOR, University of Cambridge, England and Max Planck Institute for Intelligent Systems, Germany
More informationProbability theory basics
Probability theory basics Michael Franke Basics of probability theory: axiomatic definition, interpretation, joint distributions, marginalization, conditional probability & Bayes rule. Random variables:
More informationNotes 1 : Measure-theoretic foundations I
Notes 1 : Measure-theoretic foundations I Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Wil91, Section 1.0-1.8, 2.1-2.3, 3.1-3.11], [Fel68, Sections 7.2, 8.1, 9.6], [Dur10,
More informationSynthetic Computability
Synthetic Computability Andrej Bauer Department of Mathematics and Physics University of Ljubljana Slovenia MFPS XXIII, New Orleans, April 2007 What is synthetic mathematics? Suppose we want to study mathematical
More informationDiscrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10
EECS 70 Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10 Introduction to Basic Discrete Probability In the last note we considered the probabilistic experiment where we flipped
More informationMA : Introductory Probability
MA 320-001: Introductory Probability David Murrugarra Department of Mathematics, University of Kentucky http://www.math.uky.edu/~dmu228/ma320/ Spring 2017 David Murrugarra (University of Kentucky) MA 320:
More informationNotes about Filters. Samuel Mimram. December 6, 2012
Notes about Filters Samuel Mimram December 6, 2012 1 Filters and ultrafilters Definition 1. A filter F on a poset (L, ) is a subset of L which is upwardclosed and downward-directed (= is a filter-base):
More informationIntroduction to Probability. Ariel Yadin. Lecture 1. We begin with an example [this is known as Bertrand s paradox]. *** Nov.
Introduction to Probability Ariel Yadin Lecture 1 1. Example: Bertrand s Paradox We begin with an example [this is known as Bertrand s paradox]. *** Nov. 1 *** Question 1.1. Consider a circle of radius
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationMeasure and integration
Chapter 5 Measure and integration In calculus you have learned how to calculate the size of different kinds of sets: the length of a curve, the area of a region or a surface, the volume or mass of a solid.
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationPreliminary statistics
1 Preliminary statistics The solution of a geophysical inverse problem can be obtained by a combination of information from observed data, the theoretical relation between data and earth parameters (models),
More informationTrustworthy, Useful Languages for. Probabilistic Modeling and Inference
Trustworthy, Useful Languages for Probabilistic Modeling and Inference Neil Toronto Dissertation Defense Brigham Young University 2014/06/11 Master s Research: Super-Resolution Toronto et al. Super-Resolution
More informationDiscrete Random Variables Over Domains
Theoretical Computer Sceince, to appear Discrete Random Variables Over Domains Michael Mislove 1 Department of Mathematics Tulane University New Orleans, LA 70118 Abstract In this paper we initiate the
More informationLogic and Probability Lecture 3: Beyond Boolean Logic
Logic and Probability Lecture 3: Beyond Boolean Logic Wesley Holliday & Thomas Icard UC Berkeley & Stanford August 13, 2014 ESSLLI, Tübingen Wesley Holliday & Thomas Icard: Logic and Probability, Lecture
More informationA Domain Theory for Statistical Probabilistic Programming
A Domain Theory for Statistical Probabilistic Programming MATTHIJS VÁKÁR, Columbia University, USA OHAD KAMMAR, University of Oxford, UK SAM STATON, University of Oxford, UK We give an adequate denotational
More information5. Conditional Distributions
1 of 12 7/16/2009 5:36 AM Virtual Laboratories > 3. Distributions > 1 2 3 4 5 6 7 8 5. Conditional Distributions Basic Theory As usual, we start with a random experiment with probability measure P on an
More informationDomain theory and denotational semantics of functional programming
Domain theory and denotational semantics of functional programming Martín Escardó School of Computer Science, Birmingham University MGS 2007, Nottingham, version of April 20, 2007 17:26 What is denotational
More informationLECTURE 1. 1 Introduction. 1.1 Sample spaces and events
LECTURE 1 1 Introduction The first part of our adventure is a highly selective review of probability theory, focusing especially on things that are most useful in statistics. 1.1 Sample spaces and events
More informationRS Chapter 1 Random Variables 6/5/2017. Chapter 1. Probability Theory: Introduction
Chapter 1 Probability Theory: Introduction Basic Probability General In a probability space (Ω, Σ, P), the set Ω is the set of all possible outcomes of a probability experiment. Mathematically, Ω is just
More informationProbability and Inference
Deniz Yuret ECOE 554 Lecture 3 Outline 1 Probabilities and ensembles 2 3 Ensemble An ensemble X is a triple (x, A X, P X ), where the outcome x is the value of a random variable, which takes on one of
More informationHandbook of Logic and Proof Techniques for Computer Science
Steven G. Krantz Handbook of Logic and Proof Techniques for Computer Science With 16 Figures BIRKHAUSER SPRINGER BOSTON * NEW YORK Preface xvii 1 Notation and First-Order Logic 1 1.1 The Use of Connectives
More informationLecture 2: From Classical to Quantum Model of Computation
CS 880: Quantum Information Processing 9/7/10 Lecture : From Classical to Quantum Model of Computation Instructor: Dieter van Melkebeek Scribe: Tyson Williams Last class we introduced two models for deterministic
More informationMA : Introductory Probability
MA 320-001: Introductory Probability David Murrugarra Department of Mathematics, University of Kentucky http://www.math.uky.edu/~dmu228/ma320/ Spring 2017 David Murrugarra (University of Kentucky) MA 320:
More informationStochastic Domain Theory
Stochastic Domain Theory Michael Mislove Tulane University Simons Institute on the Theory of Computing Reunion Workshop on Logical Structures for Computation December 12, 2017 Supported by US AFOSR Scott
More informationMachine Learning. Probability Theory. WS2013/2014 Dmitrij Schlesinger, Carsten Rother
Machine Learning Probability Theory WS2013/2014 Dmitrij Schlesinger, Carsten Rother Probability space is a three-tuple (Ω, σ, P) with: Ω the set of elementary events σ algebra P probability measure σ-algebra
More informationModule 1. Probability
Module 1 Probability 1. Introduction In our daily life we come across many processes whose nature cannot be predicted in advance. Such processes are referred to as random processes. The only way to derive
More informationJUSTIN HARTMANN. F n Σ.
BROWNIAN MOTION JUSTIN HARTMANN Abstract. This paper begins to explore a rigorous introduction to probability theory using ideas from algebra, measure theory, and other areas. We start with a basic explanation
More informationBayesian Approaches Data Mining Selected Technique
Bayesian Approaches Data Mining Selected Technique Henry Xiao xiao@cs.queensu.ca School of Computing Queen s University Henry Xiao CISC 873 Data Mining p. 1/17 Probabilistic Bases Review the fundamentals
More informationArtificial Intelligence
Artificial Intelligence Probabilities Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: AI systems need to reason about what they know, or not know. Uncertainty may have so many sources:
More informationThe Fundamental Principle of Data Science
The Fundamental Principle of Data Science Harry Crane Department of Statistics Rutgers May 7, 2018 Web : www.harrycrane.com Project : researchers.one Contact : @HarryDCrane Harry Crane (Rutgers) Foundations
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,
More informationProbability. Lecture Notes. Adolfo J. Rumbos
Probability Lecture Notes Adolfo J. Rumbos October 20, 204 2 Contents Introduction 5. An example from statistical inference................ 5 2 Probability Spaces 9 2. Sample Spaces and σ fields.....................
More informationSome glances at topos theory. Francis Borceux
Some glances at topos theory Francis Borceux Como, 2018 2 Francis Borceux francis.borceux@uclouvain.be Contents 1 Localic toposes 7 1.1 Sheaves on a topological space.................... 7 1.2 Sheaves
More informationProbability Theory. Prof. Dr. Martin Riedmiller AG Maschinelles Lernen Albert-Ludwigs-Universität Freiburg.
Probability Theory Prof. Dr. Martin Riedmiller AG Maschinelles Lernen Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de Prof. Dr. Martin Riedmiller Machine Learning Lab, University
More informationIntroduction to Information Entropy Adapted from Papoulis (1991)
Introduction to Information Entropy Adapted from Papoulis (1991) Federico Lombardo Papoulis, A., Probability, Random Variables and Stochastic Processes, 3rd edition, McGraw ill, 1991. 1 1. INTRODUCTION
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More informationAPPENDIX C: Measure Theoretic Issues
APPENDIX C: Measure Theoretic Issues A general theory of stochastic dynamic programming must deal with the formidable mathematical questions that arise from the presence of uncountable probability spaces.
More informationPart V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory
Part V 7 Introduction: What are measures and why measurable sets Lebesgue Integration Theory Definition 7. (Preliminary). A measure on a set is a function :2 [ ] such that. () = 2. If { } = is a finite
More informationOperational domain theory and topology of sequential programming languages
Operational domain theory and topology of sequential programming languages Martín Escardó Weng Kin Ho School of Computer Science, University of Birmingham, UK November 2006, revised May 2008 Abstract A
More informationProbability Theory for Machine Learning. Chris Cremer September 2015
Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares
More informationData Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber
Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2017 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields
More informationLecture 10: Introduction to reasoning under uncertainty. Uncertainty
Lecture 10: Introduction to reasoning under uncertainty Introduction to reasoning under uncertainty Review of probability Axioms and inference Conditional probability Probability distributions COMP-424,
More informationLecture 4: Probability, Proof Techniques, Method of Induction Lecturer: Lale Özkahya
BBM 205 Discrete Mathematics Hacettepe University http://web.cs.hacettepe.edu.tr/ bbm205 Lecture 4: Probability, Proof Techniques, Method of Induction Lecturer: Lale Özkahya Resources: Kenneth Rosen, Discrete
More informationReview (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology
Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationDiffeological Spaces and Denotational Semantics for Differential Programming
Kammar, Staton, andvákár Diffeological Spaces and Denotational Semantics for Differential Programming Ohad Kammar, Sam Staton, and Matthijs Vákár MFPS 2018 Halifax 8 June 2018 What is differential programming?
More informationProbability Theory Review
Probability Theory Review Brendan O Connor 10-601 Recitation Sept 11 & 12, 2012 1 Mathematical Tools for Machine Learning Probability Theory Linear Algebra Calculus Wikipedia is great reference 2 Probability
More informationStatistical Inference
Statistical Inference Lecture 1: Probability Theory MING GAO DASE @ ECNU (for course related communications) mgao@dase.ecnu.edu.cn Sep. 11, 2018 Outline Introduction Set Theory Basics of Probability Theory
More informationTopology, Math 581, Fall 2017 last updated: November 24, Topology 1, Math 581, Fall 2017: Notes and homework Krzysztof Chris Ciesielski
Topology, Math 581, Fall 2017 last updated: November 24, 2017 1 Topology 1, Math 581, Fall 2017: Notes and homework Krzysztof Chris Ciesielski Class of August 17: Course and syllabus overview. Topology
More informationAn application of computable distributions to the semantics of probabilistic programming languages
An application of computable distributions to the semantics of probabilistic programming languages Daniel Huang 1 and Greg Morrisett 2 1 Harvard SEAS, Cambridge MA dehuang@fas.harvard.edu 2 Cornell University,
More informationRandom Process Lecture 1. Fundamentals of Probability
Random Process Lecture 1. Fundamentals of Probability Husheng Li Min Kao Department of Electrical Engineering and Computer Science University of Tennessee, Knoxville Spring, 2016 1/43 Outline 2/43 1 Syllabus
More informationLecture 11: Random Variables
EE5110: Probability Foundations for Electrical Engineers July-November 2015 Lecture 11: Random Variables Lecturer: Dr. Krishna Jagannathan Scribe: Sudharsan, Gopal, Arjun B, Debayani The study of random
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair
More informationThe main results about probability measures are the following two facts:
Chapter 2 Probability measures The main results about probability measures are the following two facts: Theorem 2.1 (extension). If P is a (continuous) probability measure on a field F 0 then it has a
More informationProbabilistic Reasoning. (Mostly using Bayesian Networks)
Probabilistic Reasoning (Mostly using Bayesian Networks) Introduction: Why probabilistic reasoning? The world is not deterministic. (Usually because information is limited.) Ways of coping with uncertainty
More informationA Type Theory for Probabilistic and Bayesian Reasoning
A Type Theory for Probabilistic and Bayesian Reasoning Robin Adams 1 and Bart Jacobs 1 1 Institute for Computing and Information Sciences, Radboud University, the Netherlands {r.adams,bart}@cs.ru.nl Abstract
More informationDiffeological Spaces and Denotational Semantics for Differential Programming
Diffeological Spaces and Denotational Semantics for Differential Programming Ohad Kammar, Sam Staton, and Matthijs Vákár Domains 2018 Oxford 8 July 2018 What is differential programming? PL in which all
More informationMachine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples
Machine Learning Bayes Basics Bayes, probabilities, Bayes theorem & examples Marc Toussaint U Stuttgart So far: Basic regression & classification methods: Features + Loss + Regularization & CV All kinds
More informationSTATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Propositional Logic and Probability Theory: Review
STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas Propositional Logic and Probability Theory: Review Logic Logics are formal languages for representing information such that
More informationMCMC Sampling for Bayesian Inference using L1-type Priors
MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling
More information