Scribe Notes 11/30/07

Size: px
Start display at page:

Download "Scribe Notes 11/30/07"

Transcription

1 Scribe Notes /3/7 Lauren Hannah reviewed salient points from Some Developments of the Blackwell-MacQueen Urn Scheme by Jim Pitman. The paper links existing ideas on Dirichlet Processes (Chinese Restaurant Process, stick breaking construction, etc. to the Poisson-Dirichlet distribution. Lauren reviewed measure theory, described the Poisson-Dirichlet distribution, and reviewed theorems from the paper. Peter Frazier then presented a novel extension from a theorem in the paper to Dirichlet hyperparameters. Measure theory review is a state space and is a σ-algebra (a collection of subsets on if \, i.e. the collection of subsets is closed under complements., 2,... i= i, i.e. the collection of subsets is closed under countable unions. xamples = {, }, i.e. the trivial σ-algebra B (, i.e. the Borel σ-algebra generated by the collection of all open sets, most often on the real line R, denoted by B R (, is a measurable space. µ is the function R + : [, ]. µ is a measure if µ( =

2 µ( n n = n µ( n if all subsets i and j i j. are pairwise disjoint. Lebesgue Measure R = length of interval. R 2 = area. Usually denoted by Leb(.2 Dirac Measure Dirac : δ x ( = point x. { x x /, i.e. a point measure [or delta measure] at 2 Random measures Recall that a random variable X is defined as a [deterministic] function that maps the outcome to a state space, e.g. X = f : Ω R. ssume state space (, and outcome [or sample] space (Ω, H. M : Ω R + is a random measure if For fixed, ω M(ω, is a random variable. For fixed ω Ω, M(ω, is a measure on. To understand the equations above it helps to temporarily drop all notions of randomness. In a purely deterministic world, we have two subsets ω and. We assign a mapping from ω, a state of the world to, our subset of, the σ-algebra on the state space that we care about in a random experiment. Furthermore, we additionally have another function M(ω, that maps any combination of ω and to the positive portion of the real line, R +. In formal measure-theory terms, M is a transition kernel from the product space Ω to R +. See figs., 2, and 3. Randomness comes about by stating that some other agent fixed ω in advance, yielding the observed and corresponding M(ω,. or rather, as Prof. Cinlar likes to say, the Greek Goddess of chance, Tyche, fixes ω for you. Doesn t it seem unnecessarily complex and unproductive to describe the source of randomness in a model through ω and then map it to another variable of interest,? Why not just represent probabilities, etc. directly on the values of the variable of interest, 2

3 Figure : Fixing ω Ω. The resulting cdf of M(ω, on (, is depicted here. The cdf is a deterministic measure dependent on ω. ( Figure 2: Fixing. The area of is a random variable dependent on the underlying measure, which itself depends on ω. ( 3 x x x x

4 ( Value x x x x Figure 3: Fixing. Let M(ω, = Poisson(λ. The measure of [# dots in, here 4] is a Poisson randomly distributed variable. The dot pattern on is a realization of outcome ω. n example of a random measure that we ve seen is a Dirichlet Random Measure DRM. In a DRM, realizations of M are almost surely purely atomic measures on. Fixing ω while varying, we obtain a R + -valued scalar M(ω, for each, which is a deterministic measure, here a probability distribution, on (,. This would look like fig. 4. Similarly [as is often done in Poisson random measures, although not commonly with Dirichlet Processes] we can isolate a specific subset, i.e. restrict ourselves to x x x x x x x r r3 ri (xi,ri e.g. through a probability density function? Well, the relationship between states of the world and experiment realizations may be more complicated than just a : bijection, e.g. consider an experiment where the cardinality of H and differ substantially: many (x,r states of the world influencing the outcome of a binary-valued variable of interest vs. only two states of the world influencing the outcome. s more complex issues arise, one finds additional comforts modeling a random experiment in a measure-theoretical terms. Joint probabilities and conditional (x2,r2 expectations may be interpreted as the sequential composition of functions, mapping a value from one space to another, and to the next, and so on. Thus, by positing the existence of ω c. 92, the forefathers of probability theory (Kolmogorov et al. allowed us to retain deterministic tools of math to analyze seemingly random or undeterministic systems. Flowery notions of chance and dice throws were violently usurped, seemingly overnight, by the strict formalism of measure theory and calculus - the very elements of probability theory as we know them today. One can only imagine the bewilderment of haggardly gambling-types upon initial confrontation with a presumably equally-bewildered camp of geeky Russian mathematicians. This mixing phenomenon still resonates within the halls of the Princeton Graduate School, a.k.a Where the Nerd World Meets the Third World. 4

5 Figure 4: DRM realization fixing ω Ω. Discrete stick-valued probability atoms sitting on having =. ( Figure 5: Fixing. 5 x ( x x x

6 only a portion of our state space, while varying ω. This yields a R + -valued scalar random variable ω M(ω,. See fig Poisson random measures Let ν be a measure on (,. random measure N is a Poisson random measure with mean measure ν if, N( is a Poisson random variable. P(N( = k = e ν( (ν( k k! ( when, 2,..., n are disjoint then random variables N(, N( 2,..., N( n are independent. ( xamples of Poisson random measures x x x x Value Value (x,r (x2,r2 x Figure x 6: Poisson x arrival x process. (X,Γ (X2, Γ 2 (xi,ri (Xi, Γ i Figure 7: Compound Poisson Process. x x x x x x x r r3 x x x x x x x Γ( Γ(4 6 Γ(i ri

7 Poisson Process. See fig. 6. T exp(λ. The number of arrivals in each disjoint subinterval of space is independent. is an arbitrary subset, e.g. = {(X i, Γ i : Γ 2 i > X i 2}. Compound Poisson Process. See fig. 7. X i Poisson, Γ i gamma distributed. is a region of the x-y space. # dots in Poisson( dtλdxax e cx, or in other words, a Poisson random variable with parameter p where p is the integral of the mean measure over the region. Here, [, ] [, ], i.e. a Lebesgue measure on x-axis [] and a gamma measure on the y-axis [a value]. For a pictoral understanding of the y-axis, see fig Theorems from the paper We now review some theorems from the paper. 4. Theorem and 2 Let (X n be a sample from F. Then Note: here, µ = αg from before. 4.2 Theorem 3 F X,..., X n Dirichlet(µ n n µ n = µ + δ(x i i= Measure < µ(s <, ν = µ θ, θ = µ(s. Let Γ ( > Γ (2 >... be the points of a Poisson random measure on (, with mean measure θx e x dx. To further aid our understanding of what a gamma measure might look like, we isolate the y-axis from fig. 7 above. Plotting the y-values now horizontally, we observe a dense clustering of points [raindrops] nearer the origin, thinning out as x. See fig. 8. If we were to count the # of points falling in each subinterval of the x-axis, or in other 7

8 Value (X,Γ (Xi, Γ i (x2, Γ 2 x x x x x x x Γ( Γ(4 Γ(i Figure 8: y-values, Γ i in fig. 7 are reordered and labeled Γ (i, here plotted horizontally. The dotted line, the resulting pdf of Γ (i is a gamma distribution. words integrating mean measure θx e x dx over subintervals [, ɛ] and [ɛ, ] we get ɛ ɛ θx e x dx θ e x dx <. ( ɛ ɛ ɛ θx e x dx θe ɛ dx =. (2 x ( reveals that there are a finite # of points as x while (2 reveals that there an infinite # of points on [, ɛ]. Note that the on the righthand side of (2 does not render the measure invalid since we are not using the Lebesgue measure but instead the gamma measure. lso note that θx gives us ( (2 since as x the mean measure θx e x dx [i.e. the intensity of rain drops decreases]. Peter illustrated how we would simulate these raindrops on a computer:. Choose an interval of the x-axis [ɛ, ]. For example, [, ]. 2. To determine the # of points in the subinterval, draw a sample N from a Poisson distribution with rate θx e x dx. That is, compute N = θx e x dx. Let s assume that N = Obtain the x-values for these N points by taking N # of samples from this gamma distribution, conditioned on the interval [, ]. Here, we sample 3 values from the conditional gamma distribution. We then plot the 3 points along the x-axis at those values. 8

9 4. Choose the interval of the x-axis [ ɛ 2, ɛ, e.g. [,. Repeat step 2. 2 s ɛ, we achieve our x-axis raindrop plot as depicted above. Thus, the density parameter θx controls both the number of points and location of points in each subinterval. Generating a sequence of (x i, y i values using the recipe above for y i while drawing x i exp(λ yields the Compound Poisson process depicted above. D. Blei provided some intuition on relating the θ parameter in this Compound Poisson process with the α parameter as we traditionally have seen it. θ, α large = more points having a smaller height. In fig. 7 this would correspond to many raindrops clustered around y = which, in a Dirichlet stick-breaking figure, looks like many low-probability stick lengths spread out evenly along the x-axis. θ, α small = fewer points having a larger height. In fig. 7 this would correspond to fewer raindrops around y = and more raindrops around higher y-values which, in fig. 4, would look like a few highly-varying probability stick lengths with a larger degree of clustering. Continuing with the theorem, put Finally, define P i = Γ (i Σ, Σ = F = i= Γ (i P i δ( ˆX i (3 i= where the ˆX i are i.i.d. (ν, independent also of the Γ (i. It may be worth nothing that we only use the Poisson random measure to generate the gamma random variables (rather than just drawing gamma random variables to get an ordering of the gamma random variables. Otherwise, try to answer the question: have we drawn the largest random variable yet? The second largest? etc. Then F has Dirichlet(θν distribution, independently of Σ which has gamma(θ distribution. How does a gamma distribution emerge from a Poisson random measure? Specifically, how do we have Σ gamma(θ,? Let f(x = x 9

10 nd Nf = f(xn(ω, dx [dx is a measure] R + = x( Γi (dx i= Then e rσ = e rnf (4 ( = exp dxθx e x ( e rx (5 = exp( θ log( + r (6 ( θ = Nf gamma(θ, (7 + r Note: 4 5 is a property of Poisson random measure, e rnf = exp ( ν( e rf. Thus using this setup we are able to order our stick breaks such that they follow a gamma distribution. ach data point X i has an associated value Γ i drawn from a Poisson random measure with mean measure θx e x dx. Reordering Γ i yields a gamma distribution with scale parameter θ as a result as shown in eqn. 7. Integrating Γ (i over ω Ω and normalizing such that = gives the posterior probability stick-lengths. The scale of the obtained gamma distribution θ is exactly the hyperparameter α of the posterior Dirichlet distribution. 5 Theorem 5 [Definition 4 and 6, as well as Theorem 5 and associated corollaries from the paper were written]. 6 Species sampling We demonstrate how the previous definition can be applied to species sampling models. First, let

11 (P i be the frequencies (of each species; ( ˆX i be the tags (of each species; X n X j be the species of the nth observation; be the jth species to be observed; N jn be the number of s the jth species appears in the sample X,..., X n, i.e. n m= (X m = X j ; K n be the number of distinct species in the first n observations, i.e. max{j N jn > }. Further assume that ν is drawn from a uniform distribution U[, ]. Using the Blackwell-MacQueen prediction rule, the predictive distribution can be written as P(X n+ X,..., X n, K n = k = k j= N jn n + θ X ( j + θ ν(. (8 n + θ Observe that the terms N jn and θ can be seen as functions of the partition. n+θ n+θ This leads to generalizations of quation 8 wherein the aforementioned terms are replaced by other functions of the partition. Define N n (N n, N 2n,... (n,..., n k n, and p j (n = P(X n+ = X j N n = n j k(n + (9 P(X = ν(. ( With these definitions in place, (X n now has a distribution determined by p j (n. The Blackwell-MacQueen rule can then be seen as a special case of this formulation, with p j (n = n j n + θ ( j k + θ n + θ (j = k + ( statement about this more general case is given by Proposition. assumes that. (X n is an exchangeable sequence; It

12 2. (X n obeys the predictive rules given by quation 9 and quation. Then the predictive distribution converges (in total variation norm a.s. as n to ( F = j P j δ( X j + where X j is drawn i.i.d. from ν and This proposition implies that the number of values may be finite; j F is almost surely discrete iff j P j = ; the form of F is not specified. P j ν, (2 N jn P j = lim n n. (3 7 xchangeable partition probability functions (PPFs Let [n] = {,..., n} be partitioned into k non-empty subsets,..., k. If (X n is exchangeable then ( k P (X l = X j, l j = p(#,..., # k, (4 j= where p is symmetric and # j denotes the number of elements of j. p is called the exchangeable partition probability function (PPF derived from the exchangeable sequence (X n. Now denote n PPF must then satisfy n = (n,..., n k,,,... (5 n j+ = (n,..., n j +,..., n k,.... (6 p( = (7 p(n = k(n+ j= p(n j+. (8 2

13 Therefore, a p defined in such a way is an PPF of an exchangeable sequence (X n. Such a p can thus be used to define the predictive rules given in quation 9 and quation in such a way as to satisfy the condition for Proposition (quation 2. The p j governing the prediction rule can be written in terms of p as p j (n = p(nj+ p(n j k(n +, p(n >. (9 The previous statement can be clarified by Proposition 3, Corresponding to each pair (p, ν where p is an PPF and ν is a diffuse probability distribution, there is a unique distribution for a sampling sequence (X n such that p is the PPF of (X n and ν is the distribution of X. The implication of all this is the stronger Theorem 4 which states that Given a diffuse probability distribution ν and a sequence of functions p j which satisfy quation 5 and quation 6, let (X n be a sequence governed by quation 9 and quation. (X n is exchangeable iff there exists a non-negative, symmetric function p defined such that quation 9 holds. Then (X n is a sample from F as in quation 2 and the PPF of (X n is the uniquely p. Finally, we can once again return to the Blackwell-MacQueen Urn Scheme we are familiar with. For a given θ, we can write the PPF p θ as p θ (n = θk k i= (n i! [ + θ] n, (2 where [x] m = m j= (x + j. The result of Theorem 4 is that p θ is the PPF of a sample from a Dirichlet process with parameter θν. 3

CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I

CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I X i Ν CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr 2004 Dirichlet Process I Lecturer: Prof. Michael Jordan Scribe: Daniel Schonberg dschonbe@eecs.berkeley.edu 22.1 Dirichlet

More information

Random Process Lecture 1. Fundamentals of Probability

Random Process Lecture 1. Fundamentals of Probability Random Process Lecture 1. Fundamentals of Probability Husheng Li Min Kao Department of Electrical Engineering and Computer Science University of Tennessee, Knoxville Spring, 2016 1/43 Outline 2/43 1 Syllabus

More information

Feature Allocations, Probability Functions, and Paintboxes

Feature Allocations, Probability Functions, and Paintboxes Feature Allocations, Probability Functions, and Paintboxes Tamara Broderick, Jim Pitman, Michael I. Jordan Abstract The problem of inferring a clustering of a data set has been the subject of much research

More information

Stochastic Processes, Kernel Regression, Infinite Mixture Models

Stochastic Processes, Kernel Regression, Infinite Mixture Models Stochastic Processes, Kernel Regression, Infinite Mixture Models Gabriel Huang (TA for Simon Lacoste-Julien) IFT 6269 : Probabilistic Graphical Models - Fall 2018 Stochastic Process = Random Function 2

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

Recitation 2: Probability

Recitation 2: Probability Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions

More information

1 Stochastic Dynamic Programming

1 Stochastic Dynamic Programming 1 Stochastic Dynamic Programming Formally, a stochastic dynamic program has the same components as a deterministic one; the only modification is to the state transition equation. When events in the future

More information

Probability Background

Probability Background CS76 Spring 0 Advanced Machine Learning robability Background Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu robability Meure A sample space Ω is the set of all possible outcomes. Elements ω Ω are called sample

More information

2 Metric Spaces Definitions Exotic Examples... 3

2 Metric Spaces Definitions Exotic Examples... 3 Contents 1 Vector Spaces and Norms 1 2 Metric Spaces 2 2.1 Definitions.......................................... 2 2.2 Exotic Examples...................................... 3 3 Topologies 4 3.1 Open Sets..........................................

More information

X 1 ((, a]) = {ω Ω : X(ω) a} F, which leads us to the following definition:

X 1 ((, a]) = {ω Ω : X(ω) a} F, which leads us to the following definition: nna Janicka Probability Calculus 08/09 Lecture 4. Real-valued Random Variables We already know how to describe the results of a random experiment in terms of a formal mathematical construction, i.e. the

More information

Some Background Material

Some Background Material Chapter 1 Some Background Material In the first chapter, we present a quick review of elementary - but important - material as a way of dipping our toes in the water. This chapter also introduces important

More information

Chapter 1: Probability Theory Lecture 1: Measure space and measurable function

Chapter 1: Probability Theory Lecture 1: Measure space and measurable function Chapter 1: Probability Theory Lecture 1: Measure space and measurable function Random experiment: uncertainty in outcomes Ω: sample space: a set containing all possible outcomes Definition 1.1 A collection

More information

Bayesian Nonparametric Models

Bayesian Nonparametric Models Bayesian Nonparametric Models David M. Blei Columbia University December 15, 2015 Introduction We have been looking at models that posit latent structure in high dimensional data. We use the posterior

More information

Properties of Continuous Probability Distributions The graph of a continuous probability distribution is a curve. Probability is represented by area

Properties of Continuous Probability Distributions The graph of a continuous probability distribution is a curve. Probability is represented by area Properties of Continuous Probability Distributions The graph of a continuous probability distribution is a curve. Probability is represented by area under the curve. The curve is called the probability

More information

Basic Measure and Integration Theory. Michael L. Carroll

Basic Measure and Integration Theory. Michael L. Carroll Basic Measure and Integration Theory Michael L. Carroll Sep 22, 2002 Measure Theory: Introduction What is measure theory? Why bother to learn measure theory? 1 What is measure theory? Measure theory is

More information

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Random variables. DS GA 1002 Probability and Statistics for Data Science. Random variables DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Motivation Random variables model numerical quantities

More information

Bayesian nonparametrics

Bayesian nonparametrics Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability

More information

STAT 7032 Probability Spring Wlodek Bryc

STAT 7032 Probability Spring Wlodek Bryc STAT 7032 Probability Spring 2018 Wlodek Bryc Created: Friday, Jan 2, 2014 Revised for Spring 2018 Printed: January 9, 2018 File: Grad-Prob-2018.TEX Department of Mathematical Sciences, University of Cincinnati,

More information

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University February 7, 2007 2 Contents 1 Metric Spaces 1 1.1 Basic definitions...........................

More information

II - REAL ANALYSIS. This property gives us a way to extend the notion of content to finite unions of rectangles: we define

II - REAL ANALYSIS. This property gives us a way to extend the notion of content to finite unions of rectangles: we define 1 Measures 1.1 Jordan content in R N II - REAL ANALYSIS Let I be an interval in R. Then its 1-content is defined as c 1 (I) := b a if I is bounded with endpoints a, b. If I is unbounded, we define c 1

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Monte Carlo Integration II & Sampling from PDFs

Monte Carlo Integration II & Sampling from PDFs Monte Carlo Integration II & Sampling from PDFs CS295, Spring 2017 Shuang Zhao Computer Science Department University of California, Irvine CS295, Spring 2017 Shuang Zhao 1 Last Lecture Direct illumination

More information

An Introduction to Non-Standard Analysis and its Applications

An Introduction to Non-Standard Analysis and its Applications An Introduction to Non-Standard Analysis and its Applications Kevin O Neill March 6, 2014 1 Basic Tools 1.1 A Shortest Possible History of Analysis When Newton and Leibnitz practiced calculus, they used

More information

5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing.

5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing. 5 Measure theory II 1. Charges (signed measures). Let (Ω, A) be a σ -algebra. A map φ: A R is called a charge, (or signed measure or σ -additive set function) if φ = φ(a j ) (5.1) A j for any disjoint

More information

A PECULIAR COIN-TOSSING MODEL

A PECULIAR COIN-TOSSING MODEL A PECULIAR COIN-TOSSING MODEL EDWARD J. GREEN 1. Coin tossing according to de Finetti A coin is drawn at random from a finite set of coins. Each coin generates an i.i.d. sequence of outcomes (heads or

More information

Measures and Measure Spaces

Measures and Measure Spaces Chapter 2 Measures and Measure Spaces In summarizing the flaws of the Riemann integral we can focus on two main points: 1) Many nice functions are not Riemann integrable. 2) The Riemann integral does not

More information

Lecture 2: Basic Concepts of Statistical Decision Theory

Lecture 2: Basic Concepts of Statistical Decision Theory EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture

More information

the time it takes until a radioactive substance undergoes a decay

the time it takes until a radioactive substance undergoes a decay 1 Probabilities 1.1 Experiments with randomness Wewillusethetermexperimentinaverygeneralwaytorefertosomeprocess that produces a random outcome. Examples: (Ask class for some first) Here are some discrete

More information

Lebesgue Measure on R n

Lebesgue Measure on R n CHAPTER 2 Lebesgue Measure on R n Our goal is to construct a notion of the volume, or Lebesgue measure, of rather general subsets of R n that reduces to the usual volume of elementary geometrical sets

More information

Indeed, if we want m to be compatible with taking limits, it should be countably additive, meaning that ( )

Indeed, if we want m to be compatible with taking limits, it should be countably additive, meaning that ( ) Lebesgue Measure The idea of the Lebesgue integral is to first define a measure on subsets of R. That is, we wish to assign a number m(s to each subset S of R, representing the total length that S takes

More information

Module 1. Probability

Module 1. Probability Module 1 Probability 1. Introduction In our daily life we come across many processes whose nature cannot be predicted in advance. Such processes are referred to as random processes. The only way to derive

More information

6 CARDINALITY OF SETS

6 CARDINALITY OF SETS 6 CARDINALITY OF SETS MATH10111 - Foundations of Pure Mathematics We all have an idea of what it means to count a finite collection of objects, but we must be careful to define rigorously what it means

More information

Measures. Chapter Some prerequisites. 1.2 Introduction

Measures. Chapter Some prerequisites. 1.2 Introduction Lecture notes Course Analysis for PhD students Uppsala University, Spring 2018 Rostyslav Kozhan Chapter 1 Measures 1.1 Some prerequisites I will follow closely the textbook Real analysis: Modern Techniques

More information

Probability theory basics

Probability theory basics Probability theory basics Michael Franke Basics of probability theory: axiomatic definition, interpretation, joint distributions, marginalization, conditional probability & Bayes rule. Random variables:

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables ECE 6010 Lecture 1 Introduction; Review of Random Variables Readings from G&S: Chapter 1. Section 2.1, Section 2.3, Section 2.4, Section 3.1, Section 3.2, Section 3.5, Section 4.1, Section 4.2, Section

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

Chapter 1: Probability Theory Lecture 1: Measure space, measurable function, and integration

Chapter 1: Probability Theory Lecture 1: Measure space, measurable function, and integration Chapter 1: Probability Theory Lecture 1: Measure space, measurable function, and integration Random experiment: uncertainty in outcomes Ω: sample space: a set containing all possible outcomes Definition

More information

POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS

POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS 1.1. The Rutherford-Chadwick-Ellis Experiment. About 90 years ago Ernest Rutherford and his collaborators at the Cavendish Laboratory in Cambridge conducted

More information

Statistics for Financial Engineering Session 2: Basic Set Theory March 19 th, 2006

Statistics for Financial Engineering Session 2: Basic Set Theory March 19 th, 2006 Statistics for Financial Engineering Session 2: Basic Set Theory March 19 th, 2006 Topics What is a set? Notations for sets Empty set Inclusion/containment and subsets Sample spaces and events Operations

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Measure and integration

Measure and integration Chapter 5 Measure and integration In calculus you have learned how to calculate the size of different kinds of sets: the length of a curve, the area of a region or a surface, the volume or mass of a solid.

More information

Measure Theory and Lebesgue Integration. Joshua H. Lifton

Measure Theory and Lebesgue Integration. Joshua H. Lifton Measure Theory and Lebesgue Integration Joshua H. Lifton Originally published 31 March 1999 Revised 5 September 2004 bstract This paper originally came out of my 1999 Swarthmore College Mathematics Senior

More information

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory Part V 7 Introduction: What are measures and why measurable sets Lebesgue Integration Theory Definition 7. (Preliminary). A measure on a set is a function :2 [ ] such that. () = 2. If { } = is a finite

More information

Brief Review of Probability

Brief Review of Probability Brief Review of Probability Nuno Vasconcelos (Ken Kreutz-Delgado) ECE Department, UCSD Probability Probability theory is a mathematical language to deal with processes or experiments that are non-deterministic

More information

ELEMENTS OF PROBABILITY THEORY

ELEMENTS OF PROBABILITY THEORY ELEMENTS OF PROBABILITY THEORY Elements of Probability Theory A collection of subsets of a set Ω is called a σ algebra if it contains Ω and is closed under the operations of taking complements and countable

More information

Sample Spaces, Random Variables

Sample Spaces, Random Variables Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted

More information

Defining the Integral

Defining the Integral Defining the Integral In these notes we provide a careful definition of the Lebesgue integral and we prove each of the three main convergence theorems. For the duration of these notes, let (, M, µ) be

More information

Problem set 1, Real Analysis I, Spring, 2015.

Problem set 1, Real Analysis I, Spring, 2015. Problem set 1, Real Analysis I, Spring, 015. (1) Let f n : D R be a sequence of functions with domain D R n. Recall that f n f uniformly if and only if for all ɛ > 0, there is an N = N(ɛ) so that if n

More information

n [ F (b j ) F (a j ) ], n j=1(a j, b j ] E (4.1)

n [ F (b j ) F (a j ) ], n j=1(a j, b j ] E (4.1) 1.4. CONSTRUCTION OF LEBESGUE-STIELTJES MEASURES In this section we shall put to use the Carathéodory-Hahn theory, in order to construct measures with certain desirable properties first on the real line

More information

Measure and Integration: Concepts, Examples and Exercises. INDER K. RANA Indian Institute of Technology Bombay India

Measure and Integration: Concepts, Examples and Exercises. INDER K. RANA Indian Institute of Technology Bombay India Measure and Integration: Concepts, Examples and Exercises INDER K. RANA Indian Institute of Technology Bombay India Department of Mathematics, Indian Institute of Technology, Bombay, Powai, Mumbai 400076,

More information

Lecture 6 Feb 5, The Lebesgue integral continued

Lecture 6 Feb 5, The Lebesgue integral continued CPSC 550: Machine Learning II 2008/9 Term 2 Lecture 6 Feb 5, 2009 Lecturer: Nando de Freitas Scribe: Kevin Swersky This lecture continues the discussion of the Lebesque integral and introduces the concepts

More information

Lecture 3a: Dirichlet processes

Lecture 3a: Dirichlet processes Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics

More information

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process 10-708: Probabilistic Graphical Models, Spring 2015 19 : Bayesian Nonparametrics: The Indian Buffet Process Lecturer: Avinava Dubey Scribes: Rishav Das, Adam Brodie, and Hemank Lamba 1 Latent Variable

More information

Spring 2014 Advanced Probability Overview. Lecture Notes Set 1: Course Overview, σ-fields, and Measures

Spring 2014 Advanced Probability Overview. Lecture Notes Set 1: Course Overview, σ-fields, and Measures 36-752 Spring 2014 Advanced Probability Overview Lecture Notes Set 1: Course Overview, σ-fields, and Measures Instructor: Jing Lei Associated reading: Sec 1.1-1.4 of Ash and Doléans-Dade; Sec 1.1 and A.1

More information

Bayesian Nonparametrics: Dirichlet Process

Bayesian Nonparametrics: Dirichlet Process Bayesian Nonparametrics: Dirichlet Process Yee Whye Teh Gatsby Computational Neuroscience Unit, UCL http://www.gatsby.ucl.ac.uk/~ywteh/teaching/npbayes2012 Dirichlet Process Cornerstone of modern Bayesian

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R. Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions

More information

Almost Sure Convergence of a Sequence of Random Variables

Almost Sure Convergence of a Sequence of Random Variables Almost Sure Convergence of a Sequence of Random Variables (...for people who haven t had measure theory.) 1 Preliminaries 1.1 The Measure of a Set (Informal) Consider the set A IR 2 as depicted below.

More information

Abstract Measure Theory

Abstract Measure Theory 2 Abstract Measure Theory Lebesgue measure is one of the premier examples of a measure on R d, but it is not the only measure and certainly not the only important measure on R d. Further, R d is not the

More information

Notes on ordinals and cardinals

Notes on ordinals and cardinals Notes on ordinals and cardinals Reed Solomon 1 Background Terminology We will use the following notation for the common number systems: N = {0, 1, 2,...} = the natural numbers Z = {..., 2, 1, 0, 1, 2,...}

More information

DS-GA 1002 Lecture notes 2 Fall Random variables

DS-GA 1002 Lecture notes 2 Fall Random variables DS-GA 12 Lecture notes 2 Fall 216 1 Introduction Random variables Random variables are a fundamental tool in probabilistic modeling. They allow us to model numerical quantities that are uncertain: the

More information

Analysis of Probabilistic Systems

Analysis of Probabilistic Systems Analysis of Probabilistic Systems Bootcamp Lecture 2: Measure and Integration Prakash Panangaden 1 1 School of Computer Science McGill University Fall 2016, Simons Institute Panangaden (McGill) Analysis

More information

Lecture 6 Basic Probability

Lecture 6 Basic Probability Lecture 6: Basic Probability 1 of 17 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 6 Basic Probability Probability spaces A mathematical setup behind a probabilistic

More information

Measure-theoretic probability

Measure-theoretic probability Measure-theoretic probability Koltay L. VEGTMAM144B November 28, 2012 (VEGTMAM144B) Measure-theoretic probability November 28, 2012 1 / 27 The probability space De nition The (Ω, A, P) measure space is

More information

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2) 14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

University of Regina. Lecture Notes. Michael Kozdron

University of Regina. Lecture Notes. Michael Kozdron University of Regina Statistics 252 Mathematical Statistics Lecture Notes Winter 2005 Michael Kozdron kozdron@math.uregina.ca www.math.uregina.ca/ kozdron Contents 1 The Basic Idea of Statistics: Estimating

More information

The strictly 1/2-stable example

The strictly 1/2-stable example The strictly 1/2-stable example 1 Direct approach: building a Lévy pure jump process on R Bert Fristedt provided key mathematical facts for this example. A pure jump Lévy process X is a Lévy process such

More information

CONVERGENCE OF RANDOM SERIES AND MARTINGALES

CONVERGENCE OF RANDOM SERIES AND MARTINGALES CONVERGENCE OF RANDOM SERIES AND MARTINGALES WESLEY LEE Abstract. This paper is an introduction to probability from a measuretheoretic standpoint. After covering probability spaces, it delves into the

More information

A Measure Theory Tutorial (Measure Theory for Dummies) Maya R. Gupta

A Measure Theory Tutorial (Measure Theory for Dummies) Maya R. Gupta A Measure Theory Tutorial (Measure Theory for Dummies) Maya R. Gupta {gupta}@ee.washington.edu Dept of EE, University of Washington Seattle WA, 98195-2500 UW Electrical Engineering UWEE Technical Report

More information

13. Examples of measure-preserving tranformations: rotations of a torus, the doubling map

13. Examples of measure-preserving tranformations: rotations of a torus, the doubling map 3. Examples of measure-preserving tranformations: rotations of a torus, the doubling map 3. Rotations of a torus, the doubling map In this lecture we give two methods by which one can show that a given

More information

Lecture 7. Sums of random variables

Lecture 7. Sums of random variables 18.175: Lecture 7 Sums of random variables Scott Sheffield MIT 18.175 Lecture 7 1 Outline Definitions Sums of random variables 18.175 Lecture 7 2 Outline Definitions Sums of random variables 18.175 Lecture

More information

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by L p Functions Given a measure space (, µ) and a real number p [, ), recall that the L p -norm of a measurable function f : R is defined by f p = ( ) /p f p dµ Note that the L p -norm of a function f may

More information

STAT 712 MATHEMATICAL STATISTICS I

STAT 712 MATHEMATICAL STATISTICS I STAT 72 MATHEMATICAL STATISTICS I Fall 207 Lecture Notes Joshua M. Tebbs Department of Statistics University of South Carolina c by Joshua M. Tebbs TABLE OF CONTENTS Contents Probability Theory. Set Theory......................................2

More information

Review of Probability. CS1538: Introduction to Simulations

Review of Probability. CS1538: Introduction to Simulations Review of Probability CS1538: Introduction to Simulations Probability and Statistics in Simulation Why do we need probability and statistics in simulation? Needed to validate the simulation model Needed

More information

Stat 451: Solutions to Assignment #1

Stat 451: Solutions to Assignment #1 Stat 451: Solutions to Assignment #1 2.1) By definition, 2 Ω is the set of all subsets of Ω. Therefore, to show that 2 Ω is a σ-algebra we must show that the conditions of the definition σ-algebra are

More information

RS Chapter 1 Random Variables 6/5/2017. Chapter 1. Probability Theory: Introduction

RS Chapter 1 Random Variables 6/5/2017. Chapter 1. Probability Theory: Introduction Chapter 1 Probability Theory: Introduction Basic Probability General In a probability space (Ω, Σ, P), the set Ω is the set of all possible outcomes of a probability experiment. Mathematically, Ω is just

More information

Probability Theory Review

Probability Theory Review Cogsci 118A: Natural Computation I Lecture 2 (01/07/10) Lecturer: Angela Yu Probability Theory Review Scribe: Joseph Schilz Lecture Summary 1. Set theory: terms and operators In this section, we provide

More information

RVs and their probability distributions

RVs and their probability distributions RVs and their probability distributions RVs and their probability distributions In these notes, I will use the following notation: The probability distribution (function) on a sample space will be denoted

More information

MTH 202 : Probability and Statistics

MTH 202 : Probability and Statistics MTH 202 : Probability and Statistics Lecture 5-8 : 15, 20, 21, 23 January, 2013 Random Variables and their Probability Distributions 3.1 : Random Variables Often while we need to deal with probability

More information

Lecture 9: Conditional Probability and Independence

Lecture 9: Conditional Probability and Independence EE5110: Probability Foundations July-November 2015 Lecture 9: Conditional Probability and Independence Lecturer: Dr. Krishna Jagannathan Scribe: Vishakh Hegde 9.1 Conditional Probability Definition 9.1

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

MFM Practitioner Module: Quantitative Risk Management. John Dodson. September 23, 2015

MFM Practitioner Module: Quantitative Risk Management. John Dodson. September 23, 2015 MFM Practitioner Module: Quantitative Risk Management September 23, 2015 Mixtures Mixtures Mixtures Definitions For our purposes, A random variable is a quantity whose value is not known to us right now

More information

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

STAT2201. Analysis of Engineering & Scientific Data. Unit 3 STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of Queensland School of Mathematics and Physics What we learned in Unit 2 (1) We defined a sample space of a random

More information

Sample Space: Specify all possible outcomes from an experiment. Event: Specify a particular outcome or combination of outcomes.

Sample Space: Specify all possible outcomes from an experiment. Event: Specify a particular outcome or combination of outcomes. Chapter 2 Introduction to Probability 2.1 Probability Model Probability concerns about the chance of observing certain outcome resulting from an experiment. However, since chance is an abstraction of something

More information

INTRODUCTION TO FURSTENBERG S 2 3 CONJECTURE

INTRODUCTION TO FURSTENBERG S 2 3 CONJECTURE INTRODUCTION TO FURSTENBERG S 2 3 CONJECTURE BEN CALL Abstract. In this paper, we introduce the rudiments of ergodic theory and entropy necessary to study Rudolph s partial solution to the 2 3 problem

More information

Dependent hierarchical processes for multi armed bandits

Dependent hierarchical processes for multi armed bandits Dependent hierarchical processes for multi armed bandits Federico Camerlenghi University of Bologna, BIDSA & Collegio Carlo Alberto First Italian meeting on Probability and Mathematical Statistics, Torino

More information

Math 416 Lecture 2 DEFINITION. Here are the multivariate versions: X, Y, Z iff P(X = x, Y = y, Z =z) = p(x, y, z) of X, Y, Z iff for all sets A, B, C,

Math 416 Lecture 2 DEFINITION. Here are the multivariate versions: X, Y, Z iff P(X = x, Y = y, Z =z) = p(x, y, z) of X, Y, Z iff for all sets A, B, C, Math 416 Lecture 2 DEFINITION. Here are the multivariate versions: PMF case: p(x, y, z) is the joint Probability Mass Function of X, Y, Z iff P(X = x, Y = y, Z =z) = p(x, y, z) PDF case: f(x, y, z) is

More information

Statistics 1 - Lecture Notes Chapter 1

Statistics 1 - Lecture Notes Chapter 1 Statistics 1 - Lecture Notes Chapter 1 Caio Ibsen Graduate School of Economics - Getulio Vargas Foundation April 28, 2009 We want to establish a formal mathematic theory to work with results of experiments

More information

Annalee Gomm Math 714: Assignment #2

Annalee Gomm Math 714: Assignment #2 Annalee Gomm Math 714: Assignment #2 3.32. Verify that if A M, λ(a = 0, and B A, then B M and λ(b = 0. Suppose that A M with λ(a = 0, and let B be any subset of A. By the nonnegativity and monotonicity

More information

Dirichlet Process. Yee Whye Teh, University College London

Dirichlet Process. Yee Whye Teh, University College London Dirichlet Process Yee Whye Teh, University College London Related keywords: Bayesian nonparametrics, stochastic processes, clustering, infinite mixture model, Blackwell-MacQueen urn scheme, Chinese restaurant

More information

A Measure and Integral over Unbounded Sets

A Measure and Integral over Unbounded Sets A Measure and Integral over Unbounded Sets As presented in Chaps. 2 and 3, Lebesgue s theory of measure and integral is limited to functions defined over bounded sets. There are several ways of introducing

More information

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )

More information

SETS AND FUNCTIONS JOSHUA BALLEW

SETS AND FUNCTIONS JOSHUA BALLEW SETS AND FUNCTIONS JOSHUA BALLEW 1. Sets As a review, we begin by considering a naive look at set theory. For our purposes, we define a set as a collection of objects. Except for certain sets like N, Z,

More information

Supervised Learning: Non-parametric Estimation

Supervised Learning: Non-parametric Estimation Supervised Learning: Non-parametric Estimation Edmondo Trentin March 18, 2018 Non-parametric Estimates No assumptions are made on the form of the pdfs 1. There are 3 major instances of non-parametric estimates:

More information

18.175: Lecture 2 Extension theorems, random variables, distributions

18.175: Lecture 2 Extension theorems, random variables, distributions 18.175: Lecture 2 Extension theorems, random variables, distributions Scott Sheffield MIT Outline Extension theorems Characterizing measures on R d Random variables Outline Extension theorems Characterizing

More information

5 Set Operations, Functions, and Counting

5 Set Operations, Functions, and Counting 5 Set Operations, Functions, and Counting Let N denote the positive integers, N 0 := N {0} be the non-negative integers and Z = N 0 ( N) the positive and negative integers including 0, Q the rational numbers,

More information

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define HILBERT SPACES AND THE RADON-NIKODYM THEOREM STEVEN P. LALLEY 1. DEFINITIONS Definition 1. A real inner product space is a real vector space V together with a symmetric, bilinear, positive-definite mapping,

More information