Probability Background

Size: px
Start display at page:

Download "Probability Background"

Transcription

1 CS76 Spring 0 Advanced Machine Learning robability Background Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu robability Meure A sample space Ω is the set of all possible outcomes. Elements ω Ω are called sample outcomes, while subsets A Ω are called events. For example, for a die roll, Ω = {,,, 4, 5, 6}, ω = 5 is an outcome, A = {5} is the event that the outcome is 5, and A = {,, 5} is the event that the outcome is odd. Ideally, one would like to be able to sign probabilities to all As. This is trivial for finite Ω. However, when Ω = R strange things happen: it is no longer possible to consistently sign probabilities to all subsets of R. The problem is not singleton sets such A = {π}, which happily h probability 0 (we will define it formally). This happens because of the existence of non-meurable sets, whose construction is non-trivial. See for example http: // www. math. kth. se/ matstat/ gru/ godis/ nonme. pdf Instead, we will restrict ourselves to only some of the events. A σ-algebra B is a set of Ω subsets satisfying:. B. if A B then A c B (complementarity). if A, A,... B then i= A i B (countable unions) The sets in B are called meurable. The pair (Ω, B) is called a meurable space. For Ω = R, we take the smallest σ-algebra that contains all the open subsets and call it the Borel σ-algebra. It turns out the Borel σ-algebra can be defined alternatively the smallest σ-algebra that contains all the closed subsets. Can you see why? It also follows that singleton sets {x} where x R is in the Borel σ-algebra. A meure is a function : B R satisfying:. (A) 0 for all A B. If A, A,... are disjoint then ( i= A i) = i= (A i) Note these imply ( ) = 0. The triple (Ω, B, ) is called a meure space. The Lebesgue meure is a uniform meure over the Borel σ-algebra with the usual meaning of length, area or volume, depending on dimensionality. For example, for R the Lebesgue meure of the interval (a, b) is b a. A probability meure is a meure satisfying additionally the normalization constraint: (Ω) =. Such a triple (Ω, B, ) is called a probability space. When Ω is finite, ({ω}) h the intuitive meaning: the chance that the outcome is ω. When Ω = R, ((a, b)) is the probability ms signed to the interval (a, b).

2 robability Background Random Variables Let (Ω, B, ) be a probability space. Let (R, R) be the usual meurable space of reals and its Borel σ- algebra. A random variable is a function X : Ω R such that the preimage of any set A R is meurable in B: X (A) = {ω : X(ω) A} B. This allows us to define the following (the first is the new definition, while the nd and rd s are the already-defined probability meure on B): (X A) = (X (A)) = ({ω : X(ω) A}) () (X = x) = (X (x)) = ({ω : X(ω) = x}) () A random variable is a function. Intuitively, the world generates random outcomes ω, while a random variable deterministically translates them into numbers. Example Let Ω = {(x, y) : x + y } be the unit disk. Consider drawing a point at random from Ω. The outcome is of the form ω = (x, y). Some example random variables are X(ω) = x, Y (ω) = y, Z(ω) = xy, and W (ω) = x + y. Example Let X = ω be a uniform random variable (to be defined later) with the sample space Ω = [0, ]. A sequence of different random variables {X n } n= can be defined follows, where {z} = if z is true, and 0 otherwise: X (ω) = ω + {ω [0, ]} () X (ω) = ω + {ω [0, ]} (4) X (ω) = ω + {ω [, ]} (5) X 4 (ω) = ω + {ω [0, ]} (6) X 5 (ω) = ω + {ω [, ]} (7) X 6 (ω) = ω + {ω [, ]} (8)... Given a random variable X, the cumulative distribution function (CDF) is the function F X : R [0, ] F X (x) = (X x) = (X (, x]) = ({ω : X(ω) (, x]}). (9) A random variable X is discrete if it takes countably many values. We define the probability ms function f X (x) = (X = x). A random variable X is continuous if there exists a function f X such that. f X (x) 0 for all x R. f X(x)dx =. for every a b, (X [a, b]) = b a f X(x)dx. The function f X is called the probability density function (DF) of X. CDF and DF are related by F X (x) = x f X (t)dt (0) f X (x) = F X(x) at all points x at which F X is differentiable. () It is certainly possible for the DF of a continuous random variable to be larger than one: f X (x). In fact, the DF can be unbounded in x / for x (0, ) and 0 otherwise. However, (x) = 0 for all x. Recall that and f X is related by integration over an interval. We will often use p instead of f X to denote a DF later in cls.

3 robability Background Some Random Variables. Discrete Dirac or point ms distribution X δ a if (X = a) = with CDF F (x) = 0 if x < a and if x a. Binomial. A random variable X h a binomial distribution with parameters n (number of trials) and p (head probability) X Binomial(n, p) with probability ms function ( ) n p x x ( p) n x for x = 0,,..., n () 0 otherwise If X Binomial(n, p) and X Binomial(n, p) then X + X Binomial(n + n, p). Think this merging two coin flip experiments on the same coin. The binomial distribution is used to model test set error of a clsifier. Assuming a clsifier s true error rate is p (with respect to the unknown underlying joint distribution we will make it precise later in cls), then on a test set of size n the number of misclsified items follow Binomial(n, p). Bernoulli. Binomial with n =. Multinomial. The d-dimensional version of binomial. The parameter p = (p,..., p d ) is now the probabilities of a d-sided die, and x = (x,..., x d ) is the counts of each face. ( ) n d x,..., x k= px k k if d k= x k = n d () 0 otherwise The multinomial distribution is typically used with the bag-of-word representation of text documents. λ λx oisson. X oisson(λ) if e x! for x = 0,,,.... If X oisson(λ ) and X oisson(λ ) then X + X oisson(λ + λ ). This is a distribution on unbounded counts with a probability ms function hump (mode not the mean at λ ). It can be used, for example, to model the length of a document. Geometric. X Geom(p) if p( p) x for x =,,... This is another distribution on unbounded counts. Its probability ms function h no hump (mode not the mean at ) and decrees monotonically. Think of it a stick breaking procedure: start with a stick of length. Each day you take away p fraction of the remaining stick. Then f(x) is the length you get on day x. We will see more interesting stick breaking in Bayesian nonparametrics.. Continuous Gaussian (also called Normal distribution). X N(µ, σ ) with parameters µ R (the mean) and σ (the variance) if ) ( σ π exp (x µ) σ. (4) The square root of variance σ > 0 is called the standard deviation. If µ = 0, σ =, X h a standard normal distribution. In this ce, X is usually written Z. Some useful properties: (Scaling) If X N(µ, σ ), then Z = (X µ)/σ N(0, ) (Independent sum) If X i N(µ i, σi ) are independent, then i X i N ( i µ i, ) i σ i Note there is concentration of meure in the independent sum: the spread or stddev grows n, not n.

4 robability Background 4 This is one DF you need to pay close attention to. The CDF of a standard normal is usually written Φ(z), which h no closed-form expression. However, it qualitatively resembles a sigmoid function and is often used in translating unbounded real margin values into probabilities of binary clses. You might recognize the RBF kernel an unnormalized version of the Gaussian DF. The parameter σ or confusingly σ or a scaled version of either is called the bandwidth. χ distribution. If Z,..., Z k are independent standard normal random variables, then Y = k i Z i h a χ distribution with k degrees of freedom. If X i N(µ i, σi ) are independent, then ( ) Xi µ i i σ i h a χ distribution with k degrees of freedom. The DF for the χ distribution with k degrees of freedom is k/ Γ(k/) xk/ e x/, x > 0. (5) Multivariate Gaussian. Let x, µ R d, Σ S+ d a symmetric, positive definite matrix of size d d. Then X N(µ, Σ) with DF ( exp ) Σ / (π) d/ (x µ) Σ (x µ). (6) Here, µ is the mean vector, Σ is the covariance matrix, Σ its determinant, and Σ its inverse (exists because Σ is positive definite). If we have two (groups of) variables that are jointly Gaussian: then we have: (Marginal) x N(µ x, A) [ ] x y N ([ µx µ y ] [ ]) A C, C B (Conditional) y x N(µ y + C A (x µ x ), B C A C) You want to pay attention to Multivariate Gaussian too. It is used frequently in mixture models (Mixture of Gaussian), and building blocks for Gaussian rocesses. The conditional is useful for inference. Exponential. X Exp(β) with parameter β > 0, if β e x/β. Not to be confused with the exponential family (more later). Gamma. The Gamma function (not distribution) is defined Γ(α) = 0 y α e y dy with α > 0. The Gamma function is an extension to the factorial function: Γ(n) = (n )! when n is a positive integer. It also satisfies Γ(α + ) = αγ(α) for α > 0. X h a Gamma distribution with shape parameter α > 0 and scale parameter β > 0, denoted by X Gamma(α, β), if Gamma(, β) is the same Exp(β). Beta. X Beta(α, β) with parameters α, β > 0, if (7) β α Γ(α) xα e x/β, x > 0. (8) Γ(α + β) Γ(α)Γ(β) xα ( x) β, x (0, ). (9) Beta(, ) is uniform in [0, ]. Beta(α <, β < ) h a U-shape. Beta(α >, β > ) is unimodal with mean α/(α + β) and mode (α )/(α + β ). Beta distribution h a special status in machine learning because it is the conjugate prior of the binomial and Bernoulli distributions. A draw from a beta distribution can be thought of generating a (bied) coin. A draw from the corresponding Bernoulli distribution can be thought of a flip of that coin.

5 robability Background 5 Dirichlet. The Dirichlet distribution is the multivariate version of the beta distribution. X Dir(α,..., α d ) with parameters α i > 0, if Γ( d i α i) d d i Γ(α x αi i, (0) i) where x = (x,..., x d ) with x i > 0, d i x i =. This support is called the open (d ) dimensional simplex. The Dirichlet distribution is important for machine learning because it is the conjugate prior for multinomial distributions. The analogy here is that of a dice factory (Dirichlet) and die rolls (multinomial). It is used extensively for modeling bag-of-word documents. We will also see it in Dirichlet rocesses. t (or Student s t) Distribution X t ν h a t distribution with ν degrees of freedom, if Γ ( ) ν+ ( νπγ ν ) i ) (ν+)/ ( + x () ν It is similar to a Gaussian distribution, but with heavier tails (more likely to produce extreme values). When ν =, it becomes a standard Gaussian distribution. Student is the pen name of William Gosset who worked for Guinness, because the brewery did not want its employees to publish any scientific papers for the fear of leaking trade secret (sounds familiar?). Cauchy. The Cauchy distribution is a special ce of t-distribution with ν =. The DF is ( πγ + ( ) ) () x x 0 γ where x 0 is the location parameter for the mode, and γ is the scale parameter. It is notable for the lack of mean: E(X) does not exist because x x df X(x) =. Similarly, it h no variance or higher moments. However, the mode and median are both x 0. This is a very heavy-tailed distribution compared to Gaussian, so much so that draws from Cauchy will occionally produce very large values and the sample mean never settles. 4 Convergence of Random Variables Let X, X,..., X n,... be a sequence of random variables, and X be another random variable. Think of X n the clsifier trained on a training set of size n. It is a random variable because the training set is random (an iid sample of the underlying unknown joint distribution XY ). Think of X the true clsifier (well, it is a fixed quantity, not random, but that s fine). We are interested in how the clsifiers behave we have more and more training data: Do they get closer to X? What do we mean by close? These questions are made precise by the different types of convergence. Study of such topics are called large sample theory or ymptotic theory. X n converges to X in distribution, written X n X, if lim F X n (t) = F (t) () at all t where F is continuous. Here, F Xn is the CDF of X n, and F is the CDF of X. We expect to see the next outcome in a sequence of random experiments becoming better and better modeled by the probability distribution of X. In other words, the probability for X n to be in a given interval is approximately equal to the probability for X to be in the same interval, n grows. Example Let X,..., X n be iid continuous random variables. Then trivially X n X. But note (X = X n ) = 0.

6 robability Background 6 It is their distributions that are the same, not their values which are controlled by different randomness. Example 4 Let X,..., X n be identical (but not independent) copies of X N(0, ). Then X n Y = X. Example 5 Let X n uniform[0, /n]. Then X n δ 0. This is often written X n 0. Interestingly, note F (0) = but F Xn (0) = 0 for all n, so lim F Xn (0) F (0). This does not contradict the definition of convergence in distribution, because t = 0 is not a point at which F is continuous. Example 6 Let X n h the DF f n (x) = ( cos(πnx)), x (0, ). Then X n uniform(0, ). Note the DFs f n s do not converge at all, but the CDFs do. Theorem (The Central Limit Theorem). Let X,..., X n be iid with finite mean µ and finite variance σ > 0. Let X n = n n i X i. Then Z n = X n µ σ/ N(0, ). (4) n Example 7 Let X i uniform(, ) for i =,... Note µ = 0, σ = /. Then Z n = n i X i N(0, ). Example 8 Let X i beta(, ). Note µ =, σ = 8. Then Z n = 8n ( n i X i / ) N(0, ). The Central Limit Theorem is the most widely used application of convergence in distribution. Demo: CLT uniform.m and CLT beta.m Does the CLT apply to the Cauchy distribution? X n converges to X in probability, written X n X, if for any ɛ > 0 lim ( X n X > ɛ) = 0. (5) Convergence in probability is central to machine learning because a very important concept, consistency, is defined using it. Convergence in probability implies convergence in distribution. The reverse is not true in general. However, convergence in distribution to a point ms distribution implies convergence in probability. Example, Example 4, and Example 6 do not converge in probability. Example 5 does. The definition might be eier to understand if we rewrite it lim ({ω : X n(ω) X(ω) > ɛ}) = 0. That is, the fraction of outcomes ω on which X n and X disagree must shrink to zero. When X n (ω) and X(ω) do disagree, they can differ by a lot in value. More importantly, note that X n (ω) does not need to converge to X(ω) pointwise for any ω. This will be the distinguishing property between converge in probability and converge almost surely (to be defined shortly). Example 9 X n X in Example. X n converges almost surely to X, written X n X, if ({ }) ω : lim X n(ω) = X(ω) = (6)

7 robability Background 7 Example 0 Example does not converge almost surely to X there. This is because lim X n (ω) does not exist for any ω. ointwise convergence is central to. implies, which in turn implies. Example X n 0 (and hence X n 0 and Xn 0) does not imply convergence in expectation E(X n ) 0. To see this, let { /n, if Xn = n (X n ) = (7) /n, if X n = 0 Then X n 0. However, E(X n ) = n n = n does not converge. X n converges in rth mean where r, written X n L r X, if L r implies. lim E( X n X r ) = 0. (8) Lr implies, Ls if r > s. There is no general order between Lr and. Theorem (The Weak Law of Large Numbers). If X,..., X n are iid, then X n = n n i= X i µ(x ). Theorem (The Strong Law of Large Numbers). If X,..., X n are iid, and E( X ) <, then X n = n n i= X i µ(x ).

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

Algorithms for Uncertainty Quantification

Algorithms for Uncertainty Quantification Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

Continuous Random Variables

Continuous Random Variables Continuous Random Variables Recall: For discrete random variables, only a finite or countably infinite number of possible values with positive probability. Often, there is interest in random variables

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

1.1 Review of Probability Theory

1.1 Review of Probability Theory 1.1 Review of Probability Theory Angela Peace Biomathemtics II MATH 5355 Spring 2017 Lecture notes follow: Allen, Linda JS. An introduction to stochastic processes with applications to biology. CRC Press,

More information

Stat 5101 Notes: Brand Name Distributions

Stat 5101 Notes: Brand Name Distributions Stat 5101 Notes: Brand Name Distributions Charles J. Geyer February 14, 2003 1 Discrete Uniform Distribution DiscreteUniform(n). Discrete. Rationale Equally likely outcomes. The interval 1, 2,..., n of

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Chapter 5. Chapter 5 sections

Chapter 5. Chapter 5 sections 1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Language as a Stochastic Process

Language as a Stochastic Process CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any

More information

Lecture 4. Continuous Random Variables and Transformations of Random Variables

Lecture 4. Continuous Random Variables and Transformations of Random Variables Math 408 - Mathematical Statistics Lecture 4. Continuous Random Variables and Transformations of Random Variables January 25, 2013 Konstantin Zuev (USC) Math 408, Lecture 4 January 25, 2013 1 / 13 Agenda

More information

Chapter 2: Random Variables

Chapter 2: Random Variables ECE54: Stochastic Signals and Systems Fall 28 Lecture 2 - September 3, 28 Dr. Salim El Rouayheb Scribe: Peiwen Tian, Lu Liu, Ghadir Ayache Chapter 2: Random Variables Example. Tossing a fair coin twice:

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Random variables. DS GA 1002 Probability and Statistics for Data Science. Random variables DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Motivation Random variables model numerical quantities

More information

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables ECE 6010 Lecture 1 Introduction; Review of Random Variables Readings from G&S: Chapter 1. Section 2.1, Section 2.3, Section 2.4, Section 3.1, Section 3.2, Section 3.5, Section 4.1, Section 4.2, Section

More information

7 Random samples and sampling distributions

7 Random samples and sampling distributions 7 Random samples and sampling distributions 7.1 Introduction - random samples We will use the term experiment in a very general way to refer to some process, procedure or natural phenomena that produces

More information

1 Probability theory. 2 Random variables and probability theory.

1 Probability theory. 2 Random variables and probability theory. Probability theory Here we summarize some of the probability theory we need. If this is totally unfamiliar to you, you should look at one of the sources given in the readings. In essence, for the major

More information

MFM Practitioner Module: Quantitative Risk Management. John Dodson. September 23, 2015

MFM Practitioner Module: Quantitative Risk Management. John Dodson. September 23, 2015 MFM Practitioner Module: Quantitative Risk Management September 23, 2015 Mixtures Mixtures Mixtures Definitions For our purposes, A random variable is a quantity whose value is not known to us right now

More information

Statistical Methods in Particle Physics

Statistical Methods in Particle Physics Statistical Methods in Particle Physics Lecture 3 October 29, 2012 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline Reminder: Probability density function Cumulative

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Probability and Distributions

Probability and Distributions Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated

More information

Stat 5101 Notes: Brand Name Distributions

Stat 5101 Notes: Brand Name Distributions Stat 5101 Notes: Brand Name Distributions Charles J. Geyer September 5, 2012 Contents 1 Discrete Uniform Distribution 2 2 General Discrete Uniform Distribution 2 3 Uniform Distribution 3 4 General Uniform

More information

Recitation 2: Probability

Recitation 2: Probability Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions

More information

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text. TEST #3 STA 5326 December 4, 214 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. (You will have access to

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

1 Presessional Probability

1 Presessional Probability 1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional

More information

Lecture 2: Conjugate priors

Lecture 2: Conjugate priors (Spring ʼ) Lecture : Conjugate priors Julia Hockenmaier juliahmr@illinois.edu Siebel Center http://www.cs.uiuc.edu/class/sp/cs98jhm The binomial distribution If p is the probability of heads, the probability

More information

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014 Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,

More information

Chapter 1. Sets and probability. 1.3 Probability space

Chapter 1. Sets and probability. 1.3 Probability space Random processes - Chapter 1. Sets and probability 1 Random processes Chapter 1. Sets and probability 1.3 Probability space 1.3 Probability space Random processes - Chapter 1. Sets and probability 2 Probability

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

DS-GA 1002 Lecture notes 2 Fall Random variables

DS-GA 1002 Lecture notes 2 Fall Random variables DS-GA 12 Lecture notes 2 Fall 216 1 Introduction Random variables Random variables are a fundamental tool in probabilistic modeling. They allow us to model numerical quantities that are uncertain: the

More information

Lecture 1. ABC of Probability

Lecture 1. ABC of Probability Math 408 - Mathematical Statistics Lecture 1. ABC of Probability January 16, 2013 Konstantin Zuev (USC) Math 408, Lecture 1 January 16, 2013 1 / 9 Agenda Sample Spaces Realizations, Events Axioms of Probability

More information

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3. Mathematical Statistics: Homewor problems General guideline. While woring outside the classroom, use any help you want, including people, computer algebra systems, Internet, and solution manuals, but mae

More information

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014 Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions

More information

Northwestern University Department of Electrical Engineering and Computer Science

Northwestern University Department of Electrical Engineering and Computer Science Northwestern University Department of Electrical Engineering and Computer Science EECS 454: Modeling and Analysis of Communication Networks Spring 2008 Probability Review As discussed in Lecture 1, probability

More information

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019 Lecture 10: Probability distributions DANIEL WELLER TUESDAY, FEBRUARY 19, 2019 Agenda What is probability? (again) Describing probabilities (distributions) Understanding probabilities (expectation) Partial

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Brief Review of Probability

Brief Review of Probability Maura Department of Economics and Finance Università Tor Vergata Outline 1 Distribution Functions Quantiles and Modes of a Distribution 2 Example 3 Example 4 Distributions Outline Distribution Functions

More information

Common probability distributionsi Math 217 Probability and Statistics Prof. D. Joyce, Fall 2014

Common probability distributionsi Math 217 Probability and Statistics Prof. D. Joyce, Fall 2014 Introduction. ommon probability distributionsi Math 7 Probability and Statistics Prof. D. Joyce, Fall 04 I summarize here some of the more common distributions used in probability and statistics. Some

More information

Basics on Probability. Jingrui He 09/11/2007

Basics on Probability. Jingrui He 09/11/2007 Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability

More information

Continuous random variables

Continuous random variables Continuous random variables Continuous r.v. s take an uncountably infinite number of possible values. Examples: Heights of people Weights of apples Diameters of bolts Life lengths of light-bulbs We cannot

More information

1 Random Variable: Topics

1 Random Variable: Topics Note: Handouts DO NOT replace the book. In most cases, they only provide a guideline on topics and an intuitive feel. 1 Random Variable: Topics Chap 2, 2.1-2.4 and Chap 3, 3.1-3.3 What is a random variable?

More information

1 Review of Probability and Distributions

1 Review of Probability and Distributions Random variables. A numerically valued function X of an outcome ω from a sample space Ω X : Ω R : ω X(ω) is called a random variable (r.v.), and usually determined by an experiment. We conventionally denote

More information

Week 1 Quantitative Analysis of Financial Markets Distributions A

Week 1 Quantitative Analysis of Financial Markets Distributions A Week 1 Quantitative Analysis of Financial Markets Distributions A Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 October

More information

Review of Probabilities and Basic Statistics

Review of Probabilities and Basic Statistics Alex Smola Barnabas Poczos TA: Ina Fiterau 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1 Overview Introduction to

More information

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying

More information

3 Continuous Random Variables

3 Continuous Random Variables Jinguo Lian Math437 Notes January 15, 016 3 Continuous Random Variables Remember that discrete random variables can take only a countable number of possible values. On the other hand, a continuous random

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

1: PROBABILITY REVIEW

1: PROBABILITY REVIEW 1: PROBABILITY REVIEW Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 1: Probability Review 1 / 56 Outline We will review the following

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear

More information

Class 26: review for final exam 18.05, Spring 2014

Class 26: review for final exam 18.05, Spring 2014 Probability Class 26: review for final eam 8.05, Spring 204 Counting Sets Inclusion-eclusion principle Rule of product (multiplication rule) Permutation and combinations Basics Outcome, sample space, event

More information

Name: Firas Rassoul-Agha

Name: Firas Rassoul-Agha Midterm 1 - Math 5010 - Spring 016 Name: Firas Rassoul-Agha Solve the following 4 problems. You have to clearly explain your solution. The answer carries no points. Only the work does. CALCULATORS ARE

More information

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning. Chris Cremer September 2015 Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

More information

System Identification

System Identification System Identification Arun K. Tangirala Department of Chemical Engineering IIT Madras July 27, 2013 Module 3 Lecture 1 Arun K. Tangirala System Identification July 27, 2013 1 Objectives of this Module

More information

Lecture 6 Basic Probability

Lecture 6 Basic Probability Lecture 6: Basic Probability 1 of 17 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 6 Basic Probability Probability spaces A mathematical setup behind a probabilistic

More information

Random Variables. Definition: A random variable (r.v.) X on the probability space (Ω, F, P) is a mapping

Random Variables. Definition: A random variable (r.v.) X on the probability space (Ω, F, P) is a mapping Random Variables Example: We roll a fair die 6 times. Suppose we are interested in the number of 5 s in the 6 rolls. Let X = number of 5 s. Then X could be 0, 1, 2, 3, 4, 5, 6. X = 0 corresponds to the

More information

Sample Spaces, Random Variables

Sample Spaces, Random Variables Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted

More information

Contents 1. Contents

Contents 1. Contents Contents 1 Contents 6 Distributions of Functions of Random Variables 2 6.1 Transformation of Discrete r.v.s............. 3 6.2 Method of Distribution Functions............. 6 6.3 Method of Transformations................

More information

Continuous Probability Spaces

Continuous Probability Spaces Continuous Probability Spaces Ω is not countable. Outcomes can be any real number or part of an interval of R, e.g. heights, weights and lifetimes. Can not assign probabilities to each outcome and add

More information

Review of probabilities

Review of probabilities CS 1675 Introduction to Machine Learning Lecture 5 Density estimation Milos Hauskrecht milos@pitt.edu 5329 Sennott Square Review of probabilities 1 robability theory Studies and describes random processes

More information

Lecture 3. Discrete Random Variables

Lecture 3. Discrete Random Variables Math 408 - Mathematical Statistics Lecture 3. Discrete Random Variables January 23, 2013 Konstantin Zuev (USC) Math 408, Lecture 3 January 23, 2013 1 / 14 Agenda Random Variable: Motivation and Definition

More information

X 1 ((, a]) = {ω Ω : X(ω) a} F, which leads us to the following definition:

X 1 ((, a]) = {ω Ω : X(ω) a} F, which leads us to the following definition: nna Janicka Probability Calculus 08/09 Lecture 4. Real-valued Random Variables We already know how to describe the results of a random experiment in terms of a formal mathematical construction, i.e. the

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

STAT2201. Analysis of Engineering & Scientific Data. Unit 3 STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of Queensland School of Mathematics and Physics What we learned in Unit 2 (1) We defined a sample space of a random

More information

LIST OF FORMULAS FOR STK1100 AND STK1110

LIST OF FORMULAS FOR STK1100 AND STK1110 LIST OF FORMULAS FOR STK1100 AND STK1110 (Version of 11. November 2015) 1. Probability Let A, B, A 1, A 2,..., B 1, B 2,... be events, that is, subsets of a sample space Ω. a) Axioms: A probability function

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued Chapter 3 sections Chapter 3 - continued 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.3 The Cumulative Distribution Function 3.4 Bivariate Distributions 3.5 Marginal Distributions

More information

Lecture 2: Review of Probability

Lecture 2: Review of Probability Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................

More information

Math Review Sheet, Fall 2008

Math Review Sheet, Fall 2008 1 Descriptive Statistics Math 3070-5 Review Sheet, Fall 2008 First we need to know about the relationship among Population Samples Objects The distribution of the population can be given in one of the

More information

Statistics 1B. Statistics 1B 1 (1 1)

Statistics 1B. Statistics 1B 1 (1 1) 0. Statistics 1B Statistics 1B 1 (1 1) 0. Lecture 1. Introduction and probability review Lecture 1. Introduction and probability review 2 (1 1) 1. Introduction and probability review 1.1. What is Statistics?

More information

Lecture 1: Probability Fundamentals

Lecture 1: Probability Fundamentals Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability

More information

6.1 Moment Generating and Characteristic Functions

6.1 Moment Generating and Characteristic Functions Chapter 6 Limit Theorems The power statistics can mostly be seen when there is a large collection of data points and we are interested in understanding the macro state of the system, e.g., the average,

More information

Review of probability

Review of probability Review of probability Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts definition of probability random variables

More information

Some Special Distributions (Hogg Chapter Three)

Some Special Distributions (Hogg Chapter Three) Some Special Distributions Hogg Chapter Three STAT 45-: Mathematical Statistics I Fall Semester 5 Contents Binomial & Related Distributions. The Binomial Distribution............... Some advice on calculating

More information

Example 1. The sample space of an experiment where we flip a pair of coins is denoted by:

Example 1. The sample space of an experiment where we flip a pair of coins is denoted by: Chapter 8 Probability 8. Preliminaries Definition (Sample Space). A Sample Space, Ω, is the set of all possible outcomes of an experiment. Such a sample space is considered discrete if Ω has finite cardinality.

More information

1 Introduction. P (n = 1 red ball drawn) =

1 Introduction. P (n = 1 red ball drawn) = Introduction Exercises and outline solutions. Y has a pack of 4 cards (Ace and Queen of clubs, Ace and Queen of Hearts) from which he deals a random of selection 2 to player X. What is the probability

More information

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018 15-388/688 - Practical Data Science: Basic probability J. Zico Kolter Carnegie Mellon University Spring 2018 1 Announcements Logistics of next few lectures Final project released, proposals/groups due

More information

Probability Distributions Columns (a) through (d)

Probability Distributions Columns (a) through (d) Discrete Probability Distributions Columns (a) through (d) Probability Mass Distribution Description Notes Notation or Density Function --------------------(PMF or PDF)-------------------- (a) (b) (c)

More information

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable Distributions of Functions of Random Variables 5.1 Functions of One Random Variable 5.2 Transformations of Two Random Variables 5.3 Several Random Variables 5.4 The Moment-Generating Function Technique

More information

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems

More information

Multivariate random variables

Multivariate random variables DS-GA 002 Lecture notes 3 Fall 206 Introduction Multivariate random variables Probabilistic models usually include multiple uncertain numerical quantities. In this section we develop tools to characterize

More information

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities PCMI 207 - Introduction to Random Matrix Theory Handout #2 06.27.207 REVIEW OF PROBABILITY THEORY Chapter - Events and Their Probabilities.. Events as Sets Definition (σ-field). A collection F of subsets

More information

3. Probability and Statistics

3. Probability and Statistics FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important

More information

The Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1

The Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1 The Exciting Guide To Probability Distributions Part 2 Jamie Frost v. Contents Part 2 A revisit of the multinomial distribution The Dirichlet Distribution The Beta Distribution Conjugate Priors The Gamma

More information

Chapter 1. Probability, Random Variables and Expectations. 1.1 Axiomatic Probability

Chapter 1. Probability, Random Variables and Expectations. 1.1 Axiomatic Probability Chapter 1 Probability, Random Variables and Expectations Note: The primary reference for these notes is Mittelhammer (1999. Other treatments of probability theory include Gallant (1997, Casella & Berger

More information

APM 504: Probability Notes. Jay Taylor Spring Jay Taylor (ASU) APM 504 Fall / 65

APM 504: Probability Notes. Jay Taylor Spring Jay Taylor (ASU) APM 504 Fall / 65 APM 504: Probability Notes Jay Taylor Spring 2015 Jay Taylor (ASU) APM 504 Fall 2013 1 / 65 Outline Outline 1 Probability and Uncertainty 2 Random Variables Discrete Distributions Continuous Distributions

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Lorenzo Rosasco 9.520 Class 18 April 11, 2011 About this class Goal To give an overview of some of the basic concepts in Bayesian Nonparametrics. In particular, to discuss Dirichelet

More information

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom Central Limit Theorem and the Law of Large Numbers Class 6, 8.5 Jeremy Orloff and Jonathan Bloom Learning Goals. Understand the statement of the law of large numbers. 2. Understand the statement of the

More information

15 Discrete Distributions

15 Discrete Distributions Lecture Note 6 Special Distributions (Discrete and Continuous) MIT 4.30 Spring 006 Herman Bennett 5 Discrete Distributions We have already seen the binomial distribution and the uniform distribution. 5.

More information

Set Theory Digression

Set Theory Digression 1 Introduction to Probability 1.1 Basic Rules of Probability Set Theory Digression A set is defined as any collection of objects, which are called points or elements. The biggest possible collection of

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Probability- the good parts version. I. Random variables and their distributions; continuous random variables.

Probability- the good parts version. I. Random variables and their distributions; continuous random variables. Probability- the good arts version I. Random variables and their distributions; continuous random variables. A random variable (r.v) X is continuous if its distribution is given by a robability density

More information

Review: mostly probability and some statistics

Review: mostly probability and some statistics Review: mostly probability and some statistics C2 1 Content robability (should know already) Axioms and properties Conditional probability and independence Law of Total probability and Bayes theorem Random

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

18.175: Lecture 3 Integration

18.175: Lecture 3 Integration 18.175: Lecture 3 Scott Sheffield MIT Outline Outline Recall definitions Probability space is triple (Ω, F, P) where Ω is sample space, F is set of events (the σ-algebra) and P : F [0, 1] is the probability

More information