Probability Background
|
|
- Whitney Cobb
- 5 years ago
- Views:
Transcription
1 CS76 Spring 0 Advanced Machine Learning robability Background Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu robability Meure A sample space Ω is the set of all possible outcomes. Elements ω Ω are called sample outcomes, while subsets A Ω are called events. For example, for a die roll, Ω = {,,, 4, 5, 6}, ω = 5 is an outcome, A = {5} is the event that the outcome is 5, and A = {,, 5} is the event that the outcome is odd. Ideally, one would like to be able to sign probabilities to all As. This is trivial for finite Ω. However, when Ω = R strange things happen: it is no longer possible to consistently sign probabilities to all subsets of R. The problem is not singleton sets such A = {π}, which happily h probability 0 (we will define it formally). This happens because of the existence of non-meurable sets, whose construction is non-trivial. See for example http: // www. math. kth. se/ matstat/ gru/ godis/ nonme. pdf Instead, we will restrict ourselves to only some of the events. A σ-algebra B is a set of Ω subsets satisfying:. B. if A B then A c B (complementarity). if A, A,... B then i= A i B (countable unions) The sets in B are called meurable. The pair (Ω, B) is called a meurable space. For Ω = R, we take the smallest σ-algebra that contains all the open subsets and call it the Borel σ-algebra. It turns out the Borel σ-algebra can be defined alternatively the smallest σ-algebra that contains all the closed subsets. Can you see why? It also follows that singleton sets {x} where x R is in the Borel σ-algebra. A meure is a function : B R satisfying:. (A) 0 for all A B. If A, A,... are disjoint then ( i= A i) = i= (A i) Note these imply ( ) = 0. The triple (Ω, B, ) is called a meure space. The Lebesgue meure is a uniform meure over the Borel σ-algebra with the usual meaning of length, area or volume, depending on dimensionality. For example, for R the Lebesgue meure of the interval (a, b) is b a. A probability meure is a meure satisfying additionally the normalization constraint: (Ω) =. Such a triple (Ω, B, ) is called a probability space. When Ω is finite, ({ω}) h the intuitive meaning: the chance that the outcome is ω. When Ω = R, ((a, b)) is the probability ms signed to the interval (a, b).
2 robability Background Random Variables Let (Ω, B, ) be a probability space. Let (R, R) be the usual meurable space of reals and its Borel σ- algebra. A random variable is a function X : Ω R such that the preimage of any set A R is meurable in B: X (A) = {ω : X(ω) A} B. This allows us to define the following (the first is the new definition, while the nd and rd s are the already-defined probability meure on B): (X A) = (X (A)) = ({ω : X(ω) A}) () (X = x) = (X (x)) = ({ω : X(ω) = x}) () A random variable is a function. Intuitively, the world generates random outcomes ω, while a random variable deterministically translates them into numbers. Example Let Ω = {(x, y) : x + y } be the unit disk. Consider drawing a point at random from Ω. The outcome is of the form ω = (x, y). Some example random variables are X(ω) = x, Y (ω) = y, Z(ω) = xy, and W (ω) = x + y. Example Let X = ω be a uniform random variable (to be defined later) with the sample space Ω = [0, ]. A sequence of different random variables {X n } n= can be defined follows, where {z} = if z is true, and 0 otherwise: X (ω) = ω + {ω [0, ]} () X (ω) = ω + {ω [0, ]} (4) X (ω) = ω + {ω [, ]} (5) X 4 (ω) = ω + {ω [0, ]} (6) X 5 (ω) = ω + {ω [, ]} (7) X 6 (ω) = ω + {ω [, ]} (8)... Given a random variable X, the cumulative distribution function (CDF) is the function F X : R [0, ] F X (x) = (X x) = (X (, x]) = ({ω : X(ω) (, x]}). (9) A random variable X is discrete if it takes countably many values. We define the probability ms function f X (x) = (X = x). A random variable X is continuous if there exists a function f X such that. f X (x) 0 for all x R. f X(x)dx =. for every a b, (X [a, b]) = b a f X(x)dx. The function f X is called the probability density function (DF) of X. CDF and DF are related by F X (x) = x f X (t)dt (0) f X (x) = F X(x) at all points x at which F X is differentiable. () It is certainly possible for the DF of a continuous random variable to be larger than one: f X (x). In fact, the DF can be unbounded in x / for x (0, ) and 0 otherwise. However, (x) = 0 for all x. Recall that and f X is related by integration over an interval. We will often use p instead of f X to denote a DF later in cls.
3 robability Background Some Random Variables. Discrete Dirac or point ms distribution X δ a if (X = a) = with CDF F (x) = 0 if x < a and if x a. Binomial. A random variable X h a binomial distribution with parameters n (number of trials) and p (head probability) X Binomial(n, p) with probability ms function ( ) n p x x ( p) n x for x = 0,,..., n () 0 otherwise If X Binomial(n, p) and X Binomial(n, p) then X + X Binomial(n + n, p). Think this merging two coin flip experiments on the same coin. The binomial distribution is used to model test set error of a clsifier. Assuming a clsifier s true error rate is p (with respect to the unknown underlying joint distribution we will make it precise later in cls), then on a test set of size n the number of misclsified items follow Binomial(n, p). Bernoulli. Binomial with n =. Multinomial. The d-dimensional version of binomial. The parameter p = (p,..., p d ) is now the probabilities of a d-sided die, and x = (x,..., x d ) is the counts of each face. ( ) n d x,..., x k= px k k if d k= x k = n d () 0 otherwise The multinomial distribution is typically used with the bag-of-word representation of text documents. λ λx oisson. X oisson(λ) if e x! for x = 0,,,.... If X oisson(λ ) and X oisson(λ ) then X + X oisson(λ + λ ). This is a distribution on unbounded counts with a probability ms function hump (mode not the mean at λ ). It can be used, for example, to model the length of a document. Geometric. X Geom(p) if p( p) x for x =,,... This is another distribution on unbounded counts. Its probability ms function h no hump (mode not the mean at ) and decrees monotonically. Think of it a stick breaking procedure: start with a stick of length. Each day you take away p fraction of the remaining stick. Then f(x) is the length you get on day x. We will see more interesting stick breaking in Bayesian nonparametrics.. Continuous Gaussian (also called Normal distribution). X N(µ, σ ) with parameters µ R (the mean) and σ (the variance) if ) ( σ π exp (x µ) σ. (4) The square root of variance σ > 0 is called the standard deviation. If µ = 0, σ =, X h a standard normal distribution. In this ce, X is usually written Z. Some useful properties: (Scaling) If X N(µ, σ ), then Z = (X µ)/σ N(0, ) (Independent sum) If X i N(µ i, σi ) are independent, then i X i N ( i µ i, ) i σ i Note there is concentration of meure in the independent sum: the spread or stddev grows n, not n.
4 robability Background 4 This is one DF you need to pay close attention to. The CDF of a standard normal is usually written Φ(z), which h no closed-form expression. However, it qualitatively resembles a sigmoid function and is often used in translating unbounded real margin values into probabilities of binary clses. You might recognize the RBF kernel an unnormalized version of the Gaussian DF. The parameter σ or confusingly σ or a scaled version of either is called the bandwidth. χ distribution. If Z,..., Z k are independent standard normal random variables, then Y = k i Z i h a χ distribution with k degrees of freedom. If X i N(µ i, σi ) are independent, then ( ) Xi µ i i σ i h a χ distribution with k degrees of freedom. The DF for the χ distribution with k degrees of freedom is k/ Γ(k/) xk/ e x/, x > 0. (5) Multivariate Gaussian. Let x, µ R d, Σ S+ d a symmetric, positive definite matrix of size d d. Then X N(µ, Σ) with DF ( exp ) Σ / (π) d/ (x µ) Σ (x µ). (6) Here, µ is the mean vector, Σ is the covariance matrix, Σ its determinant, and Σ its inverse (exists because Σ is positive definite). If we have two (groups of) variables that are jointly Gaussian: then we have: (Marginal) x N(µ x, A) [ ] x y N ([ µx µ y ] [ ]) A C, C B (Conditional) y x N(µ y + C A (x µ x ), B C A C) You want to pay attention to Multivariate Gaussian too. It is used frequently in mixture models (Mixture of Gaussian), and building blocks for Gaussian rocesses. The conditional is useful for inference. Exponential. X Exp(β) with parameter β > 0, if β e x/β. Not to be confused with the exponential family (more later). Gamma. The Gamma function (not distribution) is defined Γ(α) = 0 y α e y dy with α > 0. The Gamma function is an extension to the factorial function: Γ(n) = (n )! when n is a positive integer. It also satisfies Γ(α + ) = αγ(α) for α > 0. X h a Gamma distribution with shape parameter α > 0 and scale parameter β > 0, denoted by X Gamma(α, β), if Gamma(, β) is the same Exp(β). Beta. X Beta(α, β) with parameters α, β > 0, if (7) β α Γ(α) xα e x/β, x > 0. (8) Γ(α + β) Γ(α)Γ(β) xα ( x) β, x (0, ). (9) Beta(, ) is uniform in [0, ]. Beta(α <, β < ) h a U-shape. Beta(α >, β > ) is unimodal with mean α/(α + β) and mode (α )/(α + β ). Beta distribution h a special status in machine learning because it is the conjugate prior of the binomial and Bernoulli distributions. A draw from a beta distribution can be thought of generating a (bied) coin. A draw from the corresponding Bernoulli distribution can be thought of a flip of that coin.
5 robability Background 5 Dirichlet. The Dirichlet distribution is the multivariate version of the beta distribution. X Dir(α,..., α d ) with parameters α i > 0, if Γ( d i α i) d d i Γ(α x αi i, (0) i) where x = (x,..., x d ) with x i > 0, d i x i =. This support is called the open (d ) dimensional simplex. The Dirichlet distribution is important for machine learning because it is the conjugate prior for multinomial distributions. The analogy here is that of a dice factory (Dirichlet) and die rolls (multinomial). It is used extensively for modeling bag-of-word documents. We will also see it in Dirichlet rocesses. t (or Student s t) Distribution X t ν h a t distribution with ν degrees of freedom, if Γ ( ) ν+ ( νπγ ν ) i ) (ν+)/ ( + x () ν It is similar to a Gaussian distribution, but with heavier tails (more likely to produce extreme values). When ν =, it becomes a standard Gaussian distribution. Student is the pen name of William Gosset who worked for Guinness, because the brewery did not want its employees to publish any scientific papers for the fear of leaking trade secret (sounds familiar?). Cauchy. The Cauchy distribution is a special ce of t-distribution with ν =. The DF is ( πγ + ( ) ) () x x 0 γ where x 0 is the location parameter for the mode, and γ is the scale parameter. It is notable for the lack of mean: E(X) does not exist because x x df X(x) =. Similarly, it h no variance or higher moments. However, the mode and median are both x 0. This is a very heavy-tailed distribution compared to Gaussian, so much so that draws from Cauchy will occionally produce very large values and the sample mean never settles. 4 Convergence of Random Variables Let X, X,..., X n,... be a sequence of random variables, and X be another random variable. Think of X n the clsifier trained on a training set of size n. It is a random variable because the training set is random (an iid sample of the underlying unknown joint distribution XY ). Think of X the true clsifier (well, it is a fixed quantity, not random, but that s fine). We are interested in how the clsifiers behave we have more and more training data: Do they get closer to X? What do we mean by close? These questions are made precise by the different types of convergence. Study of such topics are called large sample theory or ymptotic theory. X n converges to X in distribution, written X n X, if lim F X n (t) = F (t) () at all t where F is continuous. Here, F Xn is the CDF of X n, and F is the CDF of X. We expect to see the next outcome in a sequence of random experiments becoming better and better modeled by the probability distribution of X. In other words, the probability for X n to be in a given interval is approximately equal to the probability for X to be in the same interval, n grows. Example Let X,..., X n be iid continuous random variables. Then trivially X n X. But note (X = X n ) = 0.
6 robability Background 6 It is their distributions that are the same, not their values which are controlled by different randomness. Example 4 Let X,..., X n be identical (but not independent) copies of X N(0, ). Then X n Y = X. Example 5 Let X n uniform[0, /n]. Then X n δ 0. This is often written X n 0. Interestingly, note F (0) = but F Xn (0) = 0 for all n, so lim F Xn (0) F (0). This does not contradict the definition of convergence in distribution, because t = 0 is not a point at which F is continuous. Example 6 Let X n h the DF f n (x) = ( cos(πnx)), x (0, ). Then X n uniform(0, ). Note the DFs f n s do not converge at all, but the CDFs do. Theorem (The Central Limit Theorem). Let X,..., X n be iid with finite mean µ and finite variance σ > 0. Let X n = n n i X i. Then Z n = X n µ σ/ N(0, ). (4) n Example 7 Let X i uniform(, ) for i =,... Note µ = 0, σ = /. Then Z n = n i X i N(0, ). Example 8 Let X i beta(, ). Note µ =, σ = 8. Then Z n = 8n ( n i X i / ) N(0, ). The Central Limit Theorem is the most widely used application of convergence in distribution. Demo: CLT uniform.m and CLT beta.m Does the CLT apply to the Cauchy distribution? X n converges to X in probability, written X n X, if for any ɛ > 0 lim ( X n X > ɛ) = 0. (5) Convergence in probability is central to machine learning because a very important concept, consistency, is defined using it. Convergence in probability implies convergence in distribution. The reverse is not true in general. However, convergence in distribution to a point ms distribution implies convergence in probability. Example, Example 4, and Example 6 do not converge in probability. Example 5 does. The definition might be eier to understand if we rewrite it lim ({ω : X n(ω) X(ω) > ɛ}) = 0. That is, the fraction of outcomes ω on which X n and X disagree must shrink to zero. When X n (ω) and X(ω) do disagree, they can differ by a lot in value. More importantly, note that X n (ω) does not need to converge to X(ω) pointwise for any ω. This will be the distinguishing property between converge in probability and converge almost surely (to be defined shortly). Example 9 X n X in Example. X n converges almost surely to X, written X n X, if ({ }) ω : lim X n(ω) = X(ω) = (6)
7 robability Background 7 Example 0 Example does not converge almost surely to X there. This is because lim X n (ω) does not exist for any ω. ointwise convergence is central to. implies, which in turn implies. Example X n 0 (and hence X n 0 and Xn 0) does not imply convergence in expectation E(X n ) 0. To see this, let { /n, if Xn = n (X n ) = (7) /n, if X n = 0 Then X n 0. However, E(X n ) = n n = n does not converge. X n converges in rth mean where r, written X n L r X, if L r implies. lim E( X n X r ) = 0. (8) Lr implies, Ls if r > s. There is no general order between Lr and. Theorem (The Weak Law of Large Numbers). If X,..., X n are iid, then X n = n n i= X i µ(x ). Theorem (The Strong Law of Large Numbers). If X,..., X n are iid, and E( X ) <, then X n = n n i= X i µ(x ).
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions
More informationLecture 1: August 28
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random
More informationAlgorithms for Uncertainty Quantification
Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example
More informationLecture 2: Repetition of probability theory and statistics
Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:
More informationReview of Probability Theory
Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving
More informationQuick Tour of Basic Probability Theory and Linear Algebra
Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions
More informationContinuous Random Variables
Continuous Random Variables Recall: For discrete random variables, only a finite or countably infinite number of possible values with positive probability. Often, there is interest in random variables
More informationPart IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More information1.1 Review of Probability Theory
1.1 Review of Probability Theory Angela Peace Biomathemtics II MATH 5355 Spring 2017 Lecture notes follow: Allen, Linda JS. An introduction to stochastic processes with applications to biology. CRC Press,
More informationStat 5101 Notes: Brand Name Distributions
Stat 5101 Notes: Brand Name Distributions Charles J. Geyer February 14, 2003 1 Discrete Uniform Distribution DiscreteUniform(n). Discrete. Rationale Equally likely outcomes. The interval 1, 2,..., n of
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationChapter 5. Chapter 5 sections
1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationLanguage as a Stochastic Process
CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any
More informationLecture 4. Continuous Random Variables and Transformations of Random Variables
Math 408 - Mathematical Statistics Lecture 4. Continuous Random Variables and Transformations of Random Variables January 25, 2013 Konstantin Zuev (USC) Math 408, Lecture 4 January 25, 2013 1 / 13 Agenda
More informationChapter 2: Random Variables
ECE54: Stochastic Signals and Systems Fall 28 Lecture 2 - September 3, 28 Dr. Salim El Rouayheb Scribe: Peiwen Tian, Lu Liu, Ghadir Ayache Chapter 2: Random Variables Example. Tossing a fair coin twice:
More informationStatistics for scientists and engineers
Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3
More informationRandom variables. DS GA 1002 Probability and Statistics for Data Science.
Random variables DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Motivation Random variables model numerical quantities
More informationWhy study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables
ECE 6010 Lecture 1 Introduction; Review of Random Variables Readings from G&S: Chapter 1. Section 2.1, Section 2.3, Section 2.4, Section 3.1, Section 3.2, Section 3.5, Section 4.1, Section 4.2, Section
More information7 Random samples and sampling distributions
7 Random samples and sampling distributions 7.1 Introduction - random samples We will use the term experiment in a very general way to refer to some process, procedure or natural phenomena that produces
More information1 Probability theory. 2 Random variables and probability theory.
Probability theory Here we summarize some of the probability theory we need. If this is totally unfamiliar to you, you should look at one of the sources given in the readings. In essence, for the major
More informationMFM Practitioner Module: Quantitative Risk Management. John Dodson. September 23, 2015
MFM Practitioner Module: Quantitative Risk Management September 23, 2015 Mixtures Mixtures Mixtures Definitions For our purposes, A random variable is a quantity whose value is not known to us right now
More informationStatistical Methods in Particle Physics
Statistical Methods in Particle Physics Lecture 3 October 29, 2012 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline Reminder: Probability density function Cumulative
More informationChapter 5 continued. Chapter 5 sections
Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationProbability and Distributions
Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated
More informationStat 5101 Notes: Brand Name Distributions
Stat 5101 Notes: Brand Name Distributions Charles J. Geyer September 5, 2012 Contents 1 Discrete Uniform Distribution 2 2 General Discrete Uniform Distribution 2 3 Uniform Distribution 3 4 General Uniform
More informationRecitation 2: Probability
Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions
More informationThis exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.
TEST #3 STA 5326 December 4, 214 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. (You will have access to
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More information1 Presessional Probability
1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional
More informationLecture 2: Conjugate priors
(Spring ʼ) Lecture : Conjugate priors Julia Hockenmaier juliahmr@illinois.edu Siebel Center http://www.cs.uiuc.edu/class/sp/cs98jhm The binomial distribution If p is the probability of heads, the probability
More informationMachine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014
Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,
More informationChapter 1. Sets and probability. 1.3 Probability space
Random processes - Chapter 1. Sets and probability 1 Random processes Chapter 1. Sets and probability 1.3 Probability space 1.3 Probability space Random processes - Chapter 1. Sets and probability 2 Probability
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationDS-GA 1002 Lecture notes 2 Fall Random variables
DS-GA 12 Lecture notes 2 Fall 216 1 Introduction Random variables Random variables are a fundamental tool in probabilistic modeling. They allow us to model numerical quantities that are uncertain: the
More informationLecture 1. ABC of Probability
Math 408 - Mathematical Statistics Lecture 1. ABC of Probability January 16, 2013 Konstantin Zuev (USC) Math 408, Lecture 1 January 16, 2013 1 / 9 Agenda Sample Spaces Realizations, Events Axioms of Probability
More informationt x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.
Mathematical Statistics: Homewor problems General guideline. While woring outside the classroom, use any help you want, including people, computer algebra systems, Internet, and solution manuals, but mae
More informationProbability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014
Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions
More informationNorthwestern University Department of Electrical Engineering and Computer Science
Northwestern University Department of Electrical Engineering and Computer Science EECS 454: Modeling and Analysis of Communication Networks Spring 2008 Probability Review As discussed in Lecture 1, probability
More informationLecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019
Lecture 10: Probability distributions DANIEL WELLER TUESDAY, FEBRUARY 19, 2019 Agenda What is probability? (again) Describing probabilities (distributions) Understanding probabilities (expectation) Partial
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationBrief Review of Probability
Maura Department of Economics and Finance Università Tor Vergata Outline 1 Distribution Functions Quantiles and Modes of a Distribution 2 Example 3 Example 4 Distributions Outline Distribution Functions
More informationCommon probability distributionsi Math 217 Probability and Statistics Prof. D. Joyce, Fall 2014
Introduction. ommon probability distributionsi Math 7 Probability and Statistics Prof. D. Joyce, Fall 04 I summarize here some of the more common distributions used in probability and statistics. Some
More informationBasics on Probability. Jingrui He 09/11/2007
Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability
More informationContinuous random variables
Continuous random variables Continuous r.v. s take an uncountably infinite number of possible values. Examples: Heights of people Weights of apples Diameters of bolts Life lengths of light-bulbs We cannot
More information1 Random Variable: Topics
Note: Handouts DO NOT replace the book. In most cases, they only provide a guideline on topics and an intuitive feel. 1 Random Variable: Topics Chap 2, 2.1-2.4 and Chap 3, 3.1-3.3 What is a random variable?
More information1 Review of Probability and Distributions
Random variables. A numerically valued function X of an outcome ω from a sample space Ω X : Ω R : ω X(ω) is called a random variable (r.v.), and usually determined by an experiment. We conventionally denote
More informationWeek 1 Quantitative Analysis of Financial Markets Distributions A
Week 1 Quantitative Analysis of Financial Markets Distributions A Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 October
More informationReview of Probabilities and Basic Statistics
Alex Smola Barnabas Poczos TA: Ina Fiterau 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1 Overview Introduction to
More informationSummary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016
8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying
More information3 Continuous Random Variables
Jinguo Lian Math437 Notes January 15, 016 3 Continuous Random Variables Remember that discrete random variables can take only a countable number of possible values. On the other hand, a continuous random
More informationRandom Variables and Their Distributions
Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital
More informationJoint Probability Distributions and Random Samples (Devore Chapter Five)
Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete
More information1: PROBABILITY REVIEW
1: PROBABILITY REVIEW Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 1: Probability Review 1 / 56 Outline We will review the following
More informationThis does not cover everything on the final. Look at the posted practice problems for other topics.
Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry
More informationCourse: ESO-209 Home Work: 1 Instructor: Debasis Kundu
Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear
More informationClass 26: review for final exam 18.05, Spring 2014
Probability Class 26: review for final eam 8.05, Spring 204 Counting Sets Inclusion-eclusion principle Rule of product (multiplication rule) Permutation and combinations Basics Outcome, sample space, event
More informationName: Firas Rassoul-Agha
Midterm 1 - Math 5010 - Spring 016 Name: Firas Rassoul-Agha Solve the following 4 problems. You have to clearly explain your solution. The answer carries no points. Only the work does. CALCULATORS ARE
More informationProbability Theory for Machine Learning. Chris Cremer September 2015
Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares
More informationSystem Identification
System Identification Arun K. Tangirala Department of Chemical Engineering IIT Madras July 27, 2013 Module 3 Lecture 1 Arun K. Tangirala System Identification July 27, 2013 1 Objectives of this Module
More informationLecture 6 Basic Probability
Lecture 6: Basic Probability 1 of 17 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 6 Basic Probability Probability spaces A mathematical setup behind a probabilistic
More informationRandom Variables. Definition: A random variable (r.v.) X on the probability space (Ω, F, P) is a mapping
Random Variables Example: We roll a fair die 6 times. Suppose we are interested in the number of 5 s in the 6 rolls. Let X = number of 5 s. Then X could be 0, 1, 2, 3, 4, 5, 6. X = 0 corresponds to the
More informationSample Spaces, Random Variables
Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted
More informationContents 1. Contents
Contents 1 Contents 6 Distributions of Functions of Random Variables 2 6.1 Transformation of Discrete r.v.s............. 3 6.2 Method of Distribution Functions............. 6 6.3 Method of Transformations................
More informationContinuous Probability Spaces
Continuous Probability Spaces Ω is not countable. Outcomes can be any real number or part of an interval of R, e.g. heights, weights and lifetimes. Can not assign probabilities to each outcome and add
More informationReview of probabilities
CS 1675 Introduction to Machine Learning Lecture 5 Density estimation Milos Hauskrecht milos@pitt.edu 5329 Sennott Square Review of probabilities 1 robability theory Studies and describes random processes
More informationLecture 3. Discrete Random Variables
Math 408 - Mathematical Statistics Lecture 3. Discrete Random Variables January 23, 2013 Konstantin Zuev (USC) Math 408, Lecture 3 January 23, 2013 1 / 14 Agenda Random Variable: Motivation and Definition
More informationX 1 ((, a]) = {ω Ω : X(ω) a} F, which leads us to the following definition:
nna Janicka Probability Calculus 08/09 Lecture 4. Real-valued Random Variables We already know how to describe the results of a random experiment in terms of a formal mathematical construction, i.e. the
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationSTAT2201. Analysis of Engineering & Scientific Data. Unit 3
STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of Queensland School of Mathematics and Physics What we learned in Unit 2 (1) We defined a sample space of a random
More informationLIST OF FORMULAS FOR STK1100 AND STK1110
LIST OF FORMULAS FOR STK1100 AND STK1110 (Version of 11. November 2015) 1. Probability Let A, B, A 1, A 2,..., B 1, B 2,... be events, that is, subsets of a sample space Ω. a) Axioms: A probability function
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:
More informationChapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued
Chapter 3 sections Chapter 3 - continued 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.3 The Cumulative Distribution Function 3.4 Bivariate Distributions 3.5 Marginal Distributions
More informationLecture 2: Review of Probability
Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................
More informationMath Review Sheet, Fall 2008
1 Descriptive Statistics Math 3070-5 Review Sheet, Fall 2008 First we need to know about the relationship among Population Samples Objects The distribution of the population can be given in one of the
More informationStatistics 1B. Statistics 1B 1 (1 1)
0. Statistics 1B Statistics 1B 1 (1 1) 0. Lecture 1. Introduction and probability review Lecture 1. Introduction and probability review 2 (1 1) 1. Introduction and probability review 1.1. What is Statistics?
More informationLecture 1: Probability Fundamentals
Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability
More information6.1 Moment Generating and Characteristic Functions
Chapter 6 Limit Theorems The power statistics can mostly be seen when there is a large collection of data points and we are interested in understanding the macro state of the system, e.g., the average,
More informationReview of probability
Review of probability Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts definition of probability random variables
More informationSome Special Distributions (Hogg Chapter Three)
Some Special Distributions Hogg Chapter Three STAT 45-: Mathematical Statistics I Fall Semester 5 Contents Binomial & Related Distributions. The Binomial Distribution............... Some advice on calculating
More informationExample 1. The sample space of an experiment where we flip a pair of coins is denoted by:
Chapter 8 Probability 8. Preliminaries Definition (Sample Space). A Sample Space, Ω, is the set of all possible outcomes of an experiment. Such a sample space is considered discrete if Ω has finite cardinality.
More information1 Introduction. P (n = 1 red ball drawn) =
Introduction Exercises and outline solutions. Y has a pack of 4 cards (Ace and Queen of clubs, Ace and Queen of Hearts) from which he deals a random of selection 2 to player X. What is the probability
More information15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018
15-388/688 - Practical Data Science: Basic probability J. Zico Kolter Carnegie Mellon University Spring 2018 1 Announcements Logistics of next few lectures Final project released, proposals/groups due
More informationProbability Distributions Columns (a) through (d)
Discrete Probability Distributions Columns (a) through (d) Probability Mass Distribution Description Notes Notation or Density Function --------------------(PMF or PDF)-------------------- (a) (b) (c)
More informationDistributions of Functions of Random Variables. 5.1 Functions of One Random Variable
Distributions of Functions of Random Variables 5.1 Functions of One Random Variable 5.2 Transformations of Two Random Variables 5.3 Several Random Variables 5.4 The Moment-Generating Function Technique
More informationSUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)
SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems
More informationMultivariate random variables
DS-GA 002 Lecture notes 3 Fall 206 Introduction Multivariate random variables Probabilistic models usually include multiple uncertain numerical quantities. In this section we develop tools to characterize
More informationPCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities
PCMI 207 - Introduction to Random Matrix Theory Handout #2 06.27.207 REVIEW OF PROBABILITY THEORY Chapter - Events and Their Probabilities.. Events as Sets Definition (σ-field). A collection F of subsets
More information3. Probability and Statistics
FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important
More informationThe Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1
The Exciting Guide To Probability Distributions Part 2 Jamie Frost v. Contents Part 2 A revisit of the multinomial distribution The Dirichlet Distribution The Beta Distribution Conjugate Priors The Gamma
More informationChapter 1. Probability, Random Variables and Expectations. 1.1 Axiomatic Probability
Chapter 1 Probability, Random Variables and Expectations Note: The primary reference for these notes is Mittelhammer (1999. Other treatments of probability theory include Gallant (1997, Casella & Berger
More informationAPM 504: Probability Notes. Jay Taylor Spring Jay Taylor (ASU) APM 504 Fall / 65
APM 504: Probability Notes Jay Taylor Spring 2015 Jay Taylor (ASU) APM 504 Fall 2013 1 / 65 Outline Outline 1 Probability and Uncertainty 2 Random Variables Discrete Distributions Continuous Distributions
More informationBayesian Nonparametrics
Bayesian Nonparametrics Lorenzo Rosasco 9.520 Class 18 April 11, 2011 About this class Goal To give an overview of some of the basic concepts in Bayesian Nonparametrics. In particular, to discuss Dirichelet
More informationCentral Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom
Central Limit Theorem and the Law of Large Numbers Class 6, 8.5 Jeremy Orloff and Jonathan Bloom Learning Goals. Understand the statement of the law of large numbers. 2. Understand the statement of the
More information15 Discrete Distributions
Lecture Note 6 Special Distributions (Discrete and Continuous) MIT 4.30 Spring 006 Herman Bennett 5 Discrete Distributions We have already seen the binomial distribution and the uniform distribution. 5.
More informationSet Theory Digression
1 Introduction to Probability 1.1 Basic Rules of Probability Set Theory Digression A set is defined as any collection of objects, which are called points or elements. The biggest possible collection of
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationProbability- the good parts version. I. Random variables and their distributions; continuous random variables.
Probability- the good arts version I. Random variables and their distributions; continuous random variables. A random variable (r.v) X is continuous if its distribution is given by a robability density
More informationReview: mostly probability and some statistics
Review: mostly probability and some statistics C2 1 Content robability (should know already) Axioms and properties Conditional probability and independence Law of Total probability and Bayes theorem Random
More informationMAS223 Statistical Inference and Modelling Exercises
MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,
More information18.175: Lecture 3 Integration
18.175: Lecture 3 Scott Sheffield MIT Outline Outline Recall definitions Probability space is triple (Ω, F, P) where Ω is sample space, F is set of events (the σ-algebra) and P : F [0, 1] is the probability
More information