COMBINING VARIABLES ERROR PROPAGATION. What is the error on a quantity that is a function of several random variables

Similar documents
Preliminary statistics

Continuous Random Variables

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Spring 2012 Math 541B Exam 1

Lectures on Statistical Data Analysis

COPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition

Introduction to Statistical Methods for High Energy Physics

Intro to Probability. Andrei Barbu

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Multiple Random Variables

Lectures on Statistics. William G. Faris

Statistics, Data Analysis, and Simulation SS 2015

Probability Distributions Columns (a) through (d)

Chapter 5 continued. Chapter 5 sections

2 Random Variable Generation

Recitation 2: Probability

Fundamentals of Digital Commun. Ch. 4: Random Variables and Random Processes

Two hours. Statistical Tables to be provided THE UNIVERSITY OF MANCHESTER. 14 January :45 11:45

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Review of Probabilities and Basic Statistics

The Jeffreys Prior. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) The Jeffreys Prior MATH / 13

conditional cdf, conditional pdf, total probability theorem?

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.

Test Problems for Probability Theory ,

AMCS243/CS243/EE243 Probability and Statistics. Fall Final Exam: Sunday Dec. 8, 3:00pm- 5:50pm VERSION A

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Systems Simulation Chapter 7: Random-Number Generation

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

YETI IPPP Durham

Probability Theory and Statistics. Peter Jochumzen

functions Poisson distribution Normal distribution Arbitrary functions

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours

1 Hypothesis Testing and Model Selection

Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit

Contents 1. Contents

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Bayesian Learning (II)

1 Probabilities. 1.1 Basics 1 PROBABILITIES

STAT 418: Probability and Stochastic Processes

Introduction to Machine Learning

Principles of Bayesian Inference

Problem Y is an exponential random variable with parameter λ = 0.2. Given the event A = {Y < 2},

Math Review Sheet, Fall 2008

Introduction to Probability and Statistics (Continued)

EE4601 Communication Systems

1 Probability and Random Variables

ECE 302 Division 2 Exam 2 Solutions, 11/4/2009.

BERTINORO 2 (JVW) Yet more probability Bayes' Theorem* Monte Carlo! *The Reverend Thomas Bayes

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Chapter 5. Chapter 5 sections

Independent Events. Two events are independent if knowing that one occurs does not change the probability of the other occurring

Chapter 3 Common Families of Distributions

EXPONENTIAL SUMS EQUIDISTRIBUTION

Principles of Bayesian Inference

Pattern Recognition. Parameter Estimation of Probability Density Functions

Review for the previous lecture

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Statistics, Data Analysis, and Simulation SS 2013

Foundations of Statistical Inference

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

STA205 Probability: Week 8 R. Wolpert

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

1 Presessional Probability

Lecture 2: From Linear Regression to Kalman Filter and Beyond

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

Bayesian Regression Linear and Logistic Regression

STATISTICS SYLLABUS UNIT I

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

CS145: Probability & Computing

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

STAT 414: Introduction to Probability Theory

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Principles of Bayesian Inference

Chapter 6: Functions of Random Variables

Principles of Bayesian Inference

Random Number Generation. CS1538: Introduction to simulations

STATISTICS ANCILLARY SYLLABUS. (W.E.F. the session ) Semester Paper Code Marks Credits Topic

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.

STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Bayesian Methods: Naïve Bayes

Lecture 21: Convergence of transformations and generating a random variable

Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University

CPSC 531: Random Numbers. Jonathan Hudson Department of Computer Science University of Calgary

Practical Statistics

15 Discrete Distributions

Slides 3: Random Numbers

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Introduction to Bayesian Statistics. James Swain University of Alabama in Huntsville ISEEM Department

Probability and Stochastic Processes

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

RWTH Aachen Graduiertenkolleg

Transcription:

COMBINING VARIABLES ERROR PROPAGATION What is the error on a quantity that is a function of several random variables θ = f(x, y,...) If the variance on x, y,... is small and uncorrelated variables then var(θ) = ( f/ x) 2 var(x) +... Usually not the case problem Why: define z = f(x, y); w = y; x = g(z, w) The joint PDF of w, z given PDFs P 1 (x), P 2 (y) is (x, y) P (w, z)dwdz = P 1 (x)p 2 (y)dxdy = P 1 (g(z, w))p 2 (w) (w, z) dwdz P (z) = P (w, z)dw eg. z = x + y P (z) = P 1 (z w)p 2 (w)dw convolution eg. z = x/y P (z) = P 1 (wz)p 2 (w)wdw In this latter case suppose P 1 = N(0, σ 2 1); P 2 = N(0, σ 2 2) P (z) = 1 π σ 1 /σ 2 σ 2 1/σ 2 2 + z 2 Cauchy distribution

CENTRAL LIMIT THEOREM - SIMPLE PROOF Consider a series of random variables {x i ; i = 1, 2,...n}, identically distributed with mean µ and variance σ 2. Define S = n i=1 x i and µ s = nµ; σ 2 s = nσ 2, then U = S µ s σ 2 s = n i=1 (x i µ) nσ 2 Consider the characteristic functions of U, φ u (t), and the ith term, φ i (t), in the summation, then As n then φ i (t) = φ(t) = 1 + φ u (t) = φ u (t) = n φ i (t) r=1 i=1 µ (it) r r r! 1 t2 2n + O( 1 φ u (t) = e t2 /2 = 1 + µ 1 + µ (it) 2 2 2! n n 3/2) which is the Fourier transform of an N(0,1) distribution The implies than S is distributed as N(µ s, σ 2 s). +... Can generalise to arbitrary distributions but it does not hold for distributions that lack a 1st or 2nd moment, cf. Cauchy distribution.

Figure 1: Example of CLT using U[-0.5,0.5]: red PDF n=1; green PDF n=2; blue PDF n=4; black PDF n=12 generated distribution+gaussian equivalent overlaid.

Generating Random Numbers The fundamental computer-generated random number is from a uniform distribution U(0, 1) Gaussian distributions and the rest are derived from it. These are extensively used in simulations, random sampling, testing algorithms, Monte Carlo methods and so on. U n = X n /m eg. m = 2 32 Linear congruential method (Lehmer - 1949) X n+1 = (a X n + c) mod(m) m modulus, a multiplier, c increment, X 0 starting value (seed) usually a large odd number. Recursive since X n+1 = f(x n ) and hence periodic and therefore the useful range of the cycle is < m t in t-dimensions since max period is m. Most system-supplied random number generators are awful. Knuth recommends:- m = 2 32 or 2 64 integer arithmetic efficient at mod(m), a mod 8 = 5, 0.01m < a < 0.99m, c = 1 or a. For example X n+1 = (69069 X n + 1) mod(2 32 )

BAYES THEOREM I Laplace La theorie des probabilities n est que le bon sens confirme par le calcul Kolmogorov axioms: 1. Any random event A has a probability P (A) bounded by 0 1. 2. The sure event has P (A) = 1 3. If A and B are exclusive, P (A or B) = P (A) + P (B) if A and B are dependent, P (A and B) = P (A B).P (B) if A and B are independent P (A B) = P (A); P (A, B) = P (A).P (B) Laplace s theory of probability: a. P (A B) + P (Ā B) = 1 b. Conditional probability P (A, B) = P (A B).P (B) Repeated application of the above leads to Bayes theorem (1769) P (A B) = P (B A).P (A) P (B) An innocent and uncontroversial result until you add in Bayes postulate in the absence of other knowledge all prior probabilities should be treated as equal and substitute...

BAYES THEOREM II 1. Hypothesis testing and confidence intervals. Which hypothesis? How many parameters? P (hypothesis data) = P (data hypothesis).p (hypothesis) P (data) Bayesian estimator (MAP) is model of learning process P rior probability P osterior probability 2. Parameter estimation, eg. model, θ, from data, d, p(θ d) = P (d θ).p (θ) P (d) The prior is important if no. of parameters no. of data points Bayes Maximum entropy method (Jaynes 1957...), pixon-based image reconstruction (Peutter 1996). 3. Bayes theorem is also used, generally less controversially, in classification schemes whereby, class c j has probablity P (c j d) = P (d c j) P (c j ) j P (d c j ) P (c j ) Bayes classification ANNs generally non-parametric; the industry standard parametric classifier is AUTOCLASS (Cheeseman 1996).

Some practical problems with Bayesian estimation How do you define priors? For the location parameter µ reasonable to use a uniform distribution to express ignorance about prior distribution? However what range to use? Next consider the scale (sigma) (scatter) parameter, σ P (σ) = const Bayesian prior range again? σ is presumably +ve how to incorporate that? P (σ 2 ) = const solves + ve problem but if above correct then P (σ) σ?? Jeffries (1932 - Theory of Probability) suggested using P (log e σ) = const P (σ)dσ dσ σ this is invariant under both scale changes and powers of σ transformations, and also solves +ve problem. Jaynes (1957) suggested using the concept of Maximum Entropy to define priors in a consistent way by incorporating all the constraints such that subject to these constraints the assigned prior distribution has maximum entropy (randomness). Some conceptual problems with Bayesian -v- Likelihood estimation

The Gambler s Dilemma a solution? You observe a successes and b failures in a + b trials, what is the probability of c successes and d failures in c + d further trials? From Binomial distribution, if r is the unknown probability of success P (a r) = a+b C a r a (1 r) b Simple-minded approach wrong...but...well...simple... r = a/a + b 1 r = b/a + b P (c) = (c + d)! c! d! a c b d (a + b) c+d Bayesian approach using Bayes theorem show that P (r) = P (r a, b) = (a + b + 1)! a! b! r a (1 r) b Integrate out the unwanted variable, substituting for P (c r) P (c) = P (c, r) dr = P (c r) P (r) dr P (c) = (a + c)! (b + d)! (c + d)! (a + b + 1)! a! b! c! d! (a + b + c + d + 1)! Fisher s likelihood ratio method (see book, Edwards Likelihood) P (c) = (a + c)a+c (b + d) b+d (c + d) c+d (a + b) a+b a a b b c c d d (a + b + c + d) a+b+c+d

Figure 2: Different predictions for the Gambler s Dilemma problem.

RAYLEIGH DISTRIBUTION For example: measure (x, y) positions or projected (v x, v y ) velocities what is the PDF of the error in distance or total projected velocity? Start from a bivariate Gaussian distribution where x x µ x and y y µ y P (x, y) = 1 2πσ 2e [x2 /2σ 2 +y 2 /2σ 2 ] What is PDF of r, θ? where x = rcosθ y = rsinθ P (r, θ)drdθ = P (x, y)dxdy = 1 2πσ 2e r2 /2σ 2 rdrdθ The distribution of θ clearly uniform, but P (r) dr = 1 σ 2re r2 /2σ 2 dr Maximum occurs at r = σ Average value of r is < r >= πσ 2 2 Variance of r is < r 2 >= 2 σ 2 Cumulative probability distribution C(r < R) = 1 e R2 /2σ 2 C(r > R) = e R2 /2σ 2

Figure 3: Rayleigh distribution P(r) and C(<r) as a function of r/σ.

LIKELIHOOD OF IDENTIFICATION Assume radially symmetric errors; probability of detection at distance r is then P (r δr id) = r r2 σ2exp( 2σ 2).δr where σ 2 combined variance. Probability of confusing source where ρ is surface density of these. From Bayes theorem P (id r) = P (r δr c) = 2πrρ.δr P (id).p (r id) P (id).p (r id) + P (c).p (r c) Therefore and where P (c r) = P (c).p (r c) P (id).p (r id) + P (c).p (r c) P (id r) = P (c r) = P (id).l(r) P (id).l(r) + 1 1 P (id).l(r) + 1 L(r) = P (r id) P (r c) = exp( r2 /2σ 2 ) σ 2.2πρ

ORDER STATISTICS What is PDF of maximum, minimum, median of a series of n samples {x k } from a random distribution with CDF F (x) and PDF f(x)? The probability that the kth ordered value y is P (X k y) = n j=k n C j [F (y)] j [1 F (y)] n j For example, the CDF of the min and max are P (x min y) = 1 [1 F (y)] n P (x max y) = [F (y)] n and the PDF are given by P min (y) = n [1 F (y)] n 1 f(y) P max (y) = n [F (y)] n 1 f(y) Simple example: uniform distribution { a +a} P min = n 2 n a [1 y a ]n 1 ; P max = n 2 n a [1 + y a ]n 1 < y min >= a [1 2 n + 1 ] ; < y max >= a [1 2 n + 1 ]