Data Analysis and Monte Carlo Methods

Similar documents
Lecture 2: Repetition of probability theory and statistics

Math 151. Rumbos Fall Solutions to Review Problems for Exam 2. Pr(X = 1) = ) = Pr(X = 2) = Pr(X = 3) = p X. (k) =

Algorithms for Uncertainty Quantification

Midterm Exam 1 Solution

Review of Probability. CS1538: Introduction to Simulations

Class 8 Review Problems solutions, 18.05, Spring 2014

Machine Learning: Homework Assignment 2 Solutions

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Lecture 1: Probability Fundamentals

2 (Statistics) Random variables

ECE531: Principles of Detection and Estimation Course Introduction

Joint Probability Distributions and Random Samples (Devore Chapter Five)

ECE531: Principles of Detection and Estimation Course Introduction

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Review of Probability Theory

Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality

Single Maths B: Introduction to Probability

Bivariate distributions

Probability Theory and Simulation Methods

Lecture 4: Probability and Discrete Random Variables

Basics on Probability. Jingrui He 09/11/2007

Example 1. The sample space of an experiment where we flip a pair of coins is denoted by:

Computational Perception. Bayesian Inference

Math 105 Course Outline

The Binomial distribution. Probability theory 2. Example. The Binomial distribution

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

Review of Probabilities and Basic Statistics

ENGG2430A-Homework 2

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

CS 246 Review of Proof Techniques and Probability 01/14/19

1 Random Variable: Topics

Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.

BACKGROUND NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2016 PROBABILITY A. STRANDLIE NTNU AT GJØVIK AND UNIVERSITY OF OSLO

Introduction to Machine Learning

LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO

Lecture 1: Basics of Probability

Week 12-13: Discrete Probability

Lecture 2: Review of Basic Probability Theory

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Statistical Methods in Particle Physics

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008

CME 106: Review Probability theory

Introduction to Probability and Statistics (Continued)

Some Concepts of Probability (Review) Volker Tresp Summer 2018

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.

Lecture 1. ABC of Probability

Lectures on Elementary Probability. William G. Faris

3 Multiple Discrete Random Variables

Name: Firas Rassoul-Agha

1 Presessional Probability

This does not cover everything on the final. Look at the posted practice problems for other topics.

Lectures on Statistical Data Analysis

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Problem 1. Problem 2. Problem 3. Problem 4

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Expectation is linear. So far we saw that E(X + Y ) = E(X) + E(Y ). Let α R. Then,

ACM 116: Lectures 3 4

Probability and Estimation. Alan Moses

Statistics 100A Homework 5 Solutions

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Joint Distribution of Two or More Random Variables

2. A Basic Statistical Toolbox

Math 510 midterm 3 answers

Preliminary Statistics. Lecture 3: Probability Models and Distributions

Random Variables and Their Distributions

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Tutorial:A Random Number of Coin Flips

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Randomized Algorithms

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

functions Poisson distribution Normal distribution Arbitrary functions

Lecture 16 : Independence, Covariance and Correlation of Discrete Random Variables

Appendix A : Introduction to Probability and stochastic processes

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Confidence Intervals

CS 361: Probability & Statistics

Recitation 2: Probability

Notes 12 Autumn 2005

Chapter 5 Joint Probability Distributions

EXAM. Exam #1. Math 3342 Summer II, July 21, 2000 ANSWERS

Deep Learning for Computer Vision

Math Review Sheet, Fall 2008

Random variables (discrete)

1.1 Review of Probability Theory

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

MAS113 Introduction to Probability and Statistics. Proofs of theorems

Bayesian Models in Machine Learning

Stochastic Models of Manufacturing Systems

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

CS 630 Basic Probability and Information Theory. Tim Campbell

Probability. Table of contents

Exercises with solutions (Set D)

Joint Probability Distributions, Correlations

STOR Lecture 16. Properties of Expectation - I

Notes for Math 324, Part 19

CS280, Spring 2004: Final

ECE302 Exam 2 Version A April 21, You must show ALL of your work for full credit. Please leave fractions as fractions, but simplify them, etc.

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Transcription:

Lecturer: Allen Caldwell, Max Planck Institute for Physics & TUM Recitation Instructor: Oleksander (Alex) Volynets, MPP & TUM General Information: - Lectures will be held in English, Mondays 16-18:00 - Recitation, Mondays 14-16:00 CIP room, starting May 9, 2011 - Exercises will be given at the end of the lecture, for you to try out. Will be discussed, along with other material, in recitation of following week - Course material available under: http://www.mpp.mpg.de/~caldwell/ss11.html http://www.mpp.mpg.de/~volynets/ss11/ NB: you will get the most out of this course if you can practice solving problems on your own computer! Lecture 1 1

Modeling and Data We image flipping a coin, or rolling dice, or picking a lottery number. The initial conditions are not known, so we assume symmetry and say every outcome is equally likely: - coin, heads or tails each have equal chance, probability of each is 1/2 - - rolling dice - each number on each die equally likely: each pair (6x6) equally likely. E.g., (1,1) (3,4) - i.e., we make a model for the physical process. The model contains assumptions (e.g., each outcome equally likely). - given the model, we can make predictions and compare these to the data. - from the comparison, we decide if the model is reasonable. 2

Theory g( y λ,m) y λ M Are the theoretical observables Are the parameters of the theory Is the model or theory Modeling of experiment x f( x λ,m) Is a possible data outcome compare D Experiment 3

Notation g( y λ,m) Is understood as a probability density. I.e., the probability that y is in the interval y y + d y given the model M and the parameter values specified by λ. e.g., in we are considering the decay of a unstable particle, we would have Probability density of decay occurring at time t g(t τ) e t/τ for single particle, assuming constant probability per unit time. 4

How we learn Deduction 5

How we Learn We learn by comparing measured data with distributions for predicted results assuming a theory, parameters, and a modeling of the experimental process. What we typically want to know: Is the theory reasonable? I.e., is the observed data a likely result from this theory (+ experiment). If we have more than one potential explanation, then we want to be able to quantify which theory is more likely to be correct given the observations Assuming we have a reasonable theory, we want to estimate the most probable values of the parameters, and their uncertainties. This includes setting limits (>< some value at XX% probability). 6

Logical Basis Model building and making predictions from models follows deductive reasoning: Given A B (major premise) Given B C (major premise) Then, given A you can conclude that C is true etc. Everything is clear, we can make frequency distributions of possible outcomes within the model, etc. This is math, so it is correct 7

Logical Basis However, in physics what we want to know is the validity of the model given the data. i.e., logic of the form: Given A C Measure C, what can we say about A? Well, maybe A 1 C, A 2 C, We now need inductive logic. We can never say anything absolutely conclusive about A unless we can guarantee a complete set of alternatives A i and only one of them can give outcome C. This does not happen in science, so we can never say we found the true model. 8

Logical basis Instead of truth, we consider knowledge Knowledge = justified true belief Justification comes from the data. Start with some knowledge or maybe plain belief Do the experiment Data analysis gives updated knowledge 9

Formulation of Data Analysis In the following, I will formulate data analysis as a knowledge-updating scheme. Knowledge+data updated knowledge This leads to the usual Bayes equation, but I prefer this derivation to the usual one in the textbooks. 10

Formulation We require that P ( x λ,m) 0 P ( x λ,m)d x =1 although as we will see the normalization condition is not really needed. The modeling of the experiment will typically add other (nuisance) parameters. E.g., there are often uncertainties, such as the energy scale of the experiment. Different assumptions on these lead to different predictions for the data. Can have P ( x λ, ν,m) where ν represents our nuisance parameters. 11

Formulation The expected distribution (density) of the data assuming a model M and parameters λ is written as P ( x λ,m) where x is a possible realization of the data. There are different possible definitions of this function. Imagine we flip a coin 10 times, and get the following result: T H T H H T H T T H We now repeat the process with a different coin and get T T T T T T T T T T Which outcome has higher probability? 12

Take a model where H, T are equally likely. Then, And outcome 1 outcome 2 prob = (1/2) 10 prob = (1/2) 10 Something seem wrong with this result? This is because we evaluate many probabilities at once. The result above is the probability for any sequence of ten flips of a fair coin. Given a fair coin, we could also calculate the chance of getting n times H: ( 10 n )( ) 10 1 2 13

And we find the following result: n p 0 1 2 10 1 10 2 10 2 45 2 10 3 120 2 10 4 210 2 10 5 252 2 10 6 210 2 10 7 120 2 10 8 45 2 10 9 10 2 10 10 1 2 10 There are many more ways to get 5 H than 0, so this is why the first result somehow looks more probable, even if each sequence has exactly the same probability in the model. Maybe the model is wrong and one coin is not fair? How would we test this? 14

The message: there are usually many ways to define the probability for your data. Which is better, or whether to use several, depends on what you are trying to do. E.g., have measured times in exponential decay. Can define the probability density as N 1 P ( t τ) = τ e t i/τ i=1 Or you can count events in a time interval and compare to expectations P ( t τ) = M j=1 e ν j ν n j j n j! ν j = expected events in bin j n j = observed events in bin j 15

Formulation For the model, we have 0 P (M) 1. For a fully Bayesian analysis, we require i P (M i)=1 For the parameters, assuming a model, we have: P ( λ M i ) 0 P ( λ M i )d λ = 1 The joint probability distribution is P ( λ,m)=p ( λ M)P (M) and P (M i ) i P ( λ M i )d λ =1 16

Learning Rule P i+1 ( λ,m D) P ( x = D λ,m)p i ( λ,m) where the index represents a state-of-knowledge We have to satisfy our normalization condition, so P i+1 ( λ,m D)= M P ( x = D λ,m)p i ( λ,m) P ( x = D λ,m)pi ( λ,m)d λ We usually write. This is our prior information before performing the measurement. 17

P i+1 ( λ,m D)= Learning Rule M P ( x = D λ,m)p i ( λ,m) P ( x = D λ,m)pi ( λ,m)d λ The denominator is the probability to get the data summing over all possible models and all possible values of the parameters. P ( D)= P ( x = D λ,m)p i ( λ,m)d λ M so Bayes Equation 18

Bayes-Laplace Equation Here is the standard derivation: P (A, B) = P (A B)P (B) P (A, B) = P (B A)P (A) So P (B A) = P (A B)P (B) P (A) S B A A B Clear for logic propositions and well-defined S,A,B. In our case, B=model+parameters, A=data 19

Notation-cont. Cumulative distribution function: F(a) = x i θ ) F(a) = a i = a x θ )dx 0 F(a) 1 a x b) = F(b) F(a) Equality may not be possible for discrete case Expectation value: E[x] = E[x] = i= E[u(x)] = x i x i θ) x x θ) dx u(x) x θ) dx For probabilities For probability densities For u(x), v(x) any two functions of x, E[u+v]=E[u]+E[v]. For c,k any constants, E[cu+k]=cE[u]+k. 20

Notation-cont. The n th moment variable is given by: α n E[x n ] = For discrete probabilities, integrals sums in obvious way µ α 1 =E[x] is known as the mean x n x θ) dx The n th central moment of x: m n E[(x α 1 ) n ] = (x α 1 ) n x θ) dx σ 2 V[x] m 2 =α 2 -µ 2 is known as the variance and σ is known as the standard deviation. µ, σ (or σ 2 ) are most commonly used measures to characterize a distribution. 21

Notation-cont. Other useful characteristics: most-probable value (mode) is value of x which maximizes f(x;θ) median is a value of x such that F(x med )=0.5 f(x) mode median mean x 22

Examples of using Bayes Theorem A particle detector has an efficiency of 95% for detecting particles of type A and an efficiency of 2% for detecting particles of type B. Assume the detector gives a positive signal. What can be concluded about the probability that the particle was of type A? Answer: NOTHING. It is first necessary to know the relative flux of particles of type A and B. Now assume that we know that 90% of particles are of type B and 10 % are of type A. Then we can calculate: A signal) = signal signal A) A) A) A) + signal B) B) A signal) = 0.95 0.1 0.95 0.1+ 0.02 0.9 = 0.84 23

We are told in the problem that we know A), B), signal A), and signal B). This information was somehow determined separately, possibly as a frequency, and is the job of the experimenter to determine. Suppose we want to get the Signal to Background ratio for a sample of many measurements, where signal is the number of particles of type A and background is the number of particles of type B: A signal) = B signal) = A signal) = B signal) signal signal signal signal signal A) A) A) A) + signal signal B) B) A) A) + signal A) A) = B) B) 0.95 0.02 B) B) B) B) 0.1 = 5.3 0.9 24

Notation-cont. For two random variables x,y, define joint p.d.f., x,y), where we leave off the parameters as shorthand. The probability that x is in the range x x+dx and simultaneously y is in the range y y+dy is x,y)dxdy. To evaluate expectation values, etc., usually need marginal p.d.f. The marginal p.d.f. of x (y unobserved) is The mean of x is then P x (x) = x,y) dy µ x = xx, y) dxdy = xp x (x) dx The covariance of x and y is defined as cov[x,y] = E[(x - µ x )(y µ y )] = E[xy] µ x µ y And the correlation coefficient is ρ xy = cov[x,y]/σ x σ y 25

Examples y ρ xy =0 The shading represents an equal probability density contour x y ρ xy =-0.8 y ρ xy =0.2 x x The correlation coefficient is limited to -1 ρ xy 1 26

Independent Variables Two variables are independent if and only if Then x,y)=p x (x)p y (y) cov[x, y] = E[xy] µ x µ y = xy x, y) dxdy µ x µ y = xy P x (x)p y (y) dxdy µ x µ y = xp x (x)dx yp y (y)dy µ x µ y = 0 27

Notation-cont. If x,y are independent, then E[u(x)v(y)]=E[u(x)]E[v(y)] and V[x+y]=V[x]+V[y] If x,y are not independent V[x+y]=E[(x+y) 2 ]-(E[x+y]) 2 =E[x 2 ]+E[y 2 ]+2E[xy] (E[x]+E[y]) 2 =V[x]+V[y]+2(E[xy]-E[x]E[y]) =V[x]+V[y]+2cov[x,y] 28

Binomial Distribution Bernoulli Process: random process with exactly two possible outcomes which occur with fixed probabilities (e.g., flip coin, heads or tails, particle recorded/not recorded, ). Probabilities from symmetry argument or other information. Definitions: p is the probability of a success (heads, detection of particle, ) 0 p 1 N independent trials (flip of the coin, number of particles crossing detector, ) r is the number of successes (heads, observed particles, ) 0 r N Then Probability of r successes in N trials r N, p) = N! r!(n r)! pr q N r where q =1 p Number of combinations - Binomial coefficient 29

Derivation: Binomial Coefficient Ways to order N distinct objects is N!=N(N-1)(N-2) 1 N choices for first position, then (N-1) for second, then (N-2) Now suppose we don t have N distinct objects, but have subsets of identical objects. E.g., in flipping a coin, two subsets (tails and heads). Within a subset, the objects are indistinguishable. For the i th subset, the n i! combinations are all equivalent. The number of distinct combinations is then N! n 1!n 2! n n! where i n i = N For the binomial case, there are two subclasses (Success&failure, heads or tails, ) The combinatorial coefficient is therefore N N! = r r!(n r)! 30

Binomial Distribution p is the probability of a success (heads, detection of particle, ) 0 p 1 N independent trials (flip of the coin, number of particles crossing detector, ) r is the number of successes (heads, observed particles, ) 0 r N Then Probability of r successes in N trials r N, p) = N! r!(n r)! pr q N r where q =1 p Number of combinations - Binomial coefficient 31

Binomial Distribution-cont. P=0.5 N=4 P=0.5 N=5 P=0.5 N=15 P=0.5 N=50 P=0.1 N=5 P=0.1 N=15 P=0.8 N=5 P=0.8 N=15 E[r]=Np V[r]=Np(1-p) Notes: for large N, p near 0.5 distribution is approx. symmetric for p near 0 or 1, the variance is reduced 32

Example Example: You are designing a particle tracking system and require at least three measurements of the position of the particle along its trajectory to determine the parameters. You know that each detector element has an efficiency of 95%. How many detector elements would have to see the track to have a 99% reconstruction efficiency? Solution: We are happy with 3 or more hits, so our probability is P (r 3 N, p) = N P (r N, p) > 0.99 r=3 33

Example-cont. N N N 3! = = = = 3!(3 3)! 3 3 3 3 3 f (3; 3, 0.95) (0.95) (1 0.95) 0.95 0.857 4! = = = = 3!(4 3)! 3 4 3 3 4 f (3; 4, 0.95) (0.95) (1 0.95) 4(0.95) (0.05) 0.171 f 4! 4!(4 4)! 4 4 4 4 (4; 4, 0.95) = (0.95) (1 0.95) = (0.95) = 0.815 5! 3 5 3 3 2 = 5 f (3; 5, 0.95) = (0.95) (1 0.95) = 10(0.95) (0.05) = 0.021 3!(5 3)! f f 5! 4!(5 4)! 4 5 4 4 (4; 5, 0.95) = (0.95) (1 0.95) = 5(0.95) (0.05) = 0.204 5! 5!(5 5)! 5 5 5 5 (5; 5, 0.95) = (0.95) (1 0.95) = (0.95) = 0.774 0.986 0.999 With 5 detector layers, we have >99% chance of getting at least 3 hits 34