BACKGROUND NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2016 PROBABILITY A. STRANDLIE NTNU AT GJØVIK AND UNIVERSITY OF OSLO

Similar documents
LECTURE NOTES FYS 4550/FYS EXPERIMENTAL HIGH ENERGY PHYSICS AUTUMN 2013 PART I A. STRANDLIE GJØVIK UNIVERSITY COLLEGE AND UNIVERSITY OF OSLO

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Lectures on Statistical Data Analysis

Multiple Random Variables

Data Analysis and Monte Carlo Methods

Recitation 2: Probability

Bivariate distributions

Multivariate Random Variable

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Review: mostly probability and some statistics

Single Maths B: Introduction to Probability

L2: Review of probability and statistics

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Review of Probability Theory

Stat 5101 Notes: Algorithms (thru 2nd midterm)

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

Statistische Methoden der Datenanalyse. Kapitel 1: Fundamentale Konzepte. Professor Markus Schumacher Freiburg / Sommersemester 2009

Math 416 Lecture 2 DEFINITION. Here are the multivariate versions: X, Y, Z iff P(X = x, Y = y, Z =z) = p(x, y, z) of X, Y, Z iff for all sets A, B, C,

BASICS OF PROBABILITY

Lecture 1: Probability Fundamentals

Statistics for Data Analysis. Niklaus Berger. PSI Practical Course Physics Institute, University of Heidelberg

Review of Basic Probability Theory

Preliminary Statistics. Lecture 3: Probability Models and Distributions

Basics on Probability. Jingrui He 09/11/2007

Statistical Methods in Particle Physics. Lecture 2

Let X and Y denote two random variables. The joint distribution of these random

Appendix A : Introduction to Probability and stochastic processes

Probability and Distributions

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Lecture 2: Review of Probability

Introduction to Statistical Methods for High Energy Physics

Stat 5101 Notes: Algorithms

RWTH Aachen Graduiertenkolleg

Chapter 1 Statistical Reasoning Why statistics? Section 1.1 Basics of Probability Theory

Brief Review of Probability

4. Distributions of Functions of Random Variables

1 Presessional Probability

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

EE4601 Communication Systems

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Review of probability. Nuno Vasconcelos UCSD

Recap. The study of randomness and uncertainty Chances, odds, likelihood, expected, probably, on average,... PROBABILITY INFERENTIAL STATISTICS

Statistics and Data Analysis

Review (Probability & Linear Algebra)

Analysis of Experimental Designs

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.

ECE Lecture #9 Part 2 Overview

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES

Statistics, Data Analysis, and Simulation SS 2015

Revised May 1996 by D.E. Groom (LBNL) and F. James (CERN), September 1999 by R. Cousins (UCLA), October 2001 and October 2003 by G. Cowan (RHUL).

Fourier and Stats / Astro Stats and Measurement : Stats Notes

Chapter 2. Probability

Statistical Methods in Particle Physics

Stat 366 A1 (Fall 2006) Midterm Solutions (October 23) page 1

Some Concepts of Probability (Review) Volker Tresp Summer 2018

2 (Statistics) Random variables

STA2603/205/1/2014 /2014. ry II. Tutorial letter 205/1/

Probability Theory Review Reading Assignments

2. A Basic Statistical Toolbox

Sample Spaces, Random Variables

Class 26: review for final exam 18.05, Spring 2014

Lecture 2: Repetition of probability theory and statistics

Multivariate probability distributions and linear regression

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }

18 Bivariate normal distribution I

Human-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015

Transformation of Probability Densities

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Statistical Data Analysis 2017/18

CME 106: Review Probability theory

Making Hard Decision. Probability Basics. ENCE 627 Decision Analysis for Engineering

Stat 5101 Lecture Slides: Deck 8 Dirichlet Distribution. Charles J. Geyer School of Statistics University of Minnesota

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University

Algorithms for Uncertainty Quantification

Random Variables and Their Distributions

Review of probability and statistics 1 / 31

Intro to Probability. Andrei Barbu

YETI IPPP Durham

Joint Probability Distributions and Random Samples (Devore Chapter Five)

CS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro

Probability theory basics

For a stochastic process {Y t : t = 0, ±1, ±2, ±3, }, the mean function is defined by (2.2.1) ± 2..., γ t,

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

ENGG2430A-Homework 2

1 Review of Probability and Distributions

Statistical Methods for Astronomy

Set Theory. Pattern Recognition III. Michal Haindl. Set Operations. Outline

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Joint Probability Distributions, Correlations

Statistics for Economists Lectures 6 & 7. Asrat Temesgen Stockholm University

3. Probability and Statistics

Introduction to Probability and Statistics (Continued)

2 Functions of random variables

Transcription:

ACKGROUND NOTES FYS 4550/FYS9550 - EXERIMENTAL HIGH ENERGY HYSICS AUTUMN 2016 ROAILITY A. STRANDLIE NTNU AT GJØVIK AND UNIVERSITY OF OSLO

efore embarking on the concept of probability, we will first define a set of other concepts. A stochastic experiment is characterized by: All possible elementary outcomes of the experiment are known Only one of the outcomes can occur in a single experiment The outcome of an experiment is not known a priori Example: throwing a dice Outcomes are: S={1,2,3,4,5,6} Can only observe one of these each time you throw Don t know beforehand what you will observe The set S is called the sample space of the experiment

An event A is one or more outcomes which satisfy certain specifications Example: A= odd number when throwing a dice An event is therefore also a subset of S Here: A={1,3,5} If = even number, what is the subset of S describing? The probability of occurence of an event A, A, is a number between 0 and 1 Intuitively a number for A close to 0 means that A is supposed to occur very rarely in an experiment, whereas a number close to 1 means that A occurs very often

There are three ways of quantifying probability 1. Classical approach, valid when all outcomes can be assumed equally likely. robability is defined as number of favourable outcomes for a given event divided by total number of outcomes. Example: throwing a dice has N=6 different outcomes. Assume that the event A = observing 6 eyes. Only n=1 of the outcomes are favourable for A. A=n/N=1/6=0.167. 2. Approach based on convergence value of relative frequency for a very large number of repeated, identical experiments. Example: throwing a dice, recording relative frequency of occurence of A for various numbers of trials 3. Subjective approach, reflecting degree of belief of occurence of a certain event A. ossible guideline: convergence value of a large number of hypothetical experiments

Convergence of relative frequency true probability relative frequency logarithm base 10 of trials

Approach 2 forms the basis of frequentist statistics, whereas approach 3 is the baseline of ayesian statistics Two different schools When estimating parameters from a set of data, the two approaches usually give the same numbers for the estimates if there is a large amount of data If there is little available data, estimates might differ No easy way of determining which approach is best oth approaches advocated in high-energy physics experiments Will not enter any further into such questions in this course

Will now look at probabilities of combinations of events Need some concepts from set theory: The union is a new event which occurs if A or or both events occur. To events are disjoint if they cannot occur simultaneously The intersection A is a new event which occurs if both A and occurs The complement A is a new event which occurs if A does not occur

A robability A S outcomes VENN DIAGRAM A C disjoint with A and

The mathematical axioms of probability: 1. robability is never negative, A 0 2. The probability for the event which corresponds to the entire sample space S i.e. the probability of observing any of the possible outcomes of the experiment is equal to the unit value, i. e. S = 1 3. robability must comply with the addition rule of disjoint events: A A2 An A1 A2 A 1 n A couple of useful formulas which can be derived from the axioms: A 1 A A A A

A robability A Concept of conditional probability: What is the probability of occurence of A given that we know will occur, i. e. A?

Recalling the definition of probability as the number of favourable outcomes divided by the total number of outcomes, we get: Example: throwing dice. A = {2, 4, 6}, = {3, 4, 5, 6} What is A?? 3 1 4,6} { A A / / A N N N N N N A tot tot A A 2 1 2 / 3 1/ 3 A A

A robability A Important observation: and are disjoint! A A

Therefore: Expressing A in terms of a subdivision of S in a set of other, disjoint events is called the law of total probability. The general formulation of this law is: where all { } are disjoint and span the entire sample space S. A A A A A A A i i i A A i

From the definition of conditional probability it follows: A quick manipulation gives: which is called ayes theorem. A A A A A A A

y using the law of total probability, one ends up with the general formulation of ayes theorem: which is an extremely important result in statistics. articularly in ayesian statistics this theorem is often used to update or refine the knowledge about a set of unknown parameters by the introduction of information from new data. i i i j j j A A A

This can be explained by a rewrite of ayes theorem: parameters data α data parameters parameters. data parameters is often called the likelihood, parameters denotes the prior knowledge of the parameters, whereas parameters data is the posterior probability of the parameters given the data. If parameters cannot be deduced by any objective means, a subjective belief of its value is used in ayesian statistics. Since there is no fundamental rule describing how to deduce this prior probability, ayesian statistics is still debated also in highenergy physics!

Definition of independence of events A and : A = A, i.e. any given information about does not affect the probability of observing A. hysically this means that the events A and are uncorrelated. For practical applications such independence can not be derived but rather has to be assumed, given the nature of the physical problem one intends to model. General multiplication rule for independent events A, A2,, : 1 A n A 2 A1 n 1 2 n A A A A

Stochastic or random variable: Number which can be attached to all outcomes of an experiment Example: throwing two dice, sum of number of spots Mathematical terminology: real-valued function defined over the elements of the sample space S of an experiment A capital letter is often used to denote a random variable, for instance X Simulation experiment: throwing two dice N times, recording sum of spots each time and calculating the relative frequency of occurence for each of the outcomes

N=10 lue columns: observed rel. freq. Red columns: teoretically expected rel. freq.

N=20 robability

N=100 robability

N=1000 robability

N=10000 robability

N=100000 robability

N=1000000 robability

N=10000000 robability

The relative frequencies seem to converge towards the theoretically expected probabilities Such a diagram is an expression of a probability distribution: A list of all different values of a random variable together with the associated probabilities Mathematically: a function fx = X=x defined for all possible values x of X given by the experiment at hand The values of X can be discrete like in the previous example, or continuous For continuous x, fx is called a probability density function Simulation experiment: height of Norwegian men Collecting data, calculating relative frequencies of occurences in intervals of various widths

interval width 10 cm

interval width 5 cm

interval width 1 cm

interval width 0.5 cm

interval width 0 continuous probability distribution

Cumulative distribution function: Fa=X a For discrete, random variables: For continuous, random variables: a x a x i i i i x f x X a F a dx x f a F

It follows: a X b F b F a For continuous variables: a X b f x dx b a

shaded area is a < X < b a b

shaded area is X < b b

shaded area is X > a a

A function ux of a random variable X is also a random variable. The expectation value of such a function is: u X Two very important special cases are: E u x f x dx E X x f x dx mean 2 2 X 2 Var X E x f x dx variance

The mean μ is the most important measure of the centre of the distribution of X. The variance, or its square root σ, the standard deviation, is the most important measure of the spread of the distribution of X around the mean. The mean is the first moment of X, whereas the variance is the second central moment of X. In general, the n th moment of X is n X n n E x f x dx

The n th central moment is n X n mn E 1 x 1 f x dx Another measure of the centre of the distribution of X is the median, defined as 1 F x med 2 or, in words, the value of of X of which half of the probability lies above and half lies below.

Assume now that X and Y are two random variables with a joint probability distribution function pdf fx,y. The marginal pdf of X is f 1 x f x, y dy whereas the marginal pdf of Y is f 2 y f x, y dx

The mean values of X and Y are X x f x, y dxdy x f1 x Y y f x, y dxdy y f2 y The covariance of X and Y is dx dy cov X, Y E X X Y Y EXY X Y

If several random variables are considered simultaneously, one frequently arranges the variables in a stochastic or random vector The covariances are then naturally displayed in a covariance matrix T X n X,,, X 2 1 X, cov, cov, cov, cov, cov, cov, cov, cov, cov cov 2 1 2 2 2 1 2 1 2 1 1 1 n n n n n n X X X X X X X X X X X X X X X X X X X

If two variables X and Y are independent, the joint pdf can be written f x, y f1 x f2 y The covariance of X and Y vanishes in this case why?, and the variances add: VX+Y=VX+VY. If X and Y are not independent, the general formula is: VX+Y=VX+VY+2CovX,Y. For n mutually independent random variables the covariance matrix becomes diagonal i.e. all off-diagonal terms are identically zero.

If a random vector is related to a vector X with pdf fx by a function YX, the pdf of Y is where J is the absolute value of the determinant of a matrix J. This matrix is the so-called Jacobian of the transformation from Y to X: Y n Y Y,, 2, 1 Y J y x y g f n n n n y x y x y x y x 1 1 1 1 J

The transformation of the covariance matrix is where the inverse of J is The transformation from x to y must be one-to-one, such that the inverse functional relationship exists. 1 T 1 cov cov J X J Y n n n n x y x y x y x y 1 1 1 1 1 J

Obtaining covy from covx as in the previous slide is a very much used technique in high-energy physics data analysis. It is called linear error propagation and is applicable any time one wants to transform from one set of estimated parameters to another Transformation between different sets of parameters describing a reconstructed particle track Transport of track parameters from one location in a detector to another. Will see examples later in the course

The characteristic function Φu associated with the pdf fx is the Fourier transform of fx: u iux E e Such functions are useful in deriving results about moments of random variables. The relation between Φu and the moments of X are i n n d n du u0 x If Φu is known, all moments of fx can be calculated without the knowledge of fx itself n e iux f x dx f x dx n

Some common probability distributions: inomial distribution oisson distribution Gaussian distribution Chisquare distribution Student s t distribution Gamma distribution We will take a closer look at some of them

inomial distribution: Assume that we make n identical experiments with only two possible outcomes: success or no success The probability of success p is the same for all experiments The individual experiments are independent of each other The probability of x successes out of n trials is then X Example: throwing dice n times Defining event of success to be occurence of six spots in a throw robability p=1/6 n x x x p 1 p n x

probability distribution for number of successes in 5 throws

probability distribution for number of successes in 15 throws

probability distribution for number of successes in 50 throws anything familiar with the shape of this distribution?

Mean value and variance: E X np Var X np1 p Five throws with a dice: E# six spots = 5/6 Var# six spots = 25/36 Std# six spots = 5/6

oisson distribution: Number of occurences of event A per given time length, area, volume, interval is constant and equal to λ. robability distribution of observing x occurences in the interval is X x oth mean value and variance of X is λ. x! Example: number of particles in a beam passing through a given area in a given time must be oisson distributed. If the average number λ is known, the probabilities for all x can be calculated according to the formula above. x e

Gaussian distribution: Most frequently occurring distribution in nature. Most measurement uncertainties, disturbances of directions of charged particles when penetrating through enough matter, number of ionizations created by charged particle in a slab of material etc. follow Gaussian distribution. Main reason: CENTRAL LIMIT THEOREM States that sum of n independent random variables converges to a Gaussian distribution when n is large enough, irrespective of the individual distributions of the variables. Abovementioned examples are typically of this type.

Gaussian probability density function with mean value μ and standard deviation σ: For a random vector X of size n with mean value μ and covariance matrix V the function is multivariate Gaussian distribution: 2 2 2 / 2 2 2 1, ; x e x f μ x V μ x V V μ x 1 2 / 2 1 exp det 2 1, ; T n f

Usual terminology: X ~ Nμ,σ : X is distributed according to a Gaussian normal with mean value μ and standard deviation σ. 68 % of distribution within plus/minus one σ. 95 % of distribution within plus/minus two σ. 99.5 % of distribution within plus/minus three σ. Standard normal variable Z~N0,1: Z=X- μ/ σ Quantiles of the standard normal distribution: z Z z 1 The value is denoted the 100 * α % quantile of the standard normal distribution Such quantiles can be found in tables or by computer programs

10 % quantile

5 % quantile 1.64

95 % of area within plus/ minus 2.5 % quantile 1.96

2 distribution: X,, If are independent, Gaussian random variables, then follow a 1 2 X n 2 i1 distribution with n degrees of freedom. Often used in evaluating level of compatibility between observed data and assumed pdf of the data Example: is position of measurement in a particle detector compatible with the assumed distribution of the measurement? Mean value is n and variance 2n. n X i i 2 i 2

chisquare distribution with 10 degrees of freedom