Lecture 1: Probability Fundamentals

Similar documents
Probability Theory for Machine Learning. Chris Cremer September 2015

Recitation 2: Probability

Basic Probability. Introduction

Single Maths B: Introduction to Probability

Lecture 2: Review of Basic Probability Theory

Review: Probability. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler

Fundamentals of Probability CE 311S

Lecture 1. ABC of Probability

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

Probability COMP 245 STATISTICS. Dr N A Heard. 1 Sample Spaces and Events Sample Spaces Events Combinations of Events...

Probability theory basics

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

the time it takes until a radioactive substance undergoes a decay

2. AXIOMATIC PROBABILITY

Deep Learning for Computer Vision

Sample Spaces, Random Variables

Introduction to Stochastic Processes

Origins of Probability Theory

CME 106: Review Probability theory

Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.

Lecture 2: Repetition of probability theory and statistics

Review of Probabilities and Basic Statistics

Lecture 3. Discrete Random Variables

Dept. of Linguistics, Indiana University Fall 2015

Introduction to Statistical Methods for High Energy Physics

MATH MW Elementary Probability Course Notes Part I: Models and Counting

Advanced Herd Management Probabilities and distributions

Introduction to Machine Learning

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Probability Theory and Applications

Econ 325: Introduction to Empirical Economics

Probability 1 (MATH 11300) lecture slides

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Introduction Probability. Math 141. Introduction to Probability and Statistics. Albyn Jones

Statistical Methods in Particle Physics. Lecture 2

Brief Review of Probability

Refresher on Discrete Probability

Discrete Probability and State Estimation

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Probability Theory Review

MAT Mathematics in Today's World

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Math Review Sheet, Fall 2008

Lecture 10: Bayes' Theorem, Expected Value and Variance Lecturer: Lale Özkahya

The probability of an event is viewed as a numerical measure of the chance that the event will occur.

Probability and Probability Distributions. Dr. Mohammed Alahmed

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Probability. Lecture Notes. Adolfo J. Rumbos

Fourier and Stats / Astro Stats and Measurement : Stats Notes

Probability Theory and Random Variables

Algorithms for Uncertainty Quantification

5. Conditional Distributions

STA Module 4 Probability Concepts. Rev.F08 1

3. Review of Probability and Statistics

With Question/Answer Animations. Chapter 7

Terminology. Experiment = Prior = Posterior =

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Random Variables. Definition: A random variable (r.v.) X on the probability space (Ω, F, P) is a mapping

Random Variables Example:

Lecture 2: Discrete Probability Distributions

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

EE514A Information Theory I Fall 2013

Probability measures A probability measure, P, is a real valued function from the collection of possible events so that the following

Probability and Independence Terri Bittner, Ph.D.

Statistical Methods for the Social Sciences, Autumn 2012

Statistics 1 - Lecture Notes Chapter 1

STAT 430/510 Probability

Data Analysis and Monte Carlo Methods

Lecture 3 - Axioms of Probability

Introduction to Probability Theory and Statistics

Probability Review. Chao Lan

Chapter 2 Class Notes

Chapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Intro to Probability. Andrei Barbu

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

Probability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability?

Lecture 2. October 21, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Probability and Information Theory. Sargur N. Srihari

Class 26: review for final exam 18.05, Spring 2014

Lecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability

Week 12-13: Discrete Probability

Introduction to Machine Learning

Probabilistic models

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Lecture 1. ABC of Probability

Probability (Devore Chapter Two)

CS 361: Probability & Statistics

Probability Theory. Alea iacta est!

1 INFO Sep 05

Probability and Statistics. Terms and concepts

CS 361: Probability & Statistics

Introduction to Probability

Some Basic Concepts of Probability and Information Theory: Pt. 2

Probability: Part 1 Naima Hammoud

Lecture 1. Chapter 1. (Part I) Material Covered in This Lecture: Chapter 1, Chapter 2 ( ). 1. What is Statistics?

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

An Introduction to Laws of Large Numbers

Grundlagen der Künstlichen Intelligenz

Transcription:

Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability Fundamentals January 22nd, 2008 1 / 16

Lecture overview There will be six lectures on Probability and Statistics, covering roughly: 1 What is probability and why is it useful? types of probability, observations, inference, information and entropy 2 Characterization of probability distributions mean, variance, median, mode, moments and entropy 3 Discrete random variables and distributions properties of Bernoulli, Binomial and Poisson distributions 4 Continuous random variables and distributions properties of Beta, Gaussian and Exponential distributions 5 The multivariate Gaussian and the Central Limit Theorem joint, conditional and marginal distributions and covariance 6 Tests and confidence intervals null hypothesis, one- and two-sided testing All materials will be available at mlg.eng.cam.ac.uk/teaching. Rasmussen (CUED) Lecture 1: Probability Fundamentals January 22nd, 2008 2 / 16

Today s Lecture, Motivation What is probability and probability theory and statistics, what is it useful for? Make inference about uncertain events Form the basis of information theory Test the strength of statistical evidence Example: Three different laboratories have measured the speed of light, with slightly differing results. What is the true speed likely to be? Example: Two drugs are compared. Five out of nine patients responded to treatment with drug A, where as seven out of ten responded to drug B. What do you conclude? Examples of places where uncertainty plays a role: medical diagnosis, scientific measurements, speech recognition (human or artificial), budgets,... Probability is useful, since there is uncertainty everywhere. Rasmussen (CUED) Lecture 1: Probability Fundamentals January 22nd, 2008 3 / 16

What is Probability and Statistics? Probability is used to quantify the extent to which an uncertain event is likely to occur. Probability theory is a calculus of uncertain events. It enables one to infer probabilities of interest based on assumptions and observations. Example: The probability of getting 2 heads when tossing a pair of coins is 1/4, as probability theory tells us to multiply the probability of the (independent) individual outcomes. Whereas probability theory is uncontroversial, the meaning of probability is sometimes debated. Statistics is concerned with the analysis of collections of observations. Rasmussen (CUED) Lecture 1: Probability Fundamentals January 22nd, 2008 4 / 16

The Meaning of Probability In Classical (or frequentist) statistics, the probability of an event is defined as its long run frequency in a repeatable experiment. Example: The probability of rolling a 6 with a fair die is 1/6 because this is the relative frequency of this event as the number of experiments tends to infinity. However, some notions of chance don t lend themselves to a frequentist interpretation: Example: In There is a 50% chance that the arctic polar ice-cap will have melted by the year 2100, it is not possible to define a repeatable experiment. An interpretation of probability as a (subjective) degree of belief is possible here. This is known also as the Bayesian interpretation. Both types of probability can be treated using (the same) probability theory. Rasmussen (CUED) Lecture 1: Probability Fundamentals January 22nd, 2008 5 / 16

What is randomness? An event is said to be random when it is uncertain whether it is going to happen or not. But there are several possible reasons for such uncertainty. Here are two examples: inherent uncertainty as eg. whether a radioactive atom may decay within some time interval. lack of knowledge I may be uncertain about the number of legs my pet centipede has (if I haven t counted them). Another important concept is a random sample from a population. Rasmussen (CUED) Lecture 1: Probability Fundamentals January 22nd, 2008 6 / 16

Classical and Bayesian inference Example: We want to know the proportion π of blond engineering students. We want to base our inference on the proportion observed in a randomly chosen subset of the engineering students. Classically, we think that the true proportion is deterministic but unknown. It s the sample which is treated as random: You think about what may happen when you repeatedly draw random samples from the population. Using Bayesian inference, we don t know the true proportion so we treat it as random. But we do know the actual observed sample, so this is deterministic. So, curiously, we see that the opposite quantities are treated as random in the two settings. Often, the two types of inference give very similar or even identical results. When understanding an inference problem, be clear about which quantities are treated as deterministic and which are random. Rasmussen (CUED) Lecture 1: Probability Fundamentals January 22nd, 2008 7 / 16

Axioms of Probability The foundations of probability are based on three axioms: The probability of an event E is a non-negative real number p(e) 0, E Ω, where Ω is the sample space. The certain event has unit probability p(ω) = 1. (Countable) additivity: for disjoint events E 1, E 2,..., E n p(e 1 E 2... E n ) = Remarkably, these axioms are sufficient. n p(e i ). i=1 Rasmussen (CUED) Lecture 1: Probability Fundamentals January 22nd, 2008 8 / 16

Some consequences Complement rule: p(ω E) = p( E) = 1 p(e). Impossible event: p( ) = 0. If E 1 E 2 then p(e 1 ) p(e 2 ). General addition rule: p(e 1 E 2 ) = p(e 1 ) + p(e 2 ) p(e 1 E 2 ). As an exercise, prove these! Generally, we write the probability of the intersection event as p(e 1 E 2 ) = p(e 1, E 2 ). Rasmussen (CUED) Lecture 1: Probability Fundamentals January 22nd, 2008 9 / 16

The Venn Diagram and Conditional Probability Events can sometimes usefully be visualised in a Venn diagram A B The intersection of A and B corresponds to both events happening p(a, B). By the addition axiom: p(a) = p(a, B) + p(a, B). Conditional probability, the probability of A given that we already know B is defined as p(a, B) p(a B) = p(b), assuming that p(b) 0. Rasmussen (CUED) Lecture 1: Probability Fundamentals January 22nd, 2008 10 / 16

Example: Inference in Medical Diagnosis A rare disease occurs with probability p(d) = 0.01. A screening test exists, which detects the disease with probability p(t D) = 0.99. However, the test has a false positive rate of p(t D) = 0.05. A patient takes the test, and the result is positive. What is the probability that she has the disease? We are looking for p(d T): p(d T) = = p(d, T) p(t) = p(t D)p(D) p(t) p(t D)p(D) p(t D)p(D) + p(t D)p( D) = 1 6. = p(t D)p(D) p(t, D) + p(t, D) Despite the positive test result, it is still more likely that she does not have the disease. The part of the equation above marked in blue is known as Bayes rule. Rasmussen (CUED) Lecture 1: Probability Fundamentals January 22nd, 2008 11 / 16

Random Variables A random variable is an abstraction of the intuitive concept of chance into the theoretical domains of mathematics, forming the foundations of probability theory and mathematical statistics [wikipedia]. Throughout, we ll use intuitive notions of random variables, and won t even bother defining them precisely. Sloppy definition: a random variable associates a numerical value with the outcome of a random experiment (measurement). Example: a random variable X takes values form {1,..., 6} reflecting the number of eyes showing when rolling a die. Example: a random variable Y takes the values in R + reflecting measured car velocity in a radar speed detector. Rasmussen (CUED) Lecture 1: Probability Fundamentals January 22nd, 2008 12 / 16

Probability distributions The probability function specifies that the random variable X takes on the value x with a certain probability, written p(x = x). Example: X represents the number of eyes on a fair die. The probability of rolling a 5 is 1/6, written p(x = 5) = 1/6. The notation p(x = x) is precise but a bit pedantic. Sometimes, we use the shorthand p(x), when it is clear from the context which random variable we are talking about. The cumulative probability function, F(x) is related to the probability function through: F(x) = p(x x). Rasmussen (CUED) Lecture 1: Probability Fundamentals January 22nd, 2008 13 / 16

Example Let Z be the sum of the values of two fair dice. Altogether, there are 36 possible outcomes for the two dice. For each value of the random variable, the probability is the number of outcomes which agrees with this value of Z divided by the total number of outcomes. probability, p(z) 6/36 4/36 2/36 0/36 2 3 4 5 6 7 8 9 10 11 12 value z For example, you can get Z = 4 in 3 possible ways (1,3), (2,2) and (3,1). Rasmussen (CUED) Lecture 1: Probability Fundamentals January 22nd, 2008 14 / 16

Simple properties of probability distributions The mean of a random variable is defined as E[X] = i x i p(x i ) = x p(x). The mean is also called the average or expectation. Example: The mean number of eyes when rolling a fair die is 3.5. Example: The mean number of days in a calendar month is 365.25/12. Note: the mean doesn t have to correspond to a possible event. The mean provides a very basic but useful characterization of a probability distribution. Examples: average income, life expectancy, radioactive half-life etc. Example: You can take expectations of functions of random variables: E[f (X)] = i f (x i )p(x i ). Rasmussen (CUED) Lecture 1: Probability Fundamentals January 22nd, 2008 15 / 16

Randomness, Surprise and Entropy How random is something? The surprise of an event is given by surprise(x i ) = log(p(x i )). The lower the probability, the more surprised we are when the event occurs. The mean or average surprise is called the entropy and quantifies the amount of randomness entropy(p) = E[ log(p)] = i log(p(x i ))p(x i ). Example: The entropy associated with flipping a fair coin is 2 log( 1 2 ) 1 2 = 1 bit (when the log is taken to base 2). An unfair coin has lower entropy, why? Example: The entropy of a fair die is 2.6 bits. Note the strong indication that information and probability are intricately linked. Rasmussen (CUED) Lecture 1: Probability Fundamentals January 22nd, 2008 16 / 16