Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Similar documents
Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Review of Probabilities and Basic Statistics

Introduction to Mobile Robotics Probabilistic Robotics

M378K In-Class Assignment #1

Bayesian Inference and MCMC

Math Review Sheet, Fall 2008

Why Probability? It's the right way to look at the world.

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Classical and Bayesian inference

Bayesian Approaches Data Mining Selected Technique

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

Lecture 1: Probability Fundamentals

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Naïve Bayes classification

Probability and Inference

Probabilistic Robotics

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Probability Review. Chao Lan

Lecture 1: Bayesian Framework Basics

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

MA : Introductory Probability

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

What is Probability? Probability. Sample Spaces and Events. Simple Event

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Sociology 6Z03 Topic 10: Probability (Part I)

Quantifying uncertainty & Bayesian networks

Basics of Stochastic Modeling: Part II

Strong Lens Modeling (II): Statistical Methods

Cartesian-product sample spaces and independence

Discrete Random Variables

Introduction to Stochastic Processes

EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY

2. A Basic Statistical Toolbox

Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University

Review of Statistics

Communication Theory II

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.


Counting principles, including permutations and combinations.

Consider an experiment that may have different outcomes. We are interested to know what is the probability of a particular set of outcomes.

Data Analysis and Uncertainty Part 1: Random Variables

Probability, Entropy, and Inference / More About Inference

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Uncertainty. Chapter 13

Probability Theory for Machine Learning. Chris Cremer September 2015

Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification

Algorithms for Uncertainty Quantification

Outline PMF, CDF and PDF Mean, Variance and Percentiles Some Common Distributions. Week 5 Random Variables and Their Distributions

Basic Probability. Robert Platt Northeastern University. Some images and slides are used from: 1. AIMA 2. Chris Amato

Probability. Paul Schrimpf. January 23, UBC Economics 326. Probability. Paul Schrimpf. Definitions. Properties. Random variables.

Single Maths B: Introduction to Probability

Analysis of Experimental Designs

Introduction to Probability and Statistics (Continued)

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

Review (Probability & Linear Algebra)

Basic Probability and Decisions

Discrete Random Variables

Artificial Intelligence

Recitation 2: Probability

Preliminary statistics

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Chapter 4. Continuous Random Variables 4.1 PDF

Probability Theory Review Reading Assignments

5. Conditional Distributions

Probability Theory and Simulation Methods

Probability theory basics

Decision making and problem solving Lecture 1. Review of basic probability Monte Carlo simulation

Resolution or modus ponens are exact there is no possibility of mistake if the rules are followed exactly.

Dept. of Linguistics, Indiana University Fall 2015

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Propositional Logic and Probability Theory: Review

Generative Techniques: Bayes Rule and the Axioms of Probability

Machine Learning Srihari. Probability Theory. Sargur N. Srihari

Artificial Intelligence Uncertainty

Set Theory. Pattern Recognition III. Michal Haindl. Set Operations. Outline

CIS 390 Fall 2016 Robotics: Planning and Perception Final Review Questions

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Lecture 2: Repetition of probability theory and statistics

Terminology. Experiment = Prior = Posterior =

STAT J535: Introduction

Basic Probabilistic Reasoning SEG

Sampling Distributions

(c) Find the product moment correlation coefficient between s and t.

Probability Distributions

Gamma and Normal Distribuions

Introduction to Statistical Inference Self-study

Chapter 2. Probability

Formulas for probability theory and linear models SF2941

Math438 Actuarial Probability

Review: mostly probability and some statistics

Probability Review. September 25, 2015

Design of Engineering Experiments

Probabilistic Reasoning. (Mostly using Bayesian Networks)

AUTOMOTIVE ENVIRONMENT SENSORS

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

p(z)

COMP5211 Lecture Note on Reasoning under Uncertainty

Chapter 6: Functions of Random Variables

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

Transcription:

Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2017 1

Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields Probability Mainly a field of theoretical mathematics Deals with predicting the likelihood of events given a set of assumptions (e.g. a distribution) Statistics Field of applied mathematics Deals with the collection, analysis, and interpretation of data Manfred Huber 2017 2

Statistics Statistics deals with real world data Data collection How do we have to collect the data to get valid results Analysis of data What properties does the data have What distribution does it come from Interpretation of the data Is it different from other data What could cause issues in the data Manfred Huber 2017 3

Probability Probability is a formal framework to model likelihoods mainly used to make predictions There are two main (interchangeable) views of probability Subjective / uncertainty view: Probabilities summarize the effects of uncertainty on the state of knowledge In Bayesian probability all types of uncertainty are combined in one number Frequency view: Probabilities represent relative frequencies of events P(e) = (# of times of event e) / (# of events) Manfred Huber 2017 4

Data Modeling & Analysis Techniques Probability Theory Manfred Huber 2017 5

Probability Random variables define the entities of probability theory Propositional random variables: E.g.: IsRed, Earthquake Multivalued random variables: E.g.: Event, Color, Weather Real-Valued random variables: E.g.: Height, Weight Manfred Huber 2017 6

Axioms of Probability Probability follows a fixed set of rules Propositional random variables: P(A)![ 0..1] P(T ) = 1, P(F) = 0 P(A! B) = P(A) + P(B) " P(A # B) P(A! B) = P(A)P(B A) " P(X = x) = 1 x!{t,f} Manfred Huber 2017 7

Axioms of Probability The same axioms apply to multi-valued and continuous random variables Multi-valued variables ( X!S = {x 1,..., x N };Y,Z " S ): P(Y )![ 0..1] P(Y = S) = 1, P(Y =!) = 0 P(Y! Z) = P(Y ) + P(Z) " P(Y # Z) P(Y! Z) = P(Y )P(Z Y ) " P(X = x) = 1 x!s Manfred Huber 2017 8

Continuous Random Variables The probability of continuous random variables requires additional tools The probability of a continuous random variable to take on a specific value is often 0 for all (or almost all) possible values Probability has to be defined over ranges of values P(a! X! b) Individual assignments to random variables have to be addressed using probability densities p(x = x)![ 0.." ] Manfred Huber 2017 9

Continuous Random Variables Probability density is a measure of the increase in likelihood when adding the corresponding value to the range of values b P(a! X! b) = p(x = x)dx " a The probability density is effectively the derivative of the cumulative probability distribution p(x = x) = df(x) ; F(x) = P(!" # X # x) dx Manfred Huber 2017 10

Probability Syntax Unconditional or prior probabilities represent the state of knowledge before new observations or evidence P(H) A probability distribution gives values for all possible assignments to a random variable A joint probability distribution gives values for all possible assignments to all random variables Manfred Huber 2017 11

Conditional Probability Conditional probabilities represent the probability after certain observations or facts have been considered P(H E) is the posterior probability of H after evidence E is taken into account Bayes rule allows to derive posterior probabilities from prior probabilities P(H E) = P(E H) P(H)/P(E) Manfred Huber 2017 12

Conditional Probability Probability calculations can be conditioned by conditioning all terms Often it is easier to find conditional probabilities Conditions can be removed by marginalization P(H ) =! E P(H E)P(E) Manfred Huber 2017 13

Joint Distributions A joint distribution defines the probability values for all possible assignments to all random variables Exponential in the number of random variables Conditional probabilities can be computed from a joint probability distribution P(A B) = P(A! B) / P(B) Manfred Huber 2017 14

Inference Inference in probabilistic representation involves the computation of (conditional) probabilities from the available information Most frequently the computation of a posterior probability P(H E) form a prior probability P(H) and new evidence E Manfred Huber 2017 15

Statistics Statistics attempt to represent the important characteristics of a set of data items (or of a probability distribution) and the uncertainty contained in the set (or the distribution). Statistics represent different attributes of the probability distribution represented by the data Statistics are aimed at making it possible to analyze the data based on its important characteristics Manfred Huber 2017 16

Experiment and Sample Space A (random) experiment is a procedure that has a number of possible outcomes and it is not certain which one will occur The sample space is the set of all possible outcomes of an experiment (often denoted by S). Examples: Coin : S={H, T} Two coins: S={HH, HT, TH, TT} Lifetime of a system: S={0.. } Manfred Huber 2017 17

Statistics A number of important statistics can be used to characterize a data set (or a population from which the data items are drawn) Mean Median Mode Variance Standard deviation Manfred Huber 2017 18

Mean The arithmetic mean represents the average value of data set {X i } µ N i= = 1 N X i The arithmetic mean is the expected value of a random variable, i.e. the expected value of a data item drawn at random from a population µ = E[X ] µ Manfred Huber 2017 19

Median and Mode The median m is the middle of a distribution { X X m} = { X X m} i i The mode of a distribution is the most frequently (i.e. most likely) value i i Manfred Huber 2017 20

Variance and Standard Deviation The variance represents the spread of a distribution N In a data set {X i } an unbiased estimate s 2 for the variance can be calculated as N N-1 is often called the number of degrees of freedom of the data set The standard deviation! is the square root of the variance! 2 σ s N 2 i= = 1 2 = i= 1 ( X µ) ( X i i In the case of a sample set, s is often referred to as standard error 2 X ) N 1 2 Manfred Huber 2017 21

Moments Moments are important to characterize distributions r th moment: E "( x! a) r $ # % Important moments: Mean: Variance: Skewness: "( ) 1 E # x! 0 $ % E # x! µ "( ) 2 E # x! µ $ % "( ) 3 $ % Manfred Huber 2017 22