PHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality - graphical and numeric

Similar documents
Introduction to Bayesian Inference

Bayesian RL Seminar. Chris Mansley September 9, 2008

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Computational Perception. Bayesian Inference

Bayesian Inference and MCMC

Bayesian Inference. Introduction

Hierarchical Models & Bayesian Model Selection

A primer on Bayesian statistics, with an application to mortality rate estimation

Probability and Estimation. Alan Moses

STAT 425: Introduction to Bayesian Analysis

CSC321 Lecture 18: Learning Probabilistic Models

Probability theory basics

Discrete Binary Distributions

Properties of Probability

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Machine Learning

Probabilistic Graphical Models

COS513 LECTURE 8 STATISTICAL CONCEPTS

Bayesian Analysis for Natural Language Processing Lecture 2

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Computational Cognitive Science

Outline. 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks

Bayesian Models in Machine Learning

Computational Cognitive Science

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Bayesian Methods: Naïve Bayes

Grundlagen der Künstlichen Intelligenz

Econ 325: Introduction to Empirical Economics

CS 361: Probability & Statistics

Bayesian Inference. p(y)

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace

An introduction to biostatistics: part 1

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Computational Cognitive Science

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

LEARNING WITH BAYESIAN NETWORKS

Introduction to Machine Learning CMU-10701

Parametric Techniques

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Probabilistic and Bayesian Machine Learning

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

Bayesian Statistics. State University of New York at Buffalo. From the SelectedWorks of Joseph Lucke. Joseph F. Lucke

Probability Rules. MATH 130, Elements of Statistics I. J. Robert Buchanan. Fall Department of Mathematics

Single Maths B: Introduction to Probability

Time Series and Dynamic Models

Lecture : Probabilistic Machine Learning

Review: Probability. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler

Parametric Techniques Lecture 3

What are the Findings?

Point Estimation. Vibhav Gogate The University of Texas at Dallas

Probability Theory Review

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Probability Theory for Machine Learning. Chris Cremer September 2015

Uncertain Knowledge and Bayes Rule. George Konidaris

Today we ll discuss ways to learn how to think about events that are influenced by chance.

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

Probability Review - Bayes Introduction

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

CS 361: Probability & Statistics

Introduction to Bayesian inference

Lecture 1: Probability Fundamentals

Class 26: review for final exam 18.05, Spring 2014

an introduction to bayesian inference

Probabilistic Reasoning

PMR Learning as Inference

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Basic Probabilistic Reasoning SEG

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Quantitative Understanding in Biology 1.7 Bayesian Methods

Bayesian Machine Learning

Bayesian Learning (II)

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan

Introduction into Bayesian statistics

Review of Probabilities and Basic Statistics

Hidden Markov Models 1

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Bayesian hypothesis testing

Overview of Probability. Mark Schmidt September 12, 2017

Bayesian data analysis using JASP

Advanced Probabilistic Modeling in R Day 1

Lecture 2: Conjugate priors

Ling 289 Contingency Table Statistics

CS 188: Artificial Intelligence Spring Today

Lecture Slides - Part 1

Learning P-maps Param. Learning

Some Probability and Statistics

COMP90051 Statistical Machine Learning

STAT J535: Introduction

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

(1) Introduction to Bayesian statistics

Frequentist Statistics and Hypothesis Testing Spring

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Bayesian Analysis (Optional)

Some Basic Concepts of Probability and Information Theory: Pt. 2

Naïve Bayes classification

1 Maximum Likelihood Estimation

Linear Models A linear model is defined by the expression

Transcription:

PHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality - graphical and numeric 2. Preliminary Analysis: Clarify Directions for Analysis Identifying Data Structure: What is to be regarded as an individual? Are individuals grouped or associated? What types of variables are measured on each individual? Response versus Explanatory Variables Are any observations missing? 3. Definite Analysis Formulation of Model, Fitting,... 4. Presentation of Conclusions 1

STYLES OF ANALYSIS Descriptive Methods Graphical Numerical summaries Probabilistic Methods probability model for data probability model for any uncertain quantities probabilistic properties of estimates and uncertainties graphics help convey results numerical statements for results/conclusions 2

TYPES OF STUDIES Experiments: RCT: Randomize Treatment A and B; outcome survival Difference in responses due to treatment Observational Studies (no control by investigator) Hospital Records of Treatments and Survival Other causes? Caution with Conclusions! Prospective Studies (Individuals selected by investigator) random sample of smokers/non-smokers; followup response cancer/non-cancer Retrospective Studies (Individuals selected by investigator) sample of cancer/non-cancer; identify differences in explanatory variables 3

ANALYSIS GOALS Estimation of uncertain quantities Prediction of future events Tests of hypotheses Theoretical framework based on probability models 4

Probability: Measurement of uncertainty Interpretations? Counting: equally likely outcomes (card and dice games, sampling) Frequency: Hurricane in September? Relative frequency from similar years Subjective: measure of belief, judgment Can include counting, frequency cases also applies to unique events the Duke/E.NC game, probability of terrorist attacks expert opinion computer simulations 5

Probability Models At least Two Physical Meanings for Probability Models Well defined Population and individuals (the sample) are drawn from the population in a random way could conceivably measure the entire population probability model specifies properties of the full population Observations are made on a system subject to random fluctuations probability model specifies what would happen if hypothetically observations were repeated again and again under the same conditions. Population is hypothetical Models almost always involve unknown parameters! Models are almost always an approximation! 6

Simplest statistical inference context: DIAGNOSIS example x = 0, 1 Data (to be observed) results of experiment/observation (Binary) Sample Space x {0, 1} discrete, simple test outcome θ = 0, 1 Parameter, Truth, state of nature, uncertain/unknown Primary objective: Inference on θ (Binary) Parameter Space θ {0, 1} discrete, simple Disease state Uncertainty about outcomes described by probabilities p(x θ) (conditional on θ) NOTATION: p(x θ), p(x = 1 θ = 1), P r(x = 1 θ = 1) 7

p(x θ) defines the Sampling model Where do these probabilities come from? Measures of likelihood of observed data given states of nature x = 1 is 9.9 times as likely when θ = 1 than when θ = 0 Likelihood ratios r(x) = p(x θ = 1)/p(x θ = 0) r(1) = 9.9, r(0) = 0.0111... Evidence, Information to compare θ = 1 : 0 given data x 8

Odds of disease (initial or prior odds), Odds = Prob/(1-Prob) or p(θ = 1) p(θ = 1) o(θ = 1) = = 1 p(θ = 1) p(θ = 0) Prob = Odds/(1+Odds) or p(θ = 1) = o(θ = 1)/(1 + o(θ = 1)) p(θ = 1) =.5 and o(θ = 1) = 1 p(θ = 1) = 0.01 and o(θ = 1) = 0.0101... p(θ = 1) = 0.99 and o(θ = 1) = 99. 9

JOINT PROBABILITY DISTRIBUTION: p(x, θ) = p(x θ)p(θ) Marginal distribution for x via Law of Total Probability: p(x) = p(x θ = 0)p(θ = 0) + p(x θ = 1)p(θ = 1) Bayes theorem: p(θ x) = p(θ)p(x θ) p(x) Bayes theorem reverses conditioning Now x is fixed, conditioned upon Initial/prior probability distribution for θ is revised/updated Final probability = Posterior probability p(θ x) 10

In odds form, o(θ = 1 x) = o(θ = 1)r(x) based on observed likelihood ratio r(x) = p(x θ = 1)/p(x θ = 0) Posterior odds = Prior odds Likelihood ratio Scientific learning/statistical Inference: p(θ) p(θ x) Mapping via likelihood function p(x θ) from sampling model Formally, p(θ H) p(θ H, x) where H = prior information Technically: Simply apply Bayes theorem Final inferences depend on both data x and prior information H Sensitivity to prior: Any prior a = p(θ = 1) implies posterior p(θ x) Variation with respect to a for given data, hence fixed r(x)...? graph p(θ = 1 x) versus a 11

Statistics By Example Your mischievous brother owns a coin which you know to be loaded so that it comes up heads 70% of the time. He comes to you with a coin and wants to make a bet with you. You don t know if this is the loaded coin or a fair one. He lets you flip it 5 times to check it out, and you get 2 heads and 3 tails. Which coin do you think it is? 1. What is our unknown parameter of interest? 2. What are our data? 3. What is our probability model for the data? 4. How do we estimate our unknown parameter? 12

The Unknown Parameter of Interest We want to know which coin it is. We could denote the coin type by θ, where θ {fair, loaded}. The Data Our data are the 5 flips of the coin. The observed values are 2 heads and 3 tails. We could write X for the number of heads observed in 5 flips. 13

The Probability Model for the Data We use a binomial model for the number of heads, where the probability of a single flip being heads equals θ and flips are independent of outcomes of other flips: P (X = x θ = fair) = P (X = x θ = loaded) = ( ) ( ) 5 5 1 x 2 ( ) 5 (.7) x (.3) 5 x x For a particular value of θ, the model gives us a probability model for X (the probability of observing different values of x). Once we have observed X (here X = 2), we can think of the probability model as a function of θ alone. This gives the probability of the particular observed data as a function of θ. This function is called the likelihood and is denoted L(θ). 14

Estimating the Unknown Parameter - I One way to make statements about the value of θ is Maximum Likelihood. The idea is that our best guess of the true value of θ is the value that maximizes the likelihood. 0.3125 if θ = fair L(θ X = 2) = 0.1323 if θ = loaded So in this example, we would guess that the coin is fair. Some problems with this approach: It can be hard to make useful statements of our confidence/estimation error. (In particular, we can t give a probability of being right.) It can be hard to incorporate other information. 15

Estimating the Unknown Parameter - II A different (and our preferred) approach to making statements about the value of θ is the Bayesian approach. In this case we express prior uncertainty about θ using a probability distribution. We start with a prior distribution for θ, P (θ), (representing our beliefs before seeing the data). We then observe the data and compute a posterior distribution, P (θ X), by using Bayes Theorem. P (θ X) = = P (θ, X) P (X) P (X θ)p (θ) P (X θ)p (θ)dθ L(θ)P (θ) 16

Example Since you know your brother reasonably well, you might think that there is a 60% chance that this is the loaded coin, before you get to flip it 5 times. You then get 2 heads and 3 tails. What is your posterior probability that the coin is loaded? P r(loaded X) = = P r(x loaded)p r(loaded) P r(x loaded)p r(loaded) + P r(x fair)p r(fair) = ( 5 2 ) (.7) 2 (.3) 3 (.6) ( 5 2) (.7)2 (.3) 3 (.6) + ( 5 2) (.5)2 (.5) 3 (.4) 0.07938 = 0.07938 + 0.125 = 0.388 17

Prior Sensitivity We could have made a different choice for a prior: P r(loaded) =.9 P r(loaded X) = 0.792 P r(loaded) =.5 P r(loaded X) = 0.297 Note that the non-informative prior gives the same answer as the MLE. Also note that since the sample size is small, the result is highly sensitive to the choice of prior. As the sample size increases, the prior matters less. Advantages of this approach: Our answer is a probability distribution for θ, so we can make probability statements and error estimates. We can incorporate other information into the prior. 18

Pr(loaded X) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Pr(loaded) 19