Probability and Statistics. Terms and concepts

Similar documents
Probability and Statistics. Joyeeta Dutta-Moscato June 29, 2015

Objective - To understand experimental probability

Lecture 1: Probability Fundamentals

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace

Conditional Probability. CS231 Dianna Xu

How do we compare the relative performance among competing models?

The Bayes Theorema. Converting pre-diagnostic odds into post-diagnostic odds. Prof. Dr. F. Vanstapel, MD PhD Laboratoriumgeneeskunde UZ KULeuven

Permutation. Permutation. Permutation. Permutation. Permutation

Frequentist Statistics and Hypothesis Testing Spring

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

With Question/Answer Animations. Chapter 7

Data Mining. Chapter 5. Credibility: Evaluating What s Been Learned

Probability. Chapter 1 Probability. A Simple Example. Sample Space and Probability. Sample Space and Event. Sample Space (Two Dice) Probability

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

MAT 271E Probability and Statistics

Statistics for Managers Using Microsoft Excel/SPSS Chapter 4 Basic Probability And Discrete Probability Distributions

I - Probability. What is Probability? the chance of an event occuring. 1classical probability. 2empirical probability. 3subjective probability

Evaluation & Credibility Issues

Statistical methods in recognition. Why is classification a problem?

Statistics for Managers Using Microsoft Excel (3 rd Edition)

Diagnostics. Gad Kimmel

Denker FALL Probability- Assignment 6

Probability. Lecture Notes. Adolfo J. Rumbos

AMS7: WEEK 2. CLASS 2

KDF2C QUANTITATIVE TECHNIQUES FOR BUSINESSDECISION. Unit : I - V

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Introduction to Supervised Learning. Performance Evaluation

Review. More Review. Things to know about Probability: Let Ω be the sample space for a probability measure P.

MATH 19B FINAL EXAM PROBABILITY REVIEW PROBLEMS SPRING, 2010

MAT 271E Probability and Statistics

Probability theory basics

Big Data Analytics: Evaluating Classification Performance April, 2016 R. Bohn. Some overheads from Galit Shmueli and Peter Bruce 2010

Probability Notes (A) , Fall 2010

Mutually Exclusive Events

Bayes Formula. MATH 107: Finite Mathematics University of Louisville. March 26, 2014

Conditional Probability

1 Probability Theory. 1.1 Introduction

Review of Statistics

L2: Review of probability and statistics

MAE 493G, CpE 493M, Mobile Robotics. 6. Basic Probability

Stephen Scott.

Chapter 7 Wednesday, May 26th

Formal Modeling in Cognitive Science

Probability Theory and Applications

Formal Modeling in Cognitive Science Lecture 19: Application of Bayes Theorem; Discrete Random Variables; Distributions. Background.

Performance Evaluation

Basic Statistics and Probability Chapter 3: Probability

Statistical Theory 1

ELEG 3143 Probability & Stochastic Process Ch. 1 Probability

Performance Evaluation and Hypothesis Testing

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

02 Background Minimum background on probability. Random process

Probability and Probability Distributions. Dr. Mohammed Alahmed

This exam contains 13 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

A Event has occurred

Contents. Decision Making under Uncertainty 1. Meanings of uncertainty. Classical interpretation

Lecture 01: Introduction

2011 Pearson Education, Inc

4. Suppose that we roll two die and let X be equal to the maximum of the two rolls. Find P (X {1, 3, 5}) and draw the PMF for X.

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Conditional Probability, Independence and Bayes Theorem Class 3, Jeremy Orloff and Jonathan Bloom

Epidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval

Review Basic Probability Concept

Announcements. Proposals graded

STAT Chapter 3: Probability

(1) Introduction to Bayesian statistics

Statistical Methods for the Social Sciences, Autumn 2012

Bayes Theorem & Diagnostic Tests Screening Tests

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Let us think of the situation as having a 50 sided fair die; any one number is equally likely to appear.

Statistics Primer. A Brief Overview of Basic Statistical and Probability Principles. Essential Statistics for Data Analysts Using Excel

Random Variable. Discrete Random Variable. Continuous Random Variable. Discrete Random Variable. Discrete Probability Distribution

Probability and Independence Terri Bittner, Ph.D.

Basic Probability Reference Sheet

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

MAT 271E Probability and Statistics

Probability Theory for Machine Learning. Chris Cremer September 2015

Statistical testing. Samantha Kleinberg. October 20, 2009

2. AXIOMATIC PROBABILITY

Probability and Statistics

Bayesian Inference. Introduction

Econ 325: Introduction to Empirical Economics

STAT200 Elementary Statistics for applications

Probability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability?

Performance Evaluation and Comparison

Hypothesis Testing. Testing Hypotheses MIT Dr. Kempthorne. Spring MIT Testing Hypotheses

What is Probability? Probability. Sample Spaces and Events. Simple Event

Conditional probability

Conditional Probability & Independence. Conditional Probabilities

18.05 Practice Final Exam

Probability deals with modeling of random phenomena (phenomena or experiments whose outcomes may vary)

Probability. Introduction to Biostatistics

Probability & Statistics - FALL 2008 FINAL EXAM

Statistical Inference. Why Use Statistical Inference. Point Estimates. Point Estimates. Greg C Elvers

Parameter Learning With Binary Variables

MATH 10 INTRODUCTORY STATISTICS

Point Estimation. Vibhav Gogate The University of Texas at Dallas

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Null Hypothesis Significance Testing p-values, significance level, power, t-tests

Performance Evaluation

Transcription:

Probability and Statistics Joyeeta Dutta Moscato June 30, 2014 Terms and concepts Sample vs population Central tendency: Mean, median, mode Variance, standard deviation Normal distribution Cumulative distribution Descriptive Statistics Hypothesis Null hypothesis (H 0 ) Alternate hypothesis (H A ) Significance P-value Confidence Interval Statistical Hypothesis Testing 1

Probability: How likely is it? How likely is a certain observation? Possible Outcomes Head, Tail P(Head) =? P(Tail) =? 1, 2, 3, 4, 5, 6 P(1) =? P(2) =?.. P(6) =? Probability of Multiple Events Toss a coin twice. How likely are you to observe 2 Heads? P(2 Heads) = P(Head) x P(Head) Key condition: INDEPENDENCE What is the DISTRIBUTION of outcomes? 2

Probability of Multiple Events Toss a coin twice. How likely are you to observe 2 Heads? P(2 Heads) = P(Head) x P(Head) Key condition: INDEPENDENCE What is the DISTRIBUTION of outcomes? P(2 Heads) = ¼ P(2 Tails) = ¼ P(1 Head) = P(1 Head, 1 Tail) + P( 1 Tail, 1 Head) = ¼ + ¼ = ½ Key condition: Must add to 1 Probability of Multiple Events Toss a coin twice. How likely are you to observe 2 Heads? P(2 Heads) = P(Head) x P(Head) Key condition: INDEPENDENCE What is the DISTRIBUTION of outcomes? P(2 Heads) = ¼ P(2 Tails) = ¼ P(1 Head) = P(1 Head, 1 Tail) + P( 1 Tail, 1 Head) = ¼ + ¼ Histogram of outcomes of 10 tosses = ½ Key condition: Must sum to 1 3

Normal Distribution As the number of independent (random) events grows, the distribution approaches a NORMAL or GAUSSIAN distribution This property is often used in statistics and science http://www.mathsisfun.com/data/standard normal distribution.html Cumulative Distribution The probability distribution shows the probability of the value X The cumulative distribution shows the probability of a value less than or equal to X Wikipedia: http://en.wikipedia.org/wiki/cumulative_distribution_function 4

Statistical Hypothesis Testing You are running experiments to test the effect of a drug on subjects. How likely is it that the effect would be observed even if no real relation exists? If the likelihood is sufficiently small (eg. < 1%), then it can be assumed that a real relation exists. Otherwise, any observed effect may simply be due to chance H 0 : Null hypothesis No relation exists H A : Alternate hypothesis There is some sort of relation Statistical Hypothesis Testing SIGNIFICANCE LEVEL is decided a priori to decide whether H 0 is accepted or rejected. (Eg: 0.1, 0.5, 0.01) If P-VALUE < significance level, then H 0 is rejected. i.e. The result is considered STATISTICALLY SIGNIFICANT Wikipedia: http://en.wikipedia.org/wiki/p value 5

Error reporting How reliable is the measurement? (How reliable is the estimate?) Eg: 95% CONFIDENCE INTERVAL We are 95% confident that the true value is within this interval STANDARD ERROR can be used to approximate confidence intervals Standard error = Standard deviation of the sampling distribution Correlation When we say that two genes are correlated, we mean that they vary together. But how to quantify the degree of correlation? Pearson s r measures the extent to which two random variables are linearly related. A value of 1 indicates a perfect positive correlation (that is, as one variable increases, the other increases proportionally in linear fashion). A value of -1 indicates a perfect negative correlation. 6

Positive Correlations Negative Correlations 7

What do correlations tell us? Interesting site: http://www.tylervigen.com/ So how do we do make statements of causality? - Can ask the question: How likely is event X given an event Y? Back to Probability 0 < Prob < 1 P(A) = 1 P(A C ) [A C = Complement of A] If events A and B are independent, (event B has no effect on the probability of event A) Then: P (A, B) = P(A) P(B) If they are not independent, Then: P (A, B) = P(A B) P(B) P (A, B) = JOINT PROBABILITY of A and B P (A B) = CONDITIONAL PROBABILITY of A given B 8

Exercise 1 We are given 2 urns, each containing a collection of colored balls. Urn 1 contains 2 white and 3 blue balls; Urn 2 contains 3 white and 4 blue balls. A ball is drawn at random from urn 1 and put into urn 2, and then a ball is picked at random from urn 2 and examined. What is the probability that the ball is blue? Bayes Theorem P (A B) = P (B A) P(A) P (B) How? so or P (A, B) = P(A B) P(B) P(A B) = P (A, B) / P(B) P(A B) = P(B A) P(A) / P(B) P (A, B) = P(B, A) P (B, A) = P(B A) P(A) Also, This is equivalent to: P (A B) = P (B A) P(A) P (B A) P(A) + P (B A C ) P(A C ) 9

Contingency Table Courtesy: Rich Tsui, PhD Contingency Table You have developed a test to detect a certain disease What is the True Positive Rate (TPR) and True Negative Rate (TNR) of this test? Sensitivity = TPR = TP / TP + FN = P(Test+ Disease+) Specificity = TNR = TN / TN + FP = P(Test- Disease-) What is the Positive Predictive Value (PPV) and Negative Predictive Value (NPV)? PPV = TP / TP + FP = P(Disease+ Test+) NPV = TN / TN + FN = P(Disease- Test-) 10

Sensitivity (TPR) The probability of sick people who are correctly identified as having the condition Specificity (TNR) The probability of healthy people who are correctly identified as not having the condition Positive predictive value (PPV) Given that you test positive, the probability that you actually have the condition. Negative predictive value (NPV) Given that you test negative, the probability that you actually do not have the condition. Exercise 2 The results of a hypothetical study to measure test performance of the PCR test for HIV are shown in the 2 x 2 table in Table 1. (a) Calculate the sensitivity, specificity, disease prevalence, positive predictive value (PV+), and negative predictive value (PV-). (b) Use the TPR and TNR calculated in part (a) to fill the 2 x 2 table in Table 2. Calculate the disease prevalence, positive predictive value (PV+), and negative predictive value (PV-). 11

Recall Test question: The Prevalence of a particular disease is 1/10. A test for this disease provides a correct diagnosis in 90% of cases (i.e. if you have the disease, 90% of the time you will test positive, and if you do not have the disease, 90% of the time you will test negative). Given that you test positive for the disease, what is the probability that you actually have the disease? Solution: P (D+) = 0.1 P (T+ D+) = 0.9 P (T- D-) = 0.9, therefore P(T+ D-) = 1 0.9 = 0.1 P (D+ T+) = = 0.5 Prevalence = Prior probability in population P (T+ D+) P(D+) P (T+ D+) P(D+) + P (T+ D-) P(D-) T+ Test positive T- Test negative D+ Disease present D- Disease absent = (0.1) (0.9) (0.1) (0.9) + (0.9) (0.1) Assessing quality of the predictive model ROC-AUROC The area under the curve is equal to the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one. Q: Why is the blue curve worthless? 12