Announcements. Proposals graded

Size: px
Start display at page:

Download "Announcements. Proposals graded"

Transcription

1 Announcements Proposals graded Kevin Jamieson

2 Hypothesis testing Machine Learning CSE546 Kevin Jamieson University of Washington October 30, Kevin Jamieson 2

3 Anomaly detection You are Amazon and wish to detect transactions with stolen credit cards. For each transaction we observe a feature vector X: { -address, age of account, anonymous PO box, price of items, copies of purchased item, etc. } and the transaction is either real (Y=0) or fraudulent (Y=1) Hypothesis testing: H0: X P 0 H1: X P 1 P k = P(X = x Y = k) Your job is to build a (possibly randomized) decision function (x) 2 {0, 1} Kevin Jamieson

4 Anomaly detection Hypothesis testing: H0: X P 0 P k = P(X = x Y = k) H1: X P 1 Your job is to build a (possibly randomized) decision function (x) 2 {0, 1} Bayesian Hypothesis Testing: Assume P(Y = 1) = P(X = x) = P 1 (x)+(1 )P 0 (x) arg min P XY (Y 6= (X)) Kevin Jamieson

5 Anomaly detection Hypothesis testing: H0: X P 0 P k = P(X = x Y = k) H1: X P 1 Your job is to build a (possibly randomized) decision function (x) 2 {0, 1} Minimax Hypothesis Testing: arg min max{p( (X) =0 Y = 1), P( (X) =1 Y = 0)} Kevin Jamieson

6 Anomaly detection Hypothesis testing: H0: X P 0 P k = P(X = x Y = k) H1: X P 1 Your job is to build a (possibly randomized) decision function (x) 2 {0, 1} Neyman-Pearson Hypothesis Testing: arg max P( (X) =1 Y = 1), subject to P( (X) =1 Y = 0) apple Kevin Jamieson

7 Neyman-Pearson Testing Hypothesis testing: H0: X P 0 P k = P(X = x Y = k) H1: X P 1 Neyman-Pearson Hypothesis Testing: arg max P( (X) =1 Y = 1), subject to P( (X) =1 Y = 0) apple Theorem: The optimal test has the form P( (X) = 1) = and satisfies P( (X) =1 Y = 0) = 8 >< 1 if P 1(x) P 0 (x) > if P 1(x) P 0 (x) >: = 0 if P 1(x) P 0 (x) < Kevin Jamieson

8 Neyman-Pearson Testing Hypothesis testing: H0: X P 0 P k = P(X = x Y = k) H1: X P 1 Neyman-Pearson Hypothesis Testing: arg max P( (X) =1 Y = 1), subject to P( (X) =1 Y = 0) apple Example: Kevin Jamieson

9 ROC Curve Hypothesis testing: H0: X P 0 P k = P(X = x Y = k) H1: X P 1 1 Prob of Detection P( (X) =1 Y = 1) 0 0 Prob of False Alarm 1 P( (X) =1 Y = 0)} Kevin Jamieson

10 p-values Machine Learning CSE546 Kevin Jamieson University of Washington October 30, Kevin Jamieson 10

11 Anomaly detection You are Amazon and wish to detect transactions with stolen credit cards. For each transaction we observe a feature vector X: { -address, age of account, anonymous PO box, price of items, copies of purchased item, etc. } and the transaction is either real (Y=0) or fraudulent (Y=1) Hypothesis testing: H0: X P 0 H1: X P 1 P k = P(X = x Y = k) Your job is to build a (possibly randomized) decision function (x) 2 {0, 1} Natural to have model for P 0 (regular purchases). But what if we have no model for P 1 since people are strategic? Kevin Jamieson

12 p-value Hypothesis testing: H0: X P 0 P k = P(X = x Y = k) H1: X P 1 Your job is to build a (possibly randomized) decision function (x) 2 {0, 1} Definition p-value: probability of finding the observed, or more extreme, results when the null hypothesis H0 is true (e.g., X P 0 ) Definition p-value: a uniformly distributed random variable under the null hypothesis (e.g., X P 0 ) WARNING: A small p-value is NOT evidence that H1 is true. Kevin Jamieson

13 p-value Hypothesis testing: H0: X P 0 P k = P(X = x Y = k) H1: X P 1 Your job is to build a (possibly randomized) decision function (x) 2 {0, 1} Definition p-value: a uniformly distributed random variable under the null hypothesis (e.g., X P 0 ) P 0 (x) =N (x; µ 0, 2 ) Observe: x i 2 R p-value: p i = P 0 (X x i ) = R 1 x=x i 1 p 2 2 e (x µ 0) 2 /2 2 dx Kevin Jamieson

14 p-value: used the right way Hypothesis testing: H0: X P 0 P k = P(X = x Y = k) H1: X P 1 Your job is to build a (possibly randomized) decision function (x) 2 {0, 1} Definition p-value: a uniformly distributed random variable under the null hypothesis (e.g., X P 0 ) Set: =.05 Observe: x i 2 R p-value: p i = P 0 (X x i ) Test: Ifp i apple then reject the null hypothesis H0 Kevin Jamieson

15 p-value: used the wrong way Hypothesis testing: H0: X P 0 P k = P(X = x Y = k) H1: X P 1 Your job is to build a (possibly randomized) decision function (x) 2 {0, 1} Definition p-value: a uniformly distributed random variable under the null hypothesis (e.g., X P 0 ) BAD Set: =.05 Observe: x i 2 R p-value: p i = P 0 (X x i ) Test: Ifp i apple then reject the null hypothesis H0 If p i > repeat the experiment with new x i until p i apple Kevin Jamieson

16 p-value: used the wrong way Each day i=1,2, you measure an iid x i N (µ, 1) H0: µ =0 Under H0 the statistic Z i = 1 p i P i j=1 x j N (0, 1) 1 p i = 1 2 Z 1 e z 2 /2 dz z=z i.05 0 days i Kevin Jamieson

17 Multiple testing Machine Learning CSE546 Kevin Jamieson University of Washington October 30, Kevin Jamieson 17

18 Case study in adaptive sampling tradeoffs Drosophila RNAi screen identifies host genes important for influenza virus replication, Nature Wild type strain with 13,071 genes Inhibit a single gene infect with fluorescing virus (indicating gene s influence) 72 h microwell array Each gene i=1,2,,n you measure an H0(i): µ i =0 x i N (µ i, 1) Consider procedure for individual hypothesis testing: Set: =.05 Observe: x i 2 R p-value: p i = P 0 (X x i ) Test: Ifp i apple then reject the null hypothesis H0 Under H0, how many genes do we expect to reject the null hypothesis? 18

19 Multiple Testing If we make n rejections individually at level I 0 = {i : H0(i) istrue} E[ X i2i 0 1{p i apple }] = X i2i 0 P(p i apple ) = I 0 That s a lot of false alarms! Kevin Jamieson

20 Multiple Testing - FWER Family-wise error rate FWER= P(reject any true null) I 0 = {i : H0(i) istrue} Bonferroni rule: Reject i if p i apple /n! FWER = P [ I 0 {p i apple /n} = Kevin Jamieson

21 Multiple Testing - FDR False discovery rate FDR= E h I0 \R R i I 0 = {i : H0(i) istrue} Benjamini-Hochberg procedure: Sort p-values such that p (1) apple p (2) apple apple p (n) i max = max{i : p (i) apple i n } R = {i : i apple i max } Theorem: BH( ) satisfies FDRapple Kevin Jamieson

22 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington October 30, Kevin Jamieson 22

23 MLE Recap - coin flips Data: sequence D= (HHTHT ), k heads out of n flips Hypothesis: P(Heads) = θ, P(Tails) = 1-θ P (D ) = k (1 ) n k Maximum likelihood estimation (MLE): Choose θ that maximizes the probability of observed data: b MLE = arg max = arg max P (D ) log P (D ) b MLE = k n 2017 Kevin Jamieson 23

24 What about prior Billionaire: Wait, I know that the coin is close to What can you do for me now? You say: I can learn it the Bayesian way 2017 Kevin Jamieson 24

25 Bayesian Learning Use Bayes rule: Or equivalently: 2017 Kevin Jamieson 25

26 Bayesian Learning for Coins Likelihood function is simply Binomial: What about prior? Represent expert knowledge Conjugate priors: Closed-form representation of posterior For Binomial, conjugate prior is Beta distribution 2017 Kevin Jamieson 26

27 Beta prior distribution P(θ) Mean: Mode: Beta(2,3) Beta(20,30) Likelihood function: Posterior: 2017 Kevin Jamieson 27

28 Posterior distribution Prior: Data: α H heads and α T tails Posterior distribution: Beta(2,3) Beta(20,30) 2017 Kevin Jamieson 28

29 Using Bayesian posterior Posterior distribution: Bayesian inference: Estimate mean E[ ] = Z 1 0 P ( D)d Estimate arbitrary function f For arbitrary f integral is often hard to compute 2017 Kevin Jamieson 29

30 MAP: Maximum a posteriori approximation As more data is observed, Beta is more certain MAP: use most likely parameter: 2017 Kevin Jamieson 30

31 MAP for Beta distribution MAP: use most likely parameter: 2017 Kevin Jamieson 31

32 MAP for Beta distribution MAP: use most likely parameter: H + H 1 H + T + H + T 2 Beta prior equivalent to extra coin flips As N 1, prior is forgotten But, for small sample size, prior is important! 2017 Kevin Jamieson 32

33 Bayesian vs Frequentist Data: D Estimator: b = t(d) loss: `(t(d), ) Frequentists treat unknown θ as fixed and the data D as random. Bayesian treat the data D as fixed and the unknown θ as random 2017 Kevin Jamieson 33

34 Recap for Bayesian learning Bayesians are optimists: If we model it correctly, we output most likely answer Assumes one can accurately model: Observations and link to unknown parameter θ: Distribution, structure of unknown θ: p( ) p(x ) Frequentist are pessimists: All models are wrong, prove to me your estimate is good Makes very few assumptions, e.g. E[X 2 ] < 1 and constructs an estimator (e.g., median of means of disjoint subsets of data) Must analyze each estimate 2017 Kevin Jamieson 34

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Accouncements You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF Please do not zip these files and submit (unless there are >5 files) 1 Bayesian Methods Machine Learning

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information

Point Estimation. Vibhav Gogate The University of Texas at Dallas

Point Estimation. Vibhav Gogate The University of Texas at Dallas Point Estimation Vibhav Gogate The University of Texas at Dallas Some slides courtesy of Carlos Guestrin, Chris Bishop, Dan Weld and Luke Zettlemoyer. Basics: Expectation and Variance Binary Variables

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013 Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables? Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what

More information

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables? Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do

More information

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional

More information

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform

More information

Point Estimation. Maximum likelihood estimation for a binomial distribution. CSE 446: Machine Learning

Point Estimation. Maximum likelihood estimation for a binomial distribution. CSE 446: Machine Learning Point Estimation Emily Fox University of Washington January 6, 2017 Maximum likelihood estimation for a binomial distribution 1 Your first consulting job A bored Seattle billionaire asks you a question:

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Introduc)on to Bayesian methods (con)nued) - Lecture 16

Introduc)on to Bayesian methods (con)nued) - Lecture 16 Introduc)on to Bayesian methods (con)nued) - Lecture 16 David Sontag New York University Slides adapted from Luke Zettlemoyer, Carlos Guestrin, Dan Klein, and Vibhav Gogate Outline of lectures Review of

More information

Looking at the Other Side of Bonferroni

Looking at the Other Side of Bonferroni Department of Biostatistics University of Washington 24 May 2012 Multiple Testing: Control the Type I Error Rate When analyzing genetic data, one will commonly perform over 1 million (and growing) hypothesis

More information

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018 Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian

More information

Gaussians Linear Regression Bias-Variance Tradeoff

Gaussians Linear Regression Bias-Variance Tradeoff Readings listed in class website Gaussians Linear Regression Bias-Variance Tradeoff Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 22 nd, 2007 Maximum Likelihood Estimation

More information

Machine Learning CSE546

Machine Learning CSE546 https://www.cs.washington.edu/education/courses/cse546/18au/ Machine Learning CSE546 Kevin Jamieson University of Washington September 27, 2018 1 Traditional algorithms Social media mentions of Cats vs.

More information

Machine Learning CSE546

Machine Learning CSE546 http://www.cs.washington.edu/education/courses/cse546/17au/ Machine Learning CSE546 Kevin Jamieson University of Washington September 28, 2017 1 You may also like ML uses past data to make personalized

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,

More information

Linear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1

Linear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1 Linear Regression Machine Learning CSE546 Kevin Jamieson University of Washington Oct 5, 2017 1 The regression problem Given past sales data on zillow.com, predict: y = House sale price from x = {# sq.

More information

Statistical testing. Samantha Kleinberg. October 20, 2009

Statistical testing. Samantha Kleinberg. October 20, 2009 October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 2. MLE, MAP, Bayes classification Barnabás Póczos & Aarti Singh 2014 Spring Administration http://www.cs.cmu.edu/~aarti/class/10701_spring14/index.html Blackboard

More information

(1) Introduction to Bayesian statistics

(1) Introduction to Bayesian statistics Spring, 2018 A motivating example Student 1 will write down a number and then flip a coin If the flip is heads, they will honestly tell student 2 if the number is even or odd If the flip is tails, they

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm Math 408 - Mathematical Statistics Lecture 29-30. Testing Hypotheses: The Neyman-Pearson Paradigm April 12-15, 2013 Konstantin Zuev (USC) Math 408, Lecture 29-30 April 12-15, 2013 1 / 12 Agenda Example:

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

Hypothesis Testing. Testing Hypotheses MIT Dr. Kempthorne. Spring MIT Testing Hypotheses

Hypothesis Testing. Testing Hypotheses MIT Dr. Kempthorne. Spring MIT Testing Hypotheses Testing Hypotheses MIT 18.443 Dr. Kempthorne Spring 2015 1 Outline Hypothesis Testing 1 Hypothesis Testing 2 Hypothesis Testing: Statistical Decision Problem Two coins: Coin 0 and Coin 1 P(Head Coin 0)

More information

Comparison of Bayesian and Frequentist Inference

Comparison of Bayesian and Frequentist Inference Comparison of Bayesian and Frequentist Inference 18.05 Spring 2014 First discuss last class 19 board question, January 1, 2017 1 /10 Compare Bayesian inference Uses priors Logically impeccable Probabilities

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University September 22, 2011 Today: MLE and MAP Bayes Classifiers Naïve Bayes Readings: Mitchell: Naïve Bayes and Logistic

More information

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian

More information

High-throughput Testing

High-throughput Testing High-throughput Testing Noah Simon and Richard Simon July 2016 1 / 29 Testing vs Prediction On each of n patients measure y i - single binary outcome (eg. progression after a year, PCR) x i - p-vector

More information

Learning P-maps Param. Learning

Learning P-maps Param. Learning Readings: K&F: 3.3, 3.4, 16.1, 16.2, 16.3, 16.4 Learning P-maps Param. Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 24 th, 2008 10-708 Carlos Guestrin 2006-2008

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders

More information

Detection and Estimation Chapter 1. Hypothesis Testing

Detection and Estimation Chapter 1. Hypothesis Testing Detection and Estimation Chapter 1. Hypothesis Testing Husheng Li Min Kao Department of Electrical Engineering and Computer Science University of Tennessee, Knoxville Spring, 2015 1/20 Syllabus Homework:

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Generative Models Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266

More information

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using

More information

Bayesian Inference: Posterior Intervals

Bayesian Inference: Posterior Intervals Bayesian Inference: Posterior Intervals Simple values like the posterior mean E[θ X] and posterior variance var[θ X] can be useful in learning about θ. Quantiles of π(θ X) (especially the posterior median)

More information

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu ECE521 W17 Tutorial 6 Min Bai and Yuhuai (Tony) Wu Agenda knn and PCA Bayesian Inference k-means Technique for clustering Unsupervised pattern and grouping discovery Class prediction Outlier detection

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Bayesian Classifiers, Conditional Independence and Naïve Bayes. Required reading: Naïve Bayes and Logistic Regression (available on class website)

Bayesian Classifiers, Conditional Independence and Naïve Bayes. Required reading: Naïve Bayes and Logistic Regression (available on class website) Bayesian Classifiers, Conditional Independence and Naïve Bayes Required reading: Naïve Bayes and Logistic Regression (available on class website) Machine Learning 10-701 Tom M. Mitchell Machine Learning

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Frequentist Statistics and Hypothesis Testing Spring

Frequentist Statistics and Hypothesis Testing Spring Frequentist Statistics and Hypothesis Testing 18.05 Spring 2018 http://xkcd.com/539/ Agenda Introduction to the frequentist way of life. What is a statistic? NHST ingredients; rejection regions Simple

More information

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University Multiple Testing Hoang Tran Department of Statistics, Florida State University Large-Scale Testing Examples: Microarray data: testing differences in gene expression between two traits/conditions Microbiome

More information

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting

More information

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings: K&F: 16.3, 16.4, 17.3 Bayesian Param. Learning Bayesian Structure Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University October 6 th, 2008 10-708 Carlos Guestrin 2006-2008

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Detection theory 101 ELEC-E5410 Signal Processing for Communications Detection theory 101 ELEC-E5410 Signal Processing for Communications Binary hypothesis testing Null hypothesis H 0 : e.g. noise only Alternative hypothesis H 1 : signal + noise p(x;h 0 ) γ p(x;h 1 ) Trade-off

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/?? to Bayesian Methods Introduction to Bayesian Methods p.1/?? We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design) a study, in which the parameter

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution. Hypothesis Testing Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution. Suppose the family of population distributions is indexed

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood

More information

Notes on Decision Theory and Prediction

Notes on Decision Theory and Prediction Notes on Decision Theory and Prediction Ronald Christensen Professor of Statistics Department of Mathematics and Statistics University of New Mexico October 7, 2014 1. Decision Theory Decision theory is

More information

Detection theory. H 0 : x[n] = w[n]

Detection theory. H 0 : x[n] = w[n] Detection Theory Detection theory A the last topic of the course, we will briefly consider detection theory. The methods are based on estimation theory and attempt to answer questions such as Is a signal

More information

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing Robert Vanderbei Fall 2014 Slides last edited on November 24, 2014 http://www.princeton.edu/ rvdb Coin Tossing Example Consider two coins.

More information

For use only in [the name of your school] 2014 S4 Note. S4 Notes (Edexcel)

For use only in [the name of your school] 2014 S4 Note. S4 Notes (Edexcel) s (Edexcel) Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 1 Copyright www.pgmaths.co.uk - For AS, A2 notes and IGCSE / GCSE worksheets 2 Copyright www.pgmaths.co.uk - For AS,

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite

More information

Bayesian statistics. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Bayesian statistics. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Bayesian statistics DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall15 Carlos Fernandez-Granda Frequentist vs Bayesian statistics In frequentist statistics

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Deciding, Estimating, Computing, Checking

Deciding, Estimating, Computing, Checking Deciding, Estimating, Computing, Checking How are Bayesian posteriors used, computed and validated? Fundamentalist Bayes: The posterior is ALL knowledge you have about the state Use in decision making:

More information

Deciding, Estimating, Computing, Checking. How are Bayesian posteriors used, computed and validated?

Deciding, Estimating, Computing, Checking. How are Bayesian posteriors used, computed and validated? Deciding, Estimating, Computing, Checking How are Bayesian posteriors used, computed and validated? Fundamentalist Bayes: The posterior is ALL knowledge you have about the state Use in decision making:

More information

Intelligent Systems Statistical Machine Learning

Intelligent Systems Statistical Machine Learning Intelligent Systems Statistical Machine Learning Carsten Rother, Dmitrij Schlesinger WS2014/2015, Our tasks (recap) The model: two variables are usually present: - the first one is typically discrete k

More information

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Introduction to AI Learning Bayesian networks. Vibhav Gogate

Introduction to AI Learning Bayesian networks. Vibhav Gogate Introduction to AI Learning Bayesian networks Vibhav Gogate Inductive Learning in a nutshell Given: Data Examples of a function (X, F(X)) Predict function F(X) for new examples X Discrete F(X): Classification

More information

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Sanat K. Sarkar a, Tianhui Zhou b a Temple University, Philadelphia, PA 19122, USA b Wyeth Pharmaceuticals, Collegeville, PA

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Data Mining. CS57300 Purdue University. March 22, 2018

Data Mining. CS57300 Purdue University. March 22, 2018 Data Mining CS57300 Purdue University March 22, 2018 1 Hypothesis Testing Select 50% users to see headline A Unlimited Clean Energy: Cold Fusion has Arrived Select 50% users to see headline B Wedding War

More information

ECE531 Lecture 4b: Composite Hypothesis Testing

ECE531 Lecture 4b: Composite Hypothesis Testing ECE531 Lecture 4b: Composite Hypothesis Testing D. Richard Brown III Worcester Polytechnic Institute 16-February-2011 Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 1 / 44 Introduction

More information

Probability and Statistics. Terms and concepts

Probability and Statistics. Terms and concepts Probability and Statistics Joyeeta Dutta Moscato June 30, 2014 Terms and concepts Sample vs population Central tendency: Mean, median, mode Variance, standard deviation Normal distribution Cumulative distribution

More information

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including

More information

Bayesian Analysis for Natural Language Processing Lecture 2

Bayesian Analysis for Natural Language Processing Lecture 2 Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion

More information

Intelligent Systems Statistical Machine Learning

Intelligent Systems Statistical Machine Learning Intelligent Systems Statistical Machine Learning Carsten Rother, Dmitrij Schlesinger WS2015/2016, Our model and tasks The model: two variables are usually present: - the first one is typically discrete

More information

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use? Today Statistical Learning Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian Continuous case Learning Parameters for a Bayesian Network Naive Bayes Maximum Likelihood estimates

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Machine Learning 0-70/5 70/5-78, 78, Spring 00 Probability 0 Aarti Singh Lecture, January 3, 00 f(x) µ x Reading: Bishop: Chap, Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Announcements Homework

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into

More information

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2 Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal

More information

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples Machine Learning Bayes Basics Bayes, probabilities, Bayes theorem & examples Marc Toussaint U Stuttgart So far: Basic regression & classification methods: Features + Loss + Regularization & CV All kinds

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Evaluating Hypotheses

Evaluating Hypotheses Evaluating Hypotheses IEEE Expert, October 1996 1 Evaluating Hypotheses Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal distribution,

More information

The Random Variable for Probabilities Chris Piech CS109, Stanford University

The Random Variable for Probabilities Chris Piech CS109, Stanford University The Random Variable for Probabilities Chris Piech CS109, Stanford University Assignment Grades 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 Frequency Frequency 10 20 30 40 50 60 70 80

More information

F79SM STATISTICAL METHODS

F79SM STATISTICAL METHODS F79SM STATISTICAL METHODS SUMMARY NOTES 9 Hypothesis testing 9.1 Introduction As before we have a random sample x of size n of a population r.v. X with pdf/pf f(x;θ). The distribution we assign to X is

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

False Discovery Rate

False Discovery Rate False Discovery Rate Peng Zhao Department of Statistics Florida State University December 3, 2018 Peng Zhao False Discovery Rate 1/30 Outline 1 Multiple Comparison and FWER 2 False Discovery Rate 3 FDR

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Parametric Models: from data to models

Parametric Models: from data to models Parametric Models: from data to models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Jan 22, 2018 Recall: Model-based ML DATA MODEL LEARNING MODEL MODEL INFERENCE KNOWLEDGE Learning:

More information

Review of Probabilities and Basic Statistics

Review of Probabilities and Basic Statistics Alex Smola Barnabas Poczos TA: Ina Fiterau 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1 Overview Introduction to

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the

More information

Non-parametric Methods

Non-parametric Methods Non-parametric Methods Machine Learning Alireza Ghane Non-Parametric Methods Alireza Ghane / Torsten Möller 1 Outline Machine Learning: What, Why, and How? Curve Fitting: (e.g.) Regression and Model Selection

More information

CSE 312 Final Review: Section AA

CSE 312 Final Review: Section AA CSE 312 TAs December 8, 2011 General Information General Information Comprehensive Midterm General Information Comprehensive Midterm Heavily weighted toward material after the midterm Pre-Midterm Material

More information

Probability and Statistics. Joyeeta Dutta-Moscato June 29, 2015

Probability and Statistics. Joyeeta Dutta-Moscato June 29, 2015 Probability and Statistics Joyeeta Dutta-Moscato June 29, 2015 Terms and concepts Sample vs population Central tendency: Mean, median, mode Variance, standard deviation Normal distribution Cumulative distribution

More information

Non-specific filtering and control of false positives

Non-specific filtering and control of false positives Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview

More information