ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

Similar documents
ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Computational Perception. Bayesian Inference

Bayesian Inference. p(y)

Bayesian Models in Machine Learning

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Discrete Binary Distributions

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

CS 361: Probability & Statistics

Frequentist Statistics and Hypothesis Testing Spring

Some Asymptotic Bayesian Inference (background to Chapter 2 of Tanner s book)

COMP 551 Applied Machine Learning Lecture 19: Bayesian Inference

Bayesian Analysis for Natural Language Processing Lecture 2

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probability and Estimation. Alan Moses

Lecture 2: Conjugate priors

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Bayesian Analysis (Optional)

A Brief Review of Probability, Bayesian Statistics, and Information Theory

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Introduction into Bayesian statistics

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

Bayesian RL Seminar. Chris Mansley September 9, 2008

CS 361: Probability & Statistics

Computational Cognitive Science

Computational Cognitive Science

Comparison of Bayesian and Frequentist Inference

Classical and Bayesian inference

Probability Review - Bayes Introduction

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

HPD Intervals / Regions

Bayesian Regression Linear and Logistic Regression

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Point Estimation. Vibhav Gogate The University of Texas at Dallas

Compute f(x θ)f(θ) dθ

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

Introduction to Probabilistic Machine Learning

CSC321 Lecture 18: Learning Probabilistic Models

Lecture 2: Priors and Conjugacy

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Bayesian Inference: Posterior Intervals

More Spectral Clustering and an Introduction to Conjugacy

Non-Parametric Bayes

Probability and Inference

Modeling Environment

Introduction to Machine Learning CMU-10701

ECE295, Data Assimila0on and Inverse Problems, Spring 2015

Bayesian Methods for Machine Learning

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Bayesian Learning (II)

Foundations of Statistical Inference

Announcements. Proposals graded

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

Introduction to Bayesian Statistics

PMR Learning as Inference

Bayesian Methods: Naïve Bayes

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

A primer on Bayesian statistics, with an application to mortality rate estimation

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

PHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality - graphical and numeric

Bayesian Inference. Introduction

Conjugate Priors, Uninformative Priors

The Random Variable for Probabilities Chris Piech CS109, Stanford University

Introduc)on to Bayesian Methods

Expectation Maximization

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Math 180B Problem Set 3

Intro to Bayesian Methods

The subjective worlds of Frequentist and Bayesian statistics

Non-parametric Methods

A Discussion of the Bayesian Approach

Chapter 5. Bayesian Statistics

A Very Brief Summary of Bayesian Inference, and Examples

(1) Introduction to Bayesian statistics

One-parameter models

Bayesian Statistics. Debdeep Pati Florida State University. February 11, 2016

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

Review: Statistical Model

STAT J535: Chapter 5: Classes of Bayesian Priors

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Linear Models A linear model is defined by the expression

Time Series and Dynamic Models

STAT 425: Introduction to Bayesian Analysis

Introduction to Bayesian inference

Bayesian Inference for Normal Mean

Hierarchical Models & Bayesian Model Selection

What are the Findings?

COS513 LECTURE 8 STATISTICAL CONCEPTS

Bayesian Inference and MCMC

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Bayesian analysis in nuclear physics

Computer vision: models, learning and inference

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Bayesian Estimation An Informal Introduction

Generative Models for Discrete Data

Transcription:

ECE521 W17 Tutorial 6 Min Bai and Yuhuai (Tony) Wu

Agenda knn and PCA Bayesian Inference

k-means Technique for clustering Unsupervised pattern and grouping discovery Class prediction Outlier detection

k-means

k-means

PCA Dimensionality Reduction Visualization Compression

Olivetti Faces Dataset Gray scale images 64x64 10 subjects 40 images per subject 400 images each Problem: 4096-dim feature vector for only 400 datapoints Would like: 200 dimensional feature space (each image described by 200 numbers)

PCA

Olivetti Faces Dataset - PCA First six principal components (eigen faces) u_0, u_5 u_j is column j of matrix U

PCA Reconstruction Z^n is the list of coefficients of selected principal components specific to image n B is the list of coefficient of non-selected principal components, common to all images

Olivetti Faces Dataset - PCA We can now find the weights vector for faces in the dataset Original Reconstructed with 200 components

Olivetti Faces Dataset - PCA Original: dimension = 4096 New: dimension = 200

Bayesian Inference Basic concepts Bayes theorem, bayesian modelling, conjugacy. Beta --- Binomial: conjugate prior Coin toss example Bayesian predictive distribution as ensembles

Bayesian Modelling

Conjugacy If the posterior distributions p(θ x) are in the same family as the prior probability distribution p(θ), the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood.

Binomial Data, Beta Prior Suppose the prior distribution for θ is Beta(α1,α2) and the conditional distribution of X given θ is Bin(n, p). Then

Binomial Data, Beta Prior (cont.) We now calculate the posterior:

Binomial Data, Beta Prior (cont.) Given x, is a constant. Therefore, We now recognize it as another Beta distribution with parameter (x+α1) and (n-x+α2): Beta(x+α1,n-x+α2). Same family as the prior distribution: conjugate prior!

Coin toss example You have a coin that when flipped ends up head with probability θ and ends up tail with probability (1-θ). Trying to estimate θ, you flip the coin 14 times. It ends up head 10 times. What is the probability of: "In the next two tosses we will get two heads in a row"? Would you bet on yes?

Coin toss example (cont.) --- Frequentist approach Using frequentist statistics we would say that the best (maximum likelihood) estimate for θ is 10/14, i.e., θ 0.714. In this case, the probability of two heads is 0.7142^2 0.51 and it makes sense to bet for the event. Therefore, the frequentist will bet yes!

Coin toss example (Cont.) --- Bayesian First let s consider what likelihood function is. The coin toss follows a binomial distribution Bin(n,θ). Hence, In our case:

Coin toss example (Cont.) --- Bayesian As we have shown earlier, a very convenient prior distribution for binomial distribution is its conjugate prior : Beta distribution. Let s put this prior on θ, with hyperparameter α1, α2:

Coin toss example (Cont.) --- Bayesian Therefore, the posterior distribution for θ is: Beta(x+α1,n-x+α2) ---in our case --- Beta(10+α1,4+α2) If we assume we know nothing about θ, then α1=α2=1. We plot the posterior distribution, i.e., (Beta(11,5)):

Coin toss example (Cont.) --- Two heads in a row We perform prediction by integrating out our posterior belief on θ When α1=α2=1, this is 0.485. The Bayesian will bet no!

Coin toss example (Cont.) --- Model Comparison Consider the following two models to fit the data: M1 model using a fixed θ=0.5 M2 model employing a uniform prior over the unknown θ: To choose which model is better, we need to compute the marginal likelihood or model evidence

Coin toss example (Cont.) --- Model Comparison Consider the following two models to fit the data: M1 model using a fixed θ=0.5 M2 model employing a uniform prior over the unknown θ: marginal likelihood of M1 marginal likelihood of M2 slightly better model with an additional free parameter

Prediction as Ensemble Given model M1 and M2, and their model evidence, we can do prediction in a form of model ensemble.