Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

Similar documents
Generative classification models

Bayes (Naïve or not) Classifiers: Generative Approach

ENGI 3423 Simple Linear Regression Page 12-01

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Dr. Shalabh. Indian Institute of Technology Kanpur

6.867 Machine Learning

Linear Regression Linear Regression with Shrinkage. Some slides are due to Tommi Jaakkola, MIT AI Lab

Point Estimation: definition of estimators

Qualifying Exam Statistical Theory Problem Solutions August 2005

CS 2750 Machine Learning Lecture 5. Density estimation. Density estimation

Special Instructions / Useful Data

ρ < 1 be five real numbers. The

CS 2750 Machine Learning. Lecture 7. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Lecture 7: Linear and quadratic classifiers

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

Chapter 14 Logistic Regression Models

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Simple Linear Regression

Part I: Background on the Binomial Distribution

COV. Violation of constant variance of ε i s but they are still independent. The error term (ε) is said to be heteroscedastic.

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Linear Regression with One Regressor

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

The expected value of a sum of random variables,, is the sum of the expected values:

Econometric Methods. Review of Estimation

Dimensionality reduction Feature selection

Supervised learning: Linear regression Logistic regression

TESTS BASED ON MAXIMUM LIKELIHOOD

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5

Chapter 4 Multiple Random Variables

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Unsupervised Learning and Other Neural Networks

Lecture 8: Linear Regression

Functions of Random Variables

Summary of the lecture in Biostatistics

ECON 5360 Class Notes GMM

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Bayesian Classification. CS690L Data Mining: Classification(2) Bayesian Theorem: Basics. Bayesian Theorem. Training dataset. Naïve Bayes Classifier

Recall MLR 5 Homskedasticity error u has the same variance given any values of the explanatory variables Var(u x1,...,xk) = 2 or E(UU ) = 2 I

Dimensionality Reduction and Learning

Detection and Estimation Theory

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

Lecture Notes 2. The ability to manipulate matrices is critical in economics.

Bayes Estimator for Exponential Distribution with Extension of Jeffery Prior Information

Lecture Note to Rice Chapter 8

Introduction to Matrices and Matrix Approach to Simple Linear Regression

LECTURE 2: Linear and quadratic classifiers

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Maximum Likelihood Estimation

Training Sample Model: Given n observations, [[( Yi, x i the sample model can be expressed as (1) where, zero and variance σ

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

Continuous Distributions

VOL. 3, NO. 11, November 2013 ISSN ARPN Journal of Science and Technology All rights reserved.

Section 2 Notes. Elizabeth Stone and Charles Wang. January 15, Expectation and Conditional Expectation of a Random Variable.

Probability and. Lecture 13: and Correlation

An Introduction to. Support Vector Machine

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE


Machine Learning. Introduction to Regression. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

BAYESIAN INFERENCES FOR TWO PARAMETER WEIBULL DISTRIBUTION

Lecture 3. Sampling, sampling distributions, and parameter estimation

Lecture 3 Probability review (cont d)

Support vector machines

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

Chapter 5 Properties of a Random Sample

Lecture 2: Linear Least Squares Regression

Bayesian Inferences for Two Parameter Weibull Distribution Kipkoech W. Cheruiyot 1, Abel Ouko 2, Emily Kirimi 3

New Schedule. Dec. 8 same same same Oct. 21. ^2 weeks ^1 week ^1 week. Pattern Recognition for Vision

STK4011 and STK9011 Autumn 2016

Study of Correlation using Bayes Approach under bivariate Distributions

CS 3710 Advanced Topics in AI Lecture 17. Density estimation. CS 3710 Probabilistic graphical models. Administration

Binary classification: Support Vector Machines

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

Chapter 13 Student Lecture Notes 13-1

CSE 5526: Introduction to Neural Networks Linear Regression

CS 1675 Introduction to Machine Learning Lecture 12 Support vector machines

STK3100 and STK4100 Autumn 2018

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter

Naïve Bayes MIT Course Notes Cynthia Rudin

Randomness and uncertainty play an important

ε. Therefore, the estimate

Line Fitting and Regression

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

CHAPTER VI Statistical Analysis of Experimental Data

ENGI 4421 Propagation of Error Page 8-01

Objectives of Multiple Regression

Point Estimation: definition of estimators

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

Classification : Logistic regression. Generative classification model.

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

Regression and the LMS Algorithm

Introduction to local (nonparametric) density estimation. methods

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

Probability and Statistics Basic concepts II

STK3100 and STK4100 Autumn 2017

Simple Linear Regression and Correlation.

Transcription:

Overvew Basc cocepts of Bayesa learg Most probable model gve data Co tosses Lear regresso Logstc regresso Bayesa predctos Co tosses Lear regresso 30

Recap: regresso problems Iput to learg problem: trag data L {( x1, y1),,( x, y)} Istaces gve by feature vector, label x x... x Trag data matrx form: x X x 11 1m 1 1 m x x m m y y1 y y Output: model f : X Y 31

Recap: lear regresso Lear regresso: predcto s weghted sum of features. Model gve by weghts Model defed by parameter vector ( weght vector ), costat term 0 s tegrated to weght vector by addg a costat attrbute x : f f ( x) ( x ) m 0 0 m x 1 costat term attrbute weghts 0 1 x x T. 0... weght vector m x 3

Recap: rdge regresso Trag data L {( x, y ),,( x, y )}. 1 1 Approach: mmze regularzed loss fucto * arg m f ) ( 1 y x ( f ( x ), y ) ( (, ) ) 1/ Quadratc loss fucto ( f ( x ), y) ( f ( x ) y). L-Regularzer ( ). Soluto: * X T X I X T y. 1 33

Iserto: uvarate ormal dstrbuto Dstrbuto over x. Gveby desty fucto wth parameters (mea) ad (varace). Desty of ormal dstrbuto 34

Iserto: multvarate ormal dstrbuto Dstrbuto over vectors Gve by desty fucto wth parameters mea vector p( x z 1) ( x μ, Σ) k x D. covarace matrx 1 1 T 1 exp ( x μ) Σ ( x μ) Z μ D D D, Σ. Normalzer Z D/ 1/ Example D=: desty, samples from dstrbuto 35

Probablstc Lear Regresso Lear regresso as a probablstc model: p y y f ( x) T ( x, ) ( x, ). f ( x) p y y T ( x, ) ( x, ) x 36

Probablstc Lear Regresso Lear regresso as a probablstc model: p y y f ( x) T ( x, ) ( x, ). f ( x) p y y T ( x, ) ( x, ) x T * Label y geerated by lear model f * ( x ) x plus ormally dstrbuted ose: y x T mt ~ ( 0, ). * 37

Most probable model gve the data Goal: most probably model gve the data. Approach: derve a-posteror dstrbuto Lkelhood: Probablty of data, gve model * arg max p( L) pror dstrbuto over parameters p( L) p( L ) p( ) pl ( ) 38

Bayesa lear regresso: lkelhood Lkelhood of data: staces depedet 1 multvarate ormal dstrbuto wth covarace matrx I p( L ) p( y,... y, x, ) 1, x1, 1 py ( x, ) T ( y x, ) y X, I staces x depedet of f ( x ) x T X x,..., 1 x ( )T y ( y,..., 1 y )T X x... x T 1 T vector of predctos 39

Bayesa lear regresso: pror Pror dstrbuto over weght vectors. Approprate pror dstrbuto: ormal dstrbuto. p( ) ( 0, I p p ) 1 1 exp p m/ m p cotrols stregth of pror Normal dstrbuto s cojugate to tself: ormally dstrbuted pror ad ormal lkelhood result ormally dstrbuted posteror. p( ) 0 0 1 40

Bayesa lear regresso: posteror Posteror dstrbuto over models gve the data 1 p( L) p( L ) p( ) Z 1 Z Bayes rule ( y X, I) ( 0, pi) 1 (, A ) Theorem descrbg propertes of ormal dstrbutos mt ( X X I) X T 1 T p data matrx label vector y ud ose parameter A XX T p I varace of pror Posteror dstrbuto over parameter vectors s aga ormally dstrbuted wth ovel mea ad covarace matrx A 1. 41

Summary dervato of posteror Summary dervato of posteror: Approach: Bayes rule p( L) p( L p( pl ( ) Derve lkelhood, choose approprate pror dstrbuto (ormal dstrbuto). Posteror p( L): how probably s lear regresso model after havg see the data L? Computato of posteror s relatvely smple: Pror ( 0, I). p Observato of data L. 1 Resultg posteror (, A ). 4

Bayesa lear regresso: MAP model Posteror over parameter vectors s aga ormally 1 dstrbted, wth ew mea ad covarace matrx A. Most probable model gve the data: * arg max p( L) 1 arg max (, A ) wth ( X X I) X T 1 T p y ad A XX T p I 43

Example MAP soluto regresso Trag data: 0 4 x1 3, x 3, 3 Matrx otato (addg costat attrbute): x 0 1, y1 y 3 y3 4 X 1 3 0 1 4 3 1 0 1 y 3 4 44

Example MAP soluto regresso Choose Varace of pror: Nose parameter: p 1 0.5 T 1 T Compute: ( X X I) X y p 1 T 1 0 0 0 T 1 3 0 1 3 0 1 3 0 0 1 0 0 1 4 3 1 4 3 0.5 1 4 3 3 0 0 1 0 1 0 1 1 0 1 1 0 1 4 0 0 0 1 0.7975-0.5598 0.7543 1.117 45

Example MAP soluto regresso Predctos of model o the trag data: 0.7975 1 3 0 1.9408-0.5598 ˆ y X 1 4 3 3.0646 0.7543 1 0 1 3.795 1.117 46

Coecto to rdge regresso MAP parameter lear regresso: X X y T 1 T ( X I) p Recall: rdge regresso * ( ( ) arg m f ) 1 y x X X 1 I T X T y. MAP soluto of Bayesa regresso detcal to soluto of rdge regresso for. p 47

Coecto to rdge regresso Coecto betwee loss fucto ad lkelhood, regularzer ad pror. MAP model: * arg max p( L) arg max p( L ) p( ) arg max log p( L ) log p( ) arg m log p( L ) l og p( ) Negatve Log-Lkelhood Negatve Log-Pror 48

Coecto to rdge regresso Negatve log-lkelhood correspods to squared loss log p( L ) log p( y x, ) 1 1 1 1 1 log py ( x, ) lo T g ( y x, ) 1 1 log exp ( 1/ y ( ) 1 ( T y ) x cost x T ) 49

Coecto to rdge regresso Negatve log-pror correspods to regularzer log p( ) log ( 0, I) log 1 p p 1 1 exp p m/ m cost MAP soluto correspods to the mmzato of a regularzed loss fucto. 50

Vsualzato: sequetal update of posteror staces depedet Computato of posteror by sequetal updatg: multply lkelhood of dvdual staces p( L) p( ) p( y X, ) p( ) p( y, ) 1 x Multply lkelhood of o pror Let p ( ) p( ) 0, p ( ) k the posteror f oly the frst k staces L are used: y dvdually p( L) p( ) p( y x, ) p( y x, ) p( y x, )... p( y x, ) p ( ) 1 1 1 3 p ( ) p ( ) 3 p ( ) 3 51

Example: sequetal update posteror f ( x) 0 1x (oe dmesoal regresso) Sequetal update: p ( ) p( ) 0 p ( ) p( ) 0 Sample from p0( ) 1 0 5

Example: sequetal update posteror 1 f ( x) x 0 1 Sequetal update: Lkelhood p( y x, ) 1 1 (oe dmesoal regresso) p ( ) p ( ) p( y x, ) 1 0 1 1 P( w) 1 Istace x1, y1 y f ( x ) 1 1 1 x 0 1 1 1 Sample aus P( w) 1 0 x y 0 1 1 1 1 53

Example: sequetal update posteror f ( x) 0 1x (oe dmesoal regresso) Sequetal update: Lkelhood p( y x, ) 1 1 p ( ) p ( ) p( y x, ) 1 0 1 1 Posteror p1 ( ) Sample aus p1 ( ) 1 1 0 0 54

Example: sequetal update posteror f ( x) 0 1x (oe dmesoal regresso) Sequetal update: p( y x, ) p ( ) p ( ) p( y x, ) 1 p ( ) Sample aus p ( ) 1 1 0 0 55

Example: sequetal update posteror f ( x) Sequetal update: p( y x, ) 0 1x (oe dmesoal regresso) p ( ) p ( ) p( y x, ) 1 p ( ) Sample aus p( ) 1 1 0 0 56

Overvew Basc cocepts of Bayesa learg Most probable model gve data Co tosses Lear regresso Logstc regresso Bayesa predctos Co tosses Lear regresso 57