Parameter estimation Conditional risk

Similar documents
Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Naïve Bayes classification

Lecture : Probabilistic Machine Learning

Machine Learning 4771

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

CSE446: Clustering and EM Spring 2017

CPSC 340: Machine Learning and Data Mining

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

CSC321 Lecture 18: Learning Probabilistic Models

6.867 Machine Learning

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probability Theory for Machine Learning. Chris Cremer September 2015

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Parametric Techniques Lecture 3

Parametric Techniques

Expectation maximization

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Gaussian Mixture Models

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert

PROBABILITY THEORY REVIEW

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Choosing among models

Probability and Estimation. Alan Moses

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Some Probability and Statistics

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Statistical learning. Chapter 20, Sections 1 4 1

COMP90051 Statistical Machine Learning

Bayesian Linear Regression [DRAFT - In Progress]

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Open book, but no loose leaf notes and no electronic devices. Points (out of 200) are in parentheses. Put all answers on the paper provided to you.

Gradient Ascent Chris Piech CS109, Stanford University

Bayesian Inference and MCMC

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Problem Set 1. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 20

PMR Learning as Inference

Lecture 4: Probabilistic Learning

Density Estimation. Seungjin Choi

Logistic Regression. William Cohen

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Line of symmetry Total area enclosed is 1

DD Advanced Machine Learning

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

Part 4: Multi-parameter and normal models

Why should you care about the solution strategies?

Supervised Learning Coursework

Linear Algebra. Introduction. Marek Petrik 3/23/2017. Many slides adapted from Linear Algebra Lectures by Martin Scharlemann

Introduction to Maximum Likelihood Estimation

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Confidence Intervals

Dept. of Linguistics, Indiana University Fall 2015

Introduction to Machine Learning. Lecture 2

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

CS 540: Machine Learning Lecture 2: Review of Probability & Statistics

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Statistical learning. Chapter 20, Sections 1 3 1

Some Probability and Statistics

Topic 2: Logistic Regression

Bayesian Interpretations of Regularization

Bayesian Methods: Naïve Bayes

Statistics: Learning models from data

Statistical Distribution Assumptions of General Linear Models

GAUSSIAN PROCESS REGRESSION

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Logistic Regression. Machine Learning Fall 2018

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

Mixtures of Gaussians continued

Probabilistic Graphical Models

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Linear Models for Regression CS534

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Bias-Variance Tradeoff

Lecture 3: Pattern Classification

Overfitting, Bias / Variance Analysis

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Week 2: Review of probability and statistics

ECE521 week 3: 23/26 January 2017

Bayesian Support Vector Machines for Feature Ranking and Selection

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

A Few Notes on Fisher Information (WIP)

Probabilistic Graphical Models

y Xw 2 2 y Xw λ w 2 2

(Extended) Kalman Filter

Computational Cognitive Science

Practice Problems Section Problems

Transcription:

Parameter estimation Conditional risk

Formalizing the problem Specify random variables we care about e.g., Commute Time e.g., Heights of buildings in a city We might then pick a particular distribution over these random variables e.g., Say we think our variable is Gaussian Now want a way to use the data to inform the model choice e.g., Let data tell us the parameters for that Gaussian 2

<latexit sha1_base64="dpbgw+tifmcqwl/qe7qytlb7rwm=">aaaca3icbvdlssnafj3uv62vqdvddbahgpskclosunelfewdmlgm00k7dgyszizccqu3/oobf4q49sfc+tdo2iy09ccfwzn3cu89qcyo0o7zbrwwlldw14rrpy3nre0de3evpajeytleeytkj0ckmcpiu1pnscewbpgakxywusr89gorikbito9j4nm0edskggkj9ewdjym9xiiln5okx5nt6ck64oi+dtkzy07vmqiuejcnzzcj0bo/vh6ee06exgwp1xwdwpspkppiriyll1ekrniebqrrqecckd+d/jcbx0bpwzcsposgu/x3riq4ummemm7syjxvzej/xjfr4ywfuhenmgg8wxqmdooizohappueazy2bgfjza0qd5fewjvysiyed/7lrdkqvv2n6t6eleuxerxfcaioqaw44bzuwtvogcba4be8g1fwzj1zl9a79tfrlvj5zd74a+vzb+xhlwm=</latexit> <latexit sha1_base64="dpbgw+tifmcqwl/qe7qytlb7rwm=">aaaca3icbvdlssnafj3uv62vqdvddbahgpskclosunelfewdmlgm00k7dgyszizccqu3/oobf4q49sfc+tdo2iy09ccfwzn3cu89qcyo0o7zbrwwlldw14rrpy3nre0de3evpajeytleeytkj0ckmcpiu1pnscewbpgakxywusr89gorikbito9j4nm0edskggkj9ewdjym9xiiln5okx5nt6ck64oi+dtkzy07vmqiuejcnzzcj0bo/vh6ee06exgwp1xwdwpspkppiriyll1ekrniebqrrqecckd+d/jcbx0bpwzcsposgu/x3riq4ummemm7syjxvzej/xjfr4ywfuhenmgg8wxqmdooizohappueazy2bgfjza0qd5fewjvysiyed/7lrdkqvv2n6t6eleuxerxfcaioqaw44bzuwtvogcba4be8g1fwzj1zl9a79tfrlvj5zd74a+vzb+xhlwm=</latexit> <latexit sha1_base64="dpbgw+tifmcqwl/qe7qytlb7rwm=">aaaca3icbvdlssnafj3uv62vqdvddbahgpskclosunelfewdmlgm00k7dgyszizccqu3/oobf4q49sfc+tdo2iy09ccfwzn3cu89qcyo0o7zbrwwlldw14rrpy3nre0de3evpajeytleeytkj0ckmcpiu1pnscewbpgakxywusr89gorikbito9j4nm0edskggkj9ewdjym9xiiln5okx5nt6ck64oi+dtkzy07vmqiuejcnzzcj0bo/vh6ee06exgwp1xwdwpspkppiriyll1ekrniebqrrqecckd+d/jcbx0bpwzcsposgu/x3riq4ummemm7syjxvzej/xjfr4ywfuhenmgg8wxqmdooizohappueazy2bgfjza0qd5fewjvysiyed/7lrdkqvv2n6t6eleuxerxfcaioqaw44bzuwtvogcba4be8g1fwzj1zl9a79tfrlvj5zd74a+vzb+xhlwm=</latexit> <latexit sha1_base64="dpbgw+tifmcqwl/qe7qytlb7rwm=">aaaca3icbvdlssnafj3uv62vqdvddbahgpskclosunelfewdmlgm00k7dgyszizccqu3/oobf4q49sfc+tdo2iy09ccfwzn3cu89qcyo0o7zbrwwlldw14rrpy3nre0de3evpajeytleeytkj0ckmcpiu1pnscewbpgakxywusr89gorikbito9j4nm0edskggkj9ewdjym9xiiln5okx5nt6ck64oi+dtkzy07vmqiuejcnzzcj0bo/vh6ee06exgwp1xwdwpspkppiriyll1ekrniebqrrqecckd+d/jcbx0bpwzcsposgu/x3riq4ummemm7syjxvzej/xjfr4ywfuhenmgg8wxqmdooizohappueazy2bgfjza0qd5fewjvysiyed/7lrdkqvv2n6t6eleuxerxfcaioqaw44bzuwtvogcba4be8g1fwzj1zl9a79tfrlvj5zd74a+vzb+xhlwm=</latexit> Parameter estimation Assume that we are given some model class, M, e.g., Gaussian with parameters mu and sigma N (µ, 2 ) selection of model from the class corresponds to selecting mu, sigma Now want to select best model; how do we define best? Generally assume data comes from that model class; might want to find model that best explains the data (or most likely given the data) Might want most likely model, with preference for important samples Might want most likely model, that also matches expert prior info Might want most likely model, that is the simplest (least parameters) 3 These additional requirements are usually in place to enable better generalization to unseen data

Some notation nx i=1 ny x i = x 1 + x 2 +...+ x n 4 i=1 x i = x 1 x 2...x n f c : X R d! R xw = arg argmax max f(x) c(w) x f(x w2r d c(w )) = max max x f(x) c(w) w2r d M 2 M F is a set of models e.g., F = {N (µ, ) (µ, ) 2 R 2, > 0} e.g., F = {w 2 R d f(x) =x > w}

Definition of optimization We select some (error) function c we care about c c Maximizing c means we are finding largest point Minimizing c means we are finding smallest point 5

Maximum a posteriori (MAP) estimation f MAP = arg max f2f p(f D) Want the f that is most likely, given the data p(f D) is the posterior distribution of the model given data e.g., F could be the space of Gaussian distributions, the model is f and f(x) returns probability/density of a point x e.g., we could assume x is Gaussian distributed with variance = 1, and so F could be the reals, and the model f is the mean Question: What is the function we are optimizing and what are the parameters we are learning? 6

MAP f MAP = arg max f2f p(f D) p(f D) is the posterior distribution of the model given data e.g., we could assume x is Gaussian distributed with variance = 1, and so F could be the reals, and the model f is the mean Question: What is the function we are optimizing and what are the parameters we are learning? c(f) =p(mean is f D) max f2r c(f) 7

Maximum a posteriori (MAP) f MAP = arg max f2f p(f D) p(f D) is the posterior distribution of the model given data In discrete spaces: p(f D) is the PMF the MAP estimate is exactly the most probable model e.g., bias of coin is 0.1, 0.5, or 0.7, p(f = 0.1 D), In continuous spaces: p(f D) is the PDF the MAP estimate is the model with the largest value of the posterior density function e.g., bias of a coin is in [0, 1] 8 But what is p(f D)? Do we pick it? If so, how?

MAP calculation Start by applying Bayes rule p(f D) = p(d f)p(f) p(d) p(d f) is the likelihood of the data, under the model p(f) is the prior of the model p(d) is the marginal distribution of the data we will often be able to ignore this term 9

Why is this conversion important? Do not always have a known form for p(f D) We usually have chosen (known) forms for p(d f) and p(f) Let theta = parameters of model (distribution); interchangeable use p(d f) = p(d theta) and p(f) = p(theta) Example: Let D = {x1} (one sample). Then, if model class is the set of Gaussians: p(d f) = p(x1 mu, sigma) p(f D) is not obvious, since specified our model class for p(x f) What is p(f) in this case? We may put some prior preferences on mu and sigma, e.g., normal distribution around mu, specifying that really large magnitude values in mu are unlikely 10 Specifying and using p(f) is related to regularization and Bayesian parameter estimation, which will will discuss more later

<latexit sha1_base64="me6gobzojpz1cxptawiq1fj7la8=">aaab7xicbvbnswmxej34wetx1aoxybhqpeykomeif48v7ae0s8mm2ty2myxjvixl/4mxd4p49f9489+ytnvq1gcdj/dmmjkxjoib63nfagv1bx1js7bv3n7z3dsvhrw2juo1zq2qhnltkbgmugqny61g7uqzeoectclrzdrvptjtujl3dpywicydysnoixvsm6k89fyzxqnsvb0z8dlxc1kghpve6avbvzsnmbruegm6vpfyicpacirypnhndusihzeb6zgqscxmkm2unebtp/rxplqraffm/t2rkdiycry6zpjyovn0puj/xie10vwqczmklkk6xxslalufp6/jptemwjf2hfdn3a2ydokm1lqaii4ef/hlzdi8r/pe1b+7kneu8zgkcawnuaeflqegt1chblb4ggd4htek0at6rx/z1hwuzxzbh6dph6lzjn8=</latexit> <latexit sha1_base64="me6gobzojpz1cxptawiq1fj7la8=">aaab7xicbvbnswmxej34wetx1aoxybhqpeykomeif48v7ae0s8mm2ty2myxjvixl/4mxd4p49f9489+ytnvq1gcdj/dmmjkxjoib63nfagv1bx1js7bv3n7z3dsvhrw2juo1zq2qhnltkbgmugqny61g7uqzeoectclrzdrvptjtujl3dpywicydysnoixvsm6k89fyzxqnsvb0z8dlxc1kghpve6avbvzsnmbruegm6vpfyicpacirypnhndusihzeb6zgqscxmkm2unebtp/rxplqraffm/t2rkdiycry6zpjyovn0puj/xie10vwqczmklkk6xxslalufp6/jptemwjf2hfdn3a2ydokm1lqaii4ef/hlzdi8r/pe1b+7kneu8zgkcawnuaeflqegt1chblb4ggd4htek0at6rx/z1hwuzxzbh6dph6lzjn8=</latexit> <latexit sha1_base64="me6gobzojpz1cxptawiq1fj7la8=">aaab7xicbvbnswmxej34wetx1aoxybhqpeykomeif48v7ae0s8mm2ty2myxjvixl/4mxd4p49f9489+ytnvq1gcdj/dmmjkxjoib63nfagv1bx1js7bv3n7z3dsvhrw2juo1zq2qhnltkbgmugqny61g7uqzeoectclrzdrvptjtujl3dpywicydysnoixvsm6k89fyzxqnsvb0z8dlxc1kghpve6avbvzsnmbruegm6vpfyicpacirypnhndusihzeb6zgqscxmkm2unebtp/rxplqraffm/t2rkdiycry6zpjyovn0puj/xie10vwqczmklkk6xxslalufp6/jptemwjf2hfdn3a2ydokm1lqaii4ef/hlzdi8r/pe1b+7kneu8zgkcawnuaeflqegt1chblb4ggd4htek0at6rx/z1hwuzxzbh6dph6lzjn8=</latexit> <latexit sha1_base64="me6gobzojpz1cxptawiq1fj7la8=">aaab7xicbvbnswmxej34wetx1aoxybhqpeykomeif48v7ae0s8mm2ty2myxjvixl/4mxd4p49f9489+ytnvq1gcdj/dmmjkxjoib63nfagv1bx1js7bv3n7z3dsvhrw2juo1zq2qhnltkbgmugqny61g7uqzeoectclrzdrvptjtujl3dpywicydysnoixvsm6k89fyzxqnsvb0z8dlxc1kghpve6avbvzsnmbruegm6vpfyicpacirypnhndusihzeb6zgqscxmkm2unebtp/rxplqraffm/t2rkdiycry6zpjyovn0puj/xie10vwqczmklkk6xxslalufp6/jptemwjf2hfdn3a2ydokm1lqaii4ef/hlzdi8r/pe1b+7kneu8zgkcawnuaeflqegt1chblb4ggd4htek0at6rx/z1hwuzxzbh6dph6lzjn8=</latexit> <latexit sha1_base64="9ebez7asyxnus7k0kmzsc78bsse=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0leqmeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9t3+uwkw3xnikvey0kfcjt65a/eigzphniwqbxuem5i/iwqw5naaamxakwog9mhdi2vneltz/ntp+tmkgmsxsqwngsu/p7iakt1japsz0tnsc97m/e/r5ua8mrpuexsg5itfowpicyms7/jgctkrkwsouxxeythi6oomzadkg3bw355lbquqp5b9e4uk/xrpi4inmapnimhnajdltsgcqyg8ayv8oyi58v5dz4wrqunnzmgp3a+fwalsi2g</latexit> <latexit sha1_base64="9ebez7asyxnus7k0kmzsc78bsse=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0leqmeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9t3+uwkw3xnikvey0kfcjt65a/eigzphniwqbxuem5i/iwqw5naaamxakwog9mhdi2vneltz/ntp+tmkgmsxsqwngsu/p7iakt1japsz0tnsc97m/e/r5ua8mrpuexsg5itfowpicyms7/jgctkrkwsouxxeythi6oomzadkg3bw355lbquqp5b9e4uk/xrpi4inmapnimhnajdltsgcqyg8ayv8oyi58v5dz4wrqunnzmgp3a+fwalsi2g</latexit> <latexit sha1_base64="9ebez7asyxnus7k0kmzsc78bsse=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0leqmeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9t3+uwkw3xnikvey0kfcjt65a/eigzphniwqbxuem5i/iwqw5naaamxakwog9mhdi2vneltz/ntp+tmkgmsxsqwngsu/p7iakt1japsz0tnsc97m/e/r5ua8mrpuexsg5itfowpicyms7/jgctkrkwsouxxeythi6oomzadkg3bw355lbquqp5b9e4uk/xrpi4inmapnimhnajdltsgcqyg8ayv8oyi58v5dz4wrqunnzmgp3a+fwalsi2g</latexit> <latexit sha1_base64="su5hbpak+kk9dlsnnulnhnxer84=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0mkomeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9sv9csvt+roqvajl5mk5gj0y1+9qczsckvhgmrd9dze+blvhjob01iv1zhqnqzd7foqaytaz+antsmzvqykjjutachc/t2r0ujrsrtyzoiakv72zuj/xjc14zwfczmkbivblaptquxmzn+tavfijjhyqpni9lbcrlrrzmw6jruct/zykmnvqp5b9e4ukvxrpi4inmapnimhl1chw2haexgm4rle4c0rzovz7nwswgtopnmmf+b8/ganni2h</latexit> <latexit sha1_base64="su5hbpak+kk9dlsnnulnhnxer84=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0mkomeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9sv9csvt+roqvajl5mk5gj0y1+9qczsckvhgmrd9dze+blvhjob01iv1zhqnqzd7foqaytaz+antsmzvqykjjutachc/t2r0ujrsrtyzoiakv72zuj/xjc14zwfczmkbivblaptquxmzn+tavfijjhyqpni9lbcrlrrzmw6jruct/zykmnvqp5b9e4ukvxrpi4inmapnimhl1chw2haexgm4rle4c0rzovz7nwswgtopnmmf+b8/ganni2h</latexit> <latexit sha1_base64="su5hbpak+kk9dlsnnulnhnxer84=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0mkomeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9sv9csvt+roqvajl5mk5gj0y1+9qczsckvhgmrd9dze+blvhjob01iv1zhqnqzd7foqaytaz+antsmzvqykjjutachc/t2r0ujrsrtyzoiakv72zuj/xjc14zwfczmkbivblaptquxmzn+tavfijjhyqpni9lbcrlrrzmw6jruct/zykmnvqp5b9e4ukvxrpi4inmapnimhl1chw2haexgm4rle4c0rzovz7nwswgtopnmmf+b8/ganni2h</latexit> <latexit sha1_base64="su5hbpak+kk9dlsnnulnhnxer84=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0mkomeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9sv9csvt+roqvajl5mk5gj0y1+9qczsckvhgmrd9dze+blvhjob01iv1zhqnqzd7foqaytaz+antsmzvqykjjutachc/t2r0ujrsrtyzoiakv72zuj/xjc14zwfczmkbivblaptquxmzn+tavfijjhyqpni9lbcrlrrzmw6jruct/zykmnvqp5b9e4ukvxrpi4inmapnimhl1chw2haexgm4rle4c0rzovz7nwswgtopnmmf+b8/ganni2h</latexit> <latexit sha1_base64="1zatsa6aurzb6tfxhpdtmv4pqxs=">aaab7xicbvbnswmxej2tx7v+vt16crahxspuefry9okxgv2adinznnvgzpmlyypl6x/w4kerr/4fb/4b0+0etpxbwoo9gwbmbtfn2rjut1nyw9/y3cpul3z29/ypyodhbs0trwilsc5vn8cacizoyzddatdwfecbp51gcjp3o49uasbfvzng1i/wslcqewys1i6rt4p6+abccwtubrrkvjxuiedzup7qdyvjiiom4vjrnufgxk+xmoxwoiv1e01jtcz4rhuwchxr7afzttn0zpuhcqwyjqzk1n8tky60nkab7yywgetlby7+5/use175krnxyqggi0vhwpgrap46gjjfieftszbrzn6kybgrtiwnqgrd8jzfxixtes1za97dravxncdrhbm4hsp4caknuiumtidaazzdk7w50nlx3p2prwvbyweo4q+czx+kxo6a</latexit> <latexit sha1_base64="1zatsa6aurzb6tfxhpdtmv4pqxs=">aaab7xicbvbnswmxej2tx7v+vt16crahxspuefry9okxgv2adinznnvgzpmlyypl6x/w4kerr/4fb/4b0+0etpxbwoo9gwbmbtfn2rjut1nyw9/y3cpul3z29/ypyodhbs0trwilsc5vn8cacizoyzddatdwfecbp51gcjp3o49uasbfvzng1i/wslcqewys1i6rt4p6+abccwtubrrkvjxuiedzup7qdyvjiiom4vjrnufgxk+xmoxwoiv1e01jtcz4rhuwchxr7afzttn0zpuhcqwyjqzk1n8tky60nkab7yywgetlby7+5/use175krnxyqggi0vhwpgrap46gjjfieftszbrzn6kybgrtiwnqgrd8jzfxixtes1za97dravxncdrhbm4hsp4caknuiumtidaazzdk7w50nlx3p2prwvbyweo4q+czx+kxo6a</latexit> <latexit sha1_base64="1zatsa6aurzb6tfxhpdtmv4pqxs=">aaab7xicbvbnswmxej2tx7v+vt16crahxspuefry9okxgv2adinznnvgzpmlyypl6x/w4kerr/4fb/4b0+0etpxbwoo9gwbmbtfn2rjut1nyw9/y3cpul3z29/ypyodhbs0trwilsc5vn8cacizoyzddatdwfecbp51gcjp3o49uasbfvzng1i/wslcqewys1i6rt4p6+abccwtubrrkvjxuiedzup7qdyvjiiom4vjrnufgxk+xmoxwoiv1e01jtcz4rhuwchxr7afzttn0zpuhcqwyjqzk1n8tky60nkab7yywgetlby7+5/use175krnxyqggi0vhwpgrap46gjjfieftszbrzn6kybgrtiwnqgrd8jzfxixtes1za97dravxncdrhbm4hsp4caknuiumtidaazzdk7w50nlx3p2prwvbyweo4q+czx+kxo6a</latexit> <latexit sha1_base64="1zatsa6aurzb6tfxhpdtmv4pqxs=">aaab7xicbvbnswmxej2tx7v+vt16crahxspuefry9okxgv2adinznnvgzpmlyypl6x/w4kerr/4fb/4b0+0etpxbwoo9gwbmbtfn2rjut1nyw9/y3cpul3z29/ypyodhbs0trwilsc5vn8cacizoyzddatdwfecbp51gcjp3o49uasbfvzng1i/wslcqewys1i6rt4p6+abccwtubrrkvjxuiedzup7qdyvjiiom4vjrnufgxk+xmoxwoiv1e01jtcz4rhuwchxr7afzttn0zpuhcqwyjqzk1n8tky60nkab7yywgetlby7+5/use175krnxyqggi0vhwpgrap46gjjfieftszbrzn6kybgrtiwnqgrd8jzfxixtes1za97dravxncdrhbm4hsp4caknuiumtidaazzdk7w50nlx3p2prwvbyweo4q+czx+kxo6a</latexit> Example of p(d theta) D Probabilities for fixed theta = N (x 1 µ =, 2 = 1) = 1 1 2 exp 2 (x 1 ) 2 p(x 1 ) p(x 2 ) 11 x<latexit sha1_base64="9ebez7asyxnus7k0kmzsc78bsse=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0leqmeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9t3+uwkw3xnikvey0kfcjt65a/eigzphniwqbxuem5i/iwqw5naaamxakwog9mhdi2vneltz/ntp+tmkgmsxsqwngsu/p7iakt1japsz0tnsc97m/e/r5ua8mrpuexsg5itfowpicyms7/jgctkrkwsouxxeythi6oomzadkg3bw355lbquqp5b9e4uk/xrpi4inmapnimhnajdltsgcqyg8ayv8oyi58v5dz4wrqunnzmgp3a+fwalsi2g</latexit> 1 x 2

<latexit sha1_base64="b7fkygxrip6cevss/e9i0kxe/jw=">aaacnxichzhfahnbfmznt1prtg2qd3rhwwbjoitdinsbqogkkkuqmkaqszezk0kydpypm2dlw5q38km8822ctsluvtozdhx853eymw/ixctlqfdb8zcepnx8tpw49utp9s5ufe/zqc0ki2rfzdozzzfaqvuq+6riy7pcsexilqfxxfuqp7iuxqos/u7zxi4snkzqogsss6l6z7zje6szqf1+wmap4dsthhaezi8ogfs1tfc802rb/he4ulykwgo4ijr8tprzwsxznpo23cqdaiw0glv1l+7cr0f1rtaolgw3rbgwdbauk6j+i48zusqyjahr2mey5dqq0zaswi5qvlayr3gbuzl0msve2lg5thcbb5wzhklm3e4jlu71irita+dj7mgqsxuzv5n/6w0lmrwblsrnc5kpwb00ktrqbtvxwvgzkujpnubhllsribkafoq+tozccg8++by47btdob1+e9vo9tzxblgx7dvrspadsi77xe5ynwnvhdf1pntf/ff+r//y/7pcfw8985z9u/7gdx1bwxq=</latexit> <latexit sha1_base64="b7fkygxrip6cevss/e9i0kxe/jw=">aaacnxichzhfahnbfmznt1prtg2qd3rhwwbjoitdinsbqogkkkuqmkaqszezk0kydpypm2dlw5q38km8822ctsluvtozdhx853eymw/ixctlqfdb8zcepnx8tpw49utp9s5ufe/zqc0ki2rfzdozzzfaqvuq+6riy7pcsexilqfxxfuqp7iuxqos/u7zxi4snkzqogsss6l6z7zje6szqf1+wmap4dsthhaezi8ogfs1tfc802rb/he4ulykwgo4ijr8tprzwsxznpo23cqdaiw0glv1l+7cr0f1rtaolgw3rbgwdbauk6j+i48zusqyjahr2mey5dqq0zaswi5qvlayr3gbuzl0msve2lg5thcbb5wzhklm3e4jlu71irita+dj7mgqsxuzv5n/6w0lmrwblsrnc5kpwb00ktrqbtvxwvgzkujpnubhllsribkafoq+tozccg8++by47btdob1+e9vo9tzxblgx7dvrspadsi77xe5ynwnvhdf1pntf/ff+r//y/7pcfw8985z9u/7gdx1bwxq=</latexit> <latexit sha1_base64="b7fkygxrip6cevss/e9i0kxe/jw=">aaacnxichzhfahnbfmznt1prtg2qd3rhwwbjoitdinsbqogkkkuqmkaqszezk0kydpypm2dlw5q38km8822ctsluvtozdhx853eymw/ixctlqfdb8zcepnx8tpw49utp9s5ufe/zqc0ki2rfzdozzzfaqvuq+6riy7pcsexilqfxxfuqp7iuxqos/u7zxi4snkzqogsss6l6z7zje6szqf1+wmap4dsthhaezi8ogfs1tfc802rb/he4ulykwgo4ijr8tprzwsxznpo23cqdaiw0glv1l+7cr0f1rtaolgw3rbgwdbauk6j+i48zusqyjahr2mey5dqq0zaswi5qvlayr3gbuzl0msve2lg5thcbb5wzhklm3e4jlu71irita+dj7mgqsxuzv5n/6w0lmrwblsrnc5kpwb00ktrqbtvxwvgzkujpnubhllsribkafoq+tozccg8++by47btdob1+e9vo9tzxblgx7dvrspadsi77xe5ynwnvhdf1pntf/ff+r//y/7pcfw8985z9u/7gdx1bwxq=</latexit> <latexit sha1_base64="b7fkygxrip6cevss/e9i0kxe/jw=">aaacnxichzhfahnbfmznt1prtg2qd3rhwwbjoitdinsbqogkkkuqmkaqszezk0kydpypm2dlw5q38km8822ctsluvtozdhx853eymw/ixctlqfdb8zcepnx8tpw49utp9s5ufe/zqc0ki2rfzdozzzfaqvuq+6riy7pcsexilqfxxfuqp7iuxqos/u7zxi4snkzqogsss6l6z7zje6szqf1+wmap4dsthhaezi8ogfs1tfc802rb/he4ulykwgo4ijr8tprzwsxznpo23cqdaiw0glv1l+7cr0f1rtaolgw3rbgwdbauk6j+i48zusqyjahr2mey5dqq0zaswi5qvlayr3gbuzl0msve2lg5thcbb5wzhklm3e4jlu71irita+dj7mgqsxuzv5n/6w0lmrwblsrnc5kpwb00ktrqbtvxwvgzkujpnubhllsribkafoq+tozccg8++by47btdob1+e9vo9tzxblgx7dvrspadsi77xe5ynwnvhdf1pntf/ff+r//y/7pcfw8985z9u/7gdx1bwxq=</latexit> <latexit sha1_base64="me6gobzojpz1cxptawiq1fj7la8=">aaab7xicbvbnswmxej34wetx1aoxybhqpeykomeif48v7ae0s8mm2ty2myxjvixl/4mxd4p49f9489+ytnvq1gcdj/dmmjkxjoib63nfagv1bx1js7bv3n7z3dsvhrw2juo1zq2qhnltkbgmugqny61g7uqzeoectclrzdrvptjtujl3dpywicydysnoixvsm6k89fyzxqnsvb0z8dlxc1kghpve6avbvzsnmbruegm6vpfyicpacirypnhndusihzeb6zgqscxmkm2unebtp/rxplqraffm/t2rkdiycry6zpjyovn0puj/xie10vwqczmklkk6xxslalufp6/jptemwjf2hfdn3a2ydokm1lqaii4ef/hlzdi8r/pe1b+7kneu8zgkcawnuaeflqegt1chblb4ggd4htek0at6rx/z1hwuzxzbh6dph6lzjn8=</latexit> <latexit sha1_base64="me6gobzojpz1cxptawiq1fj7la8=">aaab7xicbvbnswmxej34wetx1aoxybhqpeykomeif48v7ae0s8mm2ty2myxjvixl/4mxd4p49f9489+ytnvq1gcdj/dmmjkxjoib63nfagv1bx1js7bv3n7z3dsvhrw2juo1zq2qhnltkbgmugqny61g7uqzeoectclrzdrvptjtujl3dpywicydysnoixvsm6k89fyzxqnsvb0z8dlxc1kghpve6avbvzsnmbruegm6vpfyicpacirypnhndusihzeb6zgqscxmkm2unebtp/rxplqraffm/t2rkdiycry6zpjyovn0puj/xie10vwqczmklkk6xxslalufp6/jptemwjf2hfdn3a2ydokm1lqaii4ef/hlzdi8r/pe1b+7kneu8zgkcawnuaeflqegt1chblb4ggd4htek0at6rx/z1hwuzxzbh6dph6lzjn8=</latexit> <latexit sha1_base64="me6gobzojpz1cxptawiq1fj7la8=">aaab7xicbvbnswmxej34wetx1aoxybhqpeykomeif48v7ae0s8mm2ty2myxjvixl/4mxd4p49f9489+ytnvq1gcdj/dmmjkxjoib63nfagv1bx1js7bv3n7z3dsvhrw2juo1zq2qhnltkbgmugqny61g7uqzeoectclrzdrvptjtujl3dpywicydysnoixvsm6k89fyzxqnsvb0z8dlxc1kghpve6avbvzsnmbruegm6vpfyicpacirypnhndusihzeb6zgqscxmkm2unebtp/rxplqraffm/t2rkdiycry6zpjyovn0puj/xie10vwqczmklkk6xxslalufp6/jptemwjf2hfdn3a2ydokm1lqaii4ef/hlzdi8r/pe1b+7kneu8zgkcawnuaeflqegt1chblb4ggd4htek0at6rx/z1hwuzxzbh6dph6lzjn8=</latexit> <latexit sha1_base64="me6gobzojpz1cxptawiq1fj7la8=">aaab7xicbvbnswmxej34wetx1aoxybhqpeykomeif48v7ae0s8mm2ty2myxjvixl/4mxd4p49f9489+ytnvq1gcdj/dmmjkxjoib63nfagv1bx1js7bv3n7z3dsvhrw2juo1zq2qhnltkbgmugqny61g7uqzeoectclrzdrvptjtujl3dpywicydysnoixvsm6k89fyzxqnsvb0z8dlxc1kghpve6avbvzsnmbruegm6vpfyicpacirypnhndusihzeb6zgqscxmkm2unebtp/rxplqraffm/t2rkdiycry6zpjyovn0puj/xie10vwqczmklkk6xxslalufp6/jptemwjf2hfdn3a2ydokm1lqaii4ef/hlzdi8r/pe1b+7kneu8zgkcawnuaeflqegt1chblb4ggd4htek0at6rx/z1hwuzxzbh6dph6lzjn8=</latexit> <latexit sha1_base64="9ebez7asyxnus7k0kmzsc78bsse=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0leqmeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9t3+uwkw3xnikvey0kfcjt65a/eigzphniwqbxuem5i/iwqw5naaamxakwog9mhdi2vneltz/ntp+tmkgmsxsqwngsu/p7iakt1japsz0tnsc97m/e/r5ua8mrpuexsg5itfowpicyms7/jgctkrkwsouxxeythi6oomzadkg3bw355lbquqp5b9e4uk/xrpi4inmapnimhnajdltsgcqyg8ayv8oyi58v5dz4wrqunnzmgp3a+fwalsi2g</latexit> <latexit sha1_base64="9ebez7asyxnus7k0kmzsc78bsse=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0leqmeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9t3+uwkw3xnikvey0kfcjt65a/eigzphniwqbxuem5i/iwqw5naaamxakwog9mhdi2vneltz/ntp+tmkgmsxsqwngsu/p7iakt1japsz0tnsc97m/e/r5ua8mrpuexsg5itfowpicyms7/jgctkrkwsouxxeythi6oomzadkg3bw355lbquqp5b9e4uk/xrpi4inmapnimhnajdltsgcqyg8ayv8oyi58v5dz4wrqunnzmgp3a+fwalsi2g</latexit> <latexit sha1_base64="9ebez7asyxnus7k0kmzsc78bsse=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0leqmeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9t3+uwkw3xnikvey0kfcjt65a/eigzphniwqbxuem5i/iwqw5naaamxakwog9mhdi2vneltz/ntp+tmkgmsxsqwngsu/p7iakt1japsz0tnsc97m/e/r5ua8mrpuexsg5itfowpicyms7/jgctkrkwsouxxeythi6oomzadkg3bw355lbquqp5b9e4uk/xrpi4inmapnimhnajdltsgcqyg8ayv8oyi58v5dz4wrqunnzmgp3a+fwalsi2g</latexit> <latexit sha1_base64="su5hbpak+kk9dlsnnulnhnxer84=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0mkomeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9sv9csvt+roqvajl5mk5gj0y1+9qczsckvhgmrd9dze+blvhjob01iv1zhqnqzd7foqaytaz+antsmzvqykjjutachc/t2r0ujrsrtyzoiakv72zuj/xjc14zwfczmkbivblaptquxmzn+tavfijjhyqpni9lbcrlrrzmw6jruct/zykmnvqp5b9e4ukvxrpi4inmapnimhl1chw2haexgm4rle4c0rzovz7nwswgtopnmmf+b8/ganni2h</latexit> <latexit sha1_base64="su5hbpak+kk9dlsnnulnhnxer84=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0mkomeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9sv9csvt+roqvajl5mk5gj0y1+9qczsckvhgmrd9dze+blvhjob01iv1zhqnqzd7foqaytaz+antsmzvqykjjutachc/t2r0ujrsrtyzoiakv72zuj/xjc14zwfczmkbivblaptquxmzn+tavfijjhyqpni9lbcrlrrzmw6jruct/zykmnvqp5b9e4ukvxrpi4inmapnimhl1chw2haexgm4rle4c0rzovz7nwswgtopnmmf+b8/ganni2h</latexit> <latexit sha1_base64="su5hbpak+kk9dlsnnulnhnxer84=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0mkomeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9sv9csvt+roqvajl5mk5gj0y1+9qczsckvhgmrd9dze+blvhjob01iv1zhqnqzd7foqaytaz+antsmzvqykjjutachc/t2r0ujrsrtyzoiakv72zuj/xjc14zwfczmkbivblaptquxmzn+tavfijjhyqpni9lbcrlrrzmw6jruct/zykmnvqp5b9e4ukvxrpi4inmapnimhl1chw2haexgm4rle4c0rzovz7nwswgtopnmmf+b8/ganni2h</latexit> <latexit sha1_base64="su5hbpak+kk9dlsnnulnhnxer84=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0mkomeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9sv9csvt+roqvajl5mk5gj0y1+9qczsckvhgmrd9dze+blvhjob01iv1zhqnqzd7foqaytaz+antsmzvqykjjutachc/t2r0ujrsrtyzoiakv72zuj/xjc14zwfczmkbivblaptquxmzn+tavfijjhyqpni9lbcrlrrzmw6jruct/zykmnvqp5b9e4ukvxrpi4inmapnimhl1chw2haexgm4rle4c0rzovz7nwswgtopnmmf+b8/ganni2h</latexit> <latexit sha1_base64="1zatsa6aurzb6tfxhpdtmv4pqxs=">aaab7xicbvbnswmxej2tx7v+vt16crahxspuefry9okxgv2adinznnvgzpmlyypl6x/w4kerr/4fb/4b0+0etpxbwoo9gwbmbtfn2rjut1nyw9/y3cpul3z29/ypyodhbs0trwilsc5vn8cacizoyzddatdwfecbp51gcjp3o49uasbfvzng1i/wslcqewys1i6rt4p6+abccwtubrrkvjxuiedzup7qdyvjiiom4vjrnufgxk+xmoxwoiv1e01jtcz4rhuwchxr7afzttn0zpuhcqwyjqzk1n8tky60nkab7yywgetlby7+5/use175krnxyqggi0vhwpgrap46gjjfieftszbrzn6kybgrtiwnqgrd8jzfxixtes1za97dravxncdrhbm4hsp4caknuiumtidaazzdk7w50nlx3p2prwvbyweo4q+czx+kxo6a</latexit> <latexit sha1_base64="1zatsa6aurzb6tfxhpdtmv4pqxs=">aaab7xicbvbnswmxej2tx7v+vt16crahxspuefry9okxgv2adinznnvgzpmlyypl6x/w4kerr/4fb/4b0+0etpxbwoo9gwbmbtfn2rjut1nyw9/y3cpul3z29/ypyodhbs0trwilsc5vn8cacizoyzddatdwfecbp51gcjp3o49uasbfvzng1i/wslcqewys1i6rt4p6+abccwtubrrkvjxuiedzup7qdyvjiiom4vjrnufgxk+xmoxwoiv1e01jtcz4rhuwchxr7afzttn0zpuhcqwyjqzk1n8tky60nkab7yywgetlby7+5/use175krnxyqggi0vhwpgrap46gjjfieftszbrzn6kybgrtiwnqgrd8jzfxixtes1za97dravxncdrhbm4hsp4caknuiumtidaazzdk7w50nlx3p2prwvbyweo4q+czx+kxo6a</latexit> <latexit sha1_base64="1zatsa6aurzb6tfxhpdtmv4pqxs=">aaab7xicbvbnswmxej2tx7v+vt16crahxspuefry9okxgv2adinznnvgzpmlyypl6x/w4kerr/4fb/4b0+0etpxbwoo9gwbmbtfn2rjut1nyw9/y3cpul3z29/ypyodhbs0trwilsc5vn8cacizoyzddatdwfecbp51gcjp3o49uasbfvzng1i/wslcqewys1i6rt4p6+abccwtubrrkvjxuiedzup7qdyvjiiom4vjrnufgxk+xmoxwoiv1e01jtcz4rhuwchxr7afzttn0zpuhcqwyjqzk1n8tky60nkab7yywgetlby7+5/use175krnxyqggi0vhwpgrap46gjjfieftszbrzn6kybgrtiwnqgrd8jzfxixtes1za97dravxncdrhbm4hsp4caknuiumtidaazzdk7w50nlx3p2prwvbyweo4q+czx+kxo6a</latexit> <latexit sha1_base64="1zatsa6aurzb6tfxhpdtmv4pqxs=">aaab7xicbvbnswmxej2tx7v+vt16crahxspuefry9okxgv2adinznnvgzpmlyypl6x/w4kerr/4fb/4b0+0etpxbwoo9gwbmbtfn2rjut1nyw9/y3cpul3z29/ypyodhbs0trwilsc5vn8cacizoyzddatdwfecbp51gcjp3o49uasbfvzng1i/wslcqewys1i6rt4p6+abccwtubrrkvjxuiedzup7qdyvjiiom4vjrnufgxk+xmoxwoiv1e01jtcz4rhuwchxr7afzttn0zpuhcqwyjqzk1n8tky60nkab7yywgetlby7+5/use175krnxyqggi0vhwpgrap46gjjfieftszbrzn6kybgrtiwnqgrd8jzfxixtes1za97dravxncdrhbm4hsp4caknuiumtidaazzdk7w50nlx3p2prwvbyweo4q+czx+kxo6a</latexit> With n samples For iid samples x1,, xn, we could choose (e.g.,) a Gaussian distribution for P(xi theta), with theta = {mu,sigma} Recall iid = independent and identically distributed P(x1,, xn theta) = P(x1 theta) P(xn theta) Example: D = {x1, x2} p(d =(µ, 2 )) = p({x 1,x 2 } =(µ, = p x 1 =(µ, 2 ) 2 )) p x 2 =(µ, 2 ) 12 p(x 1 ) x<latexit sha1_base64="9ebez7asyxnus7k0kmzsc78bsse=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0leqmeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9t3+uwkw3xnikvey0kfcjt65a/eigzphniwqbxuem5i/iwqw5naaamxakwog9mhdi2vneltz/ntp+tmkgmsxsqwngsu/p7iakt1japsz0tnsc97m/e/r5ua8mrpuexsg5itfowpicyms7/jgctkrkwsouxxeythi6oomzadkg3bw355lbquqp5b9e4uk/xrpi4inmapnimhnajdltsgcqyg8ayv8oyi58v5dz4wrqunnzmgp3a+fwalsi2g</latexit> 1 x 2 p(x 2 )

What does it mean to say X1 and X2 are independent? Independent samples X1,, Xn e.g., P(X1,X2 theta) = P(X1 theta) P(X2 theta) X1 and X2 are random variables, from sampling the distribution twice Because of randomness, X1 could have been many things it represents the random variable that is the first sample 13

September 13, 2018 Pre-requisites for the course Need to have probability, statistics, linear algebra and programming background (not just prototyping in R or Matlab) Graduate students come from diverse undergraduate programs; I expect you to fill in any gaps you have If you are missing important background, do consider if this course is right for you, because it might be a struggle Add/drop deadline is September 17 Any questions? 14

Formulating the problem As motivated upfront: our main goal is to learn a function f for some inputs x to make predictions about targets y As we will discuss, this will translate into learning conditional distributions p(y x) A general goal then is to learn the best parameters for a (conditional) distribution over some variables, e.g. p(x) or p(y x) 15

How can we identify the parameters for a distribution? All we get is access to samples x1,, xn from the true underlying distribution p(x) We want to find parameters theta so that p_theta approximates the true distribution p well e.g., theta is the mean and variance for a Gaussian e.g., theta is the parameters to a Gamma distribution (commute times) Any ideas? 16

<latexit sha1_base64="zxry4cg4iusonw3ypxspoj00050=">aaacgxicbvfrsxtben47w7wxamwf+7i0cvpgujncrslic6wptjqq5gly28wls/z2j905nwzvf/r39a1/pntveksjawvffn98zoxmwkhhmyr+buhki5era+uvwhuvn7e22ztvlqwudyce11kbq5rzkejbdwvkucomsdyvcjlov9t65q0yk7t6ibmcbjkbk5ejztbtw/bvjgc44uy6rxx9rboxndcgslp3o7oe0csi8qszmfr2guat+su/bh9oze6uoqjt99l1grq4z2+9aeeohc20ovbnqktholblxathuxn1oybouxavqics4nzy/pomnc9zumgls7yfrwuohdmouisqlzqwcsanbax9dxxlwq5cs8gk7nlm1ayxayw0yr86hmutnewpr6zhtmtatt6n9uvmjgdoqkjeuhzekcslru3rc9crmmbrzjxg3ag/k+utzhhhf7swx0k8/own4okog0fd+puhztnnxtrwytvynuytmhwkz+qbosc9wsm/ydc4dlrhsngqruhrvdqmfp635fgep/8bqjpc+w==</latexit> <latexit sha1_base64="zxry4cg4iusonw3ypxspoj00050=">aaacgxicbvfrsxtben47w7wxamwf+7i0cvpgujncrslic6wptjqq5gly28wls/z2j905nwzvf/r39a1/pntveksjawvffn98zoxmwkhhmyr+buhki5era+uvwhuvn7e22ztvlqwudyce11kbq5rzkejbdwvkucomsdyvcjlov9t65q0yk7t6ibmcbjkbk5ejztbtw/bvjgc44uy6rxx9rboxndcgslp3o7oe0csi8qszmfr2guat+su/bh9oze6uoqjt99l1grq4z2+9aeeohc20ovbnqktholblxathuxn1oybouxavqics4nzy/pomnc9zumgls7yfrwuohdmouisqlzqwcsanbax9dxxlwq5cs8gk7nlm1ayxayw0yr86hmutnewpr6zhtmtatt6n9uvmjgdoqkjeuhzekcslru3rc9crmmbrzjxg3ag/k+utzhhhf7swx0k8/own4okog0fd+puhztnnxtrwytvynuytmhwkz+qbosc9wsm/ydc4dlrhsngqruhrvdqmfp635fgep/8bqjpc+w==</latexit> <latexit sha1_base64="zxry4cg4iusonw3ypxspoj00050=">aaacgxicbvfrsxtben47w7wxamwf+7i0cvpgujncrslic6wptjqq5gly28wls/z2j905nwzvf/r39a1/pntveksjawvffn98zoxmwkhhmyr+buhki5era+uvwhuvn7e22ztvlqwudyce11kbq5rzkejbdwvkucomsdyvcjlov9t65q0yk7t6ibmcbjkbk5ejztbtw/bvjgc44uy6rxx9rboxndcgslp3o7oe0csi8qszmfr2guat+su/bh9oze6uoqjt99l1grq4z2+9aeeohc20ovbnqktholblxathuxn1oybouxavqics4nzy/pomnc9zumgls7yfrwuohdmouisqlzqwcsanbax9dxxlwq5cs8gk7nlm1ayxayw0yr86hmutnewpr6zhtmtatt6n9uvmjgdoqkjeuhzekcslru3rc9crmmbrzjxg3ag/k+utzhhhf7swx0k8/own4okog0fd+puhztnnxtrwytvynuytmhwkz+qbosc9wsm/ydc4dlrhsngqruhrvdqmfp635fgep/8bqjpc+w==</latexit> <latexit sha1_base64="zxry4cg4iusonw3ypxspoj00050=">aaacgxicbvfrsxtben47w7wxamwf+7i0cvpgujncrslic6wptjqq5gly28wls/z2j905nwzvf/r39a1/pntveksjawvffn98zoxmwkhhmyr+buhki5era+uvwhuvn7e22ztvlqwudyce11kbq5rzkejbdwvkucomsdyvcjlov9t65q0yk7t6ibmcbjkbk5ejztbtw/bvjgc44uy6rxx9rboxndcgslp3o7oe0csi8qszmfr2guat+su/bh9oze6uoqjt99l1grq4z2+9aeeohc20ovbnqktholblxathuxn1oybouxavqics4nzy/pomnc9zumgls7yfrwuohdmouisqlzqwcsanbax9dxxlwq5cs8gk7nlm1ayxayw0yr86hmutnewpr6zhtmtatt6n9uvmjgdoqkjeuhzekcslru3rc9crmmbrzjxg3ag/k+utzhhhf7swx0k8/own4okog0fd+puhztnnxtrwytvynuytmhwkz+qbosc9wsm/ydc4dlrhsngqruhrvdqmfp635fgep/8bqjpc+w==</latexit> <latexit sha1_base64="ebh8vwe41e+mncrq1yv7w0htm9m=">aaackhicbvdlsgmxfm34rpu16tjnsah1yzkpgm7eoiauk9ghdnohk2ba0cqzjbmxdpm5bvwvnykkdouxmd4w2nog5hdovdx7txazqrtjjkyl5zxvtfxcrn5za3tn197br6sokzjucmqi2qyqiowkutnum9kmjue8ykqrdg7gfuorseuj8achmwlz1bm0pbhpi/n2lcep8nmqelrajypdx4ilt1kgpzvwp6wxbtyrsbgwj2yqpk+zt0/gkryar1p27yjtciaai8sdkqkyoerb7143wgknqmoglgq5tqzbkzkaykayvjcoeim8qd3smlqgtlq7nryawwojdgeysfoehhp1d0ekufjdhpjk8bzq3hul/3mtricx7zskonfe4omgmgfqr3ccguxssbbmq0mqlttscnefsys1ytzvqndnt14k9xljduru/vmhcj2liwcowreoahecgwq4a1vqaxg8g1fwat6tf+vn+rjg09ila9zzap7a+v4blcalna==</latexit> <latexit sha1_base64="ebh8vwe41e+mncrq1yv7w0htm9m=">aaackhicbvdlsgmxfm34rpu16tjnsah1yzkpgm7eoiauk9ghdnohk2ba0cqzjbmxdpm5bvwvnykkdouxmd4w2nog5hdovdx7txazqrtjjkyl5zxvtfxcrn5za3tn197br6sokzjucmqi2qyqiowkutnum9kmjue8ykqrdg7gfuorseuj8achmwlz1bm0pbhpi/n2lcep8nmqelrajypdx4ilt1kgpzvwp6wxbtyrsbgwj2yqpk+zt0/gkryar1p27yjtciaai8sdkqkyoerb7143wgknqmoglgq5tqzbkzkaykayvjcoeim8qd3smlqgtlq7nryawwojdgeysfoehhp1d0ekufjdhpjk8bzq3hul/3mtricx7zskonfe4omgmgfqr3ccguxssbbmq0mqlttscnefsys1ytzvqndnt14k9xljduru/vmhcj2liwcowreoahecgwq4a1vqaxg8g1fwat6tf+vn+rjg09ila9zzap7a+v4blcalna==</latexit> <latexit sha1_base64="ebh8vwe41e+mncrq1yv7w0htm9m=">aaackhicbvdlsgmxfm34rpu16tjnsah1yzkpgm7eoiauk9ghdnohk2ba0cqzjbmxdpm5bvwvnykkdouxmd4w2nog5hdovdx7txazqrtjjkyl5zxvtfxcrn5za3tn197br6sokzjucmqi2qyqiowkutnum9kmjue8ykqrdg7gfuorseuj8achmwlz1bm0pbhpi/n2lcep8nmqelrajypdx4ilt1kgpzvwp6wxbtyrsbgwj2yqpk+zt0/gkryar1p27yjtciaai8sdkqkyoerb7143wgknqmoglgq5tqzbkzkaykayvjcoeim8qd3smlqgtlq7nryawwojdgeysfoehhp1d0ekufjdhpjk8bzq3hul/3mtricx7zskonfe4omgmgfqr3ccguxssbbmq0mqlttscnefsys1ytzvqndnt14k9xljduru/vmhcj2liwcowreoahecgwq4a1vqaxg8g1fwat6tf+vn+rjg09ila9zzap7a+v4blcalna==</latexit> <latexit sha1_base64="ebh8vwe41e+mncrq1yv7w0htm9m=">aaackhicbvdlsgmxfm34rpu16tjnsah1yzkpgm7eoiauk9ghdnohk2ba0cqzjbmxdpm5bvwvnykkdouxmd4w2nog5hdovdx7txazqrtjjkyl5zxvtfxcrn5za3tn197br6sokzjucmqi2qyqiowkutnum9kmjue8ykqrdg7gfuorseuj8achmwlz1bm0pbhpi/n2lcep8nmqelrajypdx4ilt1kgpzvwp6wxbtyrsbgwj2yqpk+zt0/gkryar1p27yjtciaai8sdkqkyoerb7143wgknqmoglgq5tqzbkzkaykayvjcoeim8qd3smlqgtlq7nryawwojdgeysfoehhp1d0ekufjdhpjk8bzq3hul/3mtricx7zskonfe4omgmgfqr3ccguxssbbmq0mqlttscnefsys1ytzvqndnt14k9xljduru/vmhcj2liwcowreoahecgwq4a1vqaxg8g1fwat6tf+vn+rjg09ila9zzap7a+v4blcalna==</latexit> What about in the prediction setting? That feels easier Here, the goal is to learn a function f for some inputs x to make predictions about targets y How do we identify the best f in our set of functions F? F = {f : R d! R f(x) =x > w for some w 2 R d } Any ideas? Maybe min f2f nx (f(x i ) y i ) 2 i=1 What does that entail? What if y_i is always 0 or 1? 17

<latexit sha1_base64="i0g3dmevbxg/atdafqorl9xpeys=">aaacf3icbvdjsgnbeo1xjxeb9eilmqjxemze0gnqdx4jmauyifr0kkmtnoxugkky5y+8+ctepcjivw/+jz3loikpch7vvvfvz4+l0og439bs8srq2npui7+5tb2za+/t13suka5vhslinxymqyoqqihqqinwwajfqt0fxi39+j0olalwdkcxtalwc0vxcizgatsld2gikvo9zavysj162adkgy2lu0yfji59zmr6nz207yjtciagi8sdkqkzodk2v7xoxjmaqussad10nrhbzh8klihle4mgmpeb60ht0jafofvp5k+mhhulq7urmhuinai/j1iwad0kfnm5vlhpe2pxp6+zypeilyowthbcpl3utstfii5doh2hgkmcgck4euzwyvtmmy4myrwjwz1/ezhutkuuu3jvzwrly1kcoxjijkiruosclmknqzaq4esrpjnx8my9ws/wu/uxbv2yzjmh5a+szx9nkacw</latexit> <latexit sha1_base64="i0g3dmevbxg/atdafqorl9xpeys=">aaacf3icbvdjsgnbeo1xjxeb9eilmqjxemze0gnqdx4jmauyifr0kkmtnoxugkky5y+8+ctepcjivw/+jz3loikpch7vvvfvz4+l0og439bs8srq2npui7+5tb2za+/t13suka5vhslinxymqyoqqihqqinwwajfqt0fxi39+j0olalwdkcxtalwc0vxcizgatsld2gikvo9zavysj162adkgy2lu0yfji59zmr6nz207yjtciagi8sdkqkzodk2v7xoxjmaqussad10nrhbzh8klihle4mgmpeb60ht0jafofvp5k+mhhulq7urmhuinai/j1iwad0kfnm5vlhpe2pxp6+zypeilyowthbcpl3utstfii5doh2hgkmcgck4euzwyvtmmy4myrwjwz1/ezhutkuuu3jvzwrly1kcoxjijkiruosclmknqzaq4esrpjnx8my9ws/wu/uxbv2yzjmh5a+szx9nkacw</latexit> <latexit sha1_base64="i0g3dmevbxg/atdafqorl9xpeys=">aaacf3icbvdjsgnbeo1xjxeb9eilmqjxemze0gnqdx4jmauyifr0kkmtnoxugkky5y+8+ctepcjivw/+jz3loikpch7vvvfvz4+l0og439bs8srq2npui7+5tb2za+/t13suka5vhslinxymqyoqqihqqinwwajfqt0fxi39+j0olalwdkcxtalwc0vxcizgatsld2gikvo9zavysj162adkgy2lu0yfji59zmr6nz207yjtciagi8sdkqkzodk2v7xoxjmaqussad10nrhbzh8klihle4mgmpeb60ht0jafofvp5k+mhhulq7urmhuinai/j1iwad0kfnm5vlhpe2pxp6+zypeilyowthbcpl3utstfii5doh2hgkmcgck4euzwyvtmmy4myrwjwz1/ezhutkuuu3jvzwrly1kcoxjijkiruosclmknqzaq4esrpjnx8my9ws/wu/uxbv2yzjmh5a+szx9nkacw</latexit> <latexit sha1_base64="i0g3dmevbxg/atdafqorl9xpeys=">aaacf3icbvdjsgnbeo1xjxeb9eilmqjxemze0gnqdx4jmauyifr0kkmtnoxugkky5y+8+ctepcjivw/+jz3loikpch7vvvfvz4+l0og439bs8srq2npui7+5tb2za+/t13suka5vhslinxymqyoqqihqqinwwajfqt0fxi39+j0olalwdkcxtalwc0vxcizgatsld2gikvo9zavysj162adkgy2lu0yfji59zmr6nz207yjtciagi8sdkqkzodk2v7xoxjmaqussad10nrhbzh8klihle4mgmpeb60ht0jafofvp5k+mhhulq7urmhuinai/j1iwad0kfnm5vlhpe2pxp6+zypeilyowthbcpl3utstfii5doh2hgkmcgck4euzwyvtmmy4myrwjwz1/ezhutkuuu3jvzwrly1kcoxjijkiruosclmknqzaq4esrpjnx8my9ws/wu/uxbv2yzjmh5a+szx9nkacw</latexit> Using the language of probability Hard to just guess an objective function that tells you what is the best model in your class A natural objective is: find the model that is the most likely given the data (sampled according to a true underlying model) arg max p( D) Reflects the inherent uncertainty in identifying the model, since many models are possible given only samples 18

Start by applying Bayes rule MAP calculation p(f D) = p(d f)p(f) p(d) p(d f) is the likelihood of the data, under the model for iid data, p(d f) = p(x1 f) p(xn f) p(f) is the prior of the model p(d) is the marginal distribution of the data we will often be able to ignore this term 19

<latexit sha1_base64="yezbwiq5na/vuerwi02av9wta/y=">aaacknichvhlthsxfpvmodaumvsx68zqbijnninaoerluviwakvuioaur5hhuznyeb6y71reu39qf6e7/k09msxiqprklo7poffh6yhx0maq/px8fyurl9fwxzveb2y+abbevrs2waef9esmmn0bcqnkptbdiqpucw08irtcrhfnlx7ze7srwxqf0xwgcr+nmpaco6ogrd8mj4b8wdkeeyy/n3wtpy3tl7s+cz22loh3lv75lgwx5qlmdxyle8fvewhpl1qru9txnbklll3l2hnv/1upswgrhxscwddhijydnplhd9j6w0azkbjiushutd8mchy45iifattghygcizs+hr6dku/admrzsi3dcsyixpl2j0u6yx9mldwxzppezlknbpa1inxk6xcyhw1kmeyfqirqrnghkga0+h86khoeqqkdxgjpzqviwt3o0f1iwy0hxh7yy3c91wmdtvhjv336db6odfkrfci7jcsh5jrcki7peee1vqpv2dvxp/if/tp/vlb63jznpvki/9s/6z3i9w==</latexit> <latexit sha1_base64="yezbwiq5na/vuerwi02av9wta/y=">aaacknichvhlthsxfpvmodaumvsx68zqbijnninaoerluviwakvuioaur5hhuznyeb6y71reu39qf6e7/k09msxiqprklo7poffh6yhx0maq/px8fyurl9fwxzveb2y+abbevrs2waef9esmmn0bcqnkptbdiqpucw08irtcrhfnlx7ze7srwxqf0xwgcr+nmpaco6ogrd8mj4b8wdkeeyy/n3wtpy3tl7s+cz22loh3lv75lgwx5qlmdxyle8fvewhpl1qru9txnbklll3l2hnv/1upswgrhxscwddhijydnplhd9j6w0azkbjiushutd8mchy45iifattghygcizs+hr6dku/admrzsi3dcsyixpl2j0u6yx9mldwxzppezlknbpa1inxk6xcyhw1kmeyfqirqrnghkga0+h86khoeqqkdxgjpzqviwt3o0f1iwy0hxh7yy3c91wmdtvhjv336db6odfkrfci7jcsh5jrcki7peee1vqpv2dvxp/if/tp/vlb63jznpvki/9s/6z3i9w==</latexit> <latexit sha1_base64="yezbwiq5na/vuerwi02av9wta/y=">aaacknichvhlthsxfpvmodaumvsx68zqbijnninaoerluviwakvuioaur5hhuznyeb6y71reu39qf6e7/k09msxiqprklo7poffh6yhx0maq/px8fyurl9fwxzveb2y+abbevrs2waef9esmmn0bcqnkptbdiqpucw08irtcrhfnlx7ze7srwxqf0xwgcr+nmpaco6ogrd8mj4b8wdkeeyy/n3wtpy3tl7s+cz22loh3lv75lgwx5qlmdxyle8fvewhpl1qru9txnbklll3l2hnv/1upswgrhxscwddhijydnplhd9j6w0azkbjiushutd8mchy45iifattghygcizs+hr6dku/admrzsi3dcsyixpl2j0u6yx9mldwxzppezlknbpa1inxk6xcyhw1kmeyfqirqrnghkga0+h86khoeqqkdxgjpzqviwt3o0f1iwy0hxh7yy3c91wmdtvhjv336db6odfkrfci7jcsh5jrcki7peee1vqpv2dvxp/if/tp/vlb63jznpvki/9s/6z3i9w==</latexit> <latexit sha1_base64="yezbwiq5na/vuerwi02av9wta/y=">aaacknichvhlthsxfpvmodaumvsx68zqbijnninaoerluviwakvuioaur5hhuznyeb6y71reu39qf6e7/k09msxiqprklo7poffh6yhx0maq/px8fyurl9fwxzveb2y+abbevrs2waef9esmmn0bcqnkptbdiqpucw08irtcrhfnlx7ze7srwxqf0xwgcr+nmpaco6ogrd8mj4b8wdkeeyy/n3wtpy3tl7s+cz22loh3lv75lgwx5qlmdxyle8fvewhpl1qru9txnbklll3l2hnv/1upswgrhxscwddhijydnplhd9j6w0azkbjiushutd8mchy45iifattghygcizs+hr6dku/admrzsi3dcsyixpl2j0u6yx9mldwxzppezlknbpa1inxk6xcyhw1kmeyfqirqrnghkga0+h86khoeqqkdxgjpzqviwt3o0f1iwy0hxh7yy3c91wmdtvhjv336db6odfkrfci7jcsh5jrcki7peee1vqpv2dvxp/if/tp/vlb63jznpvki/9s/6z3i9w==</latexit> <latexit sha1_base64="t32l0pz9/3qsrtn4nlr65itgcly=">aaacaxichvhbssmwge7rac5tpyiin8ghujvriqa3wlavvfrwtljhslpuhavtsp4koxz8ru98aw98cdnt4mgcpwq+vgp58ywqgmtw3vflnpmdm18olzaxlldw15zk+p1oukvzkyyiufcb0uzwmdwbg2d3ujesbyk1gsffobcemdi8iw9hkfknig8xdzklykiu8ywpfegzipgj+xgbpiuiu8xr+oam+6eindogl75wjew1/bms5t8ttdz3ywe+vimebp+b7jpvt+6obk8dbwkqadlxxeff7yu0jvgmvbct254rozmrbzwklpf9vdnj6ia8slabmymy7msjpnk8b5gedhnltgx4xh5pzctsehgfxlmsrx9rbfmx1k4hpo1kpjypsjiolwptgu0fre24xxwjiiygekq42rxtpjh9gvmcsinb+/3kaxb3vpfcundzxg2ct+ooov20hw6rh05qa12ha9refl1zy9amtww92xv7294zw21rktlap8aufgc1x7hb</latexit> <latexit sha1_base64="t32l0pz9/3qsrtn4nlr65itgcly=">aaacaxichvhbssmwge7rac5tpyiin8ghujvriqa3wlavvfrwtljhslpuhavtsp4koxz8ru98aw98cdnt4mgcpwq+vgp58ywqgmtw3vflnpmdm18olzaxlldw15zk+p1oukvzkyyiufcb0uzwmdwbg2d3ujesbyk1gsffobcemdi8iw9hkfknig8xdzklykiu8ywpfegzipgj+xgbpiuiu8xr+oam+6eindogl75wjew1/bms5t8ttdz3ywe+vimebp+b7jpvt+6obk8dbwkqadlxxeff7yu0jvgmvbct254rozmrbzwklpf9vdnj6ia8slabmymy7msjpnk8b5gedhnltgx4xh5pzctsehgfxlmsrx9rbfmx1k4hpo1kpjypsjiolwptgu0fre24xxwjiiygekq42rxtpjh9gvmcsinb+/3kaxb3vpfcundzxg2ct+ooov20hw6rh05qa12ha9refl1zy9amtww92xv7294zw21rktlap8aufgc1x7hb</latexit> <latexit sha1_base64="t32l0pz9/3qsrtn4nlr65itgcly=">aaacaxichvhbssmwge7rac5tpyiin8ghujvriqa3wlavvfrwtljhslpuhavtsp4koxz8ru98aw98cdnt4mgcpwq+vgp58ywqgmtw3vflnpmdm18olzaxlldw15zk+p1oukvzkyyiufcb0uzwmdwbg2d3ujesbyk1gsffobcemdi8iw9hkfknig8xdzklykiu8ywpfegzipgj+xgbpiuiu8xr+oam+6eindogl75wjew1/bms5t8ttdz3ywe+vimebp+b7jpvt+6obk8dbwkqadlxxeff7yu0jvgmvbct254rozmrbzwklpf9vdnj6ia8slabmymy7msjpnk8b5gedhnltgx4xh5pzctsehgfxlmsrx9rbfmx1k4hpo1kpjypsjiolwptgu0fre24xxwjiiygekq42rxtpjh9gvmcsinb+/3kaxb3vpfcundzxg2ct+ooov20hw6rh05qa12ha9refl1zy9amtww92xv7294zw21rktlap8aufgc1x7hb</latexit> <latexit sha1_base64="t32l0pz9/3qsrtn4nlr65itgcly=">aaacaxichvhbssmwge7rac5tpyiin8ghujvriqa3wlavvfrwtljhslpuhavtsp4koxz8ru98aw98cdnt4mgcpwq+vgp58ywqgmtw3vflnpmdm18olzaxlldw15zk+p1oukvzkyyiufcb0uzwmdwbg2d3ujesbyk1gsffobcemdi8iw9hkfknig8xdzklykiu8ywpfegzipgj+xgbpiuiu8xr+oam+6eindogl75wjew1/bms5t8ttdz3ywe+vimebp+b7jpvt+6obk8dbwkqadlxxeff7yu0jvgmvbct254rozmrbzwklpf9vdnj6ia8slabmymy7msjpnk8b5gedhnltgx4xh5pzctsehgfxlmsrx9rbfmx1k4hpo1kpjypsjiolwptgu0fre24xxwjiiygekq42rxtpjh9gvmcsinb+/3kaxb3vpfcundzxg2ct+ooov20hw6rh05qa12ha9refl1zy9amtww92xv7294zw21rktlap8aufgc1x7hb</latexit> Reformulating MAP The constant is irrelevant, as it is the same for all D MAP = arg max = arg max p(d )p( ) p(d) p(d )p( ) Will often write: p( D) = p(d )p( ) p(d) / p(d )p( ) 20

What is the prior? The prior gives a chance to incorporate expert knowledge The knowledge or information prior to seeing any data Example: maybe ahead of time you know what you think are reasonable values for the heights of people The prior could be a Gaussian distribution around theta = height, where p(theta) is a Gaussian with mean = 170 (cm) and variance = 60 What does the variance mean here, for the prior? 21

<latexit sha1_base64="svrzvnnsfuzrtbi7pif7u1/bhtu=">aaacl3icbvdlsgnbejynrxhfuy9ebomsl2fxbl0ioiiefbsmebihze46yzdzbzo9ylj3j7z4k7mikolvv3a2yugndq3vvd10d3mrfbpt+9xktu3pzm7l5wsli0vlk8xvtvsdxopdlycyvhce0ybfafuukoeuusb8t0ln651keu0elbzhcip9cbo+6wsilthdqzwlzy52avkzcreemlm8sfna2d6ko5kptur67cgts7aurmvty5czmzym9jgo+j1mswrx7ghqsecmqymm46pzhlitkmc+bmgl07ru2be2zeiuxejacgmneem91og6gqhzqtes4b8p3tjmi7zdztjaomr/titm17rve6yzo1b/1tlyp60ey/ugkyggihecplrujixfkgbm0zzqwfh2dwbccxmr5v2mgedjccgy4px9erlc7lycu+jc75wojsd25mkg2srl4pb9cktoyrwpek6eyic8kxfr2xqxpqzpuwvogs+sk19hfx0dvcopzg==</latexit> <latexit sha1_base64="svrzvnnsfuzrtbi7pif7u1/bhtu=">aaacl3icbvdlsgnbejynrxhfuy9ebomsl2fxbl0ioiiefbsmebihze46yzdzbzo9ylj3j7z4k7mikolvv3a2yugndq3vvd10d3mrfbpt+9xktu3pzm7l5wsli0vlk8xvtvsdxopdlycyvhce0ybfafuukoeuusb8t0ln651keu0elbzhcip9cbo+6wsilthdqzwlzy52avkzcreemlm8sfna2d6ko5kptur67cgts7aurmvty5czmzym9jgo+j1mswrx7ghqsecmqymm46pzhlitkmc+bmgl07ru2be2zeiuxejacgmneem91og6gqhzqtes4b8p3tjmi7zdztjaomr/titm17rve6yzo1b/1tlyp60ey/ugkyggihecplrujixfkgbm0zzqwfh2dwbccxmr5v2mgedjccgy4px9erlc7lycu+jc75wojsd25mkg2srl4pb9cktoyrwpek6eyic8kxfr2xqxpqzpuwvogs+sk19hfx0dvcopzg==</latexit> <latexit sha1_base64="svrzvnnsfuzrtbi7pif7u1/bhtu=">aaacl3icbvdlsgnbejynrxhfuy9ebomsl2fxbl0ioiiefbsmebihze46yzdzbzo9ylj3j7z4k7mikolvv3a2yugndq3vvd10d3mrfbpt+9xktu3pzm7l5wsli0vlk8xvtvsdxopdlycyvhce0ybfafuukoeuusb8t0ln651keu0elbzhcip9cbo+6wsilthdqzwlzy52avkzcreemlm8sfna2d6ko5kptur67cgts7aurmvty5czmzym9jgo+j1mswrx7ghqsecmqymm46pzhlitkmc+bmgl07ru2be2zeiuxejacgmneem91og6gqhzqtes4b8p3tjmi7zdztjaomr/titm17rve6yzo1b/1tlyp60ey/ugkyggihecplrujixfkgbm0zzqwfh2dwbccxmr5v2mgedjccgy4px9erlc7lycu+jc75wojsd25mkg2srl4pb9cktoyrwpek6eyic8kxfr2xqxpqzpuwvogs+sk19hfx0dvcopzg==</latexit> <latexit sha1_base64="svrzvnnsfuzrtbi7pif7u1/bhtu=">aaacl3icbvdlsgnbejynrxhfuy9ebomsl2fxbl0ioiiefbsmebihze46yzdzbzo9ylj3j7z4k7mikolvv3a2yugndq3vvd10d3mrfbpt+9xktu3pzm7l5wsli0vlk8xvtvsdxopdlycyvhce0ybfafuukoeuusb8t0ln651keu0elbzhcip9cbo+6wsilthdqzwlzy52avkzcreemlm8sfna2d6ko5kptur67cgts7aurmvty5czmzym9jgo+j1mswrx7ghqsecmqymm46pzhlitkmc+bmgl07ru2be2zeiuxejacgmneem91og6gqhzqtes4b8p3tjmi7zdztjaomr/titm17rve6yzo1b/1tlyp60ey/ugkyggihecplrujixfkgbm0zzqwfh2dwbccxmr5v2mgedjccgy4px9erlc7lycu+jc75wojsd25mkg2srl4pb9cktoyrwpek6eyic8kxfr2xqxpqzpuwvogs+sk19hfx0dvcopzg==</latexit> <latexit sha1_base64="t32l0pz9/3qsrtn4nlr65itgcly=">aaacaxichvhbssmwge7rac5tpyiin8ghujvriqa3wlavvfrwtljhslpuhavtsp4koxz8ru98aw98cdnt4mgcpwq+vgp58ywqgmtw3vflnpmdm18olzaxlldw15zk+p1oukvzkyyiufcb0uzwmdwbg2d3ujesbyk1gsffobcemdi8iw9hkfknig8xdzklykiu8ywpfegzipgj+xgbpiuiu8xr+oam+6eindogl75wjew1/bms5t8ttdz3ywe+vimebp+b7jpvt+6obk8dbwkqadlxxeff7yu0jvgmvbct254rozmrbzwklpf9vdnj6ia8slabmymy7msjpnk8b5gedhnltgx4xh5pzctsehgfxlmsrx9rbfmx1k4hpo1kpjypsjiolwptgu0fre24xxwjiiygekq42rxtpjh9gvmcsinb+/3kaxb3vpfcundzxg2ct+ooov20hw6rh05qa12ha9refl1zy9amtww92xv7294zw21rktlap8aufgc1x7hb</latexit> <latexit sha1_base64="t32l0pz9/3qsrtn4nlr65itgcly=">aaacaxichvhbssmwge7rac5tpyiin8ghujvriqa3wlavvfrwtljhslpuhavtsp4koxz8ru98aw98cdnt4mgcpwq+vgp58ywqgmtw3vflnpmdm18olzaxlldw15zk+p1oukvzkyyiufcb0uzwmdwbg2d3ujesbyk1gsffobcemdi8iw9hkfknig8xdzklykiu8ywpfegzipgj+xgbpiuiu8xr+oam+6eindogl75wjew1/bms5t8ttdz3ywe+vimebp+b7jpvt+6obk8dbwkqadlxxeff7yu0jvgmvbct254rozmrbzwklpf9vdnj6ia8slabmymy7msjpnk8b5gedhnltgx4xh5pzctsehgfxlmsrx9rbfmx1k4hpo1kpjypsjiolwptgu0fre24xxwjiiygekq42rxtpjh9gvmcsinb+/3kaxb3vpfcundzxg2ct+ooov20hw6rh05qa12ha9refl1zy9amtww92xv7294zw21rktlap8aufgc1x7hb</latexit> <latexit sha1_base64="t32l0pz9/3qsrtn4nlr65itgcly=">aaacaxichvhbssmwge7rac5tpyiin8ghujvriqa3wlavvfrwtljhslpuhavtsp4koxz8ru98aw98cdnt4mgcpwq+vgp58ywqgmtw3vflnpmdm18olzaxlldw15zk+p1oukvzkyyiufcb0uzwmdwbg2d3ujesbyk1gsffobcemdi8iw9hkfknig8xdzklykiu8ywpfegzipgj+xgbpiuiu8xr+oam+6eindogl75wjew1/bms5t8ttdz3ywe+vimebp+b7jpvt+6obk8dbwkqadlxxeff7yu0jvgmvbct254rozmrbzwklpf9vdnj6ia8slabmymy7msjpnk8b5gedhnltgx4xh5pzctsehgfxlmsrx9rbfmx1k4hpo1kpjypsjiolwptgu0fre24xxwjiiygekq42rxtpjh9gvmcsinb+/3kaxb3vpfcundzxg2ct+ooov20hw6rh05qa12ha9refl1zy9amtww92xv7294zw21rktlap8aufgc1x7hb</latexit> <latexit sha1_base64="t32l0pz9/3qsrtn4nlr65itgcly=">aaacaxichvhbssmwge7rac5tpyiin8ghujvriqa3wlavvfrwtljhslpuhavtsp4koxz8ru98aw98cdnt4mgcpwq+vgp58ywqgmtw3vflnpmdm18olzaxlldw15zk+p1oukvzkyyiufcb0uzwmdwbg2d3ujesbyk1gsffobcemdi8iw9hkfknig8xdzklykiu8ywpfegzipgj+xgbpiuiu8xr+oam+6eindogl75wjew1/bms5t8ttdz3ywe+vimebp+b7jpvt+6obk8dbwkqadlxxeff7yu0jvgmvbct254rozmrbzwklpf9vdnj6ia8slabmymy7msjpnk8b5gedhnltgx4xh5pzctsehgfxlmsrx9rbfmx1k4hpo1kpjypsjiolwptgu0fre24xxwjiiygekq42rxtpjh9gvmcsinb+/3kaxb3vpfcundzxg2ct+ooov20hw6rh05qa12ha9refl1zy9amtww92xv7294zw21rktlap8aufgc1x7hb</latexit> Maximum likelihood ML = arg max In some situations, may not have a reason to prefer one model over another (i.e., no prior knowledge or preferences) Can loosely think of maximum likelihood as instance of MAP, with uniform prior p(theta) = u for some constant u If domain is infinite (example, the set of reals), the uniform distribution is not defined! but the interpretation is still similar in practice, typically have a bounded space in mind for the model class p(d ) p(x) p( D) = p(d )p( ) p(d) / p(d )p( ) 22

ML example e.g., F = R, is the mean of a Gaussian, fixed =1 c( ) =p(d ) = N (x 1 µ =, 2 = 1) = 1 1 2 exp 2 (x 1 ) 2 c( ) =p(d ) 23 x 1 Variable is theta, optimizing it

Maximizing the log-likelihood We want to maximize the likelihood, but often instead maximize the log-likelihood arg max 2F p(d ) = arg max 2F log p(d ) Why? Or maybe first, is this equivalent? The Why is that it makes the optimization much simpler, when we have more than one sample 24

<latexit sha1_base64="befek43afpkcfmcy8mkkpnkg5zi=">aaaca3icbvc7sgnbfj2nrxhfq3badayhnmfxbk0kagmzwtwgg8ls5g4yzpbbzf0xlaebf8xgqhfbf8lov3e32uitt3u4517uucenpnbowd9gywl5zxwtuf7a2nza3jf395o6jbwhbg9lqnou0ybfaa0ukkedkwc+k6hljq4zv3upsoswumnxbf2fdqlhcc4wlxrmgypwgikxkjqhvolgejcd0etq0z5ztqrwfhsr2dkpkxz1nvnl9eme+xagl0zrjm1f2e2yqselteporcfifmqg0elpwhzq3wt6w4qep0qfzjm8mea6vx9vjmzxeuy76atpckjnvuz8z+ve6f10exfemulaz4e8wfimavyi7qsfhou4jywrkwalfmgu45jwvkplsodfxitn06ptve3bs3ltkq+jsa7jeakqm5ytgrkhddigndysz/jk3own48v4nz5mowuj39knf2b8/gdbzpzd</latexit> <latexit sha1_base64="befek43afpkcfmcy8mkkpnkg5zi=">aaaca3icbvc7sgnbfj2nrxhfq3badayhnmfxbk0kagmzwtwgg8ls5g4yzpbbzf0xlaebf8xgqhfbf8lov3e32uitt3u4517uucenpnbowd9gywl5zxwtuf7a2nza3jf395o6jbwhbg9lqnou0ybfaa0ukkedkwc+k6hljq4zv3upsoswumnxbf2fdqlhcc4wlxrmgypwgikxkjqhvolgejcd0etq0z5ztqrwfhsr2dkpkxz1nvnl9eme+xagl0zrjm1f2e2yqselteporcfifmqg0elpwhzq3wt6w4qep0qfzjm8mea6vx9vjmzxeuy76atpckjnvuz8z+ve6f10exfemulaz4e8wfimavyi7qsfhou4jywrkwalfmgu45jwvkplsodfxitn06ptve3bs3ltkq+jsa7jeakqm5ytgrkhddigndysz/jk3own48v4nz5mowuj39knf2b8/gdbzpzd</latexit> <latexit sha1_base64="befek43afpkcfmcy8mkkpnkg5zi=">aaaca3icbvc7sgnbfj2nrxhfq3badayhnmfxbk0kagmzwtwgg8ls5g4yzpbbzf0xlaebf8xgqhfbf8lov3e32uitt3u4517uucenpnbowd9gywl5zxwtuf7a2nza3jf395o6jbwhbg9lqnou0ybfaa0ukkedkwc+k6hljq4zv3upsoswumnxbf2fdqlhcc4wlxrmgypwgikxkjqhvolgejcd0etq0z5ztqrwfhsr2dkpkxz1nvnl9eme+xagl0zrjm1f2e2yqselteporcfifmqg0elpwhzq3wt6w4qep0qfzjm8mea6vx9vjmzxeuy76atpckjnvuz8z+ve6f10exfemulaz4e8wfimavyi7qsfhou4jywrkwalfmgu45jwvkplsodfxitn06ptve3bs3ltkq+jsa7jeakqm5ytgrkhddigndysz/jk3own48v4nz5mowuj39knf2b8/gdbzpzd</latexit> <latexit sha1_base64="befek43afpkcfmcy8mkkpnkg5zi=">aaaca3icbvc7sgnbfj2nrxhfq3badayhnmfxbk0kagmzwtwgg8ls5g4yzpbbzf0xlaebf8xgqhfbf8lov3e32uitt3u4517uucenpnbowd9gywl5zxwtuf7a2nza3jf395o6jbwhbg9lqnou0ybfaa0ukkedkwc+k6hljq4zv3upsoswumnxbf2fdqlhcc4wlxrmgypwgikxkjqhvolgejcd0etq0z5ztqrwfhsr2dkpkxz1nvnl9eme+xagl0zrjm1f2e2yqselteporcfifmqg0elpwhzq3wt6w4qep0qfzjm8mea6vx9vjmzxeuy76atpckjnvuz8z+ve6f10exfemulaz4e8wfimavyi7qsfhou4jywrkwalfmgu45jwvkplsodfxitn06ptve3bs3ltkq+jsa7jeakqm5ytgrkhddigndysz/jk3own48v4nz5mowuj39knf2b8/gdbzpzd</latexit> Why can we shift by log? c( 1 ) >c( 2 ) () log c( 1 ) > log c( 2 ) for c( ) > 0 2 1.5 Monotone increasing 1 0.5 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Likelihood values always > 0 25

Maximizing the log-likelihood e.g., F = R, is the mean of a Gaussian, fixed =1 log(ab) = log a + log b log(a c )=clog a 1 c( ) = log p(d ) 1 1 = log 2 exp 2 (x 1 ) 2 1 = log(2 ) 2 (x 1 ) 2 2 3 c( ) = log p(d ) 4 5 6 7 8 26 9 x 10 1 3 2 1 0 1 2 3 4 5

This conversion is even more important when we have more than one sample Example: Let D = {x1,x2} (two samples). If x1 and x2 are independent samples from same distribution (same model), then P(x1, x2 theta) = P(x1 f) P(x2 theta) p(x 1 )p(x 2 ) = 1 2 exp 1 2 (x 1 ) 2 1 2 exp 1 2 (x 2 ) 2 log (p(x 1 )p(x 2 )) = log p(x 1 ) + log p(x 2 ) = 2 log(2 ) 1 2 (x 1 ) 2 1 2 (x 2 ) 2 27

Visualizing the objective 0.01 0 0.009 0.008 5 0.007 10 0.006 0.005 15 0.004 0.003 20 0.002 25 0.001 0 3 2 1 0 1 2 3 4 5 30 3 2 1 0 1 2 3 4 5 p(x 1 )p(x 2 ) log (p(x 1 )p(x 2 )) 28

Exercise: about the marginal over the data p(d) Why do we avoid computing p(d)? How do we compute p(d)? We have the tools, since we know about marginals 29

How do we compute p(d)? If we have p(d, f), can we obtain p(d)? Marginalization If we have p(d f) and p(f), do we have p(d, f)? 30

Data marginal Using the formula of total probability 8P >< f2f p(d f)p(f) f : discrete p(d) = >: F p(d f)p(f)df f : continuous rior distribution can be fully described using th Fully expressible in terms of likelihood and prior 31

How do we solve the MAP and ML maximization problems? Naive strategy: 1. Guess 100 solutions theta 2. Pick the one with the largest value Can we do something better? 32

Crash course in optimization Goal: find maximal (or minimal) points of functions Generally assume functions are smooth, use gradient descent Derivative: direction of ascent from a scalar point Gradient: direction of ascent from a vector point d dx f(x) c rf(x) c 33

Function surface Local Minima Saddlepoint Global Minima 34

Single-variate calculus For a function f defined on a scalar x, the derivative is df dx (x) = lim h!0 f(x + h) h f(x) At any point, x, df (x) gives dx the slope of the tangent to the function at f(x) GIF from Wikipedia: Tangent 35 *Note: f is c in this slide

Why don t constants matter? max x f(x) c d f(x) c =0 dx maxu cf(x), c u c > 0 x d d ucf(x) c =cu f(x) c =0 dx dx Both have derivative zero under same condition regardless of uc>0 36

Can either minimize or maximize arg min f( ) c = arg max f( ) c c c æ 37 c(tw 1 +(1 t)w 2 ) Æ tc(w 1 )+(1 t)c(w 2 ) s that when we draw a line between any two p

Example: maximum likelihood for discrete distributions Imagine you are flipping a biased coin; the model parameter is the bias of the coin, theta You get a dataset D = {x_1,, x_n} of coin flips, where x_i = 1 if it was heads, and x_i = 0 if it was tails What is p(d theta)? p(d ) )=p(x 1,...,x n ) = ny i=1 p(x i ) x 2 x 3 x n xv 1 v 2 v 3 v N p(x ) = x (1 ) 1 x 38

Example: maximum likelihood for discrete distributions How do we estimate theta? Counting: x 2 x 3 x n xv 1 v 2 v 3 v N count the number of heads Nh count the number of tails Nt normalize: theta = Nh/ (Nh + Nt) What if you actually try to maximize the likelihood? i.e., solve argmax p(d theta) p(d ) )=p(x 1,...,x n ) ny = p(x i ) i=1 p(x ) = x (1 ) 1 x 39

Example: maximum likelihood for discrete distributions What if you actually try to maximize the likelihood to get theta? i.e., solve argmax p(d theta) max arg max ny i=1 f( ) c = p(x i ) = max ny i=1 p(x i ) f( ) c = arg max f( ) c log f( ) c x 2 x 3 x n xv 1 v 2 v 3 v N p(d ) )=p(x 1,...,x n ) ny = p(x i ) i=1 p(x ) = x (1 ) 1 x 40

Example: maximum likelihood for discrete distributions arg max f( ) c = arg max log f( ) c ny log f( ) c = log p(x i ) = nx i=1 i=1 log p(x i ) log(ab) = log a + log b log(a c )=c log a p(x ) = x (1 ) 1 x log p(x ) = log ( x ) + log (1 ) 1 x = x log ( )+(1 x) log (1 ) 41

Example: maximum likelihood for discrete distributions nx i=1 nx nx log p(x i ) = x i log ( )+ (1 x i ) log (1 ) i=1 = log ( ) nx x = i=1 i=1! nx x i + log (1 ) i=1 i=1 x i! nx (1 x i ) d d = 1 x 1 1 (n x) =0 42

Example: maximum likelihood for discrete distributions x = nx i=1 x i d d = 1 x 1 1 (n x) =0 =) x = n x 1 =) (1 ) x = (n x) =) x x = n x =) = x n 43

We solved for a stationary point But how do we know that this is the optimal solution to the optimization problem we specified? 44

Univariate optimization - Minima - Maxima - Saddle points If then has a local maximum at. If then has a local minimum at. If, the test is inconclusive. f 0 (a) = lim h!0 f(a + h) f(a) h 45

Second derivative test If then has a local maximum at. If then has a local minimum at. If, the test is inconclusive. 46 Ignore the green line

Second derivative test If then has a local maximum at. If then has a local minimum at. If, the test is inconclusive. 47

Example: MAP for discrete distributions Imagine you are flipping a biased coin; the model parameter is the bias of the coin, theta You get a dataset D = {x_1,, x_n} of coin flips, where x_i = 1 if it was heads, and x_i = 0 if it was tails What if we also specify p(theta)? x 2 x 3 x n xv 1 v 2 v 3 v N What is the MAP estimate? 48

Example: MAP for discrete distributions We still need to fully specify the prior p( ). To avoid complexities resulting from continuous variables, we ll consider a discrete with only three possible states, 2 {0.1, 0.5, 0.8}. Specifically,weassume p( =0.1) = 0.15, p( =0.5) = 0.8, p( =0.8) = 0.05 0.1 0.5 0.8 49

Example: MAP for discrete distributions For an experiment with N H =2, N T =8, the posterior distribution is p( ) p( D) =p(d )p( ) 0.1 0.5 0.8 0.1 0.5 0.8 If we were asked to choose a single a posteriori most likely value for, it would be =0.5, although our confidence in this is low since the posterior belief that = 0.1 is also appreciable. This result is intuitive since, even though we observed more Tails than Heads, our prior belief was that it was more likely the coin is fair. 50

Example: MAP for discrete distributions Repeating the above with N H = 20, N T = 80, the posterior changes to 0.1 0.5 0.8 0.1 0.5 0.8 51 so that the posterior belief in =0.1 dominates. There are so many more tails than heads that this is unlikely to occur from a fair coin. Even though we apriori thought that the coin was fair, a posteriori we have enough evidence to change our minds. What is the Max Likelihood solution for these two cases? Exercise: Solve for the MAP estimate

Now on to some careful examples of MAP! Whiteboard for Examples 8, 9, 10, 11 More fun with derivatives and finding the minimum of a function Next: introduction to prediction problems for ML 52