Parameter estimation Conditional risk

Size: px
Start display at page:

Download "Parameter estimation Conditional risk"

Transcription

1 Parameter estimation Conditional risk

2 Formalizing the problem Specify random variables we care about e.g., Commute Time e.g., Heights of buildings in a city We might then pick a particular distribution over these random variables e.g., Say we think our variable is Gaussian Now want a way to use the data to inform the model choice e.g., Let data tell us the parameters for that Gaussian 2

3 <latexit sha1_base64="dpbgw+tifmcqwl/qe7qytlb7rwm=">aaaca3icbvdlssnafj3uv62vqdvddbahgpskclosunelfewdmlgm00k7dgyszizccqu3/oobf4q49sfc+tdo2iy09ccfwzn3cu89qcyo0o7zbrwwlldw14rrpy3nre0de3evpajeytleeytkj0ckmcpiu1pnscewbpgakxywusr89gorikbito9j4nm0edskggkj9ewdjym9xiiln5okx5nt6ck64oi+dtkzy07vmqiuejcnzzcj0bo/vh6ee06exgwp1xwdwpspkppiriyll1ekrniebqrrqecckd+d/jcbx0bpwzcsposgu/x3riq4ummemm7syjxvzej/xjfr4ywfuhenmgg8wxqmdooizohappueazy2bgfjza0qd5fewjvysiyed/7lrdkqvv2n6t6eleuxerxfcaioqaw44bzuwtvogcba4be8g1fwzj1zl9a79tfrlvj5zd74a+vzb+xhlwm=</latexit> <latexit sha1_base64="dpbgw+tifmcqwl/qe7qytlb7rwm=">aaaca3icbvdlssnafj3uv62vqdvddbahgpskclosunelfewdmlgm00k7dgyszizccqu3/oobf4q49sfc+tdo2iy09ccfwzn3cu89qcyo0o7zbrwwlldw14rrpy3nre0de3evpajeytleeytkj0ckmcpiu1pnscewbpgakxywusr89gorikbito9j4nm0edskggkj9ewdjym9xiiln5okx5nt6ck64oi+dtkzy07vmqiuejcnzzcj0bo/vh6ee06exgwp1xwdwpspkppiriyll1ekrniebqrrqecckd+d/jcbx0bpwzcsposgu/x3riq4ummemm7syjxvzej/xjfr4ywfuhenmgg8wxqmdooizohappueazy2bgfjza0qd5fewjvysiyed/7lrdkqvv2n6t6eleuxerxfcaioqaw44bzuwtvogcba4be8g1fwzj1zl9a79tfrlvj5zd74a+vzb+xhlwm=</latexit> <latexit sha1_base64="dpbgw+tifmcqwl/qe7qytlb7rwm=">aaaca3icbvdlssnafj3uv62vqdvddbahgpskclosunelfewdmlgm00k7dgyszizccqu3/oobf4q49sfc+tdo2iy09ccfwzn3cu89qcyo0o7zbrwwlldw14rrpy3nre0de3evpajeytleeytkj0ckmcpiu1pnscewbpgakxywusr89gorikbito9j4nm0edskggkj9ewdjym9xiiln5okx5nt6ck64oi+dtkzy07vmqiuejcnzzcj0bo/vh6ee06exgwp1xwdwpspkppiriyll1ekrniebqrrqecckd+d/jcbx0bpwzcsposgu/x3riq4ummemm7syjxvzej/xjfr4ywfuhenmgg8wxqmdooizohappueazy2bgfjza0qd5fewjvysiyed/7lrdkqvv2n6t6eleuxerxfcaioqaw44bzuwtvogcba4be8g1fwzj1zl9a79tfrlvj5zd74a+vzb+xhlwm=</latexit> <latexit sha1_base64="dpbgw+tifmcqwl/qe7qytlb7rwm=">aaaca3icbvdlssnafj3uv62vqdvddbahgpskclosunelfewdmlgm00k7dgyszizccqu3/oobf4q49sfc+tdo2iy09ccfwzn3cu89qcyo0o7zbrwwlldw14rrpy3nre0de3evpajeytleeytkj0ckmcpiu1pnscewbpgakxywusr89gorikbito9j4nm0edskggkj9ewdjym9xiiln5okx5nt6ck64oi+dtkzy07vmqiuejcnzzcj0bo/vh6ee06exgwp1xwdwpspkppiriyll1ekrniebqrrqecckd+d/jcbx0bpwzcsposgu/x3riq4ummemm7syjxvzej/xjfr4ywfuhenmgg8wxqmdooizohappueazy2bgfjza0qd5fewjvysiyed/7lrdkqvv2n6t6eleuxerxfcaioqaw44bzuwtvogcba4be8g1fwzj1zl9a79tfrlvj5zd74a+vzb+xhlwm=</latexit> Parameter estimation Assume that we are given some model class, M, e.g., Gaussian with parameters mu and sigma N (µ, 2 ) selection of model from the class corresponds to selecting mu, sigma Now want to select best model; how do we define best? Generally assume data comes from that model class; might want to find model that best explains the data (or most likely given the data) Might want most likely model, with preference for important samples Might want most likely model, that also matches expert prior info Might want most likely model, that is the simplest (least parameters) 3 These additional requirements are usually in place to enable better generalization to unseen data

4 Some notation nx i=1 ny x i = x 1 + x x n 4 i=1 x i = x 1 x 2...x n f c : X R d! R xw = arg argmax max f(x) c(w) x f(x w2r d c(w )) = max max x f(x) c(w) w2r d M 2 M F is a set of models e.g., F = {N (µ, ) (µ, ) 2 R 2, > 0} e.g., F = {w 2 R d f(x) =x > w}

5 Definition of optimization We select some (error) function c we care about c c Maximizing c means we are finding largest point Minimizing c means we are finding smallest point 5

6 Maximum a posteriori (MAP) estimation f MAP = arg max f2f p(f D) Want the f that is most likely, given the data p(f D) is the posterior distribution of the model given data e.g., F could be the space of Gaussian distributions, the model is f and f(x) returns probability/density of a point x e.g., we could assume x is Gaussian distributed with variance = 1, and so F could be the reals, and the model f is the mean Question: What is the function we are optimizing and what are the parameters we are learning? 6

7 MAP f MAP = arg max f2f p(f D) p(f D) is the posterior distribution of the model given data e.g., we could assume x is Gaussian distributed with variance = 1, and so F could be the reals, and the model f is the mean Question: What is the function we are optimizing and what are the parameters we are learning? c(f) =p(mean is f D) max f2r c(f) 7

8 Maximum a posteriori (MAP) f MAP = arg max f2f p(f D) p(f D) is the posterior distribution of the model given data In discrete spaces: p(f D) is the PMF the MAP estimate is exactly the most probable model e.g., bias of coin is 0.1, 0.5, or 0.7, p(f = 0.1 D), In continuous spaces: p(f D) is the PDF the MAP estimate is the model with the largest value of the posterior density function e.g., bias of a coin is in [0, 1] 8 But what is p(f D)? Do we pick it? If so, how?

9 MAP calculation Start by applying Bayes rule p(f D) = p(d f)p(f) p(d) p(d f) is the likelihood of the data, under the model p(f) is the prior of the model p(d) is the marginal distribution of the data we will often be able to ignore this term 9

10 Why is this conversion important? Do not always have a known form for p(f D) We usually have chosen (known) forms for p(d f) and p(f) Let theta = parameters of model (distribution); interchangeable use p(d f) = p(d theta) and p(f) = p(theta) Example: Let D = {x1} (one sample). Then, if model class is the set of Gaussians: p(d f) = p(x1 mu, sigma) p(f D) is not obvious, since specified our model class for p(x f) What is p(f) in this case? We may put some prior preferences on mu and sigma, e.g., normal distribution around mu, specifying that really large magnitude values in mu are unlikely 10 Specifying and using p(f) is related to regularization and Bayesian parameter estimation, which will will discuss more later

11 <latexit sha1_base64="me6gobzojpz1cxptawiq1fj7la8=">aaab7xicbvbnswmxej34wetx1aoxybhqpeykomeif48v7ae0s8mm2ty2myxjvixl/4mxd4p49f9489+ytnvq1gcdj/dmmjkxjoib63nfagv1bx1js7bv3n7z3dsvhrw2juo1zq2qhnltkbgmugqny61g7uqzeoectclrzdrvptjtujl3dpywicydysnoixvsm6k89fyzxqnsvb0z8dlxc1kghpve6avbvzsnmbruegm6vpfyicpacirypnhndusihzeb6zgqscxmkm2unebtp/rxplqraffm/t2rkdiycry6zpjyovn0puj/xie10vwqczmklkk6xxslalufp6/jptemwjf2hfdn3a2ydokm1lqaii4ef/hlzdi8r/pe1b+7kneu8zgkcawnuaeflqegt1chblb4ggd4htek0at6rx/z1hwuzxzbh6dph6lzjn8=</latexit> <latexit sha1_base64="me6gobzojpz1cxptawiq1fj7la8=">aaab7xicbvbnswmxej34wetx1aoxybhqpeykomeif48v7ae0s8mm2ty2myxjvixl/4mxd4p49f9489+ytnvq1gcdj/dmmjkxjoib63nfagv1bx1js7bv3n7z3dsvhrw2juo1zq2qhnltkbgmugqny61g7uqzeoectclrzdrvptjtujl3dpywicydysnoixvsm6k89fyzxqnsvb0z8dlxc1kghpve6avbvzsnmbruegm6vpfyicpacirypnhndusihzeb6zgqscxmkm2unebtp/rxplqraffm/t2rkdiycry6zpjyovn0puj/xie10vwqczmklkk6xxslalufp6/jptemwjf2hfdn3a2ydokm1lqaii4ef/hlzdi8r/pe1b+7kneu8zgkcawnuaeflqegt1chblb4ggd4htek0at6rx/z1hwuzxzbh6dph6lzjn8=</latexit> <latexit sha1_base64="me6gobzojpz1cxptawiq1fj7la8=">aaab7xicbvbnswmxej34wetx1aoxybhqpeykomeif48v7ae0s8mm2ty2myxjvixl/4mxd4p49f9489+ytnvq1gcdj/dmmjkxjoib63nfagv1bx1js7bv3n7z3dsvhrw2juo1zq2qhnltkbgmugqny61g7uqzeoectclrzdrvptjtujl3dpywicydysnoixvsm6k89fyzxqnsvb0z8dlxc1kghpve6avbvzsnmbruegm6vpfyicpacirypnhndusihzeb6zgqscxmkm2unebtp/rxplqraffm/t2rkdiycry6zpjyovn0puj/xie10vwqczmklkk6xxslalufp6/jptemwjf2hfdn3a2ydokm1lqaii4ef/hlzdi8r/pe1b+7kneu8zgkcawnuaeflqegt1chblb4ggd4htek0at6rx/z1hwuzxzbh6dph6lzjn8=</latexit> <latexit sha1_base64="me6gobzojpz1cxptawiq1fj7la8=">aaab7xicbvbnswmxej34wetx1aoxybhqpeykomeif48v7ae0s8mm2ty2myxjvixl/4mxd4p49f9489+ytnvq1gcdj/dmmjkxjoib63nfagv1bx1js7bv3n7z3dsvhrw2juo1zq2qhnltkbgmugqny61g7uqzeoectclrzdrvptjtujl3dpywicydysnoixvsm6k89fyzxqnsvb0z8dlxc1kghpve6avbvzsnmbruegm6vpfyicpacirypnhndusihzeb6zgqscxmkm2unebtp/rxplqraffm/t2rkdiycry6zpjyovn0puj/xie10vwqczmklkk6xxslalufp6/jptemwjf2hfdn3a2ydokm1lqaii4ef/hlzdi8r/pe1b+7kneu8zgkcawnuaeflqegt1chblb4ggd4htek0at6rx/z1hwuzxzbh6dph6lzjn8=</latexit> <latexit sha1_base64="9ebez7asyxnus7k0kmzsc78bsse=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0leqmeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9t3+uwkw3xnikvey0kfcjt65a/eigzphniwqbxuem5i/iwqw5naaamxakwog9mhdi2vneltz/ntp+tmkgmsxsqwngsu/p7iakt1japsz0tnsc97m/e/r5ua8mrpuexsg5itfowpicyms7/jgctkrkwsouxxeythi6oomzadkg3bw355lbquqp5b9e4uk/xrpi4inmapnimhnajdltsgcqyg8ayv8oyi58v5dz4wrqunnzmgp3a+fwalsi2g</latexit> <latexit sha1_base64="9ebez7asyxnus7k0kmzsc78bsse=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0leqmeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9t3+uwkw3xnikvey0kfcjt65a/eigzphniwqbxuem5i/iwqw5naaamxakwog9mhdi2vneltz/ntp+tmkgmsxsqwngsu/p7iakt1japsz0tnsc97m/e/r5ua8mrpuexsg5itfowpicyms7/jgctkrkwsouxxeythi6oomzadkg3bw355lbquqp5b9e4uk/xrpi4inmapnimhnajdltsgcqyg8ayv8oyi58v5dz4wrqunnzmgp3a+fwalsi2g</latexit> <latexit sha1_base64="9ebez7asyxnus7k0kmzsc78bsse=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0leqmeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9t3+uwkw3xnikvey0kfcjt65a/eigzphniwqbxuem5i/iwqw5naaamxakwog9mhdi2vneltz/ntp+tmkgmsxsqwngsu/p7iakt1japsz0tnsc97m/e/r5ua8mrpuexsg5itfowpicyms7/jgctkrkwsouxxeythi6oomzadkg3bw355lbquqp5b9e4uk/xrpi4inmapnimhnajdltsgcqyg8ayv8oyi58v5dz4wrqunnzmgp3a+fwalsi2g</latexit> <latexit sha1_base64="su5hbpak+kk9dlsnnulnhnxer84=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0mkomeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9sv9csvt+roqvajl5mk5gj0y1+9qczsckvhgmrd9dze+blvhjob01iv1zhqnqzd7foqaytaz+antsmzvqykjjutachc/t2r0ujrsrtyzoiakv72zuj/xjc14zwfczmkbivblaptquxmzn+tavfijjhyqpni9lbcrlrrzmw6jruct/zykmnvqp5b9e4ukvxrpi4inmapnimhl1chw2haexgm4rle4c0rzovz7nwswgtopnmmf+b8/ganni2h</latexit> <latexit sha1_base64="su5hbpak+kk9dlsnnulnhnxer84=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0mkomeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9sv9csvt+roqvajl5mk5gj0y1+9qczsckvhgmrd9dze+blvhjob01iv1zhqnqzd7foqaytaz+antsmzvqykjjutachc/t2r0ujrsrtyzoiakv72zuj/xjc14zwfczmkbivblaptquxmzn+tavfijjhyqpni9lbcrlrrzmw6jruct/zykmnvqp5b9e4ukvxrpi4inmapnimhl1chw2haexgm4rle4c0rzovz7nwswgtopnmmf+b8/ganni2h</latexit> <latexit sha1_base64="su5hbpak+kk9dlsnnulnhnxer84=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0mkomeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9sv9csvt+roqvajl5mk5gj0y1+9qczsckvhgmrd9dze+blvhjob01iv1zhqnqzd7foqaytaz+antsmzvqykjjutachc/t2r0ujrsrtyzoiakv72zuj/xjc14zwfczmkbivblaptquxmzn+tavfijjhyqpni9lbcrlrrzmw6jruct/zykmnvqp5b9e4ukvxrpi4inmapnimhl1chw2haexgm4rle4c0rzovz7nwswgtopnmmf+b8/ganni2h</latexit> <latexit sha1_base64="su5hbpak+kk9dlsnnulnhnxer84=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0mkomeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9sv9csvt+roqvajl5mk5gj0y1+9qczsckvhgmrd9dze+blvhjob01iv1zhqnqzd7foqaytaz+antsmzvqykjjutachc/t2r0ujrsrtyzoiakv72zuj/xjc14zwfczmkbivblaptquxmzn+tavfijjhyqpni9lbcrlrrzmw6jruct/zykmnvqp5b9e4ukvxrpi4inmapnimhl1chw2haexgm4rle4c0rzovz7nwswgtopnmmf+b8/ganni2h</latexit> <latexit sha1_base64="1zatsa6aurzb6tfxhpdtmv4pqxs=">aaab7xicbvbnswmxej2tx7v+vt16crahxspuefry9okxgv2adinznnvgzpmlyypl6x/w4kerr/4fb/4b0+0etpxbwoo9gwbmbtfn2rjut1nyw9/y3cpul3z29/ypyodhbs0trwilsc5vn8cacizoyzddatdwfecbp51gcjp3o49uasbfvzng1i/wslcqewys1i6rt4p6+abccwtubrrkvjxuiedzup7qdyvjiiom4vjrnufgxk+xmoxwoiv1e01jtcz4rhuwchxr7afzttn0zpuhcqwyjqzk1n8tky60nkab7yywgetlby7+5/use175krnxyqggi0vhwpgrap46gjjfieftszbrzn6kybgrtiwnqgrd8jzfxixtes1za97dravxncdrhbm4hsp4caknuiumtidaazzdk7w50nlx3p2prwvbyweo4q+czx+kxo6a</latexit> <latexit sha1_base64="1zatsa6aurzb6tfxhpdtmv4pqxs=">aaab7xicbvbnswmxej2tx7v+vt16crahxspuefry9okxgv2adinznnvgzpmlyypl6x/w4kerr/4fb/4b0+0etpxbwoo9gwbmbtfn2rjut1nyw9/y3cpul3z29/ypyodhbs0trwilsc5vn8cacizoyzddatdwfecbp51gcjp3o49uasbfvzng1i/wslcqewys1i6rt4p6+abccwtubrrkvjxuiedzup7qdyvjiiom4vjrnufgxk+xmoxwoiv1e01jtcz4rhuwchxr7afzttn0zpuhcqwyjqzk1n8tky60nkab7yywgetlby7+5/use175krnxyqggi0vhwpgrap46gjjfieftszbrzn6kybgrtiwnqgrd8jzfxixtes1za97dravxncdrhbm4hsp4caknuiumtidaazzdk7w50nlx3p2prwvbyweo4q+czx+kxo6a</latexit> <latexit sha1_base64="1zatsa6aurzb6tfxhpdtmv4pqxs=">aaab7xicbvbnswmxej2tx7v+vt16crahxspuefry9okxgv2adinznnvgzpmlyypl6x/w4kerr/4fb/4b0+0etpxbwoo9gwbmbtfn2rjut1nyw9/y3cpul3z29/ypyodhbs0trwilsc5vn8cacizoyzddatdwfecbp51gcjp3o49uasbfvzng1i/wslcqewys1i6rt4p6+abccwtubrrkvjxuiedzup7qdyvjiiom4vjrnufgxk+xmoxwoiv1e01jtcz4rhuwchxr7afzttn0zpuhcqwyjqzk1n8tky60nkab7yywgetlby7+5/use175krnxyqggi0vhwpgrap46gjjfieftszbrzn6kybgrtiwnqgrd8jzfxixtes1za97dravxncdrhbm4hsp4caknuiumtidaazzdk7w50nlx3p2prwvbyweo4q+czx+kxo6a</latexit> <latexit sha1_base64="1zatsa6aurzb6tfxhpdtmv4pqxs=">aaab7xicbvbnswmxej2tx7v+vt16crahxspuefry9okxgv2adinznnvgzpmlyypl6x/w4kerr/4fb/4b0+0etpxbwoo9gwbmbtfn2rjut1nyw9/y3cpul3z29/ypyodhbs0trwilsc5vn8cacizoyzddatdwfecbp51gcjp3o49uasbfvzng1i/wslcqewys1i6rt4p6+abccwtubrrkvjxuiedzup7qdyvjiiom4vjrnufgxk+xmoxwoiv1e01jtcz4rhuwchxr7afzttn0zpuhcqwyjqzk1n8tky60nkab7yywgetlby7+5/use175krnxyqggi0vhwpgrap46gjjfieftszbrzn6kybgrtiwnqgrd8jzfxixtes1za97dravxncdrhbm4hsp4caknuiumtidaazzdk7w50nlx3p2prwvbyweo4q+czx+kxo6a</latexit> Example of p(d theta) D Probabilities for fixed theta = N (x 1 µ =, 2 = 1) = exp 2 (x 1 ) 2 p(x 1 ) p(x 2 ) 11 x<latexit sha1_base64="9ebez7asyxnus7k0kmzsc78bsse=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0leqmeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9t3+uwkw3xnikvey0kfcjt65a/eigzphniwqbxuem5i/iwqw5naaamxakwog9mhdi2vneltz/ntp+tmkgmsxsqwngsu/p7iakt1japsz0tnsc97m/e/r5ua8mrpuexsg5itfowpicyms7/jgctkrkwsouxxeythi6oomzadkg3bw355lbquqp5b9e4uk/xrpi4inmapnimhnajdltsgcqyg8ayv8oyi58v5dz4wrqunnzmgp3a+fwalsi2g</latexit> 1 x 2

12 <latexit sha1_base64="b7fkygxrip6cevss/e9i0kxe/jw=">aaacnxichzhfahnbfmznt1prtg2qd3rhwwbjoitdinsbqogkkkuqmkaqszezk0kydpypm2dlw5q38km8822ctsluvtozdhx853eymw/ixctlqfdb8zcepnx8tpw49utp9s5ufe/zqc0ki2rfzdozzzfaqvuq+6riy7pcsexilqfxxfuqp7iuxqos/u7zxi4snkzqogsss6l6z7zje6szqf1+wmap4dsthhaezi8ogfs1tfc802rb/he4ulykwgo4ijr8tprzwsxznpo23cqdaiw0glv1l+7cr0f1rtaolgw3rbgwdbauk6j+i48zusqyjahr2mey5dqq0zaswi5qvlayr3gbuzl0msve2lg5thcbb5wzhklm3e4jlu71irita+dj7mgqsxuzv5n/6w0lmrwblsrnc5kpwb00ktrqbtvxwvgzkujpnubhllsribkafoq+tozccg8++by47btdob1+e9vo9tzxblgx7dvrspadsi77xe5ynwnvhdf1pntf/ff+r//y/7pcfw8985z9u/7gdx1bwxq=</latexit> <latexit sha1_base64="b7fkygxrip6cevss/e9i0kxe/jw=">aaacnxichzhfahnbfmznt1prtg2qd3rhwwbjoitdinsbqogkkkuqmkaqszezk0kydpypm2dlw5q38km8822ctsluvtozdhx853eymw/ixctlqfdb8zcepnx8tpw49utp9s5ufe/zqc0ki2rfzdozzzfaqvuq+6riy7pcsexilqfxxfuqp7iuxqos/u7zxi4snkzqogsss6l6z7zje6szqf1+wmap4dsthhaezi8ogfs1tfc802rb/he4ulykwgo4ijr8tprzwsxznpo23cqdaiw0glv1l+7cr0f1rtaolgw3rbgwdbauk6j+i48zusqyjahr2mey5dqq0zaswi5qvlayr3gbuzl0msve2lg5thcbb5wzhklm3e4jlu71irita+dj7mgqsxuzv5n/6w0lmrwblsrnc5kpwb00ktrqbtvxwvgzkujpnubhllsribkafoq+tozccg8++by47btdob1+e9vo9tzxblgx7dvrspadsi77xe5ynwnvhdf1pntf/ff+r//y/7pcfw8985z9u/7gdx1bwxq=</latexit> <latexit sha1_base64="b7fkygxrip6cevss/e9i0kxe/jw=">aaacnxichzhfahnbfmznt1prtg2qd3rhwwbjoitdinsbqogkkkuqmkaqszezk0kydpypm2dlw5q38km8822ctsluvtozdhx853eymw/ixctlqfdb8zcepnx8tpw49utp9s5ufe/zqc0ki2rfzdozzzfaqvuq+6riy7pcsexilqfxxfuqp7iuxqos/u7zxi4snkzqogsss6l6z7zje6szqf1+wmap4dsthhaezi8ogfs1tfc802rb/he4ulykwgo4ijr8tprzwsxznpo23cqdaiw0glv1l+7cr0f1rtaolgw3rbgwdbauk6j+i48zusqyjahr2mey5dqq0zaswi5qvlayr3gbuzl0msve2lg5thcbb5wzhklm3e4jlu71irita+dj7mgqsxuzv5n/6w0lmrwblsrnc5kpwb00ktrqbtvxwvgzkujpnubhllsribkafoq+tozccg8++by47btdob1+e9vo9tzxblgx7dvrspadsi77xe5ynwnvhdf1pntf/ff+r//y/7pcfw8985z9u/7gdx1bwxq=</latexit> <latexit sha1_base64="b7fkygxrip6cevss/e9i0kxe/jw=">aaacnxichzhfahnbfmznt1prtg2qd3rhwwbjoitdinsbqogkkkuqmkaqszezk0kydpypm2dlw5q38km8822ctsluvtozdhx853eymw/ixctlqfdb8zcepnx8tpw49utp9s5ufe/zqc0ki2rfzdozzzfaqvuq+6riy7pcsexilqfxxfuqp7iuxqos/u7zxi4snkzqogsss6l6z7zje6szqf1+wmap4dsthhaezi8ogfs1tfc802rb/he4ulykwgo4ijr8tprzwsxznpo23cqdaiw0glv1l+7cr0f1rtaolgw3rbgwdbauk6j+i48zusqyjahr2mey5dqq0zaswi5qvlayr3gbuzl0msve2lg5thcbb5wzhklm3e4jlu71irita+dj7mgqsxuzv5n/6w0lmrwblsrnc5kpwb00ktrqbtvxwvgzkujpnubhllsribkafoq+tozccg8++by47btdob1+e9vo9tzxblgx7dvrspadsi77xe5ynwnvhdf1pntf/ff+r//y/7pcfw8985z9u/7gdx1bwxq=</latexit> <latexit sha1_base64="me6gobzojpz1cxptawiq1fj7la8=">aaab7xicbvbnswmxej34wetx1aoxybhqpeykomeif48v7ae0s8mm2ty2myxjvixl/4mxd4p49f9489+ytnvq1gcdj/dmmjkxjoib63nfagv1bx1js7bv3n7z3dsvhrw2juo1zq2qhnltkbgmugqny61g7uqzeoectclrzdrvptjtujl3dpywicydysnoixvsm6k89fyzxqnsvb0z8dlxc1kghpve6avbvzsnmbruegm6vpfyicpacirypnhndusihzeb6zgqscxmkm2unebtp/rxplqraffm/t2rkdiycry6zpjyovn0puj/xie10vwqczmklkk6xxslalufp6/jptemwjf2hfdn3a2ydokm1lqaii4ef/hlzdi8r/pe1b+7kneu8zgkcawnuaeflqegt1chblb4ggd4htek0at6rx/z1hwuzxzbh6dph6lzjn8=</latexit> <latexit sha1_base64="me6gobzojpz1cxptawiq1fj7la8=">aaab7xicbvbnswmxej34wetx1aoxybhqpeykomeif48v7ae0s8mm2ty2myxjvixl/4mxd4p49f9489+ytnvq1gcdj/dmmjkxjoib63nfagv1bx1js7bv3n7z3dsvhrw2juo1zq2qhnltkbgmugqny61g7uqzeoectclrzdrvptjtujl3dpywicydysnoixvsm6k89fyzxqnsvb0z8dlxc1kghpve6avbvzsnmbruegm6vpfyicpacirypnhndusihzeb6zgqscxmkm2unebtp/rxplqraffm/t2rkdiycry6zpjyovn0puj/xie10vwqczmklkk6xxslalufp6/jptemwjf2hfdn3a2ydokm1lqaii4ef/hlzdi8r/pe1b+7kneu8zgkcawnuaeflqegt1chblb4ggd4htek0at6rx/z1hwuzxzbh6dph6lzjn8=</latexit> <latexit sha1_base64="me6gobzojpz1cxptawiq1fj7la8=">aaab7xicbvbnswmxej34wetx1aoxybhqpeykomeif48v7ae0s8mm2ty2myxjvixl/4mxd4p49f9489+ytnvq1gcdj/dmmjkxjoib63nfagv1bx1js7bv3n7z3dsvhrw2juo1zq2qhnltkbgmugqny61g7uqzeoectclrzdrvptjtujl3dpywicydysnoixvsm6k89fyzxqnsvb0z8dlxc1kghpve6avbvzsnmbruegm6vpfyicpacirypnhndusihzeb6zgqscxmkm2unebtp/rxplqraffm/t2rkdiycry6zpjyovn0puj/xie10vwqczmklkk6xxslalufp6/jptemwjf2hfdn3a2ydokm1lqaii4ef/hlzdi8r/pe1b+7kneu8zgkcawnuaeflqegt1chblb4ggd4htek0at6rx/z1hwuzxzbh6dph6lzjn8=</latexit> <latexit sha1_base64="me6gobzojpz1cxptawiq1fj7la8=">aaab7xicbvbnswmxej34wetx1aoxybhqpeykomeif48v7ae0s8mm2ty2myxjvixl/4mxd4p49f9489+ytnvq1gcdj/dmmjkxjoib63nfagv1bx1js7bv3n7z3dsvhrw2juo1zq2qhnltkbgmugqny61g7uqzeoectclrzdrvptjtujl3dpywicydysnoixvsm6k89fyzxqnsvb0z8dlxc1kghpve6avbvzsnmbruegm6vpfyicpacirypnhndusihzeb6zgqscxmkm2unebtp/rxplqraffm/t2rkdiycry6zpjyovn0puj/xie10vwqczmklkk6xxslalufp6/jptemwjf2hfdn3a2ydokm1lqaii4ef/hlzdi8r/pe1b+7kneu8zgkcawnuaeflqegt1chblb4ggd4htek0at6rx/z1hwuzxzbh6dph6lzjn8=</latexit> <latexit sha1_base64="9ebez7asyxnus7k0kmzsc78bsse=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0leqmeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9t3+uwkw3xnikvey0kfcjt65a/eigzphniwqbxuem5i/iwqw5naaamxakwog9mhdi2vneltz/ntp+tmkgmsxsqwngsu/p7iakt1japsz0tnsc97m/e/r5ua8mrpuexsg5itfowpicyms7/jgctkrkwsouxxeythi6oomzadkg3bw355lbquqp5b9e4uk/xrpi4inmapnimhnajdltsgcqyg8ayv8oyi58v5dz4wrqunnzmgp3a+fwalsi2g</latexit> <latexit sha1_base64="9ebez7asyxnus7k0kmzsc78bsse=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0leqmeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9t3+uwkw3xnikvey0kfcjt65a/eigzphniwqbxuem5i/iwqw5naaamxakwog9mhdi2vneltz/ntp+tmkgmsxsqwngsu/p7iakt1japsz0tnsc97m/e/r5ua8mrpuexsg5itfowpicyms7/jgctkrkwsouxxeythi6oomzadkg3bw355lbquqp5b9e4uk/xrpi4inmapnimhnajdltsgcqyg8ayv8oyi58v5dz4wrqunnzmgp3a+fwalsi2g</latexit> <latexit sha1_base64="9ebez7asyxnus7k0kmzsc78bsse=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0leqmeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9t3+uwkw3xnikvey0kfcjt65a/eigzphniwqbxuem5i/iwqw5naaamxakwog9mhdi2vneltz/ntp+tmkgmsxsqwngsu/p7iakt1japsz0tnsc97m/e/r5ua8mrpuexsg5itfowpicyms7/jgctkrkwsouxxeythi6oomzadkg3bw355lbquqp5b9e4uk/xrpi4inmapnimhnajdltsgcqyg8ayv8oyi58v5dz4wrqunnzmgp3a+fwalsi2g</latexit> <latexit sha1_base64="su5hbpak+kk9dlsnnulnhnxer84=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0mkomeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9sv9csvt+roqvajl5mk5gj0y1+9qczsckvhgmrd9dze+blvhjob01iv1zhqnqzd7foqaytaz+antsmzvqykjjutachc/t2r0ujrsrtyzoiakv72zuj/xjc14zwfczmkbivblaptquxmzn+tavfijjhyqpni9lbcrlrrzmw6jruct/zykmnvqp5b9e4ukvxrpi4inmapnimhl1chw2haexgm4rle4c0rzovz7nwswgtopnmmf+b8/ganni2h</latexit> <latexit sha1_base64="su5hbpak+kk9dlsnnulnhnxer84=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0mkomeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9sv9csvt+roqvajl5mk5gj0y1+9qczsckvhgmrd9dze+blvhjob01iv1zhqnqzd7foqaytaz+antsmzvqykjjutachc/t2r0ujrsrtyzoiakv72zuj/xjc14zwfczmkbivblaptquxmzn+tavfijjhyqpni9lbcrlrrzmw6jruct/zykmnvqp5b9e4ukvxrpi4inmapnimhl1chw2haexgm4rle4c0rzovz7nwswgtopnmmf+b8/ganni2h</latexit> <latexit sha1_base64="su5hbpak+kk9dlsnnulnhnxer84=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0mkomeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9sv9csvt+roqvajl5mk5gj0y1+9qczsckvhgmrd9dze+blvhjob01iv1zhqnqzd7foqaytaz+antsmzvqykjjutachc/t2r0ujrsrtyzoiakv72zuj/xjc14zwfczmkbivblaptquxmzn+tavfijjhyqpni9lbcrlrrzmw6jruct/zykmnvqp5b9e4ukvxrpi4inmapnimhl1chw2haexgm4rle4c0rzovz7nwswgtopnmmf+b8/ganni2h</latexit> <latexit sha1_base64="su5hbpak+kk9dlsnnulnhnxer84=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0mkomeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9sv9csvt+roqvajl5mk5gj0y1+9qczsckvhgmrd9dze+blvhjob01iv1zhqnqzd7foqaytaz+antsmzvqykjjutachc/t2r0ujrsrtyzoiakv72zuj/xjc14zwfczmkbivblaptquxmzn+tavfijjhyqpni9lbcrlrrzmw6jruct/zykmnvqp5b9e4ukvxrpi4inmapnimhl1chw2haexgm4rle4c0rzovz7nwswgtopnmmf+b8/ganni2h</latexit> <latexit sha1_base64="1zatsa6aurzb6tfxhpdtmv4pqxs=">aaab7xicbvbnswmxej2tx7v+vt16crahxspuefry9okxgv2adinznnvgzpmlyypl6x/w4kerr/4fb/4b0+0etpxbwoo9gwbmbtfn2rjut1nyw9/y3cpul3z29/ypyodhbs0trwilsc5vn8cacizoyzddatdwfecbp51gcjp3o49uasbfvzng1i/wslcqewys1i6rt4p6+abccwtubrrkvjxuiedzup7qdyvjiiom4vjrnufgxk+xmoxwoiv1e01jtcz4rhuwchxr7afzttn0zpuhcqwyjqzk1n8tky60nkab7yywgetlby7+5/use175krnxyqggi0vhwpgrap46gjjfieftszbrzn6kybgrtiwnqgrd8jzfxixtes1za97dravxncdrhbm4hsp4caknuiumtidaazzdk7w50nlx3p2prwvbyweo4q+czx+kxo6a</latexit> <latexit sha1_base64="1zatsa6aurzb6tfxhpdtmv4pqxs=">aaab7xicbvbnswmxej2tx7v+vt16crahxspuefry9okxgv2adinznnvgzpmlyypl6x/w4kerr/4fb/4b0+0etpxbwoo9gwbmbtfn2rjut1nyw9/y3cpul3z29/ypyodhbs0trwilsc5vn8cacizoyzddatdwfecbp51gcjp3o49uasbfvzng1i/wslcqewys1i6rt4p6+abccwtubrrkvjxuiedzup7qdyvjiiom4vjrnufgxk+xmoxwoiv1e01jtcz4rhuwchxr7afzttn0zpuhcqwyjqzk1n8tky60nkab7yywgetlby7+5/use175krnxyqggi0vhwpgrap46gjjfieftszbrzn6kybgrtiwnqgrd8jzfxixtes1za97dravxncdrhbm4hsp4caknuiumtidaazzdk7w50nlx3p2prwvbyweo4q+czx+kxo6a</latexit> <latexit sha1_base64="1zatsa6aurzb6tfxhpdtmv4pqxs=">aaab7xicbvbnswmxej2tx7v+vt16crahxspuefry9okxgv2adinznnvgzpmlyypl6x/w4kerr/4fb/4b0+0etpxbwoo9gwbmbtfn2rjut1nyw9/y3cpul3z29/ypyodhbs0trwilsc5vn8cacizoyzddatdwfecbp51gcjp3o49uasbfvzng1i/wslcqewys1i6rt4p6+abccwtubrrkvjxuiedzup7qdyvjiiom4vjrnufgxk+xmoxwoiv1e01jtcz4rhuwchxr7afzttn0zpuhcqwyjqzk1n8tky60nkab7yywgetlby7+5/use175krnxyqggi0vhwpgrap46gjjfieftszbrzn6kybgrtiwnqgrd8jzfxixtes1za97dravxncdrhbm4hsp4caknuiumtidaazzdk7w50nlx3p2prwvbyweo4q+czx+kxo6a</latexit> <latexit sha1_base64="1zatsa6aurzb6tfxhpdtmv4pqxs=">aaab7xicbvbnswmxej2tx7v+vt16crahxspuefry9okxgv2adinznnvgzpmlyypl6x/w4kerr/4fb/4b0+0etpxbwoo9gwbmbtfn2rjut1nyw9/y3cpul3z29/ypyodhbs0trwilsc5vn8cacizoyzddatdwfecbp51gcjp3o49uasbfvzng1i/wslcqewys1i6rt4p6+abccwtubrrkvjxuiedzup7qdyvjiiom4vjrnufgxk+xmoxwoiv1e01jtcz4rhuwchxr7afzttn0zpuhcqwyjqzk1n8tky60nkab7yywgetlby7+5/use175krnxyqggi0vhwpgrap46gjjfieftszbrzn6kybgrtiwnqgrd8jzfxixtes1za97dravxncdrhbm4hsp4caknuiumtidaazzdk7w50nlx3p2prwvbyweo4q+czx+kxo6a</latexit> With n samples For iid samples x1,, xn, we could choose (e.g.,) a Gaussian distribution for P(xi theta), with theta = {mu,sigma} Recall iid = independent and identically distributed P(x1,, xn theta) = P(x1 theta) P(xn theta) Example: D = {x1, x2} p(d =(µ, 2 )) = p({x 1,x 2 } =(µ, = p x 1 =(µ, 2 ) 2 )) p x 2 =(µ, 2 ) 12 p(x 1 ) x<latexit sha1_base64="9ebez7asyxnus7k0kmzsc78bsse=">aaab6nicbvbns8naej3ur1q/qh69lbbbu0leqmeif48v7qe0owy2k3bpzhn2n2ij/qlepcji1v/kzx/jts1bwx8mpn6bywzekaiujet+o4w19y3nrej2awd3b/+gfhju0ngqgdzzlglvcahgwsu2dtcco4lcgguc28h4zua3h1fphsshm0nqj+hq8pazaqx0/9t3+uwkw3xnikvey0kfcjt65a/eigzphniwqbxuem5i/iwqw5naaamxakwog9mhdi2vneltz/ntp+tmkgmsxsqwngsu/p7iakt1japsz0tnsc97m/e/r5ua8mrpuexsg5itfowpicyms7/jgctkrkwsouxxeythi6oomzadkg3bw355lbquqp5b9e4uk/xrpi4inmapnimhnajdltsgcqyg8ayv8oyi58v5dz4wrqunnzmgp3a+fwalsi2g</latexit> 1 x 2 p(x 2 )

13 What does it mean to say X1 and X2 are independent? Independent samples X1,, Xn e.g., P(X1,X2 theta) = P(X1 theta) P(X2 theta) X1 and X2 are random variables, from sampling the distribution twice Because of randomness, X1 could have been many things it represents the random variable that is the first sample 13

14 September 13, 2018 Pre-requisites for the course Need to have probability, statistics, linear algebra and programming background (not just prototyping in R or Matlab) Graduate students come from diverse undergraduate programs; I expect you to fill in any gaps you have If you are missing important background, do consider if this course is right for you, because it might be a struggle Add/drop deadline is September 17 Any questions? 14

15 Formulating the problem As motivated upfront: our main goal is to learn a function f for some inputs x to make predictions about targets y As we will discuss, this will translate into learning conditional distributions p(y x) A general goal then is to learn the best parameters for a (conditional) distribution over some variables, e.g. p(x) or p(y x) 15

16 How can we identify the parameters for a distribution? All we get is access to samples x1,, xn from the true underlying distribution p(x) We want to find parameters theta so that p_theta approximates the true distribution p well e.g., theta is the mean and variance for a Gaussian e.g., theta is the parameters to a Gamma distribution (commute times) Any ideas? 16

17 <latexit sha1_base64="zxry4cg4iusonw3ypxspoj00050=">aaacgxicbvfrsxtben47w7wxamwf+7i0cvpgujncrslic6wptjqq5gly28wls/z2j905nwzvf/r39a1/pntveksjawvffn98zoxmwkhhmyr+buhki5era+uvwhuvn7e22ztvlqwudyce11kbq5rzkejbdwvkucomsdyvcjlov9t65q0yk7t6ibmcbjkbk5ejztbtw/bvjgc44uy6rxx9rboxndcgslp3o7oe0csi8qszmfr2guat+su/bh9oze6uoqjt99l1grq4z2+9aeeohc20ovbnqktholblxathuxn1oybouxavqics4nzy/pomnc9zumgls7yfrwuohdmouisqlzqwcsanbax9dxxlwq5cs8gk7nlm1ayxayw0yr86hmutnewpr6zhtmtatt6n9uvmjgdoqkjeuhzekcslru3rc9crmmbrzjxg3ag/k+utzhhhf7swx0k8/own4okog0fd+puhztnnxtrwytvynuytmhwkz+qbosc9wsm/ydc4dlrhsngqruhrvdqmfp635fgep/8bqjpc+w==</latexit> <latexit sha1_base64="zxry4cg4iusonw3ypxspoj00050=">aaacgxicbvfrsxtben47w7wxamwf+7i0cvpgujncrslic6wptjqq5gly28wls/z2j905nwzvf/r39a1/pntveksjawvffn98zoxmwkhhmyr+buhki5era+uvwhuvn7e22ztvlqwudyce11kbq5rzkejbdwvkucomsdyvcjlov9t65q0yk7t6ibmcbjkbk5ejztbtw/bvjgc44uy6rxx9rboxndcgslp3o7oe0csi8qszmfr2guat+su/bh9oze6uoqjt99l1grq4z2+9aeeohc20ovbnqktholblxathuxn1oybouxavqics4nzy/pomnc9zumgls7yfrwuohdmouisqlzqwcsanbax9dxxlwq5cs8gk7nlm1ayxayw0yr86hmutnewpr6zhtmtatt6n9uvmjgdoqkjeuhzekcslru3rc9crmmbrzjxg3ag/k+utzhhhf7swx0k8/own4okog0fd+puhztnnxtrwytvynuytmhwkz+qbosc9wsm/ydc4dlrhsngqruhrvdqmfp635fgep/8bqjpc+w==</latexit> <latexit sha1_base64="zxry4cg4iusonw3ypxspoj00050=">aaacgxicbvfrsxtben47w7wxamwf+7i0cvpgujncrslic6wptjqq5gly28wls/z2j905nwzvf/r39a1/pntveksjawvffn98zoxmwkhhmyr+buhki5era+uvwhuvn7e22ztvlqwudyce11kbq5rzkejbdwvkucomsdyvcjlov9t65q0yk7t6ibmcbjkbk5ejztbtw/bvjgc44uy6rxx9rboxndcgslp3o7oe0csi8qszmfr2guat+su/bh9oze6uoqjt99l1grq4z2+9aeeohc20ovbnqktholblxathuxn1oybouxavqics4nzy/pomnc9zumgls7yfrwuohdmouisqlzqwcsanbax9dxxlwq5cs8gk7nlm1ayxayw0yr86hmutnewpr6zhtmtatt6n9uvmjgdoqkjeuhzekcslru3rc9crmmbrzjxg3ag/k+utzhhhf7swx0k8/own4okog0fd+puhztnnxtrwytvynuytmhwkz+qbosc9wsm/ydc4dlrhsngqruhrvdqmfp635fgep/8bqjpc+w==</latexit> <latexit sha1_base64="zxry4cg4iusonw3ypxspoj00050=">aaacgxicbvfrsxtben47w7wxamwf+7i0cvpgujncrslic6wptjqq5gly28wls/z2j905nwzvf/r39a1/pntveksjawvffn98zoxmwkhhmyr+buhki5era+uvwhuvn7e22ztvlqwudyce11kbq5rzkejbdwvkucomsdyvcjlov9t65q0yk7t6ibmcbjkbk5ejztbtw/bvjgc44uy6rxx9rboxndcgslp3o7oe0csi8qszmfr2guat+su/bh9oze6uoqjt99l1grq4z2+9aeeohc20ovbnqktholblxathuxn1oybouxavqics4nzy/pomnc9zumgls7yfrwuohdmouisqlzqwcsanbax9dxxlwq5cs8gk7nlm1ayxayw0yr86hmutnewpr6zhtmtatt6n9uvmjgdoqkjeuhzekcslru3rc9crmmbrzjxg3ag/k+utzhhhf7swx0k8/own4okog0fd+puhztnnxtrwytvynuytmhwkz+qbosc9wsm/ydc4dlrhsngqruhrvdqmfp635fgep/8bqjpc+w==</latexit> <latexit sha1_base64="ebh8vwe41e+mncrq1yv7w0htm9m=">aaackhicbvdlsgmxfm34rpu16tjnsah1yzkpgm7eoiauk9ghdnohk2ba0cqzjbmxdpm5bvwvnykkdouxmd4w2nog5hdovdx7txazqrtjjkyl5zxvtfxcrn5za3tn197br6sokzjucmqi2qyqiowkutnum9kmjue8ykqrdg7gfuorseuj8achmwlz1bm0pbhpi/n2lcep8nmqelrajypdx4ilt1kgpzvwp6wxbtyrsbgwj2yqpk+zt0/gkryar1p27yjtciaai8sdkqkyoerb7143wgknqmoglgq5tqzbkzkaykayvjcoeim8qd3smlqgtlq7nryawwojdgeysfoehhp1d0ekufjdhpjk8bzq3hul/3mtricx7zskonfe4omgmgfqr3ccguxssbbmq0mqlttscnefsys1ytzvqndnt14k9xljduru/vmhcj2liwcowreoahecgwq4a1vqaxg8g1fwat6tf+vn+rjg09ila9zzap7a+v4blcalna==</latexit> <latexit sha1_base64="ebh8vwe41e+mncrq1yv7w0htm9m=">aaackhicbvdlsgmxfm34rpu16tjnsah1yzkpgm7eoiauk9ghdnohk2ba0cqzjbmxdpm5bvwvnykkdouxmd4w2nog5hdovdx7txazqrtjjkyl5zxvtfxcrn5za3tn197br6sokzjucmqi2qyqiowkutnum9kmjue8ykqrdg7gfuorseuj8achmwlz1bm0pbhpi/n2lcep8nmqelrajypdx4ilt1kgpzvwp6wxbtyrsbgwj2yqpk+zt0/gkryar1p27yjtciaai8sdkqkyoerb7143wgknqmoglgq5tqzbkzkaykayvjcoeim8qd3smlqgtlq7nryawwojdgeysfoehhp1d0ekufjdhpjk8bzq3hul/3mtricx7zskonfe4omgmgfqr3ccguxssbbmq0mqlttscnefsys1ytzvqndnt14k9xljduru/vmhcj2liwcowreoahecgwq4a1vqaxg8g1fwat6tf+vn+rjg09ila9zzap7a+v4blcalna==</latexit> <latexit sha1_base64="ebh8vwe41e+mncrq1yv7w0htm9m=">aaackhicbvdlsgmxfm34rpu16tjnsah1yzkpgm7eoiauk9ghdnohk2ba0cqzjbmxdpm5bvwvnykkdouxmd4w2nog5hdovdx7txazqrtjjkyl5zxvtfxcrn5za3tn197br6sokzjucmqi2qyqiowkutnum9kmjue8ykqrdg7gfuorseuj8achmwlz1bm0pbhpi/n2lcep8nmqelrajypdx4ilt1kgpzvwp6wxbtyrsbgwj2yqpk+zt0/gkryar1p27yjtciaai8sdkqkyoerb7143wgknqmoglgq5tqzbkzkaykayvjcoeim8qd3smlqgtlq7nryawwojdgeysfoehhp1d0ekufjdhpjk8bzq3hul/3mtricx7zskonfe4omgmgfqr3ccguxssbbmq0mqlttscnefsys1ytzvqndnt14k9xljduru/vmhcj2liwcowreoahecgwq4a1vqaxg8g1fwat6tf+vn+rjg09ila9zzap7a+v4blcalna==</latexit> <latexit sha1_base64="ebh8vwe41e+mncrq1yv7w0htm9m=">aaackhicbvdlsgmxfm34rpu16tjnsah1yzkpgm7eoiauk9ghdnohk2ba0cqzjbmxdpm5bvwvnykkdouxmd4w2nog5hdovdx7txazqrtjjkyl5zxvtfxcrn5za3tn197br6sokzjucmqi2qyqiowkutnum9kmjue8ykqrdg7gfuorseuj8achmwlz1bm0pbhpi/n2lcep8nmqelrajypdx4ilt1kgpzvwp6wxbtyrsbgwj2yqpk+zt0/gkryar1p27yjtciaai8sdkqkyoerb7143wgknqmoglgq5tqzbkzkaykayvjcoeim8qd3smlqgtlq7nryawwojdgeysfoehhp1d0ekufjdhpjk8bzq3hul/3mtricx7zskonfe4omgmgfqr3ccguxssbbmq0mqlttscnefsys1ytzvqndnt14k9xljduru/vmhcj2liwcowreoahecgwq4a1vqaxg8g1fwat6tf+vn+rjg09ila9zzap7a+v4blcalna==</latexit> What about in the prediction setting? That feels easier Here, the goal is to learn a function f for some inputs x to make predictions about targets y How do we identify the best f in our set of functions F? F = {f : R d! R f(x) =x > w for some w 2 R d } Any ideas? Maybe min f2f nx (f(x i ) y i ) 2 i=1 What does that entail? What if y_i is always 0 or 1? 17

18 <latexit sha1_base64="i0g3dmevbxg/atdafqorl9xpeys=">aaacf3icbvdjsgnbeo1xjxeb9eilmqjxemze0gnqdx4jmauyifr0kkmtnoxugkky5y+8+ctepcjivw/+jz3loikpch7vvvfvz4+l0og439bs8srq2npui7+5tb2za+/t13suka5vhslinxymqyoqqihqqinwwajfqt0fxi39+j0olalwdkcxtalwc0vxcizgatsld2gikvo9zavysj162adkgy2lu0yfji59zmr6nz207yjtciagi8sdkqkzodk2v7xoxjmaqussad10nrhbzh8klihle4mgmpeb60ht0jafofvp5k+mhhulq7urmhuinai/j1iwad0kfnm5vlhpe2pxp6+zypeilyowthbcpl3utstfii5doh2hgkmcgck4euzwyvtmmy4myrwjwz1/ezhutkuuu3jvzwrly1kcoxjijkiruosclmknqzaq4esrpjnx8my9ws/wu/uxbv2yzjmh5a+szx9nkacw</latexit> <latexit sha1_base64="i0g3dmevbxg/atdafqorl9xpeys=">aaacf3icbvdjsgnbeo1xjxeb9eilmqjxemze0gnqdx4jmauyifr0kkmtnoxugkky5y+8+ctepcjivw/+jz3loikpch7vvvfvz4+l0og439bs8srq2npui7+5tb2za+/t13suka5vhslinxymqyoqqihqqinwwajfqt0fxi39+j0olalwdkcxtalwc0vxcizgatsld2gikvo9zavysj162adkgy2lu0yfji59zmr6nz207yjtciagi8sdkqkzodk2v7xoxjmaqussad10nrhbzh8klihle4mgmpeb60ht0jafofvp5k+mhhulq7urmhuinai/j1iwad0kfnm5vlhpe2pxp6+zypeilyowthbcpl3utstfii5doh2hgkmcgck4euzwyvtmmy4myrwjwz1/ezhutkuuu3jvzwrly1kcoxjijkiruosclmknqzaq4esrpjnx8my9ws/wu/uxbv2yzjmh5a+szx9nkacw</latexit> <latexit sha1_base64="i0g3dmevbxg/atdafqorl9xpeys=">aaacf3icbvdjsgnbeo1xjxeb9eilmqjxemze0gnqdx4jmauyifr0kkmtnoxugkky5y+8+ctepcjivw/+jz3loikpch7vvvfvz4+l0og439bs8srq2npui7+5tb2za+/t13suka5vhslinxymqyoqqihqqinwwajfqt0fxi39+j0olalwdkcxtalwc0vxcizgatsld2gikvo9zavysj162adkgy2lu0yfji59zmr6nz207yjtciagi8sdkqkzodk2v7xoxjmaqussad10nrhbzh8klihle4mgmpeb60ht0jafofvp5k+mhhulq7urmhuinai/j1iwad0kfnm5vlhpe2pxp6+zypeilyowthbcpl3utstfii5doh2hgkmcgck4euzwyvtmmy4myrwjwz1/ezhutkuuu3jvzwrly1kcoxjijkiruosclmknqzaq4esrpjnx8my9ws/wu/uxbv2yzjmh5a+szx9nkacw</latexit> <latexit sha1_base64="i0g3dmevbxg/atdafqorl9xpeys=">aaacf3icbvdjsgnbeo1xjxeb9eilmqjxemze0gnqdx4jmauyifr0kkmtnoxugkky5y+8+ctepcjivw/+jz3loikpch7vvvfvz4+l0og439bs8srq2npui7+5tb2za+/t13suka5vhslinxymqyoqqihqqinwwajfqt0fxi39+j0olalwdkcxtalwc0vxcizgatsld2gikvo9zavysj162adkgy2lu0yfji59zmr6nz207yjtciagi8sdkqkzodk2v7xoxjmaqussad10nrhbzh8klihle4mgmpeb60ht0jafofvp5k+mhhulq7urmhuinai/j1iwad0kfnm5vlhpe2pxp6+zypeilyowthbcpl3utstfii5doh2hgkmcgck4euzwyvtmmy4myrwjwz1/ezhutkuuu3jvzwrly1kcoxjijkiruosclmknqzaq4esrpjnx8my9ws/wu/uxbv2yzjmh5a+szx9nkacw</latexit> Using the language of probability Hard to just guess an objective function that tells you what is the best model in your class A natural objective is: find the model that is the most likely given the data (sampled according to a true underlying model) arg max p( D) Reflects the inherent uncertainty in identifying the model, since many models are possible given only samples 18

19 Start by applying Bayes rule MAP calculation p(f D) = p(d f)p(f) p(d) p(d f) is the likelihood of the data, under the model for iid data, p(d f) = p(x1 f) p(xn f) p(f) is the prior of the model p(d) is the marginal distribution of the data we will often be able to ignore this term 19

20 <latexit sha1_base64="yezbwiq5na/vuerwi02av9wta/y=">aaacknichvhlthsxfpvmodaumvsx68zqbijnninaoerluviwakvuioaur5hhuznyeb6y71reu39qf6e7/k09msxiqprklo7poffh6yhx0maq/px8fyurl9fwxzveb2y+abbevrs2waef9esmmn0bcqnkptbdiqpucw08irtcrhfnlx7ze7srwxqf0xwgcr+nmpaco6ogrd8mj4b8wdkeeyy/n3wtpy3tl7s+cz22loh3lv75lgwx5qlmdxyle8fvewhpl1qru9txnbklll3l2hnv/1upswgrhxscwddhijydnplhd9j6w0azkbjiushutd8mchy45iifattghygcizs+hr6dku/admrzsi3dcsyixpl2j0u6yx9mldwxzppezlknbpa1inxk6xcyhw1kmeyfqirqrnghkga0+h86khoeqqkdxgjpzqviwt3o0f1iwy0hxh7yy3c91wmdtvhjv336db6odfkrfci7jcsh5jrcki7peee1vqpv2dvxp/if/tp/vlb63jznpvki/9s/6z3i9w==</latexit> <latexit sha1_base64="yezbwiq5na/vuerwi02av9wta/y=">aaacknichvhlthsxfpvmodaumvsx68zqbijnninaoerluviwakvuioaur5hhuznyeb6y71reu39qf6e7/k09msxiqprklo7poffh6yhx0maq/px8fyurl9fwxzveb2y+abbevrs2waef9esmmn0bcqnkptbdiqpucw08irtcrhfnlx7ze7srwxqf0xwgcr+nmpaco6ogrd8mj4b8wdkeeyy/n3wtpy3tl7s+cz22loh3lv75lgwx5qlmdxyle8fvewhpl1qru9txnbklll3l2hnv/1upswgrhxscwddhijydnplhd9j6w0azkbjiushutd8mchy45iifattghygcizs+hr6dku/admrzsi3dcsyixpl2j0u6yx9mldwxzppezlknbpa1inxk6xcyhw1kmeyfqirqrnghkga0+h86khoeqqkdxgjpzqviwt3o0f1iwy0hxh7yy3c91wmdtvhjv336db6odfkrfci7jcsh5jrcki7peee1vqpv2dvxp/if/tp/vlb63jznpvki/9s/6z3i9w==</latexit> <latexit sha1_base64="yezbwiq5na/vuerwi02av9wta/y=">aaacknichvhlthsxfpvmodaumvsx68zqbijnninaoerluviwakvuioaur5hhuznyeb6y71reu39qf6e7/k09msxiqprklo7poffh6yhx0maq/px8fyurl9fwxzveb2y+abbevrs2waef9esmmn0bcqnkptbdiqpucw08irtcrhfnlx7ze7srwxqf0xwgcr+nmpaco6ogrd8mj4b8wdkeeyy/n3wtpy3tl7s+cz22loh3lv75lgwx5qlmdxyle8fvewhpl1qru9txnbklll3l2hnv/1upswgrhxscwddhijydnplhd9j6w0azkbjiushutd8mchy45iifattghygcizs+hr6dku/admrzsi3dcsyixpl2j0u6yx9mldwxzppezlknbpa1inxk6xcyhw1kmeyfqirqrnghkga0+h86khoeqqkdxgjpzqviwt3o0f1iwy0hxh7yy3c91wmdtvhjv336db6odfkrfci7jcsh5jrcki7peee1vqpv2dvxp/if/tp/vlb63jznpvki/9s/6z3i9w==</latexit> <latexit sha1_base64="yezbwiq5na/vuerwi02av9wta/y=">aaacknichvhlthsxfpvmodaumvsx68zqbijnninaoerluviwakvuioaur5hhuznyeb6y71reu39qf6e7/k09msxiqprklo7poffh6yhx0maq/px8fyurl9fwxzveb2y+abbevrs2waef9esmmn0bcqnkptbdiqpucw08irtcrhfnlx7ze7srwxqf0xwgcr+nmpaco6ogrd8mj4b8wdkeeyy/n3wtpy3tl7s+cz22loh3lv75lgwx5qlmdxyle8fvewhpl1qru9txnbklll3l2hnv/1upswgrhxscwddhijydnplhd9j6w0azkbjiushutd8mchy45iifattghygcizs+hr6dku/admrzsi3dcsyixpl2j0u6yx9mldwxzppezlknbpa1inxk6xcyhw1kmeyfqirqrnghkga0+h86khoeqqkdxgjpzqviwt3o0f1iwy0hxh7yy3c91wmdtvhjv336db6odfkrfci7jcsh5jrcki7peee1vqpv2dvxp/if/tp/vlb63jznpvki/9s/6z3i9w==</latexit> <latexit sha1_base64="t32l0pz9/3qsrtn4nlr65itgcly=">aaacaxichvhbssmwge7rac5tpyiin8ghujvriqa3wlavvfrwtljhslpuhavtsp4koxz8ru98aw98cdnt4mgcpwq+vgp58ywqgmtw3vflnpmdm18olzaxlldw15zk+p1oukvzkyyiufcb0uzwmdwbg2d3ujesbyk1gsffobcemdi8iw9hkfknig8xdzklykiu8ywpfegzipgj+xgbpiuiu8xr+oam+6eindogl75wjew1/bms5t8ttdz3ywe+vimebp+b7jpvt+6obk8dbwkqadlxxeff7yu0jvgmvbct254rozmrbzwklpf9vdnj6ia8slabmymy7msjpnk8b5gedhnltgx4xh5pzctsehgfxlmsrx9rbfmx1k4hpo1kpjypsjiolwptgu0fre24xxwjiiygekq42rxtpjh9gvmcsinb+/3kaxb3vpfcundzxg2ct+ooov20hw6rh05qa12ha9refl1zy9amtww92xv7294zw21rktlap8aufgc1x7hb</latexit> <latexit sha1_base64="t32l0pz9/3qsrtn4nlr65itgcly=">aaacaxichvhbssmwge7rac5tpyiin8ghujvriqa3wlavvfrwtljhslpuhavtsp4koxz8ru98aw98cdnt4mgcpwq+vgp58ywqgmtw3vflnpmdm18olzaxlldw15zk+p1oukvzkyyiufcb0uzwmdwbg2d3ujesbyk1gsffobcemdi8iw9hkfknig8xdzklykiu8ywpfegzipgj+xgbpiuiu8xr+oam+6eindogl75wjew1/bms5t8ttdz3ywe+vimebp+b7jpvt+6obk8dbwkqadlxxeff7yu0jvgmvbct254rozmrbzwklpf9vdnj6ia8slabmymy7msjpnk8b5gedhnltgx4xh5pzctsehgfxlmsrx9rbfmx1k4hpo1kpjypsjiolwptgu0fre24xxwjiiygekq42rxtpjh9gvmcsinb+/3kaxb3vpfcundzxg2ct+ooov20hw6rh05qa12ha9refl1zy9amtww92xv7294zw21rktlap8aufgc1x7hb</latexit> <latexit sha1_base64="t32l0pz9/3qsrtn4nlr65itgcly=">aaacaxichvhbssmwge7rac5tpyiin8ghujvriqa3wlavvfrwtljhslpuhavtsp4koxz8ru98aw98cdnt4mgcpwq+vgp58ywqgmtw3vflnpmdm18olzaxlldw15zk+p1oukvzkyyiufcb0uzwmdwbg2d3ujesbyk1gsffobcemdi8iw9hkfknig8xdzklykiu8ywpfegzipgj+xgbpiuiu8xr+oam+6eindogl75wjew1/bms5t8ttdz3ywe+vimebp+b7jpvt+6obk8dbwkqadlxxeff7yu0jvgmvbct254rozmrbzwklpf9vdnj6ia8slabmymy7msjpnk8b5gedhnltgx4xh5pzctsehgfxlmsrx9rbfmx1k4hpo1kpjypsjiolwptgu0fre24xxwjiiygekq42rxtpjh9gvmcsinb+/3kaxb3vpfcundzxg2ct+ooov20hw6rh05qa12ha9refl1zy9amtww92xv7294zw21rktlap8aufgc1x7hb</latexit> <latexit sha1_base64="t32l0pz9/3qsrtn4nlr65itgcly=">aaacaxichvhbssmwge7rac5tpyiin8ghujvriqa3wlavvfrwtljhslpuhavtsp4koxz8ru98aw98cdnt4mgcpwq+vgp58ywqgmtw3vflnpmdm18olzaxlldw15zk+p1oukvzkyyiufcb0uzwmdwbg2d3ujesbyk1gsffobcemdi8iw9hkfknig8xdzklykiu8ywpfegzipgj+xgbpiuiu8xr+oam+6eindogl75wjew1/bms5t8ttdz3ywe+vimebp+b7jpvt+6obk8dbwkqadlxxeff7yu0jvgmvbct254rozmrbzwklpf9vdnj6ia8slabmymy7msjpnk8b5gedhnltgx4xh5pzctsehgfxlmsrx9rbfmx1k4hpo1kpjypsjiolwptgu0fre24xxwjiiygekq42rxtpjh9gvmcsinb+/3kaxb3vpfcundzxg2ct+ooov20hw6rh05qa12ha9refl1zy9amtww92xv7294zw21rktlap8aufgc1x7hb</latexit> Reformulating MAP The constant is irrelevant, as it is the same for all D MAP = arg max = arg max p(d )p( ) p(d) p(d )p( ) Will often write: p( D) = p(d )p( ) p(d) / p(d )p( ) 20

21 What is the prior? The prior gives a chance to incorporate expert knowledge The knowledge or information prior to seeing any data Example: maybe ahead of time you know what you think are reasonable values for the heights of people The prior could be a Gaussian distribution around theta = height, where p(theta) is a Gaussian with mean = 170 (cm) and variance = 60 What does the variance mean here, for the prior? 21

22 <latexit sha1_base64="svrzvnnsfuzrtbi7pif7u1/bhtu=">aaacl3icbvdlsgnbejynrxhfuy9ebomsl2fxbl0ioiiefbsmebihze46yzdzbzo9ylj3j7z4k7mikolvv3a2yugndq3vvd10d3mrfbpt+9xktu3pzm7l5wsli0vlk8xvtvsdxopdlycyvhce0ybfafuukoeuusb8t0ln651keu0elbzhcip9cbo+6wsilthdqzwlzy52avkzcreemlm8sfna2d6ko5kptur67cgts7aurmvty5czmzym9jgo+j1mswrx7ghqsecmqymm46pzhlitkmc+bmgl07ru2be2zeiuxejacgmneem91og6gqhzqtes4b8p3tjmi7zdztjaomr/titm17rve6yzo1b/1tlyp60ey/ugkyggihecplrujixfkgbm0zzqwfh2dwbccxmr5v2mgedjccgy4px9erlc7lycu+jc75wojsd25mkg2srl4pb9cktoyrwpek6eyic8kxfr2xqxpqzpuwvogs+sk19hfx0dvcopzg==</latexit> <latexit sha1_base64="svrzvnnsfuzrtbi7pif7u1/bhtu=">aaacl3icbvdlsgnbejynrxhfuy9ebomsl2fxbl0ioiiefbsmebihze46yzdzbzo9ylj3j7z4k7mikolvv3a2yugndq3vvd10d3mrfbpt+9xktu3pzm7l5wsli0vlk8xvtvsdxopdlycyvhce0ybfafuukoeuusb8t0ln651keu0elbzhcip9cbo+6wsilthdqzwlzy52avkzcreemlm8sfna2d6ko5kptur67cgts7aurmvty5czmzym9jgo+j1mswrx7ghqsecmqymm46pzhlitkmc+bmgl07ru2be2zeiuxejacgmneem91og6gqhzqtes4b8p3tjmi7zdztjaomr/titm17rve6yzo1b/1tlyp60ey/ugkyggihecplrujixfkgbm0zzqwfh2dwbccxmr5v2mgedjccgy4px9erlc7lycu+jc75wojsd25mkg2srl4pb9cktoyrwpek6eyic8kxfr2xqxpqzpuwvogs+sk19hfx0dvcopzg==</latexit> <latexit sha1_base64="svrzvnnsfuzrtbi7pif7u1/bhtu=">aaacl3icbvdlsgnbejynrxhfuy9ebomsl2fxbl0ioiiefbsmebihze46yzdzbzo9ylj3j7z4k7mikolvv3a2yugndq3vvd10d3mrfbpt+9xktu3pzm7l5wsli0vlk8xvtvsdxopdlycyvhce0ybfafuukoeuusb8t0ln651keu0elbzhcip9cbo+6wsilthdqzwlzy52avkzcreemlm8sfna2d6ko5kptur67cgts7aurmvty5czmzym9jgo+j1mswrx7ghqsecmqymm46pzhlitkmc+bmgl07ru2be2zeiuxejacgmneem91og6gqhzqtes4b8p3tjmi7zdztjaomr/titm17rve6yzo1b/1tlyp60ey/ugkyggihecplrujixfkgbm0zzqwfh2dwbccxmr5v2mgedjccgy4px9erlc7lycu+jc75wojsd25mkg2srl4pb9cktoyrwpek6eyic8kxfr2xqxpqzpuwvogs+sk19hfx0dvcopzg==</latexit> <latexit sha1_base64="svrzvnnsfuzrtbi7pif7u1/bhtu=">aaacl3icbvdlsgnbejynrxhfuy9ebomsl2fxbl0ioiiefbsmebihze46yzdzbzo9ylj3j7z4k7mikolvv3a2yugndq3vvd10d3mrfbpt+9xktu3pzm7l5wsli0vlk8xvtvsdxopdlycyvhce0ybfafuukoeuusb8t0ln651keu0elbzhcip9cbo+6wsilthdqzwlzy52avkzcreemlm8sfna2d6ko5kptur67cgts7aurmvty5czmzym9jgo+j1mswrx7ghqsecmqymm46pzhlitkmc+bmgl07ru2be2zeiuxejacgmneem91og6gqhzqtes4b8p3tjmi7zdztjaomr/titm17rve6yzo1b/1tlyp60ey/ugkyggihecplrujixfkgbm0zzqwfh2dwbccxmr5v2mgedjccgy4px9erlc7lycu+jc75wojsd25mkg2srl4pb9cktoyrwpek6eyic8kxfr2xqxpqzpuwvogs+sk19hfx0dvcopzg==</latexit> <latexit sha1_base64="t32l0pz9/3qsrtn4nlr65itgcly=">aaacaxichvhbssmwge7rac5tpyiin8ghujvriqa3wlavvfrwtljhslpuhavtsp4koxz8ru98aw98cdnt4mgcpwq+vgp58ywqgmtw3vflnpmdm18olzaxlldw15zk+p1oukvzkyyiufcb0uzwmdwbg2d3ujesbyk1gsffobcemdi8iw9hkfknig8xdzklykiu8ywpfegzipgj+xgbpiuiu8xr+oam+6eindogl75wjew1/bms5t8ttdz3ywe+vimebp+b7jpvt+6obk8dbwkqadlxxeff7yu0jvgmvbct254rozmrbzwklpf9vdnj6ia8slabmymy7msjpnk8b5gedhnltgx4xh5pzctsehgfxlmsrx9rbfmx1k4hpo1kpjypsjiolwptgu0fre24xxwjiiygekq42rxtpjh9gvmcsinb+/3kaxb3vpfcundzxg2ct+ooov20hw6rh05qa12ha9refl1zy9amtww92xv7294zw21rktlap8aufgc1x7hb</latexit> <latexit sha1_base64="t32l0pz9/3qsrtn4nlr65itgcly=">aaacaxichvhbssmwge7rac5tpyiin8ghujvriqa3wlavvfrwtljhslpuhavtsp4koxz8ru98aw98cdnt4mgcpwq+vgp58ywqgmtw3vflnpmdm18olzaxlldw15zk+p1oukvzkyyiufcb0uzwmdwbg2d3ujesbyk1gsffobcemdi8iw9hkfknig8xdzklykiu8ywpfegzipgj+xgbpiuiu8xr+oam+6eindogl75wjew1/bms5t8ttdz3ywe+vimebp+b7jpvt+6obk8dbwkqadlxxeff7yu0jvgmvbct254rozmrbzwklpf9vdnj6ia8slabmymy7msjpnk8b5gedhnltgx4xh5pzctsehgfxlmsrx9rbfmx1k4hpo1kpjypsjiolwptgu0fre24xxwjiiygekq42rxtpjh9gvmcsinb+/3kaxb3vpfcundzxg2ct+ooov20hw6rh05qa12ha9refl1zy9amtww92xv7294zw21rktlap8aufgc1x7hb</latexit> <latexit sha1_base64="t32l0pz9/3qsrtn4nlr65itgcly=">aaacaxichvhbssmwge7rac5tpyiin8ghujvriqa3wlavvfrwtljhslpuhavtsp4koxz8ru98aw98cdnt4mgcpwq+vgp58ywqgmtw3vflnpmdm18olzaxlldw15zk+p1oukvzkyyiufcb0uzwmdwbg2d3ujesbyk1gsffobcemdi8iw9hkfknig8xdzklykiu8ywpfegzipgj+xgbpiuiu8xr+oam+6eindogl75wjew1/bms5t8ttdz3ywe+vimebp+b7jpvt+6obk8dbwkqadlxxeff7yu0jvgmvbct254rozmrbzwklpf9vdnj6ia8slabmymy7msjpnk8b5gedhnltgx4xh5pzctsehgfxlmsrx9rbfmx1k4hpo1kpjypsjiolwptgu0fre24xxwjiiygekq42rxtpjh9gvmcsinb+/3kaxb3vpfcundzxg2ct+ooov20hw6rh05qa12ha9refl1zy9amtww92xv7294zw21rktlap8aufgc1x7hb</latexit> <latexit sha1_base64="t32l0pz9/3qsrtn4nlr65itgcly=">aaacaxichvhbssmwge7rac5tpyiin8ghujvriqa3wlavvfrwtljhslpuhavtsp4koxz8ru98aw98cdnt4mgcpwq+vgp58ywqgmtw3vflnpmdm18olzaxlldw15zk+p1oukvzkyyiufcb0uzwmdwbg2d3ujesbyk1gsffobcemdi8iw9hkfknig8xdzklykiu8ywpfegzipgj+xgbpiuiu8xr+oam+6eindogl75wjew1/bms5t8ttdz3ywe+vimebp+b7jpvt+6obk8dbwkqadlxxeff7yu0jvgmvbct254rozmrbzwklpf9vdnj6ia8slabmymy7msjpnk8b5gedhnltgx4xh5pzctsehgfxlmsrx9rbfmx1k4hpo1kpjypsjiolwptgu0fre24xxwjiiygekq42rxtpjh9gvmcsinb+/3kaxb3vpfcundzxg2ct+ooov20hw6rh05qa12ha9refl1zy9amtww92xv7294zw21rktlap8aufgc1x7hb</latexit> Maximum likelihood ML = arg max In some situations, may not have a reason to prefer one model over another (i.e., no prior knowledge or preferences) Can loosely think of maximum likelihood as instance of MAP, with uniform prior p(theta) = u for some constant u If domain is infinite (example, the set of reals), the uniform distribution is not defined! but the interpretation is still similar in practice, typically have a bounded space in mind for the model class p(d ) p(x) p( D) = p(d )p( ) p(d) / p(d )p( ) 22

23 ML example e.g., F = R, is the mean of a Gaussian, fixed =1 c( ) =p(d ) = N (x 1 µ =, 2 = 1) = exp 2 (x 1 ) 2 c( ) =p(d ) 23 x 1 Variable is theta, optimizing it

24 Maximizing the log-likelihood We want to maximize the likelihood, but often instead maximize the log-likelihood arg max 2F p(d ) = arg max 2F log p(d ) Why? Or maybe first, is this equivalent? The Why is that it makes the optimization much simpler, when we have more than one sample 24

25 <latexit sha1_base64="befek43afpkcfmcy8mkkpnkg5zi=">aaaca3icbvc7sgnbfj2nrxhfq3badayhnmfxbk0kagmzwtwgg8ls5g4yzpbbzf0xlaebf8xgqhfbf8lov3e32uitt3u4517uucenpnbowd9gywl5zxwtuf7a2nza3jf395o6jbwhbg9lqnou0ybfaa0ukkedkwc+k6hljq4zv3upsoswumnxbf2fdqlhcc4wlxrmgypwgikxkjqhvolgejcd0etq0z5ztqrwfhsr2dkpkxz1nvnl9eme+xagl0zrjm1f2e2yqselteporcfifmqg0elpwhzq3wt6w4qep0qfzjm8mea6vx9vjmzxeuy76atpckjnvuz8z+ve6f10exfemulaz4e8wfimavyi7qsfhou4jywrkwalfmgu45jwvkplsodfxitn06ptve3bs3ltkq+jsa7jeakqm5ytgrkhddigndysz/jk3own48v4nz5mowuj39knf2b8/gdbzpzd</latexit> <latexit sha1_base64="befek43afpkcfmcy8mkkpnkg5zi=">aaaca3icbvc7sgnbfj2nrxhfq3badayhnmfxbk0kagmzwtwgg8ls5g4yzpbbzf0xlaebf8xgqhfbf8lov3e32uitt3u4517uucenpnbowd9gywl5zxwtuf7a2nza3jf395o6jbwhbg9lqnou0ybfaa0ukkedkwc+k6hljq4zv3upsoswumnxbf2fdqlhcc4wlxrmgypwgikxkjqhvolgejcd0etq0z5ztqrwfhsr2dkpkxz1nvnl9eme+xagl0zrjm1f2e2yqselteporcfifmqg0elpwhzq3wt6w4qep0qfzjm8mea6vx9vjmzxeuy76atpckjnvuz8z+ve6f10exfemulaz4e8wfimavyi7qsfhou4jywrkwalfmgu45jwvkplsodfxitn06ptve3bs3ltkq+jsa7jeakqm5ytgrkhddigndysz/jk3own48v4nz5mowuj39knf2b8/gdbzpzd</latexit> <latexit sha1_base64="befek43afpkcfmcy8mkkpnkg5zi=">aaaca3icbvc7sgnbfj2nrxhfq3badayhnmfxbk0kagmzwtwgg8ls5g4yzpbbzf0xlaebf8xgqhfbf8lov3e32uitt3u4517uucenpnbowd9gywl5zxwtuf7a2nza3jf395o6jbwhbg9lqnou0ybfaa0ukkedkwc+k6hljq4zv3upsoswumnxbf2fdqlhcc4wlxrmgypwgikxkjqhvolgejcd0etq0z5ztqrwfhsr2dkpkxz1nvnl9eme+xagl0zrjm1f2e2yqselteporcfifmqg0elpwhzq3wt6w4qep0qfzjm8mea6vx9vjmzxeuy76atpckjnvuz8z+ve6f10exfemulaz4e8wfimavyi7qsfhou4jywrkwalfmgu45jwvkplsodfxitn06ptve3bs3ltkq+jsa7jeakqm5ytgrkhddigndysz/jk3own48v4nz5mowuj39knf2b8/gdbzpzd</latexit> <latexit sha1_base64="befek43afpkcfmcy8mkkpnkg5zi=">aaaca3icbvc7sgnbfj2nrxhfq3badayhnmfxbk0kagmzwtwgg8ls5g4yzpbbzf0xlaebf8xgqhfbf8lov3e32uitt3u4517uucenpnbowd9gywl5zxwtuf7a2nza3jf395o6jbwhbg9lqnou0ybfaa0ukkedkwc+k6hljq4zv3upsoswumnxbf2fdqlhcc4wlxrmgypwgikxkjqhvolgejcd0etq0z5ztqrwfhsr2dkpkxz1nvnl9eme+xagl0zrjm1f2e2yqselteporcfifmqg0elpwhzq3wt6w4qep0qfzjm8mea6vx9vjmzxeuy76atpckjnvuz8z+ve6f10exfemulaz4e8wfimavyi7qsfhou4jywrkwalfmgu45jwvkplsodfxitn06ptve3bs3ltkq+jsa7jeakqm5ytgrkhddigndysz/jk3own48v4nz5mowuj39knf2b8/gdbzpzd</latexit> Why can we shift by log? c( 1 ) >c( 2 ) () log c( 1 ) > log c( 2 ) for c( ) > Monotone increasing Likelihood values always > 0 25

26 Maximizing the log-likelihood e.g., F = R, is the mean of a Gaussian, fixed =1 log(ab) = log a + log b log(a c )=clog a 1 c( ) = log p(d ) 1 1 = log 2 exp 2 (x 1 ) 2 1 = log(2 ) 2 (x 1 ) c( ) = log p(d ) x

27 This conversion is even more important when we have more than one sample Example: Let D = {x1,x2} (two samples). If x1 and x2 are independent samples from same distribution (same model), then P(x1, x2 theta) = P(x1 f) P(x2 theta) p(x 1 )p(x 2 ) = 1 2 exp 1 2 (x 1 ) exp 1 2 (x 2 ) 2 log (p(x 1 )p(x 2 )) = log p(x 1 ) + log p(x 2 ) = 2 log(2 ) 1 2 (x 1 ) (x 2 ) 2 27

28 Visualizing the objective p(x 1 )p(x 2 ) log (p(x 1 )p(x 2 )) 28

29 Exercise: about the marginal over the data p(d) Why do we avoid computing p(d)? How do we compute p(d)? We have the tools, since we know about marginals 29

30 How do we compute p(d)? If we have p(d, f), can we obtain p(d)? Marginalization If we have p(d f) and p(f), do we have p(d, f)? 30

31 Data marginal Using the formula of total probability 8P >< f2f p(d f)p(f) f : discrete p(d) = >: F p(d f)p(f)df f : continuous rior distribution can be fully described using th Fully expressible in terms of likelihood and prior 31

32 How do we solve the MAP and ML maximization problems? Naive strategy: 1. Guess 100 solutions theta 2. Pick the one with the largest value Can we do something better? 32

33 Crash course in optimization Goal: find maximal (or minimal) points of functions Generally assume functions are smooth, use gradient descent Derivative: direction of ascent from a scalar point Gradient: direction of ascent from a vector point d dx f(x) c rf(x) c 33

34 Function surface Local Minima Saddlepoint Global Minima 34

35 Single-variate calculus For a function f defined on a scalar x, the derivative is df dx (x) = lim h!0 f(x + h) h f(x) At any point, x, df (x) gives dx the slope of the tangent to the function at f(x) GIF from Wikipedia: Tangent 35 *Note: f is c in this slide

36 Why don t constants matter? max x f(x) c d f(x) c =0 dx maxu cf(x), c u c > 0 x d d ucf(x) c =cu f(x) c =0 dx dx Both have derivative zero under same condition regardless of uc>0 36

37 Can either minimize or maximize arg min f( ) c = arg max f( ) c c c æ 37 c(tw 1 +(1 t)w 2 ) Æ tc(w 1 )+(1 t)c(w 2 ) s that when we draw a line between any two p

38 Example: maximum likelihood for discrete distributions Imagine you are flipping a biased coin; the model parameter is the bias of the coin, theta You get a dataset D = {x_1,, x_n} of coin flips, where x_i = 1 if it was heads, and x_i = 0 if it was tails What is p(d theta)? p(d ) )=p(x 1,...,x n ) = ny i=1 p(x i ) x 2 x 3 x n xv 1 v 2 v 3 v N p(x ) = x (1 ) 1 x 38

39 Example: maximum likelihood for discrete distributions How do we estimate theta? Counting: x 2 x 3 x n xv 1 v 2 v 3 v N count the number of heads Nh count the number of tails Nt normalize: theta = Nh/ (Nh + Nt) What if you actually try to maximize the likelihood? i.e., solve argmax p(d theta) p(d ) )=p(x 1,...,x n ) ny = p(x i ) i=1 p(x ) = x (1 ) 1 x 39

40 Example: maximum likelihood for discrete distributions What if you actually try to maximize the likelihood to get theta? i.e., solve argmax p(d theta) max arg max ny i=1 f( ) c = p(x i ) = max ny i=1 p(x i ) f( ) c = arg max f( ) c log f( ) c x 2 x 3 x n xv 1 v 2 v 3 v N p(d ) )=p(x 1,...,x n ) ny = p(x i ) i=1 p(x ) = x (1 ) 1 x 40

41 Example: maximum likelihood for discrete distributions arg max f( ) c = arg max log f( ) c ny log f( ) c = log p(x i ) = nx i=1 i=1 log p(x i ) log(ab) = log a + log b log(a c )=c log a p(x ) = x (1 ) 1 x log p(x ) = log ( x ) + log (1 ) 1 x = x log ( )+(1 x) log (1 ) 41

42 Example: maximum likelihood for discrete distributions nx i=1 nx nx log p(x i ) = x i log ( )+ (1 x i ) log (1 ) i=1 = log ( ) nx x = i=1 i=1! nx x i + log (1 ) i=1 i=1 x i! nx (1 x i ) d d = 1 x 1 1 (n x) =0 42

43 Example: maximum likelihood for discrete distributions x = nx i=1 x i d d = 1 x 1 1 (n x) =0 =) x = n x 1 =) (1 ) x = (n x) =) x x = n x =) = x n 43

44 We solved for a stationary point But how do we know that this is the optimal solution to the optimization problem we specified? 44

45 Univariate optimization - Minima - Maxima - Saddle points If then has a local maximum at. If then has a local minimum at. If, the test is inconclusive. f 0 (a) = lim h!0 f(a + h) f(a) h 45

46 Second derivative test If then has a local maximum at. If then has a local minimum at. If, the test is inconclusive. 46 Ignore the green line

47 Second derivative test If then has a local maximum at. If then has a local minimum at. If, the test is inconclusive. 47

48 Example: MAP for discrete distributions Imagine you are flipping a biased coin; the model parameter is the bias of the coin, theta You get a dataset D = {x_1,, x_n} of coin flips, where x_i = 1 if it was heads, and x_i = 0 if it was tails What if we also specify p(theta)? x 2 x 3 x n xv 1 v 2 v 3 v N What is the MAP estimate? 48

49 Example: MAP for discrete distributions We still need to fully specify the prior p( ). To avoid complexities resulting from continuous variables, we ll consider a discrete with only three possible states, 2 {0.1, 0.5, 0.8}. Specifically,weassume p( =0.1) = 0.15, p( =0.5) = 0.8, p( =0.8) =

50 Example: MAP for discrete distributions For an experiment with N H =2, N T =8, the posterior distribution is p( ) p( D) =p(d )p( ) If we were asked to choose a single a posteriori most likely value for, it would be =0.5, although our confidence in this is low since the posterior belief that = 0.1 is also appreciable. This result is intuitive since, even though we observed more Tails than Heads, our prior belief was that it was more likely the coin is fair. 50

51 Example: MAP for discrete distributions Repeating the above with N H = 20, N T = 80, the posterior changes to so that the posterior belief in =0.1 dominates. There are so many more tails than heads that this is unlikely to occur from a fair coin. Even though we apriori thought that the coin was fair, a posteriori we have enough evidence to change our minds. What is the Max Likelihood solution for these two cases? Exercise: Solve for the MAP estimate

52 Now on to some careful examples of MAP! Whiteboard for Examples 8, 9, 10, 11 More fun with derivatives and finding the minimum of a function Next: introduction to prediction problems for ML 52

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

CSE446: Clustering and EM Spring 2017

CSE446: Clustering and EM Spring 2017 CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released

More information

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning. Chris Cremer September 2015 Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

More information

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Expectation maximization

Expectation maximization Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be

More information

PROBABILITY THEORY REVIEW

PROBABILITY THEORY REVIEW PROBABILITY THEORY REVIEW CMPUT 466/551 Martha White Fall, 2017 REMINDERS Assignment 1 is due on September 28 Thought questions 1 are due on September 21 Chapters 1-4, about 40 pages If you are printing,

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

Choosing among models

Choosing among models Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21 CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about

More information

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated

More information

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017 Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Some Probability and Statistics

Some Probability and Statistics Some Probability and Statistics David M. Blei COS424 Princeton University February 13, 2012 Card problem There are three cards Red/Red Red/Black Black/Black I go through the following process. Close my

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Bayesian Linear Regression [DRAFT - In Progress]

Bayesian Linear Regression [DRAFT - In Progress] Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Open book, but no loose leaf notes and no electronic devices. Points (out of 200) are in parentheses. Put all answers on the paper provided to you.

Open book, but no loose leaf notes and no electronic devices. Points (out of 200) are in parentheses. Put all answers on the paper provided to you. ISQS 5347 Final Exam Spring 2017 Open book, but no loose leaf notes and no electronic devices. Points (out of 200) are in parentheses. Put all answers on the paper provided to you. 1. Recall the commute

More information

Gradient Ascent Chris Piech CS109, Stanford University

Gradient Ascent Chris Piech CS109, Stanford University Gradient Ascent Chris Piech CS109, Stanford University Our Path Deep Learning Linear Regression Naïve Bayes Logistic Regression Parameter Estimation Our Path Deep Learning Linear Regression Naïve Bayes

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Problem Set 1. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 20

Problem Set 1. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 20 Problem Set MAS 6J/.6J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 0 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain a

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Logistic Regression. William Cohen

Logistic Regression. William Cohen Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Line of symmetry Total area enclosed is 1

Line of symmetry Total area enclosed is 1 The Normal distribution The Normal distribution is a continuous theoretical probability distribution and, probably, the most important distribution in Statistics. Its name is justified by the fact that

More information

DD Advanced Machine Learning

DD Advanced Machine Learning Modelling Carl Henrik {chek}@csc.kth.se Royal Institute of Technology November 4, 2015 Who do I think you are? Mathematically competent linear algebra multivariate calculus Ok programmers Able to extend

More information

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation

COMP 551 Applied Machine Learning Lecture 21: Bayesian optimisation COMP 55 Applied Machine Learning Lecture 2: Bayesian optimisation Associate Instructor: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material posted

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples Machine Learning Bayes Basics Bayes, probabilities, Bayes theorem & examples Marc Toussaint U Stuttgart So far: Basic regression & classification methods: Features + Loss + Regularization & CV All kinds

More information

Part 4: Multi-parameter and normal models

Part 4: Multi-parameter and normal models Part 4: Multi-parameter and normal models 1 The normal model Perhaps the most useful (or utilized) probability model for data analysis is the normal distribution There are several reasons for this, e.g.,

More information

Why should you care about the solution strategies?

Why should you care about the solution strategies? Optimization Why should you care about the solution strategies? Understanding the optimization approaches behind the algorithms makes you more effectively choose which algorithm to run Understanding the

More information

Supervised Learning Coursework

Supervised Learning Coursework Supervised Learning Coursework John Shawe-Taylor Tom Diethe Dorota Glowacka November 30, 2009; submission date: noon December 18, 2009 Abstract Using a series of synthetic examples, in this exercise session

More information

Linear Algebra. Introduction. Marek Petrik 3/23/2017. Many slides adapted from Linear Algebra Lectures by Martin Scharlemann

Linear Algebra. Introduction. Marek Petrik 3/23/2017. Many slides adapted from Linear Algebra Lectures by Martin Scharlemann Linear Algebra Introduction Marek Petrik 3/23/2017 Many slides adapted from Linear Algebra Lectures by Martin Scharlemann Midterm Results Highest score on the non-r part: 67 / 77 Score scaling: Additive

More information

Introduction to Maximum Likelihood Estimation

Introduction to Maximum Likelihood Estimation Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:

More information

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence

More information

Confidence Intervals

Confidence Intervals Quantitative Foundations Project 3 Instructor: Linwei Wang Confidence Intervals Contents 1 Introduction 3 1.1 Warning....................................... 3 1.2 Goals of Statistics..................................

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 34 To start out the course, we need to know something about statistics and This is only an introduction; for a fuller understanding, you would

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

CS 540: Machine Learning Lecture 2: Review of Probability & Statistics

CS 540: Machine Learning Lecture 2: Review of Probability & Statistics CS 540: Machine Learning Lecture 2: Review of Probability & Statistics AD January 2008 AD () January 2008 1 / 35 Outline Probability theory (PRML, Section 1.2) Statistics (PRML, Sections 2.1-2.4) AD ()

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Statistical learning. Chapter 20, Sections 1 3 1

Statistical learning. Chapter 20, Sections 1 3 1 Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Some Probability and Statistics

Some Probability and Statistics Some Probability and Statistics David M. Blei COS424 Princeton University February 12, 2007 D. Blei ProbStat 01 1 / 42 Who wants to scribe? D. Blei ProbStat 01 2 / 42 Random variable Probability is about

More information

Topic 2: Logistic Regression

Topic 2: Logistic Regression CS 4850/6850: Introduction to Machine Learning Fall 208 Topic 2: Logistic Regression Instructor: Daniel L. Pimentel-Alarcón c Copyright 208 2. Introduction Arguably the simplest task that we can teach

More information

Bayesian Interpretations of Regularization

Bayesian Interpretations of Regularization Bayesian Interpretations of Regularization Charlie Frogner 9.50 Class 15 April 1, 009 The Plan Regularized least squares maps {(x i, y i )} n i=1 to a function that minimizes the regularized loss: f S

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

Mixtures of Gaussians continued

Mixtures of Gaussians continued Mixtures of Gaussians continued Machine Learning CSE446 Carlos Guestrin University of Washington May 17, 2013 1 One) bad case for k-means n Clusters may overlap n Some clusters may be wider than others

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Introduction. Basic Probability and Bayes Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 26/9/2011 (EPFL) Graphical Models 26/9/2011 1 / 28 Outline

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Machine Learning 0-70/5 70/5-78, 78, Spring 00 Probability 0 Aarti Singh Lecture, January 3, 00 f(x) µ x Reading: Bishop: Chap, Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Announcements Homework

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Lecture 8: Bayesian Estimation of Parameters in State Space Models in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space

More information

Week 2: Review of probability and statistics

Week 2: Review of probability and statistics Week 2: Review of probability and statistics Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ALL RIGHTS RESERVED

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Bayesian Support Vector Machines for Feature Ranking and Selection

Bayesian Support Vector Machines for Feature Ranking and Selection Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction

More information

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction 15-0: Learning vs. Deduction Artificial Intelligence Programming Bayesian Learning Chris Brooks Department of Computer Science University of San Francisco So far, we ve seen two types of reasoning: Deductive

More information

A Few Notes on Fisher Information (WIP)

A Few Notes on Fisher Information (WIP) A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially

More information

y Xw 2 2 y Xw λ w 2 2

y Xw 2 2 y Xw λ w 2 2 CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:

More information

(Extended) Kalman Filter

(Extended) Kalman Filter (Extended) Kalman Filter Brian Hunt 7 June 2013 Goals of Data Assimilation (DA) Estimate the state of a system based on both current and all past observations of the system, using a model for the system

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information