Introduction to Logistic Regression

Size: px
Start display at page:

Download "Introduction to Logistic Regression"

Transcription

1 Introduction to Logistic Regression

2 Problem & Data Overview Primary Research Questions: 1. What are the risk factors associated with CHD? Regression Questions: 1. What is Y? 2. What is X? Did player develop CHD? Health info

3 Exporatory Data Analysis 1. Side-by-side boxplots Age no yes CHD

4 Exporatory Data Analysis 2. Scatterplot (Yes=1, No = 0) CHD CHD Age Age

5 Exporatory Data Analysis 3. Scatterplot w/smooth curve CHD Age

6 <latexit sha1_base64="ouldixfcnvh0cecexa9j3mzvbtw=">aaafmhicbzrnb9mwgmezjcaobx0cuvhuvjyq2m1le5veay5dudapqsrhdbtoevpiifvrvxixvgnisgmgrnwkbnduy2zldpzf8/bpeydxmsy1c92bo2pbuxf/wcndzqpht /1wxtuxohbrpuv3gukzpktmjs1hkl8uk4ixo6uv8/vbyl77qqk6k/bnbl3sw4vwelbocgufzu/tdfnnvkrcmx02kq01b8bhpabbdizwd0ad5wzc1rfn6sclafig92euqhwid/gydebkdsudn668bhkymhgpfeokeyrbaz+f7zupfasmfqfisppv2vlh3iqfrl4muquebr0qkskc7ghhibl0dovelwqikjfakxxypinvskbfq1hboheaccoj2ig1ckq2uxcc0o+ki8fegonslfrsivd+x5e04bchoredkfb8ncxmvqy8vcxgwbkzt10du3xi4on524b1veprnvuiumgz9/p379y31ws/enzvm+id1zdkrehnqe70ublnvhaht77ec0xu+9m3u25a1cochcd1aznqxe+736pfgvpmpozkuk6nkk3zlmwvywhkd10oqamjsbxeewnfjvnazvn7xg/cakwvyfhwfoqoshka0okvrdrzzzwyzq/q2tuctbdqw5wwjnnzmjqtbaflkwjwaph7aiukoosla77bpeq4vkcuciuj4z+zdm8cvp3idzctnbgp4afuo0oqgyfws+uv9cacvmidwe+tc2tiefur/cp+af9yvk3zm/nz9b1+efvld+g87ffxbnkqw=</latexit> <latexit sha1_base64="ouldixfcnvh0cecexa9j3mzvbtw=">aaafmhicbzrnb9mwgmezjcaobx0cuvhuvjyq2m1le5veay5dudapqsrhdbtoevpiifvrvxixvgnisgmgrnwkbnduy2zldpzf8/bpeydxmsy1c92bo2pbuxf/wcndzqpht /1wxtuxohbrpuv3gukzpktmjs1hkl8uk4ixo6uv8/vbyl77qqk6k/bnbl3sw4vwelbocgufzu/tdfnnvkrcmx02kq01b8bhpabbdizwd0ad5wzc1rfn6sclafig92euqhwid/gydebkdsudn668bhkymhgpfeokeyrbaz+f7zupfasmfqfisppv2vlh3iqfrl4muquebr0qkskc7ghhibl0dovelwqikjfakxxypinvskbfq1hboheaccoj2ig1ckq2uxcc0o+ki8fegonslfrsivd+x5e04bchoredkfb8ncxmvqy8vcxgwbkzt10du3xi4on524b1veprnvuiumgz9/p379y31ws/enzvm+id1zdkrehnqe70ublnvhaht77ec0xu+9m3u25a1cochcd1aznqxe+736pfgvpmpozkuk6nkk3zlmwvywhkd10oqamjsbxeewnfjvnazvn7xg/cakwvyfhwfoqoshka0okvrdrzzzwyzq/q2tuctbdqw5wwjnnzmjqtbaflkwjwaph7aiukoosla77bpeq4vkcuciuj4z+zdm8cvp3idzctnbgp4afuo0oqgyfws+uv9cacvmidwe+tc2tiefur/cp+af9yvk3zm/nz9b1+efvld+g87ffxbnkqw=</latexit> <latexit sha1_base64="ouldixfcnvh0cecexa9j3mzvbtw=">aaafmhicbzrnb9mwgmezjcaobx0cuvhuvjyq2m1le5veay5dudapqsrhdbtoevpiifvrvxixvgnisgmgrnwkbnduy2zldpzf8/bpeydxmsy1c92bo2pbuxf/wcndzqpht /1wxtuxohbrpuv3gukzpktmjs1hkl8uk4ixo6uv8/vbyl77qqk6k/bnbl3sw4vwelbocgufzu/tdfnnvkrcmx02kq01b8bhpabbdizwd0ad5wzc1rfn6sclafig92euqhwid/gydebkdsudn668bhkymhgpfeokeyrbaz+f7zupfasmfqfisppv2vlh3iqfrl4muquebr0qkskc7ghhibl0dovelwqikjfakxxypinvskbfq1hboheaccoj2ig1ckq2uxcc0o+ki8fegonslfrsivd+x5e04bchoredkfb8ncxmvqy8vcxgwbkzt10du3xi4on524b1veprnvuiumgz9/p379y31ws/enzvm+id1zdkrehnqe70ublnvhaht77ec0xu+9m3u25a1cochcd1aznqxe+736pfgvpmpozkuk6nkk3zlmwvywhkd10oqamjsbxeewnfjvnazvn7xg/cakwvyfhwfoqoshka0okvrdrzzzwyzq/q2tuctbdqw5wwjnnzmjqtbaflkwjwaph7aiukoosla77bpeq4vkcuciuj4z+zdm8cvp3idzctnbgp4afuo0oqgyfws+uv9cacvmidwe+tc2tiefur/cp+af9yvk3zm/nz9b1+efvld+g87ffxbnkqw=</latexit> <latexit sha1_base64="ouldixfcnvh0cecexa9j3mzvbtw=">aaafmhicbzrnb9mwgmezjcaobx0cuvhuvjyq2m1le5veay5dudapqsrhdbtoevpiifvrvxixvgnisgmgrnwkbnduy2zldpzf8/bpeydxmsy1c92bo2pbuxf/wcndzqpht /1wxtuxohbrpuv3gukzpktmjs1hkl8uk4ixo6uv8/vbyl77qqk6k/bnbl3sw4vwelbocgufzu/tdfnnvkrcmx02kq01b8bhpabbdizwd0ad5wzc1rfn6sclafig92euqhwid/gydebkdsudn668bhkymhgpfeokeyrbaz+f7zupfasmfqfisppv2vlh3iqfrl4muquebr0qkskc7ghhibl0dovelwqikjfakxxypinvskbfq1hboheaccoj2ig1ckq2uxcc0o+ki8fegonslfrsivd+x5e04bchoredkfb8ncxmvqy8vcxgwbkzt10du3xi4on524b1veprnvuiumgz9/p379y31ws/enzvm+id1zdkrehnqe70ublnvhaht77ec0xu+9m3u25a1cochcd1aznqxe+736pfgvpmpozkuk6nkk3zlmwvywhkd10oqamjsbxeewnfjvnazvn7xg/cakwvyfhwfoqoshka0okvrdrzzzwyzq/q2tuctbdqw5wwjnnzmjqtbaflkwjwaph7aiukoosla77bpeq4vkcuciuj4z+zdm8cvp3idzctnbgp4afuo0oqgyfws+uv9cacvmidwe+tc2tiefur/cp+af9yvk3zm/nz9b1+efvld+g87ffxbnkqw=</latexit> Exporatory Data Analysis 4. Cross-tabulations Cigs CHD no yes Sum Sum

7 <latexit sha1_base64="atsyvyxeazdgxpdhc9gq4ua2sc=">aaacnhicbvdjsgnbeo1xn25r14ag+ipzii4haqhhnlwogbcyitq06ljmvt0dn01ahyu178d0968kdi1w+wewdccxoe79wr6npbiovb1310rkbhxicmp6ylm7nz8wvfxauze6eaq43hmtyxatmghyiacprwkwhgusdhpohwbvr5fwgynwkvqqaewsreqro0fln4tflu9d1feoh0byq43au6voprlmf4qyzkkjaqr72qe8x3c82xg7oa2ggx/bbtxjfs1hyy+6w6f/g5abe8pufu/9vsztcbryyyype26cyxpffwozqcgesa7ra11cxwlwdsy4dv9umazfg1bz9coms/oziwgdolatszmeyy39qa/e+rpxunkhkhrb8c9fysopxnqqiw0jdrxlzwlgtbb/pbzdnonogy7yelzfj/8ftc3yxtk72sodbodptjevsko2ied2yagpkmnsi5zckgfytf6co+fjexxeplthnnyzth6u8/4b/4ypda==</latexit> <latexit sha1_base64="atsyvyxeazdgxpdhc9gq4ua2sc=">aaacnhicbvdjsgnbeo1xn25r14ag+ipzii4haqhhnlwogbcyitq06ljmvt0dn01ahyu178d0968kdi1w+wewdccxoe79wr6npbiovb1310rkbhxicmp6ylm7nz8wvfxauze6eaq43hmtyxatmghyiacprwkwhgusdhpohwbvr5fwgynwkvqqaewsreqro0fln4tflu9d1feoh0byq43au6voprlmf4qyzkkjaqr72qe8x3c82xg7oa2ggx/bbtxjfs1hyy+6w6f/g5abe8pufu/9vsztcbryyyype26cyxpffwozqcgesa7ra11cxwlwdsy4dv9umazfg1bz9coms/oziwgdolatszmeyy39qa/e+rpxunkhkhrb8c9fysopxnqqiw0jdrxlzwlgtbb/pbzdnonogy7yelzfj/8ftc3yxtk72sodbodptjevsko2ied2yagpkmnsi5zckgfytf6co+fjexxeplthnnyzth6u8/4b/4ypda==</latexit> <latexit sha1_base64="atsyvyxeazdgxpdhc9gq4ua2sc=">aaacnhicbvdjsgnbeo1xn25r14ag+ipzii4haqhhnlwogbcyitq06ljmvt0dn01ahyu178d0968kdi1w+wewdccxoe79wr6npbiovb1310rkbhxicmp6ylm7nz8wvfxauze6eaq43hmtyxatmghyiacprwkwhgusdhpohwbvr5fwgynwkvqqaewsreqro0fln4tflu9d1feoh0byq43au6voprlmf4qyzkkjaqr72qe8x3c82xg7oa2ggx/bbtxjfs1hyy+6w6f/g5abe8pufu/9vsztcbryyyype26cyxpffwozqcgesa7ra11cxwlwdsy4dv9umazfg1bz9coms/oziwgdolatszmeyy39qa/e+rpxunkhkhrb8c9fysopxnqqiw0jdrxlzwlgtbb/pbzdnonogy7yelzfj/8ftc3yxtk72sodbodptjevsko2ied2yagpkmnsi5zckgfytf6co+fjexxeplthnnyzth6u8/4b/4ypda==</latexit> <latexit sha1_base64="atsyvyxeazdgxpdhc9gq4ua2sc=">aaacnhicbvdjsgnbeo1xn25r14ag+ipzii4haqhhnlwogbcyitq06ljmvt0dn01ahyu178d0968kdi1w+wewdccxoe79wr6npbiovb1310rkbhxicmp6ylm7nz8wvfxauze6eaq43hmtyxatmghyiacprwkwhgusdhpohwbvr5fwgynwkvqqaewsreqro0fln4tflu9d1feoh0byq43au6voprlmf4qyzkkjaqr72qe8x3c82xg7oa2ggx/bbtxjfs1hyy+6w6f/g5abe8pufu/9vsztcbryyyype26cyxpffwozqcgesa7ra11cxwlwdsy4dv9umazfg1bz9coms/oziwgdolatszmeyy39qa/e+rpxunkhkhrb8c9fysopxnqqiw0jdrxlzwlgtbb/pbzdnonogy7yelzfj/8ftc3yxtk72sodbodptjevsko2ied2yagpkmnsi5zckgfytf6co+fjexxeplthnnyzth6u8/4b/4ypda==</latexit> Can we use linear regression? Our response is a categorical variables so can we ust use indicator variables and set, Y i = ( 1 if CHD 0 otherwise then use regular least squares multiple regression? No, because 1. predictions will be outside of {0,1} 2. linear assumption might be violated 3. errors certainly won t be normal 4. equal variance is also likely to be violated. We need an entirely new regression framework!

8 Logistic regression Going back to Day 1, we have the following generic framework for statistical modeling: Y i iid p Y (y i ) E(y i )=f(x i1,...,x ip ) E.g, for simple and multiple linear regression modeling we! had: Y i iid N 0 + E(y i )= 0 + p=1 p=1 x ip x ip p, Where the normal assumption was OK because Y was quantitative p 2

9 Logistic regression What s an appropriate distribution when Y i 2 {0, 1}? Bernoulli Distribution: f(y i )=p y i (1 p) 1 y i If our response follows a Bernoulli distribution then E(y i )=p = Prob(Y = 1) So can we ust set E(y i )=p = 0 + p=1 x ip p No because p is has to be between 0 and 1. We need to choose a different math function than we have used before (one that keeps p between 0 and 1).

10 Logistic regression Logistic Regression Model: (Generalized Linear Model) Odds Ratio log Logit Transform Y i ind Bern(p i ) JX = 0 + x i ) p i = exp{ 0 + P J x i } 1 + exp{ 0 + P J x i } Logistic Function 2 (0, 1)

11 Logistic Regression Model: log = 0 + How do we interpret? 1. For every unit increase in x, the log-odds ratio increases by. 2. Just interpret the sign: If > 0, then p i increases as x increases. 3. As x increases by 1, a patient is exp{ } times more likely to have CHD. 4. As x increases by 1, a pateint is more likely to have CHD. JX x i 100 (exp{ } 1)%

12 Logistic Regression Model: Bern(p i ) log = 0 + y i ind x i How do we estimate the s? We use maximum likelihood (see Stat 340) In this class, we ll let R do it for us.

13 <latexit sha1_base64="jxt0eolpdrtt+nssdkr3uxvnt0=">aaacchicbvc7sgnbfl3r2/ikphkbqrgswq5f1eii2fhgcfxiha7uumgzd6yusugzvsb/8evslfqsfut7pwbj4mfrwpdhm65l3vvcvmldbnuhzm1ptm7n7+wwfpaxlldk69vxjgk0wj9kahex4xcojix+irj4vwqkuehwstwcdlyl69rg5ne5zrmsrxxxiy7uncyurvmg6npairenhoa8ibynkpi4idm7fq1g7a5r378h+eu+l7nq37yt1agi0y+9bjxfzhdejxy1pem5krzxrkkjhuqoygykxazulawnmizstfhxjwxat0mhdrnsxexur3ztyhhkzejbgxhqm9/espzpa2bupwzlmk4zwlhmbnuzxshho1hyr2oupiawckgl3zwjptdcka2vzepwfp/8l/71aoqd2bdqmeec7af27ahhhxahu6hat4iuiuheijn5855df6c10nplppvu4efcn4+aauumpc=</latexit> <latexit sha1_base64="v9bxfbvndtj1hmvfhpdeu/sroa=">aaacchicbvdjsgnbeo1xxgl5iregomgcghgg8tbchxgmexgcww9hqqprfnobtgdmncvfgpfoexdype/qrvfow/ygc5ud1o+vfefvx1wlqkbb9yu1nz8zozzcwyotlyyurlbx1c51kiople5modsg0sbgdiwiltfmflaoltmkrk6hfugalrrkf4yafp2kxsegjztbiqyv6fya5fwkyisg9hbvm2suubt2mdt3epwgqnfopqp8sz0jqy376q77sztb5d3rjylieyumdydx07rz5lcwsuuzs/tkdj+zaz0di1zbnrpr5cudnsoxdpllhkx0ph6vsnnkdadkdsveco+/u0nxf+8toa9qz8xczohxhw8qjdjigkdxkk7qgfhotcecsxmrpt3mwictxhle4lz++s/xn2rh9wdmxpgphmrdbjftkhdkgdxjkmsqlnnysb/jenq0769f6sv7hpvpwpkdkfsb6+wjkpvm</latexit> <latexit sha1_base64="v9bxfbvndtj1hmvfhpdeu/sroa=">aaacchicbvdjsgnbeo1xxgl5iregomgcghgg8tbchxgmexgcww9hqqprfnobtgdmncvfgpfoexdype/qrvfow/ygc5ud1o+vfefvx1wlqkbb9yu1nz8zozzcwyotlyyurlbx1c51kiople5modsg0sbgdiwiltfmflaoltmkrk6hfugalrrkf4yafp2kxsegjztbiqyv6fya5fwkyisg9hbvm2suubt2mdt3epwgqnfopqp8sz0jqy376q77sztb5d3rjylieyumdydx07rz5lcwsuuzs/tkdj+zaz0di1zbnrpr5cudnsoxdpllhkx0ph6vsnnkdadkdsveco+/u0nxf+8toa9qz8xczohxhw8qjdjigkdxkk7qgfhotcecsxmrpt3mwictxhle4lz++s/xn2rh9wdmxpgphmrdbjftkhdkgdxjkmsqlnnysb/jenq0769f6sv7hpvpwpkdkfsb6+wjkpvm</latexit> <latexit sha1_base64="qbya1ibvpqu5gwhunvhs1n9bh8=">aaacchicbva9swnben2lxzf+rs1tfongfe4sei2egi1lbgmcuspsbsbjkr0pdufecfxr41+xsvcx9sfy+w/cjfdo4onlh+/nmdpp6xqanvfvmfldw19o7hz2tre2d0r7x/c6shrhfo8kphq+eydfcg0ukcetqyabb6et++mvrte1baroettmlwaymxubwhkbqlak7y6pidleqml8iapg0kw0utqv+1avveumh8gukycnfrimav/ox2i54eeckxtouuy8fopuyh4bkykptoibkfmyldq0mwgpbs2suzptfknw4izv6idkb+7khzopuk8e1lwhckf72p+j/xtxbw7quiboekm8hdrjjmaltwghfkoaoj4ywrotzlfiru4yca9kqnawt14mrbpqrdw5ssunwp5gkryry3jkhfindxjnmqrfohkkz+svvflp1ov1bn3mswtw3nni/sd6/ahoujln</latexit> Logistic Regression Logistic Regression Model: Bern(p i ) log = 0 + y i ind Example: - ˆage = How do we interpret this number? x i 1. As age increases by 1 then the log(odds) goes up by As age increases by 1 then the likelihood of having CHD goes up by 100*(e ) 6.91%.

14 Logistic Regression Model: Bern(p i ) log = 0 + y i ind x i What assumptions are we making? Linear in log-odds (monotone in probability) Scatterplot w/smoother

15 What assumptions are we making? Linear in log-odds (monotone in probability) Scatterplot w/smoother CHD Age

16 Logistic Regression Model: Bern(p i ) log = 0 + y i ind x i What assumptions are we making? Linear in log-odds (monotone in probability) Check using scatterplot w/smoother Independence Normality Equal Variance

17 Logistic Regression Model: Bern(p i ) log = 0 + y i ind x i How can we perform variable selection? Same way as before - compare AIC or BIC.

18 Logistic Regression Model: Bern(p i ) log = 0 + y i ind How do we build confidence intervals (or perform hypothesis tests) for our effects? ˆ N(0, 1) SE( ˆ) ˆ ± z? SE( ˆ) x i

19 <latexit sha1_base64="qkiosvyik2brcvw3l9pnt+s/xic=">aaab+nicbvdlssnafj3uv62vwpdugkvwvrirh7ucg5cvc00iuymn+3qyyozg2kj+ru3lltc+ixu/bunbrbaeudc4zx7z+49qsq4qsv6nipr6xubw9xt2s7u3v5b/bdxqjjmmnbyihlzc6gcwwnwkkoaxiqbrogabc+nfndj5ckj/edtlpwiqmecgzrs359yybafi/dxemmnmhfivfb1otaw5zldglazishb/+5q4slkuqixnuqb5tpelvcjnaoqamylikrvr1/uaxqc5exz3qvzvcsdm0ykrhnufp7iqerutmo0j0rxzfa9mbif14/w/day3mczggxw3wuzslexjwfyq64biziqgllkutdttaiklucdv0cpbyyaveow/dtoz7i2b7skyso7jctknrkibxjhosqhezim3klb0zhvbvxseitwkum0fkd4zphz0mllw=</latexit> <latexit sha1_base64="qkiosvyik2brcvw3l9pnt+s/xic=">aaab+nicbvdlssnafj3uv62vwpdugkvwvrirh7ucg5cvc00iuymn+3qyyozg2kj+ru3lltc+ixu/bunbrbaeudc4zx7z+49qsq4qsv6nipr6xubw9xt2s7u3v5b/bdxqjjmmnbyihlzc6gcwwnwkkoaxiqbrogabc+nfndj5ckj/edtlpwiqmecgzrs359yybafi/dxemmnmhfivfb1otaw5zldglazishb/+5q4slkuqixnuqb5tpelvcjnaoqamylikrvr1/uaxqc5exz3qvzvcsdm0ykrhnufp7iqerutmo0j0rxzfa9mbif14/w/day3mczggxw3wuzslexjwfyq64biziqgllkutdttaiklucdv0cpbyyaveow/dtoz7i2b7skyso7jctknrkibxjhosqhezim3klb0zhvbvxseitwkum0fkd4zphz0mllw=</latexit> <latexit sha1_base64="qkiosvyik2brcvw3l9pnt+s/xic=">aaab+nicbvdlssnafj3uv62vwpdugkvwvrirh7ucg5cvc00iuymn+3qyyozg2kj+ru3lltc+ixu/bunbrbaeudc4zx7z+49qsq4qsv6nipr6xubw9xt2s7u3v5b/bdxqjjmmnbyihlzc6gcwwnwkkoaxiqbrogabc+nfndj5ckj/edtlpwiqmecgzrs359yybafi/dxemmnmhfivfb1otaw5zldglazishb/+5q4slkuqixnuqb5tpelvcjnaoqamylikrvr1/uaxqc5exz3qvzvcsdm0ykrhnufp7iqerutmo0j0rxzfa9mbif14/w/day3mczggxw3wuzslexjwfyq64biziqgllkutdttaiklucdv0cpbyyaveow/dtoz7i2b7skyso7jctknrkibxjhosqhezim3klb0zhvbvxseitwkum0fkd4zphz0mllw=</latexit> <latexit sha1_base64="qkiosvyik2brcvw3l9pnt+s/xic=">aaab+nicbvdlssnafj3uv62vwpdugkvwvrirh7ucg5cvc00iuymn+3qyyozg2kj+ru3lltc+ixu/bunbrbaeudc4zx7z+49qsq4qsv6nipr6xubw9xt2s7u3v5b/bdxqjjmmnbyihlzc6gcwwnwkkoaxiqbrogabc+nfndj5ckj/edtlpwiqmecgzrs359yybafi/dxemmnmhfivfb1otaw5zldglazishb/+5q4slkuqixnuqb5tpelvcjnaoqamylikrvr1/uaxqc5exz3qvzvcsdm0ykrhnufp7iqerutmo0j0rxzfa9mbif14/w/day3mczggxw3wuzslexjwfyq64biziqgllkutdttaiklucdv0cpbyyaveow/dtoz7i2b7skyso7jctknrkibxjhosqhezim3klb0zhvbvxseitwkum0fkd4zphz0mllw=</latexit> Logistic Regression Logistic Regression Model: Bern(p i ) log = 0 + y i ind How do we build confidence intervals (or perform hypothesis tests) for our effects? - 95% CI for age is (0.037, 0.097). - How do we interpret this interval? 1. We are 95% confident that as age increases by 1 the log(odds) of CHD goes up by between and x i

20 <latexit sha1_base64="erwrep1chsy6esb8sicwix8g6a8=">aaacg3icbvdlsgmxfm34rpu16tjnsbsmuidmk9yuhiiblxwslxsgkkntntzimmizeihupfx3lhqcsw48g9mhwttpzbwoode7r3hzmtcqfvy2v1bx1m7ov3d7z3ds3dw7vzjqiqhsk4pfo+vhszklauexx2oofxyhpadmfxk3850vkkxhrrrf1atwp2q9rrdsuscsowi5igvuwi59in3uqyqv4r6r1yk7vukcblajxtipsvosguuflcx8xpewq4tjw5yye56h3z0+1gjaloqahurydfcsvxuixwuk46yasxpgmcz+2nq2xxsdlp8enyv4rxdilhh6hglp1d0ekaylhga8ra6wgctgbip957ut1lryuhxgiaehmg3ojhyqck6rglwlkfb9pgolgeldiblhgonsewr2cs3ymmmu7krt3jzlaufzndlggjwaczigamrggtrbaxdwcj7bk3gznowx4934mjwugpoei/ahxtcpm4cbua==</latexit> <latexit sha1_base64="erwrep1chsy6esb8sicwix8g6a8=">aaacg3icbvdlsgmxfm34rpu16tjnsbsmuidmk9yuhiiblxwslxsgkkntntzimmizeihupfx3lhqcsw48g9mhwttpzbwoode7r3hzmtcqfvy2v1bx1m7ov3d7z3ds3dw7vzjqiqhsk4pfo+vhszklauexx2oofxyhpadmfxk3850vkkxhrrrf1atwp2q9rrdsuscsowi5igvuwi59in3uqyqv4r6r1yk7vukcblajxtipsvosguuflcx8xpewq4tjw5yye56h3z0+1gjaloqahurydfcsvxuixwuk46yasxpgmcz+2nq2xxsdlp8enyv4rxdilhh6hglp1d0ekaylhga8ra6wgctgbip957ut1lryuhxgiaehmg3ojhyqck6rglwlkfb9pgolgeldiblhgonsewr2cs3ymmmu7krt3jzlaufzndlggjwaczigamrggtrbaxdwcj7bk3gznowx4934mjwugpoei/ahxtcpm4cbua==</latexit> <latexit sha1_base64="erwrep1chsy6esb8sicwix8g6a8=">aaacg3icbvdlsgmxfm34rpu16tjnsbsmuidmk9yuhiiblxwslxsgkkntntzimmizeihupfx3lhqcsw48g9mhwttpzbwoode7r3hzmtcqfvy2v1bx1m7ov3d7z3ds3dw7vzjqiqhsk4pfo+vhszklauexx2oofxyhpadmfxk3850vkkxhrrrf1atwp2q9rrdsuscsowi5igvuwi59in3uqyqv4r6r1yk7vukcblajxtipsvosguuflcx8xpewq4tjw5yye56h3z0+1gjaloqahurydfcsvxuixwuk46yasxpgmcz+2nq2xxsdlp8enyv4rxdilhh6hglp1d0ekaylhga8ra6wgctgbip957ut1lryuhxgiaehmg3ojhyqck6rglwlkfb9pgolgeldiblhgonsewr2cs3ymmmu7krt3jzlaufzndlggjwaczigamrggtrbaxdwcj7bk3gznowx4934mjwugpoei/ahxtcpm4cbua==</latexit> <latexit sha1_base64="erwrep1chsy6esb8sicwix8g6a8=">aaacg3icbvdlsgmxfm34rpu16tjnsbsmuidmk9yuhiiblxwslxsgkkntntzimmizeihupfx3lhqcsw48g9mhwttpzbwoode7r3hzmtcqfvy2v1bx1m7ov3d7z3ds3dw7vzjqiqhsk4pfo+vhszklauexx2oofxyhpadmfxk3850vkkxhrrrf1atwp2q9rrdsuscsowi5igvuwi59in3uqyqv4r6r1yk7vukcblajxtipsvosguuflcx8xpewq4tjw5yye56h3z0+1gjaloqahurydfcsvxuixwuk46yasxpgmcz+2nq2xxsdlp8enyv4rxdilhh6hglp1d0ekaylhga8ra6wgctgbip957ut1lryuhxgiaehmg3ojhyqck6rglwlkfb9pgolgeldiblhgonsewr2cs3ymmmu7krt3jzlaufzndlggjwaczigamrggtrbaxdwcj7bk3gznowx4934mjwugpoei/ahxtcpm4cbua==</latexit> <latexit sha1_base64="qkiosvyik2brcvw3l9pnt+s/xic=">aaab+nicbvdlssnafj3uv62vwpdugkvwvrirh7ucg5cvc00iuymn+3qyyozg2kj+ru3lltc+ixu/bunbrbaeudc4zx7z+49qsq4qsv6nipr6xubw9xt2s7u3v5b/bdxqjjmmnbyihlzc6gcwwnwkkoaxiqbrogabc+nfndj5ckj/edtlpwiqmecgzrs359yybafi/dxemmnmhfivfb1otaw5zldglazishb/+5q4slkuqixnuqb5tpelvcjnaoqamylikrvr1/uaxqc5exz3qvzvcsdm0ykrhnufp7iqerutmo0j0rxzfa9mbif14/w/day3mczggxw3wuzslexjwfyq64biziqgllkutdttaiklucdv0cpbyyaveow/dtoz7i2b7skyso7jctknrkibxjhosqhezim3klb0zhvbvxseitwkum0fkd4zphz0mllw=</latexit> <latexit sha1_base64="qkiosvyik2brcvw3l9pnt+s/xic=">aaab+nicbvdlssnafj3uv62vwpdugkvwvrirh7ucg5cvc00iuymn+3qyyozg2kj+ru3lltc+ixu/bunbrbaeudc4zx7z+49qsq4qsv6nipr6xubw9xt2s7u3v5b/bdxqjjmmnbyihlzc6gcwwnwkkoaxiqbrogabc+nfndj5ckj/edtlpwiqmecgzrs359yybafi/dxemmnmhfivfb1otaw5zldglazishb/+5q4slkuqixnuqb5tpelvcjnaoqamylikrvr1/uaxqc5exz3qvzvcsdm0ykrhnufp7iqerutmo0j0rxzfa9mbif14/w/day3mczggxw3wuzslexjwfyq64biziqgllkutdttaiklucdv0cpbyyaveow/dtoz7i2b7skyso7jctknrkibxjhosqhezim3klb0zhvbvxseitwkum0fkd4zphz0mllw=</latexit> <latexit sha1_base64="qkiosvyik2brcvw3l9pnt+s/xic=">aaab+nicbvdlssnafj3uv62vwpdugkvwvrirh7ucg5cvc00iuymn+3qyyozg2kj+ru3lltc+ixu/bunbrbaeudc4zx7z+49qsq4qsv6nipr6xubw9xt2s7u3v5b/bdxqjjmmnbyihlzc6gcwwnwkkoaxiqbrogabc+nfndj5ckj/edtlpwiqmecgzrs359yybafi/dxemmnmhfivfb1otaw5zldglazishb/+5q4slkuqixnuqb5tpelvcjnaoqamylikrvr1/uaxqc5exz3qvzvcsdm0ykrhnufp7iqerutmo0j0rxzfa9mbif14/w/day3mczggxw3wuzslexjwfyq64biziqgllkutdttaiklucdv0cpbyyaveow/dtoz7i2b7skyso7jctknrkibxjhosqhezim3klb0zhvbvxseitwkum0fkd4zphz0mllw=</latexit> <latexit sha1_base64="qkiosvyik2brcvw3l9pnt+s/xic=">aaab+nicbvdlssnafj3uv62vwpdugkvwvrirh7ucg5cvc00iuymn+3qyyozg2kj+ru3lltc+ixu/bunbrbaeudc4zx7z+49qsq4qsv6nipr6xubw9xt2s7u3v5b/bdxqjjmmnbyihlzc6gcwwnwkkoaxiqbrogabc+nfndj5ckj/edtlpwiqmecgzrs359yybafi/dxemmnmhfivfb1otaw5zldglazishb/+5q4slkuqixnuqb5tpelvcjnaoqamylikrvr1/uaxqc5exz3qvzvcsdm0ykrhnufp7iqerutmo0j0rxzfa9mbif14/w/day3mczggxw3wuzslexjwfyq64biziqgllkutdttaiklucdv0cpbyyaveow/dtoz7i2b7skyso7jctknrkibxjhosqhezim3klb0zhvbvxseitwkum0fkd4zphz0mllw=</latexit> Logistic Regression Logistic Regression Model: Bern(p i ) log = 0 + y i ind x i How do we build confidence intervals (or perform hypothesis tests) for our effects? - 95% CI for age is (0.037, 0.097). - How do we interpret this interval? 2. We are 95% confident that as age increases by 1 the likelihood of CHD increases between 100 (exp{(0.037, 0.097)} 1) = (3.7%, 10.2%)

21 Logistic Regression Model: Bern(p i ) log = 0 + y i ind How do we predict? Predict probabilities ˆp = n exp ˆ0 + P P 1 + exp x i p=1 x ip ˆp o n ˆ0 + P P p=1 x ip ˆp o

22 Logistic Regression Model: Bern(p i ) log = 0 + y i ind Many times we want to classify so we set: where ŷ = ( 1 if ˆp>c 0 if ˆp apple c x i c = Cuto Probability

23 <latexit sha1_base64="k0+09d3ps5xyimyxnwvmgsoobaw=">aaacaxicbvdlsgmxfm2mrzq+aguv3qsl4qrmsnv2v3dsipyh3rkywru22ammyqzoyxd+ivu/ai3forpo2h93jbwcs653oqecwdku+6bzc/nlywufzadldw19y3izulwxamk0kix+v9qbrwjqclmezwn0ggucdhlni8got3tyavi8wnhibqiuhfsb6rbuqw3zxa+gzkwkspjziucafqvk5wg3jysmagxayg8e7+kse87/ma8fn/ifhjhvw4o77rmxalrfef4tgqoavxsu8xxqyrfe7vfsltxj4x/ai8hzzrxs1t89coypheittlrqu25ie5krgpgoywcp1wqeppi+ta2ujaivcebrdxch4yjcs+wzgunj+xsr0yipyzryjwr0qp1wxut/2ntvpdqnyyjjnug6hrql+vyx3icow6zbkr50abcjtnvxxrajdgbsuwyelzfx/4lwievesw7pik3zvi0cmgfhab5kfz1ecxqilaikj3a83atnasd7tk79p7u6tt5t1b6efz5u9/ursl</latexit> <latexit sha1_base64="k0+09d3ps5xyimyxnwvmgsoobaw=">aaacaxicbvdlsgmxfm2mrzq+aguv3qsl4qrmsnv2v3dsipyh3rkywru22ammyqzoyxd+ivu/ai3forpo2h93jbwcs653oqecwdku+6bzc/nlywufzadldw19y3izulwxamk0kix+v9qbrwjqclmezwn0ggucdhlni8got3tyavi8wnhibqiuhfsb6rbuqw3zxa+gzkwkspjziucafqvk5wg3jysmagxayg8e7+kse87/ma8fn/ifhjhvw4o77rmxalrfef4tgqoavxsu8xxqyrfe7vfsltxj4x/ai8hzzrxs1t89coypheittlrqu25ie5krgpgoywcp1wqeppi+ta2ujaivcebrdxch4yjcs+wzgunj+xsr0yipyzryjwr0qp1wxut/2ntvpdqnyyjjnug6hrql+vyx3icow6zbkr50abcjtnvxxrajdgbsuwyelzfx/4lwievesw7pik3zvi0cmgfhab5kfz1ecxqilaikj3a83atnasd7tk79p7u6tt5t1b6efz5u9/ursl</latexit> <latexit sha1_base64="k0+09d3ps5xyimyxnwvmgsoobaw=">aaacaxicbvdlsgmxfm2mrzq+aguv3qsl4qrmsnv2v3dsipyh3rkywru22ammyqzoyxd+ivu/ai3forpo2h93jbwcs653oqecwdku+6bzc/nlywufzadldw19y3izulwxamk0kix+v9qbrwjqclmezwn0ggucdhlni8got3tyavi8wnhibqiuhfsb6rbuqw3zxa+gzkwkspjziucafqvk5wg3jysmagxayg8e7+kse87/ma8fn/ifhjhvw4o77rmxalrfef4tgqoavxsu8xxqyrfe7vfsltxj4x/ai8hzzrxs1t89coypheittlrqu25ie5krgpgoywcp1wqeppi+ta2ujaivcebrdxch4yjcs+wzgunj+xsr0yipyzryjwr0qp1wxut/2ntvpdqnyyjjnug6hrql+vyx3icow6zbkr50abcjtnvxxrajdgbsuwyelzfx/4lwievesw7pik3zvi0cmgfhab5kfz1ecxqilaikj3a83atnasd7tk79p7u6tt5t1b6efz5u9/ursl</latexit> <latexit sha1_base64="k0+09d3ps5xyimyxnwvmgsoobaw=">aaacaxicbvdlsgmxfm2mrzq+aguv3qsl4qrmsnv2v3dsipyh3rkywru22ammyqzoyxd+ivu/ai3forpo2h93jbwcs653oqecwdku+6bzc/nlywufzadldw19y3izulwxamk0kix+v9qbrwjqclmezwn0ggucdhlni8got3tyavi8wnhibqiuhfsb6rbuqw3zxa+gzkwkspjziucafqvk5wg3jysmagxayg8e7+kse87/ma8fn/ifhjhvw4o77rmxalrfef4tgqoavxsu8xxqyrfe7vfsltxj4x/ai8hzzrxs1t89coypheittlrqu25ie5krgpgoywcp1wqeppi+ta2ujaivcebrdxch4yjcs+wzgunj+xsr0yipyzryjwr0qp1wxut/2ntvpdqnyyjjnug6hrql+vyx3icow6zbkr50abcjtnvxxrajdgbsuwyelzfx/4lwievesw7pik3zvi0cmgfhab5kfz1ecxqilaikj3a83atnasd7tk79p7u6tt5t1b6efz5u9/ursl</latexit> Logistic Regression Using a cutoff value, we can produce a confusion matrix: Predicted Yes Predicted No True Yes True No Sensitivity: Percent of True Positives (99/(99+158)) Specificity: Percent of True Negatives (446/(446+54)) Positive Predictive Value: % Correctly Predicted Yes s (99/(99+54)) Negative Predictive Value: % Correctly Predicted No s (446/( ))

24 So, how what do we use for the cutoff value? It Depends! Error Rate Overall Error False Negative Rate False Positive Rate Threshold

25 Logistic Regression Model: Bern(p i ) log = 0 + y i ind So, how do we choose the cutoff value? 1. c =0.5! Bayes Classifier 2. Choose c to minimize the misclassification rate 1 n nx I(y i 6=ŷ i ) = Percent Misclassified i=1 x i sensitivity, specificity, positive predicted value or negative predicted value based on cost-benefit analysis.

26 Misclassification Cutoff

27 <latexit sha1_base64="k0+09d3ps5xyimyxnwvmgsoobaw=">aaacaxicbvdlsgmxfm2mrzq+aguv3qsl4qrmsnv2v3dsipyh3rkywru22ammyqzoyxd+ivu/ai3forpo2h93jbwcs653oqecwdku+6bzc/nlywufzadldw19y3izulwxamk0kix+v9qbrwjqclmezwn0ggucdhlni8got3tyavi8wnhibqiuhfsb6rbuqw3zxa+gzkwkspjziucafqvk5wg3jysmagxayg8e7+kse87/ma8fn/ifhjhvw4o77rmxalrfef4tgqoavxsu8xxqyrfe7vfsltxj4x/ai8hzzrxs1t89coypheittlrqu25ie5krgpgoywcp1wqeppi+ta2ujaivcebrdxch4yjcs+wzgunj+xsr0yipyzryjwr0qp1wxut/2ntvpdqnyyjjnug6hrql+vyx3icow6zbkr50abcjtnvxxrajdgbsuwyelzfx/4lwievesw7pik3zvi0cmgfhab5kfz1ecxqilaikj3a83atnasd7tk79p7u6tt5t1b6efz5u9/ursl</latexit> <latexit sha1_base64="k0+09d3ps5xyimyxnwvmgsoobaw=">aaacaxicbvdlsgmxfm2mrzq+aguv3qsl4qrmsnv2v3dsipyh3rkywru22ammyqzoyxd+ivu/ai3forpo2h93jbwcs653oqecwdku+6bzc/nlywufzadldw19y3izulwxamk0kix+v9qbrwjqclmezwn0ggucdhlni8got3tyavi8wnhibqiuhfsb6rbuqw3zxa+gzkwkspjziucafqvk5wg3jysmagxayg8e7+kse87/ma8fn/ifhjhvw4o77rmxalrfef4tgqoavxsu8xxqyrfe7vfsltxj4x/ai8hzzrxs1t89coypheittlrqu25ie5krgpgoywcp1wqeppi+ta2ujaivcebrdxch4yjcs+wzgunj+xsr0yipyzryjwr0qp1wxut/2ntvpdqnyyjjnug6hrql+vyx3icow6zbkr50abcjtnvxxrajdgbsuwyelzfx/4lwievesw7pik3zvi0cmgfhab5kfz1ecxqilaikj3a83atnasd7tk79p7u6tt5t1b6efz5u9/ursl</latexit> <latexit sha1_base64="k0+09d3ps5xyimyxnwvmgsoobaw=">aaacaxicbvdlsgmxfm2mrzq+aguv3qsl4qrmsnv2v3dsipyh3rkywru22ammyqzoyxd+ivu/ai3forpo2h93jbwcs653oqecwdku+6bzc/nlywufzadldw19y3izulwxamk0kix+v9qbrwjqclmezwn0ggucdhlni8got3tyavi8wnhibqiuhfsb6rbuqw3zxa+gzkwkspjziucafqvk5wg3jysmagxayg8e7+kse87/ma8fn/ifhjhvw4o77rmxalrfef4tgqoavxsu8xxqyrfe7vfsltxj4x/ai8hzzrxs1t89coypheittlrqu25ie5krgpgoywcp1wqeppi+ta2ujaivcebrdxch4yjcs+wzgunj+xsr0yipyzryjwr0qp1wxut/2ntvpdqnyyjjnug6hrql+vyx3icow6zbkr50abcjtnvxxrajdgbsuwyelzfx/4lwievesw7pik3zvi0cmgfhab5kfz1ecxqilaikj3a83atnasd7tk79p7u6tt5t1b6efz5u9/ursl</latexit> <latexit sha1_base64="k0+09d3ps5xyimyxnwvmgsoobaw=">aaacaxicbvdlsgmxfm2mrzq+aguv3qsl4qrmsnv2v3dsipyh3rkywru22ammyqzoyxd+ivu/ai3forpo2h93jbwcs653oqecwdku+6bzc/nlywufzadldw19y3izulwxamk0kix+v9qbrwjqclmezwn0ggucdhlni8got3tyavi8wnhibqiuhfsb6rbuqw3zxa+gzkwkspjziucafqvk5wg3jysmagxayg8e7+kse87/ma8fn/ifhjhvw4o77rmxalrfef4tgqoavxsu8xxqyrfe7vfsltxj4x/ai8hzzrxs1t89coypheittlrqu25ie5krgpgoywcp1wqeppi+ta2ujaivcebrdxch4yjcs+wzgunj+xsr0yipyzryjwr0qp1wxut/2ntvpdqnyyjjnug6hrql+vyx3icow6zbkr50abcjtnvxxrajdgbsuwyelzfx/4lwievesw7pik3zvi0cmgfhab5kfz1ecxqilaikj3a83atnasd7tk79p7u6tt5t1b6efz5u9/ursl</latexit> Logistic Regression Logistic Regression Model: Bern(p i ) log = 0 + y i ind How can we tell how well our model fits? In sample confusion matrix: report sensitivity, specificity, etc. for a single cutoff. x i Predicted Yes Predicted No True Yes True No

28 Thought Question: Classification are built on a cut-off. So how well do we do across all cut-offs? ROC (Receiver Operating Characteristic) Curves: For many cut-off values compare the sensitivity to the false positive rate (1-specificity)

29 Thought Question: Classification are built on a cut-off. So how well do we do across all cut-offs? ROC (Receiver Operating Characteristic) Curves: Sensitivity Coin Flip Rate 1 Specificity Summarize an ROC curve by the area under the curve (AUC):

30 Logistic Regression Model: Bern(p i ) log = 0 + y i ind How can we tell how well our model fits? Report the AUC (area under ROC curve) which says how well we classify across all thresholds. x i

31 Logistic Regression Model: Bern(p i ) log = 0 + y i ind How can we tell how well our model fits? Pseudo -R 2 R 2 pseudo =1 Whats Left Over After Model Total Variation x i =1 Residual Deviance Null Deviance Interpretation: Percent of variation in log(p/(1-p)) explained by modeling. Warning: Low R2 values are the norm even if you classify well (upper bound in practice isn t 1).

32 Logistic Regression Model: Bern(p i ) log = 0 + y i ind How can we tell how well our model predicts? Cross validated confusion matrix: Split into test and training sets then report cross validated sensitivity, specificity, positive predicted value, negative predicted value or AUC. x i

33 End of CHD Analysis (see webpage for R code)

Introduction to Logistic Regression

Introduction to Logistic Regression Misclassification 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.0 0.2 0.4 0.6 0.8 1.0 Cutoff Introduction to Logistic Regression Problem & Data Overview Primary Research Questions: 1. What skills are important

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Simple Linear Regression for the MPG Data

Simple Linear Regression for the MPG Data Simple Linear Regression for the MPG Data 2000 2500 3000 3500 15 20 25 30 35 40 45 Wgt MPG What do we do with the data? y i = MPG of i th car x i = Weight of i th car i =1,...,n n = Sample Size Exploratory

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Performance Evaluation and Comparison

Performance Evaluation and Comparison Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation

More information

Lecture 3 Classification, Logistic Regression

Lecture 3 Classification, Logistic Regression Lecture 3 Classification, Logistic Regression Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se F. Lindsten Summary

More information

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about

More information

Multiple Linear Regression for the Salary Data

Multiple Linear Regression for the Salary Data Multiple Linear Regression for the Salary Data 5 10 15 20 10000 15000 20000 25000 Experience Salary HS BS BS+ 5 10 15 20 10000 15000 20000 25000 Experience Salary No Yes Problem & Data Overview Primary

More information

Multiple Linear Regression for the Supervisor Data

Multiple Linear Regression for the Supervisor Data for the Supervisor Data Rating 40 50 60 70 80 90 40 50 60 70 50 60 70 80 90 40 60 80 40 60 80 Complaints Privileges 30 50 70 40 60 Learn Raises 50 70 50 70 90 Critical 40 50 60 70 80 30 40 50 60 70 80

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Statistical Methods for SVM

Statistical Methods for SVM Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Chapter 10 Logistic Regression

Chapter 10 Logistic Regression Chapter 10 Logistic Regression Data Mining for Business Intelligence Shmueli, Patel & Bruce Galit Shmueli and Peter Bruce 2010 Logistic Regression Extends idea of linear regression to situation where outcome

More information

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Logistic Regression 21/05

Logistic Regression 21/05 Logistic Regression 21/05 Recall that we are trying to solve a classification problem in which features x i can be continuous or discrete (coded as 0/1) and the response y is discrete (0/1). Logistic regression

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection Model comparison Patrick Breheny March 28 Patrick Breheny BST 760: Advanced Regression 1/25 Wells in Bangladesh In this lecture and the next, we will consider a data set involving modeling the decisions

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Minimum Description Length (MDL)

Minimum Description Length (MDL) Minimum Description Length (MDL) Lyle Ungar AIC Akaike Information Criterion BIC Bayesian Information Criterion RIC Risk Inflation Criterion MDL u Sender and receiver both know X u Want to send y using

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

LDA, QDA, Naive Bayes

LDA, QDA, Naive Bayes LDA, QDA, Naive Bayes Generative Classification Models Marek Petrik 2/16/2017 Last Class Logistic Regression Maximum Likelihood Principle Logistic Regression Predict probability of a class: p(x) Example:

More information

Statistical Consulting Topics Classification and Regression Trees (CART)

Statistical Consulting Topics Classification and Regression Trees (CART) Statistical Consulting Topics Classification and Regression Trees (CART) Suppose the main goal in a data analysis is the prediction of a categorical variable outcome. Such as in the examples below. Given

More information

BMI 541/699 Lecture 22

BMI 541/699 Lecture 22 BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based

More information

Statistical Modelling with Stata: Binary Outcomes

Statistical Modelling with Stata: Binary Outcomes Statistical Modelling with Stata: Binary Outcomes Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 21/11/2017 Cross-tabulation Exposed Unexposed Total Cases a b a + b Controls

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

Regression modeling for categorical data. Part II : Model selection and prediction

Regression modeling for categorical data. Part II : Model selection and prediction Regression modeling for categorical data Part II : Model selection and prediction David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625 http://math.agrocampus-ouest.fr/infogluedeliverlive/membres/david.causeur

More information

Lecture 6: Linear Regression

Lecture 6: Linear Regression Lecture 6: Linear Regression Reading: Sections 3.1-3 STATS 202: Data mining and analysis Jonathan Taylor, 10/5 Slide credits: Sergio Bacallado 1 / 30 Simple linear regression Model: y i = β 0 + β 1 x i

More information

Simple Linear Regression for the Advertising Data

Simple Linear Regression for the Advertising Data Revenue 0 10 20 30 40 50 5 10 15 20 25 Pages of Advertising Simple Linear Regression for the Advertising Data What do we do with the data? y i = Revenue of i th Issue x i = Pages of Advertisement in i

More information

Statistical Methods for Data Mining

Statistical Methods for Data Mining Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find

More information

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses. 1 Review: Let X 1, X,..., X n denote n independent random variables sampled from some distribution might not be normal!) with mean µ) and standard deviation σ). Then X µ σ n In other words, X is approximately

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

y ˆ i = ˆ  T u i ( i th fitted value or i th fit) 1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u

More information

Binary Response: Logistic Regression. STAT 526 Professor Olga Vitek

Binary Response: Logistic Regression. STAT 526 Professor Olga Vitek Binary Response: Logistic Regression STAT 526 Professor Olga Vitek March 29, 2011 4 Model Specification and Interpretation 4-1 Probability Distribution of a Binary Outcome Y In many situations, the response

More information

Classification: Logistic Regression and Naive Bayes Book Chapter 4. Carlos M. Carvalho The University of Texas McCombs School of Business

Classification: Logistic Regression and Naive Bayes Book Chapter 4. Carlos M. Carvalho The University of Texas McCombs School of Business Classification: Logistic Regression and Naive Bayes Book Chapter 4. Carlos M. Carvalho The University of Texas McCombs School of Business 1 1. Classification 2. Logistic Regression, One Predictor 3. Inference:

More information

The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is

The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is Example The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is log p 1 p = β 0 + β 1 f 1 (y 1 ) +... + β d f d (y d ).

More information

Lecture 6: Linear Regression (continued)

Lecture 6: Linear Regression (continued) Lecture 6: Linear Regression (continued) Reading: Sections 3.1-3.3 STATS 202: Data mining and analysis October 6, 2017 1 / 23 Multiple linear regression Y = β 0 + β 1 X 1 + + β p X p + ε Y ε N (0, σ) i.i.d.

More information

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler + Machine Learning and Data Mining Bayes Classifiers Prof. Alexander Ihler A basic classifier Training data D={x (i),y (i) }, Classifier f(x ; D) Discrete feature vector x f(x ; D) is a con@ngency table

More information

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression

More information

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear.

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. Linear regression Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. 1/48 Linear regression Linear regression is a simple approach

More information

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression 22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,

More information

Regression modeling for categorical data. Part II : Model selection and prediction

Regression modeling for categorical data. Part II : Model selection and prediction Regression modeling for categorical data Part II : Model selection and prediction David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625 http://math.agrocampus-ouest.fr/infogluedeliverlive/membres/david.causeur

More information

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

More information

Using the same data as before, here is part of the output we get in Stata when we do a logistic regression of Grade on Gpa, Tuce and Psi.

Using the same data as before, here is part of the output we get in Stata when we do a logistic regression of Grade on Gpa, Tuce and Psi. Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 14, 2018 This handout steals heavily

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information

Lecture 01: Introduction

Lecture 01: Introduction Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction

More information

Boosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13

Boosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13 Boosting Ryan Tibshirani Data Mining: 36-462/36-662 April 25 2013 Optional reading: ISL 8.2, ESL 10.1 10.4, 10.7, 10.13 1 Reminder: classification trees Suppose that we are given training data (x i, y

More information

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example

More information

Introduction to Signal Detection and Classification. Phani Chavali

Introduction to Signal Detection and Classification. Phani Chavali Introduction to Signal Detection and Classification Phani Chavali Outline Detection Problem Performance Measures Receiver Operating Characteristics (ROC) F-Test - Test Linear Discriminant Analysis (LDA)

More information

Introduction to Data Science

Introduction to Data Science Introduction to Data Science Winter Semester 2018/19 Oliver Ernst TU Chemnitz, Fakultät für Mathematik, Professur Numerische Mathematik Lecture Slides Contents I 1 What is Data Science? 2 Learning Theory

More information

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. Take random samples from each of m populations. 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel Lecture 2 Judging the Performance of Classifiers Nitin R. Patel 1 In this note we will examine the question of how to udge the usefulness of a classifier and how to compare different classifiers. Not only

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press, Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml CHAPTER 14: Assessing and Comparing Classification Algorithms

More information

36-463/663: Multilevel & Hierarchical Models

36-463/663: Multilevel & Hierarchical Models 36-463/663: Multilevel & Hierarchical Models (P)review: in-class midterm Brian Junker 132E Baker Hall brian@stat.cmu.edu 1 In-class midterm Closed book, closed notes, closed electronics (otherwise I have

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Article from Predictive Analytics and Futurism July 2016 Issue 13 Regression and Classification: A Deeper Look By Jeff Heaton Classification and regression are the two most common forms of models fitted

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

ECE521 Lecture7. Logistic Regression

ECE521 Lecture7. Logistic Regression ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard

More information

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Performance Evaluation

Performance Evaluation Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

Classification. Classification is similar to regression in that the goal is to use covariates to predict on outcome.

Classification. Classification is similar to regression in that the goal is to use covariates to predict on outcome. Classification Classification is similar to regression in that the goal is to use covariates to predict on outcome. We still have a vector of covariates X. However, the response is binary (or a few classes),

More information

Classification: Linear Discriminant Analysis

Classification: Linear Discriminant Analysis Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based

More information

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP Personal Healthcare Revolution Electronic health records (CFH) Personal genomics (DeCode, Navigenics, 23andMe) X-prize: first $10k human genome technology

More information

Simple logistic regression

Simple logistic regression Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

How do we compare the relative performance among competing models?

How do we compare the relative performance among competing models? How do we compare the relative performance among competing models? 1 Comparing Data Mining Methods Frequent problem: we want to know which of the two learning techniques is better How to reliably say Model

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 7 Unsupervised Learning Statistical Perspective Probability Models Discrete & Continuous: Gaussian, Bernoulli, Multinomial Maimum Likelihood Logistic

More information

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12 Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose

More information

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla.

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla. Experimental Design and Statistical Methods Workshop LOGISTIC REGRESSION Jesús Piedrafita Arilla jesus.piedrafita@uab.cat Departament de Ciència Animal i dels Aliments Items Logistic regression model Logit

More information

STA 450/4000 S: January

STA 450/4000 S: January STA 450/4000 S: January 6 005 Notes Friday tutorial on R programming reminder office hours on - F; -4 R The book Modern Applied Statistics with S by Venables and Ripley is very useful. Make sure you have

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Yan Lu Jan, 2018, week 3 1 / 67 Hypothesis tests Likelihood ratio tests Wald tests Score tests 2 / 67 Generalized Likelihood ratio tests Let Y = (Y 1,

More information

Lecture 5: Clustering, Linear Regression

Lecture 5: Clustering, Linear Regression Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 .0.0 5 5 1.0 7 5 X2 X2 7 1.5 1.0 0.5 3 1 2 Hierarchical clustering

More information

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two

More information

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 When and why do we use logistic regression? Binary Multinomial Theory behind logistic regression Assessing the model Assessing predictors

More information

Data-analysis and Retrieval Ordinal Classification

Data-analysis and Retrieval Ordinal Classification Data-analysis and Retrieval Ordinal Classification Ad Feelders Universiteit Utrecht Data-analysis and Retrieval 1 / 30 Strongly disagree Ordinal Classification 1 2 3 4 5 0% (0) 10.5% (2) 21.1% (4) 42.1%

More information

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released

More information

Modelling Binary Outcomes 21/11/2017

Modelling Binary Outcomes 21/11/2017 Modelling Binary Outcomes 21/11/2017 Contents 1 Modelling Binary Outcomes 5 1.1 Cross-tabulation.................................... 5 1.1.1 Measures of Effect............................... 6 1.1.2 Limitations

More information

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments. Analysis of Covariance In some experiments, the experimental units (subjects) are nonhomogeneous or there is variation in the experimental conditions that are not due to the treatments. For example, a

More information

Hypothesis tests

Hypothesis tests 6.1 6.4 Hypothesis tests Prof. Tesler Math 186 February 26, 2014 Prof. Tesler 6.1 6.4 Hypothesis tests Math 186 / February 26, 2014 1 / 41 6.1 6.2 Intro to hypothesis tests and decision rules Hypothesis

More information

CS 195-5: Machine Learning Problem Set 2

CS 195-5: Machine Learning Problem Set 2 Decision Theory Problem CS 95-5: Machine Learning Problem Set 2 Douglas Lanman dlanman@brown.edu October 26 Part : In this problem we will examine randomized classifiers. Define, for any point x, a probabilty

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

1. Logistic Regression, One Predictor 2. Inference: Estimating the Parameters 3. Multiple Logistic Regression 4. AIC and BIC in Logistic Regression

1. Logistic Regression, One Predictor 2. Inference: Estimating the Parameters 3. Multiple Logistic Regression 4. AIC and BIC in Logistic Regression Logistic Regression 1. Logistic Regression, One Predictor 2. Inference: Estimating the Parameters 3. Multiple Logistic Regression 4. AIC and BIC in Logistic Regression 5. Target Marketing: Tabloid Data

More information

Confidence Intervals and Hypothesis Tests

Confidence Intervals and Hypothesis Tests Confidence Intervals and Hypothesis Tests STA 281 Fall 2011 1 Background The central limit theorem provides a very powerful tool for determining the distribution of sample means for large sample sizes.

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 3: Detection Theory January 2018 Heikki Huttunen heikki.huttunen@tut.fi Department of Signal Processing Tampere University of Technology Detection theory

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information