The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above
|
|
- Laureen Foster
- 5 years ago
- Views:
Transcription
1 The conjugate pror to a Bernoull s A) Bernoull B) Gaussan C) Beta D) none of the above
2 The conjugate pror to a Gaussan s A) Bernoull B) Gaussan C) Beta D) none of the above
3 MAP estmates A) argmax θ p(θ D) B) argmax θ p(d θ) C) argmax θ p(d θ)p(θ) D) None of the above
4 MLE estmates A) argmax θ p(θ D) B) argmax θ p(d θ) C) argmax θ p(d θ)p(θ) D) None of the above
5 Consstent estmator A consstent estmator (or asymptotcally consstent estmator) s an estmator a rule for computng estmates of a parameter θ havng the property that as the number of data ponts used ncreases ndefntely, the resultng sequence of estmates converges n probablty to the true parameter θ.
6 Whch s consstent for our conflppng example? A) MLE B) MAP C) Both D) Nether P(D θ) P(θ D) ~ P(D θ)p(θ) Slde 6
7 Covarance Gven random varables X and Y wth jont densty p(x, y) and means E(X) = µ 1, E(Y) = µ The covarance of X and Y s cov(x,y) = E[(X µ 1 )(Y µ )] cov(x, Y) = E(XY) E(X) E(Y) Proof follows easly from the defnton cov(x, X) = var(x)
8 Covarance If X and Y are ndependent then cov(x, Y) = 0. A) True B) False If cov(x, Y) = 0 then X and Y are ndependent. A) True B) False
9 Covarance If X and Y are ndependent then cov(x, Y) = 0 Proof: Independence of X and Y mples that E(XY) = E(X)E(Y). Remark: The converse f NOT true n general. It can happen that the covarance s 0 but X and Y are hghly dependent. (Try to thnk of an example.) For the bvarate normal case the converse does hold.
10 Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your own needs. PowerPont orgnals are avalable. If you make use of a sgnfcant porton of these sldes n your own lecture, please nclude ths message, or the followng lnk to the source repostory of Andrew s ntroducton tutorals: to regresson Comments and correctons gratefully receved. Mostly by Andrew W. Moore But wth modfcatons by Lyle Ungar Copyrght 001, 003, Andrew W. Moore Slde 10
11 Two nterpretatons of regresson Lnear regresson ŷ = w. x Probablstc/Bayesan (MLE and MAP) y ~ N(w. x, σ ) argmax w p(d w) argmax w p(d w)p(w) Error mnmzaton y - w. X p p + λ w q q here: argmax w p(y w,x) But frst, we ll look at Gaussans
12 Sngle-Parameter Lnear Regresson Copyrght 001, 003, Andrew W. Moore
13 Lnear Regresson nputs outputs 1 w x 1 = 1 y 1 = 1 x = 3 y =. x 3 = y 3 = x 4 = 1.5 y 4 = 1.9 x 5 = 4 y 5 = 3.1 Lnear regresson assumes that the expected value of the output gven an nput, E[y x], s lnear. Smplest case: Out(x) = wx for some unknown w. Copyrght 001, 003, Andrew W. Moore Gven the data, we can estmate w.
14 Copyrght 001, 003, Andrew W. Moore 1-parameter lnear regresson Assume that the data s formed by y = wx + nose where the nose sgnals are ndependent the nose has a normal dstrbuton wth mean 0 and unknown varance σ p(y w,x) has a normal dstrbuton wth mean wx varance σ
15 Bayesan Lnear Regresson Copyrght 001, 003, Andrew W. Moore p(y w,x) = Normal (mean: wx, varance: σ ) y ~ N(wx, σ ) We have a set of data (x 1,y 1 ) (x,y ) (x n,y n ) We want to nfer w from the data. p(w x 1, x, x 3, x n, y 1, y y n ) = P(w D) You can use BAYES rule to work out a posteror dstrbuton for w gven the data. Or you could do Maxmum Lkelhood Estmaton
16 Maxmum lkelhood estmaton of w MLE asks : For whch value of w s ths data most lkely to have happened? <=> For what w s p(y 1, y y n w, x 1, x, x 3, x n ) maxmzed? <=> n For what w s ( y w, x ) maxmzed? Copyrght 001, 003, Andrew W. Moore p = 1
17 For what w s n For what w s Copyrght 001, 003, Andrew W. Moore p( y = 1 For what w s For what w s w, x 1 ( ) n y exp( wx = 1 σ n 1 = 1 σ n = 1 y ) maxmzed? wx ) maxmzed? maxmzed? ( ) mnmzed? y wx
18 Frst result MLE wth Gaussan nose s the same as mnmzng the L error argmn n =1 ( y wx ) Copyrght 001, 003, Andrew W. Moore
19 Lnear Regresson The maxmum lkelhood w s the one that mnmzes sum-of-squares of resduals We want to mnmze a quadratc functon of w. Copyrght 001, 003, Andrew W. Moore Ε = = E(w) y ( y wx ) w ( x y ) w + ( x ) w
20 Easy to show the sum of squares s mnmzed when x y w = x Copyrght 001, 003, Andrew W. Moore Lnear Regresson The maxmum lkelhood model s Out x We can use t for predcton ( ) = wx
21 Lnear Regresson Easy to show the sum of squares s mnmzed when x y w = x The maxmum lkelhood model s Out ( x ) = wx We can use t for predcton p(w) w Note: In Bayesan stats you d have ended up wth a prob dstrbuton of w And predctons would have gven a prob dsrbuton of expected output Often useful to know your confdence. Max lkelhood can gve some knds of confdence too. Copyrght 001, 003, Andrew W. Moore
22 But what about MAP? MLE n argmax p(y w, x ) =1 MAP argmax n =1 p(y w, x )p(w) Copyrght 001, 003, Andrew W. Moore
23 MAP argmax We assumed y ~ N(w x, σ ) But what about MAP? Now add a pror that assumpton that w ~ N(0, γ ) n =1 p(y w, x )p(w)
24 For what w s n =1 p(y For what w s n =1 exp( 1 For what w s n =1 1 For what w s n =1 # % $ w, x ) p(w) maxmzed? ( y wx σ y wx σ ( y wx ) ) ) exp( 1 ( w γ ) )maxmzed? & ( ' 1 ( w γ ) maxmzed? + ( σ w γ ) mnmzed? Copyrght 001, 003, Andrew W. Moore
25 Second result MAP wth a Gaussan pror on w s the same as mnmzng the L error plus an L penalty on w argmn Ths s called Rdge regresson Shrnkage Regularzaton Copyrght 001, 003, Andrew W. Moore n =1 ( y wx ) + λw
26 The speed of lectures s A) too slow B) good C) too fast Copyrght Andrew W. Moore
27 Multvarate Lnear Regresson Copyrght 001, 003, Andrew W. Moore
28 Multvarate Regresson What f the nputs are vectors? x -d nput example Copyrght 001, 003, Andrew W. Moore x 1 Dataset has form x 1 y 1 x y x 3 y 3.: :. x n y n
29 Multvarate Regresson Wrte matrx X and Y thus: x =...x x...!...x n... = x 11 x 1... x 1p x 1 x... x p! x n1 x n... x np y = y 1 y! y n (There are R data ponts. Each nput has m components) The lnear regresson model assumes a vector w such that Out(x) = x. w = w 1 x[1] + w x[] +.w p x[p] Copyrght 001, 003, Andrew W. Moore The max. lkelhood w s w = (X T X) -1 (X T y)
30 Multvarate Regresson Wrte matrx X and Y thus: x =... x... x!... x 1 R = x x x 11 1 R1 x x x 1 R......!... x x x 1m m Rm y = y y! y 1 R (There are R dataponts. Each nput has m components) The lnear regresson model assumes a vector w such that Out(x) = w T x = w 1 x[1] + w x[] +.w m x[d] Copyrght 001, 003, Andrew W. Moore IMPORTANT EXERCISE: PROVE IT!!!!! The max. lkelhood w s w = (X T X) -1 (X T Y)
31 Multvarate Regresson (con t) The max. lkelhood w s w = (X T X) -1 (X T y) X T X s an m x m matrx:,j th element s X T Y s an m-element vector: th element R k = 1 R k=1 x k x kj x k y k Copyrght 001, 003, Andrew W. Moore
32 Constant Term n Lnear Regresson Copyrght 001, 003, Andrew W. Moore
33 What about a constant term? We may expect lnear data that does not go through the orgn. Statstcans and Neural Net Folks all agree on a smple obvous hack. Can you guess?? Copyrght 001, 003, Andrew W. Moore
34 The constant term The trck s to create a fake nput X 0 that always takes the value 1 X 1 X Y Before: Y=w 1 X 1 + w X has to be a poor model Copyrght 001, 003, Andrew W. Moore In ths example, You should be able to see the MLE w 0, w 1 and w by nspecton X 0 X 1 X Y After: Y= w 0 X 0 +w 1 X 1 + w X = w 0 +w 1 X 1 + w X has a fne constant term
35 Heteroscedastcty... Lnear Regresson wth varyng nose Copyrght 001, 003, Andrew W. Moore
36 Regresson wth varyng nose Suppose you know the varance of the nose that was added to each datapont. x y σ ½ ½ /4 3 4 y=3 y= y=1 y=0 σ= σ=1 3 1/4 ~ (, Assume y N wx σ ) σ= σ=1/ x=0 x=1 x= x=3 σ=1/ Copyrght 001, 003, Andrew W. Moore
37 Copyrght 001, 003, Andrew W. Moore MLE estmaton wth varyng nose = ),,...,,,,...,,,...,, ( log argmax w x x x y y y p w R R R σ σ σ = = R y wx w 1 ) ( argmn σ = = = 0 ) ( such that 1 R wx y x w σ = = R R x y x 1 1 σ σ Assumng ndependence among nose and then pluggng n equaton for Gaussan and smplfyng. Settng dll/dw equal to zero Trval algebra
38 Ths s Weghted Regresson We are askng to mnmze the weghted sum of squares argmn w R = 1 ( y wx σ ) y=3 y= y=1 y=0 where weght for th datapont s σ= σ=1 σ= σ=1/ x=0 x=1 x= x=3 1 σ σ=1/ Copyrght 001, 003, Andrew W. Moore
39 Non-lnear Regresson Copyrght 001, 003, Andrew W. Moore
40 Non-lnear Regresson Suppose you know that y s related to a functon of x n such a way that the predcted values have a non-lnear dependence on w, e.g: x ½ y ½ y=3 y= y=1 y=0 x=0 x=1 x= x=3 Assume y ~ N( w + x, σ ) Copyrght 001, 003, Andrew W. Moore
41 argmax w Non-lnear MLE estmaton log p( y1, y,..., yr x1, x,..., x argmn w w such that R = 1 R = 1 ( y w + ) = y x w + x w + x R = 0 =, σ, w) = Assumng..d. and then pluggng n equaton for Gaussan and smplfyng. Settng dll/dw equal to zero Copyrght 001, 003, Andrew W. Moore
42 argmax w Non-lnear MLE estmaton log p( y1, y,..., yr x1, x,..., x argmn w w such that R = 1 R = 1 ( y w + ) = y x w + x w + x R = 0 =, σ, w) = Assumng..d. and then pluggng n equaton for Gaussan and smplfyng. Settng dll/dw equal to zero Copyrght 001, 003, Andrew W. Moore We re down the algebrac tolet So guess what we do?
43 argmax Copyrght 001, 003, Andrew W. Moore Non-lnear MLE estmaton log p( y1, y,..., yr x1, x,..., xr, σ, w) = w Common (but not only) approach: R Numercal Solutons: argmn ( y w + x ) = = 1 Lne Search w Smulated Annealng R y + w x Gradent Descent w such that = = 0 = 1 w + x Conjugate Gradent Levenberg Marquart Newton s Method Also, specal purpose statstcaloptmzaton-specfc trcks such as E.M. (See Gaussan Mxtures lecture for ntroducton) Assumng..d. and then pluggng n equaton for Gaussan and smplfyng. Settng dll/dw equal to zero We re down the algebrac tolet So guess what we do?
44 Polynomal Regresson Copyrght 001, 003, Andrew W. Moore
45 X 1 X Y Polynomal Regresson So far we ve manly been dealng wth lnear regresson X= 3 y= 1 1 : : : : : x 1 =(3,).. y 1 =7.. Z= 1 3 y= : : : β=(z T Z) -1 (Z T y) z 1 =(1,3,).. z k =(1,x k1,x k ) y 1 = : y est = β 0 + β 1 x 1 + β x Copyrght 001, 003, Andrew W. Moore
46 Quadratc Regresson It s trval to do lnear fts of fxed nonlnear bass functons X 1 X Y X= 3 y= : : : : : x 1 =(3,).. y 1 = y= Z= β=(z : : T Z) -1 (Z T y) : z=(1, x 1, x, x 1, x 1 x,x y, ) est = β 0 + β 1 x 1 + β x + β 3 x 1 + β 4 x 1 x + β 5 x : Copyrght 001, 003, Andrew W. Moore
47 Quadratc Regresson It s Each trval component to do lnear of a fts z vector of fxed s called nonlnear a term. bass functons X 1 X Y X= 3 y= Each column of the Z matrx s called a term 7 column How many terms n a quadratc regresson wth m nputs? : : : 1 constant term : : : x 1 =(3,).. y 1 =7.. 1 m 3 lnear terms y= Z= 7 1 (m+1)-choose = 1 m(m+1)/ 1 quadratc terms 3 β=(z : (m+)-choose- terms n : total = O(m ) T Z) -1 (Z T y) : z=(1, x 1, x, x 1, x 1 x,x y, ) est = β 0 + β 1 x 1 + β x + β 3 x 1 + β 4 x 1 x + β 5 x Note that solvng β=(z T Z) -1 (Z T y) s thus O(m 6 ) Copyrght 001, 003, Andrew W. Moore
48 Q th -degree polynomal Regresson X 1 X Y X= 3 y= 1 1 : : : : : x 1 =(3,).. y 1 = y= 7 Z= : : z=(all products of powers of nputs n whch sum of powers s q or less, ) 7 3 : β=(z T Z) -1 (Z T y) y est = β 0 + β 1 x 1 + Copyrght 001, 003, Andrew W. Moore
49 m nputs, degree Q: how many terms? = the number of unque terms m of the form q1 q qm x x... x where q Q 1 m =1 = the number of unque terms m of the form q0 q1 q qm 1 x x... x where q = Q 1 m = the number of lsts of non-negatve ntegers [q 0,q 1,q,..q m ] n whch Σq = Q = the number of ways of placng Q red dsks on a row of squares of length Q+m = (Q+m)-choose-Q =0 q 0 = q 1 = q =0 q 3 =4 q 4 =3 Q=11, m=4 Copyrght 001, 003, Andrew W. Moore
50 What we have seen MLE wth Gaussan nose s the same as mnmzng the L error Other nose models wll gve other loss functons MLE wth a Gaussan pror adds a penalty to the L error, gvng Rdge regresson Other prors wll gve dfferent penaltes One can make nonlnear relatons lnear by transformng the features Polynomal regresson Radal Bass Functons (RBF) wll be covered later
MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationInstance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification
Instance-Based earnng (a.k.a. memory-based learnng) Part I: Nearest Neghbor Classfcaton Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n
More informationClustering with Gaussian Mixtures
Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your
More informationSingle- Parameter Linear Regression
Predctng Real-valued outputs an ntroducton to Regresson Note to other teachers and users of these sldes. Andrew would be delghted f ou found ths source materal useful n gvng our own lectures. Feel free
More informationxp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ
CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationLearning with Maximum Likelihood
Learnng wth Mamum Lelhood Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm,
More informationCS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements
CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationClustering with Gaussian Mixtures
Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your
More information8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore
8/5/17 Data Modelng Patrce Koehl Department of Bologcal Scences atonal Unversty of Sngapore http://www.cs.ucdavs.edu/~koehl/teachng/bl59 koehl@cs.ucdavs.edu Data Modelng Ø Data Modelng: least squares Ø
More informationOutline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil
Outlne Multvarate Parametrc Methods Steven J Zel Old Domnon Unv. Fall 2010 1 Multvarate Data 2 Multvarate ormal Dstrbuton 3 Multvarate Classfcaton Dscrmnants Tunng Complexty Dscrete Features 4 Multvarate
More informationMaximum Likelihood Estimation
Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9 Why MLE?
More informationGenerative classification models
CS 675 Intro to Machne Learnng Lecture Generatve classfcaton models Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Data: D { d, d,.., dn} d, Classfcaton represents a dscrete class value Goal: learn
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationThe Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD
he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s
More informationSupport Vector Machines
Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class
More informationJAB Chain. Long-tail claims development. ASTIN - September 2005 B.Verdier A. Klinger
JAB Chan Long-tal clams development ASTIN - September 2005 B.Verder A. Klnger Outlne Chan Ladder : comments A frst soluton: Munch Chan Ladder JAB Chan Chan Ladder: Comments Black lne: average pad to ncurred
More informationMACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression
11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationClassification learning II
Lecture 8 Classfcaton learnng II Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Logstc regresson model Defnes a lnear decson boundar Dscrmnant functons: g g g g here g z / e z f, g g - s a logstc functon
More informationSupport Vector Machines. Vibhav Gogate The University of Texas at dallas
Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More informationFirst Year Examination Department of Statistics, University of Florida
Frst Year Examnaton Department of Statstcs, Unversty of Florda May 7, 010, 8:00 am - 1:00 noon Instructons: 1. You have four hours to answer questons n ths examnaton.. You must show your work to receve
More informationRegression Analysis. Regression Analysis
Regresson Analyss Smple Regresson Multvarate Regresson Stepwse Regresson Replcaton and Predcton Error 1 Regresson Analyss In general, we "ft" a model by mnmzng a metrc that represents the error. n mn (y
More informationMaximum Likelihood Estimation (MLE)
Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y
More informationHidden Markov Models
Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your
More informationLinear Feature Engineering 11
Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19
More informationMachine learning: Density estimation
CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of
More informationPredictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore
Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.
More informationβ0 + β1xi and want to estimate the unknown
SLR Models Estmaton Those OLS Estmates Estmators (e ante) v. estmates (e post) The Smple Lnear Regresson (SLR) Condtons -4 An Asde: The Populaton Regresson Functon B and B are Lnear Estmators (condtonal
More informationLogistic Classifier CISC 5800 Professor Daniel Leeds
lon 9/7/8 Logstc Classfer CISC 58 Professor Danel Leeds Classfcaton strategy: generatve vs. dscrmnatve Generatve, e.g., Bayes/Naïve Bayes: 5 5 Identfy probablty dstrbuton for each class Determne class
More informationRegression and Classification with Neural Networks
egresson and Classfcaton th Neural Netors Note to other teachers and users of these sldes. Andre ould be delghted f you found ths source materal useful n gvng your on lectures. Feel free to use these sldes
More informationSemi-Supervised Learning
Sem-Supervsed Learnng Consder the problem of Prepostonal Phrase Attachment. Buy car wth money ; buy car wth wheel There are several ways to generate features. Gven the lmted representaton, we can assume
More informationParametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010
Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton
More informationChapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.
Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where y + = β + β e for =,..., y and are observable varables e s a random error How can an estmaton rule be constructed for the
More informationFinite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin
Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of
More information15-381: Artificial Intelligence. Regression and cross validation
15-381: Artfcal Intellgence Regresson and cross valdaton Where e are Inputs Densty Estmator Probablty Inputs Classfer Predct category Inputs Regressor Predct real no. Today Lnear regresson Gven an nput
More informationC4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )
C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z
More informatione i is a random error
Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where + β + β e for,..., and are observable varables e s a random error How can an estmaton rule be constructed for the unknown
More informationSupport Vector Machines
/14/018 Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x
More informationChapter 11: Simple Linear Regression and Correlation
Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests
More informationINF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018
INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton
More informationNegative Binomial Regression
STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationSpace of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics
/7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space
More informationSTAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).
(out of 15 ponts) STAT 3340 Assgnment 1 solutons (10) (10) 1. Fnd the equaton of the lne whch passes through the ponts (1,1) and (4,5). β 1 = (5 1)/(4 1) = 4/3 equaton for the lne s y y 0 = β 1 (x x 0
More informationTopic 5: Non-Linear Regression
Topc 5: Non-Lnear Regresson The models we ve worked wth so far have been lnear n the parameters. They ve been of the form: y = Xβ + ε Many models based on economc theory are actually non-lnear n the parameters.
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationEM and Structure Learning
EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder
More informationMATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)
1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons
More informationThe big picture. Outline
The bg pcture Vncent Claveau IRISA - CNRS, sldes from E. Kjak INSA Rennes Notatons classes: C = {ω = 1,.., C} tranng set S of sze m, composed of m ponts (x, ω ) per class ω representaton space: R d (=
More informationLecture 3: Probability Distributions
Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the
More informationLinear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the
Chapter 11 Student Lecture Notes 11-1 Lnear regresson Wenl lu Dept. Health statstcs School of publc health Tanjn medcal unversty 1 Regresson Models 1. Answer What Is the Relatonshp Between the Varables?.
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces
More informationLinear Regression Introduction to Machine Learning. Matt Gormley Lecture 5 September 14, Readings: Bishop, 3.1
School of Computer Scence 10-601 Introducton to Machne Learnng Lnear Regresson Readngs: Bshop, 3.1 Matt Gormle Lecture 5 September 14, 016 1 Homework : Remnders Extenson: due Frda (9/16) at 5:30pm Rectaton
More informationLinear Classification, SVMs and Nearest Neighbors
1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush
More informationENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition
EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationLecture 2: Prelude to the big shrink
Lecture 2: Prelude to the bg shrnk Last tme A slght detour wth vsualzaton tools (hey, t was the frst day... why not start out wth somethng pretty to look at?) Then, we consdered a smple 120a-style regresson
More informationSee Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)
Count Data Models See Book Chapter 11 2 nd Edton (Chapter 10 1 st Edton) Count data consst of non-negatve nteger values Examples: number of drver route changes per week, the number of trp departure changes
More informationIntroduction to Regression
Introducton to Regresson Dr Tom Ilvento Department of Food and Resource Economcs Overvew The last part of the course wll focus on Regresson Analyss Ths s one of the more powerful statstcal technques Provdes
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationProbabilistic Classification: Bayes Classifiers. Lecture 6:
Probablstc Classfcaton: Bayes Classfers Lecture : Classfcaton Models Sam Rowes January, Generatve model: p(x, y) = p(y)p(x y). p(y) are called class prors. p(x y) are called class condtonal feature dstrbutons.
More informationWhich Separator? Spring 1
Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal
More informationRockefeller College University at Albany
Rockefeller College Unverst at Alban PAD 705 Handout: Maxmum Lkelhood Estmaton Orgnal b Davd A. Wse John F. Kenned School of Government, Harvard Unverst Modfcatons b R. Karl Rethemeer Up to ths pont n
More informationLimited Dependent Variables
Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages
More informationDepartment of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6
Department of Quanttatve Methods & Informaton Systems Tme Seres and Ther Components QMIS 30 Chapter 6 Fall 00 Dr. Mohammad Zanal These sldes were modfed from ther orgnal source for educatonal purpose only.
More informationConjugacy and the Exponential Family
CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the
More informationStatistical pattern recognition
Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationStatistics for Business and Economics
Statstcs for Busness and Economcs Chapter 11 Smple Regresson Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-1 11.1 Overvew of Lnear Models n An equaton can be ft to show the best lnear
More informationStatistics MINITAB - Lab 2
Statstcs 20080 MINITAB - Lab 2 1. Smple Lnear Regresson In smple lnear regresson we attempt to model a lnear relatonshp between two varables wth a straght lne and make statstcal nferences concernng that
More informationSection 8.3 Polar Form of Complex Numbers
80 Chapter 8 Secton 8 Polar Form of Complex Numbers From prevous classes, you may have encountered magnary numbers the square roots of negatve numbers and, more generally, complex numbers whch are the
More informationSTAT 3008 Applied Regression Analysis
STAT 3008 Appled Regresson Analyss Tutoral : Smple Lnear Regresson LAI Chun He Department of Statstcs, The Chnese Unversty of Hong Kong 1 Model Assumpton To quantfy the relatonshp between two factors,
More informationx i1 =1 for all i (the constant ).
Chapter 5 The Multple Regresson Model Consder an economc model where the dependent varable s a functon of K explanatory varables. The economc model has the form: y = f ( x,x,..., ) xk Approxmate ths by
More informationLaboratory 1c: Method of Least Squares
Lab 1c, Least Squares Laboratory 1c: Method of Least Squares Introducton Consder the graph of expermental data n Fgure 1. In ths experment x s the ndependent varable and y the dependent varable. Clearly
More informationComposite Hypotheses testing
Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter
More informationHidden Markov Models & The Multivariate Gaussian (10/26/04)
CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models
More informationLINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity
LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 31 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 6. Rdge regresson The OLSE s the best lnear unbased
More informationRecitation 2. Probits, Logits, and 2SLS. Fall Peter Hull
14.387 Rectaton 2 Probts, Logts, and 2SLS Peter Hull Fall 2014 1 Part 1: Probts, Logts, Tobts, and other Nonlnear CEFs 2 Gong Latent (n Bnary): Probts and Logts Scalar bernoull y, vector x. Assume y =
More informationEconomics 130. Lecture 4 Simple Linear Regression Continued
Economcs 130 Lecture 4 Contnued Readngs for Week 4 Text, Chapter and 3. We contnue wth addressng our second ssue + add n how we evaluate these relatonshps: Where do we get data to do ths analyss? How do
More informationDr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur
Analyss of Varance and Desgn of Exerments-I MODULE III LECTURE - 2 EXPERIMENTAL DESIGN MODELS Dr. Shalabh Deartment of Mathematcs and Statstcs Indan Insttute of Technology Kanur 2 We consder the models
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models
More informationDifferentiating Gaussian Processes
Dfferentatng Gaussan Processes Andrew McHutchon Aprl 17, 013 1 Frst Order Dervatve of the Posteror Mean The posteror mean of a GP s gven by, f = x, X KX, X 1 y x, X α 1 Only the x, X term depends on the
More informationSmall Area Interval Estimation
.. Small Area Interval Estmaton Partha Lahr Jont Program n Survey Methodology Unversty of Maryland, College Park (Based on jont work wth Masayo Yoshmor, Former JPSM Vstng PhD Student and Research Fellow
More informationHowever, since P is a symmetric idempotent matrix, of P are either 0 or 1 [Eigen-values
Fall 007 Soluton to Mdterm Examnaton STAT 7 Dr. Goel. [0 ponts] For the general lnear model = X + ε, wth uncorrelated errors havng mean zero and varance σ, suppose that the desgn matrx X s not necessarly
More informationCommunication with AWGN Interference
Communcaton wth AWG Interference m {m } {p(m } Modulator s {s } r=s+n Recever ˆm AWG n m s a dscrete random varable(rv whch takes m wth probablty p(m. Modulator maps each m nto a waveform sgnal s m=m
More informationExpected Value and Variance
MATH 38 Expected Value and Varance Dr. Neal, WKU We now shall dscuss how to fnd the average and standard devaton of a random varable X. Expected Value Defnton. The expected value (or average value, or
More informationj) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1
Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons
More informationHydrological statistics. Hydrological statistics and extremes
5--0 Stochastc Hydrology Hydrologcal statstcs and extremes Marc F.P. Berkens Professor of Hydrology Faculty of Geoscences Hydrologcal statstcs Mostly concernes wth the statstcal analyss of hydrologcal
More informationProperties of Least Squares
Week 3 3.1 Smple Lnear Regresson Model 3. Propertes of Least Squares Estmators Y Y β 1 + β X + u weekly famly expendtures X weekly famly ncome For a gven level of x, the expected level of food expendtures
More informationn α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0
MODULE 2 Topcs: Lnear ndependence, bass and dmenson We have seen that f n a set of vectors one vector s a lnear combnaton of the remanng vectors n the set then the span of the set s unchanged f that vector
More informationp(z) = 1 a e z/a 1(z 0) yi a i x (1/a) exp y i a i x a i=1 n i=1 (y i a i x) inf 1 (y Ax) inf Ax y (1 ν) y if A (1 ν) = 0 otherwise
Dustn Lennon Math 582 Convex Optmzaton Problems from Boy, Chapter 7 Problem 7.1 Solve the MLE problem when the nose s exponentally strbute wth ensty p(z = 1 a e z/a 1(z 0 The MLE s gven by the followng:
More informationLaboratory 3: Method of Least Squares
Laboratory 3: Method of Least Squares Introducton Consder the graph of expermental data n Fgure 1. In ths experment x s the ndependent varable and y the dependent varable. Clearly they are correlated wth
More informationLossy Compression. Compromise accuracy of reconstruction for increased compression.
Lossy Compresson Compromse accuracy of reconstructon for ncreased compresson. The reconstructon s usually vsbly ndstngushable from the orgnal mage. Typcally, one can get up to 0:1 compresson wth almost
More informationsince [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation
Econ 388 R. Butler 204 revsons Lecture 4 Dummy Dependent Varables I. Lnear Probablty Model: the Regresson model wth a dummy varables as the dependent varable assumpton, mplcaton regular multple regresson
More information9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov
9.93 Class IV Part I Bayesan Decson Theory Yur Ivanov TOC Roadmap to Machne Learnng Bayesan Decson Makng Mnmum Error Rate Decsons Mnmum Rsk Decsons Mnmax Crteron Operatng Characterstcs Notaton x - scalar
More information