The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model
|
|
- Julie Ball
- 6 years ago
- Views:
Transcription
1 Back to Maximum Likelihood Give a geerative model f (x, y = k) =π k f k (x) Usig a geerative modellig approach, we assume a parametric form for f k (x) =f (x; k ) ad compute the MLE θ of θ =(π k, k ) k= based o the traiig data {x i, y i } i=. We the use a plug-i approach to perform classificatio p(y = k X = x, θ)= π kf (x; k ) j= π jf (x; j ) Eve for simple models, this ca prove difficult; e.g. for LDA, f (x; k )=N (x; µ k, Σ), ad the MLE estimate of Σ is ot full rak for p >. Oe aswer: simplify eve further, e.g. usig axis-aliged covariaces, but this is usually too crude. Aother aswer: regularizatio. The Bayesia Learig Framework Bayes Theorem: Give two radom variables X ad Θ, p(θ X) = p(x Θ)p(Θ) p(x) Likelihood: p(x Θ) Posterior: p(θ X) Prior: p(θ) Margial likelihood: p(x) = p(x Θ)p(Θ)dΘ Treat parameters as radom variables, ad process of learig is just computatio of posterior p(θ X). Summarizig the posterior: Posterior mode: θ MAP = argmax θ p(θ X). Maximum a posteriori. Posterior mea: θ mea = E[Θ X]. Posterior variace: Var[Θ X]. How to make decisios ad predictios? Decisio theory. How to compute posterior? Naïve Bayes Retur to the spam classificatio example with two-class aïve Bayes f (x i ; k )= p j= k xij kj ( kj) xij. The MLE estimates are give by i= (x ij =, y i = k) kj =, π k = k where k = i= I(y i = k). If a word j does ot appear i class k by chace, but it does appear i a documet x, the p(x y = k) = ad so posterior p(y = k x )=. Worse thigs ca happe: e.g., probability of documet uder all classes ca be, so posterior is ill-defied. Simple Example: Coi Tosses A very simple example: We have a coi with probability of comig up heads. Model coi tosses as iid Beroullis, =head, =tail. Lear about give dataset D =(x i ) i= of tosses. with j = i= (x i = j). Maximum likelihood f (D ) = ( ) ˆ ML = Bayesia approach: treat ukow parameter as a radom variable Φ. Simple prior: Φ U[, ]. Posterior distributio: p( D) = Z ( ), Z = Posterior is a Beta( +, + ) distributio. ( ) d = ( + )!!! 58 6
2 Simple Example: Coi Tosses Simple Example: Coi Tosses =, =, = =, =, = =, =4, = =, =65, = =, =686, = =, =74, = Posterior becomes peaked at true value =.7 as dataset grows. What about test data? The posterior predictive distributio is the coditioal distributio of x + give (x i ) i= : p(x + (x i ) i=) = = p(x +, (x i ) i=))p( (x i ) i=))d p(x + )p( (x i ) i=))d =( mea ) x+ ( mea ) x+ We predict o ew data by averagig the predictive distributio over the posterior. Accouts for ucertaity about. 6 6 Simple Example: Coi Tosses Posterior distributio captures all leart iformatio. Posterior mode: MAP = Posterior mea: mea = mea ( mea ) Posterior variace: Asymptotically, for large, variace decreases as / ad is give by the iverse of Fisher s iformatio. Posterior distributio coverges to true parameter as. Simple Example: Coi Tosses Posterior distributio is a kow aalytic form. I fact posterior distributio is i the same beta family as the prior. A example of a cojugate prior. A beta distributio Beta(a, b) with parameters a, b > is a expoetial family distributio with desity p( a, b) = Γ(a + b) Γ(a)Γ(b) a ( ) b where Γ(t) = u t e u du is the gamma fuctio. If the prior is Beta(a, b), the the posterior distributio is so is Beta(a +, b + ). p( D, a, b) = a+ ( ) b+ Hyperparameters a ad b are pseudo-couts, a imagiary iitial sample that reflects our prior beliefs about. 6 64
3 Beta Distributios Dirichlet Distributios (.,.) (.8,.8) (,) (,) (5,5) (,9) (,7) (5,5) (7,) (9,) (A) Support of the Dirichlet desity for =. (B) Dirichlet desity for α k =. (C) Dirichlet desity for α k = Bayesia Iferece for Multiomials Suppose x i {,...,} istead, ad we model (x i ) i= as iid multiomials: p(d π) = π xi = i= k= π k k with k = i= (x i = k) ad π k >, k= π k =. The cojugate prior is the Dirichlet distributio. Dir(α,...,α ) has parameters α k >, ad desity p(π) = Γ( k= α k) k= Γ(α k) k= π αk k o the probability simplex {π : π k >, k= π k = }. The posterior is also Dirichlet, with parameters (α k + k ) k=. Posterior mea is π mea k = α k + k j= α j + j Text Classificatio with (Less) Naïve Bayes Uder the Naïve Bayes model, the joit distributio of labels y i {,...,} ad data vectors x i {, } p is p(x i, y i )= i= = i= k= k= π k π k k p j= p j= kj ( kj) xij xij kj kj ( kj) k kj where k = i= (y i = k), kj = i= (y i = k, x ij = ). (y i=k) For cojugate prior, we ca use Dir((α k ) k= ) for π, ad Beta(a, b) for kj idepedetly. Because the likelihood factorizes, the posterior distributio over π ad ( kj ) also factorizes, ad posterior for π is Dir((α k + k ) k= ), ad for kj is Beta(a + kj, b + k kj )
4 Text Classificatio with (Less) Naïve Bayes For predictio give D =(x i, y i ) i= we ca calculate with Predicted class is p(x, y = k D) =p(y = k D)p(x y = k, D) p(y = k D) = α k + k + l= α l p(x j = y = k, D) = a + kj a + b + k p(y = k x D) = p(y = k D)p(x y = k, D) p(x D) Compared to ML plug-i estimator, pseudocouts help to regularize probabilities away from extreme values. Bayesia Learig Discussio Clear separatio betwee models, which frame learig problems ad ecapsulates prior iformatio, ad algorithms, which computes posteriors ad predictios. Bayesia computatios Most posteriors are itractable, ad algorithms eeded to efficietly approximate posterior: Mote Carlo methods (Markov chai ad sequetial varieties). Variatioal methods (variatioal Bayes, belief propagatio etc). No optimizatio o overfittig (!) but there ca still be model misfit. Tuig parameters Ψ ca be optimized (without eed for cross-validatio). p(x Ψ) = p(x θ)p(θ Ψ)dθ p(ψ X) = p(x Ψ)p(Ψ) p(x) Be Bayesia about Ψ compute posterior. Type II maximum likelihood fid Ψ maximizig p(x Ψ) Bayesia Learig ad Regularizatio Cosider a Bayesia approach to logistic regressio: itroduce a multivariate ormal prior for b, ad uiform (improper) prior for a. The prior desity is: p(a, b) =(πσ ) p e σ b Bayesia Learig Further Readigs The posterior is p(a, b D) exp σ b log( + exp( y i (a + b x i ))) i= Zoubi Ghahramai. Bayesia Learig. Graphical models. Videolectures. Gelma et al. Bayesia Data Aalysis. evi Murphy. Machie Learig: a Probabilistic Perspective. The posterior mode is the parameters maximizig the above, equivalet to miimizig the L -regularized empirical risk. Regularized empirical risk miimizatio is (ofte) equivalet to havig a prior ad fidig the maximum a posteriori (MAP) parameters. L regularizatio - multivariate ormal prior. L regularizatio - multivariate Laplace prior. From a Bayesia perspective, the MAP parameters are just oe way to summarize the posterior distributio. 7 7
5 Gaussia Processes.5.5 Gaussia Processes The prior p(f) ecodes our prior kowledge about the fuctio. What properties of the fuctio ca we icorporate? Multivariate ormal assumptio: f N (, G).5 Use a kerel fuctio κ to defie G:.5.5 G ij = κ(x i, x j) Suppose we are give a dataset cosistig of iputs x =(x i ) i= ad outputs y =(y i ) i=. Regressio: lear the uderlyig fuctio f (x). 7 f (x) f (x ) Expect regressio fuctios to be smooth: If x ad x are close by, the f (x) ad f (x ) have similar values, i.e. strogly correlated. N, κ(x, x) κ(x, x ) κ(x, x) κ(x, x ) I particular, wat κ(x, x ) κ(x, x) =κ(x, x ) Model: f N (, G) y i f i N (f i, σ ) 75 Gaussia Processes We ca model respose as oisy versio of a uderlyig fuctio f (x): y i f (x i ) N (f (x i ), σ ) Typical approach: parametrize f (x; β), ad lear β, e.g., f (x) = d β d j (x) j= More direct approach: sice f (x) is ukow, we take a Bayesia approach, itroduce a prior over fuctios, ad compute a posterior over fuctios Istead of tryig to work with the whole fuctio, just work with the fuctio values at the iputs f =(f (x ),...,f (x )) Gaussia Processes What does a multivariate ormal prior mea? Imagie x forms a very dese grid of data space. Simulate prior draws f N (, G) Plot f i vs x i for i =,...,. The prior over fuctios is called a Gaussia process (GP)
6 Gaussia Processes Differet kerels lead to differet fuctio characteristics. Gaussia Processes Carl Rasmusse. Tutorial o Gaussia Processes at NIPS Gaussia Processes f x N (, G) y f N (f, σ I) Posterior distributio: f y N (G(G + σ I) y, G G(G + σ I)G) Posterior predictive distributio: Suppose x is a test set. We ca exted our model to iclude the fuctio values f at the test set: f f x, x xx N, xx x x y f N (f, σ I) x x where zz is matrix with ijth etry κ(z i, z j). xx = G. Some maipulatio of multivariate ormals gives: f y N x x( xx + σ I) y, x x x x( xx + σ I) xx 78
Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationExponential Families and Bayesian Inference
Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where
More informationRegression and generalization
Regressio ad geeralizatio CE-717: Machie Learig Sharif Uiversity of Techology M. Soleymai Fall 2016 Curve fittig: probabilistic perspective Describig ucertaity over value of target variable as a probability
More informationNaïve Bayes. Naïve Bayes
Statistical Data Miig ad Machie Learig Hilary Term 206 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.uk/~sejdiov/sdmml : aother plug-i classifier
More information15-780: Graduate Artificial Intelligence. Density estimation
5-780: Graduate Artificial Itelligece Desity estimatio Coditioal Probability Tables (CPT) But where do we get them? P(B)=.05 B P(E)=. E P(A B,E) )=.95 P(A B, E) =.85 P(A B,E) )=.5 P(A B, E) =.05 A P(J
More informationProbability and MLE.
10-701 Probability ad MLE http://www.cs.cmu.edu/~pradeepr/701 (brief) itro to probability Basic otatios Radom variable - referrig to a elemet / evet whose status is ukow: A = it will rai tomorrow Domai
More informationBayesian Methods: Introduction to Multi-parameter Models
Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested
More informationOutline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019
Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /
More informationClassification with linear models
Lecture 8 Classificatio with liear models Milos Hauskrecht milos@cs.pitt.edu 539 Seott Square Geerative approach to classificatio Idea:. Represet ad lear the distributio, ). Use it to defie probabilistic
More informationExpectation-Maximization Algorithm.
Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5
CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio
More informationElementary manipulations of probabilities
Elemetary maipulatios of probabilities Set probability of multi-valued r.v. {=Odd} = +3+5 = /6+/6+/6 = ½ X X,, X i j X i j Multi-variat distributio: Joit probability: X true true X X,, X X i j i j X X
More informationClustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.
Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.
More informationOutline. L7: Probability Basics. Probability. Probability Theory. Bayes Law for Diagnosis. Which Hypothesis To Prefer? p(a,b) = p(b A) " p(a)
Outlie L7: Probability Basics CS 344R/393R: Robotics Bejami Kuipers. Bayes Law 2. Probability distributios 3. Decisios uder ucertaity Probability For a propositio A, the probability p(a is your degree
More informationLecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)
Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell
More informationMachine Learning 4771
Machie Learig 4771 Istructor: Toy Jebara Topic 14 Structurig Probability Fuctios for Storage Structurig Probability Fuctios for Iferece Basic Graphical Models Graphical Models Parameters as Nodes Structurig
More informationMathematical Statistics Anna Janicka
Mathematical Statistics Aa Jaicka Lecture XIV, 5.06.07 BAYESIAN STATISTICS Pla for Today. BayesiaStatistics a priori ad a posteriori distributios Bayesia estimatio: Maximum a posteriori probability(map)
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationLogit regression Logit regression
Logit regressio Logit regressio models the probability of Y= as the cumulative stadard logistic distributio fuctio, evaluated at z = β 0 + β X: Pr(Y = X) = F(β 0 + β X) F is the cumulative logistic distributio
More informationCSIE/GINM, NTU 2009/11/30 1
Itroductio ti to Machie Learig (Part (at1: Statistical Machie Learig Shou de Li CSIE/GINM, NTU sdli@csie.tu.edu.tw 009/11/30 1 Syllabus of a Itro ML course ( Machie Learig, Adrew Ng, Staford, Autum 009
More informationStatistical Pattern Recognition
Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationIntroductory statistics
CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More information1.010 Uncertainty in Engineering Fall 2008
MIT OpeCourseWare http://ocw.mit.edu.00 Ucertaity i Egieerig Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu.terms. .00 - Brief Notes # 9 Poit ad Iterval
More information5 : Exponential Family and Generalized Linear Models
0-708: Probabilistic Graphical Models 0-708, Sprig 206 5 : Expoetial Family ad Geeralized Liear Models Lecturer: Matthew Gormley Scribes: Yua Li, Yichog Xu, Silu Wag Expoetial Family Probability desity
More informationLet us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.
Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationLECTURE NOTES 9. 1 Point Estimation. 1.1 The Method of Moments
LECTURE NOTES 9 Poit Estimatio Uder the hypothesis that the sample was geerated from some parametric statistical model, a atural way to uderstad the uderlyig populatio is by estimatig the parameters of
More informationLecture 2 October 11
Itroductio to probabilistic graphical models 203/204 Lecture 2 October Lecturer: Guillaume Oboziski Scribes: Aymeric Reshef, Claire Verade Course webpage: http://www.di.es.fr/~fbach/courses/fall203/ 2.
More informationLecture 3: MLE and Regression
STAT/Q SCI 403: Itroductio to Resamplig Methods Sprig 207 Istructor: Ye-Chi Che Lecture 3: MLE ad Regressio 3. Parameters ad Distributios Some distributios are idexed by their uderlyig parameters. Thus,
More informationAxis Aligned Ellipsoid
Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple
More informationProbabilistic Unsupervised Learning
HT2015: SC4 Statistical Data Miig ad Machie Learig Dio Sejdiovic Departmet of Statistics Oxford http://www.stats.ox.ac.u/~sejdiov/sdmml.html Probabilistic Methods Algorithmic approach: Data Probabilistic
More informationMATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4
MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.
More informationMaximum Likelihood Estimation and Complexity Regularization
ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio
More informationBasics of Inference. Lecture 21: Bayesian Inference. Review - Example - Defective Parts, cont. Review - Example - Defective Parts
Basics of Iferece Lecture 21: Sta230 / Mth230 Coli Rudel Aril 16, 2014 U util this oit i the class you have almost exclusively bee reseted with roblems where we are usig a robability model where the model
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationCSE 527, Additional notes on MLE & EM
CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be
More informationNYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)
NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we
More informationResampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.
Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator
More informationMachine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring
Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor
More informationThis exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.
Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the
More informationChapter 6 Principles of Data Reduction
Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a
More informationECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization
ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where
More informationLecture 10: Universal coding and prediction
0-704: Iformatio Processig ad Learig Sprig 0 Lecture 0: Uiversal codig ad predictio Lecturer: Aarti Sigh Scribes: Georg M. Goerg Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationLecture 11 and 12: Basic estimation theory
Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationThe Expectation-Maximization (EM) Algorithm
The Expectatio-Maximizatio (EM) Algorithm Readig Assigmets T. Mitchell, Machie Learig, McGraw-Hill, 997 (sectio 6.2, hard copy). S. Gog et al. Dyamic Visio: From Images to Face Recogitio, Imperial College
More informationLecture 33: Bootstrap
Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece
More informationINF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification
INF 4300 90 Itroductio to classifictio Ae Solberg ae@ifiuioo Based o Chapter -6 i Duda ad Hart: atter Classificatio 90 INF 4300 Madator proect Mai task: classificatio You must implemet a classificatio
More informationR. van Zyl 1, A.J. van der Merwe 2. Quintiles International, University of the Free State
Bayesia Cotrol Charts for the Two-parameter Expoetial Distributio if the Locatio Parameter Ca Take o Ay Value Betwee Mius Iity ad Plus Iity R. va Zyl, A.J. va der Merwe 2 Quitiles Iteratioal, ruaavz@gmail.com
More informationLecture 10: Performance Evaluation of ML Methods
CSE57A Machie Learig Sprig 208 Lecture 0: Performace Evaluatio of ML Methods Istructor: Mario Neuma Readig: fcml: 5.4 (Performace); esl: 7.0 (Cross-Validatio); optioal book: Evaluatio Learig Algorithms
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete
More informationEECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1
EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum
More informationHypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance
Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?
More informationProbabilistic Unsupervised Learning
Statistical Data Miig ad Machie Learig Hilary Term 2016 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.u/~sejdiov/sdmml Probabilistic Methods
More informationChapter 6 Sampling Distributions
Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationIntroduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam
Itroductio to Artificial Itelligece CAP 601 Summer 013 Midterm Exam 1. Termiology (7 Poits). Give the followig task eviromets, eter their properties/characteristics. The properties/characteristics of the
More informationLecture 2: April 3, 2013
TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,
More informationn n i=1 Often we also need to estimate the variance. Below are three estimators each of which is optimal in some sense: n 1 i=1 k=1 i=1 k=1 i=1 k=1
MATH88T Maria Camero Cotets Basic cocepts of statistics Estimators, estimates ad samplig distributios 2 Ordiary least squares estimate 3 3 Maximum lielihood estimator 3 4 Bayesia estimatio Refereces 9
More informationFirst Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise
First Year Quatitative Comp Exam Sprig, 2012 Istructio: There are three parts. Aswer every questio i every part. Questio I-1 Part I - 203A A radom variable X is distributed with the margial desity: >
More informationA Note on Box-Cox Quantile Regression Estimation of the Parameters of the Generalized Pareto Distribution
A Note o Box-Cox Quatile Regressio Estimatio of the Parameters of the Geeralized Pareto Distributio JM va Zyl Abstract: Makig use of the quatile equatio, Box-Cox regressio ad Laplace distributed disturbaces,
More informationAlgorithms for Clustering
CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat
More information4.5 Multiple Imputation
45 ultiple Imputatio Itroductio Assume a parametric model: y fy x; θ We are iterested i makig iferece about θ I Bayesia approach, we wat to make iferece about θ from fθ x, y = πθfy x, θ πθfy x, θdθ where
More informationFactor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis
Lecture 10: Factor Aalysis ad Pricipal Compoet Aalysis Sam Roweis February 9, 2004 Whe we assume that the subspace is liear ad that the uderlyig latet variable has a Gaussia distributio we get a model
More informationUncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty
Bayes Classificatio Ucertaity & robability Baye's rule Choosig Hypotheses- Maximum a posteriori Maximum Likelihood - Baye's cocept learig Maximum Likelihood of real valued fuctio Bayes optimal Classifier
More informationSome Examples on Gibbs Sampling and Metropolis-Hastings methods
Soe Exaples o Gibbs Saplig ad Metropolis-Hastigs ethods S420/620 Itroductio to Statistical Theory, Fall 2012 Gibbs Sapler Saple a ultidiesioal probability distributio fro coditioal desities. Suppose d
More informationSolution of Final Exam : / Machine Learning
Solutio of Fial Exam : 10-701/15-781 Machie Learig Fall 2004 Dec. 12th 2004 Your Adrew ID i capital letters: Your full ame: There are 9 questios. Some of them are easy ad some are more difficult. So, if
More informationMixtures of Gaussians and the EM Algorithm
Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity
More informationRandom Variables, Sampling and Estimation
Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig
More informationOutline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression
REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques
More informationCEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering
CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio
More informationEE 6885 Statistical Pattern Recognition
EE 6885 Statistical Patter Recogitio Fall 5 Prof. Shih-Fu Chag http://www.ee.columbia.edu/~sfchag Lecture 6 (9/8/5 EE6887-Chag 6- Readig EM for Missig Features Textboo, DHS 3.9 Bayesia Parameter Estimatio
More informationStat 421-SP2012 Interval Estimation Section
Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More information6.867 Machine learning, lecture 7 (Jaakkola) 1
6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit
More informationLecture 9: September 19
36-700: Probability ad Mathematical Statistics I Fall 206 Lecturer: Siva Balakrisha Lecture 9: September 9 9. Review ad Outlie Last class we discussed: Statistical estimatio broadly Pot estimatio Bias-Variace
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More informationLectures on Stochastic System Analysis and Bayesian Updating
Lectures o Stochastic System Aalysis ad Bayesia Updatig Jue 29-July 13 2005 James L. Beck, Califoria Istitute of Techology Jiaye Chig, Natioal Taiwa Uiversity of Sciece & Techology Siu-Kui (Iva) Au, Nayag
More information6.867 Machine learning
6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples
More informationDimensionality Reduction vs. Clustering
Dimesioality Reductio vs. Clusterig Lecture 9: Cotiuous Latet Variable Models Sam Roweis Traiig such factor models (e.g. FA, PCA, ICA) is called dimesioality reductio. You ca thik of this as (o)liear regressio
More informationSimulation. Two Rule For Inverting A Distribution Function
Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump
More informationLecture 13: Maximum Likelihood Estimation
ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select
More informationMachine Learning. Logistic Regression -- generative verses discriminative classifier. Le Song /15-781, Spring 2008
Machie Learig 070/578 Srig 008 Logistic Regressio geerative verses discrimiative classifier Le Sog Lecture 5 Setember 4 0 Based o slides from Eric Xig CMU Readig: Cha. 3..34 CB Geerative vs. Discrimiative
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber
More informationStat410 Probability and Statistics II (F16)
Some Basic Cocepts of Statistical Iferece (Sec 5.) Suppose we have a rv X that has a pdf/pmf deoted by f(x; θ) or p(x; θ), where θ is called the parameter. I previous lectures, we focus o probability problems
More informationMachine Learning.
10-701 Machie Learig http://www.cs.cmu.edu/~epxig/class/10701-15f/ Orgaizatioal ifo All up-to-date ifo is o the course web page (follow liks from my page). Istructors - Eric Xig - Ziv Bar-Joseph TAs: See
More informationOverview of Estimation
Topic Iferece is the problem of turig data ito kowledge, where kowledge ofte is expressed i terms of etities that are ot preset i the data per se but are preset i models that oe uses to iterpret the data.
More informationEstimation for Complete Data
Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of
More informationApproximations and more PMFs and PDFs
Approximatios ad more PMFs ad PDFs Saad Meimeh 1 Approximatio of biomial with Poisso Cosider the biomial distributio ( b(k,,p = p k (1 p k, k λ: k Assume that is large, ad p is small, but p λ at the limit.
More informationQuestions and Answers on Maximum Likelihood
Questios ad Aswers o Maximum Likelihood L. Magee Fall, 2008 1. Give: a observatio-specific log likelihood fuctio l i (θ) = l f(y i x i, θ) the log likelihood fuctio l(θ y, X) = l i(θ) a data set (x i,
More information17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15
17. Joit distributios of extreme order statistics Lehma 5.1; Ferguso 15 I Example 10., we derived the asymptotic distributio of the maximum from a radom sample from a uiform distributio. We did this usig
More informationPattern Classification
Patter Classificatio All materials i these slides were tae from Patter Classificatio (d ed) by R. O. Duda, P. E. Hart ad D. G. Stor, Joh Wiley & Sos, 000 with the permissio of the authors ad the publisher
More informationIntroduction to Probability I: Expectations, Bayes Theorem, Gaussians, and the Poisson Distribution. 1
Itroductio to Probability I: Expectatios, Bayes Theorem, Gaussias, ad the Poisso Distributio. 1 Pakaj Mehta February 25, 2019 1 Read: This will itroduce some elemetary ideas i probability theory that we
More information