The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is
|
|
- Wilfrid Preston
- 5 years ago
- Views:
Transcription
1 Example The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is log p 1 p = β 0 + β 1 f 1 (y 1 ) β d f d (y d ). The probit link is the inverse of the distribution function for the normal distribution, that is, with Φ(x) = 1 x exp ( y 2 ) dy 2π 2 the success probability is related to the linear predictor via and the link function is l = Φ 1. p = Φ(η), Niels Richard Hansen (Univ. Copenhagen) Statistics BI/E lecture January 12, / 22
2 Example The Poisson point probabilities are λ λx p(x) = e x! = ex log(λ) λ log(x!) thus we have θ = log(λ), b(θ) = e θ and c(x) = log(x!). The mean and variance are computed as and EX = db dθ (θ) = eθ = λ VX = d2 b dθ 2 (θ) = eθ = λ. The canonical link funtion is log and when the log of the mean is a linear combination of the covariates we often talk about a log-linear model. Niels Richard Hansen (Univ. Copenhagen) Statistics BI/E lecture January 12, / 22
3 Classification problems Medical diagnosis. Drug tests. Spam detection. Mail sorting based on automatic readings of post number. Gene discovery.... Niels Richard Hansen (Univ. Copenhagen) Statistics BI/E lecture January 12, / 22
4 Prediction problems Stock prices. Prices in general/interest rates. Temperature increase. Technical prediction problems, e.g. ELISA (calibration). Survival times given diagnosis.... Niels Richard Hansen (Univ. Copenhagen) Statistics BI/E lecture January 12, / 22
5 Models behind prediction Method based on probabilistic modeling is often called generative modeling we construct models that can actually generate data as opposed to pure prediction methods that produce mathematics/algorithms that predict but can not produce data. Pure prediction: Fit a least squares regression line to data. Use it for prediction. The line can not generate data. Model based prediction: Formulate the generative model including iid ε s, say, and their distribution. We can use the line for prediction but given that the model is adequate we can use the mdoel for adressing questions such as performance of the method. Niels Richard Hansen (Univ. Copenhagen) Statistics BI/E lecture January 12, / 22
6 Prediction and classification We want to predict the value of an unobserved random variable Y given that X = x. For this to be sensible X and Y should certainly not be independent. What we need is the conditional distribution of Y given X = x, which tells us precisely which values Y could take. If the joint density (or point probabilities) are f : E 1 E 2 (0, ), then the conditional distribution of Y given X = x has conditional density (or conditional point probabilities) f (y x) = f (x, y) f 1 (x) (1) where f 1 (x) = E 2 f (x, y)dy is the density for the marginal distribution of X. Niels Richard Hansen (Univ. Copenhagen) Statistics BI/E lecture January 12, / 22
7 Classification If E 2 is discrete while E 1 is continuous or discrete we often talk about classification in particular if E 2 = {0, 1}. If the conditional distribution of X given Y = y has density g(x y), the marginal distribution of X is P(X A) = y E 2 P(X A, Y = y) = y E 2 P(X A Y = y)f 2 (y) = y E 2 = A A g(x y)dxf 2 (y) y E 2 f 2 (y)g(x y)dx Niels Richard Hansen (Univ. Copenhagen) Statistics BI/E lecture January 12, / 22
8 Classification The previous computations show that the marginal distribution of X has density f 1 (x) = y E 2 f 2 (y)g(x y). The conditional distribution of Y given X = x has point probabilities f (y x) = f 2(y)g(x y). f 1 (x) Niels Richard Hansen (Univ. Copenhagen) Statistics BI/E lecture January 12, / 22
9 Maximum a posterior predictor If we know the conditional distribution of Y given X = x the maximum a posterior predictor is the predictor ŷ(x) = arg max y f (y x) = f 2(y)g(x y) f 1 (x) = f 2 (y)g(x y). Hence the maximum a posterior predictor also equals ŷ(x) = arg max y f 2 (y)g(x y) = arg max y {log f 2 (y) + log g(x y)}. This is extremely close mathematically to interpreting y as a parameter, and then compute the log-likelihood, log g(x y), function and add a penalty term log f 2 (y). However, a parameter is fixed when the experiment is repeated and a random variable is not. Niels Richard Hansen (Univ. Copenhagen) Statistics BI/E lecture January 12, / 22
10 Example If E 2 = R and the conditional distribution of Y given X = x is a normal distribution N(g(x), σ 2 ) then arg max y R f (y x) = ( ) 1 exp (y g(x))2 2πσ 2 2σ 2. Thus the maximum a posterior predictor is ( ) 1 exp 2πσ 2 (y g(x))2 2σ 2 = arg max y R (y g(x))2 = g(x). Thus the maximum a posterior predictor is the conditional expectation g(x) of Y given X = x. Niels Richard Hansen (Univ. Copenhagen) Statistics BI/E lecture January 12, / 22
11 Statistical decision theory With a loss function L : E 2 E 2 [0, ) and ŷ : E 1 E 2 any predictor, then L(y, ŷ(x)) is the loss of predicting ŷ(x) when Y = y. The distribution of the losses is summarized by the expected prediction error EPE(ŷ) = E(L(Y, ŷ(x ))). We define an optimal predictor as a predictor that minimizes EPE over the set of possible predictors. Niels Richard Hansen (Univ. Copenhagen) Statistics BI/E lecture January 12, / 22
12 Example The squared error loss is L(y, ŷ) = (y ŷ) 2. The optimal predictor using the squared error loss is the conditional expectation of Y given X = x (see notes for computations). Niels Richard Hansen (Univ. Copenhagen) Statistics BI/E lecture January 12, / 22
13 Classification With E 2 = {1,..., n} we talk about classification and the values of the losses, L(y, ŷ), can be organized as L(1, 1) L(1, 2)... L(1, n) L =... L(n, 1) L(n, 2)... L(n, n) Usually with diagonal elements being 0 and off-diagonals > 0. Then EPE(ŷ) = E(L(Y, ŷ(x ))) ( ) = L(y, ŷ(x)))f (y x) f 1 (x)dx. y Niels Richard Hansen (Univ. Copenhagen) Statistics BI/E lecture January 12, / 22
14 Zero-one loss function The zero-one loss function is given by L(i, j) = 1 whenever i j and L(i, i) = 0. Then L(Y, ŷ(x )) is a Bernoulli variable, and EPE(ŷ) equals the probability that the loss equals 1. Using the zero-one loss function the expected prediction error is the probability of misclassification. EPE(ŷ) = P(Y ŷ(x )) = f (y x) f 1 (x)dx. y ŷ(x) Niels Richard Hansen (Univ. Copenhagen) Statistics BI/E lecture January 12, / 22
15 Zero-one loss function EPE(ŷ) is minimized by minimizing f (y x) = 1 f (ŷ(x) x) every x E 1. y ŷ(x) The optimal choice of ŷ is ŷ(x) = arg max y f (y x), which is precisely the maximum a posterior predictor. In this context this is also known as the Bayes classifier and EPE(ŷ) = P(Y ŷ(x)) (the probability of misclassification) is called the Bayes rate. Niels Richard Hansen (Univ. Copenhagen) Statistics BI/E lecture January 12, / 22
16 Confusion matrix With E 2 = {0, 1} we denote the marginal distribution of Y by π and write π 0 and π 1 for the point probabilities. The Bayes classifier divides the sample space E 1 into two sets { g(x 0) G 0 = x E 1 g(x 1) > π } 1 π 0 and The confusion matrix is G 1 = { g(x 0) x E 1 g(x 1) < π } 1. π 0 Predicted y Observed y p 00 p 01 1 p 10 p 11 where p ij = P(Y = i, ŷ(x ) = j) (Bayes rate = p 01 + p 10 ). Niels Richard Hansen (Univ. Copenhagen) Statistics BI/E lecture January 12, / 22
17 Estimation Problem: Statistical decision theory is based on knowledge of the probability generating mechanism we never have that. One solution: Estimate the probability measure and compute the resulting optimal predictor (plug-in principle again). Keep in mind: We loose optimality when estimating. The estimated predictor may be good but essentially we are not able to tell. A central question is how the estimation affects the performance of the estimator. Niels Richard Hansen (Univ. Copenhagen) Statistics BI/E lecture January 12, / 22
18 Example If E 1 = R and P(X x, Y = y) = π y G y (x), y = 0, 1 then P(X x) = π 0 G 0 (x) + π 1 G 1 (x). If g(x y) is a density for the conditional distribution the marginal distribution of X has density f 1 (x) = π 0 g(x 0) + π 1 g(x 1). If the conditional distributions are N(µ(0), σ 2 ) and N(µ(1), σ 2 ), respectively, then the boundary between G 0 and G 1 is the point 2σ 2 (log π 0 log π 1 ) + µ(1) 2 µ(0) 2. 2(µ(1) µ(0)) Niels Richard Hansen (Univ. Copenhagen) Statistics BI/E lecture January 12, / 22
19 Example Consider the logistic regression model of Y given X. We obtain the estimates ˆα and ˆβ and the estimated point probability ˆp(x) of Y = 1 given X = x is ˆp(x) = exp(ˆα + ˆβx) 1 + exp(ˆα + ˆβx). The logit transformation p log(p/(1 p)) is monotonely increasing and ˆp(x) > 1/2 if and only if ˆα + ˆβx > 0. The Bayes classifier is therefore ŷ(x) = { 1 if ˆα + ˆβx > 0 0 if ˆα + ˆβx < 0. Observe that LD 50 is the boundary point that separates the G 0 set from the G 1 set. Niels Richard Hansen (Univ. Copenhagen) Statistics BI/E lecture January 12, / 22
20 Generalization error The following is phrased in a setup where we have a parameterized model (P θ ) θ Θ for the joint distribution of (X, Y ) in E 1 E 2 and we have iid observations (X 1, Y 1 ),..., (X n, Y n ). EPE(θ) = E θ (L(Y, ŷ θ (X ))) is the minimal expected prediction error as a function of θ. With reference to the plug-in principle EPE(ˆθ) is an estimator of the unknown optimal value of EPE. But this is not so interesting we want to know the expected prediction error of the non-optimal predictor ŷˆθ under P θ. The generalization error or test error is Err(θ) = E θ (L(Y, ŷˆθ(x ))), where the expectation E θ denotes expectation over the estimator ˆθ as well as an independent copy of (X, Y ). Niels Richard Hansen (Univ. Copenhagen) Statistics BI/E lecture January 12, / 22
21 What we want to know In reality we are mostly interested in the quantity R(ˆθ, θ) = E θ (L(Y, ŷˆθ(x )) ˆθ) E θ (L(Y, ŷ θ (X ))) which is the conditional expectation of the loss given the estimator. The inequality follows from the fact that ŷ θ is the optimal predictor under P θ. But we can t get our hands on R(ˆθ, θ) the plug-in principle just yields which is still not so interesting. R( ˆϑ, ˆϑ) = EPE( ˆϑ), The generalization error, Err(θ) = E θ (R(ˆθ, θ)), is the expected performance of the estimated predictor and is thus a measure of the quality of the estimator. Niels Richard Hansen (Univ. Copenhagen) Statistics BI/E lecture January 12, / 22
22 Plug-in estimate of Err We have a dataeset (x, y) = (x 1,..., x n, y 1,..., y n ) E1 n E 2 n and an estimate ˆϑ = θ(x, y). We want to compute the plug-in estimate Err( ˆϑ). Choose B sufficiently large and simulate B new independent, identically distributed data sets, (x 1, y 1 ),..., (x B, y B ) E, each simulation being from the probability measure P ˆϑ. Compute, for each data set (x i, y i ), i = 1,..., B, new estimates ˆϑ i = ˆθ(x i, y i ) using the estimator ˆθ and new optimal predictors ŷ ˆϑi. Compute R( ˆϑ i, ˆϑ) for i = 1,..., B. This may again be done via simulations. Compute ˆ Err( ˆϑ) = 1 B B R( ˆϑ i, ˆϑ). i=1 Niels Richard Hansen (Univ. Copenhagen) Statistics BI/E lecture January 12, / 22
When is MLE appropriate
When is MLE appropriate As a rule of thumb the following to assumptions need to be fulfilled to make MLE the appropriate method for estimation: The model is adequate. That is, we trust that one of the
More informationTopic 12 Overview of Estimation
Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More informationST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples
ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will
More informationGeneralized Linear Models
Generalized Linear Models David Rosenberg New York University April 12, 2015 David Rosenberg (New York University) DS-GA 1003 April 12, 2015 1 / 20 Conditional Gaussian Regression Gaussian Regression Input
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:
More informationLinear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52
Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationGenerative Learning algorithms
CS9 Lecture notes Andrew Ng Part IV Generative Learning algorithms So far, we ve mainly been talking about learning algorithms that model p(y x; θ), the conditional distribution of y given x. For instance,
More informationLogistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationAnswer Key for STAT 200B HW No. 7
Answer Key for STAT 200B HW No. 7 May 5, 2007 Problem 2.2 p. 649 Assuming binomial 2-sample model ˆπ =.75, ˆπ 2 =.6. a ˆτ = ˆπ 2 ˆπ =.5. From Ex. 2.5a on page 644: ˆπ ˆπ + ˆπ 2 ˆπ 2.75.25.6.4 = + =.087;
More informationBayesian Inference. Chapter 9. Linear models and regression
Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering
More informationGaussian discriminant analysis Naive Bayes
DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate
More informationFinal Exam # 3. Sta 230: Probability. December 16, 2012
Final Exam # 3 Sta 230: Probability December 16, 2012 This is a closed-book exam so do not refer to your notes, the text, or any other books (please put them on the floor). You may use the extra sheets
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationHT Introduction. P(X i = x i ) = e λ λ x i
MODS STATISTICS Introduction. HT 2012 Simon Myers, Department of Statistics (and The Wellcome Trust Centre for Human Genetics) myers@stats.ox.ac.uk We will be concerned with the mathematical framework
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Elias Tragas tragas@cs.toronto.edu October 3, 2016 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, 2016 1 / 23 Naive Bayes Bayes Rules: Naive
More informationGeneralized Linear Models
Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.
More informationPractice Examination # 3
Practice Examination # 3 Sta 23: Probability December 13, 212 This is a closed-book exam so do not refer to your notes, the text, or any other books (please put them on the floor). You may use a single
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationIntroduction to Logistic Regression
Introduction to Logistic Regression Problem & Data Overview Primary Research Questions: 1. What are the risk factors associated with CHD? Regression Questions: 1. What is Y? 2. What is X? Did player develop
More informationBayesian Decision Theory
Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian
More informationLDA, QDA, Naive Bayes
LDA, QDA, Naive Bayes Generative Classification Models Marek Petrik 2/16/2017 Last Class Logistic Regression Maximum Likelihood Principle Logistic Regression Predict probability of a class: p(x) Example:
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationStatistics & Data Sciences: First Year Prelim Exam May 2018
Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book
More informationBivariate scatter plots and densities
Multivariate descriptive statistics Bivariate scatter plots and densities Plotting two (related) variables is often called a scatter plot. It is a bivariate version of the rug plot. It can show something
More informationStatistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003
Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)
More informationCSC 411: Lecture 09: Naive Bayes
CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationUniversity of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries
University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationIntroduction to Logistic Regression
Misclassification 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.0 0.2 0.4 0.6 0.8 1.0 Cutoff Introduction to Logistic Regression Problem & Data Overview Primary Research Questions: 1. What skills are important
More informationBayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory
Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationLecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2)
Lectures on Machine Learning (Fall 2017) Hyeong In Choi Seoul National University Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Topics to be covered:
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationTo Hold Out or Not. Frank Schorfheide and Ken Wolpin. April 4, University of Pennsylvania
Frank Schorfheide and Ken Wolpin University of Pennsylvania April 4, 2011 Introduction Randomized controlled trials (RCTs) to evaluate policies, e.g., cash transfers for school attendance, have become
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationComments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert
Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be
More informationBIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation
BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)
More information7 Gaussian Discriminant Analysis (including QDA and LDA)
36 Jonathan Richard Shewchuk 7 Gaussian Discriminant Analysis (including QDA and LDA) GAUSSIAN DISCRIMINANT ANALYSIS Fundamental assumption: each class comes from normal distribution (Gaussian). X N(µ,
More informationλ(x + 1)f g (x) > θ 0
Stat 8111 Final Exam December 16 Eleven students took the exam, the scores were 92, 78, 4 in the 5 s, 1 in the 4 s, 1 in the 3 s and 3 in the 2 s. 1. i) Let X 1, X 2,..., X n be iid each Bernoulli(θ) where
More informationLecture 13: More on Binary Data
Lecture 1: More on Binary Data Link functions for Binomial models Link η = g(π) π = g 1 (η) identity π η logarithmic log π e η logistic log ( π 1 π probit Φ 1 (π) Φ(η) log-log log( log π) exp( e η ) complementary
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationLecture 3 Classification, Logistic Regression
Lecture 3 Classification, Logistic Regression Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se F. Lindsten Summary
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationIntroduction to Maximum Likelihood Estimation
Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:
More informationLogistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA
Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative 1. Fixed basis
More informationPerhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.
Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationSTA216: Generalized Linear Models. Lecture 1. Review and Introduction
STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationExercises and Answers to Chapter 1
Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean
More informationGeneralized Linear Models Introduction
Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,
More information10. Composite Hypothesis Testing. ECE 830, Spring 2014
10. Composite Hypothesis Testing ECE 830, Spring 2014 1 / 25 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve unknown parameters
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationKernel Density Estimation
Kernel Density Estimation If Y {1,..., K} and g k denotes the density for the conditional distribution of X given Y = k the Bayes classifier is f (x) = argmax π k g k (x) k If ĝ k for k = 1,..., K are
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationQualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf
Part : Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section
More informationLecture 11. Probability Theory: an Overveiw
Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24 The starting point in developing the
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationLinear Methods for Prediction
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationECON 3150/4150, Spring term Lecture 6
ECON 3150/4150, Spring term 2013. Lecture 6 Review of theoretical statistics for econometric modelling (II) Ragnar Nymoen University of Oslo 31 January 2013 1 / 25 References to Lecture 3 and 6 Lecture
More informationIntroduction to Bayesian Statistics
School of Computing & Communication, UTS January, 207 Random variables Pre-university: A number is just a fixed value. When we talk about probabilities: When X is a continuous random variable, it has a
More informationLinear Classification: Probabilistic Generative Models
Linear Classification: Probabilistic Generative Models Sargur N. University at Buffalo, State University of New York USA 1 Linear Classification using Probabilistic Generative Models Topics 1. Overview
More informationLogistic Regression and Generalized Linear Models
Logistic Regression and Generalized Linear Models Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/2 Topics Generative vs. Discriminative models In
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationModern Methods of Statistical Learning sf2935 Lecture 5: Logistic Regression T.K
Lecture 5: Logistic Regression T.K. 10.11.2016 Overview of the Lecture Your Learning Outcomes Discriminative v.s. Generative Odds, Odds Ratio, Logit function, Logistic function Logistic regression definition
More informationMaximum Likelihood Methods
Maximum Likelihood Methods Some of the models used in econometrics specify the complete probability distribution of the outcomes of interest rather than just a regression function. Sometimes this is because
More informationDimensionality Reduction for Exponential Family Data
Dimensionality Reduction for Exponential Family Data Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Andrew Landgraf July 2-6, 2018 Computational Strategies for Large-Scale
More informationFormulas for probability theory and linear models SF2941
Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms
More informationProblem Set 7. Ideally, these would be the same observations left out when you
Business 4903 Instructor: Christian Hansen Problem Set 7. Use the data in MROZ.raw to answer this question. The data consist of 753 observations. Before answering any of parts a.-b., remove 253 observations
More informationDensity estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas
0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,
More information