Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books
|
|
- Evangeline Johnston
- 5 years ago
- Views:
Transcription
1 STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books
2 STA 44/04 Jan 6, administration / 5
3 STA 44/04 Jan 6, 00 / 5... administration collection of tools for regression and classification some old (least squares, discriminant analysis) some new (lasso, support vector machines) statistical justifications: loss, likelihood, mean squared error, classification error, posterior probabilities... statistical thinking framework for analysing new tools
4 STA 44/04 Jan 6, 00 4 / 5 Linear regression plus variable selection: forward, backward, stepwise, all possible subsets comparing models: adjusted R, C p vs. p, AIC and variants, K -fold cross-validation shrinkage methods: ridge regression, lasso, Least Angle Regression tuning parameter (amount of shrinkage): validation data, K -fold cross-validation derived variables: principal components regression, partial least squares C p and AIC can be shown to be estimates of expected prediction error
5 STA 44/04 Jan 6, 00 Predictions with smoothing regression on training data: x s are centered and scaled when fitting Lasso, LAR, and ridge regression ˆβ 0 is not included in shrinkage (.4) p p ˆβ ridge = argmin β (yi β 0 x ij β j ) + λ (.5) ˆβ lasso = argmin β (yi β 0 ˆβ 0 = ȳ = ȳ train for predicting a new y 0 = ˆβ 0 + p j= p x ij β j ) + λ j= j= (x j 0 j= β j p β j j= x j ) ˆβ j : use ˆβ 0 = ȳ train, and x j means average for jth feature on the training data see construction of tx and mm in TableR.txt 5 / 5
6 STA 44/04 Jan 6, 00 6 / 5...predictions > options(digits=4) > as.vector(predict.lars(pr.lars,newx=test[,:8],type="fit",s=0.6, mode="fraction")$fit) [] [] [5] > as.vector(tx %*% coef(pr.lars,s=0.6,mode="fraction"))+mean(train$lpsa) [] [] [5]
7 STA 44/04 Jan 6, 00 7 / 5 Derived features.5 replace x,... x p with linear combinations of columns principal components from SVD are natural candidates X = UDV T, U T U = I N, V T V = I p z m = Xv m, m =,..., M < p z m are orthogonal by construction ŷ pcr (M) = ȳ + M m= ˆθ m z m ˆθ m = z m, y n z m, z m = i= z miy i n i= z mi inputs should be scaled first (mean 0, variance ) Angle brackets notation explained at (.5). Exercise: z m = 0??
8 STA 44/04 Jan 6, 00 8 / 5... derived features closely related method Partial least squares also constructs derived variables widely used in chemometrics, where often p > N see.6 for discussion
9
10 STA 44/04 Jan 6, 00 0 / 5 (Chapter 4) Linear methods for classification inputs X = X,..., X p (notation x used on p.0) output Y takes values in one of K classes output G is a group label: values,..., K response Y as needed (Y G), e.g. Y =, 0 as G = blue, orange Fig.; eq. (.7) data (x i, g i ), i =,... N goal to learn a model to predict the correct class for a future output, based on inputs
11 STA 44/04 Jan 6, 00 / 5 Linear methods rule: G = linear function of inputs something, else G = (with extensions for K > ) see Fig.. for a nonlinear method note that boundaries can be curved by including terms like Xj and X j X k : see Fig. 4. types of linear methods: linear regression 4. Y = β 0 + β T X linear discriminant analysis 4. logistic regression 4.4 log separating hyperplanes 4.5 P(Y = ) P(Y = 0) = β 0 + β T X
12 STA 44/04 Jan 6, 00 / 5 Linear regression 4. Y = y... y K y... y K... y N... y NK y i = (y i,..., y ik ) multivariate Bernoulli E(Y X) = XB ˆB = (X T X) X T Y : dimension? new observation (, x0 T ), new prediction (, x0 T )ˆB = (ˆf,...,ˆf K ): find the largest = x T x T X = C.. A xn T
13 STA 44/04 Jan 6, 00 / 5... linear regression if K = this is linear regression of 0/ response will produce predictions larger than, smaller than 0 more natural to consider instead modelling Pr(Y = X) and finding methods that predict in (0, ) but if data supports 0. < Pr(Y = X) < 0.8 results will not be very different Figure 4. targets t k (p. 04): t k = (0,...,,..., 0) }{{} k find the largest {ˆf (x 0 ) t k } ; another way to state the least squares solution
14 STA 44/04 Jan 6, 00 4 / 5 Discriminant analysis ( 4.) G {,,..., K }, f k (x) = f (x G = k) =density of x in class k new ingredient: density of inputs Bayes Theorem: Pr(G = k x) = f (x G = k)π k f (x) k =,..., K associated classification rule: assign a new observation to class k if Pr(G = k x) > Pr(G = k x) (maximize the posterior probability) k k
15 STA 44/04 Jan 6, 00 max{log π k + x T Σ µ k k µt k Σ µ k } 5 / 5 special case - Normal x G = k N p (µ k, Σ k ) Pr(G = k x) π k ( π) p Σ k / exp (x µ k) T Σ k (x µ k) which is maximized by maximizing the log: max{log π k k log Σ k (x µ k) T Σ k (x µ k)} if we further assume Σ k = Σ, then max{log π k k (x µ k) T Σ (x µ k )} max{log π k k (x T Σ x x T Σ µ k µ k Σ x + µ T k Σ µ k )}
16 STA 44/04 Jan 6, 00 6 / 5 Procedure: compute δ k (x) = log π k + x T Σ µ k µt k Σ µ k classify observation x to class k if δ k (x) largest (see Figure 4.5, left) Estimate unknown parameters π k, µ k, Σ: ˆπ k = N k N, ˆµ k = x i N k i:g i =k K Σ = (x i ˆµ k )(x i ˆµ k ) T /(N K ) k= i:g i =k Figure 4.5, 4., 4.
17 STA 44/04 Jan 6, 00 7 / 5 Special case: classes log ˆπ + x T ˆΣ ˆµ ˆµT ˆΣ ˆµ > (<) log ˆπ + x T ˆΣ ˆµ ˆµT ˆΣ ˆµ, x T ˆΣ (ˆµ ˆµ ) > (<) ˆµ ˆΣ ˆµ ˆµ ˆΣ ˆµ + log(n /N) log(n /N) LHS is a linear combination of the inputs; RHS is the cutpoint If Σ k not all equal, the discriminant function δ k (x) defines a quadratic boundary; see Figure 4.6, left An alternative is to augment the original set of features with quadratic terms and use linear discriminant functions; see Figure 4.6, right
18 STA 44/04 Jan 6, 00 8 / 5 Another description of LDA ( 4.., 4..) Let W = within class covariance matrix K k= i:g i =k (x i x k )(x i x k ) T B = between class covariance matrix K k= N k( x k x)( x k x) T B + W = K (x i x)(x i x) T = T k= i:g i =k linear classification rule a T x equivalently max a a T Ba a T Wa max a T Ba, subject to a T Wa = a
19 STA 44/04 Jan 6, 00 9 / 5... LDA Solution a, say, is the eigenvector of W B corresponding to the largest eigenvalue. This determines a line in R p. continue, finding a, orthogonal (with respect to W) to a, which is the eigenvector corresponding to the second largest eigenvalue, and so on. There are at most min(p, K ) positive eigenvalues. These eigenvectors are the linear discriminants, also called canonical variates. This technique can be useful for visualization of the groups. Figure 4.
20 STA 44/04 Jan 6, 00 0 / 5... LDA ( 4..) write ˆΣ = UDU T, where U T U = I, D is diagonal (see p.09 for ˆΣ) X = D / U T X, with ˆΣ = I classification rule is to choose k if ˆµ k is closest (closest class centroid) only needs the K points ˆµ k, and the K dimension subspace to compute this, since remaining directions are orthogonal (in the X space) if K = can plot the first two variates (cf wine data) Figures 4.4 and 4.8 algorithm on p.4
21 STA 44/04 Jan 6, 00 R code for the wine data > library(mass) > wine.lda <- lda(class alcohol + malic + ash + alcil + mag + totphen + flav + nonflav + proanth + col + hue + dil + proline, data = wine) > wine.lda Call: lda.formula(class alcohol + malic + ash + alcil + mag + totphen + flav + nonflav + proanth + col + hue + dil + proline, data = wine) Prior probabilities of groups: Group means: alcohol malic ash alcil mag totphen flav nonflav proanth col hue dil proline Coefficients of linear discriminants: LD LD alcohol malic ash alcil mag totphen flav nonflav / 5
22 STA 44/04 Jan 6, first linear discriminant second linear discriminant / 5
23 STA 44/04 Jan 6, 00 Logistic regression data (x i, g i ), g i =, ; equivalently (x i, y i ), y i = 0, natural starting point is Bernoulli distribution for y i : y i = with probability p i = p i (β) likelihood function log-likelihood L(β) = N i= p y i i ( p i ) y i l(β) = Σ N i= y i log p i (β) + ( y i ) log{ p i (β)} A common choice for p i (β) is the logistic function p i (β) = exp(βt x i ) + exp(β T x i ) x i is a column vector: the ith row of X is x T i / 5
24 STA 44/04 Jan 6, 00 4 / 5 log-likelihood Likelihood methods l(β) = Σ N i= {y iβ T x i log( + e βt x i )} (constant term included in the set of inputs; β has length p + ) Maximum likelihood estimate of β: l(β) β = 0 Σ N i= y ix ij = Σ N i= p i( ˆβ)x ij, j =,..., p β= ˆβ Fisher information l(β) β β T β= ˆβ = Σ N i= x ix T i ˆp i ( ˆp i ) Fitting: use an iteratively reweighted least squares algorithm; equivalent to Newton-Raphson; p. Asymptotics: ˆβ d N(β, { l ( ˆβ)} )
25 STA 44/04 Jan 6, 00 5 / 5 Inference Component: ˆβ j. N(β j, ˆσ j ) ˆσ j = [{ l ( ˆβ)} ] jj ; gives a t-test (z-test) for each component {l( ˆβ) l(β j, β j )}. χ dimβ j ; in particular for each component get a χ, or equivalently sign( ˆβ j β j ) [{l( ˆβ) l(β j, β j )}. N(0, ) this gives two (asymptotically equivalent) ways to test if an input is statistically significant To compare models M 0 M can use this twice to get {l M ( ˆβ) l M0 ( β q )}. χ p q which provides a test of the adequacy of M 0 LHS is the difference in (residual) deviances; analogous to SS in regression
26 STA 44/04 Jan 6, 00 Heart Data > library(elemstatlearn) > data(saheart) > hr = SA.heart > pairs(hr[:9],pch=,bg=c("red","green")[codes(factor(hr$chd))]) > hr.glm = glm(chd., data = hr, family=binomial) > summary(hr.glm) Call: glm(formula = chd., family = binomial, data = hr) Deviance Residuals: Min Q Median Q Max Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) e-06 *** sbp tobacco ** ldl ** adiposity famhistpresent e-05 *** typea ** obesity alcohol age *** --- Signif. codes: *** 0.00 ** 0.0 * (Dispersion parameter for binomial family taken to be ) Null deviance: 596. on 46 degrees of freedom Residual deviance: 47.4 on 45 degrees of freedom AIC: / 5
27 tobacco sbp adiposity ldl typea famhist alcohol age obesity
28 STA 44/04 Jan 6, 00 8 / 5 extensions E(y i ) = p i, var(y i ) = p i ( p i ) under Bernoulli Often the model is generalized to allow var(y i ) = φp i ( p i ); called over-dispersion Most software provides an estimate of φ based on residuals. if y i Binom(n i, p i ) same model applies E(y i ) = n i p i and var(y i ) = n i p i ( p i ) under Binomial Model selection uses AIC: l( ˆβ) + p In R, use glm to fit logistic regression, step for model selection glm can be used for all exponential family models: uses iteratively reweighted least squares See also 4.4.
29 STA 44/04 Jan 6, 00 9 / 5 step > step(hr.glm) Start: AIC=49.4 chd sbp + tobacco + ldl + adiposity + famhist + typea + obesity + alcohol + age Df Deviance AIC - alcohol adiposity sbp <none> obesity ldl tobacco typea age famhist Step: AIC=490.4 chd sbp + tobacco + ldl + adiposity + famhist + typea + obesity + age Df Deviance AIC - adiposity sbp <none> obesity ldl tobacco typea age famhist
30 STA 44/04 Jan 6, 00 0 / 5... step Step: AIC= chd sbp + tobacco + ldl + famhist + typea + obesity + age Df Deviance AIC - sbp <none> obesity tobacco ldl typea famhist age Step: AIC= chd tobacco + ldl + famhist + typea + obesity + age Df Deviance AIC - obesity <none> tobacco typea ldl famhist age
31 STA 44/04 Jan 6, 00 / 5... step Step: AIC= chd tobacco + ldl + famhist + typea + age Df Deviance AIC <none> ldl typea tobacco famhist age Call: glm(formula = chd tobacco + ldl + famhist + typea + age, family = binomial, data Coefficients: (Intercept) tobacco ldl famhistpresent typea age Degrees of Freedom: 46 Total (i.e. Null); Null Deviance: 596. Residual Deviance: AIC: Residual
32 STA 44/04 Jan 6, 00 / 5 Interpretation of coefficients e.g. tobacco (measured in kg): coeff= 0.08 logit {p i (β)} = β T x i increase in one unit of x ij, say, leads to increase in logit p i of 0.08 increase in p i /( p i ) of exp(0.08) =.084. estimated s.e. 0.06, logit p i ± 0.06, exp( ), exp( )) is (.0,.4). similarly for age: ˆβ j = 0.044; increased odds.045 for year increase prediction of new values to class or 0 according as ˆp > (<)0.5
33 STA 44/04 Jan 6, 00 / 5 L regularization eqn (4.) N [ ] max y β 0,β i (β 0 + β T x i ) log( + e β 0+β T x i ) λ i= note that intercept has been separated out Figure 4. interpretation? recently developed software: glmpath and glmnet p β j j=
34 STA 44/04 Jan 6, 00 4 / 5 Notes how to choose between logistic regression and discriminant analysis? classification error on the training data is overly optimistic logistic regression and generalizations to K classes doesn t assume any distribution for the inputs discriminant analysis more efficient if the assumed distribution is correct warning: in 4. x and x i are p vectors, and we estimate β 0 and β, the latter a p vector in 4.4 they are (p + ) with first element equal to and β is (p + ).
35 STA 44/04 Jan 6, 00 5 / 5 Misclassification > hr.glm.class = predict(hr.glm)>0 > table(hr.glm.class,hr$chd) hr.glm.class 0 FALSE TRUE 46 8 > hr.glm = glm(chd tobacco + ldl + famhist + age, data = hr, family=binomial) > table(predict(hr.glm)>0, hr$chd) 0 FALSE TRUE > hr.lda = lda(chd., data = hr) > table(predict(hr.lda)$class,hr$chd)
STA 450/4000 S: January
STA 450/4000 S: January 6 005 Notes Friday tutorial on R programming reminder office hours on - F; -4 R The book Modern Applied Statistics with S by Venables and Ripley is very useful. Make sure you have
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 12: Logistic regression (v1) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 30 Regression methods for binary outcomes 2 / 30 Binary outcomes For the duration of this
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 9: Logistic regression (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 28 Regression methods for binary outcomes 2 / 28 Binary outcomes For the duration of this lecture suppose
More informationRecap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis:
1 / 23 Recap HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis: Pr(G = k X) Pr(X G = k)pr(g = k) Theory: LDA more
More informationLogistic Regression and Generalized Linear Models
Logistic Regression and Generalized Linear Models Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/2 Topics Generative vs. Discriminative models In
More informationLecture 5: LDA and Logistic Regression
Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More information1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches
Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model
More informationClassification 2: Linear discriminant analysis (continued); logistic regression
Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:
More informationLinear Methods for Prediction
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More informationChap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University
Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationApplied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne
Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Motivation: Why Applied Statistics?
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationLogistic Regressions. Stat 430
Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to
More informationCMSC858P Supervised Learning Methods
CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors
More informationA Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46
A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationR Hints for Chapter 10
R Hints for Chapter 10 The multiple logistic regression model assumes that the success probability p for a binomial random variable depends on independent variables or design variables x 1, x 2,, x k.
More informationLogistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction
More informationTopics in this module. Learning material for this module: Move to: James et al (2013): An Introduction to Statistical Learning. Chapter 2.2.3, 4.
Module 4: CLASSIFICATION TMA4268 Statistical Learning V2018 Mette Langaas and Julia Debik, Department of Mathematical Sciences, NTNU week 5 2018 (Version 31.01.2018) Learning material for this module:
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationLinear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model
Regression: Part II Linear Regression y~n X, 2 X Y Data Model β, σ 2 Process Model Β 0,V β s 1,s 2 Parameter Model Assumptions of Linear Model Homoskedasticity No error in X variables Error in Y variables
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationLogistic Regression - problem 6.14
Logistic Regression - problem 6.14 Let x 1, x 2,, x m be given values of an input variable x and let Y 1,, Y m be independent binomial random variables whose distributions depend on the corresponding values
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationRegularized Discriminant Analysis and Reduced-Rank LDA
Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and
More informationLogistic Regression 21/05
Logistic Regression 21/05 Recall that we are trying to solve a classification problem in which features x i can be continuous or discrete (coded as 0/1) and the response y is discrete (0/1). Logistic regression
More informationLecture 5: Classification
Lecture 5: Classification Advanced Applied Multivariate Analysis STAT 2221, Spring 2015 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department of Mathematical Sciences Binghamton
More informationClassification 1: Linear regression of indicators, linear discriminant analysis
Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification
More informationClassification Methods II: Linear and Quadratic Discrimminant Analysis
Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear
More informationSTATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours
Instructions: STATS216v Introduction to Statistical Learning Stanford University, Summer 2017 Remember the university honor code. Midterm Exam (Solutions) Duration: 1 hours Write your name and SUNet ID
More informationHigh-dimensional regression modeling
High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More informationUNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator
UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationSTAC51: Categorical data Analysis
STAC51: Categorical data Analysis Mahinda Samarakoon April 6, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 25 Table of contents 1 Building and applying logistic regression models (Chap
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science
UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator
More informationGeneralized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model
Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationGeneralized Linear Models. Last time: Background & motivation for moving beyond linear
Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered
More informationApplied Multivariate and Longitudinal Data Analysis
Applied Multivariate and Longitudinal Data Analysis Discriminant analysis and classification Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 Consider the examples: An online banking service
More informationOn the Inference of the Logistic Regression Model
On the Inference of the Logistic Regression Model 1. Model ln =(; ), i.e. = representing false. The linear form of (;) is entertained, i.e. ((;)) ((;)), where ==1 ;, with 1 representing true, 0 ;= 1+ +
More informationToday. HW 1: due February 4, pm. Aspects of Design CD Chapter 2. Continue with Chapter 2 of ELM. In the News:
Today HW 1: due February 4, 11.59 pm. Aspects of Design CD Chapter 2 Continue with Chapter 2 of ELM In the News: STA 2201: Applied Statistics II January 14, 2015 1/35 Recap: data on proportions data: y
More informationLinear Decision Boundaries
Linear Decision Boundaries A basic approach to classification is to find a decision boundary in the space of the predictor variables. The decision boundary is often a curve formed by a regression model:
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More information22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression
22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP
More informationModel Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection
Model Selection in GLMs Last class: estimability/identifiability, analysis of deviance, standard errors & confidence intervals (should be able to implement frequentist GLM analyses!) Today: standard frequentist
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 9: Basis Expansions Department of Statistics & Biostatistics Rutgers University Nov 01, 2011 Regression and Classification Linear Regression. E(Y X) = f(x) We want to learn
More informationMSG500/MVE190 Linear Models - Lecture 15
MSG500/MVE190 Linear Models - Lecture 15 Rebecka Jörnsten Mathematical Statistics University of Gothenburg/Chalmers University of Technology December 13, 2012 1 Regularized regression In ordinary least
More informationBiostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences
Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per
More informationMath for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han
Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR
More informationCourse in Data Science
Course in Data Science About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst. The course gives an
More informationIntroduction to the Logistic Regression Model
CHAPTER 1 Introduction to the Logistic Regression Model 1.1 INTRODUCTION Regression methods have become an integral component of any data analysis concerned with describing the relationship between a response
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationLog-linear Models for Contingency Tables
Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A
More informationTwo Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00
Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section
More information12 Modelling Binomial Response Data
c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationLecture 16 Solving GLMs via IRWLS
Lecture 16 Solving GLMs via IRWLS 09 November 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Notes problem set 5 posted; due next class problem set 6, November 18th Goals for today fixed PCA example
More informationSupervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing
Supervised Learning Unsupervised learning: To extract structure and postulate hypotheses about data generating process from observations x 1,...,x n. Visualize, summarize and compress data. We have seen
More informationStandard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j
Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )
More informationVarious Issues in Fitting Contingency Tables
Various Issues in Fitting Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Complete Tables with Zero Entries In contingency tables, it is possible to have zero entries in a
More informationAnswer Key for STAT 200B HW No. 8
Answer Key for STAT 200B HW No. 8 May 8, 2007 Problem 3.42 p. 708 The values of Ȳ for x 00, 0, 20, 30 are 5/40, 0, 20/50, and, respectively. From Corollary 3.5 it follows that MLE exists i G is identiable
More informationNeural networks (not in book)
(not in book) Another approach to classification is neural networks. were developed in the 1980s as a way to model how learning occurs in the brain. There was therefore wide interest in neural networks
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,
More informationStat 579: Generalized Linear Models and Extensions
Stat 579: Generalized Linear Models and Extensions Yan Lu Jan, 2018, week 3 1 / 67 Hypothesis tests Likelihood ratio tests Wald tests Score tests 2 / 67 Generalized Likelihood ratio tests Let Y = (Y 1,
More informationSTAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.
STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. Rebecca Barter May 5, 2015 Linear Regression Review Linear Regression Review
More informationSingle-level Models for Binary Responses
Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More informationFSAN815/ELEG815: Foundations of Statistical Learning
FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression Fall 2014 Course Objectives & Structure Course Objectives & Structure The course provides an introduction
More information10702/36702 Statistical Machine Learning, Spring 2008: Homework 3 Solutions
10702/36702 Statistical Machine Learning, Spring 2008: Homework 3 Solutions March 24, 2008 1 [25 points], (Jingrui) (a) Generate data as follows. n = 100 p = 1000 X = matrix(rnorm(n p), n, p) beta = c(rep(10,
More informationMATH 829: Introduction to Data Mining and Analysis Logistic regression
1/11 MATH 829: Introduction to Data Mining and Analysis Logistic regression Dominique Guillot Departments of Mathematical Sciences University of Delaware March 7, 2016 Logistic regression 2/11 Suppose
More information7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis
Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression
More informationMachine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart
Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationBIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation
BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)
More informationMSH3 Generalized linear model Ch. 6 Count data models
Contents MSH3 Generalized linear model Ch. 6 Count data models 6 Count data model 208 6.1 Introduction: The Children Ever Born Data....... 208 6.2 The Poisson Distribution................. 210 6.3 Log-Linear
More informationMSH3 Generalized linear model
Contents MSH3 Generalized linear model 5 Logit Models for Binary Data 173 5.1 The Bernoulli and binomial distributions......... 173 5.1.1 Mean, variance and higher order moments.... 173 5.1.2 Normal limit....................
More informationLDA, QDA, Naive Bayes
LDA, QDA, Naive Bayes Generative Classification Models Marek Petrik 2/16/2017 Last Class Logistic Regression Maximum Likelihood Principle Logistic Regression Predict probability of a class: p(x) Example:
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationIntroduction to the Generalized Linear Model: Logistic regression and Poisson regression
Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Statistical modelling: Theory and practice Gilles Guillot gigu@dtu.dk November 4, 2013 Gilles Guillot (gigu@dtu.dk)
More informationGeneralized Linear Models 1
Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter
More information