Homework 2: Solutions
|
|
- Rose Poole
- 5 years ago
- Views:
Transcription
1 Homework 2: Solutions Statistics 63 Fall 207 Theoretical Problems:. Since ˆβ = arg min β { Y X β 22/2n + λ β }, we have: Y X ˆβ 2 2/2n + λ ˆβ Y X β 0 2 2/2n + λ β 0 Also β 0 is the true parameter value. Y = X β 0 + ɛ. Hence: X β 0 + ɛ X ˆβ 2 2/2n + λ ˆβ X β 0 + ɛ X β 0 2 2/2n + λ β 0 X( ˆβ β 0 ɛ 2 2/2n + λ ˆβ ɛ 2 2/2n + λ β 0 X( ˆβ β 0 2 2/2n ɛ T X( ˆβ β 0 /n + λ ˆβ λ β 0 X( ˆβ β 0 2 2/n + 2λ ˆβ 2ɛ T X( ˆβ β 0 /n + 2λ β 0 2. Since Therefore: The Hessian: θ k J(θ = n log( + e yx T θ = θ k + e yx = h yxt θ θ( yxyx + e y(i θ T x (i y(i x (i k = n i= 2 i= H kl = J(θ θ k θ l = h θ ( y (i x (i y (i x (i k n θ i= l = h θ (x (i ( h θ (x (i x (i l x (i k n i= h θ ( y (i x (i y (i x (i k The last equality uses that: for g(z = /(+e z, g (z = g(z( g(z. Therefore, for h(x = g(θ T x, h(x θ k = h(x( h(xx k. So we have for the Hessian matrix H: H = h(x (i ( h(x (i x (i x (it n To prove H is positive semidefinite, we show z T Hz 0 for all z: i= z T Hz = ( n n zt h(x (i ( h(x (i x (i x z (it = n = n i= h(x (i ( h(x (i z T x (i x (it z i= h(x (i ( h(x (i (z T x (i 2 0 i= The last inequality holds since 0 h(x (i, which implies h(x (i ( h(x (i 0,and (z T x (i 2 0.
2 3. (a The between-class covariance is Σ B = n K n k µ k µ T k k= where n k is the number of observations in the k th class. (b Since Y is the n K indicator matrix of class label, X T Y gives us a p K matrix [ n µ n 2 µ 2 ] n K µ K. Therefore Σ B = n (XT Y(Y T Y (X T Y T (c We have Σ W = n K k= {i:y ik =} (X i µ k (X i µ k T where X i is the i th column of X. We can also write Σ W as Σ W = n XT ( I Y(Y T Y Y T X Hence Σ B + Σ W = n (XT Y(Y T Y (X T Y T + ( n XT I Y(Y T Y Y T X = n XT Y(Y T Y Y T X + ( n XT I Y(Y T Y Y T X = n XT X = Σ T 4. Let us assume that the data has been centered so that the grand mean, µ = 0. Let K be the total number of classes X be the data matrix Y be a n K indicator matrix of class membership n i be the number of samples in class i K N = n i be the total number of samples i= µ be the grand mean of the data, by assumption 0µ i be the (estimated center of class i Σ W be the within-class covariance Σ B be the between-class covariance Before digging into details, note that Y T X = ( n µ n 2 µ 2... T n k µ K which gives (Y T Y Y T X = M = ( µ µ 2... µ K T R K p Notation-wise, it may help to recall that M is an upper-case µ. 2
3 To see this, note that M i,j = (µ i j = n i n k=:y k =i = n i n k=:y k =i (x k j X k,j = n yk =i X k,j n i k= = n Y k,i X k,j n i k= = n i n k= Y T i,k X k,j = n i (Y T X i,j where the (i, i-th element of (Y T Y is n i and we don t worry about cross terms since (Y T Y is diagonal. 2 Recall that, for a general centered data set Z of k observations, the covariance is given by k ZT Z Applying this principal to M = (Y T Y Y T X, we have: Σ B = M T M K = ((Y T Y Y T X T (Y T Y Y T X K = XT Y (Y T Y (Y T Y Y T X K = XT Y (Y T Y 2 Y T X K From here, recall that Σ T = N XT X = N XT [ Y (Y T Y Y T + I Y (Y T Y Y T ] X = N XT [ Y (Y T Y Y T ] X + N XT [ I Y (Y T Y Y T ] X = Σ B + Σ W 2 (Y T Y i,j = n k= Y T i,k Y k,j = n k= Y k,iy k,j = n k= y k =i yk =j = n k= y k =i=j = i=j n i. (Y T Y is diagonal because the inverse of a diagonal matrix is simply the matrix with the diagonal (non-zero elements inverted. 3
4 as claimed in lecture. This can also be verified directly: (Σ W i,j = K = K = K K (Σ (k W i,j k= K E[X,i X,j X = k] E[X,i X = k]e[x,j X = k] k= K n k k= n l= y l =k x l,i x l,j µ k,i µ k,j ( K = n k K n k n k= ( K = K n k= ( K = K n k= l= ( K ( = X T K n i,lx l,j K µt µ l= ( ( = X T n i,lx l,j K µt µ l= i,j ( ( = n XT X K µt µ ( x l,i x l,j µ k,i µ k,j K l= k= ( X l,i X l,j µ T K i,kµ k,j l= k= ( Xi,lX T l,j K µt µ i,j = (Σ T i,j (Σ B i,j This reflects a general result of probability theory, the Law of Total Variance: Total variance of X {}}{ Within group variance {}}{ Between group variance {}}{ Var(X = E[Var(X Y ] + Var(E[X Y ] = E[Var(X Y ] = Var(X Var(E[X Y ] From here forward, let us assume without loss of generality that K = 2 and n = n 2 (hence N = 2 n. Equivalence of LDA and FDA: With the above relationships worked out, we can now prove the equivalence of LDA and FDA. Recall that FDA solves the problem: maximize β i,j i,j β T Σ B β subject to β T Σ W β = This is a generalized eigenvalue problem and can be solved easily. We can write it in Lagrangian form and take the gradient with respect to β: i,j L = β T Σ B β λ(β T Σ W β 0 = β L = Σ B β = λσ W β Σ W Σ Bβ = λβ = 2Σ B β 2λΣ W β 4
5 assuming Σ W is invertible. 3 Hence, our solution vector β is the first eigenvector of Σ W Σ B. 4 Alternatively, we can consider FDA as the problem of finding w which maximizes the ratio of the between- and within-class variances: J(w = wt Σ B w w T Σ W w This problem does not have a unique solution (J(w = J(αw for any α R, w R p but our decision rule depends only on the scale of w so this isn t a problem and we can play a bit fast-and-loose with constants. Taking the gradient of J( and setting it equal to zero we find: 0 = w J = (w T Σ B w(2σ W w = (w T Σ W w(2σ B w = (wt Σ W w(2σ B w (w T Σ B w(2σ W w 2Σ W w 2 2 (w T Σ B w(σ W w = (w T Σ W w(σ B w = Σ W w Σ B w Here we note that Σ B w will always lie in the span of µ 2 µ so we have: or which defines the discriminant vector. Σ W w µ 2 µ w Σ W (µ 2 µ Now consider LDA. From Cf. [HTF09, Eq. 4.9], we know that the decision boundary for two-class LDA is a line of the form: ( n 0 = log n 2 2 (µ + µ l T Σ W (µ µ 2 + x T Σ W (µ µ 2 ( n = log µt Σ W µ µ T Σ W µ 2 + µ T 2 Σ W µ µ T 2 Σ W µ 2 + x T Σ W 2 (µ µ 2 = log = log n 2 ( n n 2 ( n n 2 µt Σ W µ µ T Σ W µ 2 + Transpose of a scalar is itself {}}{ µ T Σ W µ µt 2 Σ W µ 2 µ T Σ W µ + x T Σ W 2 (µ µ 2 µ T 2 Σ W µ 2 + x T Σ W (µ µ 2 Hence the decision boundary lies along the span of Σ W (µ µ 2 By construction, it is clear that Span(µ µ 2 = range(σ B so we have the same line as before and hence the same decision boundary. 5 Equivalence of FDA and Optimal Scoring: Next we show that FDA and Optimal Scoring are equivalent. 3 A reasonable assumption since Σ W is a covariance matrix (and hence positive semi-definite by construction. If it is not, then our data lies in a linear manifold and we should apply some form of dimension reduction before classification. 4 If we considered the K class case, FDA would identify K eigenvectors. Note here that Σ W Σ B has only one non-zero eigenvector under the centering constraint. 5 For completeness, we should show that the constant from LDA has a relationship with the decision boundary from FDA. I omit this step. 5
6 We first find a solution to the Optimal Scoring problem: minimize β,θ Y Θ Xβ 2 2 subject to Θ T Y T Y Θ = Let us fix β temporarily and optimize with respect to Θ R 2. Moving the constraint into a penalty in the Lagrangian form of the problem, we cast this as a generalized ridge regression problem: 6 minimize Θ Y Θ Xβ λθ T Y T Y Θ with solution given by: L = Y Θ Xβ λθ T Y T Y Θ 0 = Θ L = 2Y T (Y Θ Xβ + 2λY T Y Θ 2Y T Xβ = 2Y T Y Θ + 2λY T Y Θ Y T Xβ = (Y T Y + λy T Y Θ Θ = (Y T Y + λy T Y Y T Xβ = + λ (Y T Y Y T Xβ Y Θ = + λ Y (Y T Y Y T Xβ Note here that Y T Y is the diagonal matrix of counts so it is invertible. Next we choose λ so that the original problem is feasible: = Y Θ 2 2 [ ] T [ ] = + λ Y (Y T Y Y T Xβ + λ Y (Y T Y Y T Xβ = ( + λ 2 βt X T Y (Y T Y Y T Y (Y T Y Y T Xβ T = ( + λ 2 βt X T Y (Y T Y Y T Xβ T = λ = β T X T Y (Y T Y Y T Xβ T Substituting this back into the original optimal scoring problem, we find the optimal β is that which satisfies: minimize 2 β T Σ B β + β T Σ T β β or equivalently 6 With the substitutions: minimize 2 β T Σ B β + β T Σ B β + β T Σ W β β Generalized Ridge Optimal Scoring β Θ Ω Y T Y Y Xβ X Y 6
7 To avoid clutter, let β = Σ /2 W β and Σ B = Σ /2 W Σ BΣ /2 W. Our problem then becomes minimize 2 β T Σ B β + βt ( Σ B + I β β Suppose ˆβ is a non-trivial solution to this problem: we then have ˆβ Σ B ˆβ > 0 so ˆβ satisfies Σ B ˆβ ˆβ T Σ B ˆβ and hence ˆβ is an eigenvector of Σ B. Making the same notational substitutions into FDA, we see that FDA is characterized by the first eigenvector of Σ B, hence the solutions are equal. This proof is due to [WT, Appendix A.6] with Ω = 0. [HBT95, Section 3] gives an alternate proof of this result (based on a clever use of the SVD which you may find clearer. Note: This equivalence only holds for the unpenalized form of these classifiers. The equivalence is broken for the penalized forms of these penalties. See jrojo/4th-lehmann/ slides/witten.pdf or [WT] References [HBT95] Trevor Hastie, Andreas Buja, and Robert Tibshirani. Penalized discriminant analysis. Annals of Statistics, 23(:73 02, [HTF09] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, 2nd edition, February tibs/elemstatlearn/. [WT] Daniel M. Witten and Robert Tibshirani. Penalized classification using fisher s linear discriminant. Journal of the Royal Statistical Society, Series B (Statistical Methodology, 73(5: ,
8 STAT 63 Homework II Yanjun Yang Data Analysis (a The test misclassification errors for these classifiers are: Test Error Rate NB LDA QDA Logistic Regression (LR LR with lasso LR with ridge LR with elastic net alpha = Linear SVMs v. For regularized logistic regression, I tried lasso, ridge and an elastic net penalty. If I need to choose one, I wound choose the ridge, because the predictors are grayscale values, they are highly correlated for hand writing zip codes. (b Logistic regression with the ridge regularization and the linear SVMs gave the best performance. These predictors are highly correlated, especially for nearby pixels. On the other hand, since 3 s and 8 s are only different on the left side, only these predictors are essential to classify 3 s and 8 s. Therefore, the regularized logistic regression with the ridge penalty performs best, considering the highly correlated predictors, the ridge performs better than the lasso. Linear SVMs with a slack variable also perform best because they did a similar job if we view it as a penalization method. Meanwhile, the poor performance of QDA indicates it overfitted the data. The decision boundaries are closer to linear. The R code are attached after problem 2 of this part.
9 2 (a The test misclassification errors for these classifiers are: Test Error Rate NB LDA Multinomial Regression (MR MR with lasso MR with ridge MR with elastic net alpha = MR with grouped lasso Linear SVMs (one vs one For regularized multinomial regression, I tried lasso, ridge, an elastic net and grouped lasso penalty. If I need to choose one, I would choose the grouped lasso, because the predictors are highly correlated and only part of them are essential in this classification. For linear SVMs, I used one-vs-one methods to implement a multi-class SVM. Because one-vs-one is usually more accurate than one-vs-all and we only have 0 classes here the computational cost is affordable. (b Linear SVMs performs best, followed by regularized multinomial regression with a grouped lasso regularization. Because the decision boundaries are close to linear and the predictors are highly correlated, in this situation, linear SVMs with slack variables would perform very well. The confusion matrix for these classifiers are: i. Naive Bayes: y yhat
10 Most often misclassified is 4 ( 9, followed by 5 ( 6 and 3 ( 8. ii. LDA: y yhat Most often misclassified is 2 ( 4 or 8, followed by 5 ( 3 and 8 ( 3. iii. Multinomial regression: y yhat Most often misclassified is 8 ( 5, 0 or 2, followed by 2 ( 4 or 8 and 4 ( 9. iv. Regularized multinomial regression (with a grouped lasso penalty 3
11 y yhat Most often misclassified is 2 ( 4 or 8, followed by 4 ( 2 or 9 and 8 ( 0 or 5. v. Linear SVMs (one-vs-one y yhat Most often misclassified is 2 ( 4 or 8, followed by 3 ( 5 and 8 ( 5 or 0. The most often misclassified by all methods are 2 and 8, they ranked top 3 in 4 of these 5 classifiers. Because they look similar to a lot other classes in hand writing, for example, 2 looks similar to 4, 5, 8 etc and 8 looks similar to 0, 2, 3, 5 etc. R code are attached together with problem s below. train = as.matrix(read.csv(file="zip.train.csv", header=false test = as.matrix(read.csv(file="zip.test.csv", header=false 4
12 #Problem Binary Classification: #select 3 and 8 for problem tetrain = train[(train[,]==3 train[,]==8,] tetest = test[(test[,]==3 test[,]==8,] tetrainy = as.factor(tetrain[,] tetrainx = as.matrix(tetrain[,-] tetesty = as.factor(tetest[,] tetestx = as.matrix(tetest[,-] #Naive Bayes library("e07" mod.nb = naivebayes(x=tetrainx, y=as.factor(tetrainy pred.nb = predict(mod.nb,newdata = tetestx, type = "class" conmat.nb = table(pred.nb, tetesty err.nb = - mean(pred.nb == tetesty #LDA library(mass tetrain=data.frame(tetrain tetest=data.frame(tetest mod.lda = lda(v~., data = tetrain pred.lda = predict(mod.lda, tetest conmat.lda = table(pred.lda$class, tetesty err.lda = - mean(pred.lda$class == tetesty #QDA tetrain.j <- tetrain tetrain.j[, -] <- apply(tetrain[,-],2,jitter mod.qda = qda(v~., data = tetrain.j pred.qda = predict(mod.qda, tetest conmat.qda = table(pred.qda$class, tetesty err.qda = - mean(pred.qda$class== tetesty #Logistic Regression tetrain.b <- tetrain tetrain.b[,]=(tetrain[,]==3 tetest.b <- tetest tetest.b[,]=(tetest[,]==3 mod.lr = glm(v~., data = tetrain.b, family = "binomial" pred.lr = predict(mod.lr, newdata = tetest.b, type = " 5
13 response" pred.lr2=rep("8", length(tetesty pred.lr2[pred.lr>.5]="3" conmat.lr = table(pred.lr2, tetesty err.lr = -mean(pred.lr2==tetesty #Regularized logistic regression library("glmnet" #lasso tetrainy.b = as.numeric(tetrain.b[,] tetrainx.b = as.matrix(tetrain.b[,-] tetesty.b = as.numeric(tetest.b[,] tetestx.b = as.matrix(tetest.b[,-] mod.rlr.lasso = glmnet(tetrainx.b, tetrainy.b, family = " binomial", alpha = cv.rlr.lasso = cv.glmnet(tetrainx.b, tetrainy.b, family = " binomial", alpha= bestlam = cv.rlr.lasso$lambda.min pred.rlr.lasso0 = predict(mod.rlr.lasso, s=bestlam, newx = tetestx, type = "response" pred.rlr.lasso=rep("8", length(tetesty pred.rlr.lasso[pred.rlr.lasso0>.5]="3" conmat.rlr.lasso = table(pred.rlr.lasso, tetesty err.rlr.lasso = - mean(pred.rlr.lasso==tetesty #ridge mod.rlr.ridge = glmnet(tetrainx.b, tetrainy.b, family = " binomial", alpha = 0 cv.rlr.ridge = cv.glmnet(tetrainx.b, tetrainy.b, family = " binomial", alpha=0 bestlam = cv.rlr.ridge$lambda.min pred.rlr.ridge0 = predict(mod.rlr.ridge, s=bestlam, newx = tetestx, type = "response" pred.rlr.ridge=rep("8", length(tetesty pred.rlr.ridge[pred.rlr.ridge0>.5]="3" conmat.rlr.ridge = table(pred.rlr.ridge, tetesty err.rlr.ridge = -mean(pred.rlr.ridge==tetesty #enet mod.rlr.ent = glmnet(tetrainx, tetrainy, family = "binomial", 6
14 alpha = 0.8 cv.rlr.ent = cv.glmnet(tetrainx, tetrainy, family = "binomial ", alpha=0.8 bestlam = cv.rlr.ent$lambda.min pred.rlr.ent0 = predict(mod.rlr.ent, s=bestlam, newx = tetestx, type = "response" pred.rlr.ent=rep("8", length(tetesty pred.rlr.ent[pred.rlr.ent0>.5]="3" conmat.rlr.ent = table(pred.rlr.ent, tetesty err.rlr.ent = -mean(pred.rlr.ent==tetesty #LiblineaR might be helpful as well #Linear SVMs tetraindat = data.frame(x=tetrainx, y=as.factor(tetrainy tetestdat = data.frame(x=tetestx, y=as.factor(tetesty tune.svm = tune(svm, y~., data=tetraindat, kernel="linear", ranges = list(cost=c (0.00,0.005,0.0,0.05,0.,,5,0,00, scale = FALSE bestsvmmod = tune.svm$best.model pred.svm = predict(bestsvmmod, tetestdat conmat.svm = table(pred.svm, tetesty err.svm = -mean(pred.svm == tetesty #Summary bierror <- rbind(err.nb, err.lda, err.qda, err.lr, err.rlr.lasso, err.rlr.ridge, err.rlr.ent, err.svm colnames(bierror <- c("test Error Rate" rownames(bierror <- c("nb", "LDA", "QDA", "Logistic Regression (LR", "LR with lasso", "LR with ridge", "LR with elastic net alpha = 0.8", "Linear SVMs" #Problem 2 Multi-class Classification: trainy = as.factor(train[,] trainx = as.matrix(train[,-] testy = as.factor(test[,] testx = as.matrix(test[,-] #NB mod.nb2 = naivebayes(x=trainx, y=as.factor(trainy 7
15 pred.nb2 = predict(mod.nb2, newdata = testx, type = "class" conmat.nb2 = table(yhat=pred.nb2, y=testy err.nb2 = - mean(pred.nb2 == testy #LDA traindat = data.frame(x=trainx, y=as.factor(trainy testdat = data.frame(x=testx, y=as.factor(testy mod.lda2 = lda(y~., data = traindat pred.lda2 = predict(mod.lda2, testdat conmat.lda2 = table(yhat = pred.lda2$class, y=testy err.lda2 = -mean(pred.lda2$class == testy #Multinomial regression library(nnet mod.mr2 = multinom(y~., traindat, MaxNWts = 3000 pred.mr2 = predict(mod.mr2, testdat conmat.mr2 = table(yhat=pred.mr2, y=testy err.mr2 = -mean(pred.mr2 == testy #Regularized multinomial regression #lasso mod.rmr.lasso2 = glmnet(trainx, as.factor(trainy, family = " multinomial", alpha = cv.rmr.lasso2 = cv.glmnet(trainx, as.factor(trainy, family = "multinomial", alpha = bestlam = cv.rmr.lasso2$lambda.min pred.rmr.lasso2 = predict(mod.rmr.lasso2, s=bestlam, newx = testx, type = "class" conmat.lasso2 = table(yhat=pred.rmr.lasso2, y=testy err.lasso2 = - mean(pred.rmr.lasso2 == testy #ridge mod.rmr.ridge2 = glmnet(trainx, as.factor(trainy, family = " multinomial", alpha = 0 cv.rmr.ridge2 = cv.glmnet(trainx, as.factor(trainy, family = "multinomial", alpha = 0 bestlam = cv.rmr.ridge2$lambda.min pred.rmr.ridge2 = predict(mod.rmr.ridge2, s=bestlam, newx = testx, type = "class" conmat.ridge2 = table(yhat = pred.rmr.ridge2, y = testy 8
16 err.ridge2 = -mean(pred.rmr.ridge2 == testy #enet mod.rmr.enet2 = glmnet(trainx, as.factor(trainy, family = " multinomial", alpha = 0.8 cv.rmr.enet2 = cv.glmnet(trainx, as.factor(trainy, family = " multinomial", alpha = 0.8 bestlam = cv.rmr.enet2$lambda.min pred.rmr.enet2 = predict(mod.rmr.enet2, s=bestlam, newx = testx, type = "class" conmat.enet2 = table(yhat = pred.rmr.enet2, y = testy err.enet2 = -mean(pred.rmr.enet2 == testy #lasso 2 (grouped mod.rmr.lasso3 = glmnet(trainx, as.factor(trainy, family = " multinomial", type.multinomial = "grouped", alpha = cv.rmr.lasso3 = cv.glmnet(trainx, as.factor(trainy, family = "multinomial", type.multinomial = "grouped", alpha = bestlam = cv.rmr.lasso3$lambda.min pred.rmr.lasso3 = predict(mod.rmr.lasso3, s=bestlam, newx = testx, type = "class" conmat.lasso3 = table(yhat = pred.rmr.lasso3, y = testy err.lasso3 = -mean(pred.rmr.lasso3 == testy #Linear SVMs #this function used one vs one method to implement a multi- class SVM. tune.svm2 = tune(svm, y~., data = traindat, kernel = "linear", ranges = list(cost=c(0.00,0.005,0.0,0.05,0.,,5,0, scale = FALSE bestsvmmod2 = tune.svm2$best.model pred.svm2 = predict(bestsvmmod2, testdat conmat.svm2 = table(yhat = pred.svm2, y = testy err.svm2 = -mean(pred.svm2 == testy #Summary error <- rbind(err.nb2, err.lda2, err.mr2, err.lasso2, err. ridge2, err.enet2, err.lasso3, err.svm2 colnames(error <- c("test Error Rate" rownames(error <- c("nb", "LDA", "Multinomial Regression (MR 9
17 ", "MR with lasso", "MR with ridge", "MR with elastic net alpha = 0.8", "MR with grouped lasso", "Linear SVMs (one vs one" 0
MSA200/TMS041 Multivariate Analysis
MSA200/TMS041 Multivariate Analysis Lecture 8 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Back to Discriminant analysis As mentioned in the previous
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationSTATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010
STATS306B Discriminant Analysis Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Classification Given K classes in R p, represented as densities f i (x), 1 i K classify
More informationClassification and Support Vector Machine
Classification and Support Vector Machine Yiyong Feng and Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) ELEC 5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline
More informationData Mining 2018 Logistic Regression Text Classification
Data Mining 2018 Logistic Regression Text Classification Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Data Mining 1 / 50 Two types of approaches to classification In (probabilistic)
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationClassification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).
Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationA Least Squares Formulation for Canonical Correlation Analysis
A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation
More informationMachine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber
Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 7/8 - High-dimensional modeling part 1 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationFeature Engineering, Model Evaluations
Feature Engineering, Model Evaluations Giri Iyengar Cornell University gi43@cornell.edu Feb 5, 2018 Giri Iyengar (Cornell Tech) Feature Engineering Feb 5, 2018 1 / 35 Overview 1 ETL 2 Feature Engineering
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationData Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction
More informationMath for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han
Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR
More informationCMSC858P Supervised Learning Methods
CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationarxiv: v3 [stat.ml] 14 Apr 2016
arxiv:1307.0048v3 [stat.ml] 14 Apr 2016 Simple one-pass algorithm for penalized linear regression with cross-validation on MapReduce Kun Yang April 15, 2016 Abstract In this paper, we propose a one-pass
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationLecture 16 Solving GLMs via IRWLS
Lecture 16 Solving GLMs via IRWLS 09 November 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Notes problem set 5 posted; due next class problem set 6, November 18th Goals for today fixed PCA example
More informationSupport Vector Machine I
Support Vector Machine I Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please use piazza. No emails. HW 0 grades are back. Re-grade request for one week. HW 1 due soon. HW
More informationSUPPORT VECTOR MACHINE
SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationMultivariate statistical methods and data mining in particle physics
Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More informationClassification Methods II: Linear and Quadratic Discrimminant Analysis
Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationStatistical Methods for Data Mining
Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP
More informationWeek 5: Classification
Big Data BUS 41201 Week 5: Classification Veronika Ročková University of Chicago Booth School of Business http://faculty.chicagobooth.edu/veronika.rockova/ [5] Classification Parametric and non-parametric
More informationUnsupervised Learning: Dimensionality Reduction
Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationMachine Learning. Lecture 3: Logistic Regression. Feng Li.
Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification
More informationClassification: Linear Discriminant Analysis
Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based
More informationRegularization Paths
December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and
More informationFrom dummy regression to prior probabilities in PLS-DA
JOURNAL OF CHEMOMETRICS J. Chemometrics (2007) Published online in Wiley InterScience (www.interscience.wiley.com).1061 From dummy regression to prior probabilities in PLS-DA Ulf G. Indahl 1,3, Harald
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationRegularized Discriminant Analysis and Its Application in Microarray
Regularized Discriminant Analysis and Its Application in Microarray Yaqian Guo, Trevor Hastie and Robert Tibshirani May 5, 2004 Abstract In this paper, we introduce a family of some modified versions of
More informationECE 661: Homework 10 Fall 2014
ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationA Study of Relative Efficiency and Robustness of Classification Methods
A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics
More informationKernel Logistic Regression and the Import Vector Machine
Kernel Logistic Regression and the Import Vector Machine Ji Zhu and Trevor Hastie Journal of Computational and Graphical Statistics, 2005 Presented by Mingtao Ding Duke University December 8, 2011 Mingtao
More information6.867 Machine Learning
6.867 Machine Learning Problem Set 2 Due date: Wednesday October 6 Please address all questions and comments about this problem set to 6867-staff@csail.mit.edu. You will need to use MATLAB for some of
More informationNeural networks (NN) 1
Neural networks (NN) 1 Hedibert F. Lopes Insper Institute of Education and Research São Paulo, Brazil 1 Slides based on Chapter 11 of Hastie, Tibshirani and Friedman s book The Elements of Statistical
More informationInternational Journal of Pure and Applied Mathematics Volume 19 No , A NOTE ON BETWEEN-GROUP PCA
International Journal of Pure and Applied Mathematics Volume 19 No. 3 2005, 359-366 A NOTE ON BETWEEN-GROUP PCA Anne-Laure Boulesteix Department of Statistics University of Munich Akademiestrasse 1, Munich,
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationGraphical Model Selection
May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationRegularized Discriminant Analysis and Its Application in Microarrays
Biostatistics (2005), 1, 1, pp. 1 18 Printed in Great Britain Regularized Discriminant Analysis and Its Application in Microarrays By YAQIAN GUO Department of Statistics, Stanford University Stanford,
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationMachine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression
Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,
More informationHomework 4. Convex Optimization /36-725
Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationLecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides
Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture
More informationLearning From Data: Modelling as an Optimisation Problem
Learning From Data: Modelling as an Optimisation Problem Iman Shames April 2017 1 / 31 You should be able to... Identify and formulate a regression problem; Appreciate the utility of regularisation; Identify
More informationMSA220 Statistical Learning for Big Data
MSA220 Statistical Learning for Big Data Lecture 4 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology More on Discriminant analysis More on Discriminant
More informationLDA, QDA, Naive Bayes
LDA, QDA, Naive Bayes Generative Classification Models Marek Petrik 2/16/2017 Last Class Logistic Regression Maximum Likelihood Principle Logistic Regression Predict probability of a class: p(x) Example:
More informationCovariance-regularized regression and classification for high-dimensional problems
Covariance-regularized regression and classification for high-dimensional problems Daniela M. Witten Department of Statistics, Stanford University, 390 Serra Mall, Stanford CA 94305, USA. E-mail: dwitten@stanford.edu
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2015 Announcements TA Monisha s office hour has changed to Thursdays 10-12pm, 462WVH (the same
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods
More informationApplied Multivariate and Longitudinal Data Analysis
Applied Multivariate and Longitudinal Data Analysis Discriminant analysis and classification Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 Consider the examples: An online banking service
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationRegularization Paths. Theme
June 00 Trevor Hastie, Stanford Statistics June 00 Trevor Hastie, Stanford Statistics Theme Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Mee-Young Park,
More informationNonlinear Support Vector Machines through Iterative Majorization and I-Splines
Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support
More informationDoes Modeling Lead to More Accurate Classification?
Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More information9 Classification. 9.1 Linear Classifiers
9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationLinear Methods for Prediction
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More information1. Kernel ridge regression In contrast to ordinary least squares which has a cost function. m (θ T x (i) y (i) ) 2, J(θ) = 1 2.
CS229 Problem Set #2 Solutions 1 CS 229, Public Course Problem Set #2 Solutions: Theory Kernels, SVMs, and 1. Kernel ridge regression In contrast to ordinary least squares which has a cost function J(θ)
More informationHigh-dimensional regression modeling
High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationA Short Introduction to the Lasso Methodology
A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael
More informationMidterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.
CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators
More informationBoosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13
Boosting Ryan Tibshirani Data Mining: 36-462/36-662 April 25 2013 Optional reading: ISL 8.2, ESL 10.1 10.4, 10.7, 10.13 1 Reminder: classification trees Suppose that we are given training data (x i, y
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationSupervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing
Supervised Learning Unsupervised learning: To extract structure and postulate hypotheses about data generating process from observations x 1,...,x n. Visualize, summarize and compress data. We have seen
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More information