Applied Machine Learning Annalisa Marsico
|
|
- Darlene Cross
- 5 years ago
- Views:
Transcription
1 Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015
2 Goals Feature Selection rather than Feature reduction: regularized linear models From Regression to Classification Logistic regression Regularization, partial least square also possible How to improve overfitting How to evaluate a classification model Class imbalance
3 The Variance-Bias Tradeoff MSE N 1 2 Mean Square Error N y i yˆ i i1 can be decomposed E( MSE) 2 ( Model _ bias) 2 Model _ variance Irreducible noise Reflects how close the function of the model is to the real relationship input-output Reflects how good the model is in generalizing Low variance /high bias Low bias / high variance
4 The Variance-Bias Tradeoff Complex models can have high variance. Collinearity gives rise to high variance models -> we can try to reduce the model s variance as a way to reduce collinearity -> by reducing the variance we increase the bias in the model E( MSE) 2 ( Model _ bias) 2 Model _ variance N.B. Ordinary linear regression produces unbiased coefficients
5 Ridge Regression Controlling or regularizing the parameter estimates can be done by adding a penalty to the SSE SSE N y i yˆ i i1 2 SSE L N 2 y i yi ˆ 2 j i1 P j1 Penalty controls the amount of shrinkage Path of the regression coefficients for different values of λ. 2 L 2 -norm
6 Ridge Regression-How to chose the penalty What s happening in this region?
7 Lasso (Least Absolute Shrinkage and SSE L Selection Operation) regression N 2 y i yi 1 ˆ i1 P j1 It seems like a small modification but the practical implications are significant. While regression coefficients are still shrunk towards zero, by penalizing the absolute value some parameters are actually set to zero for some values of λ
8 Questions Is PLSR a feature selection method? Is Ridge Regression a feature selection method? Is Lasso a feature selection method?
9 Elastic net Generalizaiton of the Lasso model. It combines two types of penalty. p j j P j j N i i i Enet y y SSE ˆ Advantage: enables regularization via the ridge-type penalty and feature selection via the Lasso-like penalty. Zou and Hastie (2005) suggested This model is good to deal with groups of high correlated predictors
10 Comparison between Ridge, Lasso and Elastic Net Lasso subjected to the penalty: 1 2 t Ridge subjected to the penalty: t Elliptical regions is the residual sum of square function. The center is the least square estimate Elastic net penalty: j 2 j 0.2 (1 ) j
11 Linear models for Classification
12 Classification The process of predicting categorical / qualitative responses Often predict probability to belong to a certain class / category We have a set of observation (x 1,y 1 )...(x n,y n ) to train classifier Example: predict if individual will default with his credit card based on annual Income and monthly balance on a set of individuals We want to learn a model that predicts Y (default) from X 1 (balance) and X 2 (income)
13 Classification Linear regression not suitable No natural way to convert qualitative response into quantitative For binary classes we can use dummy variables Y 0 1 If not default If default Yˆ Fitted values converted to output Class G G default if no _ default if Y ˆ 0.5 Yˆ 0. 5 If we try to predict Y with linear regression we might not get a number between 0 and 1 Rather than modeling the response Y directly we model the probability that Y belongs to a certain class P(G=1 X) and P(G=0 X)
14 Classification Linear decision boundary x x x : x p p
15 Logistic Regression How to model the relationship between p(x) = P(Y=1 X) and X? Logistic function ) ( 1 ) 0 ( 1 ) ( ) 1 ( X p X Y P e e X p X Y P X X After a bit of manipulation X X p X p e X p X p X 1 0 ) ( 1 ) ( log ) ( 1 ) ( 1 0 odds Log-odds or logit Logit is linear in X!
16 Estimating coefficients in logistic regression Maximum likelihood to fit the model and learn the β parameters. I.e. We chose those β that maximize the likelihood function of the data: ), ( ) ( ) ( 1 ) ( 1 : 0 : i i y i y i i i x X k Y P x p x p x p L i i
17 Logistic Regression interpretation What do the coefficient represent?
18 Generalized linear models Yˆ f ( X ) X T ˆ Linear model Yˆ Y Yˆ f ( X ) f ( X ) g( X T ˆ) We will always have an error in trying to approximate the real function Y Generalized linear model g = activation function In a linear model g = identity funciton In logistic regression g = logistic function The RSS criterium can still be used to find f(x)! Only, f(x) is a more complicated function..
19 Logistic Regression vs Linear Regression Linear Regression Logistic Regression yˆ R yˆ 0,1 T y( x) X 0 T y( x) ( X 0) g = identity function g = sigmoid function
20 Regularized Logistic Regression Classification model can alos use penalty (Ridge, Lasso, etc.) to improve fit E.g. Logistic regression we can maximize a constrainted likelihood function log L( ) p j1 2 j E.g. Ridge-like penalty The glmnet package in R uses a combination of Ridge and Lasso penalty log L( ) p p j j 2 j1 j1 α = mixing proportion that toggles between the pure Lasso penalty (α=1) and pure Ridge (when α=0). α controls the total amount of penalization
21 Regularized Logistic Regression Example: accuracy for different models with different α and λ parameters
22 Can Partial Least Square be extended to Logistic Regression? Yes..It will find new variables that simultaneously reduce dimension and correlate to the response (but Y=0,1) One tuninig parameter: number of components
23 Over-Fitting and Model Tuning
24 The problem of Over-Fitting Tendency to over-emphasize patterns Need to evaluate the model to be confident that it will do well in the future (on new data) Problems in the data: Data quality Limited number of samples
25 The problem of Over-Fitting We want to use existing data to find the best parameters which give not only the best accuracy, but also the most realistic Originally: Split the data into a training set and test set. Modern approaches: Split the data into multiple sets for training, i.e. Parameter tuning Split data into one (or more) distinct set for evaluation purposes
26 The problem of Over-Fitting When the model, in addition to learning general patterns in the data learns the noise This kind of model will have poor accuracy when predicting a new sample Let s consider the following classification problem Which of these two classifiers is likely to generalize better to new data?
27 The problem of over-fitting
28 Parameter Tuning Several models have at least one tuning parameter We want to find the best set of parameters General strategy for parameter tuning
29 Data Splitting Given a certain amount of data, we have to decide how to spend the data points i.e. which data used for tuning / training and which ones for evaluation Important: Evaluation must be carried out on samples never used in model tuning Many data points -> Test set Few data points -> Re-sampling (Cross-validation) Stratified sampling: random sampling within subgroups when disproportion between classes present
30 Resampling techniques
31 K-fold Cross Validation Example: predicting cancer patients from gene expression & clinical data Patients split into k sets of roughly same size Model is fit using all patients, except the first subset (first fold) Held-out patients used for predictions and estimation of performance First subset is returned to the training set and procedure repeated for all k sets k estimates of performance are summarized (usually averaged) Schema of cross-validation process with k = 3
32 Leave One Out Cross-Validation (LOOCV) k-fold cross-validation, k= number of patients (only one patient is held out at time) Final performance computed from the k individual held-out predictions Computationally expensive! k= 10 more attractive but k small reduces the bias between predicted performance and real performance In practise they give similar results
33 Bootstrap Random sample of the patients with replacement I.e. After a data point (patient) is selected for a subset, it can be still selected for the same dataset Some patients represented multiple times in the a set, others not selected at all Not selected samples, out-of bag samples used for prediction and performance estimation Schema of bootstrap procedure
34 Choosing final tuning paramters Pick the parameter setting associated with best accuracy/ minimum error Not always the best choise.. Example: SVM accuracy vs Cost Parameter. 5-fold cross-validation
35 Practical hints to choose the model Test set is a single evaluation: sometimes limited ability Small sample size: We might want to use all points for model building Resampling might be a better solution There is no resampling method better than the others Depends on the situation, e.g. Sample size, computational cost Bootstrap can have lower error rate compared to k-fold CV How to practically choose between models, e.g. SVM or ligistic regression? How can you compare different models?
36 Performance in Classification Models
37 Class Prediction RMSE and R 2 are not appropriate in the context of classification Although classification models mainly return a continuous value (e.g. prob between 0 and 1) -> we need a class prediction (discrete) However, sometimes the probability can be useful to gain confidence
38 Class Prediction - examples message with p=0.51 and another message with p=0.9 would both be classified as spam Imagine a model to classify molecules based on toxicity: molecule1 with class probability 0.52, 0.48 and molecule2 with class probability 0.98, 0.02 will be both classified as non-toxic -> but confidence for molecule 1 is higher
39 Softmax Transformation Prediction for the l-class
40 Evaluating Predicted Classes Confusion matrix: example for a two classes outcome Predicted Event Observed Nonevent Event TP FP Nonevent FN TN Where classes are wrongly predicted Where classes are correctly predicted e.g. Event: healthy, Nonevent: toxic
41 Drawbacks of accuracy measure It does not make a distinction about the type of error being made. E.g. in spam filtering, the cost of deleting an important is higher than allowing a spam pass the filter.. It does not consider the frequency of each class. E.g. in a compound screening model the molecules with biological activity are a minority
42 Other metrics Predicted Observed Event Nonevent Event TP FP Nonevent FN TN Sensitivity (true positive rate) is the rate of correctly predicting the event of interest for all samples having an event Sensitivity = TP TP + FN Specificity is the rate for non-even samples predicted correctly Specificity= False Positive Rate = 1 - Specificity Potential trade-offs between sensitivity and specificity can be made and still keep the same accuracy
43 Other metrics Predicted Observed Event Nonevent Event TP FP Nonevent FN TN Sensitivity and Specificity are conditional measures -> they depend on the event In theory if the event is rare (prevalence w), this should be taken into account.. PPV NPV Sensitivity w Sensitivity w 1 Specificity 1 w Specificity 1 w w1 Sensitivity Specificity 1 w
44 Receiver Operating Characteristic (ROC) Curves Given a collection of continous data points plots sensitivity and false discovery rate at different thresholds What is the effect of altering the threshold? AUC =Area Under the Curve, quantitative assessment of the model ROC curve for a logistic regression model to predict toxicity of a model
45 Precision-Recall curve TP/TP+FN TP/TP+FP FP/FP+TN TP/TP+FN PR curve is much more sensitive to the false positives (e.g. healthy patients that were predicted to have cancer) in cases there the negative class (e.g. Healthy patients) dominates.
46 Class imbalance Imbalance: when one or more classes have very low propostion in the training data Can have significant impact on the effectivness of the model E.g. Pharmaceutical research: High-throughput screening only few molecules show activity: frequency of intersting compounds is low.
47 The effect of class imbalance - example Three models usde to model the high-throughput screening data and evaluated on a test-set The result of class imbalance (most of compounds show no activity) is that models are comparable, have good specificity, but very little sensitivity
48 Class imbalance What can be done? Change the cutoff to increase prediction accuracy of the minority class -> find appropriate balance between sensitivity and specificity min(distance) How do we determine the new cutoff? Find the point on the ROC curve closest to the perfect model
49 Class Imbalance sampling models Reduce effect of imbalance during training Down-sampling (of the majority class) Bootstrap (such that lcasses are balanced in bootstrap set) Up-sampling (of the minority class) Some samples from the minority class appear more than once in the set SMOTE (combination of down-sampling and up-sampling) Class1: healthy Class2: cancer Predictor A: mutation Predictor B: patient age
50 Goals Feature Selection rather than Feature reduction: regularized linear models From Regression to Classification Logistic regression Regularization, partial least square also possible How to improve overfitting How to evaluate a classification model Class imbalance
Machine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationMS-C1620 Statistical inference
MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents
More informationDiagnostics. Gad Kimmel
Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator. Given data samples say the average. x 1, x 2,...,
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationPrediction & Feature Selection in GLM
Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis
More informationEvaluation requires to define performance measures to be optimized
Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation
More informationLecture 14: Shrinkage
Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the
More informationA simulation study of model fitting to high dimensional data using penalized logistic regression
A simulation study of model fitting to high dimensional data using penalized logistic regression Ellinor Krona Kandidatuppsats i matematisk statistik Bachelor Thesis in Mathematical Statistics Kandidatuppsats
More informationRegularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline
Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function
More informationEvaluation. Andrea Passerini Machine Learning. Evaluation
Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain
More informationESS2222. Lecture 4 Linear model
ESS2222 Lecture 4 Linear model Hosein Shahnas University of Toronto, Department of Earth Sciences, 1 Outline Logistic Regression Predicting Continuous Target Variables Support Vector Machine (Some Details)
More informationSVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning
SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are
More informationPerformance Evaluation and Comparison
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationClass 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio
Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant
More informationIntroduction to Supervised Learning. Performance Evaluation
Introduction to Supervised Learning Performance Evaluation Marcelo S. Lauretto Escola de Artes, Ciências e Humanidades, Universidade de São Paulo marcelolauretto@usp.br Lima - Peru Performance Evaluation
More informationArticle from. Predictive Analytics and Futurism. July 2016 Issue 13
Article from Predictive Analytics and Futurism July 2016 Issue 13 Regression and Classification: A Deeper Look By Jeff Heaton Classification and regression are the two most common forms of models fitted
More informationModel Accuracy Measures
Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses
More informationStephen Scott.
1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training
More informationPerformance Evaluation
Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationData Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.
TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin
More informationStatistical Methods for Data Mining
Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find
More informationBANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1
BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I
More informationBayesian Decision Theory
Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification
More informationCOMS 4771 Regression. Nakul Verma
COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the
More informationMethods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }
More informationGeneralized Linear Models
Generalized Linear Models Lecture 7. Models with binary response II GLM (Spring, 2018) Lecture 7 1 / 13 Existence of estimates Lemma (Claudia Czado, München, 2004) The log-likelihood ln L(β) in logistic
More informationcxx ab.ec Warm up OH 2 ax 16 0 axtb Fix any a, b, c > What is the x 2 R that minimizes ax 2 + bx + c
Warm up D cai.yo.ie p IExrL9CxsYD Sglx.Ddl f E Luo fhlexi.si dbll Fix any a, b, c > 0. 1. What is the x 2 R that minimizes ax 2 + bx + c x a b Ta OH 2 ax 16 0 x 1 Za fhkxiiso3ii draulx.h dp.d 2. What is
More informationClassification. Classification is similar to regression in that the goal is to use covariates to predict on outcome.
Classification Classification is similar to regression in that the goal is to use covariates to predict on outcome. We still have a vector of covariates X. However, the response is binary (or a few classes),
More informationClassification using stochastic ensembles
July 31, 2014 Topics Introduction Topics Classification Application and classfication Classification and Regression Trees Stochastic ensemble methods Our application: USAID Poverty Assessment Tools Topics
More informationCOMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless
More informationProteomics and Variable Selection
Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationSTATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours
Instructions: STATS216v Introduction to Statistical Learning Stanford University, Summer 2017 Remember the university honor code. Midterm Exam (Solutions) Duration: 1 hours Write your name and SUNet ID
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 29 April, SoSe 2015 Support Vector Machines (SVMs) 1. One of
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationLinear and Logistic Regression. Dr. Xiaowei Huang
Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationLeast Squares Classification
Least Squares Classification Stephen Boyd EE103 Stanford University November 4, 2017 Outline Classification Least squares classification Multi-class classifiers Classification 2 Classification data fitting
More informationPerformance Evaluation
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Example:
More informationTrain the model with a subset of the data. Test the model on the remaining data (the validation set) What data to choose for training vs. test?
Train the model with a subset of the data Test the model on the remaining data (the validation set) What data to choose for training vs. test? In a time-series dimension, it is natural to hold out the
More informationHigh-Dimensional Statistical Learning: Introduction
Classical Statistics Biological Big Data Supervised and Unsupervised Learning High-Dimensional Statistical Learning: Introduction Ali Shojaie University of Washington http://faculty.washington.edu/ashojaie/
More informationBiostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences
Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin SoSe 2015 What is Machine Learning? What is Machine Learning?
More informationStatistical aspects of prediction models with high-dimensional data
Statistical aspects of prediction models with high-dimensional data Anne Laure Boulesteix Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie February 15th, 2017 Typeset by
More informationA Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)
A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical
More informationCOMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma
COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released
More informationEnsemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan
Ensemble Methods NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan How do you make a decision? What do you want for lunch today?! What did you have last night?! What are your favorite
More informationLecture 3 Classification, Logistic Regression
Lecture 3 Classification, Logistic Regression Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se F. Lindsten Summary
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationDimension Reduction Methods
Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.
More informationLinear model selection and regularization
Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationMachine Learning Linear Regression. Prof. Matteo Matteucci
Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares
More informationPDEEC Machine Learning 2016/17
PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /
More informationIntroduction to Data Science
Introduction to Data Science Winter Semester 2018/19 Oliver Ernst TU Chemnitz, Fakultät für Mathematik, Professur Numerische Mathematik Lecture Slides Contents I 1 What is Data Science? 2 Learning Theory
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationPattern Recognition 2018 Support Vector Machines
Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 48 Support Vector Machines Ad Feelders ( Universiteit Utrecht
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10
COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem
More informationCSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015
CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 Luke ZeElemoyer Slides adapted from Carlos Guestrin Predic5on of con5nuous variables Billionaire says: Wait, that s not what
More informationECE521 Lecture7. Logistic Regression
ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard
More informationCharles H. Dyson School of Applied Economics and Management Cornell University, Ithaca, New York USA
WP 2017-15 October 2017 Working Paper Charles H. Dyson School of Applied Economics and Management Cornell University, Ithaca, New York 14853-7801 USA Can Machine Learning Improve Prediction? An Application
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationThe exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.
CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More information15-388/688 - Practical Data Science: Nonlinear modeling, cross-validation, regularization, and evaluation
15-388/688 - Practical Data Science: Nonlinear modeling, cross-validation, regularization, and evaluation J. Zico Kolter Carnegie Mellon University Fall 2016 1 Outline Example: return to peak demand prediction
More informationSVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels
SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More informationChapter 10 Logistic Regression
Chapter 10 Logistic Regression Data Mining for Business Intelligence Shmueli, Patel & Bruce Galit Shmueli and Peter Bruce 2010 Logistic Regression Extends idea of linear regression to situation where outcome
More informationReal Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report
Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Hujia Yu, Jiafu Wu [hujiay, jiafuwu]@stanford.edu 1. Introduction Housing prices are an important
More informationLinear classifiers: Logistic regression
Linear classifiers: Logistic regression STAT/CSE 416: Machine Learning Emily Fox University of Washington April 19, 2018 How confident is your prediction? The sushi & everything else were awesome! The
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 12: Logistic regression (v1) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 30 Regression methods for binary outcomes 2 / 30 Binary outcomes For the duration of this
More informationIMBALANCED DATA. Phishing. Admin 9/30/13. Assignment 3: - how did it go? - do the experiments help? Assignment 4. Course feedback
9/3/3 Admin Assignment 3: - how did it go? - do the experiments help? Assignment 4 IMBALANCED DATA Course feedback David Kauchak CS 45 Fall 3 Phishing 9/3/3 Setup Imbalanced data. for hour, google collects
More informationSTK4900/ Lecture 5. Program
STK4900/9900 - Lecture 5 Program 1. Checking model assumptions Linearity Equal variances Normality Influential observations Importance of model assumptions 2. Selection of predictors Forward and backward
More informationShrinkage Methods: Ridge and Lasso
Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and
More informationMachine Learning Concepts in Chemoinformatics
Machine Learning Concepts in Chemoinformatics Martin Vogt B-IT Life Science Informatics Rheinische Friedrich-Wilhelms-Universität Bonn BigChem Winter School 2017 25. October Data Mining in Chemoinformatics
More informationLecture Data Science
Web Science & Technologies University of Koblenz Landau, Germany Lecture Data Science Regression Analysis JProf. Dr. Last Time How to find parameter of a regression model Normal Equation Gradient Decent
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationClassifica(on and predic(on omics style. Dr Nicola Armstrong Mathema(cs and Sta(s(cs Murdoch University
Classifica(on and predic(on omics style Dr Nicola Armstrong Mathema(cs and Sta(s(cs Murdoch University Classifica(on Learning Set Data with known classes Prediction Classification rule Data with unknown
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationOnline Advertising is Big Business
Online Advertising Online Advertising is Big Business Multiple billion dollar industry $43B in 2013 in USA, 17% increase over 2012 [PWC, Internet Advertising Bureau, April 2013] Higher revenue in USA
More informationCS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS
CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems
More informationLDA, QDA, Naive Bayes
LDA, QDA, Naive Bayes Generative Classification Models Marek Petrik 2/16/2017 Last Class Logistic Regression Maximum Likelihood Principle Logistic Regression Predict probability of a class: p(x) Example:
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 9: Logistic regression (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 28 Regression methods for binary outcomes 2 / 28 Binary outcomes For the duration of this lecture suppose
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationClassification Logistic Regression
Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham
More informationStatistical Methods for SVM
Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More information