Machine Learning & Data Mining CS/CNS/EE 155. Lecture 4: Recent Applica6ons of Lasso
|
|
- Regina Butler
- 5 years ago
- Views:
Transcription
1 Machine Learning Data Mining CS/CNS/EE 155 Lecture 4: Recent Applica6ons of Lasso 1
2 Today: Two Recent Applica6ons Cancer Detec0on Personaliza0on via twi9er music( Biden( soccer( Labour( Applica6ons of Lasso (and related methods) Think about the data modeling goals Some new learning problems Slide material borrowed from Rob Tibshirani and Khalid El- Arini Image Sources: hbp:// hbps://dl.dropboxusercontent.com/u/ /papers/badgepaper- kdd2013.pdf 2
3 Aside: Convexity Not Convex Easy to find global op0ma! Strict convex if diff always >0 Image Source: hbp://en.wikipedia.org/wiki/convex_func6on 3
4 Aside: Convexity All local op6ma are global op6ma: Strictly convex: unique global op6mum: Almost all objec6ves discussed are (strictly) convex: SVMs, LR, Ridge, Lasso (except ANNs) 4
5 Cancer Detec6on 5
6 Molecular assessment of surgical- resec6on margins of gastric cancer by mass- spectrometric imaging Proceedings of the Na0onal Academy of Sciences (2014) Livia S. Eberlin, Robert Tibshirani, Jialing Zhang, Teri Longacre, Gerald Berry, David B. Bingham, Jeffrey Norton, Richard N. Zare, and George A. Poultsides hbp:// hbp://statweb.stanford.edu/~6bs/gp/canc.pdf Extracted part Gastric (Stomach) Cancer Left in patient Cancer Stromal Epithelial Normal margin 1. Surgeon removes 6ssue 2. Pathologist examines 6ssue Under microscope 3. If no margin, GOTO Step 1. Image Source: hbp://statweb.stanford.edu/~6bs/gp/canc.pdf 6
7 Drawbacks Expensive: requires a pathologist Slow: examina6on can take up to an hour Unreliable: 20%- 30% can t predict on the spot Extracted part Gastric (Stomach) Cancer 1. Surgeon removes 6ssue Left in patient Cancer Stromal Epithelial Normal margin 2. Pathologist examines 6ssue Under microscope 3. If no margin, GOTO Step 1. Image Source: hbp://statweb.stanford.edu/~6bs/gp/canc.pdf 7
8 Machine Learning to the Rescue! (actually just sta6s6cs) Lasso originated from sta6s6cs community. But we machine learners love it! Basic Lasso: Train a model to predict cancerous regions! Y = {C,E,S} (How to predict 3 possible labels?) What is X? What is loss func6on? argmin λ w + w,b N i=1 L( y i, w T x i b) 2 8
9 Mass Spectrometry Imaging DESI- MSI (Desorp6on Electrospray Ioniza6on) Effec6vely runs in real- 6me (used to generate x) hbp://en.wikipedia.org/wiki/desorp6on_electrospray_ioniza6on Image Source: hbp://statweb.stanford.edu/~6bs/gp/canc.pdf 9
10 Cancer Each pixel is data point x via spectroscopy y via cell- type label Epithelial Stromal Spectrum for each pixel x Spectrum sampled at 11,000 m/z values Image Source: hbp://statweb.stanford.edu/~6bs/gp/canc.pdf 10
11 x Spectrum sampled at 11,000 m/z values Each pixel has 11K features. Visualizing a few features. Image Source: hbp://statweb.stanford.edu/~6bs/gp/canc.pdf 11
12 Mul6class Predic6on Mul6class y: Most common model: N S = {(x i, y i )} i=1 x R D y { 1, 2,..., K} Replicate Weights:! # # w = # # "# w 1 w 2! w K $ %! # # b = # # "# b 1 b 2! b K $ % Loss func0on? Score All Classes: " $ $ f (x w, b) = $ $ $ # w 1 T x b 1 w 2 T x b 2! w K T x b K % ' ' ' ' ' Predict via Largest Score: argmax k " $ $ $ $ $ # w 1 T x b 1 w 2 T x b 2! w K T x b K % ' ' ' ' ' 12
13 Mul6class Logis6c Regression Binary LR: P(y x, w, b) = e y ( wt x b) ( ) y w + e ( T x b) e y wt x b y { 1,+1 } Log Linear Property: P(y x, w, b) e y ( wt x b) (w 1,b 1 ) = (- w - 1,- b - 1 ) Extension to Mul0class: P(y = k x, w, b) e w T k x b k Keep a (w k,b k ) for each class Mul0class LR: P(y = k x, w, b) = m T ew k x b k e w T m x b m Referred to as Mul6nomial Log- Likelihood by Tibshirani hbp://statweb.stanford.edu/~6bs/gp/canc.pdf 13
14 Mul6class Log Loss argmin w,b i P(y x, w, b) = ln P(y i x i, w, b) m T ew y x b y e w T m x b m # ln P(y x, w, b) = w T y x + b y + ln% $ m e w T m x b m! # # w = # # "# ( ' w 1 w 2! w K x R D y { 1, 2,..., K} $ %! # # b = # # "# b 1 b 2! b K $ % $ wk ln P(y x, w, b) = % ' ( 1+ P(y x, w, b) ) x if y = k P(y x, w, b)x if y k 14
15 Mul6class Log Loss Suppose x=1 ignore b Model score is just w k Vary one weight, others = 1 Log Loss y=k y k # ln P(y x, w, b) = w T y x + b y + ln% $ m e w T m x b m ( ' w k $ wk ln P(y x, w, b) = % ' ( 1+ P(y x, w, b) ) x if y = k P(y x, w, b)x if y k 15
16 Lasso Mul6class Logis6c Regression argmin λ w + ln P(y i x i, w, b) w,b i x R D y { 1, 2,..., K} k k d w = w k = w kd! # # w = # # "# w 1 w 2! w K $ %! # # b = # # "# b 1 b 2! b K $ % Probabilis6c model Sparse weights 16
17 Back to the Problem Image Tissue Samples Each pixel is an x 11K features via Mass Spec Computable in real 6me 1 predic6on per pixel y via lab results ~2 weeks turn- around Visualiza6on of all pixels for one feature Spectrum sampled at 11,000 m/z values x 17
18 Learn a Predic6ve Model Training set: 28 6ssue samples from 14 pa6ents Cross valida6on to select λ Test set: 21 6ssue samples from 9 pa6ents Test Performance: Predicted 0.2 margin in probability Pathology Cancer Epithelium Stroma Don t know Agreement, % Overall agreement, % Cancer 5, Epithelium 134 3, Stroma , Cancer Normal Agreement, % Overall agreement, % Cancer 5, Normal 159 6,
19 Lasso yields sparse weights! (Manual Inspec0on Feasible!) Many correlated features Lasso tends to focus on one hbp://cshprotocols.cshlp.org/content/2008/5/pdb.prot
20 Extension: Local Linearity P(y x, w, b) = T ew y x b y e w T m x b m m Assumes probability shigs along straight line Ogen not true Approach: cluster based on x Train customized model for each cluster Patient Overall Standard training 0.29% 4.56% 6.78% 0.00% 13.76% 2.77% 3.58% Customized training 0.71% 1.89% 0.82% 0.40% 9.43% 0.92% 1.89% hbp://statweb.stanford.edu/~6bs/gp/canc.pdf 20
21 Recap: Cancer Detec6on Seems Awesome! What s the catch? Small sample size Tested on 9 pa6ents Machine Learning only part of the solu6on Need infrastructure investment, etc. Analyze the scien6fic legi6macy Social/Poli6cal/Legal If there is mis- predic6on, who is at fault? 21
22 Personaliza6on via twiber music( Biden( soccer( Labour( 22
23 Represen6ng Documents Through Their Readers Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (2013) Khalid El- Arini, Min Xu, Emily Fox, Carlos Guestrin hbps://dl.dropboxusercontent.com/u/ /papers/badgepaper- kdd2013.pdf overloaded by news 1 million news ar6cles blog posts generated every hour* * [ 23
24 News Recommenda6on Engine user corpus Vector representa0on: Bag of words LDA topics etc. 24
25 Challenge Most common representa6ons don t naturally line up with user interests Fine- grained representa0ons (bag of words) too specific High- level topics (e.g., from a topic model) - too fuzzy and/or vague - can be inconsistent over 0me 25
26 Goal Improve recommenda6on performance through a more natural document representa6on 26
27 An Opportunity: News is Now Social In 2012, Guardian announced more readers visit site via Facebook than via Google search 27
28 badges 28
29 Approach Learn a document representa6on based on how readers publicly describe themselves 29
30 30
31 Using many tweets, can we learn that someone who iden6fies with via profile badges music reads ar6cles with these words:? 31
32 Given: training set of tweeted news ar6cles from a specific period of 6me 1. Learn a badge dic0onary from training set music 3 million ar6cles words badges 2. Use badge dic6onary to encode new ar0cles 32
33 Advantages Interpretable Clear labels Correspond to user interests Higher- level than words 33
34 Advantages Interpretable Clear labels Correspond to user interests 34
35 Advantages Interpretable Clear labels Correspond to user interests Higher- level than words Seman6cally consistent over 6me poli0cs 35
36 Given: training set of tweeted news ar6cles from a specific period of 6me 1. Learn a badge dic0onary from training set music 3 million ar6cles words badges 2. Use badge dic6onary to encode new ar0cles 36
37 Training data : Dic6onary Learning Iden6fies badges in TwiBer profile of tweeter S = N {( z i, y )} i i=1 Bag- of- words representa6on of document y album Fleetwood Mac love Nicks z gig music cycling linux 37
38 Training Objec6ve: Dic6onary Learning argmin B,W Iden6fies badges in TwiBer profile of tweeter S = N {( z i, y )} i i=1 N i=1 λ B B + λ W W + y i BW i 2 Bag- of- words representa6on of document Dic0onary Encoding music! words! badges! badges! ar.cles! 38
39 argmin B,W N i=1 λ B B + λ W W + y i BW i 2 Dic0onary Encoding music! words! badges! badges! ar.cles! Not convex! (because of BW term) Convex if only op6mize B or W (but not both) Alterna6ng Op6miza6on (between B and W) How to ini6alize? Use: S = N {( z i, y i )} i=1 z Ini6alize: W i = z i z i gig music cycling linux 39
40 argmin B,W Suppose Badge s ogen co- occurs with Badge t B s B t are correlated N i=1 λ B B + λ W W + y i BW i 2 From perspec6ve of W, B s are features. Lasso tends to focus on one correlated feature Many ar6cles might be about Gig, Fes6val Music simultaneously. 40
41 argmin B,W Suppose Badge s ogen co- occurs with Badge t B s B t are correlated From perspec6ve of W, B s are features. Lasso tends to focus on one correlated feature Graph Guided Fused Lasso: N argmin λ B B + λ W W + λ G ω st W is W it + 2 y i BW i B,W i=1 (s,t) E(G) N i=1 λ B B + λ W W + y i BW i 2 N i=1 Graph G of related Badges Co- occurance Rate On TwiBer Profiles 41
42 Encoding New Ar6cles Badge Dic6onary B is already learned Given a new document j with word vector y j Learn Badge Encoding W j : argmin λ W W j + λ 2 G W js W jt + y j BW j W j (s,t) G 42
43 Recap: Badge Dic6onary Learning 1. Learn a badge dic0onary from training set music words badges 2. Use badge dic6onary to encode new ar0cles 43
44 music Examining B Biden September 2012 soccer Labour 44
45 Badges Over Time music Biden September 2012 September
46 A Spectrum of Pundits top conserva6ves on TwiBer Limit badges to progressive and TCOT Predict poli6cal alignments of likely readers? more conserva6ve Dowd Krugman Parker Rich Kristof Kristol Klein Goldberg Brooks Friedman Zakaria Ignatius Krauthammer Coulter Took all ar6cles by columnist Looked at encoding score progressive vs TCOT Average
47 User Study Which representa6on best captures user preferences over 6me? Study on Amazon Mechanical Turk with 112 users 1. Show users random 20 ar6cles from Guardian, from 6me period 1, and obtain ra6ngs 2. Pick random representa6on bag of words, high level topic, Badges 3. Represent user preferences as mean of liked ar6cles 4. GOTO next 6me period Recommend according to preferences GOTO STEP 2 47
48 User Study 10 Number of liked articles beber 5 tf idf High LDA Level Topic badges Badges Bag of Words 48
49 Recap: Personaliza6on via twiber Sparse Dic6onary Learning Learn a new representa6on of ar6cles Encode ar6cles using dic6onary BeBer than Bag of Words BeBer than High Level Topics Based on social data Badges on twiber profile twee6ng Seman6cs not directly evident from text alone 49
50 Next Week Sequence Predic6on Hidden Markov Models Condi6onal Random Fields Homework 1 due Tues via Moodle 50
Machine Learning & Data Mining CS/CNS/EE 155. Lecture 8: Hidden Markov Models
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 8: Hidden Markov Models 1 x = Fish Sleep y = (N, V) Sequence Predic=on (POS Tagging) x = The Dog Ate My Homework y = (D, N, V, D, N) x = The Fox Jumped
More informationMachine Learning & Data Mining CS/CNS/EE 155. Lecture 11: Hidden Markov Models
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 11: Hidden Markov Models 1 Kaggle Compe==on Part 1 2 Kaggle Compe==on Part 2 3 Announcements Updated Kaggle Report Due Date: 9pm on Monday Feb 13 th
More informationMachine Learning & Data Mining CMS/CS/CNS/EE 155. Lecture 2: Perceptron & Gradient Descent
Machine Learning & Data Mining CMS/CS/CNS/EE 155 Lecture 2: Perceptron & Gradient Descent Announcements Homework 1 is out Due Tuesday Jan 12 th at 2pm Via Moodle Sign up for Moodle & Piazza if you haven
More informationCS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Assignment
More informationCS 6140: Machine Learning Spring 2016
CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Assignment
More informationLogis&c Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Logis&c Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationBoos$ng Can we make dumb learners smart?
Boos$ng Can we make dumb learners smart? Aarti Singh Machine Learning 10-601 Nov 29, 2011 Slides Courtesy: Carlos Guestrin, Freund & Schapire 1 Why boost weak learners? Goal: Automa'cally categorize type
More informationLatent Dirichlet Alloca/on
Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which
More informationCS 6140: Machine Learning Spring What We Learned Last Week 2/26/16
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Sign
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationAlgorithms for NLP. Classifica(on III. Taylor Berg- Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Classifica(on III Taylor Berg- Kirkpatrick CMU Slides: Dan Klein UC Berkeley The Perceptron, Again Start with zero weights Visit training instances one by one Try to classify If correct,
More informationSTAD68: Machine Learning
STAD68: Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.cs.toronto.edu/~rsalakhu/ Lecture 1 Evalua;on 3 Assignments worth 40%. Midterm worth 20%. Final
More informationCSE546: SVMs, Dual Formula5on, and Kernels Winter 2012
CSE546: SVMs, Dual Formula5on, and Kernels Winter 2012 Luke ZeClemoyer Slides adapted from Carlos Guestrin Linear classifiers Which line is becer? w. = j w (j) x (j) Data Example i Pick the one with the
More informationLast Lecture Recap UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 3: Linear Regression
UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 3: Linear Regression Yanjun Qi / Jane University of Virginia Department of Computer Science 1 Last Lecture Recap q Data
More informationFounda'ons of Large- Scale Mul'media Informa'on Management and Retrieval. Lecture #3 Machine Learning. Edward Chang
Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval Lecture #3 Machine Learning Edward Y. Chang Edward Chang Founda'ons of LSMM 1 Edward Chang Foundations of LSMM 2 Machine Learning
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationRegression.
Regression www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts linear regression RMSE, MAE, and R-square logistic regression convex functions and sets
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationUnsupervised Learning: K- Means & PCA
Unsupervised Learning: K- Means & PCA Unsupervised Learning Supervised learning used labeled data pairs (x, y) to learn a func>on f : X Y But, what if we don t have labels? No labels = unsupervised learning
More informationCOMP 562: Introduction to Machine Learning
COMP 562: Introduction to Machine Learning Lecture 20 : Support Vector Machines, Kernels Mahmoud Mostapha 1 Department of Computer Science University of North Carolina at Chapel Hill mahmoudm@cs.unc.edu
More informationPriors in Dependency network learning
Priors in Dependency network learning Sushmita Roy sroy@biostat.wisc.edu Computa:onal Network Biology Biosta2s2cs & Medical Informa2cs 826 Computer Sciences 838 hbps://compnetbiocourse.discovery.wisc.edu
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationCSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015
CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 Luke ZeElemoyer Slides adapted from Carlos Guestrin Predic5on of con5nuous variables Billionaire says: Wait, that s not what
More informationAnnouncements - Homework
Announcements - Homework Homework 1 is graded, please collect at end of lecture Homework 2 due today Homework 3 out soon (watch email) Ques 1 midterm review HW1 score distribution 40 HW1 total score 35
More informationMulti-class SVMs. Lecture 17: Aykut Erdem April 2016 Hacettepe University
Multi-class SVMs Lecture 17: Aykut Erdem April 2016 Hacettepe University Administrative We will have a make-up lecture on Saturday April 23, 2016. Project progress reports are due April 21, 2016 2 days
More informationCOMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection
COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:
More informationUVA CS 4501: Machine Learning. Lecture 6: Linear Regression Model with Dr. Yanjun Qi. University of Virginia
UVA CS 4501: Machine Learning Lecture 6: Linear Regression Model with Regulariza@ons Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major sec@ons of this course
More informationCS 6140: Machine Learning Spring 2017
CS 6140: Machine Learning Spring 2017 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis@cs Assignment
More informationPart 2. Representation Learning Algorithms
53 Part 2 Representation Learning Algorithms 54 A neural network = running several logistic regressions at the same time If we feed a vector of inputs through a bunch of logis;c regression func;ons, then
More informationLecture #11: Classification & Logistic Regression
Lecture #11: Classification & Logistic Regression CS 109A, STAT 121A, AC 209A: Data Science Weiwei Pan, Pavlos Protopapas, Kevin Rader Fall 2016 Harvard University 1 Announcements Midterm: will be graded
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 1 Evalua:on
More informationUVA CS / Introduc8on to Machine Learning and Data Mining
UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 13: Probability and Sta3s3cs Review (cont.) + Naïve Bayes Classifier Yanjun Qi / Jane, PhD University of Virginia Department
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationLeast Mean Squares Regression. Machine Learning Fall 2017
Least Mean Squares Regression Machine Learning Fall 2017 1 Lecture Overview Linear classifiers What func?ons do linear classifiers express? Least Squares Method for Regression 2 Where are we? Linear classifiers
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationUVA CS 4501: Machine Learning. Lecture 11b: Support Vector Machine (nonlinear) Kernel Trick and in PracCce. Dr. Yanjun Qi. University of Virginia
UVA CS 4501: Machine Learning Lecture 11b: Support Vector Machine (nonlinear) Kernel Trick and in PracCce Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationGraphical Models. Lecture 1: Mo4va4on and Founda4ons. Andrew McCallum
Graphical Models Lecture 1: Mo4va4on and Founda4ons Andrew McCallum mccallum@cs.umass.edu Thanks to Noah Smith and Carlos Guestrin for some slide materials. Board work Expert systems the desire for probability
More informationClassifica(on and predic(on omics style. Dr Nicola Armstrong Mathema(cs and Sta(s(cs Murdoch University
Classifica(on and predic(on omics style Dr Nicola Armstrong Mathema(cs and Sta(s(cs Murdoch University Classifica(on Learning Set Data with known classes Prediction Classification rule Data with unknown
More informationSeman&cs with Dense Vectors. Dorota Glowacka
Semancs with Dense Vectors Dorota Glowacka dorota.glowacka@ed.ac.uk Previous lectures: - how to represent a word as a sparse vector with dimensions corresponding to the words in the vocabulary - the values
More informationLinear Algebra. Introduction. Marek Petrik 3/23/2017. Many slides adapted from Linear Algebra Lectures by Martin Scharlemann
Linear Algebra Introduction Marek Petrik 3/23/2017 Many slides adapted from Linear Algebra Lectures by Martin Scharlemann Midterm Results Highest score on the non-r part: 67 / 77 Score scaling: Additive
More informationPredic've Analy'cs for Energy Systems State Es'ma'on
1 Predic've Analy'cs for Energy Systems State Es'ma'on Presenter: Team: Yingchen (YC) Zhang Na/onal Renewable Energy Laboratory (NREL) Andrey Bernstein, Rui Yang, Yu Xie, Anthony Florita, Changfu Li, ScoD
More information4.1 Online Convex Optimization
CS/CNS/EE 53: Advanced Topics in Machine Learning Topic: Online Convex Optimization and Online SVM Lecturer: Daniel Golovin Scribe: Xiaodi Hou Date: Jan 3, 4. Online Convex Optimization Definition 4..
More informationK-means. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. November 19 th, Carlos Guestrin 1
EM Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 19 th, 2007 2005-2007 Carlos Guestrin 1 K-means 1. Ask user how many clusters they d like. e.g. k=5 2. Randomly guess
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationBayesian networks Lecture 18. David Sontag New York University
Bayesian networks Lecture 18 David Sontag New York University Outline for today Modeling sequen&al data (e.g., =me series, speech processing) using hidden Markov models (HMMs) Bayesian networks Independence
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationClassification 1: Linear regression of indicators, linear discriminant analysis
Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationUnsupervised learning (part 1) Lecture 19
Unsupervised learning (part 1) Lecture 19 David Sontag New York University Slides adapted from Carlos Guestrin, Dan Klein, Luke Ze@lemoyer, Dan Weld, Vibhav Gogate, and Andrew Moore Bayesian networks enable
More informationDiscrimina)ve Latent Variable Models. SPFLODD November 15, 2011
Discrimina)ve Latent Variable Models SPFLODD November 15, 2011 Lecture Plan 1. Latent variables in genera)ve models (review) 2. Latent variables in condi)onal models 3. Latent variables in structural SVMs
More informationMini-project 2 (really) due today! Turn in a printout of your work at the end of the class
Administrivia Mini-project 2 (really) due today Turn in a printout of your work at the end of the class Project presentations April 23 (Thursday next week) and 28 (Tuesday the week after) Order will be
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationCOMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless
More informationHow to generate large-scale data from small-scale realworld
How to generate large-scale data from small-scale realworld data sets? Gang Lu Institute of Computing Technology, Chinese Academy of Sciences BigDataBench Tutorial MICRO 2014 Cambridge, UK INSTITUTE OF
More informationECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning
ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Markov Random Fields: Representation Conditional Random Fields Log-Linear Models Readings: KF
More informationGenerative Classifiers: Part 1. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang
Generative Classifiers: Part 1 CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang 1 This Week Discriminative vs Generative Models Simple Model: Does the patient
More informationOnline Passive-Aggressive Algorithms. Tirgul 11
Online Passive-Aggressive Algorithms Tirgul 11 Multi-Label Classification 2 Multilabel Problem: Example Mapping Apps to smart folders: Assign an installed app to one or more folders Candy Crush Saga 3
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning and Inference in DAGs We discussed learning in DAG models, log p(x W ) = n d log p(x i j x i pa(j),
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationMachine Learning and Data Mining. Linear regression. Prof. Alexander Ihler
+ Machine Learning and Data Mining Linear regression Prof. Alexander Ihler Supervised learning Notation Features x Targets y Predictions ŷ Parameters θ Learning algorithm Program ( Learner ) Change µ Improve
More informationLoss Functions and Optimization. Lecture 3-1
Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative Assignment 1 is released: http://cs231n.github.io/assignments2017/assignment1/ Due Thursday April 20, 11:59pm on Canvas (Extending
More informationINTRODUCTION TO DATA SCIENCE
INTRODUCTION TO DATA SCIENCE JOHN P DICKERSON Lecture #13 3/9/2017 CMSC320 Tuesdays & Thursdays 3:30pm 4:45pm ANNOUNCEMENTS Mini-Project #1 is due Saturday night (3/11): Seems like people are able to do
More informationMachine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement
More informationAd Placement Strategies
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January
More informationEnsemble Methods for Machine Learning
Ensemble Methods for Machine Learning COMBINING CLASSIFIERS: ENSEMBLE APPROACHES Common Ensemble classifiers Bagging/Random Forests Bucket of models Stacking Boosting Ensemble classifiers we ve studied
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationLecture 15: Logistic Regression
Lecture 15: Logistic Regression William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 15 What we ll learn in this lecture Model-based regression and classification Logistic regression
More informationCase Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!
Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:
More informationLecture 12. Talk Announcement. Neural Networks. This Lecture: Advanced Machine Learning. Recap: Generalized Linear Discriminants
Advanced Machine Learning Lecture 2 Neural Networks 24..206 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de Talk Announcement Yann LeCun (NYU & FaceBook AI) 28..
More informationInteres'ng- Phrase Mining for Ad- Hoc Text Analy'cs
Interes'ng- Phrase Mining for Ad- Hoc Text Analy'cs Klaus Berberich (kberberi@mpi- inf.mpg.de) Joint work with: Srikanta Bedathur, Jens DiIrich, Nikos Mamoulis, and Gerhard Weikum (originally presented
More informationModeling with Rules. Cynthia Rudin. Assistant Professor of Sta8s8cs Massachuse:s Ins8tute of Technology
Modeling with Rules Cynthia Rudin Assistant Professor of Sta8s8cs Massachuse:s Ins8tute of Technology joint work with: David Madigan (Columbia) Allison Chang, Ben Letham (MIT PhD students) Dimitris Bertsimas
More informationBBM406 - Introduc0on to ML. Spring Ensemble Methods. Aykut Erdem Dept. of Computer Engineering HaceDepe University
BBM406 - Introduc0on to ML Spring 2014 Ensemble Methods Aykut Erdem Dept. of Computer Engineering HaceDepe University 2 Slides adopted from David Sontag, Mehryar Mohri, Ziv- Bar Joseph, Arvind Rao, Greg
More informationECE521 Lecture7. Logistic Regression
ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard
More informationMixture Models & EM algorithm Lecture 21
Mixture Models & EM algorithm Lecture 21 David Sontag New York University Slides adapted from Carlos Guestrin, Dan Klein, Luke Ze@lemoyer, Dan Weld, Vibhav Gogate, and Andrew Moore The Evils of Hard Assignments?
More informationCSCE 478/878 Lecture 6: Bayesian Learning
Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell
More informationMixtures of Gaussians continued
Mixtures of Gaussians continued Machine Learning CSE446 Carlos Guestrin University of Washington May 17, 2013 1 One) bad case for k-means n Clusters may overlap n Some clusters may be wider than others
More informationLatent Dirichlet Allocation
Outlines Advanced Artificial Intelligence October 1, 2009 Outlines Part I: Theoretical Background Part II: Application and Results 1 Motive Previous Research Exchangeability 2 Notation and Terminology
More informationConvex optimization COMS 4771
Convex optimization COMS 4771 1. Recap: learning via optimization Soft-margin SVMs Soft-margin SVM optimization problem defined by training data: w R d λ 2 w 2 2 + 1 n n [ ] 1 y ix T i w. + 1 / 15 Soft-margin
More informationReplicated Softmax: an Undirected Topic Model. Stephen Turner
Replicated Softmax: an Undirected Topic Model Stephen Turner 1. Introduction 2. Replicated Softmax: A Generative Model of Word Counts 3. Evaluating Replicated Softmax as a Generative Model 4. Experimental
More informationWhy Should I Trust You? Speaker: Philip-W. Grassal Date: 05/03/18
Why Should I Trust You? Speaker: Philip-W. Grassal Date: 05/03/18 Why should I trust you? Why Should I Trust You? - Philip Grassal 2 Based on... Title: Why Should I Trust You? Explaining the Predictions
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationInformation Extraction from Text
Information Extraction from Text Jing Jiang Chapter 2 from Mining Text Data (2012) Presented by Andrew Landgraf, September 13, 2013 1 What is Information Extraction? Goal is to discover structured information
More informationQuan&fying Uncertainty. Sai Ravela Massachuse7s Ins&tute of Technology
Quan&fying Uncertainty Sai Ravela Massachuse7s Ins&tute of Technology 1 the many sources of uncertainty! 2 Two days ago 3 Quan&fying Indefinite Delay 4 Finally 5 Quan&fying Indefinite Delay P(X=delay M=
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationCS 570: Machine Learning Seminar. Fall 2016
CS 570: Machine Learning Seminar Fall 2016 Class Information Class web page: http://web.cecs.pdx.edu/~mm/mlseminar2016-2017/fall2016/ Class mailing list: cs570@cs.pdx.edu My office hours: T,Th, 2-3pm or
More informationDetec%ng and Analyzing Urban Regions with High Impact of Weather Change on Transport
Detec%ng and Analyzing Urban Regions with High Impact of Weather Change on Transport Ye Ding, Yanhua Li, Ke Deng, Haoyu Tan, Mingxuan Yuan, Lionel M. Ni Presenta;on by Karan Somaiah Napanda, Suchithra
More informationCS 540: Machine Learning Lecture 1: Introduction
CS 540: Machine Learning Lecture 1: Introduction AD January 2008 AD () January 2008 1 / 41 Acknowledgments Thanks to Nando de Freitas Kevin Murphy AD () January 2008 2 / 41 Administrivia & Announcement
More informationMachine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 9 Sep. 26, 2018 1 Reminders Homework 3:
More informationCOMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma
COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released
More informationLecture 12. Neural Networks Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 12 Neural Networks 10.12.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More information