Machine Teaching and its Applications
|
|
- Dennis Atkinson
- 5 years ago
- Views:
Transcription
1 Machine Teaching and its Applications Jerry Zhu University of Wisconsin-Madison Jan. 8, /64
2 Introduction 2/64
3 Machine teaching Given target model θ, learner A Find the best training set D so that A(D) θ 3/64
4 Passive learning, active learning, teaching 4/64
5 Passive learning with large probability ˆθ θ = O(n 1 ) 5/64
6 Active learning ˆθ θ = O(2 n ) 6/64
7 Machine teaching ɛ > 0, n = 2 7/64
8 Another example: teaching hard margin SVM T D = 2 vs. V C = d + 1 8/64
9 Machine learning vs. machine teaching learning (D given, learn ˆθ) ˆθ = argmin θ (x,y) D teaching (θ given, learn D) l(x, y, θ) + λ θ 2 min D,ˆθ s.t. ˆθ θ 2 + η D 0 ˆθ = argmin θ (x,y) D l(x, y, θ) + λ θ 2 D not i.i.d. synthetic or pool-based 9/64
10 Why bother if we already know θ? teach ing / teching/ noun 1. education 2. controlling 3. shaping 4. persuasion 5. influence maximization 6. attacking 7. poisoning 10/64
11 The coding view message=θ decoder=learning algorithm A language=d D A A(D) θ 1 Α (θ ) D Α 1 Θ 11/64
12 Machine teaching generic form min D,ˆθ s.t. TeachingRisk(ˆθ) + ηteachingcost(d) ˆθ = MachineLearning(D) 12/64
13 Fascinating things I will not discuss today probing graybox learners teaching by features, pairwise comparisons learner anticipates teaching reward shaping, reinforcement learning, optimal control 13/64
14 Machine learning debugging 14/64
15 Harry Potter toy example 15/64
16 education Labels y contain historical bias + hired by the Ministry of Magic o no magical heritage 16/64
17 education Trusted items ( x, ỹ) expensive insufficient to learn magical heritage 17/64
18 education Idea magical heritage Flip training labels and re-train model to agree with trusted items. Ψ(ˆθ) := [ˆθ( x) = ỹ] 18/64
19 Not our goal: only to learn a better model min θ Θ s.t. l(x, Y, θ) + λ θ Ψ(θ) = true 19/64
20 Our goal: To find bugs and learn a better model min Y,ˆθ s.t. Y Y Ψ(ˆθ) = true ˆθ = argmin l(x, Y, θ) + λ θ θ Θ 20/64
21 Solving combinatorial, bilevel optimization (Stackelberg game) step 1. label to probability simplex step 2. counting to probability mass y i δ i Y Y 1 n n (1 δ i,yi ) i=1 step 3. soften postcondition ˆθ( X) = Ỹ 1 m m l( x i, ỹ i, θ) i=1 21/64
22 Continuous now, but still bilevel argmin δ n,ˆθ s.t. 1 m m l( x i, ỹ i, ˆθ) + γ 1 n i=1 ˆθ = argmin θ 1 n n i=1 j=1 n (1 δ i,yi ) i=1 k δ ij l(x i, j, θ) + λ θ 2 22/64
23 Removing the lower level problem ˆθ = argmin θ step 4. the KKT condition 1 n n 1 n i=1 j=1 n i=1 j=1 k δ ij l(x i, j, θ) + λ θ 2 k δ ij θ l(x i, j, θ) + 2λθ = 0 step 5. plug implicit function θ(δ) into upper level problem argmin δ 1 m m l( x i, ỹ i, θ(δ)) + γ 1 n i=1 n (1 δ i,yi ) i=1 step 6. compute gradient δ with implicit function theorem 23/64
24 Software available. 24/64
25 education education Precision education education education Harry Potter Toy Example magical heritage magical heritage magical heritage data our debugger influence function magical heritage magical heritage DUTI INF NN LND (Oracle) Recall nearest neighbor label noise detection average PR 25/64
26 Adversarial Attacks 26/64
27 Level 1 attack: test item ( x, ỹ) manipulation Model ˆθ fixed. min x x x p s.t. ˆθ(x) ỹ. 27/64
28 Level 2 attack: training set poisoning min D s.t. D 0 D p Ψ(A(D)) e.g. Ψ(θ) := [θ( x + ɛ) = y ] 28/64
29 Level 2 attack on regression Lake Mendota, Wisconsin (x, y) ICE DAYS YEAR 29/64
30 Level 2 attack on regression min δ, β δ p s.t. β1 0 β = argmin (y + δ) Xβ 2 β ICE DAYS original, β 1 = norm attack, β 1 = YEAR ICE DAYS original, β 1 = norm attack, β 1 = YEAR minimize δ 2 2 minimize δ 1 [Mei, Z 15a] 30/64
31 Level 2 attack on latent Dirichlet allocation [Mei, Z 15b] 31/64
32 Guess the classification task Ready? 32/64
33 Guess the classification task (1) 33/64
34 Guess the classification task (2) 34/64
35 Guess the classification task (3) + The Angels won their home opener against the Brewers today before 33,000+ at Anaheim Stadium, 3-1 on a 3-hitter by Mark Lan + I m *very* interested in finding out how I might be able to get two tickets for the All Star game in Baltimore this year. + I know there s been a lot of talk about Jack Morris horrible start, but what about Dennis Martinez. Last I checked he s 0-3 with 6+ E... - Where are all the Bruins fans??? Good point - there haven t even been any recent posts about Ulf! - I agree thouroughly!! Screw the damn contractual agreements! Show the exciting hockey game. They will lose fans of ESPN - TV Coverage - NHL to blame! Give this guy a drug test, and some Ridalin whale you are at it /64
36 Did you get it right? (1) gun vs. phone 36/64
37 Did you get it right? (2) woman vs. man 37/64
38 Did you get it right? (3) 20Newsgroups soc.religion.christian vs. alt.atheism + : T H E W I T N E S S & P R O O F O F : : J E S U S C H R I S T S R E S U R R E C T I O N : : F R O M T H E D E A D : + I ve heard it said that the accounts we have of Christs life and ministry in the Gospels were actually written many years after - An Introduction to Atheism by mathew <mathew@mantis.co.uk> - Computers are an excellent example... of evolution without a creator. 38/64
39 Camouflage attack Social engineering against Eve 39/64
40 Camouflage attack Alice knows S (e.g. women, men) C (e.g. 7, 1) A Eve s inspection function MMD (maximum mean discrepancy) finds argmin D C s.t. (x,y) S l(a(d), x, y) MMD(D, C) α 40/64
41 Test set error baseline1 baseline /1 Error Budget (Gun vs. Phone) camouflaged as (5 vs. 2) 41/64
42 Enhance human learning 42/64
43 Hedging 1. Find D to maximize accuracy on cognitive model A 2. Give humans D either human performance improved or cognitive model A revised 43/64
44 Human learning example 1 [Patil et al. 2014] y= 1 y=1 A = kernel density estimator 0 θ =0.5 1 human trained on human test accuracy random items 69.8% D 72.5% (statistically significant) 44/64
45 Human learning example 2 [Sen et al. in preparation] Lewis space-filling A = neural network human trained on human test error random 28.6% expert 28.1% D 25.1% (statistically significant) 45/64
46 Human learning example 3 [Nosofsky & Sanders, Psychonomics 2017] A = Generalized Context Model (GCM) human trained on human accuracy random 67.2% coverage 71.2% D 69.3% D not better on humans (experts revising the model) 46/64
47 Super Teaching 47/64
48 48/64
49 Super teaching example 1 Let D iid U(0, 1), A(D) =SVM. whole training set O(n 1 ) most symmetrical pair O(n 2 ) (Not training set reduction) 49/64
50 Super teaching example 2 Let D iid N(0, 1), A(D) = 1 D x D x. Theorem: Fix k. For n sufficiently large, with large probablity kk ɛ min A(S) n k ɛ A(D) S D, S =k k 50/64
51 Thank you me for Machine Teaching Tutorial Collaborators: Security: Scott Alfeld, Paul Barford HCI: Saleema Amershi, Bilge Mutlu, Jina Suh Programming language: Aws Albarghouthi, Loris D Antoni, Shalini Ghosh Machine learning: Ran Gilad-Bachrach, Manuel Lopes, Yuzhe Ma, Christopher Meek, Shike Mei, Robert Nowak, Gorune Ohannessian, Philippe Rigollet, Ayon Sen, Patrice Simard, Ara Vartanian, Xuezhou Zhang Optimization: Ji Liu, Stephen Wright Psychology: Bradley Love, Robert Nosofsky, Martina Rau, Tim Rogers 51/64
52 52/64
53 Yet another example: teach Gaussian density T D = d + 1: tetrahedron vertices 53/64
54 education Proposed bugs flipping them makes re-trained model agree with trusted items given to experts to interpret magical heritage 54/64
55 The ML pipeline data (X, Y ) learner l parameters λ model ˆθ ˆθ = argmin l(x, Y, θ) + λ θ θ Θ 55/64
56 Postconditions Examples: Ψ(ˆθ) the learned model must correctly predict an important item ( x, ỹ) ˆθ( x) = ỹ the learned model must satisfy individual fairness x, x, p(y = 1 x, ˆθ) p(y = 1 x, ˆθ) L x x 56/64
57 Bug Assumptions Ψ satisfied if we were to train through clean pipeline bugs are changes to the clean pipeline Ψ violated on the dirty pipeline 57/64
58 Debugging formulation min Y s.t. Y Y ˆθ( X) = Ỹ ˆθ = argmin θ Θ 1 n n l(x i, y i, θ) + λ θ 2 i=1 bilevel optimization (Stackelberg game) combinatorial 58/64
59 pass/fail Another special case: bug in regularization weight (logistic regression) score 59/64
60 Postcondition violated Ψ(ˆθ): Individual fairness (Lipschitz condition) x, x, p(y = 1 x, ˆθ) p(y = 1 x, ˆθ) L x x 60/64
61 Bug assumption Learner s regularization weight λ = was inappropriate ˆθ = argmin l(x, Y, θ) + λ θ 2 θ Θ 61/64
62 Debugging formulation min λ,ˆθ s.t. (λ λ) 2 Ψ(ˆθ) = true ˆθ = argmin l(x, Y, θ) + λ θ 2 θ Θ 62/64
63 pass/fail Suggested bug 1 λ = λ = score 63/64
64 Guaranteed defense? Let A(D 0 )( x) = ỹ Attacker can use the debug formulation D 1 := argmin D s.t. D 0 D p Ψ 1 (A(D)) := A(D)( x) ỹ Defender can use the debug formulation, too D 2 := argmin D s.t. D 1 D p Ψ 2 (A(D)) := A(D)( x) = ỹ When does D 2 = D 0? 64/64
Toward Adversarial Learning as Control
Toward Adversarial Learning as Control Jerry Zhu University of Wisconsin-Madison The 2nd ARO/IARPA Workshop on Adversarial Machine Learning May 9, 2018 Test time attacks Given classifier f : X Y, x X Attacker
More informationMachine Teaching. for Personalized Education, Security, Interactive Machine Learning. Jerry Zhu
Machine Teaching for Personalized Education, Security, Interactive Machine Learning Jerry Zhu NIPS 2015 Workshop on Machine Learning from and for Adaptive User Technologies Supervised Learning Review D:
More informationOptimal Teaching for Online Perceptrons
Optimal Teaching for Online Perceptrons Xuezhou Zhang Hrag Gorune Ohannessian Ayon Sen Scott Alfeld Xiaojin Zhu Department of Computer Sciences, University of Wisconsin Madison {zhangxz1123, gorune, ayonsn,
More informationMachine Learning Basics
Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationML (cont.): SUPPORT VECTOR MACHINES
ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version
More informationAdversarial Label Flips Attack on Support Vector Machines
Adversarial Label Flips Attack on Support Vector Machines Han Xiao, Huang Xiao, Claudia Eckert Institute of Informatics TU München xiaoh@in.tum.de August 26, 2012 Overview 1 Causative Attack 2 Problem
More informationNaïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationMachine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationA Magiv CV Theory for Large-Margin Classifiers
A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector
More informationarxiv: v1 [stat.ml] 25 Feb 2018
Yuzhe Ma Robert Nowa Philippe Rigollet Xuezhou Zhang Xiaojin Zhu University of Wisconsin-Madison Massachusetts Institute of Technology arxiv:8.8946v stat.ml 5 Feb 8 Abstract We call a learner super-teachable
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning and Inference in DAGs We discussed learning in DAG models, log p(x W ) = n d log p(x i j x i pa(j),
More informationHow can a Machine Learn: Passive, Active, and Teaching
How can a Machine Learn: Passive, Active, and Teaching Xiaojin Zhu jerryzhu@cs.wisc.edu Department of Computer Sciences University of Wisconsin-Madison NLP&CC 2013 Zhu (Univ. Wisconsin-Madison) How can
More informationMachine Learning (CS 567) Lecture 3
Machine Learning (CS 567) Lecture 3 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More information1 Machine Learning Concepts (16 points)
CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions
More informationCS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning
CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy
More informationReward-modulated inference
Buck Shlegeris Matthew Alger COMP3740, 2014 Outline Supervised, unsupervised, and reinforcement learning Neural nets RMI Results with RMI Types of machine learning supervised unsupervised reinforcement
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationHomework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn
Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn CMU 10-701: Machine Learning (Fall 2016) https://piazza.com/class/is95mzbrvpn63d OUT: September 13th DUE: September
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)
More informationCS325 Artificial Intelligence Chs. 18 & 4 Supervised Machine Learning (cont)
CS325 Artificial Intelligence Cengiz Spring 2013 Model Complexity in Learning f(x) x Model Complexity in Learning f(x) x Let s start with the linear case... Linear Regression Linear Regression price =
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationReinforcement Learning
Reinforcement Learning Ron Parr CompSci 7 Department of Computer Science Duke University With thanks to Kris Hauser for some content RL Highlights Everybody likes to learn from experience Use ML techniques
More informationCSE 546 Final Exam, Autumn 2013
CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationEvaluation. Andrea Passerini Machine Learning. Evaluation
Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain
More informationActive Learning and Optimized Information Gathering
Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office
More informationLogistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017
1 Will Monroe CS 109 Logistic Regression Lecture Notes #22 August 14, 2017 Based on a chapter by Chris Piech Logistic regression is a classification algorithm1 that works by trying to learn a function
More informationEvaluation requires to define performance measures to be optimized
Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More information10701/15781 Machine Learning, Spring 2007: Homework 2
070/578 Machine Learning, Spring 2007: Homework 2 Due: Wednesday, February 2, beginning of the class Instructions There are 4 questions on this assignment The second question involves coding Do not attach
More informationInformation Retrieval and Organisation
Information Retrieval and Organisation Chapter 13 Text Classification and Naïve Bayes Dell Zhang Birkbeck, University of London Motivation Relevance Feedback revisited The user marks a number of documents
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationSupport Vector Machines
Support Vector Machines INFO-4604, Applied Machine Learning University of Colorado Boulder September 28, 2017 Prof. Michael Paul Today Two important concepts: Margins Kernels Large Margin Classification
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationRelationship between Least Squares Approximation and Maximum Likelihood Hypotheses
Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses Steven Bergner, Chris Demwell Lecture notes for Cmpt 882 Machine Learning February 19, 2004 Abstract In these notes, a
More informationSupport Vector Machines (SVMs).
Support Vector Machines (SVMs). SemiSupervised Learning. SemiSupervised SVMs. MariaFlorina Balcan 3/25/215 Support Vector Machines (SVMs). One of the most theoretically well motivated and practically most
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux INRIA 8 Nov 2011 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov 2011 1 / 80 1 Introduction 2 Linear classifier 3 Convolutional neural networks
More informationA Deep Interpretation of Classifier Chains
A Deep Interpretation of Classifier Chains Jesse Read and Jaakko Holmén http://users.ics.aalto.fi/{jesse,jhollmen}/ Aalto University School of Science, Department of Information and Computer Science and
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationLab 12: Structured Prediction
December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationFrom Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018
From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction
More informationCS 188: Artificial Intelligence Spring Today
CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve
More informationProbability and Statistical Decision Theory
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Probability and Statistical Decision Theory Many slides attributable to: Erik Sudderth (UCI) Prof. Mike Hughes
More informationLinear and Logistic Regression. Dr. Xiaowei Huang
Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationSupport Vector Machine I
Support Vector Machine I Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please use piazza. No emails. HW 0 grades are back. Re-grade request for one week. HW 1 due soon. HW
More information4.1 Online Convex Optimization
CS/CNS/EE 53: Advanced Topics in Machine Learning Topic: Online Convex Optimization and Online SVM Lecturer: Daniel Golovin Scribe: Xiaodi Hou Date: Jan 3, 4. Online Convex Optimization Definition 4..
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationMidterm Exam, Spring 2005
10-701 Midterm Exam, Spring 2005 1. Write your name and your email address below. Name: Email address: 2. There should be 15 numbered pages in this exam (including this cover sheet). 3. Write your name
More informationBayesian Networks Inference with Probabilistic Graphical Models
4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 3: Linear Models I (LFD 3.2, 3.3) Cho-Jui Hsieh UC Davis Jan 17, 2018 Linear Regression (LFD 3.2) Regression Classification: Customer record Yes/No Regression: predicting
More informationLearning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14
Learning Theory Piyush Rai CS5350/6350: Machine Learning September 27, 2011 (CS5350/6350) Learning Theory September 27, 2011 1 / 14 Why Learning Theory? We want to have theoretical guarantees about our
More informationSupport Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs
E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationBlind Attacks on Machine Learners
Blind Attacks on Machine Learners Alex Beatson 1 Zhaoran Wang 1 Han Liu 1 1 Princeton University NIPS, 2016/ Presenter: Anant Kharkar NIPS, 2016/ Presenter: Anant Kharkar 1 Outline 1 Introduction Motivation
More informationLecture 8. Instructor: Haipeng Luo
Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationFinal Exam, Spring 2006
070 Final Exam, Spring 2006. Write your name and your email address below. Name: Andrew account: 2. There should be 22 numbered pages in this exam (including this cover sheet). 3. You may use any and all
More informationLinear Classifiers and the Perceptron
Linear Classifiers and the Perceptron William Cohen February 4, 2008 1 Linear classifiers Let s assume that every instance is an n-dimensional vector of real numbers x R n, and there are only two possible
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released
More informationCollaborative Filtering. Radek Pelánek
Collaborative Filtering Radek Pelánek 2017 Notes on Lecture the most technical lecture of the course includes some scary looking math, but typically with intuitive interpretation use of standard machine
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationLarge-scale Stochastic Optimization
Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation
More informationMidterm Exam Solutions, Spring 2007
1-71 Midterm Exam Solutions, Spring 7 1. Personal info: Name: Andrew account: E-mail address:. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationDecoupled Collaborative Ranking
Decoupled Collaborative Ranking Jun Hu, Ping Li April 24, 2017 Jun Hu, Ping Li WWW2017 April 24, 2017 1 / 36 Recommender Systems Recommendation system is an information filtering technique, which provides
More informationBeyond the Point Cloud: From Transductive to Semi-Supervised Learning
Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard and Mitch Marcus (and lots original slides by
More information1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016
AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework
More informationLecture 3 Classification, Logistic Regression
Lecture 3 Classification, Logistic Regression Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se F. Lindsten Summary
More information6.867 Machine learning
6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationSupport Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Support Vector Machines CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Linear Classifier Naive Bayes Assume each attribute is drawn from Gaussian distribution with the same variance Generative model:
More informationLogistic Regression Logistic
Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,
More informationSome Mathematical Models to Turn Social Media into Knowledge
Some Mathematical Models to Turn Social Media into Knowledge Xiaojin Zhu University of Wisconsin Madison, USA NLP&CC 2013 Collaborators Amy Bellmore Aniruddha Bhargava Kwang-Sung Jun Robert Nowak Jun-Ming
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationSupport Vector Machines
Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationMachine Learning (COMP-652 and ECSE-608)
Machine Learning (COMP-652 and ECSE-608) Lecturers: Guillaume Rabusseau and Riashat Islam Email: guillaume.rabusseau@mail.mcgill.ca - riashat.islam@mail.mcgill.ca Teaching assistant: Ethan Macdonald Email:
More informationLecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;
CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and
More informationQualifying Exam in Machine Learning
Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts
More informationCSC 411 Lecture 17: Support Vector Machine
CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More information