Adversarial Machine Learning: Big Data Meets Cyber Security
|
|
- Claude Kelly
- 6 years ago
- Views:
Transcription
1 Adversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics Purdue University With Murat Kantarcioglu
2 Introduction Many adversarial learning problems in practice. Image Classification/Object Recognition Intrusion Detection Fraud Detection Spam Detection Social Networks Adversary adapts to avoid being detected. New solutions are needed to address this challenge.
3 Million ways to write a spam From: "Ezra Martens" <ezrabngktbbem... To: "Eleftheria Marconi" <clifton@pu... Subject: shunless Phaxrrmaceutical Date: Fri, 30 Sep :49: Hello, Easy Fast = Best Home Total OrdeShipPrricDelivConf ringpingeseryidentiality VIAAmbCIALevVALXan GRAienLISitraIUMax $ $ $$ Get =additional informmation attempted to
4 Understanding Adversarial Learning It is not concept drift. It is not online learning. Adversary changes the objects under its control to avoid being detected. This is a game between the defender and the adversary.
5 Solution Ideas Constantly update a learning algorithm Dalvi et. al., Adversarial classification, KDD 2004 Optimize worse case performance Globerson and Roweis, Nightmare at test time: robust learning by feature selection, ICML 2006 Goals How to evaluate a defensive learning algorithm? How to construct a resilient defensive learning algorithm?
6 Game Theoretic Framework Often a classifier is updated after observing adversaries actions, such as spam filters. Adversarial Stackelberg Game Two players take sequential actions. 1. Adversary moves first by choosing a transformation T, f b (x) fb T (x) 2. After observing T, defender sets parameter values for a learning algorithm and creates a defensive rule h(x). f(x) = p g f g (x) + p b fb T (x). p g + p b = 1. Overall it is a mixture distribution with two classes.
7 Game Theoretic Framework Adversary s payoff is defined as u b (T, h) = L(h,g) g(t, x)f T b (x) dx. g(t, x) is profit for a bad object x being classified as a good one, after transformation T being applied. There is a penalty for transformation. L(h, g) is the region where the objects are identified as legitimate under decision rule h(x). Let C(T, h) be the misclassification cost. Defender s payoff is u g (T, h) = C(T, h). h T (x) is defender s best response facing T.
8 Game Theoretic Framework The adversary gain of applying transformation T is W (T ) = u b (T, h T ). W (T ) = L(h T,g) g(t, x)f T b (x)dx = E f T b (T e, h T e) is a subgame perfect equilibrium, s.t., T e = argmax T S ( W (T ) ). (I {L(hT,g)}(x)g(T, x)). After an equilibrium is reached, each player has no incentive to change their action.
9 Game Theoretic Framework Attribute selection is important in adversarial learning. Example: Gaussian mixture distribution, Naive Bayes classifier, and linear penalty for transformation. π g π b Penalty Equi. Bayes Error X 1 N(1, 1) N(3, 1) a = X 2 N(1, 1) N(3.5, 1) a = X 3 N(1, 1) N(4, 1) a =
10 Game Theoretic Framework Discretize the continuous numerical variables. Use the product of one dimensional marginal distributions to approximate the joint distribution. Search for an equilibrium using linear programming. Case study: Lending Club Data Peer to peer financing, offer small loans ranging from $1000 to $ Except for a credit report, many fields are easy to lie, such as job information, purpose of the loan etc.
11 Game Theoretic Framework We use the following attributes from the data set. X 1 Amount requested X 2 Loan purpose X 3 Debt-to-income ratio X 4 Home ownership (any, none, rent, own, mortgage) X 5 Monthly income X 6 FICO range X 7 Open credit lines X 8 Total credit lines X 9 Revolving credit balance X 10 Revolving line utilization X 11 Inquiries in the last 6 months X 12 Accounts now delinquent X 13 Delinquent amount X 14 Delinquencies in last 2 years X 15 Months since last delinquency
12 Game Theoretic Framework The instances originally are classified as: 1) Removed ; 2) Loan is being issued ; 3) Late ( days) ; 4) Late (16-30 days) ; 5) Issued ; 6) In review ; 7) In grace period ; 8) In funding ; 9) Fully paid ; 10) Expired ; 11) Default ; 12) Current ; 13) Charged Off ; 14) New instances after data cleaning. p g = p b = 0.5. Numerical attributes are discretized using 10 equi-width buckets. Naive Bayes classifier with independent attributes. Attacker transforms each attribute independently.
13 Game Theoretic Framework max subject to m i=1 j=1 ( m m f ij d ij + i=1 I {yi <q i } y i u i ) f ij 0 i = 1,..., m, j = 1,..., m (1) m f ij = p i i = 1,..., m (2) j=1 m f ij = y j j = 1,..., m (3) i=1 m y j = 1 (4) j=1 p i is the prob. of x = w i given that instance is in bad class. q i.is the prob. of x = w i given that instance is in bad class. f ij is the proportion moved from w i to w j by the adversary. d ij is the penalty of transforming X from w i to w j. u i is the expected profit. y i is the prob. after transformation.
14 Game Theoretic Framework The worst case scenario has transformation cost 0. For penalized case, we assume improving an attribute for one level costs $500 (results do not change in $300-$600 range). Attribute selection (choose top 6 attributes according to one dim classification accuracy after transformation) works well. Equilibrium performance is a good indicator of the long term success of a learning algorithm. Penalty prevents aggressive attacks. Experiment Type Accuracy Initial data set prior to transformation 89.95% Worst case scenario 61.25% Penalized transformation scenario 72.28% Attribute selection scenario 70.92%
15 Leader vs. Follower How defender being a leader versus being a follower affects its equilibrium strategies? One defender with utility D(h). m adversaries, each with utility A i (t i ). h and t i are player strategies.
16 Leader vs. Follower Defender being the leader A one-leader-m-follower game. 1. Given a leader s strategy h fixed, assume the m adversaries (i.e., the followers ) strategies are the attacks T = (t 1,, t m ). For the i-th adversary, further assume all other adversaries strategies are fixed, i.e., fixed t j, j i. Solve the following optimization for t h i : t h i = argmax {t i S i } {A i(t i, h)} 2. With the solution from above, T h = (t h 1,, th m) is the m adversaries joint optimal attacks for a given defender strategy h, the defender solves another optimization problem. h e = argmax {h H} {D(t h 1,, th m, h)} (h e, t he 1,, the m) is an equilibrium strategy for all players in the game.
17 Leader vs. Follower Defender being the follower A m-leader-one-follower game. 1. Given the joint attacks from the m adversaries, T = (t 1,, t m ), solve for the defender s optimal strategy. h T = argmax {h H} {D(t 1,, t m, h)} 2.With the solution above as the defender s optimal strategy h T against joint attacks T = (t 1,, t m ), solve for the optimal joint attacks T e. T e = (t e 1,..., te m) = argmax {ti S i, i} m i=1 A i (t i, h T ) (h T e, t e 1,, te m) is an equilibrium strategy for all players in the game.
18 Leader vs. Follower Gaussian Mixture Population The defender controls the normal population N p (µ g, Σ g ). Each adversary controls a population N p (µ b i, Σb i ). For an attack t, adversarial objects move toward µ g, the center of the normal. X t = µ g + (1 t) (X µ g ). 0 t 1. t = 1 is the strongest attack. Adversary population after attack is N((1 t)µ b + tµ g, (1 t) 2 Σ b ). m adversaries strategy is the joint attack (t 1,, t m ). Defender s strategy is to build a defensive wall around the center µ g.
19 Leader vs. Follower An Euclidean defensive wall is an ellipsoid shaped defensive region (X ˆµ g ) S g, 1 (X ˆµ g ) = χ 2 p (α). 0 < α < 1 controls the size. It is an approximate confidence region for multivariate normal. A Manhattan defensive wall forms a diamond shaped region. ˆσ g i p i=1 X i ˆµ g i ˆσ g i = η, is the sample standard deviation. ˆµ g i is the sample mean. η controls the size.
20 Leader vs. Follower Obtain η(α) as a function of α. Using the sample mean vector ˆµ g and var-cov matrix S g from the normal objects, generate a large sample from N(ˆµ g, S g ). For an η(α), (100α)% of the generated sample points fall into the diamond shaped Manhattan region with vertices at ˆµ g i ± η(α)ˆσg i on each dimension. A defender s strategy is to choose h = α or h = η(α). Defender utility is D(α) or D(η(α)). Let c be misclassification cost. D(α) = 100(error-of-normal + c error-of-adversary). L(t) = E ( max{k a X t X 2, 0} ), an adversary utility. k is max utility of an adversarial object. X t X 2 measures how much an adversary object is moved towards normal. Do exhaustive search to find the equilibria of the games.
21 Leader vs. Follower Euclidean defensive wall. Left is defender being the leader; right is defender being the follower.
22 Leader vs. Follower Manhattan defensive wall. Left is defender being the leader; right is defender being the follower.
23 Adversarial Clustering Previous adversarial learning work assumes the availability of large amounts of labeled instances. We develop a novel grid based adversarial clustering algorithm using mostly unlabeled objects with a few labeled instances. A classifier created using very few labeled objects is inaccurate. We identify 1) the centers of normal objects, 2) sub-clusters of attack objects, 3) the overlapping areas, 4) outliers and unknown clusters. We draw defensive walls around the centers of the normal objects to block out attack objects. The size of a defensive wall is based on the previous game theoretic study.
24 Adversarial Clustering It is not semi-supervised learning. We operate under very different assumptions. In adversarial settings, objects similar to each other belong to different classes, while objects in different clusters belong to the same class. Within a cluster, objects from two classes can overlap significantly. Adversaries can bridge the gap between two previously well separated clusters. Semi-supervised learning assigns labels to all the unlabeled objects aiming at the best accuracy. We do not label the overlapping regions and outliers.
25 Adversarial Clustering We have nearly purely normal objects inside the defensive walls, despite an increased Bayes error. Like in airport security. A small number of passengers use the fast pre-check lane, while all other passengers go through security check. The goal is not to let a single terrorist enter an airport, at a cost of blocking out many normal objects. A classification boundary is a defensive wall, analogous to a point estimate, while the overlapping areas analogous to confidence regions.
26 Adversarial Clustering A Grid Based Adversarial Clustering Algorithm Initialization: Set parameter values Merge 1: Creating labeled normal and abnormal sub-clusters Merge 2: Clustering the remaining unlabeled data points Merge 3: Merge all the data points without considering the labels Match: Match the big unlabeled clusters with the normal and abnormal clusters, and the clusters of the remaining unlabeled data points from the first pass. Draw α-level defensive walls inside the normal regions A weight parameter k controls the size of the overlapping areas.
27 Adversarial Clustering Compare with semi-supervised learning, EM least squares and S4VM. α = 0.6. True labels below. Blue is for normal; orange is for abnormal; purple for unlabeled; yellow and black for outliers
28 Adversarial Clustering Exp.1. Left is ADClust with k = 10; right is ADClust with k = 20.
29 Adversarial Clustering Exp.1. Left is EM least square; right is S4VM
30 Adversarial Clustering Exp.2. Left is ADClust with k = 10; right is ADClust with k = 20
31 Adversarial Clustering Exp.2. Left is EM least square; right is S4VM
32 Adversarial Clustering Experiment 1 has an attack taking place between two normal clusters. Experiment 2 has an unknown cluster, with normal and abnormal clusters heavily mixed under a strong attack. Our algorithm found the core areas of the normal. Semi-supervised wrongly labeled a whole normal cluster in experiment 1, and labeled the unknown cluster as normal in experiment 2. Smaller k is more conservative, creating larger unlabeled mixed areas.
33 Adversarial Clustering KDD Cup 1999 Network Intrusion Data Around 40 percent are network intrusion instances. We use instances from training data and top 7 continuous features. Experiment 1 has 100 runs. In one run, 150 instances are randomly sampled with their labels. 99.4% become unlabeled. Experiment 2 has 100 runs. In one run, 100 instances are randomly sampled with their labels. 99.6% become unlabeled. Larger k produces less unlabeled points, bigger normal regions containing more attack instances. More aggressive.
34 Adversarial Clustering Exp.1. Increase the weight k from 1 to 100. Top left is percent of abnormal points in mixed region;bottom left is percent of abnormal points among outliers; top right is the number of points in mixed region; and bottom right is the number of points as outliers.
35 Adversarial Clustering Exp.2. k equals to 1, 30 and 50 as low, medium and high weights. Set α levels from 0.6 to KDD data is highly mixed, yet we achieve on average nearly 90% pure normal rate inside the defensive walls
36 Adversarial SVM Classification boundaries. + is for the untransformed bad objects; o is for the good objects; * is for the transformed bad objects, i.e., the attack objects. The black dashed line is the standard SVM classification boundary, and the blue line is the Adversarial SVM classification boundary. Both the untransformed and the transformed bad objects are what we want to detect and block.
37 Adversarial SVM AD-SVM solves a convex optimization problem where the constraints are tied to adversarial attack models. Free-range attack: Adversary can move attack objects anywhere in the domain. C f (x min.j x ij ) δ ij C f (x max.j x ij ) The j th feature of an instance x i falls in between x max.j and x min.j. C f [0, 1]. C f = 0 means no attack. C f = 1 is the most aggressive attack.
38 Adversarial SVM Targeted attack: Adversary can only move attack instances closer to a targeted value. x t i is the target. The adversary adds δ ij to the j th feature x ij of an attack object. δ ij x t ij x ij. An upper bound on the amount of movement for x ij : 0 (x t ij x ij)δ ij C ξ x t ij 1 C x ij δ x ij + x t ij (x t ij x ij) 2 C δ reflects the loss of malicious utility. It can be larger than 1. 1 C δ x t ij x ij x ij + x t ij is the maximum percentage of x t ij x ij that δ ij is allowed to be. C ξ [0, 1] is a discount factor. Larger C ξ allows more movement.
39 Adversarial SVM SVM risk minimization model: free-range attack argmin 1 w,b,ξi,t i,u i,v i 2 w 2 + C i ξ i s.t. ξ i 0 ξ i 1 y i (w x i + b) + t i t i ( j C f vij (x max j u i v i = 1 2 (1 + y i)w u i 0 v i 0 x ij ) u ij (x min j x ij ) ) SVM risk minimization model: targeted attack argmin 1 w,b,ξi,t i,u i,v i 2 w 2 + C i ξ i s.t. ξ i 0 ξ i 1 y i (w x i + b) + t i t i j e ij u ij ( u i + v i ) (x t i x i) = 1 2 (1 + y i)w u i 0 v i 0
40 Adversarial SVM spambase data is from the UCI data repository. Accuracy of AD- SVM, SVM, and one-class SVM on the spambase data as free range attacks intensify. C f increases as attacks become more aggressive. Generate attacks as δ ij = f attack (x t ij x ij) f attack = 0 f attack = 0.3 f attack = 0.5 f attack = 0.7 f attack = 1.0 AD-SVM C f = AD-SVM C f = AD-SVM C f = AD-SVM C f = AD-SVM C f = SVM One-Class SVM
41 Adversarial SVM C δ decreases as targeted attacks become more aggressive. C ξ = 1 f attack = 0 f attack = 0.3 f attack = 0.5 f attack = 0.7 f attack = 1.0 AD-SVM C δ = AD-SVM C δ = AD-SVM C δ = AD-SVM C δ = AD-SVM C δ = SVM One-class SVM AD-SVM is more resilient to modest attacks than other SVM learning algorithms.
42 Funding Support ARO W911NF : Data Analytics for Cyber Security: Defeating the Active Adversaries, UT Dallas PI: M. Kantarcioglu, Purdue PI: B. Xi, $470,000, 08/07/ /06/2020, Amount Responsible: $222,500 ARO W911NF : A Game Theoretic Framework for Adversarial Classification, UT Dallas PI: M. Kantarcioglu, Purdue PI: B. Xi, $440,000, 08/01/ /31/2015, Amount Responsible: $210,140 Publications Kantarcioglu, M., Xi, B., and Clifton, C., Classifier Evaluation and Attribute Selection against Active Adversaries, Data Mining and Knowledge Discovery. 22(1-2), , 2011 Zhou, Y., Kantarcioglu, M., Thuraisingham, B., and Xi, B., Adversarial Support Vector Machine Learning, Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, , KDD 2012
43 Kantarcioglu, M., and Xi, B., Adversarial Data Mining, North Atlantic Treaty Organization (NATO) SAS-106 Symposium on Analysis Support to Decision Making in Cyber Defense, 1 11, June, 2014, Estonia Zhou, Y., Kantarcioglu, M., Xi, B., (Hot Topic Essay) Adversarial Learning: Mind Games with the Opponent, ACM Computing Reviews, August 2017 Zhou, Y., Kantarcioglu, M., Xi, B., A Survey of Game Theoretic Approach for Adversarial Machine Learning, revision submitted Wei, W., Xi, B., Kantarcioglu, M., Adversarial Clustering: A Grid Based Clustering Algorithm against Active Adversaries, submitted Zhou, Y., Kantarcioglu, M., Xi, B., Adversarial Active Learning, submitted
Unsupervised Anomaly Detection for High Dimensional Data
Unsupervised Anomaly Detection for High Dimensional Data Department of Mathematics, Rowan University. July 19th, 2013 International Workshop in Sequential Methodologies (IWSM-2013) Outline of Talk Motivation
More informationPrinciples of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata
Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date
More informationL5 Support Vector Classification
L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationCSE446: Clustering and EM Spring 2017
CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationLINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning
LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationAlgorithmic Stability and Generalization Christoph Lampert
Algorithmic Stability and Generalization Christoph Lampert November 28, 2018 1 / 32 IST Austria (Institute of Science and Technology Austria) institute for basic research opened in 2009 located in outskirts
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationAnomaly Detection. Jing Gao. SUNY Buffalo
Anomaly Detection Jing Gao SUNY Buffalo 1 Anomaly Detection Anomalies the set of objects are considerably dissimilar from the remainder of the data occur relatively infrequently when they do occur, their
More informationDoes Unlabeled Data Help?
Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline
More informationThe Perceptron Algorithm, Margins
The Perceptron Algorithm, Margins MariaFlorina Balcan 08/29/2018 The Perceptron Algorithm Simple learning algorithm for supervised classification analyzed via geometric margins in the 50 s [Rosenblatt
More informationThe No-Regret Framework for Online Learning
The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Support Vector Machine (SVM) Hamid R. Rabiee Hadi Asheri, Jafar Muhammadi, Nima Pourdamghani Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Introduction
More informationBrief Introduction to Machine Learning
Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector
More informationOnline Manifold Regularization: A New Learning Setting and Empirical Study
Online Manifold Regularization: A New Learning Setting and Empirical Study Andrew B. Goldberg 1, Ming Li 2, Xiaojin Zhu 1 1 Computer Sciences, University of Wisconsin Madison, USA. {goldberg,jerryzhu}@cs.wisc.edu
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationIntro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation
Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationAn Algorithms-based Intro to Machine Learning
CMU 15451 lecture 12/08/11 An Algorithmsbased Intro to Machine Learning Plan for today Machine Learning intro: models and basic issues An interesting algorithm for combining expert advice Avrim Blum [Based
More informationCS570 Data Mining. Anomaly Detection. Li Xiong. Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber.
CS570 Data Mining Anomaly Detection Li Xiong Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber April 3, 2011 1 Anomaly Detection Anomaly is a pattern in the data that does not conform
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationAdversarial Classification
Adversarial Classification Nilesh Dalvi, Pedro Domingos, Mausam, Sumit Sanghai, Deepak Verma [KDD 04, Seattle] Presented by: Aditya Menon UCSD April 22, 2008 Presented by: Aditya Menon (UCSD) Adversarial
More informationLearning with strategic agents: from adversarial learning to game-theoretic statistics
Learning with strategic agents: from adversarial learning to game-theoretic statistics Patrick Loiseau, EURECOM (Sophia-Antipolis) Graduate Summer School: Games and Contracts for Cyber-Physical Security
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationTowards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert
Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert HSE Computer Science Colloquium September 6, 2016 IST Austria (Institute of Science and Technology
More informationCopyright (C) 2013 David K. Levine This document is an open textbook; you can redistribute it and/or modify it under the terms of the Creative
Copyright (C) 2013 David K. Levine This document is an open textbook; you can redistribute it and/or modify it under the terms of the Creative Commons attribution license http://creativecommons.org/licenses/by/2.0/
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationData Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition
Data Mining Classification: Basic Concepts and Techniques Lecture Notes for Chapter 3 by Tan, Steinbach, Karpatne, Kumar 1 Classification: Definition Given a collection of records (training set ) Each
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More informationSupport Vector Machines
EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationData Preprocessing. Cluster Similarity
1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More information1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016
AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationLinear Programming-based Data Mining Techniques And Credit Card Business Intelligence
Linear Programming-based Data Mining Techniques And Credit Card Business Intelligence Yong Shi the Charles W. and Margre H. Durham Distinguished Professor of Information Technology University of Nebraska,
More informationSVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning
SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are
More informationIntroduction to Game Theory
Introduction to Game Theory Part 2. Dynamic games of complete information Chapter 2. Two-stage games of complete but imperfect information Ciclo Profissional 2 o Semestre / 2011 Graduação em Ciências Econômicas
More informationChapter 10. Semi-Supervised Learning
Chapter 10. Semi-Supervised Learning Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Outline
More informationClick Prediction and Preference Ranking of RSS Feeds
Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationLearning with Rejection
Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationStat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.
Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have
More informationMachine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)
Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang (Chap. 12 of CIML) Nonlinear Features x4: -1 x1: +1 x3: +1 x2: -1 Concatenated (combined) features XOR:
More informationComputational Learning Theory. CS534 - Machine Learning
Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows
More informationSupport Vector Machines
Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization
More informationSecurity Analytics. Topic 6: Perceptron and Support Vector Machine
Security Analytics Topic 6: Perceptron and Support Vector Machine Purdue University Prof. Ninghui Li Based on slides by Prof. Jenifer Neville and Chris Clifton Readings Principle of Data Mining Chapter
More informationSample and Computationally Efficient Active Learning. Maria-Florina Balcan Carnegie Mellon University
Sample and Computationally Efficient Active Learning Maria-Florina Balcan Carnegie Mellon University Machine Learning is Shaping the World Highly successful discipline with lots of applications. Computational
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationCSE 546 Final Exam, Autumn 2013
CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,
More informationNeural networks and support vector machines
Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith
More informationBoosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13
Boosting Ryan Tibshirani Data Mining: 36-462/36-662 April 25 2013 Optional reading: ISL 8.2, ESL 10.1 10.4, 10.7, 10.13 1 Reminder: classification trees Suppose that we are given training data (x i, y
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationSupport Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Support Vector Machines CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Linear Classifier Naive Bayes Assume each attribute is drawn from Gaussian distribution with the same variance Generative model:
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationCPSC 340: Machine Learning and Data Mining. Gradient Descent Fall 2016
CPSC 340: Machine Learning and Data Mining Gradient Descent Fall 2016 Admin Assignment 1: Marks up this weekend on UBC Connect. Assignment 2: 3 late days to hand it in Monday. Assignment 3: Due Wednesday
More informationSummary and discussion of: Dropout Training as Adaptive Regularization
Summary and discussion of: Dropout Training as Adaptive Regularization Statistics Journal Club, 36-825 Kirstin Early and Calvin Murdock November 21, 2014 1 Introduction Multi-layered (i.e. deep) artificial
More informationClassifier Complexity and Support Vector Classifiers
Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl
More informationMidterm exam CS 189/289, Fall 2015
Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points
More informationBinary Classification / Perceptron
Binary Classification / Perceptron Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Supervised Learning Input: x 1, y 1,, (x n, y n ) x i is the i th data
More informationAnomaly Detection via Over-sampling Principal Component Analysis
Anomaly Detection via Over-sampling Principal Component Analysis Yi-Ren Yeh, Zheng-Yi Lee, and Yuh-Jye Lee Abstract Outlier detection is an important issue in data mining and has been studied in different
More informationA Framework for Adaptive Anomaly Detection Based on Support Vector Data Description
A Framework for Adaptive Anomaly Detection Based on Support Vector Data Description Min Yang, HuanGuo Zhang, JianMing Fu, and Fei Yan School of Computer, State Key Laboratory of Software Engineering, Wuhan
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationOptimization Methods for Machine Learning (OMML)
Optimization Methods for Machine Learning (OMML) 2nd lecture (2 slots) Prof. L. Palagi 16/10/2014 1 What is (not) Data Mining? By Namwar Rizvi - Ad Hoc Query: ad Hoc queries just examines the current data
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationMachine Learning Theory (CS 6783)
Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Hollister, 306 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)
More informationPARSIMONIOUS MULTIVARIATE COPULA MODEL FOR DENSITY ESTIMATION. Alireza Bayestehtashk and Izhak Shafran
PARSIMONIOUS MULTIVARIATE COPULA MODEL FOR DENSITY ESTIMATION Alireza Bayestehtashk and Izhak Shafran Center for Spoken Language Understanding, Oregon Health & Science University, Portland, Oregon, USA
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationCS534: Machine Learning. Thomas G. Dietterich 221C Dearborn Hall
CS534: Machine Learning Thomas G. Dietterich 221C Dearborn Hall tgd@cs.orst.edu http://www.cs.orst.edu/~tgd/classes/534 1 Course Overview Introduction: Basic problems and questions in machine learning.
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationIntroduction to SVM and RVM
Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationOnline Advertising is Big Business
Online Advertising Online Advertising is Big Business Multiple billion dollar industry $43B in 2013 in USA, 17% increase over 2012 [PWC, Internet Advertising Bureau, April 2013] Higher revenue in USA
More informationMachine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler
+ Machine Learning and Data Mining Bayes Classifiers Prof. Alexander Ihler A basic classifier Training data D={x (i),y (i) }, Classifier f(x ; D) Discrete feature vector x f(x ; D) is a con@ngency table
More informationISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification
ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology
More informationWhen Dictionary Learning Meets Classification
When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor: Professor Yohann Tendero 1 UCLA 2 Dalhousie University 3 Harvey Mudd College August
More informationMachine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationDistirbutional robustness, regularizing variance, and adversaries
Distirbutional robustness, regularizing variance, and adversaries John Duchi Based on joint work with Hongseok Namkoong and Aman Sinha Stanford University November 2017 Motivation We do not want machine-learned
More information