Adversarial Machine Learning: Big Data Meets Cyber Security

Size: px
Start display at page:

Download "Adversarial Machine Learning: Big Data Meets Cyber Security"

Transcription

1 Adversarial Machine Learning: Big Data Meets Cyber Security Bowei Xi Department of Statistics Purdue University With Murat Kantarcioglu

2 Introduction Many adversarial learning problems in practice. Image Classification/Object Recognition Intrusion Detection Fraud Detection Spam Detection Social Networks Adversary adapts to avoid being detected. New solutions are needed to address this challenge.

3 Million ways to write a spam From: "Ezra Martens" <ezrabngktbbem... To: "Eleftheria Marconi" <clifton@pu... Subject: shunless Phaxrrmaceutical Date: Fri, 30 Sep :49: Hello, Easy Fast = Best Home Total OrdeShipPrricDelivConf ringpingeseryidentiality VIAAmbCIALevVALXan GRAienLISitraIUMax $ $ $$ Get =additional informmation attempted to

4 Understanding Adversarial Learning It is not concept drift. It is not online learning. Adversary changes the objects under its control to avoid being detected. This is a game between the defender and the adversary.

5 Solution Ideas Constantly update a learning algorithm Dalvi et. al., Adversarial classification, KDD 2004 Optimize worse case performance Globerson and Roweis, Nightmare at test time: robust learning by feature selection, ICML 2006 Goals How to evaluate a defensive learning algorithm? How to construct a resilient defensive learning algorithm?

6 Game Theoretic Framework Often a classifier is updated after observing adversaries actions, such as spam filters. Adversarial Stackelberg Game Two players take sequential actions. 1. Adversary moves first by choosing a transformation T, f b (x) fb T (x) 2. After observing T, defender sets parameter values for a learning algorithm and creates a defensive rule h(x). f(x) = p g f g (x) + p b fb T (x). p g + p b = 1. Overall it is a mixture distribution with two classes.

7 Game Theoretic Framework Adversary s payoff is defined as u b (T, h) = L(h,g) g(t, x)f T b (x) dx. g(t, x) is profit for a bad object x being classified as a good one, after transformation T being applied. There is a penalty for transformation. L(h, g) is the region where the objects are identified as legitimate under decision rule h(x). Let C(T, h) be the misclassification cost. Defender s payoff is u g (T, h) = C(T, h). h T (x) is defender s best response facing T.

8 Game Theoretic Framework The adversary gain of applying transformation T is W (T ) = u b (T, h T ). W (T ) = L(h T,g) g(t, x)f T b (x)dx = E f T b (T e, h T e) is a subgame perfect equilibrium, s.t., T e = argmax T S ( W (T ) ). (I {L(hT,g)}(x)g(T, x)). After an equilibrium is reached, each player has no incentive to change their action.

9 Game Theoretic Framework Attribute selection is important in adversarial learning. Example: Gaussian mixture distribution, Naive Bayes classifier, and linear penalty for transformation. π g π b Penalty Equi. Bayes Error X 1 N(1, 1) N(3, 1) a = X 2 N(1, 1) N(3.5, 1) a = X 3 N(1, 1) N(4, 1) a =

10 Game Theoretic Framework Discretize the continuous numerical variables. Use the product of one dimensional marginal distributions to approximate the joint distribution. Search for an equilibrium using linear programming. Case study: Lending Club Data Peer to peer financing, offer small loans ranging from $1000 to $ Except for a credit report, many fields are easy to lie, such as job information, purpose of the loan etc.

11 Game Theoretic Framework We use the following attributes from the data set. X 1 Amount requested X 2 Loan purpose X 3 Debt-to-income ratio X 4 Home ownership (any, none, rent, own, mortgage) X 5 Monthly income X 6 FICO range X 7 Open credit lines X 8 Total credit lines X 9 Revolving credit balance X 10 Revolving line utilization X 11 Inquiries in the last 6 months X 12 Accounts now delinquent X 13 Delinquent amount X 14 Delinquencies in last 2 years X 15 Months since last delinquency

12 Game Theoretic Framework The instances originally are classified as: 1) Removed ; 2) Loan is being issued ; 3) Late ( days) ; 4) Late (16-30 days) ; 5) Issued ; 6) In review ; 7) In grace period ; 8) In funding ; 9) Fully paid ; 10) Expired ; 11) Default ; 12) Current ; 13) Charged Off ; 14) New instances after data cleaning. p g = p b = 0.5. Numerical attributes are discretized using 10 equi-width buckets. Naive Bayes classifier with independent attributes. Attacker transforms each attribute independently.

13 Game Theoretic Framework max subject to m i=1 j=1 ( m m f ij d ij + i=1 I {yi <q i } y i u i ) f ij 0 i = 1,..., m, j = 1,..., m (1) m f ij = p i i = 1,..., m (2) j=1 m f ij = y j j = 1,..., m (3) i=1 m y j = 1 (4) j=1 p i is the prob. of x = w i given that instance is in bad class. q i.is the prob. of x = w i given that instance is in bad class. f ij is the proportion moved from w i to w j by the adversary. d ij is the penalty of transforming X from w i to w j. u i is the expected profit. y i is the prob. after transformation.

14 Game Theoretic Framework The worst case scenario has transformation cost 0. For penalized case, we assume improving an attribute for one level costs $500 (results do not change in $300-$600 range). Attribute selection (choose top 6 attributes according to one dim classification accuracy after transformation) works well. Equilibrium performance is a good indicator of the long term success of a learning algorithm. Penalty prevents aggressive attacks. Experiment Type Accuracy Initial data set prior to transformation 89.95% Worst case scenario 61.25% Penalized transformation scenario 72.28% Attribute selection scenario 70.92%

15 Leader vs. Follower How defender being a leader versus being a follower affects its equilibrium strategies? One defender with utility D(h). m adversaries, each with utility A i (t i ). h and t i are player strategies.

16 Leader vs. Follower Defender being the leader A one-leader-m-follower game. 1. Given a leader s strategy h fixed, assume the m adversaries (i.e., the followers ) strategies are the attacks T = (t 1,, t m ). For the i-th adversary, further assume all other adversaries strategies are fixed, i.e., fixed t j, j i. Solve the following optimization for t h i : t h i = argmax {t i S i } {A i(t i, h)} 2. With the solution from above, T h = (t h 1,, th m) is the m adversaries joint optimal attacks for a given defender strategy h, the defender solves another optimization problem. h e = argmax {h H} {D(t h 1,, th m, h)} (h e, t he 1,, the m) is an equilibrium strategy for all players in the game.

17 Leader vs. Follower Defender being the follower A m-leader-one-follower game. 1. Given the joint attacks from the m adversaries, T = (t 1,, t m ), solve for the defender s optimal strategy. h T = argmax {h H} {D(t 1,, t m, h)} 2.With the solution above as the defender s optimal strategy h T against joint attacks T = (t 1,, t m ), solve for the optimal joint attacks T e. T e = (t e 1,..., te m) = argmax {ti S i, i} m i=1 A i (t i, h T ) (h T e, t e 1,, te m) is an equilibrium strategy for all players in the game.

18 Leader vs. Follower Gaussian Mixture Population The defender controls the normal population N p (µ g, Σ g ). Each adversary controls a population N p (µ b i, Σb i ). For an attack t, adversarial objects move toward µ g, the center of the normal. X t = µ g + (1 t) (X µ g ). 0 t 1. t = 1 is the strongest attack. Adversary population after attack is N((1 t)µ b + tµ g, (1 t) 2 Σ b ). m adversaries strategy is the joint attack (t 1,, t m ). Defender s strategy is to build a defensive wall around the center µ g.

19 Leader vs. Follower An Euclidean defensive wall is an ellipsoid shaped defensive region (X ˆµ g ) S g, 1 (X ˆµ g ) = χ 2 p (α). 0 < α < 1 controls the size. It is an approximate confidence region for multivariate normal. A Manhattan defensive wall forms a diamond shaped region. ˆσ g i p i=1 X i ˆµ g i ˆσ g i = η, is the sample standard deviation. ˆµ g i is the sample mean. η controls the size.

20 Leader vs. Follower Obtain η(α) as a function of α. Using the sample mean vector ˆµ g and var-cov matrix S g from the normal objects, generate a large sample from N(ˆµ g, S g ). For an η(α), (100α)% of the generated sample points fall into the diamond shaped Manhattan region with vertices at ˆµ g i ± η(α)ˆσg i on each dimension. A defender s strategy is to choose h = α or h = η(α). Defender utility is D(α) or D(η(α)). Let c be misclassification cost. D(α) = 100(error-of-normal + c error-of-adversary). L(t) = E ( max{k a X t X 2, 0} ), an adversary utility. k is max utility of an adversarial object. X t X 2 measures how much an adversary object is moved towards normal. Do exhaustive search to find the equilibria of the games.

21 Leader vs. Follower Euclidean defensive wall. Left is defender being the leader; right is defender being the follower.

22 Leader vs. Follower Manhattan defensive wall. Left is defender being the leader; right is defender being the follower.

23 Adversarial Clustering Previous adversarial learning work assumes the availability of large amounts of labeled instances. We develop a novel grid based adversarial clustering algorithm using mostly unlabeled objects with a few labeled instances. A classifier created using very few labeled objects is inaccurate. We identify 1) the centers of normal objects, 2) sub-clusters of attack objects, 3) the overlapping areas, 4) outliers and unknown clusters. We draw defensive walls around the centers of the normal objects to block out attack objects. The size of a defensive wall is based on the previous game theoretic study.

24 Adversarial Clustering It is not semi-supervised learning. We operate under very different assumptions. In adversarial settings, objects similar to each other belong to different classes, while objects in different clusters belong to the same class. Within a cluster, objects from two classes can overlap significantly. Adversaries can bridge the gap between two previously well separated clusters. Semi-supervised learning assigns labels to all the unlabeled objects aiming at the best accuracy. We do not label the overlapping regions and outliers.

25 Adversarial Clustering We have nearly purely normal objects inside the defensive walls, despite an increased Bayes error. Like in airport security. A small number of passengers use the fast pre-check lane, while all other passengers go through security check. The goal is not to let a single terrorist enter an airport, at a cost of blocking out many normal objects. A classification boundary is a defensive wall, analogous to a point estimate, while the overlapping areas analogous to confidence regions.

26 Adversarial Clustering A Grid Based Adversarial Clustering Algorithm Initialization: Set parameter values Merge 1: Creating labeled normal and abnormal sub-clusters Merge 2: Clustering the remaining unlabeled data points Merge 3: Merge all the data points without considering the labels Match: Match the big unlabeled clusters with the normal and abnormal clusters, and the clusters of the remaining unlabeled data points from the first pass. Draw α-level defensive walls inside the normal regions A weight parameter k controls the size of the overlapping areas.

27 Adversarial Clustering Compare with semi-supervised learning, EM least squares and S4VM. α = 0.6. True labels below. Blue is for normal; orange is for abnormal; purple for unlabeled; yellow and black for outliers

28 Adversarial Clustering Exp.1. Left is ADClust with k = 10; right is ADClust with k = 20.

29 Adversarial Clustering Exp.1. Left is EM least square; right is S4VM

30 Adversarial Clustering Exp.2. Left is ADClust with k = 10; right is ADClust with k = 20

31 Adversarial Clustering Exp.2. Left is EM least square; right is S4VM

32 Adversarial Clustering Experiment 1 has an attack taking place between two normal clusters. Experiment 2 has an unknown cluster, with normal and abnormal clusters heavily mixed under a strong attack. Our algorithm found the core areas of the normal. Semi-supervised wrongly labeled a whole normal cluster in experiment 1, and labeled the unknown cluster as normal in experiment 2. Smaller k is more conservative, creating larger unlabeled mixed areas.

33 Adversarial Clustering KDD Cup 1999 Network Intrusion Data Around 40 percent are network intrusion instances. We use instances from training data and top 7 continuous features. Experiment 1 has 100 runs. In one run, 150 instances are randomly sampled with their labels. 99.4% become unlabeled. Experiment 2 has 100 runs. In one run, 100 instances are randomly sampled with their labels. 99.6% become unlabeled. Larger k produces less unlabeled points, bigger normal regions containing more attack instances. More aggressive.

34 Adversarial Clustering Exp.1. Increase the weight k from 1 to 100. Top left is percent of abnormal points in mixed region;bottom left is percent of abnormal points among outliers; top right is the number of points in mixed region; and bottom right is the number of points as outliers.

35 Adversarial Clustering Exp.2. k equals to 1, 30 and 50 as low, medium and high weights. Set α levels from 0.6 to KDD data is highly mixed, yet we achieve on average nearly 90% pure normal rate inside the defensive walls

36 Adversarial SVM Classification boundaries. + is for the untransformed bad objects; o is for the good objects; * is for the transformed bad objects, i.e., the attack objects. The black dashed line is the standard SVM classification boundary, and the blue line is the Adversarial SVM classification boundary. Both the untransformed and the transformed bad objects are what we want to detect and block.

37 Adversarial SVM AD-SVM solves a convex optimization problem where the constraints are tied to adversarial attack models. Free-range attack: Adversary can move attack objects anywhere in the domain. C f (x min.j x ij ) δ ij C f (x max.j x ij ) The j th feature of an instance x i falls in between x max.j and x min.j. C f [0, 1]. C f = 0 means no attack. C f = 1 is the most aggressive attack.

38 Adversarial SVM Targeted attack: Adversary can only move attack instances closer to a targeted value. x t i is the target. The adversary adds δ ij to the j th feature x ij of an attack object. δ ij x t ij x ij. An upper bound on the amount of movement for x ij : 0 (x t ij x ij)δ ij C ξ x t ij 1 C x ij δ x ij + x t ij (x t ij x ij) 2 C δ reflects the loss of malicious utility. It can be larger than 1. 1 C δ x t ij x ij x ij + x t ij is the maximum percentage of x t ij x ij that δ ij is allowed to be. C ξ [0, 1] is a discount factor. Larger C ξ allows more movement.

39 Adversarial SVM SVM risk minimization model: free-range attack argmin 1 w,b,ξi,t i,u i,v i 2 w 2 + C i ξ i s.t. ξ i 0 ξ i 1 y i (w x i + b) + t i t i ( j C f vij (x max j u i v i = 1 2 (1 + y i)w u i 0 v i 0 x ij ) u ij (x min j x ij ) ) SVM risk minimization model: targeted attack argmin 1 w,b,ξi,t i,u i,v i 2 w 2 + C i ξ i s.t. ξ i 0 ξ i 1 y i (w x i + b) + t i t i j e ij u ij ( u i + v i ) (x t i x i) = 1 2 (1 + y i)w u i 0 v i 0

40 Adversarial SVM spambase data is from the UCI data repository. Accuracy of AD- SVM, SVM, and one-class SVM on the spambase data as free range attacks intensify. C f increases as attacks become more aggressive. Generate attacks as δ ij = f attack (x t ij x ij) f attack = 0 f attack = 0.3 f attack = 0.5 f attack = 0.7 f attack = 1.0 AD-SVM C f = AD-SVM C f = AD-SVM C f = AD-SVM C f = AD-SVM C f = SVM One-Class SVM

41 Adversarial SVM C δ decreases as targeted attacks become more aggressive. C ξ = 1 f attack = 0 f attack = 0.3 f attack = 0.5 f attack = 0.7 f attack = 1.0 AD-SVM C δ = AD-SVM C δ = AD-SVM C δ = AD-SVM C δ = AD-SVM C δ = SVM One-class SVM AD-SVM is more resilient to modest attacks than other SVM learning algorithms.

42 Funding Support ARO W911NF : Data Analytics for Cyber Security: Defeating the Active Adversaries, UT Dallas PI: M. Kantarcioglu, Purdue PI: B. Xi, $470,000, 08/07/ /06/2020, Amount Responsible: $222,500 ARO W911NF : A Game Theoretic Framework for Adversarial Classification, UT Dallas PI: M. Kantarcioglu, Purdue PI: B. Xi, $440,000, 08/01/ /31/2015, Amount Responsible: $210,140 Publications Kantarcioglu, M., Xi, B., and Clifton, C., Classifier Evaluation and Attribute Selection against Active Adversaries, Data Mining and Knowledge Discovery. 22(1-2), , 2011 Zhou, Y., Kantarcioglu, M., Thuraisingham, B., and Xi, B., Adversarial Support Vector Machine Learning, Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, , KDD 2012

43 Kantarcioglu, M., and Xi, B., Adversarial Data Mining, North Atlantic Treaty Organization (NATO) SAS-106 Symposium on Analysis Support to Decision Making in Cyber Defense, 1 11, June, 2014, Estonia Zhou, Y., Kantarcioglu, M., Xi, B., (Hot Topic Essay) Adversarial Learning: Mind Games with the Opponent, ACM Computing Reviews, August 2017 Zhou, Y., Kantarcioglu, M., Xi, B., A Survey of Game Theoretic Approach for Adversarial Machine Learning, revision submitted Wei, W., Xi, B., Kantarcioglu, M., Adversarial Clustering: A Grid Based Clustering Algorithm against Active Adversaries, submitted Zhou, Y., Kantarcioglu, M., Xi, B., Adversarial Active Learning, submitted

Unsupervised Anomaly Detection for High Dimensional Data

Unsupervised Anomaly Detection for High Dimensional Data Unsupervised Anomaly Detection for High Dimensional Data Department of Mathematics, Rowan University. July 19th, 2013 International Workshop in Sequential Methodologies (IWSM-2013) Outline of Talk Motivation

More information

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date

More information

L5 Support Vector Classification

L5 Support Vector Classification L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

CSE446: Clustering and EM Spring 2017

CSE446: Clustering and EM Spring 2017 CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Algorithmic Stability and Generalization Christoph Lampert

Algorithmic Stability and Generalization Christoph Lampert Algorithmic Stability and Generalization Christoph Lampert November 28, 2018 1 / 32 IST Austria (Institute of Science and Technology Austria) institute for basic research opened in 2009 located in outskirts

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so. CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

Anomaly Detection. Jing Gao. SUNY Buffalo

Anomaly Detection. Jing Gao. SUNY Buffalo Anomaly Detection Jing Gao SUNY Buffalo 1 Anomaly Detection Anomalies the set of objects are considerably dissimilar from the remainder of the data occur relatively infrequently when they do occur, their

More information

Does Unlabeled Data Help?

Does Unlabeled Data Help? Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline

More information

The Perceptron Algorithm, Margins

The Perceptron Algorithm, Margins The Perceptron Algorithm, Margins MariaFlorina Balcan 08/29/2018 The Perceptron Algorithm Simple learning algorithm for supervised classification analyzed via geometric margins in the 50 s [Rosenblatt

More information

The No-Regret Framework for Online Learning

The No-Regret Framework for Online Learning The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Support Vector Machine (SVM) Hamid R. Rabiee Hadi Asheri, Jafar Muhammadi, Nima Pourdamghani Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Introduction

More information

Brief Introduction to Machine Learning

Brief Introduction to Machine Learning Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector

More information

Online Manifold Regularization: A New Learning Setting and Empirical Study

Online Manifold Regularization: A New Learning Setting and Empirical Study Online Manifold Regularization: A New Learning Setting and Empirical Study Andrew B. Goldberg 1, Ming Li 2, Xiaojin Zhu 1 1 Computer Sciences, University of Wisconsin Madison, USA. {goldberg,jerryzhu}@cs.wisc.edu

More information

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1 Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

More information

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

An Algorithms-based Intro to Machine Learning

An Algorithms-based Intro to Machine Learning CMU 15451 lecture 12/08/11 An Algorithmsbased Intro to Machine Learning Plan for today Machine Learning intro: models and basic issues An interesting algorithm for combining expert advice Avrim Blum [Based

More information

CS570 Data Mining. Anomaly Detection. Li Xiong. Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber.

CS570 Data Mining. Anomaly Detection. Li Xiong. Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber. CS570 Data Mining Anomaly Detection Li Xiong Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber April 3, 2011 1 Anomaly Detection Anomaly is a pattern in the data that does not conform

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Adversarial Classification

Adversarial Classification Adversarial Classification Nilesh Dalvi, Pedro Domingos, Mausam, Sumit Sanghai, Deepak Verma [KDD 04, Seattle] Presented by: Aditya Menon UCSD April 22, 2008 Presented by: Aditya Menon (UCSD) Adversarial

More information

Learning with strategic agents: from adversarial learning to game-theoretic statistics

Learning with strategic agents: from adversarial learning to game-theoretic statistics Learning with strategic agents: from adversarial learning to game-theoretic statistics Patrick Loiseau, EURECOM (Sophia-Antipolis) Graduate Summer School: Games and Contracts for Cyber-Physical Security

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert

Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert HSE Computer Science Colloquium September 6, 2016 IST Austria (Institute of Science and Technology

More information

Copyright (C) 2013 David K. Levine This document is an open textbook; you can redistribute it and/or modify it under the terms of the Creative

Copyright (C) 2013 David K. Levine This document is an open textbook; you can redistribute it and/or modify it under the terms of the Creative Copyright (C) 2013 David K. Levine This document is an open textbook; you can redistribute it and/or modify it under the terms of the Creative Commons attribution license http://creativecommons.org/licenses/by/2.0/

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition Data Mining Classification: Basic Concepts and Techniques Lecture Notes for Chapter 3 by Tan, Steinbach, Karpatne, Kumar 1 Classification: Definition Given a collection of records (training set ) Each

More information

Support Vector Machine (continued)

Support Vector Machine (continued) Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need

More information

Support Vector Machines

Support Vector Machines EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Data Preprocessing. Cluster Similarity

Data Preprocessing. Cluster Similarity 1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016 AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Linear Programming-based Data Mining Techniques And Credit Card Business Intelligence

Linear Programming-based Data Mining Techniques And Credit Card Business Intelligence Linear Programming-based Data Mining Techniques And Credit Card Business Intelligence Yong Shi the Charles W. and Margre H. Durham Distinguished Professor of Information Technology University of Nebraska,

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

More information

Introduction to Game Theory

Introduction to Game Theory Introduction to Game Theory Part 2. Dynamic games of complete information Chapter 2. Two-stage games of complete but imperfect information Ciclo Profissional 2 o Semestre / 2011 Graduação em Ciências Econômicas

More information

Chapter 10. Semi-Supervised Learning

Chapter 10. Semi-Supervised Learning Chapter 10. Semi-Supervised Learning Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Outline

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Learning with Rejection

Learning with Rejection Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,

More information

Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

More information

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable. Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have

More information

Machine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)

Machine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML) Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang (Chap. 12 of CIML) Nonlinear Features x4: -1 x1: +1 x3: +1 x2: -1 Concatenated (combined) features XOR:

More information

Computational Learning Theory. CS534 - Machine Learning

Computational Learning Theory. CS534 - Machine Learning Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization

More information

Security Analytics. Topic 6: Perceptron and Support Vector Machine

Security Analytics. Topic 6: Perceptron and Support Vector Machine Security Analytics Topic 6: Perceptron and Support Vector Machine Purdue University Prof. Ninghui Li Based on slides by Prof. Jenifer Neville and Chris Clifton Readings Principle of Data Mining Chapter

More information

Sample and Computationally Efficient Active Learning. Maria-Florina Balcan Carnegie Mellon University

Sample and Computationally Efficient Active Learning. Maria-Florina Balcan Carnegie Mellon University Sample and Computationally Efficient Active Learning Maria-Florina Balcan Carnegie Mellon University Machine Learning is Shaping the World Highly successful discipline with lots of applications. Computational

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori

More information

CSE 546 Final Exam, Autumn 2013

CSE 546 Final Exam, Autumn 2013 CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,

More information

Neural networks and support vector machines

Neural networks and support vector machines Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith

More information

Boosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13

Boosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13 Boosting Ryan Tibshirani Data Mining: 36-462/36-662 April 25 2013 Optional reading: ISL 8.2, ESL 10.1 10.4, 10.7, 10.13 1 Reminder: classification trees Suppose that we are given training data (x i, y

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI Support Vector Machines CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Linear Classifier Naive Bayes Assume each attribute is drawn from Gaussian distribution with the same variance Generative model:

More information

Foundations of Machine Learning

Foundations of Machine Learning Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

CPSC 340: Machine Learning and Data Mining. Gradient Descent Fall 2016

CPSC 340: Machine Learning and Data Mining. Gradient Descent Fall 2016 CPSC 340: Machine Learning and Data Mining Gradient Descent Fall 2016 Admin Assignment 1: Marks up this weekend on UBC Connect. Assignment 2: 3 late days to hand it in Monday. Assignment 3: Due Wednesday

More information

Summary and discussion of: Dropout Training as Adaptive Regularization

Summary and discussion of: Dropout Training as Adaptive Regularization Summary and discussion of: Dropout Training as Adaptive Regularization Statistics Journal Club, 36-825 Kirstin Early and Calvin Murdock November 21, 2014 1 Introduction Multi-layered (i.e. deep) artificial

More information

Classifier Complexity and Support Vector Classifiers

Classifier Complexity and Support Vector Classifiers Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

Binary Classification / Perceptron

Binary Classification / Perceptron Binary Classification / Perceptron Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Supervised Learning Input: x 1, y 1,, (x n, y n ) x i is the i th data

More information

Anomaly Detection via Over-sampling Principal Component Analysis

Anomaly Detection via Over-sampling Principal Component Analysis Anomaly Detection via Over-sampling Principal Component Analysis Yi-Ren Yeh, Zheng-Yi Lee, and Yuh-Jye Lee Abstract Outlier detection is an important issue in data mining and has been studied in different

More information

A Framework for Adaptive Anomaly Detection Based on Support Vector Data Description

A Framework for Adaptive Anomaly Detection Based on Support Vector Data Description A Framework for Adaptive Anomaly Detection Based on Support Vector Data Description Min Yang, HuanGuo Zhang, JianMing Fu, and Fei Yan School of Computer, State Key Laboratory of Software Engineering, Wuhan

More information

Warm up: risk prediction with logistic regression

Warm up: risk prediction with logistic regression Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Optimization Methods for Machine Learning (OMML)

Optimization Methods for Machine Learning (OMML) Optimization Methods for Machine Learning (OMML) 2nd lecture (2 slots) Prof. L. Palagi 16/10/2014 1 What is (not) Data Mining? By Namwar Rizvi - Ad Hoc Query: ad Hoc queries just examines the current data

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Hollister, 306 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)

More information

PARSIMONIOUS MULTIVARIATE COPULA MODEL FOR DENSITY ESTIMATION. Alireza Bayestehtashk and Izhak Shafran

PARSIMONIOUS MULTIVARIATE COPULA MODEL FOR DENSITY ESTIMATION. Alireza Bayestehtashk and Izhak Shafran PARSIMONIOUS MULTIVARIATE COPULA MODEL FOR DENSITY ESTIMATION Alireza Bayestehtashk and Izhak Shafran Center for Spoken Language Understanding, Oregon Health & Science University, Portland, Oregon, USA

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

CS534: Machine Learning. Thomas G. Dietterich 221C Dearborn Hall

CS534: Machine Learning. Thomas G. Dietterich 221C Dearborn Hall CS534: Machine Learning Thomas G. Dietterich 221C Dearborn Hall tgd@cs.orst.edu http://www.cs.orst.edu/~tgd/classes/534 1 Course Overview Introduction: Basic problems and questions in machine learning.

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Introduction to SVM and RVM

Introduction to SVM and RVM Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Online Advertising is Big Business

Online Advertising is Big Business Online Advertising Online Advertising is Big Business Multiple billion dollar industry $43B in 2013 in USA, 17% increase over 2012 [PWC, Internet Advertising Bureau, April 2013] Higher revenue in USA

More information

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler + Machine Learning and Data Mining Bayes Classifiers Prof. Alexander Ihler A basic classifier Training data D={x (i),y (i) }, Classifier f(x ; D) Discrete feature vector x f(x ; D) is a con@ngency table

More information

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

More information

When Dictionary Learning Meets Classification

When Dictionary Learning Meets Classification When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor: Professor Yohann Tendero 1 UCLA 2 Dalhousie University 3 Harvey Mudd College August

More information

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Distirbutional robustness, regularizing variance, and adversaries

Distirbutional robustness, regularizing variance, and adversaries Distirbutional robustness, regularizing variance, and adversaries John Duchi Based on joint work with Hongseok Namkoong and Aman Sinha Stanford University November 2017 Motivation We do not want machine-learned

More information