Introduction of Recruit

Size: px
Start display at page:

Download "Introduction of Recruit"

Transcription

1 Apr. 11, 2018

2 Introduction of Recruit We provide various kinds of online services from job search to hotel reservations across the world. Housing Beauty Travel Life & Local O2O Education Automobile Bridal & Baby Human Resources IT & Trends Media Dining 2

3 Introduction of Recruit We help users to find the best clients through our services. Data science plays an important role in the business. Internet Users Clients 3

4 Data Science at Recruit Recruit has hosted two data mining competitions in Kaggle Kaggle, KDD Cup: International competitions of data mining Recruit Restaurant Visitor Forecasting (2018) Coupon Purchase Prediction (2015) We are passionate about data science Some of us came in 1st and 2nd place in KDD Cup 2015 { Engineers at Recruit (as of March 2018)4 C Recruit Communications Co., Ltd.

5 Feature Selection: A Key Technique A key technique to win data mining competitions Find the most relevant features Balance bias-variance trade-off Features User 1 User 2 User 3 User 4 Benefits Improve prediction Reduce computational cost User n-1 User n 5 Beating Kaggle the easy way studien/2015/dong_ying.pdf

6 Types of Feature Selection (FS) Algorithms Wrapper methods Iteratively evaluate a feature subset by black-box learning algorithm Embedded methods Train a model and select features at the same time Filter methods Features are selected by some criteria such as Mutual Information Independent on learning algorithms Can be used as a pre-processing 6

7 What is Mutual Information (MI)? Mutual Information I(X;Y) is a measure of the mutual independence between two random variables X and Y High Mutual Information I(X;Y) Low Able to predict Y given X Hard to predict Y given X MI can capture non-linear relationships unlike Pearson s correlation coefficient Shannon entropy Pearson r = 0.8 MI = 0.5 Pearson r = 0.0 MI = 0.7 Pearson r = 0.0 MI = Figures are retrieved from

8 Mutual Information based Feature Selection (MIFS) MIFS: using Mutual Information as a criteria in filter methods General formulation of MIFS MIFS selects a feature subset with a size of k which maximizes the Mutual Information (MI) between the features and the target variable 8

9 Heuristic MIFS Algorithms Max Relevance method Selecting the most relevant feature iteratively Repeat k times Mim Redundancy & Max Relevance method [1] (MRMR) Selecting the most relevant and least redundant feature iteratively Repeat k times 9 [1] H. Peng et al., 2005 [2] J. R. Vergara & P. A. Estévez, 2015

10 Our Contributions MI increase (%) w.r.t Linear MIFS optimization QUBO formulation of MIFS ) Better (1) We reformulate MIFS by QUBO ( #features (2) We confirmed optimizations by D-Wave do well in MIFS QUBO: Quadratic Unconstrained Binary Optimization HOW? image is retrieved from 10 C Recruit Communications Co., Ltd.

11 Reformulation of MIFS by QUBO (1) Expand the MI term Proof. Theorem 1.1: Chain theorem for Conditional Mutual Information Using theorem 1.1, the following equation holds for all i S Averaging the equation above for all i leads to 11

12 Reformulation of MIFS by QUBO (2) Approximate under the assumption of Conditional Independence (CI) Proof. If we assume the conditional independence We can obtain 12

13 Reformulation of MIFS by QUBO (3) Optimization of MIFS QUBO formulation of MIFS MI Penalty for selecting only k features α: penalty strength 13

14 Interpretation of the Derived Formulation Expand the derived formulation Increase: Relevance, Complementary Reduce: Redundancy Relevance Redundancy Complementary 14

15 Comparison of Optimization Methods Problem Formulation Binary Quadratic Problem (BQP) Optimization Methods Linear Relaxation [1] (Linear) Truncated Power [1,2] (TPower) QUBO Tabu Search by qbsolv [3] D-Wave 2000Q 15 [1] H. Venkateswara, et al., 2015 [2] X. T. Yuan & T. Zhang, 2013 [3]

16 Linear Relaxation Method (Linear) Linearize the quadratic term by introducing new variables One of the optimal conditions is, which leads to Since Qij 0, the solution of this problem is given by k largest column sum of Q. This solution is tightly bounded [1]. Time complexity is O(nk). 16 [1] H. Venkateswara, et al., 2015

17 Truncated Power Method (TPower) Finding the largest k-sparse eigenvector of Q is defined as We select i th feature if xi > 0 This is calculated by the following procedure [1] [1] X. T. Yuan & T. Zhang, 2013 [2] H. Venkateswara, et al., 2015 Repeat T times This method is confirmed to be the best-performing method for BQP problem with non-negative matrix [2]. Time complexity of the algorithm is O(Tn 2 ). 17

18 Optimization by D-Wave Machine We used the D-Wave machine with the following settings Machine: D-Wave 2000Q Embedding: 64 bit full connection Annealing Time: 20µs Annealing Repetitions: 10 When feature size n is larger than hardware size h (=64), we use Linear to narrow down the candidate features to h as a pre-processing. Full Connection Embedding for C(4,4,4) 18

19 Comparison of Mutual Information Score We compared MI scores of each optimization method for a public dataset. The increases with regard to Linear are shown in the graph below. Better MI increase (%) w.r.t Linear Mutual Information Score #features 19 ( ) Data Name: a1a #features: 122 #data points: 8000

20 Classification Accuracy We calculated the classification accuracy for different #features. Accuracy is a good measure to evaluate the quality of a selected subset of features. Original features Classification Accuracy Selected k-features Measure the classification accuracy by random forest classifiers 20

21 Classification Accuracy We evaluated each method by classification accuracy for different #features. Better Accuracy Classification Accuracy D-Wave TPower Tabu(qbsolv) Linear #features Better 21 Data Name: a1a #features: 122 #data points: 8000

22 Summary We derived the QUBO formulation of MIFS so that the problem can be embedded in Ising machines We used the D-Wave quantum annealing machine as a solver in MIFS The optimization method by D-Wave outperformed TPower which is the state-of-the-art optimization method for BQP We are planning to use MIFS by D-Wave in Kaggle! 22

23 Thank you for listening 23

24 Runtime of Optimizations method Linear TPower Tabu(qbsolv) D-Wave Averaege Runtime 9.0 msec 26.1 msec 14.3 sec 9.0 msec (Linear) μsec (annealing) Data Name: a1a #features: 122 #data points:

25 Comparison to MRMR, Max Rel. Accuracy D-Wave MRMR Max Rel #features Data Name: a1a #features: 122 #data points:

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector

More information

Qubits qop Tools Directions

Qubits qop Tools Directions Qubits qop Tools Directions Steve Reinhardt Director of Software Tools D-Wave Systems The qop goals are to establish key abstractions that are valuable for applications and higherlevel tools and effectively

More information

SOLVING SPARSE REPRESENTATIONS FOR OBJECT CLASSIFICATION USING QUANTUM D-WAVE 2X MACHINE

SOLVING SPARSE REPRESENTATIONS FOR OBJECT CLASSIFICATION USING QUANTUM D-WAVE 2X MACHINE SOLVING SPARSE REPRESENTATIONS FOR OBJECT CLASSIFICATION USING QUANTUM D-WAVE 2X MACHINE! Nga Nguyen, Amy Larson, Carleton Coffrin, John Perry, Gary Salazar, and Garrett Kenyon Los Alamos National Laboratory

More information

Display Advertising Optimization by Quantum Annealing Processor

Display Advertising Optimization by Quantum Annealing Processor Display Advertising Optimization by Quantum Annealing Processor Shinichi Takayanagi*, Kotaro Tanahashi*, Shu Tanaka *Recruit Communications Co., Ltd. Waseda University, JST PRESTO Overview 1. Introduction

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

Observing Dark Worlds (Final Report)

Observing Dark Worlds (Final Report) Observing Dark Worlds (Final Report) Bingrui Joel Li (0009) Abstract Dark matter is hypothesized to account for a large proportion of the universe s total mass. It does not emit or absorb light, making

More information

Conditional Likelihood Maximization: A Unifying Framework for Information Theoretic Feature Selection

Conditional Likelihood Maximization: A Unifying Framework for Information Theoretic Feature Selection Conditional Likelihood Maximization: A Unifying Framework for Information Theoretic Feature Selection Gavin Brown, Adam Pocock, Mingjie Zhao and Mikel Lujan School of Computer Science University of Manchester

More information

Quantum Annealing Approaches to Graph Partitioning on the D-Wave System

Quantum Annealing Approaches to Graph Partitioning on the D-Wave System Quantum Annealing Approaches to Graph Partitioning on the D-Wave System 2017 D-Wave QUBITS Users Conference Applications 1: Optimization S. M. Mniszewski, smm@lanl.gov H. Ushijima-Mwesigwa, hayato@lanl.gov

More information

Financial Portfolio Management using D-Wave s Quantum Optimizer: The Case of Abu Dhabi Securities Exchange

Financial Portfolio Management using D-Wave s Quantum Optimizer: The Case of Abu Dhabi Securities Exchange Financial Portfolio Management using D-Wave s Quantum Optimizer: The Case of Abu Dhabi Securities Exchange Nada Elsokkary and Faisal Shah Khan Quantum Computing Research Group Department of Applied Mathematics

More information

Classification using stochastic ensembles

Classification using stochastic ensembles July 31, 2014 Topics Introduction Topics Classification Application and classfication Classification and Regression Trees Stochastic ensemble methods Our application: USAID Poverty Assessment Tools Topics

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Sparse Approximation and Variable Selection

Sparse Approximation and Variable Selection Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Semestrial Project - Expedia Hotel Ranking

Semestrial Project - Expedia Hotel Ranking 1 Many customers search and purchase hotels online. Companies such as Expedia make their profit from purchases made through their sites. The ultimate goal top of the list are the hotels that are most likely

More information

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision

More information

Decision Trees: Overfitting

Decision Trees: Overfitting Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9

More information

Machine Learning for Software Engineering

Machine Learning for Software Engineering Machine Learning for Software Engineering Dimensionality Reduction Prof. Dr.-Ing. Norbert Siegmund Intelligent Software Systems 1 2 Exam Info Scheduled for Tuesday 25 th of July 11-13h (same time as the

More information

Preliminaries. Data Mining. The art of extracting knowledge from large bodies of structured data. Let s put it to use!

Preliminaries. Data Mining. The art of extracting knowledge from large bodies of structured data. Let s put it to use! Data Mining The art of extracting knowledge from large bodies of structured data. Let s put it to use! 1 Recommendations 2 Basic Recommendations with Collaborative Filtering Making Recommendations 4 The

More information

The role of dimensionality reduction in classification

The role of dimensionality reduction in classification The role of dimensionality reduction in classification Weiran Wang and Miguel Á. Carreira-Perpiñán Electrical Engineering and Computer Science University of California, Merced http://eecs.ucmerced.edu

More information

Solving the Travelling Salesman Problem Using Quantum Computing

Solving the Travelling Salesman Problem Using Quantum Computing Solving the Travelling Salesman Problem Using Quantum Computing Sebastian Feld, Christoph Roch, Thomas Gabor Ludwig-Maximilians-Universität München OpenMunich 01.12.2017, Munich Agenda I. Quantum Computing

More information

Introduction to Machine Learning. Regression. Computer Science, Tel-Aviv University,

Introduction to Machine Learning. Regression. Computer Science, Tel-Aviv University, 1 Introduction to Machine Learning Regression Computer Science, Tel-Aviv University, 2013-14 Classification Input: X Real valued, vectors over real. Discrete values (0,1,2,...) Other structures (e.g.,

More information

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s

More information

HETEROGENEOUS QUANTUM COMPUTING FOR SATELLITE OPTIMIZATION

HETEROGENEOUS QUANTUM COMPUTING FOR SATELLITE OPTIMIZATION HETEROGENEOUS QUANTUM COMPUTING FOR SATELLITE OPTIMIZATION GIDEON BASS BOOZ ALLEN HAMILTON September 2017 COLLABORATORS AND PARTNERS Special thanks to: Brad Lackey (UMD/QuICS) for advice and suggestions

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Information Theory and Feature Selection (Joint Informativeness and Tractability)

Information Theory and Feature Selection (Joint Informativeness and Tractability) Information Theory and Feature Selection (Joint Informativeness and Tractability) Leonidas Lefakis Zalando Research Labs 1 / 66 Dimensionality Reduction Feature Construction Construction X 1,..., X D f

More information

Opportunities and challenges in quantum-enhanced machine learning in near-term quantum computers

Opportunities and challenges in quantum-enhanced machine learning in near-term quantum computers Opportunities and challenges in quantum-enhanced machine learning in near-term quantum computers Alejandro Perdomo-Ortiz Senior Research Scientist, Quantum AI Lab. at NASA Ames Research Center and at the

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

arxiv: v1 [quant-ph] 16 Aug 2017

arxiv: v1 [quant-ph] 16 Aug 2017 Noname manuscript No. (will be inserted by the editor) Combinatorial Optimization on Gate Model Quantum Computers: A Survey Ehsan Zahedinejad Arman Zaribafiyan arxiv:1708.05294v1 [quant-ph] 16 Aug 2017

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

LOCKHEED MARTIN SITE UPDATE

LOCKHEED MARTIN SITE UPDATE LOCKHEED MARTIN SITE UPDATE 25 SEPTEMBER 2018 Julia Kwok Software Engineer Quantum Applications THE USC-LM QUANTUM COMPUTING CENTER Dr. Edward H. Ned Allen Chief Scientist and LM Senior Fellow Lockheed

More information

Recap from previous lecture

Recap from previous lecture Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience

More information

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18 CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H

More information

MULTIPLEKERNELLEARNING CSE902

MULTIPLEKERNELLEARNING CSE902 MULTIPLEKERNELLEARNING CSE902 Multiple Kernel Learning -keywords Heterogeneous information fusion Feature selection Max-margin classification Multiple kernel learning MKL Convex optimization Kernel classification

More information

Compressing Tabular Data via Pairwise Dependencies

Compressing Tabular Data via Pairwise Dependencies Compressing Tabular Data via Pairwise Dependencies Amir Ingber, Yahoo! Research TCE Conference, June 22, 2017 Joint work with Dmitri Pavlichin, Tsachy Weissman (Stanford) Huge datasets: everywhere - Internet

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

Quantum Annealing with continuous variables: Low-Rank Matrix Factorization. Daniele Ottaviani CINECA. Alfonso Amendola ENI

Quantum Annealing with continuous variables: Low-Rank Matrix Factorization. Daniele Ottaviani CINECA. Alfonso Amendola ENI Quantum Annealing with continuous variables: Low-Rank Matrix Factorization Daniele Ottaviani CINECA Alfonso Amendola ENI Qubits Europe 2019 Milan, 25-27/03/2019 QUBO Problems with real variables We define

More information

Quantum Computing at Volkswagen: Traffic Flow Optimization using the D-Wave Quantum Annealer

Quantum Computing at Volkswagen: Traffic Flow Optimization using the D-Wave Quantum Annealer Quantum Computing at Volkswagen: Traffic Flow Optimization using the D-Wave Quantum Annealer D-Wave Users Group Meeting - National Harbour, MD 27.09.2017 Dr. Gabriele Compostella The Volkswagen Data:Lab

More information

Machine Learning, Midterm Exam

Machine Learning, Midterm Exam 10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Predicting Future Energy Consumption CS229 Project Report

Predicting Future Energy Consumption CS229 Project Report Predicting Future Energy Consumption CS229 Project Report Adrien Boiron, Stephane Lo, Antoine Marot Abstract Load forecasting for electric utilities is a crucial step in planning and operations, especially

More information

Fast Logistic Regression for Text Categorization with Variable-Length N-grams

Fast Logistic Regression for Text Categorization with Variable-Length N-grams Fast Logistic Regression for Text Categorization with Variable-Length N-grams Georgiana Ifrim *, Gökhan Bakır +, Gerhard Weikum * * Max-Planck Institute for Informatics Saarbrücken, Germany + Google Switzerland

More information

Multiclass Classification-1

Multiclass Classification-1 CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass

More information

NetBox: A Probabilistic Method for Analyzing Market Basket Data

NetBox: A Probabilistic Method for Analyzing Market Basket Data NetBox: A Probabilistic Method for Analyzing Market Basket Data José Miguel Hernández-Lobato joint work with Zoubin Gharhamani Department of Engineering, Cambridge University October 22, 2012 J. M. Hernández-Lobato

More information

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html

More information

Filter Methods. Part I : Basic Principles and Methods

Filter Methods. Part I : Basic Principles and Methods Filter Methods Part I : Basic Principles and Methods Feature Selection: Wrappers Input: large feature set Ω 10 Identify candidate subset S Ω 20 While!stop criterion() Evaluate error of a classifier using

More information

Term Filtering with Bounded Error

Term Filtering with Bounded Error Term Filtering with Bounded Error Zi Yang, Wei Li, Jie Tang, and Juanzi Li Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University, China {yangzi, tangjie, ljz}@keg.cs.tsinghua.edu.cn

More information

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this

More information

Final Examination CS540-2: Introduction to Artificial Intelligence

Final Examination CS540-2: Introduction to Artificial Intelligence Final Examination CS540-2: Introduction to Artificial Intelligence May 9, 2018 LAST NAME: SOLUTIONS FIRST NAME: Directions 1. This exam contains 33 questions worth a total of 100 points 2. Fill in your

More information

Quantum Annealing and the Satisfiability Problem

Quantum Annealing and the Satisfiability Problem arxiv:1612.7258v1 [quant-ph] 21 Dec 216 Quantum Annealing and the Satisfiability Problem 1. Introduction Kristen L PUDENZ 1, Gregory S TALLANT, Todd R BELOTE, and Steven H ADACHI Lockheed Martin, United

More information

Linear Programming-based Data Mining Techniques And Credit Card Business Intelligence

Linear Programming-based Data Mining Techniques And Credit Card Business Intelligence Linear Programming-based Data Mining Techniques And Credit Card Business Intelligence Yong Shi the Charles W. and Margre H. Durham Distinguished Professor of Information Technology University of Nebraska,

More information

Active Learning for Sparse Bayesian Multilabel Classification

Active Learning for Sparse Bayesian Multilabel Classification Active Learning for Sparse Bayesian Multilabel Classification Deepak Vasisht, MIT & IIT Delhi Andreas Domianou, University of Sheffield Manik Varma, MSR, India Ashish Kapoor, MSR, Redmond Multilabel Classification

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

LANL Site Report. Motivation and Machine Activity Training/Access Approach Technical Highlights. Daniel O Malley EES-16 04/11/2018 LA-UR-XX-XXXXXX

LANL Site Report. Motivation and Machine Activity Training/Access Approach Technical Highlights. Daniel O Malley EES-16 04/11/2018 LA-UR-XX-XXXXXX LANL Site Report Motivation and Machine Activity Training/Access Approach Technical Highlights Daniel O Malley EES-16 04/11/2018 Operated by Los Alamos National Security, LLC for the U.S. Department of

More information

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Decision Trees CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Classification without Models Well, partially without a model } Today: Decision Trees 2015 Bruno Ribeiro 2 3 Why Trees? } interpretable/intuitive,

More information

L5 Support Vector Classification

L5 Support Vector Classification L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander

More information

Gradient Boosting (Continued)

Gradient Boosting (Continued) Gradient Boosting (Continued) David Rosenberg New York University April 4, 2016 David Rosenberg (New York University) DS-GA 1003 April 4, 2016 1 / 31 Boosting Fits an Additive Model Boosting Fits an Additive

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

Quantum Classification of Malware

Quantum Classification of Malware Quantum Classification of Malware John Seymour seymour1@umbc.edu Charles Nicholas nicholas@umbc.edu August 24, 2015 Abstract Quantum computation has recently become an important area for security research,

More information

Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee

Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee wwlee1@uiuc.edu September 28, 2004 Motivation IR on newsgroups is challenging due to lack of

More information

Finding Maximum Cliques on a Quantum Annealer

Finding Maximum Cliques on a Quantum Annealer Finding Maximum Cliques on a Quantum Annealer Guillaume Chapuis Los Alamos National Laboratory Georg Hahn Imperial College, London, UK Hristo Djidjev (PI) Los Alamos National Laboratory Guillaume Rizk

More information

Support Vector Machines Explained

Support Vector Machines Explained December 23, 2008 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

Collaborative Filtering Applied to Educational Data Mining

Collaborative Filtering Applied to Educational Data Mining Journal of Machine Learning Research (200) Submitted ; Published Collaborative Filtering Applied to Educational Data Mining Andreas Töscher commendo research 8580 Köflach, Austria andreas.toescher@commendo.at

More information

Turbulence Simulations

Turbulence Simulations Innovatives Supercomputing in Deutschland Innovative HPC in Germany Vol. 14 No. 2 Autumn 2016 Turbulence Simulations The world s largest terrestrial & astrophysical applications Vice World Champion HLRS

More information

Support Vector Machine

Support Vector Machine Andrea Passerini passerini@disi.unitn.it Machine Learning Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

More information

Using Entropy-Related Measures in Categorical Data Visualization

Using Entropy-Related Measures in Categorical Data Visualization Using Entropy-Related Measures in Categorical Data Visualization Jamal Alsakran The University of Jordan Xiaoke Huang, Ye Zhao Kent State University Jing Yang UNC Charlotte Karl Fast Kent State University

More information

Kyle Reing University of Southern California April 18, 2018

Kyle Reing University of Southern California April 18, 2018 Renormalization Group and Information Theory Kyle Reing University of Southern California April 18, 2018 Overview Renormalization Group Overview Information Theoretic Preliminaries Real Space Mutual Information

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

Traffic flow optimization using a quantum annealer

Traffic flow optimization using a quantum annealer Traffic flow optimization using a quantum annealer Florian Neukart 1, David Von Dollen 1, Gabriele Compostella 2, Christian Seidel 2, Sheir Yarkoni 3, and Bob Parney 3 1 Volkswagen Group of America, San

More information

Dimension Reduction Methods

Dimension Reduction Methods Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Cutting Plane Training of Structural SVM

Cutting Plane Training of Structural SVM Cutting Plane Training of Structural SVM Seth Neel University of Pennsylvania sethneel@wharton.upenn.edu September 28, 2017 Seth Neel (Penn) Short title September 28, 2017 1 / 33 Overview Structural SVMs

More information

DATA MINING AND MACHINE LEARNING

DATA MINING AND MACHINE LEARNING DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems

More information

Understanding Wealth in New York City From the Activity of Local Businesses

Understanding Wealth in New York City From the Activity of Local Businesses Understanding Wealth in New York City From the Activity of Local Businesses Vincent S. Chen Department of Computer Science Stanford University vschen@stanford.edu Dan X. Yu Department of Computer Science

More information

Planning maximum capacity Wireless Local Area Networks

Planning maximum capacity Wireless Local Area Networks Edoardo Amaldi Sandro Bosio Antonio Capone Matteo Cesana Federico Malucelli Di Yuan Planning maximum capacity Wireless Local Area Networks http://www.elet.polimi.it/upload/malucell Outline Application

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

Self-Organization by Optimizing Free-Energy

Self-Organization by Optimizing Free-Energy Self-Organization by Optimizing Free-Energy J.J. Verbeek, N. Vlassis, B.J.A. Kröse University of Amsterdam, Informatics Institute Kruislaan 403, 1098 SJ Amsterdam, The Netherlands Abstract. We present

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore Decision Trees Claude Monet, The Mulberry Tree Slides from Pedro Domingos, CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore Michael Guerzhoy

More information

Perceptron Revisited: Linear Separators. Support Vector Machines

Perceptron Revisited: Linear Separators. Support Vector Machines Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department

More information

Randomized Decision Trees

Randomized Decision Trees Randomized Decision Trees compiled by Alvin Wan from Professor Jitendra Malik s lecture Discrete Variables First, let us consider some terminology. We have primarily been dealing with real-valued data,

More information

arxiv: v1 [cs.ds] 25 Jan 2016

arxiv: v1 [cs.ds] 25 Jan 2016 A Novel Graph-based Approach for Determining Molecular Similarity Maritza Hernandez 1, Arman Zaribafiyan 1,2, Maliheh Aramon 1, and Mohammad Naghibi 3 1 1QB Information Technologies (1QBit), Vancouver,

More information

In: Advances in Intelligent Data Analysis (AIDA), International Computer Science Conventions. Rochester New York, 1999

In: Advances in Intelligent Data Analysis (AIDA), International Computer Science Conventions. Rochester New York, 1999 In: Advances in Intelligent Data Analysis (AIDA), Computational Intelligence Methods and Applications (CIMA), International Computer Science Conventions Rochester New York, 999 Feature Selection Based

More information

Maximum Entropy Klassifikator; Klassifikation mit Scikit-Learn

Maximum Entropy Klassifikator; Klassifikation mit Scikit-Learn Maximum Entropy Klassifikator; Klassifikation mit Scikit-Learn Benjamin Roth Centrum für Informations- und Sprachverarbeitung Ludwig-Maximilian-Universität München beroth@cis.uni-muenchen.de Benjamin Roth

More information

Automated Solar Flare Prediction: Is it a myth?

Automated Solar Flare Prediction: Is it a myth? Automated Solar Flare Prediction: Is it a myth? Tufan Colak, t.colak@bradford.ac.uk Rami Qahwaji, Omar W. Ahmed, Paul Higgins* University of Bradford, U.K.,Trinity Collage Dublin, Ireland* European Space

More information

MSc Project Feature Selection using Information Theoretic Techniques. Adam Pocock

MSc Project Feature Selection using Information Theoretic Techniques. Adam Pocock MSc Project Feature Selection using Information Theoretic Techniques Adam Pocock pococka4@cs.man.ac.uk 15/08/2008 Abstract This document presents a investigation into 3 different areas of feature selection,

More information

Mini-project in scientific computing

Mini-project in scientific computing Mini-project in scientific computing Eran Treister Computer Science Department, Ben-Gurion University of the Negev, Israel. March 7, 2018 1 / 30 Scientific computing Involves the solution of large computational

More information

The connection of dropout and Bayesian statistics

The connection of dropout and Bayesian statistics The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University

More information

Midterm: CS 6375 Spring 2018

Midterm: CS 6375 Spring 2018 Midterm: CS 6375 Spring 2018 The exam is closed book (1 cheat sheet allowed). Answer the questions in the spaces provided on the question sheets. If you run out of room for an answer, use an additional

More information

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:

More information

arxiv: v2 [quant-ph] 2 Oct 2014

arxiv: v2 [quant-ph] 2 Oct 2014 A Quantum Annealing Approach for Fault Detection and Diagnosis of Graph-Based Systems Alejandro Perdomo-Ortiz,, 2, a) Joseph Fluegemann,, 3 Sriram Narasimhan, 2 Rupak Biswas, and Vadim N. Smelyanskiy )

More information

Midterm: CS 6375 Spring 2015 Solutions

Midterm: CS 6375 Spring 2015 Solutions Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an

More information

Reducing Computation Time for the Analysis of Large Social Science Datasets

Reducing Computation Time for the Analysis of Large Social Science Datasets Reducing Computation Time for the Analysis of Large Social Science Datasets Douglas G. Bonett Center for Statistical Analysis in the Social Sciences University of California, Santa Cruz Jan 28, 2014 Overview

More information

Support vector machines Lecture 4

Support vector machines Lecture 4 Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of

More information