Instance-based Domain Adaptation
|
|
- Francine Cook
- 5 years ago
- Views:
Transcription
1 Instance-based Domain Adaptation Rui Xia School of Computer Science and Engineering Nanjing University of Science and Technology 1
2 Problem Background Training data Test data Movie Domain Sentiment Classifier 85% 65% 75% 2
3 How to find the right training data? Picture from IJCAI-13 tutorial: Transfer Learning with Applications 3
4 A Summary of Domain Adaptation Methods in NLP Methods Description References Semi-supervised based Parameter based Feature based Instance based Build a semi-supervised classifier based on the source domain labeled data, and target domain unlabeled data Assume that models of the source and target domain share the same prior parameter Learn a new feature representation (or a new labeling function) for the target domain Learn the importance of labeled data in the source domain by instance selection and instance weighting [Aue, 2005; Tan, 2007; Dai, 2007; Tan, 2009] [Chelba, 2006; Li, 2009; Xue, 2008; Finkel, 2009] [Daume III, 2007; Blitzer, 2007; Gao, 2008; Pan, 2009; Pan, 2010b; Ji, 2011; Xia, 2011; Samdani, 2011; Glorot, 2011; Duan, 2012] [Sugiyama, 2007; Bickel, 2009; Axelrod, 2011] 4
5 Our Work PUIS/PUIW (Instance Selection and Instance Weighting via PU learning) Rui Xia, Xuelei Hu, Jianfeng Lu, Jian Yang, and ChengqingZong. Instance Selection and Instance Weighting for Cross-domain Sentiment Classification via PU Learning. IJCAI ILA (Instance Adaptation via In-target-domain Logistic Approximation) Rui Xia, Jianfei Yu, Feng Xu, and Shumei Wang. Instance-based Domain Adaptation in NLP via In-target-domain Logistic Approximation. AAAI
6 Part 1. PUIS/PUIW (Instance Selection and Instance Weighting via PU Learning) 6
7 Introduction to PU learning What is PU learning? Give a set of positive (P) samples of a particular class, and a set of unlabeled (U) samples that contains hidden positive and negative ones Build a binary-class classifier using P and U Relation to traditional semi-supervised learning Traditional semi-supervised learning : Learning with a small set of labeled data and a large set of unlabeled data (LU learning) PU learning: labeled data only contain positive samples 7
8 Illustration of PU Learning P P U Nr (reliable negative) Ur (remaining unlabeled) Apply traditional semi-supervised learning, based on P, Nr and Ur positive hidden positive hidden negative 8
9 A Motivation Example of PUIS/PUIW Large-size source domain training data are used as U set (blue cross) Some target domain unlabeled data are used as P set (red dot) PUL firstly identifies a reliable negative set Nr from U, and secondly builds a semi-supervised classifier using P, Nr, and U-Nr 9
10 PU Learning for Instance Selection (PUIS) Step 1: Detect reliable negative samples by sampling some spy samples and building a supervised NB classifier Step 2: Build a semisupervised NB based on EM training (NBEM) Step 3: Use the NBEM classifier as an in-targetdomain selector 10
11 The Instance Weighting Framework Expected log-likelihood of the target domain L(µ) = ¼ Instance Weighting: Density Ratio Estimation Z Z = x2x x2x Z x2x X y2y p t (x; y) log p(x; yjµ)dx p t (x) p s (x) p s(x) X y2y r(x)p s (x) X y2y p s (yjx) log p(x; yjµ)dx p s (yjx) log p(x; yjµ)dx Instance-weighted maximum likelihood estimation µ = arg max µ 1 N s XN s n=1 r(x n )log p(x n ; y n jµ) 11
12 PU Learning for Instance Weighting (PUIW) In-target-domain Sampling Assumption q t (x) = r(x)p s (x) / p(d = 1jx)p s (x) Instance Weighting via PU learning and Probability Calibration r(x) = p s(x) p t (x) / p(d = 1jx) = exp f(x) The calibration parameter The log-likelihood output of PU learning 12
13 Instance-weighted Classification Model Learning in traditional Naïve Bayes p(c j ) = P k I(y k = c j ) P j 0 I(y k = c 0 j ) = N j N P k p(t i jc j ) = P k I(y k = c j )N(t i ; x k ) P k I(y k = c j ) P i 0 =1 N(t i 0;x k) Learning in instance-weighted Naïve Byes p(c j ) = P k I(y k = c j )r(x k ) P k r(x k) p(t i jc j ) = P k I(y k = c j )r(x k )N(t i ;x k ) Pk I(y k = c j )r(x k ) P V i 0 =1 N(t i 0; x k) Note: PUIW can also be applied to Discriminative model (e.g., weighted MaxEnt, weighted SVMs) 13
14 Experiments on Cross-domain Sentiment Classification Setting 1 (with a normal size of training set) Source domain: Movie (training set size: 2,000) Target domains: {Book, DVD, Electronics, Kitchen} Setting 2 (with a large size of training set) Source domain: Video (training set size: 10,000) Target domains: 12 domains of reviews extracted from the Multi-domain dataset 14
15 Classification Normal-size Training Set Task KLD ALL PUIS PUIW Movie Book Movie DVD Movie Electronics Movie Kitchen Average
16 Accuracy Normal-size Training Set Movie Book 10-5 Alpha Movie DVD 10-5 Alpha Accuracy PUIW PUIS Random Selected sample size Accuracy 0.75 PUIW 0.7 PUIS Random Selected sample size Accuracy Movie Eletronics 10-4 Alpha PUIW PUIS Random Accuracy Movie Kitchen 10-4 Alpha PUIW PUIS Random Selected sample size Selected sample size For Random and PUIS, the bottom x-axis (number of selected samples) is used; For PUIW, the top x-axis (calibration parameter alpha) is used. 16
17 Large-size Training Set Task KLD ALL PUIS PUIW Video Apparel Video Baby Video Books Video Camera Video DVD Video Electronics Video Health Video Kitchen Video Magazines Video Music Video Software Video Toys Average
18 Accuracy A Large-size Training Set Accuracy Accuracy Accuracy Video Apparel 10-4 Alpha PUIW PUIS Random Selected sample size Video Baby 10-4 Alpha PUIS PUIW Random Selected sample size Video Books 10-4 Alpha PUIW PUIS Random Selected sample size Accuracy Accuracy Accuracy Video Camera 10-4 Alpha PUIW PUIS Random Selected sample size Video Health Alpha PUIW PUIS Random Selected sample size Video Kitchen 10-4 Alpha PUIS PUIW Random Selected sample size For Random and PUIS, the bottom x-axis is used; For PUIW, the top x-axis is used. 18
19 Conclusions from the Results The effect of adding more training samples When the size of training data is small (<2000) When the size of training data becomes larger (>3000) The necessity of PUIS/PUIW The improvements of both PUIS/PUIW are significant PUIW is overall better than PUIS The stability of PUIS/PUIW The number of selected samples in PUIS is hard to determine The curve of PUIW is unimodal(best \alpha 0.1) 19
20 The Relation of K-L Distance and Accuracy Improvement Acc improvement (%) Experimental Setting KL divergence Experimental Setting KL divergence KLD and Accuracy Improvement are in a roughly linear relation. 20
21 Part 2. Instance Adaptation via In-target-domain Logistic Approximation (ILA) 21
22 In-target-domain Sampling Model PUIW (PU learning for Instance Weighting) q t (x) = r(x)p s (x) 1 / 1 + exp f(x) p s(x) ILA (In-target-domain logistic approximation) q t (x) / p(d = 1jx)p s (x) 1 = 1 + e p!t x s(x) Shortcomings in PUIW: PU the Learning log-likelihood and Probability output Calibration of PU learning are conducted separately In ILA: We will learn the instance weight in one single model 22
23 Illustration of instance-weighted sampling Sampling weights: [1, 1, 1] Sampling weights: [0.5, 1.5, 1] Sampling weights: [1.5, 0.5, 1] f1 f2 f3 f4 f5 f6 0 f1 f2 f3 f4 f5 f6 0 f1 f2 f3 f4 f5 f6 23
24 Illustration of parameter learning criterion Target domain Approximated distribution f1 f2 f3 f4 f5 f Target domain true distribution - = f1 f2 f3 f4 f5 f Distributional Distance f1 f2 f3 f4 f5 f Another Target domain Approximated distribution f1 f2 f3 f4 f5 f Target domain true distribution - = f1 f2 f3 f4 f5 f Distributional Distance f1 f2 f3 f4 f5 f6 24
25 Parameter Learning in ILA K-L Distance (KLD) of the True and Approximated Target-domain Probability KL(p t (x)jjq t (x)) = min!; Z s:t: KL = max!; x2x Z = Z Z x2x q t (x)dx = x2x x2x p t (x) log p t(x) q t (x) dx p t (x) log p t (x) log Z x2d t p t (x) 1+e!T x p s(x) dx Optimization with normalization constraint 1 + e!t x dx 1 + e!t p s (x)dx = 1 25
26 Empirical Form of KLD Minimization Empirical K-L distance minimization max w;b;c 1 N t s:t: 1 XN s N s j=1 XN t i=1 log 1 + e wt x i 1 + e wt x j = 1 () = P Ns j=1 N s 1 1+e wt x j Final optimization formula (loss function) w = arg max w = arg min w 1 N t XN t i=1 XN t i=1 log log XN s j=1 P N s j=1 N s 1 1+e wt x j 1 + e wt x i 1 1+e wt x j 1 1+e wt x i A standard unconstraint optimization problem! 26
27 Gradient-based Optimization Gradient of the k = 1 P Ns j=1 s(!t x j ) XN s j=1 s(! T x j )(1 s(! T x j ))x j;k 1 XN t (1 s(! T x i ))x i;k N t i=1 where s(! T x j ) = 1 1+e wt x j Optimization method Gradient Descent! (t+1) k =! (t) k Newton, Quasi-Newton, L-BFGS can also work 27
28 Instance-weighted Classification In-target-domain sampling weights for each sourcedomain labeled data r(x n ) = q t(x n ) p s (x n ) = 1 + e!t x n where = P Ns j=1 N s 1 1+e wt x j Instance-weighted classification model Generative: weighted naïve Bayes Discriminative: weighted logistic regression, weighted SVMs 28
29 Instance Adaptation Feature Selection Domain-sensitive Information Gain IG(t k ) = Feature Space X l2f0;1g + p(t k ) X p(d = l)log p(d = l) l2f0;1g + p(¹t k ) X l2f0;1g p(d = ljt k )log p(d = ljt k ) p(d = lj¹t k )log p(d = lj¹t k ) ILA: from original feature space to dimension-reduced feature space x! ^x KLIEP: from original feature space to kernel space x! Á(x) 29
30 Experimental Settings Setting 1: Cross-domain Document Categorization 20 Newsgroups dataset Top categories are used as class labels, and subcategories are used to generate source and target domains [Dai, 2007] Setting 2: Cross-domain Sentiment Classification Source domain: Movie Target domains: {Book, DVD, Electronics, Kitchen} 30
31 Experimental Results Cross-domain Document Categorization Dataset K-L divergence No Adaptation Linear KLIEP Gaussian PUIS PUIW ILA sci vs com talk vs com sci vs talk rec vs sci rec vs com talk vs rec Avg A vs B means that the top category A and B are used as class labels, and subcategories under the top categories are used to generate the source and target domain datasets. 31
32 Experimental Results Cross-domain Sentiment Classification Dataset K-L divergence No Adaptation Linear KLIEP Gaussian PUIS PUIW ILA movie book movie dvd movie elec movie kitchen Average A B denote that we use dataset A as the source domain, and B as the target domain. 32
33 Parameter Sensitivity Effect of Instance Adaptation Feature Selection Accuracy movie dvd movie elec talk vs rec rec vs sci The percentage of selected features ILA can work efficiently in a dimensionality-reduced linear feature space 33
34 Comparison of Parameter Stability Accuracy PUIW 0.95 movie dvd 0.9 movie elec talk vs rec 0.85 rec vs sci calibration parameter α KLIEP movie dvd movie elec talk vs rec rec vs sci kernel parameter σ Parameter stability of PUIW and KLIEP-Gaussian. The x-axis denotes the value of the calibration parameter \alpha in PUIW(left), and the kernel parameter \delta in KLIEP-Gaussian (right). ILA is more convenient at parameter tuning in comparison with PUIW and KLIEP 34
35 Comparison of Computational Efficiency Computational Time Cost Task KLIEP-Gaussian #1000 #100 PUIW ILA Text categorization 19346s 2121s 241s 161s Sentiment classification 5219s 482s 184s 97s Overall Summary Joint model rather than separate model Parameters are learnt rather then tuned Better performance in Classification Accuracy, Parameter Stability, and Computational Efficiency 35
36 Feasibility to apply to the other NLP tasks Classification Task Cross-domain Text / Sentiment Classification Cross-domain NER? Cross-domain WSD? Sequential Learning Task Cross-domain Chinese word segmentation? Cross-domain parsing? Bilingual Alignment Learning Task Cross-domain MT? 36
37 Any Questions? 37
Instance-based Domain Adaptation via Multi-clustering Logistic Approximation
Instance-based Domain Adaptation via Multi-clustering Logistic Approximation FENG U, Nanjing University of Science and Technology JIANFEI YU, Singapore Management University RUI IA, Nanjing University
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationLogistic Regression. William Cohen
Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting
More informationConditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013
Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationDomain Adaptation for Regression
Domain Adaptation for Regression Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute and Google mohri@cims.nyu.edu Motivation Applications: distinct training and test distributions.
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationBowl Maximum Entropy #4 By Ejay Weiss. Maxent Models: Maximum Entropy Foundations. By Yanju Chen. A Basic Comprehension with Derivations
Bowl Maximum Entropy #4 By Ejay Weiss Maxent Models: Maximum Entropy Foundations By Yanju Chen A Basic Comprehension with Derivations Outlines Generative vs. Discriminative Feature-Based Models Softmax
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationMachine Learning for Structured Prediction
Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationWhat is semi-supervised learning?
What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,
More informationAd Placement Strategies
Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationMachine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber
Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationMaxent Models and Discriminative Estimation
Maxent Models and Discriminative Estimation Generative vs. Discriminative models (Reading: J+M Ch6) Introduction So far we ve looked at generative models Language models, Naive Bayes But there is now much
More informationFast Logistic Regression for Text Categorization with Variable-Length N-grams
Fast Logistic Regression for Text Categorization with Variable-Length N-grams Georgiana Ifrim *, Gökhan Bakır +, Gerhard Weikum * * Max-Planck Institute for Informatics Saarbrücken, Germany + Google Switzerland
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationImportance Reweighting Using Adversarial-Collaborative Training
Importance Reweighting Using Adversarial-Collaborative Training Yifan Wu yw4@andrew.cmu.edu Tianshu Ren tren@andrew.cmu.edu Lidan Mu lmu@andrew.cmu.edu Abstract We consider the problem of reweighting a
More informationISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification
ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology
More informationMore on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013
More on HMMs and other sequence models Intro to NLP - ETHZ - 18/03/2013 Summary Parts of speech tagging HMMs: Unsupervised parameter estimation Forward Backward algorithm Bayesian variants Discriminative
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationMidterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.
CS 189 Spring 2013 Introduction to Machine Learning Midterm You have 1 hour 20 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. Please use non-programmable calculators
More informationLogistic Regression. Some slides adapted from Dan Jurfasky and Brendan O Connor
Logistic Regression Some slides adapted from Dan Jurfasky and Brendan O Connor Naïve Bayes Recap Bag of words (order independent) Features are assumed independent given class P (x 1,...,x n c) =P (x 1
More informationMachine Learning for natural language processing
Machine Learning for natural language processing Classification: Maximum Entropy Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 24 Introduction Classification = supervised
More informationLast Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression
CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer
More informationStatistical NLP Spring A Discriminative Approach
Statistical NLP Spring 2008 Lecture 6: Classification Dan Klein UC Berkeley A Discriminative Approach View WSD as a discrimination task (regression, really) P(sense context:jail, context:county, context:feeding,
More informationDomain Adaptation for Word Sense Disambiguation under the Problem of Covariate Shift
Domain Adaptation for Word Sense Disambiguation under the Problem of Covariate Shift HIRONORI KIKUCHI 1,a) HIROYUKI SHINNOU 1,b) Abstract: Word sense disambiguation(wsd) is the task of identifying the
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationSequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015
Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationMIA - Master on Artificial Intelligence
MIA - Master on Artificial Intelligence 1 Introduction Unsupervised & semi-supervised approaches Supervised Algorithms Maximum Likelihood Estimation Maximum Entropy Modeling Introduction 1 Introduction
More informationLogistic Regression: Online, Lazy, Kernelized, Sequential, etc.
Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010
More informationDeep Neural Networks
Deep Neural Networks DT2118 Speech and Speaker Recognition Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 45 Outline State-to-Output Probability Model Artificial Neural Networks Perceptron Multi
More informationDynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji
Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient
More informationGenerative MaxEnt Learning for Multiclass Classification
Generative Maximum Entropy Learning for Multiclass Classification A. Dukkipati, G. Pandey, D. Ghoshdastidar, P. Koley, D. M. V. S. Sriram Dept. of Computer Science and Automation Indian Institute of Science,
More informationInformation Extraction from Text
Information Extraction from Text Jing Jiang Chapter 2 from Mining Text Data (2012) Presented by Andrew Landgraf, September 13, 2013 1 What is Information Extraction? Goal is to discover structured information
More informationTensor Methods for Feature Learning
Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to
More informationClick Prediction and Preference Ranking of RSS Feeds
Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationLogistic Regression. Sargur N. Srihari. University at Buffalo, State University of New York USA
Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative 1. Fixed basis
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationStatistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields
Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Sameer Maskey Week 13, Nov 28, 2012 1 Announcements Next lecture is the last lecture Wrap up of the semester 2 Final Project
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More informationMachine Learning: Logistic Regression. Lecture 04
Machine Learning: Logistic Regression Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Supervised Learning Task = learn an (unkon function t : X T that maps input
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision
More informationCS60021: Scalable Data Mining. Large Scale Machine Learning
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Large Scale Machine Learning Sourangshu Bhattacharya Example: Spam filtering Instance
More informationExploiting Associations between Word Clusters and Document Classes for Cross-domain Text Categorization
Exploiting Associations between Word Clusters and Document Classes for Cross-domain Text Categorization Fuzhen Zhuang Ping Luo Hui Xiong Qing He Yuhong Xiong Zhongzhi Shi Abstract Cross-domain text categorization
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationCS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines
CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationDistributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College
Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest
More informationIntroduction to Machine Learning Spring 2018 Note 18
CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes
More informationThe exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.
CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationChap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University
Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationCS446: Machine Learning Fall Final Exam. December 6 th, 2016
CS446: Machine Learning Fall 2016 Final Exam December 6 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationStatistical Methods for NLP
Statistical Methods for NLP Text Categorization, Support Vector Machines Sameer Maskey Announcement Reading Assignments Will be posted online tonight Homework 1 Assigned and available from the course website
More informationSupport Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Support Vector Machines CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Linear Classifier Naive Bayes Assume each attribute is drawn from Gaussian distribution with the same variance Generative model:
More informationLINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning
LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu November 3, 2015 Methods to Learn Matrix Data Text Data Set Data Sequence Data Time Series Graph
More informationMathematical Formulation of Our Example
Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot
More information6.867 Machine Learning
6.867 Machine Learning Problem Set 2 Due date: Wednesday October 6 Please address all questions and comments about this problem set to 6867-staff@csail.mit.edu. You will need to use MATLAB for some of
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More information