Semi-supervised learning for node classification in networks

Size: px
Start display at page:

Download "Semi-supervised learning for node classification in networks"

Transcription

1 Semi-supervised learning for node classification in networks Jennifer Neville Departments of Computer Science and Statistics Purdue University (joint work with Paul Bennett, John Moore, and Joel Pfeiffer)

2 vs. How to exploit relational dependencies to improve predictions about users?

3 G =(V,E) V := users E := friendships Data network

4 Gender? Married? Politics? Religion? G =(V,E) V := users E := friendships Data network

5 < X,Y j =0> F N D!C F Y D!C M Y D C M N C C M Y C C < X,Y i =1> F N D!C F N C!C F Y D C G =(V,E) V := users E := friendships Attributed network

6 How to predict labels within a single, partially labeled network? < X,Y i =1> < X,Y j =0>

7 Graph regularization: Assume linked nodes exhibit homophily and make predictions via label propagation Christakis and J.H. Fowler, The Spread of Obesity in a Large Social Network Over 32 Years, New England Journal of Medicine 2007; 35: ;

8 Apply model to make predictions collectively Small world graph Labeled nodes: 30% Autocorrelation: 0.5

9 No learning Add links Learn weights Graph regularization GRF (Zhu et al. 03); wvrn (Macskassy et al. 07) wvrn+ (Macskassy 07); GhostEdges (Gallagher et al. 08); SCRN (Wang et al. 13) GNetMine (Ji et al. 10); LNP (Shi et al. 11); LGCW (Dhurandhar et al. 13) Probabilistic modeling Learn from labeled only RMN (Taskar et al. 01); MLN (Richardson et al. 06); RDN (Neville et al. 06) Add graph features SocialDim (Tang et al. 09); RelationStr (Xiang et al. 10); Semi-super. learning LBC (Lu et al. 03); PL-EM (Xiang et al. 08); CC-HLR (McDowell et al. 12); RDA (Pfeiffer et al. 14)

10 Probabilistic modeling: Learn statistical relational model and use joint inference to make predictions X 2 2 X 8 2 X 2 1 X 2 3 X 8 1 X 8 3 X 1 1 X 1 2 X 1 3 Y 2 X 4 1 X 4 2 X 4 3 X 5 1 X 5 2 X 5 3 Y 8 Y 1 X 3 1 X 3 2 X 3 3 Y 4 X 6 1 X 6 2 X 6 3 Y 5 X 7 1 X 7 2 X 7 3 Y 3 Y 6 Y 7 Estimate joint distribution: P (Y {X} n,g) Note we often have only a single network for learning

11 No learning Add links Learn weights Graph regularization GRF (Zhu et al. 03); wvrn (Macskassy et al. 07) wvrn+ (Macskassy 07); GhostEdges (Gallagher et al. 08); SCRN (Wang et al. 13) GNetMine (Ji et al. 10); LNP (Shi et al. 11); LGCW (Dhurandhar et al. 13) Learn from labeled only Add graph features Semi-super. learning Probabilistic modeling RMN (Taskar et al. 01); MLN (Richardson et al. 06); RDN (Neville et al. 06) SocialDim (Tang et al. 09); RelationStr (Xiang et al. 10); LBC (Lu et al. 03); PL-EM (Xiang et al. 08); CC-HLR (McDowell et al. 12); RDA (Pfeiffer et al. 14)

12 How does regularization compare to probabilistic modeling?

13 Regularization vs. probabilistic modeling (Zeno & Neville, MLG 16) Weighted vote relational neighbor Relational Bayes collective classifier Attribute correlation AUC Probabilistic modeling can exploit dependencies across wider range of scenarios, link density impacts performance Network density Network density

14 How to learn from one partially labeled network?

15 Method 1: Ignore unlabeled during learning Drop out unlabeled nodes; Training data = labeled part of network Model is defined via local conditional; optimize params using pseudolikelihood Labels ˆ Y = arg max Y Edges P (Y X, E, Y ) ˆ Y Attributes = arg max Y X v i 2V L log P Y (y i Y MBL (v i ), x i, Y ) {summation over local log conditionals}

16 Method 1: Apply learned model to remainder Test data = full network (but only evaluate on unlabeled) Labeled nodes seed the inference Use approximate inference to collectively classify (e.g., Variational, Gibbs) Labels Attributes Edges P (Y X, E, Y ) For unlabeled instances, iteratively estimate: P Y (y i Y MB (v i ), x i, Y )

17 Method 2: Semi-supervised learning Use entire network to jointly learn parameters and make inferences about class labels of unlabeled nodes Lu and Getoor (ICML 03) use relational features and ICA McDowell and Aha (ICML 12) combine two classifiers with label regularization

18 Semi-supervised relational learning Relational Expectation Maximization (EM) (Xiang & Neville 08) Expectation (E) Step P Y (y i Y MB (v i ), x i, Y ) ˆ Y = arg max Y X Maximization (M) Step Y U 2Y U P Y (Y U ){summation over local log conditionals} Predict labels with collective classification Use predicted probabilities during optimization (in local conditionals) B D E M A B C D E F A C J F L G H I K

19 How does relational EM perform? Works well when network has a moderate amount of labels If network is sparsely labeled, it is often better to use a model that is not learned Graph regularization Relational EM Why? In sparsely labeled networks, errors from the collective classification compound during propagation Both learning and inference require approximation and network structure impacts errors

20 Finding: Network structure can bias inference in partially-labeled networks; maximum entropy constraints correct for bias

21 Effect of relational biases on R-EM We compared CL-EM and PL-EM and examined the distribution of predicted probabilities on a real world dataset P(+) 25% 19% 13% Err - Amazon Co-occurrence (SNAP) - Varied class priors, 10% Labeled 6% 0% Actual RML CL-EM PL-EM Overpropagation error during inference causes PL-EM to collapse to single prediction Worse on sparsely labeled datasets Need method to correct bias for any method based on local (relational) conditional

22 Maximum entropy inference for PL-EM (Pfeiffer et al. WWW 15) Correction to inference (E-Step) - Enables estimation with the pseudolikelihood (M-Step) Idea: The proportion of negatively predicted items should equal the proportion of negatively labeled items - Fix: Shift the probabilities up/down Repeat for each inference itr transform probabilities to logit space: h i = 1 (P (y i = 1)) compute offset location: = P (0) V U adjust logits: h i = h i h ( ) transform back to probabilities: P (y i )= (h i ) P(+) 0 Corrected probabilities are used to retrain during PL-EM (M-Step) 1 0 (P (y) =0.5) Pivot = 5/7 1

23 Experimental results - Correction effects 100% Amazon (small prior) 100% Amazon (large prior) 75% 75% P(+) 50% P(+) 50% 25% 25% 0% Actual RML CL-EM PL-EM (Naive) PL-EM(MaxEntInf) Max entropy correction removes bias due to over propagation in collective inference 0% Actual RML CL-EM PL-EM (Naive) PL-EM (MaxEntInf)

24 Experimental results - Large patent dataset BAE LR LR (EM) RLR LP CL-EM PL-EM (Naive) PL-EM (MaxEntInf) Proportion Labeled Computers PL-EM (Naive) PL-EM (MaxEnt) BAE Proportion Labeled Organic PL-EM (Naive) PL-EM (MaxEnt) Correction allows relational EM to improve over competing methods in sparsely labeled domains Note: McDowell & Aha (ICML 12) may correct same effect, but during estimation rather than inference

25 Can neural networks improve predictions by further reducing bias?

26 Deep collective inference (Moore & Neville AAAI 17) Approach: Use a neural network with neighbors attributes and class label predictions as inputs Key ideas: Represent set of neighbors as a sequence, in random order To deal with heterogenous inputs (i.e., varying number of neighbors), use a recurrent model (LSTM) To learn with partially labeled network, use semi-supervised collective classification

27 Example network b e g h d f j a c i red = target node blue = neighbors grey = labeled white = unlabeled

28 Example network red = target node blue = neighbors grey = labeled white = unlabeled b e i f a d ŷ (t c) d [f b, ŷ (t c 1) b ] [f f, ŷ (t c 1) f ] [f d, ŷ (t c 1) [f e,y e ] [f i,y i ] [f a,y a ] d ]

29 Depp collective inference (DCI): Model description For node, and current iteration, the input is node features concatenated with previous prediction [f i, ŷ (t c 1) i ] and neighbor features concatenated with predictions/labels {[f j, (y j or ŷ (t c 1) j )] v j 2 N i } For node v i t c v i, specified input is: x i =[x (0) apple = i, x (1) i,...,x ( N i ) i ] [f j1,y j1 ], [f j2,y j2 ],...,[f i, ŷ (t c 1) i ] b e g h d f j b e i f a d a c i ^ (t x d = [< f b,y c -1) (t b >, < f e,y e >, < f i,y i >, < f f,y c -1) (t f >, < f a,y a >, < f d,y c -1) d > ] (0) (1) (2) (3) (4) (5) = [x d, x d, x d, x d, x d, x d ] ^ ^

30 LSTM Structure y^ Structure of LSTM with sequential inputs... x (0) x (1) x ( Ɲ -1) x ( Ɲ ) LSTM input at end of sequence with w hidden units and p features (t-1) h 1... (t-1) h w LSTM cell ŷ f 0 f 1... f p y^ x ( Ɲ )

31 Learning: key aspects Initialize label predictions with non-collective version of model Deep Relational Inference (DRI) Semi-supervised learning: Estimate parameters until convergence, then perform collective inference to make predictions for all unlabeled nodes Randomize neighbor order on every iteration Correct for imbalanced classes, either by balancing the objective function or by balancing the data with augmentation Use backpropagation through time with early stopping and cross-entropy loss

32 Evaluation on small-medium sized networks Amazon DVD (21/79) Amazon DVD (50/50) LR LP LR+ RNCC PLEM PLEM+ DCI LR LP LR+ RNCC PLEM PLEM+ DCI Lower BAE is better

33 Evaluation on large network (900K nodes) Patents (17/83) Lower BAE is better PLEM PLEM+ RNCC DCI Overall: 12% gain over PLEM+N2V 20% gain over RNCC (competing NN)

34 How does network structure impact performance of learning and inference?

35 Impact of network on collective classification Non-stationarity in network structure reduces accuracy (Angin & N 08) Propagation error during inference is mediated by local network structure (Xiang & N 11) The common thread among these effects is a difference in graph distribution between the learning and inference settings Difference in network structure and label availability results in variance among node marginals, which impairs performance (Xiang & N 13) Structure of rolled-out model can lead to inference instability in collective classification (Pfeiffer et al. 14) which increases error due to bias in learning and/or inference Understanding the mechanisms that lead to error can help to improve models/ algorithms, e.g., neural networks may help to reduce bias, but at expense of increased variance

36 Thanks

Collective classification in large scale networks. Jennifer Neville Departments of Computer Science and Statistics Purdue University

Collective classification in large scale networks. Jennifer Neville Departments of Computer Science and Statistics Purdue University Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University The data mining process Network Datadata Knowledge Selection Interpretation

More information

How to exploit network properties to improve learning in relational domains

How to exploit network properties to improve learning in relational domains How to exploit network properties to improve learning in relational domains Jennifer Neville Departments of Computer Science and Statistics Purdue University!!!! (joint work with Brian Gallagher, Timothy

More information

Supporting Statistical Hypothesis Testing Over Graphs

Supporting Statistical Hypothesis Testing Over Graphs Supporting Statistical Hypothesis Testing Over Graphs Jennifer Neville Departments of Computer Science and Statistics Purdue University (joint work with Tina Eliassi-Rad, Brian Gallagher, Sergey Kirshner,

More information

Lifted and Constrained Sampling of Attributed Graphs with Generative Network Models

Lifted and Constrained Sampling of Attributed Graphs with Generative Network Models Lifted and Constrained Sampling of Attributed Graphs with Generative Network Models Jennifer Neville Departments of Computer Science and Statistics Purdue University (joint work with Pablo Robles Granda,

More information

A Shrinkage Approach for Modeling Non-Stationary Relational Autocorrelation

A Shrinkage Approach for Modeling Non-Stationary Relational Autocorrelation A Shrinkage Approach for Modeling Non-Stationary Relational Autocorrelation Pelin Angin Department of Computer Science Purdue University pangin@cs.purdue.edu Jennifer Neville Department of Computer Science

More information

A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

More information

Node similarity and classification

Node similarity and classification Node similarity and classification Davide Mottin, Anton Tsitsulin HassoPlattner Institute Graph Mining course Winter Semester 2017 Acknowledgements Some part of this lecture is taken from: http://web.eecs.umich.edu/~dkoutra/tut/icdm14.html

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Sequential Supervised Learning

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require Sequential Learning Part-of of-speech Tagging Information Extraction from the Web Text-to to-speech Mapping Part-of of-speech Tagging Given

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

Classification Semi-supervised learning based on network. Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS Winter

Classification Semi-supervised learning based on network. Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS Winter Classification Semi-supervised learning based on network Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS 249-2 2017 Winter Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions Xiaojin

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Probabilistic Models for Sequence Labeling

Probabilistic Models for Sequence Labeling Probabilistic Models for Sequence Labeling Besnik Fetahu June 9, 2011 Besnik Fetahu () Probabilistic Models for Sequence Labeling June 9, 2011 1 / 26 Background & Motivation Problem introduction Generative

More information

Learning From Crowds. Presented by: Bei Peng 03/24/15

Learning From Crowds. Presented by: Bei Peng 03/24/15 Learning From Crowds Presented by: Bei Peng 03/24/15 1 Supervised Learning Given labeled training data, learn to generalize well on unseen data Binary classification ( ) Multi-class classification ( y

More information

A Deep Interpretation of Classifier Chains

A Deep Interpretation of Classifier Chains A Deep Interpretation of Classifier Chains Jesse Read and Jaakko Holmén http://users.ics.aalto.fi/{jesse,jhollmen}/ Aalto University School of Science, Department of Information and Computer Science and

More information

IFT Lecture 7 Elements of statistical learning theory

IFT Lecture 7 Elements of statistical learning theory IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and

More information

Measuring Social Influence Without Bias

Measuring Social Influence Without Bias Measuring Social Influence Without Bias Annie Franco Bobbie NJ Macdonald December 9, 2015 The Problem CS224W: Final Paper How well can statistical models disentangle the effects of social influence from

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Max-margin learning of GM Eric Xing Lecture 28, Apr 28, 2014 b r a c e Reading: 1 Classical Predictive Models Input and output space: Predictive

More information

RaRE: Social Rank Regulated Large-scale Network Embedding

RaRE: Social Rank Regulated Large-scale Network Embedding RaRE: Social Rank Regulated Large-scale Network Embedding Authors: Yupeng Gu 1, Yizhou Sun 1, Yanen Li 2, Yang Yang 3 04/26/2018 The Web Conference, 2018 1 University of California, Los Angeles 2 Snapchat

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Machine Learning for Structured Prediction

Machine Learning for Structured Prediction Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

lecture 6: modeling sequences (final part)

lecture 6: modeling sequences (final part) Natural Language Processing 1 lecture 6: modeling sequences (final part) Ivan Titov Institute for Logic, Language and Computation Outline After a recap: } Few more words about unsupervised estimation of

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity- Representativeness Reward

Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity- Representativeness Reward Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity- Representativeness Reward Kaiyang Zhou, Yu Qiao, Tao Xiang AAAI 2018 What is video summarization? Goal: to automatically

More information

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

CS Machine Learning Qualifying Exam

CS Machine Learning Qualifying Exam CS Machine Learning Qualifying Exam Georgia Institute of Technology March 30, 2017 The exam is divided into four areas: Core, Statistical Methods and Models, Learning Theory, and Decision Processes. There

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder

More information

Be able to define the following terms and answer basic questions about them:

Be able to define the following terms and answer basic questions about them: CS440/ECE448 Section Q Fall 2017 Final Review Be able to define the following terms and answer basic questions about them: Probability o Random variables, axioms of probability o Joint, marginal, conditional

More information

Introduction to Bayesian Learning

Introduction to Bayesian Learning Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline

More information

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari Learning MN Parameters with Alternative Objective Functions Sargur srihari@cedar.buffalo.edu 1 Topics Max Likelihood & Contrastive Objectives Contrastive Objective Learning Methods Pseudo-likelihood Gradient

More information

Qualifier: CS 6375 Machine Learning Spring 2015

Qualifier: CS 6375 Machine Learning Spring 2015 Qualifier: CS 6375 Machine Learning Spring 2015 The exam is closed book. You are allowed to use two double-sided cheat sheets and a calculator. If you run out of room for an answer, use an additional sheet

More information

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging 10-708: Probabilistic Graphical Models 10-708, Spring 2018 10 : HMM and CRF Lecturer: Kayhan Batmanghelich Scribes: Ben Lengerich, Michael Kleyman 1 Case Study: Supervised Part-of-Speech Tagging We will

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori

More information

10-701/15-781, Machine Learning: Homework 4

10-701/15-781, Machine Learning: Homework 4 10-701/15-781, Machine Learning: Homewor 4 Aarti Singh Carnegie Mellon University ˆ The assignment is due at 10:30 am beginning of class on Mon, Nov 15, 2010. ˆ Separate you answers into five parts, one

More information

Sampling of Attributed Networks from Hierarchical Generative Models

Sampling of Attributed Networks from Hierarchical Generative Models Sampling of Attributed Networks from Hierarchical Generative Models Pablo Robles Purdue University West Lafayette, IN USA problesg@purdue.edu Sebastian Moreno Universidad Adolfo Ibañez Viña del Mar, Chile

More information

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Graphical Networks: Definitions and Basic Results This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

Online Bayesian Passive-Agressive Learning

Online Bayesian Passive-Agressive Learning Online Bayesian Passive-Agressive Learning International Conference on Machine Learning, 2014 Tianlin Shi Jun Zhu Tsinghua University, China 21 August 2015 Presented by: Kyle Ulrich Introduction Online

More information

Hidden Markov Models

Hidden Markov Models 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework

More information

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann ECLT 5810 Classification Neural Networks Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann Neural Networks A neural network is a set of connected input/output

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Instance-based Domain Adaptation

Instance-based Domain Adaptation Instance-based Domain Adaptation Rui Xia School of Computer Science and Engineering Nanjing University of Science and Technology 1 Problem Background Training data Test data Movie Domain Sentiment Classifier

More information

Advanced statistical methods for data analysis Lecture 2

Advanced statistical methods for data analysis Lecture 2 Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu From statistics to data science BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Why? How? What? How much? How many? Individual facts (quantities, characters, or symbols) The Data-Information-Knowledge-Wisdom

More information

Some Probability and Statistics

Some Probability and Statistics Some Probability and Statistics David M. Blei COS424 Princeton University February 13, 2012 Card problem There are three cards Red/Red Red/Black Black/Black I go through the following process. Close my

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Unsupervised Learning

Unsupervised Learning CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality

More information

Probabilistic Graphical Models & Applications

Probabilistic Graphical Models & Applications Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with

More information

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Sameer Maskey Week 13, Nov 28, 2012 1 Announcements Next lecture is the last lecture Wrap up of the semester 2 Final Project

More information

Batch Mode Sparse Active Learning. Lixin Shi, Yuhang Zhao Tsinghua University

Batch Mode Sparse Active Learning. Lixin Shi, Yuhang Zhao Tsinghua University Batch Mode Sparse Active Learning Lixin Shi, Yuhang Zhao Tsinghua University Our work Propose an unified framework of batch mode active learning Instantiate the framework using classifiers based on sparse

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

Tensor Methods for Feature Learning

Tensor Methods for Feature Learning Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to

More information

A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning

A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning Saharon Rosset Data Analytics Research Group IBM T.J. Watson Research Center Yorktown Heights, NY 1598 srosset@us.ibm.com Hui

More information

Introduction An approximated EM algorithm Simulation studies Discussion

Introduction An approximated EM algorithm Simulation studies Discussion 1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse

More information

Learning in Probabilistic Graphs exploiting Language-Constrained Patterns

Learning in Probabilistic Graphs exploiting Language-Constrained Patterns Learning in Probabilistic Graphs exploiting Language-Constrained Patterns Claudio Taranto, Nicola Di Mauro, and Floriana Esposito Department of Computer Science, University of Bari "Aldo Moro" via E. Orabona,

More information

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference CS 229 Project Report (TR# MSB2010) Submitted 12/10/2010 hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference Muhammad Shoaib Sehgal Computer Science

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24 Big Data Analytics Special Topics for Computer Science CSE 4095-001 CSE 5095-005 Feb 24 Fei Wang Associate Professor Department of Computer Science and Engineering fei_wang@uconn.edu Prediction III Goal

More information

Adaptive Multi-Modal Sensing of General Concealed Targets

Adaptive Multi-Modal Sensing of General Concealed Targets Adaptive Multi-Modal Sensing of General Concealed argets Lawrence Carin Balaji Krishnapuram, David Williams, Xuejun Liao and Ya Xue Department of Electrical & Computer Engineering Duke University Durham,

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

ECE521 Lecture 7/8. Logistic Regression

ECE521 Lecture 7/8. Logistic Regression ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression

More information

CSCE 478/878 Lecture 6: Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

More information

Natural Language Processing with Deep Learning CS224N/Ling284

Natural Language Processing with Deep Learning CS224N/Ling284 Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 4: Word Window Classification and Neural Networks Richard Socher Organization Main midterm: Feb 13 Alternative midterm: Friday Feb

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation) Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Artificial Neural Network

Artificial Neural Network Artificial Neural Network Eung Je Woo Department of Biomedical Engineering Impedance Imaging Research Center (IIRC) Kyung Hee University Korea ejwoo@khu.ac.kr Neuron and Neuron Model McCulloch and Pitts

More information

Online Bayesian Passive-Aggressive Learning

Online Bayesian Passive-Aggressive Learning Online Bayesian Passive-Aggressive Learning Full Journal Version: http://qr.net/b1rd Tianlin Shi Jun Zhu ICML 2014 T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 1 / 35 Outline Introduction Motivation Framework

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Machine Learning Lecture Notes

Machine Learning Lecture Notes Machine Learning Lecture Notes Predrag Radivojac January 25, 205 Basic Principles of Parameter Estimation In probabilistic modeling, we are typically presented with a set of observations and the objective

More information

Cost-Sensitive Learning with Conditional Markov Networks

Cost-Sensitive Learning with Conditional Markov Networks Cost-Sensitive Learning with Conditional Markov Networks Prithviraj Sen sen@cs.umd.edu Lise Getoor getoor@cs.umd.edu Abstract There has been a recent, growing interest in classification and link prediction

More information

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi Boosting CAP5610: Machine Learning Instructor: Guo-Jun Qi Weak classifiers Weak classifiers Decision stump one layer decision tree Naive Bayes A classifier without feature correlations Linear classifier

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

Inferring Useful Heuristics from the Dynamics of Iterative Relational Classifiers

Inferring Useful Heuristics from the Dynamics of Iterative Relational Classifiers Inferring Useful Heuristics from the Dynamics of Iterative Relational Classifiers Aram Galstyan and Paul R. Cohen USC Information Sciences Institute 4676 Admiralty Way, Suite 11 Marina del Rey, California

More information

Probability Based Learning

Probability Based Learning Probability Based Learning Lecture 7, DD2431 Machine Learning J. Sullivan, A. Maki September 2013 Advantages of Probability Based Methods Work with sparse training data. More powerful than deterministic

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

Statistical Learning Reading Assignments

Statistical Learning Reading Assignments Statistical Learning Reading Assignments S. Gong et al. Dynamic Vision: From Images to Face Recognition, Imperial College Press, 2001 (Chapt. 3, hard copy). T. Evgeniou, M. Pontil, and T. Poggio, "Statistical

More information

Final Exam, Spring 2006

Final Exam, Spring 2006 070 Final Exam, Spring 2006. Write your name and your email address below. Name: Andrew account: 2. There should be 22 numbered pages in this exam (including this cover sheet). 3. You may use any and all

More information