Machine Learning

Similar documents
Problem sets. Late policy (5% off per day, but the weekend counts as only one day). E.g., Friday: -5% Monday: -15% Tuesday: -20% Thursday: -30%

Concepts & Categorization. Measurement of Similarity

Dimension Reduction (PCA, ICA, CCA, FLD,

Generative Clustering, Topic Modeling, & Bayesian Inference

Topic Models and Applications to Short Documents

Topic Modeling Using Latent Dirichlet Allocation (LDA)

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Distributed ML for DOSNs: giving power back to users

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Latent Dirichlet Allocation (LDA)

Kernel Density Topic Models: Visual Topics Without Visual Words

Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1

Machine Learning

RECSM Summer School: Facebook + Topic Models. github.com/pablobarbera/big-data-upf

Multi-Label Informed Latent Semantic Indexing

Clustering. Introduction to Data Science. Jordan Boyd-Graber and Michael Paul SLIDES ADAPTED FROM LAUREN HANNAH

Machine Learning - MT & 14. PCA and MDS

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

STA 414/2104: Lecture 8

Latent Dirichlet Allocation Introduction/Overview

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

CS Lecture 18. Topic Models and LDA

Unsupervised Learning: Linear Dimension Reduction

Introduction To Machine Learning

Machine Learning (Spring 2012) Principal Component Analysis

Machine Learning

November 28 th, Carlos Guestrin 1. Lower dimensional projections

Neural networks and optimization

Content-based Recommendation

Web Search and Text Mining. Lecture 16: Topics and Communities

CSC321 Lecture 20: Autoencoders

Machine Learning Gaussian Naïve Bayes Big Picture

Pattern Recognition and Machine Learning

Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson

Deriving Principal Component Analysis (PCA)

Stat 406: Algorithms for classification and prediction. Lecture 1: Introduction. Kevin Murphy. Mon 7 January,

Hybrid Models for Text and Graphs. 10/23/2012 Analysis of Social Media

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012

Non-Parametric Bayes

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Machine learning for pervasive systems Classification in high-dimensional spaces

CSC 411 Lecture 12: Principal Component Analysis

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Latent Dirichlet Allocation

CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization

Applying LDA topic model to a corpus of Italian Supreme Court decisions

Recent Advances in Bayesian Inference Techniques

CS534 Machine Learning - Spring Final Exam

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

CS145: INTRODUCTION TO DATA MINING

Understanding Comments Submitted to FCC on Net Neutrality. Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014

PAC-learning, VC Dimension and Margin-based Bounds

Distinguish between different types of scenes. Matching human perception Understanding the environment

Topic Modelling and Latent Dirichlet Allocation

Information Extraction from Text

AN INTRODUCTION TO TOPIC MODELS

Principal Component Analysis

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Computer Vision Group Prof. Daniel Cremers. 3. Regression

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

CS6220: DATA MINING TECHNIQUES

Probabilistic Dyadic Data Analysis with Local and Global Consistency

STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ECE 5984: Introduction to Machine Learning

STA 414/2104: Lecture 8

CS 6375 Machine Learning

Generative v. Discriminative classifiers Intuition

Statistical Topic Models for Computational Social Science Hanna M. Wallach

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank

The Perceptron. Volker Tresp Summer 2018

Introduction to Machine Learning

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Machine Learning 11. week

Lecture 13 : Variational Inference: Mean Field Approximation

Issues and Techniques in Pattern Classification

CS 540: Machine Learning Lecture 1: Introduction

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Collaborative topic models: motivations cont

Machine Learning (CS 567) Lecture 2

Applying hlda to Practical Topic Modeling

Loss Functions, Decision Theory, and Linear Models

Notes on Latent Semantic Analysis

Introduction to Probabilistic Machine Learning

4 Bias-Variance for Ridge Regression (24 points)

ECE-271B. Nuno Vasconcelos ECE Department, UCSD

Machine Learning

Bayesian Linear Regression [DRAFT - In Progress]

Lecture 22 Exploratory Text Analysis & Topic Models

Brief Introduction of Machine Learning Techniques for Content Analysis

Dimensionality Reduction

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Transcription:

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University April 5, 2011 Today: Latent Dirichlet Allocation topic models Social network analysis based on latent probabilistic models Kernel regression Readings: Kernels: Bishop Ch. 6.1 optional: Bishop Ch 6.2, 6.3 Kernel Methods for Pattern Analysis, Shawe-Taylor & Cristianini, Chapter 2 Supervised Dimensionality Reduction 1

Supervised Dimensionality Reduction Neural nets: learn hidden layer representation, designed to optimize network prediction accuracy PCA: unsupervised, minimize reconstruction error but sometimes people use PCA to re-represent original data before classification (to reduce dimension, to reduce overfitting) Fisher Linear Discriminant like PCA, learns a linear projection of the data but supervised: it uses labels to choose projection Fisher Linear Discriminant A method for projecting data into lower dimension to hopefully improve classification We ll consider 2-class case Project data onto vector that connects class means? 2

Fisher Linear Discriminant Project data onto one dimension, to help classification Define class means: Could choose w according to: Instead, Fisher Linear Discriminant chooses: Summary: Fisher Linear Discriminant Choose n-1 dimension projection for n-class classification problem Use within-class covariances to determine the projection Minimizes a different error function (the projected withinclass variances) 3

4 STORY STORIES TELL CHARACTER CHARACTERS AUTHOR READ TOLD SETTING TALES PLOT TELLING SHORT FICTION ACTION TRUE EVENTS TELLS TALE NOVEL MIND WORLD DREAM DREAMS THOUGHT IMAGINATION MOMENT THOUGHTS OWN REAL LIFE IMAGINE SENSE CONSCIOUSNESS STRANGE FEELING WHOLE BEING MIGHT HOPE WATER FISH SEA SWIM SWIMMING POOL LIKE SHELL SHARK TANK SHELLS SHARKS DIVING DOLPHINS SWAM LONG SEAL DIVE DOLPHIN UNDERWATER DISEASE BACTERIA DISEASES GERMS FEVER CAUSE CAUSED SPREAD VIRUSES INFECTION VIRUS MICROORGANISMS PERSON INFECTIOUS COMMON CAUSING SMALLPOX BODY INFECTIONS CERTAIN Example topics induced from a large collection of text MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POLES IRON COMPASS LINES CORE ELECTRIC DIRECTION FORCE MAGNETS BE MAGNETISM POLE INDUCED SCIENCE STUDY SCIENTISTS SCIENTIFIC KNOWLEDGE WORK RESEARCH CHEMISTRY TECHNOLOGY MANY MATHEMATICS BIOLOGY PHYSICS LABORATORY STUDIES WORLD SCIENTIST STUDYING SCIENCES BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY PLAYER BASKETBALL COACH PLAYED PLAYING HIT TENNIS TEAMS GAMES SPORTS BAT TERRY JOB WORK JOBS CAREER EXPERIENCE EMPLOYMENT OPPORTUNITIES WORKING TRAINING SKILLS CAREERS POSITIONS FIND POSITION OCCUPATIONS REQUIRE OPPORTUNITY EARN ABLE [Tennenbaum et al] What about Probabilistic Approaches? Supervised? Unsupervised?

5 STORY STORIES TELL CHARACTER CHARACTERS AUTHOR READ TOLD SETTING TALES PLOT TELLING SHORT FICTION ACTION TRUE EVENTS TELLS TALE NOVEL MIND WORLD DREAM DREAMS THOUGHT IMAGINATION MOMENT THOUGHTS OWN REAL LIFE IMAGINE SENSE CONSCIOUSNESS STRANGE FEELING WHOLE BEING MIGHT HOPE WATER FISH SEA SWIM SWIMMING POOL LIKE SHELL SHARK TANK SHELLS SHARKS DIVING DOLPHINS SWAM LONG SEAL DIVE DOLPHIN UNDERWATER DISEASE BACTERIA DISEASES GERMS FEVER CAUSE CAUSED SPREAD VIRUSES INFECTION VIRUS MICROORGANISMS PERSON INFECTIOUS COMMON CAUSING SMALLPOX BODY INFECTIONS CERTAIN Example topics induced from a large collection of text MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POLES IRON COMPASS LINES CORE ELECTRIC DIRECTION FORCE MAGNETS BE MAGNETISM POLE INDUCED SCIENCE STUDY SCIENTISTS SCIENTIFIC KNOWLEDGE WORK RESEARCH CHEMISTRY TECHNOLOGY MANY MATHEMATICS BIOLOGY PHYSICS LABORATORY STUDIES WORLD SCIENTIST STUDYING SCIENCES BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY PLAYER BASKETBALL COACH PLAYED PLAYING HIT TENNIS TEAMS GAMES SPORTS BAT TERRY JOB WORK JOBS CAREER EXPERIENCE EMPLOYMENT OPPORTUNITIES WORKING TRAINING SKILLS CAREERS POSITIONS FIND POSITION OCCUPATIONS REQUIRE OPPORTUNITY EARN ABLE [Tennenbaum et al] Plate Notation

Latent Dirichlet Allocation Model Also extended to case where number of topics is not known in advance (hierarchical Dirichlet processes [Blei et al, 2004]) Clustering words into topics with Latent Dirichlet Allocation [Blei, Ng, Jordan 2003] Probabilistic model for document set: For each of the D documents: 1. Pick a θ d ~ P(θ α) to define P(z θ d ) 2. For each of the N d words w Pick topic z n ~ P(z θ d ) Pick word w n ~ P(w z n, φ) Training this model defines topics (i.e., φ which defines P(W Z)) 6

Example topics induced from a large collection of text DISEASE BACTERIA WATER FISH MIND WORLD STORY STORIES DISEASES SEA DREAM TELL Significance: GERMS SWIM DREAMS CHARACTER FEVER SWIMMING THOUGHT CHARACTERS CAUSE POOL IMAGINATION AUTHOR Learned CAUSED topics LIKE reveal MOMENT implicit READ SPREAD SHELL THOUGHTS TOLD semantic VIRUSES categories SHARK of OWN words SETTING INFECTION TANK REAL TALES within VIRUS the documents SHELLS LIFE PLOT MICROORGANISMS SHARKS IMAGINE TELLING PERSON DIVING SENSE SHORT In INFECTIOUS many cases, DOLPHINS we CONSCIOUSNESS can FICTION COMMON SWAM STRANGE ACTION represent CAUSING documents LONG FEELING with 10 2 TRUE SMALLPOX SEAL WHOLE EVENTS topics BODY instead DIVE of 10 5 words BEING TELLS INFECTIONS CERTAIN DOLPHIN UNDERWATER MIGHT HOPE TALE NOVEL Especially important for short documents (e.g., emails). Topics overlap when words don t! MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POLES IRON COMPASS LINES CORE ELECTRIC DIRECTION FORCE MAGNETS BE MAGNETISM POLE INDUCED SCIENCE STUDY SCIENTISTS SCIENTIFIC KNOWLEDGE WORK RESEARCH CHEMISTRY TECHNOLOGY MANY MATHEMATICS BIOLOGY PHYSICS LABORATORY STUDIES WORLD SCIENTIST STUDYING SCIENCES BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY PLAYER BASKETBALL COACH PLAYED PLAYING HIT TENNIS TEAMS GAMES SPORTS BAT TERRY [Tennenbaum et al] JOB WORK JOBS CAREER EXPERIENCE EMPLOYMENT OPPORTUNITIES WORKING TRAINING SKILLS CAREERS POSITIONS FIND POSITION OCCUPATIONS REQUIRE OPPORTUNITY EARN ABLE Analyzing topic distributions in email 7

Author-Recipient-Topic model for Email Latent Dirichlet Allocation (LDA) [Blei, Ng, Jordan, 2003] Author-Recipient Topic (ART) [McCallum, Corrada, Wang, 2005] Enron Email Corpus 250k email messages 23k people Date: Wed, 11 Apr 2001 06:56:00-0700 (PDT) From: debra.perlingiere@enron.com To: steve.hooser@enron.com Subject: Enron/TransAltaContract dated Jan 1, 2001 Please see below. Katalin Kiss of TransAlta has requested an electronic copy of our final draft? Are you OK with this? If so, the only version I have is the original draft without revisions. DP Debra Perlingiere Enron North America Corp. Legal Department 1400 Smith Street, EB 3885 Houston, Texas 77002 dperlin@enron.com 8

Topics, and prominent sender/receivers discovered by ART [McCallum et al, 2005] Top words within topic : Top author-recipients exhibiting this topic Topics, and prominent sender/receivers discovered by ART Beck = Chief Operations Officer Dasovich = Government Relations Executive Shapiro = Vice Presidence of Regulatory Affairs Steffes = Vice President of Government Affairs 9

Discovering Role Similarity Traditional SNA ART connection strength (A,B) = Similarity in recipients they sent email to Similarity in authored topics, conditioned on recipient Discovering Role Similarity Tracy Geaconne Dan McCarty Traditional SNA ART Similar (send email to same individuals) Different (discuss different topics) Geaconne = Secretary McCarty = Vice President 10

Discovering Role Similarity Lynn Blair Kimberly Watson Traditional SNA ART Different (send to different individuals) Similar (discuss same topics) Blair = Gas pipeline logistics Watson = Pipeline facilities planning What you should know Unsupervised dimension reduction using all features Principle Components Analysis Minimize reconstruction error Singular Value Decomposition Efficient PCA Independent components analysis Canonical correlation analysis Probabilistic models with latent variables Supervised dimension reduction Fisher Linear Discriminant Project to n-1 dimensions to discriminate n classes Hidden layers of Neural Networks Most flexible, local minima issues LOTS of ways of combining discovery of latent features with classification tasks 11

Kernel Functions Kernel functions provide a way to manipulate data as though it were projected into a higher dimensional space, by operating on it in its original space This leads to efficient algorithms And is a key component of algorithms such as Support Vector Machines kernel PCA kernel CCA kernel regression Linear Regression Wish to learn f: X Y, where X=<X 1, X n >, Y real-valued Learn where 12

Linear Regression Wish to learn f: X Y, where X=<X 1, X n >, Y real-valued Learn where note l th row of X is l th training example x Tl Linear Regression Wish to learn f: X Y, where X=<X 1, X n >, Y real-valued Learn where solve by taking derivative wrt w, setting to zero so: 13