CS249: ADVANCED DATA MINING

Size: px
Start display at page:

Download "CS249: ADVANCED DATA MINING"

Transcription

1 CS249: ADVANCED DATA MINING Vector Data: Clustering: Part II Instructor: Yizhou Sun May 3, 2017

2 Methods to Learn: Last Lecture Classification Clustering Vector Data Text Data Recommender System Decision Tree; Naïve Bayes; Logistic Regression SVM; NN K-means; hierarchical clustering; DBSCAN; Mixture Models; kernel k-means PLSA; LDA Matrix Factorization Graph & Network Label Propagation SCAN; Spectral Clustering Prediction Ranking Linear Regression GLM Collaborative Filtering PageRank Feature Representation Word embedding Network embedding 2

3 Methods to Learn Classification Clustering Vector Data Text Data Recommender System Decision Tree; Naïve Bayes; Logistic Regression SVM; NN K-means; hierarchical clustering; DBSCAN; Mixture Models; kernel k-means PLSA; LDA Matrix Factorization Graph & Network Label Propagation SCAN; Spectral Clustering Prediction Ranking Linear Regression GLM Collaborative Filtering PageRank Feature Representation Word embedding Network embedding 3

4 Vector Data: Clustering: Part 2 Revisit K-means Kernel K-means Mixture Model and EM algorithm Summary 4

5 Objective function Recall K-Means kk JJ = jj=1 CC ii =jj xx ii cc jj 2 Total within-cluster variance Re-arrange the objective function kk JJ = jj=1 ii ww iiii xx ii cc jj 2 ww iiii {0,1} ww iiii = 1, iiii xx ii bbbbbbbbbbbbbb tttt cccccccccccccc jj; ww iiii = 0, ooooooooooooooooo Looking for: The best assignment ww iiii The best center cc jj 5

6 Iterations Solution of K-Means JJ = Step 1: Fix centers cc jj, find assignment ww iiii that minimizes JJ => ww iiii = 1, iiii xx ii cc jj 2 is the smallest Step 2: Fix assignment ww iiii, find centers that minimize JJ => first derivative of JJ = 0 => cc jj = 2 ii ww iiii (xx ii cc jj ) = 0 =>cc jj = ii ww iiiixx ii ii ww iiii Note ii ww iiii is the total number of objects in cluster j kk jj=1 ww iiii xx ii cc jj 2 ii 6

7

8

9

10

11

12 Converges! Why?

13 Limitations of K-Means K-means has problems when clusters are of different Sizes and density Non-Spherical Shapes 13

14 Limitations of K-Means: Different Sizes and Density 14

15 Limitations of K-Means: Non-Spherical Shapes 15

16 Demo ster/applet/code/cluster.html 16

17 Connections of K-means to Other Methods K-means Kernel K- means Gaussian Mixture Model 17

18 Vector Data: Clustering: Part 2 Revisit K-means Kernel K-means Mixture Model and EM algorithm Summary 18

19 Kernel K-Means How to cluster the following data? A non-linear map: φφ: RR nn FF Map a data point into a higher/infinite dimensional space xx φφ xx Dot product matrix KK iiii KK iiii =< φφ xx ii, φφ(xx jj ) > 19

20 Typical Kernel Functions Recall kernel SVM: 20

21 Solution of Kernel K-Means Objective function under new feature space: JJ = kk jj=1 ii ww iiii φφ(xx ii ) cc jj 2 Algorithm By fixing assignment ww iiii cc jj = ii ww iiii φφ(xx ii )/ ii ww iiii In the assignment step, assign the data points to the closest center dd xx ii, cc jj = φφ xx ii ii ww ii jjφφ xx ii ii ww ii jj 2 = φφ xx ii φφ xx ii 2 ii ww ii jj φφ xx ii φφ xx ii ii ww ii jj + ii ll ww ii jj ww llll φφ xx ii φφ xx ll ( ii ww ii jj )^2 Do not really need to know φφ xx, bbbbbb oooooooo KK iiii 21

22 Advantages and Disadvantages of Kernel K-Means Advantages Algorithm is able to identify the non-linear structures. Disadvantages Number of cluster centers need to be predefined. Algorithm is complex in nature and time complexity is large. References Kernel k-means and Spectral Clustering by Max Welling. Kernel k-means, Spectral Clustering and Normalized Cut by Inderjit S. Dhillon, Yuqiang Guan and Brian Kulis. An Introduction to kernel methods by Colin Campbell. 22

23 Vector Data: Clustering: Part 2 Revisit K-means Kernel K-means Mixture Model and EM algorithm Summary 23

24 Fuzzy Set and Fuzzy Cluster Clustering methods discussed so far Every data object is assigned to exactly one cluster Some applications may need for fuzzy or soft cluster assignment Ex. An e-game could belong to both entertainment and software Methods: fuzzy clusters and probabilistic model-based clusters Fuzzy cluster: A fuzzy set S: F S : X [0, 1] (value between 0 and 1) 24

25 Mixture Model-Based Clustering A set C of k probabilistic clusters C 1,,C k probability density functions: f 1,, f k, Cluster prior probabilities: w 1,, w k, jj ww jj = 1 Joint Probability of an object i and its cluster C j is: PP(xx ii, zz ii = CC jj ) = ww jj ff jj (xx ii ) Probability of i is: PP xx ii = jj ww jj ff jj (xx ii ) 25

26 Maximum Likelihood Estimation Since objects are assumed to be generated independently, for a data set D = {x 1,, x n }, we have, PP DD = PP xx ii = ww jj ff jj (xx ii ) ii ii jj llllllll DD = llllllll xx ii = llllll ww jj ff jj (xx ii ) ii Task: Find a set C of k probabilistic clusters s.t. P(D) is maximized ii jj 26

27 The EM (Expectation Maximization) Algorithm The (EM) algorithm: A framework to approach maximum likelihood or maximum a posteriori estimates of parameters in statistical models. E-step assigns objects to clusters according to the current fuzzy clustering or parameters of probabilistic clusters ww tt+1 iiii = pp zz ii = jj θθ tt jj, xx ii pp xx ii zz ii = jj, θθ jj tt pp(zz ii = jj) M-step finds the new clustering or parameters that maximize the expected likelihood, with respect to conditional distribution pp zz ii = jj θθ jj tt, xx ii θθ tt+1 = aaaaaaaaaaxx θθ ii jj ww tt+1 iiii log LL(xx ii, zz ii = jj θθ) 27

28 Gaussian Mixture Model Generative model For each object: Pick its distribution component: ZZ~MMMMMMMMMM ww 1,, ww kk Sample a value from the selected distribution: XX~NN μμ ZZ, σσ ZZ 2 Overall likelihood function LL DD θθ = ii jj ww jj pp(xx ii μμ jj, σσ jj 2 ) s.t. jj ww jj = 1 aaaaaa ww jj 0 Q: What is θθ here? 28

29 Estimating Parameters LL DD; θθ = ii log jj ww jj pp(xx ii μμ jj, σσ jj 2 ) Considering the first derivative of μμ jj : LL uu jj = ii ww jj jj ww jj pp(xx ii μμ jj,σσ jj 2 ) pp(xx ii μμ jj,σσ jj 2 ) μμ jj = ii ww jj pp(xx ii μμ jj,σσ jj 2 ) jj ww jj pp(xx ii μμ jj,σσ jj 2 ) 1 pp(xx ii μμ jj,σσ jj 2 ) (xx ii μμ jj,σσ jj 2 ) μμ jj = ii ww jj pp(xx ii μμ jj,σσ jj 2 ) jj ww jj pp(xx ii μμ jj,σσ jj 2 ) ww iiii = PP(ZZ = jj XX = xx ii, θθ) (xx ii μμ jj,σσ jj 2 ) uu jj (xx ii )/ μμ jj Like weighted likelihood estimation; But the weight is determined by the parameters! 29

30 Apply EM algorithm: 1-d An iterative algorithm (at iteration t+1) E(expectation)-step Evaluate the weight ww iiii when μμ jj, σσ jj, ww jj are given ww iiii tt+1 = ww jj tt pp(xx ii μμ jj tt,(σσjj 2 ) tt ) jj ww jj tt pp(xx ii μμ jj tt,(σσ jj 2 ) tt ) M(maximization)-step Evaluate μμ jj, σσ jj, ww jj when ww iiii s are given that maximize the weighted likelihood It is equivalent to Gaussian distribution parameter estimation when each point has a weight belonging to each distribution μμ tt+1 jj = ii ww iiii tt+1 xx ii tt+1 ii ww iiii ; (σσ 2 jj ) tt+1 = ii ww tt+1 tt 2 iiii xxii μμ jj tt+1 ; ww tt+1 tt+1 ii ww jj ii ww iiii iiii 30

31 Example: 1-D GMM 31

32 2-d Gaussian Bivariate Gaussian distribution Two dimensional random variable: X = XX 1 XX 1 XX 2 NN(μμ = μμ 1 μμ 2, Σ = μμ 1 aaaaaa μμ 2 are means of XX 1 aaaaaa XX 2 XX 2 σσ 1 2 σσ XX 1, XX 2 σσ XX 1, XX 2 σσ 2 2 ) σσ 1 aaaaaa σσ 2 aaaaaa standard deviations of XX 1 aaaaaa XX 2 σσ XX 1, XX 2 iiii ttttt cccccccccccccccccccc bbbbbbbbbbbbbb XX 1 aaaaaa XX 2, ii. ee., σσ XX 1, XX 2 = EE XX 1 μμ 1 XX 2 μμ 2 32

33 Apply EM algorithm: 2-d An iterative algorithm (at iteration t+1) E(expectation)-step Evaluate the weight ww iiii when μμ jj, Σ jj, ww jj are given ww tt+1 iiii = ww jj tt tt tt pp(xx ii μμ jj,σjj) jj ww tt jj pp(xx ii μμ tt jj,σ tt jj ) M(maximization)-step Evaluate μμ jj, Σ jj, ww jj when ww iiii s are given that maximize the weighted likelihood It is equivalent to Gaussian distribution parameter estimation when each point has a weight belonging to each distribution μμ tt+1 jj = ii ww iiii tt+1 xx ii ii ww tt+1 ; (σσ jj,1 iiii (σσ XX 1, XX 2 jj ) tt+1 = ii ww iiii 2 ) tt+1 = ii ww iiii tt+1 tt xx ii,1 μμ jj,1 ii ww tt+1 iiii tt+1 tt tt (xx ii,1 μμ jj,1)(xxii,2 μμ jj,2) ii ww tt+1 iiii 2 ; (σσ 2 jj,2 ) tt+1 = ii ww iiii tt+1 ; ww jj tt+1 ii ww iiii tt+1 2 tt xx ii,2 μμ jj,2 tt+1 ; ii ww iiii 33

34 K-Means: A Special Case of Gaussian Mixture Model When each Gaussian component with covariance matrix σσ 2 II Soft K-means ww iiii pp xx ii μμ jj, σσ 2 ww jj = exp xx ii μμ jj 2 When σσ 2 0 2σσ 2 Distance! Soft assignment becomes hard assignment ww iiii 1, iiii xx ii is closest to μμ jj (why?) ww jj 34

35 Why EM Works? E-Step: computing a tight lower bound LL of the original objective function l at θθ oooooo M-Step: find θθ nnnnnn to maximize the lower bound ll θθ nnnnnn LL θθ nnnnnn LL(θθ oooooo ) = ll(θθ oooooo ) 35

36 How to Find Tight Lower Bound? qq h : ttttt kkkkkk tttt ttttttttt llllllllll bbbbbbbbbb wwww wwwwwwww tttt gggggg Jensen s inequality When = holds to get a tight lower bound? qq h = pp(h dd, θθ) (why?) tthee ttttttttt llllllllll bbbbbbbbbb 36

37 In GMM Case LL DD; θθ = ii log jj ww jj pp(xx ii μμ jj, σσ jj 2 ) ii jj ww iiii (log ww jj pp xx ii μμ jj, σσ jj 2 llllllww iiii ) log LL(xx ii, zz ii = jj θθ) Does not involve θθ, can be dropped 37

38 Advantages and Disadvantages of GMM Strength Mixture models are more general than partitioning: different densities and sizes of clusters Clusters can be characterized by a small number of parameters The results may satisfy the statistical assumptions of the generative models Weakness Converge to local optimal (overcome: run multi-times w. random initialization) Computationally expensive if the number of distributions is large, or the data set contains very few observed data points Hard to estimate the number of clusters Can only deal with spherical clusters 38

39 Vector Data: Clustering: Part 2 Revisit K-means Kernel K-means Mixture Model and EM algorithm Summary 39

40 Revisit k-means Derivative Kernel k-means Summary Objective function; solution; connection to k- means Mixture models Gaussian mixture model; multinomial mixture model; EM algorithm; Connection to k-means 40

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu November 3, 2015 Methods to Learn Matrix Data Text Data Set Data Sequence Data Time Series Graph

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Graph and Network Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Methods Learnt Classification Clustering Vector Data Text Data Recommender System Decision Tree; Naïve

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Clustering Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu May 2, 2017 Announcements Homework 2 due later today Due May 3 rd (11:59pm) Course project

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Support Vector Machine and Neural Network Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 1 Announcements Due end of the day of this Friday (11:59pm) Reminder

More information

Support Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Support Vector Machines CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification

More information

Machine Learning for Data Science (CS4786) Lecture 12

Machine Learning for Data Science (CS4786) Lecture 12 Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Recommender Systems Instructor: Yizhou Sun yzsun@cs.ucla.edu May 17, 2017 Methods Learnt: Last Lecture Classification Clustering Vector Data Text Data Recommender System Decision

More information

Lecture 11. Kernel Methods

Lecture 11. Kernel Methods Lecture 11. Kernel Methods COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture The kernel trick Efficient computation of a dot product

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)

More information

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2015 Announcements TA Monisha s office hour has changed to Thursdays 10-12pm, 462WVH (the same

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods

More information

Chapter 6: Classification

Chapter 6: Classification Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 6: Classification Lecture: Prof. Dr. Thomas

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 2: Vector Data: Prediction Instructor: Yizhou Sun yzsun@cs.ucla.edu October 8, 2018 TA Office Hour Time Change Junheng Hao: Tuesday 1-3pm Yunsheng Bai: Thursday 1-3pm

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu November 16, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

Radial Basis Function (RBF) Networks

Radial Basis Function (RBF) Networks CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks 1 Function approximation We have been using MLPs as pattern classifiers But in general, they are function approximators Depending

More information

Lecture No. 1 Introduction to Method of Weighted Residuals. Solve the differential equation L (u) = p(x) in V where L is a differential operator

Lecture No. 1 Introduction to Method of Weighted Residuals. Solve the differential equation L (u) = p(x) in V where L is a differential operator Lecture No. 1 Introduction to Method of Weighted Residuals Solve the differential equation L (u) = p(x) in V where L is a differential operator with boundary conditions S(u) = g(x) on Γ where S is a differential

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

CSC411: Final Review. James Lucas & David Madras. December 3, 2018 CSC411: Final Review James Lucas & David Madras December 3, 2018 Agenda 1. A brief overview 2. Some sample questions Basic ML Terminology The final exam will be on the entire course; however, it will be

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

Inferring the origin of an epidemic with a dynamic message-passing algorithm

Inferring the origin of an epidemic with a dynamic message-passing algorithm Inferring the origin of an epidemic with a dynamic message-passing algorithm HARSH GUPTA (Based on the original work done by Andrey Y. Lokhov, Marc Mézard, Hiroki Ohta, and Lenka Zdeborová) Paper Andrey

More information

CS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds

CS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds CS 6347 Lecture 8 & 9 Lagrange Multipliers & Varitional Bounds General Optimization subject to: min ff 0() R nn ff ii 0, h ii = 0, ii = 1,, mm ii = 1,, pp 2 General Optimization subject to: min ff 0()

More information

Variations. ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra

Variations. ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra Variations ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra Last Time Probability Density Functions Normal Distribution Expectation / Expectation of a function Independence Uncorrelated

More information

Worksheets for GCSE Mathematics. Algebraic Expressions. Mr Black 's Maths Resources for Teachers GCSE 1-9. Algebra

Worksheets for GCSE Mathematics. Algebraic Expressions. Mr Black 's Maths Resources for Teachers GCSE 1-9. Algebra Worksheets for GCSE Mathematics Algebraic Expressions Mr Black 's Maths Resources for Teachers GCSE 1-9 Algebra Algebraic Expressions Worksheets Contents Differentiated Independent Learning Worksheets

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of

More information

Dan Roth 461C, 3401 Walnut

Dan Roth   461C, 3401 Walnut CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

HISTORY. In the 70 s vs today variants. (Multi-layer) Feed-forward (perceptron) Hebbian Recurrent

HISTORY. In the 70 s vs today variants. (Multi-layer) Feed-forward (perceptron) Hebbian Recurrent NEURAL NETWORKS REFERENCES: ARTIFICIAL INTELLIGENCE FOR GAMES "ARTIFICIAL INTELLIGENCE: A NEW SYNTHESIS HTTPS://MATTMAZUR.COM/2015/03/17/A- STEP- BY- STEP- BACKPROPAGATION- EXAMPLE/ " HISTORY In the 70

More information

Data Preprocessing. Cluster Similarity

Data Preprocessing. Cluster Similarity 1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Statistical Learning with the Lasso, spring The Lasso

Statistical Learning with the Lasso, spring The Lasso Statistical Learning with the Lasso, spring 2017 1 Yeast: understanding basic life functions p=11,904 gene values n number of experiments ~ 10 Blomberg et al. 2003, 2010 The Lasso fmri brain scans function

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Loss Functions, Decision Theory, and Linear Models

Loss Functions, Decision Theory, and Linear Models Loss Functions, Decision Theory, and Linear Models CMSC 678 UMBC January 31 st, 2018 Some slides adapted from Hamed Pirsiavash Logistics Recap Piazza (ask & answer questions): https://piazza.com/umbc/spring2018/cmsc678

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

A new procedure for sensitivity testing with two stress factors

A new procedure for sensitivity testing with two stress factors A new procedure for sensitivity testing with two stress factors C.F. Jeff Wu Georgia Institute of Technology Sensitivity testing : problem formulation. Review of the 3pod (3-phase optimal design) procedure

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Lecture 6. Notes on Linear Algebra. Perceptron

Lecture 6. Notes on Linear Algebra. Perceptron Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors

More information

Advanced data analysis

Advanced data analysis Advanced data analysis Akisato Kimura ( 木村昭悟 ) NTT Communication Science Laboratories E-mail: akisato@ieee.org Advanced data analysis 1. Introduction (Aug 20) 2. Dimensionality reduction (Aug 20,21) PCA,

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Unit II. Page 1 of 12

Unit II. Page 1 of 12 Unit II (1) Basic Terminology: (i) Exhaustive Events: A set of events is said to be exhaustive, if it includes all the possible events. For example, in tossing a coin there are two exhaustive cases either

More information

Multiple Regression Analysis: Inference ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Multiple Regression Analysis: Inference ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD Multiple Regression Analysis: Inference ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD Introduction When you perform statistical inference, you are primarily doing one of two things: Estimating the boundaries

More information

Expectation Propagation performs smooth gradient descent GUILLAUME DEHAENE

Expectation Propagation performs smooth gradient descent GUILLAUME DEHAENE Expectation Propagation performs smooth gradient descent 1 GUILLAUME DEHAENE In a nutshell Problem: posteriors are uncomputable Solution: parametric approximations 2 But which one should we choose? Laplace?

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Introduction to Density Estimation and Anomaly Detection. Tom Dietterich

Introduction to Density Estimation and Anomaly Detection. Tom Dietterich Introduction to Density Estimation and Anomaly Detection Tom Dietterich Outline Definition and Motivations Density Estimation Parametric Density Estimation Mixture Models Kernel Density Estimation Neural

More information

Gaussian Mixture Models, Expectation Maximization

Gaussian Mixture Models, Expectation Maximization Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Expectation maximization

Expectation maximization Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is

More information

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Outline Maximum likelihood (ML) Priors, and

More information

Module 7 (Lecture 27) RETAINING WALLS

Module 7 (Lecture 27) RETAINING WALLS Module 7 (Lecture 27) RETAINING WALLS Topics 1.1 RETAINING WALLS WITH METALLIC STRIP REINFORCEMENT Calculation of Active Horizontal and vertical Pressure Tie Force Factor of Safety Against Tie Failure

More information

Introduction to time series econometrics and VARs. Tom Holden PhD Macroeconomics, Semester 2

Introduction to time series econometrics and VARs. Tom Holden   PhD Macroeconomics, Semester 2 Introduction to time series econometrics and VARs Tom Holden http://www.tholden.org/ PhD Macroeconomics, Semester 2 Outline of today s talk Discussion of the structure of the course. Some basics: Vector

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

(1) Introduction: a new basis set

(1) Introduction: a new basis set () Introduction: a new basis set In scattering, we are solving the S eq. for arbitrary VV in integral form We look for solutions to unbound states: certain boundary conditions (EE > 0, plane and spherical

More information

A Step Towards the Cognitive Radar: Target Detection under Nonstationary Clutter

A Step Towards the Cognitive Radar: Target Detection under Nonstationary Clutter A Step Towards the Cognitive Radar: Target Detection under Nonstationary Clutter Murat Akcakaya Department of Electrical and Computer Engineering University of Pittsburgh Email: akcakaya@pitt.edu Satyabrata

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Work, Energy, and Power. Chapter 6 of Essential University Physics, Richard Wolfson, 3 rd Edition

Work, Energy, and Power. Chapter 6 of Essential University Physics, Richard Wolfson, 3 rd Edition Work, Energy, and Power Chapter 6 of Essential University Physics, Richard Wolfson, 3 rd Edition 1 With the knowledge we got so far, we can handle the situation on the left but not the one on the right.

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Lecture 3. Linear Regression

Lecture 3. Linear Regression Lecture 3. Linear Regression COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne Weeks 2 to 8 inclusive Lecturer: Andrey Kan MS: Moscow, PhD:

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is

More information

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.

More information

Exam Programme VWO Mathematics A

Exam Programme VWO Mathematics A Exam Programme VWO Mathematics A The exam. The exam programme recognizes the following domains: Domain A Domain B Domain C Domain D Domain E Mathematical skills Algebra and systematic counting Relationships

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

Chapter 5: Spectral Domain From: The Handbook of Spatial Statistics. Dr. Montserrat Fuentes and Dr. Brian Reich Prepared by: Amanda Bell

Chapter 5: Spectral Domain From: The Handbook of Spatial Statistics. Dr. Montserrat Fuentes and Dr. Brian Reich Prepared by: Amanda Bell Chapter 5: Spectral Domain From: The Handbook of Spatial Statistics Dr. Montserrat Fuentes and Dr. Brian Reich Prepared by: Amanda Bell Background Benefits of Spectral Analysis Type of data Basic Idea

More information

A linear operator of a new class of multivalent harmonic functions

A linear operator of a new class of multivalent harmonic functions International Journal of Scientific Research Publications, Volume 7, Issue 8, August 2017 278 A linear operator of a new class of multivalent harmonic functions * WaggasGalibAtshan, ** Ali Hussein Battor,

More information

Cryptography CS 555. Topic 4: Computational Security

Cryptography CS 555. Topic 4: Computational Security Cryptography CS 555 Topic 4: Computational Security 1 Recap Perfect Secrecy, One-time-Pads Theorem: If (Gen,Enc,Dec) is a perfectly secret encryption scheme then KK M 2 What if we want to send a longer

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

Clustering and Gaussian Mixture Models

Clustering and Gaussian Mixture Models Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

A brief summary of the chapters and sections that we cover in Math 141. Chapter 2 Matrices

A brief summary of the chapters and sections that we cover in Math 141. Chapter 2 Matrices A brief summary of the chapters and sections that we cover in Math 141 Chapter 2 Matrices Section 1 Systems of Linear Equations with Unique Solutions 1. A system of linear equations is simply a collection

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

2017 Predictive Analytics Symposium

2017 Predictive Analytics Symposium 2017 Predictive Analytics Symposium Session 14, Introduction to Machine Learning Moderator: Robert Anders Larson, FSA, MAAA Presenter: Boyi Xie, Ph.D. SOA Antitrust Compliance Guidelines SOA Presentation

More information

Variable Selection in Predictive Regressions

Variable Selection in Predictive Regressions Variable Selection in Predictive Regressions Alessandro Stringhi Advanced Financial Econometrics III Winter/Spring 2018 Overview This chapter considers linear models for explaining a scalar variable when

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

International Journal of Scientific & Engineering Research, Volume 7, Issue 8, August ISSN MM Double Exponential Distribution

International Journal of Scientific & Engineering Research, Volume 7, Issue 8, August ISSN MM Double Exponential Distribution International Journal of Scientific & Engineering Research, Volume 7, Issue 8, August-216 72 MM Double Exponential Distribution Zahida Perveen, Mubbasher Munir Abstract: The exponential distribution is

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:

More information

PHY103A: Lecture # 4

PHY103A: Lecture # 4 Semester II, 2017-18 Department of Physics, IIT Kanpur PHY103A: Lecture # 4 (Text Book: Intro to Electrodynamics by Griffiths, 3 rd Ed.) Anand Kumar Jha 10-Jan-2018 Notes The Solutions to HW # 1 have been

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

7.3 The Jacobi and Gauss-Seidel Iterative Methods

7.3 The Jacobi and Gauss-Seidel Iterative Methods 7.3 The Jacobi and Gauss-Seidel Iterative Methods 1 The Jacobi Method Two assumptions made on Jacobi Method: 1.The system given by aa 11 xx 1 + aa 12 xx 2 + aa 1nn xx nn = bb 1 aa 21 xx 1 + aa 22 xx 2

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Lecture 11: Unsupervised Machine Learning

Lecture 11: Unsupervised Machine Learning CSE517A Machine Learning Spring 2018 Lecture 11: Unsupervised Machine Learning Instructor: Marion Neumann Scribe: Jingyu Xin Reading: fcml Ch6 (Intro), 6.2 (k-means), 6.3 (Mixture Models); [optional]:

More information

Review for Exam Hyunse Yoon, Ph.D. Assistant Research Scientist IIHR-Hydroscience & Engineering University of Iowa

Review for Exam Hyunse Yoon, Ph.D. Assistant Research Scientist IIHR-Hydroscience & Engineering University of Iowa 57:020 Fluids Mechanics Fall2013 1 Review for Exam3 12. 11. 2013 Hyunse Yoon, Ph.D. Assistant Research Scientist IIHR-Hydroscience & Engineering University of Iowa 57:020 Fluids Mechanics Fall2013 2 Chapter

More information

P.2 Multiplication of Polynomials

P.2 Multiplication of Polynomials 1 P.2 Multiplication of Polynomials aa + bb aa + bb As shown in the previous section, addition and subtraction of polynomials results in another polynomial. This means that the set of polynomials is closed

More information

(1) Correspondence of the density matrix to traditional method

(1) Correspondence of the density matrix to traditional method (1) Correspondence of the density matrix to traditional method New method (with the density matrix) Traditional method (from thermal physics courses) ZZ = TTTT ρρ = EE ρρ EE = dddd xx ρρ xx ii FF = UU

More information

General Strong Polarization

General Strong Polarization General Strong Polarization Madhu Sudan Harvard University Joint work with Jaroslaw Blasiok (Harvard), Venkatesan Gurswami (CMU), Preetum Nakkiran (Harvard) and Atri Rudra (Buffalo) May 1, 018 G.Tech:

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information