CS249: ADVANCED DATA MINING
|
|
- Georgiana Natalie Cummings
- 5 years ago
- Views:
Transcription
1 CS249: ADVANCED DATA MINING Vector Data: Clustering: Part II Instructor: Yizhou Sun May 3, 2017
2 Methods to Learn: Last Lecture Classification Clustering Vector Data Text Data Recommender System Decision Tree; Naïve Bayes; Logistic Regression SVM; NN K-means; hierarchical clustering; DBSCAN; Mixture Models; kernel k-means PLSA; LDA Matrix Factorization Graph & Network Label Propagation SCAN; Spectral Clustering Prediction Ranking Linear Regression GLM Collaborative Filtering PageRank Feature Representation Word embedding Network embedding 2
3 Methods to Learn Classification Clustering Vector Data Text Data Recommender System Decision Tree; Naïve Bayes; Logistic Regression SVM; NN K-means; hierarchical clustering; DBSCAN; Mixture Models; kernel k-means PLSA; LDA Matrix Factorization Graph & Network Label Propagation SCAN; Spectral Clustering Prediction Ranking Linear Regression GLM Collaborative Filtering PageRank Feature Representation Word embedding Network embedding 3
4 Vector Data: Clustering: Part 2 Revisit K-means Kernel K-means Mixture Model and EM algorithm Summary 4
5 Objective function Recall K-Means kk JJ = jj=1 CC ii =jj xx ii cc jj 2 Total within-cluster variance Re-arrange the objective function kk JJ = jj=1 ii ww iiii xx ii cc jj 2 ww iiii {0,1} ww iiii = 1, iiii xx ii bbbbbbbbbbbbbb tttt cccccccccccccc jj; ww iiii = 0, ooooooooooooooooo Looking for: The best assignment ww iiii The best center cc jj 5
6 Iterations Solution of K-Means JJ = Step 1: Fix centers cc jj, find assignment ww iiii that minimizes JJ => ww iiii = 1, iiii xx ii cc jj 2 is the smallest Step 2: Fix assignment ww iiii, find centers that minimize JJ => first derivative of JJ = 0 => cc jj = 2 ii ww iiii (xx ii cc jj ) = 0 =>cc jj = ii ww iiiixx ii ii ww iiii Note ii ww iiii is the total number of objects in cluster j kk jj=1 ww iiii xx ii cc jj 2 ii 6
7
8
9
10
11
12 Converges! Why?
13 Limitations of K-Means K-means has problems when clusters are of different Sizes and density Non-Spherical Shapes 13
14 Limitations of K-Means: Different Sizes and Density 14
15 Limitations of K-Means: Non-Spherical Shapes 15
16 Demo ster/applet/code/cluster.html 16
17 Connections of K-means to Other Methods K-means Kernel K- means Gaussian Mixture Model 17
18 Vector Data: Clustering: Part 2 Revisit K-means Kernel K-means Mixture Model and EM algorithm Summary 18
19 Kernel K-Means How to cluster the following data? A non-linear map: φφ: RR nn FF Map a data point into a higher/infinite dimensional space xx φφ xx Dot product matrix KK iiii KK iiii =< φφ xx ii, φφ(xx jj ) > 19
20 Typical Kernel Functions Recall kernel SVM: 20
21 Solution of Kernel K-Means Objective function under new feature space: JJ = kk jj=1 ii ww iiii φφ(xx ii ) cc jj 2 Algorithm By fixing assignment ww iiii cc jj = ii ww iiii φφ(xx ii )/ ii ww iiii In the assignment step, assign the data points to the closest center dd xx ii, cc jj = φφ xx ii ii ww ii jjφφ xx ii ii ww ii jj 2 = φφ xx ii φφ xx ii 2 ii ww ii jj φφ xx ii φφ xx ii ii ww ii jj + ii ll ww ii jj ww llll φφ xx ii φφ xx ll ( ii ww ii jj )^2 Do not really need to know φφ xx, bbbbbb oooooooo KK iiii 21
22 Advantages and Disadvantages of Kernel K-Means Advantages Algorithm is able to identify the non-linear structures. Disadvantages Number of cluster centers need to be predefined. Algorithm is complex in nature and time complexity is large. References Kernel k-means and Spectral Clustering by Max Welling. Kernel k-means, Spectral Clustering and Normalized Cut by Inderjit S. Dhillon, Yuqiang Guan and Brian Kulis. An Introduction to kernel methods by Colin Campbell. 22
23 Vector Data: Clustering: Part 2 Revisit K-means Kernel K-means Mixture Model and EM algorithm Summary 23
24 Fuzzy Set and Fuzzy Cluster Clustering methods discussed so far Every data object is assigned to exactly one cluster Some applications may need for fuzzy or soft cluster assignment Ex. An e-game could belong to both entertainment and software Methods: fuzzy clusters and probabilistic model-based clusters Fuzzy cluster: A fuzzy set S: F S : X [0, 1] (value between 0 and 1) 24
25 Mixture Model-Based Clustering A set C of k probabilistic clusters C 1,,C k probability density functions: f 1,, f k, Cluster prior probabilities: w 1,, w k, jj ww jj = 1 Joint Probability of an object i and its cluster C j is: PP(xx ii, zz ii = CC jj ) = ww jj ff jj (xx ii ) Probability of i is: PP xx ii = jj ww jj ff jj (xx ii ) 25
26 Maximum Likelihood Estimation Since objects are assumed to be generated independently, for a data set D = {x 1,, x n }, we have, PP DD = PP xx ii = ww jj ff jj (xx ii ) ii ii jj llllllll DD = llllllll xx ii = llllll ww jj ff jj (xx ii ) ii Task: Find a set C of k probabilistic clusters s.t. P(D) is maximized ii jj 26
27 The EM (Expectation Maximization) Algorithm The (EM) algorithm: A framework to approach maximum likelihood or maximum a posteriori estimates of parameters in statistical models. E-step assigns objects to clusters according to the current fuzzy clustering or parameters of probabilistic clusters ww tt+1 iiii = pp zz ii = jj θθ tt jj, xx ii pp xx ii zz ii = jj, θθ jj tt pp(zz ii = jj) M-step finds the new clustering or parameters that maximize the expected likelihood, with respect to conditional distribution pp zz ii = jj θθ jj tt, xx ii θθ tt+1 = aaaaaaaaaaxx θθ ii jj ww tt+1 iiii log LL(xx ii, zz ii = jj θθ) 27
28 Gaussian Mixture Model Generative model For each object: Pick its distribution component: ZZ~MMMMMMMMMM ww 1,, ww kk Sample a value from the selected distribution: XX~NN μμ ZZ, σσ ZZ 2 Overall likelihood function LL DD θθ = ii jj ww jj pp(xx ii μμ jj, σσ jj 2 ) s.t. jj ww jj = 1 aaaaaa ww jj 0 Q: What is θθ here? 28
29 Estimating Parameters LL DD; θθ = ii log jj ww jj pp(xx ii μμ jj, σσ jj 2 ) Considering the first derivative of μμ jj : LL uu jj = ii ww jj jj ww jj pp(xx ii μμ jj,σσ jj 2 ) pp(xx ii μμ jj,σσ jj 2 ) μμ jj = ii ww jj pp(xx ii μμ jj,σσ jj 2 ) jj ww jj pp(xx ii μμ jj,σσ jj 2 ) 1 pp(xx ii μμ jj,σσ jj 2 ) (xx ii μμ jj,σσ jj 2 ) μμ jj = ii ww jj pp(xx ii μμ jj,σσ jj 2 ) jj ww jj pp(xx ii μμ jj,σσ jj 2 ) ww iiii = PP(ZZ = jj XX = xx ii, θθ) (xx ii μμ jj,σσ jj 2 ) uu jj (xx ii )/ μμ jj Like weighted likelihood estimation; But the weight is determined by the parameters! 29
30 Apply EM algorithm: 1-d An iterative algorithm (at iteration t+1) E(expectation)-step Evaluate the weight ww iiii when μμ jj, σσ jj, ww jj are given ww iiii tt+1 = ww jj tt pp(xx ii μμ jj tt,(σσjj 2 ) tt ) jj ww jj tt pp(xx ii μμ jj tt,(σσ jj 2 ) tt ) M(maximization)-step Evaluate μμ jj, σσ jj, ww jj when ww iiii s are given that maximize the weighted likelihood It is equivalent to Gaussian distribution parameter estimation when each point has a weight belonging to each distribution μμ tt+1 jj = ii ww iiii tt+1 xx ii tt+1 ii ww iiii ; (σσ 2 jj ) tt+1 = ii ww tt+1 tt 2 iiii xxii μμ jj tt+1 ; ww tt+1 tt+1 ii ww jj ii ww iiii iiii 30
31 Example: 1-D GMM 31
32 2-d Gaussian Bivariate Gaussian distribution Two dimensional random variable: X = XX 1 XX 1 XX 2 NN(μμ = μμ 1 μμ 2, Σ = μμ 1 aaaaaa μμ 2 are means of XX 1 aaaaaa XX 2 XX 2 σσ 1 2 σσ XX 1, XX 2 σσ XX 1, XX 2 σσ 2 2 ) σσ 1 aaaaaa σσ 2 aaaaaa standard deviations of XX 1 aaaaaa XX 2 σσ XX 1, XX 2 iiii ttttt cccccccccccccccccccc bbbbbbbbbbbbbb XX 1 aaaaaa XX 2, ii. ee., σσ XX 1, XX 2 = EE XX 1 μμ 1 XX 2 μμ 2 32
33 Apply EM algorithm: 2-d An iterative algorithm (at iteration t+1) E(expectation)-step Evaluate the weight ww iiii when μμ jj, Σ jj, ww jj are given ww tt+1 iiii = ww jj tt tt tt pp(xx ii μμ jj,σjj) jj ww tt jj pp(xx ii μμ tt jj,σ tt jj ) M(maximization)-step Evaluate μμ jj, Σ jj, ww jj when ww iiii s are given that maximize the weighted likelihood It is equivalent to Gaussian distribution parameter estimation when each point has a weight belonging to each distribution μμ tt+1 jj = ii ww iiii tt+1 xx ii ii ww tt+1 ; (σσ jj,1 iiii (σσ XX 1, XX 2 jj ) tt+1 = ii ww iiii 2 ) tt+1 = ii ww iiii tt+1 tt xx ii,1 μμ jj,1 ii ww tt+1 iiii tt+1 tt tt (xx ii,1 μμ jj,1)(xxii,2 μμ jj,2) ii ww tt+1 iiii 2 ; (σσ 2 jj,2 ) tt+1 = ii ww iiii tt+1 ; ww jj tt+1 ii ww iiii tt+1 2 tt xx ii,2 μμ jj,2 tt+1 ; ii ww iiii 33
34 K-Means: A Special Case of Gaussian Mixture Model When each Gaussian component with covariance matrix σσ 2 II Soft K-means ww iiii pp xx ii μμ jj, σσ 2 ww jj = exp xx ii μμ jj 2 When σσ 2 0 2σσ 2 Distance! Soft assignment becomes hard assignment ww iiii 1, iiii xx ii is closest to μμ jj (why?) ww jj 34
35 Why EM Works? E-Step: computing a tight lower bound LL of the original objective function l at θθ oooooo M-Step: find θθ nnnnnn to maximize the lower bound ll θθ nnnnnn LL θθ nnnnnn LL(θθ oooooo ) = ll(θθ oooooo ) 35
36 How to Find Tight Lower Bound? qq h : ttttt kkkkkk tttt ttttttttt llllllllll bbbbbbbbbb wwww wwwwwwww tttt gggggg Jensen s inequality When = holds to get a tight lower bound? qq h = pp(h dd, θθ) (why?) tthee ttttttttt llllllllll bbbbbbbbbb 36
37 In GMM Case LL DD; θθ = ii log jj ww jj pp(xx ii μμ jj, σσ jj 2 ) ii jj ww iiii (log ww jj pp xx ii μμ jj, σσ jj 2 llllllww iiii ) log LL(xx ii, zz ii = jj θθ) Does not involve θθ, can be dropped 37
38 Advantages and Disadvantages of GMM Strength Mixture models are more general than partitioning: different densities and sizes of clusters Clusters can be characterized by a small number of parameters The results may satisfy the statistical assumptions of the generative models Weakness Converge to local optimal (overcome: run multi-times w. random initialization) Computationally expensive if the number of distributions is large, or the data set contains very few observed data points Hard to estimate the number of clusters Can only deal with spherical clusters 38
39 Vector Data: Clustering: Part 2 Revisit K-means Kernel K-means Mixture Model and EM algorithm Summary 39
40 Revisit k-means Derivative Kernel k-means Summary Objective function; solution; connection to k- means Mixture models Gaussian mixture model; multinomial mixture model; EM algorithm; Connection to k-means 40
CS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu November 3, 2015 Methods to Learn Matrix Data Text Data Set Data Sequence Data Time Series Graph
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Graph and Network Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Methods Learnt Classification Clustering Vector Data Text Data Recommender System Decision Tree; Naïve
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Clustering Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu May 2, 2017 Announcements Homework 2 due later today Due May 3 rd (11:59pm) Course project
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Support Vector Machine and Neural Network Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 1 Announcements Due end of the day of this Friday (11:59pm) Reminder
More informationSupport Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationMachine Learning for Data Science (CS4786) Lecture 12
Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Recommender Systems Instructor: Yizhou Sun yzsun@cs.ucla.edu May 17, 2017 Methods Learnt: Last Lecture Classification Clustering Vector Data Text Data Recommender System Decision
More informationLecture 11. Kernel Methods
Lecture 11. Kernel Methods COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture The kernel trick Efficient computation of a dot product
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)
More informationLecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher
Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2015 Announcements TA Monisha s office hour has changed to Thursdays 10-12pm, 462WVH (the same
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods
More informationChapter 6: Classification
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 6: Classification Lecture: Prof. Dr. Thomas
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 2: Vector Data: Prediction Instructor: Yizhou Sun yzsun@cs.ucla.edu October 8, 2018 TA Office Hour Time Change Junheng Hao: Tuesday 1-3pm Yunsheng Bai: Thursday 1-3pm
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu November 16, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision
More informationRadial Basis Function (RBF) Networks
CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks 1 Function approximation We have been using MLPs as pattern classifiers But in general, they are function approximators Depending
More informationLecture No. 1 Introduction to Method of Weighted Residuals. Solve the differential equation L (u) = p(x) in V where L is a differential operator
Lecture No. 1 Introduction to Method of Weighted Residuals Solve the differential equation L (u) = p(x) in V where L is a differential operator with boundary conditions S(u) = g(x) on Γ where S is a differential
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision
More informationCSC411: Final Review. James Lucas & David Madras. December 3, 2018
CSC411: Final Review James Lucas & David Madras December 3, 2018 Agenda 1. A brief overview 2. Some sample questions Basic ML Terminology The final exam will be on the entire course; however, it will be
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationInferring the origin of an epidemic with a dynamic message-passing algorithm
Inferring the origin of an epidemic with a dynamic message-passing algorithm HARSH GUPTA (Based on the original work done by Andrey Y. Lokhov, Marc Mézard, Hiroki Ohta, and Lenka Zdeborová) Paper Andrey
More informationCS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds
CS 6347 Lecture 8 & 9 Lagrange Multipliers & Varitional Bounds General Optimization subject to: min ff 0() R nn ff ii 0, h ii = 0, ii = 1,, mm ii = 1,, pp 2 General Optimization subject to: min ff 0()
More informationVariations. ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra
Variations ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra Last Time Probability Density Functions Normal Distribution Expectation / Expectation of a function Independence Uncorrelated
More informationWorksheets for GCSE Mathematics. Algebraic Expressions. Mr Black 's Maths Resources for Teachers GCSE 1-9. Algebra
Worksheets for GCSE Mathematics Algebraic Expressions Mr Black 's Maths Resources for Teachers GCSE 1-9 Algebra Algebraic Expressions Worksheets Contents Differentiated Independent Learning Worksheets
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of
More informationDan Roth 461C, 3401 Walnut
CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn
More informationData Mining Techniques
Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!
More informationHISTORY. In the 70 s vs today variants. (Multi-layer) Feed-forward (perceptron) Hebbian Recurrent
NEURAL NETWORKS REFERENCES: ARTIFICIAL INTELLIGENCE FOR GAMES "ARTIFICIAL INTELLIGENCE: A NEW SYNTHESIS HTTPS://MATTMAZUR.COM/2015/03/17/A- STEP- BY- STEP- BACKPROPAGATION- EXAMPLE/ " HISTORY In the 70
More informationData Preprocessing. Cluster Similarity
1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationStatistical Learning with the Lasso, spring The Lasso
Statistical Learning with the Lasso, spring 2017 1 Yeast: understanding basic life functions p=11,904 gene values n number of experiments ~ 10 Blomberg et al. 2003, 2010 The Lasso fmri brain scans function
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationLoss Functions, Decision Theory, and Linear Models
Loss Functions, Decision Theory, and Linear Models CMSC 678 UMBC January 31 st, 2018 Some slides adapted from Hamed Pirsiavash Logistics Recap Piazza (ask & answer questions): https://piazza.com/umbc/spring2018/cmsc678
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationA new procedure for sensitivity testing with two stress factors
A new procedure for sensitivity testing with two stress factors C.F. Jeff Wu Georgia Institute of Technology Sensitivity testing : problem formulation. Review of the 3pod (3-phase optimal design) procedure
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationLecture 6. Notes on Linear Algebra. Perceptron
Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors
More informationAdvanced data analysis
Advanced data analysis Akisato Kimura ( 木村昭悟 ) NTT Communication Science Laboratories E-mail: akisato@ieee.org Advanced data analysis 1. Introduction (Aug 20) 2. Dimensionality reduction (Aug 20,21) PCA,
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationUnit II. Page 1 of 12
Unit II (1) Basic Terminology: (i) Exhaustive Events: A set of events is said to be exhaustive, if it includes all the possible events. For example, in tossing a coin there are two exhaustive cases either
More informationMultiple Regression Analysis: Inference ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD
Multiple Regression Analysis: Inference ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD Introduction When you perform statistical inference, you are primarily doing one of two things: Estimating the boundaries
More informationExpectation Propagation performs smooth gradient descent GUILLAUME DEHAENE
Expectation Propagation performs smooth gradient descent 1 GUILLAUME DEHAENE In a nutshell Problem: posteriors are uncomputable Solution: parametric approximations 2 But which one should we choose? Laplace?
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationIntroduction to Density Estimation and Anomaly Detection. Tom Dietterich
Introduction to Density Estimation and Anomaly Detection Tom Dietterich Outline Definition and Motivations Density Estimation Parametric Density Estimation Mixture Models Kernel Density Estimation Neural
More informationGaussian Mixture Models, Expectation Maximization
Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationExpectation maximization
Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is
More informationMaximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS
Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Outline Maximum likelihood (ML) Priors, and
More informationModule 7 (Lecture 27) RETAINING WALLS
Module 7 (Lecture 27) RETAINING WALLS Topics 1.1 RETAINING WALLS WITH METALLIC STRIP REINFORCEMENT Calculation of Active Horizontal and vertical Pressure Tie Force Factor of Safety Against Tie Failure
More informationIntroduction to time series econometrics and VARs. Tom Holden PhD Macroeconomics, Semester 2
Introduction to time series econometrics and VARs Tom Holden http://www.tholden.org/ PhD Macroeconomics, Semester 2 Outline of today s talk Discussion of the structure of the course. Some basics: Vector
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More information(1) Introduction: a new basis set
() Introduction: a new basis set In scattering, we are solving the S eq. for arbitrary VV in integral form We look for solutions to unbound states: certain boundary conditions (EE > 0, plane and spherical
More informationA Step Towards the Cognitive Radar: Target Detection under Nonstationary Clutter
A Step Towards the Cognitive Radar: Target Detection under Nonstationary Clutter Murat Akcakaya Department of Electrical and Computer Engineering University of Pittsburgh Email: akcakaya@pitt.edu Satyabrata
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationWork, Energy, and Power. Chapter 6 of Essential University Physics, Richard Wolfson, 3 rd Edition
Work, Energy, and Power Chapter 6 of Essential University Physics, Richard Wolfson, 3 rd Edition 1 With the knowledge we got so far, we can handle the situation on the left but not the one on the right.
More informationActive and Semi-supervised Kernel Classification
Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationLecture 3. Linear Regression
Lecture 3. Linear Regression COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne Weeks 2 to 8 inclusive Lecturer: Andrey Kan MS: Moscow, PhD:
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationThe classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know
The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is
More informationThe classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA
The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.
More informationExam Programme VWO Mathematics A
Exam Programme VWO Mathematics A The exam. The exam programme recognizes the following domains: Domain A Domain B Domain C Domain D Domain E Mathematical skills Algebra and systematic counting Relationships
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,
More informationChapter 5: Spectral Domain From: The Handbook of Spatial Statistics. Dr. Montserrat Fuentes and Dr. Brian Reich Prepared by: Amanda Bell
Chapter 5: Spectral Domain From: The Handbook of Spatial Statistics Dr. Montserrat Fuentes and Dr. Brian Reich Prepared by: Amanda Bell Background Benefits of Spectral Analysis Type of data Basic Idea
More informationA linear operator of a new class of multivalent harmonic functions
International Journal of Scientific Research Publications, Volume 7, Issue 8, August 2017 278 A linear operator of a new class of multivalent harmonic functions * WaggasGalibAtshan, ** Ali Hussein Battor,
More informationCryptography CS 555. Topic 4: Computational Security
Cryptography CS 555 Topic 4: Computational Security 1 Recap Perfect Secrecy, One-time-Pads Theorem: If (Gen,Enc,Dec) is a perfectly secret encryption scheme then KK M 2 What if we want to send a longer
More informationProbabilistic Time Series Classification
Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign
More informationClustering and Gaussian Mixture Models
Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationClick Prediction and Preference Ranking of RSS Feeds
Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS
More informationA brief summary of the chapters and sections that we cover in Math 141. Chapter 2 Matrices
A brief summary of the chapters and sections that we cover in Math 141 Chapter 2 Matrices Section 1 Systems of Linear Equations with Unique Solutions 1. A system of linear equations is simply a collection
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More information2017 Predictive Analytics Symposium
2017 Predictive Analytics Symposium Session 14, Introduction to Machine Learning Moderator: Robert Anders Larson, FSA, MAAA Presenter: Boyi Xie, Ph.D. SOA Antitrust Compliance Guidelines SOA Presentation
More informationVariable Selection in Predictive Regressions
Variable Selection in Predictive Regressions Alessandro Stringhi Advanced Financial Econometrics III Winter/Spring 2018 Overview This chapter considers linear models for explaining a scalar variable when
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationInternational Journal of Scientific & Engineering Research, Volume 7, Issue 8, August ISSN MM Double Exponential Distribution
International Journal of Scientific & Engineering Research, Volume 7, Issue 8, August-216 72 MM Double Exponential Distribution Zahida Perveen, Mubbasher Munir Abstract: The exponential distribution is
More informationClustering. CSL465/603 - Fall 2016 Narayanan C Krishnan
Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:
More informationPHY103A: Lecture # 4
Semester II, 2017-18 Department of Physics, IIT Kanpur PHY103A: Lecture # 4 (Text Book: Intro to Electrodynamics by Griffiths, 3 rd Ed.) Anand Kumar Jha 10-Jan-2018 Notes The Solutions to HW # 1 have been
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More information7.3 The Jacobi and Gauss-Seidel Iterative Methods
7.3 The Jacobi and Gauss-Seidel Iterative Methods 1 The Jacobi Method Two assumptions made on Jacobi Method: 1.The system given by aa 11 xx 1 + aa 12 xx 2 + aa 1nn xx nn = bb 1 aa 21 xx 1 + aa 22 xx 2
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationLecture 11: Unsupervised Machine Learning
CSE517A Machine Learning Spring 2018 Lecture 11: Unsupervised Machine Learning Instructor: Marion Neumann Scribe: Jingyu Xin Reading: fcml Ch6 (Intro), 6.2 (k-means), 6.3 (Mixture Models); [optional]:
More informationReview for Exam Hyunse Yoon, Ph.D. Assistant Research Scientist IIHR-Hydroscience & Engineering University of Iowa
57:020 Fluids Mechanics Fall2013 1 Review for Exam3 12. 11. 2013 Hyunse Yoon, Ph.D. Assistant Research Scientist IIHR-Hydroscience & Engineering University of Iowa 57:020 Fluids Mechanics Fall2013 2 Chapter
More informationP.2 Multiplication of Polynomials
1 P.2 Multiplication of Polynomials aa + bb aa + bb As shown in the previous section, addition and subtraction of polynomials results in another polynomial. This means that the set of polynomials is closed
More information(1) Correspondence of the density matrix to traditional method
(1) Correspondence of the density matrix to traditional method New method (with the density matrix) Traditional method (from thermal physics courses) ZZ = TTTT ρρ = EE ρρ EE = dddd xx ρρ xx ii FF = UU
More informationGeneral Strong Polarization
General Strong Polarization Madhu Sudan Harvard University Joint work with Jaroslaw Blasiok (Harvard), Venkatesan Gurswami (CMU), Preetum Nakkiran (Harvard) and Atri Rudra (Buffalo) May 1, 018 G.Tech:
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationIntroduction to Graphical Models
Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic
More information