Probabilistic Unsupervised Learning
|
|
- Rudolf Mosley
- 5 years ago
- Views:
Transcription
1 HT2015: SC4 Statistical Data Miig ad Machie Learig Dio Sejdiovic Departmet of Statistics Oxford Probabilistic Methods Algorithmic approach: Data Probabilistic modellig approach: Uobserved process Algorithm Geerative Model Aalysis Iterpretatio Aalysis/ Iterpretatio Data Mixture models suppose that our dataset X was created by samplig iid from K distict populatios (called mixture compoets). Typical samples i populatio ca be modelled usig a distributio F µ with desity f (x µ ). For a cocrete example, cosider a Gaussia with uow mea µ ad ow diagoal covariace σ 2 I, ( f (x µ ) 2πσ 2 p 2 exp 1 ) 2σ 2 x µ 2 2. Geerative model: for i 1, 2,..., : First determie which populatio item i came from (idepedetly): Z i Discrete(π 1,..., π K ) i.e., P(Z i ) π where mixig proportios are π 0 for each ad 1 π 1. If Z i, the X i (X i1,..., X ip ) is sampled (idepedetly) from correspodig populatio distributio: X i Z i F µ We observe that X i x i for each i, ad would lie to lear about the uow parameters of the process.
2 - Posterior Distributio Uows to lear give data are Parameters: π 1,..., π K [0, 1], µ 1,..., µ K R p, as well as Latet variables: z 1,..., z. The joit probability over all cluster idicator variables {Z i } are: K p Z ((z i ) ) π zi 1 π 1(z i) The joit desity at observatios X i x i give Z i z i are: K p X ((x i ) (Z i z i ) ) f (x i µ ) 1(z i) 1 1 So the joit probability/desity 1 is: K p X,Z ((x i, z i ) ) (π f (x i µ )) 1(z i) 1 I this course we will treat probabilities ad desities equivaletly for otatioal simplicity. I geeral, the quatity is a desity with respect to the product base measure, where the base measure is the coutig measure for discrete variables ad Lebesgue for cotiuous variables. - Maximum Liehood Suppose we ow the parameters (π, µ ) K 1. Z i is a radom variable ad its posterior distributio give data set X is: Q i : p(z i x i ) p(z i, x i ) p(x i ) where the margial probability of i-th istace is: p(x i ) p(z i j, x i ) j1 π j f (x i µ j ). j1 π f (x i µ ) j1 π jf (x i µ j ) The posterior probability Q i of Z i is called the resposibility of mixture compoet for data poit x i. The posterior distributio softly partitios the dataset amog the compoets. - Maximum Liehood How ca we lear about the parameters θ (π, µ ) K 1 from data? Stadard statistical methodology ass for the maximum lielihood estimator (MLE). The goal is to maximize the margial probability of the data over the parameters Margial log-lielihood: l((π, µ ) K 1) : log p(x (π, µ ) K 1) log π f (x i µ ) 1 ˆθ ML argmax p(x θ) argmax θ (π,µ ) K 1 argmax (π,µ ) K 1 argmax p(x i (π, µ ) K 1) 1 (π,µ ) K 1 log π f (x i µ ) π f (x i µ ). 1 } {{ } :l((π,µ ) K 1 ) The gradiet w.r.t. µ : µ l((π, µ ) K 1) π f (x i µ ) j1 π jf (x i µ j ) µ log f (x i µ ) Q i µ log f (x i µ ). Difficult to solve, as Q i depeds implicitly o µ.
3 - Maximum Liehood - Maximum Liehood Q i µ log f (x i µ ) 0 What if we igore the depedece of Q i o the parameters? Taig the mixture of Gaussia with covariace σ 2 I as example, 1 σ 2 ( Q i µ p 2 log(2πσ2 ) 1 ) 2σ 2 x i µ 2 2 Q i (x i µ ) 1 σ 2 ( Q i x i µ ( Q i) ) 0 The estimate is a weighted average of data poits, where the estimated mea of cluster uses its resposibilities to data poits as weights. µ ML? Q ix i Q. i Maes sese: Suppose we ew that data poit x i came from populatio z i. The Q izi 1 ad Q i 0 for z i ad: µ ML? i:z i x i i:z i 1 avg{x i : z i } µ ML? Q ix i Q i Our best guess of the origiatig populatio is give by Q i. - Maximum Liehood - The Gradiet w.r.t. mixig proportio π (icludig a Lagrage multiplier λ ( π 1 ) to eforce costrait π 1). Note: ( π l((π, µ ) K 1) λ( ) K 1 π 1) 1 Q i f (x i µ ) j1 π jf (x i µ j ) λ Q i π λ 0 π 1 Q i }{{} 1 Q i π ML? Q i Agai maes sese: the estimate is simply (our best guess of) the proportio of data poits comig from populatio. Puttig all the derivatios together, we get a iterative algorithm for learig about the uows i the mixture model. Start with some iitial parameters (π (0), µ (0) ) K 1. Iterate for t 1, 2,...: Expectatio Step: Maximizatio Step: Q (t) i : π (t) Q(t) i Will the algorithm coverge? What does it coverge to? π (t 1) j1 π(t 1) j ) j ) µ (t) Q(t) i x i Q(t) i
4 Lielihood Surface for a Simple Example Example: Mixture of 3 Gaussias compbody.tex After 1st E ad M step. Iteratio mu mu 1 (a) (b) (left) 200 data poits from a mixture of two 1D Gaussias with Figure π : πleft: 2 N 0.5, 200 σ data 5poits ad sampled µ 1 from10, a mixture µ 2 of Gaussias i 1d, with π 0.5, σ 5, µ 1 10 ad µ Right: (right) Lielihood Logsurface lielihood p(d µ 1, µ 2 surface ), with all other l (µ parameters 1, µ 2 ), set all to their true other values. parameters We see the two symmetric beigmodes, reflectig the uidetifiability assumedofow. the parameters. Produced by mixgausslisurfacedemo. data[,2] data[,1] Uidetifiability Note that mixture models are ot idetifiable, which meas there are may settigs of the parameters which have the same lielihood. Specifically, i a mixture model with K compoets, there are K! equivalet parameter settigs, which differ merely byafter permutig 5ththeE labels adofmthestep. hidde states. See Figure 11.6 for a illustratio. The existece of equivalet global modes does ot matter whe computig a sigle poit estimate, such as the ML or MAP estimate, but it does complicate Bayesia iferece, Iteratio 5 as we will i Sectio Ufortuately, eve fidig just oe of these global modes is computatioally difficult. The EM algorithm is oly guarateed to fid a local mode. A variety of methods ca be used to icrease the chace of fidig a good local optimum. The simplest, ad most widely used, is to perform multiple radom restarts K-meas algorithm Example: Mixture of 3 Gaussias There is a variat of the EM algorithm for GMMs ow as the K-meas algorithm, which we ow discuss. Cosider a GMM i which we mae the followig assumptios: Σ σ 2 I D is fixed, ad π 1/K is fixed, so oly the cluster ceters, µ R D, have to be estimated. Now cosider a approximatio to EM i which we mae the approximatio data[,2] p(z i x i, θ) I( z i ) (11.61) where z i arg max p(z i x i, θ). This is sometimes called hard EM, sice we are maig a hard assigmet of poits to clusters. Sice we assumed a equal spherical covariace matrix for each cluster, the most probable cluster for x i ca be computed by fidig the earest prototype: zi arg mi x i µ 2 (11.62) Hece i each E step, we must fid the Euclidea distace betwee N data poits ad K cluster ceters, which taes O(NKD) data[,1] time. However, this ca be sped up usig various techiques, such as applyig the triagle iequality to avoid some redudat computatios [El03]. Give the hard cluster assigmets, the M step updates each cluster ceter by computig the mea of all I a maximum lielihood framewor, the objective fuctio is the log lielihood, l(θ) log π f (x i µ ) 1 Direct maximizatio is ot feasible. Cosider aother objective fuctio F(θ, q) such that: F(θ, q) l(θ) for all θ, q, max F(θ, q) l(θ) q F(θ, q) is a lower boud o the log lielihood. We ca costruct a alteratig maximizatio algorithm as follows: For t 1, 2... util covergece: q (t) : argmax F(θ (t 1), q) q θ (t) : argmax F(θ, q (t) ) θ
5 - Solvig for q Gradiet of F w.r.t q (with Lagrage multiplier for z q(z) 1): The lower boud we use is called the variatioal free eergy. q is a probability mass fuctio for a distributio over z : (z i ). F(θ, q) E q [log p(x, z θ) log q(z)] [( ) ] E q 1(z i ) (log π + log f (x i µ )) log q(z) 1 [( ) ] q(z) 1(z i ) (log π + log f (x i µ )) log q(z) z 1 q(z) F(θ, q) q (z) 1 1(z i ) (log π + log f (x i µ )) log q(z) 1 λ (log π zi + log f (x i µ zi )) log q(z) 1 λ 0 q (z) π zi f (x i µ zi ). π z i f (x i µ zi ) z π z i f (x i µ z i ) π zi f (x i µ zi ) π f (x i µ ) Optimal q is simply the posterior distributio for fixed θ. Pluggig i the optimal q ito the variatioal free eergy, F(θ, q ) log π f (x i µ ) l(θ) 1 p(z i x i, θ). - Solvig for θ Settig derivative with respect to µ to 0, µ F(θ, q) q(z) 1(z i ) µ log f (x i µ ) z q(z i ) µ log f (x i µ ) 0 This equatio ca be solved quite easily. E.g., for mixture of Gaussias, µ q(z i )x i q(z i ) If it caot be solved exactly, we ca use gradiet ascet algorithm: µ µ + α q(z i ) µ log f (x i µ ). Similar derivatio for optimal π as before. Start with some iitial parameters (π (0), µ (0) ) K 1. Iterate for t 1, 2,...: Expectatio Step: q (t) (z i ) : π (t 1) j1 π(t 1) j Maximizatio Step: π (t) q(t) (z i ) Each step icreases the log lielihood: ) j ) E p(z i x i,θ (t 1) ) [1(z i )] µ (t) q(t) (z i )x i q(t) (z i ) l(θ (t 1) ) F(θ (t 1), q (t) ) F(θ (t), q (t) ) F(θ (t), q (t+1) ) l(θ (t) ). Additioal assumptio, that 2 θ F(θ(t), q (t) ) are egative defiite with eigevalues < ɛ < 0, implies that θ (t) θ where θ is a local MLE.
6 Notes o Probabilistic Approach ad Flexible Gaussia Some good thigs: Guarateed covergece to locally optimal parameters. Formal reasoig of ucertaities, usig both Bayes Theorem ad maximum lielihood theory. Rich laguage of probability theory to express a wide rage of geerative models, ad straightforward derivatio of algorithms for ML estimatio. Some bad thigs: Ca get stuc i local miima so multiple starts are recommeded. Slower ad more expesive tha K-meas. Choice of K still problematic, but rich array of methods for model selectio comes to rescue. We ca allow each cluster to have its ow mea ad covariace structure allows greater flexibility i the model. Differet covariaces Idetical covariaces Differet, but diagoal covariaces Idetical ad spherical covariaces PPCA latets A probabilistic model related to PCA has the followig geerative model: for i 1, 2,..., : Let <, p be give. Let Y i be a (latet) -dimesioal ormally distributed radom variable with 0 mea ad idetity covariace: Y i N (0, I ) PCA projectio We model the distributio of the ith data poit give Y i as a p-dimesioal ormal: X i N (µ + LY i, σ 2 I) where the parameters are a vector µ R p, a matrix L R p ad σ 2 > 0. pricipal subspace figures by M. Sahai
7 Mixture of s PPCA latets PPCA posterior PPCA oise PPCA latet prior PPCA projectio We have leart two types of usupervised learig techiques: Dimesioality reductio, e.g. PCA, MDS, Isomap. Clusterig, e.g. K-meas, liage ad mixture models. Probabilistic models allow us to costruct more complex models from simpler pieces. Mixture of probabilistic PCAs allows both clusterig ad dimesioality reductio at the same time. Z i Discrete(π 1,..., π K ) Y i N (0, I d ) X i Z i, Y i y i N (µ + Ly i, σ 2 I p ) pricipal subspace Allows flexible modellig of covariace structure without usig too may parameters. figures by M. Sahai Ghahramai ad Hito 1996 Further Readig Usupervised Learig Hastie et al, Chapter 14. James et al, Chapter 10. Ripley, Chapter 9. Tuey, Joh W. (1980). We eed both exploratory ad cofirmatory. The America Statisticia 34 (1):
Probabilistic Unsupervised Learning
Statistical Data Miig ad Machie Learig Hilary Term 2016 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.u/~sejdiov/sdmml Probabilistic Methods
More informationLatent Variable Models and EM Algorithm
SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/
More informationClustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.
Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.
More informationUnsupervised Learning 2001
Usupervised Learig 2001 Lecture 3: The EM Algorithm Zoubi Ghahramai zoubi@gatsby.ucl.ac.uk Carl Edward Rasmusse edward@gatsby.ucl.ac.uk Gatsby Computatioal Neurosciece Uit MSc Itelliget Systems, Computer
More informationOutline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019
Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /
More informationThe Expectation-Maximization (EM) Algorithm
The Expectatio-Maximizatio (EM) Algorithm Readig Assigmets T. Mitchell, Machie Learig, McGraw-Hill, 997 (sectio 6.2, hard copy). S. Gog et al. Dyamic Visio: From Images to Face Recogitio, Imperial College
More informationExpectation-Maximization Algorithm.
Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationCSE 527, Additional notes on MLE & EM
CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be
More information8 : Learning Partially Observed GM: the EM algorithm
10-708: Probabilistic Graphical Models, Sprig 2015 8 : Learig Partially Observed GM: the EM algorithm Lecturer: Eric P. Xig Scribes: Auric Qiao, Hao Zhag, Big Liu 1 Itroductio Two fudametal questios i
More informationChapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian
Chapter 2 EM algorithms The Expectatio-Maximizatio (EM) algorithm is a maximum likelihood method for models that have hidde variables eg. Gaussia Mixture Models (GMMs), Liear Dyamic Systems (LDSs) ad Hidde
More informationAlgorithms for Clustering
CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat
More informationFactor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis
Lecture 10: Factor Aalysis ad Pricipal Compoet Aalysis Sam Roweis February 9, 2004 Whe we assume that the subspace is liear ad that the uderlyig latet variable has a Gaussia distributio we get a model
More informationClustering: Mixture Models
Clusterig: Mixture Models Machie Learig 10-601B Seyoug Kim May of these slides are derived from Tom Mitchell, Ziv- Bar Joseph, ad Eric Xig. Thaks! Problem with K- meas Hard Assigmet of Samples ito Three
More informationThe Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model
Back to Maximum Likelihood Give a geerative model f (x, y = k) =π k f k (x) Usig a geerative modellig approach, we assume a parametric form for f k (x) =f (x; k ) ad compute the MLE θ of θ =(π k, k ) k=
More informationExpectation maximization
Motivatio Expectatio maximizatio Subhrasu Maji CMSCI 689: Machie Learig 14 April 015 Suppose you are builig a aive Bayes spam classifier. After your are oe your boss tells you that there is o moey to label
More informationChapter 6 Principles of Data Reduction
Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a
More informationAxis Aligned Ellipsoid
Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple
More informationExponential Families and Bayesian Inference
Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where
More informationIntroduction to Machine Learning DIS10
CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig
More informationMixtures of Gaussians and the EM Algorithm
Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity
More informationRegression and generalization
Regressio ad geeralizatio CE-717: Machie Learig Sharif Uiversity of Techology M. Soleymai Fall 2016 Curve fittig: probabilistic perspective Describig ucertaity over value of target variable as a probability
More informationECE 901 Lecture 13: Maximum Likelihood Estimation
ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered
More informationRandom Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices
Radom Matrices with Blocks of Itermediate Scale Strogly Correlated Bad Matrices Jiayi Tog Advisor: Dr. Todd Kemp May 30, 07 Departmet of Mathematics Uiversity of Califoria, Sa Diego Cotets Itroductio Notatio
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5
CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationPattern Classification
Patter Classificatio All materials i these slides were tae from Patter Classificatio (d ed) by R. O. Duda, P. E. Hart ad D. G. Stor, Joh Wiley & Sos, 000 with the permissio of the authors ad the publisher
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More informationEECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1
EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum
More informationStatistical Pattern Recognition
Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig
More informationLecture 11 and 12: Basic estimation theory
Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis
More informationEconomics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator
Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters
More informationn n i=1 Often we also need to estimate the variance. Below are three estimators each of which is optimal in some sense: n 1 i=1 k=1 i=1 k=1 i=1 k=1
MATH88T Maria Camero Cotets Basic cocepts of statistics Estimators, estimates ad samplig distributios 2 Ordiary least squares estimate 3 3 Maximum lielihood estimator 3 4 Bayesia estimatio Refereces 9
More informationThis exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.
Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the
More informationNaïve Bayes. Naïve Bayes
Statistical Data Miig ad Machie Learig Hilary Term 206 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.uk/~sejdiov/sdmml : aother plug-i classifier
More informationLecture 3: MLE and Regression
STAT/Q SCI 403: Itroductio to Resamplig Methods Sprig 207 Istructor: Ye-Chi Che Lecture 3: MLE ad Regressio 3. Parameters ad Distributios Some distributios are idexed by their uderlyig parameters. Thus,
More informationPattern Classification
Patter Classificatio All materials i these slides were tae from Patter Classificatio (d ed) by R. O. Duda, P. E. Hart ad D. G. Stor, Joh Wiley & Sos, 000 with the permissio of the authors ad the publisher
More informationLecture 12: September 27
36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationOutline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression
REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques
More information1.010 Uncertainty in Engineering Fall 2008
MIT OpeCourseWare http://ocw.mit.edu.00 Ucertaity i Egieerig Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu.terms. .00 - Brief Notes # 9 Poit ad Iterval
More informationSlide Set 13 Linear Model with Endogenous Regressors and the GMM estimator
Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday
More informationBayesian Methods: Introduction to Multi-parameter Models
Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested
More informationIntroductory statistics
CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key
More informationTable 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab
Sectio 12 Tests of idepedece ad homogeeity I this lecture we will cosider a situatio whe our observatios are classified by two differet features ad we would like to test if these features are idepedet
More informationLecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)
Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell
More informationProbability and MLE.
10-701 Probability ad MLE http://www.cs.cmu.edu/~pradeepr/701 (brief) itro to probability Basic otatios Radom variable - referrig to a elemet / evet whose status is ukow: A = it will rai tomorrow Domai
More informationDistributional Similarity Models (cont.)
Distributioal Similarity Models (cot.) Regia Barzilay EECS Departmet MIT October 19, 2004 Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More informationStochastic Simulation
Stochastic Simulatio 1 Itroductio Readig Assigmet: Read Chapter 1 of text. We shall itroduce may of the key issues to be discussed i this course via a couple of model problems. Model Problem 1 (Jackso
More informationDimensionality Reduction vs. Clustering
Dimesioality Reductio vs. Clusterig Lecture 9: Cotiuous Latet Variable Models Sam Roweis Traiig such factor models (e.g. FA, PCA, ICA) is called dimesioality reductio. You ca thik of this as (o)liear regressio
More informationStatistical inference: example 1. Inferential Statistics
Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either
More informationStat410 Probability and Statistics II (F16)
Some Basic Cocepts of Statistical Iferece (Sec 5.) Suppose we have a rv X that has a pdf/pmf deoted by f(x; θ) or p(x; θ), where θ is called the parameter. I previous lectures, we focus o probability problems
More informationStat 421-SP2012 Interval Estimation Section
Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationMachine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring
Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor
More informationMachine Learning for Data Science (CS 4786)
Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm
More informationStatistical Inference Based on Extremum Estimators
T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0
More informationLecture 7: Density Estimation: k-nearest Neighbor and Basis Approach
STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.
More informationProblem Cosider the curve give parametrically as x = si t ad y = + cos t for» t» ß: (a) Describe the path this traverses: Where does it start (whe t =
Mathematics Summer Wilso Fial Exam August 8, ANSWERS Problem 1 (a) Fid the solutio to y +x y = e x x that satisfies y() = 5 : This is already i the form we used for a first order liear differetial equatio,
More informationMATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4
MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.
More informationDepartment of Mathematics
Departmet of Mathematics Ma 3/103 KC Border Itroductio to Probability ad Statistics Witer 2017 Lecture 19: Estimatio II Relevat textbook passages: Larse Marx [1]: Sectios 5.2 5.7 19.1 The method of momets
More informationDistributional Similarity Models (cont.)
Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical Last Time EM Clusterig Soft versio of K-meas clusterig Iput: m dimesioal objects X = {
More informationBIOINF 585: Machine Learning for Systems Biology & Clinical Informatics
BIOINF 585: Machie Learig for Systems Biology & Cliical Iformatics Lecture 14: Dimesio Reductio Jie Wag Departmet of Computatioal Medicie & Bioiformatics Uiversity of Michiga 1 Outlie What is feature reductio?
More informationADVANCED SOFTWARE ENGINEERING
ADVANCED SOFTWARE ENGINEERING COMP 3705 Exercise Usage-based Testig ad Reliability Versio 1.0-040406 Departmet of Computer Ssciece Sada Narayaappa, Aeliese Adrews Versio 1.1-050405 Departmet of Commuicatio
More informationw (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.
2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For
More informationLecture 7: Properties of Random Samples
Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationSupplemental Material: Proofs
Proof to Theorem Supplemetal Material: Proofs Proof. Let be the miimal umber of traiig items to esure a uique solutio θ. First cosider the case. It happes if ad oly if θ ad Rak(A) d, which is a special
More informationEE 6885 Statistical Pattern Recognition
EE 6885 Statistical Patter Recogitio Fall 5 Prof. Shih-Fu Chag http://www.ee.columbia.edu/~sfchag Lecture 6 (9/8/5 EE6887-Chag 6- Readig EM for Missig Features Textboo, DHS 3.9 Bayesia Parameter Estimatio
More informationSupport vector machine revisited
6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector
More informationMIDTERM 3 CALCULUS 2. Monday, December 3, :15 PM to 6:45 PM. Name PRACTICE EXAM SOLUTIONS
MIDTERM 3 CALCULUS MATH 300 FALL 08 Moday, December 3, 08 5:5 PM to 6:45 PM Name PRACTICE EXAM S Please aswer all of the questios, ad show your work. You must explai your aswers to get credit. You will
More informationSolution of Final Exam : / Machine Learning
Solutio of Fial Exam : 10-701/15-781 Machie Learig Fall 2004 Dec. 12th 2004 Your Adrew ID i capital letters: Your full ame: There are 9 questios. Some of them are easy ad some are more difficult. So, if
More informationVariable selection in principal components analysis of qualitative data using the accelerated ALS algorithm
Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm Masahiro Kuroda Yuichi Mori Masaya Iizuka Michio Sakakihara (Okayama Uiversity of Sciece) (Okayama Uiversity
More informationLecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)
Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +
More informationLecture 13: Maximum Likelihood Estimation
ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select
More informationRandomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)
Radomized Algorithms I, Sprig 08, Departmet of Computer Sciece, Uiversity of Helsiki Homework : Solutios Discussed Jauary 5, 08). Exercise.: Cosider the followig balls-ad-bi game. We start with oe black
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More information3/8/2016. Contents in latter part PATTERN RECOGNITION AND MACHINE LEARNING. Dynamical Systems. Dynamical Systems. Linear Dynamical Systems
Cotets i latter part PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Liear Dyamical Systems What is differet from HMM? Kalma filter Its stregth ad limitatio Particle Filter Its simple
More information4. Partial Sums and the Central Limit Theorem
1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems
More informationLecture 23: Minimal sufficiency
Lecture 23: Miimal sufficiecy Maximal reductio without loss of iformatio There are may sufficiet statistics for a give problem. I fact, X (the whole data set) is sufficiet. If T is a sufficiet statistic
More information5 : Exponential Family and Generalized Linear Models
0-708: Probabilistic Graphical Models 0-708, Sprig 206 5 : Expoetial Family ad Geeralized Liear Models Lecturer: Matthew Gormley Scribes: Yua Li, Yichog Xu, Silu Wag Expoetial Family Probability desity
More informationVector Quantization: a Limiting Case of EM
. Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z
More informationChi-Squared Tests Math 6070, Spring 2006
Chi-Squared Tests Math 6070, Sprig 2006 Davar Khoshevisa Uiversity of Utah February XXX, 2006 Cotets MLE for Goodess-of Fit 2 2 The Multiomial Distributio 3 3 Applicatio to Goodess-of-Fit 6 3 Testig for
More informationOptimization Methods MIT 2.098/6.255/ Final exam
Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short
More informationStatistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.
Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized
More informationEmpirical Processes: Glivenko Cantelli Theorems
Empirical Processes: Gliveko Catelli Theorems Mouliath Baerjee Jue 6, 200 Gliveko Catelli classes of fuctios The reader is referred to Chapter.6 of Weller s Torgo otes, Chapter??? of VDVW ad Chapter 8.3
More information5.1 Review of Singular Value Decomposition (SVD)
MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More information6.867 Machine learning
6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples
More informationBig Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.
5. Data, Estimates, ad Models: quatifyig the accuracy of estimates. 5. Estimatig a Normal Mea 5.2 The Distributio of the Normal Sample Mea 5.3 Normal data, cofidece iterval for, kow 5.4 Normal data, cofidece
More informationHypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance
Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?
More informationRandom Variables, Sampling and Estimation
Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig
More informationCSIE/GINM, NTU 2009/11/30 1
Itroductio ti to Machie Learig (Part (at1: Statistical Machie Learig Shou de Li CSIE/GINM, NTU sdli@csie.tu.edu.tw 009/11/30 1 Syllabus of a Itro ML course ( Machie Learig, Adrew Ng, Staford, Autum 009
More information