Probabilistic Unsupervised Learning

Size: px
Start display at page:

Download "Probabilistic Unsupervised Learning"

Transcription

1 Statistical Data Miig ad Machie Learig Hilary Term 2016 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: Probabilistic Methods Algorithmic approach: Data Probabilistic modellig approach: Uobserved process Algorithm Geerative Model Aalysis Iterpretatio Aalysis/ Iterpretatio Data Mixture models suppose that our dataset X was created by samplig iid from K distict populatios (called mixture compoets). Samples i populatio ca be modelled usig a distributio F µ with desity f (x µ ), where µ is the model parameter for the -th compoet. For a cocrete example, cosider a Gaussia with uow mea µ ad ow diagoal covariace σ 2 I, f (x µ ) 2πσ 2 p 2 exp ( 1 2σ 2 x µ 2 2 Geerative model: for i 1, 2,..., : First determie the assigmet variable idepedetly for each data item i: Z i Discrete(π 1,..., π K ) ). i.e., P(Z i ) π where mixig proportios / additioal model parameters are π 0 for each ad 1 π 1. Give the assigmet Z i, the X i (X (1) i (idepedetly) from the correspodig -th compoet:,..., X (p) i ) is sampled X i Z i f (x µ ) We observe X i x i for each i but ot Z i s (latet variables), ad would lie to ifer the parameters.

2 Uows to lear give data are Parameters: θ (π, µ ) K 1, where π 1,..., π K [0, 1], µ 1,..., µ K R p, ad Latet variables: z 1,..., z. The joit probability over all cluster idicator variables {Z i } are: : Joit pmf/pdf of observed ad latet variables Uows to lear give data are Parameters: θ (π, µ ) K 1, where π 1,..., π K [0, 1], µ 1,..., µ K R p, ad Latet variables: z 1,..., z. The joit probability mass fuctio/desity 1 is: p Z ((z i ) ) π zi K 1 π 1(z i) Mixture models 339 The joit desity at observatios X i x i give Z i z i are: p(x i z i ) p(z i ) Name Sectio MVN Discrete Mixture of Gaussias K Prod. pdiscrete X ((x i ) Discrete (Z i z i ) ) Mixture f (xof i µ multiomials zi ) f (x i µ ) (z i) Prod. Gaussia Prod. Gaussia Factor aalysis/ probabilistic PCA Prod. Gaussia Prod. Laplace Probabilistic ICA/ sparse codig 12.6 Prod. Discrete Prod. Gaussia Multiomial PCA Prod. Discrete Dirichlet Latet Dirichlet allocatio 27.3 Prod. Noisy-OR Prod. Beroulli BN20/ QMR Prod. Beroulli Probabilistic Prod. Usupervised Beroulli Learig Sigmoid Mixturebelief Modelset 27.7 Mixture Table 11.1 Models: Gaussia Mixtures with Uequal Prod. Discrete i the lielihood meas a factored distributio of the form j Covariaces Gaussia meas a factored distributio of the form j Summary of some popular directed latet variable models. Here Prod meas product, so Cat(xij zi), ad Prod. N (xij zi). PCA stads for pricipal compoets aalysis. ICA stads for idepededet compoets aalysis. p X,Z ((x i, z i ) ) p Z ((z i ) )p X ((x i ) (Z i z i ) ) 1 K (π f (x i µ )) 1(z i) Ad the margial desity of x i (resultig model o the observed data) is: p(x i ) p(z i j, x i ) j1 π j f (x i µ j ). 1 I this course we will treat probability mass fuctios ad desities i the same way for otatioal simplicity. Strictly speaig, p X,Z is a desity with respect to the product base measure, where the base measure is the coutig measure for discrete variables ad Lebesgue for cotiuous variables. : Resposibility j Suppose we ow the parameters θ (π, µ ) K 1. Z i is a radom variable ad its coditioal distributio give data set X is: Q i : p(z i x i ) p(z i, x i ) p(x i ) π f (x i µ ) j1 π jf (x i µ j ) (a) (b) figure from Murphy, 2012, Ch. 11. Here Figure θ 11.3(π A mixture of 3 Gaussias i 2d. (a) We show the cotours of costat probability for each, µ, Σ ) K 1 are all the model parametes ad compoet i the mixture. (b) A surface plot of the overall desity. Based o Figure 2.23 of (Bishop 2006a). ) Figure geerated by mixgaussplotdemo. ( f (x (µ, Σ )) (2π) p 2 Σ 1 2 exp 1 2 (x µ ) Σ 1 (x µ ), The coditioal probability Q i is called the resposibility of mixture compoet for data poit x i. These coditioals softly partitios the dataset amog the compoets: 1 Q i Mixtures of Gaussias p(x) π The most widely used mixture model f (x (µ is the mixture, Σ )) of Gaussias (MOG), also called a Gaussia 1 mixture model or GMM. I this model, each base distributio i the mixture is a multivariate Gaussia with mea μ ad covariace matrix Σ. Thus the model has the form

3 : Maximum Liehood : Maximum Liehood How ca we lear about the parameters θ (π, µ ) K 1 from data? Stadard statistical methodology ass for the maximum lielihood estimator (MLE). The goal is to maximise the margial probability of the data over the parameters Margial log-lielihood: l((π, µ ) K 1) : log p(x (π, µ ) K 1) log π f (x i µ ) 1 ˆθ ML argmax p(x θ) argmax θ (π,µ ) K 1 argmax (π,µ ) K 1 argmax p(x i (π, µ ) K 1) 1 (π,µ ) K 1 log π f (x i µ ) π f (x i µ ). 1 } {{ } :l((π,µ ) K 1 ) The gradiet w.r.t. µ : µ l((π, µ ) K 1) π f (x i µ ) j1 π jf (x i µ j ) µ log f (x i µ ) Q i µ log f (x i µ ). Difficult to solve, as Q i depeds implicitly o µ. Lielihood Surface for a Simple Example : Maximum Liehood If latet variables z i s were all observed, we would have a uimodal lielihood surface but whe we margialise out the latets, the lielihood surface becomes multimodal: o uique MLE. 320 compbody.tex Recall we would lie to solve: µ l((π, µ ) K 1) Q i µ log f (x i µ ) (a) mu mu 1 (left) 200 data poits from a mixture of two 1D Gaussias with Figure 11.6: Left: N 200 data poits sampled from a mixture of 2 Gaussias i 1d, with π 0.5, σ 5, µ 1 10 ad µ Right: π 1 Lielihood π 2 surface 0.5, p(d µ σ 1, 5µ 2), ad withµ all 1 other 10, parameters µ 2 set 10. to their true values. We see the two symmetric modes, reflectig the uidetifiability of the parameters. Produced by mixgausslisurfacedemo. (right) Observed data log lielihood surface l (µ 1, µ 2 ), all the other parameters beig assumed ow Uidetifiability Note that mixture models are ot idetifiable, which meas there are may settigs of the parameters which have the same lielihood. Specifically, i a mixture model with K compoets, there are K! equivalet parameter settigs, which differ merely (b) What if we igore the depedece of Q i o the parameters? Taig the mixture of Gaussia with covariace σ 2 I as example, 1 σ 2 ( Q i µ p 2 log(2πσ2 ) 1 ) 2σ 2 x i µ 2 2 Q i (x i µ ) 1 σ 2 ( µ ML? Q ix i Q i Q i x i µ ( Q i) ) 0

4 : Maximum Liehood : Maximum Liehood The estimate is a weighted average of data poits, where the estimated mea of cluster uses its resposibilities to data poits as weights. µ ML? Q ix i Q. i Maes sese: Suppose we ew that data poit x i came from populatio z i. The Q izi 1 ad Q i 0 for z i ad: µ ML? i:z i x i i:z i 1 avg{x i : z i } Our best guess of the origiatig populatio is give by Q i. Soft K-Meas algorithm? Gradiet w.r.t. mixig proportio π (icludig a Lagrage multiplier λ ( π 1 ) to eforce costrait π 1). Note: ( π l((π, µ ) K 1) λ( ) K 1 π 1) 1 Q i f (x i µ ) j1 π jf (x i µ j ) λ Q i π λ 0 π 1 Q i }{{} 1 Q i π ML? Q i Agai maes sese: the estimate is simply (our best guess of) the proportio of data poits comig from populatio. : The Puttig all the derivatios together, we get a iterative algorithm for learig about the uows i the mixture model. Start with some iitial parameters (π (0), µ (0) ) K 1. Iterate for t 1, 2,...: Expectatio Step: Maximizatio Step: Q (t) i : π (t) Q(t) i Will the algorithm coverge? What does it coverge to? π (t 1) j1 π(t 1) j f (x i µ (t 1) ) f (x i µ (t 1) j ) µ (t) Q(t) i x i Q(t) i A example with 3 clusters. X2 X1

5 After 1st E ad M step. Iteratio 1 After 2d E ad M step. Iteratio 2 After 3rd E ad M step. Iteratio 3 After 4th E ad M step. Iteratio 4

6 After 5th E ad M step. Iteratio 5 I a maximum lielihood framewor, the objective fuctio is the log lielihood, l(θ) log π f (x i µ ) Direct maximisatio is ot feasible. Cosider aother objective fuctio F(θ, q), where q is ay probability distributio o latet variables z, such that: 1 F(θ, q) l(θ) for all θ, q, max F(θ, q) l(θ) q F(θ, q) is a lower boud o the log lielihood. We ca costruct a alteratig maximisatio algorithm as follows: For t 1, 2... util covergece: q (t) : argmax F(θ (t 1), q) q θ (t) : argmax F(θ, q (t) ) θ - Solvig for q The lower boud we use is called the variatioal free eergy. q is a probability mass fuctio for a distributio over z : (z i ). F(θ, q) E q [log p(x, z θ) log q(z)] [( ) ] E q 1(z i ) (log π + log f (x i µ )) log q(z) 1 [( ) ] q(z) 1(z i ) (log π + log f (x i µ )) log q(z) z 1 Lemma F(θ, q) l(θ) for all q ad for all θ. Lemma F(θ, q) l(θ) for q(z) p(z x, θ). I combiatio with previous Lemma, this implies that q(z) p(z x, θ) maximizes F(θ, q) for fixed θ, i.e., the optimal q is simply the coditioal distributio give the data ad that fixed θ. I mixture model, q (z) p(z, x θ) p(x θ) π z i f (x i µ zi ) z π z i f (x i µ z i ) π zi f (x i µ zi ) π f (x i µ ) p(z i x i, θ).

7 - Solvig for θ Settig derivative with respect to µ to 0, µ F(θ, q) q(z) 1(z i ) µ log f (x i µ ) z q(z i ) µ log f (x i µ ) 0 This equatio ca be solved quite easily. E.g., for mixture of Gaussias, µ q(z i )x i q(z i ) If it caot be solved exactly, we ca use gradiet ascet algorithm: µ µ + α q(z i ) µ log f (x i µ ). Similar derivatio for optimal π as before. Notes o Probabilistic Approach ad Start with some iitial parameters (π (0), µ (0) ) K 1. Iterate for t 1, 2,...: Expectatio Step: Theorem q (t) (z i ) : p(z i x i, θ (t 1) ) Maximizatio Step: π (t) q(t) (z i ) π (t 1) j1 π(t 1) j EM algorithm mootoically icreases the log lielihood. f (x i µ (t 1) ) f (x i µ (t 1) j ) µ (t) q(t) (z i )x i q(t) (z i ) Proof: l(θ (t 1) ) F(θ (t 1), q (t) ) F(θ (t), q (t) ) F(θ (t), q (t+1) ) l(θ (t) ). Additioal assumptio, that 2 θ F(θ(t), q (t) ) are egative defiite with eigevalues < ɛ < 0, implies that θ (t) θ where θ is a local MLE. Flexible Gaussia Some good thigs: Guarateed covergece to locally optimal parameters. Formal reasoig of ucertaities, usig both Bayes Theorem ad maximum lielihood theory. Rich laguage of probability theory to express a wide rage of geerative models, ad straightforward derivatio of algorithms for ML estimatio. Some bad thigs: Ca get stuc i local miima so multiple starts are recommeded. Slower ad more expesive tha K-meas. Choice of K still problematic, but rich array of methods for model selectio comes to rescue. We ca allow each cluster to have its ow mea ad covariace structure to eable greater flexibility i the model. Differet covariaces Idetical covariaces Differet, but diagoal covariaces Idetical ad spherical covariaces

8 PPCA latets A probabilistic model related to PCA has the followig geerative model: for i 1, 2,..., : Let <, p be give. Let Y i be a (latet) -dimesioal ormally distributed radom variable with 0 mea ad idetity covariace: Y i N (0, I ) We model the distributio of the ith data poit give Y i as a p-dimesioal ormal: X i N (µ + LY i, σ 2 I) where the parameters are a vector µ R p, a matrix L R p ad σ 2 > 0. pricipal subspace figures from M. Sahai s UCL course o Usupervised Learig PPCA latets PPCA latets PCA projectio PPCA oise PPCA latet prior pricipal subspace pricipal subspace figures from M. Sahai s UCL course o Usupervised Learig figures from M. Sahai s UCL course o Usupervised Learig

9 Mixture of s PPCA latets PPCA posterior PPCA oise PPCA latet prior PPCA projectio We have leart two types of usupervised learig techiques: Dimesioality reductio, e.g. PCA, MDS, Isomap. Clusterig, e.g. K-meas, liage ad mixture models. Probabilistic models allow us to costruct more complex models from simpler pieces. Mixture of probabilistic PCAs allows both clusterig ad dimesioality reductio at the same time. Z i Discrete(π 1,..., π K ) Y i N (0, I d ) X i Z i, Y i y i N (µ + Ly i, σ 2 I p ) pricipal subspace Allows flexible modellig of covariace structure without usig too may parameters. figures from M. Sahai s UCL course o Usupervised Learig Ghahramai ad Hito 1996 Further Readig Usupervised Learig Hastie et al, Chapter 14. James et al, Chapter 10. Ripley, Chapter 9. Tuey, Joh W. (1980). We eed both exploratory ad cofirmatory. The America Statisticia 34 (1):

Probabilistic Unsupervised Learning

Probabilistic Unsupervised Learning HT2015: SC4 Statistical Data Miig ad Machie Learig Dio Sejdiovic Departmet of Statistics Oxford http://www.stats.ox.ac.u/~sejdiov/sdmml.html Probabilistic Methods Algorithmic approach: Data Probabilistic

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

Unsupervised Learning 2001

Unsupervised Learning 2001 Usupervised Learig 2001 Lecture 3: The EM Algorithm Zoubi Ghahramai zoubi@gatsby.ucl.ac.uk Carl Edward Rasmusse edward@gatsby.ucl.ac.uk Gatsby Computatioal Neurosciece Uit MSc Itelliget Systems, Computer

More information

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar. Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.

More information

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019 Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /

More information

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian Chapter 2 EM algorithms The Expectatio-Maximizatio (EM) algorithm is a maximum likelihood method for models that have hidde variables eg. Gaussia Mixture Models (GMMs), Liear Dyamic Systems (LDSs) ad Hidde

More information

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis Lecture 10: Factor Aalysis ad Pricipal Compoet Aalysis Sam Roweis February 9, 2004 Whe we assume that the subspace is liear ad that the uderlyig latet variable has a Gaussia distributio we get a model

More information

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model Back to Maximum Likelihood Give a geerative model f (x, y = k) =π k f k (x) Usig a geerative modellig approach, we assume a parametric form for f k (x) =f (x; k ) ad compute the MLE θ of θ =(π k, k ) k=

More information

The Expectation-Maximization (EM) Algorithm

The Expectation-Maximization (EM) Algorithm The Expectatio-Maximizatio (EM) Algorithm Readig Assigmets T. Mitchell, Machie Learig, McGraw-Hill, 997 (sectio 6.2, hard copy). S. Gog et al. Dyamic Visio: From Images to Face Recogitio, Imperial College

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

Expectation-Maximization Algorithm.

Expectation-Maximization Algorithm. Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................

More information

Mixtures of Gaussians and the EM Algorithm

Mixtures of Gaussians and the EM Algorithm Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

CSE 527, Additional notes on MLE & EM

CSE 527, Additional notes on MLE & EM CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

Pattern Classification

Pattern Classification Patter Classificatio All materials i these slides were tae from Patter Classificatio (d ed) by R. O. Duda, P. E. Hart ad D. G. Stor, Joh Wiley & Sos, 000 with the permissio of the authors ad the publisher

More information

Lecture 11 and 12: Basic estimation theory

Lecture 11 and 12: Basic estimation theory Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis

More information

8 : Learning Partially Observed GM: the EM algorithm

8 : Learning Partially Observed GM: the EM algorithm 10-708: Probabilistic Graphical Models, Sprig 2015 8 : Learig Partially Observed GM: the EM algorithm Lecturer: Eric P. Xig Scribes: Auric Qiao, Hao Zhag, Big Liu 1 Itroductio Two fudametal questios i

More information

Regression and generalization

Regression and generalization Regressio ad geeralizatio CE-717: Machie Learig Sharif Uiversity of Techology M. Soleymai Fall 2016 Curve fittig: probabilistic perspective Describig ucertaity over value of target variable as a probability

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Expectation maximization

Expectation maximization Motivatio Expectatio maximizatio Subhrasu Maji CMSCI 689: Machie Learig 14 April 015 Suppose you are builig a aive Bayes spam classifier. After your are oe your boss tells you that there is o moey to label

More information

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam. Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

Axis Aligned Ellipsoid

Axis Aligned Ellipsoid Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple

More information

Clustering: Mixture Models

Clustering: Mixture Models Clusterig: Mixture Models Machie Learig 10-601B Seyoug Kim May of these slides are derived from Tom Mitchell, Ziv- Bar Joseph, ad Eric Xig. Thaks! Problem with K- meas Hard Assigmet of Samples ito Three

More information

Dimensionality Reduction vs. Clustering

Dimensionality Reduction vs. Clustering Dimesioality Reductio vs. Clusterig Lecture 9: Cotiuous Latet Variable Models Sam Roweis Traiig such factor models (e.g. FA, PCA, ICA) is called dimesioality reductio. You ca thik of this as (o)liear regressio

More information

Naïve Bayes. Naïve Bayes

Naïve Bayes. Naïve Bayes Statistical Data Miig ad Machie Learig Hilary Term 206 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.uk/~sejdiov/sdmml : aother plug-i classifier

More information

Distributional Similarity Models (cont.)

Distributional Similarity Models (cont.) Distributioal Similarity Models (cot.) Regia Barzilay EECS Departmet MIT October 19, 2004 Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical

More information

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics BIOINF 585: Machie Learig for Systems Biology & Cliical Iformatics Lecture 14: Dimesio Reductio Jie Wag Departmet of Computatioal Medicie & Bioiformatics Uiversity of Michiga 1 Outlie What is feature reductio?

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Pattern Classification

Pattern Classification Patter Classificatio All materials i these slides were tae from Patter Classificatio (d ed) by R. O. Duda, P. E. Hart ad D. G. Stor, Joh Wiley & Sos, 000 with the permissio of the authors ad the publisher

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab Sectio 12 Tests of idepedece ad homogeeity I this lecture we will cosider a situatio whe our observatios are classified by two differet features ad we would like to test if these features are idepedet

More information

Distributional Similarity Models (cont.)

Distributional Similarity Models (cont.) Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical Last Time EM Clusterig Soft versio of K-meas clusterig Iput: m dimesioal objects X = {

More information

EE 6885 Statistical Pattern Recognition

EE 6885 Statistical Pattern Recognition EE 6885 Statistical Patter Recogitio Fall 5 Prof. Shih-Fu Chag http://www.ee.columbia.edu/~sfchag Lecture 6 (9/8/5 EE6887-Chag 6- Readig EM for Missig Features Textboo, DHS 3.9 Bayesia Parameter Estimatio

More information

Computing the maximum likelihood estimates: concentrated likelihood, EM-algorithm. Dmitry Pavlyuk

Computing the maximum likelihood estimates: concentrated likelihood, EM-algorithm. Dmitry Pavlyuk Computig the maximum likelihood estimates: cocetrated likelihood, EM-algorithm Dmitry Pavlyuk The Mathematical Semiar, Trasport ad Telecommuicatio Istitute, Riga, 13.05.2016 Presetatio outlie 1. Basics

More information

5 : Exponential Family and Generalized Linear Models

5 : Exponential Family and Generalized Linear Models 0-708: Probabilistic Graphical Models 0-708, Sprig 206 5 : Expoetial Family ad Geeralized Liear Models Lecturer: Matthew Gormley Scribes: Yua Li, Yichog Xu, Silu Wag Expoetial Family Probability desity

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

1.010 Uncertainty in Engineering Fall 2008

1.010 Uncertainty in Engineering Fall 2008 MIT OpeCourseWare http://ocw.mit.edu.00 Ucertaity i Egieerig Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu.terms. .00 - Brief Notes # 9 Poit ad Iterval

More information

Bayesian Methods: Introduction to Multi-parameter Models

Bayesian Methods: Introduction to Multi-parameter Models Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested

More information

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +

More information

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15 17. Joit distributios of extreme order statistics Lehma 5.1; Ferguso 15 I Example 10., we derived the asymptotic distributio of the maximum from a radom sample from a uiform distributio. We did this usig

More information

Probability and MLE.

Probability and MLE. 10-701 Probability ad MLE http://www.cs.cmu.edu/~pradeepr/701 (brief) itro to probability Basic otatios Radom variable - referrig to a elemet / evet whose status is ukow: A = it will rai tomorrow Domai

More information

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

Stat410 Probability and Statistics II (F16)

Stat410 Probability and Statistics II (F16) Some Basic Cocepts of Statistical Iferece (Sec 5.) Suppose we have a rv X that has a pdf/pmf deoted by f(x; θ) or p(x; θ), where θ is called the parameter. I previous lectures, we focus o probability problems

More information

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise First Year Quatitative Comp Exam Sprig, 2012 Istructio: There are three parts. Aswer every questio i every part. Questio I-1 Part I - 203A A radom variable X is distributed with the margial desity: >

More information

Classification with linear models

Classification with linear models Lecture 8 Classificatio with liear models Milos Hauskrecht milos@cs.pitt.edu 539 Seott Square Geerative approach to classificatio Idea:. Represet ad lear the distributio, ). Use it to defie probabilistic

More information

ADVANCED SOFTWARE ENGINEERING

ADVANCED SOFTWARE ENGINEERING ADVANCED SOFTWARE ENGINEERING COMP 3705 Exercise Usage-based Testig ad Reliability Versio 1.0-040406 Departmet of Computer Ssciece Sada Narayaappa, Aeliese Adrews Versio 1.1-050405 Departmet of Commuicatio

More information

n n i=1 Often we also need to estimate the variance. Below are three estimators each of which is optimal in some sense: n 1 i=1 k=1 i=1 k=1 i=1 k=1

n n i=1 Often we also need to estimate the variance. Below are three estimators each of which is optimal in some sense: n 1 i=1 k=1 i=1 k=1 i=1 k=1 MATH88T Maria Camero Cotets Basic cocepts of statistics Estimators, estimates ad samplig distributios 2 Ordiary least squares estimate 3 3 Maximum lielihood estimator 3 4 Bayesia estimatio Refereces 9

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Chi-Squared Tests Math 6070, Spring 2006

Chi-Squared Tests Math 6070, Spring 2006 Chi-Squared Tests Math 6070, Sprig 2006 Davar Khoshevisa Uiversity of Utah February XXX, 2006 Cotets MLE for Goodess-of Fit 2 2 The Multiomial Distributio 3 3 Applicatio to Goodess-of-Fit 6 3 Testig for

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

AAEC/ECON 5126 FINAL EXAM: SOLUTIONS

AAEC/ECON 5126 FINAL EXAM: SOLUTIONS AAEC/ECON 5126 FINAL EXAM: SOLUTIONS SPRING 2015 / INSTRUCTOR: KLAUS MOELTNER This exam is ope-book, ope-otes, but please work strictly o your ow. Please make sure your ame is o every sheet you re hadig

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

Supplemental Material: Proofs

Supplemental Material: Proofs Proof to Theorem Supplemetal Material: Proofs Proof. Let be the miimal umber of traiig items to esure a uique solutio θ. First cosider the case. It happes if ad oly if θ ad Rak(A) d, which is a special

More information

ECE 901 Lecture 13: Maximum Likelihood Estimation

ECE 901 Lecture 13: Maximum Likelihood Estimation ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Outline. L7: Probability Basics. Probability. Probability Theory. Bayes Law for Diagnosis. Which Hypothesis To Prefer? p(a,b) = p(b A) " p(a)

Outline. L7: Probability Basics. Probability. Probability Theory. Bayes Law for Diagnosis. Which Hypothesis To Prefer? p(a,b) = p(b A)  p(a) Outlie L7: Probability Basics CS 344R/393R: Robotics Bejami Kuipers. Bayes Law 2. Probability distributios 3. Decisios uder ucertaity Probability For a propositio A, the probability p(a is your degree

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn Stat 366 Lab 2 Solutios (September 2, 2006) page TA: Yury Petracheko, CAB 484, yuryp@ualberta.ca, http://www.ualberta.ca/ yuryp/ Review Questios, Chapters 8, 9 8.5 Suppose that Y, Y 2,..., Y deote a radom

More information

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

5. Likelihood Ratio Tests

5. Likelihood Ratio Tests 1 of 5 7/29/2009 3:16 PM Virtual Laboratories > 9. Hy pothesis Testig > 1 2 3 4 5 6 7 5. Likelihood Ratio Tests Prelimiaries As usual, our startig poit is a radom experimet with a uderlyig sample space,

More information

STAT331. Example of Martingale CLT with Cox s Model

STAT331. Example of Martingale CLT with Cox s Model STAT33 Example of Martigale CLT with Cox s Model I this uit we illustrate the Martigale Cetral Limit Theorem by applyig it to the partial likelihood score fuctio from Cox s model. For simplicity of presetatio

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 9 Maximum Likelihood Estimatio 9.1 The Likelihood Fuctio The maximum likelihood estimator is the most widely used estimatio method. This chapter discusses the most importat cocepts behid maximum

More information

Variable selection in principal components analysis of qualitative data using the accelerated ALS algorithm

Variable selection in principal components analysis of qualitative data using the accelerated ALS algorithm Variable selectio i pricipal compoets aalysis of qualitative data usig the accelerated ALS algorithm Masahiro Kuroda Yuichi Mori Masaya Iizuka Michio Sakakihara (Okayama Uiversity of Sciece) (Okayama Uiversity

More information

Statistical Inference Based on Extremum Estimators

Statistical Inference Based on Extremum Estimators T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0

More information

Lecture 3: MLE and Regression

Lecture 3: MLE and Regression STAT/Q SCI 403: Itroductio to Resamplig Methods Sprig 207 Istructor: Ye-Chi Che Lecture 3: MLE ad Regressio 3. Parameters ad Distributios Some distributios are idexed by their uderlyig parameters. Thus,

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices Radom Matrices with Blocks of Itermediate Scale Strogly Correlated Bad Matrices Jiayi Tog Advisor: Dr. Todd Kemp May 30, 07 Departmet of Mathematics Uiversity of Califoria, Sa Diego Cotets Itroductio Notatio

More information

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead) Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell

More information

Lecture 23: Minimal sufficiency

Lecture 23: Minimal sufficiency Lecture 23: Miimal sufficiecy Maximal reductio without loss of iformatio There are may sufficiet statistics for a give problem. I fact, X (the whole data set) is sufficiet. If T is a sufficiet statistic

More information

Empirical Processes: Glivenko Cantelli Theorems

Empirical Processes: Glivenko Cantelli Theorems Empirical Processes: Gliveko Catelli Theorems Mouliath Baerjee Jue 6, 200 Gliveko Catelli classes of fuctios The reader is referred to Chapter.6 of Weller s Torgo otes, Chapter??? of VDVW ad Chapter 8.3

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Questions and Answers on Maximum Likelihood

Questions and Answers on Maximum Likelihood Questios ad Aswers o Maximum Likelihood L. Magee Fall, 2008 1. Give: a observatio-specific log likelihood fuctio l i (θ) = l f(y i x i, θ) the log likelihood fuctio l(θ y, X) = l i(θ) a data set (x i,

More information

6.883: Online Methods in Machine Learning Alexander Rakhlin

6.883: Online Methods in Machine Learning Alexander Rakhlin 6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURES 5 AND 6. THE EXPERTS SETTING. EXPONENTIAL WEIGHTS All the algorithms preseted so far halluciate the future values as radom draws ad the perform

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

Simulation. Two Rule For Inverting A Distribution Function

Simulation. Two Rule For Inverting A Distribution Function Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump

More information

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates. 5. Data, Estimates, ad Models: quatifyig the accuracy of estimates. 5. Estimatig a Normal Mea 5.2 The Distributio of the Normal Sample Mea 5.3 Normal data, cofidece iterval for, kow 5.4 Normal data, cofidece

More information

STA Object Data Analysis - A List of Projects. January 18, 2018

STA Object Data Analysis - A List of Projects. January 18, 2018 STA 6557 Jauary 8, 208 Object Data Aalysis - A List of Projects. Schoeberg Mea glaucomatous shape chages of the Optic Nerve Head regio i aimal models 2. Aalysis of VW- Kedall ati-mea shapes with a applicatio

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

CSIE/GINM, NTU 2009/11/30 1

CSIE/GINM, NTU 2009/11/30 1 Itroductio ti to Machie Learig (Part (at1: Statistical Machie Learig Shou de Li CSIE/GINM, NTU sdli@csie.tu.edu.tw 009/11/30 1 Syllabus of a Itro ML course ( Machie Learig, Adrew Ng, Staford, Autum 009

More information

Confidence Intervals for the Population Proportion p

Confidence Intervals for the Population Proportion p Cofidece Itervals for the Populatio Proportio p The cocept of cofidece itervals for the populatio proportio p is the same as the oe for, the samplig distributio of the mea, x. The structure is idetical:

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)]. Probability 2 - Notes 0 Some Useful Iequalities. Lemma. If X is a radom variable ad g(x 0 for all x i the support of f X, the P(g(X E[g(X]. Proof. (cotiuous case P(g(X Corollaries x:g(x f X (xdx x:g(x

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information