CSIE/GINM, NTU 2009/11/30 1
|
|
- Roy Edwards
- 5 years ago
- Views:
Transcription
1 Itroductio ti to Machie Learig (Part (at1: Statistical Machie Learig Shou de Li CSIE/GINM, NTU 009/11/30 1
2 Syllabus of a Itro ML course ( Machie Learig, Adrew Ng, Staford, Autum 009 Supervised learig. (7 classes Supervised learig setup. LMS. Logistic regressio. Perceptro. Expoetial family. Geerative learig algorithms. Gaussia discrimiati i aalysis. Ni Naive Bayes. Support vector machies. Model selectio ad feature selectio. Esemble methods: Baggig, boostig, ECOC. Evaluatig ad debuggig learig algorithms. Learig theory. (3 classes Bias/variace tradeoff. Uio ad Cheroff/Hoeffdig bouds. VC dimesio. Worst case (olie learig. Practical advice o how to use learig algorithms. Usupervised learig. (5 classes Clusterig. K meas. EM. Mixture of Gaussias. Factor aalysis. PCA. MDS. ppca. Idepedet compoets aalysis (ICA. Reiforcemet learig ad cotrol. (4 classes MDPs.Bellma equatios. Value iteratio ad policy iteratio. Liear quadratic regulatio (LQR. LQG. Q learig. Value fuctio approximatio. Policy search. Reiforce. POMDPs. HT has doe a great job teachig you Advaced SL ad Learig 009/11/30 Theory, ad my missio is to fill oe missig piece i the puzzle.
3 Why teachig Itro to ML? Whe revealig that t you have take a ML course, people would more or less expect you to have already kow somethig, E.g. Eg Naïve Bayes. Thereare some ML methods that are so commoly applied i research ad real world that you will eed to kow a little bit about them. E.g. K meas clusterig There are some ML method that are too ubelievable ad amazig to igore. E.g. EM framework. 009/11/30 3
4 To Brig you Back to the Earth Statistical Machie Learig. ( hours A Bayesia view about ML Geerative learig model. Gaussia discrimiat aalysis. Naïve Bayes Usupervised dlearig. (3 hours Clusterig: K meas. EM. Reiforcemet learig (0.5 hour Value iteratio ad policy iteratio. Q learig & SARSA 009/11/30 4
5 Theoretical ML vs. Statistical ML What you have kow: SL takes may (x,t as iputs to trai a learer f(x, the apply it to usee x k ad predict it as f(x k For example (X is 3 dimesioal: Traiig { ([1,,3], 0.1, ([,3,4],0., ([3,4,5], 0.5 } Testig: [,4,5] 0.7 However, ucertaity exist i the real world, therefore a error distributio (e.g. Gaussia is usually added: t=f(x+error. That says, it is possible to geerate differet results for same iputs, for example: Traiig {([1,,3],0.1, ([1,,3],0.,([1,,3],0.1 } Testig: [13]=? [1,,3]=? 009/11/30 Probability ad ML, Shou de Li 5
6 The Probabilistic Form of t The output t is a distributio caused by the error (assumig Gaussia term: p(t x,w,β= N(t y(x,w, β 1, β is called a precisio parameter which equals theiverse of the variace 1/σ. 009/11/30 Probability ad ML, Shou de Li 6
7 The SL process uder probability Give traiig data {X,T}, we wat to determie the ukow parameter W ad β so we will kow the distributio of y. Assumig we observed N data poits, the p(t X,W, β N = Ν ( t = 1 = y ( x p(t x,w, β * p(t 1 1, W, β 1 N β l( l( p(t X,W, β = { y ( x likelihood = 1 x, W this is called log - likelihood fuctio,w, β...* p(t t fuctio } + N N x N,W, β (l β l( l(π, 009/11/30 Probability ad ML, Shou de Li 7
8 Maximum Likelihood Estimatio (MLE Idea: tryig to adjust the ukow parameters (i.e. W ad β to maximize the likelihood fuctio or log likelihood fuctio N β l( p(t X,W, β = { y( x = 1, W t } + N (l β l(π Adjustig W to maximizig this log likelihood fuctio give Gaussia error fuctio is equivalet to fidig a W ML that miimizig the mea square error fuctio 009/11/30 Probability ad ML, Shou de Li 8
9 Maximum Likelihood Estimatio for β l( First, we calculate W ML that govers the mea of the distributio. The we use W ML i the likelihood fuctio to determie theoptimal β determie the optimal β ML β p(t X,W N ML, β 1 N = { y ( x, W ML t } + = β β 1 = 1 N N = 1 { y( x, W ML = 1 t } 0 009/11/30 Probability ad ML, Shou de Li 9
10 A SL system usig MLE 1. We first determie W as W ML that miimizes the error fuctio 1 N { (, } y x w t Ted to overfit =1 N. Usig W ML to fid β as 1 1 = β { y ( x, W ML t} N = 1 3. Predictio stage: Usig W ML ad β to costruct the distributio of t: p(t x,w,β= W N(t y(x,w N(t y(xw 1 ML, β ML 4. Predict the value of a iput x by samplig t usig the distributio i (3 The MLE approach cosistetly uderestimate the variace of the data ad ca lead to overfittig 009/11/30 Probability ad ML, Shou de Li 10
11 Bayesia Approach for Regressio Why Bayesia Approach: some w s are preferable tha others For example, the regularizatio prefers simple model (i.e. smallw s. ws. Cosequetly, p(w caot be treated as uiformly distributed 009/11/30 Probability ad ML, Shou de Li 11
12 P ( W T Bayes Rule Review = P( W X, T P ( T W * P ( W P( T = P( T X, W * P( W P ( T X P( W X, T P( T X, W * P( W X P(W X: prior probability P(Tl XW:Likelihood X,W: probability (what MLE tries to optimize, argmax w P(T X,W P(W X,T : posterior probability bili 009/11/30 Probability ad ML, Shou de Li 1 X
13 Bayesia Curve Fittig y g ( *, (, ( X W P W X T P T X W P Likelihood probability (we have already doe: β N N l( (l }, ( {, l( 1 π β β β + = = N t W x y p(t X,W Prior: Assumig idepedet of X, ad is Gaussia with mea 0 ad variace = 1/α / w w M T e X W p 1 ( ( α π α + = The the log probability of posterior will be proportio to π 009/11/30 Probability ad ML, Shou de Li 13 w w M N t W x y T N l( (l 1 l( (l }, ( { 1 α π α π β β =
14 Maximum Posterior Estimatio (MAP N β { y( x = 1 N M + 1 α, W t} + (lβ l(π + (lα l(π The best parameter set should maximize posterior probability istead of the likelihood probability. The MAP solutio for the Gaussia oise ad Gaussia Prior is to fid a W that miimize N β = 1 { y ( x, W t } + α Maximizig the posterior distributio is equivalet to miimizig the regularized sum ofsquares error fuctio with the regularizatio parameter λ=α/β 009/11/30 Probability ad ML, Shou de Li 14 w T w w T w
15 What we have discussed so far 1. Learig Phrase (MLE or MAP: Fidig W ML that maximizes the likelihood fuctio p(t X,W Fidig W that miimize the square error of loss fuctio, or Fidig W MAP that maximizes the posterior fuctio P(W T,X Fidig W that miimize the regularized sum of squares loss fuctio. Iferece Phrase: Whe a ew x comes i, usig the determied W to predict the output y 009/11/30 Probability ad ML, Shou de Li 15
16 Potetial Issues The problem of MLE: overfittig i The problem of MAP: lose iformatio P(W X,T P(W X,T P(W X,T MAP W MAP W MAP W Sice i MAP we have leared P(W X,T, why ot usig total probability theory p( t where x, X, T = p ( t w p( t x, W * p( W x, w = N ( t y ( x, W, β X, T dw 009/11/30 Probability ad ML, Shou de Li 16 11
17 The predictive distributio of t p( t x, X, T = p( t x, W * p( W X, T dw where p( t w x, w = N ( t y( x, W, β It ca be proved that whe the posterior ad p(t x,w are Gaussia, the the predictive distributio p(t x,x,t is also Gaussia with mea m(x ad variace s (x 1 009/11/30 Probability ad ML, Shou de Li 17
18 Example of predictive distributio Gree: true fuctio. Red lie: mea of the predicted dfuctio. Red zoe: oe variace from mea. 009/11/30 Probability ad ML, Shou de Li 18
19 Y(x,w from samplig posterior distributios over w 009/11/30 Probability ad ML, Shou de Li 19
20 The beefit of Statistical Learig Because it ca ot oly produce the output, but the distributio of the outputs. The distributio tells us more about the data, icludig how cofidet the system has about its predictio. It ca be used to geerate the dataset. 009/11/30 0
21 We have talked about Regressio, so how about Classificatio? 009/11/30 1
22 Two Classificatio Strategies Strategy 1: two stage methods Classificatio ca be broke dow ito two stages Iferece stage: for each C k, usig its ow traiig data to lear a model for p(c k X Decisio stage: Use p(c k X ad the loss matrix to make optimal class assigmet Strategy : Oe shot methods (or Discrimiat model Usig all traiig data to lear a fuctio that directly maps iputs x ito the output class 009/11/30
23 Two Models for Strategy 1 (1/ Model 1: Geerative Model First solve the iferece problem of determiig p(x C k for each class C k idividually. Separately ifer the prior class probabilities p(c k. Use Bayes theorem to fid the posterior class probabilities p(c k x p( x Ck p( Ck p( Ck x = p( x ote that the deomiator ca be geerated as p(x=σ p(x C k p(c k Fially use p(c k x ad decisio theory to fid the best k class assigmet. This is called geerative model sice we ca lear p(x ad p(c k,x 009/11/30 3
24 Two Approaches for Strategy 1 (/ Model l: Discrimiative i i Model Directly lear p(c k x from data ( kow othig about p(x C k, ad p(x Logistic regressio is a typical example. 009/11/30 4
25 Classificatio Models Geerative Model: learig P(C k X usig Bayes Rule First solve the iferece problem of determiig p(x C k ad p(c k for each class C k idividually. Use Bayes rule to fid the posterior class probabilities bilii p(c k x Discrimiative Model: learig P(C k X directly from data The apply decisio theory to decide which C is the best assigmet for x (e.g. Logistic Regressio Discrimiat Model: Lear a fuctiothatthat directly maps iputs x ito the output class Liear discrimiat fuctio: learig liear fuctios to separate the classes Least Squares Fisher s s liear discrimiat Perceptro Algorithm 009/11/30 5
26 Geerative vs. Discrimiative Model Geerative model dl Pros: P(x ca be used to geerate samples of iputs, which is useful for kowledge discovery & data miig (e.g. outlier detectio ad ovelty detectio. Cos: very demadig sice it has to fid the joit distributio of Ck ad x. Need a lot traiig data. Discrimiative Model Pros: ca be leared with fewer data Cos: caot learthe detail structure of the data 009/11/30 6
27 Geerative vs. Discrimiat Model (1/3 A discrimiat approach lears a discrimiat fuctio ad use it for decisio makig. It does ot lear P(C k x. However, P(C k x is useful i may aspects 1. It ca be combied with the cost fuctio to produce the fial decisio. If the cost fuctio chages, we do t eed to re trai the whole model as a discrimiat model does.. It ca be used to determie the reject regio. P(C HT x= 0.1, P(C PJ x= 0.05 P(C HT x= 0.7, P(C PJ x= /11/30 7
28 Geerative vs. Discrimiat Model (/3 Geerative Model takes care of the class prior P(y explicitly. E.g.: i cacer predictio, oly a small amout of data (e.g. 0.1 %arepositive. A ormal classifier will guess egative ad receive 99.9% 9% accuracy. Usig P(C k x ad P(C k allow us to igore the iferece from the prior durig learig. 009/11/30 8
29 Geerative vs. Discrimiat Model (3/3 Geerative model dlare btt better i terms of combiig several models: Assumig i the previous example, we have two types of iformatio for each photo: The image features (X i The social iformatio (X s It might be more effective ad meaigful to build separate models dlp(c k X i, P(C k X s for these two sets of features. Geerative allows us to combie these models as: P(C k X i,x s p C x, x P( x, x C P( C ( k i s i s k k P( x i C k Naïve bayes assumptio P( x s P( C P( C xi P( C P( C 009/11/30 9 C k k k k k x s
30 Naïve Baye Assumptio p ( Ck x = p( x C p( C Recall i Bayesia Setup, we have k p( x If we assume features of a istace are idepedet give the class (coditioally y idepedet. P( X C = P( X1, X, L X = C i=1= 1 P( X i k C Therefore, we the oly eed to kow P(X i C for each possible pair of a feature value ad class. If C ad all X i are biary, this requires specifyig i oly parameters: P(X i =true tue Ctuea C=true ad P(X i =true tue C=false ase for each X i P(X i =false C = 1 P(X i =true C k Compared to specifyig parameters without ay idepedece assumptios. 30
31 Gaussia Discrimiat Aalysis (GDA This is aother geerative model. GDA assumes p(x y is distributed accordig to a Multivariate Normal Distributio (MND. A MND i dimesios i is parameterized dby a mea vector μ R ad a covariace matrix Σ R x, also writte as N(μ, Σ. Its desity is: 009/11/30 31
32 Examples for D Multivariate Normal Distributio Σ= I Σ= 0.6I Σ= I 009/11/30 3
33 The Model for GDA (1/ p(x y is MND, p(y=0=φ, p(y=1=1 Φ (assumig differet y shares the same Σ The log likelyhood of the data is 009/11/30 33
34 The Model for GDA (/ Usig maximum likelihood estimate (MLE, we ca obtai 009/11/30 34
35 Discussio: GDA vs. Logistic Regressio I GDA, p(y x is of the form 1/(1+exp( θ T x, where θ is a fuctio of ϕ, Σ, μ. This is exactly the form of logistic regressio to model p(y x. That says, if p(x y is multivariate gaussia, the p(y x follows a logistic fuctio. However, the coverse is ot true. This implies that GDA makes stroger modelig assumptios about the data tha LR does. Traiigothesame dataset, these two oalgorithms will produce differet decisio boudaries. If p(x y is ideed Gaussia, the GDA will get better results. That says, if x is some sort of the mea value of somethig whose size is ot small, the based o cetral limit theorem, GDA should perform very well. If p(x y=1 ad p(x y=0 are both Poisso, the P(y x will be logistic. I this case, LR ca work better tha GDA. If we are sure the data is o Gaussia, we should use LR tha GDA 009/11/30 35
10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationOutline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019
Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /
More informationRegression and generalization
Regressio ad geeralizatio CE-717: Machie Learig Sharif Uiversity of Techology M. Soleymai Fall 2016 Curve fittig: probabilistic perspective Describig ucertaity over value of target variable as a probability
More informationClustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.
Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.
More informationNaïve Bayes. Naïve Bayes
Statistical Data Miig ad Machie Learig Hilary Term 206 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.uk/~sejdiov/sdmml : aother plug-i classifier
More informationThe Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model
Back to Maximum Likelihood Give a geerative model f (x, y = k) =π k f k (x) Usig a geerative modellig approach, we assume a parametric form for f k (x) =f (x; k ) ad compute the MLE θ of θ =(π k, k ) k=
More informationExpectation-Maximization Algorithm.
Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More informationOutline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression
REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5
CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio
More informationMixtures of Gaussians and the EM Algorithm
Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity
More informationClassification with linear models
Lecture 8 Classificatio with liear models Milos Hauskrecht milos@cs.pitt.edu 539 Seott Square Geerative approach to classificatio Idea:. Represet ad lear the distributio, ). Use it to defie probabilistic
More information15-780: Graduate Artificial Intelligence. Density estimation
5-780: Graduate Artificial Itelligece Desity estimatio Coditioal Probability Tables (CPT) But where do we get them? P(B)=.05 B P(E)=. E P(A B,E) )=.95 P(A B, E) =.85 P(A B,E) )=.5 P(A B, E) =.05 A P(J
More information6.867 Machine learning
6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples
More informationBoosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32
Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260
More informationIntroduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam
Itroductio to Artificial Itelligece CAP 601 Summer 013 Midterm Exam 1. Termiology (7 Poits). Give the followig task eviromets, eter their properties/characteristics. The properties/characteristics of the
More informationExponential Families and Bayesian Inference
Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More information1 Review of Probability & Statistics
1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5
More informationStatistical Pattern Recognition
Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig
More informationLecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)
Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +
More informationLecture 11 and 12: Basic estimation theory
Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis
More informationMachine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring
Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor
More informationMATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4
MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More informationLecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)
Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell
More informationEECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1
EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationHypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance
Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationIntroductory statistics
CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key
More informationProbability and MLE.
10-701 Probability ad MLE http://www.cs.cmu.edu/~pradeepr/701 (brief) itro to probability Basic otatios Radom variable - referrig to a elemet / evet whose status is ukow: A = it will rai tomorrow Domai
More informationStatistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions
Statistical ad Mathematical Methods DS-GA 00 December 8, 05. Short questios Sample Fial Problems Solutios a. Ax b has a solutio if b is i the rage of A. The dimesio of the rage of A is because A has liearly-idepedet
More informationProblem Set 4 Due Oct, 12
EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationEE 6885 Statistical Pattern Recognition
EE 6885 Statistical Patter Recogitio Fall 5 Prof. Shih-Fu Chag http://www.ee.columbia.edu/~sfchag Lecture 6 (9/8/5 EE6887-Chag 6- Readig EM for Missig Features Textboo, DHS 3.9 Bayesia Parameter Estimatio
More informationIntroduction to Machine Learning DIS10
CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationTopics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion
.87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses
More informationJanuary 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS
Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More information( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2
82 CHAPTER 4. MAXIMUM IKEIHOOD ESTIMATION Defiitio: et X be a radom sample with joit p.m/d.f. f X x θ. The geeralised likelihood ratio test g.l.r.t. of the NH : θ H 0 agaist the alterative AH : θ H 1,
More informationStep 1: Function Set. Otherwise, output C 2. Function set: Including all different w and b
Logistic Regressio Step : Fuctio Set We wat to fid P w,b C x σ z = + exp z If P w,b C x.5, output C Otherwise, output C 2 z P w,b C x = σ z z = w x + b = w i x i + b i z Fuctio set: f w,b x = P w,b C x
More informationStat 421-SP2012 Interval Estimation Section
Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible
More informationINF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification
INF 4300 90 Itroductio to classifictio Ae Solberg ae@ifiuioo Based o Chapter -6 i Duda ad Hart: atter Classificatio 90 INF 4300 Madator proect Mai task: classificatio You must implemet a classificatio
More informationBayesian Methods: Introduction to Multi-parameter Models
Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested
More informationChapter 6 Principles of Data Reduction
Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a
More informationFactor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis
Lecture 10: Factor Aalysis ad Pricipal Compoet Aalysis Sam Roweis February 9, 2004 Whe we assume that the subspace is liear ad that the uderlyig latet variable has a Gaussia distributio we get a model
More informationTAMS24: Notations and Formulas
TAMS4: Notatios ad Formulas Basic otatios ad defiitios X: radom variable stokastiska variabel Mea Vätevärde: µ = X = by Xiagfeg Yag kpx k, if X is discrete, xf Xxdx, if X is cotiuous Variace Varias: =
More informationStatistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.
Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized
More informationProbabilistic Unsupervised Learning
HT2015: SC4 Statistical Data Miig ad Machie Learig Dio Sejdiovic Departmet of Statistics Oxford http://www.stats.ox.ac.u/~sejdiov/sdmml.html Probabilistic Methods Algorithmic approach: Data Probabilistic
More informationLecture 7: Density Estimation: k-nearest Neighbor and Basis Approach
STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.
More informationSupport vector machine revisited
6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector
More informationECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization
ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where
More informationA quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population
A quick activity - Cetral Limit Theorem ad Proportios Lecture 21: Testig Proportios Statistics 10 Coli Rudel Flip a coi 30 times this is goig to get loud! Record the umber of heads you obtaied ad calculate
More informationMachine Learning.
10-701 Machie Learig http://www.cs.cmu.edu/~epxig/class/10701-15f/ Orgaizatioal ifo All up-to-date ifo is o the course web page (follow liks from my page). Istructors - Eric Xig - Ziv Bar-Joseph TAs: See
More informationCSE 527, Additional notes on MLE & EM
CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be
More informationSTATS 200: Introduction to Statistical Inference. Lecture 1: Course introduction and polling
STATS 200: Itroductio to Statistical Iferece Lecture 1: Course itroductio ad pollig U.S. presidetial electio projectios by state (Source: fivethirtyeight.com, 25 September 2016) Pollig Let s try to uderstad
More informationAlgorithms for Clustering
CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationThis exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.
Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the
More informationChapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian
Chapter 2 EM algorithms The Expectatio-Maximizatio (EM) algorithm is a maximum likelihood method for models that have hidde variables eg. Gaussia Mixture Models (GMMs), Liear Dyamic Systems (LDSs) ad Hidde
More informationMachine Learning. Logistic Regression -- generative verses discriminative classifier. Le Song /15-781, Spring 2008
Machie Learig 070/578 Srig 008 Logistic Regressio geerative verses discrimiative classifier Le Sog Lecture 5 Setember 4 0 Based o slides from Eric Xig CMU Readig: Cha. 3..34 CB Geerative vs. Discrimiative
More informationAsymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values
of the secod half Biostatistics 6 - Statistical Iferece Lecture 6 Fial Exam & Practice Problems for the Fial Hyu Mi Kag Apil 3rd, 3 Hyu Mi Kag Biostatistics 6 - Lecture 6 Apil 3rd, 3 / 3 Rao-Blackwell
More informationSolution of Final Exam : / Machine Learning
Solutio of Fial Exam : 10-701/15-781 Machie Learig Fall 2004 Dec. 12th 2004 Your Adrew ID i capital letters: Your full ame: There are 9 questios. Some of them are easy ad some are more difficult. So, if
More informationCS284A: Representations and Algorithms in Molecular Biology
CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by
More informationClustering: Mixture Models
Clusterig: Mixture Models Machie Learig 10-601B Seyoug Kim May of these slides are derived from Tom Mitchell, Ziv- Bar Joseph, ad Eric Xig. Thaks! Problem with K- meas Hard Assigmet of Samples ito Three
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationAxis Aligned Ellipsoid
Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple
More informationOutline. L7: Probability Basics. Probability. Probability Theory. Bayes Law for Diagnosis. Which Hypothesis To Prefer? p(a,b) = p(b A) " p(a)
Outlie L7: Probability Basics CS 344R/393R: Robotics Bejami Kuipers. Bayes Law 2. Probability distributios 3. Decisios uder ucertaity Probability For a propositio A, the probability p(a is your degree
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber
More informationLecture 15: Learning Theory: Concentration Inequalities
STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that
More informationAgnostic Learning and Concentration Inequalities
ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture
More informationIf, for instance, we were required to test whether the population mean μ could be equal to a certain value μ
STATISTICAL INFERENCE INTRODUCTION Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I oesample testig, we essetially
More informationGoodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)
Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................
More informationProperties and Tests of Zeros of Polynomial Functions
Properties ad Tests of Zeros of Polyomial Fuctios The Remaider ad Factor Theorems: Sythetic divisio ca be used to fid the values of polyomials i a sometimes easier way tha substitutio. This is show by
More informationClass 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700
Class 23 Daiel B. Rowe, Ph.D. Departmet of Mathematics, Statistics, ad Computer Sciece Copyright 2017 by D.B. Rowe 1 Ageda: Recap Chapter 9.1 Lecture Chapter 9.2 Review Exam 6 Problem Solvig Sessio. 2
More information6.867 Machine learning, lecture 7 (Jaakkola) 1
6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit
More informationLecture 3: MLE and Regression
STAT/Q SCI 403: Itroductio to Resamplig Methods Sprig 207 Istructor: Ye-Chi Che Lecture 3: MLE ad Regressio 3. Parameters ad Distributios Some distributios are idexed by their uderlyig parameters. Thus,
More informationENGI 4421 Confidence Intervals (Two Samples) Page 12-01
ENGI 44 Cofidece Itervals (Two Samples) Page -0 Two Sample Cofidece Iterval for a Differece i Populatio Meas [Navidi sectios 5.4-5.7; Devore chapter 9] From the cetral limit theorem, we kow that, for sufficietly
More informationStatistical Inference Based on Extremum Estimators
T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0
More informationLecture 3. Properties of Summary Statistics: Sampling Distribution
Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More informationChapter 6 Sampling Distributions
Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to
More informationLet us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.
Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,
More informationSimulation. Two Rule For Inverting A Distribution Function
Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump
More informationTests of Hypotheses Based on a Single Sample (Devore Chapter Eight)
Tests of Hypotheses Based o a Sigle Sample Devore Chapter Eight MATH-252-01: Probability ad Statistics II Sprig 2018 Cotets 1 Hypothesis Tests illustrated with z-tests 1 1.1 Overview of Hypothesis Testig..........
More informationEconomics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator
Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters
More informationThe Expectation-Maximization (EM) Algorithm
The Expectatio-Maximizatio (EM) Algorithm Readig Assigmets T. Mitchell, Machie Learig, McGraw-Hill, 997 (sectio 6.2, hard copy). S. Gog et al. Dyamic Visio: From Images to Face Recogitio, Imperial College
More informationFinal Examination Solutions 17/6/2010
The Islamic Uiversity of Gaza Faculty of Commerce epartmet of Ecoomics ad Political Scieces A Itroductio to Statistics Course (ECOE 30) Sprig Semester 009-00 Fial Eamiatio Solutios 7/6/00 Name: I: Istructor:
More information1 Inferential Methods for Correlation and Regression Analysis
1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet
More informationMachine Learning, Spring 2011: Homework 1 Solution
10-701 Machie Learig, Sprig 011: Homework 1 Solutio February 1, 011 Istructios There are 3 questios o this assigmet. The last questio ivolves codig. Attach your code to the writeup. Please submit your
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More informationEstimation for Complete Data
Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of
More informationLecture 9: Boosting. Akshay Krishnamurthy October 3, 2017
Lecture 9: Boostig Akshay Krishamurthy akshay@csumassedu October 3, 07 Recap Last week we discussed some algorithmic aspects of machie learig We saw oe very powerful family of learig algorithms, amely
More informationSTATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:
Recall: STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Commets:. So far we have estimates of the parameters! 0 ad!, but have o idea how good these estimates are. Assumptio: E(Y x)! 0 +! x (liear coditioal
More informationPSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9
Hypothesis testig PSYCHOLOGICAL RESEARCH (PYC 34-C Lecture 9 Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More information