CSIE/GINM, NTU 2009/11/30 1

Size: px
Start display at page:

Download "CSIE/GINM, NTU 2009/11/30 1"

Transcription

1 Itroductio ti to Machie Learig (Part (at1: Statistical Machie Learig Shou de Li CSIE/GINM, NTU 009/11/30 1

2 Syllabus of a Itro ML course ( Machie Learig, Adrew Ng, Staford, Autum 009 Supervised learig. (7 classes Supervised learig setup. LMS. Logistic regressio. Perceptro. Expoetial family. Geerative learig algorithms. Gaussia discrimiati i aalysis. Ni Naive Bayes. Support vector machies. Model selectio ad feature selectio. Esemble methods: Baggig, boostig, ECOC. Evaluatig ad debuggig learig algorithms. Learig theory. (3 classes Bias/variace tradeoff. Uio ad Cheroff/Hoeffdig bouds. VC dimesio. Worst case (olie learig. Practical advice o how to use learig algorithms. Usupervised learig. (5 classes Clusterig. K meas. EM. Mixture of Gaussias. Factor aalysis. PCA. MDS. ppca. Idepedet compoets aalysis (ICA. Reiforcemet learig ad cotrol. (4 classes MDPs.Bellma equatios. Value iteratio ad policy iteratio. Liear quadratic regulatio (LQR. LQG. Q learig. Value fuctio approximatio. Policy search. Reiforce. POMDPs. HT has doe a great job teachig you Advaced SL ad Learig 009/11/30 Theory, ad my missio is to fill oe missig piece i the puzzle.

3 Why teachig Itro to ML? Whe revealig that t you have take a ML course, people would more or less expect you to have already kow somethig, E.g. Eg Naïve Bayes. Thereare some ML methods that are so commoly applied i research ad real world that you will eed to kow a little bit about them. E.g. K meas clusterig There are some ML method that are too ubelievable ad amazig to igore. E.g. EM framework. 009/11/30 3

4 To Brig you Back to the Earth Statistical Machie Learig. ( hours A Bayesia view about ML Geerative learig model. Gaussia discrimiat aalysis. Naïve Bayes Usupervised dlearig. (3 hours Clusterig: K meas. EM. Reiforcemet learig (0.5 hour Value iteratio ad policy iteratio. Q learig & SARSA 009/11/30 4

5 Theoretical ML vs. Statistical ML What you have kow: SL takes may (x,t as iputs to trai a learer f(x, the apply it to usee x k ad predict it as f(x k For example (X is 3 dimesioal: Traiig { ([1,,3], 0.1, ([,3,4],0., ([3,4,5], 0.5 } Testig: [,4,5] 0.7 However, ucertaity exist i the real world, therefore a error distributio (e.g. Gaussia is usually added: t=f(x+error. That says, it is possible to geerate differet results for same iputs, for example: Traiig {([1,,3],0.1, ([1,,3],0.,([1,,3],0.1 } Testig: [13]=? [1,,3]=? 009/11/30 Probability ad ML, Shou de Li 5

6 The Probabilistic Form of t The output t is a distributio caused by the error (assumig Gaussia term: p(t x,w,β= N(t y(x,w, β 1, β is called a precisio parameter which equals theiverse of the variace 1/σ. 009/11/30 Probability ad ML, Shou de Li 6

7 The SL process uder probability Give traiig data {X,T}, we wat to determie the ukow parameter W ad β so we will kow the distributio of y. Assumig we observed N data poits, the p(t X,W, β N = Ν ( t = 1 = y ( x p(t x,w, β * p(t 1 1, W, β 1 N β l( l( p(t X,W, β = { y ( x likelihood = 1 x, W this is called log - likelihood fuctio,w, β...* p(t t fuctio } + N N x N,W, β (l β l( l(π, 009/11/30 Probability ad ML, Shou de Li 7

8 Maximum Likelihood Estimatio (MLE Idea: tryig to adjust the ukow parameters (i.e. W ad β to maximize the likelihood fuctio or log likelihood fuctio N β l( p(t X,W, β = { y( x = 1, W t } + N (l β l(π Adjustig W to maximizig this log likelihood fuctio give Gaussia error fuctio is equivalet to fidig a W ML that miimizig the mea square error fuctio 009/11/30 Probability ad ML, Shou de Li 8

9 Maximum Likelihood Estimatio for β l( First, we calculate W ML that govers the mea of the distributio. The we use W ML i the likelihood fuctio to determie theoptimal β determie the optimal β ML β p(t X,W N ML, β 1 N = { y ( x, W ML t } + = β β 1 = 1 N N = 1 { y( x, W ML = 1 t } 0 009/11/30 Probability ad ML, Shou de Li 9

10 A SL system usig MLE 1. We first determie W as W ML that miimizes the error fuctio 1 N { (, } y x w t Ted to overfit =1 N. Usig W ML to fid β as 1 1 = β { y ( x, W ML t} N = 1 3. Predictio stage: Usig W ML ad β to costruct the distributio of t: p(t x,w,β= W N(t y(x,w N(t y(xw 1 ML, β ML 4. Predict the value of a iput x by samplig t usig the distributio i (3 The MLE approach cosistetly uderestimate the variace of the data ad ca lead to overfittig 009/11/30 Probability ad ML, Shou de Li 10

11 Bayesia Approach for Regressio Why Bayesia Approach: some w s are preferable tha others For example, the regularizatio prefers simple model (i.e. smallw s. ws. Cosequetly, p(w caot be treated as uiformly distributed 009/11/30 Probability ad ML, Shou de Li 11

12 P ( W T Bayes Rule Review = P( W X, T P ( T W * P ( W P( T = P( T X, W * P( W P ( T X P( W X, T P( T X, W * P( W X P(W X: prior probability P(Tl XW:Likelihood X,W: probability (what MLE tries to optimize, argmax w P(T X,W P(W X,T : posterior probability bili 009/11/30 Probability ad ML, Shou de Li 1 X

13 Bayesia Curve Fittig y g ( *, (, ( X W P W X T P T X W P Likelihood probability (we have already doe: β N N l( (l }, ( {, l( 1 π β β β + = = N t W x y p(t X,W Prior: Assumig idepedet of X, ad is Gaussia with mea 0 ad variace = 1/α / w w M T e X W p 1 ( ( α π α + = The the log probability of posterior will be proportio to π 009/11/30 Probability ad ML, Shou de Li 13 w w M N t W x y T N l( (l 1 l( (l }, ( { 1 α π α π β β =

14 Maximum Posterior Estimatio (MAP N β { y( x = 1 N M + 1 α, W t} + (lβ l(π + (lα l(π The best parameter set should maximize posterior probability istead of the likelihood probability. The MAP solutio for the Gaussia oise ad Gaussia Prior is to fid a W that miimize N β = 1 { y ( x, W t } + α Maximizig the posterior distributio is equivalet to miimizig the regularized sum ofsquares error fuctio with the regularizatio parameter λ=α/β 009/11/30 Probability ad ML, Shou de Li 14 w T w w T w

15 What we have discussed so far 1. Learig Phrase (MLE or MAP: Fidig W ML that maximizes the likelihood fuctio p(t X,W Fidig W that miimize the square error of loss fuctio, or Fidig W MAP that maximizes the posterior fuctio P(W T,X Fidig W that miimize the regularized sum of squares loss fuctio. Iferece Phrase: Whe a ew x comes i, usig the determied W to predict the output y 009/11/30 Probability ad ML, Shou de Li 15

16 Potetial Issues The problem of MLE: overfittig i The problem of MAP: lose iformatio P(W X,T P(W X,T P(W X,T MAP W MAP W MAP W Sice i MAP we have leared P(W X,T, why ot usig total probability theory p( t where x, X, T = p ( t w p( t x, W * p( W x, w = N ( t y ( x, W, β X, T dw 009/11/30 Probability ad ML, Shou de Li 16 11

17 The predictive distributio of t p( t x, X, T = p( t x, W * p( W X, T dw where p( t w x, w = N ( t y( x, W, β It ca be proved that whe the posterior ad p(t x,w are Gaussia, the the predictive distributio p(t x,x,t is also Gaussia with mea m(x ad variace s (x 1 009/11/30 Probability ad ML, Shou de Li 17

18 Example of predictive distributio Gree: true fuctio. Red lie: mea of the predicted dfuctio. Red zoe: oe variace from mea. 009/11/30 Probability ad ML, Shou de Li 18

19 Y(x,w from samplig posterior distributios over w 009/11/30 Probability ad ML, Shou de Li 19

20 The beefit of Statistical Learig Because it ca ot oly produce the output, but the distributio of the outputs. The distributio tells us more about the data, icludig how cofidet the system has about its predictio. It ca be used to geerate the dataset. 009/11/30 0

21 We have talked about Regressio, so how about Classificatio? 009/11/30 1

22 Two Classificatio Strategies Strategy 1: two stage methods Classificatio ca be broke dow ito two stages Iferece stage: for each C k, usig its ow traiig data to lear a model for p(c k X Decisio stage: Use p(c k X ad the loss matrix to make optimal class assigmet Strategy : Oe shot methods (or Discrimiat model Usig all traiig data to lear a fuctio that directly maps iputs x ito the output class 009/11/30

23 Two Models for Strategy 1 (1/ Model 1: Geerative Model First solve the iferece problem of determiig p(x C k for each class C k idividually. Separately ifer the prior class probabilities p(c k. Use Bayes theorem to fid the posterior class probabilities p(c k x p( x Ck p( Ck p( Ck x = p( x ote that the deomiator ca be geerated as p(x=σ p(x C k p(c k Fially use p(c k x ad decisio theory to fid the best k class assigmet. This is called geerative model sice we ca lear p(x ad p(c k,x 009/11/30 3

24 Two Approaches for Strategy 1 (/ Model l: Discrimiative i i Model Directly lear p(c k x from data ( kow othig about p(x C k, ad p(x Logistic regressio is a typical example. 009/11/30 4

25 Classificatio Models Geerative Model: learig P(C k X usig Bayes Rule First solve the iferece problem of determiig p(x C k ad p(c k for each class C k idividually. Use Bayes rule to fid the posterior class probabilities bilii p(c k x Discrimiative Model: learig P(C k X directly from data The apply decisio theory to decide which C is the best assigmet for x (e.g. Logistic Regressio Discrimiat Model: Lear a fuctiothatthat directly maps iputs x ito the output class Liear discrimiat fuctio: learig liear fuctios to separate the classes Least Squares Fisher s s liear discrimiat Perceptro Algorithm 009/11/30 5

26 Geerative vs. Discrimiative Model Geerative model dl Pros: P(x ca be used to geerate samples of iputs, which is useful for kowledge discovery & data miig (e.g. outlier detectio ad ovelty detectio. Cos: very demadig sice it has to fid the joit distributio of Ck ad x. Need a lot traiig data. Discrimiative Model Pros: ca be leared with fewer data Cos: caot learthe detail structure of the data 009/11/30 6

27 Geerative vs. Discrimiat Model (1/3 A discrimiat approach lears a discrimiat fuctio ad use it for decisio makig. It does ot lear P(C k x. However, P(C k x is useful i may aspects 1. It ca be combied with the cost fuctio to produce the fial decisio. If the cost fuctio chages, we do t eed to re trai the whole model as a discrimiat model does.. It ca be used to determie the reject regio. P(C HT x= 0.1, P(C PJ x= 0.05 P(C HT x= 0.7, P(C PJ x= /11/30 7

28 Geerative vs. Discrimiat Model (/3 Geerative Model takes care of the class prior P(y explicitly. E.g.: i cacer predictio, oly a small amout of data (e.g. 0.1 %arepositive. A ormal classifier will guess egative ad receive 99.9% 9% accuracy. Usig P(C k x ad P(C k allow us to igore the iferece from the prior durig learig. 009/11/30 8

29 Geerative vs. Discrimiat Model (3/3 Geerative model dlare btt better i terms of combiig several models: Assumig i the previous example, we have two types of iformatio for each photo: The image features (X i The social iformatio (X s It might be more effective ad meaigful to build separate models dlp(c k X i, P(C k X s for these two sets of features. Geerative allows us to combie these models as: P(C k X i,x s p C x, x P( x, x C P( C ( k i s i s k k P( x i C k Naïve bayes assumptio P( x s P( C P( C xi P( C P( C 009/11/30 9 C k k k k k x s

30 Naïve Baye Assumptio p ( Ck x = p( x C p( C Recall i Bayesia Setup, we have k p( x If we assume features of a istace are idepedet give the class (coditioally y idepedet. P( X C = P( X1, X, L X = C i=1= 1 P( X i k C Therefore, we the oly eed to kow P(X i C for each possible pair of a feature value ad class. If C ad all X i are biary, this requires specifyig i oly parameters: P(X i =true tue Ctuea C=true ad P(X i =true tue C=false ase for each X i P(X i =false C = 1 P(X i =true C k Compared to specifyig parameters without ay idepedece assumptios. 30

31 Gaussia Discrimiat Aalysis (GDA This is aother geerative model. GDA assumes p(x y is distributed accordig to a Multivariate Normal Distributio (MND. A MND i dimesios i is parameterized dby a mea vector μ R ad a covariace matrix Σ R x, also writte as N(μ, Σ. Its desity is: 009/11/30 31

32 Examples for D Multivariate Normal Distributio Σ= I Σ= 0.6I Σ= I 009/11/30 3

33 The Model for GDA (1/ p(x y is MND, p(y=0=φ, p(y=1=1 Φ (assumig differet y shares the same Σ The log likelyhood of the data is 009/11/30 33

34 The Model for GDA (/ Usig maximum likelihood estimate (MLE, we ca obtai 009/11/30 34

35 Discussio: GDA vs. Logistic Regressio I GDA, p(y x is of the form 1/(1+exp( θ T x, where θ is a fuctio of ϕ, Σ, μ. This is exactly the form of logistic regressio to model p(y x. That says, if p(x y is multivariate gaussia, the p(y x follows a logistic fuctio. However, the coverse is ot true. This implies that GDA makes stroger modelig assumptios about the data tha LR does. Traiigothesame dataset, these two oalgorithms will produce differet decisio boudaries. If p(x y is ideed Gaussia, the GDA will get better results. That says, if x is some sort of the mea value of somethig whose size is ot small, the based o cetral limit theorem, GDA should perform very well. If p(x y=1 ad p(x y=0 are both Poisso, the P(y x will be logistic. I this case, LR ca work better tha GDA. If we are sure the data is o Gaussia, we should use LR tha GDA 009/11/30 35

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019 Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /

More information

Regression and generalization

Regression and generalization Regressio ad geeralizatio CE-717: Machie Learig Sharif Uiversity of Techology M. Soleymai Fall 2016 Curve fittig: probabilistic perspective Describig ucertaity over value of target variable as a probability

More information

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar. Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.

More information

Naïve Bayes. Naïve Bayes

Naïve Bayes. Naïve Bayes Statistical Data Miig ad Machie Learig Hilary Term 206 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.uk/~sejdiov/sdmml : aother plug-i classifier

More information

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model Back to Maximum Likelihood Give a geerative model f (x, y = k) =π k f k (x) Usig a geerative modellig approach, we assume a parametric form for f k (x) =f (x; k ) ad compute the MLE θ of θ =(π k, k ) k=

More information

Expectation-Maximization Algorithm.

Expectation-Maximization Algorithm. Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Mixtures of Gaussians and the EM Algorithm

Mixtures of Gaussians and the EM Algorithm Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity

More information

Classification with linear models

Classification with linear models Lecture 8 Classificatio with liear models Milos Hauskrecht milos@cs.pitt.edu 539 Seott Square Geerative approach to classificatio Idea:. Represet ad lear the distributio, ). Use it to defie probabilistic

More information

15-780: Graduate Artificial Intelligence. Density estimation

15-780: Graduate Artificial Intelligence. Density estimation 5-780: Graduate Artificial Itelligece Desity estimatio Coditioal Probability Tables (CPT) But where do we get them? P(B)=.05 B P(E)=. E P(A B,E) )=.95 P(A B, E) =.85 P(A B,E) )=.5 P(A B, E) =.05 A P(J

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32 Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260

More information

Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam

Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam Itroductio to Artificial Itelligece CAP 601 Summer 013 Midterm Exam 1. Termiology (7 Poits). Give the followig task eviromets, eter their properties/characteristics. The properties/characteristics of the

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +

More information

Lecture 11 and 12: Basic estimation theory

Lecture 11 and 12: Basic estimation theory Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis

More information

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead) Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Introductory statistics

Introductory statistics CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key

More information

Probability and MLE.

Probability and MLE. 10-701 Probability ad MLE http://www.cs.cmu.edu/~pradeepr/701 (brief) itro to probability Basic otatios Radom variable - referrig to a elemet / evet whose status is ukow: A = it will rai tomorrow Domai

More information

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions Statistical ad Mathematical Methods DS-GA 00 December 8, 05. Short questios Sample Fial Problems Solutios a. Ax b has a solutio if b is i the rage of A. The dimesio of the rage of A is because A has liearly-idepedet

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

EE 6885 Statistical Pattern Recognition

EE 6885 Statistical Pattern Recognition EE 6885 Statistical Patter Recogitio Fall 5 Prof. Shih-Fu Chag http://www.ee.columbia.edu/~sfchag Lecture 6 (9/8/5 EE6887-Chag 6- Readig EM for Missig Features Textboo, DHS 3.9 Bayesia Parameter Estimatio

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion .87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses

More information

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2 82 CHAPTER 4. MAXIMUM IKEIHOOD ESTIMATION Defiitio: et X be a radom sample with joit p.m/d.f. f X x θ. The geeralised likelihood ratio test g.l.r.t. of the NH : θ H 0 agaist the alterative AH : θ H 1,

More information

Step 1: Function Set. Otherwise, output C 2. Function set: Including all different w and b

Step 1: Function Set. Otherwise, output C 2. Function set: Including all different w and b Logistic Regressio Step : Fuctio Set We wat to fid P w,b C x σ z = + exp z If P w,b C x.5, output C Otherwise, output C 2 z P w,b C x = σ z z = w x + b = w i x i + b i z Fuctio set: f w,b x = P w,b C x

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification INF 4300 90 Itroductio to classifictio Ae Solberg ae@ifiuioo Based o Chapter -6 i Duda ad Hart: atter Classificatio 90 INF 4300 Madator proect Mai task: classificatio You must implemet a classificatio

More information

Bayesian Methods: Introduction to Multi-parameter Models

Bayesian Methods: Introduction to Multi-parameter Models Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis Lecture 10: Factor Aalysis ad Pricipal Compoet Aalysis Sam Roweis February 9, 2004 Whe we assume that the subspace is liear ad that the uderlyig latet variable has a Gaussia distributio we get a model

More information

TAMS24: Notations and Formulas

TAMS24: Notations and Formulas TAMS4: Notatios ad Formulas Basic otatios ad defiitios X: radom variable stokastiska variabel Mea Vätevärde: µ = X = by Xiagfeg Yag kpx k, if X is discrete, xf Xxdx, if X is cotiuous Variace Varias: =

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Probabilistic Unsupervised Learning

Probabilistic Unsupervised Learning HT2015: SC4 Statistical Data Miig ad Machie Learig Dio Sejdiovic Departmet of Statistics Oxford http://www.stats.ox.ac.u/~sejdiov/sdmml.html Probabilistic Methods Algorithmic approach: Data Probabilistic

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population A quick activity - Cetral Limit Theorem ad Proportios Lecture 21: Testig Proportios Statistics 10 Coli Rudel Flip a coi 30 times this is goig to get loud! Record the umber of heads you obtaied ad calculate

More information

Machine Learning.

Machine Learning. 10-701 Machie Learig http://www.cs.cmu.edu/~epxig/class/10701-15f/ Orgaizatioal ifo All up-to-date ifo is o the course web page (follow liks from my page). Istructors - Eric Xig - Ziv Bar-Joseph TAs: See

More information

CSE 527, Additional notes on MLE & EM

CSE 527, Additional notes on MLE & EM CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be

More information

STATS 200: Introduction to Statistical Inference. Lecture 1: Course introduction and polling

STATS 200: Introduction to Statistical Inference. Lecture 1: Course introduction and polling STATS 200: Itroductio to Statistical Iferece Lecture 1: Course itroductio ad pollig U.S. presidetial electio projectios by state (Source: fivethirtyeight.com, 25 September 2016) Pollig Let s try to uderstad

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam. Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the

More information

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian Chapter 2 EM algorithms The Expectatio-Maximizatio (EM) algorithm is a maximum likelihood method for models that have hidde variables eg. Gaussia Mixture Models (GMMs), Liear Dyamic Systems (LDSs) ad Hidde

More information

Machine Learning. Logistic Regression -- generative verses discriminative classifier. Le Song /15-781, Spring 2008

Machine Learning. Logistic Regression -- generative verses discriminative classifier. Le Song /15-781, Spring 2008 Machie Learig 070/578 Srig 008 Logistic Regressio geerative verses discrimiative classifier Le Sog Lecture 5 Setember 4 0 Based o slides from Eric Xig CMU Readig: Cha. 3..34 CB Geerative vs. Discrimiative

More information

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values of the secod half Biostatistics 6 - Statistical Iferece Lecture 6 Fial Exam & Practice Problems for the Fial Hyu Mi Kag Apil 3rd, 3 Hyu Mi Kag Biostatistics 6 - Lecture 6 Apil 3rd, 3 / 3 Rao-Blackwell

More information

Solution of Final Exam : / Machine Learning

Solution of Final Exam : / Machine Learning Solutio of Fial Exam : 10-701/15-781 Machie Learig Fall 2004 Dec. 12th 2004 Your Adrew ID i capital letters: Your full ame: There are 9 questios. Some of them are easy ad some are more difficult. So, if

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

Clustering: Mixture Models

Clustering: Mixture Models Clusterig: Mixture Models Machie Learig 10-601B Seyoug Kim May of these slides are derived from Tom Mitchell, Ziv- Bar Joseph, ad Eric Xig. Thaks! Problem with K- meas Hard Assigmet of Samples ito Three

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Axis Aligned Ellipsoid

Axis Aligned Ellipsoid Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple

More information

Outline. L7: Probability Basics. Probability. Probability Theory. Bayes Law for Diagnosis. Which Hypothesis To Prefer? p(a,b) = p(b A) " p(a)

Outline. L7: Probability Basics. Probability. Probability Theory. Bayes Law for Diagnosis. Which Hypothesis To Prefer? p(a,b) = p(b A)  p(a) Outlie L7: Probability Basics CS 344R/393R: Robotics Bejami Kuipers. Bayes Law 2. Probability distributios 3. Decisios uder ucertaity Probability For a propositio A, the probability p(a is your degree

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber

More information

Lecture 15: Learning Theory: Concentration Inequalities

Lecture 15: Learning Theory: Concentration Inequalities STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that

More information

Agnostic Learning and Concentration Inequalities

Agnostic Learning and Concentration Inequalities ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture

More information

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ STATISTICAL INFERENCE INTRODUCTION Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I oesample testig, we essetially

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

Properties and Tests of Zeros of Polynomial Functions

Properties and Tests of Zeros of Polynomial Functions Properties ad Tests of Zeros of Polyomial Fuctios The Remaider ad Factor Theorems: Sythetic divisio ca be used to fid the values of polyomials i a sometimes easier way tha substitutio. This is show by

More information

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700 Class 23 Daiel B. Rowe, Ph.D. Departmet of Mathematics, Statistics, ad Computer Sciece Copyright 2017 by D.B. Rowe 1 Ageda: Recap Chapter 9.1 Lecture Chapter 9.2 Review Exam 6 Problem Solvig Sessio. 2

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

Lecture 3: MLE and Regression

Lecture 3: MLE and Regression STAT/Q SCI 403: Itroductio to Resamplig Methods Sprig 207 Istructor: Ye-Chi Che Lecture 3: MLE ad Regressio 3. Parameters ad Distributios Some distributios are idexed by their uderlyig parameters. Thus,

More information

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01 ENGI 44 Cofidece Itervals (Two Samples) Page -0 Two Sample Cofidece Iterval for a Differece i Populatio Meas [Navidi sectios 5.4-5.7; Devore chapter 9] From the cetral limit theorem, we kow that, for sufficietly

More information

Statistical Inference Based on Extremum Estimators

Statistical Inference Based on Extremum Estimators T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0

More information

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Lecture 3. Properties of Summary Statistics: Sampling Distribution Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

Chapter 6 Sampling Distributions

Chapter 6 Sampling Distributions Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

Simulation. Two Rule For Inverting A Distribution Function

Simulation. Two Rule For Inverting A Distribution Function Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump

More information

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight) Tests of Hypotheses Based o a Sigle Sample Devore Chapter Eight MATH-252-01: Probability ad Statistics II Sprig 2018 Cotets 1 Hypothesis Tests illustrated with z-tests 1 1.1 Overview of Hypothesis Testig..........

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

The Expectation-Maximization (EM) Algorithm

The Expectation-Maximization (EM) Algorithm The Expectatio-Maximizatio (EM) Algorithm Readig Assigmets T. Mitchell, Machie Learig, McGraw-Hill, 997 (sectio 6.2, hard copy). S. Gog et al. Dyamic Visio: From Images to Face Recogitio, Imperial College

More information

Final Examination Solutions 17/6/2010

Final Examination Solutions 17/6/2010 The Islamic Uiversity of Gaza Faculty of Commerce epartmet of Ecoomics ad Political Scieces A Itroductio to Statistics Course (ECOE 30) Sprig Semester 009-00 Fial Eamiatio Solutios 7/6/00 Name: I: Istructor:

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Machine Learning, Spring 2011: Homework 1 Solution

Machine Learning, Spring 2011: Homework 1 Solution 10-701 Machie Learig, Sprig 011: Homework 1 Solutio February 1, 011 Istructios There are 3 questios o this assigmet. The last questio ivolves codig. Attach your code to the writeup. Please submit your

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

Lecture 9: Boosting. Akshay Krishnamurthy October 3, 2017

Lecture 9: Boosting. Akshay Krishnamurthy October 3, 2017 Lecture 9: Boostig Akshay Krishamurthy akshay@csumassedu October 3, 07 Recap Last week we discussed some algorithmic aspects of machie learig We saw oe very powerful family of learig algorithms, amely

More information

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments: Recall: STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Commets:. So far we have estimates of the parameters! 0 ad!, but have o idea how good these estimates are. Assumptio: E(Y x)! 0 +! x (liear coditioal

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9 Hypothesis testig PSYCHOLOGICAL RESEARCH (PYC 34-C Lecture 9 Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information