NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)

Size: px
Start display at page:

Download "NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)"

Transcription

1 NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we will be providig cocept checks for review. Each cocept check will: List the lab/lecture learig objectives. You will be resposible for masterig these objectives, ad demostratig mastery through homework assigmets, exams (midterm ad fial), ad o the fial course project. Iclude cocept check questios. These questios are iteded to reiforce the lab/lectures, ad help you master the learig objectives. You are strogly ecourage to complete all cocept check questios, ad to discuss these (ad related) problems o Piazza ad at office hours. However, problems marked with a ( ) are cosidered optioal. Lecture : Itroductio to Statistical Learig Theory Topic : Statistical Learig Theory Learig Objectives. Idetify the iput, actio, ad outcome spaces for a give machie learig problem. 2. Provide a example for which the actio space ad outcome spaces are the same ad oe for which they are differet. 3. Explai the relatioships betwee the decisio fuctio, the loss fuctio, the iput space, the actio space, ad the outcome space. 4. Defie the risk of a decisio fuctio ad a Bayes decisio fuctio. Brett authored these cocept checks for Sprig 207 DS-GA 003, ad the work is almost etirely his. Later (mior) modificatios were made by David Roseberg ad Be Jakubowski.

2 5. Provide example decisio problems for which the Bayes risk is 0 ad the Bayes risk is ozero. 6. Kow the Bayes decisio fuctios for square loss ad multiclass 0/ loss. 7. Defie the empirical risk for a decisio fuctio ad the empirical risk miimizer. 8. Explai what a hypothesis space is, ad how it ca be used with costraied empirical risk miimizatio to cotrol overfittig. Cocept Check Questios. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete joit distributio. Compute a Bayes decisio fuctio whe the loss fuctio l : A Y R is give by l(a, y) = (a y), the 0 loss. Solutio. The Bayes decisio fuctio f satisfies f = arg mi f where (X, Y ) P X Y. Let R(f) = arg mi f E[(f(X) Y )] = arg mi P (f(x) Y ), f f (x) = arg max P (Y = y X = x), y the maximum a posteriori estimate of Y. If there is a tie, we choose ay of the maximizers. If f 2 is aother decisio fuctio we have P (f (X) Y ) = x P (f (x) Y X = x)p (X = x) = x ( P (f (x) = Y X = x))p (X = x) x ( P (f 2(x) = Y X = x))p (X = x) (Def of f ) = x P (f 2(x) Y X = x)p (X = x) = P (f 2 (X) Y ). Thus f = f. 2. ( ) Suppose A = Y = R, X is some other set, ad l : A Y R is give by l(a, y) = (a y) 2, the square error loss. What is the Bayes risk ad how does it compare with the variace of Y? Solutio. From Homework we kow that the Bayes decisio fuctio is give by f (x) = E[Y X = x]. Thus the Bayes risk is give by E[(f (X) Y ) 2 ] = E[(E[Y X] Y ) 2 ] = E[E[(E[Y X] Y ) 2 X]] = E[Var(Y X)], 2

3 where we applied the law of iterated expectatios. The law of total variace states that Var(Y ) = E[Var(Y X)] + Var[E(Y X)]. This proves the Bayes risk satisfies E[Var(Y X)] = Var(Y ) Var[E(Y X)] Var(Y ). Recall from Homework that Var(Y ) is the Bayes risk whe we estimate Y without ay iput X. This shows that usig X i our estimatio reduces the Bayes risk, ad that the improvemet is measured by Var[E(Y X)]. As a saity check, ote that if X, Y are idepedet the E(Y X) = E(Y ) so Var[E(Y X)] = 0. If X = Y the E(Y X) = Y ad Var[E(Y X)] = Var(Y ). The promiet role of variace i our aalysis above is due to the fact that we are usig the square loss. 3. Let X = {,..., 0}, let Y = {,..., 0}, ad let A = Y. Suppose the data geeratig distributio, P, has margial X Uif{,..., 0} ad coditioal distributio Y X = x Uif{,..., x}. For each loss fuctio below give a Bayes decisio fuctio. (a) l(a, y) = (a y) 2, (b) l(a, y) = a y, (c) l(a, y) = (a y). Solutio. (a) From Homework we kow that f (x) = E[Y X = x] = (x + )/2. (b) From Homework, we kow that f (x) is the coditioal media of Y give X = x. If x is odd, the f (x) = (x + )/2. If x is eve, the we ca choose ay value i the iterval [ ] x + x +,. 2 2 (c) From questio above, we kow that f (x) = arg max y P (Y = y X = x). Thus we ca choose ay iteger betwee ad x, iclusive, for f (x). 4. Show that the empirical risk is a ubiased ad cosistet estimator of the Bayes risk. You may assume the Bayes risk is fiite. Solutio. We assume a give loss fuctio l ad a i.i.d. sample (x, y i ),..., (x, y ). 3

4 To show it is ubiased, ote that [ ] E[ ˆR (f)] = E l(f(x i ), y i ) i= = E[l(f(x i ), y i )] (Liearity of E) i= = E[l(f(x ), y )] (i.i.d.) = R(f). For cosistecy, we must show that as we have ˆR (f) R(f) with probability. Lettig z i = l(f(x i ), y i ), we see that the z i are i.i.d. with fiite mea. Thus cosistecy follows by applyig the strog law of large umbers. 5. Let X = [0, ] ad Y = A = R. Suppose you receive the (x, y) data poits (0, 5), (.2, 3), (.37, 4.2), (.9, 3), (, 5). Throughout assume we are usig the 0 loss. (a) Suppose we restrict our decisio fuctios to the hypothesis space F of costat fuctios. Give a decisio fuctio that miimizes the empirical risk over F ad the correspodig empirical risk. Is the empirical risk miimizig fuctio uique? (b) Suppose we restrict our decisio fuctios to the hypothesis space F 2 of piecewisecostat fuctios with at most discotiuity. Give a decisio fuctio that miimizes the empirical risk over F 2 ad the correspodig empirical risk. Is the empirical risk miimizig fuctio uique? Solutio. (a) We ca let ˆf(x) = 5 or ˆf(x) = 3 ad obtai the miimal empirical risk of 3/5. Thus the empirical risk miimizer is ot uique. (b) Oe solutio is to let ˆf(x) = 5 for x [0,.] ad ˆf(x) = 3 for x (., ] givig a empirical risk of 2/5. There are ucoutably may empirical risk miimizers, so agai we do ot have uiqueess. 6. ( ) Let X = [ 0, 0], Y = A = R ad suppose the data geeratig distributio has margial distributio X Uif[ 0, 0] ad coditioal distributio Y X = x N (a + bx, ) for some fixed a, b R. Suppose you are also give the followig data poits: (0, ), (0, 2), (, 3), (2.5, 3.), ( 4, 2.). (a) Assumig the 0 loss, what is the Bayes risk? (b) Assumig the square error loss l(a, y) = (a y) 2, what is the Bayes risk? 4

5 (c) Usig the full hypothesis space of all (measurable) fuctios, what is the miimum achievable empirical risk for the square error loss. (d) Usig the hypothesis space of all affie fuctios (i.e., of the form f(x) = cx + d for some c, d R), what is the miimum achievable empirical risk for the square error loss. (e) Usig the hypothesis space of all quadratic fuctios (i.e., of the form f(x) = cx 2 + dx + e for some c, d, e R), what is the miimum achievable empirical risk for the square error loss. Solutio. (a) For ay decisio fuctio f the risk is give by To see this ote that P (f(x) = Y ) = E[(f(X) Y )] = P (f(x) Y ) = P (f(x) = Y ) =. 20 2π 0 0 (f(x) = y)e (y a bx)2 /2 dy dx = 20 2π Thus every decisio fuctio is a Bayes decisio fuctio, ad the Bayes risk is. (b) By problem 2 above we kow the Bayes risk is give by sice Var(Y X = x) =. (c) We choose ˆf such that E[Var(Y X)] = E[] =, ˆf(0) =.5, ˆf() = 3, ˆf(2.5) = 3., ˆf( 4) = 2., 0 ad ˆf(x) = 0 otherwise. The we achieve the miimum empirical risk of /0. (d) Lettig 0 0 A = 2.5, y = we obtai (usig a computer) ( ) ˆd ŵ = = (A T A) A T y = ĉ This gives ( ) ˆR 5 ( ˆf) = 5 Aŵ y 2 2 = [Aside: I geeral, to solve systems like the oe above o a computer you should t actually ivert the matrix A T A, but use somethig like w=a\y i Matlab which performs a QR factorizatio of A.] dx = 0.

6 (e) Lettig we obtai (usig a computer) A = , y = ê.775 ŵ = ˆd = (A T A) A T y = ĉ This gives ˆR 5 ( ˆf) = 5 Aŵ y 2 2 = Topic 2: Stochastic Gradiet Descet Learig Objectives. Be able to write the empirical risk for a particular loss fuctio over a particular parameterized hypothesis space, such as for square loss over a hypothesis space of liear fuctios. 2. Compare ad costrast gradiet descet, miibatch gradiet descet, ad stochastic gradiet descet. Cocept Check Questios. Whe performig mii-batch gradiet descet, we ofte radomly choose the miibatch from the full traiig set without replacemet. Show that the resultig miibatch gradiet is a ubiased estimate of the gradiet of the full traiig set. Here we assume each decisio fuctio f w i our hypothesis space is determied by a parameter vector w R d. Solutio. Let (x m, y m ),..., (x m, y m ) be our mii-batch selected uiformly without 6

7 replacemet from the full traiig set (x, y ),..., (x, y ). [ ] E w l(f w (x mi, y mi )) = E [ w l(f w (x mi ), y mi )] (Liearity of, E) i= i= = E [ w l(f w (x m ), y m )] (Margials are the same) i= = E [ w l(f w (x m ), y m )] N = N wl(f w (x i ), y i ) i= N = w l(f w (x i ), y i ) N i= (Liearity of ). 2. You wat to estimate the average age of the people visitig your website. Over a fixed week we will receive a total of N visitors (which we will call our full populatio). Suppose the populatio mea µ is ukow but the variace σ 2 is kow. Sice we do t wat to bother every visitor, we will ask a small sample what their ages are. How may visitors must we radomly sample so that our estimator ˆµ has variace at most ɛ > 0? Solutio. Let x,..., x deote our radomly sampled ages, ad let ˆx deote the sample mea i= x i. The Var(ˆx) = σ2. Thus we require σ 2 /ɛ. Note that this does t deped o N, the full populatio size. 3. ( ) Suppose you have bee successfully ruig mii-batch gradiet descet with a full traiig set size of 0 5 ad a mii-batch size of 00. After receivig more data your full traiig set size icreases to 0 9. Give a heuristic argumet as to why the mii-batch size eed ot icrease eve though we have 0000 times more data. Solutio. Throughout we assume our gradiet lies i R d. Cosider the empirical distributio o the full traiig set (i.e., each sample is chose with probability /N where N is the full traiig set size). Assume this distributio has mea vector µ R d (the full-batch gradiet) ad covariace matrix Σ R d d. By the cetral limit theorem the mii-batch gradiet will be approximately ormally distributed with mea µ ad covariace Σ, where is the mii-batch size. As N grows the etries of Σ eed ot grow, ad thus eed ot grow. I fact, as N grows, the empirical mea ad covariace matrix will coverge to their true values. More precisely, the mea of the 7

8 empirical distributio will coverge to E l(f(x), Y ) ad the covariace will coverge to E[( l(f(x), Y ))( l(f(x), Y )) T ] E[ l(f(x), Y )]E[ l(f(x), Y )] T where (X, Y ) P X Y. The importat takeaway here is that the size of the mii-batch is depedet o the speed of computatio, ad o the characteristics of the distributio of the gradiets (such as the momets), ad thus may vary idepedetly of the size of the full traiig set. 8

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

Maximum Likelihood Estimation and Complexity Regularization

Maximum Likelihood Estimation and Complexity Regularization ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Lecture 13: Maximum Likelihood Estimation

Lecture 13: Maximum Likelihood Estimation ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select

More information

Agnostic Learning and Concentration Inequalities

Agnostic Learning and Concentration Inequalities ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture

More information

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam. Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we

More information

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors ECONOMETRIC THEORY MODULE XIII Lecture - 34 Asymptotic Theory ad Stochastic Regressors Dr. Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Asymptotic theory The asymptotic

More information

Lecture 11 and 12: Basic estimation theory

Lecture 11 and 12: Basic estimation theory Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis

More information

Efficient GMM LECTURE 12 GMM II

Efficient GMM LECTURE 12 GMM II DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet

More information

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead) Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell

More information

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2. SAMPLE STATISTICS A radom sample x 1,x,,x from a distributio f(x) is a set of idepedetly ad idetically variables with x i f(x) for all i Their joit pdf is f(x 1,x,,x )=f(x 1 )f(x ) f(x )= f(x i ) The sample

More information

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Lecture 3. Properties of Summary Statistics: Sampling Distribution Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Statistical Inference Based on Extremum Estimators

Statistical Inference Based on Extremum Estimators T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0

More information

Lecture 33: Bootstrap

Lecture 33: Bootstrap Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece

More information

AMS570 Lecture Notes #2

AMS570 Lecture Notes #2 AMS570 Lecture Notes # Review of Probability (cotiued) Probability distributios. () Biomial distributio Biomial Experimet: ) It cosists of trials ) Each trial results i of possible outcomes, S or F 3)

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion .87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Output Analysis and Run-Length Control

Output Analysis and Run-Length Control IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

ECE 901 Lecture 13: Maximum Likelihood Estimation

ECE 901 Lecture 13: Maximum Likelihood Estimation ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

This section is optional.

This section is optional. 4 Momet Geeratig Fuctios* This sectio is optioal. The momet geeratig fuctio g : R R of a radom variable X is defied as g(t) = E[e tx ]. Propositio 1. We have g () (0) = E[X ] for = 1, 2,... Proof. Therefore

More information

Mathematical Statistics - MS

Mathematical Statistics - MS Paper Specific Istructios. The examiatio is of hours duratio. There are a total of 60 questios carryig 00 marks. The etire paper is divided ito three sectios, A, B ad C. All sectios are compulsory. Questios

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

EE 4TM4: Digital Communications II Probability Theory

EE 4TM4: Digital Communications II Probability Theory 1 EE 4TM4: Digital Commuicatios II Probability Theory I. RANDOM VARIABLES A radom variable is a real-valued fuctio defied o the sample space. Example: Suppose that our experimet cosists of tossig two fair

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We

More information

Unbiased Estimation. February 7-12, 2008

Unbiased Estimation. February 7-12, 2008 Ubiased Estimatio February 7-2, 2008 We begi with a sample X = (X,..., X ) of radom variables chose accordig to oe of a family of probabilities P θ where θ is elemet from the parameter space Θ. For radom

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

32 estimating the cumulative distribution function

32 estimating the cumulative distribution function 32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

Lecture Chapter 6: Convergence of Random Sequences

Lecture Chapter 6: Convergence of Random Sequences ECE5: Aalysis of Radom Sigals Fall 6 Lecture Chapter 6: Covergece of Radom Sequeces Dr Salim El Rouayheb Scribe: Abhay Ashutosh Doel, Qibo Zhag, Peiwe Tia, Pegzhe Wag, Lu Liu Radom sequece Defiitio A ifiite

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019 Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /

More information

Ma 530 Introduction to Power Series

Ma 530 Introduction to Power Series Ma 530 Itroductio to Power Series Please ote that there is material o power series at Visual Calculus. Some of this material was used as part of the presetatio of the topics that follow. What is a Power

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

STAT Homework 2 - Solutions

STAT Homework 2 - Solutions STAT-36700 Homework - Solutios Fall 08 September 4, 08 This cotais solutios for Homework. Please ote that we have icluded several additioal commets ad approaches to the problems to give you better isight.

More information

1 Introduction to reducing variance in Monte Carlo simulations

1 Introduction to reducing variance in Monte Carlo simulations Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by

More information

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise First Year Quatitative Comp Exam Sprig, 2012 Istructio: There are three parts. Aswer every questio i every part. Questio I-1 Part I - 203A A radom variable X is distributed with the margial desity: >

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Lecture 7: October 18, 2017

Lecture 7: October 18, 2017 Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

Summary. Recap ... Last Lecture. Summary. Theorem

Summary. Recap ... Last Lecture. Summary. Theorem Last Lecture Biostatistics 602 - Statistical Iferece Lecture 23 Hyu Mi Kag April 11th, 2013 What is p-value? What is the advatage of p-value compared to hypothesis testig procedure with size α? How ca

More information

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values of the secod half Biostatistics 6 - Statistical Iferece Lecture 6 Fial Exam & Practice Problems for the Fial Hyu Mi Kag Apil 3rd, 3 Hyu Mi Kag Biostatistics 6 - Lecture 6 Apil 3rd, 3 / 3 Rao-Blackwell

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn Stat 366 Lab 2 Solutios (September 2, 2006) page TA: Yury Petracheko, CAB 484, yuryp@ualberta.ca, http://www.ualberta.ca/ yuryp/ Review Questios, Chapters 8, 9 8.5 Suppose that Y, Y 2,..., Y deote a radom

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

6. Sufficient, Complete, and Ancillary Statistics

6. Sufficient, Complete, and Ancillary Statistics Sufficiet, Complete ad Acillary Statistics http://www.math.uah.edu/stat/poit/sufficiet.xhtml 1 of 7 7/16/2009 6:13 AM Virtual Laboratories > 7. Poit Estimatio > 1 2 3 4 5 6 6. Sufficiet, Complete, ad Acillary

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?

More information

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday

More information

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS MBACATÓLICA Quatitative Methods Miguel Gouveia Mauel Leite Moteiro Faculdade de Ciêcias Ecoómicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS MBACatólica 006/07 Métodos Quatitativos

More information

Section 11.8: Power Series

Section 11.8: Power Series Sectio 11.8: Power Series 1. Power Series I this sectio, we cosider geeralizig the cocept of a series. Recall that a series is a ifiite sum of umbers a. We ca talk about whether or ot it coverges ad i

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

ECON 3150/4150, Spring term Lecture 3

ECON 3150/4150, Spring term Lecture 3 Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step ECON 3150/4150, Sprig term 2014. Lecture 3 Ragar Nymoe Uiversity of Oslo 21 Jauary 2014 1 / 30 Itroductio

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

5. INEQUALITIES, LIMIT THEOREMS AND GEOMETRIC PROBABILITY

5. INEQUALITIES, LIMIT THEOREMS AND GEOMETRIC PROBABILITY IA Probability Let Term 5 INEQUALITIES, LIMIT THEOREMS AND GEOMETRIC PROBABILITY 51 Iequalities Suppose that X 0 is a radom variable takig o-egative values ad that c > 0 is a costat The P X c E X, c is

More information

1 Covariance Estimation

1 Covariance Estimation Eco 75 Lecture 5 Covariace Estimatio ad Optimal Weightig Matrices I this lecture, we cosider estimatio of the asymptotic covariace matrix B B of the extremum estimator b : Covariace Estimatio Lemma 4.

More information

Simulation. Two Rule For Inverting A Distribution Function

Simulation. Two Rule For Inverting A Distribution Function Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump

More information

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions Statistical ad Mathematical Methods DS-GA 00 December 8, 05. Short questios Sample Fial Problems Solutios a. Ax b has a solutio if b is i the rage of A. The dimesio of the rage of A is because A has liearly-idepedet

More information

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

Lecture 6 Simple alternatives and the Neyman-Pearson lemma STATS 00: Itroductio to Statistical Iferece Autum 06 Lecture 6 Simple alteratives ad the Neyma-Pearso lemma Last lecture, we discussed a umber of ways to costruct test statistics for testig a simple ull

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram.

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram. Key Cocepts: 1) Sketchig of scatter diagram The scatter diagram of bivariate (i.e. cotaiig two variables) data ca be easily obtaied usig GC. Studets are advised to refer to lecture otes for the GC operatios

More information

Stat410 Probability and Statistics II (F16)

Stat410 Probability and Statistics II (F16) Some Basic Cocepts of Statistical Iferece (Sec 5.) Suppose we have a rv X that has a pdf/pmf deoted by f(x; θ) or p(x; θ), where θ is called the parameter. I previous lectures, we focus o probability problems

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED MATH 47 / SPRING 013 ASSIGNMENT : DUE FEBRUARY 4 FINALIZED Please iclude a cover sheet that provides a complete setece aswer to each the followig three questios: (a) I your opiio, what were the mai ideas

More information

An Introduction to Asymptotic Theory

An Introduction to Asymptotic Theory A Itroductio to Asymptotic Theory Pig Yu School of Ecoomics ad Fiace The Uiversity of Hog Kog Pig Yu (HKU) Asymptotic Theory 1 / 20 Five Weapos i Asymptotic Theory Five Weapos i Asymptotic Theory Pig Yu

More information

Naïve Bayes. Naïve Bayes

Naïve Bayes. Naïve Bayes Statistical Data Miig ad Machie Learig Hilary Term 206 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.uk/~sejdiov/sdmml : aother plug-i classifier

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

Chapter 6 Sampling Distributions

Chapter 6 Sampling Distributions Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

Chapter 6 Infinite Series

Chapter 6 Infinite Series Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat

More information

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals 7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses

More information

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)]. Probability 2 - Notes 0 Some Useful Iequalities. Lemma. If X is a radom variable ad g(x 0 for all x i the support of f X, the P(g(X E[g(X]. Proof. (cotiuous case P(g(X Corollaries x:g(x f X (xdx x:g(x

More information