# Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Size: px
Start display at page:

Download "Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach"

## Transcription

1 STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics. 7. k-earest eighbor k-earest eighbor (k-nn is a cool ad powerful idea for oparametric estimatio. Today we will talk about its applicatio i desity estimatio. I the future, we will lear how to use it for regressio aalysis ad classificatio. Let X,, X p be our radom sample. Assume each observatio has d differet variables; amely, X i R d. For a give poit x, we first rak every observatio based o its distace to x. Let R k (x deotes the distace from x to its k-th earest eighbor poit. For a give poit x, the knn desity estimator estimates the desity by where V d p k (x k πd/2 Γ( d 2 V d Rk d(x k Volume of a d-dimesioal ball with radius beig R k (x, + is the volume of a uit d-dimesioal ball ad Γ(x is the Gamma fuctio. Here are the results whe d, 2, ad 3 d, V 2: p k (x k d 2, V π: p k (x k d 3, V 4 3 π: p k(x k 2R k (x. πrk 2 (x. 3 4πRk 3 (x. What is the ituitio of a knn desity estimator? By the defiitio of R k (x, the ball cetered at x with radius R k (x B(x, R k (x {y : x y R k (x} satisfies the fact that k I(X i B(x, R k (x. i Namely, ratio of observatios withi B(x, R k (x is k/. Recall from the relatio betwee EDF ad CDF, the quatity I(X i B(x, R k (x i ca be viewed as a estimator of the quatity P (X i B(x, R k (x 7- B(x,R k (x p(ydy.

2 7-2 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Whe is large ad k is relatively small compared to, R k (x will be small because the ratio k is small. Thus, the desity p(y withi the regio B(x, R k (x will ot chage too much. Namely, p(y p(x for every y B(x, R k (x. Note that p(x is the ceter of the ball B(x, R k (x. Therefore, P (X i B(x, R k (x B(x,R k (x p(ydy p(x dy p(x V d Rk(x. d B(x,R k (x This quatity will be the target of the estimator i I(X i B(x, R k (x, which equals to k. As a result, we ca say that which leads to This motivates us to use as a desity estimator. p(x V d R d k(x P (X i B(x, R k (x I(X i B(x, R k (x k, i p(x V d Rk(x d k p(x k V d Rk d(x p k (x k V d Rk d(x Example. We cosider a simple example i d. Assume our data is X {, 2, 6,, 3, 4, 2, 33}. What is the knn desity estimator at x 5 with k 2? First, we calculate R 2 (5. The distace from x 5 to each data poit i X is {4, 3,, 6, 8, 9, 5, 28}. Thus, R 2 (5 3 ad p k ( R 2 (5 24. What will the desity estimator be whe we choose k 5? I this case, R 5 (5 8 so p k ( R 5 ( Now we see that differet value of k gives a differet desity estimate eve at the same x. How do we choose k? Well, just as the smoothig badwidth i the KDE, it is a very difficult problem i practice. However, we ca do some theoretical aalysis to get a rough idea about how k should be chagig with respect to the sample size. 7.. Asymptotic theory The asymptotic aalysis of a k-nn estimator is quiet complicated so here I oly stated its result i d. The bias of the k-nn estimator is p ( ( 2 ( 2 (x k p(x k bias( p k (x E( p k (x p(x b p 2 + b 2 (x k + o +, k where b ad b 2 are two costats. The variace of the k-nn estimator is ( Var( p k (x v p2 (x + o, k k

3 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach 7-3 where v is a costat. The quatity k is somethig we ca choose. We eed k whe to make sure both bias ad variace coverge to. However, how k diverges affects the quality of estimatio. Whe k is large, the variace is small while the bias is large. Whe k is small, the variace is large ad the bias teds to be small but it could also be large (the secod compoet i the bias will be large. To balace the bias ad variace, we cosider the mea square error, which is at the rate ( k 4 MSE( p k (x O 4 +. k This motivates us to choose k C 4 5 for some costat C. This leads to the optimal covergece rate for a k-nn desity estimator. Remark. MSE( p k,opt (x O( 4 5 Whe we cosider a d-dimesioal data, the bias will be of the rate ( ( 2 k d bias( p k (x O + k ad the variace is still at rate Var( p k (x O (. k This shows a very differet pheomeo compared to the KDE. I KDE, the rate of variace depeds o the dimesio whereas the bias remais the same. I knn, the rate of variace stays the same rate but the rate of bias chages with respect to the dimesio. Oe ituitio is that o matter what dimesio is, the ball B(x, R k (x always cotai k observatios. Thus, the variability of a knn estimator is caused by k poits, which is idepedet of the dimesio. O the other had, i KDE, the same h i differet dimesios will cover differet umber of observatios so the variability chages with respect to the dimesio. The knn approach has a advatage that it ca be computed very efficietly usig kd-tree algorithm. This is a particularly useful feature whe we have a huge amout of data ad whe the dimesio of the data is large. However, a dowside of the knn is that the desity ofte has a heavy-tail, which implies it may ot work well whe x is very large. Moreover, whe d, the desity estimator p k (x is ot eve a desity fuctio (the itegral is ifiite!. 7.2 Basis approach I this sectio, we assume that the PDF p(x is supported o [, ]. Namely, p(x > oly i [, ]. Whe the PDF p(x is smooth (i geeral, we eed p to be squared itegrable, i.e., p(x2 dx <, we ca use a orthoormal basis to approximate this fuctio. This approach has several other ames: the basis estimator, projectio estimator, ad a orthogoal series estimator.

4 7-4 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Let {φ (x, φ 2 (x,, φ m (x, } be a set of basis fuctios. The we have p(x θ j φ j (x. The quatity θ j is the coefficiet of each basis. I sigal process, these quatities are refereed to as the sigal. The collectio {φ (x, φ 2 (x,, φ m (x, } is called a basis if its elemets have the followig property: Uit : φ 2 j(xdx (7. for every j,. Orthoormal: φ j (xφ k (xdx (7.2 for every j k,. Here are some cocrete examples of the basis: Cosie basis: φ (x, φ j (x 2 cos(π(j x, j 2, 3,, Trigoometric basis: φ (x, φ 2j (x 2 cos(2πjx, φ 2j+ (x 2 si(2πjx, j, 2, Ofte the basis is somethig we ca choose so it is kow to us. What is ukow to use is the coefficiets θ, θ 2,,. Thus, the goal is to estimate these coefficiets usig the radom sample X,, X. How do we estimate these parameters? We start with some simple aalysis. For ay basis φ j (x, cosider the followig itegral: E(φ j (X φ j (xdp (x φ j (xp(xdx φ j (x θ k φ k (xdx k θ k φ j (xφ k (xdx k }{{} except k j θ k I(k j k θ j.

5 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach 7-5 Namely, the expectatio of φ j (X is exactly the coefficiet θ j. This motivates us to use the sample average as a estimator: θ j φ j (X i. By costructio, this estimator is ubiased, i.e., E( θ j θ j. The variace of this estimator is i where σ 2 j E(φ2 j (X φ 2 j. Var( θ j Var(φ j(x ( E(φ 2 j (X E 2 (φ j (X (E(φ2 j(x φ 2 j σ2 j, I practice, we caot use all the basis because there will be ifiite umber of them beig calculated. So we will use oly M basis as our estimator. Namely, our estimator is p,m (x θ j φ j (x. (7.3 Later i the asymptotic aalysis, we will show that we should ot choose M to be too large because of the bias-variace tradeoff Asymptotic theory To aalyze the quality of our estimator, we use the mea itegrated squared error (MISE. Namely, we wat to aalyze MISE( p,m E( ( p,m (x p(x 2 dx [ bias 2 ( p,m (x + Var( p,m (x ] dx Aalysis of Bias. Because each θ j is a ubiased estimator of θ j, we have E( p,m (x E( θ j φ j (x E( θ j φ j (x θ j φ j (x. Thus, the bias at poit x is bias( p,m (x E( p,m (x p(x θ j φ j (x θ j φ j (x θ j φ j (x. jm+

6 7-6 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Thus, the itegrated squared bias is bias 2 ( p,m (xdx jm+ jm+ jm+ km+ jm+ θ 2 j. 2 θ j φ j (x dx ( θ j φ j (x km+ θ k φ k (x θ j θ k φ j (xφ k (xdx } {{ } I(jk Namely, the bias is determied by the sigal stregth of the igored basis, which makes sese because the bias should be reflectig the fact that we are ot usig all the basis ad if there are some importat basis (the oes with large θ j beig igored, the bias ought to be large. We kow that i KDE ad knn, the bias is ofte associated with the smoothess of the desity fuctio. How does the smoothess comes ito play i this case? It turs out that if the desity is smooth, the remaiig sigals jm+ θ2 j will also be small. To see this, we cosider a very simple model by assumig that we are usig the cosie basis ad dx (7.4 p (x 2 dx L (7.5 for some costat L >. Namely, the overall curvature of the desity fuctio is bouded. Usig the fact that for cosie basis fuctio φ j (x, Thus, equatio (7.5 implies φ j(x 2π(j si(π(j x, φ j (x 2π 2 (j 2 cos(π(j x π 2 (j 2 φ j (x. L p (x 2 dx π 4 θ j φ j (x 2 dx π 2 (j 2 θ j φ j (x 2 dx ( π 2 (j 2 θ j φ j (x π 2 (k 2 θ k φ k (x dx k k (j 2 (k 2 θ j θ k φ j (xφ k (xdx } {{ } I(jk π 4 (j 2 θj 2.

7 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach 7-7 Namely, equatio (7.5 implies This further implies that (j 4 θj 2 L π 4. (7.6 θj 2 O(M 4. (7.7 jm+ A ituitive explaatio is as follows. Equatio (7.6 implies that whe j, the sigal θj 2 O(j 5. The reaso is: if θj 2 j, equatio (7.6 becomes 5 (j 4 θj 2 (j 4 j 5 j! Thus, the tail sigals have to be shrikig toward at rate O(j 5. As a result, jm+ θ2 j jm+ O(j 5 O(M 4. Equatio (7.7 ad (7.4 together imply that the bias is at rate bias 2 ( p,m (xdx O(M 4. Aalysis of Variace. Now we tur to the aalysis of variace. Var( p,m (xdx Var θ j φ j (x dx φ 2 j(xvar( θ j + Var( θ j Var( θ j O σ 2 j ( M φ 2 j(xdx + } {{ } j k j k Now puttig both bias ad variace together, we obtai the rate of the MISE MISE( p,m. φ j (xφ k (xcov( θ j, θ k dx [ bias 2 ( p,m (x + Var( p,m (x ] dx O ( M 4 Cov( θ j, θ k φ j (xφ k (xdx } {{ } + O ( M. Thus, the optimal choice is M M C /5 for some positive costat C. Ad this leads to which is the same rate as the KDE ad knn. Remark. MISE( p,m O( 4/5,

### Lecture 2: Monte Carlo Simulation

STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

### Lecture 15: Learning Theory: Concentration Inequalities

STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that

### Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

### Study the bias (due to the nite dimensional approximation) and variance of the estimators

2 Series Methods 2. Geeral Approach A model has parameters (; ) where is ite-dimesioal ad is oparametric. (Sometimes, there is o :) We will focus o regressio. The fuctio is approximated by a series a ite

### January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we

### Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

### Lecture 10 October Minimaxity and least favorable prior sequences

STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

### Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3

No-Parametric Techiques Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3 Parametric vs. No-Parametric Parametric Based o Fuctios (e.g Normal Distributio) Uimodal Oly oe peak Ulikely real data cofies

### 6.3 Testing Series With Positive Terms

6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

### Ma 530 Introduction to Power Series

Ma 530 Itroductio to Power Series Please ote that there is material o power series at Visual Calculus. Some of this material was used as part of the presetatio of the topics that follow. What is a Power

### Lecture Chapter 6: Convergence of Random Sequences

ECE5: Aalysis of Radom Sigals Fall 6 Lecture Chapter 6: Covergece of Radom Sequeces Dr Salim El Rouayheb Scribe: Abhay Ashutosh Doel, Qibo Zhag, Peiwe Tia, Pegzhe Wag, Lu Liu Radom sequece Defiitio A ifiite

### Topic 9: Sampling Distributions of Estimators

Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

### Discrete Mathematics for CS Spring 2008 David Wagner Note 22

CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

### Topic 9: Sampling Distributions of Estimators

Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

### ECE 901 Lecture 13: Maximum Likelihood Estimation

ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered

### Efficient GMM LECTURE 12 GMM II

DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet

### Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

### Section 11.8: Power Series

Sectio 11.8: Power Series 1. Power Series I this sectio, we cosider geeralizig the cocept of a series. Recall that a series is a ifiite sum of umbers a. We ca talk about whether or ot it coverges ad i

### Lecture 19: Convergence

Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

### Empirical Process Theory and Oracle Inequalities

Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

### Lecture 3. Properties of Summary Statistics: Sampling Distribution

Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary

### Pattern Classification, Ch4 (Part 1)

Patter Classificatio All materials i these slides were take from Patter Classificatio (2d ed) by R O Duda, P E Hart ad D G Stork, Joh Wiley & Sos, 2000 with the permissio of the authors ad the publisher

### Topic 9: Sampling Distributions of Estimators

Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

### ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

### EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

### 9.3 The INTEGRAL TEST; p-series

Lecture 9.3 & 9.4 Math 0B Nguye of 6 Istructor s Versio 9.3 The INTEGRAL TEST; p-series I this ad the followig sectio, you will study several covergece tests that apply to series with positive terms. Note

### Random Variables, Sampling and Estimation

Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

### Monte Carlo Integration

Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce

### Chapter 4. Fourier Series

Chapter 4. Fourier Series At this poit we are ready to ow cosider the caoical equatios. Cosider, for eample the heat equatio u t = u, < (4.) subject to u(, ) = si, u(, t) = u(, t) =. (4.) Here,

### (A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

### Introduction to Machine Learning DIS10

CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

### 8.1 Introduction. 8. Nonparametric Inference Using Orthogonal Functions

8. Noparametric Iferece Usig Orthogoal Fuctios 1. Itroductio. Noparametric Regressio 3. Irregular Desigs 4. Desity Estimatio 5. Compariso of Methods 8.1 Itroductio Use a orthogoal basis to covert oparametric

### Lecture Notes for Analysis Class

Lecture Notes for Aalysis Class Topological Spaces A topology for a set X is a collectio T of subsets of X such that: (a) X ad the empty set are i T (b) Uios of elemets of T are i T (c) Fiite itersectios

### 62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of

### Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

### Convergence of random variables. (telegram style notes) P.J.C. Spreij

Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

### Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

### Kernel density estimator

Jauary, 07 NONPARAMETRIC ERNEL DENSITY ESTIMATION I this lecture, we discuss kerel estimatio of probability desity fuctios PDF Noparametric desity estimatio is oe of the cetral problems i statistics I

### MAT1026 Calculus II Basic Convergence Tests for Series

MAT026 Calculus II Basic Covergece Tests for Series Egi MERMUT 202.03.08 Dokuz Eylül Uiversity Faculty of Sciece Departmet of Mathematics İzmir/TURKEY Cotets Mootoe Covergece Theorem 2 2 Series of Real

### Algorithms for Clustering

CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

### Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

### Lecture 13: Maximum Likelihood Estimation

ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select

### Infinite Sequences and Series

Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

### LECTURE 2 LEAST SQUARES CROSS-VALIDATION FOR KERNEL DENSITY ESTIMATION

Jauary 3 07 LECTURE LEAST SQUARES CROSS-VALIDATION FOR ERNEL DENSITY ESTIMATION Noparametric kerel estimatio is extremely sesitive to te coice of badwidt as larger values of result i averagig over more

### STAT Homework 2 - Solutions

STAT-36700 Homework - Solutios Fall 08 September 4, 08 This cotais solutios for Homework. Please ote that we have icluded several additioal commets ad approaches to the problems to give you better isight.

### Ma 530 Infinite Series I

Ma 50 Ifiite Series I Please ote that i additio to the material below this lecture icorporated material from the Visual Calculus web site. The material o sequeces is at Visual Sequeces. (To use this li

### 6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

### Chapter 10: Power Series

Chapter : Power Series 57 Chapter Overview: Power Series The reaso series are part of a Calculus course is that there are fuctios which caot be itegrated. All power series, though, ca be itegrated because

### Sequences and Series of Functions

Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

### 1 Approximating Integrals using Taylor Polynomials

Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................

### Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

### MA131 - Analysis 1. Workbook 2 Sequences I

MA3 - Aalysis Workbook 2 Sequeces I Autum 203 Cotets 2 Sequeces I 2. Itroductio.............................. 2.2 Icreasig ad Decreasig Sequeces................ 2 2.3 Bouded Sequeces..........................

### Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

### Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

### Lecture 9: Regression: Regressogram and Kernel Regression

STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 9: Regressio: Regressogram ad erel Regressio Istructor: Ye-Ci Ce Referece: Capter 5 of All of oparametric statistics 9 Itroductio Let X,

### Statistics 511 Additional Materials

Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

### 3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense,

3. Z Trasform Referece: Etire Chapter 3 of text. Recall that the Fourier trasform (FT) of a DT sigal x [ ] is ω ( ) [ ] X e = j jω k = xe I order for the FT to exist i the fiite magitude sese, S = x [

### Lecture 33: Bootstrap

Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece

### Machine Learning for Data Science (CS 4786)

Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

### Math 155 (Lecture 3)

Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,

### Lecture 3: August 31

36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

### Metric Space Properties

Metric Space Properties Math 40 Fial Project Preseted by: Michael Brow, Alex Cordova, ad Alyssa Sachez We have already poited out ad will recogize throughout this book the importace of compact sets. All

### Fall 2013 MTH431/531 Real analysis Section Notes

Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters

### MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

### 11 Correlation and Regression

11 Correlatio Regressio 11.1 Multivariate Data Ofte we look at data where several variables are recorded for the same idividuals or samplig uits. For example, at a coastal weather statio, we might record

### Machine Learning Brett Bernstein

Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

### Vector Quantization: a Limiting Case of EM

. Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z

### Estimation of the Mean and the ACVF

Chapter 5 Estimatio of the Mea ad the ACVF A statioary process {X t } is characterized by its mea ad its autocovariace fuctio γ ), ad so by the autocorrelatio fuctio ρ ) I this chapter we preset the estimators

### 7.1 Convergence of sequences of random variables

Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

### 1.010 Uncertainty in Engineering Fall 2008

MIT OpeCourseWare http://ocw.mit.edu.00 Ucertaity i Egieerig Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu.terms. .00 - Brief Notes # 9 Poit ad Iterval

### Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Term Test October 3, 003 Name Math 56 Studet Number Directio: This test is worth 50 poits. You are required to complete this test withi 50 miutes. I order to receive full credit, aswer each problem completely

### A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as

### The Growth of Functions. Theoretical Supplement

The Growth of Fuctios Theoretical Supplemet The Triagle Iequality The triagle iequality is a algebraic tool that is ofte useful i maipulatig absolute values of fuctios. The triagle iequality says that

### Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

### Bayesian Methods: Introduction to Multi-parameter Models

Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested

### University of Colorado Denver Dept. Math. & Stat. Sciences Applied Analysis Preliminary Exam 13 January 2012, 10:00 am 2:00 pm. Good luck!

Uiversity of Colorado Dever Dept. Math. & Stat. Scieces Applied Aalysis Prelimiary Exam 13 Jauary 01, 10:00 am :00 pm Name: The proctor will let you read the followig coditios before the exam begis, ad

### CHAPTER 10 INFINITE SEQUENCES AND SERIES

CHAPTER 10 INFINITE SEQUENCES AND SERIES 10.1 Sequeces 10.2 Ifiite Series 10.3 The Itegral Tests 10.4 Compariso Tests 10.5 The Ratio ad Root Tests 10.6 Alteratig Series: Absolute ad Coditioal Covergece

### Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

### Math 116 Final Exam December 12, 2014

Math 6 Fial Exam December 2, 24 Name: EXAM SOLUTIONS Istructor: Sectio:. Do ot ope this exam util you are told to do so. 2. This exam has 4 pages icludig this cover. There are 2 problems. Note that the

### CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

### Lecture 12: September 27

36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

### Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019

Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /

### Sieve Estimators: Consistency and Rates of Convergence

EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes

### Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables

CSCI-B609: A Theorist s Toolkit, Fall 06 Aug 3 Lecture 0: the Cetral Limit Theorem Lecturer: Yua Zhou Scribe: Yua Xie & Yua Zhou Cetral Limit Theorem for iid radom variables Let us say that we wat to aalyze

### Kolmogorov-Smirnov type Tests for Local Gaussianity in High-Frequency Data

Proceedigs 59th ISI World Statistics Cogress, 5-30 August 013, Hog Kog (Sessio STS046) p.09 Kolmogorov-Smirov type Tests for Local Gaussiaity i High-Frequecy Data George Tauche, Duke Uiversity Viktor Todorov,

### Chapter 6 Principles of Data Reduction

Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

### Seunghee Ye Ma 8: Week 5 Oct 28

Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value

### Math 113 Exam 3 Practice

Math Exam Practice Exam will cover.-.9. This sheet has three sectios. The first sectio will remid you about techiques ad formulas that you should kow. The secod gives a umber of practice questios for you

### Math 525: Lecture 5. January 18, 2018

Math 525: Lecture 5 Jauary 18, 2018 1 Series (review) Defiitio 1.1. A sequece (a ) R coverges to a poit L R (writte a L or lim a = L) if for each ǫ > 0, we ca fid N such that a L < ǫ for all N. If the

### x a x a Lecture 2 Series (See Chapter 1 in Boas)

Lecture Series (See Chapter i Boas) A basic ad very powerful (if pedestria, recall we are lazy AD smart) way to solve ay differetial (or itegral) equatio is via a series expasio of the correspodig solutio

### INFINITE SEQUENCES AND SERIES

11 INFINITE SEQUENCES AND SERIES INFINITE SEQUENCES AND SERIES 11.4 The Compariso Tests I this sectio, we will lear: How to fid the value of a series by comparig it with a kow series. COMPARISON TESTS

### Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet

### Math 2784 (or 2794W) University of Connecticut

ORDERS OF GROWTH PAT SMITH Math 2784 (or 2794W) Uiversity of Coecticut Date: Mar. 2, 22. ORDERS OF GROWTH. Itroductio Gaiig a ituitive feel for the relative growth of fuctios is importat if you really

### Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the

### Machine Learning Brett Bernstein

Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

### 4.3 Growth Rates of Solutions to Recurrences

4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.

### MA131 - Analysis 1. Workbook 9 Series III

MA3 - Aalysis Workbook 9 Series III Autum 004 Cotets 4.4 Series with Positive ad Negative Terms.............. 4.5 Alteratig Series.......................... 4.6 Geeral Series.............................

### Basis for simulation techniques

Basis for simulatio techiques M. Veeraraghava, March 7, 004 Estimatio is based o a collectio of experimetal outcomes, x, x,, x, where each experimetal outcome is a value of a radom variable. x i. Defiitios

### Lecture 2: April 3, 2013

TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,