Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Size: px
Start display at page:

Download "Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach"

Transcription

1 STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics. 7. k-earest eighbor k-earest eighbor (k-nn is a cool ad powerful idea for oparametric estimatio. Today we will talk about its applicatio i desity estimatio. I the future, we will lear how to use it for regressio aalysis ad classificatio. Let X,, X p be our radom sample. Assume each observatio has d differet variables; amely, X i R d. For a give poit x, we first rak every observatio based o its distace to x. Let R k (x deotes the distace from x to its k-th earest eighbor poit. For a give poit x, the knn desity estimator estimates the desity by where V d p k (x k πd/2 Γ( d 2 V d Rk d(x k Volume of a d-dimesioal ball with radius beig R k (x, + is the volume of a uit d-dimesioal ball ad Γ(x is the Gamma fuctio. Here are the results whe d, 2, ad 3 d, V 2: p k (x k d 2, V π: p k (x k d 3, V 4 3 π: p k(x k 2R k (x. πrk 2 (x. 3 4πRk 3 (x. What is the ituitio of a knn desity estimator? By the defiitio of R k (x, the ball cetered at x with radius R k (x B(x, R k (x {y : x y R k (x} satisfies the fact that k I(X i B(x, R k (x. i Namely, ratio of observatios withi B(x, R k (x is k/. Recall from the relatio betwee EDF ad CDF, the quatity I(X i B(x, R k (x i ca be viewed as a estimator of the quatity P (X i B(x, R k (x 7- B(x,R k (x p(ydy.

2 7-2 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Whe is large ad k is relatively small compared to, R k (x will be small because the ratio k is small. Thus, the desity p(y withi the regio B(x, R k (x will ot chage too much. Namely, p(y p(x for every y B(x, R k (x. Note that p(x is the ceter of the ball B(x, R k (x. Therefore, P (X i B(x, R k (x B(x,R k (x p(ydy p(x dy p(x V d Rk(x. d B(x,R k (x This quatity will be the target of the estimator i I(X i B(x, R k (x, which equals to k. As a result, we ca say that which leads to This motivates us to use as a desity estimator. p(x V d R d k(x P (X i B(x, R k (x I(X i B(x, R k (x k, i p(x V d Rk(x d k p(x k V d Rk d(x p k (x k V d Rk d(x Example. We cosider a simple example i d. Assume our data is X {, 2, 6,, 3, 4, 2, 33}. What is the knn desity estimator at x 5 with k 2? First, we calculate R 2 (5. The distace from x 5 to each data poit i X is {4, 3,, 6, 8, 9, 5, 28}. Thus, R 2 (5 3 ad p k ( R 2 (5 24. What will the desity estimator be whe we choose k 5? I this case, R 5 (5 8 so p k ( R 5 ( Now we see that differet value of k gives a differet desity estimate eve at the same x. How do we choose k? Well, just as the smoothig badwidth i the KDE, it is a very difficult problem i practice. However, we ca do some theoretical aalysis to get a rough idea about how k should be chagig with respect to the sample size. 7.. Asymptotic theory The asymptotic aalysis of a k-nn estimator is quiet complicated so here I oly stated its result i d. The bias of the k-nn estimator is p ( ( 2 ( 2 (x k p(x k bias( p k (x E( p k (x p(x b p 2 + b 2 (x k + o +, k where b ad b 2 are two costats. The variace of the k-nn estimator is ( Var( p k (x v p2 (x + o, k k

3 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach 7-3 where v is a costat. The quatity k is somethig we ca choose. We eed k whe to make sure both bias ad variace coverge to. However, how k diverges affects the quality of estimatio. Whe k is large, the variace is small while the bias is large. Whe k is small, the variace is large ad the bias teds to be small but it could also be large (the secod compoet i the bias will be large. To balace the bias ad variace, we cosider the mea square error, which is at the rate ( k 4 MSE( p k (x O 4 +. k This motivates us to choose k C 4 5 for some costat C. This leads to the optimal covergece rate for a k-nn desity estimator. Remark. MSE( p k,opt (x O( 4 5 Whe we cosider a d-dimesioal data, the bias will be of the rate ( ( 2 k d bias( p k (x O + k ad the variace is still at rate Var( p k (x O (. k This shows a very differet pheomeo compared to the KDE. I KDE, the rate of variace depeds o the dimesio whereas the bias remais the same. I knn, the rate of variace stays the same rate but the rate of bias chages with respect to the dimesio. Oe ituitio is that o matter what dimesio is, the ball B(x, R k (x always cotai k observatios. Thus, the variability of a knn estimator is caused by k poits, which is idepedet of the dimesio. O the other had, i KDE, the same h i differet dimesios will cover differet umber of observatios so the variability chages with respect to the dimesio. The knn approach has a advatage that it ca be computed very efficietly usig kd-tree algorithm. This is a particularly useful feature whe we have a huge amout of data ad whe the dimesio of the data is large. However, a dowside of the knn is that the desity ofte has a heavy-tail, which implies it may ot work well whe x is very large. Moreover, whe d, the desity estimator p k (x is ot eve a desity fuctio (the itegral is ifiite!. 7.2 Basis approach I this sectio, we assume that the PDF p(x is supported o [, ]. Namely, p(x > oly i [, ]. Whe the PDF p(x is smooth (i geeral, we eed p to be squared itegrable, i.e., p(x2 dx <, we ca use a orthoormal basis to approximate this fuctio. This approach has several other ames: the basis estimator, projectio estimator, ad a orthogoal series estimator.

4 7-4 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Let {φ (x, φ 2 (x,, φ m (x, } be a set of basis fuctios. The we have p(x θ j φ j (x. The quatity θ j is the coefficiet of each basis. I sigal process, these quatities are refereed to as the sigal. The collectio {φ (x, φ 2 (x,, φ m (x, } is called a basis if its elemets have the followig property: Uit : φ 2 j(xdx (7. for every j,. Orthoormal: φ j (xφ k (xdx (7.2 for every j k,. Here are some cocrete examples of the basis: Cosie basis: φ (x, φ j (x 2 cos(π(j x, j 2, 3,, Trigoometric basis: φ (x, φ 2j (x 2 cos(2πjx, φ 2j+ (x 2 si(2πjx, j, 2, Ofte the basis is somethig we ca choose so it is kow to us. What is ukow to use is the coefficiets θ, θ 2,,. Thus, the goal is to estimate these coefficiets usig the radom sample X,, X. How do we estimate these parameters? We start with some simple aalysis. For ay basis φ j (x, cosider the followig itegral: E(φ j (X φ j (xdp (x φ j (xp(xdx φ j (x θ k φ k (xdx k θ k φ j (xφ k (xdx k }{{} except k j θ k I(k j k θ j.

5 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach 7-5 Namely, the expectatio of φ j (X is exactly the coefficiet θ j. This motivates us to use the sample average as a estimator: θ j φ j (X i. By costructio, this estimator is ubiased, i.e., E( θ j θ j. The variace of this estimator is i where σ 2 j E(φ2 j (X φ 2 j. Var( θ j Var(φ j(x ( E(φ 2 j (X E 2 (φ j (X (E(φ2 j(x φ 2 j σ2 j, I practice, we caot use all the basis because there will be ifiite umber of them beig calculated. So we will use oly M basis as our estimator. Namely, our estimator is p,m (x θ j φ j (x. (7.3 Later i the asymptotic aalysis, we will show that we should ot choose M to be too large because of the bias-variace tradeoff Asymptotic theory To aalyze the quality of our estimator, we use the mea itegrated squared error (MISE. Namely, we wat to aalyze MISE( p,m E( ( p,m (x p(x 2 dx [ bias 2 ( p,m (x + Var( p,m (x ] dx Aalysis of Bias. Because each θ j is a ubiased estimator of θ j, we have E( p,m (x E( θ j φ j (x E( θ j φ j (x θ j φ j (x. Thus, the bias at poit x is bias( p,m (x E( p,m (x p(x θ j φ j (x θ j φ j (x θ j φ j (x. jm+

6 7-6 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Thus, the itegrated squared bias is bias 2 ( p,m (xdx jm+ jm+ jm+ km+ jm+ θ 2 j. 2 θ j φ j (x dx ( θ j φ j (x km+ θ k φ k (x θ j θ k φ j (xφ k (xdx } {{ } I(jk Namely, the bias is determied by the sigal stregth of the igored basis, which makes sese because the bias should be reflectig the fact that we are ot usig all the basis ad if there are some importat basis (the oes with large θ j beig igored, the bias ought to be large. We kow that i KDE ad knn, the bias is ofte associated with the smoothess of the desity fuctio. How does the smoothess comes ito play i this case? It turs out that if the desity is smooth, the remaiig sigals jm+ θ2 j will also be small. To see this, we cosider a very simple model by assumig that we are usig the cosie basis ad dx (7.4 p (x 2 dx L (7.5 for some costat L >. Namely, the overall curvature of the desity fuctio is bouded. Usig the fact that for cosie basis fuctio φ j (x, Thus, equatio (7.5 implies φ j(x 2π(j si(π(j x, φ j (x 2π 2 (j 2 cos(π(j x π 2 (j 2 φ j (x. L p (x 2 dx π 4 θ j φ j (x 2 dx π 2 (j 2 θ j φ j (x 2 dx ( π 2 (j 2 θ j φ j (x π 2 (k 2 θ k φ k (x dx k k (j 2 (k 2 θ j θ k φ j (xφ k (xdx } {{ } I(jk π 4 (j 2 θj 2.

7 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach 7-7 Namely, equatio (7.5 implies This further implies that (j 4 θj 2 L π 4. (7.6 θj 2 O(M 4. (7.7 jm+ A ituitive explaatio is as follows. Equatio (7.6 implies that whe j, the sigal θj 2 O(j 5. The reaso is: if θj 2 j, equatio (7.6 becomes 5 (j 4 θj 2 (j 4 j 5 j! Thus, the tail sigals have to be shrikig toward at rate O(j 5. As a result, jm+ θ2 j jm+ O(j 5 O(M 4. Equatio (7.7 ad (7.4 together imply that the bias is at rate bias 2 ( p,m (xdx O(M 4. Aalysis of Variace. Now we tur to the aalysis of variace. Var( p,m (xdx Var θ j φ j (x dx φ 2 j(xvar( θ j + Var( θ j Var( θ j O σ 2 j ( M φ 2 j(xdx + } {{ } j k j k Now puttig both bias ad variace together, we obtai the rate of the MISE MISE( p,m. φ j (xφ k (xcov( θ j, θ k dx [ bias 2 ( p,m (x + Var( p,m (x ] dx O ( M 4 Cov( θ j, θ k φ j (xφ k (xdx } {{ } + O ( M. Thus, the optimal choice is M M C /5 for some positive costat C. Ad this leads to which is the same rate as the KDE ad knn. Remark. MISE( p,m O( 4/5,

8 7-8 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach (Sobolev space Agai, we see that the bias is depedet o the smoothess of the desity fuctio. Here, as log as the desity fuctio has a overall curvature (secod derivative beig bouded, we have a optimal MISE at the rate of O( 4/5. I fact, if the desity fuctio has stroger smoothess, such as the overall third derivatives beig bouded, we will have a eve faster covergece rate O( 6/7. Now for ay positive iteger β ad a positive umber L >, we defie W (β, L { p : } p (β (x 2 dx L < be a collectio of smooth desity fuctios. The the bias betwee a basis estimator ad ay desity p W (β, L is at the rate O(M 2β ad the optimal covergece rate is O( 2β 2β+. The collectio W (β, L is a space of fuctios ad is kow as the Sobolev Space. (Tuig parameter The umber of basis M i a basis estimator, the umber of eighborhood k i the knn approach, ad the smoothig badwidth h i the KDE all play a very similar role i biasvariace tradeoff. These quatities are called tuig parameters i statistics ad machie learig. Just like what we have see i the aalysis of the KDE, choosig these parameters is ofte a very difficult task because the optimal choice depeds o the actual desity fuctio, which is ukow to us. A priciple approach to choosig the tuig parameters is based o miimizig a error estimator. For istace, i the basis estimator, we try to fid a estimator for the MISE, MISE( p,m. Whe we chage value of M, this error estimator will also chage. We the choose M by miimizig the error, i.e., M argmi M MISE( p,m. The we pick M basis ad costruct our estimator.

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Lecture 15: Learning Theory: Concentration Inequalities

Lecture 15: Learning Theory: Concentration Inequalities STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

Study the bias (due to the nite dimensional approximation) and variance of the estimators

Study the bias (due to the nite dimensional approximation) and variance of the estimators 2 Series Methods 2. Geeral Approach A model has parameters (; ) where is ite-dimesioal ad is oparametric. (Sometimes, there is o :) We will focus o regressio. The fuctio is approximated by a series a ite

More information

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3

Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3 No-Parametric Techiques Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3 Parametric vs. No-Parametric Parametric Based o Fuctios (e.g Normal Distributio) Uimodal Oly oe peak Ulikely real data cofies

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Ma 530 Introduction to Power Series

Ma 530 Introduction to Power Series Ma 530 Itroductio to Power Series Please ote that there is material o power series at Visual Calculus. Some of this material was used as part of the presetatio of the topics that follow. What is a Power

More information

Lecture Chapter 6: Convergence of Random Sequences

Lecture Chapter 6: Convergence of Random Sequences ECE5: Aalysis of Radom Sigals Fall 6 Lecture Chapter 6: Covergece of Radom Sequeces Dr Salim El Rouayheb Scribe: Abhay Ashutosh Doel, Qibo Zhag, Peiwe Tia, Pegzhe Wag, Lu Liu Radom sequece Defiitio A ifiite

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

ECE 901 Lecture 13: Maximum Likelihood Estimation

ECE 901 Lecture 13: Maximum Likelihood Estimation ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered

More information

Efficient GMM LECTURE 12 GMM II

Efficient GMM LECTURE 12 GMM II DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet

More information

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1. Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

More information

Section 11.8: Power Series

Section 11.8: Power Series Sectio 11.8: Power Series 1. Power Series I this sectio, we cosider geeralizig the cocept of a series. Recall that a series is a ifiite sum of umbers a. We ca talk about whether or ot it coverges ad i

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Lecture 3. Properties of Summary Statistics: Sampling Distribution Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary

More information

Pattern Classification, Ch4 (Part 1)

Pattern Classification, Ch4 (Part 1) Patter Classificatio All materials i these slides were take from Patter Classificatio (2d ed) by R O Duda, P E Hart ad D G Stork, Joh Wiley & Sos, 2000 with the permissio of the authors ad the publisher

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

9.3 The INTEGRAL TEST; p-series

9.3 The INTEGRAL TEST; p-series Lecture 9.3 & 9.4 Math 0B Nguye of 6 Istructor s Versio 9.3 The INTEGRAL TEST; p-series I this ad the followig sectio, you will study several covergece tests that apply to series with positive terms. Note

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Monte Carlo Integration

Monte Carlo Integration Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce

More information

Chapter 4. Fourier Series

Chapter 4. Fourier Series Chapter 4. Fourier Series At this poit we are ready to ow cosider the caoical equatios. Cosider, for eample the heat equatio u t = u, < (4.) subject to u(, ) = si, u(, t) = u(, t) =. (4.) Here,

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

8.1 Introduction. 8. Nonparametric Inference Using Orthogonal Functions

8.1 Introduction. 8. Nonparametric Inference Using Orthogonal Functions 8. Noparametric Iferece Usig Orthogoal Fuctios 1. Itroductio. Noparametric Regressio 3. Irregular Desigs 4. Desity Estimatio 5. Compariso of Methods 8.1 Itroductio Use a orthogoal basis to covert oparametric

More information

Lecture Notes for Analysis Class

Lecture Notes for Analysis Class Lecture Notes for Aalysis Class Topological Spaces A topology for a set X is a collectio T of subsets of X such that: (a) X ad the empty set are i T (b) Uios of elemets of T are i T (c) Fiite itersectios

More information

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Kernel density estimator

Kernel density estimator Jauary, 07 NONPARAMETRIC ERNEL DENSITY ESTIMATION I this lecture, we discuss kerel estimatio of probability desity fuctios PDF Noparametric desity estimatio is oe of the cetral problems i statistics I

More information

MAT1026 Calculus II Basic Convergence Tests for Series

MAT1026 Calculus II Basic Convergence Tests for Series MAT026 Calculus II Basic Covergece Tests for Series Egi MERMUT 202.03.08 Dokuz Eylül Uiversity Faculty of Sciece Departmet of Mathematics İzmir/TURKEY Cotets Mootoe Covergece Theorem 2 2 Series of Real

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

Lecture 13: Maximum Likelihood Estimation

Lecture 13: Maximum Likelihood Estimation ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

LECTURE 2 LEAST SQUARES CROSS-VALIDATION FOR KERNEL DENSITY ESTIMATION

LECTURE 2 LEAST SQUARES CROSS-VALIDATION FOR KERNEL DENSITY ESTIMATION Jauary 3 07 LECTURE LEAST SQUARES CROSS-VALIDATION FOR ERNEL DENSITY ESTIMATION Noparametric kerel estimatio is extremely sesitive to te coice of badwidt as larger values of result i averagig over more

More information

STAT Homework 2 - Solutions

STAT Homework 2 - Solutions STAT-36700 Homework - Solutios Fall 08 September 4, 08 This cotais solutios for Homework. Please ote that we have icluded several additioal commets ad approaches to the problems to give you better isight.

More information

Ma 530 Infinite Series I

Ma 530 Infinite Series I Ma 50 Ifiite Series I Please ote that i additio to the material below this lecture icorporated material from the Visual Calculus web site. The material o sequeces is at Visual Sequeces. (To use this li

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

Chapter 10: Power Series

Chapter 10: Power Series Chapter : Power Series 57 Chapter Overview: Power Series The reaso series are part of a Calculus course is that there are fuctios which caot be itegrated. All power series, though, ca be itegrated because

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

1 Approximating Integrals using Taylor Polynomials

1 Approximating Integrals using Taylor Polynomials Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

MA131 - Analysis 1. Workbook 2 Sequences I

MA131 - Analysis 1. Workbook 2 Sequences I MA3 - Aalysis Workbook 2 Sequeces I Autum 203 Cotets 2 Sequeces I 2. Itroductio.............................. 2.2 Icreasig ad Decreasig Sequeces................ 2 2.3 Bouded Sequeces..........................

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

Lecture 9: Regression: Regressogram and Kernel Regression

Lecture 9: Regression: Regressogram and Kernel Regression STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 9: Regressio: Regressogram ad erel Regressio Istructor: Ye-Ci Ce Referece: Capter 5 of All of oparametric statistics 9 Itroductio Let X,

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense,

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense, 3. Z Trasform Referece: Etire Chapter 3 of text. Recall that the Fourier trasform (FT) of a DT sigal x [ ] is ω ( ) [ ] X e = j jω k = xe I order for the FT to exist i the fiite magitude sese, S = x [

More information

Lecture 33: Bootstrap

Lecture 33: Bootstrap Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

Math 155 (Lecture 3)

Math 155 (Lecture 3) Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

Metric Space Properties

Metric Space Properties Metric Space Properties Math 40 Fial Project Preseted by: Michael Brow, Alex Cordova, ad Alyssa Sachez We have already poited out ad will recogize throughout this book the importace of compact sets. All

More information

Fall 2013 MTH431/531 Real analysis Section Notes

Fall 2013 MTH431/531 Real analysis Section Notes Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

11 Correlation and Regression

11 Correlation and Regression 11 Correlatio Regressio 11.1 Multivariate Data Ofte we look at data where several variables are recorded for the same idividuals or samplig uits. For example, at a coastal weather statio, we might record

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information

Vector Quantization: a Limiting Case of EM

Vector Quantization: a Limiting Case of EM . Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z

More information

Estimation of the Mean and the ACVF

Estimation of the Mean and the ACVF Chapter 5 Estimatio of the Mea ad the ACVF A statioary process {X t } is characterized by its mea ad its autocovariace fuctio γ ), ad so by the autocorrelatio fuctio ρ ) I this chapter we preset the estimators

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

1.010 Uncertainty in Engineering Fall 2008

1.010 Uncertainty in Engineering Fall 2008 MIT OpeCourseWare http://ocw.mit.edu.00 Ucertaity i Egieerig Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu.terms. .00 - Brief Notes # 9 Poit ad Iterval

More information

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes. Term Test October 3, 003 Name Math 56 Studet Number Directio: This test is worth 50 poits. You are required to complete this test withi 50 miutes. I order to receive full credit, aswer each problem completely

More information

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as

More information

The Growth of Functions. Theoretical Supplement

The Growth of Functions. Theoretical Supplement The Growth of Fuctios Theoretical Supplemet The Triagle Iequality The triagle iequality is a algebraic tool that is ofte useful i maipulatig absolute values of fuctios. The triagle iequality says that

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Bayesian Methods: Introduction to Multi-parameter Models

Bayesian Methods: Introduction to Multi-parameter Models Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested

More information

University of Colorado Denver Dept. Math. & Stat. Sciences Applied Analysis Preliminary Exam 13 January 2012, 10:00 am 2:00 pm. Good luck!

University of Colorado Denver Dept. Math. & Stat. Sciences Applied Analysis Preliminary Exam 13 January 2012, 10:00 am 2:00 pm. Good luck! Uiversity of Colorado Dever Dept. Math. & Stat. Scieces Applied Aalysis Prelimiary Exam 13 Jauary 01, 10:00 am :00 pm Name: The proctor will let you read the followig coditios before the exam begis, ad

More information

CHAPTER 10 INFINITE SEQUENCES AND SERIES

CHAPTER 10 INFINITE SEQUENCES AND SERIES CHAPTER 10 INFINITE SEQUENCES AND SERIES 10.1 Sequeces 10.2 Ifiite Series 10.3 The Itegral Tests 10.4 Compariso Tests 10.5 The Ratio ad Root Tests 10.6 Alteratig Series: Absolute ad Coditioal Covergece

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Math 116 Final Exam December 12, 2014

Math 116 Final Exam December 12, 2014 Math 6 Fial Exam December 2, 24 Name: EXAM SOLUTIONS Istructor: Sectio:. Do ot ope this exam util you are told to do so. 2. This exam has 4 pages icludig this cover. There are 2 problems. Note that the

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019 Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /

More information

Sieve Estimators: Consistency and Rates of Convergence

Sieve Estimators: Consistency and Rates of Convergence EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes

More information

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables CSCI-B609: A Theorist s Toolkit, Fall 06 Aug 3 Lecture 0: the Cetral Limit Theorem Lecturer: Yua Zhou Scribe: Yua Xie & Yua Zhou Cetral Limit Theorem for iid radom variables Let us say that we wat to aalyze

More information

Kolmogorov-Smirnov type Tests for Local Gaussianity in High-Frequency Data

Kolmogorov-Smirnov type Tests for Local Gaussianity in High-Frequency Data Proceedigs 59th ISI World Statistics Cogress, 5-30 August 013, Hog Kog (Sessio STS046) p.09 Kolmogorov-Smirov type Tests for Local Gaussiaity i High-Frequecy Data George Tauche, Duke Uiversity Viktor Todorov,

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

Seunghee Ye Ma 8: Week 5 Oct 28

Seunghee Ye Ma 8: Week 5 Oct 28 Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value

More information

Math 113 Exam 3 Practice

Math 113 Exam 3 Practice Math Exam Practice Exam will cover.-.9. This sheet has three sectios. The first sectio will remid you about techiques ad formulas that you should kow. The secod gives a umber of practice questios for you

More information

Math 525: Lecture 5. January 18, 2018

Math 525: Lecture 5. January 18, 2018 Math 525: Lecture 5 Jauary 18, 2018 1 Series (review) Defiitio 1.1. A sequece (a ) R coverges to a poit L R (writte a L or lim a = L) if for each ǫ > 0, we ca fid N such that a L < ǫ for all N. If the

More information

x a x a Lecture 2 Series (See Chapter 1 in Boas)

x a x a Lecture 2 Series (See Chapter 1 in Boas) Lecture Series (See Chapter i Boas) A basic ad very powerful (if pedestria, recall we are lazy AD smart) way to solve ay differetial (or itegral) equatio is via a series expasio of the correspodig solutio

More information

INFINITE SEQUENCES AND SERIES

INFINITE SEQUENCES AND SERIES 11 INFINITE SEQUENCES AND SERIES INFINITE SEQUENCES AND SERIES 11.4 The Compariso Tests I this sectio, we will lear: How to fid the value of a series by comparig it with a kow series. COMPARISON TESTS

More information

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet

More information

Math 2784 (or 2794W) University of Connecticut

Math 2784 (or 2794W) University of Connecticut ORDERS OF GROWTH PAT SMITH Math 2784 (or 2794W) Uiversity of Coecticut Date: Mar. 2, 22. ORDERS OF GROWTH. Itroductio Gaiig a ituitive feel for the relative growth of fuctios is importat if you really

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014. Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

4.3 Growth Rates of Solutions to Recurrences

4.3 Growth Rates of Solutions to Recurrences 4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.

More information

MA131 - Analysis 1. Workbook 9 Series III

MA131 - Analysis 1. Workbook 9 Series III MA3 - Aalysis Workbook 9 Series III Autum 004 Cotets 4.4 Series with Positive ad Negative Terms.............. 4.5 Alteratig Series.......................... 4.6 Geeral Series.............................

More information

Basis for simulation techniques

Basis for simulation techniques Basis for simulatio techiques M. Veeraraghava, March 7, 004 Estimatio is based o a collectio of experimetal outcomes, x, x,, x, where each experimetal outcome is a value of a radom variable. x i. Defiitios

More information

Lecture 2: April 3, 2013

Lecture 2: April 3, 2013 TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,

More information