Lecture 7: Density Estimation: knearest Neighbor and Basis Approach


 Scot Fitzgerald
 1 years ago
 Views:
Transcription
1 STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: knearest Neighbor ad Basis Approach Istructor: YeChi Che Referece: Sectio 8.4 of All of Noparametric Statistics. 7. kearest eighbor kearest eighbor (knn is a cool ad powerful idea for oparametric estimatio. Today we will talk about its applicatio i desity estimatio. I the future, we will lear how to use it for regressio aalysis ad classificatio. Let X,, X p be our radom sample. Assume each observatio has d differet variables; amely, X i R d. For a give poit x, we first rak every observatio based o its distace to x. Let R k (x deotes the distace from x to its kth earest eighbor poit. For a give poit x, the knn desity estimator estimates the desity by where V d p k (x k πd/2 Γ( d 2 V d Rk d(x k Volume of a ddimesioal ball with radius beig R k (x, + is the volume of a uit ddimesioal ball ad Γ(x is the Gamma fuctio. Here are the results whe d, 2, ad 3 d, V 2: p k (x k d 2, V π: p k (x k d 3, V 4 3 π: p k(x k 2R k (x. πrk 2 (x. 3 4πRk 3 (x. What is the ituitio of a knn desity estimator? By the defiitio of R k (x, the ball cetered at x with radius R k (x B(x, R k (x {y : x y R k (x} satisfies the fact that k I(X i B(x, R k (x. i Namely, ratio of observatios withi B(x, R k (x is k/. Recall from the relatio betwee EDF ad CDF, the quatity I(X i B(x, R k (x i ca be viewed as a estimator of the quatity P (X i B(x, R k (x 7 B(x,R k (x p(ydy.
2 72 Lecture 7: Desity Estimatio: knearest Neighbor ad Basis Approach Whe is large ad k is relatively small compared to, R k (x will be small because the ratio k is small. Thus, the desity p(y withi the regio B(x, R k (x will ot chage too much. Namely, p(y p(x for every y B(x, R k (x. Note that p(x is the ceter of the ball B(x, R k (x. Therefore, P (X i B(x, R k (x B(x,R k (x p(ydy p(x dy p(x V d Rk(x. d B(x,R k (x This quatity will be the target of the estimator i I(X i B(x, R k (x, which equals to k. As a result, we ca say that which leads to This motivates us to use as a desity estimator. p(x V d R d k(x P (X i B(x, R k (x I(X i B(x, R k (x k, i p(x V d Rk(x d k p(x k V d Rk d(x p k (x k V d Rk d(x Example. We cosider a simple example i d. Assume our data is X {, 2, 6,, 3, 4, 2, 33}. What is the knn desity estimator at x 5 with k 2? First, we calculate R 2 (5. The distace from x 5 to each data poit i X is {4, 3,, 6, 8, 9, 5, 28}. Thus, R 2 (5 3 ad p k ( R 2 (5 24. What will the desity estimator be whe we choose k 5? I this case, R 5 (5 8 so p k ( R 5 ( Now we see that differet value of k gives a differet desity estimate eve at the same x. How do we choose k? Well, just as the smoothig badwidth i the KDE, it is a very difficult problem i practice. However, we ca do some theoretical aalysis to get a rough idea about how k should be chagig with respect to the sample size. 7.. Asymptotic theory The asymptotic aalysis of a knn estimator is quiet complicated so here I oly stated its result i d. The bias of the knn estimator is p ( ( 2 ( 2 (x k p(x k bias( p k (x E( p k (x p(x b p 2 + b 2 (x k + o +, k where b ad b 2 are two costats. The variace of the knn estimator is ( Var( p k (x v p2 (x + o, k k
3 Lecture 7: Desity Estimatio: knearest Neighbor ad Basis Approach 73 where v is a costat. The quatity k is somethig we ca choose. We eed k whe to make sure both bias ad variace coverge to. However, how k diverges affects the quality of estimatio. Whe k is large, the variace is small while the bias is large. Whe k is small, the variace is large ad the bias teds to be small but it could also be large (the secod compoet i the bias will be large. To balace the bias ad variace, we cosider the mea square error, which is at the rate ( k 4 MSE( p k (x O 4 +. k This motivates us to choose k C 4 5 for some costat C. This leads to the optimal covergece rate for a knn desity estimator. Remark. MSE( p k,opt (x O( 4 5 Whe we cosider a ddimesioal data, the bias will be of the rate ( ( 2 k d bias( p k (x O + k ad the variace is still at rate Var( p k (x O (. k This shows a very differet pheomeo compared to the KDE. I KDE, the rate of variace depeds o the dimesio whereas the bias remais the same. I knn, the rate of variace stays the same rate but the rate of bias chages with respect to the dimesio. Oe ituitio is that o matter what dimesio is, the ball B(x, R k (x always cotai k observatios. Thus, the variability of a knn estimator is caused by k poits, which is idepedet of the dimesio. O the other had, i KDE, the same h i differet dimesios will cover differet umber of observatios so the variability chages with respect to the dimesio. The knn approach has a advatage that it ca be computed very efficietly usig kdtree algorithm. This is a particularly useful feature whe we have a huge amout of data ad whe the dimesio of the data is large. However, a dowside of the knn is that the desity ofte has a heavytail, which implies it may ot work well whe x is very large. Moreover, whe d, the desity estimator p k (x is ot eve a desity fuctio (the itegral is ifiite!. 7.2 Basis approach I this sectio, we assume that the PDF p(x is supported o [, ]. Namely, p(x > oly i [, ]. Whe the PDF p(x is smooth (i geeral, we eed p to be squared itegrable, i.e., p(x2 dx <, we ca use a orthoormal basis to approximate this fuctio. This approach has several other ames: the basis estimator, projectio estimator, ad a orthogoal series estimator.
4 74 Lecture 7: Desity Estimatio: knearest Neighbor ad Basis Approach Let {φ (x, φ 2 (x,, φ m (x, } be a set of basis fuctios. The we have p(x θ j φ j (x. The quatity θ j is the coefficiet of each basis. I sigal process, these quatities are refereed to as the sigal. The collectio {φ (x, φ 2 (x,, φ m (x, } is called a basis if its elemets have the followig property: Uit : φ 2 j(xdx (7. for every j,. Orthoormal: φ j (xφ k (xdx (7.2 for every j k,. Here are some cocrete examples of the basis: Cosie basis: φ (x, φ j (x 2 cos(π(j x, j 2, 3,, Trigoometric basis: φ (x, φ 2j (x 2 cos(2πjx, φ 2j+ (x 2 si(2πjx, j, 2, Ofte the basis is somethig we ca choose so it is kow to us. What is ukow to use is the coefficiets θ, θ 2,,. Thus, the goal is to estimate these coefficiets usig the radom sample X,, X. How do we estimate these parameters? We start with some simple aalysis. For ay basis φ j (x, cosider the followig itegral: E(φ j (X φ j (xdp (x φ j (xp(xdx φ j (x θ k φ k (xdx k θ k φ j (xφ k (xdx k }{{} except k j θ k I(k j k θ j.
5 Lecture 7: Desity Estimatio: knearest Neighbor ad Basis Approach 75 Namely, the expectatio of φ j (X is exactly the coefficiet θ j. This motivates us to use the sample average as a estimator: θ j φ j (X i. By costructio, this estimator is ubiased, i.e., E( θ j θ j. The variace of this estimator is i where σ 2 j E(φ2 j (X φ 2 j. Var( θ j Var(φ j(x ( E(φ 2 j (X E 2 (φ j (X (E(φ2 j(x φ 2 j σ2 j, I practice, we caot use all the basis because there will be ifiite umber of them beig calculated. So we will use oly M basis as our estimator. Namely, our estimator is p,m (x θ j φ j (x. (7.3 Later i the asymptotic aalysis, we will show that we should ot choose M to be too large because of the biasvariace tradeoff Asymptotic theory To aalyze the quality of our estimator, we use the mea itegrated squared error (MISE. Namely, we wat to aalyze MISE( p,m E( ( p,m (x p(x 2 dx [ bias 2 ( p,m (x + Var( p,m (x ] dx Aalysis of Bias. Because each θ j is a ubiased estimator of θ j, we have E( p,m (x E( θ j φ j (x E( θ j φ j (x θ j φ j (x. Thus, the bias at poit x is bias( p,m (x E( p,m (x p(x θ j φ j (x θ j φ j (x θ j φ j (x. jm+
6 76 Lecture 7: Desity Estimatio: knearest Neighbor ad Basis Approach Thus, the itegrated squared bias is bias 2 ( p,m (xdx jm+ jm+ jm+ km+ jm+ θ 2 j. 2 θ j φ j (x dx ( θ j φ j (x km+ θ k φ k (x θ j θ k φ j (xφ k (xdx } {{ } I(jk Namely, the bias is determied by the sigal stregth of the igored basis, which makes sese because the bias should be reflectig the fact that we are ot usig all the basis ad if there are some importat basis (the oes with large θ j beig igored, the bias ought to be large. We kow that i KDE ad knn, the bias is ofte associated with the smoothess of the desity fuctio. How does the smoothess comes ito play i this case? It turs out that if the desity is smooth, the remaiig sigals jm+ θ2 j will also be small. To see this, we cosider a very simple model by assumig that we are usig the cosie basis ad dx (7.4 p (x 2 dx L (7.5 for some costat L >. Namely, the overall curvature of the desity fuctio is bouded. Usig the fact that for cosie basis fuctio φ j (x, Thus, equatio (7.5 implies φ j(x 2π(j si(π(j x, φ j (x 2π 2 (j 2 cos(π(j x π 2 (j 2 φ j (x. L p (x 2 dx π 4 θ j φ j (x 2 dx π 2 (j 2 θ j φ j (x 2 dx ( π 2 (j 2 θ j φ j (x π 2 (k 2 θ k φ k (x dx k k (j 2 (k 2 θ j θ k φ j (xφ k (xdx } {{ } I(jk π 4 (j 2 θj 2.
7 Lecture 7: Desity Estimatio: knearest Neighbor ad Basis Approach 77 Namely, equatio (7.5 implies This further implies that (j 4 θj 2 L π 4. (7.6 θj 2 O(M 4. (7.7 jm+ A ituitive explaatio is as follows. Equatio (7.6 implies that whe j, the sigal θj 2 O(j 5. The reaso is: if θj 2 j, equatio (7.6 becomes 5 (j 4 θj 2 (j 4 j 5 j! Thus, the tail sigals have to be shrikig toward at rate O(j 5. As a result, jm+ θ2 j jm+ O(j 5 O(M 4. Equatio (7.7 ad (7.4 together imply that the bias is at rate bias 2 ( p,m (xdx O(M 4. Aalysis of Variace. Now we tur to the aalysis of variace. Var( p,m (xdx Var θ j φ j (x dx φ 2 j(xvar( θ j + Var( θ j Var( θ j O σ 2 j ( M φ 2 j(xdx + } {{ } j k j k Now puttig both bias ad variace together, we obtai the rate of the MISE MISE( p,m. φ j (xφ k (xcov( θ j, θ k dx [ bias 2 ( p,m (x + Var( p,m (x ] dx O ( M 4 Cov( θ j, θ k φ j (xφ k (xdx } {{ } + O ( M. Thus, the optimal choice is M M C /5 for some positive costat C. Ad this leads to which is the same rate as the KDE ad knn. Remark. MISE( p,m O( 4/5,
8 78 Lecture 7: Desity Estimatio: knearest Neighbor ad Basis Approach (Sobolev space Agai, we see that the bias is depedet o the smoothess of the desity fuctio. Here, as log as the desity fuctio has a overall curvature (secod derivative beig bouded, we have a optimal MISE at the rate of O( 4/5. I fact, if the desity fuctio has stroger smoothess, such as the overall third derivatives beig bouded, we will have a eve faster covergece rate O( 6/7. Now for ay positive iteger β ad a positive umber L >, we defie W (β, L { p : } p (β (x 2 dx L < be a collectio of smooth desity fuctios. The the bias betwee a basis estimator ad ay desity p W (β, L is at the rate O(M 2β ad the optimal covergece rate is O( 2β 2β+. The collectio W (β, L is a space of fuctios ad is kow as the Sobolev Space. (Tuig parameter The umber of basis M i a basis estimator, the umber of eighborhood k i the knn approach, ad the smoothig badwidth h i the KDE all play a very similar role i biasvariace tradeoff. These quatities are called tuig parameters i statistics ad machie learig. Just like what we have see i the aalysis of the KDE, choosig these parameters is ofte a very difficult task because the optimal choice depeds o the actual desity fuctio, which is ukow to us. A priciple approach to choosig the tuig parameters is based o miimizig a error estimator. For istace, i the basis estimator, we try to fid a estimator for the MISE, MISE( p,m. Whe we chage value of M, this error estimator will also chage. We the choose M by miimizig the error, i.e., M argmi M MISE( p,m. The we pick M basis ad costruct our estimator.
Lecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: YeChi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More informationLecture 15: Learning Theory: Concentration Inequalities
STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: YeChi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 11
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple
More informationStudy the bias (due to the nite dimensional approximation) and variance of the estimators
2 Series Methods 2. Geeral Approach A model has parameters (; ) where is itedimesioal ad is oparametric. (Sometimes, there is o :) We will focus o regressio. The fuctio is approximated by a series a ite
More informationJanuary 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS
Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we
More informationResampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.
Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator
More informationLecture 10 October Minimaxity and least favorable prior sequences
STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least
More informationJacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3
NoParametric Techiques Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3 Parametric vs. NoParametric Parametric Based o Fuctios (e.g Normal Distributio) Uimodal Oly oe peak Ulikely real data cofies
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationMa 530 Introduction to Power Series
Ma 530 Itroductio to Power Series Please ote that there is material o power series at Visual Calculus. Some of this material was used as part of the presetatio of the topics that follow. What is a Power
More informationLecture Chapter 6: Convergence of Random Sequences
ECE5: Aalysis of Radom Sigals Fall 6 Lecture Chapter 6: Covergece of Radom Sequeces Dr Salim El Rouayheb Scribe: Abhay Ashutosh Doel, Qibo Zhag, Peiwe Tia, Pegzhe Wag, Lu Liu Radom sequece Defiitio A ifiite
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationECE 901 Lecture 13: Maximum Likelihood Estimation
ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered
More informationEfficient GMM LECTURE 12 GMM II
DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet
More informationEcon 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chisquare Distribution, Student s t distribution 1.
Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chisquare Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio
More informationSection 11.8: Power Series
Sectio 11.8: Power Series 1. Power Series I this sectio, we cosider geeralizig the cocept of a series. Recall that a series is a ifiite sum of umbers a. We ca talk about whether or ot it coverges ad i
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationLecture 3. Properties of Summary Statistics: Sampling Distribution
Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary
More informationPattern Classification, Ch4 (Part 1)
Patter Classificatio All materials i these slides were take from Patter Classificatio (2d ed) by R O Duda, P E Hart ad D G Stork, Joh Wiley & Sos, 2000 with the permissio of the authors ad the publisher
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationEECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1
EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum
More information9.3 The INTEGRAL TEST; pseries
Lecture 9.3 & 9.4 Math 0B Nguye of 6 Istructor s Versio 9.3 The INTEGRAL TEST; pseries I this ad the followig sectio, you will study several covergece tests that apply to series with positive terms. Note
More informationRandom Variables, Sampling and Estimation
Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig
More informationMonte Carlo Integration
Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce
More informationChapter 4. Fourier Series
Chapter 4. Fourier Series At this poit we are ready to ow cosider the caoical equatios. Cosider, for eample the heat equatio u t = u, < (4.) subject to u(, ) = si, u(, t) = u(, t) =. (4.) Here,
More information(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3
MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special
More informationIntroduction to Machine Learning DIS10
CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig
More information8.1 Introduction. 8. Nonparametric Inference Using Orthogonal Functions
8. Noparametric Iferece Usig Orthogoal Fuctios 1. Itroductio. Noparametric Regressio 3. Irregular Desigs 4. Desity Estimatio 5. Compariso of Methods 8.1 Itroductio Use a orthogoal basis to covert oparametric
More informationLecture Notes for Analysis Class
Lecture Notes for Aalysis Class Topological Spaces A topology for a set X is a collectio T of subsets of X such that: (a) X ad the empty set are i T (b) Uios of elemets of T are i T (c) Fiite itersectios
More information62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +
62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of
More informationLet us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.
Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationKernel density estimator
Jauary, 07 NONPARAMETRIC ERNEL DENSITY ESTIMATION I this lecture, we discuss kerel estimatio of probability desity fuctios PDF Noparametric desity estimatio is oe of the cetral problems i statistics I
More informationMAT1026 Calculus II Basic Convergence Tests for Series
MAT026 Calculus II Basic Covergece Tests for Series Egi MERMUT 202.03.08 Dokuz Eylül Uiversity Faculty of Sciece Departmet of Mathematics İzmir/TURKEY Cotets Mootoe Covergece Theorem 2 2 Series of Real
More informationAlgorithms for Clustering
CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat
More informationOutline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression
REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Stepwise forward regressio Regressio methods Statistical techiques
More informationLecture 13: Maximum Likelihood Estimation
ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More informationLECTURE 2 LEAST SQUARES CROSSVALIDATION FOR KERNEL DENSITY ESTIMATION
Jauary 3 07 LECTURE LEAST SQUARES CROSSVALIDATION FOR ERNEL DENSITY ESTIMATION Noparametric kerel estimatio is extremely sesitive to te coice of badwidt as larger values of result i averagig over more
More informationSTAT Homework 2  Solutions
STAT36700 Homework  Solutios Fall 08 September 4, 08 This cotais solutios for Homework. Please ote that we have icluded several additioal commets ad approaches to the problems to give you better isight.
More informationMa 530 Infinite Series I
Ma 50 Ifiite Series I Please ote that i additio to the material below this lecture icorporated material from the Visual Calculus web site. The material o sequeces is at Visual Sequeces. (To use this li
More information6.867 Machine learning, lecture 7 (Jaakkola) 1
6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit
More informationChapter 10: Power Series
Chapter : Power Series 57 Chapter Overview: Power Series The reaso series are part of a Calculus course is that there are fuctios which caot be itegrated. All power series, though, ca be itegrated because
More informationSequences and Series of Functions
Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges
More information1 Approximating Integrals using Taylor Polynomials
Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................
More informationEcon 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara
Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio
More informationMA131  Analysis 1. Workbook 2 Sequences I
MA3  Aalysis Workbook 2 Sequeces I Autum 203 Cotets 2 Sequeces I 2. Itroductio.............................. 2.2 Icreasig ad Decreasig Sequeces................ 2 2.3 Bouded Sequeces..........................
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationLecture 9: Regression: Regressogram and Kernel Regression
STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 9: Regressio: Regressogram ad erel Regressio Istructor: YeCi Ce Referece: Capter 5 of All of oparametric statistics 9 Itroductio Let X,
More informationEconomics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator
Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters
More informationStatistics 511 Additional Materials
Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability
More information3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense,
3. Z Trasform Referece: Etire Chapter 3 of text. Recall that the Fourier trasform (FT) of a DT sigal x [ ] is ω ( ) [ ] X e = j jω k = xe I order for the FT to exist i the fiite magitude sese, S = x [
More informationLecture 33: Bootstrap
Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece
More informationMachine Learning for Data Science (CS 4786)
Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm
More informationMath 155 (Lecture 3)
Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a elemet subset of the set {,,,
More informationLecture 3: August 31
36705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More informationMetric Space Properties
Metric Space Properties Math 40 Fial Project Preseted by: Michael Brow, Alex Cordova, ad Alyssa Sachez We have already poited out ad will recogize throughout this book the importace of compact sets. All
More informationFall 2013 MTH431/531 Real analysis Section Notes
Fall 013 MTH431/531 Real aalysis Sectio 8.18. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters
More informationMATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4
MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.
More information11 Correlation and Regression
11 Correlatio Regressio 11.1 Multivariate Data Ofte we look at data where several variables are recorded for the same idividuals or samplig uits. For example, at a coastal weather statio, we might record
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete
More informationVector Quantization: a Limiting Case of EM
. Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z
More informationEstimation of the Mean and the ACVF
Chapter 5 Estimatio of the Mea ad the ACVF A statioary process {X t } is characterized by its mea ad its autocovariace fuctio γ ), ad so by the autocorrelatio fuctio ρ ) I this chapter we preset the estimators
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More information1.010 Uncertainty in Engineering Fall 2008
MIT OpeCourseWare http://ocw.mit.edu.00 Ucertaity i Egieerig Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu.terms. .00  Brief Notes # 9 Poit ad Iterval
More informationDirection: This test is worth 250 points. You are required to complete this test within 50 minutes.
Term Test October 3, 003 Name Math 56 Studet Number Directio: This test is worth 50 poits. You are required to complete this test withi 50 miutes. I order to receive full credit, aswer each problem completely
More informationA sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as
More informationThe Growth of Functions. Theoretical Supplement
The Growth of Fuctios Theoretical Supplemet The Triagle Iequality The triagle iequality is a algebraic tool that is ofte useful i maipulatig absolute values of fuctios. The triagle iequality says that
More informationStatistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.
Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized
More informationBayesian Methods: Introduction to Multiparameter Models
Bayesia Methods: Itroductio to Multiparameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested
More informationUniversity of Colorado Denver Dept. Math. & Stat. Sciences Applied Analysis Preliminary Exam 13 January 2012, 10:00 am 2:00 pm. Good luck!
Uiversity of Colorado Dever Dept. Math. & Stat. Scieces Applied Aalysis Prelimiary Exam 13 Jauary 01, 10:00 am :00 pm Name: The proctor will let you read the followig coditios before the exam begis, ad
More informationCHAPTER 10 INFINITE SEQUENCES AND SERIES
CHAPTER 10 INFINITE SEQUENCES AND SERIES 10.1 Sequeces 10.2 Ifiite Series 10.3 The Itegral Tests 10.4 Compariso Tests 10.5 The Ratio ad Root Tests 10.6 Alteratig Series: Absolute ad Coditioal Covergece
More informationChapter 3. Strong convergence. 3.1 Definition of almost sure convergence
Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i
More informationMath 116 Final Exam December 12, 2014
Math 6 Fial Exam December 2, 24 Name: EXAM SOLUTIONS Istructor: Sectio:. Do ot ope this exam util you are told to do so. 2. This exam has 4 pages icludig this cover. There are 2 problems. Note that the
More informationCEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering
CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio
More informationLecture 12: September 27
36705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.
More informationOutline. CSCI567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019
Outlie CSCI567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /
More informationSieve Estimators: Consistency and Rates of Convergence
EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia KatzSamuels, Brado Oselio, PiYu Che Disclaimer: These otes
More informationLecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables
CSCIB609: A Theorist s Toolkit, Fall 06 Aug 3 Lecture 0: the Cetral Limit Theorem Lecturer: Yua Zhou Scribe: Yua Xie & Yua Zhou Cetral Limit Theorem for iid radom variables Let us say that we wat to aalyze
More informationKolmogorovSmirnov type Tests for Local Gaussianity in HighFrequency Data
Proceedigs 59th ISI World Statistics Cogress, 530 August 013, Hog Kog (Sessio STS046) p.09 KolmogorovSmirov type Tests for Local Gaussiaity i HighFrequecy Data George Tauche, Duke Uiversity Viktor Todorov,
More informationChapter 6 Principles of Data Reduction
Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a
More informationSeunghee Ye Ma 8: Week 5 Oct 28
Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value
More informationMath 113 Exam 3 Practice
Math Exam Practice Exam will cover..9. This sheet has three sectios. The first sectio will remid you about techiques ad formulas that you should kow. The secod gives a umber of practice questios for you
More informationMath 525: Lecture 5. January 18, 2018
Math 525: Lecture 5 Jauary 18, 2018 1 Series (review) Defiitio 1.1. A sequece (a ) R coverges to a poit L R (writte a L or lim a = L) if for each ǫ > 0, we ca fid N such that a L < ǫ for all N. If the
More informationLecture 14: Maximum Likelihood and Complexity Regularization
ECE90 Sprig 007 Statistical Learig Theory Istructor: R Nowak Lecture 4: Maximum Likelihood ad Complexity Regularizatio Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio
More informationx a x a Lecture 2 Series (See Chapter 1 in Boas)
Lecture Series (See Chapter i Boas) A basic ad very powerful (if pedestria, recall we are lazy AD smart) way to solve ay differetial (or itegral) equatio is via a series expasio of the correspodig solutio
More informationINFINITE SEQUENCES AND SERIES
11 INFINITE SEQUENCES AND SERIES INFINITE SEQUENCES AND SERIES 11.4 The Compariso Tests I this sectio, we will lear: How to fid the value of a series by comparig it with a kow series. COMPARISON TESTS
More informationSequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet
More informationMath 2784 (or 2794W) University of Connecticut
ORDERS OF GROWTH PAT SMITH Math 2784 (or 2794W) Uiversity of Coecticut Date: Mar. 2, 22. ORDERS OF GROWTH. Itroductio Gaiig a ituitive feel for the relative growth of fuctios is importat if you really
More informationProduct measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.
Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More information4.3 Growth Rates of Solutions to Recurrences
4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.
More informationMA131  Analysis 1. Workbook 9 Series III
MA3  Aalysis Workbook 9 Series III Autum 004 Cotets 4.4 Series with Positive ad Negative Terms.............. 4.5 Alteratig Series.......................... 4.6 Geeral Series.............................
More informationBasis for simulation techniques
Basis for simulatio techiques M. Veeraraghava, March 7, 004 Estimatio is based o a collectio of experimetal outcomes, x, x,, x, where each experimetal outcome is a value of a radom variable. x i. Defiitios
More informationLecture 2: April 3, 2013
TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,
More information