Nonparametric Regression. Badr Missaoui
|
|
- Buddy Murphy
- 6 years ago
- Views:
Transcription
1 Badr Missaoui
2 Outline Kernel and local polynomial regression. Penalized regression.
3 We are given n pairs of observations (X 1, Y 1 ),...,(X n, Y n ) where Y i = r(x i ) + ε i, i = 1,..., n and r(x) = E(Y X = x). If X i are deterministic, we assume that ε i N(0, σ 2 ). If X i are random variables, we assume that ε i N(0, σ 2 ) are independent of X i. In the absence of any hypothesis on the function r, we are in the nonparametric framework.
4 The simplest nonparametric estimator is the regressogram. Suppose that X i are in the interval [a, b] and denote the bins byb 1,..., B m. Let k j be the number of observations in bin B j. Define ˆr n (x) = 1 Y i = k Ȳj j i:x i B j We can rewrite the estimator as for x B j ˆr n (x) = n l i (x)y i i=1 where l i (x) = 1/k j if X j B j, and l i (x) = 0 otherwise. In other words, the estimate ˆr n is a step function obtained by averaging the Y i over each bin
5 Recall from our discussion of model selection that R(h) = E(Y ˆr n (X)) 2 = σ 2 + E(r(X) ˆr n (X)) 2 = σ 2 + MSE. We can write the MSE as MSE = bias 2 (x)p(x)dx + where and bias(x) = E(ˆr n (x) r(x)) var(x) = Variance(ˆr n (x)) var(x)p(x)dx When the data are oversmoothed, the bias term is large and the variance is small. When the data are undersmoothed, the opposite is true. This is called the bias-variance tradeoff. Minimizing risk corresponds to balancing bias and variance.
6 Ideally, we would like to choose h to minimize R(h). R(h) depends on the unknown function r(x). We use instead the average residual sums of squares to estimate R(h). 1 n n (Y i ˆr n (X i )) 2 i=1 We will estimate the risk using the leave-one-out cross validation which defined by CV = ˆR(h) = 1 n n (Y i ˆr n( i) (X i )) 2 i=1 where ˆr n( i) is the estimator obtained by omitting the i th pair (X i, Y i )
7 There is a shortcut formula for computing R just like in linear regression. Let ˆr n be a linear smoother. Then the leave-one-out cross-validation R(h) can be written as n ( Yi ˆr n( i) (X i ) ) 2 R(h) = 1 n i=1 1 L ii where L ii = l i (X i ) An alternative is to use generalized cross validation GCV (h) = 1 n n i=1 ( ) Yi ˆr n (X i ) 2 1 n 1 n i=1 L ii
8 Kernel regression Kernel refers to any smooth function K such that K (x) 0 and K (x)dx = 1, xk (x)dx = 0 and σ 2 K = x 2 K (x)dx > 0 Some commonly used kernels : the boxcar kernel : K (x) = 1 2 I x 1 the Gaussian kernel : K (x) = 1 e x 2/2 2π
9 Kernel regression Let h > 0 be a positive number (bandwidth). The Nadaraya-Watson kernel estimator is defined by where ˆr n (x) = l i (x) = n l i (x)y i i=1 ( ) K x Xi h ( n j=1 K x Xi h In R, we suggest using the loess command or the locfit library out = loess(y x,span=.25,degree=0) lines(x,fitted(out)) ) out = locfit(y x,deg=0,alpha=c(0,h))
10 Kernel regression To do the cross-validation, create a vector bandwidths h = (h 1,..., h k ) h = c(... put your values here... ) k = length(h) zero = rep(0,k) H = cbind(zero,h) out = gcvplot(yx,deg=0,alpha=h) plot(out$df,out$values)
11 Kernel regression The choice of the kernel K is not too important. What does matter much is the choice of the bandwidth which controls the amount of smoothing. In general the bandwidth depends on the sample size (h n ). We assume that f is the density of x 1,..., x n. The risk of the Nadaraya-Watson kernel estimator is R(h n ) = h4 n 4 ( + σ2 K 2 (x)dx nh n as h n 0 and nh n ) 2 ( x 2 K (x)dx r (x) + 2r (x) f ) (x) 2 dx f (x) 1 f (x) dx + o(n 1 h n ) + o(h 4 n)
12 Kernel regression What is especially notable is the presence of the term 2r (x) f (x) f (x) This means that the bias is sensitive to the position s of the X i s. If we differentiate R with respect to h n and set the result to 0, we find that the optimal h is h = ( 1 n ) 1/5 σ 2 K 2 (x)dx 1 f (x) dx ( x 2 K (x)dx ) 2 ( r (x) + 2r (x) f (x) f (x) ) 2 dx 1/5
13 Kernel regression Thus, h = n 1/5. The risk R hn decreases at rate (n 4/5 ) In practice we cannot not use the formula of h mentioned above since it depends on the unknown function r. Instead, we use leave-one-out cross-validation.
14 Local Polynomials Kernel estimators suffer from design bias. These problem can be alleviated by using a local polynomial regression. The idea is to approximate a smooth regression function r(u) in the target value x by the polynomial : where r(u) P x (u; a) P x (u; a) = a 0 + a 1 (u x) + a 2 2! (u x)2 ) a p p! (u x)p. We estimate a = (a 0,..., a p ) T by minimizing n w i (x)(y i P x (u; a)) 2 i=1
15 Local Polynomials To find â(x), it is helpful to re-express the problem in matrix notation. Let 1 x 1 x 1 x X x = 2 x... 1 x n x and let W x = diag{w i (x)} i we can write then the problem as (x 1 x) p p! (x 2 x) p p!. (x n x) p p! (Y X x a) T W x (Y X x a) Minimizing this gives the weighted least squares estimator a(x) = (X T x W x X x ) 1 X T x W x Y
16 Local Polynomials The local polynomial regression estimate is ˆr n (x) = n l i (x)y i i=1 where l(x) T = (l 1 (x),..., l n (x)), e 1 = (1, 0,..., 0) T. l(x) T = e T 1 (X T x W x X x ) 1 X T x W x The R code is the same except we use deg=1 for local linear, deg=2 for quadratic loess(y locfit(y x,deg=1,span=h) x,deg = 1,alpha=c(0,h))
17 Local Polynomials Let Y i = r(x i ) + σ(x i )ε i for i = 1,..., n and a < X i < b. Assume that X 1,..., X n are a sample from a distribution with density f and that (i) f (x) > 0, (ii) f, r and σ 2 are continuous in a neighborhood of x, and (iii) h n 0 and nh n 0. The local linear estimator has a variance σ 2 (x) K 2 (x)dx + o(1/nh n ) f (x)nh n and has an asymptotic bias hn r (x) x 2 K (x)dx + o(h 2 ) Thus, the local linear estimator is free from design bias.
18 Penalized Regression Consider polynomial regression p Y = β j x j + ε i=0 or n ˆr(x) = ˆβ j x j j=0 We have the design matrix 1 x 1 x p 1 1 x 2 x p 2 X = x n xn p
19 Penalized Regression Least squares minimizes (Y Xβ) T (Y Xβ), which implies ˆβ = (X T X) 1 X T Y The ridge regression aims to minimize (Y Xβ) T (Y Xβ) + λβ T β then β = (X T X + λi) 1 X T Y
20 Penalized Regression An alternative way is to minimize the penalized sums of squares M(λ) = (Y i ˆr n (X i )) 2 + λj(r) i where J(r) = (r (x)) 2 dx is the roughness penalty. This penalty leads to a solution that favors smoother functions. The parameter λ controls the amount of smoothness.
21 Penalized Regression The most commonly used splines are piecewise cubic splines. Let ξ 1 <... < ξ k be a set of ordered point -called knotscontained in some interval (a, b). A cubic spline is a continuous function r such that (i) r is a cubic polynomial over (ξ 1, ξ 2 ),... and r has continuous first and second derivatives at the knots. A spline that is linear beyond the boundary knots is called a natural spline. Theorem The function ˆr n (x) that minimizes M(λ) is a natural cubic spline with knots the data points. The estimator ˆr n is called a smoothing spline.
22 Penalized Regression The theorem above does not give you an explicit form for ˆr n We will construct a basis for for the set of splines (cubic B-spline). B i,0 (t) = 1 t [ti,t i+1 ] B i,d (t) = t t i B i,d 1 (t) + t i+d+1 t B i+1,d 1 (t) t i+d t i t i+d+1 t i+1 Without the penalty, the B-spline basis interpolate the data and therefore provide a perfect fit to the data.
23 Penalized Regression Now, we can write ˆr n = n ˆβ j B j (x) j=1 where B j (x) are the basis vectors for the B-splines. We follow the pattern of polynomial regression B 1 (x 1 ) B n (x 1 ) B 1 (x 2 ) B n (x 2 ) B =... B 1 (x n ) B n (x n )
24 Penalized Regression We can rewrite the problem as follows : argmin β (Y Bβ) T (Y Bβ) + λβ T Ωβ where B ij = [B j (X i )] ij and Ω jk = B j (x)b k (x)dx. The solution is and where β = (B T B + λω) 1 B T Y Ŷ = LY L = B(B T B + λω) 1 B T We define the effective degree of freedom by df = trace(l), and we choose λ using the GCV or CV. In R, out=smooth.spline(x,y,df=10,cv=true) lines(x,out$y)
25 Penalized Regression In more general case, we will assume that r admits an expansion series wrt to a orthonormal basis ( i ) i such that r(x) = β j φ j (x) j=1 We will approximate r by r J (x) = J β j φ j (x) j=1 The number of terms J will be our smoothing parameter. Our estimate is ˆr(x) = J j=1 ˆβ j φ j (x).
26 Penalized Regression The estimate ˆβ is ˆβ = (U T U) 1 U T Y where U = [φ j (X i )] ij and Ŷ = SY where S = U(UT U) 1 U T We can choose J by cross validation. Note that trace(s) = J so the GCV takes the simple form GCV (J) = SSE n 1 (1 J/n) 2
27 Penalized Regression : Variance estimation Theorem Let ˆr n be a linear smoother. Let ˆσ 2 = n i=1 (Y i ˆr(X i )) 2 n 2ν + ˆν where ν = tr(l), ˆν = tr(l T L). If r are sufficiently smooth, ν = o(n) and ˆν = o(n) then ˆσ 2 is a consistent estimator of σ 2.
Introduction to Regression
Introduction to Regression p. 1/97 Introduction to Regression Chad Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/97 Acknowledgement Larry Wasserman, All of Nonparametric
More informationIntroduction to Regression
Introduction to Regression David E Jones (slides mostly by Chad M Schafer) June 1, 2016 1 / 102 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures
More informationIntroduction to Regression
Introduction to Regression Chad M. Schafer May 20, 2015 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures Cross Validation Local Polynomial Regression
More informationIntroduction to Regression
Introduction to Regression Chad M. Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/100 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression
More informationLocal regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression
Local regression I Patrick Breheny November 1 Patrick Breheny STA 621: Nonparametric Statistics 1/27 Simple local models Kernel weighted averages The Nadaraya-Watson estimator Expected loss and prediction
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance
More informationNonparametric Regression 10/ Larry Wasserman
Nonparametric Regression 10/36-702 Larry Wasserman 1 Introduction Now we focus on the following problem: Given a sample (X 1, Y 1 ),..., (X n, Y n ), where X i R d and Y i R, estimate the regression function
More informationLinear Regression Model. Badr Missaoui
Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus
More informationRecap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis:
1 / 23 Recap HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis: Pr(G = k X) Pr(X G = k)pr(g = k) Theory: LDA more
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 9: Basis Expansions Department of Statistics & Biostatistics Rutgers University Nov 01, 2011 Regression and Classification Linear Regression. E(Y X) = f(x) We want to learn
More informationNonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction
Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann Univariate Kernel Regression The relationship between two variables, X and Y where m(
More information3 Nonparametric Density Estimation
3 Nonparametric Density Estimation Example: Income distribution Source: U.K. Family Expenditure Survey (FES) 1968-1995 Approximately 7000 British Households per year For each household many different variables
More informationDensity estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas
0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity
More information4 Nonparametric Regression
4 Nonparametric Regression 4.1 Univariate Kernel Regression An important question in many fields of science is the relation between two variables, say X and Y. Regression analysis is concerned with the
More informationAlternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods
Alternatives to Basis Expansions Basis expansions require either choice of a discrete set of basis or choice of smoothing penalty and smoothing parameter Both of which impose prior beliefs on data. Alternatives
More informationTheoretical Exercises Statistical Learning, 2009
Theoretical Exercises Statistical Learning, 2009 Niels Richard Hansen April 20, 2009 The following exercises are going to play a central role in the course Statistical learning, block 4, 2009. The exercises
More informationEcon 582 Nonparametric Regression
Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume
More information41903: Introduction to Nonparametrics
41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific
More informationTime Series and Forecasting Lecture 4 NonLinear Time Series
Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations
More informationNonparametric Methods
Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis
More informationNonparametric Regression. Changliang Zou
Nonparametric Regression Institute of Statistics, Nankai University Email: nk.chlzou@gmail.com Smoothing parameter selection An overall measure of how well m h (x) performs in estimating m(x) over x (0,
More informationAlternatives. The D Operator
Using Smoothness Alternatives Text: Chapter 5 Some disadvantages of basis expansions Discrete choice of number of basis functions additional variability. Non-hierarchical bases (eg B-splines) make life
More informationSpatial Process Estimates as Smoothers: A Review
Spatial Process Estimates as Smoothers: A Review Soutir Bandyopadhyay 1 Basic Model The observational model considered here has the form Y i = f(x i ) + ɛ i, for 1 i n. (1.1) where Y i is the observed
More informationPenalized Splines, Mixed Models, and Recent Large-Sample Results
Penalized Splines, Mixed Models, and Recent Large-Sample Results David Ruppert Operations Research & Information Engineering, Cornell University Feb 4, 2011 Collaborators Matt Wand, University of Wollongong
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationMath 3330: Solution to midterm Exam
Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the
More informationA Short Introduction to the Lasso Methodology
A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael
More informationUNIVERSITETET I OSLO
UNIVERSITETET I OSLO Det matematisk-naturvitenskapelige fakultet Examination in: STK4030 Modern data analysis - FASIT Day of examination: Friday 13. Desember 2013. Examination hours: 14.30 18.30. This
More informationAdditive Isotonic Regression
Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive
More informationIEOR 165 Lecture 7 1 Bias-Variance Tradeoff
IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to
More informationOutline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model
Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression
More informationLocal Polynomial Regression
VI Local Polynomial Regression (1) Global polynomial regression We observe random pairs (X 1, Y 1 ),, (X n, Y n ) where (X 1, Y 1 ),, (X n, Y n ) iid (X, Y ). We want to estimate m(x) = E(Y X = x) based
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationMath 533 Extra Hour Material
Math 533 Extra Hour Material A Justification for Regression The Justification for Regression It is well-known that if we want to predict a random quantity Y using some quantity m according to a mean-squared
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationAdaptive Piecewise Polynomial Estimation via Trend Filtering
Adaptive Piecewise Polynomial Estimation via Trend Filtering Liubo Li, ShanShan Tu The Ohio State University li.2201@osu.edu, tu.162@osu.edu October 1, 2015 Liubo Li, ShanShan Tu (OSU) Trend Filtering
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More informationSTAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)
STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points
More informationProblems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B
Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2
More informationECON 721: Lecture Notes on Nonparametric Density and Regression Estimation. Petra E. Todd
ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation Petra E. Todd Fall, 2014 2 Contents 1 Review of Stochastic Order Symbols 1 2 Nonparametric Density Estimation 3 2.1 Histogram
More informationLecture 5: A step back
Lecture 5: A step back Last time Last time we talked about a practical application of the shrinkage idea, introducing James-Stein estimation and its extension We saw our first connection between shrinkage
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More information9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures
FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models
More informationSTAT 462-Computational Data Analysis
STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods
More informationSimple Linear Regression
Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.
More informationDirect Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Classification Logistic regression models for classification problem We consider two class problem: Y {0, 1}. The Bayes rule for the classification is I(P(Y = 1 X = x) > 1/2) so
More informationBickel Rosenblatt test
University of Latvia 28.05.2011. A classical Let X 1,..., X n be i.i.d. random variables with a continuous probability density function f. Consider a simple hypothesis H 0 : f = f 0 with a significance
More informationModel-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego
Model-free prediction intervals for regression and autoregression Dimitris N. Politis University of California, San Diego To explain or to predict? Models are indispensable for exploring/utilizing relationships
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More informationTerminology for Statistical Data
Terminology for Statistical Data variables - features - attributes observations - cases (consist of multiple values) In a standard data matrix, variables or features correspond to columns observations
More informationGeneralised Smoothing in Functional Data Analysis
Generalised Smoothing in Functional Data Analysis Michelle Carey Department of Mathematics and Statistics University of Limerick A thesis submitted for the degree of Doctor of Philosophy (PhD) Supervised
More informationNonparametric Econometrics
Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-
More informationInversion Base Height. Daggot Pressure Gradient Visibility (miles)
Stanford University June 2, 1998 Bayesian Backtting: 1 Bayesian Backtting Trevor Hastie Stanford University Rob Tibshirani University of Toronto Email: trevor@stat.stanford.edu Ftp: stat.stanford.edu:
More informationComputational Physics
Interpolation, Extrapolation & Polynomial Approximation Lectures based on course notes by Pablo Laguna and Kostas Kokkotas revamped by Deirdre Shoemaker Spring 2014 Introduction In many cases, a function
More informationIntroduction to Nonparametric Regression
Introduction to Nonparametric Regression Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota)
More informationSimple Linear Regression
Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent
More informationNonparametric Inference in Cosmology and Astrophysics: Biases and Variants
Nonparametric Inference in Cosmology and Astrophysics: Biases and Variants Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Collaborators:
More informationMultiple Regression Analysis. Part III. Multiple Regression Analysis
Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant
More information11 Hypothesis Testing
28 11 Hypothesis Testing 111 Introduction Suppose we want to test the hypothesis: H : A q p β p 1 q 1 In terms of the rows of A this can be written as a 1 a q β, ie a i β for each row of A (here a i denotes
More informationRegression: Lecture 2
Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More informationIntroduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β
Introduction - Introduction -2 Introduction Linear Regression E(Y X) = X β +...+X d β d = X β Example: Wage equation Y = log wages, X = schooling (measured in years), labor market experience (measured
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationSimple Linear Regression
Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring
More informationOn the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models
On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of
More informationEstimation of cumulative distribution function with spline functions
INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationHigh-dimensional regression with unknown variance
High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f
More informationBasis Penalty Smoothers. Simon Wood Mathematical Sciences, University of Bath, U.K.
Basis Penalty Smoothers Simon Wood Mathematical Sciences, University of Bath, U.K. Estimating functions It is sometimes useful to estimate smooth functions from data, without being too precise about the
More informationLecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationLinear models and their mathematical foundations: Simple linear regression
Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction
More informationModelling Non-linear and Non-stationary Time Series
Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September
More informationMa 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA
Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA March 6, 2017 KC Border Linear Regression II March 6, 2017 1 / 44 1 OLS estimator 2 Restricted regression 3 Errors in variables 4
More informationRemedial Measures for Multiple Linear Regression Models
Remedial Measures for Multiple Linear Regression Models Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Remedial Measures for Multiple Linear Regression Models 1 / 25 Outline
More informationASYMPTOTICS FOR PENALIZED SPLINES IN ADDITIVE MODELS
Mem. Gra. Sci. Eng. Shimane Univ. Series B: Mathematics 47 (2014), pp. 63 71 ASYMPTOTICS FOR PENALIZED SPLINES IN ADDITIVE MODELS TAKUMA YOSHIDA Communicated by Kanta Naito (Received: December 19, 2013)
More informationLocal Polynomial Modelling and Its Applications
Local Polynomial Modelling and Its Applications J. Fan Department of Statistics University of North Carolina Chapel Hill, USA and I. Gijbels Institute of Statistics Catholic University oflouvain Louvain-la-Neuve,
More informationSTAT 4385 Topic 03: Simple Linear Regression
STAT 4385 Topic 03: Simple Linear Regression Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2017 Outline The Set-Up Exploratory Data Analysis
More informationBoundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta
Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta November 29, 2007 Outline Overview of Kernel Density
More informationLikelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science
1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School
More informationCurve Fitting. Objectives
Curve Fitting Objectives Understanding the difference between regression and interpolation. Knowing how to fit curve of discrete with least-squares regression. Knowing how to compute and understand the
More informationMotivation for multiple regression
Motivation for multiple regression 1. Simple regression puts all factors other than X in u, and treats them as unobserved. Effectively the simple regression does not account for other factors. 2. The slope
More informationMultiple Linear Regression
Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate
More informationSimple and Multiple Linear Regression
Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where
More informationDEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE
Estimating the error distribution in nonparametric multiple regression with applications to model testing Natalie Neumeyer & Ingrid Van Keilegom Preprint No. 2008-01 July 2008 DEPARTMENT MATHEMATIK ARBEITSBEREICH
More informationThe Statistical Property of Ordinary Least Squares
The Statistical Property of Ordinary Least Squares The linear equation, on which we apply the OLS is y t = X t β + u t Then, as we have derived, the OLS estimator is ˆβ = [ X T X] 1 X T y Then, substituting
More informationF-tests and Nested Models
F-tests and Nested Models Nested Models: A core concept in statistics is comparing nested s. Consider the Y = β 0 + β 1 x 1 + β 2 x 2 + ǫ. (1) The following reduced s are special cases (nested within)
More informationChapter 4: Interpolation and Approximation. October 28, 2005
Chapter 4: Interpolation and Approximation October 28, 2005 Outline 1 2.4 Linear Interpolation 2 4.1 Lagrange Interpolation 3 4.2 Newton Interpolation and Divided Differences 4 4.3 Interpolation Error
More informationRegularization: Ridge Regression and the LASSO
Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression
More informationConfidence intervals for kernel density estimation
Stata User Group - 9th UK meeting - 19/20 May 2003 Confidence intervals for kernel density estimation Carlo Fiorio c.fiorio@lse.ac.uk London School of Economics and STICERD Stata User Group - 9th UK meeting
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationECON 450 Development Economics
ECON 450 Development Economics Statistics Background University of Illinois at Urbana-Champaign Summer 2017 Outline 1 Introduction 2 3 4 5 Introduction Regression analysis is one of the most important
More informationLecture 02 Linear classification methods I
Lecture 02 Linear classification methods I 22 January 2016 Taylor B. Arnold Yale Statistics STAT 365/665 1/32 Coursewebsite: A copy of the whole course syllabus, including a more detailed description of
More information