Two Applications of Nonparametric Regression in Survey Estimation

Size: px
Start display at page:

Download "Two Applications of Nonparametric Regression in Survey Estimation"

Transcription

1 Two Applications of Nonparametric Regression in Survey Estimation 1/56 Jean Opsomer Iowa State University Joint work with Jay Breidt, Colorado State University Gerda Claeskens, Université Catholique de Louvain Göran Kauermann, Universität Bielefeld Giovanna Ranalli, Università di Perugia April 9, 2004

2 Outline 2/56 1. Introduction (a) Two examples (b) Generic and specific inference 2. Nonparametric Regression Using Penalized Splines 3. Generic Inference for Survey Data (a) Nonparametric model-assisted estimator (b) Utah Forest Survey 4. Specific Inference for Survey Data (a) Nonparametric small area estimator (b) Northeastern Lakes Survey 5. Conclusion

3 1. Introduction: Utah Forest Survey 3/56 Forest inventory survey conducted by U.S. Forest Service

4 Utah Forest Survey (2) Phase Variables Geographic X,Y coordinates E-W, N-S Information elev elevation (m) System asp aspect (deg) (GIS) slope slope (deg) hillshd hillshade (solar radiation) N = 67, 216 nlcd vegetation cover type (class) Forest fortyp forest type (class) Inventory trees number of trees and agemax max tree age (years) Analysis ageavg avg tree age (years) (FIA) bamax max tree basal area (sq in) crcov tree crown cover (%) n I = 3, Forest lichen lichen species present (count) Health... Monitoring (FHM) n = 71 Goal: estimate mean number of lichen in region/domains 4/56

5 Introduction: Northeastern Lakes Survey 5/56 Ecological condition survey of Northeastern lakes conducted by U.S. EPA Data collected for 338 lakes

6 Northeastern Lakes Survey (2) Region includes digit Hydrologic Unit Codes (HUC) 6/56 Goal: estimate mean lake Acid Neutralizing Capacity (ANC) for all HUCs

7 Types of Statistical Inference 7/56 Specific Inference: expensive, high quality, targeted using custom-built method (or model) to achieve best possible estimator for particular variable(s) willing to defend model

8 Types of Statistical Inference 7/56 Specific Inference: expensive, high quality, targeted using custom-built method (or model) to achieve best possible estimator for particular variable(s) willing to defend model Generic Inference: cheap, reasonable quality, good for many purposes using method appropriate for large number of variables that need to be estimated jointly Corn Flakes NET WT. 12 OZ.

9 Statistical Inference in Surveys 8/56 Number of Inference Modelling observations Large { generic none generic model-assisted Moderate specific model-based Small specific small domain estimation Number of observations depends on domain (subpopulation) size For moderate sample size, use of generic inference depends on model goodness-of-fit Nonparametric methods can improve both generic and specific inference

10 2. Nonparametric Regression Using Penalized Splines 9/56 Many nonparametric regression methods are available Kernel and local polynomial methods Smoothing splines, regression splines,... Orthogonal decomposition (wavelet, Fourier series) Penalized splines or P-splines regression (Eilers and Marx, 1996) is simple, flexible and computationally attractive smoothing method

11 Definition of Penalized Splines Model 10/56 Regression model y i = m(x i ) + ε i Function m( ) is unknown but assumed well approximated by polynomial spline m(x; β) = β 0 + β 1 x β p x p + K β p+k (x κ k ) p + k=1 p : degree of spline (fixed) κ 1 <... < κ K : set of K knots (fixed) β = (β 0,..., β p+k ) : vector of parameters (unknown) (Ruppert, Wand and Carroll, 2003)

12 Polynomial Spline Basis Functions 11/56 (x κ) p + { (x κ) p if x κ > 0 0 if x κ p=1 0.8 p= basis functions x x Other spline basis functions are possible (B-splines, radial splines)

13 Choosing K 12/56 If K is sufficiently large, m( ; β) can approximate large class of functions y K= K= y K= x Rule of thumb: K = min(#x/4, 35) K= x

14 Expressing Spline Model as Parametric Model 13/56 m(x; β) = β 0 + β 1 x β p x p + x β K β p+k (x κ k ) p + k=1 with x = (1, x,..., x p, (x κ 1 ) p +,..., (x κ K ) p +) β = (β 0,..., β p+k ) T

15 Fitting by Penalized Splines Regression Minimize penalized sum of squares 14/56 min β n (y i m(x i ; β)) 2 + λ i=1 K k=1 β 2 p+k ˆβ λ = ( X T X + λd ) 1 X T Y λ = smoothing penalty (fixed) D = diag{0,..., 0, 1,..., 1} X = design matrix (including spline terms) ˆβ λ is ridge regression estimator, with ridge penalty on nonlinear (spline) terms of model

16 Fitting by Penalized Splines Regression (2) 15/56 λ protects against overfitting and determines smoothness of fit y lambda= lambda= y lambda= x lambda= x

17 Choosing the Penalty λ Cross-Validation: minimize CV sum of squares with respect to λ Mixed model approach: treat spline parameters β p+1,..., β p+k as a random effect with common variance σβ 2 and fit regression using Maximum Likelihood approach (MLE, REML) 16/56

18 P-spline: Hybrid Regression Method 17/56 P-spline is a nonparametric regression method: can fit very large classes of functions adaptive to local features in the data smoothness of function is determined by penalty parameter λ P-spline is a parametric regression method: model can be written as x β fitted by (global) least squares method number of parameters p + K puts upper bound on flexibility of model

19 Extending the model 18/56 Semi-parametric regression Model y i = m(x 1i ; β 1 ) + x 2i β 2 + ε i min β 1,β 2 n (y i m(x 1i ; β 1 ) x 2i β 2 ) 2 + λ i=1 K k=1 β 2 1,p+k

20 Extending the model 18/56 Semi-parametric regression Model y i = m(x 1i ; β 1 ) + x 2i β 2 + ε i min β 1,β 2 n (y i m(x 1i ; β 1 ) x 2i β 2 ) 2 + λ i=1 K k=1 β 2 1,p+k Additive model min β 1,β 2 Model y i = m 1 (x 1i ; β 1 ) + m 2 (x 2i ; β 2 ) + ε i n K (y i m 1 (x 1i ; β 1 ) m 2 (x 2i ; β 2 )) 2 +λ 1 i=1 Other... k=1 β 2 1,p+k+λ 2 K β2,p+k 2 k=1

21 3.1 Generic Inference for Survey Data Population U = {1,..., k,..., N} with unknown population parameters ȳ N = 1 y i and z N, x N,... N U Sample s selected from U according to known sampling design p(s) stratification clustering multiple phases 19/56

22 3.1 Generic Inference for Survey Data Population U = {1,..., k,..., N} with unknown population parameters ȳ N = 1 y i and z N, x N,... N U Sample s selected from U according to known sampling design p(s) stratification clustering multiple phases Generic estimator for the population mean ŷ s = [ w i y i ẑ s = ] w i z i, ˆx s,... s s 19/56

23 Simple Generic Estimation: Design-based Horvitz-Thompson estimator (1952) ŷ π = 1 N s 1 π i y i with inclusion probabilities π i = Pr(i s) Properties of ŷ π (for any design p) with E p (ŷ π ) = 1 N Var p (ŷ π ) = 1 N 2 y i = ȳ N U U π ij = Pr(i, j s) y i π i y j π j (π ij π i π j ) 20/56

24 Better Generic Estimation: Model-assisted Superpopulation model ξ: y i are iid with E ξ (y i ) = β 0 + β 1 x i = x T i β Var ξ (y i ) = σ 2 Least squares population fit for β 21/56 B U = (X T UX U ) 1 X U Y U

25 Better Generic Estimation: Model-assisted Superpopulation model ξ: y i are iid with E ξ (y i ) = β 0 + β 1 x i = x T i β Var ξ (y i ) = σ 2 Least squares population fit for β 21/56 B U = (X T UX U ) 1 X U Y U B U is estimated by sample-based estimator ˆB = ( X T s Π 1 s with Π s = diag{π i, i s} X s ) 1 X T s Π 1 s Y s Model-assisted ( regression ) estimator (Cassel et al., 1977) ŷ r = 1 x T i ˆB + 1 y i x T i ˆB N N U s π i

26 Properties of Regression Estimator Generic estimator ŷ r = w i (s)y i s 22/56 Consistent, asymptotically design unbiased E p (ŷ r ) ȳ N Approximate design variance Var p (ŷ r ) 1 N 2 U y i x T i B U π i y j x T j B U π j (π ij π i π j ) If model is true, smaller than HT variance [ ] HT : Var p (ŷ π ) = 1 y i y j (π N 2 ij π i π j ) π i π j U

27 Nonparametric Regression Estimation 23/56 Superpopulation model ξ: E ξ (y i ) = x T i β Var ξ (y i ) = σ 2

28 Nonparametric Regression Estimation 23/56 Superpopulation model ξ: E ξ (y i ) = x T i β Var ξ (y i ) = σ 2 Replace by: Superpopulation model ξ: E ξ (y i ) = m(x i ) Var ξ (y i ) = v(x i ) Breidt and Opsomer (2000) developed model-assisted nonparametric estimator based on local polynomial regression

29 Nonparametric Regression Estimation 23/56 Superpopulation model ξ: E ξ (y i ) = x T i β Var ξ (y i ) = σ 2 Replace by: Superpopulation model ξ: E ξ (y i ) = m(x i ) Var ξ (y i ) = v(x i ) Breidt and Opsomer (2000) developed model-assisted nonparametric estimator based on local polynomial regression Easier to do with P-splines!

30 Sample-weighted Penalized Splines Regression 24/56 Superpopulation model ξ: E ξ (y i ) = m(x i ; β) = x i β Var ξ (y i ) = v(x i ) Least squares P-spline population fit (for fixed λ) B U,λ = ( X T U X U + λd ) 1 X T U Y U B U,λ is estimated by design-based estimator ˆB λ = ( X T s Π 1 s X s + λd ) 1 X T s Π 1 s Y s

31 Penalized Splines Regression Estimation Model-assisted P-splines estimator ŷ p = 1 N U x T i ˆB λ + 1 N s y i x T i π i ˆB λ 25/56 Design properties: identical to regression estimator (!)

32 Improved Efficiency from Nonparametric Fit Approximate design variance Var p (ŷ p ) 1 N 2 [ Var p (ŷ r ) 1 N 2 [ U y i x T i B U,λ π i U y i x T i B π i HT : Var p (ŷ π ) = 1 N 2 U y j x T j B U,λ (π ij π i π j ) π j y j x T j B ] (π ij π i π j ) π j y i π i y j π j (π ij π i π j ) ] 26/56

33 P-Splines or Local Polynomials for Model- Assisted Estimation? 27/56 Advantages of P-Splines ly related to parametric modelling (ratio, linear models) Model is easy to extend to multivariate, additive, semiparametric cases Handles data sparseness easily, very fast to compute

34 P-Splines or Local Polynomials for Model- Assisted Estimation? 27/56 Advantages of P-Splines ly related to parametric modelling (ratio, linear models) Model is easy to extend to multivariate, additive, semiparametric cases Handles data sparseness easily, very fast to compute Disadvantages of P-Splines Flexibility of model limited by number of parameters p + K

35 Multi-phase Sampling 28/56 Population U (N elements) Phase 1 sample S (n I elements) Phase 2 sample R (n elements)

36 Multi-phase Sampling 28/56 Population U (N elements) Geographic Information System (GIS) N = 67, 216 Phase 1 sample S (n I elements) Phase 2 sample R (n elements) Forest Inventory and Analysis (FIA) n I = 3, 107 Forest Health Monitoring (FHM) n = 71

37 Model-assisted Estimation for Multi-phase samples Different information available at different phases population x ak, z ak, k U GIS variables phase 1 x bk, z bk, k S FIA measurements phase 2 y k, k R FHM measurements (lichen count) (x bk, z bk contains x ak, z ak ) 29/56 Goal: estimate ȳ N with generic but efficient estimator

38 Model-assisted Estimation for Multi-phase samples Different information available at different phases population x ak, z ak, k U GIS variables phase 1 x bk, z bk, k S FIA measurements phase 2 y k, k R FHM measurements (lichen count) (x bk, z bk contains x ak, z ak ) 29/56 Goal: estimate ȳ N with generic but efficient estimator Approach: 1. use P-splines to fit semiparametric additive model for each level of auxiliary info 2. construct model-assisted estimator

39 Model-assisted Estimation for Multi-phase samples (2) Models Model a: using predictors available for U 30/56 E ξ (y i ) = g a (x ai, z ai ) = m a (x ai ; β a ) + z ai γ a Model b: using predictors available for S E ξ (y i ) = g b (x bi, z bi ) = m b (x bi ; β b ) + z bi γ b

40 Model-assisted Estimation for Multi-phase samples (2) Models Model a: using predictors available for U 30/56 E ξ (y i ) = g a (x ai, z ai ) = m a (x ai ; β a ) + z ai γ a Model b: using predictors available for S E ξ (y i ) = g b (x bi, z bi ) = m b (x bi ; β b ) + z bi γ b P-splines regression estimator ŷ p = 1 ĝ ai + 1 N N U S ĝ bi ĝ ai π i(s) + 1 N R y i ĝ bi π i(s) π k(r S) Design properties: follow nonparametric model-assisted framework

41 3.2 Utah Forest Survey Phase Variables Geographic X,Y coordinates E-W, N-S Information elev elevation (m) System asp aspect (deg) (GIS) slope slope (deg) hillshd hillshade (solar radiation) N = 67, 216 nlcd vegetation cover type (class) Forest fortyp forest type (class) Inventory trees number of trees and agemax max tree age (years) Analysis ageavg avg tree age (years) (FIA) bamax max tree basal area (sq in) crcov tree crown cover (%) n I = 3, Forest lichen lichen species present (count) Health... Monitoring (FHM) n = 71 31/56

42 Semiparametric Model a 32/56 E ξ (LICHEN) = m(hillshd; β) + z NLCD γ ps(hillshd) hillshd partial for nlcd Fitted using Splus functions gam(), ps() with p = 2, K = 15, λ = 1 nlcd2

43 Semiparametric Model b E ξ (LICHEN) = m 1 (CRCOV; β 1 ) + m 2 (AGEMAX; β 2 ) +m 3 (BAMAX; β 3 ) + z NLCD γ 33/56 ps(crcov) ps(agemax) crcov agemax ps(bamax) bamax partial for nlcd nlcd2

44 Forest Health Monitoring Estimates 34/56 HT = Generic estimator, ignores all auxiliary information Linear = Generic estimator, all models are fitted by linear regression Semiparametric = Generic estimator, semiparametric (P-spline) models HT Linear Semiparametric Estimate Est. St. Dev (69%) (44%) Nonparametric regression can improve the efficiency of generic estimators in surveys

45 35/56 Estimators for Domains Domain : subpopulation for which separate estimator is needed Models can improve precision of domain estimators, if they have good local properties Nonparametric model better able to adapt to local features of data crcov s(crcov)

46 Model-Assisted Estimators for Domains 36/56 (Särndal, 1984) Obtain sample-weighted model fit for complete sample s Estimator for domain U d U with realized sample s d s is ŷ p = 1 ĝ i + 1 N d N d U d s d y i ĝ i π i Variance follows from model-assisted estimation theory Approach maintains additivity of domain estimates

47 Example: Estimation for Domain with NLCD > 50 37/56 Phase Longitude Index Latitude Index Only n = 27 observations in domain

48 Example (2) All Data (n = 71) HT Linear Semiparametric Estimate Est. St. Dev /56 Domain (n = 27) HT Linear Semiparametric Estimate Est. St. Dev (1.58) (1.56) (1.06) Nonparametric regression makes it possible to maintain generic approach at smaller scales

49 4.1 Specific Inference for Survey Data 39/56 Data contain 557 observations over 113 HUCs Goal: produce estimates of ANC for all HUCs

50 HUC Sample Means 40/56 Problems: unreliable estimates, missing HUCs

51 HUC as Small Areas 41/56 Few sample observations available in most HUCs Average sample size/huc: HUCs contain less than 5 observations 27 out of 113 HUCs contain no sample observations Modelling required to construct reliable HUC-level estimates called small area estimation in surveys statistics mixed model/prediction problem = type of specific inference

52 Nonparametric Small Area Estimation 42/56 P-splines are ideally suited for small area estimation when the mean function is difficult to specify a priori close relationship between classical small area estimation models and P-splines availability of existing software ability to evaluate need for nonlinearity in model and significance of small area effects

53 P-splines as Random Effects 43/56 y i = m(x i ; β) + ε i = x i β + ε i

54 P-splines as Random Effects 43/56 y i = m(x i ; β) + ε i = x i β + ε i = x F i β F + z i γ + ε i x F i β F β 0 + x i β x p i β p (parametric, fixed component) z i γ = z 1i γ z Ki γ K (x i κ 1 ) p +β p (x i κ K ) p +β p+k

55 P-splines as Random Effects 43/56 y i = m(x i ; β) + ε i = x i β + ε i = x F i β F + z i γ + ε i x F i β F β 0 + x i β x p i β p (parametric, fixed component) z i γ = z 1i γ z Ki γ K (x i κ 1 ) p +β p (x i κ K ) p +β p+k (deviations from parametric, treated as random effect) γ = (γ 1,..., γ K ) iid F γ (0, σ 2 γ) ε i F ε (0, σ 2 ε)

56 P-splines Estimator as BLUP Assuming σγ, 2 σε 2 known, Best Linear Unbiased Estimator/Predictor (BLUE/BLUP) is solution to n ( min yi x F β F i β F + z i γ ) 2 σ 2 K + ε γ 2,γ σγ 2 k i=1 (Henderson et al., 1959) [ ] ˆβF = ˆγ with [ ˆβF ˆγ k=1 ( 1 X T X + σ2 ε D) X T Y σγ 2 D = diag{0,..., 0, 1,..., 1} X = [ x F z ] ] = P-splines (ridge) regression estimator ˆβ λ with λ = σε/σ 2 γ 2 44/56

57 P-splines Estimator as EBLUP If σ 2 γ, σ 2 ε are unknown, estimates can be obtained by Maximum Likelihood (ML) or Restricted Maximum Likelihood (REML) (e.g. Searle, Casella and McCulloch, 1992), and ˆβˆλ = [ ˆβF ˆγ ] = ( 1 X T X + ˆσ2 ε D) X T Y ˆσ γ 2 45/56 ˆβˆλ is Empirical BLUP (EBLUP) for β

58 P-splines Estimator as EBLUP If σ 2 γ, σ 2 ε are unknown, estimates can be obtained by Maximum Likelihood (ML) or Restricted Maximum Likelihood (REML) (e.g. Searle, Casella and McCulloch, 1992), and ˆβˆλ = [ ˆβF ˆγ ] = ( 1 X T X + ˆσ2 ε D) X T Y ˆσ γ 2 45/56 ˆβˆλ is Empirical BLUP (EBLUP) for β Smoothing penalty λ = ˆσ 2 ε/ˆσ 2 γ is determined by data Automatically adjusts λ to patterns in data small deviations from parametric shape ˆσ 2 γ small more smoothing data exhibit significant deviations from parametric shape ˆσ 2 γ large less smoothing

59 Spatial Smoothing using P-splines NE Lakes data require bivariate (spatial) smoothing Low-rank radial basis ( thin-plate spline) [ ] z = C(x i κ k ) 1 i n 1 k K [ C(κ k κ k ) with C(r) = r 2 log r (Ruppert et al. 2003) Mixed model ] 1/2 1 k,k K 46/56 y i = x F i βf + z i γ + ε i γ = (γ 1,..., γ K ) iid F γ (0, σ 2 γ) ε i F ε (0, σ 2 ε) Knot selection: regular spatial grid, space-filling algorithm

60 Small Area Estimation as Mixed Effect Regression Classical small area estimation (Battese, Harter and Fuller, 1988): T small areas, with u t the random effect for small area t = 1,..., T Model 47/56 y i = x i β + u t + ε i = x i β + d i u + ε i d i = (d i1,..., d it ) d it = { 1 if i small area t 0 otherwise u = (u 1,..., u T ) iid F u (0, σ 2 u) ε i F ε (0, σ 2 ε) Target Ȳt = X t β + u t estimated by (E)BLUP ȳ t = X t ˆβ + ût

61 Nonparametric Small Area Model Combine both random effects models 48/56 y i = m(x i ; β) + d i u + ε i = x F i β F + z i γ + d i u + ε i Variance components γ iid F γ (0, σ 2 γ) u iid F u (0, σ 2 u) ε i F ε (0, σ 2 ε)

62 Nonparametric Small Area Model Combine both random effects models 48/56 y i = m(x i ; β) + d i u + ε i = x F i β F + z i γ + d i u + ε i Variance components γ iid F γ (0, σ 2 γ) u iid F u (0, σ 2 u) ε i F ε (0, σ 2 ε) EBLUP can be computed by REML, and ȳ t = X F t ˆβ F + Z t ˆγ + û t

63 4.2 Small Area Estimation for NE Lakes 557 measurements on 338 lakes Dependent variable Independent variables Fixed effects Random effects ANC Acid Neutralizing Capacity INT intercept ELEV elevation TPS spatial thin-plate spline for K = 81 HUC 113 small areas 49/56

64 Knot Locations for Spatial Spline 50/56 utmy utmx

65 Full Model: Model Fit 51/56 Estimates Fixed effects Random effects ˆβ F INT 586 ELEV ˆσ TPS 79 HUC 420 Error 173 Correlation between model predictions and ANC: 0.96

66 Full Model: HUC Predictions 52/ Correlation between HUC means and model predictions: 0.97

67 Are both random effects needed? AIC model selection criterion TPS HUC yes no yes no /56 Correlation between ANC and model prediction TPS HUC yes no yes no

68 Spline in Small Area Model? 54/ observed HUCs non observed HUCs HUC predictions HUC model HUC predictions HUC + TPS model Spline model provides better predictions for empty HUCs

69 Small Area Random Effect in Spatial Model? 55/ observed HUCs non observed HUCs HUC predictions TPS model HUC predictions HUC + TPS model HUC effect results in predictions that are closer to observed data

70 5. Conclusions 56/56 Generic and specific inference can be improved with nonparametric regression methods P-splines easily extend existing survey estimation approaches Utah: more efficient generic estimators for region and for domains NE Lakes: improved prediction for HUCs without observations To do: smoothing parameter selection mixed model inference, test for significance of random effects Contact information: http// jopsomer/home.html

Nonparametric Small Area Estimation Using Penalized Spline Regression

Nonparametric Small Area Estimation Using Penalized Spline Regression Nonparametric Small Area Estimation Using Penalized Spline Regression 0verview Spline-based nonparametric regression Nonparametric small area estimation Prediction mean squared error Bootstrapping small

More information

Model-assisted Estimation of Forest Resources with Generalized Additive Models

Model-assisted Estimation of Forest Resources with Generalized Additive Models Model-assisted Estimation of Forest Resources with Generalized Additive Models Jean Opsomer, Jay Breidt, Gretchen Moisen, Göran Kauermann August 9, 2006 1 Outline 1. Forest surveys 2. Sampling from spatial

More information

Nonparametric Small Area Estimation Using Penalized Spline Regression

Nonparametric Small Area Estimation Using Penalized Spline Regression Nonparametric Small Area Estimation Using Penalized Spline Regression J. D. Opsomer Iowa State University G. Claeskens Katholieke Universiteit Leuven M. G. Ranalli Universita degli Studi di Perugia G.

More information

Nonparametric Small Area Estimation via M-quantile Regression using Penalized Splines

Nonparametric Small Area Estimation via M-quantile Regression using Penalized Splines Nonparametric Small Estimation via M-quantile Regression using Penalized Splines Monica Pratesi 10 August 2008 Abstract The demand of reliable statistics for small areas, when only reduced sizes of the

More information

Penalized Balanced Sampling. Jay Breidt

Penalized Balanced Sampling. Jay Breidt Penalized Balanced Sampling Jay Breidt Colorado State University Joint work with Guillaume Chauvet (ENSAI) February 4, 2010 1 / 44 Linear Mixed Models Let U = {1, 2,...,N}. Consider linear mixed models

More information

F. Jay Breidt Colorado State University

F. Jay Breidt Colorado State University Model-assisted survey regression estimation with the lasso 1 F. Jay Breidt Colorado State University Opening Workshop on Computational Methods in Social Sciences SAMSI August 2013 This research was supported

More information

Small Area Estimation Using a Nonparametric Model Based Direct Estimator

Small Area Estimation Using a Nonparametric Model Based Direct Estimator University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Information Sciences 2009 Small Area Estimation Using a Nonparametric

More information

Penalized Splines, Mixed Models, and Recent Large-Sample Results

Penalized Splines, Mixed Models, and Recent Large-Sample Results Penalized Splines, Mixed Models, and Recent Large-Sample Results David Ruppert Operations Research & Information Engineering, Cornell University Feb 4, 2011 Collaborators Matt Wand, University of Wollongong

More information

Small Area Estimation for Skewed Georeferenced Data

Small Area Estimation for Skewed Georeferenced Data Small Area Estimation for Skewed Georeferenced Data E. Dreassi - A. Petrucci - E. Rocco Department of Statistics, Informatics, Applications "G. Parenti" University of Florence THE FIRST ASIAN ISI SATELLITE

More information

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y

More information

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science 1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School

More information

The Hodrick-Prescott Filter

The Hodrick-Prescott Filter The Hodrick-Prescott Filter A Special Case of Penalized Spline Smoothing Alex Trindade Dept. of Mathematics & Statistics, Texas Tech University Joint work with Rob Paige, Missouri University of Science

More information

Cross-validation in model-assisted estimation

Cross-validation in model-assisted estimation Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 009 Cross-validation in model-assisted estimation Lifeng You Iowa State University Follow this and additional

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Model-assisted Estimation of Forest Resources with Generalized Additive Models

Model-assisted Estimation of Forest Resources with Generalized Additive Models Model-assisted Estimation of Forest Resources with Generalized Additive Models Jean D. Opsomer, F. Jay Breidt, Gretchen G. Moisen, and Göran Kauermann March 26, 2003 Abstract Multi-phase surveys are often

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Institute of Statistics and Econometrics Georg-August-University Göttingen Department of Statistics

More information

Additional results for model-based nonparametric variance estimation for systematic sampling in a forestry survey

Additional results for model-based nonparametric variance estimation for systematic sampling in a forestry survey Additional results for model-based nonparametric variance estimation for systematic sampling in a forestry survey J.D. Opsomer Colorado State University M. Francisco-Fernández Universidad de A Coruña July

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

NONPARAMETRIC ENDOGENOUS POST-STRATIFICATION ESTIMATION

NONPARAMETRIC ENDOGENOUS POST-STRATIFICATION ESTIMATION Statistica Sinica 2011): Preprint 1 NONPARAMETRIC ENDOGENOUS POST-STRATIFICATION ESTIMATION Mark Dahlke 1, F. Jay Breidt 1, Jean D. Opsomer 1 and Ingrid Van Keilegom 2 1 Colorado State University and 2

More information

Nonparametric Regression Estimation of Finite Population Totals under Two-Stage Sampling

Nonparametric Regression Estimation of Finite Population Totals under Two-Stage Sampling Nonparametric Regression Estimation of Finite Population Totals under Two-Stage Sampling Ji-Yeon Kim Iowa State University F. Jay Breidt Colorado State University Jean D. Opsomer Colorado State University

More information

1. Introduction. Keywords: Auxiliary information; Nonparametric regression; Pseudo empirical likelihood; Model-assisted approach; MARS.

1. Introduction. Keywords: Auxiliary information; Nonparametric regression; Pseudo empirical likelihood; Model-assisted approach; MARS. Nonparametric Methods for Sample Surveys of Environmental Populations Metodi nonparametrici nell inferenza per popolazioni finite di carattere ambientale Giorgio E. Montanari Dipartimento di Economia,

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

Some properties of Likelihood Ratio Tests in Linear Mixed Models

Some properties of Likelihood Ratio Tests in Linear Mixed Models Some properties of Likelihood Ratio Tests in Linear Mixed Models Ciprian M. Crainiceanu David Ruppert Timothy J. Vogelsang September 19, 2003 Abstract We calculate the finite sample probability mass-at-zero

More information

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse:

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

Analysing geoadditive regression data: a mixed model approach

Analysing geoadditive regression data: a mixed model approach Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression

More information

Andreas Blöchl: Trend Estimation with Penalized Splines as Mixed Models for Series with Structural Breaks

Andreas Blöchl: Trend Estimation with Penalized Splines as Mixed Models for Series with Structural Breaks Andreas Blöchl: Trend Estimation with Penalized Splines as Mixed Models for Series with Structural Breaks Munich Discussion Paper No. 2014-2 Department of Economics University of Munich Volkswirtschaftliche

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

Small Area Modeling of County Estimates for Corn and Soybean Yields in the US

Small Area Modeling of County Estimates for Corn and Soybean Yields in the US Small Area Modeling of County Estimates for Corn and Soybean Yields in the US Matt Williams National Agricultural Statistics Service United States Department of Agriculture Matt.Williams@nass.usda.gov

More information

ASYMPTOTICS FOR PENALIZED SPLINES IN ADDITIVE MODELS

ASYMPTOTICS FOR PENALIZED SPLINES IN ADDITIVE MODELS Mem. Gra. Sci. Eng. Shimane Univ. Series B: Mathematics 47 (2014), pp. 63 71 ASYMPTOTICS FOR PENALIZED SPLINES IN ADDITIVE MODELS TAKUMA YOSHIDA Communicated by Kanta Naito (Received: December 19, 2013)

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Recovering Indirect Information in Demographic Applications

Recovering Indirect Information in Demographic Applications Recovering Indirect Information in Demographic Applications Jutta Gampe Abstract In many demographic applications the information of interest can only be estimated indirectly. Modelling events and rates

More information

Calibration estimation using exponential tilting in sample surveys

Calibration estimation using exponential tilting in sample surveys Calibration estimation using exponential tilting in sample surveys Jae Kwang Kim February 23, 2010 Abstract We consider the problem of parameter estimation with auxiliary information, where the auxiliary

More information

Regularization in Cox Frailty Models

Regularization in Cox Frailty Models Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University

More information

Exact Likelihood Ratio Tests for Penalized Splines

Exact Likelihood Ratio Tests for Penalized Splines Exact Likelihood Ratio Tests for Penalized Splines By CIPRIAN CRAINICEANU, DAVID RUPPERT, GERDA CLAESKENS, M.P. WAND Department of Biostatistics, Johns Hopkins University, 615 N. Wolfe Street, Baltimore,

More information

1 Mixed effect models and longitudinal data analysis

1 Mixed effect models and longitudinal data analysis 1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between

More information

Nonparametric Regression. Badr Missaoui

Nonparametric Regression. Badr Missaoui Badr Missaoui Outline Kernel and local polynomial regression. Penalized regression. We are given n pairs of observations (X 1, Y 1 ),...,(X n, Y n ) where Y i = r(x i ) + ε i, i = 1,..., n and r(x) = E(Y

More information

Estimating Timber Volume using Airborne Laser Scanning Data based on Bayesian Methods J. Breidenbach 1 and E. Kublin 2

Estimating Timber Volume using Airborne Laser Scanning Data based on Bayesian Methods J. Breidenbach 1 and E. Kublin 2 Estimating Timber Volume using Airborne Laser Scanning Data based on Bayesian Methods J. Breidenbach 1 and E. Kublin 2 1 Norwegian University of Life Sciences, Department of Ecology and Natural Resource

More information

Forecasting Data Streams: Next Generation Flow Field Forecasting

Forecasting Data Streams: Next Generation Flow Field Forecasting Forecasting Data Streams: Next Generation Flow Field Forecasting Kyle Caudle South Dakota School of Mines & Technology (SDSMT) kyle.caudle@sdsmt.edu Joint work with Michael Frey (Bucknell University) and

More information

Approaches to Modeling Menstrual Cycle Function

Approaches to Modeling Menstrual Cycle Function Approaches to Modeling Menstrual Cycle Function Paul S. Albert (albertp@mail.nih.gov) Biostatistics & Bioinformatics Branch Division of Epidemiology, Statistics, and Prevention Research NICHD SPER Student

More information

Likelihood ratio testing for zero variance components in linear mixed models

Likelihood ratio testing for zero variance components in linear mixed models Likelihood ratio testing for zero variance components in linear mixed models Sonja Greven 1,3, Ciprian Crainiceanu 2, Annette Peters 3 and Helmut Küchenhoff 1 1 Department of Statistics, LMU Munich University,

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency

Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency Chris Paciorek March 11, 2005 Department of Biostatistics Harvard School of

More information

Simultaneous Confidence Bands for the Coefficient Function in Functional Regression

Simultaneous Confidence Bands for the Coefficient Function in Functional Regression University of Haifa From the SelectedWorks of Philip T. Reiss August 7, 2008 Simultaneous Confidence Bands for the Coefficient Function in Functional Regression Philip T. Reiss, New York University Available

More information

Nonstationary Regressors using Penalized Splines

Nonstationary Regressors using Penalized Splines Estimation and Inference for Varying-coefficient Models with Nonstationary Regressors using Penalized Splines Haiqiang Chen, Ying Fang and Yingxing Li Wang Yanan Institute for Studies in Economics, MOE

More information

Single-index model-assisted estimation in survey sampling

Single-index model-assisted estimation in survey sampling Journal of onparametric Statistics Vol. 21, o. 4, May 2009, 487 504 Single-index model-assisted estimation in survey sampling Li Wang* Department of Statistics, University of Georgia, Athens, GA, 30602,

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

Combining data from two independent surveys: model-assisted approach

Combining data from two independent surveys: model-assisted approach Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Model-Assisted Estimation of Forest Resources With Generalized Additive Models

Model-Assisted Estimation of Forest Resources With Generalized Additive Models Model-Assisted Estimation of Forest Resources With Generalized Additive Models Jean D. OPSOMER, F.JayBREIDT, Gretchen G. MOISEN, and Göran KAUERMANN Multiphase surveys are often conducted in forest inventories,

More information

arxiv: v2 [math.st] 20 Jun 2014

arxiv: v2 [math.st] 20 Jun 2014 A solution in small area estimation problems Andrius Čiginas and Tomas Rudys Vilnius University Institute of Mathematics and Informatics, LT-08663 Vilnius, Lithuania arxiv:1306.2814v2 [math.st] 20 Jun

More information

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations

More information

P -spline ANOVA-type interaction models for spatio-temporal smoothing

P -spline ANOVA-type interaction models for spatio-temporal smoothing P -spline ANOVA-type interaction models for spatio-temporal smoothing Dae-Jin Lee 1 and María Durbán 1 1 Department of Statistics, Universidad Carlos III de Madrid, SPAIN. e-mail: dae-jin.lee@uc3m.es and

More information

Nonparametric Regression

Nonparametric Regression Nonparametric Regression Econ 674 Purdue University April 8, 2009 Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 1 / 31 Consider the univariate nonparametric regression model: where y

More information

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy

More information

Beyond Mean Regression

Beyond Mean Regression Beyond Mean Regression Thomas Kneib Lehrstuhl für Statistik Georg-August-Universität Göttingen 8.3.2013 Innsbruck Introduction Introduction One of the top ten reasons to become statistician (according

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

One-stage dose-response meta-analysis

One-stage dose-response meta-analysis One-stage dose-response meta-analysis Nicola Orsini, Alessio Crippa Biostatistics Team Department of Public Health Sciences Karolinska Institutet http://ki.se/en/phs/biostatistics-team 2017 Nordic and

More information

Missing Data Issues in the Studies of Neurodegenerative Disorders: the Methodology

Missing Data Issues in the Studies of Neurodegenerative Disorders: the Methodology Missing Data Issues in the Studies of Neurodegenerative Disorders: the Methodology Sheng Luo, PhD Associate Professor Department of Biostatistics & Bioinformatics Duke University Medical Center sheng.luo@duke.edu

More information

Spatial Lasso with Application to GIS Model Selection. F. Jay Breidt Colorado State University

Spatial Lasso with Application to GIS Model Selection. F. Jay Breidt Colorado State University Spatial Lasso with Application to GIS Model Selection F. Jay Breidt Colorado State University with Hsin-Cheng Huang, Nan-Jung Hsu, and Dave Theobald September 25 The work reported here was developed under

More information

Variable Selection and Model Choice in Survival Models with Time-Varying Effects

Variable Selection and Model Choice in Survival Models with Time-Varying Effects Variable Selection and Model Choice in Survival Models with Time-Varying Effects Boosting Survival Models Benjamin Hofner 1 Department of Medical Informatics, Biometry and Epidemiology (IMBE) Friedrich-Alexander-Universität

More information

Comments on: Model-free model-fitting and predictive distributions : Applications to Small Area Statistics and Treatment Effect Estimation

Comments on: Model-free model-fitting and predictive distributions : Applications to Small Area Statistics and Treatment Effect Estimation Article Comments on: Model-free model-fitting and predictive distributions : Applications to Small Area Statistics and Treatment Effect Estimation SPERLICH, Stefan Andréas Reference SPERLICH, Stefan Andréas.

More information

Restricted Likelihood Ratio Tests in Nonparametric Longitudinal Models

Restricted Likelihood Ratio Tests in Nonparametric Longitudinal Models Restricted Likelihood Ratio Tests in Nonparametric Longitudinal Models Short title: Restricted LR Tests in Longitudinal Models Ciprian M. Crainiceanu David Ruppert May 5, 2004 Abstract We assume that repeated

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

Linear Regression Model. Badr Missaoui

Linear Regression Model. Badr Missaoui Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus

More information

AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET

AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET The Problem Identification of Linear and onlinear Dynamical Systems Theme : Curve Fitting Division of Automatic Control Linköping University Sweden Data from Gripen Questions How do the control surface

More information

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University REGRESSION ITH SPATIALL MISALIGNED DATA Lisa Madsen Oregon State University David Ruppert Cornell University SPATIALL MISALIGNED DATA 10 X X X X X X X X 5 X X X X X 0 X 0 5 10 OUTLINE 1. Introduction 2.

More information

WLS and BLUE (prelude to BLUP) Prediction

WLS and BLUE (prelude to BLUP) Prediction WLS and BLUE (prelude to BLUP) Prediction Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 21, 2018 Suppose that Y has mean X β and known covariance matrix V (but Y need

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Kneib, Fahrmeir: Supplement to Structured additive regression for categorical space-time data: A mixed model approach Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics

More information

Spatial Process Estimates as Smoothers: A Review

Spatial Process Estimates as Smoothers: A Review Spatial Process Estimates as Smoothers: A Review Soutir Bandyopadhyay 1 Basic Model The observational model considered here has the form Y i = f(x i ) + ɛ i, for 1 i n. (1.1) where Y i is the observed

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Efficient Estimation for the Partially Linear Models with Random Effects

Efficient Estimation for the Partially Linear Models with Random Effects A^VÇÚO 1 33 ò 1 5 Ï 2017 c 10 Chinese Journal of Applied Probability and Statistics Oct., 2017, Vol. 33, No. 5, pp. 529-537 doi: 10.3969/j.issn.1001-4268.2017.05.009 Efficient Estimation for the Partially

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

A comparison of stratified simple random sampling and sampling with probability proportional to size

A comparison of stratified simple random sampling and sampling with probability proportional to size A comparison of stratified simple random sampling and sampling with probability proportional to size Edgar Bueno Dan Hedlin Per Gösta Andersson 1 Introduction When planning the sampling strategy (i.e.

More information

Modeling Real Estate Data using Quantile Regression

Modeling Real Estate Data using Quantile Regression Modeling Real Estate Data using Semiparametric Quantile Regression Department of Statistics University of Innsbruck September 9th, 2011 Overview 1 Application: 2 3 4 Hedonic regression data for house prices

More information

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance

More information

Regularized PCA to denoise and visualise data

Regularized PCA to denoise and visualise data Regularized PCA to denoise and visualise data Marie Verbanck Julie Josse François Husson Laboratoire de statistique, Agrocampus Ouest, Rennes, France CNAM, Paris, 16 janvier 2013 1 / 30 Outline 1 PCA 2

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

A Hierarchical Perspective on Lee-Carter Models

A Hierarchical Perspective on Lee-Carter Models A Hierarchical Perspective on Lee-Carter Models Paul Eilers Leiden University Medical Centre L-C Workshop, Edinburgh 24 The vantage point Previous presentation: Iain Currie led you upward From Glen Gumbel

More information

Selection of Variables and Functional Forms in Multivariable Analysis: Current Issues and Future Directions

Selection of Variables and Functional Forms in Multivariable Analysis: Current Issues and Future Directions in Multivariable Analysis: Current Issues and Future Directions Frank E Harrell Jr Department of Biostatistics Vanderbilt University School of Medicine STRATOS Banff Alberta 2016-07-04 Fractional polynomials,

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

AIR FORCE INSTITUTE OF TECHNOLOGY

AIR FORCE INSTITUTE OF TECHNOLOGY TOPICS IN STATISTICAL CALIBRATION DISSERTATION Brandon M. Greenwell, Civilian AFIT-ENC-DS-14-M-01 DEPARTMENT OF THE AIR FORCE AIR UNIVERSITY AIR FORCE INSTITUTE OF TECHNOLOGY Wright-Patterson Air Force

More information

Lesson 6 Maximum Likelihood: Part III. (plus a quick bestiary of models)

Lesson 6 Maximum Likelihood: Part III. (plus a quick bestiary of models) Lesson 6 Maximum Likelihood: Part III (plus a quick bestiary of models) Homework You want to know the density of fish in a set of experimental ponds You observe the following counts in ten ponds: 5,6,7,3,6,5,8,4,4,3

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Imputation for Missing Data under PPSWR Sampling

Imputation for Missing Data under PPSWR Sampling July 5, 2010 Beijing Imputation for Missing Data under PPSWR Sampling Guohua Zou Academy of Mathematics and Systems Science Chinese Academy of Sciences 1 23 () Outline () Imputation method under PPSWR

More information

Estimation of cumulative distribution function with spline functions

Estimation of cumulative distribution function with spline functions INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

Flexible Spatio-temporal smoothing with array methods

Flexible Spatio-temporal smoothing with array methods Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session IPS046) p.849 Flexible Spatio-temporal smoothing with array methods Dae-Jin Lee CSIRO, Mathematics, Informatics and

More information

ERRATA. for Semiparametric Regression. Last Updated: 30th September, 2014

ERRATA. for Semiparametric Regression. Last Updated: 30th September, 2014 1 ERRATA for Semiparametric Regression by D. Ruppert, M. P. Wand and R. J. Carroll Last Updated: 30th September, 2014 p.6. In the vertical axis Figure 1.7 the lower 1, 2 and 3 should have minus signs.

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

Spatial bias modeling with application to assessing remotely-sensed aerosol as a proxy for particulate matter

Spatial bias modeling with application to assessing remotely-sensed aerosol as a proxy for particulate matter Spatial bias modeling with application to assessing remotely-sensed aerosol as a proxy for particulate matter Chris Paciorek Department of Biostatistics Harvard School of Public Health application joint

More information

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable. Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have

More information

Chapter 8: Estimation 1

Chapter 8: Estimation 1 Chapter 8: Estimation 1 Jae-Kwang Kim Iowa State University Fall, 2014 Kim (ISU) Ch. 8: Estimation 1 Fall, 2014 1 / 33 Introduction 1 Introduction 2 Ratio estimation 3 Regression estimator Kim (ISU) Ch.

More information