Nonparametric model calibration estimation in survey sampling

Ames February 18, 004 Nonparametrc model calbraton estmaton n survey samplng M. Govanna Ranall Department of Statstcs, Colorado State Unversty (Jont work wth G.E. Montanar, Dpartmento d Scenze Statstche, Unverstà d eruga, Italy) 1/7 M L

Outlne /7 M L

Outlne Introducton to the framework and notaton /7 M L

Outlne Introducton to the framework and notaton Auxlary nformaton: Calbraton /7 M L

Outlne Introducton to the framework and notaton Auxlary nformaton: Calbraton Allowng for more flexble models: Model Calbraton /7 M L

Outlne Introducton to the framework and notaton Auxlary nformaton: Calbraton Allowng for more flexble models: Model Calbraton Nonparametrc methods for Model Calbraton: Neural Networks and Local olynomals /7 M L

The roblem am and tools We want to know more about a Fnte opulaton... U = {u 1, u,..., u N } fnte populaton of N dstnct unts labelled by the ntegers = 1,..., N. y value taken by the survey varable y n unt. Ȳ = N 1 U y populaton mean of y, parameter of nterest.... by means of a robablstc sample and Auxlary Informaton. s sample of sze n selected from U accordng to a samplng desgn p(s), whch nduces frst and second order ncluson probabltes π and π j. y, for s values of y observed on sample unts. x = (x 1, x,..., x Q ) known for U. value taken by Q auxlary varables x on unt and /7 M L

Calbraton Estmators Devlle & Särndal (199) 4/7 M L

Calbraton Estmators Devlle & Särndal (199) CAL = N 1 s w y 4/7 M L

Calbraton Estmators Devlle & Särndal (199) CAL = N 1 s w y The weghts w are found to be as close as possble to d = π 1, whle meetng benchmark constrants: 4/7 M L

Calbraton Estmators Devlle & Särndal (199) CAL = N 1 s w y The weghts w are found to be as close as possble to d = π 1, whle meetng benchmark constrants: mn w Φ (w, d ) s.t. N 1 s w x = N 1 U x s 4/7 M L

Calbraton Estmators Devlle & Särndal (199) CAL = N 1 s w y The weghts w are found to be as close as possble to d = π 1, whle meetng benchmark constrants: mn w Φ (w, d ) s.t. N 1 s w x = N 1 U x s Dfferent dstance measures can be employed n the mnmzaton procedure as long as they meet some basc requrements. 4/7 M L

How to get a soluton 5/7 M L

How to get a soluton The mnmzaton problem s solved by φ (w, d ) x l = 0, where φ (w, d) = Φ(w, d)/ w and l s a vector of Lagrangan multplers. 5/7 M L

How to get a soluton The mnmzaton problem s solved by φ (w, d ) x l = 0, where φ (w, d) = Φ(w, d)/ w and l s a vector of Lagrangan multplers. If a soluton exsts, basc requrements guarantee t s unque and can be wrtten as w = d F (x l), where d F ( ) s the recprocal mappng of φ(, d ). The vector l can be determned through the calbraton constrants: x = 1 N w x = s s d F (x l)x g s (l) = s Therefore, gven a sample s and chosen a dstance functon Φ (, d) d (F (x l) 1)x = x ˆ x. 5/7 M L

An example - not chosen by chance... 6/7 M L

An example - not chosen by chance... 1. Assume you choose a quadratc dstance functon Φ (w, d ) = (w d ) d q, where the q s are known constants unrelated to d. 6/7 M L

An example - not chosen by chance... 1. Assume you choose a quadratc dstance functon Φ (w, d ) = (w d ) d q, where the q s are known constants unrelated to d. The partal dervatve s gven by φ (w, d ) = (w /d 1)/q and ts recprocal mappng by d F (u) = d (1 + q u); therefore w = d (1 + q x l). 6/7 M L

An example - not chosen by chance... (follows). And the calbraton estmator follows to be 7/7 M L

An example - not chosen by chance... (follows). And the calbraton estmator follows to be CAL = N 1 s w y = 7/7 M L

An example - not chosen by chance... (follows). And the calbraton estmator follows to be CAL = N 1 s w y = = N 1 s d (1 + q x ( s d q x x ) 1 ( x ˆ x) )y = 7/7 M L

An example - not chosen by chance... (follows). And the calbraton estmator follows to be CAL = N 1 s w y = = N 1 s d (1 + q x ( s d q x x ) 1 ( x ˆ x) )y = = ˆȲ + ( x ˆ x)( s d q x x ) 1 s d q x y = 7/7 M L

An example - not chosen by chance... (follows). And the calbraton estmator follows to be CAL = N 1 s w y = = N 1 s d (1 + q x ( s d q x x ) 1 ( x ˆ x) )y = = ˆȲ + ( x ˆ x)( s d q x x ) 1 s d q x y = = ˆȲ + ( x ˆ x) ˆβ c 7/7 M L

An example - not chosen by chance... (follows). And the calbraton estmator follows to be CAL = N 1 s w y = = N 1 s d (1 + q x ( s d q x x ) 1 ( x ˆ x) )y = = ˆȲ + ( x ˆ x)( s d q x x ) 1 s d q x y = = ˆȲ + ( x ˆ x) ˆβ c Whch s equvalent to the Generalzed REGresson estmator, GREG, f we choose the q s to represent the varance structure of the workng model. 7/7 M L

Calbraton Estmators - propertes 8/7 M L

Calbraton Estmators - propertes In an asymptotc framework as n Isak and Fuller (198) and under regularty condtons (Devlle & Särndal, 199, p.79), CAL s shown to be desgn n-consstent for Ȳ, n the sense that CAL Ȳ = O p(n 1/ ). Moreover, all the calbraton estmators are asymptotcally equvalent to the one that employs a quadratc dstance measure. In fact, CAL GREG= O p (n 1 ). 8/7 M L

Model Calbraton - the dea (Wu & Stter, 001) 9/7 M L

Model Calbraton - the dea (Wu & Stter, 001) 1. Assume a (lnear), nonlnear, generalzed lnear model well descrbes the relatonshp between x and y; 9/7 M L

Model Calbraton - the dea (Wu & Stter, 001) 1. Assume a (lnear), nonlnear, generalzed lnear model well descrbes the relatonshp between x and y;. obtan obtan ftted values of y for all unts n the populaton accordng to the model and accountng for the samplng plan; 9/7 M L

Model Calbraton - n symbols 10/7 M L

Model Calbraton - n symbols 1. Workng model { Eξ (y ) = µ(x, θ), U, µ( ) s known V ξ (y ) = σ v(x ), U, v( ) s known µ(x, θ) nonlnear model g(µ ) = x θ (lnear or) generalzed lnear model 10/7 M L

Model Calbraton - n symbols 1. Workng model { Eξ (y ) = µ(x, θ), U, µ( ) s known V ξ (y ) = σ v(x ), U, v( ) s known µ(x, θ) nonlnear model g(µ ) = x θ (lnear or) generalzed lnear model. Obtan ftted values ˆµ = µ(x, ˆθ), ˆθ = f {y, x, π } by e.g. estmatng equatons modfed to account for the desgn. 10/7 M L

Model Calbraton - the new calbraton step 11/7 M L

Model Calbraton - the new calbraton step. Buld the model calbraton estmator MC = N 1 s w y 11/7 M L

Model Calbraton - the new calbraton step. Buld the model calbraton estmator MC = N 1 s w y wth weghts w found agan to be as close as possble to d benchmark constrants: = π 1, but wth new 11/7 M L

Model Calbraton - the new calbraton step. Buld the model calbraton estmator MC = N 1 s w y wth weghts w found agan to be as close as possble to d benchmark constrants: = π 1, but wth new mn w s (w d ) d q s.t. s w = N N 1 s w ˆµ = N 1 U ˆµ The estmator s obtaned as n the prevous example as ( MC = ˆȲ + 1 ˆµ ) d ˆµ N s U ˆβ mc where ˆβ mc, s the GREG coeffcent of a regresson estmator of y on ˆµ. 11/7 M L

Nonparametrc model calbraton - the dea 1/7 M L

Nonparametrc model calbraton - the dea Allow more flexble modellng than parametrc model calbraton by assumng more general models and employ nonparametrc technques to obtan ftted values to calbrate on. Any nonparametrc technque can n prncple be employed. We wll focus on Neural Networks because ths method 1. allows dervng theoretcal propertes of the resultng estmator,. s a popular nonparametrc technque for whch software s easly avalable,. allows straghtforward nserton of multvarate auxlary nformaton; 1/7 M L

...Quckly, what s a Feedforward Neural Network wth skp layer connectons 1/7 M L

...Quckly, what s a Feedforward Neural Network wth skp layer connectons It s just a lnear combnaton of sgmodal functons of lnear combnatons of the auxlary varables LUS a lnear model 1/7 M L

...Quckly, what s a Feedforward Neural Network wth skp layer connectons It s just a lnear combnaton of sgmodal functons of lnear combnatons of the auxlary varables LUS a lnear model f(x ) = ( ) Q M Q q=1 β qx q + m=1 am φ q=1 γ qmx q + γ 0m +a 0 We wll denote by θ = {β 1,..., β Q, a 0, a 1,..., a M, γ 01,..., γ 0M, γ 1,..., γ M } the set of all parameters of the net. 1/7 M L

A dagram that usually represents a NNet x 1 x x Q 5 γ 11 5 γ 01 5 1 γ 1M φ a 1 a 0 φ a. β 5. M 5 y a M γ Q1 γ φ QM β Q... lauxlary varables Neurons Response varable β 1 14/7 M L

Nonparametrc model calbraton - the steps 15/7 M L

Nonparametrc model calbraton - the steps 1. Workng model - Neural Network model E ξ (y ) = f(x ), V ξ (y ) = v(x ), U U 15/7 M L

Nonparametrc model calbraton - the steps 1. Workng model - Neural Network model E ξ (y ) = f(x ), V ξ (y ) = v(x ), U U. Obtan ftted values. Frst get the populaton parameter as the mnmzer of { N } 1 r θ = argmn (y f(x, θ)) + λ θl, θ Θ v where λ s a weght decay parameter, by means of estmatng equatons N =1 =1 { (y f(x, θ)) f(x, θ) θ l=1 1 λ } v N θ = 0. Then get ˆθ as the soluton of the desgn based sample verson of t n { 1 (y f(x, θ)) f(x, θ) 1 λ } θ v N θ = 0. =1 π 15/7 M L

Nonparametrc model calbraton - the steps 1. Workng model - Neural Network model E ξ (y ) = f(x ), V ξ (y ) = v(x ), U U. Obtan ftted values. Frst get the populaton parameter as the mnmzer of { N } 1 r θ = argmn (y f(x, θ)) + λ θl, θ Θ v where λ s a weght decay parameter, by means of estmatng equatons N { (y f(x, θ)) f(x, θ) 1 λ } θ v N θ = 0. =1 =1 l=1 Then get ˆθ as the soluton of the desgn based sample verson of t n { 1 (y f(x, θ)) f(x, θ) 1 λ } θ v N θ =1 π = 0. Then we can obtan ftted values ˆf = f(x, ˆθ) such that ˆf = f(x, θ) + O p (n 1/ ). 15/7 M L

Nonparametrc model calbraton - the calbraton step 16/7 M L

Nonparametrc model calbraton - the calbraton step. Buld the calbraton estmator as for parametrc model calbraton, just the ftted values n the calbraton constrant are dfferent NNMC = N 1 s w y 16/7 M L

Nonparametrc model calbraton - the calbraton step. Buld the calbraton estmator as for parametrc model calbraton, just the ftted values n the calbraton constrant are dfferent NNMC = N 1 s w y mn w s (w d ) d q s.t. s w = N N 1 s w ˆf = N 1 U ˆf Then, NNMC = ˆȲ + 1 N ( U ˆf s 16/7 M L d ˆf ) where ˆβ nn, s the GREG coeffcent of a regresson estmator of y on ˆf. (Cfr wth a modfed GAM regresson estmator n Opsomer et al., 001) ˆβ nn

Local polynomals model calbraton - the steps 17/7 M L

Local polynomals model calbraton - the steps 1. Workng model - A very general unvarate model { Eξ (y ) = m(x ), U V ξ (y ) = v(x ), U 17/7 M L

Local polynomals model calbraton - the steps 1. Workng model - A very general unvarate model { Eξ (y ) = m(x ), U V ξ (y ) = v(x ), U. Get ftted values by means of local polynomals ˆm = e 1(X sw s X s ) 1 X sw s y s, where e 1 = (1, 0,..., 0) s a column vector of length p + 1, p s the order of the local polynomal ft, y s = (y 1,..., y n ), W s = dag {d j K h (x j x )} j s, K s a kernel functon, h s the bandwdth and X s = [1 x j x (x j x ) p ] j s. 17/7 M L

Local polynomals model calbraton - the steps 18/7 M L

Local polynomals model calbraton - the steps. Buld the model calbraton estmator as prevously LMC = N 1 s w y 18/7 M L

Local polynomals model calbraton - the steps. Buld the model calbraton estmator as prevously LMC = N 1 s w y mn w s (w d ) d q s.t. s w = N N 1 s w ˆm = N 1 U ˆm 18/7 M L

Local polynomals model calbraton - the steps. Buld the model calbraton estmator as prevously LMC = N 1 s w y Then, mn w s (w d ) d q s.t. LMC = ˆȲ + 1 N ( s w = N N 1 s w ˆm = N 1 U ˆm U ˆm s d ˆm ) where ˆβ lp, s the GREG coeffcent of a regresson estmator of y on ˆm. ˆβ lp 18/7 M L

Whch s the dfference between LMC and LRE The Local polynomal regresson estmator (Bredt & Opsomer, 000) s gven by ( LRE = ˆȲ + 1 ˆm ) d ˆm. N s U 19/7 M L

Whch s the dfference between LMC and LRE The Local polynomal regresson estmator (Bredt & Opsomer, 000) s gven by ( LRE = ˆȲ + 1 ˆm ) d ˆm. N s U The only dfference lays n the supplementary regresson step provded by ˆβ lp. LMC can be also thought of a GREG estmator wth a workng model of the type E ξ (y ) = a + b m(x ). If the nonparametrc technque provdes based estmates of the mean functon or the workng model s not vald, then ths step wll lead to more effcent estmates by correctng ths bas. 19/7 M L

Smulaton Study - MAHA regon The Md-Atlantc Hghlands study regon ncludes the area from the Blue Rdge Mountans n the east to the Oho Rver n the west and from the Catskll Mountans n the north to Vrgna n the south. 0/7 M L

Smulaton Study - data Md-Atlantc Hghlands Area (MAHA) EA Regon : A, VA, WV, DE, MD. Data collected by EAs EMA. Streams sampled from 199 through 1996. Total Ntrogen (NTL) and Total hosphorus (TL) sampled at 574 stes. roporton of watershed devoted to Agrculture (AG) avalable from remote sensng. 1/7 M L

Scatterplots of the two survey varables wrt AG 18000 NTL 700 TL 16000 600 14000 500 1000 10000 400 8000 00 6000 00 4000 000 100 0 0 0. 0.4 0.6 0.8 1 AG 0 0 0. 0.4 0.6 0.8 1 AG /7 M L

Smulaton Study - setup N = 574, auxlary varable: AG, survey varables: NTL and TL. A thousand smple random samples wthout replacement of dmenson n = 100 have been drawn from U. /7 M L

Smulaton Study - setup N = 574, auxlary varable: AG, survey varables: NTL and TL. A thousand smple random samples wthout replacement of dmenson n = 100 have been drawn from U. For each sample we computed the followng estmators HT the sample mean; CAL the regresson estmator; NNMC wth four combnatons of complexty parameters M and λ LMC, LRE wth a local constant and a local lnear ft wth bandwdths h = 0.10 and h = 0.5 /7 M L

Smulaton Study - results NTL TL HT 1.44 1.07 CAL 1.000 1.000 NNMC M = λ=5e-4 0.810 1.00 M =4 λ=5e-4 0.81 1.010 M =6 λ=1e- 0.809 1.010 M =8 λ=1e- 0.808 1.00 LMC p=0 h=0.1 0.807 1.051 p=0 h=0.5 0.809 1.00 p=1 h=0.1 0.41.44 p=1 h=0.5 0.814 1.008 LRE p=0 h=0.1 0.81 1.1 p=0 h=0.5 0.845 1.00 p=1 h=0.1 1.51.465 p=1 h=0.5 0.815 1.008

Conclusons The use of Nonparametrc methods to obtan ftted values for non-sampled unts provdes more effcent estmates by allowng for more complex modellng than wth parametrc calbraton. Neural Networks have shown effcent and robust behavors n both unvarate and (multvarate - not shown here) settngs. Such behavor seems to be less nfluenced by the choce of complexty parameters w.r.t. other nonparametrc methods. Wth respect to nonparametrc regresson estmaton, the supplementary calbraton step performed by model calbraton provdes more effcent estmators when the ftted values underft the data (low degree polynomal and/or large bandwdth) 5/7 M L

Ongong research and perspectves erformance of NNMC s beng explored wth multple auxlary nformaton and compared to other nonparametrc technques (GAM, MARS, DART) Model calbraton weghts depend on the survey varable, therefore a sngle set of weghts s not avalable for more than one. Also studyng the effects of multple smultaneous calbraton (dea n Opsomer et al., 001). Varance estmaton s an ssue when nonparametrc methods are employed: usually largely underestmated. 6/7 M L

Essental bblography Bredt F.J. and Opsomer J.D. (000), Local polynomal regresson estmators n survey samplng, The Annals of Statstcs, 8, 106 105. Devlle J.C. and Särndal C.E. (199), JASA, 87, 76 8. Calbraton estmators n survey samplng, Opsomer J.D., Mosen G.G. and Km J.Y. (001), Model-asssted estmaton of forest resources wth generalzed addtve models, ASA roceedngs of the Secton on Survey Research Methods. Wu C. and Stter R.R. (001), A model-calbraton to usng complete auxlary nformaton from survey data, JASA, 96, 185 19. The work reported here was n part developed under the STAR Research Assstance Agreement CR-89095 awarded by the U.S. Envronmental rotecton Agency (EA) to Colorado State Unversty. Ths presentaton has not been formally revewed by EA. The vews expressed here are solely those of the presenter and STARMA. EA does not endorse any products or commercal servces mentoned n ths presentaton. 7/7 M L