Sampling Coordination of Business Surveys Conducted by Insee

Similar documents
V.Abramov - FURTHER ANALYSIS OF CONFIDENCE INTERVALS FOR LARGE CLIENT/SERVER COMPUTER NETWORKS

Linear Response Theory: The connection between QFT and experiments

Variants of Pegasos. December 11, 2009

Solution in semi infinite diffusion couples (error function analysis)

In the complete model, these slopes are ANALYSIS OF VARIANCE FOR THE COMPLETE TWO-WAY MODEL. (! i+1 -! i ) + [(!") i+1,q - [(!

On One Analytic Method of. Constructing Program Controls

Robustness Experiments with Two Variance Components

Time-interval analysis of β decay. V. Horvat and J. C. Hardy

GENERATING CERTAIN QUINTIC IRREDUCIBLE POLYNOMIALS OVER FINITE FIELDS. Youngwoo Ahn and Kitae Kim

THE PREDICTION OF COMPETITIVE ENVIRONMENT IN BUSINESS

( ) () we define the interaction representation by the unitary transformation () = ()

CS286.2 Lecture 14: Quantum de Finetti Theorems II

Comb Filters. Comb Filters

[ ] 2. [ ]3 + (Δx i + Δx i 1 ) / 2. Δx i-1 Δx i Δx i+1. TPG4160 Reservoir Simulation 2018 Lecture note 3. page 1 of 5

TSS = SST + SSE An orthogonal partition of the total SS

( t) Outline of program: BGC1: Survival and event history analysis Oslo, March-May Recapitulation. The additive regression model

Mechanics Physics 151

. The geometric multiplicity is dim[ker( λi. number of linearly independent eigenvectors associated with this eigenvalue.

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 4

Econ107 Applied Econometrics Topic 5: Specification: Choosing Independent Variables (Studenmund, Chapter 6)

Mechanics Physics 151

FI 3103 Quantum Physics

Existence and Uniqueness Results for Random Impulsive Integro-Differential Equation

John Geweke a and Gianni Amisano b a Departments of Economics and Statistics, University of Iowa, USA b European Central Bank, Frankfurt, Germany

HEAT CONDUCTION PROBLEM IN A TWO-LAYERED HOLLOW CYLINDER BY USING THE GREEN S FUNCTION METHOD

Lecture 2 M/G/1 queues. M/G/1-queue

Tight results for Next Fit and Worst Fit with resource augmentation

Let s treat the problem of the response of a system to an applied external force. Again,

UNIVERSITAT AUTÒNOMA DE BARCELONA MARCH 2017 EXAMINATION

. The geometric multiplicity is dim[ker( λi. A )], i.e. the number of linearly independent eigenvectors associated with this eigenvalue.

5th International Conference on Advanced Design and Manufacturing Engineering (ICADME 2015)

Lecture 18: The Laplace Transform (See Sections and 14.7 in Boas)

Appendix H: Rarefaction and extrapolation of Hill numbers for incidence data

New M-Estimator Objective Function. in Simultaneous Equations Model. (A Comparative Study)

CH.3. COMPATIBILITY EQUATIONS. Continuum Mechanics Course (MMC) - ETSECCPB - UPC

Mechanics Physics 151

Department of Economics University of Toronto

Anisotropic Behaviors and Its Application on Sheet Metal Stamping Processes

J i-1 i. J i i+1. Numerical integration of the diffusion equation (I) Finite difference method. Spatial Discretization. Internal nodes.

Robust and Accurate Cancer Classification with Gene Expression Profiling

RELATIONSHIP BETWEEN VOLATILITY AND TRADING VOLUME: THE CASE OF HSI STOCK RETURNS DATA

F-Tests and Analysis of Variance (ANOVA) in the Simple Linear Regression Model. 1. Introduction

Ordinary Differential Equations in Neuroscience with Matlab examples. Aim 1- Gain understanding of how to set up and solve ODE s

P R = P 0. The system is shown on the next figure:

Volatility Interpolation

Graduate Macroeconomics 2 Problem set 5. - Solutions

[Link to MIT-Lab 6P.1 goes here.] After completing the lab, fill in the following blanks: Numerical. Simulation s Calculations

Math 128b Project. Jude Yuen

CHAPTER 10: LINEAR DISCRIMINATION

Part II CONTINUOUS TIME STOCHASTIC PROCESSES

FTCS Solution to the Heat Equation

THEORETICAL AUTOCORRELATIONS. ) if often denoted by γ. Note that

Standard Error of Technical Cost Incorporating Parameter Uncertainty

Survival Analysis and Reliability. A Note on the Mean Residual Life Function of a Parallel System

How about the more general "linear" scalar functions of scalars (i.e., a 1st degree polynomial of the following form with a constant term )?

On computing differential transform of nonlinear non-autonomous functions and its applications

SOME NOISELESS CODING THEOREMS OF INACCURACY MEASURE OF ORDER α AND TYPE β

e-journal Reliability: Theory& Applications No 2 (Vol.2) Vyacheslav Abramov

First-order piecewise-linear dynamic circuits

Li An-Ping. Beijing , P.R.China

Relative controllability of nonlinear systems with delays in control

Scattering at an Interface: Oblique Incidence

Outline. Probabilistic Model Learning. Probabilistic Model Learning. Probabilistic Model for Time-series Data: Hidden Markov Model

Should Exact Index Numbers have Standard Errors? Theory and Application to Asian Growth

ABSTRACT KEYWORDS. Bonus-malus systems, frequency component, severity component. 1. INTRODUCTION

Computing Relevance, Similarity: The Vector Space Model

Cubic Bezier Homotopy Function for Solving Exponential Equations

Optimal environmental charges under imperfect compliance

The Finite Element Method for the Analysis of Non-Linear and Dynamic Systems

Online Supplement for Dynamic Multi-Technology. Production-Inventory Problem with Emissions Trading

Approximate Analytic Solution of (2+1) - Dimensional Zakharov-Kuznetsov(Zk) Equations Using Homotopy

3. OVERVIEW OF NUMERICAL METHODS

Dual Approximate Dynamic Programming for Large Scale Hydro Valleys

January Examinations 2012

Clustering (Bishop ch 9)

Data Collection Definitions of Variables - Conceptualize vs Operationalize Sample Selection Criteria Source of Data Consistency of Data

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

PHYS 705: Classical Mechanics. Canonical Transformation

An introduction to Support Vector Machine

M. Y. Adamu Mathematical Sciences Programme, AbubakarTafawaBalewa University, Bauchi, Nigeria

2 Aggregate demand in partial equilibrium static framework

Testing a new idea to solve the P = NP problem with mathematical induction

ELASTIC MODULUS ESTIMATION OF CHOPPED CARBON FIBER TAPE REINFORCED THERMOPLASTICS USING THE MONTE CARLO SIMULATION

Notes on the stability of dynamic systems and the use of Eigen Values.

Robustness of DEWMA versus EWMA Control Charts to Non-Normal Processes

Advanced time-series analysis (University of Lund, Economic History Department)

Motion in Two Dimensions

Bernoulli process with 282 ky periodicity is detected in the R-N reversals of the earth s magnetic field

Density Matrix Description of NMR BCMB/CHEM 8190

12d Model. Civil and Surveying Software. Drainage Analysis Module Detention/Retention Basins. Owen Thornton BE (Mech), 12d Model Programmer

, t 1. Transitions - this one was easy, but in general the hardest part is choosing the which variables are state and control variables

10. A.C CIRCUITS. Theoretically current grows to maximum value after infinite time. But practically it grows to maximum after 5τ. Decay of current :

2.1 Constitutive Theory

Sampling Procedure of the Sum of two Binary Markov Process Realizations

ESTIMATIONS OF RESIDUAL LIFETIME OF ALTERNATING PROCESS. COMMON APPROACH TO ESTIMATIONS OF RESIDUAL LIFETIME

1 Constant Real Rate C 1

Lecture VI Regression

ON THE WEAK LIMITS OF SMOOTH MAPS FOR THE DIRICHLET ENERGY BETWEEN MANIFOLDS

Efficient Asynchronous Channel Hopping Design for Cognitive Radio Networks

Advanced Macroeconomics II: Exchange economy

Transcription:

Samplng Coordnaon of Busness Surveys Conduced by Insee Faben Guggemos 1, Olver Sauory 1 1 Insee, Busness Sascs Drecorae 18 boulevard Adolphe Pnard, 75675 Pars cedex 14, France Absrac The mehod presenly used a Insee for negave coordnaon s based on he use of random numbers, drawn accordng o he unform dsrbuon on [0,1]. In a gven sraum, he uns wh he smalles numbers are seleced. Afer he draw, a roaon s carred ou on he random numbers: he uns ha have jus been seleced have a lower probably of beng seleced n he nex drawng, whle mananng good properes o classcal sascal esmaors. Oher schemes are used o acheve posve coordnaon. A new mehod of coordnaon, whch generalzes he curren echnque, s now expermened, and presened n hs paper. The random number s replaced by a "coordnaon funcon", whch ransforms he random numbers, and has he characersc of preservng unform probably. Ths funcon changes wh each selecon, dependng on he desred ype of coordnaon: negave coordnaon for separae samples or updang panels. Ths mehod aes no accoun he cumulave response burden over several samples, and can be used wh Posson samplng and srafed smple random samplng. KeyWords: sample coordnaon, random numbers, response burden, srafed samples 1. Inroducon The publc sascal sysem carres ou each year a sgnfcan number of busnesses and esablshmens surveys. Samples for busness surveys (also called uns n he followng) are mos ofen drawn accordng o srafed smple random samplng desgns. The populaon of frms correspondng o he scope of he survey s dvded no sraa consruced from nown characerscs of busnesses, usually avalable n he admnsrave busness regser (acvy, sze, geographcal locaon...). The objecve of he negave coordnaon of samples s o foser, when selecng a sample, he selecon of busnesses ha have no already been seleced n recen surveys, whle preservng he unbasedness of he samples. Ths coordnaon conrbues o reduce he sascal burden of small busnesses - large busnesses, from a ceran hreshold, are sysemacally surveyed n mos surveys. The mehod currenly used a Insee can provde samples of busnesses (or samples of esablshmens) negavely coordnaed n pars. A more general mehod, ang no accoun he cumulave burden of busnesses and he overall coordnaon of a se of surveys, s currenly expermened by he mehodologcal un on busness sascs a Insee. Ths mehod was proposed by C. Hesse and suded by Pascal Ardlly (see [1] and [2]).

In a frs par, hs paper presens hs new mehodology o draw samples negavely coordnaed. In he second par, we wll presen he frs resuls of smulaons o expermen he mehod. 2. Coordnaon funcon Selecon of he samples As n he curren mehod, he new mehodology of negave coordnaon s based on random numbers gven o he uns. Bu unle he curren mehod, whch s based on permuaons performed on he random numbers, hese numbers wll be gven once for all o all uns, and ransformaons of hese numbers wll be mplemened o oban he desred coordnaon. These ransformaons wll be acheved by funcons wh specal properes, called coordnaon funcons. The concep of coordnaon funcon plays an essenal role n he mehod. 2.1 Defnon of a coordnaon funcon A coordnaon funcon g s a measurable funcon from [0,1] ono self, whch preserves unform probably : f P s he unform probably on [0,1], hen he mage probably P g s P. 1 g I means ha for any nerval I = [a, b[ ncluded n [0,1] : P[ g (I)] = P (I) = P(I) = b a The lengh of he nverse mage of any nerval under g equals he lengh of hs nerval : a coordnaon funcon preserves he lengh of nervals or unon of nervals by nverse mage. 2.2 Selecon of he samples We consder a sequence of surveys = 1, 2, ( refers o he dae and he number of he survey), and we denoe by S he sample correspondng o survey. Each un of he populaon s gven a permanen random number ω, drawn accordng o he unform dsrbuon on he nerval [0,1[. The drawngs of he ω s are muually ndependen. We wll defne for each un a coordnaon funcon whch changes a each survey: g, s he coordnaon funcon assocaed o un for survey. The selecon of sample S wll depend on he values of he ransformed random numbers g ( ω ). We wll sudy wo samplng mehods frequenly used for he selecon of samples for busness surveys : Posson samplng and srafed smple random samplng. 2.2.1 Posson samplng We presen hs samplng mehod, whch s very easy o mplemen, alhough s no used a Insee. The prncple s he followng: each un of he samplng frame s gven a frs-order ncluson probably π, each un s drawn wh he probably π, and he drawngs are muually ndependen (as a consequence, he sze of he sample s random). Here, o selec sample S, we selec uns such as g ( ) [ 0 π [ he ncluson probably of un for survey. Then we ge: g, P( S ) = P g ( ω ) 0, π = P 0, π = P ω def, ω,, where, ( [ [) ([ [) ( [ 0 π [) = π,, π denoes The ncluson probables are sasfed, and he drawngs of he uns are ndependen snce he ω are ndependen. s

2.2.2 Srafed smple random samplng The prncple s he followng: we dvde he samplng frame no sraa, and n each sraum h we selec n h uns accordng o a smple random samplng. Here, a dae, he samplng frame s dvded no sraa (h,). Whn each sraum (h,), of sze N (h,), we selec he n (h,) uns assocaed wh he n (h, ) smalles numbers g ( ω ), = 1... N., (h,) We wll om he ndexes h and : N s he sze of he sraum and n s he sample sze n he sraum. Then: S g ( ω ) En ( g, (), where En ( g, () s he se of he n smalles values g ( )., ω The N random numbers (ω ) assocaed o he N uns of he sraum have been ndependenly seleced accordng o he unform probably on [0,1], denoed P. Snce we g have P, = P for each, he N numbers g,( ω ) are also ndependenly seleced accordng o P. Then, usng a well-nown resul, he n smalles values g ( ω ) gve a smple random sample of sze n n he sraum. 3. A sep by sep procedure 3.1 Cumulave response burden and coordnaon funcon The general dea of negave coordnaon s o choose as a prory, for a gven sample selecon, he uns ha have had he lowes response burden durng he recen perod. Le ω = ( ω ) denoe he vecor of random numbers gven o he populaon uns. Le I ( be he ndcaor funcon whch defnes he ncluson of un n sample S, equal o 1 f he values n ω lead o selec un, and 0 oherwse : S I ( ω ) 1 (he ncluson of n sample S depends only on he vecor. I s a random varable, dependng on he vecor ω.,, = Le γ, be he response burden of a quesoned busness a survey (we wll ofen assume ha has he same value for all he uns for a gven survey). The effecve burden s a random varable γ = γ.i (, The cumulave burden for un s a funcon of ω, equal o: Γ, ( ) = γ,u.i,u u ω (1) To mee he objecve of negave coordnaon, when selecng sample S, we wsh o defne, for each un, a coordnaon funcon g based on Γ 1,.e. he cumulave burden of un unl survey -1. Tang no accoun he selecon scheme of he uns (he hgher he probably for he un o be seleced he smaller he number g, (ω ) ), a desrable propery for any coordnaon funcon s he followng: Γ 1 (1) (2) (1) (2) ( ω ) < Γ ( ω ) g (ω ) g (ω ) where 1 componen of vecor ω (). () ω (=1,2) denoes he h

Ths condon s no easy o handle, because he cumulave burden Γ s a funcon ( 1 of vecor ω: depends no only on he random number ω gven o un, bu also on all he oher random numbers ω 1 ω N. We wll see laer on how we can replace hs funcon by a funcon Γ 1( ω ) whch depends only on ω.the desrable propery for any coordnaon funcon g wll become : (1) Γ (ω ) <Γ (ω 1 1 (2) ) g (ω (1) ) g 3.2 Consrucon of a coordnaon funcon usng a gven creron Le us consder he problem n a more general way. We suppose ha we have a creron C, (ω ), such ha he smaller he creron, he larger he probably of selecon for un n samplng S. Ths creron could be a burden, bu no necessarly. I can be seen as a smple sorng creron. For he sae of smplcy, we om he subscrps and. So ω s now a smple real number beween 0 and 1. C s supposed o be a bounded measurable funcon: ω [0,1] C( IR. We wsh o assocae o hs creron a coordnaon funcon g such ha: C(ω (1) (ω (2) (1) (2) ) < C(ω ) g(ω ) g(ω ) (2) Le P C be he mage probably of unform probably P under C, and F C he cumulave dsrbuon funcon of C. Le us defne he funcon G C = F C (C). Consderng he defnons of P C and F C, we can wre: G = P C C (2) 1 (],C( [) = P( C ],C( [) = P( u C(u) < C ) Before loong a he properes of funcon G C, le us defne he noon of level. Defnon of a level We call level of creron C any measurable se A ncluded n [0,1], wh P(A) > 0, such ha here exss a real number x sasfyng C -1 ({x})=a. In oher words, C has levels when horzonal lne segmens form par of he graph of C. ) Properes of G C I can be proved ha: The range of funcon G C s ncluded n [0,1] G C has he same levels as C (plus a se of null probably) G C sasfes mplcaon (2) For every y n he range of G C, we have FG C (y) = P( u GC(u) < y) = y, where F GC s he cumulave dsrbuon funcon of G C. If C has no level, he range of G C s exacly [0,1] (possbly mnus a null probably se), and G C s a coordnaon funcon. If C has a leas one level, he range of funcon G C s srcly ncluded n [0,1]. We have o modfy G C o oban a coordnaon funcon, denoed by g C, such ha he range of g C s equal o [0,1]. Ths can be obaned n he followng way. We se g C ( = G C ( for any ω whch s no whn a level.

Le A be a level of C (and G C ) : f ω A, G C ( = consan = y. Le B denoe he larges nerval [y,[ such ha G 1 C (B) = A. On he graph of C, B s he nerval on he ordnae axs whch mars he gap beween level A and he frs level above A. We can show ha P(B) = P(A),.e. segmens A and B have he same lengh. Probably P G (B) s concenraed a pon y. To oban a unform dsrbuon over B, we consder he lnear funcon wh slope 1 whch ransforms A no B: If A = [a,b[, for any ω A, we defne g C ( = ω a + y. g C C A 0, ω Ths can be wren: ( ω ) = G + 1 [ ](u) du More generally, f C has several levels A, we defne: g C( ω ) = GC( + 1A 1A [ 0, ω] (u) du An example of he shape of hese 3 funcons s shown n he nex fgure. 4. Applcaon o Posson samplng Wh hs samplng mehod, we selec a un n sample S f g ( ) [ 0 π [, ω,, where π, s he ncluson probably of un. Consequenly, he ndcaor funcon I ( defned 1 n 3.1. s a funcon of ω only: S I ( ω ) = 1 ω g ([ 0 π [ ), Selecon of sample S 1 Inalzaon: we se = 0, ω [ 0,1 ] Γ.,0( There s no coordnaon a hs sep: S1 ω [ 0, π,1[ deny funcon on [0,1] as a coordnaon funcon : g 1 ( ) = ω ω [ 0,1 ],, ha amouns o usng he ω. The ndcaor funcon equals I = 1I [ π [(, depends only on he h componen of,1 vecor ω. I s a sep funcon, equal o 1 on he nerval [,,1[ [ π,1,1 ]. 0,,1 0 π, and 0 on he nerval The "cumulave" burden funcon s defned by Γ,1( = γ,1 1I [ 0, π [(, where γ,1, 1 denoes he response burden for un a he frs survey ; he cumulave burden s a funcon of

he h componen of vecor ω. I s a sep funcon as well, equal o γ, 1 on he nerval [ 0, π [, and 0 on he nerval [ ],1 π.,1,1 Selecon of sample S 2 For a gven, we use hs cumulave burden funcon Γ, 1 as a "creron" (as defned n 3.2) o buld a coordnaon funcon g, 2 for he selecon of he second sample S 2. Ths creron has wo levels, he resulng coordnaon funcon s very smple. I s relevan o noce ha we can buld several coordnaon funcons whch sasfy condon (2) n 3.2. See oppose he shape of a coordnaon funcon correspondng o π,1 = c = 0.3. Each of boh lne segmens, wh a slope equal o 1, could be also replaced by a lne segmen wh a slope equal o 1, defned on he same nerval. The selecon of sample S 2 s performed n hs way: S2 g,2( ω ) [ 0, π,2[ 1 If we se A = g [ 0 π [, he ndcaor funcon s hen I ( ω ) = I1,2,2,,2,2 A,2, and he cumulave burden s gven by Γ,2( = Γ,1 + γ,2 1I A,2, where γ, 2 denoes he response burden for un a he second survey. The ndcaor funcon and he cumulave burden are funcons of he h componen of vecor ω. Selecon of sample S More generally, o selec sample S, we buld coordnaon funcons g from he cumulave burden funcon Γ 1. And hen we oban: I S g ( ω ), [ 0 π [ 1, = 1I A wh A = g, [ 0 π [ Γ,( = Γ + γ 1 1I A A each sep, all he ndcaor funcons and cumulave burdens are sep funcons. There s one more level a each sep. We have o buld coordnaon funcons from a creron whch has only levels: hey are pecewse lnear funcons, wh slope 1 (or 1). 5. Applcaon o srafed smple random samplng Wh hs samplng mehod, we selec a un n sample S f he random number g ( ) ω s one of he n lowes numbers g, ( ω ) assocaed wh all he uns of he samplng

frame 1. Then he ncluson of n S depends on he random numbers ω of all he uns. The ndcaor funcon I, ogeher wh he cumulave burden Γ, are funcons of vecor ω. So here s a need o replace he ndcaor funcon I wh an approxmae ndcaor funcon I', whch should be a funcon of ω close o I. 5.1 The expeced ndcaor funcon Le Ω = (Ω 1, Ω 2,, Ω N ) denoe he random vecor from whch we have an oucome ω = (ω 1, ω 2,, ω N ) conssng of he N random numbers assocaed wh he uns of he samplng frame. The bes approxmaon of he ndcaor funcon dependng only on ω, n he L2-norm sense, s s condonal expecaon gven Ω : I = E I ( Ω ) Ω = ω = P S Ω = ω ( ) ( ) If we suppose ha he coordnaon funcons are bjecve 2 funcons, we can wre I = P S Ω = ω = P S g Ω = g ω = b g ω ( ) ( ( ) ( )) ( ( )), ( x). where b ( x) = P S g ( Ω ), = b (x) equals he probably ha among he N-1 random numbers g, ( ω ) ( ), a mos n-1 of hem are lower han x. Usng a well-nown resul of he probably heory, can be shown ha: u= x 1 p 1 q 1 b (x) = 1 u (1 u) du wh p=n, q = N n, and B(p,q) u= 0, (p 1)!(q 1)! B(p,q) = (p + q 1)! The graph hereafer shows he shape of he b(x) funcon for some values of n and N. A b(x) funcon has he followng shape: a frs par "almos horzonal" close o 1, correspondng o an "almos ceran" selecon of he un n he sample, a hrd par "almos horzonal" close o 0, correspondng o an "almos ceran" non-selecon of he un n he sample. Beween hem, a decreasng par "wh a hgh (negave) slope" 1 We recall ha we om he sraum ndex. 2 Ths propery s sasfed wh he mehod descrbed here, bu s no an nrnsc propery of a coordnaon funcon.

correspondng o a more or less shor nerval on he abscssa axs: hs nerval s nearly cenered on he value n/n 3, equal o he samplng rae. Around hs value here s uncerany abou he selecon of he un n he sample. 5.2 The expeced cumulave burden funcon Due o he subsuon of an expeced ndcaor funcon for he ndcaor funcon, he cumulave burden funcon self s replaced, n formula (1) n 3.1., by an expeced cumulave burden funcon Γ, gven ω :,, = γ,u I,u u= 1 Γ To ensure ha he algorhm performs well, ha s leads o unbased samples, s necessary o use hs expeced burden nsead of he acual burden. The laer s based on he observed nclusons of un n he successve samples: Γ, = γ,u I1 ( Su ) u= 1 I may be noed ha, n he Posson samplng case, he algorhm uses he acual burden. 5.3 Approxmaon by sep funcons The expeced ndcaor funcons I' and he expeced cumulave burden funcons are no sep funcons or funcons ha can be easly "compued". We wll use wo nds of smplfcaons. We dvde he nerval [0,1] no L equal subnervals "large enough" neger (a leas greaer han 50). l 1 ; l, l = 1 L 4, where L s a L L 1. We replace he approxmae ndcaor funcon b, by a pecewse lnear funcon b ~, whch aes he same values as b a he endpons of he nervals., 2. Then we defne a sep funcon β, consan on each nerval, equal o he average value of b ~, on he nerval : we oban an approxmaon of he approxmae ndcaor funcon. 5.4 Consrucon of a coordnaon funcon The sepwse algorhm, whch s descrbed laer, s smlar o he one presened n he Posson samplng case. I uses he expeced cumulave burden funcons Γ,, see formula (2), 5.2., where he ndcaor funcons are replaced by he approxmae ndcaor funcons β,. As he β,, he cumulave burden funcons wll be sep funcons, consan on each of he subnervals. Usng he mehod descrbed n he 3.2., we derve from each cumulave burden funcon: - he funcon G C, whch s consan on each subnerval ; - he funcon g C : whch s lnear (wh a slope equal o 1) on each subnerval. 3 The abscssa of he nee-pon of he curve s equal o n 1 / N 2 4 L 1 If l = L, he nerval s ; 1 L

5.5 Selecon of he samples Selecon of sample S 1 Inalzaon: we se = 0, ω [ 0,1 ] Γ.,0( There s no coordnaon a hs sep : S ω E ( ω, 1...N) 1 n = 5, ha amouns o usng he deny funcon on [0,1] as a coordnaon funcon : g 1 ( ω ) = ω ω [ 0,1 ]. The acual burden equals Γ,1 = γ,1 I1 ( S1), where γ, 1 denoes he response burden for un a survey 1. Bu he expeced cumulave burden funcon, whch s a funcon of he h componen of vecor ω only, uses he expeced ndcaor funcon β, 1: Γ = γ β,1(,1, 1 As a consequence of he shape of funcon β, 1, as descrbed before, he acual burden funcon and he expeced burden funcon wll be nearly he same on he nerval [0,1], excep on a neghborhood of he value n/n. Beng seleced n he sample s he mos unceran for he uns whose random number s near o n/n., Selecon of sample S 2 We use he expeced cumulave burden funcon Γ, 1 as a "creron" (as defned n 3.2) o buld a coordnaon funcon g, 2 for he selecon of he second sample S 2. The selecon of sample S 2 s performed n hs way: S g ( ω ) E ( g ( ω ), 1...N) 2,2 n,2 = The expeced cumulave burden funcon s: Γ = Γ + γ β, where γ,2 denoes he response burden for un a survey 2, and β, 2 funcon assocaed wh hs selecon.,2(,1,2, 2 he expeced ndcaor Selecon of sample S More generally, o selec sample S, we buld coordnaon funcons g, from he expeced cumulave burden funcons Γ 1. And hen we oban S g ( ω ) En( g,(ω ), = 1...N) and Γ = Γ 1( + γ β, ( 6. Emprcal assessmen of he mehod on smulaed daa 6.1 Performance ndcaors of he mehod The prooype developed n SAS has been desgned o selec a Srafed Smple Random Sample whn a gven populaon (SSRS, usual samplng desgn for INSEE busness surveys), coordnaed wh a se of paramerzed pas surveys whose samples have been all drawn by he same procedure. The populaon can be affeced by demographc changes (he feld of uns of neres can change from one survey o anoher). Srafcaon of he populaon may also dffer oally from one survey o anoher. 5 E ( ω, 1...N) n = s he se of he n smalles values ω.

The purpose of he smulaons whose resuls are shown laer n hs secon s o evaluae he effcency of he samplng coordnaon mehod, dependng on he varous parameers characerzng he samplng desgn or he populaon of sascal uns. A gven smulaon wll conss n drawng 20 coordnaed samples and 20 ndependen samples (for 20 surveys) whn a populaon affeced by demographc changes and hen o compare samples obaned from hese wo ways. For he ndependen case as for he coordnaed case, we compue he dsrbuon over he populaon of he varable "number of selecons": how many uns have been seleced once, how many have been seleced wce, ec. Ths leads o draw he wo hsograms n Fgure 1 (correspondng o he smulaon of he las lne of Table 4 below). Fgure 1: Dsrbuons over he populaon of he varable "number of selecons". On he lef, when samples are coordnaed. On he rgh, when samples are ndependen. The populaon and he esed samplng desgns are hose correspondng o he resuls of he las lne of Table 4 below. The wo hsograms n Fgure 1 are emblemac because hey hghlgh a suaon where coordnaon s very effecve. Indeed, as he sample szes are exacly he same beween coordnaed samplng and ndependen samplng, he average number of samples n whch any un of he populaon s seleced s he same n boh cases. In oher words, he means of he dsrbuons represened by he wo hsograms n Fgure 1 are necessarly equal. The qualy of he samplng coordnaon mehod wll herefore be assessed by analysng he hgher order momens of hese dsrbuons. Here, he values aen by he varable "number of selecons" are clearly gahered more ghly around her mean when samples are coordnaed. Ths means exacly ha he samplng coordnaon mehod allocaes he response burden o all populaon uns as farly as possble. In he case of Fgure 1, nowng ha a un s seleced 3.55 mes on average, he samplng coordnaon mehod leads o he selecon of more han 87% uns n only hree or four samples ; only 6.4% uns are seleced fve mes and 0.4% uns sx mes. When samples are ndependen, such performances are far from beng acheved, some uns are even seleced egh, nne or en mes. The qualy of coordnaon s herefore measured by he fac ha he sandard devaon of he dsrbuon s much smaller for he hsogram on he lef of Fgure 1 han for he one on he rgh.

Also we defne he man performance ndcaor of he samplng coordnaon mehod as follows: R = Coord where Coord and Indep are he sandard devaons of he dsrbuons over he populaon of he varable "number of selecons", when he samples are coordnaed and ndependen respecvely. A rao R below 1 means ha he response burden s beer allocaed o busnesses by drawng coordnaed samples. The closer o 0 he rao R s, he more effcen he samplng coordnaon mehod wll be. The advanage of hs ndcaor s ha he numercal values aes are drecly nerpreable: f s below 1, samplng coordnaon reduces by (1 - R )% he dsperson of burdens over he populaon uns. Also, le us remember ha he prmary purpose of samplng coordnaon s frs o avod suaons where uns have a oo hgh response burden. Allocang he burden evenly over all uns of he populaon s no an end n self bu a means o acheve hs goal. We can herefore expec ha, f he samples are coordnaed, he dsrbuon over he populaon of he varable "number of selecons" spreads less o he rgh, ha s o say owards he values hgher han average. A good samplng coordnaon mehod should avod selecng he same un oo frequenly. Ths leads us o defne a secondary performance ndcaor of he samplng coordnaon mehod: s Coord Indep = S S Indep where S Coord e S Indep are he sewness (asymmery coeffcens) of he dsrbuons over he populaon of he varable "number of selecons", when he samples are coordnaed and ndependen respecvely. As he spread of he dsrbuon o he rgh s expeced o be lower n he case of coordnaed samples, good coordnaon of samples should resul n a negave value for. Neverheless, he numercal value of he secondary ndcaor s more dffcul o nerpre, so we nss on he fac ha s sgn s more worhy of consderaon. 6.2 Impac of he samplng desgn and of he populaon characerscs on he qualy of he samplng procedure 6.2.1 Impac of dscrezaon The samplng coordnaon mehod s based on he calculaon, for each un, of dfferen funcons (expeced cumulave burden, expeced ndcaor of beng sampled, coordnaon funcon) ha are defned on he connuous nerval [0, 1]. In operaonal erms, only a fne number of values of hese funcons can be calculaed. In hs frs sub-par, we analyze he effcency of coordnaon when he number L of subnervals paronng [0, 1] vares. One smulaon has been performed by consderng a populaon of sze 1000 wh srafcaon. Table 1 shows he characerscs (mean, sandard devaon, sewness) of he wo dsrbuons (coordnaed samplng vs ndependen samplng) over he populaon of he varable "number of selecons". Le us remnd ha hese wo S

dsrbuons have he same average. Table 1 also ncludes he values of he wo performance ndcaors nroduced above. Overall, he resuls are very promsng, snce he samplng coordnaon mehod seems very effcen. R s always much less han 1, even when coarse approxmaons of funcons are compued (L=10 for nsance). The dsperson of response burdens over he populaon uns s generally reduced by 50 o 60% when he samples are coordnaed! The secondary performance ndcaor self s always negave, as expeced. Whle he mpac of dscrezaon of he nerval [0, 1] on he qualy of coordnaon appears o be que lmed, addonal smulaons have esablshed ha s bes o choose a parameer L a leas equal o 100. Mean Coord Indep S Coord S Indep R S L = 10 3.514 0.947 1.711 0.140 0.299 0.554-0.158 L = 50 3.514 0.689 1.736 0.152 0.515 0.397-0.363 L = 100 3.514 0.683 1.737 0.120 0.351 0.393-0.232 L = 200 3.514 0.692 1.684 0.132 0.340 0.411-0.208 L = 500 3.514 0.673 1.748 0.236 0.456 0.385-0.220 L = 1000 3.514 0.677 1.705 0.260 0.392 0.397-0.132 Table 1: Impac of dscrezaon of he nerval [0, 1], dvded no L subnervals Srafed populaon of sze 1000, Average sze of sraa: 200, Coverage raes around 86%, Samplng raes around 20%, Coordnaon wh all he pas surveys (20 surveys n all), Consan response burden. 6.2.2 Impac of samplng rae In hs secon, he samplng coordnaon mehod s esed wh several samplng raes. For one smulaon, he same samplng rae s appled o he 20 successve surveys. The resuls are repored n Table 2. Mean Coord Indep S Coord S Indep R S f = 0.01 0.113 0.317 0.329 2.448 2.781 0.962-0.332 f = 0.05 0.839 0.462 0.879-0.427 0.967 0.526-1.393 f = 0.10 1.738 0.572 1.279-0.186 0.691 0.447-0.877 f = 0.20 3.569 0.701 1.714 0.251 0.495 0.409-0.244 f = 0.40 7.105 0.865 2.176-0.065 0.248 0.398-0.313 f = 0.60 10.728 1.073 2.322-0.173-0.006 0.462-0.168 f = 0.80 14.175 1.282 2.091-0.390-0.170 0.613-0.219 f = 0.95 17.129 1.337 1.535-0.388-0.487 0.871 0.100 f = 0.99 17.733 1.427 1.432-0.554-0.538 0.997-0.016 Table 2: Impac of samplng rae f. Dscrezaon: L = 100, Srafed populaon of sze 1000, Average sze of sraa: 150, Coverage raes around 90%, Coordnaon wh all he pas surveys (20 surveys n all), Consan response burden.

Once agan, hese resuls confrm he earler conclusons. The samplng coordnaon mehod allocaes he response burden o all populaon uns very effcenly: he man ndcaor R s always less han 1, he secondary ndcaor S s negave (excep n one case). Furhermore, he samplng coordnaon mehod provdes very lle mprovemen n he case where he samplng rae s very low or very hgh. Ths s no surprsng! When he samplng rae s very close o 0 (respecvely 1), he uns have an exremely low probably of beng sampled wce (respecvely o be absen from one sample), wheher or no successve surveys are coordnaed. In borderlne cases, f = 0 and f = 1, coordnang s obvously useless: he rao R wll be equal o 1 and he dfference S o 0, snce he underlyng dsrbuons when samples are coordnaed and ndependen wll be perfecly dencal. Evenually, he samplng coordnaon mehod also provdes gans of over 50% n erms of response burden allocaon, for a wde range of samplng raes (here, for samplng raes beween 0.05 and 0.60). The mehod sll performs well for a samplng rae of 0.8, he gan beng approxmaely equal o 40% (he rao R s close o 0.6). However, how he effcency of he mehod decreases as he samplng rae approaches 1 or 0 also depends on he number of pas surveys wh whch he curren ones are coordnaed (see secon 6.2.4). 6.2.3 Robusness of coordnaon when he populaon s very unsable over me or when samplng raes vary srongly from one survey o anoher Some ess have been performed on populaons of larger sze (populaon of sze 10000). Very large demographc changes have been smulaed (coverage raes beween 10% and 90%) and samplng raes dffer sgnfcanly from one survey o anoher (rangng from 1% o 99%). Two man smulaons have been performed, whose resuls are presened n Table 3. The samples are coordnaed wh all he pas surveys n he frs row of he able, bu only wh he prevous fve surveys n he second row. The samplng coordnaon mehod remans generally very effcen, even f he gans are less subsanal han hose observed n he prevous smulaons: s always negave, R s sll below 1, bu wh hgher values han before. The coordnaon mehod seems o be que robus owards he volaly of he parameers characerzng he populaon and he survey desgn. S Coordnaon wh Mean Coord Indep S Coord S Indep R S All he pas surveys 3.906 1.153 1.689 0.132 0.287 0.683-0.156 he prevous 5 surveys 5.704 1.371 1.810-0.002 0.187 0.757-0.188 Table 3: Srong varaons n he characerscs of he populaon and he samplng raes: coverage raes varyng beween 10% & 90%, samplng raes beween 1% & 99%. Dscrezaon: L = 100, Srafed populaon of sze 10000, Average sze of sraa: 400, 20 surveys n all, Consan response burden.

6.2.4 Impac of durably and varably of he response burden Table 4 shows he resuls of smulaons esng he effec relaed o he durably of he response burden. A row n he able s characerzed by he value of he parameer NCoord: for he correspondng smulaon, each sample s hen coordnaed wh he samples of he prevous Ncoord surveys. The resuls n Table 4 hghlgh he need for coordnang wh a suffcenly large number of pas surveys. By coordnang sysemacally wh a leas he prevous four surveys, he mehod s as effcen as for he prevous smulaons, wh an ndcaor R much lower han 1, flucuang beween 0.4 and 0.5, and an ndcaor S remanng negave. When coordnang wh oo few pas surveys, he samplng coordnaon mehod s counerproducve because ends o concenrae he response burden on some uns. Le us consder he frs smulaon where each sample s smply coordnaed wh he prevous survey. Durng he frs draw, he uns o be seleced are hose wh a very low random number. Thus, hese uns have he lowes probables (acually very close o zero) o be seleced durng he second draw. Bu hen, as he hrd sample s coordnaed wh he second one only, hey wll agan have he hghes probables (close o 1) o be seleced n he hrd sample. Ths reasonng can be repeaed and hghlghs a perverse effec. Coordnaon wh oo few pas surveys creaes "basns of aracon" - ranges of random numbers n [0, 1] for whch he uns wll have perodcally very hgh condonal ncluson probables (once every (NCoord + 1) survey) - as well as "basns of repulson", ranges of random numbers n [0, 1] for whch he uns wll have sysemacally very low condonal ncluson probables. When here s no demographc change and he samplng rae, consan from survey o survey, s sgnfcanly lower han 1/(1+NCoord), he exsence of hese basns of aracon and repulson can be easly proved n heory. In he general case, hs suggess ha he number Ncoord of prevous surveys wh whch he curren ones are coordnaed be chosen accordng o a rule of humb le he followng, NCoord 1 > 1 f where f s he average samplng rae of he surveys ha are concerned (Noe ha, n Table 4, he average samplng rae s equal o 1/5, whch leads o NCoord> 4). These basns of aracon and repulson allow us o explan he bad performances of he samplng coordnaon mehod a he frs wo rows of Table 4, he rao R beng greaer han 1 (and even han 2 n he frs case). Acually, when coordnang wh he prevous survey only, here s a parcularly hgh proporon of uns ha are seleced 10 mes, ha s o say half he me.

Mean Coord Indep S Coord S Indep R S NCoord = 1 3.557 3.905 1.704 0.481 0.308 2.292 0.173 NCoord = 2 3.557 2.306 1.723 0.025 0.424 1.338-0.400 NCoord = 3 3.557 1.387 1.756-0.612 0.403 0.790-1.015 NCoord = 4 3.557 0.835 1.676-0.506 0.491 0.498-0.998 NCoord = 5 3.557 0.816 1.718 0.110 0.266 0.475-0.156 NCoord = 7 3.557 0.859 1.722 0.037 0.379 0.499-0.343 NCoord = 9 3.557 0.748 1.723-0.310 0.388 0.434-0.698 NCoord = 12 3.557 0.777 1.647 0.272 0.388 0.472-0.116 NCoord = 15 3.557 0.787 1.670-0.226 0.333 0.471-0.559 NCoord = 19 3.557 0.718 1.674 0.027 0.369 0.429-0.342 Table 4: Impac of durably of he response burden, characerzed by he number Ncoord of prevous surveys wh whch he curren ones are coordnaed Dscrezaon: L = 100, Srafed populaon of sze 1000, Average sze of sraa: 150, Coverage raes around 90%, Samplng raes around 20%, Consan response burden. Oher smulaons have also been performed, where he response burden vares from a survey o anoher. The resuls of hese ses of smulaons yeld smlar conclusons, confrmng he effcency of he samplng coordnaon mehod. However, s dffcul o deec any mpac of he varaon of he response burden on he sascs and performance ndcaors nroduced n 6.1. 6.3 Concluson Overall, all numercal resuls hghlgh he effcency of he samplng coordnaon mehod presened n hs paper, provdng sgnfcan and easly measurable gans n erms of response burden allocaon over he populaon uns. The effcency of he mehod seems o ress very unsable suaons, for nsance when he populaon of neres s affeced by srong demographc changes or when he srafcaons are ndependen and he samplng raes vary grealy from one survey o anoher. If, no surprsngly, he mehod becomes less effcen when he samplng raes end o 0 or 1, s neverheless necessary o perform coordnaon wh a suffcenly large number of pas surveys, n order o avod he phenomena relaed o basns of aracon or repulson. A rule of humb for choosng he number of prevous surveys wh whch he curren ones have o be coordnaed has been suggesed. The smulaons can hen be carred ou wh populaons of larger szes, before consderng he ndusralzaon of he process a INSEE. The samplng coordnaon mehod could also be refned n order o accoun for selecng panels. References [1] Chrsan Hesse, «Généralsaon des rages aléaores à numéros aléaores permanens, ou la méhode JALES+», documen de raval Insee E0101, 2001. [2] Pascal Ardlly, «Présenaon de la méhode JALES+ conçue par Chrsan Hesse», documen de raval nerne Insee, 2009. [3] Franc Coon e Chrsan Hesse, «Trages coordonnés d échanllons», documen de raval Insee E9206, 1992.