Random Parttons of Samples Klaus Th. Hess Insttut für Mathematsche Stochastk Technsche Unverstät Dresden Abstract In the present paper we construct a decomposton of a sample nto a fnte number of subsamples n the case where the sample sze s random and the decomposton depends on the values of the samplng varables. We nvestgate the basc propertes of the subsamples and compute the frst and second order moments of ther sample sums, sample means, and sample varances. 1 Introducton In the present paper we construct a decomposton of a sample nto a fnte number of subsamples n the case where the sample sze s random and the decomposton depends on the values of the samplng varables. We nvestgate the basc propertes of the subsamples and compute the frst and second order moments of ther sample sums, sample means, and sample varances. Throughout the paper let Ω, F, P be a probablty space, M, M be a measurable space where M s a lnear space, H be a fnte set of ndces, {M h } be a fnte partton of M, N : Ω N 0 be a random varable wth P [N N > 0, and {Y } N be a sequence of random varables Ω M such that the sequence {Y } N s..d., the par {N, {Y } N } s ndependent, and η h : [Y M h > 0 holds for all h H where Y denotes a random varable havng the same dstrbuton as each Y. Ths s a corrected reprnt of the verson from February 24, 2000. The author would lke to thank Markus Hübel München for some helpful comments. Present Address: Unverstät Rostock, Insttut für Mathematk, D 18055 Rostock, Germany 1
Interpretaton: The famly Y 1,..., Y N s a sample wth random sample sze N. We want to construct a decomposton nto some random subsamples Y h 1,..., Y h N h wth random sample szes N h, such that Y h M h for each h H. The am of ths paper s to prove some propertes of these random subsamples and to calculate the frst two moments of the sample sum, the sample mean, and the sample varance. The present paper partly generalzes the results of Franke and Macht 1995 and of Hess, Macht, and Schmdt 1995, whch are ncluded n Schmdt 1996, n the sense that we consder a more general structure of the sample and a decomposton nto more than two subsamples. Propertes of thnned samples were also studed by Stenger 1986. He descrbed the procedure of thnnng and called t Posson samplng. The paper s organzed as follows: In Secton 2 we gve the results on decomposed samples, and n Secton 3 we establsh the frst and second order moments of the sample sum, sample mean, and sample varance of the subsamples by reducng our stuaton to the classcal case see e. g. Cramér 1946. In Secton 4 we present some applcatons n nsurance. Secton 5 ncludes the proof of Theorem 2.1, whch s the man result of ths paper. 2 Decomposton of samples For the moment fx h H. Frst let N N h : χ {Y M h } Then N h s the random sample sze of group h. Second we defne recursvely a sequence of stoppng tmes and a sequence of random varables ν h 0 : 0 ν h : nf{j N ν h 1 < j, Y j M h } Y h : j1 χ Y h {ν j} j for all N. Then { Y h } s the sequence of samplng varables of group h and N Y h M h. { {Y } } h The par, N h s called a random subsample of group h. N The next theorem shows the propertes of random subsamples: 2
2.1 Theorem. a For each h H the sequence { Y h } for all A M. b The famly N s..d. wth P [ Y h A Y A Y M h { {Y } } h N s ndependent. c The par {{ {Y } } h, { N h} } N s ndependent. d The condtonal jont dstrbuton of {N h } gven N s the condtonal multnomnal dstrbuton wth parameters N and {η h }. Proof. See Secton 5. In general the famly of random sample szes of all groups s not ndependent: 2.2 Corollary. The followng are equvalent: The famly {N h } s ndependent. N has a non degenerate Posson dstrbuton. Proof. The asserton follows from Theorem 2.1 d; see e. g. Hess and Schmdt 1994. Furthermore we have the followng propertes of the random sample szes: 2.3 Corollary. a For each h H the condtonal dstrbuton of N h gven N s the condtonal bnomal dstrbuton wth the parameters N and η h. b If E [N <, then E [ N h η h E [N holds for all h H. c If E [N 2 <, then cov [ N h, N j η h η j var [N E [N+δ hj η h E [N holds for all h, j H. d If N has the bnomal dstrbuton wth parameters n N and ϑ 0, 1, then {N h } has the multnomal dstrbuton wth parameters n and {ϑη h } and each N h has the bnomal dstrbuton wth parameters n and ϑη h. e If N has the Posson dstrbuton wth parameter λ 0,, then {N h } s ndependent and each N h has the Posson dstrbuton wth parameter λη h. f If N has the negatvebnomal dstrbuton wth parameters ρ 0, and ϑ 0, 1, then {N h } has the negatvemultnomal dstrbuton wth parameters ρ and {ϑη h } and each N h has the negatvebnomal dstrbuton wth parameters ρ and ϑη h /1 ϑ + ϑη h. Proof. Straghtforward. Partly see Hess and Schmdt 1994 and Schmdt and Wünsche 1998. 3
3 Moments of sample moments Let {g h } be a famly of measurable functons g h : M R, and defne sequences of real random varables by X h : g h Y h for all N and h H. Then Theorem 2.1 remans vald wth X h and P [ X h A Y gh 1 A Y Mh for all A BR. Defne N : σ{n h } nstead of Y h Because of Theorem 2.1 c, the assertons a and b of Theorem 2.1 also hold condtonally wth respect to N. Now we consder some sample functons and usng Theorem 2.1 we can calculate the frst two moments of these sample functons. Frst denote by the sample sum of group h. S h : N h X h 3.1 Theorem. Assume that E [N 2 < and E [X 2 <. Then the equatons E [ S h E [ N h E [ X h cov [ S h, S j cov [ N h, N j E [ X h E [ X j + δ hj E [ N h var [ X h hold for all h, j H. Proof. For h j H the assertons follow from Wald s equaltes see e. g. Schmdt 1996; for h, j H wth h j we get cov [ S h, S j cov [ E S h N, E S j N + E [ cov S h, S j N cov [ N h E X h N, N j E X j N cov [ N h E [ X h, N j E [ X j cov [ N h, N j E [ X h E [ X j and the asserton follows. Next denote by N 1 h S h X h : N h f N h 1 0 otherwse 4
the sample mean of group h. The concept of the sample mean s useful only f we have at least one observaton n group h, whch means that N h > 0. Therefore we wll consder only the condtonal moments for the sample mean under the condton that we have at least one observaton. For each non empty J H defne A J : {N h > 0} {N h 0} h J \J Then we have σa J N and, by Theorem 2.1 d, P [A J > 0. calculate the frst two condtonal moments of the sample mean: Now we can 3.2 Theorem. Let J H wth J and assume that E [N 2 < and E [X 2 <. Then the equatons E S h A J E [ X h cov S h, S j 1 A J δ hj E A J var [ X h hold for all h, j J. N h Proof. We get for all h, j J E S h A J E E S h N A J E E X h N AJ E [ X h and cov S h, S j A J cov E S h N, E S j N A J + E cov E X h N, E X j N AJ + δhj E 1 δ hj E A J var [ X h N h whch proves the asserton. cov 1 S h, S j N A J N h var X h N A J We conclude ths secton wth analogous results for the sample varance, whch s defned for each group h by N 1 h V h X h : N h S h 2 f N h 2 1 0 otherwse 5
For the sample varance we need at least two observatons. Therefore we defne for all non empty J H B J : {N h > 1} {N h 1} h J \J Then we have agan σb J N and, by Theorem 2.1 d, P [B J > 0, f we assume n addton, that P [N 2 > 0 holds. Now we can calculate the frst two condtonal moments of the sample varance: 3.3 Theorem. Let J H wth J and assume that P [N 2 > 0, E [N 2 <, and E [X 4 <. Then the equatons E V h BJ [ var X h cov V h, V j BJ 3 N δhj E var N h N h 1 B [ J X h 2 [ 1 X +E B J E E [ X h 4 hold for all h, j J. N h Proof. We get for all h, j J E V h BJ E E V h N BJ E var X h N BJ var [ X h and cov V h, V j BJ cov E V h N, E V j N BJ + E cov V h, V j N BJ cov var X h N, var X j N BJ 3 N +δ hj E h var X h 2 N B N h N h J 1 1 X +E N E h E X h h N 4 N B J 3 N δ hj E h var N h N h 1 B [ J X h 2 [ 1 X +E B J E h E [ X h 4 N h as was to be shown. 6
4 Examples The followng example on rensurance was consdered by Hess, Macht, and Schmdt 1995. The samplng problem n health nsurance occurs n Segel 1995. Excess of Loss Rensurance: Let M, M : R, BR. We consder the collectve model of rsk theory gven by the random varable N whch represents the number of clams occurrng n one year and the sequence {Y } N where Y represents the clam amount of clam N. It s assumed that the sequence {Y } N s..d. and ndependent of N. In excess of loss rensurance, the rensurer covers for each ndvdual clam that part of the clam amount whch exceeds a gven prorty c > 0. The aggregate clam amount of the rensurer s then gven by N S : Y c + In general, the probablty P [Y c + 0 s large. It s therefore convenent to consder the thnned sequence of all clams exceedng the prorty c. We defne H : {0, 1} and M 1 : c,. Then N N 1 χ {Y >c} s the number of all clams exceedng the prorty c. The aggregate clam amount of the rensurer s the sample sum S N 1 Y 1 c where { Y 1 } s..d. wth P [ Y 1 > c 1 and ndependent of N 1 by Theorem N 2.1. Health Insurance: We consder a portfolo of n rsks, that means P [N n 1. The annual cost per head depends on the age of the nsured and on the observaton year. Let X N 2 0 be the fnte set consstng of the possble pars of ages and observaton years and defne M, M X R, 2 X BR. Then the sequence of random varables {Y } N s assumed to be..d. wth Y X, T, K where X s the age of the nsured, T s the observaton perod, and K s the annual cost. In order to estmate the average cost per head for each age and each observaton year, the sample {Y } {1,...,n} has to be decomposed accordng to the values of the sample {X, T } {1,...,n}. Therefore we defne H : X and M h : {h} R. Then for each x, t X n N x,t χ {X,T x,t} s the random number of nsured of age x and observed n perod t. The samplng varables of group x, t are Y x,t X x,t, T x,t, K x,t x, t, K x,t. We are 7
nterested n an estmator of the average cost per head E [ K x,t n each group. By Theorem 3.2 the sample mean of the thnned sequence K x,t : 1 N x,t N x,t K x,t s an unbased estmator of E [ K x,t, f we have at least one observaton n group x, t X. For further calculatons lke regresson the theorem gves the varance covarance structure of the estmators. 5 Proof of Theorem 2.1 We frst prove that the sequence of the samplng varables of all groups s..d. asserton a. 5.1 Theorem. For each h H the sequence { Y h } s..d. wth for all A M. N P [ Y h A Y A Y M h Proof. Let h H. For all k N, let Ek denote the collecton of all strctly ncreasng sequences {m } {1,...,k} N. For E {m } {1,...,k} Ek, defne JE : {1,..., m k }\E. Then the denttes m } {Y m M h } {Y m / M h } m E m JE and hence [ k P m } ηh1 k η h m k k hold for all k N and for all E {m } {1,...,k} Ek. For two dstnct sequences {m } {1,...,k} Ek and {m } {1,...,k} Ek we have Furthermore we get [ k P m } E Ek m } m } 1 E Ek m k k 8 l0 η k h1 η h m k k mk 1 η k m k k h1 η h m k k k + l 1 η k l h1 η h l
For all A 1,..., A k M and E {m } {1,...,k} Ek we have [ k P {Y h A } m } [ k {Y m A } m } {Y m A } {Y m M h } {Y l / M h } l JE {Y m A } {Y m M h } {Y l / M h } l JE [{Y m A } {Y m M h } P [Y l / M h l JE Y m A Y m M h P [Y m M h P [Y l / M h l JE Y m A Y m M h P {Y m M h } {Y l / M h } l JE [ k Y A Y M h P m } Summaton over all sequences n Ek yelds Usng ths dentty we get [ k P {Y h A } P [ Y h A PY A Y M h Y A Y M h for all {1,..., k} and therefore whch completes the proof. [ k P {Y h A } P [ Y h A For the proof of the asserton b we need a famly of sequences whch generalzes the set Ek from the last proof. For s N and k 1,..., k s N 0, such that max{k 1,..., k s } 1, and for each r {1,..., s} denote by D r k 1,..., k s the collecton of all s tuples of strctly ncreasng sequences {m j } {1,...,lj } N satsfyng l r k r and l j k j as well as m lj < m lr l 1 +...+l s for all j {1,..., s}\{r}, such that some of these sequences 9
may be empty and the dsjont unon of these sequences s {1,..., l 1 +... + l s }. Further we defne s Dk 1,..., k s : D r k 1,..., k s Note that D r k 1,..., k s f k r 0. Furthermore there exsts a bjecton between Dk, 0 and Ek as used n the proof of Theorem 5.1. For the proof of asserton b we need the followng lemma. 5.2 Lemma. For all s N and k 1,..., k s N 0 wth max{k 1,..., k s } 1 the equaton s Dk 1,...,k s r1 ϑ l 1 holds for each ϑ 1,..., ϑ s 0, 1 wth ϑ 1 +... + ϑ s 1. Proof. We wll prove ths lemma by nducton over both s and k 1 +... + k s. If s 1 and k 1 1 the asserton follows mmedately. Now we consder the case, that at least one of the k j s s equal to zero. Wthout loss of generalty we assume that k s 0. In ths case we have s > 1 and hence ϑ < 1 for all {1,..., s}. Thus we obtan s ϑ l Dk 1,...,k s s j1 s 1 j1 s 1 j1 s 1 j1 s 1 j1 r 1,...,r s 1,rs k 1,...,k s 1,0 r j k j r 1,...,r s 1,rs k 1,...,k s 1,0 r j k j r 1,...,r s 1 k 1,...,k s 1 r j k j r 1,...,r s 1 k 1,...,k s 1 r j k j r s0 r 1,...,r s 1 k 1,...,k s 1 r j k j s 1 ϑ #{D D j k 1,..., k s {1,..., s} : l r } #{D D j k 1,..., k s {1,..., s} : l r } r s0 r 1 +... + r s 1 r 1... r j 1 r j 1 r j+1... r s Γr 1 +... + r s 1 Γr j r 1!... r j 1!r j+1!... r s 1! s Γr 1 +... + r s 1 + r s Γr 1 +... + r s 1 r s! ϑrs s 1 ϑ s r 1+...+r s 1 r 1 +... + r s 1 1 r 1... r j 1 r j 1 r j+1... r s 1 1 ϑ s r 10 ϑ r s 1 ϑ 1 ϑ s s s r ϑ l ϑ l
s 1 j1 r 1,...,r s 1 k 1,...,k s 1 r j k j s 1 ϑ Dk 1,...,k s 1 1 ϑ s s 1 ϑ 1 ϑ s #{D D j k 1,..., k s 1 {1,..., s 1} : l r } r l If all k j > 0, then we splt Dk 1,..., k s nto s parts: For each r {1,..., s} denote by D r k 1,..., k s the collecton of all s tuples {m 1 have } {1,...,l1 },..., {m s } {1,...,ls} Dk 1,..., k s satsfyng m r 1 1. Then we Dk 1,..., k s s D r k 1,..., k s r1 Furthermore, there are obvous bjectons between D r k 1,..., k r 1, k r, k r+1,..., k s and Dk 1,..., k r 1, k r 1, k r+1,..., k s. Therefore the asserton follows by nducton. Now we are able to prove that the sequences of sample varable of dfferent groups are ndependent asserton b. Furthermore, we shall prove that the famly of all these sequences s ndependent of the famles of stoppng tmes and the sample sze. We wll need ths result for the proof of asserton c. 5.3 Theorem. The famly of the thnned sequences and the par are ndependent. { {Y } } h N {{ {Y } } {{ h {ν } } }} h, N, N N Proof. For all famles {k h } N 0 such that max{k h h H} 1 and for all } {1,...,lh }} D{k h } the denttes {{m h l h m h } l h {Y m h M h } and hence P l h m h } η l h h 11
hold. Usng Lemma 5.2 we get P D{k h } l h m h } 1 Let {{m h } {1,...,lh }} D{k h } and {{ m h } {1,...,lh }} D{k h } be two dstnct famles. Then we have l h m h } l h m h } By usng Theorem 5.1 we get for all k N and A h M for {1,..., k} and h H, for all famles {{m h } {1,...,lh }} D{k}, and for all n N 0 P {Y h j j1 j1 j1 j1 j1 A h j } l h {Y m h A h } {Y m h A h } j j m h } {N n} l h l h {Ym h A h j } {Y m h M h } l h k+1 m h } {N n} {Y m h M h } {N n} {Y m h M h } {N n} P [ {Y m h A h j } {Y m h M h } l h P [ Y m h M h P [N n k+1 P Y m h A h j Y m h M h j1 l h P [ Y m h M h P [N n P [ Y h j A h j P l h {Y m h M h } {N n} j1 P [ Y h j A h j P l h m h } {N n} j1 P {Y h j A h j } P l h m h } {N n} j1 12
Summaton over all {{m h } {1,...,lh }} D{k} and all n N yelds P {Y h A h P {Y h A h j1 j j } Hence t s clear, that the sequences of samplng varables of dfferent groups are ndependent. By usng the last equalty we also get P {Y h j j1 {Y h j j1 It s easly seen that A h j } A h j1 l h m h j } P l h j j } } {N n} m h } {N n} P {Y h j A h j } m h } {N n} 1 j K h 2 L h {Y h j A h j } P m h } {N n} 1 j K h 2 L h holds for all H 1, H 2 H, fnte K h, L h N and A h j M for h H 1 and j K h, m h N 0 for h H 2 and L h and n N 0, whch completes the proof. Next we prove that the famly of the sequences of samplng varables of each group s ndependent of the famly of random sample szes asserton c. 5.4 Theorem. The par s ndependent. {{ {Y } } h, { N h} } N Proof. We have for each h H and n h N 0 the dentty {N h n h } n h N < ν h n h +1} For all k N, A h M for {1,..., k} and h H, and {n h } N 0 defne n : n h. By usng Theorem 5.3 we get P {Y h {Y h A h } A h } {N h n h } {N h n h } {N n} 13
{Y h {Y h {Y h {Y h {Y h {Y h A h } {ν n h h N < ν h A h } A h } A h } A h } A h } n h P P P P n h +1} {N n} n < ν h n h +1} {N n} n h n h n < ν h n h +1} {N n} N < ν h n h +1} {N n} {N h n h } {N n} {N h n h } and the asserton follows. We fnsh the proof of Theorem 2.1 by showng that the jont dstrbuton of the random sample szes of all groups s a condtonal multnomal dstrbuton gven the sample sze asserton d. 5.5 Theorem. The condtonal jont dstrbuton of {N h } gven N s the condtonal multnomnal dstrbuton wth the parameters N and {η h }. Proof. For all {n h } N 0 defne n : n h. If P [N n > 0, then we get P {N h n h } N n {N h n h } {N n} / P [N n { N { n { n {I h } {1,...,n} I h {1,...,n} #I h n h, } χ {Y M h } n h {N n} / P [N n } χ {Y M h } n h {N n} / P [N n } χ {Y M h } n h P {Y M h } I h 14
{I h } {1,...,n} I h {1,...,n} #I h n h, {I h } {1,...,n} I h {1,...,n} #I h n h, n! n h! The asserton now follows. η n h h I h P [Y M h η n h h Acknowledgment The author wshes to thank Klaus D. Schmdt for varous dscussons and for hs helpful suggestons. References Cramér, H. 1946: Mathematcal methods of statstcs. Prnceton N. J.: Prnceton Unversty Press. Franke, T.; Macht, W. 1995: Decomposton of Rsk Processes. Dresdner Schrften zur Verscherungsmathematk 2/1995. Hess, K. Th.; Schmdt, K. D. 1994: A Remark on Modellng IBNR Clam Numbers wth Random Delay Pattern. Dresdner Schrften zur Verscherungsmathematk 4/1994. Hess, K. Th.; Macht, W.; Schmdt, K. D. 1995: Thnnng of Rsk Processes. Dresdner Schrften zur Verscherungsmathematk 1/1995. Schmdt, K. D. 1996: Lectures on Rsk Theory. Stuttgart: B. G. Teubner. Segel, G. 1995: Gewchtete Ausglechsverfahren für Kopfschadenrehen und Lestungsschätzungen n der PKV. Blätter DGVM 22, 419 441. Stenger, H. 1986: Stchproben. Hedelberg Wen: Physca Verlag. Klaus Th. Hess Insttut für Mathematsche Stochastk Technsche Unverstät Dresden D 01062 Dresden Germany Aprl 2, 2009 15