Developng a Data Valdaton Tool Based on Mendelan Samplng Devatons Flppo Bscarn, Stefano Bffan and Fabola Canaves A.N.A.F.I. Italan Holsten Breeders Assocaton Va Bergamo, 292 Cremona, ITALY Abstract A more comprehensve valdaton procedure for nternatonal genetc evaluaton data s lkely to be welcome n a scenaro of growng exchange of lvestock and semen. Mendelan samplng (MS) devatons are used n ths paper to buld a tool for valdaton of natonal genetc proofs before submsson to the Interbull system. The regresson analyss of the mean of MS terms proved to be more senstve to small bases than the current valdaton methodology: t detected a bas that n the current stuaton would have a major nfluence on the nternatonal rankngs (295% more Italan bulls for proten yeld). Introducton The ssue of data qualty for the genetc evaluaton of dary cattle has become a great concern over tme, for a number of reasons such as: 1. the ncreased nternatonal trade of lvestock and cattle semen; 2. the related ncreasng mportance of the nternatonal genetc evaluaton of cattle carred out by the Interbull Centre; 3. the adopton, n many countres, of Test-day models for producton trats, whch restrcted the possblty of use of current valdaton methods; 4. the evaluaton of new trats, for whch Webull and threshold models are used, that make t more dffcult to apply the current valdaton methods; 5. the ncreasng demand for hghly accurate genetc nformaton from the ndustry, breeders and farmers. At present, Interbull checks for valdaton of genetc trend and sre standard devaton. The genetc trend valdaton test comprses three alternatve methods: the frst one compares frst-lactaton to multple-lactaton proofs; the second one analyses daughter yeld devatons (DYD) over tme; the last one s based on a regresson analyss of offcal proofs through years. These three methods all have some drawbacks: the frst one s not sutable for genetc evaluaton procedures that do not use a lactaton repeatablty model. DYDs are not easly estmated for Test-day models. The thrd method does not take nto account the contnuous updatng of genetc evaluaton technques. The sre standard devaton test s not entrely approprate for data valdaton purposes, snce t focuses only on the degree of change n sre standard devaton across consecutve MACE runs (Mglor et al., 22). Furthermore, as shown n fgure 1, a small bas that could not be detected by the current valdaton procedure (purposely just under the threshold of detecton) can lead to huge dfferences n the results of the nternatonal genetc evaluaton; the proporton of Italan bulls n the top1 rankng for proten yeld can be up to 19% before the bas s detected, a 295% ncrease compared to the offcal results (Bscarn et al., 26). Thus, there are both need and scope for the development of addtonal tools to valdate genetc evaluaton data: as suggested by prevous works (Van Doormal et al., 1999; Van Doormal & Mglor, 2; Mglor et al., 22) Mendelan samplng (MS) devatons can form the bass of a new data qualty assessment tool. 117
MS terms can be used to valdate both natonal data before submsson to the MACE model and nternatonal proofs after Interbull s calculatons. The theoretcal expectaton s that trends n mean and varance of MS should reman constant over tme. Objectve of ths paper s to present results of the use of MS terms to valdate natonal proofs before submsson for the MACE model under offcal and based crcumstances. Materal and Methods Ths study s based on the February 25 Holsten bulls genetc proofs for mlk, proten and fat yeld of 8 countres (Canada, Germany, Denmark, France, Italy, The Netherlands, USA and UK), whch consttute the nput data of the MACE model. Three sets of Italan nput EBVs were used: the offcal February 25 EBVs and two artfcally based sets: - one wth a large bas; - one wth a small bas. The Italan based data were compared to the offcal data from the other countres n reduced MACE runs. Both bases were constructed so to accumulate over tme, usng the followng quadratc functon: n y = a, where: y s the bas to be added to the th trat (mlk, proten, or fat yeld); a s an ad hoc coeffcent for each trat and each based dataset (large or small bas); n s the renumbered brth-year of bulls used as exponent of the functon. The coeffcents for mlk, fat and proten yeld were emprcally derved and were 1.31, 1.28, 1.26 and 1.4, 1.33, 1.33 for the small and large bas respectvely. For fat and proten yelds these coeffcents were dvded by 1, to account for the dfferences n magntude of ther standard devatons compared to that of mlk yeld. The same bases were added to the hol4 fles, wth nformaton on past genetc evaluatons, needed for the offcal trend valdaton method 3, whch analyses the varaton of natonal proofs across evaluaton runs (Bochard et al., 1994). For trend valdaton, Interbull method 3 and the regresson analyss of MS terms have been compared for effcacy. Accordng to the Interbull Code of Practce, the t parameter of method 3 must be lower than 2% of the standard devaton of the trat consdered for the data to be accepted (Interbull, 26); the same crteron has been used for the regresson coeffcents of MS trends. Wthn-country sre varances for each one of the 3 above mentoned Italan datasets, were calculated usng a copy of the offcal MACE programs provded by the Interbull Centre. SAS statstcal procedures have been used to generate the based datasets, and to valdate the genetc trend usng ether offcal method 3 or MS devatons trends. Results Frst of all, the mpact of bases on sre varance estmaton has been consdered: results are shown n table 1. Whle n the case of the large bas the varaton far exceeds the 5% lmt set by the Interbull Centre and data are therefore easly dscarded, when the bas s small the sre standard devatons are well wthn the boundares (,85, 1,3 and,81% for mlk, fat and proten respectvely), data are accepted and have qute an effect on fnal MACE results. 118
Interbull method 3 for trend valdaton (table 2) can t detect a small but not neglgble bas ether, both n the scenaro of a recently ntroduced bas (offcal hol4 fle) and of a long establshed one (based hol4 fle). When lookng at the regresson analyss of MS terms n table 3, t can be notced that ther average trend through years s more senstve to bases than method 3, beng able to detect also the small bas that has been ntroduced n the Italan data: regresson coeffcents are well beyond the lmt of the 2% of the standard devaton of the trat even n ths case. Ths can be deduced also vsually, lookng at the graph n fgure 2. Contrarwse, the regresson analyss of the varance of MS devatons does not seem very useful n assessng the valdty of natonal genetc evaluatons: trends look flat and regresson coeffcents reman always wthn the 2% threshold of acceptance. However, ths s lkely due to the addtve nature of the bas. Conclusons Current Interbull offcal valdaton procedures can hypothetcally allow for the ntroducton of bases wth potentally consderable effects on MACE results. The development of a new tool, based on MS terms, can provde a more senstve method to ensure a better data qualty. The regresson analyss of the average MS devatons, combned wth the other currently avalable methods, can help dentfy also small, but not neglgble, bases n natonal genetc evaluatons. MS varance does not seem as effectve. The analyss of MS of also nternatonal proofs, would properly supplement ths valdaton tool and can be the objectve of a further study. References Bscarn, F., Bffan, S. & Canaves, F. 26. The consequences of bases n the nternatonal genetc evalauton. Proceedngs of WCGALP 8 (submtted). Mglor, F., Sullvan, P. & Van Doormaal, B. 22. Prelmnary analyss of Mendelan samplng terms for genetc evaluaton valdaton. Interbull Bulletn 29, 183-187. Van Doormaal, B. & Mglor, F. 2. Trends n sre varance estmates by brth year. Interbull Bulletn 25, 7-73. Van Doormaal, B., Kstemaker, G. & Sullvan, P. 1999. Heterogeneous varances of Canadan Bull EBVs over tme. Interbull Bulletn 22, 141-148. Bonat, B., Bochard, D., Barbat, A. & Mattala, S. 1994. Three methods to valdate the estmaton of genetc trend n dary cattle. Interbull Bulletn 1, 9 pp. Interbull code of practce, 26. - http://wwwnterbull.slu.se/servce_documentaton/ge neral/code_of_practce/framesdacode.htm. Acc.29/5/6. SAS 1982. User s Gude: Statstcs. Verson 5.18. SAS Inst., Inc., Cary, NC. 119
Table 1. Italan sre standard devaton. uff small bas large bas mlk 355 358 443 fat 13,8 13,25 14,47 proten 11,15 11,24 12,83 mlk fat proten Table 2. Results of trend valdaton by means of Interbull method 3. offcal large bas small bas reference Trat t std err t std err t std err dev std 2% hol4_off 1,517 12,93-68,1 24,31-14,37 12,7 hol4_b4 - - -67,66 24,18 - - 875,396 17,58 hol4_b5 - - - - -14,11 12,7 hol4_off,347,53-1,96,84 -,62,55 hol4_b4 - - -1,71,76 - - 36,95,738 hol4_b5 - - - - -,42,53 hol4_off,334,47-2,35,79 -,45,46 hol4_b4 - - -2,7,7 - - 27,139,543 hol4_b5 - - - - -,21,45 Table 3. Regresson analyss of average and standard devaton of MS terms. Offcal feb 5 small bas large bas Ms_mlk ms_fat ms_prot ms_mlk ms_fat ms_prot ms_mlk ms_fat ms_prot mean ref std dev B -14,24 -,58 -,59 28,55 1,89,99 233,94 6,12 6,6 std err 1,55,7,4 6,37,31,2 35,19,85,83 std dev 875,396 36,95 27,139 mlk fat proten 2% 17,58,738,543 std err 1,55,32 1,11,5,3 7,5,14,14 B -1,78,23 -,21 -,55,6,5 23,31,38,42 Ms_stdm ms_stdf ms_stdp ms_stdm ms_stdf ms_stdp ms_stdm ms_stdf ms_stdp 12
Fgure 1. Effects of the sze of an addtve bas on nternatonal proten yeld rankngs. % ITA bulls n MACE top1 1,9,8,7,6,5,4,3,2,1 Influence of bas on results y =,175e,8273x,,17,117,13,146 log transformed bas top1 threshold functon Fgure 2. Trend of average MS for proten yeld. MS trend - proten mean ms_prot_uff ms_prot_bas5 16 ms_prot_bas4 14 12 1 8 6 4 2-2 1981 1982 1983 1984 1985 1986 1987 1988 1989 199 1991 1992 1993 1994 1995 1996 1997 1998 ms mean 1999 2 brthyear Fgure 3. Trend of the standard devaton of MS for proten yeld. MS Trend - proten std std_prot_off std_prot_largeb 2 std_prot_smallb 18 16 14 MS std 12 1 8 6 4 2 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 brthyear 121