Developing a Data Validation Tool Based on Mendelian Sampling Deviations

Similar documents
J. Dairy Sci. 91: doi: /jds American Dairy Science Association, 2008.

Comparison of Regression Lines

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

A Robust Method for Calculating the Correlation Coefficient

STAT 511 FINAL EXAM NAME Spring 2001

Negative Binomial Regression

UNR Joint Economics Working Paper Series Working Paper No Further Analysis of the Zipf Law: Does the Rank-Size Rule Really Exist?

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Statistical Evaluation of WATFLOOD

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for U Charts. Dr. Wayne A. Taylor

Chapter 13: Multiple Regression

DETERMINATION OF UNCERTAINTY ASSOCIATED WITH QUANTIZATION ERRORS USING THE BAYESIAN APPROACH

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

USE OF DOUBLE SAMPLING SCHEME IN ESTIMATING THE MEAN OF STRATIFIED POPULATION UNDER NON-RESPONSE

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor

/ n ) are compared. The logic is: if the two

Kernel Methods and SVMs Extension

Estimation of Genetic and Phenotypic Covariance Functions for Body Weight as Longitudinal Data of SD-II Swine Line

STATISTICS QUESTIONS. Step by Step Solutions.

[ ] λ λ λ. Multicollinearity. multicollinearity Ragnar Frisch (1934) perfect exact. collinearity. multicollinearity. exact

Chapter 9: Statistical Inference and the Relationship between Two Variables

SIMPLE LINEAR REGRESSION

Uncertainty in measurements of power and energy on power networks

Lecture 6: Introduction to Linear Regression

Evaluation of Validation Metrics. O. Polach Final Meeting Frankfurt am Main, September 27, 2013

Statistical registers by restricted neighbor imputation

Regression Analysis. Regression Analysis

Sampling Theory MODULE VII LECTURE - 23 VARYING PROBABILITY SAMPLING

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Chapter 11: Simple Linear Regression and Correlation

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

Regulation No. 117 (Tyres rolling noise and wet grip adhesion) Proposal for amendments to ECE/TRANS/WP.29/GRB/2010/3

Uncertainty as the Overlap of Alternate Conditional Distributions

Recall that quantitative genetics is based on the extension of Mendelian principles to polygenic traits.

Linear Regression Analysis: Terminology and Notation

Turbulence classification of load data by the frequency and severity of wind gusts. Oscar Moñux, DEWI GmbH Kevin Bleibler, DEWI GmbH

Statistics II Final Exam 26/6/18

NUMERICAL DIFFERENTIATION

Computing MLE Bias Empirically

Andreas C. Drichoutis Agriculural University of Athens. Abstract

ONE DIMENSIONAL TRIANGULAR FIN EXPERIMENT. Technical Advisor: Dr. D.C. Look, Jr. Version: 11/03/00

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Chapter 6. Supplemental Text Material

arxiv:cs.cv/ Jun 2000

Supporting Information

Statistics for Economics & Business

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Appendix B: Resampling Algorithms

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Sampling Theory MODULE V LECTURE - 17 RATIO AND PRODUCT METHODS OF ESTIMATION

on the improved Partial Least Squares regression

Phase I Monitoring of Nonlinear Profiles

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

Homework Assignment 3 Due in class, Thursday October 15

x = , so that calculated

Statistics for Business and Economics

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Cokriging Partial Grades - Application to Block Modeling of Copper Deposits

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

THE SUMMATION NOTATION Ʃ

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Supplemental document

Global Sensitivity. Tuesday 20 th February, 2018

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Basically, if you have a dummy dependent variable you will be estimating a probability.

Statistical tools to perform Sensitivity Analysis in the Context of the Evaluation of Measurement Uncertainty

experimenteel en correlationeel onderzoek

Regularized Discriminant Analysis for Face Recognition

The Granular Origins of Aggregate Fluctuations : Supplementary Material

One-sided finite-difference approximations suitable for use with Richardson extrapolation

AS-Level Maths: Statistics 1 for Edexcel

Ensemble Methods: Boosting

Uncertainty and auto-correlation in. Measurement

Effective plots to assess bias and precision in method comparison studies

x i1 =1 for all i (the constant ).

UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Chapter 11 Analysis of Variance - ANOVA. Instructor: Ivo Dinov,

STAT 3008 Applied Regression Analysis

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Development of a Semi-Automated Approach for Regional Corrector Surface Modeling in GPS-Levelling

A Hybrid Variational Iteration Method for Blasius Equation

HLW. Vol.9 No.2 [2,3] aleatory uncertainty 10. ignorance epistemic uncertainty. variability. variability ignorance. Keywords:

Introduction to Dummy Variable Regressors. 1. An Example of Dummy Variable Regressors

Lecture 4 Hypothesis Testing

Generalized Linear Methods

DERIVATION OF THE PROBABILITY PLOT CORRELATION COEFFICIENT TEST STATISTICS FOR THE GENERALIZED LOGISTIC DISTRIBUTION

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

a. (All your answers should be in the letter!

Multivariate Ratio Estimator of the Population Total under Stratified Random Sampling

Introduction to Regression

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

Fuzzy Boundaries of Sample Selection Model

On Outlier Robust Small Area Mean Estimate Based on Prediction of Empirical Distribution Function

NEW ASTERISKS IN VERSION 2.0 OF ACTIVEPI

Genetic Evaluation of Fertility Traits of Dairy Cattle Using a Multiple Trait Model

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Transcription:

Developng a Data Valdaton Tool Based on Mendelan Samplng Devatons Flppo Bscarn, Stefano Bffan and Fabola Canaves A.N.A.F.I. Italan Holsten Breeders Assocaton Va Bergamo, 292 Cremona, ITALY Abstract A more comprehensve valdaton procedure for nternatonal genetc evaluaton data s lkely to be welcome n a scenaro of growng exchange of lvestock and semen. Mendelan samplng (MS) devatons are used n ths paper to buld a tool for valdaton of natonal genetc proofs before submsson to the Interbull system. The regresson analyss of the mean of MS terms proved to be more senstve to small bases than the current valdaton methodology: t detected a bas that n the current stuaton would have a major nfluence on the nternatonal rankngs (295% more Italan bulls for proten yeld). Introducton The ssue of data qualty for the genetc evaluaton of dary cattle has become a great concern over tme, for a number of reasons such as: 1. the ncreased nternatonal trade of lvestock and cattle semen; 2. the related ncreasng mportance of the nternatonal genetc evaluaton of cattle carred out by the Interbull Centre; 3. the adopton, n many countres, of Test-day models for producton trats, whch restrcted the possblty of use of current valdaton methods; 4. the evaluaton of new trats, for whch Webull and threshold models are used, that make t more dffcult to apply the current valdaton methods; 5. the ncreasng demand for hghly accurate genetc nformaton from the ndustry, breeders and farmers. At present, Interbull checks for valdaton of genetc trend and sre standard devaton. The genetc trend valdaton test comprses three alternatve methods: the frst one compares frst-lactaton to multple-lactaton proofs; the second one analyses daughter yeld devatons (DYD) over tme; the last one s based on a regresson analyss of offcal proofs through years. These three methods all have some drawbacks: the frst one s not sutable for genetc evaluaton procedures that do not use a lactaton repeatablty model. DYDs are not easly estmated for Test-day models. The thrd method does not take nto account the contnuous updatng of genetc evaluaton technques. The sre standard devaton test s not entrely approprate for data valdaton purposes, snce t focuses only on the degree of change n sre standard devaton across consecutve MACE runs (Mglor et al., 22). Furthermore, as shown n fgure 1, a small bas that could not be detected by the current valdaton procedure (purposely just under the threshold of detecton) can lead to huge dfferences n the results of the nternatonal genetc evaluaton; the proporton of Italan bulls n the top1 rankng for proten yeld can be up to 19% before the bas s detected, a 295% ncrease compared to the offcal results (Bscarn et al., 26). Thus, there are both need and scope for the development of addtonal tools to valdate genetc evaluaton data: as suggested by prevous works (Van Doormal et al., 1999; Van Doormal & Mglor, 2; Mglor et al., 22) Mendelan samplng (MS) devatons can form the bass of a new data qualty assessment tool. 117

MS terms can be used to valdate both natonal data before submsson to the MACE model and nternatonal proofs after Interbull s calculatons. The theoretcal expectaton s that trends n mean and varance of MS should reman constant over tme. Objectve of ths paper s to present results of the use of MS terms to valdate natonal proofs before submsson for the MACE model under offcal and based crcumstances. Materal and Methods Ths study s based on the February 25 Holsten bulls genetc proofs for mlk, proten and fat yeld of 8 countres (Canada, Germany, Denmark, France, Italy, The Netherlands, USA and UK), whch consttute the nput data of the MACE model. Three sets of Italan nput EBVs were used: the offcal February 25 EBVs and two artfcally based sets: - one wth a large bas; - one wth a small bas. The Italan based data were compared to the offcal data from the other countres n reduced MACE runs. Both bases were constructed so to accumulate over tme, usng the followng quadratc functon: n y = a, where: y s the bas to be added to the th trat (mlk, proten, or fat yeld); a s an ad hoc coeffcent for each trat and each based dataset (large or small bas); n s the renumbered brth-year of bulls used as exponent of the functon. The coeffcents for mlk, fat and proten yeld were emprcally derved and were 1.31, 1.28, 1.26 and 1.4, 1.33, 1.33 for the small and large bas respectvely. For fat and proten yelds these coeffcents were dvded by 1, to account for the dfferences n magntude of ther standard devatons compared to that of mlk yeld. The same bases were added to the hol4 fles, wth nformaton on past genetc evaluatons, needed for the offcal trend valdaton method 3, whch analyses the varaton of natonal proofs across evaluaton runs (Bochard et al., 1994). For trend valdaton, Interbull method 3 and the regresson analyss of MS terms have been compared for effcacy. Accordng to the Interbull Code of Practce, the t parameter of method 3 must be lower than 2% of the standard devaton of the trat consdered for the data to be accepted (Interbull, 26); the same crteron has been used for the regresson coeffcents of MS trends. Wthn-country sre varances for each one of the 3 above mentoned Italan datasets, were calculated usng a copy of the offcal MACE programs provded by the Interbull Centre. SAS statstcal procedures have been used to generate the based datasets, and to valdate the genetc trend usng ether offcal method 3 or MS devatons trends. Results Frst of all, the mpact of bases on sre varance estmaton has been consdered: results are shown n table 1. Whle n the case of the large bas the varaton far exceeds the 5% lmt set by the Interbull Centre and data are therefore easly dscarded, when the bas s small the sre standard devatons are well wthn the boundares (,85, 1,3 and,81% for mlk, fat and proten respectvely), data are accepted and have qute an effect on fnal MACE results. 118

Interbull method 3 for trend valdaton (table 2) can t detect a small but not neglgble bas ether, both n the scenaro of a recently ntroduced bas (offcal hol4 fle) and of a long establshed one (based hol4 fle). When lookng at the regresson analyss of MS terms n table 3, t can be notced that ther average trend through years s more senstve to bases than method 3, beng able to detect also the small bas that has been ntroduced n the Italan data: regresson coeffcents are well beyond the lmt of the 2% of the standard devaton of the trat even n ths case. Ths can be deduced also vsually, lookng at the graph n fgure 2. Contrarwse, the regresson analyss of the varance of MS devatons does not seem very useful n assessng the valdty of natonal genetc evaluatons: trends look flat and regresson coeffcents reman always wthn the 2% threshold of acceptance. However, ths s lkely due to the addtve nature of the bas. Conclusons Current Interbull offcal valdaton procedures can hypothetcally allow for the ntroducton of bases wth potentally consderable effects on MACE results. The development of a new tool, based on MS terms, can provde a more senstve method to ensure a better data qualty. The regresson analyss of the average MS devatons, combned wth the other currently avalable methods, can help dentfy also small, but not neglgble, bases n natonal genetc evaluatons. MS varance does not seem as effectve. The analyss of MS of also nternatonal proofs, would properly supplement ths valdaton tool and can be the objectve of a further study. References Bscarn, F., Bffan, S. & Canaves, F. 26. The consequences of bases n the nternatonal genetc evalauton. Proceedngs of WCGALP 8 (submtted). Mglor, F., Sullvan, P. & Van Doormaal, B. 22. Prelmnary analyss of Mendelan samplng terms for genetc evaluaton valdaton. Interbull Bulletn 29, 183-187. Van Doormaal, B. & Mglor, F. 2. Trends n sre varance estmates by brth year. Interbull Bulletn 25, 7-73. Van Doormaal, B., Kstemaker, G. & Sullvan, P. 1999. Heterogeneous varances of Canadan Bull EBVs over tme. Interbull Bulletn 22, 141-148. Bonat, B., Bochard, D., Barbat, A. & Mattala, S. 1994. Three methods to valdate the estmaton of genetc trend n dary cattle. Interbull Bulletn 1, 9 pp. Interbull code of practce, 26. - http://wwwnterbull.slu.se/servce_documentaton/ge neral/code_of_practce/framesdacode.htm. Acc.29/5/6. SAS 1982. User s Gude: Statstcs. Verson 5.18. SAS Inst., Inc., Cary, NC. 119

Table 1. Italan sre standard devaton. uff small bas large bas mlk 355 358 443 fat 13,8 13,25 14,47 proten 11,15 11,24 12,83 mlk fat proten Table 2. Results of trend valdaton by means of Interbull method 3. offcal large bas small bas reference Trat t std err t std err t std err dev std 2% hol4_off 1,517 12,93-68,1 24,31-14,37 12,7 hol4_b4 - - -67,66 24,18 - - 875,396 17,58 hol4_b5 - - - - -14,11 12,7 hol4_off,347,53-1,96,84 -,62,55 hol4_b4 - - -1,71,76 - - 36,95,738 hol4_b5 - - - - -,42,53 hol4_off,334,47-2,35,79 -,45,46 hol4_b4 - - -2,7,7 - - 27,139,543 hol4_b5 - - - - -,21,45 Table 3. Regresson analyss of average and standard devaton of MS terms. Offcal feb 5 small bas large bas Ms_mlk ms_fat ms_prot ms_mlk ms_fat ms_prot ms_mlk ms_fat ms_prot mean ref std dev B -14,24 -,58 -,59 28,55 1,89,99 233,94 6,12 6,6 std err 1,55,7,4 6,37,31,2 35,19,85,83 std dev 875,396 36,95 27,139 mlk fat proten 2% 17,58,738,543 std err 1,55,32 1,11,5,3 7,5,14,14 B -1,78,23 -,21 -,55,6,5 23,31,38,42 Ms_stdm ms_stdf ms_stdp ms_stdm ms_stdf ms_stdp ms_stdm ms_stdf ms_stdp 12

Fgure 1. Effects of the sze of an addtve bas on nternatonal proten yeld rankngs. % ITA bulls n MACE top1 1,9,8,7,6,5,4,3,2,1 Influence of bas on results y =,175e,8273x,,17,117,13,146 log transformed bas top1 threshold functon Fgure 2. Trend of average MS for proten yeld. MS trend - proten mean ms_prot_uff ms_prot_bas5 16 ms_prot_bas4 14 12 1 8 6 4 2-2 1981 1982 1983 1984 1985 1986 1987 1988 1989 199 1991 1992 1993 1994 1995 1996 1997 1998 ms mean 1999 2 brthyear Fgure 3. Trend of the standard devaton of MS for proten yeld. MS Trend - proten std std_prot_off std_prot_largeb 2 std_prot_smallb 18 16 14 MS std 12 1 8 6 4 2 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 brthyear 121