Atte Moilanen 1, Liselotte Sundström 2 & Jes Søe Pedersen 3

Size: px

Start display at page:

Download "Atte Moilanen 1, Liselotte Sundström 2 & Jes Søe Pedersen 3"

Alaina Holland
6 years ago
Views:

1 MATESOFT A PROGRAM FOR GENETIC ANALYSIS OF MATING SYSTEMS VERSION.0 DOCUMENTATION Atte Moilanen, Liselotte Sundström & Jes Søe Pedersen 3 Implementation and algorithms ajmoilan@mappi.helsinki.fi Department of Biological and Environmental Sciences University of Helsinki Concept and testing liselotte.sundstrom@helsinki.fi Department of Biological and Environmental Sciences University of Helsinki 3 Algorithms, documentation and testing JSPedersen@bi.ku.dk Department of Population Biology Institute of Biology University of Copenhagen November 006 MateSoft.Documentation.(6).doc/JSP

2 Table of Contents A. MATESOFT HELP Introduction Data Types and Analyses General Info on Input Format F Data: Deducing Queens from Known Offspring FQ Data: Deducing Fathers and Assigning Patrilines FQM Data: Mating Frequency Statistics QM Data: Mating Frequency Statistics File Configuration Menu Queen and Mate Deduction Menu Mating Frequency Statistics Menu Mating Frequency Statistics Output Troubleshooting and Special Use FAQ List of Example Files... 3 B. DEDUCING QUEEN GENOTYPES: ALGORITHMS FOR F DATA ANALYSIS Power Analysis Deducing Putative Queen Genotypes Worked-out Examples C. DEDUCING MATE GENOTYPES AND ASSIGNING PATRILINES: ALGORITHMS FOR FQ DATA ANALYSIS Defining Possible Fathers Selecting Putative Fathers (Parentage Analysis) Output Patriline Assignment and Putative Mates Some Simple Worked-out Examples... 4 D. ESTIMATION OF MATING FREQUENCY STATISTICS Estimation of the Paternity Skew c Estimation of the Nonidentification Error f Estimation of the Proportion of Double-Mated Queens D est and Effective Mate Number m e,p Estimations Based on Sperm Typing Statistics for Data with Unlimited Number of Queen Matings Statistics for Data with Only Single Matings Detected E. RELEVANT SOFTWARE AND LITERATURE Current Software for Parentage Analysis Literature MateSoft.Documentation.(6).doc/JSP

3 3 A. MATESOFT Help. Introduction MATESOFT is a software for the analysis of mating systems in male-haplodiploid organisms based on co-dominant genetic marker data. It is intended to be used in studies of hymenopteran social insects, where queens may have mated with one or several males and males may sire a variable proportion of the queen s offspring. The genetic data can be queen genotypes, genotypes of single-queen female offspring, and genotypes of sperm stored in the queen s spermatheca. The current version of MATESOFT has the following main features: Deduction of possible queen genotypes from offspring data when the mother could not be analysed (so-called F data; F stands for female offspring). Deduction of genotypes of putative males mated with the queen from offspring and queen data (FQ data). Assigning offspring to patrilines corresponding to the queen s putative mates (also FQ data). Estimation of mating frequency statistics from data where the queen can be either single or double mated. The statistics calculated are paternity skew, proportion of multiple mated queens, and average effective mate number using the algorithms in Pedersen and Boomsma (999a). The input is either genotypes of queens, their putative mates and offspring assigned to patrilines (FQM data), or genotypes of queens and sperm from their spermathecae (QM data). Estimation of summary mating frequency statistics from data where the queen can have any number of matings (FQM data). These include the sum of squared paternity contributions, frequency distribution of observed mate number, and average observed mate number. Deduction of parental genotypes and patriline assignment can be done for any number of queen matings. There are currently no general procedures available for estimating the effective mate number and related statistics in systems where a queen may have an arbitrary number of mates, each contributing an unknown proportion to the brood. We hope, however, to include methods for this in a future version of MATESOFT. System requirements: MATESOFT runs on any PC under a 3 bit version of Microsoft Windows and will also work under PC-Windows emulation on a Macintosh. Availability: MATESOFT can be downloaded free of charge at Registration: Please register by using the form on the MATESOFT homepage. This ensures that we can keep you updated about bugs and new releases. Support: If you have a problem or question please first have a close look at the example files and check the FAQ section. If you remain puzzled, contact Atte Moilanen about MateSoft.Documentation.(6).doc/JSP

4 4 possible bugs or how to make the program work, or Jes Søe Pedersen or Lotta Sundstöm about any other issue. Citation as publication (preferred): Moilanen A, Sundström L & Pedersen JS (004) MateSoft: a program for deducing parental genotypes and estimating mating system statistics in haplodiploid species. Molecular Ecology Notes, 4, Citation as software (alternative): Moilanen A, Sundström L & Pedersen JS (004) MateSoft: a program for genetic analysis of mating systems.0. Institute of Biology, University of Copenhagen, Copenhagen. Available at Acknowledgements: We thank the beta users for their feedback on software performance, and in particular Koos Boomsma, Elisabeth Brunner, Michael Haberl, Annette Bruun Jensen, Daniel Kronauer, Cathy Liautard, Alexandra Schrempf, Christoph Strehl, Seirian Sumner, and Palle Villesen. The development of this software was supported by the Carlsberg Foundation (J.S.P.), the Swiss National Science Foundation, the Danish National Science Research Council (J.S.P.), the Academy of Finland (L.S. and the Spatial Ecology Programme), and the FW5 EU Research-training network INSECTS (contract HPRN-CT ). The present Documentation is divided into five chapters. Chapter A provides the information needed for most users, including the formats of input and output data and explanations of the various program menus. The three subsequent chapters give a detailed account of the algorithms and estimations performed by the program. Finally, Chapter E lists other available software for parentage and related analysis along with relevant literature. Furthermore, a set of example files are included in the MATESOFT distribution package.. Data Types and Analyses To estimate mating frequency statistics from brood data you need to know the following: population allele frequencies, offspring genotypes, queen genotypes, male genotypes, and the sire of each offspring. If the queen genotypes are not known, MATESOFT can deduce them from the offspring data assuming the lowest number of queen mates that can explain the genotype array among offspring. Furthermore, if genotypes of putative mates are not deduced by the user, MATESOFT can do the analysis and assign offspring to patrilines, hereby completing the information for estimating mating frequency statistics. The deduction and handling of parental genotypes are less trivial than often assumed in such studies and the procedures implemented in MATESOFT will save error-prone analyses previously done by hand. When the statistics are based on sperm from the queen s spermatheca, only queen and sperm genotypes in addition to the population allele frequencies are needed for the estimations. In overview, MATESOFT is able to handle four different data types called as follows: F data: Genotypes of female offspring sorted in groups of sisters. FQ data: As F data but also including actual or deduced queen genotypes. MateSoft.Documentation.(6).doc/JSP

5 5 FQM data: As FQ data but also including deduced mate genotypes and assignation of offspring to putative patrilines. QM data: Genotypes of queens and the sperm stored in their spermathecae. Contains no offspring genotypes. MateSoft.Documentation.(6).doc/JSP

6 6 These data types are used in the following analyses where each arrow corresponds to processing in MATESOFT with input and output files: F.in F.out FQ.in FQ.out FQM.in FQM mating frequency statistics QM.in QM mating frequency statistics In F and FQ analyses two output files are produced. One output file referred to as the extended output data file contains the original data plus the further information deduced by the analysis. This file can usually be applied directly as input file at the following step, but in any case the file should be inspected and modifications may be needed before proceeding. The other file produced, simply referred to as output file, saves the analysis details also given in the screen output. A further description can be found in the relevant sections, and example files are provided for all cases. Estimations done by MATESOFT are based on the following assumptions: Loci are neutral and unlinked. Population allele frequencies are the same for males and females. Queens are not related to their mates, i.e. no regular inbreeding. Multiple mates of the same queen are not related. 3. General Info on Input Format This section gives information that applies to all data types regarding the format of the input file. Please refer to the file {Ex3-FQM.in.txt} for an example of most of the features mentioned. To the extent possible MATESOFT uses the common file format of RELATEDNESS and KINSHIP.3 (both by Keith Goodnight and Dave Queller, see Section E.), so that data can be transferred between these programs with only moderate file modifications. In particular, genotypes can be given in the same format, and through a configuration dialogue box the user can control how the individual data are loaded by the program. The input file is plain text and can be prepared in any text editor or in a spread sheet like Excel by using the save as text option. The file is organised as a spread sheet with rows and columns. Any row starting with an asterisk ( * ) is treated as a comment and simply disregarded by the program. IMPORTANT: all other rows are read with any white-space (space and/or tab stop) delimiting the columns. Empty columns are only allowed in the end of a row. If other columns contain no information they should be filled with an asterisk as place holder, e.g. Queen4<tab>*<tab>*<tab>30/36 in the case with two empty columns before the genotype. It follows that the first column cannot be empty as the row would otherwise be interpreted as a comment line. Furthermore, no spaces are allowed in variable names, e.g. use Queen4 instead of Queen 4. Keeping these rules ensures easy inspection of the file for formatting errors and that MATESOFT counts the columns correctly. MateSoft.Documentation.(6).doc/JSP

7 7 The input is divided in two sections: one for group data and one for individual genotype data. 3. Group Data Section The group data section has three subsections, starting with the lines Demes, K-Groups and Loci and allele frequencies, respectively, and each concluding with the line end. IMPORTANT: The identification of these lines is case-sensitive and the text should be entered exactly as shown here. The first part of the example file {Ex3-FQ.in.txt} illustrates the general lay-out: *MateSoft Data File: Fictive Sample Data #3 *The dataset has two demes, three loci analysed, single/double mating of queens *Group data section starts below this line Demes IslandA IslandB end *Group Deme Included K-Groups IAN IslandA IAN IslandA IAN3 IslandA IAN4 IslandA IAN5 IslandA IAN6 IslandA IAN7 IslandA IAN8 IslandA IBN IslandB IBN IslandB IBN3 IslandB IBN4 IslandB IBN5 IslandB IBN6 IslandB IBN7 IslandB end * Loci and allele frequencies *Locus IslandA IslandB L-Vsp0b L-Ric L-PGM f s end * Demes lists the names of all populations to which the groups (see below) belong. For each group the allele frequencies of its deme will be used as the population reference in all relevant calculations. Hence, it functions the same way as the deme option in RELATEDNESS. A deme name should be given even if all groups come from the same population. MateSoft.Documentation.(6).doc/JSP

8 8 K-Groups lists all group names used by the individuals in the genotype data section. A group comprises all individuals, actual or putative, pertaining to a single queen, her male mates, and her offspring. This is the important unit for estimating the mating frequency statistics and is analogous to the group level in RELATEDNESS. The second column shows to what deme the group belongs. Groups can be included or excluded in the subsequent analysis of the data by assigning (include) or 0 (exclude) in the third column. As the name tells Loci and allele frequencies gives the names of all loci studied and the frequency of all alleles found in each deme. For each locus, the first row starts with the locus name preceded by L-, e.g. L-Loc to indicate locus Loc. The second and subsequent columns give the sample size per deme as the number of diploid genomes (i.e. diploid and haploid individuals counted as and 0.5, respectively) analysed at this locus. Please note that the sample size refers to the individuals analysed for estimates of allele frequencies and not to the individuals in the present dataset. Ideally the estimated allele frequencies for the background population should be based on a large sample of unrelated individuals different from the ones included in the data file. Data for multiple demes should be given in the same order as in the deme list above. The rows below the locus name give the names and frequencies of each allele, deme by deme. The information for all loci is given this way one after another. The variable names for demes, groups, loci, and alleles can be any combination of up to eight letters and figures (case-sensitive). Longer names will be truncated to eight characters by the program when loading the data. Spaces and punctuation characters are not allowed, i.e. F 7, F-7, F_7, and F.7 are invalid names. 3. Individual Genotype Data Section The section with individual data starts with the line Individual genotype data section and concludes with the line end, both case-sensitive. The second row contains labels for the variables as column headings, and subsequent rows give the individual data. The individuals can be given in any order as the data in each row is loaded independently and sorted by the program. The format is illustrated by a stretch of the FQ data in{ex3-fq.in.txt}: Individual genotype data section starts below this line Ind-ID K-Group Class AltQ/M Alt-P Vsp0b Ric45 PGM IANQ IAN Q.0000 /4 60/7 s/s IANF0 IAN F * * /0 7/8 f/s IANF0 IAN F * * / 63/7 f/s IANF03 IAN F * * 4/0 60/8 f/s IANF04 IAN F * * 4/0 60/8 f/s IANF05 IAN F * * / 60/63 f/s { } IAN3Q IAN3 Q.0000 /8 60/69 f/s IAN3F0 IAN3 F * * 8/8 63/69 f/s IAN3F0 IAN3 F * * 8/8 60/63 f/s IAN3F03 IAN3 F * * /8 60/63 f/f IAN3F04 IAN3 F * * 8/8 63/69 f/s IAN3F05 IAN3 F * * 8/8 60/63 f/f IAN3F06 IAN3 F * *?/? 63/69 f/f IAN3F07 IAN3 F * * 8/8 63/69 f/s MateSoft.Documentation.(6).doc/JSP

9 9 and by some of the FQM data in {Ex-FQM.in.txt} Individual genotype data section starts below this line * Here starts computer generated section of queen genotypes *Male genotypes in the below section are computer generated *Ind-ID K-Group Class AltQ/M Alt-P Loc Q G0 Q.0000 aa M G0 M /.0000 a F00 G0 F * * aa F00 G0 F * * aa F003 G0 F * * aa F004 G0 F * * aa F005 G0 F * * aa { } Q0 G08 Q ab Q G08 Q cd M4 G08 M /.0000 c M5 G08 M /.0000 d M6 G08 M /.0000 a M7 G08 M /.0000 b F085 G08 F * * ac F086 G08 F * * bc F087 G08 F * * bc F088 G08 F * * bc F089 G08 F * * ad F090 G08 F * * ac F09 G08 F * * ad F09 G08 F * * bc F093 G08 F * * ac F094 G08 F * * bd F095 G08 F * * bd F096 G08 F * * ac Like the individuals, variables can be in any order and columns may contain data not used by MATESOFT as the user will tell the program what columns to read in and what variables they represent. For the same reason the actual column headings used in the file don t matter. In the following list of variables, R indicates that an analogous variable is used in RELATEDNESS (small r if it can be used as a demographic variable in this program), and numbers 4 show whether the variable is relevant for the F, FQ, FQM or QM data type, respectively: Variable Type Explanation Ind-ID 34 R Individual ID. A unique label for each individual in the data set. Individuals having several alternative genotypes listed (see below) should still have a unique ID for each of their alternatives, despite the alternatives represent the same, true individual. K-Group 34 R Group. Use the same name for all individuals pertaining to the same queen. All group names should be given in the list in the group data section. Class 34 r Class. Type of individual, being either queen ( Q ), mate ( M ) or female offspring ( F ). Only capital letters are allowed. There is no distinction between putative male mates and sperm scored. AltQ/M 3 r Alternative Queen/Mate. This is used to distinguish between possible alternative genotypes of the same, true individual. Read more about this variable in the sections on FQ and FQM data. Alt-P 3 Probability of Alternative. This is the weighted probability of the alternative so that all probabilities for a true individual sum to one. MateSoft.Documentation.(6).doc/JSP

10 0 Locus# 34 R Locus Name #. Loci should be given in the same order as in the list of loci and allele frequencies in the group data section. PatriQ# 3 r Patrilines for Queen Alternative #. Every mate and his putative offspring (i.e. a patriline) are assigned the same number. There is one column of such paternity assignments for each alternative genotype of the queen (#). See the section on FQM data for details. As for demes, groups, loci, and alleles variable names for individual IDs can be any combination of up to eight letters and figures, case-sensitive and not including spaces or punctuation characters. The variables Alt and Alt-P are not relevant for offspring, and PatriQ# has no use in queens and mates. In these cases and if the variables are not in the last columns * should be entered as a place holder to ensure correct loading of data. Diploid genotypes for queens and offspring are entered as the names of the two alleles scored separated by a special character, e.g. 56/59. Haploid genotypes for male mates are entered simply as the allele scored. All alleles in the genotypic data should be defined with their name and population (deme) frequencies in the group data section. A missing genotype is indicated by?/? or just?, independent of the ploidy of the individual. Having only one of two alleles scored for diploids like 56/? is not allowed. Both the character used as allele delimiter and the missing value indicator can be defined by the user in the file configuration menu, the default characters being / and?, respectively. Alternatively, no allele delimiter may be used if all alleles are given by single characters as in the example above from {Ex-FQM.in.txt}. The individuals can be given in any order and the data rows can be separated by comment lines (preceeded by * ) for easier inspection of the file. The following size limitations for the input data apply: Number of Limit Demes * Loci 00 Alleles per locus 00 Groups (queens) * Offspring per group 4000 Queen alternatives per group 00 Mates per group 000 Alternative patrilines per offspring 00 Individuals in total * *Limited only by the available memory in your computer. These limitations should not be of any practical importance. 4. F Data: Deducing Queens from Known Offspring The input for F analysis is genotypes of female offspring organised in groups of sisters (F data) and MATESOFT outputs a modified data file with possible genotypes of mother queens appended. The algorithms applied are described in Chapter B. MateSoft.Documentation.(6).doc/JSP

11 The input data should include the following variables in the individual genotype data section: Ind-ID, K-Group, Class, and Locus#. The deduction of queen genotypes works for any number of fathers that have sired the offspring in a brood, and if the maternal genotype cannot be deduced unequivocally, alternative queen genotypes are listed in the output with associated probability weights. See the section on FQ data for more about alternative genotypes and their probabilities. In case offspring genotypes can only be explained by a combination of different mothers, the brood is excluded from further analysis and a warning about polygyny is given. 4. The Power of Correct Deduction of Queen Genotypes The algorithm for deducing queen genotypes assumes that a sufficient number of offspring is analysed per brood so that at least one copy of both queen alleles are present in the brood. MATESOFT calculates the probability that this assumption is true, both per group and for the over-all data, and outputs the values, e.g. Your power of correctly deducing all queen genotypes is (see section B.). Groups with low deduction power should be omitted from further analysis. 4. Narrow and Broad Deduction of Queen Genotypes In the menu launching the F analysis (section 9) the user decides to apply one of two algorithms for deducing the queen genotypes: the narrow deduction option (default) or the broad deduction option. In both cases the algorithms works locus by locus to construct the maternal genotype from offspring data, but they differ in their assumption regarding the number of times the queen may have mated. Here, the narrow deduction option always assumes single mating of the queen when this can explain the offspring genotypes at the locus considered. However, data on other loci may show that the queen is in fact multiple mated, meaning that this assumption was too restrictive and alternative queen genotypes are possible. This option will always subsequently lead to the smallest number of mates possible and correct assignment of patrilines although the algorithm may give wrong queen and mate genotypes at single loci. The broad deduction option in principle assumes that single and multiple matings are equally likely and allows for all possible queen genotypes to be deduced. If it turns out from the analysis of other loci that multiple mating was not needed to explain the data, the FQ analysis will include some queen alternatives with too high mate numbers. For parsimony, it may be preferred to consider only the queen alternative(s) with the smallest number of mates. In that case, queen alternatives with more than the minimum number of mates should be removed manually by the user from the extended output data before further analysis. Alternatively, if it turns out that the queen is in fact multiple mated the correct queen genotype will be included among the alternative queen genotypes with associated mates. The algorithms are described in section B.. The general advise on what deduction option to apply is: if the exact genotypes of queens and fathers are of importance for the study, then use the broad deduction and be critical about the possible number of mates; if the parental genotypes are of no importance, go for the narrow deduction instead. MateSoft.Documentation.(6).doc/JSP

12 4.3 Checking Mendelian Segregation of the Queen s alleles If the putative queen is heterozygous at a given locus MATESOFT calculates the probability that the queen s alleles follow Mendelian segregation in the offspring and gives the value in the output file. Refer to section B..3 for the formulae. These probabilities should be inspected carefully before further analysis as low values may indicate that important assumptions of the analysis are violated. One possibility is that more than one mother has contributed to the offspring (polygyny) although monogyny is assumed. Another posibility is that the queen is mated more times than the minimum number needed to explain the offspring genotypes. An obvious example of such violation is the following: Consider ten offspring of genotypes,,,,,,,,, and 3. As monogyny is assumed, the queen genotype will be deduced to or 3 (queen type..3.), although such segregation of the queen s alleles in combination with the mate s allele is as low as (0!/(9!!))(0.5) 0 (0.) Rather, the brood has several mothers with the last offspring having a different mother than the first nine. The output is in the following format, here with suggestions of what action to take if the probability is regarded as too low to be acceptable: Group G0, Locus Loc Queen type...a : Mendelian probability for only single mating of queen. Interpretation and action: Single mating can explain the data. However, if the Mendelian probability is too low, then allow for more matings by applying the broad instead of the narrow deduction option in a new F analysis. Group G04, Locus Loc Queen type...4a : Mendelian probability for only double mating of queen. Interpretation and action: Double mating can explain the data. However, if the Mendelian probability is too low, then allow for more matings by applying the broad instead of the narrow deduction option in a new F analysis. Group G08, Locus Loc Queen type.. : Mendelian probability for monogynous group. Interpretation and action: A single queen mother can explain the data. However, if the Mendelian probability is too low, then inspect the offspring genotypes to detect the offspring of a possible alien mother. Either purge the data from these offspring or the safest option skip this group in further analysis. The Mendelian probability can t be applied as a P value in statistical hypothesis testing where values < 5% normally would lead you to discard the null hypothesis. What to adopt as critical values for the Mendelian probabilities should be based on biological more than statistical considerations, i.e. you should ask the question: how biologically likely is it that my first assumptions about the number of mothers and fathers were correct? For example, if your experimental setup makes it biologically impossible that more queens are involved, then you should accept the analysis even when a low probability for a monogynous group is the result. MateSoft.Documentation.(6).doc/JSP

13 3 4.4 The Extended Output Data File The extended output data file can be used as input data file for the FQ analysis. Inspect the file in advance, and modify this or the configuration file according to the subsequent analysis. Only relevant variables from the original data are included in the new genotype data section, so the position of columns may have changed. The default order of variables in the extended data file from F analysis is: Ind-ID, Group, Class, AltQ/M, Alt-P, Locus#. The files {Ex-F.in.txt}, {Ex4-F.in.txt}, {Ex-F.out.txt} and {Ex4-F.out.txt} exemplify the input and output data from F analysis. 5. FQ Data: Deducing Fathers and Assigning Patrilines The input for FQ analysis is genotypes of queens and their female offspring organised in groups (FQ data). MATESOFT outputs a modified data file with possible genotypes of queens mates and patriline assignment of each offspring appended. The relevant algorithms can be found in Chapter C. When both queens and offspring are analysed genetically the FQ data file is prepared by the user. When only offspring are scored the input file is produced by MATESOFT in the F analysis by appending putative queen genotypes. The input data should include the following variables in the individual genotype data section: Ind-ID, K-Group, Class, AltQ/M, Alt-P, and Locus#. When the queen genotype is deduced from the offspring several multilocus genotypes may be compatible with the brood data. If there are no such alternatives, the only possible genotype is just labelled under the variable AltQ/M (for alternative queen or mate genotypes) with the associated probability.0000 for the variable Alt-P. In the case of alternative queen genotypes, these are labelled by ordinals, i.e.,, 3 and so on with their probabilities summing to unity. The probability calculation in Alt-P is based on the population frequencies of the alleles involved and/or the allele segregation among the offspring. These variables are generated by MATESOFT for the extended data file when processing F or FQ data. MATESOFT will deduce genotypes of paternal males and assign the offspring to patrilines for any number of queen matings. If an offspring genotype is incompatible with the genotype of the queen analysed a warning about possible polygyny is given in the output. Such offspring or the complete group it belongs to should be removed from the data set by the user before further analysis. The output data file can be used for estimating mating frequency statistics. Inspect the file in advance, and modify this or the configuration file according to the subsequent analysis. Only relevant variables from the original data are included in the new genotype data section, so the position of columns may have changed. The default order of variables in the extended data file from FQ analysis is: Ind-ID, Group, Class, AltQ/M, Alt-P, Locus#, PatriQ#. The files {Ex-FQ.in.txt}, {Ex-FQ.in.txt}, {Ex4-FQ.in.txt}, {Ex-FQ.out.txt}, {Ex- FQ.out.txt}, and {Ex4-FQ.out.txt} exemplify the input and output from FQ analysis. MateSoft.Documentation.(6).doc/JSP

14 4 6. FQM Data: Mating Frequency Statistics The main data type for estimation of mating frequency statistics is genotypes of queens, their putative mates and their female offspring organised in groups (FQM data). MATESOFT outputs summary statistics for each group and over-all estimates of average paternity skew, proportion of multiple mated queens, and effective mate number. The input file can either be prepared by the user or produced by MATESOFT in the FQ analysis. The input data should include the following variables in the individual genotype data section: Ind-ID, K-Group, Class, AltQ/M, Alt-P, Locus#, and PatriQ#. As for queens in FQ data, alternative genotypes of putative mates that may occur in FQM data are indicated by use of the AltQ/M variable. All mates are labelled in the format A/B where A is the queen alternative he may have mated and B is the label used for him and his associated patriline, both given by ordinals. That is, / designates the mate that sired the offspring of patriline, given the queen is alternative. Contrary to queens, mates being alternatives to each other have identical AltQ/M values but are distinguished by the program by their different individual IDs. If not present in the input file, this variable is generated by MATESOFT for the output data when processing F or FQ data. This notation is illustrated by data from the file {Ex-FQM.in.txt}. First a simple case Individual genotype data section starts below this line * Here starts computer generated section of queen genotypes *Male genotypes in the below section are computer generated *Ind-ID K-Group Class AltQ/M Alt-P loci... Q G0 Q.0000 aa M G0 M /.0000 a where both the genotypes of the queen and her single mate are scored unambiguously, i.e. with no alternatives. Then a more complex example *Ind-ID K-Group Class AltQ/M Alt-P loci... { } Q G09 Q ab Q3 G09 Q 0.57 ac Q4 G09 Q bc M8 G09 M /.0000 c M9 G09 M / a M0 G09 M / b M G09 M /.0000 b M G09 M / a M3 G09 M / c M4 G09 M 3/ b M5 G09 M 3/ c M6 G09 M 3/.0000 a where the double mated queen can be one of three genotypes (each with associated probability), and one of the queen s two putative mates has two alternative genotypes. For example, if Q3 is the true maternal genotype (queen alternative ), she is mated to M siring patriline and either M or M3 siring patriline. Finally about the patriline variable PatriQ#. All offspring possibly sired by a putative male mate (i.e. a patriline) are labelled with the same number, and ordinals are assigned to MateSoft.Documentation.(6).doc/JSP

15 5 patrilines in the order that these are encountered among the offspring, i.e. the first patriline encountered is labelled, the second, etc. This numbering is done within each alternative queen and different queen alternatives may have been mated to a different number of putative mates (e.g. see group Col03 in the example file {Ex4-FQM.in.txt}). Both the putative mates and the paternity assignment depends on what queen genotype is assumed, so each queen alternative has an associated paternity assignment for the offspring. Consequently, the number of datafile columns with patrilines corresponds to the maximum number of queen alternatives in the overall data. Patriline variables are created by MATESOFT when analysing FQ data. In some cases, an offspring cannot be exclusively assigned to a patriline but is the possible daughter of one among several candidate fathers. The relevant patrilines are then listed separated by commas, e.g. 4, as for the offspring F07 in *Ind-ID K-Group Class AltQ/M Alt-P Hal54 Pha Pha77 { } Q0 Col36 Q c/f a/k a/f Q Col36 Q c/f a/m a/f Q Col36 Q c/f a/a a/f M8 Col36 M /.0000 h a a M8 Col36 M /.0000 k a h M83 Col36 M / k a d M84 Col36 M / k k d M85 Col36 M / e a a M86 Col36 M / e m d M87 Col36 M / c a d M88 Col36 M /.0000 h a a M89 Col36 M /.0000 k a h M90 Col36 M / k k d M9 Col36 M / e a a M9 Col36 M / e a d M93 Col36 M / e m d M94 Col36 M / c a d M95 Col36 M 3/.0000 h a a M96 Col36 M 3/.0000 k a h M97 Col36 M 3/ k k d M98 Col36 M 3/ e a a M99 Col36 M 3/ e m d M00 Col36 M 3/ c a d F07 Col36 F * *?/? a/a a/f 4, 4, 4, F07 Col36 F * * c/k a/k d/f F073 Col36 F * * f/k a/k d/f F074 Col36 F * * e/f a/m a/d F075 Col36 F * * e/f a/a a/a F076 Col36 F * * c/h a/a a/f F077 Col36 F * * c/k a/a a/h F078 Col36 F * * c/c a/a a/d F079 Col36 F * * c/k a/a f/h F080 Col36 F * * f/k a/a f/h F08 Col36 F * * c/h a/a a/a taken from the example file {Ex4-FQM.in.txt}. This means that both male and 4 are possible fathers of this offspring. The genotype and hence the individual ID label of these males depend on what queen alternative is assumed: M8,M85; M88,M9; and M95,M98 for queen alternative, and 3, respectively. The files {Ex-FQM.in.txt}, {Ex-FQM.in.txt}, and {Ex4-FQM.in.txt} exemplify the input to FQM analysis. MateSoft.Documentation.(6).doc/JSP

16 6 7. QM Data: Mating Frequency Statistics The alternative data type for estimation of mating frequency statistics is genotypes of queens and their putative mates, based on genetic analysis of the queen and sperm stored in their spermathecae (QM data). MATESOFT outputs summary statistics for each group and over-all estimates of proportion of multiple mated queens and effective mate number. The current version of the program is limited to a maximum of two matings per queen. The input data should include the following variables in the individual genotype data section: Ind-ID, K-Group, Class, AltQ/M, Alt-P, Locus#, and PatriQ#. Note that by principle all Alt-P for this data type as all parental alleles have been scored with certainty. When mate genotypes are deduced by sperm typing and several alleles are scored at several loci multilocus genotypes of each male cannot be deduced. For example, if the sperm contains alleles a and b at the first locus and c and d at the second locus, then the two males can be either ac and bd or ad and bc, respectively. However, whatever interpretation of the multilocus genotypes is used in the input data it has no influence on the calculations. The reason is that the estimation of the error of not identifying a multiple mating is based on the genotypes of putative single mates only. The files {Ex-QM.in.txt} and {Ex-QM.out.txt} exemplify the input and output for FQM analysis. 8. File Configuration Menu Before any analysis can be done the user has to define the configuration of the input file. This is done by opening and filling in the menu under File File configuration. The configuration can be saved for later application and loaded from file by using the File File configuration File menu. The name extension for configuration files is cfg (see the example files provided). After loading the configuration file {Ex-F.cfg} the menu will look like this: MateSoft.Documentation.(6).doc/JSP

7 See section 3. for a description of the variables and types of allele coding. If a variable is not relevant in for the subsequent analysis, enter 0 or simply leave the column number blank.

The output file to be named is the general output file different from the file with extended data from the F or FQ analysis.

17 7 See section 3. for a description of the variables and types of allele coding. If a variable is not relevant in for the subsequent analysis, enter 0 or simply leave the column number blank. Under Count give the number of loci analysed that follow in consequtive columns starting with the column indicated. The output file to be named is the general output file different from the file with extended data from the F or FQ analysis. The general output file is almost identical to the screen output when running the analyses. In F and FQ analysis some, but not all, of this output is rather technical and included for testing purposes, and as such it can be ignored by the average user. In FQM and FQ analysis this file contains the mating frequency statistics calculated. 9. Queen and Mate Deduction Menu This self-explanatory menu is used to launch analysis of F or FQ data. See section 4 for a discussion on what deduction option to apply for F analysis. When ready to run the data in {Ex-F.in.txt} it looks like this: MateSoft.Documentation.(6).doc/JSP

8 0. Mating Frequency Statistics Menu Use this menu to launch estimation of mating frequency statistics based on FQM or QM data. When ready to run the calculations for {Ex-FQM.in.txt} it looks like this 0.

18 8 0. Mating Frequency Statistics Menu Use this menu to launch estimation of mating frequency statistics based on FQM or QM data. When ready to run the calculations for {Ex-FQM.in.txt} it looks like this 0. Data with Single and Double Mating of Queens The paternity skew c of a double mated queen is the proportion of the offspring sired by the mate having the largest contribution to the brood. The user can decide to let MATESOFT calculate the average paternity skew c ( c-bar ) by the estimator in Pedersen and Boomsma (999a) or to input a predefined value. When QM data is analysed, the skew cannot be estimated from the data and the value given by the user is used always (0.5 is default). Furthermore, this option may be relevant if the user wants to examine the effect on other statistics of a range of possible skews. See also the section about the output and the FAQ on data with no detected multiple matings. Two methods can be used to provide dispersal measures of the estimates: jackknifing over groups for standard errors and bootstrapping by groups for confidence limits. The statistical properties of these measures are not yet fully investigated and caution should be taken in their interpretation. We recommend that jackknifing and bootstrapping is only used for relative large datasets, i.e. data that includes minimum five groups in each category of detected single and double matings, respectively. The option Maintain M/M ratios in BS restricts the bootstrap replicates to have the same number of single and double mated groups as in the original data. This is included for testing purposes and is expected to give a more accurate confidence interval for c as this statistic is based on the sample of double mated groups only. 0. Data with Three or More Queen Matings If the data contain groups where three or more matings of the queen have been detected the calculation option FQM 3+ matings should be used. This will produce summary mating frequency statistics that can be presented directly or used for further analysis but not an integrated correction of sampling and detection errors. It should be mentioned that MATESOFT is able to load data with high mate numbers and perform the calculations for single double mating systems using the FQM - matings MateSoft.Documentation.(6).doc/JSP

19 9 option. In this case the program will merge all patrilines numbered and above to a single patriline. If three or more queen matings are rare in the population, if additional mates only contribute little to the brood, and if the most common patriline always holds the majority of offspring, then it may be recommended instead to analyse the data as if only single and double matings occurred. In that case D est should be understood as the estimated proportion of multiple mated queens in the population. Currently no procedures are implemented for calculating dispersal measures of the statistics for 3+ matings.. Mating Frequency Statistics Output As the calculation procedures differ also the output varies according to data type and the number of queen matings observed.. FQM Data with Single or Double Mating of Queens The following over-all statistics are given in the output as exemplified by the file {Ex- FQM.out} Estimation of paternity skew Average number of offspring in double mated groups n Observed pi for double mated groups, pi Corrected average paternity skew, c-bar Deviation from target value [abs(pi_double-pi) Paternity skew directly calculated from data, c.obs { } Estimation of proportion of double mating: summary Average weighted nonidentification error f' Observed proportion of double mated queens, D.obs Estimated proportion of double mated queens, D.est Average pedigree effective mate number, m.e,p.3933 The calculation procedures follow Pedersen and Boomsma [, 999 #754] and are described in more detail in sections D. 3. Furthermore, when the user has chosen bootstrapping and jackknifing of the statistics the following are included: 95% confidence limits c-bar: f': D.obs: D.est: m.e,p: { } Jacknifed average SD sample size c-bar: f': D.est: D.obs: m.e,p: The wide confidence intervals indicate that for this dataset (many) more queens have to be analysed to obtain accurate estimates of D est and m e,p, which is partly due to a high error in MateSoft.Documentation.(6).doc/JSP

20 0 identifying double mated queens (f ). Comparing the bootstrapped confidence limits and jackknifed SDs further gives the more general message that the estimates are not likely to follow the t-distribution, and that care should be taken in statistical tests assuming that the true sample SE equals the jackknifed SD. The files {Ex-FQM.out} and {Ex3-FQM.out} exemplify the output from FQM analysis of data with single or double mating of queens... Special Advice on Estimation of Paternity Skew When MATESOFT estimates the average paternity skew c the method of Pedersen and Boomsma (999a) is applied, and the average observed skew is given for comparison. Both values should be inspected as in some extreme cases of sampling errors or limited data the estimation will generate misleading results. One case is when patrilines are more equal in frequency than expected by random sampling given c 0.5 (e.g. 3-3, -, and 5-4 offspring sampled from the first and second mate, respectively, of each of three queens). Here, c cannot get low enough for the expected values to fit, and the program outputs c close to 0.5 with a large difference abs(pi_double pi). This is still correct as 0.5 is then the best estimate for c. The other extreme is when patriline is so rare that it is never represented by more than one offspring in any group. This leads to an estimation of c, as the rarer patriline is, the more likely it is that just a single individual from this patriline was sampled. However, this usually leads to absurd estimates of D est exceeding one, assuming more hidden double matings than the number of detected single mated queens. In this case the best estimate of c is the observed skew. The user should then take this value and recalculate the mating frequency statistics by applying the Use value option.. QM Data with Single or Double Mating of Queens The slightly different output from analysis of queen and sperm genotypic data is exemplified by the file {Ex-QM.out}: Estimation of proportion of double mating: summary Average weighted nonidentification error f' Observed proportion of double mated queens, D.obs Estimated proportion of double mated queens, D.est Average pedigree effective mate number, m.e,p { } 95% confidence limits c-bar: f': D.obs: D.est: m.e,p: { } Jacknifed average SD sample size f': D.est: D.obs: MateSoft.Documentation.(6).doc/JSP

21 m.e,p: Note that there is no calculation of c as this is fixed by the user. The specific calculation of f for queen and sperm genotypic data is given in section D.4, otherwise refer to sections D FQM Data with Three or More Queen Matings For each group the number of offspring (n) assigned to each patriline is given in a table and the sum of squared paternity contributions (π given as pi ) is calculated like in this example from the file {Ex4-FQM.out}: Groupwise statistics { } Pline Sum(w) Sum(w<) n y Group Col36/QQ pi0.77 (n) Group Col36, overall pi0.730 In this case a single offspring can belong to both patrilines and 4, and consequently its assignment is weigted ( Sum(w<) ) according to the relative frequencies of these patrilines based on offspring with unambiguous assignments ( Sum(w) ). The groupwise π is calculated corrected for sample size following the formula of Pamilo (993) given here as Eqn. D.5.3. If a group has alternative queens, π for each alternative is calculated along with the overall value for the group weighted for the probabilities of alternative queen genotypes. Furthermore, summary statistics is produced including the average π over all groups (π ) and the average number of matings detected ( k ) based on the frequency distribution of the number of patrilines per group. If alternative queens differ in the number of patrilines found their contribution to the frequency distribution is weighted according to the probabilities of the alternatives ( weighted k ). Alternatively, minimum k calculates the average number of matings detected based on the smallest number of matings found per group. Summary statistics Average pi over all groups pi0.454 Frequency of observed mate number k minimum weighted Average minimum k Average weighted k MateSoft.Documentation.(6).doc/JSP

22 See section D.5 for the application of these statistics in estimating the average effective mate number. The file {Ex4-FQM.out} exemplifies the output from FQM analysis of data with three or more queen matings.. Troubleshooting and Special Use FAQ Q: It doesn t work! Why do I just get stupid error messages? A: Hard to tell. Best place to start is to make sure that MATESOFT works with the example file that fits your data type. If it does, make a copy of the example file and modify it to contain your own data instead. Remember to modify the file configuration accordingly, if needed. Errors typically arise from non-printing characters like spaces and tab stops ending up the wrong places in the data file and being hard to catch. Q: I didn t detect any multiple matings at all. What can I do to get something interesting out of the data anyway? A: Then have a look at the calculation methods suggested in section D.6. Q: I don t have a large and independent sample of individuals for estimating the population allele frequencies. Actually, the individuals in this study are the only ones I ve got. What can I do? A: First you should realise that every group may represent as few as three haploid genomes (given the queen is single mated), so the basic problem is sample size and you won t get a good estimate of population allele frequencies unless many broods are analysed. The best option is simply to calculate the allele frequencies in the offspring, weighting groups equally. This provides unbiased estimates but with a large variation, as paternal alleles are counted double compared to maternal ones. However, this is preferred to other methods involving the deduction of maternal and paternal alleles as they have the shortcoming that the frequency of common alleles are underestimated. Q: My data are a mess: the number of offspring varies a lot between groups and many genotypes are not complete for all loci. Is this a problem? A: No! Estimations and analyses will work correctly anyway based on the available data. Just make sure that no offspring lacks scoring at all loci, and that all groups have some offspring (minimum one) scored at each particular locus. Q: I ve analysed the queens but wasn't able to score all queens at all loci. Do I just indicate the gaps as missing genotypes in the FQ data file? A: No, for the FQ analysis every queen should have a complete multilocus genotype. Take the offspring groups with incomplete queens and run a separate F analysis on this part of the data to deduce the missing genotypes. Then use the output to fill in the gaps in your original data set. Q: I ve scored the genotypes of queens, offspring and sperm. Can I take advantage of having both sperm and brood data? A: Based on the sperm typing you may be able to exclude some of the alternative mate genotypes in the FQM data. Then the FQM and QM analyses should be carried out as usual, and you ll have the possibility to compare the mating frequency statistics based on MateSoft.Documentation.(6).doc/JSP

Labs 7 and 8: Mitosis, Meiosis, Gametes and Genetics

Biology 107 General Biology Labs 7 and 8: Mitosis, Meiosis, Gametes and Genetics In Biology 107, our discussion of the cell has focused on the structure and function of subcellular organelles. The next