Aster Modeling of Chamaecrista fasciculata
|
|
- Simon Jones
- 5 years ago
- Views:
Transcription
1 Aster Modeling of Chamaecrista fasciculata Allen Clark February 9, 2018 Abstract Aster models are a type of graphical model where each node is modeled by an exponential family distribution. In biological applications, aster models are well suited for estimating the fitness of a species. In this report, the species of interest is Chamaecrista fasciculata, commonly known as the partridge pea. Fitness is assessed by the number of seeds each plant produced. Our model includes both fixed effects for each parameter in the graphical model and random effects to account for the genetic variability of each plant s parentage. We fit a Bayesian aster model to sample from the posterior distribution via Markov chain Monte Carlo (MCMC). The prior distributions were elicited via subject matter expertise and additional data on Chamaecrista fasciculata from another growing location, both provided by collaborators in the evolutionary biology group. Contents 1 Introduction 2 2 Aster Models Exponential Families Aster Graphs Requirements for the Aster Graph An Aster Graph Example Conditional and Unconditional Models Saturated Aster Models and Aster Submodels Aster Model Transformations Random Effects Breeding Values Pedigrees Avoiding Inverting the Numerator Relationship Matrix Data Why C. fasciculata? C. fasciculata Component of Fitness Data Pedigree Data Bayesian Analysis Log Likelihood Log Priors Fixed Effect Parameters
2 5.2.2 Random Effect Parameters Prior Elicitation Computation via MCMC Why MCMC? Markov Chains Monte Carlo Markov Chain Monte Carlo Metropolis Algorithm Proposal Distributions Metropolis Ratio Metropolis Rejection MCMC spacing: saving memory Checkpointing MCMC Central Limit Theorem Variance MCMC Diagnostics Time Series Plots Auto-correlation Function Plots Results 32 1 Introduction In evolutionary biology, one way to measure an organism s success at passing down genetic information to future generations is by counting the lifetime number of offspring produced. Plants or animals that produce more offspring will contribute more to the genetic makeup of the species in future generations. This counts as a reproductive success for the organism. On the other hand, organisms that do not produce any offspring will not directly contribute to the future of a species genetics. Following the terminology of Shaw et al. (2008, p. E35), this reports defines fitness as the lifetime number of offspring produced by an individual. 1 Note that two individuals with nearly the exact same genetics placed in the same environment can produce different numbers of offspring purely by random chance. Therefore, the fitness of an individual is a random variable. It can take on a range of values from zero to high and has an expected value which we call expected fitness. An organism must be able to survive to breeding periods to reproduce and organisms that reproduce more often will have higher observed fitness. Therefore, fitness is influenced by: 1. Longevity - An organism s ability to survive 2. Fecundity - An organism s frequency of reproduction Because fitness is count data, a naive researcher might estimate fitness with µ = x under a µ µx Poisson model which has probability mass function f(x) = e x!. Unfortunately, the easy way doesn t work. Figure 1 shows two problems with this Poisson approach. The observed data has a very heavy tail and too many individuals have zero offspring. Neither a Poisson distribution, nor any other known distribution can accommodate these features. The Poisson distribution considers fecundity (offspring produced), but longevity, another key aspect of fitness, has been left out of the equation. 1 Lifetime offspring count is sometimes called observed fitness to distinguish between an organisms ability to fit into it s environment and an observation of how well it can do so. See Beatty (1992, p ) for a discussion. 2
3 Simulated Poisson data Count Fitness Observed Fitness data Count Fitness Figure 1: Comparison of Poisson data versus fitness data: Poisson is not a good fit. Top: Simulated Poisson data. Bottom: Real fitness data from Chamaecrista fasciculata. Note the heavy tail and frequent occurrence of zero beyond what is expected from a Poisson distribution 3
4 A better approach is to model fitness conditioned on the survival and development of an organism throughout its fertile life. The overall fitness, which is the total number of offspring, can be broken down into components of fitness, which represent key measurements related to overall fitness. Examples of components of fitness include survival to fertility, the presence or absence of offspring, and the number of offspring produced in a given breeding period. Breaking down an organism s reproduction into its biologically relevant parts provides the model with the statistical flexibility it needs to produce valid results. The approach works as follows: 1. Identify the most important life cycle stages for an organism as components of fitness. 2. Connect these components of fitness by forming a graphical dependence structure based on how they influence each other biologically. 3. Decide on statistical distributions and parameters for each component of fitness. 4. Use data to estimate the parameters chosen. Prior to the introduction of aster models (Geyer et al., 2007), the practice was to model components of fitness separately, conditioned on survival. The sticking point was how to combine these separate analyses together to draw inference about fitness. Aster models offer a solution by directly modeling the joint distribution of all components of fitness and provide parameter estimates that directly correspond to fitness. Aster models are named for the first species to be analyzed using aster models, Echinacea angustifolia, a flowering plant in the aster family (Geyer et al., 2007). In aster models, the dependence structure between components of fitness can be represented visually in an aster graph, which for E. angustifolia is shown in figure 2. The dependence structure is the same across three years and consists of three layers: survival, flowering status, and flower count. If the plant survives, it has a chance to produce flowers. If the plant produces flowers, it can have any number of them. Survival status also depends on the survival status in the previous year, so arrows run across the survival layer from left to right. The terminal nodes, flower count, are proxies for offspring, so the sum over all terminal nodes is used as a proxy for fitness. Year 1 Year 2 Year 3 Bernoulli Bernoulli Bernoulli Initial = 1 Y 1 Y 2 Y 3 Plant Survival Layer Bernoulli Bernoulli Bernoulli F S 1 F S 2 F S 3 Flowering Status Layer 0-Poisson 0-Poisson 0-Poisson F C 1 F C 2 F C 3 Flower Count Layer Figure 2: Graphical dependence structure for E. angustifolia flower count data (Geyer et al., 2007) 4
5 2 Aster Models Aster models are exponential family graphical models. They are exponential family models in the sense that the conditional distribution of each component of fitness given its predecessor is an exponential family and in the sense that the joint distribution of all components of fitness is also an exponential family (see section 2.1). Aster models are also graphical models in that each component of fitness depends on the previous component of fitness in the aster graph (see section 2.2). There are six parameterizations available for aster models (see sections 2.3, 2.4, 2.5). 2.1 Exponential Families An exponential family is a category of distributions having a common form. Many well known distributions fit into this category (e.g. normal, binomial and Poisson). The advantage of exponential family distributions is to share theory. One only has to show that a given distribution is an exponential family and all the properties proven about exponential family distributions apply. Suppose X is the raw data, and Y (X) is a k dimensional statistic calculated from the raw data X. An exponential family is defined as any distribution with probability density function (PDF), probability mass function (PMF), or probability mass-density function (PMDF) 2 of the form ( k ) f θ (x) = h(x) exp y i (x)θ i c(θ) (1) Here h(x) is a function of data only. The statistic Y (X) is called the canonical statistic when it is in the form of equation 1. Likewise, the parameter θ is called the canonical parameter when it is in the form of equation 1. The function c(θ) is called the cumulant function. The key to exponential families is that the statistic Y (X), the parameter θ and the functions h( ) and c( ), are not allowed to include both data and the parameter. For example, an indicator function such as h(x) = I {x<θ} violates the separation of data and parameter rule. Following from equation 1, the log likelihood of an exponential family is l(θ) = log h(x) + = i=1 k y i θ i c(θ) i=1 k y i θ i c(θ) i=1 dropping terms without θ The cumulant function c( ) is useful because it allows calculation of the mean of Y. Assuming that θ Interior (Θ) so that the canonical parameter θ is in the interior of its parameter space, exponential families have the property that µ = E θ {Y } = c(θ) Var θ (Y ) = 2 c(θ) The parameter µ also parameterizes an exponential family (Geyer, 2016). Since µ is also a parameter, the exponential family can use µ as a parameter in place of θ. Of course, this transformation will then require a matching transformation of Y in the distribution. The parameterization involving µ is so useful that is gets its own name, the mean-value parameterization. 2 When some components of a random vector are discrete and others are continuous, the distribution of the random vector is partly discrete and partly continuous. (2) (3) 5
6 The mean-value parameter is useful for drawing inference in an applied problem. The canonical parameter is useful because it allows use of theorems about exponential families and can be used for maximum likelihood estimation. The aster model estimation is accomplished using the canonical parameter which is then mapped to the mean-value parameter to make conclusions about applied problems. Important exponential family distributions for aster models include Bernoulli: Ber (p) Normal: N (µ, Σ) Negative-binomial: NegBin (r, p) Poisson: Pois (µ) Zero-truncated Poisson: 0-Pois (µ) The zero-truncated Poisson distribution is a Poisson distribution with the value zero cut out as a possible value. The PMF is ( ) ( 1 e µ µ x ) f µ (x) = 1 e µ y = 1, 2,... (4) x! Notice the second factor is the usual Poisson distribution. The first factor scales the distribution by the probability that x 0. Models using zero-truncated Poisson distributions can avoid the inflated zeros issue in figure 1 with a two-step setup. A Bernoulli variable models the zero values, and a zero-truncated Poison models the number observed given that the Bernoulli variable is one. 2.2 Aster Graphs Aster models have a graphical dependence structure that can be expressed visually in an aster graph. There are four rules for creating an aster graph (Geyer et al., 2007). 1. Nodes are random variables for components of fitness. 2. Edges are conditional distributions for components of fitness. 3. Predecessors are sample sizes. 4. Initial nodes are constant random variables. An aster graph relationship like X f θ(y x) Y (5) means that component of fitness X has direct influence on component of fitness Y. The conditional distribution is X iid Y X = Y i where Y i fθ (y i x) (6) i=1 where θ is the canonical parameter in the f θ distribution. Exponential family distributions have a special formula for the sum of n independent and identically distributed (IID) random variables. If Y 1,..., Y n are IID with the same exponential family distribution and cumulant function c(θ), then the sum n i=1 Y i is again an exponential family distribution with canonical statistic n i=1 Y i, canonical statistic θ, and cumulant function n c(θ) (Geyer, 2013, deck 2, slide 22). This gives a convenient way to find the conditional distribution of Y X in the aster graph. For many exponential family distributions, the resulting sum will be a well known distribution. For example, if Y 1, Y 2,..., Y n are IID, then the distribution of n i=1 Y i is 6
7 Bin (np) if Y i iid Bin (p) Pois (nµ) if Y i iid Pois (µ) N (nµ, σ 2 ) if Y i iid N (µ, σ 2 ) For other exponential family distributions, such as the zero-truncated Poisson, the sum does not follow any well known distribution. Regardless, the distribution of the sum of IID exponential family random variables is always an exponential family distribution with cumulant function nc(θ) Initial nodes must be constant to denote the sample size of the first non-degenerate random variable in the aster graph. For example, if an experimenter planted three seeds and then established a component of fitness to see if the seeds germinate, the first arrow in the graph would be 3 Ber(p) Germinate where Germinate would be the sum of three independent Ber(p) random variables, which by the Bernoulli sum rule is a single Bin(3, p) random variable. Components of fitness are often zero. When a predecessor variable is zero, the successor is an empty sum of zero terms. By convention this is also zero. Each node in the graph comes together to form the joint distribution for all the components of fitness. Suppose we have n nodes non-initial nodes X 1, X 2,..., X n in the aster graph. Then the joint distribution of X = {X 1, X 2,..., X n } is f(x 1, X 2,..., X n ) = n nodes i=1 f(x i X p(i) ) (7) where p(i) is the index of the predecessor node that comes immediately before X i. Notice that the initial nodes are not among the factors in this product. Immediate successors of initial nodes will have conditional distribution f(x i X p(i) ) = f(x i ) Requirements for the Aster Graph Aster models place some requirements on the aster graph (Geyer et al., 2007). At most one predecessor: Each node has at most one predecessor. The initial node has no predecessors. Acyclic: The aster graph must be acyclic. Without this property, the joint distribution cannot be factored into a product of conditionals as in equation 7. An acyclic graph is necessary to obtain closed form densities and likelihoods. Initial node is constant: An initial node must be constant. It still represents the sample size of the succeeding random variable. Suppose we plant seeds in a garden. If we place X initial = 3 seeds in a single slot, then there are three random variables (seeds) that can either sprout, or not sprout. If X initial = 1, we only plant one seed in the slot and either one or zero plants can sprout from that seed An Aster Graph Example To gain more intuition, let s apply these rules to a simple example. Suppose we want to know how many seeds a plant produces and we use three components of fitness: an initial node X initial, the number of flowers F, and the total number of seeds S the plant produces. For simplicity 7
8 suppose we plant one plant in each growing slot, so X initial = 1. Plants can t produce seeds without flowers so it makes sense for F to be the predecessor of S. A plant can have any number 0, 1, 2, 3,... of flowers so F follows a Poisson distribution. Depending on what type of plant we are modeling, flowers have at least one seed. If each flower must have at least one seed, then S F follows a zero-truncated Poisson distribution. Using this model we obtain the aster graph that follows in equation 8: 1 Pois(µ F ) F 0-Pois(µ S) S (8) With the aster graph above fully specified, the conditional distribution of seeds given flower count S F is concentrated at zero if F = 0 a zero-truncated Poisson if F = 1 the sum of n IID zero-truncated Poisson random variables if F = n > 1 In the first case, the plant has no flowers so F is zero. Hence, there are no seeds. Thus f(s F = 0) is a degenerate distribution concentrated at value zero. In the second case, the plant has one flower, so F = 1. We calculate f(s F = 1) as the distribution of single zero-truncated Poisson random variable. In the third case, F > 1 so we get a proper summation. We have to take the summation of IID zero-truncated Poisson random variables over all the flowers that the plant produced. 2.3 Conditional and Unconditional Models There are two different ways to parameterize the same saturated aster model (see section 2.4): through a conditional model or through an unconditional model. These conditional and unconditional models relate to equation 7, which we now repeat here. f ϕ (X 1, X 2,... X nnodes ) = n nodes i=1 f θi (X i X p(i) ) The right hand side is a product of conditional PMDFs. Each random variable X i X p(i) has conditional distribution f θi (X i X p(i) ), which is an exponential family with canonical parameter θ i. The joint distribution, which is obtained by taking the product, is also an exponential family distribution. The joint parameter θ = (θ 1, θ 2,..., θ nnodes ) is a valid parameter for the joint distribution, but is not the joint distribution canonical parameter ϕ The left hand side is a single unconditional PMDF. The joint random vector (X 1, X 2,..., X nnodes ) has an unconditional joint distribution f ϕ (X 1, X 2,..., X nnodes ) which is an exponential family with canonical parameter ϕ. The conditional distributions, which are obtained by factoring, are also exponential family distributions. Each conditional parameter ϕ i is a valid parameter for the corresponding conditional distribution f ϕi (X i X p(i) ), but is not the conditional distribution canonical parameter θ i. The relationship between θ and ϕ is determined by a mapping called the aster transform. The aster transform inv. aster transform aster transform maps θ ϕ. The inverse aster transform maps ϕ θ. Section 2.5 presents this mapping in more detail. The conditional model allows biologists to model the relationships between components of fitness that specify the aster graph(see figures 2 and 6). Both the conditional model and the unconditional model may be useful for inference depending on the application. 8
9 2.4 Saturated Aster Models and Aster Submodels So far we have discussed the aster model for a single organism. We now expand this to allow data on many individuals. In the most general case, each individual in the study may have its own aster graph. This can be handled mathematically by the framework presented in Geyer et al. (2007) and computationally with the tools included in the aster2 package (Geyer, 2017b). However, this is more general than necessary for either the E. angustifolia example (fig 2) or the C. fasciculata data (fig 6). If we assume all individuals share the same aster graph, the unconditional canonical statistic Y is a vector of length n nodes n ind. Likewise, ϕ and θ are vectors of length n nodes n ind. This means that the model given by the aster graph using θ or ϕ is saturated. It has as many parameters as there are data points available and has no degrees of freedom available. Aster models deal with the saturated model issue the same way that linear models and generalized linear models do; an affine submodel reduces the dimension of the problem thus gives back degrees of freedom. Model Type Saturated Model Affine Submodel Linear Regression Y X N(µ, σ 2 I n ) µ = o + Mβ Logistic Regression Y X Bin(n, p) logit(p) = o + M β Aster Models Y X ExpFam(ϕ) ϕ = o + Mβ. In each case, the saturated model, which is specified by giving a probability model to the target variable Y X, has too many parameters to estimate directly. Rather, some function of the parameter in the saturated model is reduced in dimension by mapping to o + Mβ. The dimension is now reduced to the number of columns of M, which gives back degrees of freedom. In linear or logistic regression, the model matrix M tracks covariate information. Continuous variables in X may become columns in M directly, categorical variables are converted to binary columns in M, and interaction columns or alternative basis functions may be included as desired. In aster models, the model matrix tracks component of fitness data in addition to other covariates. This is accomplished by treating the component of fitness as an indicator variable, and appending n nodes 1 binary columns to the model matrix M. For an affine submodel for aster models, ϕ = o + Mβ (9) the offset vector o is a known vector that may optionally depend on covariate information or be left out entirely. The β is called the canonical submodel parameter. 2.5 Aster Model Transformations So far we have introduced three parameterizations for aster models, conditional, unconditional and submodel. θ, ϕ and β are the canonical parameters for the conditional model, unconditional model, and submodel respectively. Aster models parameters can also take mean-value form rather than canonical form. Like canonical parameters, mean-value parameters exist for all three model types, conditional, unconditional and submodel. In total, there are 2 3 = 6 possible aster model parameterizations; first choose the parameters role, mean-value or canonical. Next choose the model type, unconditional, conditional or submodel. The mean-value parameter for the conditional model uses parameter ξ, defined as ξ i = E(Y i Y p(i) = 1) (10) 9
10 Conditional Model dim: n ind n nodes Unconditional Model dim: n ind n nodes Submodel dim: n coef Canonical Parameters Mean-value Parameters no closed form aster transform ϕ = o + Mβ θ ϕ β ξi = c i(θi) inv. aster transform β = M 1 (ϕ o) multiplication µ = Mτ ξ µ τ division no closed form µ = c(ϕ) τ = M 1 µ no closed form τ = csub(β) Figure 3: Parameters The ξ i can be obtained from θ i with ξ i = c i (θ i). The mean-value parameter for the unconditional model, µ is defined similarly as µ = E(Y ) (11) From exponential family theory, we have µ = E(Y ) = c(ϕ). Finally, the mean-value parameter for the submodel, τ, is the expected value of Y mapped through the submodel matrix. That is, τ = M T E(Y ) = M T µ (12) The six parameters θ, ϕ, β, ξ, µ, τ are displayed in figure 3. The conditional model and unconditional model parameters all have dimension n ind n nodes, since there is one component for each aster graph node per individual. The submodel parameters have dimension n coef, the number of columns in the model matrix M. The relationships between each of the six parameters are indicated by the arrows in figure 3. The aster transform converts the conditional canonical parameter θ to the unconditional canonical parameter ϕ. When we express the factorization of the log likelihood using θ we get l(θ; y) = n nodes i=1 l(θ i ; y i ) (13) Using the fact that the successor node Y i is the sum of Y p(i) IID random variables and applying the exponential family summation rule this becomes ( nnodes l(θ; y) = log exp ( y i θ i y p(i) c i (θ i ) )) = n nodes i=1 i=1 ( yi θ i y p(i) c i (θ i ) ) (14) This sum is linear in y. The y p(i) terms can be grouped into non-random initial node data and random non-initial node data. Let J be the set of random non-initial nodes in the aster graph. 10
11 Then the result matches the unconditional canonical parameterization for ϕ. l(θ; y) = n nodes i=1 y i = y ϕ c(ϕ) θ i c j (θ j ) y p(j) c j (θ j ) j J j J p(j)=i (15) The last step determines the aster transform by setting the i th component of ϕ to and the unconditional cumulant function to ϕ i = θ i j J p(j)=i c(ϕ) = j J c j (θ j ) (16) y p(j) c j (θ j ) (17) The inverse aster transform relies on back-solving the aster transform for θ i in terms of ϕ i. At terminal nodes, there are no successors so ϕ i = θ i. The approach is to start with terminal nodes and work towards the initial node to solve for θ i in terms of the components of ϕ. This specifies the inverse aster transform via θ i = ϕ i + j J p(j)=i c j (θ j ) (18) where θ j has already been determined from the previous step in the back-solving process. The relationship between µ and ξ is that of multiplication and division. µ i = E(Y i ) = E{E(Y i Y p(i) )} = E{y p(j) ξ i } = ξ i E{y p(i) } (19) = ξ i µ p(i) This process can be repeated until the evaluation reaches an initial node. 3 Random Effects In aster models, the components of fitness already account for variability via the variance of the conditional models for each component of fitness in the aster graph. Additional sources of variability come from explicit random effects introduced into the model. Specifically, we extended the submodel to allow for random effects. Recall equation 9, ϕ = o + Mβ. We now revise this submodel to include random effects. ϕ = o + Mβ + Za (20) As before, o is an offset term. But now there are two model matrices M and Z. M is the model matrix for fixed effects. Z is the model matrix for random effects. β and a are vectors 11
12 representing the fixed and random effects of the model. This formulation of random effect aster models was introduced in Geyer et al. (2013). Our purpose for random effects is to estimate the variability in fitness that arises from the genetic differences between individuals. Quantitative genetics literature (Lynch and Walsh, 1998; Wilson et al., 2010) provides a random effect called breeding value for this purpose. Here we take a to the breeding values, a vector with one component per individual. 3.1 Breeding Values Breeding values retain a dependence structure specifying how much genetic information is shared between individuals. This structure is determined by the numerator relationship matrix N and a scalar variance component σa 2 known as the additive genetic variance. Then the vector of breeding values a follows a normal distribution as a N ( 0, σ 2 AN ) (21) The numerator relationship matrix indicates how much genetic similarity there is between two individuals. This is measured via family relationships between individuals such as parent, child, sibling, half-sibling, etc. When there is no inbreeding (i.e. an individual s parents are unrelated) a parent and a child will share half of their genetic information since the child inherits half of its genetic information from each parent. Likewise, siblings also share half of their genetic information. The full rules for computing N under the most general circumstances can be found on page 763 in equations (26.16a) and (26.16b) of Lynch and Walsh (1998). For our purposes, we will leave out the possibility of inbreeding and limit the family structure to two generations. In the parent generation all individuals are thought to be unrelated. In the offspring generation, covariance is determined by the family relationships discussed in section 3.2. Under these assumptions, the possible relationships between individuals are reduced to parent, child, sibling, half-sibling, and unrelated. With these assumptions, the amount of genetic information shared between individuals i and j is represented by n ij and can be computed with the rules: Individual i with itself: n ii = 1 i is the parent of j: n ij = 1 2 i is a child of j: n ij = 1 2 i and j are siblings: n ij = 1 2 i and j are half-siblings: n ij = 1 4 i and j are unrelated: n ij = Pedigrees The family relationships between all the individuals in the study is known as a pedigree and can be visualized with a family tree diagram. These diagrams use the terminology sire and dam for the father and mother, as is common in quantitative genetics. The mathematical notation follows from this with s(i) and d(i) representing the father (sire) and mother (dam) of individual i. The pedigree for the data in this project has two generations: a parent generation without component of fitness data, and a offspring generation with component of fitness data. The parent generation is used only for the pedigree. There are two assumptions about the pedigree data: 12
13 m1 m2 m3 m4 Example Pedigree Parent Generation Offspring Generation Figure 4: A pedigree with a parent generation and an offspring generation. Blue squares denote sires, red circles denote dams, and black diamonds denote offspring. 1. There is no inbreeding between individuals. 2. Individuals either have two parents in the pedigree or none. An example pedigree is shown in figure 4. There are 11 individuals: 3 sires, 3 dams, and 5 offspring. For this example, the numerator relationship matrix would be /2 1/ /2 1/2 1/ /2 1/ / /2 N = /2 1/2 1/ /2 1/ /2 1/ /2 1 1/ /2 1/ /4 1/4 1 1/ /2 1/ / /2 1/ Since our data only includes component of fitness data for the offspring, we can drop the parents from the matrix. The reduced numerator relationship matrix which only includes information on the five offspring individuals 7, 8, 9, 10, 11 is the bottom right 5 5 block matrix of the original. 1 1/2 1/ /2 1 1/4 0 0 N = 1/4 1/4 1 1/ / To summarize, the pedigree graph provides a visual tool with which to calculate the numerator relationship matrix N. This is then used to fully specify the breeding values as a N(0, σ 2 A N), where dim (a) = n ind and dim (N) = n ind n ind. 13
14 3.3 Avoiding Inverting the Numerator Relationship Matrix With the breeding values fully specified, this information can be incorporated into the log likelihood. l(ϕ) = l(o + Mβ + Za) + l(a) = l(o + Mβ + Za) at σ 2 A N 1 a log ( σ 2n ind A Det (N) ) (22) The N 1 in equation 22 poses a problem. Inverting N is difficult when n ind is large, and it usually is. This prompts a factorization of a that avoids inverting N. The key is that the dependencies in a involve offspring depending on their sire and dam. Since the joint distribution of a is normal, the univariate conditionals a i a s(i), a d(i) must also be univariate normal. In Geyer (2012, p. 3), it is shown that ( ) as(i) + a d(i) a i N, σ2 A (23) 2 2 Thus the PDF of a can be factored as f(a) = ( ) ( ( ) 1 2 ) exp a2 i 1 2ai a s(i) a d(i) 2πσA 2σ 2 exp i F A πσa 2σ 2 i F A where F is the first generation of parents in the pedigree data. If we assume that there is no component of fitness data on this generation, then all individuals in the likelihood come from offspring generations. This further simplifies the PDF of a to f(a) = ( ( ) 2 ) 1 2ai a s(i) a d(i) exp πσa 2σ 2 (25) i F A This avoids inverting the numerator relationship matrix. The additive genetic variance parameter σa 2 is sometimes divided into the variance for the genetic information coming from the sire, dam and individual itself. This is expressed by (24) σ 2 A = σ 2 ind + σ 2 sire + σ 2 dam (26) where 4 Data σ 2 ind = σ 2 A/2 σ 2 sire = σ 2 A/4 σ 2 dam = σ 2 A/4 (27) The primary data used in this project records information on Chamaecrista fasciculata grown at McCarthy Lake, MN. A second population of C. fasciculata grown at the Grey Cloud Dunes is used for prior elicitation. Both data sets feature component of fitness information and a pedigree used to compute the numerator relationship matrix N. 4.1 Why C. fasciculata? There are several reasons why the C. fasciculata plant is well suited for a Darwinian fitness study. 14
15 Annual plants - C. fasciculata are annual plants, which means that each generation of plants lives only one year. This allows experimenters to collect a full generation of data in only one year. In contrast, experimenters are often not able to collect complete data on perennial plants because the lifespan can be too long. Primarily outcrossing - C. fasciculata are primarily out-crossing, as opposed to self fertilizing. Out-crossing allows plants to create more genetic diversity by passing genes on to different plants. In the case of C. fasciculata, a mechanical mechanism is in place to prevent self fertilization. Self fertilization is a complication for individual model research because it can mask the pedigree of an individual. It can be difficult to determine whether an offspring plant was self fertilized by its sole parent or crossed by a sire and dam plant. Out-crossing is a desirable trait for individual model research because it avoids this confusion and offers a clear pedigree. In the case of C. fasciculata, bees land on the flowers to collect nectar. The buzzing of the bees will shake pollen free and stick to the bee. The bee will carry the pollen to another plant to fertilize the female ovules. In this study, researchers used a device similar to an electric toothbrush to shake the pollen free from the plants. The pollen was given to the desired dam in order to produce the desired pedigree. Perfect flowers - C. fasciculata have perfect flowers, which means that a flower includes both male and female reproductive organs. This allows for easier determination of sire and dam plants. Low seed dormancy tendency - C. fasciculata have little tendency for seed dormancy. In some species, seeds can remain dormant in the soil for multiple years before germination. This adds an additional complication in determining the pedigree in individual models. It is possible that C. fasciculata seeds from previous years found their way into the soil at the experiment growing sites. If these seeds were to sprout it would mix plants from an older generation and unknown parentage with the current generation of plants from the experiment. If this were to happen, the data set would not contain accurate records of the sire and dam. Fortunately, seed dormancy is uncommon for C. fasciculata plants so the recorded sire and dam in the data set can be trusted. Natural range - The natural habitat for C. fasciculata stretches into Minnesota at the northern limit of the range (figure 5). If this were not the case, estimates of Darwinian fitness would be artificially low, since plants would still need to adapt to the new environment. Furthermore, observing non-native plants would complicate genetics by environment interactions. As a native species, C. fasciculata escapes these concerns. 4.2 C. fasciculata Component of Fitness Data The components of fitness chosen for C. fasciculata are based on its life cycle and reproduction. The first (non-initial) component of fitness is germination, a Bernoulli random variable indicating whether a seed successfully germinates and sprouts out of the ground. As a plant continues to grow, it may produce flowers, giving a plant a chance to reproduce. The flower status of a plant is also taken as a Bernoulli random variable and serves as the second component of fitness. Each flower then has the potential to produce fruit. For C. fasciculata this fruit is a pea-pod and is referred to as a pod in this report. The number of pods produced must be at least one. Therefore, it is reasonable to model the pod count as a zero-truncated Poisson random variable. This becomes the third component of fitness. Like all fruit, the C. fasciculata pods contain the seeds of the plant. The total number of seeds across all the pods is called the total seed count and is modeled by the fourth and final component of fitness in the aster graph as a zero-truncated 15
16 Figure 5: Native Range of C. fasciculata. Image from plants.usda.gov. Poisson random variable. The resulting graph is visualized in figure 6 along with pictures of a C. fasciculata plant at each stage. Root Germination Flowering Status Pod Count Seed Count Ber Ber 0-Pois 0-Pois 1 G FS PC SC Planted Figure 6: C. fasciculata aster graph 4.3 Pedigree Data The pedigree is divided into the parent generation, for which there is no component of fitness data, and the offspring generation, for which there is component of fitness data. The parent generation contains 48 sires (fathers) and 132 dams (mothers). The offspring generation contains component of fitness and pedigree data on 3445 individuals. 16
17 5 Bayesian Analysis Bayesian analysis works by Bayes rule (Bayes and Price, 1763). A prior distribution represents the belief about the parameter before data is observed. A likelihood distribution specifies the model by which the data is generated. The prior and likelihood combine to make the posterior distribution, which gives an updated belief about θ after the data has been observed. prior likelihood {}}{{}}{ L(θ; X) P (θ) P (θ X) = }{{} L(θ; X) P (θ)dθ posterior }{{} normalizing constant (28) The normalizing constant, L(θ; X)P (θ)dθ is constant in θ because θ has been integrated out. Since the posterior distribution is a valid density, it must integrate to one. The normalizing constant is the constant factor making the posterior to integrate to one. As is standard practice, we relax this equation by dropping multiplicative factors that do not include θ. The normalizing constant in equation 28 is one such factor, but the likelihood and priors may contain other multiplicative factors not involving θ. In this way, any function of data only can be dropped. The posterior is now unnormalized, but this will not cause difficulties. P (θ X) }{{} unnormalized posterior = L(θ; X) }{{} likelihood P (θ) }{{} unnormalized prior (29) In simple cases, the priors may be chosen to be the conjugate family of the likelihood. If this is the case, then the posterior distribution is in the same distribution family as the prior. Analysis can then proceed analytically. In practice, conjugate priors are often undesirable or infeasible because they may not reflect the subject matter knowledge about a parameter or the likelihood may be a more complicated function having no known conjugate prior. Markov chain Monte Carlo (MCMC) is an alternative approach that provides a way to approximate the posterior distribution numerically. Taking the log of the posterior can often reduce difficulties with computer arithmetic overflow. On the log scale, the same multiplicative constant that were ignored in equation 29 become additive constants that may be safely ignored in the log likelihood. The equation becomes log P (θ X) }{{} = l(θ; X) }{{} + log(p (θ)) }{{} log unnormalized posterior log likelihood log unnormalized prior (30) Under the Bayesian viewpoint, uncertainty is handled by random variables. Since the parameters built into a model are uncertain, a Bayesian maintains that parameters are random variables. Likewise, latent variables and random effects hold uncertainty to their values. The Bayesian addresses this by treating random effects as unknown parameters which are themselves random variables. From this point onward, this report takes the Bayesian view using the terminology random effect parameters for the breeding values a. With this clarification on the Bayesian perspective, we now turn to the likelihood and priors in more detail. 17
18 5.1 Log Likelihood The log likelihood likelihood l(ϕ) = l(o + Mβ + Za) is computed via the minimal function in the animal package 3. Since a N(0, σa 2 N) the log likelihood can be expressed as l(ϕ) = l(β, a, σ 2 A) (31) The minimal function is written to be used with additional random effects, but can be adapted to fit an aster model with only a breeding value random effect. minimal assumes that the model has three random effect parameters, one for individual, sire, and dam random effect components. Adapting this to a model with a single random effect parameter for breeding value requires the constraint that σ 2 A = σ 2 ind + σ 2 sire + σ 2 dam where σ 2 ind = σ 2 A/2 σ 2 sire = σ 2 A/4 σ 2 dam = σ 2 A/4 The first random effect aster paper (Geyer et al., 2013) discusses another complication in the likelihood. We would like to allow random effect variance parameters to be zero. Modeling the standard deviation parameters instead allows the standard deviation to be negative or zero and removes the restriction that variance be positive. In addition, the minimal function avoids problems with taking the square root of negative numbers by modeling the additive genetic standard deviation instead of the additive genetic variance (Geyer et al., 2013, p. 1783). The resulting log likelihood becomes 5.2 Log Priors l(ϕ) = l(β, a, σ A ) = l(β, a, σ ind, σ sire, σ dam ) (32) The parameters in this model come from the aster submodel, ϕ = o + Mβ + Za. β is the parameter vector for fixed effects. It includes both component of fitness parameters and block effect parameters. The random effect parameter vector a includes one breeding value for each individual in the data. a has one associated variance parameter, σa 2, which is the additive genetic variance. Rather than model σa 2 directly, we chose to model the standard deviation σ A. Priors were chosen under the assumption of independence between the components Fixed Effect Parameters Each component of β has an independent logistic distribution prior. The logistic distribution is a location-scale family. We use µ hp to represent the location parameter and σ hp to represent the scale parameter, where the subscript hp indicates that µ hp and σ hp are hyper-parameters and is not the same as µ or σ used elsewhere in this report. f µhp,σ hp (β i ) = e (βi µ hp)/σ hp σ hp (1 + e (βi µ hp)/σ hp) 2 (33) Here µ hp really is the mean of β i, but σ hp is not the variance. Rather, Var(β i ) = σ2 hp π2 3 Computation is performed via the dlogis(..., log=true) function in R. 3 The minimal function computes what frequentists know as the complete data log likelihood, i.e. the log likelihood if the random effects a could be observed. Under the Bayesian perspective, this is just the log likelihood. 18
19 5.2.2 Random Effect Parameters The only random effect parameter vector is the breeding value, specified by a N(0, σa 2 N). Since N, the numerator relationship matrix is known, the only other parameter for random effects is the additive genetic variance, σa 2. In principle, the additive genetic variance could be zero or some positive number, so this must be reflected in the choice of the prior distribution for breeding values. The PDF of an exponential distribution, f λ (x) = λ exp λx has support on the positive real numbers. Moreover, lim x 0 f λ (x) = λ > 0. As a result, it is realistic to observe samples from an exponential distribution with value arbitrarily close to zero. This matches the modeling assumptions. Rather than model the additive genetic variance σa 2 directly, we use the additive genetic standard deviation σ A. If σa 2 Exp(λ) (34) then the change of variable formula gives Taking the log and dropping additive constant terms gives f λ (σ A ) = λe λσ2 A 2σA (35) logprior(σ A ) = log (σ A ) λσ 2 A (36) When the log prior for σ A is programmed in R, care must be taken to ensure computer arithmetic errors do not occur. For example, if equation 36 were used as is, R would encounter an overflow error when σ A is too large. R would compute equation 36 as logprior(σ A ) = log (σ A ) λσ 2 A = log(inf) λinf 2 = Inf Inf = NaN but the desired behavior is logprior(σ A ) = -Inf, since lim σa logprior(σ A ) =. solution is to return -Inf whenever the result of logprior(σ A ) is NaN or Inf. (37) One Prior Elicitation Researchers grew C. fasciculata plants at four different growing locations in Minnesota. The McCarthy Lake growing site is the primary location of interest. Consequently, the analysis was performed using the data from the McCarthy Lake site. Biologists consider growing locations at Grey Cloud Dunes to be most similar to those at McCarthy Lake. This makes Grey Cloud Dunes data useful for prior elicitation. The means and variances of each fixed effect were calculated on the Grey Cloud Data, then transformed into the µ hp and σ hp used as hyper-parameters for the logistic distribution fixed effects. Researchers came up with a point estimate for the σ A additive 1 genetic variance parameter to be used for the random effect. Then setting λ = point est specifies the prior distribution for σ A. 6 Computation via MCMC We used Markov chain Monte Carlo to sample from the posterior distribution. The code in this report uses the metrop function in the R package mcmc, which implements the Metropolis random-walk algorithm. This section follows a discussion of MCMC from Geyer (2011) and explains how this was implemented for the aster model fit on C. fasciculata data. 19
20 6.1 Why MCMC? Both Bayesian and frequentist approaches commonly encounter integrals that can not be solved analytically. One place where these integrals arise is the likelihood normalizing constant. Another place is in drawing inference. Two common examples are the posterior expectation E (θ X) = θ XP (θ X)dθ and marginal posterior distributions, which are obtained by integrating out Θ undesired components of θ from the posterior distribution. MCMC offers the ability to avoid computing these integrals. Integrating the normalizing constant is avoided with a cancellation trick. Integrals that arise in the inference step can be approximated using the samples produced by MCMC rather than numerical integration. Aster models with random effects must address these issues. With a handful of random effects, aster models may use approximate integrated likelihood with the Breslow-Clayton approximation (Breslow and Clayton, 1993). This approach assumes that the likelihood is nearly quadratic in the random effect parameters, approximates the likelihood with the with a form that can be solved analytically. This is implemented in the R package aster (Geyer, 2017a) through the function reaster. However, in high dimensional settings, the Breslow-Clayton approximation breaks down and other methods are needed. Geyer et al. (2013, p. 1793) point out the Breslow- Clayton approximation is not workable for quantitative genetic models with one random effect parameter per individual. Numerical integration could offer a solution to these integrals, but this technique comes with its own issues. In equation 28 we saw that the posterior distribution can be decomposed into the likelihood times the prior divided by the normalizing constant. P (θ X) = Θ L(θ; X) P (θ) L(θ; X) P (θ)dθ Computing the normalizing constant L(θ; X) P (θ)dθ requires integrating over each dimension of the vector θ. We will illustrate the computational complexity of this integration under Θ the simplest case possible, when each component of θ is binary. Suppose dim(θ) = d. Then the integral becomes a summation over the d dimensions of θ. L(θ; X) P (θ)dθ = L(θ 1,..., θ d ; X) P (θ 1,..., θ d ) (38) Θ θ i=0,1 i=1,2,...,d There are d components of θ taking 2 values each, so there will be 2 d terms in the sum. This is an exponential time algorithm and quickly gets out of hand. For instance, if d = 30, there are over a billion terms in the sum. This phenomenon of rapidly increasing computation with the dimension of θ is known as the curse of dimensionality and it prevents direct computation of the integral in a reasonable amount of time. In cases where components of θ are continuous rather than binary, numerical integration techniques such as the trapezoid rule or Simpson s rule suffer even more profoundly from the curse of dimensionality. These methods create a grid over the dimensions of θ. If there are k grid points in each dimension, the computation will run in exponential time O(k d ). More advanced numerical integration methods such as sparse grids (Heiss and Winschel, 2008) can improve on this running time to polynomial in d, but this still does not address the need to integrate both normalizing constant and other integrals needed for inference. Another approach peculiar to the Bayesian perspective is to use conjugate priors, a prior distribution chosen for the property that the posterior and the prior belong to the same family of distribution. For many simple likelihood distributions, the posterior can be determined analytically to belong to the same distribution family. For instance, when the likelihood is a binomial 20
21 distribution, setting the prior to follow a beta distribution guarantees that the posterior is also a beta distribution. However, no conjugate prior distributions are known for aster models with random effects for each individual. Consequently, we turn to MCMC to draw samples from the posterior distribution without actually knowing what that distribution is. The MCMC sampling algorithm uses a cancellation trick to produce samples from the posterior distribution without computing the normalizing constant. 6.2 Markov Chains From a Bayesian perspective, data is not random once it has been seen. Instead, the uncertainty in the model comes from the parameter vector θ. Thus θ, not X is a random. A Markov chain is a sequence of random vectors θ 1, θ 2,... having the property that the conditional distribution only depends on the most recent state in the chain, not on all the previous states. That is, P (θ n+1 θ 1, θ 2,..., θ n ) = P (θ n+1 θ n ) n = 1, 2, 3,... (39) Equation 39 describes the memoryless property, named because the process that determines θ n+1 has no memory of the previous states θ 1,... θ n 1. If further we have that P (θ 2 θ 1 ) = P (θ 3 θ 2 ) =... = P (θ n+1 θ n ) then the Markov chain has stationary transition probabilities. Markov chains with the this stationary transition probabilities are determined by two simpler distributions, the initial distribution P (θ 1 ) and the transition probability distribution P (θ n+1 θ n ). When the initial distribution and transition probability distribution interact so that the marginal distributions are equal P (θ 1 ) = P (θ 2 ) = = P (θ n ) we say that the Markov chain is in equilibrium and P (θ i ) is the equilibrium distribution. Often in MCMC simulations, the main interest for inference is not on the Markov chain itself, but on a particular function on the state space, g( ). That is, Raw MC: θ 1, θ 2,... Functional MC: g(θ 1 ), g(θ 2 ),... (40) 6.3 Monte Carlo A Monte Carlo method is a way of understanding a random variable θ through simulated data θ 1, θ 2,... θ n. The simplest case is known as ordinary Monte Carlo, where θ 1, θ 2,... iid f θ (θ). Monte Carlo uses the law of large numbers to show that the sample average will converge to its expected value. That is, for a function g( ), g(θ) n = 1 n n g(θ i ) E(g(θ)) (41) i=1 Since the samples θ i on the left hand side are readily available, Monte Carlo is effective when the expectation involves an integral that is difficult to compute numerically. Furthermore, the central limit theorem gives an approximation to the normal distribution as ) g(θ) n N (E(g(θ)), σ2 (42) n 21
Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016
Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016 1 Theory This section explains the theory of conjugate priors for exponential families of distributions,
More informationPart 8: GLMs and Hierarchical LMs and GLMs
Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course
More informationStat 8931 (Aster Models) Lecture Slides Deck 8
Stat 8931 (Aster Models) Lecture Slides Deck 8 Charles J. Geyer School of Statistics University of Minnesota June 7, 2015 Conditional Aster Models A conditional aster model is a submodel parameterized
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationStat 8931 (Exponential Families) Lecture Notes Aster Models Charles J. Geyer December 5, 2016
Stat 8931 (Exponential Families) Lecture Notes Aster Models Charles J. Geyer December 5, 2016 1 Introduction 1.1 Aster Models and Other Statistical Models Aster models (Geyer, Wagenius, and Shaw, 2007;
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationMotivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University
Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined
More informationMachine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014
Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,
More informationPart III. A Decision-Theoretic Approach and Bayesian testing
Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationStat 451 Lecture Notes Numerical Integration
Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationAster Models for Life History Analysis
Aster Models for Life History Analysis Charles J. Geyer School of Statistics University of Minnesota Stuart Wagenius Institute for Plant Conservation Biology Chicago Botanic Garden Ruth G. Shaw Department
More informationConditional probabilities and graphical models
Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within
More informationLecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011
Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector
More informationAnswers and expectations
Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E
More informationIntroduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016
Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More information3 Independent and Identically Distributed 18
Stat 5421 Lecture Notes Exponential Families, Part I Charles J. Geyer April 4, 2016 Contents 1 Introduction 3 1.1 Definition............................. 3 1.2 Terminology............................ 4
More informationSTATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero
STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero 1999 32 Statistic used Meaning in plain english Reduction ratio T (X) [X 1,..., X n ] T, entire data sample RR 1 T (X) [X (1),..., X (n) ] T, rank
More informationClass Copy! Return to teacher at the end of class! Mendel's Genetics
Class Copy! Return to teacher at the end of class! Mendel's Genetics For thousands of years farmers and herders have been selectively breeding their plants and animals to produce more useful hybrids. It
More informationBayesian GLMs and Metropolis-Hastings Algorithm
Bayesian GLMs and Metropolis-Hastings Algorithm We have seen that with conjugate or semi-conjugate prior distributions the Gibbs sampler can be used to sample from the posterior distribution. In situations,
More information2 Inference for Multinomial Distribution
Markov Chain Monte Carlo Methods Part III: Statistical Concepts By K.B.Athreya, Mohan Delampady and T.Krishnan 1 Introduction In parts I and II of this series it was shown how Markov chain Monte Carlo
More informationDavid Giles Bayesian Econometrics
David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian
More informationStatistical Data Analysis Stat 3: p-values, parameter estimation
Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationE. Santovetti lesson 4 Maximum likelihood Interval estimation
E. Santovetti lesson 4 Maximum likelihood Interval estimation 1 Extended Maximum Likelihood Sometimes the number of total events measurements of the experiment n is not fixed, but, for example, is a Poisson
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationLecture 9. Short-Term Selection Response: Breeder s equation. Bruce Walsh lecture notes Synbreed course version 3 July 2013
Lecture 9 Short-Term Selection Response: Breeder s equation Bruce Walsh lecture notes Synbreed course version 3 July 2013 1 Response to Selection Selection can change the distribution of phenotypes, and
More informationHeredity.. An Introduction Unit 5: Seventh Grade
Heredity.. An Introduction Unit 5: Seventh Grade Why don t you look like a rhinoceros? The answer seems simple --- neither of your parents is a rhinoceros (I assume). But there is more to this answer than
More informationAnalysing geoadditive regression data: a mixed model approach
Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationIntroduction to Probabilistic Graphical Models
Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationPart 2: One-parameter models
Part 2: One-parameter models 1 Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes
More informationThe Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations
The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture
More information1 Probabilities. 1.1 Basics 1 PROBABILITIES
1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationABC methods for phase-type distributions with applications in insurance risk problems
ABC methods for phase-type with applications problems Concepcion Ausin, Department of Statistics, Universidad Carlos III de Madrid Joint work with: Pedro Galeano, Universidad Carlos III de Madrid Simon
More informationBayesian Regression (1/31/13)
STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed
More informationChapter Eleven: Heredity
Genetics Chapter Eleven: Heredity 11.1 Traits 11.2 Predicting Heredity 11.3 Other Patterns of Inheritance Investigation 11A Observing Human Traits How much do traits vary in your classroom? 11.1 Traits
More informationRonald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California
Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University
More informationBayesian Graphical Models
Graphical Models and Inference, Lecture 16, Michaelmas Term 2009 December 4, 2009 Parameter θ, data X = x, likelihood L(θ x) p(x θ). Express knowledge about θ through prior distribution π on θ. Inference
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationIntroduction to Applied Bayesian Modeling. ICPSR Day 4
Introduction to Applied Bayesian Modeling ICPSR Day 4 Simple Priors Remember Bayes Law: Where P(A) is the prior probability of A Simple prior Recall the test for disease example where we specified the
More information1 Mendel and His Peas
CHAPTER 6 1 Mendel and His Peas SECTION Heredity 7.2.d California Science Standards BEFORE YOU READ After you read this section, you should be able to answer these questions: What is heredity? Who was
More informationDAG models and Markov Chain Monte Carlo methods a short overview
DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex
More informationModel comparison. Christopher A. Sims Princeton University October 18, 2016
ECO 513 Fall 2008 Model comparison Christopher A. Sims Princeton University sims@princeton.edu October 18, 2016 c 2016 by Christopher A. Sims. This document may be reproduced for educational and research
More information1 Probabilities. 1.1 Basics 1 PROBABILITIES
1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability
More informationBayesian Methods with Monte Carlo Markov Chains II
Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm 1 Part 3
More informationBayesian Inference. Chapter 4: Regression and Hierarchical Models
Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School
More informationMetropolis-Hastings Algorithm
Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to
More informationRepresent processes and observations that span multiple levels (aka multi level models) R 2
Hierarchical models Hierarchical models Represent processes and observations that span multiple levels (aka multi level models) R 1 R 2 R 3 N 1 N 2 N 3 N 4 N 5 N 6 N 7 N 8 N 9 N i = true abundance on a
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationOne-parameter models
One-parameter models Patrick Breheny January 22 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/17 Introduction Binomial data is not the only example in which Bayesian solutions can be worked
More informationIntroduction to Genetics
Introduction to Genetics The Work of Gregor Mendel B.1.21, B.1.22, B.1.29 Genetic Inheritance Heredity: the transmission of characteristics from parent to offspring The study of heredity in biology is
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationBayesian Inference. Chapter 4: Regression and Hierarchical Models
Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative
More informationProbability Review - Bayes Introduction
Probability Review - Bayes Introduction Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin Advantages of Bayesian Analysis Answers the questions that researchers are usually interested in, What
More informationEstimation of Operational Risk Capital Charge under Parameter Uncertainty
Estimation of Operational Risk Capital Charge under Parameter Uncertainty Pavel V. Shevchenko Principal Research Scientist, CSIRO Mathematical and Information Sciences, Sydney, Locked Bag 17, North Ryde,
More informationBayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference
Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Osnat Stramer 1 and Matthew Bognar 1 Department of Statistics and Actuarial Science, University of
More information9-1 The Work of Gregor
9-1 The Work of Gregor 11-1 The Work of Gregor Mendel Mendel 1 of 32 11-1 The Work of Gregor Mendel Gregor Mendel s Peas Gregor Mendel s Peas Genetics is the scientific study of heredity. Gregor Mendel
More informationRobert Collins CSE586, PSU Intro to Sampling Methods
Robert Collins Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Robert Collins A Brief Overview of Sampling Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling
More informationBayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference
1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE
More informationComputing Likelihood Functions for High-Energy Physics Experiments when Distributions are Defined by Simulators with Nuisance Parameters
Computing Likelihood Functions for High-Energy Physics Experiments when Distributions are Defined by Simulators with Nuisance Parameters Radford M. Neal Dept. of Statistics, University of Toronto Abstract
More informationSTA 294: Stochastic Processes & Bayesian Nonparametrics
MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a
More informationBayes: All uncertainty is described using probability.
Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationDirected Graphical Models
CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 Graphical Models If no assumption of independence is made, must estimate an exponential
More informationStat 542: Item Response Theory Modeling Using The Extended Rank Likelihood
Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal
More informationBiol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference
Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference By Philip J. Bergmann 0. Laboratory Objectives 1. Learn what Bayes Theorem and Bayesian Inference are 2. Reinforce the properties of Bayesian
More informationGuided Reading Chapter 1: The Science of Heredity
Name Number Date Guided Reading Chapter 1: The Science of Heredity Section 1-1: Mendel s Work 1. Gregor Mendel experimented with hundreds of pea plants to understand the process of _. Match the term with
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationProbability and Information Theory. Sargur N. Srihari
Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal
More informationEE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS
EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005 Instructor: Professor Jeff A. Bilmes Uncertainty & Bayesian Networks
More informationSampling Algorithms for Probabilistic Graphical models
Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationMA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Principles of Statistical Inference Recap of statistical models Statistical inference (frequentist) Parametric vs. semiparametric
More informationProbabilistic Graphical Networks: Definitions and Basic Results
This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical
More informationGeneral Bayesian Inference I
General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for
More information11.1 Traits. Studying traits
11.1 Traits Tyler has free earlobes like his father. His mother has attached earlobes. Why does Tyler have earlobes like his father? In this section you will learn about traits and how they are passed
More informationFigure 36: Respiratory infection versus time for the first 49 children.
y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects
More informationBiol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Fall 2016
Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Fall 2016 By Philip J. Bergmann 0. Laboratory Objectives 1. Learn what Bayes Theorem and Bayesian Inference are 2. Reinforce the properties
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More information1. What is genetics and who was Gregor Mendel? 2. How are traits passed from one generation to the next?
Chapter 11 Heredity The fruits, vegetables, and grains you eat are grown on farms all over the world. Tomato seeds produce tomatoes, which in turn produce more seeds to grow more tomatoes. Each new crop
More information