Aster Modeling of Chamaecrista fasciculata

Size: px
Start display at page:

Download "Aster Modeling of Chamaecrista fasciculata"

Transcription

1 Aster Modeling of Chamaecrista fasciculata Allen Clark February 9, 2018 Abstract Aster models are a type of graphical model where each node is modeled by an exponential family distribution. In biological applications, aster models are well suited for estimating the fitness of a species. In this report, the species of interest is Chamaecrista fasciculata, commonly known as the partridge pea. Fitness is assessed by the number of seeds each plant produced. Our model includes both fixed effects for each parameter in the graphical model and random effects to account for the genetic variability of each plant s parentage. We fit a Bayesian aster model to sample from the posterior distribution via Markov chain Monte Carlo (MCMC). The prior distributions were elicited via subject matter expertise and additional data on Chamaecrista fasciculata from another growing location, both provided by collaborators in the evolutionary biology group. Contents 1 Introduction 2 2 Aster Models Exponential Families Aster Graphs Requirements for the Aster Graph An Aster Graph Example Conditional and Unconditional Models Saturated Aster Models and Aster Submodels Aster Model Transformations Random Effects Breeding Values Pedigrees Avoiding Inverting the Numerator Relationship Matrix Data Why C. fasciculata? C. fasciculata Component of Fitness Data Pedigree Data Bayesian Analysis Log Likelihood Log Priors Fixed Effect Parameters

2 5.2.2 Random Effect Parameters Prior Elicitation Computation via MCMC Why MCMC? Markov Chains Monte Carlo Markov Chain Monte Carlo Metropolis Algorithm Proposal Distributions Metropolis Ratio Metropolis Rejection MCMC spacing: saving memory Checkpointing MCMC Central Limit Theorem Variance MCMC Diagnostics Time Series Plots Auto-correlation Function Plots Results 32 1 Introduction In evolutionary biology, one way to measure an organism s success at passing down genetic information to future generations is by counting the lifetime number of offspring produced. Plants or animals that produce more offspring will contribute more to the genetic makeup of the species in future generations. This counts as a reproductive success for the organism. On the other hand, organisms that do not produce any offspring will not directly contribute to the future of a species genetics. Following the terminology of Shaw et al. (2008, p. E35), this reports defines fitness as the lifetime number of offspring produced by an individual. 1 Note that two individuals with nearly the exact same genetics placed in the same environment can produce different numbers of offspring purely by random chance. Therefore, the fitness of an individual is a random variable. It can take on a range of values from zero to high and has an expected value which we call expected fitness. An organism must be able to survive to breeding periods to reproduce and organisms that reproduce more often will have higher observed fitness. Therefore, fitness is influenced by: 1. Longevity - An organism s ability to survive 2. Fecundity - An organism s frequency of reproduction Because fitness is count data, a naive researcher might estimate fitness with µ = x under a µ µx Poisson model which has probability mass function f(x) = e x!. Unfortunately, the easy way doesn t work. Figure 1 shows two problems with this Poisson approach. The observed data has a very heavy tail and too many individuals have zero offspring. Neither a Poisson distribution, nor any other known distribution can accommodate these features. The Poisson distribution considers fecundity (offspring produced), but longevity, another key aspect of fitness, has been left out of the equation. 1 Lifetime offspring count is sometimes called observed fitness to distinguish between an organisms ability to fit into it s environment and an observation of how well it can do so. See Beatty (1992, p ) for a discussion. 2

3 Simulated Poisson data Count Fitness Observed Fitness data Count Fitness Figure 1: Comparison of Poisson data versus fitness data: Poisson is not a good fit. Top: Simulated Poisson data. Bottom: Real fitness data from Chamaecrista fasciculata. Note the heavy tail and frequent occurrence of zero beyond what is expected from a Poisson distribution 3

4 A better approach is to model fitness conditioned on the survival and development of an organism throughout its fertile life. The overall fitness, which is the total number of offspring, can be broken down into components of fitness, which represent key measurements related to overall fitness. Examples of components of fitness include survival to fertility, the presence or absence of offspring, and the number of offspring produced in a given breeding period. Breaking down an organism s reproduction into its biologically relevant parts provides the model with the statistical flexibility it needs to produce valid results. The approach works as follows: 1. Identify the most important life cycle stages for an organism as components of fitness. 2. Connect these components of fitness by forming a graphical dependence structure based on how they influence each other biologically. 3. Decide on statistical distributions and parameters for each component of fitness. 4. Use data to estimate the parameters chosen. Prior to the introduction of aster models (Geyer et al., 2007), the practice was to model components of fitness separately, conditioned on survival. The sticking point was how to combine these separate analyses together to draw inference about fitness. Aster models offer a solution by directly modeling the joint distribution of all components of fitness and provide parameter estimates that directly correspond to fitness. Aster models are named for the first species to be analyzed using aster models, Echinacea angustifolia, a flowering plant in the aster family (Geyer et al., 2007). In aster models, the dependence structure between components of fitness can be represented visually in an aster graph, which for E. angustifolia is shown in figure 2. The dependence structure is the same across three years and consists of three layers: survival, flowering status, and flower count. If the plant survives, it has a chance to produce flowers. If the plant produces flowers, it can have any number of them. Survival status also depends on the survival status in the previous year, so arrows run across the survival layer from left to right. The terminal nodes, flower count, are proxies for offspring, so the sum over all terminal nodes is used as a proxy for fitness. Year 1 Year 2 Year 3 Bernoulli Bernoulli Bernoulli Initial = 1 Y 1 Y 2 Y 3 Plant Survival Layer Bernoulli Bernoulli Bernoulli F S 1 F S 2 F S 3 Flowering Status Layer 0-Poisson 0-Poisson 0-Poisson F C 1 F C 2 F C 3 Flower Count Layer Figure 2: Graphical dependence structure for E. angustifolia flower count data (Geyer et al., 2007) 4

5 2 Aster Models Aster models are exponential family graphical models. They are exponential family models in the sense that the conditional distribution of each component of fitness given its predecessor is an exponential family and in the sense that the joint distribution of all components of fitness is also an exponential family (see section 2.1). Aster models are also graphical models in that each component of fitness depends on the previous component of fitness in the aster graph (see section 2.2). There are six parameterizations available for aster models (see sections 2.3, 2.4, 2.5). 2.1 Exponential Families An exponential family is a category of distributions having a common form. Many well known distributions fit into this category (e.g. normal, binomial and Poisson). The advantage of exponential family distributions is to share theory. One only has to show that a given distribution is an exponential family and all the properties proven about exponential family distributions apply. Suppose X is the raw data, and Y (X) is a k dimensional statistic calculated from the raw data X. An exponential family is defined as any distribution with probability density function (PDF), probability mass function (PMF), or probability mass-density function (PMDF) 2 of the form ( k ) f θ (x) = h(x) exp y i (x)θ i c(θ) (1) Here h(x) is a function of data only. The statistic Y (X) is called the canonical statistic when it is in the form of equation 1. Likewise, the parameter θ is called the canonical parameter when it is in the form of equation 1. The function c(θ) is called the cumulant function. The key to exponential families is that the statistic Y (X), the parameter θ and the functions h( ) and c( ), are not allowed to include both data and the parameter. For example, an indicator function such as h(x) = I {x<θ} violates the separation of data and parameter rule. Following from equation 1, the log likelihood of an exponential family is l(θ) = log h(x) + = i=1 k y i θ i c(θ) i=1 k y i θ i c(θ) i=1 dropping terms without θ The cumulant function c( ) is useful because it allows calculation of the mean of Y. Assuming that θ Interior (Θ) so that the canonical parameter θ is in the interior of its parameter space, exponential families have the property that µ = E θ {Y } = c(θ) Var θ (Y ) = 2 c(θ) The parameter µ also parameterizes an exponential family (Geyer, 2016). Since µ is also a parameter, the exponential family can use µ as a parameter in place of θ. Of course, this transformation will then require a matching transformation of Y in the distribution. The parameterization involving µ is so useful that is gets its own name, the mean-value parameterization. 2 When some components of a random vector are discrete and others are continuous, the distribution of the random vector is partly discrete and partly continuous. (2) (3) 5

6 The mean-value parameter is useful for drawing inference in an applied problem. The canonical parameter is useful because it allows use of theorems about exponential families and can be used for maximum likelihood estimation. The aster model estimation is accomplished using the canonical parameter which is then mapped to the mean-value parameter to make conclusions about applied problems. Important exponential family distributions for aster models include Bernoulli: Ber (p) Normal: N (µ, Σ) Negative-binomial: NegBin (r, p) Poisson: Pois (µ) Zero-truncated Poisson: 0-Pois (µ) The zero-truncated Poisson distribution is a Poisson distribution with the value zero cut out as a possible value. The PMF is ( ) ( 1 e µ µ x ) f µ (x) = 1 e µ y = 1, 2,... (4) x! Notice the second factor is the usual Poisson distribution. The first factor scales the distribution by the probability that x 0. Models using zero-truncated Poisson distributions can avoid the inflated zeros issue in figure 1 with a two-step setup. A Bernoulli variable models the zero values, and a zero-truncated Poison models the number observed given that the Bernoulli variable is one. 2.2 Aster Graphs Aster models have a graphical dependence structure that can be expressed visually in an aster graph. There are four rules for creating an aster graph (Geyer et al., 2007). 1. Nodes are random variables for components of fitness. 2. Edges are conditional distributions for components of fitness. 3. Predecessors are sample sizes. 4. Initial nodes are constant random variables. An aster graph relationship like X f θ(y x) Y (5) means that component of fitness X has direct influence on component of fitness Y. The conditional distribution is X iid Y X = Y i where Y i fθ (y i x) (6) i=1 where θ is the canonical parameter in the f θ distribution. Exponential family distributions have a special formula for the sum of n independent and identically distributed (IID) random variables. If Y 1,..., Y n are IID with the same exponential family distribution and cumulant function c(θ), then the sum n i=1 Y i is again an exponential family distribution with canonical statistic n i=1 Y i, canonical statistic θ, and cumulant function n c(θ) (Geyer, 2013, deck 2, slide 22). This gives a convenient way to find the conditional distribution of Y X in the aster graph. For many exponential family distributions, the resulting sum will be a well known distribution. For example, if Y 1, Y 2,..., Y n are IID, then the distribution of n i=1 Y i is 6

7 Bin (np) if Y i iid Bin (p) Pois (nµ) if Y i iid Pois (µ) N (nµ, σ 2 ) if Y i iid N (µ, σ 2 ) For other exponential family distributions, such as the zero-truncated Poisson, the sum does not follow any well known distribution. Regardless, the distribution of the sum of IID exponential family random variables is always an exponential family distribution with cumulant function nc(θ) Initial nodes must be constant to denote the sample size of the first non-degenerate random variable in the aster graph. For example, if an experimenter planted three seeds and then established a component of fitness to see if the seeds germinate, the first arrow in the graph would be 3 Ber(p) Germinate where Germinate would be the sum of three independent Ber(p) random variables, which by the Bernoulli sum rule is a single Bin(3, p) random variable. Components of fitness are often zero. When a predecessor variable is zero, the successor is an empty sum of zero terms. By convention this is also zero. Each node in the graph comes together to form the joint distribution for all the components of fitness. Suppose we have n nodes non-initial nodes X 1, X 2,..., X n in the aster graph. Then the joint distribution of X = {X 1, X 2,..., X n } is f(x 1, X 2,..., X n ) = n nodes i=1 f(x i X p(i) ) (7) where p(i) is the index of the predecessor node that comes immediately before X i. Notice that the initial nodes are not among the factors in this product. Immediate successors of initial nodes will have conditional distribution f(x i X p(i) ) = f(x i ) Requirements for the Aster Graph Aster models place some requirements on the aster graph (Geyer et al., 2007). At most one predecessor: Each node has at most one predecessor. The initial node has no predecessors. Acyclic: The aster graph must be acyclic. Without this property, the joint distribution cannot be factored into a product of conditionals as in equation 7. An acyclic graph is necessary to obtain closed form densities and likelihoods. Initial node is constant: An initial node must be constant. It still represents the sample size of the succeeding random variable. Suppose we plant seeds in a garden. If we place X initial = 3 seeds in a single slot, then there are three random variables (seeds) that can either sprout, or not sprout. If X initial = 1, we only plant one seed in the slot and either one or zero plants can sprout from that seed An Aster Graph Example To gain more intuition, let s apply these rules to a simple example. Suppose we want to know how many seeds a plant produces and we use three components of fitness: an initial node X initial, the number of flowers F, and the total number of seeds S the plant produces. For simplicity 7

8 suppose we plant one plant in each growing slot, so X initial = 1. Plants can t produce seeds without flowers so it makes sense for F to be the predecessor of S. A plant can have any number 0, 1, 2, 3,... of flowers so F follows a Poisson distribution. Depending on what type of plant we are modeling, flowers have at least one seed. If each flower must have at least one seed, then S F follows a zero-truncated Poisson distribution. Using this model we obtain the aster graph that follows in equation 8: 1 Pois(µ F ) F 0-Pois(µ S) S (8) With the aster graph above fully specified, the conditional distribution of seeds given flower count S F is concentrated at zero if F = 0 a zero-truncated Poisson if F = 1 the sum of n IID zero-truncated Poisson random variables if F = n > 1 In the first case, the plant has no flowers so F is zero. Hence, there are no seeds. Thus f(s F = 0) is a degenerate distribution concentrated at value zero. In the second case, the plant has one flower, so F = 1. We calculate f(s F = 1) as the distribution of single zero-truncated Poisson random variable. In the third case, F > 1 so we get a proper summation. We have to take the summation of IID zero-truncated Poisson random variables over all the flowers that the plant produced. 2.3 Conditional and Unconditional Models There are two different ways to parameterize the same saturated aster model (see section 2.4): through a conditional model or through an unconditional model. These conditional and unconditional models relate to equation 7, which we now repeat here. f ϕ (X 1, X 2,... X nnodes ) = n nodes i=1 f θi (X i X p(i) ) The right hand side is a product of conditional PMDFs. Each random variable X i X p(i) has conditional distribution f θi (X i X p(i) ), which is an exponential family with canonical parameter θ i. The joint distribution, which is obtained by taking the product, is also an exponential family distribution. The joint parameter θ = (θ 1, θ 2,..., θ nnodes ) is a valid parameter for the joint distribution, but is not the joint distribution canonical parameter ϕ The left hand side is a single unconditional PMDF. The joint random vector (X 1, X 2,..., X nnodes ) has an unconditional joint distribution f ϕ (X 1, X 2,..., X nnodes ) which is an exponential family with canonical parameter ϕ. The conditional distributions, which are obtained by factoring, are also exponential family distributions. Each conditional parameter ϕ i is a valid parameter for the corresponding conditional distribution f ϕi (X i X p(i) ), but is not the conditional distribution canonical parameter θ i. The relationship between θ and ϕ is determined by a mapping called the aster transform. The aster transform inv. aster transform aster transform maps θ ϕ. The inverse aster transform maps ϕ θ. Section 2.5 presents this mapping in more detail. The conditional model allows biologists to model the relationships between components of fitness that specify the aster graph(see figures 2 and 6). Both the conditional model and the unconditional model may be useful for inference depending on the application. 8

9 2.4 Saturated Aster Models and Aster Submodels So far we have discussed the aster model for a single organism. We now expand this to allow data on many individuals. In the most general case, each individual in the study may have its own aster graph. This can be handled mathematically by the framework presented in Geyer et al. (2007) and computationally with the tools included in the aster2 package (Geyer, 2017b). However, this is more general than necessary for either the E. angustifolia example (fig 2) or the C. fasciculata data (fig 6). If we assume all individuals share the same aster graph, the unconditional canonical statistic Y is a vector of length n nodes n ind. Likewise, ϕ and θ are vectors of length n nodes n ind. This means that the model given by the aster graph using θ or ϕ is saturated. It has as many parameters as there are data points available and has no degrees of freedom available. Aster models deal with the saturated model issue the same way that linear models and generalized linear models do; an affine submodel reduces the dimension of the problem thus gives back degrees of freedom. Model Type Saturated Model Affine Submodel Linear Regression Y X N(µ, σ 2 I n ) µ = o + Mβ Logistic Regression Y X Bin(n, p) logit(p) = o + M β Aster Models Y X ExpFam(ϕ) ϕ = o + Mβ. In each case, the saturated model, which is specified by giving a probability model to the target variable Y X, has too many parameters to estimate directly. Rather, some function of the parameter in the saturated model is reduced in dimension by mapping to o + Mβ. The dimension is now reduced to the number of columns of M, which gives back degrees of freedom. In linear or logistic regression, the model matrix M tracks covariate information. Continuous variables in X may become columns in M directly, categorical variables are converted to binary columns in M, and interaction columns or alternative basis functions may be included as desired. In aster models, the model matrix tracks component of fitness data in addition to other covariates. This is accomplished by treating the component of fitness as an indicator variable, and appending n nodes 1 binary columns to the model matrix M. For an affine submodel for aster models, ϕ = o + Mβ (9) the offset vector o is a known vector that may optionally depend on covariate information or be left out entirely. The β is called the canonical submodel parameter. 2.5 Aster Model Transformations So far we have introduced three parameterizations for aster models, conditional, unconditional and submodel. θ, ϕ and β are the canonical parameters for the conditional model, unconditional model, and submodel respectively. Aster models parameters can also take mean-value form rather than canonical form. Like canonical parameters, mean-value parameters exist for all three model types, conditional, unconditional and submodel. In total, there are 2 3 = 6 possible aster model parameterizations; first choose the parameters role, mean-value or canonical. Next choose the model type, unconditional, conditional or submodel. The mean-value parameter for the conditional model uses parameter ξ, defined as ξ i = E(Y i Y p(i) = 1) (10) 9

10 Conditional Model dim: n ind n nodes Unconditional Model dim: n ind n nodes Submodel dim: n coef Canonical Parameters Mean-value Parameters no closed form aster transform ϕ = o + Mβ θ ϕ β ξi = c i(θi) inv. aster transform β = M 1 (ϕ o) multiplication µ = Mτ ξ µ τ division no closed form µ = c(ϕ) τ = M 1 µ no closed form τ = csub(β) Figure 3: Parameters The ξ i can be obtained from θ i with ξ i = c i (θ i). The mean-value parameter for the unconditional model, µ is defined similarly as µ = E(Y ) (11) From exponential family theory, we have µ = E(Y ) = c(ϕ). Finally, the mean-value parameter for the submodel, τ, is the expected value of Y mapped through the submodel matrix. That is, τ = M T E(Y ) = M T µ (12) The six parameters θ, ϕ, β, ξ, µ, τ are displayed in figure 3. The conditional model and unconditional model parameters all have dimension n ind n nodes, since there is one component for each aster graph node per individual. The submodel parameters have dimension n coef, the number of columns in the model matrix M. The relationships between each of the six parameters are indicated by the arrows in figure 3. The aster transform converts the conditional canonical parameter θ to the unconditional canonical parameter ϕ. When we express the factorization of the log likelihood using θ we get l(θ; y) = n nodes i=1 l(θ i ; y i ) (13) Using the fact that the successor node Y i is the sum of Y p(i) IID random variables and applying the exponential family summation rule this becomes ( nnodes l(θ; y) = log exp ( y i θ i y p(i) c i (θ i ) )) = n nodes i=1 i=1 ( yi θ i y p(i) c i (θ i ) ) (14) This sum is linear in y. The y p(i) terms can be grouped into non-random initial node data and random non-initial node data. Let J be the set of random non-initial nodes in the aster graph. 10

11 Then the result matches the unconditional canonical parameterization for ϕ. l(θ; y) = n nodes i=1 y i = y ϕ c(ϕ) θ i c j (θ j ) y p(j) c j (θ j ) j J j J p(j)=i (15) The last step determines the aster transform by setting the i th component of ϕ to and the unconditional cumulant function to ϕ i = θ i j J p(j)=i c(ϕ) = j J c j (θ j ) (16) y p(j) c j (θ j ) (17) The inverse aster transform relies on back-solving the aster transform for θ i in terms of ϕ i. At terminal nodes, there are no successors so ϕ i = θ i. The approach is to start with terminal nodes and work towards the initial node to solve for θ i in terms of the components of ϕ. This specifies the inverse aster transform via θ i = ϕ i + j J p(j)=i c j (θ j ) (18) where θ j has already been determined from the previous step in the back-solving process. The relationship between µ and ξ is that of multiplication and division. µ i = E(Y i ) = E{E(Y i Y p(i) )} = E{y p(j) ξ i } = ξ i E{y p(i) } (19) = ξ i µ p(i) This process can be repeated until the evaluation reaches an initial node. 3 Random Effects In aster models, the components of fitness already account for variability via the variance of the conditional models for each component of fitness in the aster graph. Additional sources of variability come from explicit random effects introduced into the model. Specifically, we extended the submodel to allow for random effects. Recall equation 9, ϕ = o + Mβ. We now revise this submodel to include random effects. ϕ = o + Mβ + Za (20) As before, o is an offset term. But now there are two model matrices M and Z. M is the model matrix for fixed effects. Z is the model matrix for random effects. β and a are vectors 11

12 representing the fixed and random effects of the model. This formulation of random effect aster models was introduced in Geyer et al. (2013). Our purpose for random effects is to estimate the variability in fitness that arises from the genetic differences between individuals. Quantitative genetics literature (Lynch and Walsh, 1998; Wilson et al., 2010) provides a random effect called breeding value for this purpose. Here we take a to the breeding values, a vector with one component per individual. 3.1 Breeding Values Breeding values retain a dependence structure specifying how much genetic information is shared between individuals. This structure is determined by the numerator relationship matrix N and a scalar variance component σa 2 known as the additive genetic variance. Then the vector of breeding values a follows a normal distribution as a N ( 0, σ 2 AN ) (21) The numerator relationship matrix indicates how much genetic similarity there is between two individuals. This is measured via family relationships between individuals such as parent, child, sibling, half-sibling, etc. When there is no inbreeding (i.e. an individual s parents are unrelated) a parent and a child will share half of their genetic information since the child inherits half of its genetic information from each parent. Likewise, siblings also share half of their genetic information. The full rules for computing N under the most general circumstances can be found on page 763 in equations (26.16a) and (26.16b) of Lynch and Walsh (1998). For our purposes, we will leave out the possibility of inbreeding and limit the family structure to two generations. In the parent generation all individuals are thought to be unrelated. In the offspring generation, covariance is determined by the family relationships discussed in section 3.2. Under these assumptions, the possible relationships between individuals are reduced to parent, child, sibling, half-sibling, and unrelated. With these assumptions, the amount of genetic information shared between individuals i and j is represented by n ij and can be computed with the rules: Individual i with itself: n ii = 1 i is the parent of j: n ij = 1 2 i is a child of j: n ij = 1 2 i and j are siblings: n ij = 1 2 i and j are half-siblings: n ij = 1 4 i and j are unrelated: n ij = Pedigrees The family relationships between all the individuals in the study is known as a pedigree and can be visualized with a family tree diagram. These diagrams use the terminology sire and dam for the father and mother, as is common in quantitative genetics. The mathematical notation follows from this with s(i) and d(i) representing the father (sire) and mother (dam) of individual i. The pedigree for the data in this project has two generations: a parent generation without component of fitness data, and a offspring generation with component of fitness data. The parent generation is used only for the pedigree. There are two assumptions about the pedigree data: 12

13 m1 m2 m3 m4 Example Pedigree Parent Generation Offspring Generation Figure 4: A pedigree with a parent generation and an offspring generation. Blue squares denote sires, red circles denote dams, and black diamonds denote offspring. 1. There is no inbreeding between individuals. 2. Individuals either have two parents in the pedigree or none. An example pedigree is shown in figure 4. There are 11 individuals: 3 sires, 3 dams, and 5 offspring. For this example, the numerator relationship matrix would be /2 1/ /2 1/2 1/ /2 1/ / /2 N = /2 1/2 1/ /2 1/ /2 1/ /2 1 1/ /2 1/ /4 1/4 1 1/ /2 1/ / /2 1/ Since our data only includes component of fitness data for the offspring, we can drop the parents from the matrix. The reduced numerator relationship matrix which only includes information on the five offspring individuals 7, 8, 9, 10, 11 is the bottom right 5 5 block matrix of the original. 1 1/2 1/ /2 1 1/4 0 0 N = 1/4 1/4 1 1/ / To summarize, the pedigree graph provides a visual tool with which to calculate the numerator relationship matrix N. This is then used to fully specify the breeding values as a N(0, σ 2 A N), where dim (a) = n ind and dim (N) = n ind n ind. 13

14 3.3 Avoiding Inverting the Numerator Relationship Matrix With the breeding values fully specified, this information can be incorporated into the log likelihood. l(ϕ) = l(o + Mβ + Za) + l(a) = l(o + Mβ + Za) at σ 2 A N 1 a log ( σ 2n ind A Det (N) ) (22) The N 1 in equation 22 poses a problem. Inverting N is difficult when n ind is large, and it usually is. This prompts a factorization of a that avoids inverting N. The key is that the dependencies in a involve offspring depending on their sire and dam. Since the joint distribution of a is normal, the univariate conditionals a i a s(i), a d(i) must also be univariate normal. In Geyer (2012, p. 3), it is shown that ( ) as(i) + a d(i) a i N, σ2 A (23) 2 2 Thus the PDF of a can be factored as f(a) = ( ) ( ( ) 1 2 ) exp a2 i 1 2ai a s(i) a d(i) 2πσA 2σ 2 exp i F A πσa 2σ 2 i F A where F is the first generation of parents in the pedigree data. If we assume that there is no component of fitness data on this generation, then all individuals in the likelihood come from offspring generations. This further simplifies the PDF of a to f(a) = ( ( ) 2 ) 1 2ai a s(i) a d(i) exp πσa 2σ 2 (25) i F A This avoids inverting the numerator relationship matrix. The additive genetic variance parameter σa 2 is sometimes divided into the variance for the genetic information coming from the sire, dam and individual itself. This is expressed by (24) σ 2 A = σ 2 ind + σ 2 sire + σ 2 dam (26) where 4 Data σ 2 ind = σ 2 A/2 σ 2 sire = σ 2 A/4 σ 2 dam = σ 2 A/4 (27) The primary data used in this project records information on Chamaecrista fasciculata grown at McCarthy Lake, MN. A second population of C. fasciculata grown at the Grey Cloud Dunes is used for prior elicitation. Both data sets feature component of fitness information and a pedigree used to compute the numerator relationship matrix N. 4.1 Why C. fasciculata? There are several reasons why the C. fasciculata plant is well suited for a Darwinian fitness study. 14

15 Annual plants - C. fasciculata are annual plants, which means that each generation of plants lives only one year. This allows experimenters to collect a full generation of data in only one year. In contrast, experimenters are often not able to collect complete data on perennial plants because the lifespan can be too long. Primarily outcrossing - C. fasciculata are primarily out-crossing, as opposed to self fertilizing. Out-crossing allows plants to create more genetic diversity by passing genes on to different plants. In the case of C. fasciculata, a mechanical mechanism is in place to prevent self fertilization. Self fertilization is a complication for individual model research because it can mask the pedigree of an individual. It can be difficult to determine whether an offspring plant was self fertilized by its sole parent or crossed by a sire and dam plant. Out-crossing is a desirable trait for individual model research because it avoids this confusion and offers a clear pedigree. In the case of C. fasciculata, bees land on the flowers to collect nectar. The buzzing of the bees will shake pollen free and stick to the bee. The bee will carry the pollen to another plant to fertilize the female ovules. In this study, researchers used a device similar to an electric toothbrush to shake the pollen free from the plants. The pollen was given to the desired dam in order to produce the desired pedigree. Perfect flowers - C. fasciculata have perfect flowers, which means that a flower includes both male and female reproductive organs. This allows for easier determination of sire and dam plants. Low seed dormancy tendency - C. fasciculata have little tendency for seed dormancy. In some species, seeds can remain dormant in the soil for multiple years before germination. This adds an additional complication in determining the pedigree in individual models. It is possible that C. fasciculata seeds from previous years found their way into the soil at the experiment growing sites. If these seeds were to sprout it would mix plants from an older generation and unknown parentage with the current generation of plants from the experiment. If this were to happen, the data set would not contain accurate records of the sire and dam. Fortunately, seed dormancy is uncommon for C. fasciculata plants so the recorded sire and dam in the data set can be trusted. Natural range - The natural habitat for C. fasciculata stretches into Minnesota at the northern limit of the range (figure 5). If this were not the case, estimates of Darwinian fitness would be artificially low, since plants would still need to adapt to the new environment. Furthermore, observing non-native plants would complicate genetics by environment interactions. As a native species, C. fasciculata escapes these concerns. 4.2 C. fasciculata Component of Fitness Data The components of fitness chosen for C. fasciculata are based on its life cycle and reproduction. The first (non-initial) component of fitness is germination, a Bernoulli random variable indicating whether a seed successfully germinates and sprouts out of the ground. As a plant continues to grow, it may produce flowers, giving a plant a chance to reproduce. The flower status of a plant is also taken as a Bernoulli random variable and serves as the second component of fitness. Each flower then has the potential to produce fruit. For C. fasciculata this fruit is a pea-pod and is referred to as a pod in this report. The number of pods produced must be at least one. Therefore, it is reasonable to model the pod count as a zero-truncated Poisson random variable. This becomes the third component of fitness. Like all fruit, the C. fasciculata pods contain the seeds of the plant. The total number of seeds across all the pods is called the total seed count and is modeled by the fourth and final component of fitness in the aster graph as a zero-truncated 15

16 Figure 5: Native Range of C. fasciculata. Image from plants.usda.gov. Poisson random variable. The resulting graph is visualized in figure 6 along with pictures of a C. fasciculata plant at each stage. Root Germination Flowering Status Pod Count Seed Count Ber Ber 0-Pois 0-Pois 1 G FS PC SC Planted Figure 6: C. fasciculata aster graph 4.3 Pedigree Data The pedigree is divided into the parent generation, for which there is no component of fitness data, and the offspring generation, for which there is component of fitness data. The parent generation contains 48 sires (fathers) and 132 dams (mothers). The offspring generation contains component of fitness and pedigree data on 3445 individuals. 16

17 5 Bayesian Analysis Bayesian analysis works by Bayes rule (Bayes and Price, 1763). A prior distribution represents the belief about the parameter before data is observed. A likelihood distribution specifies the model by which the data is generated. The prior and likelihood combine to make the posterior distribution, which gives an updated belief about θ after the data has been observed. prior likelihood {}}{{}}{ L(θ; X) P (θ) P (θ X) = }{{} L(θ; X) P (θ)dθ posterior }{{} normalizing constant (28) The normalizing constant, L(θ; X)P (θ)dθ is constant in θ because θ has been integrated out. Since the posterior distribution is a valid density, it must integrate to one. The normalizing constant is the constant factor making the posterior to integrate to one. As is standard practice, we relax this equation by dropping multiplicative factors that do not include θ. The normalizing constant in equation 28 is one such factor, but the likelihood and priors may contain other multiplicative factors not involving θ. In this way, any function of data only can be dropped. The posterior is now unnormalized, but this will not cause difficulties. P (θ X) }{{} unnormalized posterior = L(θ; X) }{{} likelihood P (θ) }{{} unnormalized prior (29) In simple cases, the priors may be chosen to be the conjugate family of the likelihood. If this is the case, then the posterior distribution is in the same distribution family as the prior. Analysis can then proceed analytically. In practice, conjugate priors are often undesirable or infeasible because they may not reflect the subject matter knowledge about a parameter or the likelihood may be a more complicated function having no known conjugate prior. Markov chain Monte Carlo (MCMC) is an alternative approach that provides a way to approximate the posterior distribution numerically. Taking the log of the posterior can often reduce difficulties with computer arithmetic overflow. On the log scale, the same multiplicative constant that were ignored in equation 29 become additive constants that may be safely ignored in the log likelihood. The equation becomes log P (θ X) }{{} = l(θ; X) }{{} + log(p (θ)) }{{} log unnormalized posterior log likelihood log unnormalized prior (30) Under the Bayesian viewpoint, uncertainty is handled by random variables. Since the parameters built into a model are uncertain, a Bayesian maintains that parameters are random variables. Likewise, latent variables and random effects hold uncertainty to their values. The Bayesian addresses this by treating random effects as unknown parameters which are themselves random variables. From this point onward, this report takes the Bayesian view using the terminology random effect parameters for the breeding values a. With this clarification on the Bayesian perspective, we now turn to the likelihood and priors in more detail. 17

18 5.1 Log Likelihood The log likelihood likelihood l(ϕ) = l(o + Mβ + Za) is computed via the minimal function in the animal package 3. Since a N(0, σa 2 N) the log likelihood can be expressed as l(ϕ) = l(β, a, σ 2 A) (31) The minimal function is written to be used with additional random effects, but can be adapted to fit an aster model with only a breeding value random effect. minimal assumes that the model has three random effect parameters, one for individual, sire, and dam random effect components. Adapting this to a model with a single random effect parameter for breeding value requires the constraint that σ 2 A = σ 2 ind + σ 2 sire + σ 2 dam where σ 2 ind = σ 2 A/2 σ 2 sire = σ 2 A/4 σ 2 dam = σ 2 A/4 The first random effect aster paper (Geyer et al., 2013) discusses another complication in the likelihood. We would like to allow random effect variance parameters to be zero. Modeling the standard deviation parameters instead allows the standard deviation to be negative or zero and removes the restriction that variance be positive. In addition, the minimal function avoids problems with taking the square root of negative numbers by modeling the additive genetic standard deviation instead of the additive genetic variance (Geyer et al., 2013, p. 1783). The resulting log likelihood becomes 5.2 Log Priors l(ϕ) = l(β, a, σ A ) = l(β, a, σ ind, σ sire, σ dam ) (32) The parameters in this model come from the aster submodel, ϕ = o + Mβ + Za. β is the parameter vector for fixed effects. It includes both component of fitness parameters and block effect parameters. The random effect parameter vector a includes one breeding value for each individual in the data. a has one associated variance parameter, σa 2, which is the additive genetic variance. Rather than model σa 2 directly, we chose to model the standard deviation σ A. Priors were chosen under the assumption of independence between the components Fixed Effect Parameters Each component of β has an independent logistic distribution prior. The logistic distribution is a location-scale family. We use µ hp to represent the location parameter and σ hp to represent the scale parameter, where the subscript hp indicates that µ hp and σ hp are hyper-parameters and is not the same as µ or σ used elsewhere in this report. f µhp,σ hp (β i ) = e (βi µ hp)/σ hp σ hp (1 + e (βi µ hp)/σ hp) 2 (33) Here µ hp really is the mean of β i, but σ hp is not the variance. Rather, Var(β i ) = σ2 hp π2 3 Computation is performed via the dlogis(..., log=true) function in R. 3 The minimal function computes what frequentists know as the complete data log likelihood, i.e. the log likelihood if the random effects a could be observed. Under the Bayesian perspective, this is just the log likelihood. 18

19 5.2.2 Random Effect Parameters The only random effect parameter vector is the breeding value, specified by a N(0, σa 2 N). Since N, the numerator relationship matrix is known, the only other parameter for random effects is the additive genetic variance, σa 2. In principle, the additive genetic variance could be zero or some positive number, so this must be reflected in the choice of the prior distribution for breeding values. The PDF of an exponential distribution, f λ (x) = λ exp λx has support on the positive real numbers. Moreover, lim x 0 f λ (x) = λ > 0. As a result, it is realistic to observe samples from an exponential distribution with value arbitrarily close to zero. This matches the modeling assumptions. Rather than model the additive genetic variance σa 2 directly, we use the additive genetic standard deviation σ A. If σa 2 Exp(λ) (34) then the change of variable formula gives Taking the log and dropping additive constant terms gives f λ (σ A ) = λe λσ2 A 2σA (35) logprior(σ A ) = log (σ A ) λσ 2 A (36) When the log prior for σ A is programmed in R, care must be taken to ensure computer arithmetic errors do not occur. For example, if equation 36 were used as is, R would encounter an overflow error when σ A is too large. R would compute equation 36 as logprior(σ A ) = log (σ A ) λσ 2 A = log(inf) λinf 2 = Inf Inf = NaN but the desired behavior is logprior(σ A ) = -Inf, since lim σa logprior(σ A ) =. solution is to return -Inf whenever the result of logprior(σ A ) is NaN or Inf. (37) One Prior Elicitation Researchers grew C. fasciculata plants at four different growing locations in Minnesota. The McCarthy Lake growing site is the primary location of interest. Consequently, the analysis was performed using the data from the McCarthy Lake site. Biologists consider growing locations at Grey Cloud Dunes to be most similar to those at McCarthy Lake. This makes Grey Cloud Dunes data useful for prior elicitation. The means and variances of each fixed effect were calculated on the Grey Cloud Data, then transformed into the µ hp and σ hp used as hyper-parameters for the logistic distribution fixed effects. Researchers came up with a point estimate for the σ A additive 1 genetic variance parameter to be used for the random effect. Then setting λ = point est specifies the prior distribution for σ A. 6 Computation via MCMC We used Markov chain Monte Carlo to sample from the posterior distribution. The code in this report uses the metrop function in the R package mcmc, which implements the Metropolis random-walk algorithm. This section follows a discussion of MCMC from Geyer (2011) and explains how this was implemented for the aster model fit on C. fasciculata data. 19

20 6.1 Why MCMC? Both Bayesian and frequentist approaches commonly encounter integrals that can not be solved analytically. One place where these integrals arise is the likelihood normalizing constant. Another place is in drawing inference. Two common examples are the posterior expectation E (θ X) = θ XP (θ X)dθ and marginal posterior distributions, which are obtained by integrating out Θ undesired components of θ from the posterior distribution. MCMC offers the ability to avoid computing these integrals. Integrating the normalizing constant is avoided with a cancellation trick. Integrals that arise in the inference step can be approximated using the samples produced by MCMC rather than numerical integration. Aster models with random effects must address these issues. With a handful of random effects, aster models may use approximate integrated likelihood with the Breslow-Clayton approximation (Breslow and Clayton, 1993). This approach assumes that the likelihood is nearly quadratic in the random effect parameters, approximates the likelihood with the with a form that can be solved analytically. This is implemented in the R package aster (Geyer, 2017a) through the function reaster. However, in high dimensional settings, the Breslow-Clayton approximation breaks down and other methods are needed. Geyer et al. (2013, p. 1793) point out the Breslow- Clayton approximation is not workable for quantitative genetic models with one random effect parameter per individual. Numerical integration could offer a solution to these integrals, but this technique comes with its own issues. In equation 28 we saw that the posterior distribution can be decomposed into the likelihood times the prior divided by the normalizing constant. P (θ X) = Θ L(θ; X) P (θ) L(θ; X) P (θ)dθ Computing the normalizing constant L(θ; X) P (θ)dθ requires integrating over each dimension of the vector θ. We will illustrate the computational complexity of this integration under Θ the simplest case possible, when each component of θ is binary. Suppose dim(θ) = d. Then the integral becomes a summation over the d dimensions of θ. L(θ; X) P (θ)dθ = L(θ 1,..., θ d ; X) P (θ 1,..., θ d ) (38) Θ θ i=0,1 i=1,2,...,d There are d components of θ taking 2 values each, so there will be 2 d terms in the sum. This is an exponential time algorithm and quickly gets out of hand. For instance, if d = 30, there are over a billion terms in the sum. This phenomenon of rapidly increasing computation with the dimension of θ is known as the curse of dimensionality and it prevents direct computation of the integral in a reasonable amount of time. In cases where components of θ are continuous rather than binary, numerical integration techniques such as the trapezoid rule or Simpson s rule suffer even more profoundly from the curse of dimensionality. These methods create a grid over the dimensions of θ. If there are k grid points in each dimension, the computation will run in exponential time O(k d ). More advanced numerical integration methods such as sparse grids (Heiss and Winschel, 2008) can improve on this running time to polynomial in d, but this still does not address the need to integrate both normalizing constant and other integrals needed for inference. Another approach peculiar to the Bayesian perspective is to use conjugate priors, a prior distribution chosen for the property that the posterior and the prior belong to the same family of distribution. For many simple likelihood distributions, the posterior can be determined analytically to belong to the same distribution family. For instance, when the likelihood is a binomial 20

21 distribution, setting the prior to follow a beta distribution guarantees that the posterior is also a beta distribution. However, no conjugate prior distributions are known for aster models with random effects for each individual. Consequently, we turn to MCMC to draw samples from the posterior distribution without actually knowing what that distribution is. The MCMC sampling algorithm uses a cancellation trick to produce samples from the posterior distribution without computing the normalizing constant. 6.2 Markov Chains From a Bayesian perspective, data is not random once it has been seen. Instead, the uncertainty in the model comes from the parameter vector θ. Thus θ, not X is a random. A Markov chain is a sequence of random vectors θ 1, θ 2,... having the property that the conditional distribution only depends on the most recent state in the chain, not on all the previous states. That is, P (θ n+1 θ 1, θ 2,..., θ n ) = P (θ n+1 θ n ) n = 1, 2, 3,... (39) Equation 39 describes the memoryless property, named because the process that determines θ n+1 has no memory of the previous states θ 1,... θ n 1. If further we have that P (θ 2 θ 1 ) = P (θ 3 θ 2 ) =... = P (θ n+1 θ n ) then the Markov chain has stationary transition probabilities. Markov chains with the this stationary transition probabilities are determined by two simpler distributions, the initial distribution P (θ 1 ) and the transition probability distribution P (θ n+1 θ n ). When the initial distribution and transition probability distribution interact so that the marginal distributions are equal P (θ 1 ) = P (θ 2 ) = = P (θ n ) we say that the Markov chain is in equilibrium and P (θ i ) is the equilibrium distribution. Often in MCMC simulations, the main interest for inference is not on the Markov chain itself, but on a particular function on the state space, g( ). That is, Raw MC: θ 1, θ 2,... Functional MC: g(θ 1 ), g(θ 2 ),... (40) 6.3 Monte Carlo A Monte Carlo method is a way of understanding a random variable θ through simulated data θ 1, θ 2,... θ n. The simplest case is known as ordinary Monte Carlo, where θ 1, θ 2,... iid f θ (θ). Monte Carlo uses the law of large numbers to show that the sample average will converge to its expected value. That is, for a function g( ), g(θ) n = 1 n n g(θ i ) E(g(θ)) (41) i=1 Since the samples θ i on the left hand side are readily available, Monte Carlo is effective when the expectation involves an integral that is difficult to compute numerically. Furthermore, the central limit theorem gives an approximation to the normal distribution as ) g(θ) n N (E(g(θ)), σ2 (42) n 21

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016 Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016 1 Theory This section explains the theory of conjugate priors for exponential families of distributions,

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Stat 8931 (Aster Models) Lecture Slides Deck 8

Stat 8931 (Aster Models) Lecture Slides Deck 8 Stat 8931 (Aster Models) Lecture Slides Deck 8 Charles J. Geyer School of Statistics University of Minnesota June 7, 2015 Conditional Aster Models A conditional aster model is a submodel parameterized

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Stat 8931 (Exponential Families) Lecture Notes Aster Models Charles J. Geyer December 5, 2016

Stat 8931 (Exponential Families) Lecture Notes Aster Models Charles J. Geyer December 5, 2016 Stat 8931 (Exponential Families) Lecture Notes Aster Models Charles J. Geyer December 5, 2016 1 Introduction 1.1 Aster Models and Other Statistical Models Aster models (Geyer, Wagenius, and Shaw, 2007;

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014 Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Stat 451 Lecture Notes Numerical Integration

Stat 451 Lecture Notes Numerical Integration Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Aster Models for Life History Analysis

Aster Models for Life History Analysis Aster Models for Life History Analysis Charles J. Geyer School of Statistics University of Minnesota Stuart Wagenius Institute for Plant Conservation Biology Chicago Botanic Garden Ruth G. Shaw Department

More information

Conditional probabilities and graphical models

Conditional probabilities and graphical models Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Answers and expectations

Answers and expectations Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

3 Independent and Identically Distributed 18

3 Independent and Identically Distributed 18 Stat 5421 Lecture Notes Exponential Families, Part I Charles J. Geyer April 4, 2016 Contents 1 Introduction 3 1.1 Definition............................. 3 1.2 Terminology............................ 4

More information

STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero

STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero 1999 32 Statistic used Meaning in plain english Reduction ratio T (X) [X 1,..., X n ] T, entire data sample RR 1 T (X) [X (1),..., X (n) ] T, rank

More information

Class Copy! Return to teacher at the end of class! Mendel's Genetics

Class Copy! Return to teacher at the end of class! Mendel's Genetics Class Copy! Return to teacher at the end of class! Mendel's Genetics For thousands of years farmers and herders have been selectively breeding their plants and animals to produce more useful hybrids. It

More information

Bayesian GLMs and Metropolis-Hastings Algorithm

Bayesian GLMs and Metropolis-Hastings Algorithm Bayesian GLMs and Metropolis-Hastings Algorithm We have seen that with conjugate or semi-conjugate prior distributions the Gibbs sampler can be used to sample from the posterior distribution. In situations,

More information

2 Inference for Multinomial Distribution

2 Inference for Multinomial Distribution Markov Chain Monte Carlo Methods Part III: Statistical Concepts By K.B.Athreya, Mohan Delampady and T.Krishnan 1 Introduction In parts I and II of this series it was shown how Markov chain Monte Carlo

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

E. Santovetti lesson 4 Maximum likelihood Interval estimation

E. Santovetti lesson 4 Maximum likelihood Interval estimation E. Santovetti lesson 4 Maximum likelihood Interval estimation 1 Extended Maximum Likelihood Sometimes the number of total events measurements of the experiment n is not fixed, but, for example, is a Poisson

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Lecture 9. Short-Term Selection Response: Breeder s equation. Bruce Walsh lecture notes Synbreed course version 3 July 2013

Lecture 9. Short-Term Selection Response: Breeder s equation. Bruce Walsh lecture notes Synbreed course version 3 July 2013 Lecture 9 Short-Term Selection Response: Breeder s equation Bruce Walsh lecture notes Synbreed course version 3 July 2013 1 Response to Selection Selection can change the distribution of phenotypes, and

More information

Heredity.. An Introduction Unit 5: Seventh Grade

Heredity.. An Introduction Unit 5: Seventh Grade Heredity.. An Introduction Unit 5: Seventh Grade Why don t you look like a rhinoceros? The answer seems simple --- neither of your parents is a rhinoceros (I assume). But there is more to this answer than

More information

Analysing geoadditive regression data: a mixed model approach

Analysing geoadditive regression data: a mixed model approach Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Sargur Srihari srihari@cedar.buffalo.edu 1 Topics 1. What are probabilistic graphical models (PGMs) 2. Use of PGMs Engineering and AI 3. Directionality in

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Part 2: One-parameter models

Part 2: One-parameter models Part 2: One-parameter models 1 Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes

More information

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

ABC methods for phase-type distributions with applications in insurance risk problems

ABC methods for phase-type distributions with applications in insurance risk problems ABC methods for phase-type with applications problems Concepcion Ausin, Department of Statistics, Universidad Carlos III de Madrid Joint work with: Pedro Galeano, Universidad Carlos III de Madrid Simon

More information

Bayesian Regression (1/31/13)

Bayesian Regression (1/31/13) STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed

More information

Chapter Eleven: Heredity

Chapter Eleven: Heredity Genetics Chapter Eleven: Heredity 11.1 Traits 11.2 Predicting Heredity 11.3 Other Patterns of Inheritance Investigation 11A Observing Human Traits How much do traits vary in your classroom? 11.1 Traits

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Bayesian Graphical Models

Bayesian Graphical Models Graphical Models and Inference, Lecture 16, Michaelmas Term 2009 December 4, 2009 Parameter θ, data X = x, likelihood L(θ x) p(x θ). Express knowledge about θ through prior distribution π on θ. Inference

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Introduction to Applied Bayesian Modeling. ICPSR Day 4 Introduction to Applied Bayesian Modeling ICPSR Day 4 Simple Priors Remember Bayes Law: Where P(A) is the prior probability of A Simple prior Recall the test for disease example where we specified the

More information

1 Mendel and His Peas

1 Mendel and His Peas CHAPTER 6 1 Mendel and His Peas SECTION Heredity 7.2.d California Science Standards BEFORE YOU READ After you read this section, you should be able to answer these questions: What is heredity? Who was

More information

DAG models and Markov Chain Monte Carlo methods a short overview

DAG models and Markov Chain Monte Carlo methods a short overview DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex

More information

Model comparison. Christopher A. Sims Princeton University October 18, 2016

Model comparison. Christopher A. Sims Princeton University October 18, 2016 ECO 513 Fall 2008 Model comparison Christopher A. Sims Princeton University sims@princeton.edu October 18, 2016 c 2016 by Christopher A. Sims. This document may be reproduced for educational and research

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

Bayesian Methods with Monte Carlo Markov Chains II

Bayesian Methods with Monte Carlo Markov Chains II Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm 1 Part 3

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School

More information

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to

More information

Represent processes and observations that span multiple levels (aka multi level models) R 2

Represent processes and observations that span multiple levels (aka multi level models) R 2 Hierarchical models Hierarchical models Represent processes and observations that span multiple levels (aka multi level models) R 1 R 2 R 3 N 1 N 2 N 3 N 4 N 5 N 6 N 7 N 8 N 9 N i = true abundance on a

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

One-parameter models

One-parameter models One-parameter models Patrick Breheny January 22 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/17 Introduction Binomial data is not the only example in which Bayesian solutions can be worked

More information

Introduction to Genetics

Introduction to Genetics Introduction to Genetics The Work of Gregor Mendel B.1.21, B.1.22, B.1.29 Genetic Inheritance Heredity: the transmission of characteristics from parent to offspring The study of heredity in biology is

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative

More information

Probability Review - Bayes Introduction

Probability Review - Bayes Introduction Probability Review - Bayes Introduction Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin Advantages of Bayesian Analysis Answers the questions that researchers are usually interested in, What

More information

Estimation of Operational Risk Capital Charge under Parameter Uncertainty

Estimation of Operational Risk Capital Charge under Parameter Uncertainty Estimation of Operational Risk Capital Charge under Parameter Uncertainty Pavel V. Shevchenko Principal Research Scientist, CSIRO Mathematical and Information Sciences, Sydney, Locked Bag 17, North Ryde,

More information

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Osnat Stramer 1 and Matthew Bognar 1 Department of Statistics and Actuarial Science, University of

More information

9-1 The Work of Gregor

9-1 The Work of Gregor 9-1 The Work of Gregor 11-1 The Work of Gregor Mendel Mendel 1 of 32 11-1 The Work of Gregor Mendel Gregor Mendel s Peas Gregor Mendel s Peas Genetics is the scientific study of heredity. Gregor Mendel

More information

Robert Collins CSE586, PSU Intro to Sampling Methods

Robert Collins CSE586, PSU Intro to Sampling Methods Robert Collins Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Robert Collins A Brief Overview of Sampling Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling

More information

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference 1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE

More information

Computing Likelihood Functions for High-Energy Physics Experiments when Distributions are Defined by Simulators with Nuisance Parameters

Computing Likelihood Functions for High-Energy Physics Experiments when Distributions are Defined by Simulators with Nuisance Parameters Computing Likelihood Functions for High-Energy Physics Experiments when Distributions are Defined by Simulators with Nuisance Parameters Radford M. Neal Dept. of Statistics, University of Toronto Abstract

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Bayes: All uncertainty is described using probability.

Bayes: All uncertainty is described using probability. Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Directed Graphical Models

Directed Graphical Models CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 Graphical Models If no assumption of independence is made, must estimate an exponential

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference By Philip J. Bergmann 0. Laboratory Objectives 1. Learn what Bayes Theorem and Bayesian Inference are 2. Reinforce the properties of Bayesian

More information

Guided Reading Chapter 1: The Science of Heredity

Guided Reading Chapter 1: The Science of Heredity Name Number Date Guided Reading Chapter 1: The Science of Heredity Section 1-1: Mendel s Work 1. Gregor Mendel experimented with hundreds of pea plants to understand the process of _. Match the term with

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005 Instructor: Professor Jeff A. Bilmes Uncertainty & Bayesian Networks

More information

Sampling Algorithms for Probabilistic Graphical models

Sampling Algorithms for Probabilistic Graphical models Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Principles of Statistical Inference Recap of statistical models Statistical inference (frequentist) Parametric vs. semiparametric

More information

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Graphical Networks: Definitions and Basic Results This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical

More information

General Bayesian Inference I

General Bayesian Inference I General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for

More information

11.1 Traits. Studying traits

11.1 Traits. Studying traits 11.1 Traits Tyler has free earlobes like his father. His mother has attached earlobes. Why does Tyler have earlobes like his father? In this section you will learn about traits and how they are passed

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Fall 2016

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Fall 2016 Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Fall 2016 By Philip J. Bergmann 0. Laboratory Objectives 1. Learn what Bayes Theorem and Bayesian Inference are 2. Reinforce the properties

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

1. What is genetics and who was Gregor Mendel? 2. How are traits passed from one generation to the next?

1. What is genetics and who was Gregor Mendel? 2. How are traits passed from one generation to the next? Chapter 11 Heredity The fruits, vegetables, and grains you eat are grown on farms all over the world. Tomato seeds produce tomatoes, which in turn produce more seeds to grow more tomatoes. Each new crop

More information