OEB 242: Population Genetics Exam Review, Spring 2015

Similar documents
CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

Introduction to Quantitative Genetics II: Resemblance Between Relatives

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

Hypothesis Tests for One Population Mean

Chapter 8: The Binomial and Geometric Distributions

Phys. 344 Ch 7 Lecture 8 Fri., April. 10 th,

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

, which yields. where z1. and z2

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Accelerated Chemistry POGIL: Half-life

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

Computational modeling techniques

Computational modeling techniques

Review Problems 3. Four FIR Filter Types

Comparing Several Means: ANOVA. Group Means and Grand Mean

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

Physics 2B Chapter 23 Notes - Faraday s Law & Inductors Spring 2018

We can see from the graph above that the intersection is, i.e., [ ).

Resampling Methods. Chapter 5. Chapter 5 1 / 52

SUMMER REV: Half-Life DUE DATE: JULY 2 nd

Distributions, spatial statistics and a Bayesian perspective

Inference in the Multiple-Regression

Statistics, Numerical Models and Ensembles

General Chemistry II, Unit I: Study Guide (part I)

Part One: Heat Changes and Thermochemistry. This aspect of Thermodynamics was dealt with in Chapter 6. (Review)

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y )

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

5.60 Thermodynamics & Kinetics Spring 2008

Thermodynamics Partial Outline of Topics

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

I. Analytical Potential and Field of a Uniform Rod. V E d. The definition of electric potential difference is

BLAST / HIDDEN MARKOV MODELS

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers

A Matrix Representation of Panel Data

AP Statistics Practice Test Unit Three Exploring Relationships Between Variables. Name Period Date

Simple Linear Regression (single variable)

Module 4: General Formulation of Electric Circuit Theory

The general linear model and Statistical Parametric Mapping I: Introduction to the GLM

Maximum A Posteriori (MAP) CS 109 Lecture 22 May 16th, 2016

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets

General Chemistry II, Unit II: Study Guide (part 1)

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning

IN a recent article, Geary [1972] discussed the merit of taking first differences

Lead/Lag Compensator Frequency Domain Properties and Design Methods

READING STATECHART DIAGRAMS

NUMBERS, MATHEMATICS AND EQUATIONS

You need to be able to define the following terms and answer basic questions about them:

Chapter 17 Free Energy and Thermodynamics

Ecology 302 Lecture III. Exponential Growth (Gotelli, Chapter 1; Ricklefs, Chapter 11, pp )

February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA

Functional Form and Nonlinearities

1 The limitations of Hartree Fock approximation

Group Analysis: Hands-On

CS 109 Lecture 23 May 18th, 2016

Computational modeling techniques

SIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST. Mark C. Otto Statistics Research Division, Bureau of the Census Washington, D.C , U.S.A.

LCAO APPROXIMATIONS OF ORGANIC Pi MO SYSTEMS The allyl system (cation, anion or radical).

INSTRUMENTAL VARIABLES

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa

Chemistry 20 Lesson 11 Electronegativity, Polarity and Shapes

2004 AP CHEMISTRY FREE-RESPONSE QUESTIONS

Function notation & composite functions Factoring Dividing polynomials Remainder theorem & factor property

Lecture 23: Lattice Models of Materials; Modeling Polymer Solutions

Math 10 - Exam 1 Topics

MODULE 1. e x + c. [You can t separate a demominator, but you can divide a single denominator into each numerator term] a + b a(a + b)+1 = a + b

Checking the resolved resonance region in EXFOR database

Part 3 Introduction to statistical classification techniques

SIMULATION OF GENETIC SYSTEMS BY AUTOMATIC DIGITAL COMPUTERS

11. DUAL NATURE OF RADIATION AND MATTER

Thermodynamics and Equilibrium

Math 105: Review for Exam I - Solutions

NGSS High School Physics Domain Model

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

ELE Final Exam - Dec. 2018

Lecture 12: Chemical reaction equilibria

37 Maxwell s Equations

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

Chapter 8 Predicting Molecular Geometries

making triangle (ie same reference angle) ). This is a standard form that will allow us all to have the X= y=

**DO NOT ONLY RELY ON THIS STUDY GUIDE!!!**

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Evolution. Diversity of Life. Lamarck s idea is called the. If a body

If (IV) is (increased, decreased, changed), then (DV) will (increase, decrease, change) because (reason based on prior research).

MATHEMATICS SYLLABUS SECONDARY 5th YEAR

Lecture 4 Resemblance Between Relatives

Lecture 17: Free Energy of Multi-phase Solutions at Equilibrium

( ) + θ θ. ω rotation rate. θ g geographic latitude - - θ geocentric latitude - - Reference Earth Model - WGS84 (Copyright 2002, David T.

ECE 5318/6352 Antenna Engineering. Spring 2006 Dr. Stuart Long. Chapter 6. Part 7 Schelkunoff s Polynomial

Competency Statements for Wm. E. Hay Mathematics for grades 7 through 12:

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving.

Getting Involved O. Responsibilities of a Member. People Are Depending On You. Participation Is Important. Think It Through

Lesson Plan. Recode: They will do a graphic organizer to sequence the steps of scientific method.

EASTERN ARIZONA COLLEGE Introduction to Statistics

1b) =.215 1c).080/.215 =.372

(2) Even if such a value of k was possible, the neutrons multiply

Section 5.8 Notes Page Exponential Growth and Decay Models; Newton s Law

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank

Lab #3: Pendulum Period and Proportionalities

Transcription:

OEB 242: Ppulatin Genetics Statistics Review HYPOTHESIS TESTING Null hypthesis has tw parts: Substantive (what are values we expect if nthing interesting is happening?) and frmal (hw much deviatin frm expected values d we allw?) Sme exemplars: H 0: Alleles at lcus A and lcus B assrt independently; thus any deviatin frm a 1:1:1:1 gametic rati is n greater than culd be explained by chance alne at the α=.05 level. H 0: The ppulatin is in Hardy-Weinberg equilibrium; thus any deviatin frm a 1:2:1 gentypic rati is n greater than culd be explained by chance alne at the α=.1 level. p-value represents P(bserved data H 0) statistics means never having t say yu re certain -- Must specify significance threshld fail t reject H 0 (why nt accept H 0? ) reject H 0: what can yu therefre cnclude (if anything?) Degrees f freedm are critical fr cnnecting the test statistic t a p-value in e.g. a chi-squared test. Given a cntingency table, the d.f. represents the minimum number f entries necessary t reppulate the entire thing, hlding cnstant what is knwn abut the dataset (the ttal number f datapints and the prprtins f datapints that fall int ne class r anther). Find by taking (ttal number f classes f data) 1 (fr fixing N tt) 1 (fr every independent parameter estimated when furnishing expected values). RANDOM VARIABLES An unspecified value, that takes n actual values accrding t a prbability distributin Mean r expected value is a weighted average f the pssible values an r.v. can take Variance is the expected value f the squared deviatins frm the mean: Var(X) = E[(X-µ) 2 ] We have used a few different kinds f randm variables in this curse: Binmial randm variables represent the number f successes in n independent trials, each f which has prbability f success = p. An example is the Wright-Fisher mdel f drift, where we imagine reprductin as sampling frm an infinite pl f gametes. P X = k X~Bin(n, k)) = p q. Mean = np; var = npq Pissn randm variables are binmial randm variables with large n and small p. They are cmputatinally mre tractable and are useful t describe scenaris where yu have very many chances t d smething rare. Mutatins, fr example, are mdeled as a Pissn prcess. P X = k X~Pis(λ)) = Mean = var = λ Gemetric randm variables represent the number f failures befre getting ne success with prbability p. The Kingman calescent, fr example, imagines nn-calescence as a failure with prbability q = 1-p and calescence as a success where p is equal t the frequency f the allele in questin. P X = k X~Gem(p)) = q p Mean = q/p, var = q/p 2 Expnential randm variables are the cntinuus analgues f gemetric randm variables. The Kingman calescent ften uses this apprximatin, which hlds when the ppulatin is large. P X = k X~Exp(λ)) = λe Mean = λ -1, var = λ -2 There are a few different ways t talk abut the dependencies f randm variables: Cvariance is an analgue f variance fr tw randm variables. It describes the extent t which tw r.v.s track each ther: if I change ne, hw des the ther change? Cvariance is the expected value f the prducts f the deviatins frm the mean: Cv(X,Y) = E[(X-µ x) (Y-µ y)] Crrelatin cefficient is a scaled versin f cvariance that falls between -1 (perfectly anticrrelated) and 1 (perfectly crrelated). Divide the cvariance by the prduct f the standard deviatins (i.e., square rt f the variance) f the tw randm variables t nrmalize. The slpe f the regressin line is slightly different: it measures the directness f the assciatin between tw r.v.s, whereas cvariance and crrelatin measure the precisin r tightness f that assciatin. Recall, fr example, that the slpe f the regressin line between midparent and ffspring gives the narrw sense heritability. Gd luck n the exam, try the review prblems n the website, and dn t frget t send in yur final papers via email by 6PM n Weds 5/6 (end f reading perid). 1

OEB 242: Ppulatin Genetics Chapter 4: Mutatin and Neutral Thery Infinite alleles mdel Assume each mutatin creates a new allele. Hence, hmzygsity implies identity-bydescent (autzygsity) F = hmzygsity = prbability tw randmly chsen chrmsmes (alleles) are IBD F = 1 μ + 1 μ 1 F Equilibrium value f F (mutatin-drift balance; F t = F t-1 ): H = heterzygsity = 1 F Equilibrium value f H (mutatin-drift balance; H t = H t-1 ): Infinite sites mdel Assume each mutatin affects ne base and that there is n recmbinatin. Used t derive Kingman calescent: T ~ exp ; E T = () Use t predict number f mutatins separating tw sequences (= pairwise diversity r per-site heterzygsity, Π) by multiplying mutatin rate times length f tw branches. à E Π = Θ Use t predict number f segregating sites in a sample f k alleles by multiplying mutatin rate times ttal length f tree. à E S = Θ The neutral thery The infinite sites mdel lets us estimate Θ in several different ways, which frms the basis f Tajima s D and ther neutrality tests. D = "#("#$%&'(%) The denminatr is a nrmalizing factr that is difficult t slve analytically. The numeratr tells us whether using pairwise diversity r the ttal number f segregating sites gives a greater estimate f theta. If D is psitive, this suggests a surplus f intermediate alleles (which inflate pairwise diversity disprprtinately). This is cnsistent with shallw/recent calescent times, and suggests balancing selectin r admixture. If D is negative, this suggests a surplus f rare alleles (which inflate the number f segregating sites disprprtinately). This is cnsistent with deep/ancient calescent times, and suggests directinal selectin r ppulatin grwth. In the neutral thery, the prbability f fixatin f an allele is its frequency. As a crllary, the fixatin rate is therefre the neutral mutatin rate, independent f ppulatin size. The prbability that any ne new allele fixes is (1/2N), and the ppulatin-wide rate f new mutatins is (2Nµ). The prduct f these tw values is simply µ. The average time between fixatin events is therefre (1/µ). The expected time t fixatin f an allele, given that it will eventually fix, is 4N generatins. The expected time t lss f a new neutral allele, cnditinal n its eventual lss, is 2ln(2N). Gd luck n the exam, try the review prblems n the website, and dn t frget t send in yur final papers via email by 6PM n Weds 5/6 (end f reading perid). 2

OEB 242: Ppulatin Genetics Chapter 7: Mlecular Ppulatin Genetics Here, we are lking at timescales that dn t allw us t invke the infinite sites/alleles mdels. Generally, we need t accunt fr the pssibility f multiple mutatins at the same site. Distinguish d (the number f differences bserved) frm k (the number f substitutins inferred). The prcedure fr inferring substitutins frm differences depends n whether we are talking abut amin acids r abut nucletides, and depending n what assumptins we make abut the prbabilities f varius mutatins. The Jukes-Cantr mdel, fr example, assumes that all mutatins are equally prbable. Frm this assumptin, ne can establish a recurrence equatin fr the prbability f a site taking n a given identity, which then can be translated int a partial differential equatin and slved t find an estimatr fr k based n d. We can test fr selectin in prtein-cding regins by cmparing either the rate f differences (dn/ds) r the rate f substitutins (Ka/Ks) fr nn-synnymus/amin-acidchanging mutatins versus synnymus mutatins, per site. Mutatins that change prtein structure are presumably mre visible t selectin than thse that d nt. In calculating these statistics, ne must accunt fr the number f sites that are ptentially synnymus r nn-synnymus. A twfld degenerate site is ften cunted as 2/3 nn, 1/3 syn. Under neutrality, the rati ~=1. Under purifying (negative) selectin, the rati may be less than 1. (Changes t the prtein are nt tlerated by selectin and are remved) Under psitive selectin, the rati may be greater than 1. (Changes t the prtein are favred and accumulate at an accelerated rate) The mlecular clck assumes that the number f substitutins is directly prprtinal t the amunt f evlutinary time that separates tw sequences. It is critical t realize, hwever, that T MRCA is half f this value (because the tw lineages tgether sum t the ttal amunt f evlutinary time) In general, substitutin rates d vary acrss rganisms, acrss genmic regins, etc., and s the clck is nt always cnstant. But the pint remains that we can quantify these rates and then make assumptins abut them in rder t interpret their significance. The McDnald-Kreitman test assumes that, under neutrality, a mlecular clck type assumptin will preserve the dn/ds rati acrss large evlutinary timescales. The test quantifies this by cmparing the dn/ds fr recent, micrevlutinary events (which give rise t plymrphism within a ppulatin r species) with the dn/ds fr ancient, macrevlutinary events (which give rise t differences between species). The test is mst straightfrwardly implemented as a chi-squared test using a cntingency table f nn-synnymus/synnymus and plymrphic/divergent mutatins. (In this case, there are fur data classes, and three fixed parameters: N tt, % NS vs. S, % P vs. D ), meaning that we have 1 df. If plymrphism is nt prprtinal t divergence, we must interpret: If div > ply: suggests (eg) psitive selectin between species If div < ply: suggests (eg) purifying selectin between species, balancing selectin within ppulatin Gd luck n the exam, try the review prblems n the website, and dn t frget t send in yur final papers via email by 6PM n Weds 5/6 (end f reading perid). 3

OEB 242: Ppulatin Genetics Chapter 8: Evlutinary Quantitative Genetics The Mendelian paradigm is mngenic, but we want t be able t talk abut plygenic (cmplex, quantitative) traits. We want t be able t ask questins like, hw much d genetics influence the phentype? Unfrtunately, this is an ill-frmed questin, and there s n way t talk abut hw genetic a trait is in the abstract. We have t grund ur discussin in ppulatins. This pens the dr fr the cncept f heritability Technical definitin: prprtin f phentypic variance attributable t genetic variance Interpretatin: extent t which genetic differences amng individuals explain phentypic differences amng individuals Desn t tell us hw many genes are invlved in a trait, e.g., but des help us understand relative cntributin f genetics and envirnment fr a given ppulatin There are tw ways t get a hld n it: by measuring variance r by calculating dminance cefficients. Variance One apprach invlves quantifying heritability by lking at the relatinships amng the variances f varius quantities (e.g. phentype) and hw they are related frm ne individual t its family member Variance decmpsitin: V P = V G + V E + V GE (Variance due t genetics, envirnment, and gene-envirnment interactins tgether explain the ttal phentypic variance) V G = V A + V D + V I (Variance due t genetics, in turn, is explained by the variance due t additive allele effects, dminance effects, and epistatic interactins) Brad sense: H 2 = V G / V P Narrw sense: h 2 = V A / V P h 2 = slpe fr regressin f mean ffspring vs mean parents Visscher, Hill & Wray, 2008 We can use the narrw-sense heritability fr predicting respnse t selectin Breeder s equatin : R=h 2 S, where S = selectin differential = (mean phentypic value fr breeding ppulatin) (mean phentypic value fr entire ppulatin) and R = respnse t selectin = the change in the mean phentypic value f the ppulatin after ne generatin f selectin Dminance Anther apprach t quantitative genetics psits tw values, a and d, which can be used t represent the strength f dminance and the relatinship amng the phentypic values assciated with each gentypic class. We assume the mean phentypes (als called gentypic value ) f AA, AA and A A are a, d, and a, respectively. Then, using HWE prprtins, we calculate the ppulatin mean (which depends critically n gentype frequencies). The mean is p 2 a+2pqd-q 2 a (which can be simplified). We can then describe ur gentypic values as deviatins frm the ppulatin mean. We say: (gentypic value) (smething) = (pp mean); therefre Gd luck n the exam, try the review prblems n the website, and dn t frget t send in yur final papers via email by 6PM n Weds 5/6 (end f reading perid). 4

OEB 242: Ppulatin Genetics (smething) = (pp mean) (gentypic value). Nw smething is ur gentypic value expressed as a deviatin frm the ppulatin mean (see left side f belw) We can then calculate a new statistic, the breeding values fr each gentype. The breeding value f an allele represents the phentypic cntributin it wuld make if it were strictly additive, and hence the breeding value f a gentype equals the sum f the breeding values f the cnstituent alleles. Thus, if there is n dminance and all effects are purely additive, gentypic values are equal t breeding values. We culd calculate the per-allele effect n phentype t get breeding values, r we culd prject ur gentypic values nt its leastsquares fit regressin line as in the diagram. On the right side f the belw diagram, we express breeding values as deviatins frm the ppulatin mean. As with the gentypic values, this cnventin makes analysis easier. Because f the definitin f breeding values given abve, we can get V A by lking at the variance f the breeding values. We can cmpare the breeding values t the gentypic values t get the dminance deviatins. (As suggested abve, when there is n dminance, breeding values = gentypic values and hence dminance deviatin = 0). In the belw diagram, these values appear in blue, and represent the distance between the gentypic values (black circles) and the breeding values (white circles). We can lk at the variance f the dminance deviatins t calculate V D. Gd luck n the exam, try the review prblems n the website, and dn t frget t send in yur final papers via email by 6PM n Weds 5/6 (end f reading perid). 5