Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Size: px
Start display at page:

Download "Quantitative Genomics and Genetics BTRY 4830/6830; PBSB"

Transcription

1 Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Lecture11: Quantitative Genomics II Jason Mezey March 7, 019 (Th) 10:10-11:5

2 Announcements Homework #5 will be posted by Sat (March 9) and you will have > 1 week to complete (Due: 11:59PM, Tues., March 19) Reminder: more posts coming (e.g., matrix basics!), computer labs today / tomorrow!

3 Summary of lecture 11 Last lecture, we began our introduction of linear regression Today we will introduce estimation and hypothesis testing for the (genetic) linear regression, which will provide a rigorous introduction to the statistical foundation of Genome-wide Association Study (GWAS) analysis Next lecture, we will begin our introduction to GWAS

4 Conceptual Overview Genetic System Does A1 -> A affect Y? Reject / DNR Measured individuals (genotype, phenotype) Regression model Sample or experimental pop Model params F-test Pr(Y X)

5 Review: genetic system Our goal in quantitative genetics / genomics is to identify loci (positions in the genome) that contain causal mutations / polymorphisms / alleles X causal mutation or polymorphism - a position in the genome where an experimental manipulation of the DNA produces an effect on the phenotype under specified X conditions Formally, we may represent this as follows: A 1! A ) Y Z Our experiment will be a statistical experiment (sample and inference!)

6 P r(t r(t (X ), P r(t (X H0 : ce S in this(x) case, isp therefore: ˆ X H0 : = c) or H0 : = c n P r(t (X), P r(t (X ), P r(t (X H : n = c) 0P (x Review: the statistical model S = {possible individuals} 1 = c) i µ) i=1 n L( x) = e n Pn (xi µ) 1 Pn 1 As with any statistical experiment, we need to begin by defining our sample space i=1 i=1 =c = (X ), P r(te (X H0 :, x) P r(t (x = c)s = {Sg, SP } L( x) = (3) e In the most general sense, our sample space is: could = { Possible Individuals ich we also write S = {Sg }\SnP P },n where the subset SP reflects tha (xi µ) 1 = cthe phenotype can = { Possible Individuals i=1 such }{ = P } or sick or ca discrete L( take x) =(e.g. e cases ues asg healthy =S{isg Pset } o Pmodel r(t (X ), P r(t (X H = c) and where the subset as continuous such 0as: height ) the specifically, each individual in our sample space can be quantified asg(4) More = { g P } g = {A1 A1, A1 A, A Aapair } samplefor outcomes so our sample space cantwo be written and notypes, of where a diploid individual allelesas:at a singleflocus n {g,p } we hav ssible Individuals } Pn (xi µ) 1 g = {A1 A1, A1 A, A A }= (5) ) g i=1 P r(f { } L( x) =S = {A epa A, A A } g {g,p } A, g space at a locus and Where is the genotype sample P is the phenotype g (6) P } P r{g, F{g,P } sample space ere we will use g to define the genotype of an individual (at a single locus), suc P r(y X) X) = P r(y ible Individuals } F{g,P (7) P r(f ) } = P r(y, P {g,p } genotype gi = Aj Ak. isevery the set individual of possible genotypes, where for a in the p ividual has that a genotype that could occur inote = {g P } diploid, with two alleles:for H r(y, X)in = the P r(yset )P S r 0 : P values refore has a Fsingle value their phenotype (that takes P r(f ) P r{g, P } {g,p } (8) {g,p } Pn (xi gle value for their genotype (which is an element of the set S ). g = {A A, A A, A A } 1 g i=1 n e r(x)p } P r{g, P r(f P r(y) X) = P r(y, X) = P r(y )P (9) () {g,p } LRT =sick= For the phenotype, this can be any type of measurement (e.g. or healthy,pn (xi g 1 H : P r(y, X) = P r(y )P r(x) 0 P r(y X) = P r(y, X) = P r(y )Pi=1 r(x) height, etc.) n e P r{g, P } (10) () Pn (xi H0 (µ)) P 4 1 X) = P r(y )P r(x) 0 : P r(y, n ehi=1 L( ˆ x) M

7 Review: the statistical model { P Next, we need to define the probability model on the sigma algebra of the sample space ( ): F { F } {g,p} Pr(F {g,p} F) Which defines the probability { { of } each possible genotype and phenotype pair: Pr{g, P} We will define two (types) or random variables (* = state does not matter): Y :(, P ) R X :( g, ) R Note that the probability model P induces a (joint) probability distribution on this random vector (these random variables): Pr(Y,X)

8 Review: looking ahead (the goal...) The goal of quantitative genomics and genetics is to identify cases of the following relationship where when performing the perfect genotype manipulation experiment { we \ have: } Remember that, regardless of the probability P distribution of our random vector, we can define the expectation: and the variance: Pr(Y \ X) =Pr(Y,X) 6= Pr(Y )Pr(X) Var[Y,X]= E[Y,X]=[EY,EX] apple Var(Y apple ) apple Cov(Y,X) Cov(Y,X) Var(X) The goal of quantitative genomics can be rephrased as assessing the following relationship in the perfect experimental framework (although in practice, we will do this by assessing the following relationship in an uncontrolled setting): Cov(Y,X) 6= 0

9 Linear regression We are going to consider a regression model a parameterized model to represent the probability model of X and Y (that is the true statistical model of genetics!!!): Y = 0 + X 1 + N(0, ) Note that in this model, we consider Y to be the dependent or response variable and X to be the independent variable (what are the parameters!?) Also note implicitly assumes: Pr(Y,X)=Pr(Y X) That is, that X is effectively fixed when considering an individual, although note that X still varies (has a probability)

10 Linear regression is a bivariate distribution We ve seen bivariate (multivariate) distributions before:

11 Linear regression I Let s review the structure of a linear regression (not necessarily a genetic model): Y = 0 + X 1 + N(0, ) Y X

12 Linear regression I

13 Linear regression II The linear regression model allows calculation of the (interval) probability of observations (!!) Y = 0 + X 1 + N(0, ) Y X

14 Linear regression III A multiple regression model has the same structure, with a single dependent variable Y and more than one independent variable Xi, Xj, e.g.,

15 The genetic probability model I The quantitative genetic model is a multiple regression model with the following independent ( dummy ) variables: X a (A 1 A 1 )= 1,X a (A 1 A )=0,X a (A A ) = 1 X d (A 1 A 1 )= 1,X d (A 1 A )=1,X d (A A )= 1 1 A 1 A 1 A 1 A 1 A A and the following multiple regression equation: Y = µ + X a a + X d d + N(0, )

16 The genetic probability model II The probability distribution of this model, is therefore: Which has four parameters: The three parameters are required to model the three separate genotypes (A1A1, A1A, AA) Pr(Y X) N( µ + X a a + X d d, µ, a, d, The can be thought of as a random variable that describes the probability an individual will have a specific value of Y, conditional on the genotype AiAj, where the probability is normally distributed around the value determined by the X s and s N(0, ) )

17 The genetic probability model III P Let s consider a specific example where we are interested modeling the relationship between a genotype and a phenotype (such as height) where the latter is well approximated by a normal distribution For this case, the (unknown) conditions of the experiment define the true values of the parameters (unknown to us!), which we will say are X the following (note these are the same for all individuals in the population since they are { parameters of the probability } distribution): Consider an individual i with gi = A1A such that we have: If this individual has a phenotype value yi =.1 then we have the epsilon value i =0.7where the probability of this particular value (i.e. the interval surrounding g i this = A value) j A k is defined by N(0, µ =0.3, a = 0., d =1.1, =1.1 X = X a (A1A) = 0,X d (A1A) = (0)( 0.) + (1)(1.1) )

18 6 6 The genetic probability model IV Note that, while somewhat arbitrary, the advantage of the Xa and Xd coding is the parameters a and d map directly on to relationships between the genotype and phenotype that are important in genetics: If a 6= 0, d =0 then this is a purely additive case If a =0, d 6= 0 then this is only over- or underdominance (homozygotes have equal effects on phenotype) If both are non-zero, there are both additive and dominance effects If both are zero, there is no effect of the genotype on the phenotype (the genotype is not causal!)

19 Genetic example 1 As an example, consider the following of a purely additive case (= no dominance): µ =, a =5, d =0, =1

20 Genetic example II An example of dominance (= not a pure additive case): µ =0, a =4, d = 1, =1

21 Genetic example III A case of NO genetic effect: µ =, a =0, d =0, =1

22 Quantitative genetic formalism For those of you who have been exposed to classic quantitative genetics, you have seen a different notation for this model: P = G + E P is the phenotypic value - the value of the aspect measured G is the genotypic value - the expected value of the phenotype conditional on the genotype E is the environmental value - the value of the phenotype that we cannot explain given the genotype These translate as follows for our one locus case (although note the formalism extends to any multiple locus case): Y = P G =EP =EY = µ + X a a + X d d = E

23 6 Genetic inference I For our model focusing on one locus: Y = µ + X a a + X d d + N(0, We have four possible parameters we could estimate: = µ, a, d, However, for our purposes, we are only interested in the genetic parameters and testing the following null 6 hypothesis: [ ) H 0 : Cov(X a,y)=0\ Cov(X d,y)=0 H A : Cov(X a,y) 6= 0[ Cov(X d,y) 6= 0 \ OR H 0 : a =0\ d =0 H A : a 6= 0[ d 6=0

24 Genetic inference II Recall that inference (whether estimation or hypothesis testing) starts by collecting a sample and defining a statistic on that sample In this case, we are going to collect a sample of n individuals where for each we will measure their phenotype and their genotype (i.e. at the locus we are focusing on) That is an individual i will have phenotype yi and genotype gi = AjAk (where we translate these into xa and xd) Using the phenotype and genotype we will construct both an estimator (a statistic!) and we will additionally construct a test statistic Remember that our regression probability model defines a sampling distribution on our sample and therefore on our estimator and test statistic (!!)

25 Genetic inference III For notation convenience, we are going to use vector / matrix notation to represent a sample: y i = µ + x i,a a + x i,d d + i y 1 y. y n y 1 y. y n = µ + x 1,a a + x 1,d d + 1 µ + x,a a + x,d d +. µ + x n,a a + x n,d d + n 1 x 1,a x 1,d = 1 x,a x,d..... µ a +. d 1 x n,a x n,d 1 n y = x +

26 Genetic estimation I We will define a MLE for our parameters: =[ µ, a, d] Recall that an MLE is simply a statistic (a function that takes a sample in and outputs a number that is our estimate) In this case, our statistic will be a vector valued function that takes T (y, x a, x d )= ˆ 4 5 =[ˆµ, in the vectors that represent our sample ˆa, ˆd] Note that we calculate an MLE for this case just as we would any case (we use the likelihood of the fixed sample where we identify the parameter values that maximize this function) In the linear regression case (just as with normal parameters) this has a closed form: 3 MLE( ˆ) =(x T x) 1 x T y

27 Genetic estimation II Let s look at the structure of this estimator: y 1 y. y n y = x 1 x 1,a x 1,d = 1 x,a x,d..... µ a +. d 1 x n,a x n,d MLE( ˆ) = 4 + ˆµ ˆa ˆd n MLE( ˆ) =(x T x) 1 x T y

28 That s it for today Next lecture: genetic linear regression hypothesis testing and introduction to GWAS!

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 18: Introduction to covariates, the QQ plot, and population structure II + minimal GWAS steps Jason Mezey jgm45@cornell.edu April

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary

More information

BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014

BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014 BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014 Homework 4 (version 3) - posted October 3 Assigned October 2; Due 11:59PM October 9 Problem 1 (Easy) a. For the genetic regression model: Y

More information

BTRY 4830/6830: Quantitative Genomics and Genetics

BTRY 4830/6830: Quantitative Genomics and Genetics BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics Genetics BTRY 4830/6830; PBSB.5201.01 Lecture13: Introduction to genome-wide association studies (GWAS) II Jason Mezey jgm45@cornell.edu March 16, 2017 (Th) 8:40-9:55 Announcements

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 2: Introduction to probability basics Jason Mezey jgm45@cornell.edu Jan. 31, 2017 (T) 8:40-9:55 Announcements 1 Registration updates

More information

BTRY 7210: Topics in Quantitative Genomics and Genetics

BTRY 7210: Topics in Quantitative Genomics and Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu February 12, 2015 Lecture 3:

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability

More information

Lecture 3: Basic Statistical Tools. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013

Lecture 3: Basic Statistical Tools. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013 Lecture 3: Basic Statistical Tools Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013 1 Basic probability Events are possible outcomes from some random process e.g., a genotype is AA, a phenotype

More information

Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments

Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments We consider two kinds of random variables: discrete and continuous random variables. For discrete random

More information

AEC 550 Conservation Genetics Lecture #2 Probability, Random mating, HW Expectations, & Genetic Diversity,

AEC 550 Conservation Genetics Lecture #2 Probability, Random mating, HW Expectations, & Genetic Diversity, AEC 550 Conservation Genetics Lecture #2 Probability, Random mating, HW Expectations, & Genetic Diversity, Today: Review Probability in Populatin Genetics Review basic statistics Population Definition

More information

Lecture 7 Correlated Characters

Lecture 7 Correlated Characters Lecture 7 Correlated Characters Bruce Walsh. Sept 2007. Summer Institute on Statistical Genetics, Liège Genetic and Environmental Correlations Many characters are positively or negatively correlated at

More information

Generalized Linear Models (1/29/13)

Generalized Linear Models (1/29/13) STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability

More information

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power Proportional Variance Explained by QTL and Statistical Power Partitioning the Genetic Variance We previously focused on obtaining variance components of a quantitative trait to determine the proportion

More information

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative

More information

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8 The E-M Algorithm in Genetics Biostatistics 666 Lecture 8 Maximum Likelihood Estimation of Allele Frequencies Find parameter estimates which make observed data most likely General approach, as long as

More information

MATH 450: Mathematical statistics

MATH 450: Mathematical statistics Departments of Mathematical Sciences University of Delaware August 28th, 2018 General information Classes: Tuesday & Thursday 9:30-10:45 am, Gore Hall 115 Office hours: Tuesday Wednesday 1-2:30 pm, Ewing

More information

Statistical Inference

Statistical Inference Statistical Inference Classical and Bayesian Methods Class 7 AMS-UCSC Tue 31, 2012 Winter 2012. Session 1 (Class 7) AMS-132/206 Tue 31, 2012 1 / 13 Topics Topics We will talk about... 1 Hypothesis testing

More information

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies Ian Barnett, Rajarshi Mukherjee & Xihong Lin Harvard University ibarnett@hsph.harvard.edu June 24, 2014 Ian Barnett

More information

CSE 312 Final Review: Section AA

CSE 312 Final Review: Section AA CSE 312 TAs December 8, 2011 General Information General Information Comprehensive Midterm General Information Comprehensive Midterm Heavily weighted toward material after the midterm Pre-Midterm Material

More information

Outline for today s lecture (Ch. 14, Part I)

Outline for today s lecture (Ch. 14, Part I) Outline for today s lecture (Ch. 14, Part I) Ploidy vs. DNA content The basis of heredity ca. 1850s Mendel s Experiments and Theory Law of Segregation Law of Independent Assortment Introduction to Probability

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.03 Lecture 2: Introduction to probability basics Jason Mezey jgm45@cornell.edu Jan. 24, 2019 (Th) 10:10-11:25 Registering for the class ANYONE

More information

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017 Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping

More information

Lecture 2: Introduction to Quantitative Genetics

Lecture 2: Introduction to Quantitative Genetics Lecture 2: Introduction to Quantitative Genetics Bruce Walsh lecture notes Introduction to Quantitative Genetics SISG, Seattle 16 18 July 2018 1 Basic model of Quantitative Genetics Phenotypic value --

More information

Case-Control Association Testing. Case-Control Association Testing

Case-Control Association Testing. Case-Control Association Testing Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits. Technological advances have made it feasible to perform case-control association studies

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Evolution of quantitative traits

Evolution of quantitative traits Evolution of quantitative traits Introduction Let s stop and review quickly where we ve come and where we re going We started our survey of quantitative genetics by pointing out that our objective was

More information

STA442/2101: Assignment 5

STA442/2101: Assignment 5 STA442/2101: Assignment 5 Craig Burkett Quiz on: Oct 23 rd, 2015 The questions are practice for the quiz next week, and are not to be handed in. I would like you to bring in all of the code you used to

More information

DNA polymorphisms such as SNP and familial effects (additive genetic, common environment) to

DNA polymorphisms such as SNP and familial effects (additive genetic, common environment) to 1 1 1 1 1 1 1 1 0 SUPPLEMENTARY MATERIALS, B. BIVARIATE PEDIGREE-BASED ASSOCIATION ANALYSIS Introduction We propose here a statistical method of bivariate genetic analysis, designed to evaluate contribution

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 2: Introduction to probability basics Jason Mezey jgm45@cornell.edu Feb. 2, 2016 (T) 8:40-9:55 Announcements 1 Registration updates

More information

Inference in Regression Analysis

Inference in Regression Analysis Inference in Regression Analysis Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 1 Today: Normal Error Regression Model Y i = β 0 + β 1 X i + ǫ i Y i value

More information

Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics

Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics Lee H. Dicker Rutgers University and Amazon, NYC Based on joint work with Ruijun Ma (Rutgers),

More information

Lecture 1 Basic Statistical Machinery

Lecture 1 Basic Statistical Machinery Lecture 1 Basic Statistical Machinery Bruce Walsh. jbwalsh@u.arizona.edu. University of Arizona. ECOL 519A, Jan 2007. University of Arizona Probabilities, Distributions, and Expectations Discrete and Continuous

More information

(Genome-wide) association analysis

(Genome-wide) association analysis (Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by

More information

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20 CS 70 Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20 Today we shall discuss a measure of how close a random variable tends to be to its expectation. But first we need to see how to compute

More information

BTRY 7210: Topics in Quantitative Genomics and Genetics

BTRY 7210: Topics in Quantitative Genomics and Genetics BTR 70: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB Department of Genetic Medicine jgm45@cornell.edu March 9 05 Lecture 5: Basics of eqtl

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

Lecture 2. Fisher s Variance Decomposition

Lecture 2. Fisher s Variance Decomposition Lecture Fisher s Variance Decomposition Bruce Walsh. June 008. Summer Institute on Statistical Genetics, Seattle Covariances and Regressions Quantitative genetics requires measures of variation and association.

More information

Lecture 3. Inference about multivariate normal distribution

Lecture 3. Inference about multivariate normal distribution Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates

More information

Evolution of phenotypic traits

Evolution of phenotypic traits Quantitative genetics Evolution of phenotypic traits Very few phenotypic traits are controlled by one locus, as in our previous discussion of genetics and evolution Quantitative genetics considers characters

More information

Bayesian Regression (1/31/13)

Bayesian Regression (1/31/13) STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed

More information

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies Ian Barnett, Rajarshi Mukherjee & Xihong Lin Harvard University ibarnett@hsph.harvard.edu August 5, 2014 Ian Barnett

More information

Lecture WS Evolutionary Genetics Part I 1

Lecture WS Evolutionary Genetics Part I 1 Quantitative genetics Quantitative genetics is the study of the inheritance of quantitative/continuous phenotypic traits, like human height and body size, grain colour in winter wheat or beak depth in

More information

STT 843 Key to Homework 1 Spring 2018

STT 843 Key to Homework 1 Spring 2018 STT 843 Key to Homework Spring 208 Due date: Feb 4, 208 42 (a Because σ = 2, σ 22 = and ρ 2 = 05, we have σ 2 = ρ 2 σ σ22 = 2/2 Then, the mean and covariance of the bivariate normal is µ = ( 0 2 and Σ

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 9 for Applied Multivariate Analysis Outline Addressing ourliers 1 Addressing ourliers 2 Outliers in Multivariate samples (1) For

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Bivariate distributions

Bivariate distributions Bivariate distributions 3 th October 017 lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.) Bivariate Distributions of the Discrete Type The Correlation Coefficient

More information

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q) Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,

More information

Introduction to bivariate analysis

Introduction to bivariate analysis Introduction to bivariate analysis When one measurement is made on each observation, univariate analysis is applied. If more than one measurement is made on each observation, multivariate analysis is applied.

More information

Primer on statistics:

Primer on statistics: Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

More information

Simple Linear Regression for the MPG Data

Simple Linear Regression for the MPG Data Simple Linear Regression for the MPG Data 2000 2500 3000 3500 15 20 25 30 35 40 45 Wgt MPG What do we do with the data? y i = MPG of i th car x i = Weight of i th car i =1,...,n n = Sample Size Exploratory

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

Selection Page 1 sur 11. Atlas of Genetics and Cytogenetics in Oncology and Haematology SELECTION

Selection Page 1 sur 11. Atlas of Genetics and Cytogenetics in Oncology and Haematology SELECTION Selection Page 1 sur 11 Atlas of Genetics and Cytogenetics in Oncology and Haematology SELECTION * I- Introduction II- Modeling and selective values III- Basic model IV- Equation of the recurrence of allele

More information

Introduction to bivariate analysis

Introduction to bivariate analysis Introduction to bivariate analysis When one measurement is made on each observation, univariate analysis is applied. If more than one measurement is made on each observation, multivariate analysis is applied.

More information

Lecture 6: Selection on Multiple Traits

Lecture 6: Selection on Multiple Traits Lecture 6: Selection on Multiple Traits Bruce Walsh lecture notes Introduction to Quantitative Genetics SISG, Seattle 16 18 July 2018 1 Genetic vs. Phenotypic correlations Within an individual, trait values

More information

Lecture 9. QTL Mapping 2: Outbred Populations

Lecture 9. QTL Mapping 2: Outbred Populations Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred

More information

Module 11: Linear Regression. Rebecca C. Steorts

Module 11: Linear Regression. Rebecca C. Steorts Module 11: Linear Regression Rebecca C. Steorts Announcements Today is the last class Homework 7 has been extended to Thursday, April 20, 11 PM. There will be no lab tomorrow. There will be office hours

More information

I Have the Power in QTL linkage: single and multilocus analysis

I Have the Power in QTL linkage: single and multilocus analysis I Have the Power in QTL linkage: single and multilocus analysis Benjamin Neale 1, Sir Shaun Purcell 2 & Pak Sham 13 1 SGDP, IoP, London, UK 2 Harvard School of Public Health, Cambridge, MA, USA 3 Department

More information

2.1 Linear regression with matrices

2.1 Linear regression with matrices 21 Linear regression with matrices The values of the independent variables are united into the matrix X (design matrix), the values of the outcome and the coefficient are represented by the vectors Y and

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

Computational Approaches to Statistical Genetics

Computational Approaches to Statistical Genetics Computational Approaches to Statistical Genetics GWAS I: Concepts and Probability Theory Christoph Lippert Dr. Oliver Stegle Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen

More information

STAT 536: Genetic Statistics

STAT 536: Genetic Statistics STAT 536: Genetic Statistics Tests for Hardy Weinberg Equilibrium Karin S. Dorman Department of Statistics Iowa State University September 7, 2006 Statistical Hypothesis Testing Identify a hypothesis,

More information

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012 Problem Set #6: OLS Economics 835: Econometrics Fall 202 A preliminary result Suppose we have a random sample of size n on the scalar random variables (x, y) with finite means, variances, and covariance.

More information

Lecture 9: Linear Regression

Lecture 9: Linear Regression Lecture 9: Linear Regression Goals Develop basic concepts of linear regression from a probabilistic framework Estimating parameters and hypothesis testing with linear models Linear regression in R Regression

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies.

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies. Theoretical and computational aspects of association tests: application in case-control genome-wide association studies Mathieu Emily November 18, 2014 Caen mathieu.emily@agrocampus-ouest.fr - Agrocampus

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model EPSY 905: Multivariate Analysis Lecture 1 20 January 2016 EPSY 905: Lecture 1 -

More information

Previous lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure)

Previous lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure) Previous lecture Single variant association Use genome-wide SNPs to account for confounding (population substructure) Estimation of effect size and winner s curse Meta-Analysis Today s outline P-value

More information

review session gov 2000 gov 2000 () review session 1 / 38

review session gov 2000 gov 2000 () review session 1 / 38 review session gov 2000 gov 2000 () review session 1 / 38 Overview Random Variables and Probability Univariate Statistics Bivariate Statistics Multivariate Statistics Causal Inference gov 2000 () review

More information

BIOL Evolution. Lecture 9

BIOL Evolution. Lecture 9 BIOL 432 - Evolution Lecture 9 J Krause et al. Nature 000, 1-4 (2010) doi:10.1038/nature08976 Selection http://www.youtube.com/watch?v=a38k mj0amhc&feature=playlist&p=61e033 F110013706&index=0&playnext=1

More information

Exam 2. Jeremy Morris. March 23, 2006

Exam 2. Jeremy Morris. March 23, 2006 Exam Jeremy Morris March 3, 006 4. Consider a bivariate normal population with µ 0, µ, σ, σ and ρ.5. a Write out the bivariate normal density. The multivariate normal density is defined by the following

More information

Review. December 4 th, Review

Review. December 4 th, Review December 4 th, 2017 Att. Final exam: Course evaluation Friday, 12/14/2018, 10:30am 12:30pm Gore Hall 115 Overview Week 2 Week 4 Week 7 Week 10 Week 12 Chapter 6: Statistics and Sampling Distributions Chapter

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 1: August 22, 2012

More information

Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE

Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE Eric Zivot Winter 013 1 Wald, LR and LM statistics based on generalized method of moments estimation Let 1 be an iid sample

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University. Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall

More information

Gov 2000: 9. Regression with Two Independent Variables

Gov 2000: 9. Regression with Two Independent Variables Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Fall 2016 1 / 62 1. Why Add Variables to a Regression? 2. Adding a Binary Covariate 3. Adding a Continuous Covariate 4. OLS Mechanics

More information

Bias Variance Trade-off

Bias Variance Trade-off Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ) = E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ) = Var(ˆθ) + (B(ˆθ) 2 ) MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Correlation analysis. Contents

Correlation analysis. Contents Correlation analysis Contents 1 Correlation analysis 2 1.1 Distribution function and independence of random variables.......... 2 1.2 Measures of statistical links between two random variables...........

More information

GWAS for Compound Heterozygous Traits: Phenotypic Distance and Integer Linear Programming Dan Gusfield, Rasmus Nielsen.

GWAS for Compound Heterozygous Traits: Phenotypic Distance and Integer Linear Programming Dan Gusfield, Rasmus Nielsen. GWAS for Compound Heterozygous Traits: Phenotypic Distance and Integer Linear Programming Dan Gusfield, Rasmus Nielsen December 11, 2016 GWAS In Genome Wide Association Studies (GWAS) we try to locate

More information

2018 SISG Module 20: Bayesian Statistics for Genetics Lecture 2: Review of Probability and Bayes Theorem

2018 SISG Module 20: Bayesian Statistics for Genetics Lecture 2: Review of Probability and Bayes Theorem 2018 SISG Module 20: Bayesian Statistics for Genetics Lecture 2: Review of Probability and Bayes Theorem Jon Wakefield Departments of Statistics and Biostatistics University of Washington Outline Introduction

More information

1.3 Forward Kolmogorov equation

1.3 Forward Kolmogorov equation 1.3 Forward Kolmogorov equation Let us again start with the Master equation, for a system where the states can be ordered along a line, such as the previous examples with population size n = 0, 1, 2,.

More information

The concept of breeding value. Gene251/351 Lecture 5

The concept of breeding value. Gene251/351 Lecture 5 The concept of breeding value Gene251/351 Lecture 5 Key terms Estimated breeding value (EB) Heritability Contemporary groups Reading: No prescribed reading from Simm s book. Revision: Quantitative traits

More information

Regression used to predict or estimate the value of one variable corresponding to a given value of another variable.

Regression used to predict or estimate the value of one variable corresponding to a given value of another variable. CHAPTER 9 Simple Linear Regression and Correlation Regression used to predict or estimate the value of one variable corresponding to a given value of another variable. X = independent variable. Y = dependent

More information

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1. Problem 1 (21 points) An economist runs the regression y i = β 0 + x 1i β 1 + x 2i β 2 + x 3i β 3 + ε i (1) The results are summarized in the following table: Equation 1. Variable Coefficient Std. Error

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

Linear Models and Estimation by Least Squares

Linear Models and Estimation by Least Squares Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,

More information