Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Size: px
Start display at page:

Download "Quantitative Genomics and Genetics BTRY 4830/6830; PBSB"

Transcription

1 Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55

2 Announcements I April 11 Genome-Wide Association Studies (GWAS) IV: logistic regression 1 (the model) 12 April 13 Project Assigned Genome-Wide Association Studies (GWAS) V: logistic regression II (IRLS algorithm and GLMs) April 18 Genome-Wide Association Studies (GWAS) X: Haplotype testing, alternative tests, and minimum GWAS analysis 13 April 20 Advanced topics I: Mixed Models April 25 Advanced topics II: Multiple regression (epistasis) and multivariate regression 14 April 27 MAPPING LOCI: BAYESIAN ANALYSIS Bayesian inference I: inference basics / linear models May 2 Bayesian inference II: MCMC algorithms 15 May 4 PEDIGREE / INBRED LINE ANALYSIS / CLASSIC QUANTITATIVE GENETICS Basics of linkage analysis / Inbred line analysis May 9 Project Due Heritability and additive genetic variance 16

3 Announcements Midterm will be available next week No more homeworks (!!) - just a project and final (and computer labs) Your PROJECT will be assigned on Thurs.! I will have office hours today In Ithaca, same location as always In NY go to the SMALL Genetic Med Conference Room

4 Conceptual Overview Genetic System Does A1 -> A2 affect Y? Reject / DNR Measured individuals (genotype, phenotype) Regression model Sample or experimental pop Model params F-test Pr(Y X)

5 Review: modeling covariates I If we have a factor that is correlated with our phenotype and we do not handle it in some manner in our analysis, we risk producing false positives AND/OR reduce the power of our tests! The good news is that, assuming we have measured the factor (i.e. it is part of our GWAS dataset) then we can incorporate the factor in our model as a covariate(s): Y = µ + X a a + X d d + X z,1 z,1 + X z,2 z,2 + The effect of this is that we will estimate the covariate model parameter and this will account for the correlation of the factor with phenotype (such that we can test for our marker correlation without false positives / lower power!)

6 4 5 Review modeling covariates II How do we perform inference with a covariate in our lines regression model? We perform MLE the same way (!!) our X matrix now simply includes extra columns, one for each of the additional covariates, where for the linear regression we have: MLE( ˆ) =(x T x) 1 x T y We perform hypothesis testing the same way (!!) with a slight difference: our LRT includes the covariate in both the null hypothesis and the alternative, but we are testing the same null hypothesis: H 0 : a =0\ d = 0 H A : a 6= 0[ d 6= 0

7 Modeling covariates III First, determine the predicted value of the phenotype of each individual under the null hypothesis (how do we set up x?): ŷ i,ˆ 0 = ˆµ + X x i,z,j ˆz,j X j=1 X Second, determine the predicted value Xof the phenotype of each individual under Xthe alternative hypothesis (set up x?): ŷ i,ˆ 1 = ˆµ + x i,a ˆa + x i,d ˆd + X Xx i,z,j ˆz,j X j=1 Third, calculate the Error Sum of Squares for each: SSE( ˆ 0 )= nx (y i ŷ i,ˆ 0 ) 2 i=1 X i=1 Finally, we calculate X the F-statistic with degrees of freedom [2, n-3] (why two degress of freedom?): F [2,n 3] (y, x) = SSE( ˆ 1 )= SSE(ˆ 0 ) SSE(ˆ 1 ) 2 SSE(ˆ 1 ) n 3 nx (y i ŷ i,ˆ 1 ) 2

8 Modeling covariates VI Say you have GWAS data (a phenotype and genotypes) and your GWAS data also includes information on a number of covariates, e.g. male / female, several different ancestral groups (different populations!!), other risk factors, etc. First, you need to figure out how to code the XZ in each case for each of these, which may be simple (male / female) but more complex with others (where how to code them involves fuzzy rules, i.e. it depends on your context!!) Second, you will need to figure out which to include in your analysis (again, fuzzy rules!) but a good rule is if the parameter estimate associated with the covariate is large (=significant individual p-value) you should include it! There are many ways to figure out how to include covariates (again a topic in itself!!)

9 Review: population structure Population structure or stratification is a case where a sample includes groups of people that fit into two or more different ancestry groups (fuzzy def!) Population structure is often a major issue in GWAS where it can cause lots of false positives if it is not accounted for in your model Intuitively, you can model population structure as a covariate if you know: How many populations are represented in your sample Which individual in your sample belongs to which population QQ plots are good for determining whether there may be population structure Clustering techniques are good for detecting population structure and determining which individual is in which population (=ancestry group)

10 Origin of population structure Sarver World Cultures People geographically separate through migration and then the set of alleles present in the population evolves (=changes) over time

11 Principal Component Analysis (PCA) of population structure Nature Publishing

12 Learning unmeasured population factors To learn a population factor, analyze the genotype data Data = z z 1k y y 1m x x 1N z n1... z nk y n1... y nm x x nn Apply a Principal Component Analysis (PCA) where the axes (features) in this case are individuals and each point is a (scaled) genotype Zi,2 Zi,1 What we are interested in the projections (loadings) of the individual PCs on the axes (dotted arrows) on each of the individual axes, where for each, this will produce n (i.e. one value for each sample) value of a new independent (covariate) variable XZ Y = µ + X a a + X d d + X z,1 z,1 + X z,2 z,2 +

13 Applying a PCA population structure analysis (in practice) Calculate the nxn (n=sample size) covariance matrix for the individuals in your sample across all genotypes Apply a PCA to this covariance matrix, the output will be matrices containing eigenvalues and eigenvectors (= the Principal Components), where the size of the eigenvalue indicates the ordering of the Principal Component Each Principal Component (PC) will be a n element vector where each element is the loading of the PC on the individual axes, where these are your values of your independent variable coding (e.g., if you include the first PC as your first covariate, your coding will be XZ,1 = PC loadings) Note that you could also get the same answer by calculating an NxN (N=measured genotypes) covariance matrix, apply PCA and take the projects of each sample on the PCs (why might this be less optimal?)

14 Using the results of a PCA population structure analysis Once you have detected the populations (e.g. by eye in a PCA = fuzzy!) in your GWAS sample, set your independent variables equal to the loadings for each individual, e.g., for two pop covariates, set XZ,1 = Z1, XZ,2 = Z2 You could also determine which individual is in which pop and define random variables for pop assignment, e.g. for two populations include single covariate by setting, XZ,1(pop1) = 1, XZ,1(pop2) = 0 (generally less optimal but can be used!) Use one of these approaches to model a covariate in your analysis, i.e. for every genotype marker that you test in your GWAS: Y = µ + X a a + X d d + X z,1 z,1 + X z,2 z,2 + The goal is to produce a good QQ plot (what if it does not?)

15 Before (top) and after including a population covariate (bottom)

16 Review: linear regression So far, we have considered a linear regression is a reasonable model for the relationship between genotype and phenotype (where this implicitly assumes a normal error provides a reasonable approximation of the phenotype distribution given the genotype): Y = µ + X a a + X d d + N(0, 2 )

17 Case / Control Phenotypes I While a linear regression may provide a reasonable model for many phenotypes, we are commonly interested in analyzing phenotypes where this is NOT a good model As an example, we are often in situations where we are interested in identifying causal polymorphisms (loci) that contribute to the risk for developing a disease, e.g. heart disease, diabetes, etc. In this case, the phenotype we are measuring is often has disease or does not have disease or more precisely case or control Recall that such phenotypes are properties of measured individuals and therefore elements of a sample space, such that we can define a random variable such as Y(case) = 1 and Y(control) = 0

18 Case / Control Phenotypes II Let s contrast the situation, let s contrast data we might model with a linear regression model versus case / control data:

19 Case / Control Phenotypes II Let s contrast the situation, let s contrast data we might model with a linear regression model versus case / control data:

20 Logistic regression I Instead, we re going to consider a logistic regression model

21 Logistic regression II It may not be immediately obvious why we choose regression line function of this shape The reason is mathematical convenience, i.e. this function can be considered (along with linear regression) within a broader class of models called Generalized Linear Models (GLM) which we will discuss next lecture However, beyond a few differences (the error term and the regression function) we will see that the structure and out approach to inference is the same with this model

22 Logistic regression III To begin, let s consider the structure of a regression model: We code the X s the same (!!) although a major difference here is the logistic function as yet undefined However, the expected value of Y has the same structure as we have seen before in a regression: We can similarly write for a population using matrix notation (where the X matrix has the same form as we have been considering!): Y = logistic( µ + X a a + X d d )+ l E(Y i X i )=logistic( µ + X i,a a + X i,d d ) E(Y X) =logistic(x ) In fact the two major differences are in the form of the error and the logistic function

23 Logistic regression: error term I Recall that for a linear regression, the error term accounted for the difference between each point and the expected value (the linear regression line), which we assume follow a normal, but for a logistic regression, we have the same case but the value has to make up the value to either 0 or 1 (what distribution is this?): Y Y Xa Xa

24 Logistic regression: error term II For the error on an individual i, we therefore have to construct an error that takes either the value of 1 or 0 depending on the value of the expected value of the genotype For Y = 0 i = E(Y i X i )= E(Y A i A j )= logistic( µ + X i,a a + X i,d d ) For Y = 1 i =1 E(Y i X i )=1 E(Y A i A j )=1 logistic( µ + X i,a a + X i,d d )

25 Logistic regression: error term II For the error on an individual i, we therefore have to construct an error that takes either the value of 1 or 0 depending on the value of the expected value of the genotype For Y = 0 i = E(Y i X i )= E(Y A i A j )= logistic( µ + X i,a a + X i,d d ) For Y = 1 i =1 E(Y i X i )=1 E(Y A i A j )=1 logistic( µ + X i,a a + X i,d d ) For a distribution that takes two such values, a reasonable distribution is therefore the Bernoulli distribution with the following parameter i = Z E(Y i X i )

26 Logistic regression: error term II For the error on an individual i, we therefore have to construct an error that takes either the value of 1 or 0 depending on the value of the expected value of the genotype For Y = 0 i = E(Y i X i )= E(Y A i A j )= logistic( µ + X i,a a + X i,d d ) For Y = 1 i =1 E(Y i X i )=1 E(Y A i A j )=1 logistic( µ + X i,a a + X i,d d ) For a distribution that takes two such values, a reasonable distribution is therefore the Bernoulli distribution with the following parameter i = Z E(Y i X i ) Pr(Z) bern(p) p = logistic( µ + X a a + X d d )

27 Logistic regression: error term III This may look complicated at first glance but the intuition is relatively simple If the logistic regression line is near zero, the probability distribution of the error term is set up to make the probability of Y being zero greater than being one (and vice versa for the regression line near one!): i = Z E(Y i X i ) Pr(Z) bern(p) p = logistic( µ + X a a + X d d ) Y Xa

28 Logistic regression: link function I Next, we have to consider the function for the regression line of a logistic regression (remember below we are plotting just versus Xa but this really is a plot versus Xa AND Xd!!): E(Y i X i )=logistic( µ + X i,a a + X i,d d ) E(Y i X i )= e µ +X i,a a +X i,d d 1+e µ +X i,a a +X i,d d Y Xa

29 Calculating the components of an individual II For example, say we have an individual i that has genotype A1A1 and phenotype Yi = 0 We know Xa = -1 and Xd = -1 Say we also know that for the population, the true parameters (which we will not know in practice! We need to infer them!) are: µ =0.2 a =2.2 d =0.2 We can then calculate the E(Yi Xi) and the error term for i: Y i = e µ +x i,a a +x i,d d 1+e µ +x i,a a +x i,d d + i 0= e0.2+( 1)2.2+( 1)0.2 1+e 0.2+( 1)2.2+( 1)0.2 + i 0=

30 Calculating the components of an individual III For example, say we have an individual i that has genotype A1A1 and phenotype Yi = 1 We know Xa = -1 and Xd = -1 Say we also know that for the population, the true parameters (which we will not know in practice! We need to infer them!) are: µ =0.2 a =2.2 d =0.2 We can then calculate the E(Yi Xi) and the error term for i: Y i = e µ +x i,a a +x i,d d 1+e µ +x i,a a +x i,d d + i 1= e0.2+( 1)2.2+( 1)0.2 1+e 0.2+( 1)2.2+( 1)0.2 + i 1=

31 Calculating the components of an individual IV For example, say we have an individual i that has genotype A1A2 and phenotype Yi = 0 We know Xa = 0 and Xd = 1 Say we also know that for the population, the true parameters (which we will not know in practice! We need to infer them!) are: µ =0.2 a =2.2 d =0.2 We can then calculate the E(Yi Xi) and the error term for i: Y i = e µ +x i,a a +x i,d d 1+e µ +x i,a a +x i,d d + i 0= e0.2+(0)2.2+(1)0.2 1+e 0.2+(0)2.2+(1)0.2 + i 0=

32 Calculating the components of an individual V For example, say we have an individual i that has genotype A2A2 and phenotype Yi = 0 We know Xa = 1 and Xd = -1 Say we also know that for the population, the true parameters (which we will not know in practice! We need to infer them!) are: µ =0.2 a =2.2 d =0.2 We can then calculate the E(Yi Xi) and the error term for i: Y i = e µ +x i,a a +x i,d d 1+e µ +x i,a a +x i,d d + i 0= e0.2+(1)2.2+( 1)0.2 1+e 0.2+(1)2.2+( 1)0.2 + i 0=

33 For the entire probability distributions I Recall that the error term is either the negative of E(Yi Xi) when Yi is zero and 1- E(Yi Xi) when Yi is one: i (Y i = 0) = E(Y i X i ) i (Y i = 1) = 1 E(Y i X i ) For the entire distribution of the population, recall that Pr( i ) bern(p X) E(Y X) p =E(Y X) For example: i = 0.1 i =0.9 p =0.1

34 For the entire probability Recall that the error term is either the negative of E(Yi Xi) when Yi is zero and 1- E(Yi Xi) when Yi is one: i (Y i = 0) = E(Y i X i ) distributions II i (Y i = 1) = 1 E(Y i X i ) For the entire distribution of the population, recall that Pr( i ) bern(p X) E(Y X) p =E(Y X) For example: i = 0.6 i =0.4 p =0.6

35 For the entire probability Recall that the error term is either the negative of E(Yi Xi) when Yi is zero and 1- E(Yi Xi) when Yi is one: i (Y i = 0) = E(Y i X i ) distributions III i (Y i = 1) = 1 E(Y i X i ) For the entire distribution of the population, recall that Pr( i ) bern(p X) E(Y X) p =E(Y X) For example: i = 0.9 i =0.1 p =0.9

36 See you on Thurs.! That s it for today

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 18: Introduction to covariates, the QQ plot, and population structure II + minimal GWAS steps Jason Mezey jgm45@cornell.edu April

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.501.01 Lecture11: Quantitative Genomics II Jason Mezey jgm45@cornell.edu March 7, 019 (Th) 10:10-11:5 Announcements Homework #5 will be posted by

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary

More information

BTRY 4830/6830: Quantitative Genomics and Genetics

BTRY 4830/6830: Quantitative Genomics and Genetics BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements

More information

BTRY 7210: Topics in Quantitative Genomics and Genetics

BTRY 7210: Topics in Quantitative Genomics and Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu February 12, 2015 Lecture 3:

More information

Methods for Cryptic Structure. Methods for Cryptic Structure

Methods for Cryptic Structure. Methods for Cryptic Structure Case-Control Association Testing Review Consider testing for association between a disease and a genetic marker Idea is to look for an association by comparing allele/genotype frequencies between the cases

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics Genetics BTRY 4830/6830; PBSB.5201.01 Lecture13: Introduction to genome-wide association studies (GWAS) II Jason Mezey jgm45@cornell.edu March 16, 2017 (Th) 8:40-9:55 Announcements

More information

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017 Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping

More information

BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014

BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014 BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014 Homework 4 (version 3) - posted October 3 Assigned October 2; Due 11:59PM October 9 Problem 1 (Easy) a. For the genetic regression model: Y

More information

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability

More information

EE16B Designing Information Devices and Systems II

EE16B Designing Information Devices and Systems II EE16B Designing Information Devices and Systems II Lecture 9A Geometry of SVD, PCA Intro Last time: Described the SVD in Compact matrix form: U1SV1 T Full form: UΣV T Showed a procedure to SVD via A T

More information

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power Proportional Variance Explained by QTL and Statistical Power Partitioning the Genetic Variance We previously focused on obtaining variance components of a quantitative trait to determine the proportion

More information

Variance Component Models for Quantitative Traits. Biostatistics 666

Variance Component Models for Quantitative Traits. Biostatistics 666 Variance Component Models for Quantitative Traits Biostatistics 666 Today Analysis of quantitative traits Modeling covariance for pairs of individuals estimating heritability Extending the model beyond

More information

Asymptotic distribution of the largest eigenvalue with application to genetic data

Asymptotic distribution of the largest eigenvalue with application to genetic data Asymptotic distribution of the largest eigenvalue with application to genetic data Chong Wu University of Minnesota September 30, 2016 T32 Journal Club Chong Wu 1 / 25 Table of Contents 1 Background Gene-gene

More information

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit LECTURE 6 Introduction to Econometrics Hypothesis testing & Goodness of fit October 25, 2016 1 / 23 ON TODAY S LECTURE We will explain how multiple hypotheses are tested in a regression model We will define

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

EE16B Designing Information Devices and Systems II

EE16B Designing Information Devices and Systems II EE6B Designing Information Devices and Systems II Lecture 9B Geometry of SVD, PCA Uniqueness of the SVD Find SVD of A 0 A 0 AA T 0 ) ) 0 0 ~u ~u 0 ~u ~u ~u ~u Uniqueness of the SVD Find SVD of A 0 A 0

More information

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision) CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

1. Understand the methods for analyzing population structure in genomes

1. Understand the methods for analyzing population structure in genomes MSCBIO 2070/02-710: Computational Genomics, Spring 2016 HW3: Population Genetics Due: 24:00 EST, April 4, 2016 by autolab Your goals in this assignment are to 1. Understand the methods for analyzing population

More information

Simple linear regression

Simple linear regression Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single

More information

Introduction to Statistical Genetics (BST227) Lecture 6: Population Substructure in Association Studies

Introduction to Statistical Genetics (BST227) Lecture 6: Population Substructure in Association Studies Introduction to Statistical Genetics (BST227) Lecture 6: Population Substructure in Association Studies Confounding in gene+c associa+on studies q What is it? q What is the effect? q How to detect it?

More information

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies.

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies. Theoretical and computational aspects of association tests: application in case-control genome-wide association studies Mathieu Emily November 18, 2014 Caen mathieu.emily@agrocampus-ouest.fr - Agrocampus

More information

(Genome-wide) association analysis

(Genome-wide) association analysis (Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by

More information

Case-Control Association Testing. Case-Control Association Testing

Case-Control Association Testing. Case-Control Association Testing Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits. Technological advances have made it feasible to perform case-control association studies

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

The Quantitative TDT

The Quantitative TDT The Quantitative TDT (Quantitative Transmission Disequilibrium Test) Warren J. Ewens NUS, Singapore 10 June, 2009 The initial aim of the (QUALITATIVE) TDT was to test for linkage between a marker locus

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

More information

STA442/2101: Assignment 5

STA442/2101: Assignment 5 STA442/2101: Assignment 5 Craig Burkett Quiz on: Oct 23 rd, 2015 The questions are practice for the quiz next week, and are not to be handed in. I would like you to bring in all of the code you used to

More information

PCA vignette Principal components analysis with snpstats

PCA vignette Principal components analysis with snpstats PCA vignette Principal components analysis with snpstats David Clayton October 30, 2018 Principal components analysis has been widely used in population genetics in order to study population structure

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

Lecture 6: Selection on Multiple Traits

Lecture 6: Selection on Multiple Traits Lecture 6: Selection on Multiple Traits Bruce Walsh lecture notes Introduction to Quantitative Genetics SISG, Seattle 16 18 July 2018 1 Genetic vs. Phenotypic correlations Within an individual, trait values

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

Math 3330: Solution to midterm Exam

Math 3330: Solution to midterm Exam Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the

More information

The concept of breeding value. Gene251/351 Lecture 5

The concept of breeding value. Gene251/351 Lecture 5 The concept of breeding value Gene251/351 Lecture 5 Key terms Estimated breeding value (EB) Heritability Contemporary groups Reading: No prescribed reading from Simm s book. Revision: Quantitative traits

More information

Lecture 9. Short-Term Selection Response: Breeder s equation. Bruce Walsh lecture notes Synbreed course version 3 July 2013

Lecture 9. Short-Term Selection Response: Breeder s equation. Bruce Walsh lecture notes Synbreed course version 3 July 2013 Lecture 9 Short-Term Selection Response: Breeder s equation Bruce Walsh lecture notes Synbreed course version 3 July 2013 1 Response to Selection Selection can change the distribution of phenotypes, and

More information

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES Saurabh Ghosh Human Genetics Unit Indian Statistical Institute, Kolkata Most common diseases are caused by

More information

Vectors and Matrices Statistics with Vectors and Matrices

Vectors and Matrices Statistics with Vectors and Matrices Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc

More information

Statistical issues in QTL mapping in mice

Statistical issues in QTL mapping in mice Statistical issues in QTL mapping in mice Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Outline Overview of QTL mapping The X chromosome Mapping

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Lecture WS Evolutionary Genetics Part I 1

Lecture WS Evolutionary Genetics Part I 1 Quantitative genetics Quantitative genetics is the study of the inheritance of quantitative/continuous phenotypic traits, like human height and body size, grain colour in winter wheat or beak depth in

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 9 for Applied Multivariate Analysis Outline Addressing ourliers 1 Addressing ourliers 2 Outliers in Multivariate samples (1) For

More information

CBA4 is live in practice mode this week exam mode from Saturday!

CBA4 is live in practice mode this week exam mode from Saturday! Announcements CBA4 is live in practice mode this week exam mode from Saturday! Material covered: Confidence intervals (both cases) 1 sample hypothesis tests (both cases) Hypothesis tests for 2 means as

More information

MATH5745 Multivariate Methods Lecture 07

MATH5745 Multivariate Methods Lecture 07 MATH5745 Multivariate Methods Lecture 07 Tests of hypothesis on covariance matrix March 16, 2018 MATH5745 Multivariate Methods Lecture 07 March 16, 2018 1 / 39 Test on covariance matrices: Introduction

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

Econometrics. 4) Statistical inference

Econometrics. 4) Statistical inference 30C00200 Econometrics 4) Statistical inference Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Confidence intervals of parameter estimates Student s t-distribution

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

Bayesian Inference of Interactions and Associations

Bayesian Inference of Interactions and Associations Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model EPSY 905: Multivariate Analysis Lecture 1 20 January 2016 EPSY 905: Lecture 1 -

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University kbroman@jhsph.edu www.biostat.jhsph.edu/ kbroman Outline Experiments and data Models ANOVA

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 1: August 22, 2012

More information

STAT 536: Genetic Statistics

STAT 536: Genetic Statistics STAT 536: Genetic Statistics Tests for Hardy Weinberg Equilibrium Karin S. Dorman Department of Statistics Iowa State University September 7, 2006 Statistical Hypothesis Testing Identify a hypothesis,

More information

Computational Approaches to Statistical Genetics

Computational Approaches to Statistical Genetics Computational Approaches to Statistical Genetics GWAS I: Concepts and Probability Theory Christoph Lippert Dr. Oliver Stegle Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen

More information

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics 1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics and Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman [ Teaching Miscellaneous lectures]

More information

Inference in Regression Analysis

Inference in Regression Analysis Inference in Regression Analysis Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 1 Today: Normal Error Regression Model Y i = β 0 + β 1 X i + ǫ i Y i value

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature25973 Power Simulations We performed extensive power simulations to demonstrate that the analyses carried out in our study are well powered. Our simulations indicate very high power for

More information

Announcements Monday, November 13

Announcements Monday, November 13 Announcements Monday, November 13 The third midterm is on this Friday, November 17. The exam covers 3.1, 3.2, 5.1, 5.2, 5.3, and 5.5. About half the problems will be conceptual, and the other half computational.

More information

Lecture 11: Multiple trait models for QTL analysis

Lecture 11: Multiple trait models for QTL analysis Lecture 11: Multiple trait models for QTL analysis Julius van der Werf Multiple trait mapping of QTL...99 Increased power of QTL detection...99 Testing for linked QTL vs pleiotropic QTL...100 Multiple

More information

Need for Several Predictor Variables

Need for Several Predictor Variables Multiple regression One of the most widely used tools in statistical analysis Matrix expressions for multiple regression are the same as for simple linear regression Need for Several Predictor Variables

More information

BNAD 276 Lecture 10 Simple Linear Regression Model

BNAD 276 Lecture 10 Simple Linear Regression Model 1 / 27 BNAD 276 Lecture 10 Simple Linear Regression Model Phuong Ho May 30, 2017 2 / 27 Outline 1 Introduction 2 3 / 27 Outline 1 Introduction 2 4 / 27 Simple Linear Regression Model Managerial decisions

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

Machine Learning: Evaluation

Machine Learning: Evaluation Machine Learning: Evaluation Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Wintersemester 2007 / 2008 Comparison of Algorithms Comparison of Algorithms Is algorithm A better

More information

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,

More information

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

y ˆ i = ˆ  T u i ( i th fitted value or i th fit) 1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u

More information

2. Map genetic distance between markers

2. Map genetic distance between markers Chapter 5. Linkage Analysis Linkage is an important tool for the mapping of genetic loci and a method for mapping disease loci. With the availability of numerous DNA markers throughout the human genome,

More information

Genotype Imputation. Biostatistics 666

Genotype Imputation. Biostatistics 666 Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives

More information

Dimension Reduction (PCA, ICA, CCA, FLD,

Dimension Reduction (PCA, ICA, CCA, FLD, Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction

More information

Lecture 9 SLR in Matrix Form

Lecture 9 SLR in Matrix Form Lecture 9 SLR in Matrix Form STAT 51 Spring 011 Background Reading KNNL: Chapter 5 9-1 Topic Overview Matrix Equations for SLR Don t focus so much on the matrix arithmetic as on the form of the equations.

More information

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

[POLS 8500] Review of Linear Algebra, Probability and Information Theory [POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming

More information

Multivariate analysis of genetic data: an introduction

Multivariate analysis of genetic data: an introduction Multivariate analysis of genetic data: an introduction Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London XXIV Simposio Internacional De Estadística Bogotá, 25th July

More information

Multiple QTL mapping

Multiple QTL mapping Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: PCA and the Power Iteration Method

CS168: The Modern Algorithmic Toolbox Lecture #8: PCA and the Power Iteration Method CS168: The Modern Algorithmic Toolbox Lecture #8: PCA and the Power Iteration Method Tim Roughgarden & Gregory Valiant April 15, 015 This lecture began with an extended recap of Lecture 7. Recall that

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Introduction to Analysis of Genomic Data Using R Lecture 6: Review Statistics (Part II)

Introduction to Analysis of Genomic Data Using R Lecture 6: Review Statistics (Part II) 1/45 Introduction to Analysis of Genomic Data Using R Lecture 6: Review Statistics (Part II) Dr. Yen-Yi Ho (hoyen@stat.sc.edu) Feb 9, 2018 2/45 Objectives of Lecture 6 Association between Variables Goodness

More information

Friday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo

Friday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo Friday Harbor 2017 From Genetics to GWAS (Genome-wide Association Study) Sept 7 2017 David Fardo Purpose: prepare for tomorrow s tutorial Genetic Variants Quality Control Imputation Association Visualization

More information

Causal inference in biomedical sciences: causal models involving genotypes. Mendelian randomization genes as Instrumental Variables

Causal inference in biomedical sciences: causal models involving genotypes. Mendelian randomization genes as Instrumental Variables Causal inference in biomedical sciences: causal models involving genotypes Causal models for observational data Instrumental variables estimation and Mendelian randomization Krista Fischer Estonian Genome

More information

Lecture 9. QTL Mapping 2: Outbred Populations

Lecture 9. QTL Mapping 2: Outbred Populations Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred

More information

Machine Learning 11. week

Machine Learning 11. week Machine Learning 11. week Feature Extraction-Selection Dimension reduction PCA LDA 1 Feature Extraction Any problem can be solved by machine learning methods in case of that the system must be appropriately

More information

Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson

Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson PCA Review! Supervised learning! Fisher linear discriminant analysis! Nonlinear discriminant analysis! Research example! Multiple Classes! Unsupervised

More information

Linear Algebra Review

Linear Algebra Review Linear Algebra Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Linear Algebra Review 1 / 45 Definition of Matrix Rectangular array of elements arranged in rows and

More information

CSE 546 Final Exam, Autumn 2013

CSE 546 Final Exam, Autumn 2013 CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,

More information

Statistical Inference

Statistical Inference Statistical Inference Classical and Bayesian Methods Class 7 AMS-UCSC Tue 31, 2012 Winter 2012. Session 1 (Class 7) AMS-132/206 Tue 31, 2012 1 / 13 Topics Topics We will talk about... 1 Hypothesis testing

More information

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q) Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,

More information

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic

More information

Chapter 11. Regression with a Binary Dependent Variable

Chapter 11. Regression with a Binary Dependent Variable Chapter 11 Regression with a Binary Dependent Variable 2 Regression with a Binary Dependent Variable (SW Chapter 11) So far the dependent variable (Y) has been continuous: district-wide average test score

More information

G E INTERACTION USING JMP: AN OVERVIEW

G E INTERACTION USING JMP: AN OVERVIEW G E INTERACTION USING JMP: AN OVERVIEW Sukanta Dash I.A.S.R.I., Library Avenue, New Delhi-110012 sukanta@iasri.res.in 1. Introduction Genotype Environment interaction (G E) is a common phenomenon in agricultural

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015 Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2015 1 / 1 Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits.

More information