BTRY 4830/6830: Quantitative Genomics and Genetics

Size: px
Start display at page:

Download "BTRY 4830/6830: Quantitative Genomics and Genetics"

Transcription

1 BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55

2 Announcements Homework #5 available (see your TA!) We will get you details for the final next week

3 Summary of lecture 23 We will review some basics of epistasis and testing for (potentially a good topic for your project!?) We will briefly discuss alternative testing approaches in GWAS We will provide a (brief) introduction to Bayesian inference

4 Review: epistasis epistasis - a case where the effect of an allele substitution at one locus A1 -> A2 alters the effect of a substituting an allele at another locus B1->B2 This may be equivalently phrased as a change in the expected phenotype (genotypic value) for a genotype at one locus conditional on the state of a locus at another marker Note that there is a symmetry in epistasis such that if the effect of at least one allelic substitution (from one genotype to another) for one locus depends on the genotype at the other locus, then at least one allelic substitution of the other locus will be dependent as well A consequence of this symmetry is if there is an epistatic relationship between two loci BOTH will be causal polymorphisms for the phenotype (!!!) If there is an epistatic effect (=relationship) between loci, we would therefore like to know this information Note that we need not consider such relationships for a pair of loci, but such relationships can exist among three (three-way), four (four-way), etc. The amount of epistasis among loci for any given phenotype is unknown (but without question it is ubiquitous!!)

5 Review: modeling epistasis I To model epistasis, we are going to use our same GLM framework (!!) The parameterization (using Xa and Xd) that we have considered so far perfectly models any case where there is no epistasis We will account for the possibility of epistasis by constructing additional dummy variables and adding additional parameters (so that we have 9 total in our GLM)

6 Review: modeling epistasis II Recall the dummy variables we have constructed so far: X a,1 = X a,2 = 1 for A 1 A 1 0 for A 1 A 2 X d,1 = 1 for A 2 A 2 1 for B 1 B 1 0 for B 1 B 2 X d,2 = 1 for B 2 B 2 We will use these dummy variables to construct additional dummy variables in our GLM (and add additional parameters) to account for epistasis Y = γ 1 (β µ + X a,1 β a,1 + X d,1 β d,1 + X a,2 β a,2 + X d,2 β d,2 + X a,1 X a,2 β a,a + X a,1 X d,2 β a,d + X d,1 X a,2 β d,a + X d,1 X d,2 β d,d ) 1 for A 1 A 1 1 for A 1 A 2 1 for A 2 A 2 1 for B 1 B 1 1 for B 1 B 2 1 for B 2 B 2

7 Review: modeling epistasis III Y = γ 1 (β µ + X a,1 β a,1 + X d,1 β d,1 + X a,2 β a,2 + X d,2 β d,2 + X a,1 X a,2 β a,a + X a,1 X d,2 β a,d + X d,1 X a,2 β d,a + X d,1 X d,2 β d,d ) To provide some intuition concerning what each of these are capturing, consider the values that each of the genotypes would take for dummy variable Xa,1: B 1 B 1 B 1 B 2 B 2 B 2 A 1 A A 1 A A 2 A

8 Review: modeling epistasis IV Y = γ 1 (β µ + X a,1 β a,1 + X d,1 β d,1 + X a,2 β a,2 + X d,2 β d,2 + X a,1 X a,2 β a,a + X a,1 X d,2 β a,d + X d,1 X a,2 β d,a + X d,1 X d,2 β d,d ) To provide some intuition concerning what each of these are capturing, consider the values that each of the genotypes would take for dummy variable Xd,1: B 1 B 1 B 1 B 2 B 2 B 2 A 1 A A 1 A A 2 A

9 Review: modeling epistasis V Y = γ 1 (β µ + X a,1 β a,1 + X d,1 β d,1 + X a,2 β a,2 + X d,2 β d,2 + X a,1 X a,2 β a,a + X a,1 X d,2 β a,d + X d,1 X a,2 β d,a + X d,1 X d,2 β d,d ) To provide some intuition concerning what each of these are capturing, consider the values that each of the genotypes would take for dummy variable Xa,1,Xa,2: B 1 B 1 B 1 B 2 B 2 B 2 A 1 A A 1 A A 2 A

10 Review: modeling epistasis VI Y = γ 1 (β µ + X a,1 β a,1 + X d,1 β d,1 + X a,2 β a,2 + X d,2 β d,2 + X a,1 X a,2 β a,a + X a,1 X d,2 β a,d + X d,1 X a,2 β d,a + X d,1 X d,2 β d,d ) To provide some intuition concerning what each of these are capturing, consider the values that each of the genotypes would take for dummy variable Xa,1Xd,2 (similarly for Xa,2Xd,1): B 1 B 1 B 1 B 2 B 2 B 2 A 1 A A 1 A A 2 A

11 Review: modeling epistasis VII Y = γ 1 (β µ + X a,1 β a,1 + X d,1 β d,1 + X a,2 β a,2 + X d,2 β d,2 + X a,1 X a,2 β a,a + X a,1 X d,2 β a,d + X d,1 X a,2 β d,a + X d,1 X d,2 β d,d ) To provide some intuition concerning what each of these are capturing, consider the values that each of the genotypes would take for dummy variable Xd,1,Xd,2: B 1 B 1 B 1 B 2 B 2 B 2 A 1 A A 1 A A 2 A

12 Review: inference for epistasis 1 To infer epistatic relationships we will use the exact same genetic framework and statistical framework that we have been considering For the genetic framework, we are still testing markers that we are assuming are in LD with causal polymorphisms that could have an epistatic relationship (so we are indirectly inferring that there is epistasis from the marker genotypes) For inference, we going to estimate epistatic parameters using the same approach as before (!!), i.e. for a linear model: X =[1, X a,1, X d,1, X a,2, X d,2, X a,a, X a,d, X d,a, X d,d ] β =[β µ, β a,1, β d,1, β a,2, β d,2, β a,a, β a,d, β d,a, β d,d ] T ˆβ =(X T X) 1 X T y

13 Review: inference for epistasis II For hypothesis testing, we will just use an LRT calculated the same way as before (!!) For an F-statistic for a linear regression and for logistic estimate the parameters under the null and alternative model and substitute these into the likelihood equations that have the same form as before (with some additional dummy variables and parameters) The only difference is the degrees of freedom for a given test we consider = number of parameters in the alternative model - the number of parameters in the null model

14 Review: inference for epistasis III For example, we could use the entire model to test the same hypothesis that we have been considering for a single marker: H 0 : β a,1 =0 β d,1 = 0 H A : β a,1 =0 β d,1 = 0 We could also test whether either marker has evidence of being a causal polymorphism: H 0 : β a,1 =0 β d,1 =0 β a,2 =0 β d,2 = 0 H A : β a,1 =0 β d,1 =0 β a,2 =0 β d,2 = 0 We can also test just for epistasis (note this is equivalent to testing an interaction effect in an ANOVA!): H 0 : β a,a =0 β a,d =0 β d,a =0 β d,d = 0 H A : β a,a =0 β a,d =0 β d,a =0 β d,d = 0 We can also test the entire model (what is the interpretation in this case!?): H 0 : β a,1 =0 β d,1 =0 β a,2 =0 β d,2 =0 β a,a =0 β a,d =0 β d,a =0 β d,d = 0 H A : β a,1 =0 β d,1 =0 β a,2 =0 β d,2 =0 β a,a =0 β a,d =0 β d,a =0 β d,d = 0

15 Final notes on testing for epistasis Since testing for epistasis requires considering models with more parameters, these tests are generally less powerful than tests of one marker at a time In addition testing for epistasis among all possible pairs of markers (or three or four!, etc.) produces many tests (how many?) Also, identification of a causal polymorphism can be accomplished by testing just one marker at a time (!!) For these reasons, epistasis is often a secondary analysis and we often consider a subset of markers (what might be good strategies) Note however that correctly inferring epistasis is of value for many reasons (for example?) so we would like to do this How to infer epistasis is an active area of research (!!)

16 Review: GWAS analysis So far, we have considered a regression (generalized linear modeling = GLM) approach for constructing statistical models of the association of genetic polymorphisms and phenotype With this considered the following hypotheses: H 0 : β a =0 β d = 0 H A : β a =0 β d = 0 Note that this X coding of genotypes test the general null hypothesis (in fact, any coding X of the genotypes can be used to construct a test in a GWAS) There are therefore many other ways in which we could construct a different hypothesis test and any of these will be a reasonable (and acceptable) strategy for performing a GWAS analysis

17 Alternative tests in GWAS I Since our basic null / alternative hypothesis construction in GWAS covers a large number of possible relationships between genotypes and phenotypes, there are a large number of tests that we could apply in a GWAS e.g. t-tests, ANOVA, Wald s test, non-parametric permutation based tests, Kruskal-Wallis tests, other rank based tests, chisquare, Fisher s exact, Cochran-Armitage, etc. (see PLINK for a somewhat comprehensive list of tests used in GWAS) When can we use different tests? The only restriction is that our data conform to the assumptions of the test (examples?) We could therefore apply a diversity of tests for any given GWAS

18 Alternative tests in GWAS II Should we use different tests in a GWAS (and why)? Yes we should - the reason is different tests have different performance depending on the (unknown) conditions of the system and experiment, i.e. some may perform better than others In general, since we don t know the true conditions (and therefore which will be best suited) we should run a number of tests and compare results How to compare results of different GWAS is a fuzzy case (=no nonconditional rules) but a reasonable approach is to treat each test as a distinct GWAS analysis and compare the hits across analyses using the following rules: If all methods identify the same hits (=genomic locations) then this is good evidence that there is a causal polymorphism If methods do not agree on the position (e.g. some are significant, some are not) we should attempt to determine the reason for the discrepancy (this requires that we understand the tests and experience)

19 Alternative tests in GWAS III We do not have time in this course to do a comprehensive review of possible tests (keep in mind, every time you learn a new test in a statistics class, there is a good chance you could apply it in a GWAS!) Let s consider a few examples alternative tests that could be applied Remember that to apply these alternative tests, you will perform N alternative tests for each marker-phenotype combinations, where for each case, we are testing the following hypotheses with different (implicit) codings of X (!!): H 0 : Cov(Y,X) = 0 H A : Cov(Y,X) = 0

20 Alternative test examples I First, let s consider a case-control phenotype and consider a chi-square test (which has deep connections to our logistic regression test under certain assumptions but it has slightly different properties!) To construct the test statistic, we consider the counts of genotypephenotype combinations (left) and calculate the expected numbers in each cell (right): Case Control A 1 A 1 n 11 n 12 n 1. A 1 A 2 n 21 n 22 n 2. A 2 A 2 n 31 n 32 n 3. n.1 n.2 n We then construct the following test statistic: LRT = 2lnΛ = 2 in this χ 2 d.f.=2. Case Control A 1 A 1 (n.1 n 1. )/n (n.2 n 1. )/n n 1. A 1 A 2 (n.1 n 2. )/n (n.2 n 2. )/n n 2. A 2 A 2 (n.1 n 3. )/n (n.2 n 3. )/n n 3. n.1 n.2 n Where the (asymptotic) distribution when the null hypothesis is true is: 3 i=1 2 n i n ij ln n.i n j. j=1 ze tends to infinite, i.e. when the sam d.f. = (#columns-1)(#rows-1) = 2 an therefore calculate the statistic in

21 Alternative test examples II Second, let s consider a Fisher s exact test Note the the LRT for the null hypothesis under the chi-square test was only asymptotically exact, i.e. it is exact as sample size n approaches infinite but it is not exact for smaller sample sizes (although we hope it is close!) Could we construct a test that is exact for smaller sample sizes? Yes, we can calculate a Fisher s test statistic for our sample, where the distribution under the null hypothesis is exact for any sample size (I will let you look up how to calculate this statistic and the distribution under the null on your own): Case Control A 1 A 1 n 11 n 21 A 1 A 2 n 21 n 22 A 2 A 2 n 31 n 32 i-square test) is also often Given this test is exact, why would we ever use Chi-square / what is a rule for when we should use one versus the other?

22 Alternative test examples III Third, let s ways of grouping the cells, where we could apply either a chisquare or a Fisher s exact test For MAF = A1, we can apply a recessive (left) and dominance test (right): We could also apply an allele test (note these test names are from PLINK): Case Control A 1 A 1 n 11 n 12 A 1 A 2 A 2 A 2 n 21 n 22 Case Control A 1 n 11 n 12 A 2 n 21 n 22 Case Control A 1 A 1 A 1 A 2 n 11 n 12 A 2 A 2 n 21 n 22 When should we expect one of these tests to perform better than the others?

23 Basic GWAS wrap-up You now have all the tools at your disposal to perform a GWAS analysis of real data (!!) Recall that producing a good GWAS analysis requires iterative analysis of the data and considering why you might be getting the results that you observe Also recall that the more experience you have performing (careful / thoughtful) GWAS analyses, the better you will get at it!

24 Introduction to Bayesian analysis 1 Up to this point, we have considered statistical analysis (and inference) using a Frequentist formalism There is an alternative formalism called Bayesian that we will now (and in the final lectures) introduce in a very brief manner Note that there is an important conceptual split between statisticians who consider themselves Frequentist of Bayesian but for GWAS analysis (and for most applications where we are concerned with analyzing data) we do not have a preference, i.e. we only care about getting the right biological answer so any (or both) frameworks that get us to this goal are useful In GWAS (and mapping) analysis, you will see both frequentist (i.e. the framework we have built up to this point!) and Bayesian approaches applied

25 Introduction to Bayesian analysis II In both frequentist and Bayesian analyses, we have the same probabilistic framework (sample spaces, random variables, probability models, etc.) and when assuming our probability model falls in a family of parameterized distributions, we assume that a single fixed parameter value(s) describes the true model that produced our sample However, in a Bayesian framework, we now allow the parameter to have it s own probability distribution (we DO NOT do this in a frequentist analysis), such that we treat it as a random variable This may seem strange - how can we consider a parameter to have a probability distribution if it is fixed? However, we can if we have some prior assumptions about what values the parameter value will take for our system compared to others and we can make this prior assumption rigorous by assuming there is a probability distribution associated with the parameter It turns out, this assumption produces major differences between the two analysis procedures (in how they consider probability, how they perform inference, etc.

26 Introduction to Bayesian analysis III To introduce Bayesian statistics, we need to begin by introducing Bayes theorem he name Baye Consider a set of events (remember events!?) A = A 1...A k of a sample space S (where k may be infinite), which form a partition of the sample space, i.e. ple space S (where k may be infinite), whi k i A i = S and A i A j = for all i = j. For another event B S (which may be S itself) define the Law of total probability: k k Pr(B) = Pr(B A i )= Pr(B A i )Pr(A i ) i=1 i=1 A Now A we can state Bayes theorem: Pr(A i B) = Pr(A i B) Pr(B) = Pr(B A i)pr(a i ) Pr(B) = Pr(B A i )Pr(A) k i=1 Pr(B A i)pr(a i )

27 Introduction to Bayesian analysis IV Remember that in a Bayesian (not frequentist!) framework, our parameter(s) have a probability distribution associated with them that reflects our belief in the values that might be the true value of the parameter Since we are treating the parameter as a random variable, we can consider the joint distribution of the parameter AND a sample Y produced under a probability model: Pr(θ Y) Fo inference, we are interested in the probability the parameter takes a certain value given a sample: Pr(θ y) Using Bayes theorem, we can write: Pr(θ y) = Pr(y θ)pr(θ) Pr(y) Also note that since the sample is fixed (i.e. we are considering a single sample) Pr(y) =c, we can rewrite this as follows: Pr(θ y) Pr(y θ)pr(θ)

28 Introduction to Bayesian analysis V Let s consider the structure of our main equation in Bayesian statistics: Pr(θ y) Pr(y θ)pr(θ) Note that the left hand side is called the posterior probability: Pr(θ y) i.e. the The first term of the right hand side is something we have seen before, i.e. the likelihood (!!): Pr(y θ) =L(θ y) The second term of the right hand side is new and is called the prior: Pr(θ) Note that the prior is how we incorporate our assumptions concerning the values the true parameter value may take In a Bayesian framework, we are making two assumptions (unlike a frequentist where we make one assumption: 1. the probability distribution that generated the sample, 2. the probability distribution of the parameter

29 Probability in a Bayesian framework By allowing for the parameter to have an prior probability distribution, we produce a change in how we consider probability in a Bayesian versus Frequentist perspective For example, consider a coin flip, with Bern(p) In a Frequentist framework, we consider a conception of probability that we use for inference to reflect the outcomes as if we flipped the coin an infinite number of times, i.e. if we flipped the coin 100 times and it was heads each time, we do not use this information to change how we consider a new experiment with this same coin if we flipped it again In a Bayesian framework, we consider a conception of probability can incorporate previous observations, i.e. if we flipped a coin 100 times and it was heads each time, we might want to incorporate this information in to our inferences from a new experiment with this same coin if we flipped it again Note that this philosophic distinction is very deep (=we have only scratched the surface with this one example)

30 Debating the Frequentist versus Bayesian frameworks Frequentists often argue that because they do not take previous experience into account when performing their inference concerning the value of a parameter, such that they do not introduce biases into their inference framework In response, Bayesians often argue: Previous experience is used to specify the probability model in the first place By not incorporating previous experience in the inference procedure, prior assumptions are still being used (which can introduce logical inconsistencies!) The idea of considering an infinite number of observations is not particular realistic (and can be a non-sensical abstraction for the real world) The impact of prior assumptions in Bayesian inference disappear as the sample size goes to infinite Again, note that we have only scratched the surface of this debate!

31 Types of priors in Bayesian analysis Up to this point, we have discussed priors in an abstract manner To start making this concept more clear, let s consider one of our original examples where we are interested in the knowing the mean human height in the US (what are the components of the statistical framework for this example!? Note the basic components are the same in Frequentist / Bayesian!) If we assume a normal probability model of human height (what parameter are we interested in inferring in this case and why?) in a Bayesian framework, we will at least need to define a prior: Pr(µ) One possible approach is to make the probability of each possible value of the parameter the same (what distribution are we assuming and what is a problem with this approach), which defines an improper prior: Pr(µ) =c Another possible approach is to incorporate our previous observations that heights are seldom infinite, etc. where one choice for incorporating this observations is my defining a prior that has the same distribution as our probability model, which defines a conjugate prior (which is also a proper prior): ce, and use a math- Pr(µ) N(κ, φ 2 )

32 Constructing the posterior probability Let s put this all together for our heights in the US example First recall that our assumption is the probability model is normal (so what is the form of the likelihood?): dom variable Y N(µ, σ 2 ) 2 Second, assume a normal prior for the parameter we are interested in: ce, and use a math- Pr(µ) N(κ, φ 2 ), From the Bayesian equation, we can now put this together as follows: Pr(µ y) Pr(θ y) Pr(y θ)pr(θ) n i=1 1 e (y i µ)2 1 2σ 2 e (µ κ) 2φ 2 2πσ 2 2πφ 2 Note that with a little rearrangement, this can be written in the following form: Pr(µ y) N ( κ σ 2 + n i y i σ 2 ) ( 1 φ 2 + n σ 2 ), ( 1 φ 2 + n σ 2 ) 1 2

33 Bayesian inference: estimation Inference in a Bayesian framework differs from a frequentist framework in both estimation and hypothesis testing For example, for estimation in a Bayesian framework, we always construct estimators using the posterior probability distribution, for example: ˆθ = mean(θ y) = θpr(θ y)dθ For example, in our heights in the US example our estimator is: Note 1: again notice that the impact of the prior disappears as the sample size goes to infinite (=same as MLE under this condition): Note 2: estimates in a Bayesian framework can be different than in a likelihood (Frequentist) framework since estimator construction is fundamentally different (!!) or ˆθ = median(θ y) ˆµ = median(µ y) =mean(µ y) = ( κ σ 2 + nȳ σ 2 ) ( 1 φ 2 + n σ 2 ) ( κ + nȳ ) σ 2 σ 2 ( 1 + n ) ( nȳ ) σ 2 ( n ) ȳ φ 2 σ 2 σ 2

34 That s it for today Next lecture: we will continue our brief introduction to Bayesian statistics

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 18: Introduction to covariates, the QQ plot, and population structure II + minimal GWAS steps Jason Mezey jgm45@cornell.edu April

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.501.01 Lecture11: Quantitative Genomics II Jason Mezey jgm45@cornell.edu March 7, 019 (Th) 10:10-11:5 Announcements Homework #5 will be posted by

More information

Goodness of Fit Goodness of fit - 2 classes

Goodness of Fit Goodness of fit - 2 classes Goodness of Fit Goodness of fit - 2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A = 0.75! Exact p-value Exact confidence

More information

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies.

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies. Theoretical and computational aspects of association tests: application in case-control genome-wide association studies Mathieu Emily November 18, 2014 Caen mathieu.emily@agrocampus-ouest.fr - Agrocampus

More information

BTRY 7210: Topics in Quantitative Genomics and Genetics

BTRY 7210: Topics in Quantitative Genomics and Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu February 12, 2015 Lecture 3:

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Bayesian Regression (1/31/13)

Bayesian Regression (1/31/13) STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we

More information

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q) Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,

More information

Multiple QTL mapping

Multiple QTL mapping Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014

BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014 BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014 Homework 4 (version 3) - posted October 3 Assigned October 2; Due 11:59PM October 9 Problem 1 (Easy) a. For the genetic regression model: Y

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Case-Control Association Testing. Case-Control Association Testing

Case-Control Association Testing. Case-Control Association Testing Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits. Technological advances have made it feasible to perform case-control association studies

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Why uncertainty? Why should data mining care about uncertainty? We

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Bayesian Inference. STA 121: Regression Analysis Artin Armagan Bayesian Inference STA 121: Regression Analysis Artin Armagan Bayes Rule...s! Reverend Thomas Bayes Posterior Prior p(θ y) = p(y θ)p(θ)/p(y) Likelihood - Sampling Distribution Normalizing Constant: p(y

More information

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006 Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20, 2006 1 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher)

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics and Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman [ Teaching Miscellaneous lectures]

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 12: Frequentist properties of estimators (v4) Ramesh Johari ramesh.johari@stanford.edu 1 / 39 Frequentist inference 2 / 39 Thinking like a frequentist Suppose that for some

More information

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION THOMAS MAILUND Machine learning means different things to different people, and there is no general agreed upon core set of algorithms that must be

More information

Bias Variance Trade-off

Bias Variance Trade-off Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ) = E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ) = Var(ˆθ) + (B(ˆθ) 2 ) MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]

More information

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic

More information

Maximum-Likelihood Estimation: Basic Ideas

Maximum-Likelihood Estimation: Basic Ideas Sociology 740 John Fox Lecture Notes Maximum-Likelihood Estimation: Basic Ideas Copyright 2014 by John Fox Maximum-Likelihood Estimation: Basic Ideas 1 I The method of maximum likelihood provides estimators

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

Lecture 10: Generalized likelihood ratio test

Lecture 10: Generalized likelihood ratio test Stat 200: Introduction to Statistical Inference Autumn 2018/19 Lecture 10: Generalized likelihood ratio test Lecturer: Art B. Owen October 25 Disclaimer: These notes have not been subjected to the usual

More information

F & B Approaches to a simple model

F & B Approaches to a simple model A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys

More information

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics Genetics BTRY 4830/6830; PBSB.5201.01 Lecture13: Introduction to genome-wide association studies (GWAS) II Jason Mezey jgm45@cornell.edu March 16, 2017 (Th) 8:40-9:55 Announcements

More information

3 Comparison with Other Dummy Variable Methods

3 Comparison with Other Dummy Variable Methods Stats 300C: Theory of Statistics Spring 2018 Lecture 11 April 25, 2018 Prof. Emmanuel Candès Scribe: Emmanuel Candès, Michael Celentano, Zijun Gao, Shuangning Li 1 Outline Agenda: Knockoffs 1. Introduction

More information

QTL model selection: key players

QTL model selection: key players Bayesian Interval Mapping. Bayesian strategy -9. Markov chain sampling 0-7. sampling genetic architectures 8-5 4. criteria for model selection 6-44 QTL : Bayes Seattle SISG: Yandell 008 QTL model selection:

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Lecture 21: October 19

Lecture 21: October 19 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

2. Map genetic distance between markers

2. Map genetic distance between markers Chapter 5. Linkage Analysis Linkage is an important tool for the mapping of genetic loci and a method for mapping disease loci. With the availability of numerous DNA markers throughout the human genome,

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

Introduction to Bayesian Methods

Introduction to Bayesian Methods Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl Broman Biostatistics and Medical Informatics University of Wisconsin Madison kbroman.org github.com/kbroman @kwbroman Backcross P 1 P 2 P 1 F 1 BC 4

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

I Have the Power in QTL linkage: single and multilocus analysis

I Have the Power in QTL linkage: single and multilocus analysis I Have the Power in QTL linkage: single and multilocus analysis Benjamin Neale 1, Sir Shaun Purcell 2 & Pak Sham 13 1 SGDP, IoP, London, UK 2 Harvard School of Public Health, Cambridge, MA, USA 3 Department

More information

Composite Hypotheses and Generalized Likelihood Ratio Tests

Composite Hypotheses and Generalized Likelihood Ratio Tests Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve

More information

Mathematical Statistics

Mathematical Statistics Mathematical Statistics MAS 713 Chapter 8 Previous lecture: 1 Bayesian Inference 2 Decision theory 3 Bayesian Vs. Frequentist 4 Loss functions 5 Conjugate priors Any questions? Mathematical Statistics

More information

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative

More information

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes Neyman-Pearson paradigm. Suppose that a researcher is interested in whether the new drug works. The process of determining whether the outcome of the experiment points to yes or no is called hypothesis

More information

Frequentist Statistics and Hypothesis Testing Spring

Frequentist Statistics and Hypothesis Testing Spring Frequentist Statistics and Hypothesis Testing 18.05 Spring 2018 http://xkcd.com/539/ Agenda Introduction to the frequentist way of life. What is a statistic? NHST ingredients; rejection regions Simple

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

Bayesian Inference: What, and Why?

Bayesian Inference: What, and Why? Winter School on Big Data Challenges to Modern Statistics Geilo Jan, 2014 (A Light Appetizer before Dinner) Bayesian Inference: What, and Why? Elja Arjas UH, THL, UiO Understanding the concepts of randomness

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Statistics for laboratory scientists II

Statistics for laboratory scientists II This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Human vs mouse Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] www.daviddeen.com

More information

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings: K&F: 16.3, 16.4, 17.3 Bayesian Param. Learning Bayesian Structure Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University October 6 th, 2008 10-708 Carlos Guestrin 2006-2008

More information

Predictive Distributions

Predictive Distributions Predictive Distributions October 6, 2010 Hoff Chapter 4 5 October 5, 2010 Prior Predictive Distribution Before we observe the data, what do we expect the distribution of observations to be? p(y i ) = p(y

More information

Principles of Statistical Inference

Principles of Statistical Inference Principles of Statistical Inference Nancy Reid and David Cox August 30, 2013 Introduction Statistics needs a healthy interplay between theory and applications theory meaning Foundations, rather than theoretical

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

Principles of Statistical Inference

Principles of Statistical Inference Principles of Statistical Inference Nancy Reid and David Cox August 30, 2013 Introduction Statistics needs a healthy interplay between theory and applications theory meaning Foundations, rather than theoretical

More information

Ling 289 Contingency Table Statistics

Ling 289 Contingency Table Statistics Ling 289 Contingency Table Statistics Roger Levy and Christopher Manning This is a summary of the material that we ve covered on contingency tables. Contingency tables: introduction Odds ratios Counting,

More information

Review of Maximum Likelihood Estimators

Review of Maximum Likelihood Estimators Libby MacKinnon CSE 527 notes Lecture 7, October 7, 2007 MLE and EM Review of Maximum Likelihood Estimators MLE is one of many approaches to parameter estimation. The likelihood of independent observations

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th CMPT 882 - Machine Learning Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th Stephen Fagan sfagan@sfu.ca Overview: Introduction - Who was Bayes? - Bayesian Statistics Versus Classical Statistics

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017 Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Introduction to Applied Bayesian Modeling. ICPSR Day 4 Introduction to Applied Bayesian Modeling ICPSR Day 4 Simple Priors Remember Bayes Law: Where P(A) is the prior probability of A Simple prior Recall the test for disease example where we specified the

More information

Machine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.

Machine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20. 10-601 Machine Learning, Midterm Exam: Spring 2008 Please put your name on this cover sheet If you need more room to work out your answer to a question, use the back of the page and clearly mark on the

More information

INTERVAL ESTIMATION AND HYPOTHESES TESTING

INTERVAL ESTIMATION AND HYPOTHESES TESTING INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,

More information

Time Series and Dynamic Models

Time Series and Dynamic Models Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The

More information

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8 The E-M Algorithm in Genetics Biostatistics 666 Lecture 8 Maximum Likelihood Estimation of Allele Frequencies Find parameter estimates which make observed data most likely General approach, as long as

More information

Week 3: Linear Regression

Week 3: Linear Regression Week 3: Linear Regression Instructor: Sergey Levine Recap In the previous lecture we saw how linear regression can solve the following problem: given a dataset D = {(x, y ),..., (x N, y N )}, learn to

More information

Bayesian Updating with Continuous Priors Class 13, Jeremy Orloff and Jonathan Bloom

Bayesian Updating with Continuous Priors Class 13, Jeremy Orloff and Jonathan Bloom Bayesian Updating with Continuous Priors Class 3, 8.05 Jeremy Orloff and Jonathan Bloom Learning Goals. Understand a parameterized family of distributions as representing a continuous range of hypotheses

More information

Compute f(x θ)f(θ) dθ

Compute f(x θ)f(θ) dθ Bayesian Updating: Continuous Priors 18.05 Spring 2014 b a Compute f(x θ)f(θ) dθ January 1, 2017 1 /26 Beta distribution Beta(a, b) has density (a + b 1)! f (θ) = θ a 1 (1 θ) b 1 (a 1)!(b 1)! http://mathlets.org/mathlets/beta-distribution/

More information

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing Robert Vanderbei Fall 2014 Slides last edited on November 24, 2014 http://www.princeton.edu/ rvdb Coin Tossing Example Consider two coins.

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

2018 SISG Module 20: Bayesian Statistics for Genetics Lecture 2: Review of Probability and Bayes Theorem

2018 SISG Module 20: Bayesian Statistics for Genetics Lecture 2: Review of Probability and Bayes Theorem 2018 SISG Module 20: Bayesian Statistics for Genetics Lecture 2: Review of Probability and Bayes Theorem Jon Wakefield Departments of Statistics and Biostatistics University of Washington Outline Introduction

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

Bayesian Econometrics

Bayesian Econometrics Bayesian Econometrics Christopher A. Sims Princeton University sims@princeton.edu September 20, 2016 Outline I. The difference between Bayesian and non-bayesian inference. II. Confidence sets and confidence

More information

Regression With a Categorical Independent Variable: Mean Comparisons

Regression With a Categorical Independent Variable: Mean Comparisons Regression With a Categorical Independent Variable: Mean Lecture 16 March 29, 2005 Applied Regression Analysis Lecture #16-3/29/2005 Slide 1 of 43 Today s Lecture comparisons among means. Today s Lecture

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y

More information

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test. Economics 52 Econometrics Professor N.M. Kiefer LECTURE 1: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING NEYMAN-PEARSON LEMMA: Lesson: Good tests are based on the likelihood ratio. The proof is easy in the

More information

Partitioning Genetic Variance

Partitioning Genetic Variance PSYC 510: Partitioning Genetic Variance (09/17/03) 1 Partitioning Genetic Variance Here, mathematical models are developed for the computation of different types of genetic variance. Several substantive

More information

Outline. Limits of Bayesian classification Bayesian concept learning Probabilistic models for unsupervised and semi-supervised category learning

Outline. Limits of Bayesian classification Bayesian concept learning Probabilistic models for unsupervised and semi-supervised category learning Outline Limits of Bayesian classification Bayesian concept learning Probabilistic models for unsupervised and semi-supervised category learning Limitations Is categorization just discrimination among mutually

More information

Bayesian Analysis for Natural Language Processing Lecture 2

Bayesian Analysis for Natural Language Processing Lecture 2 Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion

More information