BTRY 4830/6830: Quantitative Genomics and Genetics

Similar documents
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Goodness of Fit Goodness of fit - 2 classes

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies.

BTRY 7210: Topics in Quantitative Genomics and Genetics

Association studies and regression

Bayesian Regression (1/31/13)

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Stat 5101 Lecture Notes

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

Multiple QTL mapping

CSC321 Lecture 18: Learning Probabilistic Models

BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Introduction to Machine Learning. Lecture 2

Case-Control Association Testing. Case-Control Association Testing

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Computational Systems Biology: Biology X

COS513 LECTURE 8 STATISTICAL CONCEPTS

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Introduction to QTL mapping in model organisms

MS&E 226: Small Data

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION

Bias Variance Trade-off

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Maximum-Likelihood Estimation: Basic Ideas

Bayesian RL Seminar. Chris Mansley September 9, 2008

Lecture 10: Generalized likelihood ratio test

F & B Approaches to a simple model

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

3 Comparison with Other Dummy Variable Methods

QTL model selection: key players

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Lecture 21: October 19

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

2. Map genetic distance between markers

Statistics: Learning models from data

Central Limit Theorem ( 5.3)

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

COMP90051 Statistical Machine Learning

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Introduction to Bayesian Methods

Introduction to QTL mapping in model organisms

Statistical Distribution Assumptions of General Linear Models

I Have the Power in QTL linkage: single and multilocus analysis

Composite Hypotheses and Generalized Likelihood Ratio Tests

Mathematical Statistics

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Frequentist Statistics and Hypothesis Testing Spring

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian Inference: What, and Why?

Generalized Linear Models

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

ECE521 week 3: 23/26 January 2017

Statistics for laboratory scientists II

A Very Brief Summary of Statistical Inference, and Examples

Parametric Techniques Lecture 3

Introduction to QTL mapping in model organisms

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

Predictive Distributions

Principles of Statistical Inference

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Principles of Statistical Inference

Ling 289 Contingency Table Statistics

Review of Maximum Likelihood Estimators

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Parametric Techniques

A Very Brief Summary of Statistical Inference, and Examples

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th

Probabilistic Graphical Models

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Bayesian Models in Machine Learning

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Machine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Time Series and Dynamic Models

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

Week 3: Linear Regression

Bayesian Updating with Continuous Priors Class 13, Jeremy Orloff and Jonathan Bloom

Compute f(x θ)f(θ) dθ

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

Bayesian Learning (II)

2018 SISG Module 20: Bayesian Statistics for Genetics Lecture 2: Review of Probability and Bayes Theorem

Linear Regression (1/1/17)

Bayesian Econometrics

Regression With a Categorical Independent Variable: Mean Comparisons

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

Partitioning Genetic Variance

Outline. Limits of Bayesian classification Bayesian concept learning Probabilistic models for unsupervised and semi-supervised category learning

Bayesian Analysis for Natural Language Processing Lecture 2

Transcription:

BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55

Announcements Homework #5 available (see your TA!) We will get you details for the final next week

Summary of lecture 23 We will review some basics of epistasis and testing for (potentially a good topic for your project!?) We will briefly discuss alternative testing approaches in GWAS We will provide a (brief) introduction to Bayesian inference

Review: epistasis epistasis - a case where the effect of an allele substitution at one locus A1 -> A2 alters the effect of a substituting an allele at another locus B1->B2 This may be equivalently phrased as a change in the expected phenotype (genotypic value) for a genotype at one locus conditional on the state of a locus at another marker Note that there is a symmetry in epistasis such that if the effect of at least one allelic substitution (from one genotype to another) for one locus depends on the genotype at the other locus, then at least one allelic substitution of the other locus will be dependent as well A consequence of this symmetry is if there is an epistatic relationship between two loci BOTH will be causal polymorphisms for the phenotype (!!!) If there is an epistatic effect (=relationship) between loci, we would therefore like to know this information Note that we need not consider such relationships for a pair of loci, but such relationships can exist among three (three-way), four (four-way), etc. The amount of epistasis among loci for any given phenotype is unknown (but without question it is ubiquitous!!)

Review: modeling epistasis I To model epistasis, we are going to use our same GLM framework (!!) The parameterization (using Xa and Xd) that we have considered so far perfectly models any case where there is no epistasis We will account for the possibility of epistasis by constructing additional dummy variables and adding additional parameters (so that we have 9 total in our GLM)

Review: modeling epistasis II Recall the dummy variables we have constructed so far: X a,1 = X a,2 = 1 for A 1 A 1 0 for A 1 A 2 X d,1 = 1 for A 2 A 2 1 for B 1 B 1 0 for B 1 B 2 X d,2 = 1 for B 2 B 2 We will use these dummy variables to construct additional dummy variables in our GLM (and add additional parameters) to account for epistasis Y = γ 1 (β µ + X a,1 β a,1 + X d,1 β d,1 + X a,2 β a,2 + X d,2 β d,2 + X a,1 X a,2 β a,a + X a,1 X d,2 β a,d + X d,1 X a,2 β d,a + X d,1 X d,2 β d,d ) 1 for A 1 A 1 1 for A 1 A 2 1 for A 2 A 2 1 for B 1 B 1 1 for B 1 B 2 1 for B 2 B 2

Review: modeling epistasis III Y = γ 1 (β µ + X a,1 β a,1 + X d,1 β d,1 + X a,2 β a,2 + X d,2 β d,2 + X a,1 X a,2 β a,a + X a,1 X d,2 β a,d + X d,1 X a,2 β d,a + X d,1 X d,2 β d,d ) To provide some intuition concerning what each of these are capturing, consider the values that each of the genotypes would take for dummy variable Xa,1: B 1 B 1 B 1 B 2 B 2 B 2 A 1 A 1 1-1 -1 A 1 A 2 0 0 0 A 2 A 2 1 1 1

Review: modeling epistasis IV Y = γ 1 (β µ + X a,1 β a,1 + X d,1 β d,1 + X a,2 β a,2 + X d,2 β d,2 + X a,1 X a,2 β a,a + X a,1 X d,2 β a,d + X d,1 X a,2 β d,a + X d,1 X d,2 β d,d ) To provide some intuition concerning what each of these are capturing, consider the values that each of the genotypes would take for dummy variable Xd,1: B 1 B 1 B 1 B 2 B 2 B 2 A 1 A 1 1-1 -1 A 1 A 2 1 1 1 A 2 A 2-1 -1-1

Review: modeling epistasis V Y = γ 1 (β µ + X a,1 β a,1 + X d,1 β d,1 + X a,2 β a,2 + X d,2 β d,2 + X a,1 X a,2 β a,a + X a,1 X d,2 β a,d + X d,1 X a,2 β d,a + X d,1 X d,2 β d,d ) To provide some intuition concerning what each of these are capturing, consider the values that each of the genotypes would take for dummy variable Xa,1,Xa,2: B 1 B 1 B 1 B 2 B 2 B 2 A 1 A 1 1 0 1 A 1 A 2 0 0 0 A 2 A 2 1 0-1

Review: modeling epistasis VI Y = γ 1 (β µ + X a,1 β a,1 + X d,1 β d,1 + X a,2 β a,2 + X d,2 β d,2 + X a,1 X a,2 β a,a + X a,1 X d,2 β a,d + X d,1 X a,2 β d,a + X d,1 X d,2 β d,d ) To provide some intuition concerning what each of these are capturing, consider the values that each of the genotypes would take for dummy variable Xa,1Xd,2 (similarly for Xa,2Xd,1): B 1 B 1 B 1 B 2 B 2 B 2 A 1 A 1 1-1 1 A 1 A 2 0 0 0 A 2 A 2-1 1-1

Review: modeling epistasis VII Y = γ 1 (β µ + X a,1 β a,1 + X d,1 β d,1 + X a,2 β a,2 + X d,2 β d,2 + X a,1 X a,2 β a,a + X a,1 X d,2 β a,d + X d,1 X a,2 β d,a + X d,1 X d,2 β d,d ) To provide some intuition concerning what each of these are capturing, consider the values that each of the genotypes would take for dummy variable Xd,1,Xd,2: B 1 B 1 B 1 B 2 B 2 B 2 A 1 A 1 1-1 1 A 1 A 2-1 1-1 A 2 A 2 1-1 1

Review: inference for epistasis 1 To infer epistatic relationships we will use the exact same genetic framework and statistical framework that we have been considering For the genetic framework, we are still testing markers that we are assuming are in LD with causal polymorphisms that could have an epistatic relationship (so we are indirectly inferring that there is epistasis from the marker genotypes) For inference, we going to estimate epistatic parameters using the same approach as before (!!), i.e. for a linear model: X =[1, X a,1, X d,1, X a,2, X d,2, X a,a, X a,d, X d,a, X d,d ] β =[β µ, β a,1, β d,1, β a,2, β d,2, β a,a, β a,d, β d,a, β d,d ] T ˆβ =(X T X) 1 X T y

Review: inference for epistasis II For hypothesis testing, we will just use an LRT calculated the same way as before (!!) For an F-statistic for a linear regression and for logistic estimate the parameters under the null and alternative model and substitute these into the likelihood equations that have the same form as before (with some additional dummy variables and parameters) The only difference is the degrees of freedom for a given test we consider = number of parameters in the alternative model - the number of parameters in the null model

Review: inference for epistasis III For example, we could use the entire model to test the same hypothesis that we have been considering for a single marker: H 0 : β a,1 =0 β d,1 = 0 H A : β a,1 =0 β d,1 = 0 We could also test whether either marker has evidence of being a causal polymorphism: H 0 : β a,1 =0 β d,1 =0 β a,2 =0 β d,2 = 0 H A : β a,1 =0 β d,1 =0 β a,2 =0 β d,2 = 0 We can also test just for epistasis (note this is equivalent to testing an interaction effect in an ANOVA!): H 0 : β a,a =0 β a,d =0 β d,a =0 β d,d = 0 H A : β a,a =0 β a,d =0 β d,a =0 β d,d = 0 We can also test the entire model (what is the interpretation in this case!?): H 0 : β a,1 =0 β d,1 =0 β a,2 =0 β d,2 =0 β a,a =0 β a,d =0 β d,a =0 β d,d = 0 H A : β a,1 =0 β d,1 =0 β a,2 =0 β d,2 =0 β a,a =0 β a,d =0 β d,a =0 β d,d = 0

Final notes on testing for epistasis Since testing for epistasis requires considering models with more parameters, these tests are generally less powerful than tests of one marker at a time In addition testing for epistasis among all possible pairs of markers (or three or four!, etc.) produces many tests (how many?) Also, identification of a causal polymorphism can be accomplished by testing just one marker at a time (!!) For these reasons, epistasis is often a secondary analysis and we often consider a subset of markers (what might be good strategies) Note however that correctly inferring epistasis is of value for many reasons (for example?) so we would like to do this How to infer epistasis is an active area of research (!!)

Review: GWAS analysis So far, we have considered a regression (generalized linear modeling = GLM) approach for constructing statistical models of the association of genetic polymorphisms and phenotype With this considered the following hypotheses: H 0 : β a =0 β d = 0 H A : β a =0 β d = 0 Note that this X coding of genotypes test the general null hypothesis (in fact, any coding X of the genotypes can be used to construct a test in a GWAS) There are therefore many other ways in which we could construct a different hypothesis test and any of these will be a reasonable (and acceptable) strategy for performing a GWAS analysis

Alternative tests in GWAS I Since our basic null / alternative hypothesis construction in GWAS covers a large number of possible relationships between genotypes and phenotypes, there are a large number of tests that we could apply in a GWAS e.g. t-tests, ANOVA, Wald s test, non-parametric permutation based tests, Kruskal-Wallis tests, other rank based tests, chisquare, Fisher s exact, Cochran-Armitage, etc. (see PLINK for a somewhat comprehensive list of tests used in GWAS) When can we use different tests? The only restriction is that our data conform to the assumptions of the test (examples?) We could therefore apply a diversity of tests for any given GWAS

Alternative tests in GWAS II Should we use different tests in a GWAS (and why)? Yes we should - the reason is different tests have different performance depending on the (unknown) conditions of the system and experiment, i.e. some may perform better than others In general, since we don t know the true conditions (and therefore which will be best suited) we should run a number of tests and compare results How to compare results of different GWAS is a fuzzy case (=no nonconditional rules) but a reasonable approach is to treat each test as a distinct GWAS analysis and compare the hits across analyses using the following rules: If all methods identify the same hits (=genomic locations) then this is good evidence that there is a causal polymorphism If methods do not agree on the position (e.g. some are significant, some are not) we should attempt to determine the reason for the discrepancy (this requires that we understand the tests and experience)

Alternative tests in GWAS III We do not have time in this course to do a comprehensive review of possible tests (keep in mind, every time you learn a new test in a statistics class, there is a good chance you could apply it in a GWAS!) Let s consider a few examples alternative tests that could be applied Remember that to apply these alternative tests, you will perform N alternative tests for each marker-phenotype combinations, where for each case, we are testing the following hypotheses with different (implicit) codings of X (!!): H 0 : Cov(Y,X) = 0 H A : Cov(Y,X) = 0

Alternative test examples I First, let s consider a case-control phenotype and consider a chi-square test (which has deep connections to our logistic regression test under certain assumptions but it has slightly different properties!) To construct the test statistic, we consider the counts of genotypephenotype combinations (left) and calculate the expected numbers in each cell (right): Case Control A 1 A 1 n 11 n 12 n 1. A 1 A 2 n 21 n 22 n 2. A 2 A 2 n 31 n 32 n 3. n.1 n.2 n We then construct the following test statistic: LRT = 2lnΛ = 2 in this χ 2 d.f.=2. Case Control A 1 A 1 (n.1 n 1. )/n (n.2 n 1. )/n n 1. A 1 A 2 (n.1 n 2. )/n (n.2 n 2. )/n n 2. A 2 A 2 (n.1 n 3. )/n (n.2 n 3. )/n n 3. n.1 n.2 n Where the (asymptotic) distribution when the null hypothesis is true is: 3 i=1 2 n i n ij ln n.i n j. j=1 ze tends to infinite, i.e. when the sam d.f. = (#columns-1)(#rows-1) = 2 an therefore calculate the statistic in

Alternative test examples II Second, let s consider a Fisher s exact test Note the the LRT for the null hypothesis under the chi-square test was only asymptotically exact, i.e. it is exact as sample size n approaches infinite but it is not exact for smaller sample sizes (although we hope it is close!) Could we construct a test that is exact for smaller sample sizes? Yes, we can calculate a Fisher s test statistic for our sample, where the distribution under the null hypothesis is exact for any sample size (I will let you look up how to calculate this statistic and the distribution under the null on your own): Case Control A 1 A 1 n 11 n 21 A 1 A 2 n 21 n 22 A 2 A 2 n 31 n 32 i-square test) is also often Given this test is exact, why would we ever use Chi-square / what is a rule for when we should use one versus the other?

Alternative test examples III Third, let s ways of grouping the cells, where we could apply either a chisquare or a Fisher s exact test For MAF = A1, we can apply a recessive (left) and dominance test (right): We could also apply an allele test (note these test names are from PLINK): Case Control A 1 A 1 n 11 n 12 A 1 A 2 A 2 A 2 n 21 n 22 Case Control A 1 n 11 n 12 A 2 n 21 n 22 Case Control A 1 A 1 A 1 A 2 n 11 n 12 A 2 A 2 n 21 n 22 When should we expect one of these tests to perform better than the others?

Basic GWAS wrap-up You now have all the tools at your disposal to perform a GWAS analysis of real data (!!) Recall that producing a good GWAS analysis requires iterative analysis of the data and considering why you might be getting the results that you observe Also recall that the more experience you have performing (careful / thoughtful) GWAS analyses, the better you will get at it!

Introduction to Bayesian analysis 1 Up to this point, we have considered statistical analysis (and inference) using a Frequentist formalism There is an alternative formalism called Bayesian that we will now (and in the final lectures) introduce in a very brief manner Note that there is an important conceptual split between statisticians who consider themselves Frequentist of Bayesian but for GWAS analysis (and for most applications where we are concerned with analyzing data) we do not have a preference, i.e. we only care about getting the right biological answer so any (or both) frameworks that get us to this goal are useful In GWAS (and mapping) analysis, you will see both frequentist (i.e. the framework we have built up to this point!) and Bayesian approaches applied

Introduction to Bayesian analysis II In both frequentist and Bayesian analyses, we have the same probabilistic framework (sample spaces, random variables, probability models, etc.) and when assuming our probability model falls in a family of parameterized distributions, we assume that a single fixed parameter value(s) describes the true model that produced our sample However, in a Bayesian framework, we now allow the parameter to have it s own probability distribution (we DO NOT do this in a frequentist analysis), such that we treat it as a random variable This may seem strange - how can we consider a parameter to have a probability distribution if it is fixed? However, we can if we have some prior assumptions about what values the parameter value will take for our system compared to others and we can make this prior assumption rigorous by assuming there is a probability distribution associated with the parameter It turns out, this assumption produces major differences between the two analysis procedures (in how they consider probability, how they perform inference, etc.

Introduction to Bayesian analysis III To introduce Bayesian statistics, we need to begin by introducing Bayes theorem he name Baye Consider a set of events (remember events!?) A = A 1...A k of a sample space S (where k may be infinite), which form a partition of the sample space, i.e. ple space S (where k may be infinite), whi k i A i = S and A i A j = for all i = j. For another event B S (which may be S itself) define the Law of total probability: k k Pr(B) = Pr(B A i )= Pr(B A i )Pr(A i ) i=1 i=1 A Now A we can state Bayes theorem: Pr(A i B) = Pr(A i B) Pr(B) = Pr(B A i)pr(a i ) Pr(B) = Pr(B A i )Pr(A) k i=1 Pr(B A i)pr(a i )

Introduction to Bayesian analysis IV Remember that in a Bayesian (not frequentist!) framework, our parameter(s) have a probability distribution associated with them that reflects our belief in the values that might be the true value of the parameter Since we are treating the parameter as a random variable, we can consider the joint distribution of the parameter AND a sample Y produced under a probability model: Pr(θ Y) Fo inference, we are interested in the probability the parameter takes a certain value given a sample: Pr(θ y) Using Bayes theorem, we can write: Pr(θ y) = Pr(y θ)pr(θ) Pr(y) Also note that since the sample is fixed (i.e. we are considering a single sample) Pr(y) =c, we can rewrite this as follows: Pr(θ y) Pr(y θ)pr(θ)

Introduction to Bayesian analysis V Let s consider the structure of our main equation in Bayesian statistics: Pr(θ y) Pr(y θ)pr(θ) Note that the left hand side is called the posterior probability: Pr(θ y) i.e. the The first term of the right hand side is something we have seen before, i.e. the likelihood (!!): Pr(y θ) =L(θ y) The second term of the right hand side is new and is called the prior: Pr(θ) Note that the prior is how we incorporate our assumptions concerning the values the true parameter value may take In a Bayesian framework, we are making two assumptions (unlike a frequentist where we make one assumption: 1. the probability distribution that generated the sample, 2. the probability distribution of the parameter

Probability in a Bayesian framework By allowing for the parameter to have an prior probability distribution, we produce a change in how we consider probability in a Bayesian versus Frequentist perspective For example, consider a coin flip, with Bern(p) In a Frequentist framework, we consider a conception of probability that we use for inference to reflect the outcomes as if we flipped the coin an infinite number of times, i.e. if we flipped the coin 100 times and it was heads each time, we do not use this information to change how we consider a new experiment with this same coin if we flipped it again In a Bayesian framework, we consider a conception of probability can incorporate previous observations, i.e. if we flipped a coin 100 times and it was heads each time, we might want to incorporate this information in to our inferences from a new experiment with this same coin if we flipped it again Note that this philosophic distinction is very deep (=we have only scratched the surface with this one example)

Debating the Frequentist versus Bayesian frameworks Frequentists often argue that because they do not take previous experience into account when performing their inference concerning the value of a parameter, such that they do not introduce biases into their inference framework In response, Bayesians often argue: Previous experience is used to specify the probability model in the first place By not incorporating previous experience in the inference procedure, prior assumptions are still being used (which can introduce logical inconsistencies!) The idea of considering an infinite number of observations is not particular realistic (and can be a non-sensical abstraction for the real world) The impact of prior assumptions in Bayesian inference disappear as the sample size goes to infinite Again, note that we have only scratched the surface of this debate!

Types of priors in Bayesian analysis Up to this point, we have discussed priors in an abstract manner To start making this concept more clear, let s consider one of our original examples where we are interested in the knowing the mean human height in the US (what are the components of the statistical framework for this example!? Note the basic components are the same in Frequentist / Bayesian!) If we assume a normal probability model of human height (what parameter are we interested in inferring in this case and why?) in a Bayesian framework, we will at least need to define a prior: Pr(µ) One possible approach is to make the probability of each possible value of the parameter the same (what distribution are we assuming and what is a problem with this approach), which defines an improper prior: Pr(µ) =c Another possible approach is to incorporate our previous observations that heights are seldom infinite, etc. where one choice for incorporating this observations is my defining a prior that has the same distribution as our probability model, which defines a conjugate prior (which is also a proper prior): ce, and use a math- Pr(µ) N(κ, φ 2 )

Constructing the posterior probability Let s put this all together for our heights in the US example First recall that our assumption is the probability model is normal (so what is the form of the likelihood?): dom variable Y N(µ, σ 2 ) 2 Second, assume a normal prior for the parameter we are interested in: ce, and use a math- Pr(µ) N(κ, φ 2 ), From the Bayesian equation, we can now put this together as follows: Pr(µ y) Pr(θ y) Pr(y θ)pr(θ) n i=1 1 e (y i µ)2 1 2σ 2 e (µ κ) 2φ 2 2πσ 2 2πφ 2 Note that with a little rearrangement, this can be written in the following form: Pr(µ y) N ( κ σ 2 + n i y i σ 2 ) ( 1 φ 2 + n σ 2 ), ( 1 φ 2 + n σ 2 ) 1 2

Bayesian inference: estimation Inference in a Bayesian framework differs from a frequentist framework in both estimation and hypothesis testing For example, for estimation in a Bayesian framework, we always construct estimators using the posterior probability distribution, for example: ˆθ = mean(θ y) = θpr(θ y)dθ For example, in our heights in the US example our estimator is: Note 1: again notice that the impact of the prior disappears as the sample size goes to infinite (=same as MLE under this condition): Note 2: estimates in a Bayesian framework can be different than in a likelihood (Frequentist) framework since estimator construction is fundamentally different (!!) or ˆθ = median(θ y) ˆµ = median(µ y) =mean(µ y) = ( κ σ 2 + nȳ σ 2 ) ( 1 φ 2 + n σ 2 ) ( κ + nȳ ) σ 2 σ 2 ( 1 + n ) ( nȳ ) σ 2 ( n ) ȳ φ 2 σ 2 σ 2

That s it for today Next lecture: we will continue our brief introduction to Bayesian statistics