Computational Approaches to Statistical Genetics

Size: px
Start display at page:

Download "Computational Approaches to Statistical Genetics"

Transcription

1 Computational Approaches to Statistical Genetics GWAS I: Concepts and Probability Theory Christoph Lippert Dr. Oliver Stegle Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Christoph Lippert GWAS I: Concepts and Probability Theory Summer

2 lect the most informative SNPs for genotyping are increasingly Overview irrelevant. By the time that researchers had determined that ag SNPs (a subset of informative SNPs) would suffice to cover liana Genome-wide genome, there was no economic association reason not to studies genotype 0 known high-quality SNPs that were not singletons (which that have been detected only in single individuals and whose power for other SNPs is therefore unknown) 16. Given: ortance of population structure refore, is the prospect of pinpointing individual genes with roaches? It is well known that demography affects linkage dism. One example is that there is more linkage disequilibrium ans than in Africans, reflecting humanity s African origins 12,13. is that for wild A. thaliana, linkage disequilibrium is more in North America than in Europe, consistent with the plant en introduced into North America only after Europeans set- 14,16. In both cases, the probable explanation is that there was eck in colonization, with recombination not yet having had me to whittle down linkage disequilibrium among the alleles n the limited number of founder chromosomes. haps not as widely recognized that, in the presence of popuucture, the genetic architecture of a trait in a sample of indipends on how the sample was assembled. For example, GWA immediately reveals the importance of the gene FRIGIDA in on in flowering time among A. thaliana strains from the northarts of Goal: continental Europe (where common loss-of-function an important determinant of early flowering) but not from sia (where no single loss-of-function allele is particularly fre- If variation in a trait is caused by numerous alleles of a single pposed to a small number of frequently occurring alleles), then rs carrying out a GWA scan using global samples run the risk of g that there is no major locus for this trait (Fig. 2). This is essenther facet of the problem with population structure that was d earlier: the importance of a particular allele always depends erence population, and it is far from clear which population is Genetics for multiple individuals e.g.: Single nucleotyde polymorphisms (SNPs), microsatelite markers,... Phenotypes for the same individuals e.g.: disease, height, gene-expression,... Try to find genetic markers, that explain the variance in the phenotype. F 1 generation F 2 generation Figure 1 GWA mapping is ineffective if there is strong genetic differentiation between subpopulations (that is, if there is structure in the population). In this example, two subpopulations of plants are depicted, one tall and one short (as illustrated and indicated by the numerical measurement), together with a schema of the genotype of each plant. The presence of red alleles increases the height of a plant, whereas blue alleles decrease the height; one locus has a major effect, and two have a minor effect. The many background markers (orange and green) are mostly exclusive to a specific subpopulation but are also strongly associated with height, even though they are not causal. By crossing the plants (shaded area) and generating an experimental population of F 2 generation or recombinant inbred lines, any linkage disequilibrium between background markers and causal markers is broken up, and the causal loci can then easily be mapped, albeit with relatively poor resolution. For maize, by contrast, Ed Buckler and colleagues have pioneered a distinct approach, which is called nested association mapping 26. GWA studies such as those underway in humans and A. thaliana would, at least for the next couple of years, be prohibitively expensive in maize, because its genome is larger than that of humans, is more polymorphic ul from Christoph an evolutionary Lippertperspective. GWAS I: Concepts and and has Probability less-extensive Theory linkage disequilibrium. Instead, Summer 5, recom- 2

3 lect the most informative SNPs for genotyping are increasingly Overview irrelevant. By the time that researchers had determined that ag SNPs (a subset of informative SNPs) would suffice to cover liana Genome-wide genome, there was no economic association reason not to studies genotype 0 known high-quality SNPs that were not singletons (which that have been detected only in single individuals and whose power for other SNPs is therefore unknown) 16. Given: ortance of population structure refore, is the prospect of pinpointing individual genes with roaches? It is well known that demography affects linkage dism. One example is that there is more linkage disequilibrium ans than in Africans, reflecting humanity s African origins 12,13. is that for wild A. thaliana, linkage disequilibrium is more in North America than in Europe, consistent with the plant en introduced into North America only after Europeans set- 14,16. In both cases, the probable explanation is that there was eck in colonization, with recombination not yet having had me to whittle down linkage disequilibrium among the alleles n the limited number of founder chromosomes. haps not as widely recognized that, in the presence of popuucture, the genetic architecture of a trait in a sample of indipends on how the sample was assembled. For example, GWA immediately reveals the importance of the gene FRIGIDA in on in flowering time among A. thaliana strains from the northarts of Goal: continental Europe (where common loss-of-function an important determinant of early flowering) but not from sia (where no single loss-of-function allele is particularly fre- If variation in a trait is caused by numerous alleles of a single pposed to a small number of frequently occurring alleles), then rs carrying out a GWA scan using global samples run the risk of g that there is no major locus for this trait (Fig. 2). This is essenther facet of the problem with population structure that was d earlier: the importance of a particular allele always depends erence population, and it is far from clear which population is Genetics for multiple individuals e.g.: Single nucleotyde polymorphisms (SNPs), microsatelite markers,... Phenotypes for the same individuals e.g.: disease, height, gene-expression,... Try to find genetic markers, that explain the variance in the phenotype. F 1 generation F 2 generation Figure 1 GWA mapping is ineffective if there is strong genetic differentiation between subpopulations (that is, if there is structure in the population). In this example, two subpopulations of plants are depicted, one tall and one short (as illustrated and indicated by the numerical measurement), together with a schema of the genotype of each plant. The presence of red alleles increases the height of a plant, whereas blue alleles decrease the height; one locus has a major effect, and two have a minor effect. The many background markers (orange and green) are mostly exclusive to a specific subpopulation but are also strongly associated with height, even though they are not causal. By crossing the plants (shaded area) and generating an experimental population of F 2 generation or recombinant inbred lines, any linkage disequilibrium between background markers and causal markers is broken up, and the causal loci can then easily be mapped, albeit with relatively poor resolution. For maize, by contrast, Ed Buckler and colleagues have pioneered a distinct approach, which is called nested association mapping 26. GWA studies such as those underway in humans and A. thaliana would, at least for the next couple of years, be prohibitively expensive in maize, because its genome is larger than that of humans, is more polymorphic ul from Christoph an evolutionary Lippertperspective. GWAS I: Concepts and and has Probability less-extensive Theory linkage disequilibrium. Instead, Summer 5, recom- 2

4 lect the most informative SNPs for genotyping are increasingly Overview irrelevant. By the time that researchers had determined that ag SNPs (a subset of informative SNPs) would suffice to cover liana Genome-wide genome, there was no economic association reason not to studies genotype 0 known high-quality SNPs that were not singletons (which that have been detected only in single individuals and whose power for other SNPs is therefore unknown) 16. Given: ortance of population structure refore, is the prospect of pinpointing individual genes with roaches? It is well known that demography affects linkage dism. One example is that there is more linkage disequilibrium ans than in Africans, reflecting humanity s African origins 12,13. is that for wild A. thaliana, linkage disequilibrium is more in North America than in Europe, consistent with the plant en introduced into North America only after Europeans set- 14,16. In both cases, the probable explanation is that there was eck in colonization, with recombination not yet having had me to whittle down linkage disequilibrium among the alleles n the limited number of founder chromosomes. haps not as widely recognized that, in the presence of popuucture, the genetic architecture of a trait in a sample of indipends on how the sample was assembled. For example, GWA immediately reveals the importance of the gene FRIGIDA in on in flowering time among A. thaliana strains from the northarts of Goal: continental Europe (where common loss-of-function an important determinant of early flowering) but not from sia (where no single loss-of-function allele is particularly fre- If variation in a trait is caused by numerous alleles of a single pposed to a small number of frequently occurring alleles), then rs carrying out a GWA scan using global samples run the risk of g that there is no major locus for this trait (Fig. 2). This is essenther facet of the problem with population structure that was d earlier: the importance of a particular allele always depends erence population, and it is far from clear which population is Genetics for multiple individuals e.g.: Single nucleotyde polymorphisms (SNPs), microsatelite markers,... Phenotypes for the same individuals e.g.: disease, height, gene-expression,... Try to find genetic markers, that explain the variance in the phenotype. F 1 generation F 2 generation Figure 1 GWA mapping is ineffective if there is strong genetic differentiation between subpopulations (that is, if there is structure in the population). In this example, two subpopulations of plants are depicted, one tall and one short (as illustrated and indicated by the numerical measurement), together with a schema of the genotype of each plant. The presence of red alleles increases the height of a plant, whereas blue alleles decrease the height; one locus has a major effect, and two have a minor effect. The many background markers (orange and green) are mostly exclusive to a specific subpopulation but are also strongly associated with height, even though they are not causal. By crossing the plants (shaded area) and generating an experimental population of F 2 generation or recombinant inbred lines, any linkage disequilibrium between background markers and causal markers is broken up, and the causal loci can then easily be mapped, albeit with relatively poor resolution. For maize, by contrast, Ed Buckler and colleagues have pioneered a distinct approach, which is called nested association mapping 26. GWA studies such as those underway in humans and A. thaliana would, at least for the next couple of years, be prohibitively expensive in maize, because its genome is larger than that of humans, is more polymorphic ul from Christoph an evolutionary Lippertperspective. GWAS I: Concepts and and has Probability less-extensive Theory linkage disequilibrium. Instead, Summer 5, recom- 2

5 lect the most informative SNPs for genotyping are increasingly Overview irrelevant. By the time that researchers had determined that ag SNPs (a subset of informative SNPs) would suffice to cover liana Genome-wide genome, there was no economic association reason not to studies genotype 0 known high-quality SNPs that were not singletons (which that have been detected only in single individuals and whose power for other SNPs is therefore unknown) 16. Given: ortance of population structure refore, is the prospect of pinpointing individual genes with roaches? It is well known that demography affects linkage dism. One example is that there is more linkage disequilibrium ans than in Africans, reflecting humanity s African origins 12,13. is that for wild A. thaliana, linkage disequilibrium is more in North America than in Europe, consistent with the plant en introduced into North America only after Europeans set- 14,16. In both cases, the probable explanation is that there was eck in colonization, with recombination not yet having had me to whittle down linkage disequilibrium among the alleles n the limited number of founder chromosomes. haps not as widely recognized that, in the presence of popuucture, the genetic architecture of a trait in a sample of indipends on how the sample was assembled. For example, GWA immediately reveals the importance of the gene FRIGIDA in on in flowering time among A. thaliana strains from the northarts of Goal: continental Europe (where common loss-of-function an important determinant of early flowering) but not from sia (where no single loss-of-function allele is particularly fre- If variation in a trait is caused by numerous alleles of a single pposed to a small number of frequently occurring alleles), then rs carrying out a GWA scan using global samples run the risk of g that there is no major locus for this trait (Fig. 2). This is essenther facet of the problem with population structure that was d earlier: the importance of a particular allele always depends erence population, and it is far from clear which population is Genetics for multiple individuals e.g.: Single nucleotyde polymorphisms (SNPs), microsatelite markers,... Phenotypes for the same individuals e.g.: disease, height, gene-expression,... Try to find genetic markers, that explain the variance in the phenotype. F 1 generation F 2 generation Figure 1 GWA mapping is ineffective if there is strong genetic differentiation between subpopulations (that is, if there is structure in the population). In this example, two subpopulations of plants are depicted, one tall and one short (as illustrated and indicated by the numerical measurement), together with a schema of the genotype of each plant. The presence of red alleles increases the height of a plant, whereas blue alleles decrease the height; one locus has a major effect, and two have a minor effect. The many background markers (orange and green) are mostly exclusive to a specific subpopulation but are also strongly associated with height, even though they are not causal. By crossing the plants (shaded area) and generating an experimental population of F 2 generation or recombinant inbred lines, any linkage disequilibrium between background markers and causal markers is broken up, and the causal loci can then easily be mapped, albeit with relatively poor resolution. For maize, by contrast, Ed Buckler and colleagues have pioneered a distinct approach, which is called nested association mapping 26. GWA studies such as those underway in humans and A. thaliana would, at least for the next couple of years, be prohibitively expensive in maize, because its genome is larger than that of humans, is more polymorphic ul from Christoph an evolutionary Lippertperspective. GWAS I: Concepts and and has Probability less-extensive Theory linkage disequilibrium. Instead, Summer 5, recom- 2

6 Overview Some definitions Genotype denotes the genetic state of an individual. Usually denoted by x n for individual n. Phenotype denotes the state of a trait of an individual. Usually denoted by y n for individual n. A Locus is a position or limited region in the genome. Usually denoted by xs for locus (or SNP) s. An allele is the genetic state of a locus. Christoph Lippert GWAS I: Concepts and Probability Theory Summer

7 Overview Some definitions Genotype denotes the genetic state of an individual. Usually denoted by x n for individual n. Phenotype denotes the state of a trait of an individual. Usually denoted by y n for individual n. A Locus is a position or limited region in the genome. Usually denoted by xs for locus (or SNP) s. An allele is the genetic state of a locus. Christoph Lippert GWAS I: Concepts and Probability Theory Summer

8 Overview Some definitions Genotype denotes the genetic state of an individual. Usually denoted by x n for individual n. Phenotype denotes the state of a trait of an individual. Usually denoted by y n for individual n. A Locus is a position or limited region in the genome. Usually denoted by xs for locus (or SNP) s. An allele is the genetic state of a locus. Christoph Lippert GWAS I: Concepts and Probability Theory Summer

9 Overview Some definitions Genotype denotes the genetic state of an individual. Usually denoted by x n for individual n. Phenotype denotes the state of a trait of an individual. Usually denoted by y n for individual n. A Locus is a position or limited region in the genome. Usually denoted by xs for locus (or SNP) s. An allele is the genetic state of a locus. Christoph Lippert GWAS I: Concepts and Probability Theory Summer

10 Overview More definitions An organism/cell is haploid if it only has one chromosome set or identical chromosome sets. e.g. A. thaliana, sperm cells or inbred lab strains An organism/cell is diploid if it has two separately inherited homologous chromosomes. e.g. human An organism/cell is polyploid if it has more than two homologous chromosomes. e.g. sugar cane is hexaploid. Christoph Lippert GWAS I: Concepts and Probability Theory Summer

11 Overview More definitions An organism/cell is haploid if it only has one chromosome set or identical chromosome sets. e.g. A. thaliana, sperm cells or inbred lab strains An organism/cell is diploid if it has two separately inherited homologous chromosomes. e.g. human An organism/cell is polyploid if it has more than two homologous chromosomes. e.g. sugar cane is hexaploid. Christoph Lippert GWAS I: Concepts and Probability Theory Summer

12 Overview More definitions An organism/cell is haploid if it only has one chromosome set or identical chromosome sets. e.g. A. thaliana, sperm cells or inbred lab strains An organism/cell is diploid if it has two separately inherited homologous chromosomes. e.g. human An organism/cell is polyploid if it has more than two homologous chromosomes. e.g. sugar cane is hexaploid. Christoph Lippert GWAS I: Concepts and Probability Theory Summer

13 Overview Even more definitions Haplotype denotes an individual s state of a single set of chromosomes (paternal or maternal). A locus is heterozygous if it differs between paternal and maternal haplotypes. heterozygous allele usually encoded as 1 A locus is homozygous if it matches between paternal and maternal haplotypes. homozygous major allele usually encoded as 0 homozygous minor allele usually encoded as 2 Christoph Lippert GWAS I: Concepts and Probability Theory Summer

14 Overview Even more definitions Haplotype denotes an individual s state of a single set of chromosomes (paternal or maternal). A locus is heterozygous if it differs between paternal and maternal haplotypes. heterozygous allele usually encoded as 1 A locus is homozygous if it matches between paternal and maternal haplotypes. homozygous major allele usually encoded as 0 homozygous minor allele usually encoded as 2 Christoph Lippert GWAS I: Concepts and Probability Theory Summer

15 Overview Even more definitions Haplotype denotes an individual s state of a single set of chromosomes (paternal or maternal). A locus is heterozygous if it differs between paternal and maternal haplotypes. heterozygous allele usually encoded as 1 A locus is homozygous if it matches between paternal and maternal haplotypes. homozygous major allele usually encoded as 0 homozygous minor allele usually encoded as 2 Christoph Lippert GWAS I: Concepts and Probability Theory Summer

16 Overview Columns In statistics, association is any relationship between two measured quantities that renders them statistically dependent. Direct association Indirect association Can be beneficial e.g.: Linkage Can be harmful e.g.: Population structure Oxford Dictionary of Statistics correlation statistical dependence Christoph Lippert GWAS I: Concepts and Probability Theory Summer

17 Overview Columns In statistics, association is any relationship between two measured quantities that renders them statistically dependent. Direct association Indirect association Can be beneficial e.g.: Linkage Can be harmful e.g.: Population structure Oxford Dictionary of Statistics correlation statistical dependence Christoph Lippert GWAS I: Concepts and Probability Theory Summer

18 Overview Columns In statistics, association is any relationship between two measured quantities that renders them statistically dependent. Direct association Indirect association Can be beneficial e.g.: Linkage Can be harmful e.g.: Population structure Oxford Dictionary of Statistics correlation statistical dependence Christoph Lippert GWAS I: Concepts and Probability Theory Summer

19 Overview Columns In statistics, association is any relationship between two measured quantities that renders them statistically dependent. Direct association Indirect association Can be beneficial e.g.: Linkage Can be harmful e.g.: Population structure Oxford Dictionary of Statistics correlation statistical dependence Christoph Lippert GWAS I: Concepts and Probability Theory Summer

20 Overview Columns In statistics, association is any relationship between two measured quantities that renders them statistically dependent. Direct association Indirect association Can be beneficial e.g.: Linkage Can be harmful e.g.: Population structure Oxford Dictionary of Statistics correlation statistical dependence Christoph Lippert GWAS I: Concepts and Probability Theory Summer

21 Overview Columns In statistics, association is any relationship between two measured quantities that renders them statistically dependent. Direct association Indirect association Can be beneficial e.g.: Linkage Can be harmful e.g.: Population structure Oxford Dictionary of Statistics correlation statistical dependence Christoph Lippert GWAS I: Concepts and Probability Theory Summer

22 Overview Linkage Disequilibrium Gametic Phase Disequilibrium Association between two loci. Deviation from random co-inheritance between loci. LD can be caused by recombination, population structure, epistasis Measures of LD between two loci x 1 and x 2 are D and r 2. D = f AA f.a f A.. D 2 r 2 = f AA f AB f BA f BB D 0 and r 2 0 are indicators of LD. Christoph Lippert GWAS I: Concepts and Probability Theory Summer A1 x1 A2 B2

23 Overview Linkage Disequilibrium Gametic Phase Disequilibrium Association between two loci. Deviation from random co-inheritance between loci. LD can be caused by recombination, population structure, epistasis Measures of LD between two loci x 1 and x 2 are D and r 2. D = faa f.a f A.. D 2 r 2 = f AA f AB f BA f BB D 0 and r 2 0 are indicators of LD. Christoph Lippert GWAS I: Concepts and Probability Theory Summer A1 x1 A2 B2

24 Overview Linkage Disequilibrium Gametic Phase Disequilibrium Association between two loci. Deviation from random co-inheritance between loci. LD can be caused by recombination, population structure, epistasis Measures of LD between two loci x 1 and x 2 are D and r 2. D = f AA f.a f A.. D 2 r 2 = f AA f AB f BA f BB D 0 and r 2 0 are indicators of LD. Christoph Lippert GWAS I: Concepts and Probability Theory Summer A1 x1 A2 B2

25 Overview Linkage Disequilibrium Gametic Phase Disequilibrium Association between two loci. Deviation from random co-inheritance between loci. LD can be caused by recombination, population structure, epistasis Measures of LD between two loci x 1 and x 2 are D and r 2. D = f AA f.a f A.. D 2 r 2 = f AA f AB f BA f BB D 0 and r 2 0 are indicators of LD. A1 x1 x 2 = A 2 x 2 = B 2 x 1 = A 1 f AA f AB f A. x 1 = B 1 f BA f BB f B. f.a f.b Christoph Lippert GWAS I: Concepts and Probability Theory Summer A2 B2

26 Overview Linkage Disequilibrium Gametic Phase Disequilibrium Association between two loci. Deviation from random co-inheritance between loci. LD can be caused by recombination, population structure, epistasis Measures of LD between two loci x 1 and x 2 are D and r 2. D = faa f.a f A.. D 2 r 2 = f AA f AB f BA f BB D 0 and r 2 0 are indicators of LD. A1 x1 x 2 = A 2 x 2 = B 2 x 1 = A 1 f AA f AB f A. x 1 = B 1 f BA f BB f B. f.a f.b Christoph Lippert GWAS I: Concepts and Probability Theory Summer A2 B2

27 Overview Linkage Disequilibrium Gametic Phase Disequilibrium Association between two loci. Deviation from random co-inheritance between loci. LD can be caused by recombination, population structure, epistasis Measures of LD between two loci x 1 and x 2 are D and r 2. D = faa f.a f A.. D 2 r 2 = f AA f AB f BA f BB D 0 and r 2 0 are indicators of LD. A1 x1 x 2 = A 2 x 2 = B 2 x 1 = A 1 f AA f AB f A. x 1 = B 1 f BA f BB f B. f.a f.b Christoph Lippert GWAS I: Concepts and Probability Theory Summer A2 B2

28 Overview Linkage Disequilibrium Gametic Phase Disequilibrium Association between two loci. Deviation from random co-inheritance between loci. LD can be caused by recombination, population structure, epistasis Measures of LD between two loci x 1 and x 2 are D and r 2. D = faa f.a f A.. D 2 r 2 = f AA f AB f BA f BB D 0 and r 2 0 are indicators of LD. A1 x1 x 2 = A 2 x 2 = B 2 x 1 = A 1 f AA f AB f A. x 1 = B 1 f BA f BB f B. f.a f.b Christoph Lippert GWAS I: Concepts and Probability Theory Summer A2 B2

29 Overview Linkage Disequilibrium (LD) Physical LD Recombination causes LD between loci. LD is not uniform along the chromosome. Recombination hotspots on the chromosome lead to conserved haplotype blocks in strong LD. Physical LD can be used to chose tag-snps to cover all linked regions. Tradeoff between resolution and genotyping cost. Christoph Lippert GWAS I: Concepts and Probability Theory Summer

30 Overview Linkage Disequilibrium (LD) Physical LD Recombination causes LD between loci. LD is not uniform along the chromosome. Recombination hotspots on the chromosome lead to conserved haplotype blocks in strong LD. Physical LD can be used to chose tag-snps to cover all linked regions. Tradeoff between resolution and genotyping cost. Christoph Lippert GWAS I: Concepts and Probability Theory Summer

31 Overview Linkage Disequilibrium (LD) Physical LD I Recombination causes LD between loci. I LD is not uniform along the chromosome. I Recombination hotspots on the chromosome lead to conserved haplotype blocks in strong LD. Physical LD can be used to chose tag-snps to cover all linked regions. I I Tradeoff between resolution and genotyping cost. Christoph Lippert GWAS I: Concepts and Probability Theory Summer

32 Overview Linkage Disequilibrium (LD) Physical LD I Recombination causes LD between loci. I LD is not uniform along the chromosome. I Recombination hotspots on the chromosome lead to conserved haplotype blocks in strong LD. Physical LD can be used to chose tag-snps to cover all linked regions. I I Tradeoff between resolution and genotyping cost. Christoph Lippert GWAS I: Concepts and Probability Theory Summer

33 Overview Linkage Disequilibrium (LD) Physical LD Recombination causes LD between loci. LD is not uniform along the chromosome. Recombination hotspots on the chromosome lead to conserved haplotype blocks in strong LD. Physical LD can be used to chose tag-snps to cover all linked regions. Tradeoff between resolution and genotyping cost. Christoph Lippert GWAS I: Concepts and Probability Theory Summer

34 Outline Outline Christoph Lippert GWAS I: Concepts and Probability Theory Summer

35 Motivation Outline Overview Motivation Prerequisites Probability Theory Parameter Inference for the Gaussian Summary Christoph Lippert GWAS I: Concepts and Probability Theory Summer

36 Motivation Why probabilistic modeling? Inferences from data are intrinsically uncertain. Probability theory: model uncertainty instead of ignoring it! Applications are not limited to statistical genetics: Machine Learning, Data Mining, Pattern Recognition, etc. Goal of this part of the course Overview on probabilistic modeling Key concepts Focus on Applications in statistical genetics Christoph Lippert GWAS I: Concepts and Probability Theory Summer

37 Motivation Why probabilistic modeling? Inferences from data are intrinsically uncertain. Probability theory: model uncertainty instead of ignoring it! Applications are not limited to statistical genetics: Machine Learning, Data Mining, Pattern Recognition, etc. Goal of this part of the course Overview on probabilistic modeling Key concepts Focus on Applications in statistical genetics Christoph Lippert GWAS I: Concepts and Probability Theory Summer

38 Motivation Why probabilistic modeling? Inferences from data are intrinsically uncertain. Probability theory: model uncertainty instead of ignoring it! Applications are not limited to statistical genetics: Machine Learning, Data Mining, Pattern Recognition, etc. Goal of this part of the course Overview on probabilistic modeling Key concepts Focus on Applications in statistical genetics Christoph Lippert GWAS I: Concepts and Probability Theory Summer

39 Motivation Further reading, useful material Christopher M. Bishop: Pattern Recognition and Machine learning. Good background, covers most of the machine learning used in this course and much more! Substantial parts of this tutorial borrow figures and ideas from this book. David J.C. MacKay: Information Theory, Learning and Inference Very worthwhile reading, not quite the same quality of overlap with the lecture synopsis. Freely available online. Christoph Lippert GWAS I: Concepts and Probability Theory Summer

40 Motivation Lecture overview 1. An Introduction to probabilistic modeling 2. Linear models 3. Hypothesis testing 4. Principle Components Analysis 5. Linear Mixed Models Christoph Lippert GWAS I: Concepts and Probability Theory Summer

41 Outline Outline Christoph Lippert GWAS I: Concepts and Probability Theory Summer

42 Prerequisites Outline Overview Motivation Prerequisites Probability Theory Parameter Inference for the Gaussian Summary Christoph Lippert GWAS I: Concepts and Probability Theory Summer

43 Prerequisites Key concepts Data Let D denote a dataset, consisting of N datapoints D = { x n, y n } N n=1. }{{} Inputs }{{} Outputs Typical (this course) x = {x1,..., x S } multivariate, spanning S features for each observation (SNPs, markers, etc.). y univariate (phenotype, disease status, expression level etc.). Notation: Scalars are printed as y. Vectors are printed in bold: x. Matrices are printed in capital bold: Σ. Christoph Lippert GWAS I: Concepts and Probability Theory Summer

44 Prerequisites Key concepts Data Let D denote a dataset, consisting of N datapoints D = { x n, y n } N n=1. }{{} Inputs }{{} Outputs Typical (this course) x = {x1,..., x S } multivariate, spanning S features for each observation (SNPs, markers, etc.). y univariate (phenotype, disease status, expression level etc.). Notation: Scalars are printed as y. Vectors are printed in bold: x. Matrices are printed in capital bold: Σ. Christoph Lippert GWAS I: Concepts and Probability Theory Summer Y X

45 Prerequisites Key concepts Data Let D denote a dataset, consisting of N datapoints D = { x n, y n } N n=1. }{{} Inputs }{{} Outputs Typical (this course) x = {x1,..., x S } multivariate, spanning S features for each observation (SNPs, markers, etc.). y univariate (phenotype, disease status, expression level etc.). Notation: Scalars are printed as y. Vectors are printed in bold: x. Matrices are printed in capital bold: Σ. Christoph Lippert GWAS I: Concepts and Probability Theory Summer Y X

46 Prerequisites Key concepts Predictions Observed dataset D = { x n }{{} Inputs, y n }{{} Outputs } N n=1. Given D, what can we say about y at an unseen test input x? Y X Christoph Lippert GWAS I: Concepts and Probability Theory Summer

47 Prerequisites Key concepts Predictions Observed dataset D = { x n }{{} Inputs, y n }{{} Outputs } N n=1. Given D, what can we say about y at an unseen test input x? Y? X x* Christoph Lippert GWAS I: Concepts and Probability Theory Summer

48 Prerequisites Key concepts Model Observed dataset D = { x n }{{} Inputs, y n }{{} Outputs } N n=1. Given D, what can we say about y at an unseen test input x? To make predictions we need to make assumptions. A model H encodes these assumptions and often depends on some parameters θ. Curve fitting: the model relates x to y, y = f(x θ) = θ 0 + θ 1 x }{{} example, a linear model Y X? x* Christoph Lippert GWAS I: Concepts and Probability Theory Summer

49 Prerequisites Key concepts Model Observed dataset D = { x n }{{} Inputs, y n }{{} Outputs } N n=1. Given D, what can we say about y at an unseen test input x? To make predictions we need to make assumptions. A model H encodes these assumptions and often depends on some parameters θ. Curve fitting: the model relates x to y, y = f(x θ) = θ 0 + θ 1 x }{{} example, a linear model Y X x* Christoph Lippert GWAS I: Concepts and Probability Theory Summer

50 Prerequisites Key concepts Uncertainty Virtually in all steps there is uncertainty Measurement uncertainty (D) Parameter uncertainty (θ) Uncertainty regarding the correct model (H) Uncertainty can occur in both inputs and outputs. How to represent uncertainty? Y X Christoph Lippert GWAS I: Concepts and Probability Theory Summer

51 Prerequisites Key concepts Uncertainty Virtually in all steps there is uncertainty Measurement uncertainty (D) Parameter uncertainty (θ) Uncertainty regarding the correct model (H) Uncertainty can occur in both inputs and outputs. How to represent uncertainty? Y X Christoph Lippert GWAS I: Concepts and Probability Theory Summer

52 Prerequisites Key concepts Uncertainty Virtually in all steps there is uncertainty Measurement uncertainty (D) Parameter uncertainty (θ) Uncertainty regarding the correct model (H) Measurement uncertainty Uncertainty can occur in both inputs and outputs. How to represent uncertainty? Y X Christoph Lippert GWAS I: Concepts and Probability Theory Summer

53 Probability Theory Outline Overview Motivation Prerequisites Probability Theory Parameter Inference for the Gaussian Summary Christoph Lippert GWAS I: Concepts and Probability Theory Summer

54 Probability Theory Probabilities Let X be a random variable, defined over a set X or measurable space. P (X = x) denotes the probability that X takes value x, short p(x). Probabilities are positive, P (X = x) 0 Probabilities sum to one p(x)dx = 1 p(x) = 1 x X x X Special case: no uncertainty p(x) = δ(x ˆx). Christoph Lippert GWAS I: Concepts and Probability Theory Summer

55 Probability Theory Probability Theory Marginal Probability P (X = x i ) = c i N Conditional Probability Joint Probability P (X = x i, Y = y j ) = n i,j N P (Y = y j X = x i ) = n i,j c i (C.M. Bishop, Pattern Recognition and Machine Learning) Christoph Lippert GWAS I: Concepts and Probability Theory Summer

56 Probability Theory Probability Theory Marginal Probability P (X = x i ) = c i N Conditional Probability Product Rule P (X = x i, Y = y j ) = n i,j N = n i,j ci c i N = P (Y = y j X = x i )P (X = x i ) P (Y = y j X = x i ) = n i,j c i (C.M. Bishop, Pattern Recognition and Machine Learning) Christoph Lippert GWAS I: Concepts and Probability Theory Summer

57 Probability Theory Probability Theory Sum Rule Product Rule P (X = x i ) = c i N = 1 N = j L j=1 n i,j P (X = x i, Y = y j ) P (X = x i, Y = y j ) = n i,j N = n i,j ci c i N = P (Y = y j X = x i )P (X = x i ) (C.M. Bishop, Pattern Recognition and Machine Learning) Christoph Lippert GWAS I: Concepts and Probability Theory Summer

58 Probability Theory The Rules of Probability Sum & Product Rule Sum Rule p(x) = y p(x, y) Product Rule p(x, y) = p(y x)p(x) Christoph Lippert GWAS I: Concepts and Probability Theory Summer

59 Probability Theory The Rules of Probability Bayes Theorem Using the product rule we obtain p(y x) = p(x) = y p(x y)p(y) p(x) p(x y)p(y) Christoph Lippert GWAS I: Concepts and Probability Theory Summer

60 Probability Theory Bayesian probability calculus Bayes rule is the basis for inference and learning. Assume we have a model with parameters θ, e.g. Y y = θ 0 + θ 1 x Goal: learn parameters θ given Data D. X x* p(θ D) = p(d θ) p(θ) p(d) Posterior Likelihood Prior Christoph Lippert GWAS I: Concepts and Probability Theory Summer

61 Probability Theory Bayesian probability calculus Bayes rule is the basis for inference and learning. Assume we have a model with parameters θ, e.g. Y y = θ 0 + θ 1 x Goal: learn parameters θ given Data D. p(d θ) p(θ) p(θ D) = p(d) posterior likelihood prior X x* Posterior Likelihood Prior Christoph Lippert GWAS I: Concepts and Probability Theory Summer

62 Probability Theory Information and Entropy Information is the reduction of uncertainty. Entropy H(X) is the quantitative description of uncertainty H(X) = 0: certainty about X. H(X) maximal if all possibilities are equal probable. Uncertainty and information are additive. These conditions are fulfilled by the entropy function: H(X) = x X P (X = x) log P (X = x) Christoph Lippert GWAS I: Concepts and Probability Theory Summer

63 Probability Theory Information and Entropy Information is the reduction of uncertainty. Entropy H(X) is the quantitative description of uncertainty H(X) = 0: certainty about X. H(X) maximal if all possibilities are equal probable. Uncertainty and information are additive. These conditions are fulfilled by the entropy function: H(X) = x X P (X = x) log P (X = x) Christoph Lippert GWAS I: Concepts and Probability Theory Summer

64 Probability Theory Definitions related to entropy and information Entropy is the average surprise Conditional entropy H(X Y ) = Mutual information H(X) = x X P (X = x) ( log P (X = x)) }{{} surprise x X,y Y P (X = x, Y = y) log P (X = x Y = y) I(X : Y ) = H(X) H(X Y ) = H(Y ) H(Y X) H(X) + H(Y ) H(X, Y ) Independence of X and Y, p(x, y) = p(x)p(y). Christoph Lippert GWAS I: Concepts and Probability Theory Summer

65 Probability Theory Definitions related to entropy and information Entropy is the average surprise Conditional entropy H(X Y ) = Mutual information H(X) = x X P (X = x) ( log P (X = x)) }{{} surprise x X,y Y P (X = x, Y = y) log P (X = x Y = y) I(X : Y ) = H(X) H(X Y ) = H(Y ) H(Y X) H(X) + H(Y ) H(X, Y ) Independence of X and Y, p(x, y) = p(x)p(y). Christoph Lippert GWAS I: Concepts and Probability Theory Summer

66 Probability Theory Definitions related to entropy and information Entropy is the average surprise Conditional entropy H(X Y ) = Mutual information H(X) = x X P (X = x) ( log P (X = x)) }{{} surprise x X,y Y P (X = x, Y = y) log P (X = x Y = y) I(X : Y ) = H(X) H(X Y ) = H(Y ) H(Y X) H(X) + H(Y ) H(X, Y ) Independence of X and Y, p(x, y) = p(x)p(y). Christoph Lippert GWAS I: Concepts and Probability Theory Summer

67 Probability Theory Definitions related to entropy and information Entropy is the average surprise Conditional entropy H(X Y ) = Mutual information H(X) = x X P (X = x) ( log P (X = x)) }{{} surprise x X,y Y P (X = x, Y = y) log P (X = x Y = y) I(X : Y ) = H(X) H(X Y ) = H(Y ) H(Y X) H(X) + H(Y ) H(X, Y ) Independence of X and Y, p(x, y) = p(x)p(y). Christoph Lippert GWAS I: Concepts and Probability Theory Summer

68 Probability Theory Entropy in action The optimal weighting problem Given 12 balls, all equal except for one that is lighter or heavier. What is the ideal weighting strategy and how many weightings are needed to identify the odd ball? Christoph Lippert GWAS I: Concepts and Probability Theory Summer

69 Probability Theory Probability distributions Gaussian p(x µ, σ 2 ) = N (x µ, σ) = 1 2πσ 2 e 1 2σ 2 (x µ) Multivariate Gaussian p(x µ, Σ) = N (x µ, Σ) [ 1 = exp 1 ] 2πΣ 2 (x µ)t Σ 1 (x µ) Christoph Lippert GWAS I: Concepts and Probability Theory Summer

70 Probability Theory Probability distributions Gaussian p(x µ, σ 2 ) = N (x µ, σ) = 1 2πσ 2 e 1 2σ 2 (x µ) Multivariate Gaussian p(x µ, Σ) = N (x µ, Σ) [ 1 = exp 1 ] 2πΣ 2 (x µ)t Σ 1 (x µ) Σ = Christoph Lippert GWAS I: Concepts and Probability Theory Summer

71 Probability Theory Probability distributions continued... Bernoulli Gamma p(x θ) = θ x (1 θ) 1 x p(x a, b) = ba Γ(a) xa 1 e bx Christoph Lippert GWAS I: Concepts and Probability Theory Summer

72 Probability Theory Probability distributions continued... Bernoulli Gamma p(x θ) = θ x (1 θ) 1 x p(x a, b) = ba Γ(a) xa 1 e bx p(x a = 1, b = 1) x Christoph Lippert GWAS I: Concepts and Probability Theory Summer

73 Probability Theory Probability distributions The Gaussian revisited Gaussian PDF N ( x µ, σ 2 ) = 1 2πσ 2 e 1 2σ 2 (x µ)2 Positive: N ( x µ, σ 2 ) > 0 Normalized: Expectation: < x >= + + N (x µ, σ) dx = 1 (check) N ( x µ, σ 2 ) xdx = µ Variance: Var[x] =< x 2 > < x > 2 = µ 2 + σ 2 µ 2 = σ Christoph Lippert GWAS I: Concepts and Probability Theory Summer

74 Probability Theory Probability distributions The Gaussian revisited Gaussian PDF N ( x µ, σ 2 ) = 1 2πσ 2 e 1 2σ 2 (x µ)2 Positive: N ( x µ, σ 2 ) > 0 Normalized: Expectation: < x >= + + N (x µ, σ) dx = 1 (check) N ( x µ, σ 2 ) xdx = µ Variance: Var[x] =< x 2 > < x > 2 = µ 2 + σ 2 µ 2 = σ Christoph Lippert GWAS I: Concepts and Probability Theory Summer

75 Parameter Inference for the Gaussian Outline Overview Motivation Prerequisites Probability Theory Parameter Inference for the Gaussian Summary Christoph Lippert GWAS I: Concepts and Probability Theory Summer

76 Parameter Inference for the Gaussian Inference for the Gaussian Ingredients Data D = {x 1,..., x N } Model H Gauss Gaussian PDF N ( x µ, σ 2 ) = Likelihood p(d θ) = 1 2πσ 2 e 1 2σ 2 (x µ)2 θ = {µ, σ 2 } N N ( x n µ, σ 2) n=1 Christoph Lippert GWAS I: Concepts and Probability Theory Summer

77 Parameter Inference for the Gaussian Inference for the Gaussian Ingredients Data D = {x 1,..., x N } Model H Gauss Gaussian PDF N ( x µ, σ 2 ) = Likelihood p(d θ) = 1 2πσ 2 e 1 2σ 2 (x µ)2 θ = {µ, σ 2 } N N ( x n µ, σ 2) n= Christoph Lippert GWAS I: Concepts and Probability Theory Summer

78 Parameter Inference for the Gaussian Inference for the Gaussian Ingredients Data D = {x 1,..., x N } Model H Gauss Gaussian PDF p(x) N ( x µ, σ 2 ) = 1 2πσ 2 e 1 2σ 2 (x µ)2 N (x n µ, σ 2 ) θ = {µ, σ 2 } Likelihood N p(d θ) = N ( x n µ, σ 2) n=1 x n (C.M. Bishop, Pattern Recognition and Machine Learning) x Christoph Lippert GWAS I: Concepts and Probability Theory Summer

79 Parameter Inference for the Gaussian Inference for the Gaussian Maximum likelihood Likelihood N p(d θ) = N ( x n µ, σ 2) n=1 p(x) N (x n µ, σ 2 ) Maximum likelihood ˆθ = argmax p(d θ) θ x n (C.M. Bishop, Pattern Recognition and Machine Learning) x Christoph Lippert GWAS I: Concepts and Probability Theory Summer

80 Parameter Inference for the Gaussian Inference for the Gaussian Maximum likelihood ˆθ = argmax p(d θ) θ = argmax θ N n=1 1 2πσ 2 e 1 2σ 2 (xn µ)2 Christoph Lippert GWAS I: Concepts and Probability Theory Summer

81 Parameter Inference for the Gaussian Inference for the Gaussian Maximum likelihood ˆθ = argmax θ ln p(d θ) = argmax θ ln N n=1 1 2πσ 2 e 1 2σ 2 (xn µ)2 Christoph Lippert GWAS I: Concepts and Probability Theory Summer

82 Parameter Inference for the Gaussian Inference for the Gaussian Maximum likelihood ˆθ = argmax ln p(d θ) = argmax θ θ [ N 2 ln(2π) N 2 ln σ2 1 2σ 2 ] N (x n µ) 2 n=1 Christoph Lippert GWAS I: Concepts and Probability Theory Summer

83 Parameter Inference for the Gaussian Inference for the Gaussian Maximum likelihood ˆθ = argmax ln p(d θ) = argmax θ θ [ N 2 ln(2π) N 2 ln σ2 1 2σ 2 ˆµ : d µ ln p(d µ) = 0 ˆσ2 : ] N (x n µ) 2 n=1 d σ 2 ln p(d σ2 ) = 0 Christoph Lippert GWAS I: Concepts and Probability Theory Summer

84 Parameter Inference for the Gaussian Inference for the Gaussian Maximum likelihood Christoph Lippert GWAS I: Concepts and Probability Theory Summer

85 Parameter Inference for the Gaussian Inference for the Gaussian Maximum likelihood Maximum likelihood solutions µ ML = 1 N σ 2 ML = 1 N N n=1 x n N (x n µ ML ) 2 n=1 Equivalent to common mean and variance estimators (almost). Maximum likelihood ignores parameter uncertainty Think of the ML solution for a single observed datapoint x1 µ ML1 = x 1 σml1 2 = (x 1 µ ML1 ) 2 = 0 How about Bayesian inference? Christoph Lippert GWAS I: Concepts and Probability Theory Summer

86 Parameter Inference for the Gaussian Inference for the Gaussian Maximum likelihood Maximum likelihood solutions µ ML = 1 N σ 2 ML = 1 N N n=1 x n N (x n µ ML ) 2 n=1 Equivalent to common mean and variance estimators (almost). Maximum likelihood ignores parameter uncertainty Think of the ML solution for a single observed datapoint x1 µ ML1 = x 1 σml1 2 = (x 1 µ ML1 ) 2 = 0 How about Bayesian inference? Christoph Lippert GWAS I: Concepts and Probability Theory Summer

87 Parameter Inference for the Gaussian Inference for the Gaussian Maximum likelihood Maximum likelihood solutions µ ML = 1 N σ 2 ML = 1 N N n=1 x n N (x n µ ML ) 2 n=1 Equivalent to common mean and variance estimators (almost). Maximum likelihood ignores parameter uncertainty Think of the ML solution for a single observed datapoint x1 µ ML1 = x 1 σml1 2 = (x 1 µ ML1 ) 2 = 0 How about Bayesian inference? Christoph Lippert GWAS I: Concepts and Probability Theory Summer

88 Parameter Inference for the Gaussian Bayesian Inference for the Gaussian Ingredients Data D = {x 1,..., x N } Model H Gauss Gaussian PDF N ( x µ, σ 2 ) = Likelihood θ = {µ} 1 2πσ 2 e 1 2σ 2 (x µ)2 For simplicity: assume variance σ 2 is known. p(d µ) = N N ( x n µ, σ 2) n=1 Christoph Lippert GWAS I: Concepts and Probability Theory Summer

89 Parameter Inference for the Gaussian Bayesian Inference for the Gaussian Ingredients Data D = {x 1,..., x N } Model H Gauss Gaussian PDF N ( x µ, σ 2 ) = Likelihood θ = {µ} 1 2πσ 2 e 1 2σ 2 (x µ)2 For simplicity: assume variance σ 2 is known N p(d µ) = N ( x n µ, σ 2) n=1 Christoph Lippert GWAS I: Concepts and Probability Theory Summer

90 Parameter Inference for the Gaussian Bayesian Inference for the Gaussian Ingredients Data D = {x 1,..., x N } Model H Gauss Gaussian PDF N ( x µ, σ 2 ) = Likelihood θ = {µ} 1 2πσ 2 e 1 2σ 2 (x µ)2 For simplicity: assume variance σ 2 is known. p(d µ) = N N ( x n µ, σ 2) n=1 p(x) N (x n µ, σ 2 ) x n (C.M. Bishop, Pattern Recognition and Machine Learning) x Christoph Lippert GWAS I: Concepts and Probability Theory Summer

91 Parameter Inference for the Gaussian Bayesian Inference for the Gaussian Bayes rule Combine likelihood with a Gaussian prior over µ p(µ) = N ( µ m0, s 2 ) 0 The posterior is proportional to p(µ D, σ 2 ) p(d µ, σ 2 )p(µ) Christoph Lippert GWAS I: Concepts and Probability Theory Summer

92 Parameter Inference for the Gaussian Bayesian Inference for the Gaussian p(µ D, σ 2 ) p(d µ)p(µ) [ N ] 1 = 1 1 2πσ 2 e 2σ 2 (xn µ)2 e 1 2s 2 (µ m 0) 2 0 2πs 2 0 = n=1 N 1 1 exp 2πσ 2 2πs 2 }{{ 0 } = C2 exp C1 [ 1 2 [ ( 1 s 2 + N ) 0 σ }{{ 2 } 1/ˆσ 1 2s 2 (µ 2 2µm 0 + m 2 0) 1 0 2σ 2 ( µ 2 2µ ˆσ( 1 s 2 m σ 2 N n=1 ] N (µ 2 2µx n + x 2 n) n=1 x n ) } {{ } ˆµ Posterior parameters follow as the new coefficients. ) ] + C3 Note: All the constants we dropped on the way yield the model evidence: p(µ D, σ 2 p(d µ)p(µ) ) = Z Christoph Lippert GWAS I: Concepts and Probability Theory Summer

93 Parameter Inference for the Gaussian Bayesian Inference for the Gaussian p(µ D, σ 2 ) p(d µ)p(µ) [ N ] 1 = 1 1 2πσ 2 e 2σ 2 (xn µ)2 e 1 2s 2 (µ m 0) 2 0 2πs 2 0 = n=1 N 1 1 exp 2πσ 2 2πs 2 }{{ 0 } = C2 exp C1 [ 1 2 [ ( 1 s 2 + N ) 0 σ }{{ 2 } 1/ˆσ 1 2s 2 (µ 2 2µm 0 + m 2 0) 1 0 2σ 2 ( µ 2 2µ ˆσ( 1 s 2 m σ 2 N n=1 ] N (µ 2 2µx n + x 2 n) n=1 x n ) } {{ } ˆµ Posterior parameters follow as the new coefficients. ) ] + C3 Note: All the constants we dropped on the way yield the model evidence: p(µ D, σ 2 p(d µ)p(µ) ) = Z Christoph Lippert GWAS I: Concepts and Probability Theory Summer

94 Parameter Inference for the Gaussian Bayesian Inference for the Gaussian p(µ D, σ 2 ) p(d µ)p(µ) [ N ] 1 = 1 1 2πσ 2 e 2σ 2 (xn µ)2 e 1 2s 2 (µ m 0) 2 0 2πs 2 0 = n=1 N 1 1 exp 2πσ 2 2πs 2 }{{ 0 } = C2 exp C1 [ 1 2 [ ( 1 s 2 + N ) 0 σ }{{ 2 } 1/ˆσ 1 2s 2 (µ 2 2µm 0 + m 2 0) 1 0 2σ 2 ( µ 2 2µ ˆσ( 1 s 2 m σ 2 N n=1 ] N (µ 2 2µx n + x 2 n) n=1 x n ) } {{ } ˆµ Posterior parameters follow as the new coefficients. ) ] + C3 Note: All the constants we dropped on the way yield the model evidence: p(µ D, σ 2 p(d µ)p(µ) ) = Z Christoph Lippert GWAS I: Concepts and Probability Theory Summer

95 Parameter Inference for the Gaussian Bayesian Inference for the Gaussian Posterior of the mean: p(µ D, σ 2 ) N (µ ˆµ, ˆσ), after some rewriting ˆµ = σ 2 Ns σ2 m 0 + Ns2 0 Ns σ2 µ ML, 1 ˆσ 2 = 1 s 2 + N 0 σ 2 µ ML = 1 N N n=1 x n Limiting cases for no and infinite amount of data N = 0 N ˆµ m 0 µ ML ˆσ 2 s Christoph Lippert GWAS I: Concepts and Probability Theory Summer

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

p(d g A,g B )p(g B ), g B

p(d g A,g B )p(g B ), g B Supplementary Note Marginal effects for two-locus models Here we derive the marginal effect size of the three models given in Figure 1 of the main text. For each model we assume the two loci (A and B)

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Bayesian Decision and Bayesian Learning

Bayesian Decision and Bayesian Learning Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence

More information

Bayesian Regression (1/31/13)

Bayesian Regression (1/31/13) STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Calculation of IBD probabilities

Calculation of IBD probabilities Calculation of IBD probabilities David Evans University of Bristol This Session Identity by Descent (IBD) vs Identity by state (IBS) Why is IBD important? Calculating IBD probabilities Lander-Green Algorithm

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Calculation of IBD probabilities

Calculation of IBD probabilities Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities

More information

(Genome-wide) association analysis

(Genome-wide) association analysis (Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Latent Variable models for GWAs

Latent Variable models for GWAs Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research Group Max-Planck-Institutes Tübingen, Germany September 2011 O. Stegle Latent variable models for GWAs

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014 Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Lecture 9. QTL Mapping 2: Outbred Populations

Lecture 9. QTL Mapping 2: Outbred Populations Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

Introduction to Bayesian Statistics

Introduction to Bayesian Statistics School of Computing & Communication, UTS January, 207 Random variables Pre-university: A number is just a fixed value. When we talk about probabilities: When X is a continuous random variable, it has a

More information

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia Expression QTLs and Mapping of Complex Trait Loci Paul Schliekelman Statistics Department University of Georgia Definitions: Genes, Loci and Alleles A gene codes for a protein. Proteins due everything.

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV

More information

Some Probability and Statistics

Some Probability and Statistics Some Probability and Statistics David M. Blei COS424 Princeton University February 13, 2012 Card problem There are three cards Red/Red Red/Black Black/Black I go through the following process. Close my

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

Learning ancestral genetic processes using nonparametric Bayesian models

Learning ancestral genetic processes using nonparametric Bayesian models Learning ancestral genetic processes using nonparametric Bayesian models Kyung-Ah Sohn October 31, 2011 Committee Members: Eric P. Xing, Chair Zoubin Ghahramani Russell Schwartz Kathryn Roeder Matthew

More information

Probabilistic Graphical Models for Image Analysis - Lecture 1

Probabilistic Graphical Models for Image Analysis - Lecture 1 Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.

More information

Parameter Estimation

Parameter Estimation 1 / 44 Parameter Estimation Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay October 25, 2012 Motivation System Model used to Derive

More information

CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation

CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation Instructor: Arindam Banerjee November 26, 2007 Genetic Polymorphism Single nucleotide polymorphism (SNP) Genetic Polymorphism

More information

An Introduction to Bayesian Machine Learning

An Introduction to Bayesian Machine Learning 1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems

More information

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8 The E-M Algorithm in Genetics Biostatistics 666 Lecture 8 Maximum Likelihood Estimation of Allele Frequencies Find parameter estimates which make observed data most likely General approach, as long as

More information

Evolutionary Genetics Midterm 2008

Evolutionary Genetics Midterm 2008 Student # Signature The Rules: (1) Before you start, make sure you ve got all six pages of the exam, and write your name legibly on each page. P1: /10 P2: /10 P3: /12 P4: /18 P5: /23 P6: /12 TOT: /85 (2)

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Genotype Imputation. Biostatistics 666

Genotype Imputation. Biostatistics 666 Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian

More information

Bayesian Inference of Interactions and Associations

Bayesian Inference of Interactions and Associations Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Probabilistic Models for Learning Data Representations. Andreas Damianou

Probabilistic Models for Learning Data Representations. Andreas Damianou Probabilistic Models for Learning Data Representations Andreas Damianou Department of Computer Science, University of Sheffield, UK IBM Research, Nairobi, Kenya, 23/06/2015 Sheffield SITraN Outline Part

More information

BTRY 4830/6830: Quantitative Genomics and Genetics

BTRY 4830/6830: Quantitative Genomics and Genetics BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements

More information

Lecture WS Evolutionary Genetics Part I 1

Lecture WS Evolutionary Genetics Part I 1 Quantitative genetics Quantitative genetics is the study of the inheritance of quantitative/continuous phenotypic traits, like human height and body size, grain colour in winter wheat or beak depth in

More information

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148 UNIT 8 BIOLOGY: Meiosis and Heredity Page 148 CP: CHAPTER 6, Sections 1-6; CHAPTER 7, Sections 1-4; HN: CHAPTER 11, Section 1-5 Standard B-4: The student will demonstrate an understanding of the molecular

More information

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21 CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform

More information

Some Probability and Statistics

Some Probability and Statistics Some Probability and Statistics David M. Blei COS424 Princeton University February 12, 2007 D. Blei ProbStat 01 1 / 42 Who wants to scribe? D. Blei ProbStat 01 2 / 42 Random variable Probability is about

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms

More information

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017 Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary

More information

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016

Lecture 5. Gaussian Models - Part 1. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. November 29, 2016 Lecture 5 Gaussian Models - Part 1 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza November 29, 2016 Luigi Freda ( La Sapienza University) Lecture 5 November 29, 2016 1 / 42 Outline 1 Basics

More information

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies.

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies. Theoretical and computational aspects of association tests: application in case-control genome-wide association studies Mathieu Emily November 18, 2014 Caen mathieu.emily@agrocampus-ouest.fr - Agrocampus

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Mathematical statistics

Mathematical statistics October 1 st, 2018 Lecture 11: Sufficient statistic Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation

More information

DD Advanced Machine Learning

DD Advanced Machine Learning Modelling Carl Henrik {chek}@csc.kth.se Royal Institute of Technology November 4, 2015 Who do I think you are? Mathematically competent linear algebra multivariate calculus Ok programmers Able to extend

More information

Chapter 13 Meiosis and Sexual Reproduction

Chapter 13 Meiosis and Sexual Reproduction Biology 110 Sec. 11 J. Greg Doheny Chapter 13 Meiosis and Sexual Reproduction Quiz Questions: 1. What word do you use to describe a chromosome or gene allele that we inherit from our Mother? From our Father?

More information

2. Map genetic distance between markers

2. Map genetic distance between markers Chapter 5. Linkage Analysis Linkage is an important tool for the mapping of genetic loci and a method for mapping disease loci. With the availability of numerous DNA markers throughout the human genome,

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Lecture 1: Introduction to Probabilistic Modelling Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Why a

More information

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015 Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2015 1 / 1 Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits.

More information

Learning gene regulatory networks Statistical methods for haplotype inference Part I

Learning gene regulatory networks Statistical methods for haplotype inference Part I Learning gene regulatory networks Statistical methods for haplotype inference Part I Input: Measurement of mrn levels of all genes from microarray or rna sequencing Samples (e.g. 200 patients with lung

More information

Lecture 1 October 9, 2013

Lecture 1 October 9, 2013 Probabilistic Graphical Models Fall 2013 Lecture 1 October 9, 2013 Lecturer: Guillaume Obozinski Scribe: Huu Dien Khue Le, Robin Bénesse The web page of the course: http://www.di.ens.fr/~fbach/courses/fall2013/

More information

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic NPFL108 Bayesian inference Introduction Filip Jurčíček Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic Home page: http://ufal.mff.cuni.cz/~jurcicek Version: 21/02/2014

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Linkage and Linkage Disequilibrium

Linkage and Linkage Disequilibrium Linkage and Linkage Disequilibrium Summer Institute in Statistical Genetics 2014 Module 10 Topic 3 Linkage in a simple genetic cross Linkage In the early 1900 s Bateson and Punnet conducted genetic studies

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Lecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency

Lecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency Lecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency Bruce Walsh lecture notes Introduction to Quantitative Genetics SISG, Seattle 16 18 July 2018 1 Outline Genetics of complex

More information

Case-Control Association Testing. Case-Control Association Testing

Case-Control Association Testing. Case-Control Association Testing Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits. Technological advances have made it feasible to perform case-control association studies

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

Lecture 2: Priors and Conjugacy

Lecture 2: Priors and Conjugacy Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.

More information

Causal Graphical Models in Systems Genetics

Causal Graphical Models in Systems Genetics 1 Causal Graphical Models in Systems Genetics 2013 Network Analysis Short Course - UCLA Human Genetics Elias Chaibub Neto and Brian S Yandell July 17, 2013 Motivation and basic concepts 2 3 Motivation

More information

Introduction to Genetics

Introduction to Genetics Introduction to Genetics We ve all heard of it, but What is genetics? Genetics: the study of gene structure and action and the patterns of inheritance of traits from parent to offspring. Ancient ideas

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information

Objectives. Announcements. Comparison of mitosis and meiosis

Objectives. Announcements. Comparison of mitosis and meiosis Announcements Colloquium sessions for which you can get credit posted on web site: Feb 20, 27 Mar 6, 13, 20 Apr 17, 24 May 15. Review study CD that came with text for lab this week (especially mitosis

More information

Lecture 5: GPs and Streaming regression

Lecture 5: GPs and Streaming regression Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X

More information

Population Genetics: a tutorial

Population Genetics: a tutorial : a tutorial Institute for Science and Technology Austria ThRaSh 2014 provides the basic mathematical foundation of evolutionary theory allows a better understanding of experiments allows the development

More information

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES: .5. ESTIMATION OF HAPLOTYPE FREQUENCIES: Chapter - 8 For SNPs, alleles A j,b j at locus j there are 4 haplotypes: A A, A B, B A and B B frequencies q,q,q 3,q 4. Assume HWE at haplotype level. Only the

More information

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative

More information

Quantitative Biology II Lecture 4: Variational Methods

Quantitative Biology II Lecture 4: Variational Methods 10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate

More information

B4 Estimation and Inference

B4 Estimation and Inference B4 Estimation and Inference 6 Lectures Hilary Term 27 2 Tutorial Sheets A. Zisserman Overview Lectures 1 & 2: Introduction sensors, and basics of probability density functions for representing sensor error

More information

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic

More information

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning. Chris Cremer September 2015 Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

More information

A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution

A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution Carl Scheffler First draft: September 008 Contents The Student s t Distribution The

More information

BS 50 Genetics and Genomics Week of Oct 3 Additional Practice Problems for Section. A/a ; B/B ; d/d X A/a ; b/b ; D/d

BS 50 Genetics and Genomics Week of Oct 3 Additional Practice Problems for Section. A/a ; B/B ; d/d X A/a ; b/b ; D/d BS 50 Genetics and Genomics Week of Oct 3 Additional Practice Problems for Section 1. In the following cross, all genes are on separate chromosomes. A is dominant to a, B is dominant to b and D is dominant

More information