Learning Your Identity and Disease from Research Papers: Information Leaks in Genome-Wide Association Study
|
|
- Merryl Rose Powell
- 5 years ago
- Views:
Transcription
1 Learning Your Identity and Disease from Research Papers: Information Leaks in Genome-Wide Association Study Rui Wang, Yong Li, XiaoFeng Wang, Haixu Tang and Xiaoyong Zhou Indiana University at Bloomington
2 Genomic Revolutions Low-cost genotyping Revolutionary applications
3 Genome-Wide Association Study Case Group Control Group Single Nucleotide Polymorphism (SNP)
4 Identification Risk Consequence of identifications Participant protection De-identification Aggregation Is this sufficient?
5 Attack on Aggregated Data Single-allele frequencies Major: 0; Minor: 1 Homer s attack NIH s Reactions
6 The Rest of The Iceberg Other genome data Test statistics Linkage Disequilibrium (LD) Haplotype sequences Other sources Publications
7 Our Scary Findings ID from GWAS publications Test statistics LD statistics Allele Frequencies Statistical Identification Pair-wise allele frequencies SNP Sequences Work on real genome data Conclusion: Urgent needs to thoroughly study the problem
8 Why Doing This? Facilitate Dissemination of Genome Data SAFELY A Lesson From the Internet: Build Protection Into the Core!
9 Terms Alleles Single (0 1) Pair-wise (00, 01, 10, 11) Genotype Combinations of two sets of alleles Haplotype SNP Sequence (phased genotype) Locus Surrounding region of a SNP site
10 GWAS: Backgrounds GWAS Study Quality Control Info Leaks p values: leak joint allele frequencies of case & control Association Detection p values: leak case frequencies & control frequencies LD in Regions of Association r 2 : reveal LD of SNP sequences Replication Disclose cohorts with same frequency distributions
11 Homer s Attack Reference Group (Pop) Case Group (M) Pop j : 0.3 Pop j+1 : 0.6 Pop j+2 : 0.3 Y j : 1 Y j+1 : 0 Y j+2 : 1 Y i Pop i Y i M i M j : 0.8 M j+1 : 0.2 M j+2 : 0.6 H 0 Not in M D(Y i ) = Y i Pop i - Y i M i ΣD
12 What we can do Reverse engineer test statistics To find allele frequencies LD-based statistical identification Recover SNP sequences
13 Allele Frequency (Single) SNP 1 SNP2 SNP3 r 2 (1,3) C r 2 0* (1,2) r 2 (2,3) p 2 p 3
14 Allele Frequencies (Pair-wise) 2 2 (C00N *0C 2 (C N C C 0* ) 00 *0 0* ) L r< = < U C C C C C*1 C C C C 0* 1* *0 *1 = C = C = C = C 0* 0* 1* C + C 1* *0 + C + C *0 *1 (1) (2) (3) (4) (5) Catch: C 00 not unique Integer constraint Inaccurate r-squares Signs
15 Homer-Style Attack Based On LD? Why? Single AF: n LD: n(n-1)/2 But how? Validity of the test statistic D(Y i ) = Y i Pop i - Y i M i r 2 2 (C00C11 C10C01) = C 0* C 1* C *0 C *1
16 Our Statistical Attack We have to use signed r Distribution of T r? Markov model Reference? T T r = ( Y = 00 + Y T 1 i j N 11 ) ( r R + 1) / 2 ( Y 00 + Y 11 ) ( r C + 1) / 2 = ( r C r R )( Y 00 + Y 11 Y 01 Y 10 )
17 Recover SNP Sequences Contingency table problem Studied for decades Very difficult Divide-and-Conquer 1. Construct each haplotype block 2. Connect different blocks
18 Simple Defense Low-precision statistics Correlation among SNPs Thresholds How to determine them? Noises Consistency check Maximum-likelihood approximation
19 Evaluations Data: the HapMap project Locus: FGFR2 174 SNPs Used in a real GWAS study Population Africa backgrounds 200: half cases and half controls
20 Allele Frequencies and Signs
21 Statistical Powers 20 times more powerful than Homer s test (T p )
22 Recover Haplotypes Linear equation solving: rref Integer Programming: bintprog 100 individuals, 10 blocks, 174 SNPs System: 2.80GHz Core 2 Duo, 3GB memory Fully restored within 12 hours
23 Discussion Genotypes vs. Haplotypes Defense Differential privacy
24 Conclusion New attacks and new understanding Many open research problems
25 Contacts Dr. XiaoFeng Wang Web: du/xw7 System Security Lab: sysseclab.informatics.india na.edu Dr. Haixu Tang Web: du/hatang
26 References Good: from the same population Bad: from different populations good reference average reference
27 More In-depth Studies Larger populations: Low-precision statistics (200 cases, 200 references)
Introduction to Linkage Disequilibrium
Introduction to September 10, 2014 Suppose we have two genes on a single chromosome gene A and gene B such that each gene has only two alleles Aalleles : A 1 and A 2 Balleles : B 1 and B 2 Suppose we have
More informationFriday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo
Friday Harbor 2017 From Genetics to GWAS (Genome-wide Association Study) Sept 7 2017 David Fardo Purpose: prepare for tomorrow s tutorial Genetic Variants Quality Control Imputation Association Visualization
More information1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics
1 Springer Nan M. Laird Christoph Lange The Fundamentals of Modern Statistical Genetics 1 Introduction to Statistical Genetics and Background in Molecular Genetics 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
More informationp(d g A,g B )p(g B ), g B
Supplementary Note Marginal effects for two-locus models Here we derive the marginal effect size of the three models given in Figure 1 of the main text. For each model we assume the two loci (A and B)
More informationPrivate Computation with Genomic Data for Genome-Wide Association and Linkage Studies
Private Computation with Genomic Data for Genome-Wide Association and Linkage Studies Abstract Ali Shahbazi 1, Fattaneh Bayatbabolghani 1, and Marina Blanton 2 1 Department of Computer Science and Engineering,
More informationGenotype Imputation. Biostatistics 666
Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives
More informationHow to analyze many contingency tables simultaneously?
How to analyze many contingency tables simultaneously? Thorsten Dickhaus Humboldt-Universität zu Berlin Beuth Hochschule für Technik Berlin, 31.10.2012 Outline Motivation: Genetic association studies Statistical
More informationProportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power
Proportional Variance Explained by QTL and Statistical Power Partitioning the Genetic Variance We previously focused on obtaining variance components of a quantitative trait to determine the proportion
More informationProbability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies
Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies Ruth Pfeiffer, Ph.D. Mitchell Gail Biostatistics Branch Division of Cancer Epidemiology&Genetics National
More informationDifferential Privacy with Bounded Priors: Reconciling Utility and Privacy in Genome-Wide Association Studies
Differential Privacy with ounded Priors: Reconciling Utility and Privacy in Genome-Wide Association Studies ASTRACT Florian Tramèr Zhicong Huang Jean-Pierre Hubaux School of IC, EPFL firstname.lastname@epfl.ch
More information(Genome-wide) association analysis
(Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by
More informationHybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS
Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge González-Domínguez*, Bertil Schmidt*, Jan C. Kässens**, Lars Wienbrandt** *Parallel and Distributed Architectures
More informationBayesian Inference of Interactions and Associations
Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,
More informationGenotype Imputation. Class Discussion for January 19, 2016
Genotype Imputation Class Discussion for January 19, 2016 Intuition Patterns of genetic variation in one individual guide our interpretation of the genomes of other individuals Imputation uses previously
More informationBinomial Mixture Model-based Association Tests under Genetic Heterogeneity
Binomial Mixture Model-based Association Tests under Genetic Heterogeneity Hui Zhou, Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 April 30,
More informationCSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation
CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation Instructor: Arindam Banerjee November 26, 2007 Genetic Polymorphism Single nucleotide polymorphism (SNP) Genetic Polymorphism
More informationAssociation Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5
Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative
More informationChapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)
12/5/14 Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination) Linkage Disequilibrium Genealogical Interpretation of LD Association Mapping 1 Linkage and Recombination v linkage equilibrium ²
More informationLecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012
Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium November 12, 2012 Last Time Sequence data and quantification of variation Infinite sites model Nucleotide diversity (π) Sequence-based
More informationQuantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April
More informationUSING LINEAR PREDICTORS TO IMPUTE ALLELE FREQUENCIES FROM SUMMARY OR POOLED GENOTYPE DATA. By Xiaoquan Wen and Matthew Stephens University of Chicago
Submitted to the Annals of Applied Statistics USING LINEAR PREDICTORS TO IMPUTE ALLELE FREQUENCIES FROM SUMMARY OR POOLED GENOTYPE DATA By Xiaoquan Wen and Matthew Stephens University of Chicago Recently-developed
More informationLearning ancestral genetic processes using nonparametric Bayesian models
Learning ancestral genetic processes using nonparametric Bayesian models Kyung-Ah Sohn October 31, 2011 Committee Members: Eric P. Xing, Chair Zoubin Ghahramani Russell Schwartz Kathryn Roeder Matthew
More informationSupplementary Figures
Supplementary Figures Supplementary Figure 1 Principal components analysis (PCA) of all samples analyzed in the discovery phase. Colors represent the phenotype of study populations. a) The first sample
More informationModel-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate
Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel
More informationOn the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease
On the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease Yuehua Cui 1 and Dong-Yun Kim 2 1 Department of Statistics and Probability, Michigan State University,
More information1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:
.5. ESTIMATION OF HAPLOTYPE FREQUENCIES: Chapter - 8 For SNPs, alleles A j,b j at locus j there are 4 haplotypes: A A, A B, B A and B B frequencies q,q,q 3,q 4. Assume HWE at haplotype level. Only the
More informationCONTENTS. P A R T I Genomes 1. P A R T II Gene Transcription and Regulation 109
CONTENTS ix Preface xv Acknowledgments xxi Editors and contributors xxiv A computational micro primer xxvi P A R T I Genomes 1 1 Identifying the genetic basis of disease 3 Vineet Bafna 2 Pattern identification
More informationDifferen'al Privacy with Bounded Priors: Reconciling U+lity and Privacy in Genome- Wide Associa+on Studies
Differen'al Privacy with Bounded Priors: Reconciling U+lity and Privacy in Genome- Wide Associa+on Studies Florian Tramèr, Zhicong Huang, Erman Ayday, Jean- Pierre Hubaux ACM CCS 205 Denver, Colorado,
More informationComputational Approaches to Statistical Genetics
Computational Approaches to Statistical Genetics GWAS I: Concepts and Probability Theory Christoph Lippert Dr. Oliver Stegle Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen
More informationCOMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics
COMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics Thorsten Dickhaus University of Bremen Institute for Statistics AG DANK Herbsttagung
More informationAn Integrated Approach for the Assessment of Chromosomal Abnormalities
An Integrated Approach for the Assessment of Chromosomal Abnormalities Department of Biostatistics Johns Hopkins Bloomberg School of Public Health June 26, 2007 Karyotypes Karyotypes General Cytogenetics
More informationLecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017
Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping
More informationAn Integrated Approach for the Assessment of Chromosomal Abnormalities
An Integrated Approach for the Assessment of Chromosomal Abnormalities Department of Biostatistics Johns Hopkins Bloomberg School of Public Health June 6, 2007 Karyotypes Mitosis and Meiosis Meiosis Meiosis
More informationNature Methods: doi: /nmeth Supplementary Figure 1
Supplementary Figure 1 Schematic comparison of linking attacks and detection of a genome in a mixture of attacks. (a) Each box in the figure represents a dataset in the form of a matrix. Multiple boxes
More information2. Map genetic distance between markers
Chapter 5. Linkage Analysis Linkage is an important tool for the mapping of genetic loci and a method for mapping disease loci. With the availability of numerous DNA markers throughout the human genome,
More informationCalculation of IBD probabilities
Calculation of IBD probabilities David Evans and Stacey Cherny University of Oxford Wellcome Trust Centre for Human Genetics This Session IBD vs IBS Why is IBD important? Calculating IBD probabilities
More informationRobust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis
Robust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis Hongzhe Li hongzhe@upenn.edu, http://statgene.med.upenn.edu University of Pennsylvania Perelman School of
More informationNature Genetics: doi: /ng Supplementary Figure 1. Number of cases and proxy cases required to detect association at designs.
Supplementary Figure 1 Number of cases and proxy cases required to detect association at designs. = 5 10 8 for case control and proxy case control The ratio of controls to cases (or proxy cases) is 1.
More informationBTRY 7210: Topics in Quantitative Genomics and Genetics
BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu February 12, 2015 Lecture 3:
More informationRégression en grande dimension et épistasie par blocs pour les études d association
Régression en grande dimension et épistasie par blocs pour les études d association V. Stanislas, C. Dalmasso, C. Ambroise Laboratoire de Mathématiques et Modélisation d Évry "Statistique et Génome" 1
More informationHeritability estimation in modern genetics and connections to some new results for quadratic forms in statistics
Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics Lee H. Dicker Rutgers University and Amazon, NYC Based on joint work with Ruijun Ma (Rutgers),
More informationUsing a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics
Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge González-Domínguez Parallel and Distributed Architectures Group Johannes Gutenberg University of Mainz, Germany j.gonzalez@uni-mainz.de
More informationBackward Genotype-Trait Association. in Case-Control Designs
Backward Genotype-Trait Association (BGTA)-Based Dissection of Complex Traits in Case-Control Designs Tian Zheng, Hui Wang and Shaw-Hwa Lo Department of Statistics, Columbia University, New York, New York,
More informationQuantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 18: Introduction to covariates, the QQ plot, and population structure II + minimal GWAS steps Jason Mezey jgm45@cornell.edu April
More informationSupporting Information
Supporting Information Hammer et al. 10.1073/pnas.1109300108 SI Materials and Methods Two-Population Model. Estimating demographic parameters. For each pair of sub-saharan African populations we consider
More informationQuantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.501.01 Lecture11: Quantitative Genomics II Jason Mezey jgm45@cornell.edu March 7, 019 (Th) 10:10-11:5 Announcements Homework #5 will be posted by
More informationLinear Regression (1/1/17)
STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression
More informationStatistical Methods in Mapping Complex Diseases
University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations Summer 8-12-2011 Statistical Methods in Mapping Complex Diseases Jing He University of Pennsylvania, jinghe@mail.med.upenn.edu
More informationACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG Human Population Genomics
ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG 010101100010010100001010101010011011100110001100101000100101 Human Population Genomics Heritability & Environment Feasibility of identifying
More informationFei Lu. Post doctoral Associate Cornell University
Fei Lu Post doctoral Associate Cornell University http://www.maizegenetics.net Genotyping by sequencing (GBS) is simple and cost effective 1. Digest DNA 2. Ligate adapters with barcodes 3. Pool DNAs 4.
More informationAccounting for read depth in the analysis of genotyping-by-sequencing data
Accounting for read depth in the analysis of genotyping-by-sequencing data Ken Dodds, John McEwan, Timothy Bilton, Rudi Brauning, Rayna Anderson, Tracey Van Stijn, Theodor Kristjánsson, Shannon Clarke
More informationSAT in Bioinformatics: Making the Case with Haplotype Inference
SAT in Bioinformatics: Making the Case with Haplotype Inference Inês Lynce 1 and João Marques-Silva 2 1 IST/INESC-ID, Technical University of Lisbon, Portugal ines@sat.inesc-id.pt 2 School of Electronics
More informationNIH Public Access Author Manuscript Stat Sin. Author manuscript; available in PMC 2013 August 15.
NIH Public Access Author Manuscript Published in final edited form as: Stat Sin. 2012 ; 22: 1041 1074. ON MODEL SELECTION STRATEGIES TO IDENTIFY GENES UNDERLYING BINARY TRAITS USING GENOME-WIDE ASSOCIATION
More informationHERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA)
BIRS 016 1 HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA) Malka Gorfine, Tel Aviv University, Israel Joint work with Li Hsu, FHCRC, Seattle, USA BIRS 016 The concept of heritability
More informationPhasing via the Expectation Maximization (EM) Algorithm
Computing Haplotype Frequencies and Haplotype Phasing via the Expectation Maximization (EM) Algorithm Department of Computer Science Brown University, Providence sorin@cs.brown.edu September 14, 2010 Outline
More informationHumans have two copies of each chromosome. Inherited from mother and father. Genotyping technologies do not maintain the phase
Humans have two copies of each chromosome Inherited from mother and father. Genotyping technologies do not maintain the phase Genotyping technologies do not maintain the phase Recall that proximal SNPs
More informationLecture WS Evolutionary Genetics Part I 1
Quantitative genetics Quantitative genetics is the study of the inheritance of quantitative/continuous phenotypic traits, like human height and body size, grain colour in winter wheat or beak depth in
More informationLinkage and Linkage Disequilibrium
Linkage and Linkage Disequilibrium Summer Institute in Statistical Genetics 2014 Module 10 Topic 3 Linkage in a simple genetic cross Linkage In the early 1900 s Bateson and Punnet conducted genetic studies
More informationFrequency Spectra and Inference in Population Genetics
Frequency Spectra and Inference in Population Genetics Although coalescent models have come to play a central role in population genetics, there are some situations where genealogies may not lead to efficient
More informationLab 12. Linkage Disequilibrium. November 28, 2012
Lab 12. Linkage Disequilibrium November 28, 2012 Goals 1. Es
More informationLearning gene regulatory networks Statistical methods for haplotype inference Part I
Learning gene regulatory networks Statistical methods for haplotype inference Part I Input: Measurement of mrn levels of all genes from microarray or rna sequencing Samples (e.g. 200 patients with lung
More informationQuantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary
More informationThe Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies
The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies Ian Barnett, Rajarshi Mukherjee & Xihong Lin Harvard University ibarnett@hsph.harvard.edu June 24, 2014 Ian Barnett
More informationAnalysis of Y-STR Profiles in Mixed DNA using Next Generation Sequencing
Analysis of Y-STR Profiles in Mixed DNA using Next Generation Sequencing So Yeun Kwon, Hwan Young Lee, and Kyoung-Jin Shin Department of Forensic Medicine, Yonsei University College of Medicine, Seoul,
More informationComputational Systems Biology: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,
More informationComplexity and Approximation of the Minimum Recombination Haplotype Configuration Problem
Complexity and Approximation of the Minimum Recombination Haplotype Configuration Problem Lan Liu 1, Xi Chen 3, Jing Xiao 3, and Tao Jiang 1,2 1 Department of Computer Science and Engineering, University
More informationCover Page. The handle holds various files of this Leiden University dissertation
Cover Page The handle http://hdl.handle.net/1887/35195 holds various files of this Leiden University dissertation Author: Balliu, Brunilda Title: Statistical methods for genetic association studies with
More informationCalculation of IBD probabilities
Calculation of IBD probabilities David Evans University of Bristol This Session Identity by Descent (IBD) vs Identity by state (IBS) Why is IBD important? Calculating IBD probabilities Lander-Green Algorithm
More informationOptimal Methods for Using Posterior Probabilities in Association Testing
Digital Collections @ Dordt Faculty Work: Comprehensive List 5-2013 Optimal Methods for Using Posterior Probabilities in Association Testing Keli Liu Harvard University Alexander Luedtke University of
More informationFeature Selection via Block-Regularized Regression
Feature Selection via Block-Regularized Regression Seyoung Kim School of Computer Science Carnegie Mellon University Pittsburgh, PA 3 Eric Xing School of Computer Science Carnegie Mellon University Pittsburgh,
More informationGWAS for Compound Heterozygous Traits: Phenotypic Distance and Integer Linear Programming Dan Gusfield, Rasmus Nielsen.
GWAS for Compound Heterozygous Traits: Phenotypic Distance and Integer Linear Programming Dan Gusfield, Rasmus Nielsen December 11, 2016 GWAS In Genome Wide Association Studies (GWAS) we try to locate
More informationA novel fuzzy set based multifactor dimensionality reduction method for detecting gene-gene interaction
A novel fuzzy set based multifactor dimensionality reduction method for detecting gene-gene interaction Sangseob Leem, Hye-Young Jung, Sungyoung Lee and Taesung Park Bioinformatics and Biostatistics lab
More informationEnabling Accurate Analysis of Private Network Data
Enabling Accurate Analysis of Private Network Data Michael Hay Joint work with Gerome Miklau, David Jensen, Chao Li, Don Towsley University of Massachusetts, Amherst Vibhor Rastogi, Dan Suciu University
More informationAuditing Information Leakage for Distance Metrics
In 3 rd IEEE Conference on Privacy, Security, Risk and Trust, Boston, October 2011 Auditing Information Leakage for Distance Metrics Yikan Chen University of Virginia yc2r@virginia.edu David Evans University
More informationEM algorithm. Rather than jumping into the details of the particular EM algorithm, we ll look at a simpler example to get the idea of how it works
EM algorithm The example in the book for doing the EM algorithm is rather difficult, and was not available in software at the time that the authors wrote the book, but they implemented a SAS macro to implement
More informationBTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014
BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014 Homework 4 (version 3) - posted October 3 Assigned October 2; Due 11:59PM October 9 Problem 1 (Easy) a. For the genetic regression model: Y
More informationSNP Association Studies with Case-Parent Trios
SNP Association Studies with Case-Parent Trios Department of Biostatistics Johns Hopkins Bloomberg School of Public Health September 3, 2009 Population-based Association Studies Balding (2006). Nature
More informationopulation genetics undamentals for SNP datasets
opulation genetics undamentals for SNP datasets with crocodiles) Sam Banks Charles Darwin University sam.banks@cdu.edu.au I ve got a SNP genotype dataset, now what? Do my data meet the requirements of
More informationPopulation Genetics II (Selection + Haplotype analyses)
26 th Oct 2015 Poulation Genetics II (Selection + Halotye analyses) Gurinder Singh Mickey twal Center for Quantitative iology Natural Selection Model (Molecular Evolution) llele frequency Embryos Selection
More informationMBeacon: Privacy-Preserving Beacons for DNA Methylation Data
MBeacon: Privacy-Preserving Beacons for DNA Methylation Data Inken Hagestedt, Yang Zhang, Mathias Humbert, Pascal Berrang, Haixu Tang, XiaoFeng Wang, Michael Backes CISPA Helmholtz Center for Information
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More information1. Understand the methods for analyzing population structure in genomes
MSCBIO 2070/02-710: Computational Genomics, Spring 2016 HW3: Population Genetics Due: 24:00 EST, April 4, 2016 by autolab Your goals in this assignment are to 1. Understand the methods for analyzing population
More informationSupplementary Information for: Detection and interpretation of shared genetic influences on 42 human traits
Supplementary Information for: Detection and interpretation of shared genetic influences on 42 human traits Joseph K. Pickrell 1,2,, Tomaz Berisa 1, Jimmy Z. Liu 1, Laure Segurel 3, Joyce Y. Tung 4, David
More informationFor 5% confidence χ 2 with 1 degree of freedom should exceed 3.841, so there is clear evidence for disequilibrium between S and M.
STAT 550 Howework 6 Anton Amirov 1. This question relates to the same study you saw in Homework-4, by Dr. Arno Motulsky and coworkers, and published in Thompson et al. (1988; Am.J.Hum.Genet, 42, 113-124).
More informationEfficient designs of gene environment interaction studies: implications of Hardy Weinberg equilibrium and gene environment independence
Special Issue Paper Received 7 January 20, Accepted 28 September 20 Published online 24 February 202 in Wiley Online Library (wileyonlinelibrary.com) DOI: 0.002/sim.4460 Efficient designs of gene environment
More informationInference From Genome-Wide Association Studies Using a Novel Markov Model
Genetic Epidemiology (2008) Inference From Genome-Wide Association Studies Using a Novel Markov Model Fay J. Hosking, 1 Jonathan A. C. Sterne, 2 George Davey Smith, 2 and Peter J. Green 1 1 Department
More informationPowerful multi-locus tests for genetic association in the presence of gene-gene and gene-environment interactions
Powerful multi-locus tests for genetic association in the presence of gene-gene and gene-environment interactions Nilanjan Chatterjee, Zeynep Kalaylioglu 2, Roxana Moslehi, Ulrike Peters 3, Sholom Wacholder
More informationLecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015
Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2015 1 / 1 Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits.
More informationSome models of genomic selection
Munich, December 2013 What is the talk about? Barley! Steptoe x Morex barley mapping population Steptoe x Morex barley mapping population genotyping from Close at al., 2009 and phenotyping from cite http://wheat.pw.usda.gov/ggpages/sxm/
More informationNew imputation strategies optimized for crop plants: FILLIN (Fast, Inbred Line Library ImputatioN) FSFHap (Full Sib Family Haplotype)
New imputation strategies optimized for crop plants: FILLIN (Fast, Inbred Line Library ImputatioN) FSFHap (Full Sib Family Haplotype) Kelly Swarts PAG Allele Mining 1/11/2014 Imputation is the projection
More informationGenetic Drift in Human Evolution
Genetic Drift in Human Evolution (Part 2 of 2) 1 Ecology and Evolutionary Biology Center for Computational Molecular Biology Brown University Outline Introduction to genetic drift Modeling genetic drift
More informationTheoretical and computational aspects of association tests: application in case-control genome-wide association studies.
Theoretical and computational aspects of association tests: application in case-control genome-wide association studies Mathieu Emily November 18, 2014 Caen mathieu.emily@agrocampus-ouest.fr - Agrocampus
More informationAn Efficient and Accurate Graph-Based Approach to Detect Population Substructure
An Efficient and Accurate Graph-Based Approach to Detect Population Substructure Srinath Sridhar, Satish Rao and Eran Halperin Abstract. Currently, large-scale projects are underway to perform whole genome
More informationStatistical Power of Model Selection Strategies for Genome-Wide Association Studies
Statistical Power of Model Selection Strategies for Genome-Wide Association Studies Zheyang Wu 1, Hongyu Zhao 1,2 * 1 Department of Epidemiology and Public Health, Yale University School of Medicine, New
More informationThe supplementary document of LLR: A latent low-rank approach to colocalizing genetic risk variants in multiple GWAS
The supplementary document of LLR: A latent low-rank approach to colocalizing genetic risk variants in multiple GWAS Jin Liu 1, Xiang Wan 2, Chaolong Wang 3, Chao Yang 4, Xiaowei Zhou 5, and Can Yang 6
More information1 Preliminary Variance component test in GLM Mediation Analysis... 3
Honglang Wang Depart. of Stat. & Prob. wangho16@msu.edu Omics Data Integration Statistical Genetics/Genomics Journal Club Summary and discussion of Joint Analysis of SNP and Gene Expression Data in Genetic
More informationCS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS
CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems
More informationCase-Control Association Testing. Case-Control Association Testing
Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits. Technological advances have made it feasible to perform case-control association studies
More informationA differential equation model for functional mapping of a virus-cell dynamic system
J. Math. Biol. DOI 10.1007/s00285-009-0288-1 Mathematical Biology A differential equation model for functional mapping of a virus-cell dynamic system Jiangtao Luo William W. Hager Rongling Wu Received:
More informationA mixed model based QTL / AM analysis of interactions (G by G, G by E, G by treatment) for plant breeding
Professur Pflanzenzüchtung Professur Pflanzenzüchtung A mixed model based QTL / AM analysis of interactions (G by G, G by E, G by treatment) for plant breeding Jens Léon 4. November 2014, Oulu Workshop
More information