Sparse canonical correlation analysis, with applications to genomic data

Size: px
Start display at page:

Download "Sparse canonical correlation analysis, with applications to genomic data"

Transcription

1 Sparse canonical correlation analysis, with applications to genomic data Daniela M. Witten and Robert Tibshirani June 15, 2009

2 The framework Suppose that we have n observations on p 1 + p 2 variables, and the variables are naturally partitioned into two groups of p 1 and p 2 variables, respectively.

3 The framework Suppose that we have n observations on p 1 + p 2 variables, and the variables are naturally partitioned into two groups of p 1 and p 2 variables, respectively. Let X 1 R n p 1 correspond to the first set of variables, and let X 2 R n p 2 correspond to the second set of variables.

4 The framework Suppose that we have n observations on p 1 + p 2 variables, and the variables are naturally partitioned into two groups of p 1 and p 2 variables, respectively. Let X 1 R n p 1 correspond to the first set of variables, and let X 2 R n p 2 correspond to the second set of variables. Assume that the columns of X 1 and X 2 have been standardized to have mean zero and standard deviation one.

5 The data

6 Canonical correlation analysis Canonical correlation analysis (CCA) is a classical method in statistics.

7 Canonical correlation analysis Canonical correlation analysis (CCA) is a classical method in statistics. We can seek w 1 R p 1 and w 2 R p 2 that maximize correlation between X 1 w 1 and X 2 w 2 ; that is, maximize w1,w 2 w T 1 X T 1 X 2 w 2 subject to w T 1 X T 1 X 1 w 1 = w T 2 X T 2 X 2 w 2 = 1.

8 Canonical correlation analysis Canonical correlation analysis (CCA) is a classical method in statistics. We can seek w 1 R p 1 and w 2 R p 2 that maximize correlation between X 1 w 1 and X 2 w 2 ; that is, maximize w1,w 2 w T 1 X T 1 X 2 w 2 subject to w T 1 X T 1 X 1 w 1 = w T 2 X T 2 X 2 w 2 = 1. CCA is not appropriate when p 1, p 2 n or p 1, p 2 n.

9 The data when p 1, p 2 n

10 Sparse Canonical Correlation Analysis maximize w1,w 2 w T 1 X T 1 X 2 w 2 subject to w T 1 X T 1 X 1 w 1 = w T 2 X T 2 X 2 w 2 = 1. We impose L 1 constraints onto w 1 and w 2 : that is, w 1 1 c 1 and w 2 1 c 2. We assume that X T 1 X 1 = I and X T 2 X 2 = I.

11 Sparse Canonical Correlation Analysis The sparse CCA criterion is maximize w1,w 2 w T 1 X T 1 X 2 w 2 subject to w 1 2 1, w 2 2 1, w 1 1 c 1, w 2 1 c 2.

12 Sparse Canonical Correlation Analysis The sparse CCA criterion is maximize w1,w 2 w T 1 X T 1 X 2 w 2 subject to w 1 2 1, w 2 2 1, w 1 1 c 1, w 2 1 c 2. For c 1 and c 2 small, this results in w 1 and w 2 sparse: many of the elements of w 1 and w 2 will exactly equal zero.

13 Sparse CCA Algorithm 1. Begin with an intial value for w 2 R p Iterate until convergence: w 1 argmax w1 w1 T X T 1 X 2 w 2 subject to w 1 2 1, w 1 1 c 1. w 2 argmax w2 w1 T X T 1 X 2 w 2 subject to w 2 2 1, w 2 1 c 2.

14 Sparse CCA Algorithm 1. Begin with an intial value for w 2 R p Iterate until convergence: w 1 argmax w1 w1 T X T 1 X 2 w 2 subject to w 1 2 1, w 1 1 c 1. w 2 argmax w2 w1 T X T 1 X 2 w 2 subject to w 2 2 1, w 2 1 c 2. Each update takes the form w 1 S(XT 1 X 2w 2, 1 ) S(X T 1 X 2w 2, 1 ) 2 where 1 0 is chosen so that w 1 1 = c 1. Here, S is the soft-thresholding operator: S(x, a) = sgn(x)( x a) +.

15 Genomic Data Recently, it has become increasingly common for researchers to use multiple assays to obtain measurements on a single set of patient samples. For instance, a data set might consist of gene expression and DNA copy number measurements on the same set of samples.

16 Genomic Data Recently, it has become increasingly common for researchers to use multiple assays to obtain measurements on a single set of patient samples. For instance, a data set might consist of gene expression and DNA copy number measurements on the same set of samples. The Question: Can we identify sets of genes whose expression is correlated with regions of copy number change?

17 DNA copy number change

18 DNA copy number change

19 Gene expression

20 Gene expression + copy number data

21 Gene expression + copy number data We want to find regions of DNA copy number change that are highly correlated with the expression of a set of genes.

22 Gene expression + copy number data We want to find regions of DNA copy number change that are highly correlated with the expression of a set of genes. That is, we want w 1 R p 1, w 2 R p 2 Cor(X 1 w 1, X 2 w 2 ) is high. sparse such that

23 Gene expression + copy number data We want to find regions of DNA copy number change that are highly correlated with the expression of a set of genes. That is, we want w 1 R p 1, w 2 R p 2 Cor(X 1 w 1, X 2 w 2 ) is high. sparse such that We ll make a statement like 0.3*(gene 1 expression) + 0.2*(gene 2 expression) - 4*(gene 3 expression) is highly correlated with genomic loss on part of chromosome 3.

24 Idea To find regions of copy number change on chromosome 1 that are correlated with gene expression sets anywhere on the genome, run sparse CCA using copy number data on chromosome 1 and expression measurements for all genes.

25 Data set We used a publicly-available breast cancer data set of Chin et al. (2006). 89 samples with breast cancer gene expression measurements DNA copy number measurements.

26 Copy number change on chromosome 1 1

27 Genes correlated w/copy # change on chrom. 1 i Gene Chromosome Weight 1 jumping translocation breakpoint translocated promoter region (to activated MET oncogene) glyceronephosphate O-acyltransferase NADH dehydrogenase (ubiquinone) Fe-S protein nucleoporin 133kD geranylgeranyl diphosphate synthase rab3 GTPase-activating protein, non-catalytic subunit (150kD) peroxisomal biogenesis factor 11B phosphatidylinositol glycan, class C tubulin-specific chaperone e protoporphyrinogen oxidase tuftelin papillary renal cell carcinoma (translocation-associated) splicing factor 3b, subunit 4, 49kD UDP-Gal:betaGlcNAc beta 1,4- galactosyltransferase hypothetical protein FLJ hypothetical protein HSPC mitochondrial ribosomal protein L HSPC003 protein hypothetical protein FLJ CGI-78 protein chromosome 1 open reading frame hypothetical protein My

28 Results, continued 1. Similar results for other chromosomes.

29 Results, continued 1. Similar results for other chromosomes. 2. We can use a permutation approach to estimate a p-value for the w 1 and w 2 obtained.

30 Results, continued 1. Similar results for other chromosomes. 2. We can use a permutation approach to estimate a p-value for the w 1 and w 2 obtained. 3. We described the identification of cis effects. To identify trans effects, run sparse CCA using copy number data on chromosome 1 and expression measurements for all genes NOT on chromosome 1.

31 Conclusions A method for the integrative analysis of genomic data sets. An extension of sparse CCA leads to sparse multiple CCA, for the case of K > 2 data sets on a single set of observations. If an outcome measurement (e.g. survival time, cancer subtype) is available for each observation, than sparse supervised CCA can be used. Software available in R PMA package and Excel AddIn Correlate (implemented by Sam Gross)

32 References 1. Chin et al. (2006) Genomic and transcriptional aberrations linked to breast cancer pathophysiologies, Cancer Cell Parkhomenko, Trichtler, and Beyene (2009) Sparse canonical correlation analysis with application to genomic data integration, Statistical Applications in Genetics and Molecular Biology 8 Article Waaijenborg, Versewel de Witt Hamer, and Zwinderman (2008) Quantifying the association between gene expression and DNA markers by penalized canonical correlation analysis, Statistical Applications in Genetics and Molecular Biology 7 Article Witten, Tibshirani, and Hastie (2009) A penalized matrix decomposition, with applications to sparse canonical correlation analysis and principal components, Biostatistics. 5. Witten and Tibshirani (2009) Extensions of sparse canonical correlation analysis, with applications to genomic data, Statistical Applications in Genetics and Molecular Biology 8 Article 28.

Correlate. A method for the integrative analysis of two genomic data sets

Correlate. A method for the integrative analysis of two genomic data sets Correlate A method for the integrative analysis of two genomic data sets Sam Gross, Balasubramanian Narasimhan, Robert Tibshirani, and Daniela Witten February 19, 2010 Introduction Sparse Canonical Correlation

More information

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis Biostatistics (2009), 10, 3, pp. 515 534 doi:10.1093/biostatistics/kxp008 Advance Access publication on April 17, 2009 A penalized matrix decomposition, with applications to sparse principal components

More information

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis A penalized matrix decomposition 1 A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis DANIELA M. WITTEN Department of Statistics, Stanford

More information

Sparse statistical modelling

Sparse statistical modelling Sparse statistical modelling Tom Bartlett Sparse statistical modelling Tom Bartlett 1 / 28 Introduction A sparse statistical model is one having only a small number of nonzero parameters or weights. [1]

More information

Group lasso for genomic data

Group lasso for genomic data Group lasso for genomic data Jean-Philippe Vert Mines ParisTech / Curie Institute / Inserm Machine learning: Theory and Computation workshop, IMA, Minneapolis, March 26-3, 22 J.P Vert (ParisTech) Group

More information

A Framework for Feature Selection in Clustering

A Framework for Feature Selection in Clustering A Framework for Feature Selection in Clustering Daniela M. Witten and Robert Tibshirani October 10, 2011 Outline Problem Past work Proposed (sparse) clustering framework Sparse K-means clustering Sparse

More information

Sparse Additive Functional and kernel CCA

Sparse Additive Functional and kernel CCA Sparse Additive Functional and kernel CCA Sivaraman Balakrishnan* Kriti Puniyani* John Lafferty *Carnegie Mellon University University of Chicago Presented by Miao Liu 5/3/2013 Canonical correlation analysis

More information

Computational Genetics Winter 2013 Lecture 10. Eleazar Eskin University of California, Los Angeles

Computational Genetics Winter 2013 Lecture 10. Eleazar Eskin University of California, Los Angeles Computational Genetics Winter 2013 Lecture 10 Eleazar Eskin University of California, Los ngeles Pair End Sequencing Lecture 10. February 20th, 2013 (Slides from Ben Raphael) Chromosome Painting: Normal

More information

Univariate shrinkage in the Cox model for high dimensional data

Univariate shrinkage in the Cox model for high dimensional data Univariate shrinkage in the Cox model for high dimensional data Robert Tibshirani January 6, 2009 Abstract We propose a method for prediction in Cox s proportional model, when the number of features (regressors)

More information

The lasso: some novel algorithms and applications

The lasso: some novel algorithms and applications 1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,

More information

GO ID GO term Number of members GO: translation 225 GO: nucleosome 50 GO: calcium ion binding 76 GO: structural

GO ID GO term Number of members GO: translation 225 GO: nucleosome 50 GO: calcium ion binding 76 GO: structural GO ID GO term Number of members GO:0006412 translation 225 GO:0000786 nucleosome 50 GO:0005509 calcium ion binding 76 GO:0003735 structural constituent of ribosome 170 GO:0019861 flagellum 23 GO:0005840

More information

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: m Eukaryotic mrna processing Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: Cap structure a modified guanine base is added to the 5 end. Poly-A tail

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Classification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC)

Classification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC) Classification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC) Eunsik Park 1 and Y-c Ivan Chang 2 1 Chonnam National University, Gwangju, Korea 2 Academia Sinica, Taipei,

More information

Chapter 15 Active Reading Guide Regulation of Gene Expression

Chapter 15 Active Reading Guide Regulation of Gene Expression Name: AP Biology Mr. Croft Chapter 15 Active Reading Guide Regulation of Gene Expression The overview for Chapter 15 introduces the idea that while all cells of an organism have all genes in the genome,

More information

Multiple Choice Review- Eukaryotic Gene Expression

Multiple Choice Review- Eukaryotic Gene Expression Multiple Choice Review- Eukaryotic Gene Expression 1. Which of the following is the Central Dogma of cell biology? a. DNA Nucleic Acid Protein Amino Acid b. Prokaryote Bacteria - Eukaryote c. Atom Molecule

More information

Standardization and Singular Value Decomposition in Canonical Correlation Analysis

Standardization and Singular Value Decomposition in Canonical Correlation Analysis Standardization and Singular Value Decomposition in Canonical Correlation Analysis Melinda Borello Johanna Hardin, Advisor David Bachman, Reader Submitted to Pitzer College in Partial Fulfillment of the

More information

Structural Learning and Integrative Decomposition of Multi-View Data

Structural Learning and Integrative Decomposition of Multi-View Data Structural Learning and Integrative Decomposition of Multi-View Data, Department of Statistics, Texas A&M University JSM 2018, Vancouver, Canada July 31st, 2018 Dr. Gen Li, Columbia University, Mailman

More information

Tuesday 9/6/2018 Mike Mueckler

Tuesday 9/6/2018 Mike Mueckler Tuesday 9/6/2018 Mike Mueckler mmueckler@wustl.edu Intracellular Targeting of Nascent Polypeptides Mitochondria are the Sites of Oxidative ATP Production Sugars Triglycerides Figure 14-10 Molecular Biology

More information

Some Statistical Models and Algorithms for Change-Point Problems in Genomics

Some Statistical Models and Algorithms for Change-Point Problems in Genomics Some Statistical Models and Algorithms for Change-Point Problems in Genomics S. Robin UMR 518 AgroParisTech / INRA Applied MAth & Comput. Sc. Journées SMAI-MAIRCI Grenoble, September 2012 S. Robin (AgroParisTech

More information

From Gene to Protein

From Gene to Protein From Gene to Protein Gene Expression Process by which DNA directs the synthesis of a protein 2 stages transcription translation All organisms One gene one protein 1. Transcription of DNA Gene Composed

More information

Genomes and Their Evolution

Genomes and Their Evolution Chapter 21 Genomes and Their Evolution PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

More information

Clustering and classification with applications to microarrays and cellular phenotypes

Clustering and classification with applications to microarrays and cellular phenotypes Clustering and classification with applications to microarrays and cellular phenotypes Gregoire Pau, EMBL Heidelberg gregoire.pau@embl.de European Molecular Biology Laboratory 1 Microarray data patients

More information

BME 5742 Biosystems Modeling and Control

BME 5742 Biosystems Modeling and Control BME 5742 Biosystems Modeling and Control Lecture 24 Unregulated Gene Expression Model Dr. Zvi Roth (FAU) 1 The genetic material inside a cell, encoded in its DNA, governs the response of a cell to various

More information

Variable Selection in Structured High-dimensional Covariate Spaces

Variable Selection in Structured High-dimensional Covariate Spaces Variable Selection in Structured High-dimensional Covariate Spaces Fan Li 1 Nancy Zhang 2 1 Department of Health Care Policy Harvard University 2 Department of Statistics Stanford University May 14 2007

More information

Flow of Genetic Information

Flow of Genetic Information presents Flow of Genetic Information A Montagud E Navarro P Fernández de Córdoba JF Urchueguía Elements Nucleic acid DNA RNA building block structure & organization genome building block types Amino acid

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

The University of Jordan. Accreditation & Quality Assurance Center. COURSE Syllabus

The University of Jordan. Accreditation & Quality Assurance Center. COURSE Syllabus The University of Jordan Accreditation & Quality Assurance Center COURSE Syllabus 1 Course title Principles of Genetics and molecular biology 2 Course number 0501217 3 Credit hours (theory, practical)

More information

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB I519 Introduction to Bioinformatics, 2015 Genome Comparison Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Whole genome comparison/alignment Build better phylogenies Identify polymorphism

More information

1. In most cases, genes code for and it is that

1. In most cases, genes code for and it is that Name Chapter 10 Reading Guide From DNA to Protein: Gene Expression Concept 10.1 Genetics Shows That Genes Code for Proteins 1. In most cases, genes code for and it is that determine. 2. Describe what Garrod

More information

Learning Network from High-dimensional Array Data

Learning Network from High-dimensional Array Data 1 Learning Network from High-dimensional Array Data Li Hsu, Jie Peng and Pei Wang Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA Department of Statistics, University

More information

Fused elastic net EPoC

Fused elastic net EPoC Fused elastic net EPoC Tobias Abenius November 28, 2013 Data Set and Method Use mrna and copy number aberration (CNA) to build a network using an extension of our EPoC method (Jörnsten et al., 2011; Abenius

More information

The lasso: some novel algorithms and applications

The lasso: some novel algorithms and applications 1 The lasso: some novel algorithms and applications Robert Tibshirani Stanford University ASA Bay Area chapter meeting Collaborations with Trevor Hastie, Jerome Friedman, Ryan Tibshirani, Daniela Witten,

More information

What is the central dogma of biology?

What is the central dogma of biology? Bellringer What is the central dogma of biology? A. RNA DNA Protein B. DNA Protein Gene C. DNA Gene RNA D. DNA RNA Protein Review of DNA processes Replication (7.1) Transcription(7.2) Translation(7.3)

More information

Measuring TF-DNA interactions

Measuring TF-DNA interactions Measuring TF-DNA interactions How is Biological Complexity Achieved? Mediated by Transcription Factors (TFs) 2 Regulation of Gene Expression by Transcription Factors TF trans-acting factors TF TF TF TF

More information

The Mitochondrion. Definition Structure, ultrastructure Functions

The Mitochondrion. Definition Structure, ultrastructure Functions The Mitochondrion Definition Structure, ultrastructure Functions Organelle definition Etymology of the name Carl Benda (1903): (mitos) thread; (khondrion) granule. Light microscopy identification First

More information

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis Chapters 12&13 Notes: DNA, RNA & Protein Synthesis Name Period Words to Know: nucleotides, DNA, complementary base pairing, replication, genes, proteins, mrna, rrna, trna, transcription, translation, codon,

More information

Semi-Penalized Inference with Direct FDR Control

Semi-Penalized Inference with Direct FDR Control Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p

More information

Research Statement on Statistics Jun Zhang

Research Statement on Statistics Jun Zhang Research Statement on Statistics Jun Zhang (junzhang@galton.uchicago.edu) My interest on statistics generally includes machine learning and statistical genetics. My recent work focus on detection and interpretation

More information

Multimodal Deep Learning for Predicting Survival from Breast Cancer

Multimodal Deep Learning for Predicting Survival from Breast Cancer Multimodal Deep Learning for Predicting Survival from Breast Cancer Heather Couture Deep Learning Journal Club Nov. 16, 2016 Outline Background on tumor histology & genetic data Background on survival

More information

Canonical Correlation Analysis with Kernels

Canonical Correlation Analysis with Kernels Canonical Correlation Analysis with Kernels Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Computational Diagnostics Group Seminar 2003 Mar 10 1 Overview

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

REVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E

REVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E REVIEW SESSION Wednesday, September 15 5:30 PM SHANTZ 242 E Gene Regulation Gene Regulation Gene expression can be turned on, turned off, turned up or turned down! For example, as test time approaches,

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

From gene to protein. Premedical biology

From gene to protein. Premedical biology From gene to protein Premedical biology Central dogma of Biology, Molecular Biology, Genetics transcription replication reverse transcription translation DNA RNA Protein RNA chemically similar to DNA,

More information

Lecture 10: Logistic Regression

Lecture 10: Logistic Regression BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics Lecture 10: Logistic Regression Jie Wang Department of Computational Medicine & Bioinformatics University of Michigan 1 Outline An

More information

S1 Gene ontology (GO) analysis of the network alignment results

S1 Gene ontology (GO) analysis of the network alignment results 1 Supplementary Material for Effective comparative analysis of protein-protein interaction networks by measuring the steady-state network flow using a Markov model Hyundoo Jeong 1, Xiaoning Qian 1 and

More information

Advanced Topics in RNA and DNA. DNA Microarrays Aptamers

Advanced Topics in RNA and DNA. DNA Microarrays Aptamers Quiz 1 Advanced Topics in RNA and DNA DNA Microarrays Aptamers 2 Quantifying mrna levels to asses protein expression 3 The DNA Microarray Experiment 4 Application of DNA Microarrays 5 Some applications

More information

Least Angle Regression, Forward Stagewise and the Lasso

Least Angle Regression, Forward Stagewise and the Lasso January 2005 Rob Tibshirani, Stanford 1 Least Angle Regression, Forward Stagewise and the Lasso Brad Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani Stanford University Annals of Statistics,

More information

Learning with Singular Vectors

Learning with Singular Vectors Learning with Singular Vectors CIS 520 Lecture 30 October 2015 Barry Slaff Based on: CIS 520 Wiki Materials Slides by Jia Li (PSU) Works cited throughout Overview Linear regression: Given X, Y find w:

More information

Biology 2018 Final Review. Miller and Levine

Biology 2018 Final Review. Miller and Levine Biology 2018 Final Review Miller and Levine bones blood cells elements All living things are made up of. cells If a cell of an organism contains a nucleus, the organism is a(n). eukaryote prokaryote plant

More information

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Inferring Transcriptional Regulatory Networks from Gene Expression Data II Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday

More information

Reconstructing Mitochondrial Evolution?? Morphological Diversity. Mitochondrial Diversity??? What is your definition of a mitochondrion??

Reconstructing Mitochondrial Evolution?? Morphological Diversity. Mitochondrial Diversity??? What is your definition of a mitochondrion?? Reconstructing Mitochondrial Evolution?? What is your definition of a mitochondrion?? Morphological Diversity Mitochondria as we all know them: Suprarenal gland Liver cell Plasma cell Adrenal cortex Mitochondrial

More information

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Yale School of Public Health Joint work with Ning Hao, Yue S. Niu presented @Tsinghua University Outline 1 The Problem

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph

More information

Regression Model In The Analysis Of Micro Array Data-Gene Expression Detection

Regression Model In The Analysis Of Micro Array Data-Gene Expression Detection Jamal Fathima.J.I 1 and P.Venkatesan 1. Research Scholar -Department of statistics National Institute For Research In Tuberculosis, Indian Council For Medical Research,Chennai,India,.Department of statistics

More information

ومن أحياها Translation 1. Translation 1. DONE BY :Maen Faoury

ومن أحياها Translation 1. Translation 1. DONE BY :Maen Faoury Translation 1 DONE BY :Maen Faoury 0 1 ومن أحياها Translation 1 2 ومن أحياها Translation 1 In this lecture and the coming lectures you are going to see how the genetic information is transferred into proteins

More information

STATE UNIVERSITY OF NEW YORK COLLEGE OF TECHNOLOGY CANTON, NEW YORK. COURSE OUTLINE BIOL 310 The Human Genome. Prepared By: Ron Tavernier

STATE UNIVERSITY OF NEW YORK COLLEGE OF TECHNOLOGY CANTON, NEW YORK. COURSE OUTLINE BIOL 310 The Human Genome. Prepared By: Ron Tavernier STATE UNIVERSITY OF NEW YORK COLLEGE OF TECHNOLOGY CANTON, NEW YORK COURSE OUTLINE BIOL 310 The Human Genome Prepared By: Ron Tavernier SCHOOL OF SCIENCE, HEALTH & CRIMINAL JUSTICE SCIENCE DEPARTMENT May

More information

Generalized Power Method for Sparse Principal Component Analysis

Generalized Power Method for Sparse Principal Component Analysis Generalized Power Method for Sparse Principal Component Analysis Peter Richtárik CORE/INMA Catholic University of Louvain Belgium VOCAL 2008, Veszprém, Hungary CORE Discussion Paper #2008/70 joint work

More information

Generalized Linear Models (1/29/13)

Generalized Linear Models (1/29/13) STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability

More information

M i t o c h o n d r i a

M i t o c h o n d r i a M i t o c h o n d r i a Dr. Diala Abu-Hassan School of Medicine dr.abuhassand@gmail.com Mitochondria Function: generation of metabolic energy in eukaryotic cells Generation of ATP from the breakdown of

More information

- mutations can occur at different levels from single nucleotide positions in DNA to entire genomes.

- mutations can occur at different levels from single nucleotide positions in DNA to entire genomes. February 8, 2005 Bio 107/207 Winter 2005 Lecture 11 Mutation and transposable elements - the term mutation has an interesting history. - as far back as the 17th century, it was used to describe any drastic

More information

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

AP Bio Module 16: Bacterial Genetics and Operons, Student Learning Guide

AP Bio Module 16: Bacterial Genetics and Operons, Student Learning Guide Name: Period: Date: AP Bio Module 6: Bacterial Genetics and Operons, Student Learning Guide Getting started. Work in pairs (share a computer). Make sure that you log in for the first quiz so that you get

More information

Name: SBI 4U. Gene Expression Quiz. Overall Expectation:

Name: SBI 4U. Gene Expression Quiz. Overall Expectation: Gene Expression Quiz Overall Expectation: - Demonstrate an understanding of concepts related to molecular genetics, and how genetic modification is applied in industry and agriculture Specific Expectation(s):

More information

TESTING SIGNIFICANCE OF FEATURES BY LASSOED PRINCIPAL COMPONENTS. BY DANIELA M. WITTEN 1 AND ROBERT TIBSHIRANI 2 Stanford University

TESTING SIGNIFICANCE OF FEATURES BY LASSOED PRINCIPAL COMPONENTS. BY DANIELA M. WITTEN 1 AND ROBERT TIBSHIRANI 2 Stanford University The Annals of Applied Statistics 2008, Vol. 2, No. 3, 986 1012 DOI: 10.1214/08-AOAS182 Institute of Mathematical Statistics, 2008 TESTING SIGNIFICANCE OF FEATURES BY LASSOED PRINCIPAL COMPONENTS BY DANIELA

More information

SUPPLEMENTARY MATERIAL. The table lists the 437 proteins proposed to be myristoylated in A. thaliana. The AGI

SUPPLEMENTARY MATERIAL. The table lists the 437 proteins proposed to be myristoylated in A. thaliana. The AGI SUPPLEMENTARY MATERIAL Table s1: the N-myristoylome in A. thaliana The table lists the 437 proteins proposed to be myristoylated in A. thaliana. The AGI entries were re-annotated according to the most

More information

Penalized classification using Fisher s linear discriminant

Penalized classification using Fisher s linear discriminant J. R. Statist. Soc. B (2011) 73, Part 5, pp. 753 772 Penalized classification using Fisher s linear discriminant Daniela M. Witten University of Washington, Seattle, USA and Robert Tibshirani Stanford

More information

Quantitative Genetics & Evolutionary Genetics

Quantitative Genetics & Evolutionary Genetics Quantitative Genetics & Evolutionary Genetics (CHAPTER 24 & 26- Brooker Text) May 14, 2007 BIO 184 Dr. Tom Peavy Quantitative genetics (the study of traits that can be described numerically) is important

More information

STATS 306B: Unsupervised Learning Spring Lecture 13 May 12

STATS 306B: Unsupervised Learning Spring Lecture 13 May 12 STATS 306B: Unsupervised Learning Spring 2014 Lecture 13 May 12 Lecturer: Lester Mackey Scribe: Jessy Hwang, Minzhe Wang 13.1 Canonical correlation analysis 13.1.1 Recap CCA is a linear dimensionality

More information

Scale in the biological world

Scale in the biological world Scale in the biological world 2 A cell seen by TEM 3 4 From living cells to atoms 5 Compartmentalisation in the cell: internal membranes and the cytosol 6 The Origin of mitochondria: The endosymbion hypothesis

More information

Indirect multivariate response linear regression

Indirect multivariate response linear regression Biometrika (2016), xx, x, pp. 1 22 1 2 3 4 5 6 C 2007 Biometrika Trust Printed in Great Britain Indirect multivariate response linear regression BY AARON J. MOLSTAD AND ADAM J. ROTHMAN School of Statistics,

More information

Linear models for the joint analysis of multiple. array-cgh profiles

Linear models for the joint analysis of multiple. array-cgh profiles Linear models for the joint analysis of multiple array-cgh profiles F. Picard, E. Lebarbier, B. Thiam, S. Robin. UMR 5558 CNRS Univ. Lyon 1, Lyon UMR 518 AgroParisTech/INRA, F-75231, Paris Statistics for

More information

Bioinformatics. Transcriptome

Bioinformatics. Transcriptome Bioinformatics Transcriptome Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/ Bioinformatics

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

Chapter 17. From Gene to Protein. Biology Kevin Dees

Chapter 17. From Gene to Protein. Biology Kevin Dees Chapter 17 From Gene to Protein DNA The information molecule Sequences of bases is a code DNA organized in to chromosomes Chromosomes are organized into genes What do the genes actually say??? Reflecting

More information

Analysis of MALDI-TOF Data: from Data Preprocessing to Model Validation for Survival Outcome

Analysis of MALDI-TOF Data: from Data Preprocessing to Model Validation for Survival Outcome Analysis of MALDI-TOF Data: from Data Preprocessing to Model Validation for Survival Outcome Heidi Chen, Ph.D. Cancer Biostatistics Center Vanderbilt University School of Medicine March 20, 2009 Outline

More information

UNIT 5. Protein Synthesis 11/22/16

UNIT 5. Protein Synthesis 11/22/16 UNIT 5 Protein Synthesis IV. Transcription (8.4) A. RNA carries DNA s instruction 1. Francis Crick defined the central dogma of molecular biology a. Replication copies DNA b. Transcription converts DNA

More information

Molecular Biology of the Cell

Molecular Biology of the Cell Alberts Johnson Lewis Morgan Raff Roberts Walter Molecular Biology of the Cell Sixth Edition Chapter 6 (pp. 333-368) How Cells Read the Genome: From DNA to Protein Copyright Garland Science 2015 Genetic

More information

Biology Notes 2. Mitosis vs Meiosis

Biology Notes 2. Mitosis vs Meiosis Biology Notes 2 Mitosis vs Meiosis Diagram Booklet Cell Cycle (bottom corner draw cell in interphase) Mitosis Meiosis l Meiosis ll Cell Cycle Interphase Cell spends the majority of its life in this phase

More information

16 The Cell Cycle. Chapter Outline The Eukaryotic Cell Cycle Regulators of Cell Cycle Progression The Events of M Phase Meiosis and Fertilization

16 The Cell Cycle. Chapter Outline The Eukaryotic Cell Cycle Regulators of Cell Cycle Progression The Events of M Phase Meiosis and Fertilization The Cell Cycle 16 The Cell Cycle Chapter Outline The Eukaryotic Cell Cycle Regulators of Cell Cycle Progression The Events of M Phase Meiosis and Fertilization Introduction Self-reproduction is perhaps

More information

Proteomics. 2 nd semester, Department of Biotechnology and Bioinformatics Laboratory of Nano-Biotechnology and Artificial Bioengineering

Proteomics. 2 nd semester, Department of Biotechnology and Bioinformatics Laboratory of Nano-Biotechnology and Artificial Bioengineering Proteomics 2 nd semester, 2013 1 Text book Principles of Proteomics by R. M. Twyman, BIOS Scientific Publications Other Reference books 1) Proteomics by C. David O Connor and B. David Hames, Scion Publishing

More information

Dirichlet process Bayesian clustering with the R package PReMiuM

Dirichlet process Bayesian clustering with the R package PReMiuM Dirichlet process Bayesian clustering with the R package PReMiuM Dr Silvia Liverani Brunel University London July 2015 Silvia Liverani (Brunel University London) Profile Regression 1 / 18 Outline Motivation

More information

Statistical testing. Samantha Kleinberg. October 20, 2009

Statistical testing. Samantha Kleinberg. October 20, 2009 October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find

More information

Principles of Genetics

Principles of Genetics Principles of Genetics Snustad, D ISBN-13: 9780470903599 Table of Contents C H A P T E R 1 The Science of Genetics 1 An Invitation 2 Three Great Milestones in Genetics 2 DNA as the Genetic Material 6 Genetics

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

Data visualization and clustering: an application to gene expression data

Data visualization and clustering: an application to gene expression data Data visualization and clustering: an application to gene expression data Francesco Napolitano Università degli Studi di Salerno Dipartimento di Matematica e Informatica DAA Erice, April 2007 Thanks to

More information

Fuzzy Clustering of Gene Expression Data

Fuzzy Clustering of Gene Expression Data Fuzzy Clustering of Gene Data Matthias E. Futschik and Nikola K. Kasabov Department of Information Science, University of Otago P.O. Box 56, Dunedin, New Zealand email: mfutschik@infoscience.otago.ac.nz,

More information

Mathematics, Genomics, and Cancer

Mathematics, Genomics, and Cancer School of Informatics IUB April 6, 2009 Outline Introduction Class Comparison Class Discovery Class Prediction Example Biological states and state modulation Software Tools Research directions Math & Biology

More information

CHAPTER4 Translation

CHAPTER4 Translation CHAPTER4 Translation 4.1 Outline of Translation 4.2 Genetic Code 4.3 trna and Anticodon 4.4 Ribosome 4.5 Protein Synthesis 4.6 Posttranslational Events 4.1 Outline of Translation From mrna to protein

More information

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype Lecture Series 7 From DNA to Protein: Genotype to Phenotype Reading Assignments Read Chapter 7 From DNA to Protein A. Genes and the Synthesis of Polypeptides Genes are made up of DNA and are expressed

More information

Table 5. Genes unique to G. thermodenitrificans NG80-2 Gene ID Gene name Gene product COG functional category

Table 5. Genes unique to G. thermodenitrificans NG80-2 Gene ID Gene name Gene product COG functional category Table 5. Genes unique to G. thermodenitrificans NG80-2 GT0030 gt30 Methionine--tRNA ligase/methionyl-trna synthetase COG0143, Translation GT0033 gt33 Unknown GT0106 gt106 Ribosomal protein L3 COG0087,

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Student Learning Outcomes: Nucleus distinguishes Eukaryotes from Prokaryotes

Student Learning Outcomes: Nucleus distinguishes Eukaryotes from Prokaryotes 9 The Nucleus Student Learning Outcomes: Nucleus distinguishes Eukaryotes from Prokaryotes Explain general structures of Nuclear Envelope, Nuclear Lamina, Nuclear Pore Complex Explain movement of proteins

More information

Fitness constraints on horizontal gene transfer

Fitness constraints on horizontal gene transfer Fitness constraints on horizontal gene transfer Dan I Andersson University of Uppsala, Department of Medical Biochemistry and Microbiology, Uppsala, Sweden GMM 3, 30 Aug--2 Sep, Oslo, Norway Acknowledgements:

More information

Objective 3.01 (DNA, RNA and Protein Synthesis)

Objective 3.01 (DNA, RNA and Protein Synthesis) Objective 3.01 (DNA, RNA and Protein Synthesis) DNA Structure o Discovered by Watson and Crick o Double-stranded o Shape is a double helix (twisted ladder) o Made of chains of nucleotides: o Has four types

More information

LARGE NUMBERS OF EXPLANATORY VARIABLES. H.S. Battey. WHAO-PSI, St Louis, 9 September 2018

LARGE NUMBERS OF EXPLANATORY VARIABLES. H.S. Battey. WHAO-PSI, St Louis, 9 September 2018 LARGE NUMBERS OF EXPLANATORY VARIABLES HS Battey Department of Mathematics, Imperial College London WHAO-PSI, St Louis, 9 September 2018 Regression, broadly defined Response variable Y i, eg, blood pressure,

More information

9/11/18. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes

9/11/18. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes Molecular and Cellular Biology Animal Cell ((eukaryotic cell) -----> compare with prokaryotic cell) ENDOPLASMIC RETICULUM (ER) Rough ER Smooth ER Flagellum Nuclear envelope Nucleolus NUCLEUS Chromatin

More information

Honors Biology Reading Guide Chapter 11

Honors Biology Reading Guide Chapter 11 Honors Biology Reading Guide Chapter 11 v Promoter a specific nucleotide sequence in DNA located near the start of a gene that is the binding site for RNA polymerase and the place where transcription begins

More information

Topographic Mapping and Dimensionality Reduction of Binary Tensor Data of Arbitrary Rank

Topographic Mapping and Dimensionality Reduction of Binary Tensor Data of Arbitrary Rank Topographic Mapping and Dimensionality Reduction of Binary Tensor Data of Arbitrary Rank Peter Tiňo, Jakub Mažgút, Hong Yan, Mikael Bodén Topographic Mapping and Dimensionality Reduction of Binary Tensor

More information