Giri Narasimhan. CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Evaluation. Course Homepage.

Similar documents
BSC 4934: QʼBIC Capstone Workshop" Giri Narasimhan. ECS 254A; Phone: x3748

CGS 5991 (2 Credits) Bioinformatics Tools

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Procedure to Create NCBI KOGS

Welcome to BIOL 572: Recombinant DNA techniques

Comparative genomics: Overview & Tools + MUMmer algorithm

BIOINFORMATICS LAB AP BIOLOGY

Computational Structural Bioinformatics

ECOL/MCB 320 and 320H Genetics

Advanced Algorithms and Models for Computational Biology

Related Courses He who asks is a fool for five minutes, but he who does not ask remains a fool forever.

Introduction to Bioinformatics Integrated Science, 11/9/05

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Genomics and bioinformatics summary. Finding genes -- computer searches

Overview of I519 & Introduction to Bioinformatics. Yuzhen Ye School of Informatics and Computing, IUB

Hands-On Nine The PAX6 Gene and Protein

Genomes and Their Evolution

Pyrobayes: an improved base caller for SNP discovery in pyrosequences

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Sequences, Structures, and Gene Regulatory Networks

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Algorithmics and Bioinformatics

Chapter 18 Active Reading Guide Genomes and Their Evolution

Small RNA in rice genome

11/24/13. Science, then, and now. Computational Structural Bioinformatics. Learning curve. ECS129 Instructor: Patrice Koehl

Sequence Database Search Techniques I: Blast and PatternHunter tools

G4120: Introduction to Computational Biology

CS612 - Algorithms in Bioinformatics

Frequently Asked Questions (FAQs)

DATA ACQUISITION FROM BIO-DATABASES AND BLAST. Natapol Pornputtapong 18 January 2018

microrna Studies Chen-Hanson Ting SVFIG June 23, 2018

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr

- conserved in Eukaryotes. - proteins in the cluster have identifiable conserved domains. - human gene should be included in the cluster.

String Matching Problem

COMPARING DNA SEQUENCES TO UNDERSTAND EVOLUTIONARY RELATIONSHIPS WITH BLAST

Lecture Materials are available on the 321 web site

You are required to know all terms defined in lecture. EXPLORE THE COURSE WEB SITE 1/6/2010 MENDEL AND MODELS

Heuristic Methods. Heuristic methods for alignment Sequence databases Multiple alignment Gene and protein prediction

Introduction to Bioinformatics

Chapter # EVOLUTION AND ORIGIN OF NEUROFIBROMIN, THE PRODUCT OF THE NEUROFIBROMATOSIS TYPE 1 (NF1) TUMOR-SUPRESSOR GENE

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11

Comparative Bioinformatics Midterm II Fall 2004

The Contribution of Bioinformatics to Evolutionary Thought

Algorithms in Computational Biology (236522) spring 2008 Lecture #1

Bio2. Heuristics, Databases ; Multiple Sequence Alignment ; Gene Finding. Biological Databases (sequences) Armstrong, 2007 Bioinformatics 2

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

# shared OGs (spa, spb) Size of the smallest genome. dist (spa, spb) = 1. Neighbor joining. OG1 OG2 OG3 OG4 sp sp sp

Bioinformatics Exercises

MCB142/ICB163 Overview. Instructors:

Molecular evolution 2. Please sit in row K or forward

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

Od vzniku života po evoluci člověka. Václav Pačes Ústav molekulární gene2ky Akademie věd České republiky

Practical search strategies

Instructor: Dr. Darryl Kropf 203 S. Biology ; please put cell biology in subject line

Graph Alignment and Biological Networks

GEP Annotation Report

6.096 Algorithms for Computational Biology. Prof. Manolis Kellis

Sequencing alignment Ameer Effat M. Elfarash

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

Unit 5: Cell Division and Development Guided Reading Questions (45 pts total)

Campbell Biology 10. A Global Approach. Chapter 20 The Evolution of Genomes

Genome Sequencing & DNA Sequence Analysis

2 Genome evolution: gene fusion versus gene fission

EECS730: Introduction to Bioinformatics

Genomes and Their Evolution

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Bioinformatics Chapter 1. Introduction

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Computational Biology: Basics & Interesting Problems

Bioinformatics Study On Mirror DNA In Mycobacterium Tuberculosis Using Fast Parallel Complement Blast Strategy.

Evolutionary Analysis by Whole-Genome Comparisons

PG Diploma in Genome Informatics onwards CCII Page 1 of 6

A Field Guide. GenBank Records. A Typical GenBank Record GenBank Record: Feature Table. The Flatfile Format. part 2. Header. Feature Table.

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

BIOLOGY. Genomes and Their Evolution CAMPBELL. Reece Urry Cain Wasserman Minorsky Jackson

Sequencing alignment Ameer Effat M. Elfarash

Warm Up. What are some examples of living things? Describe the characteristics of living things

CISC 889 Bioinformatics (Spring 2004) Lecture 1

The Gene The gene; Genes Genes Allele;

G4120: Introduction to Computational Biology

Course Descriptions Biology

Homolog. Orthologue. Comparative Genomics. Paralog. What is Comparative Genomics. What is Comparative Genomics

Going Beyond SNPs with Next Genera5on Sequencing Technology Personalized Medicine: Understanding Your Own Genome Fall 2014

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Annotation of Drosophila grimashawi Contig12

Ensembl Genomes (non-chordates): Quick tour. This quick tour provides a brief introduction to Ensembl Genomes [2], the non-chordate genome browser.

Syllabus BINF Computational Biology Core Course

Introduction to protein alignments

Proteins: Structure & Function. Ulf Leser

Supplemental Materials

High-throughput sequence alignment. November 9, 2017

CSCI 4181 / CSCI 6802 Algorithms in Bioinformatics

How much non-coding DNA do eukaryotes require?

Computational Biology From The Perspective Of A Physical Scientist

"Omics" - Experimental Approachs 11/18/05

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions?

Bioinformatics Practical for Biochemists

Correlate. A method for the integrative analysis of two genomic data sets

Transcription:

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 389; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs06.html 1/12/06 CAP5510/CGS5166 1 Evaluation Semester Project (50 %) Homework Assignments (20 %) Exams (25 %) Class Participation (5 %) Course Homepage www.cis.fiu.edu/~giri/teach/bioinfs05.html Lecture notes, required reading material, homework, announcements, etc. 1/12/06 CAP5510/CGS5166 2 Course Schedules CAP 5510 (3 credits) and CGS 5166 (2 credits) will meet every Tue from 2 PM to 4:45 PM. The first half of the lecture will be meant for both classes. The second half will be more focused on the CS majors Different exams and evaluation. Please attend all classes regardless of registered course. 1/12/06 CAP5510/CGS5166 3

Genome Sizes Organism Size Date Est. # genes HIV type 1 9.2 Kb 1997 9 H. influenzae 1.8 Mb 1995 1,740 M. genitalium 0.58 Mb 1998 525 E. coli 4.7 Mb 1997 4,000 S. cerevisiae 12.1 Mb 1996 6,034 C. elegans 97 Mb 1998 19,099 A. thaliana 100 Mb 2000 25,000 D. melanogaster 180 Mb 2000 13,061 M. musculus 3 Gb 2002 ~30,000 H. sapiens 3 Gb 2001 32,000+ 1/12/06 CAP5510/CGS5166 4 Genomic Databases Entrez Portal at National Center for Biotechnology Information (NCBI) gives access to: Nucleotide (GenBank, EMBL, DDBJ) Protein (PIR, SwissPROT, PRF, and Protein Data Bank or PDB) Genome Structure 3D Domains Conserved Domains Gene; UniGene; HomoloGene; SNP GEO Profiles & Datasets Cancer Chromosomes PubMed Central; Journals; Books OMIM Database Neighbors and Interlinking 1/12/06 CAP5510/CGS5166 5 Caenorhabditis Elegans Entire genome 1998; 8 year effort 1 st animal; 2 nd eukaryote (after yeast) Nematode (phylum) Easy to experiment with; Easily observable 97 million bases; 20,000 genes; 12,000 with known function; 6 Chromosomes; GC content 36% 959 cells; 302-cell nervous system 36% of proteins common with human 15 Kb mitochondrial genome Results in ACeDB 25% of genes in operons Important for HGP: technology, software, scale/efficiency 182 genes with alternative splice variants 1/12/06 CAP5510/CGS5166 6

Homo sapiens Sequenced 2001; 15 year effort 3 billion bases, 500 gaps Variable density of Genes, SNPs, CpG islands ~ 1.1 % of the genome codes for proteins; 99%? ~ 40-48 % of the genome consists of repeat sequences ~ 10 % of the genome consists of repeats called ALUs ~ 5 % of the genome consists of long repeats (>1 Kb) ~ 50 transposon-derived genes 223 genes common with bacteria that are missing from worm, fly or yeast. 1/12/06 CAP5510/CGS5166 7 The Suffix Tree Data Structure Borrelia burgdorferi 1 million bases Shotgun Sequencing: 4612 fragments 2 million bases long totally Using suffix trees - 15 min for Fragment Assembly Using Dynamic Programming - 10 days 1/12/06 CAP5510/CGS5166 8 Sequence Alignment Why? >gi 12643549 sp O18381 PAX6_DROME Paired box protein Pax-6 (Eyeless protein) MRNLPCLGTAGGSGLGGIAGKPSPTMEAVEASTASHRHSTSSYFATTYYHLTDDECHSGVNQLGGVFVGG RPLPDSTRQKIVELAHSGARPCDISRILQVSNGCVSKILGRYYETGSIRPRAIGGSKPRVATAEVVSKIS QYKRECPSIFAWEIRDRLLQENVCTNDNIPSVSSINRVLRNLAAQKEQQSTGSGSSSTSAGNSISAKVSV SIGGNVSNVASGSRGTLSSSTDLMQTATPLNSSESGGASNSGEGSEQEAIYEKLRLLNTQHAAGPGPLEP ARAAPLVGQSPNHLGTRSSHPQLVHGNHQALQQHQQQSWPPRHYSGSWYPTSLSEIPISSAPNIASVTAY ASGPSLAHSLSPPNDIESLASIGHQRNCPVATEDIHLKKELDGHQSDETGSGEGENSNGGASNIGNTEDD QARLILKRKLQRNRTSFTNDQIDSLEKEFERTHYPDVFARERLAGKIGLPEARIQVWFSNRRAKWRREEK LRNQRRTPNSTGASATSSSTSATASLTDSPNSLSACSSLLSGSAGGPSVSTINGLSSPSTLSTNVNAPTL GAGIDSSESPTPIPHIRPSCTSDNDNGRQSEDCRRVCSPCPLGVGGHQNTHHIQSNGHAQGHALVPAISP RLNFNSGSFGAMYSNMHHTALSMSDSYGAVTPIPSFNHSAVGPLAPPSPIPQQGDLTPSSLYPCHMTLRP PPMAPAHHHIVPGDGGRPAGVGLGSGQSANLGASCSGSGYEVLSAYALPPPPMASSSAADSSFSAASSAS ANVTPHHTIAQESCPSPCSSASHFGVAHSSGFSSDPISPAVSSYAHMSYNYASSANTMTPSSASGTSAHV APGKQQFFASCFYSPWV >gi 6174889 PAX6_HUMAN Paired box protein (Oculorhombin) (Aniridia, type II protein) MQNSHSGVNQLGGVFVNGRPLPDSTRQKIVELAHSGARPCDISRILQVSNGCVSKILGRYYETGSIRPRA IGGSKPRVATPEVVSKIAQYKRECPSIFAWEIRDRLLSEGVCTNDNIPSVSSINRVLRNLASEKQQMGAD GMYDKLRMLNGQTGSWGTRPGWYPGTSVPGQPTQDGCQQQEGGGENTNSISSNGEDSDEAQMRLQLKRKL QRNRTSFTQEQIEALEKEFERTHYPDVFARERLAAKIDLPEARIQVWFSNRRAKWRREEKLRNQRRQASN TPSHIPISSSFSTSVYQPIPQPTTPVSSFTSGSMLGRTDTALTNTYSALPPMPSFTMANNLPMQPPVPSQ TSSYSCMLPTSPSVNGRSYDTYTPPHMQTHMNSQPMGTSGTTSTGLISPGVSVPVQVPGSEPDMSQYWPR LQ 1/12/06 CAP5510/CGS5166 9

Drosophila Eyeless vs. Human Aniridia Query: 57 HSGVNQLGGVFVGGRPLPDSTRQKIVELAHSGARPCDISRILQVSNGCVSKILGRYYETG 116 HSGVNQLGGVFV GRPLPDSTRQKIVELAHSGARPCDISRILQVSNGCVSKILGRYYETG Sbjct: 5 HSGVNQLGGVFVNGRPLPDSTRQKIVELAHSGARPCDISRILQVSNGCVSKILGRYYETG 64 Query: 117 SIRPRAIGGSKPRVATAEVVSKISQYKRECPSIFAWEIRDRLLQENVCTNDNIPSVSSIN 176 SIRPRAIGGSKPRVAT EVVSKI+QYKRECPSIFAWEIRDRLL E VCTNDNIPSVSSIN Sbjct: 65 SIRPRAIGGSKPRVATPEVVSKIAQYKRECPSIFAWEIRDRLLSEGVCTNDNIPSVSSIN 124 Query: 177 RVLRNLAAQKEQ 188 RVLRNLA++K+Q Sbjct: 125 RVLRNLASEKQQ 136 Query: 417 TEDDQARLILKRKLQRNRTSFTNDQIDSLEKEFERTHYPDVFARERLAGKIGLPEARIQV 476 +++ Q RL LKRKLQRNRTSFT +QI++LEKEFERTHYPDVFARERLA KI LPEARIQV Sbjct: 197 SDEAQMRLQLKRKLQRNRTSFTQEQIEALEKEFERTHYPDVFARERLAAKIDLPEARIQV 256 Query: 477 WFSNRRAKWRREEKLRNQRR 496 WFSNRRAKWRREEKLRNQRR Sbjct: 257 WFSNRRAKWRREEKLRNQRR 276 E-Value = 2e-31 1/12/06 CAP5510/CGS5166 10 Motif Detection in Protein Sequences MTDKMQSLALAPVGNLDSYIRAANAWPMLSADEERALAEKLHYHGDLEAA KTLILSHLRFVVHIARNYAGYGLPQADLIQEGNIGLMKAVRRFNPEVGVR LVSFAVHWIKAEIHEYVLRNWRIVKVATTKAQRKLFFNLRKTKQRLGWFN QDEVEMVARELGVTSKDVREMESRMAAQDMTFDLSSDDDSDSQPMAPVLY LQDKSSNFADGIEDDNWEEQAANRLTDAMQGLDERSQDIIRARWLDEDNK STLQELADRYGVSAERVRQLEKNAMKKLRAAIEA MTDKMQSLALAPVGNLDSYIRAANAWPMLSADEERALAEKLHYHGDLEAA KTLILSHLRFVVHIARNYAGYGLPQADLIQEGNIGLMKAVRRFNPEVGVR LVSFAVHWIKAEIHEYVLRNWRIVKVATTKAQRKLFFNLRKTKQRLGWFN QDEVEMVARELGVTSKDVREMESRMAAQDMTFDLSSDDDSDSQPMAPVLY LQDKSSNFADGIEDDNWEEQAANRLTDAMQGLDERSQDIIRARWLDEDNK STLQELADRYGVSAERVRQLEKNAMKKLRAAIEA 1/12/06 CAP5510/CGS5166 11 Patterns in Protein Structures 1/12/06 CAP5510/CGS5166 12

-1.0 0 1.0 Microarray Analysis Different patterns of gene expression of oral epithelial IHGK cells upon co-culture with A. actinomycetemcomitans or P. gingivalis. 1/12/06 CAP5510/CGS5166 13 Tools: GenePlot Comparison of proteins from two strains of Helicobacter Pylori, 26695 and J99. Each point represents a pair of proteins from the two organisms showing a symmetrical best BLAST score; the coordinates of each point correspond to the position of the protein genes in the 2 genomes. Note the juxtaposition and inversion of two segments of the genome between the two strains. 1/12/06 CAP5510/CGS5166 14 SIDS 18000 Amish people in Pennsylvania Mostly intermarried due to religious doctrine rare recessive diseases occurred with high frequencies. SIDS: 3000 deaths/year (US); 21 deaths (Amish community) Many research centers failed to identify cause Collaboration between Affymetrix, TGEN & Clinic for special children solved the problem in 2 months Identified genes expressed in key organs (brainstem,testes) Studied 10000 SNPs using microarray technology Conclusion: Disease caused by 2 abnormal copies of TSPYL gene 1/12/06 CAP5510/CGS5166 15