Bioinformatics. Dept. of Computational Biology & Bioinformatics

Similar documents
CS612 - Algorithms in Bioinformatics

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr

BIOINFORMATICS LAB AP BIOLOGY

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Journal of Proteomics & Bioinformatics - Open Access

Homology and Information Gathering and Domain Annotation for Proteins

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Homology. and. Information Gathering and Domain Annotation for Proteins

BIOINFORMATICS: An Introduction

Large-Scale Genomic Surveys

Introduction to Bioinformatics

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Introduction to Bioinformatics Online Course: IBT

Computational Biology: Basics & Interesting Problems

Sequence Alignment Techniques and Their Uses

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

CSCE555 Bioinformatics. Protein Function Annotation

Single alignment: Substitution Matrix. 16 march 2017

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Week 10: Homology Modelling (II) - HHpred

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Hands-On Nine The PAX6 Gene and Protein

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

Bioinformatics Exercises

PG Diploma in Genome Informatics onwards CCII Page 1 of 6

Map of AP-Aligned Bio-Rad Kits with Learning Objectives

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Introduction to Bioinformatics

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

October 08-11, Co-Organizer Dr. S D Samantaray Professor & Head,

Berg Tymoczko Stryer Biochemistry Sixth Edition Chapter 1:

Miller & Levine Biology 2014

EBI web resources II: Ensembl and InterPro

DATA ACQUISITION FROM BIO-DATABASES AND BLAST. Natapol Pornputtapong 18 January 2018

Comparative genomics: Overview & Tools + MUMmer algorithm

Prediction of protein function from sequence analysis

Tools and Algorithms in Bioinformatics

Biology Tutorial. Aarti Balasubramani Anusha Bharadwaj Massa Shoura Stefan Giovan

Computational Biology Course Descriptions 12-14

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.

October 6 University Faculty of pharmacy Computer Aided Drug Design Unit

Sequencing alignment Ameer Effat M. Elfarash

Curriculum Links. AQA GCE Biology. AS level

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5

CGS 5991 (2 Credits) Bioinformatics Tools

Miller & Levine Biology

Multiple sequence alignment

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE

Updated: 10/11/2018 Page 1 of 5

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

Sequence analysis and Genomics

I. Molecules and Cells: Cells are the structural and functional units of life; cellular processes are based on physical and chemical changes.

Enduring understanding 1.A: Change in the genetic makeup of a population over time is evolution.

Homology Modeling. Roberto Lins EPFL - summer semester 2005

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

Bioinformatics in the post-sequence era

Identify stages of plant life cycle Botany Oral/written pres, exams

An introduction to SYSTEMS BIOLOGY

Bioinformatics for Biologists

Motivating the need for optimal sequence alignments...

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

TEST SUMMARY AND FRAMEWORK TEST SUMMARY

BMD645. Integration of Omics

Science Online Instructional Materials Correlation to the 2010 Biology Standards of Learning and Curriculum Framework

Algorithms in Computational Biology (236522) spring 2008 Lecture #1

In-Depth Assessment of Local Sequence Alignment

APPLICATION OF GRAPH BASED DATA MINING TO BIOLOGICAL NETWORKS CHANG HUN YOU. Presented to the Faculty of the Graduate School of

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Evaluation. Course Homepage.

Networks & pathways. Hedi Peterson MTAT Bioinformatics

Computational Analysis of the Fungal and Metazoan Groups of Heat Shock Proteins

R.S. Kittrell Biology Wk 10. Date Skill Plan

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

AP BIOLOGY SUMMER ASSIGNMENT

Computational methods for predicting protein-protein interactions

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Science Textbook and Instructional Materials Correlation to the 2010 Biology Standards of Learning and Curriculum Framework. Publisher Information

Algorithms in Bioinformatics

Big Idea 1: Does the process of evolution drive the diversity and unit of life?

Introduction to Evolutionary Concepts

A A A A B B1

Bioinformatics Chapter 1. Introduction

Readings Lecture Topics Class Activities Labs Projects Chapter 1: Biology 6 th ed. Campbell and Reese Student Selected Magazine Article

Grade Level: AP Biology may be taken in grades 11 or 12.

METABOLIC PATHWAY PREDICTION/ALIGNMENT

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Cladistics and Bioinformatics Questions 2013

Introduction Biology before Systems Biology: Reductionism Reduce the study from the whole organism to inner most details like protein or the DNA.

Effects of Gap Open and Gap Extension Penalties

I. Molecules & Cells. A. Unit One: The Nature of Science. B. Unit Two: The Chemistry of Life. C. Unit Three: The Biology of the Cell.

SUPPLEMENTARY INFORMATION

Prentice Hall. Biology: Concepts and Connections, 6th edition 2009, (Campbell, et. al) High School

Transcription:

Bioinformatics Dept. of Computational Biology & Bioinformatics 3

Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4

ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS Dept. of Computational Biology & Bioinformatics 5

WHAT IS BIOINFORMATICS? Computational Biology/Bioinformatics is the application of computer sciences and allied technologies to answer the questions of Biologists, about the mysteries of life. It has evolved to serve as the bridge between: Observations (data) in diverse biologically-related disciplines and The derivations of understanding (information) APPLICATIONS OF BIOINFORMATICS Computer Aided Drug Design Microarray Bioinformatics Proteomics Genomics Biological Databases Phylogenetics Systems Biology Dept. of Computational Biology & Bioinformatics 6

WHAT IS A BIO-SEQUENCE? DNA, RNA or protein information represented as a series of bases (or amino acids) that appear in bio-molecules. The method by which a biosequence is obtained is called Bio-sequencing. AGTCTTGATTCTTCTAGTTCTGC GTCCTGATAAGTCAGTGTCTCC TGAGTCTAGCTTCTGTCCATGCT RNDCQEGHILKMFPUSTWYZEGNDTWRDC GATCATGTCCATGTTCTAGTCAT PUQEGHILDCLKSTMFEWCUWESTHCFPW GATAGTTGATTCTAGTGTCCTG DTCEDUSTTWEGHILDNDTEGHTWUWWE ATTAGCCTTGAATCTTCTAGTTC PUSTPPUQWRDCCLKSWCUWMFCQEDT TGTCCATTATCCATCTGATGGA RWESPWYZWEGHILDDFPTCTWRDSTTFP GTAGTTATGCGATCTCATG GT EEDCCDTWCUWGHISTDTKKSUNENDCFE CCGATACTATCCTGATATAGCTT WCRGHPPHHLDTWQESRNDCQEGHILKM AATCTTCTAGTTCTGTCCATTAT PUSTWYZEGNDTWRDCFPUQEGHILDCLK CCATCTGTC TMFEWCUWESTHCFPWRDTCEDUSTTWE WHAT IS SEQUENCE ALIGNMENT? HILDNDTEGHTWUWWESPUSTPPUQWRD CLKSWCUWMFCQEDTWRWESPWYZWEG ILDDFPTCTWRDSTTFPUEEDCCDTWCUW Arranging DNA/protein sequences side by side to study the extent of their similarity Dept. of Computational Biology & Bioinformatics DNA/ RNA SEQUENCE PROTEIN SEQUENCE 7

sequencing CRISIS AFTER DATA EXPLOSION!! Dept. of Computational Biology & Bioinformatics 8

DATA EXPLOSION TREND SOLUTION?? BIOLOGICAL DATABASES Dept. of Computational Biology & Bioinformatics 9

BIOLOGICAL DATABASES Dept. of Computational 10 Biology & Bioinformatics

WHAT IS A DATABASE? A structured set of data held in a computer, esp. one that is accessible in various ways. Dept. of Computational Biology & Bioinformatics 11

POPULAR DATABASE WEBSITES Dept. of Computational Biology & Bioinformatics 12

BIOLOGICAL DATABASES Dept. of Computational Biology & Bioinformatics 13

CLASSIFICATION OF BIOLOGICAL DATABASES Based on data source Based on data type Dept. of Computational Biology & Bioinformatics 14

BASED ON DATA SOURCE Dept. of Computational Biology & Bioinformatics 15

BIOLOGICAL DATABASES PRIMARY DATABASES SECONDARY DATABASES First-hand information of experimental data from scientists and researchers Data not edited or validated Raw and original submission of data Made available to public for annotation Derived from information gathered in primary database Data is manually curated and annotated Data of highest quality as it is double checked Dept. of Computational Biology & Bioinformatics 16

PRIMARY DATABASES Database Website 1. NCBI (National Centre for Biotechnology www.ncbi.nlm.nih.gov Information) 2. DDBJ (DNA Data Bank of Japan) www.ddbj.nig.ac.jp 3. EMBL(European Molecular Biology Laboratory) www.ebi.ac.uk/embl 4. PIR (Protein Information Resource) www.pir.georgetown.edu 5. PDB (Protein Data Bank) www.rcsb.org/pdb 6. NDB( Nucleotide Data Bank) www.ndbserver.rutgers.edu 7. SwissProt (Protein- only sequence database) www.expasy.ch

SECONDARY DATABASES Database Website 1. PROSITE (Protein domains, families, functional www.expasy.org/prosite sites) 2. Pfam (Protein families) www.sanger.ac.uk/pfam 3. SCOP (Structural Classification Of Proteins) www.scop.mrc-lmb.cam.ac.uk/scop 4. CATH (Class, Architecture, Topology, Homologous www.cathdb.info Super Family of Proteins) 5. OMIM (Online Mendelian Inheritance in Man) www.ncbi.nlm.nih/omim 6. KEGG (Kyoto Encyclopedia of Genes and Genome) www.genome.jp/kegg/pathway.html 7. MetaCyc (Enzyme Metabolic Pathways) www.metacyc.org Dept. of Computational Biology & Bioinformatics 18

Based on type of data Dept. of Computational Biology & Bioinformatics 19

BASED ON THE TYPE OF DATA BIOLOGICAL DATABASES NUCLEOTIDE SEQUENCE DATABASE PROTEIN SEQUENCE DATABASE GENOME DATABASE GENE EXPRESSION DATABASE ENZYME DATABASE STRUCTURE DATABASE PROTEIN INTERACTION DATABASE PATHWAY DATABASE LITERATURE DATABASE Dept. of Computational Biology & Bioinformatics 20

NUCLEOTIDE SEQUENCE DATABASES Dept. of Computational 21 Biology & Bioinformatics

NCBI- National Centre for Biotechnology Information Dept. of Computational Biology & Bioinformatics 22

EMBL European Molecular Biology Lab Dept. of Computational Biology & Bioinformatics 23

DDBJ- DNA DATA BANK OF JAPAN Dept. of Computational Biology & Bioinformatics 24

PROTEIN SEQUENCE DATABASE Dept. of Computational 25 Biology & Bioinformatics

Dept. of Computational Biology & Bioinformatics 26

PDB- PROTEIN DATA BANK Dept. of Computational Biology & Bioinformatics 27

PATHWAY DATABASES Dept. of Computational 28 Biology & Bioinformatics

KEGG- KYOTO ENCYCLOPEDIA OF GENES AND GENOMES Dept. of Computational Biology & Bioinformatics 29

GENOME DATABASE Dept. of Computational Biology & Bioinformatics 30

WORMBASE : has the entire genome of C. elegans and other nematodes Dept. of Computational Biology & Bioinformatics 31

GENE EXPRESSION DATABASE Dept. of Computational Biology & Bioinformatics 32

Dept. of Computational Biology & Bioinformatics 33

Microarrays provide a means to measure gene expression Dept. of Computational Biology & Bioinformatics 34

Yeast genome on a chip

ENZYME DATABASE Dept. of Computational Biology & Bioinformatics 36

ENZYME DATABASE OF ExPaSy server Dept. of Computational Biology & Bioinformatics 37

STRUCTURE DATABASE Dept. of Computational Biology & Bioinformatics 38

Dept. of Computational Biology & Bioinformatics 39

LITERATURE DATABASE Dept. of Computational Biology & Bioinformatics 40

Dept. of Computational Biology & Bioinformatics 41

Use of Databases in Biology- Sequence Analysis Dept. of Computational Biology & Bioinformatics 42

Where do we get these sequences from? Through genome sequencing projects Dept. of Computational Biology & Bioinformatics 43

Submit sequences to biological databases Biological databases helps in efficient manipulation of large data sets Provides improved search sensitivity, search efficiency Joining of multiple data sets Databases allows the users to analyse the biological data sets DNA RNA Proteins Dept. of Computational Biology & Bioinformatics 44

Analysis of Nucleic acids & Protein Sequences Sequence Analysis Process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods To understand its features, function, structure, or evolution To assign function to genes & proteins by the studying the similarities between the compared sequences Methodologies include: Sequence alignment Searches against biological databases Dept. of Computational Biology & Bioinformatics 45

Sequence analysis in molecular biology includes a very wide range of relevant topics: The comparison of sequences in order to find similarity, infer if they are related (homologous) Identification of active sites, gene structures, reading frames etc. Identification of sequence differences and variations SNP, Point mutations, identify genetic markers Revealing the evolution and genetic diversity of sequences and organisms Identification of molecular structure from sequence alone Dept. of Computational Biology & Bioinformatics 46

Sequence Alignment Relationships between these sequences are usually discovered by aligning them together assigning a score to the alignments Two main types of sequence alignment: Pair-wise sequence alignment - compares only two sequences at a time Multiple sequence alignment- compares many sequences Two important algorithms for aligning pairs of sequences : Needleman-Wunsch algorithm Smith-Waterman algorithm Dept. of Computational Biology & Bioinformatics 47

Popular tools for sequence alignment include: Pair-wise alignment - BLAST Multiple alignment - ClustalW, MUSCLE, MAFFT, T-Coffee etc. Alignment methods: Local alignments - Needleman Wunsch algorithm Global alignments - Smith-Waterman algorithm Dept. of Computational Biology & Bioinformatics 48

Pair-wise alignment Used to find the best-matching piecewise (local or global) alignments of two query sequences Can only be used between two sequences at a time Dept. of Computational Biology & Bioinformatics 49

Multiple Sequence Alignment Is an extension of pairwise alignment to incorporate more than two sequences at a time Align all of the sequences in a given query set Often used in identifying conserved sequence regions across a group of sequences hypothesized to be evolutionarily related Alignments helps to establish evolutionary relationships by constructing phylogenetic trees Dept. of Computational Biology & Bioinformatics 50

Sequence Analaysis Tools Pair-wise alignment - BLAST Basic Local Alignment Search Tool (BLAST) Developed by Research staff at NCBI/GenBank as a new way to perform seq. similarity search Available as free service over internet Very fast,accurate and sensitive database searching Server-NCBI Dept. of Computational Biology & Bioinformatics 51

Types of BLAST Programs: Dept. of Computational Biology & Bioinformatics 52

NCBI -BLAST Dept. of Computational Biology & Bioinformatics 53

Dept. of Computational Biology & Bioinformatics 54

Dept. of Computational Biology & Bioinformatics 55

FASTA DNA & Protein sequence alignment software package Fast A Fast ALL Works on any Alphabets - FAST P Protein - FAST N Nucleotide Dept. of Computational Biology & Bioinformatics 56

Dept. of Computational Biology & Bioinformatics 57

Sequence Analaysis Tools Multiple alignment - ClustalW Study the identities, Similarities & Differences Study evolutionary relationship Identification of conserved sequence regions Useful in predicting Function & structure of proteins Identifying new members of protein families Dept. of Computational Biology & Bioinformatics 58

Dept. of Computational Biology & Bioinformatics 59

Dept. of Computational Biology & Bioinformatics 60

Dept. of Computational Biology & Bioinformatics 61

Includes all methods, theoretical & computational, used to model or mimic the behaviour of molecules Helps to study molecular systems ranging from small chemical systems to large biological molecules The methods are used in the fields of : Computational chemistry Drug design Computational biology Materials science Dept. of Computational Biology & Bioinformatics 62

Structure Analysis of Proteins Researchers predict the 3D structure using protein or molecular modeling Experimentally determined protein structures (templates) are used To predict the structure of another protein that has a similar amino acid sequence (target) Dept. of Computational Biology & Bioinformatics 63

Advantages in Protein Modeling Examining a protein in 3D allows for : greater understanding of protein functions providing a visual understanding that cannot always be conveyed through still photographs or descriptions Dept. of Computational Biology & Bioinformatics 64

Example of 3D-Protein Model Dept. of Computational Biology & Bioinformatics 65

Impact of Bioinformatics in Biology/Biotechnology Dept. of Computational Biology & Bioinformatics 66

Biological research is the most fundamental research to understand complete mechanism of living system The advancements in technologies helps in providing regular updates and contribution to make human life better and better. Reduced the time consuming experimental procedure Software development Bioinformatians & Computational Biologists Submitting biological sequences to databases Dept. of Computational Biology & Bioinformatics 67

Role of Bioinformatics in Biotechnology Dept. of Computational Biology & Bioinformatics 68

Genomics The study of genes and their expression Generates vast amount of data from gene sequences, their interrelations & functions Understand structural genomics, functional genomics and nutritional genomics Proteomics Study of protein structure, function &interactions produced by a particular cell, tissue, or organism Deals with techniques of genetics, biochemistry and molecular biology Study protein-protein interactions, protein profiles, protein activity pattern and organelles compositions Dept. of Computational Biology & Bioinformatics 69

Transcriptomics Study of sets of all messenger RNA molecules in the cell Also be called as Expression Profiling- DNA Micro array RNA sequencing NGS Used to analyse the continuously changing cellular transcriptome Cheminformatics Deals with focuses on storing, indexing, searching, retrieving, and applying information about chemical compounds involves organization of chemical data in a logical form - to facilitate the retrieval of chemical properties, structures & their relationships Helps to identify and structurally modify a natural product Dept. of Computational Biology & Bioinformatics 70

Drug Discovery Increasingly important role in drug discovery, drug assessment & drug development Computer-aided drug design (CADD)- generate more & more drugs in a short period of time with low risk wide range of drug-related databases & softwares - for various purposes related to drug designing & development process Evolutionary Studies Phylogenetics - evolutionary relationship among individuals or group of organisms phylogenetic trees are constructed based on the sequence alignment using various methods Dept. of Computational Biology & Bioinformatics 71

Crop Improvement Innovations in omics based research improve the plant based research Understand molecular system of the plant which are used to improve the plant productivity comparative genomics helps in understanding the genes & their functions, biological properties of each species Biodefense Biosecurity of organisms - subjected to biological threats or infectious diseases (Biowar) Bioinformatics- limited impact on forensic & intelligence operations Need of more algorithms in bioinformatics for biodefense Bioenergy/Biofuels contributing to the growing global demand for alternative sources of renewable energy progress in algal genomics + omics approach - Metabolic pathway & genes genetically engineered micro algal strains Dept. of Computational Biology & Bioinformatics 72