Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Similar documents
Bioinformatics. Dept. of Computational Biology & Bioinformatics

CS612 - Algorithms in Bioinformatics

STRUCTURAL BIOINFORMATICS I. Fall 2015

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Introduction to Bioinformatics

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

BME 5742 Biosystems Modeling and Control

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

Computational methods for predicting protein-protein interactions

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr

O 3 O 4 O 5. q 3. q 4. Transition

Algorithms in Computational Biology (236522) spring 2008 Lecture #1

BIOINFORMATICS LAB AP BIOLOGY

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

Computational Biology: Basics & Interesting Problems

Grundlagen der Bioinformatik, SS 08, D. Huson, May 2,

Course Descriptions Biology

Course plan Academic Year Qualification MSc on Bioinformatics for Health Sciences. Subject name: Computational Systems Biology Code: 30180

Updated: 10/11/2018 Page 1 of 5

Molecular and Cellular Biology

Computational Molecular Biology (

Application of Associative Matrices to Recognize DNA Sequences in Bioinformatics

Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics

SCOTCAT Credits: 20 SCQF Level 7 Semester 1 Academic year: 2018/ am, Practical classes one per week pm Mon, Tue, or Wed

BMD645. Integration of Omics

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Bio 101 General Biology 1

MSc Drug Design. Module Structure: (15 credits each) Lectures and Tutorials Assessment: 50% coursework, 50% unseen examination.

MOLECULAR BIOLOGY BIOL 021 SEMESTER 2 (2015) COURSE OUTLINE

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

CHEM 121: Chemical Biology

Visit to BPRC. Data is crucial! Case study: Evolution of AIRE protein 6/7/13

Phylogenomics, Multiple Sequence Alignment, and Metagenomics. Tandy Warnow University of Illinois at Urbana-Champaign

GCD3033:Cell Biology. Transcription

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

- conserved in Eukaryotes. - proteins in the cluster have identifiable conserved domains. - human gene should be included in the cluster.

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

Hands-On Nine The PAX6 Gene and Protein

MiGA: The Microbial Genome Atlas

Introduction to Biology Spring 2011 Biol 1308 CRN# 78977

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE

Computational Biology Course Descriptions 12-14

Lecture 3: Markov chains.

BIOINFORMATICS: METHODS AND APPLICATIONS: (Genomics, Proteomics and Drug Discovery)

Bioinformatics and BLAST

Related Courses He who asks is a fool for five minutes, but he who does not ask remains a fool forever.

Syllabus BINF Computational Biology Core Course

Computational Biology From The Perspective Of A Physical Scientist

Homology and Information Gathering and Domain Annotation for Proteins

Honors Biology 9. Dr. Donald Bowlin Ext. 1220

Prac%cal Bioinforma%cs for Life Scien%sts. Week 14, Lecture 28. István Albert Bioinforma%cs Consul%ng Center Penn State

GENE ONTOLOGY (GO) Wilver Martínez Martínez Giovanny Silva Rincón

Homology Modeling. Roberto Lins EPFL - summer semester 2005

MOLECULAR MODELING IN BIOLOGY (BIO 3356) SYLLABUS

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

Advanced Cell Biology. Lecture 1

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Journal of Proteomics & Bioinformatics - Open Access

Data Mining in Bioinformatics HMM

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Evaluation. Course Homepage.

Introduction to Evolutionary Concepts

A SURVEY OF ORGANIC CHEMISTRY CHEMISTRY 1315 TuTr 9:35-10:55 am, Boggs B6

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week:

Predicting Protein Functions and Domain Interactions from Protein Interactions

BLAST. Varieties of BLAST

Welcome to BIOL 572: Recombinant DNA techniques

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype

BIOLOGY Grades Summer Units: 10 high school credits UC Requirement Category: d. General Description:

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

or

Computational Structural Bioinformatics

Text: Biology by Miller and Levine. (A copy will be checked-out to each student)

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Week 10: Homology Modelling (II) - HHpred

Instructor: Dr. Darryl Kropf 203 S. Biology ; please put cell biology in subject line

Bioinformatics Exercises

Lecture: Computational Systems Biology Universität des Saarlandes, SS Introduction. Dr. Jürgen Pahle

Single alignment: Substitution Matrix. 16 march 2017

CGS 5991 (2 Credits) Bioinformatics Tools

SUPPLEMENTARY INFORMATION

CELL AND MICROBIOLOGY Nadia Iskandarani

COMPARING DNA SEQUENCES TO UNDERSTAND EVOLUTIONARY RELATIONSHIPS WITH BLAST

Introduction to Bioinformatics

Domain-based computational approaches to understand the molecular basis of diseases

CHEM 181: Chemical Biology

Biology 10 th Grade. Textbook: Biology, Miller and Levine, Pearson (2010) Prerequisite: None

BIOINFORMATICS: An Introduction

EASTERN ARIZONA COLLEGE Biology Concepts

Virginia Western Community College BIO 101 General Biology I

W/F = 10:00 AM - 1:00 PM Other times available by appointment only

Comparative genomics: Overview & Tools + MUMmer algorithm

EBI web resources II: Ensembl and InterPro

Ap Biology Chapter 17 Packet Answers

Comparative Bioinformatics Midterm II Fall 2004

Comparing whole genomes

Campbell Biology AP Edition 11 th Edition, 2018

DATA ACQUISITION FROM BIO-DATABASES AND BLAST. Natapol Pornputtapong 18 January 2018

Sequence Alignment Techniques and Their Uses

Chapter 19: Taxonomy, Systematics, and Phylogeny

Transcription:

Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a) 1.1 Time, place and problem sessions Lectures: Mondays 16ct-18h A301, Sand 1 Thursdays 15ct-17h Small lecture hall, Sand 6/7 Problem sessions organized by: Daniel Huson & Juliane Damaris Klein & Max Schubach & Jonas Müller Groups: Day Time Where Email: Daniel Huson Juliane Damaris Klein huson at informatik.uni-tuebingen.de klein at informatik.uni-tuebingen.de Website: http://www-ab.informatik.uni-tuebingen.de/teaching/summer-semester-2010/ grundlagen-der-bioinformatik 1.2 How to get credit This course is worth 8 ECTS credits. Hence, you will be expected to invest approximately 8 40 = 240 hours in it. Each week, you will be assigned a set of practical and theoretical problems (Übungsblatt). You may work on these, and hand in your results, in groups of up to two people. Your solutions will be graded and you will be asked to present them in the problem sessions (Übungsgruppen). Participation in the problem sessions is mandatory. Your solutions and how well you present them will be graded and this grade will make up 1 3 of your final grade. Additionally, there will be two written exams, namely a mid-term and a final exam. Each will count toward your final grade. 1 3 1.3 Script and assignments At the beginning of each lecture, a script covering the current content of the course will be handed out. The script will also be made available online. Assignment sheets will usually be handed out and posted online on Mondays.

2 Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 You must hand in your solutions on the following Monday before the beginning of the lecture. 1.4 Bioinformatics The following characterization is from: http://www.ncbi.nlm.nih.gov/about/primer/ bioinformatics.html Major advances in the field of molecular biology and advances in genomic technologies have led to an explosive growth in the biological information generated by the scientific community. This huge amount of genomic information has led to an absolute requirement for computerized databases to store, organize, and index the data and for specialized tools to view and analyze the data. 1.4.1 What is a biological database? A biological database is a large, organized body of persistent data, usually associated with computerized software designed to update, query, and retrieve components of the data stored within the system. A simple database might be a single file containing many records, each of which includes the same set of information. For example, a record associated with a nucleotide sequence database typically contains information such as contact name, the input sequence with a description of the type of molecule, the scientific name of the source organism from which it was isolated, and often, literature citations associated with the sequence. For researchers to benefit from the data stored in a database, two additional requirements must be met: easy access to the information a method for extracting only that information needed to answer a specific biological question NCBI (the National Center for Biotechnology Information) provides the largest publicly available database at: www.ncbi.nlm.nih.gov At NCBI, many of the databases are linked through a search and retrieval system called Entrez. Entrez allows a user to access and retrieve specific information from a single database and also to access integrated information from many NCBI databases. For example, the Entrez Protein database is cross-linked to the Entrez Taxonomy database. This allows a researcher to find taxonomic information (taxonomy is a division of the natural sciences that deals with the classification of animals and plants) for the species from which a protein sequence was derived.

Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 3 1.4.2 What is Bioinformatics? Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline. The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be worked out. At the beginning of the genomic revolution, a major bioinformatics concern was the creation and maintenance of a database to store biological information, such as nucleotide and amino acid sequences. Development of this type of database involved both design issues and the development of complex interfaces whereby researchers could both access existing data as well as submit new or revised data. Ultimately, however, all of this information must be combined to form a comprehensive picture of normal cellular activities so that researchers may study how these activities are altered in different disease states. Therefore, the field of bioinformatics has evolved such that the focus is on developing and applying methods for the analysis and interpretation of various types of data, including nucleotide and amino acid sequences, protein domains, and protein structures.

4 Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 The actual process of analyzing and interpreting data is referred to as computational biology. Important sub-disciplines within bioinformatics and computational biology include: the development and implementation of tools that enable efficient access to, and use and management of, various types of information, and the development of new mathematical models, algorithms and statistics with which to assess relationships among members of large data sets, such as methods to locate a gene within a sequence, predict protein structure and/or function, and cluster protein sequences into families of related sequences. 1.5 Overview Here is an overview of the course: Chapter 1: Introduction Chapter 2: Pairwise alignment The pairwise comparison of two sequences is arguably the most important computation provided by bioinformatics. How to solve this efficiently? How to score an alignment? What variants of the problem exist? Chapter 3: BLAST Searching for similar sequences in large databases requires a fast heuristic for pairwise alignment. We will discuss BLAST, the most well-known bioinformatics tool. Chapter 4: Multiple alignment Two homologous sequences whisper... a full multiple sequence alignment shouts out loud. (Arthur Lesk) Although obtaining an optimal multiple alignment is computationally hard, multiple sequence alignments are often required by biologists. We will discuss some of the most widely-used heuristic approaches.

Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 5 Chapter 5: Phylogenetic analysis Nothing in biology makes sense except in the light of evolution (Theodosius Dobzhansky) Phylogenetic analysis based on molecular sequences is important for understanding many aspects of biology. We will look at the most important distance- and character-based methods for performing phylogenetic analysis. Chapter 6: Machine-learning methods One important class of techniques for analysing biological data are machine-learning approaches. We will discuss some examples of uses of HMMs and SVMs. www.di.unito.it/www/icml96 Chapter 7: Gene finding The goal of computational sequence analysis is to find all genes, transcription factor binding sites and other important sequence features in a genome. We will discuss some of the basic techniques.

6 Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 E0+ E1+ E2+ I0+ I1+ I2+ Einit+ (initial exon) Eterm+ (terminal exon) F+ (5 UTR) Esngl+ (single exon gene) T+ (3 UTR) P+ (promoter) A+ (poly A signal) Forward (+) strand Reverse ( ) strand N (intergenic region) Chapter 8: RNA secondary structure An RNA sequence is single stranded and folds into a characteristic secondary structure by forming base pairs with itself. How to computationally predict the secondary structure of an RNA sequence? Chapter 9: Protein secondary structure What are the secondary structure elements of a protein? How do they form? Given only the primary sequence of a protein, how to predict its secondary structure? Chapter 10: Protein tertiary structure The tertiary structure of a protein determines its function. How to predict the tertiary structure of a protein computationally (a) when the structure of a homologous protein is already known, and (b) when no such information is available. Chapter 11: Microarrays Microarrays allows one to measure the expression levels of many different genes simutaneously. What are the applications of this technology and how to analyze such data?

Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 7 Chapter 12: Sequencing, assembly and resequencing of genomes The most fundamental data associated with an organism is its genome. What is genome assembly? What is resequencing? Chapter 13: Metagenomics How is DNA sequenced? Ultra high-throughput sequencing technologies paired with huge computational resources allow one to study the communities of microbes contained in samples from water, soil or other environments. We will look at some of the main approaches used.