STRUCTURAL BIOINFORMATICS I Fall 2015 Info Course Number - Classification: Biology 5411 Class Schedule: Monday 5:30-7:50 PM, SERC Room 456 (4 th floor) Instructors: Vincenzo Carnevale - SERC, Room 704C; vincenzo.carnevale@temple.edu Vijayan Ramaswamy - SERC, Room 718E; vijayan@temple.edu Vincent Voelz - Beury Hall Rm. 240; voelz@temple.edu Office hours: By appointment Description This course will cover the basic concepts of structural bioinformatics. A broad qualitative overview of macromolecular structure and protein folding will be provided which includes sequence alignment, secondary structure calculation, tertiary structure prediction and an overview of biological databases. An introduction to programming languages, data mining and algorithms used in Bioinformatics will be covered to provide competence in handling large and complex biological data. Objectives The objective of this course is to introduce students to the fundamental concepts and methods in structural bioinformatics. Students will be given training on a broad range of skill sets that is required for a sound understating of structural bioinformatics. Upon successful completion, students are expected to have a working knowledge of the Linux environment, parallel computing, scripting, the ability to analyze complex biological data and knowledge discovery from biological databases. Organization Some of the lectures will be held in the computer lab to keep a close connection between theoretical concepts and computational case studies. Most of the computational work will be performed in class; however specific tasks will be completed by the students and evaluated as homework assignments.
Materials In addition to the following reference books, additional study materials will be made available in class. Arthur M. Lesk, Introduction to Protein Science: Architecture, Function, and Genomics, 2 nd ed. Oxford University Press, 2010 Gregory A. Petsko, Dagmar Ringe, Protein Structure and Function, New Science Press, 2004 Carl Branden and John Tooze, Introduction to Protein Structure, 2 nd ed. Garland Science, 1998 Arthur M. Lesk, Introduction to Bioinformatics, 3 rd ed. Oxford University Press, 2008 R. Durbin, S. Eddy, A. Krogh, Biological Sequence Analysis, Cambridge Univesity Press, 1998 Jenny Gu, Philip E. Bourne, Structural Bioinformatics, 2nd Edition, Wiley, 2009 David Mount, Bioinformatics: Sequence and Genome Analysis, Cold Spring Harbor New York, 2013 Neil C. Jones and Pavel A. Pevzner, An Introduction to Bioinformatics Algorithms MIT Press, 2004. Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer 2006. Grading Grading is assigned on the basis of homework, assignments, midterm exam and a final exam. The following grading criteria will be followed Homework 50% Midterm 20% Final exam 30%
Syllabus Topic I. Introduction to protein structure (2 lectures) Vijayan Ramaswamy Dates: 08/24/2015 and 08/31/2015 Structure and chemistry of amino acids Basic structural features of polypeptides Primary structure Secondary structure Tertiary and quaternary structure Protein main-chain conformation and Ramachandran plots Sidechain conformation and rotamer libraries Protein folding patterns and structural classification and structural superposition II. Protein folding and design (2 lectures) Vincenzo Carnevale Dates: 09/14/2015 and 09/21/2015 Basic concepts: stability of the native state, kinetics of protein folding Experimental characterization of events in protein folding Thermodynamics of protein folding, hydrophobic collapse and molten globule Impact of free energy landscapes on folding kinetics: folding funnel Effect of denaturants on folding/unfolding equilibrium Relationship between native structure and folding The hierarchical model Protein engineering and design III. Bioalgorithms (4 lectures) Vincent Voelz Introduction to UNIX and the command line Shell commands, file system Process management Text editing Introduction to scripting with Python The interpreter Data types: Strings, Lists, tuples, and dictionaries Looping and Control Flow Writing python scripts Functions, Classes, Modules Dates: 09/28/2015, 10/05/2015 and 10/12/2015
Scripting example: mc.py Making plots with matplotlib Scientific computing with python Searching and sorting Graph theory Depth-first vs. Breadth-first searches Computational complexity Data clustering and classification Distance metrics The RMSD Clustering k-centers, k-means clustering Hierarchical clustering Support Vector Machines, regression IV. Databases (2 lectures) Vijayan Ramaswamy Repositories and information retrieval Nucleotide sequence databases Protein sequence databases Sequence motif databases Protein structure databases Small molecule databases Protein structure repositories and visualization tools Dates: 10/19/2015 and 10/26/2015 V. Bioinformatics of protein sequence and structure (4 lectures) Vincenzo Carnevale Sequence Alignments Measures of sequence similarity Computing the alignment of two sequences (Smith-Waterman) The dynamic programming algorithm Statistical significance of alignments Multiple sequence alignments Structural inferences from multiple sequence alignments Markov chains and Hidden Markov Models Formal definition of HMMs Most probable state path: the Viterbi algorithm The forward algorithm Posterior decoding Parameter estimation for HMMs HMM model structure: choice of topology Dates: 11/02/2015, 11/09/2015, 11/16/2015 and 12/02/2015
Probabilistic modeling of sequence ensembles The direct problem Statistical models and observables Entropy and Kullback-Leibler divergence The inverse problem Statement of the inverse problem Bayesian formulation Maximum likelihood criteria