Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction
Sources of information Too many sources! Some selected lectures: Course on computational biology http://www.math.tau.ac.il/~rshamir/algmb.html Human Genome project http://genome.ucsc.edu/ Artificial Intelligence and Molecular Biology http://www.aaai.org/library/books/hunter/hunter.html Another course on Molecular Biology http://cmgm.stanford.edu/biochem218/ Follow links in these sites The Tree of Life
The Cell Example: Tissues in Stomach
Four nucleotide types: Adenine Guanine Cytosine Thymine DNA Components Hydrogen bonds: A-T C-G The Double Helix Source: Alberts et al
DNA Duplication Source: Mathews & van Holde DNA Organization Source: Alberts et al
Genome Sizes E.Coli (bacteria) 4.6 x 10 6 bases Yeast (simple fungi) 15 x 10 6 bases Smallest human chromosome 50 x 10 6 bases Entire human genome 3 x 10 9 bases Genes The DNA strings include: Coding regions ( genes ) E. coli has ~4,000 genes Yeast has ~6,000 genes C. Elegans has ~13,000 genes Humans have ~32,000 genes Control regions These typically are adjacent to the genes They determine when a gene should be expressed Junk DNA (unknown function)
Transcription Coding sequences can be transcribed to RNA Source: Mathews & van Holde RNA nucleotides: Similar to DNA, slightly different backbone Uracil (U) instead of Thymine (T)
RNA Editing
RNA roles Messenger RNA (mrna) Encodes protein sequences Transfer RNA (trna) Adaptor between mrna molecules and aminoacids (protein building blocks) Ribosomal RNA (rrna) Part of the ribosome, a machine for translating mrna to proteins... Transfer RNA Anticodon: matches a codon (triplet of mrna nucleotides) Attachment site: matches a specific amino-acid
Translation Translation is mediated by the ribosome Ribosome is a complex of protein & rrna molecules The ribosome attaches to the mrna at a translation initiation site Then ribosome moves along the mrna sequence and in the process constructs a poly-peptide When the ribosome encounters a stop signal, it releases the mrna. The construct polypeptide is released, and folds into a protein. Translation Source: Alberts et al
Translation Source: Alberts et al Translation Source: Alberts et al
Translation Source: Alberts et al Translation Source: Alberts et al
Gli Aminoacidi Genetic Code
Genetic Code Protein Structure Proteins are polypeptides of 70-3000 amino-acids This structure is (mostly) determined by the sequence of amino-acids that make up the protein
Protein Structure Evolution Related organisms have similar DNA Similarity in sequences of proteins Similarity in organization of genes along the chromosomes Evolution plays a major role in biology Many mechanisms are shared across a wide range of organisms During the course of evolution existing components are adapted for new functions
Evolution Evolution of new organisms is driven by Diversity Different individuals carry different variants of the same basic blue print Mutations The DNA sequence can be changed due to single base changes, deletion/insertion of DNA segments, etc. Selection bias Four Aspects Biological What is the task? Algorithmic How to perform the task at hand efficiently? Learning How to adapt parameters of the task form examples Statistics How to differentiate true phenomena from artifacts
Example: Sequence Comparison Biological Evolution preserves sequences, thus similar genes might have similar function Algorithmic Consider all ways to align one sequence against another Learning How do we define similar sequences? Use examples to define similarity Statistics When we compare to ~10 6 sequences, what is a random match and what is true one Topics I Dealing with DNA/Protein sequences: Genome projects and how sequences are found Finding similar sequences Models of sequences: Hidden Markov Models Transcription regulation Protein Families Gene finding
Topics II Gene Expression: Genome-wide expression patterns Data organization: clustering Reconstructing transcription regulation Recognizing and classifying cancers Topics III Models of genetic change: Long term: evolutionary changes among species Reconstructing evolutionary trees from current day sequences Short term: genetic variations in a population Finding genes by linkage and association
Topics IV Protein World: How proteins fold - secondary & tertiary structure How to predict protein folds from sequences data alone How to analyze proteins changes from raw experimental measurements (MassSpec) 2D gels A Computational Biology Project From DNA Chip data: individuate expressed genes Collect DNA sequences of expressed genes Extract promoter regions of expressed genes from sequence Characterize common regulatory signals in the promoter regions Find similar signals in entire genome