Computational Biology: Basics & Interesting Problems

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Computational Biology: Basics & Interesting Problems"

Transcription

1 Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction

2 Sources of information Too many sources! Some selected lectures: Course on computational biology Human Genome project Artificial Intelligence and Molecular Biology Another course on Molecular Biology Follow links in these sites The Tree of Life

3 The Cell Example: Tissues in Stomach

4 Four nucleotide types: Adenine Guanine Cytosine Thymine DNA Components Hydrogen bonds: A-T C-G The Double Helix Source: Alberts et al

5 DNA Duplication Source: Mathews & van Holde DNA Organization Source: Alberts et al

6 Genome Sizes E.Coli (bacteria) 4.6 x 10 6 bases Yeast (simple fungi) 15 x 10 6 bases Smallest human chromosome 50 x 10 6 bases Entire human genome 3 x 10 9 bases Genes The DNA strings include: Coding regions ( genes ) E. coli has ~4,000 genes Yeast has ~6,000 genes C. Elegans has ~13,000 genes Humans have ~32,000 genes Control regions These typically are adjacent to the genes They determine when a gene should be expressed Junk DNA (unknown function)

7 Transcription Coding sequences can be transcribed to RNA Source: Mathews & van Holde RNA nucleotides: Similar to DNA, slightly different backbone Uracil (U) instead of Thymine (T)

8 RNA Editing

9 RNA roles Messenger RNA (mrna) Encodes protein sequences Transfer RNA (trna) Adaptor between mrna molecules and aminoacids (protein building blocks) Ribosomal RNA (rrna) Part of the ribosome, a machine for translating mrna to proteins... Transfer RNA Anticodon: matches a codon (triplet of mrna nucleotides) Attachment site: matches a specific amino-acid

10 Translation Translation is mediated by the ribosome Ribosome is a complex of protein & rrna molecules The ribosome attaches to the mrna at a translation initiation site Then ribosome moves along the mrna sequence and in the process constructs a poly-peptide When the ribosome encounters a stop signal, it releases the mrna. The construct polypeptide is released, and folds into a protein. Translation Source: Alberts et al

11 Translation Source: Alberts et al Translation Source: Alberts et al

12 Translation Source: Alberts et al Translation Source: Alberts et al

13 Gli Aminoacidi Genetic Code

14 Genetic Code Protein Structure Proteins are polypeptides of amino-acids This structure is (mostly) determined by the sequence of amino-acids that make up the protein

15 Protein Structure Evolution Related organisms have similar DNA Similarity in sequences of proteins Similarity in organization of genes along the chromosomes Evolution plays a major role in biology Many mechanisms are shared across a wide range of organisms During the course of evolution existing components are adapted for new functions

16 Evolution Evolution of new organisms is driven by Diversity Different individuals carry different variants of the same basic blue print Mutations The DNA sequence can be changed due to single base changes, deletion/insertion of DNA segments, etc. Selection bias Four Aspects Biological What is the task? Algorithmic How to perform the task at hand efficiently? Learning How to adapt parameters of the task form examples Statistics How to differentiate true phenomena from artifacts

17 Example: Sequence Comparison Biological Evolution preserves sequences, thus similar genes might have similar function Algorithmic Consider all ways to align one sequence against another Learning How do we define similar sequences? Use examples to define similarity Statistics When we compare to ~10 6 sequences, what is a random match and what is true one Topics I Dealing with DNA/Protein sequences: Genome projects and how sequences are found Finding similar sequences Models of sequences: Hidden Markov Models Transcription regulation Protein Families Gene finding

18 Topics II Gene Expression: Genome-wide expression patterns Data organization: clustering Reconstructing transcription regulation Recognizing and classifying cancers Topics III Models of genetic change: Long term: evolutionary changes among species Reconstructing evolutionary trees from current day sequences Short term: genetic variations in a population Finding genes by linkage and association

19 Topics IV Protein World: How proteins fold - secondary & tertiary structure How to predict protein folds from sequences data alone How to analyze proteins changes from raw experimental measurements (MassSpec) 2D gels A Computational Biology Project From DNA Chip data: individuate expressed genes Collect DNA sequences of expressed genes Extract promoter regions of expressed genes from sequence Characterize common regulatory signals in the promoter regions Find similar signals in entire genome