Sequence Alignment. Johannes Starlinger

Similar documents
Searching Sear ( Sub- (Sub )Strings Ulf Leser

Sequence Alignment (chapter 6)

Sequence analysis and Genomics

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

Bio nformatics. Lecture 3. Saad Mneimneh

Background: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6)

Computational Biology

Pairwise sequence alignment

Pairwise & Multiple sequence alignments

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Sequence analysis and comparison

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

EECS730: Introduction to Bioinformatics

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Lecture 5,6 Local sequence alignment

STATC141 Spring 2005 The materials are from Pairwise Sequence Alignment by Robert Giegerich and David Wheeler

Introduction to Bioinformatics

Dynamic Programming: Edit Distance

Introduction to Bioinformatics Algorithms Homework 3 Solution

Algorithms in Bioinformatics

Administration. ndrew Torda April /04/2008 [ 1 ]

Sequence Comparison. mouse human

Single alignment: Substitution Matrix. 16 march 2017

Alignment & BLAST. By: Hadi Mozafari KUMS

Lecture 3: Markov chains.

Pairwise sequence alignments

Algorithms and Data S tructures Structures Complexity Complexit of Algorithms Ulf Leser

Analysis and Design of Algorithms Dynamic Programming

Lecture 4: September 19

EECS730: Introduction to Bioinformatics

1.5 Sequence alignment

First generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences

Bioinformatics for Computer Scientists (Part 2 Sequence Alignment) Sepp Hochreiter

Biochemistry 324 Bioinformatics. Pairwise sequence alignment

Lecture 2: Pairwise Alignment. CG Ron Shamir

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Definition 2.3. We define addition and multiplication of matrices as follows.

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

C E N T R. Introduction to bioinformatics 2007 E B I O I N F O R M A T I C S V U F O R I N T. Lecture 5 G R A T I V. Pair-wise Sequence Alignment

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Pairwise sequence alignments. Vassilios Ioannidis (From Volker Flegel )

Motivating the need for optimal sequence alignments...

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Pair Hidden Markov Models

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

Similarity or Identity? When are molecules similar?

Pairwise alignment, Gunnar Klau, November 9, 2005, 16:

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

1 Maintaining a Dictionary

Minimum Edit Distance. Defini'on of Minimum Edit Distance

Matrices. Chapter Definitions and Notations

In-Depth Assessment of Local Sequence Alignment

20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, Global and local alignment of two sequences using dynamic programming

Whole Genome Alignment. Adam Phillippy University of Maryland, Fall 2012

Comparing whole genomes

8 Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 18, 2011

Practical considerations of working with sequencing data

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Haploid & diploid recombination and their evolutionary impact

Implementing Approximate Regularities

BLAST: Basic Local Alignment Search Tool

Practical Bioinformatics

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Bioinformatics and BLAST

Theory of LSH. Distance Measures LS Families of Hash Functions S-Curves

Computational Biology Lecture 5: Time speedup, General gap penalty function Saad Mneimneh

BINF 730. DNA Sequence Alignment Why?

BLAST: Target frequencies and information content Dannie Durand

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models

CS 246 Review of Linear Algebra 01/17/19

An Introduction to Sequence Similarity ( Homology ) Searching

Evolutionary Models. Evolutionary Models

Lecture 1 : Data Compression and Entropy

Outline. Approximation: Theory and Algorithms. Motivation. Outline. The String Edit Distance. Nikolaus Augsten. Unit 2 March 6, 2009

L3: Blast: Keyword match basics

Chapter 2 Notes, Linear Algebra 5e Lay

Finding the Best Biological Pairwise Alignment Through Genetic Algorithm Determinando o Melhor Alinhamento Biológico Através do Algoritmo Genético

Graph Alignment and Biological Networks

NUMB3RS Activity: DNA Sequence Alignment. Episode: Guns and Roses

Evolutionary Tree Analysis. Overview

On the Monotonicity of the String Correction Factor for Words with Mismatches

Quantifying sequence similarity

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Genomics and bioinformatics summary. Finding genes -- computer searches

Complexity Theory of Polynomial-Time Problems

Lecture 2: Connecting the Three Models

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Outline. Similarity Search. Outline. Motivation. The String Edit Distance

Lecture Notes: Markov chains

8 Grundlagen der Bioinformatik, SS 09, D. Huson, April 28, 2009

Linear Classifiers (Kernels)

Approximation: Theory and Algorithms

Stephen Scott.

Introduction to Bioinformatics Online Course: IBT

An Introduction to Bioinformatics Algorithms Hidden Markov Models

ORIE 6300 Mathematical Programming I August 25, Recitation 1

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Transcription:

Sequence Alignment Johannes Starlinger

his Lecture Approximate String Matching Edit distance and alignment Computing global alignments Local alignment Johannes Starlinger: Bioinformatics, Summer Semester 2017 2

ene Function A fundamental principle of bioinformatics he function of a protein depends on its physical structure he physical structure depends on the protein sequence he protein sequence depends on the gene sequence If the sequence of two genes is only slightly different, so will be the protein sequence If the sequence of two proteins is only slightly different, so will be their structure If the structure of two proteins is only moderately different, they likely have the same (or at least share some) function Studying the sequence of genes allows the generation of hypotheses about their function Johannes Starlinger: Bioinformatics, Summer Semester 2017 3

How enes Evolve Evolution, sequences, and function Any two species X 1, X 2 have a common ancestor A Any gene from A will undergo independent evolution in X 1 and X 2, leading to genes 1 and 2 he more similar 1 and 2 are, the more likely do they still have the same function (that of ) For any two genes of non-trivial length, the chance that they have a very similar sequence by chance is extremely small Corollary: If two genes 1 and 2 today are very similar, they most likely derive from the same ancestor and most likely have the same function How can we quantify this? AAACCA 2 AAACCA ACA_CCA 1 Johannes Starlinger: Bioinformatics, Summer Semester 2017 4

Basic Evolutionary Events he simplest model: Single bases can be replaced (R), inserted (I), or deleted (D) (or kept (M)) Any changes must be explained by sequences of I, D, R I.e., by singular evolutionary events accumulating over time We call this an edit script Very simple yet quite powerful model One more simplification??????????????? AAACCA ACA_CCA Johannes Starlinger: Bioinformatics, Summer Semester 2017 5

Example: Eyeless (ey) Family of genes identified first in Drosophila When activated in arbitrary cells, non functional eyes start to grow at various places of the body ey is a master gene controls a cascade of activations of other genes eventually leading to eye development Also inflicted with several other neural developments Johannes Starlinger: Bioinformatics, Summer Semester 2017 6

Eyes Red: Only shadow Blue: Lenses etc. reen: Mirrors Oval: Compound eyes Rectangle: Single chamber Source: reisman (2004). Eyes probably are an example of convergent evolution However, genes controlling eye development are highly conserved across a wide range of species Johannes Starlinger: Bioinformatics, Summer Semester 2017 7

Homologues of eyeless isoform D (DM) MFLQPPAIVVPPWSALIERLPSLEDMAHKDNVIAMRNLPCLASL IAKPSPMEAVEASASHPHSSSYFAYYHLDDECHSVNQLVFVRPL PDSRQKIVELAHSARPCDISRILQVSNCVSKILRYYESIRPRAISKPRVA AEVVSKISQYKRECPSIFAWEIRDRLLQENVCNDNIPSVSSINRVLRNLAAQKEQQS SSSSSANSISAKVSVSINVSNVASSRLSSSDLMQAPLNSSESAS NSESEQEAIYEKLRLLNQHAAPPLEPARAAPLVQSPNHLRSSHPQLVH NHQALQQHQQQSWPPRHYSSWYPSLSEIPISSAPNIASVAYASPSLAHSLSPP NDIESLASIHQRNCPVAEDIHLKKELDHQSDESEENSNASNINEDD QARLILKRKLQRNRSFNDQIDSLEKEFERHYPDVFARERLAKILPEARIQVWFS NRRAKWRREEKLRNQRRPNSASASSSSAASLDSPNSLSACSSLLSSA PSVSINLSSPSLSNVNAPLAIDSSESPPIPHIRPSCSDNDNRQSEDCRR VCSPCPLVHQNHHIQSNHAQHALVPAISPRLNFNSSFAMYSNMHHAL SMSDSYAVPIPSFNHSAVPLAPPSPIPQQDLPSSLYPCHMLRPPPMAPAHHH IVPDRPAVLSQSANLASCSSYEVLSAYALPPPPMASSSAADSSFSAAS SASANVPHHIAQESCPSPCSSASHFVAHSSFSSDPISPAVS 250 most similar protein sequences in UniProt Sequence identities all >50%, All p-values < 1E-50 Johannes Starlinger: Bioinformatics, Summer Semester 2017 8

his Lecture Approximate String Matching Edit distance and alignment Computing global alignments Local alignment Johannes Starlinger: Bioinformatics, Summer Semester 2017 9

Edit Scripts and Edit Distances Definition Let A, B Σ * An edit script e is a sequence of operations I, D, R, M e is an edit script for A and B iff e(a)=b Slightly underdetermined which replacement? Which base to insert? he length of an edit script is the number of I,D,R it contains he edit distance between A and B is the length of the shortest edit script for A and B Remarks If we know e(a)=b, determining e with e (B)=A is trivial he shortest edit script is not unique, but its length is MIMMMR IRMMMDI A_A _AA_ AC A_C Johannes Starlinger: Bioinformatics, Summer Semester 2017 10

Alignment Edit scripts are intuitive from an evolutionary point-of-view, but somewhat clumsy from a computational point-of-view Definition A (global) alignment of strings A, B is an arrangement of A and B, enriched with _ at arbitrary positions, under each other such that no column contains two _ he score of an alignment is the number of _ plus the number of mismatching columns it contains he alignment distance between A and B is the minimal score of any alignment of A and B Edit distance and alignment distance are essentially identical Examples A A A A _AAA AAA_ AC AC AAA AAA Score: 3 5 2 2 Johannes Starlinger: Bioinformatics, Summer Semester 2017 11

A Visual Approach: Dotplots A dotplot of two strings A, B is a matrix M with he i th character in A is represented by the i th column he j th character in B is represented by the j th row M[i,j]=1 (blue) iff A[i] = B[j] A C C A A A C A Johannes Starlinger: Bioinformatics, Summer Semester 2017 12

Dotplot and Identical Substrings How do identical substrings look like in a dotplot? A C C A A A C C A A A C A A C A Diagonals from up-left to down-right Longest diagonal is the longest common substring Johannes Starlinger: Bioinformatics, Summer Semester 2017 13

Alignments and Dotplots Every alignment of A, B can be uniquely mapped into a path through M he path starts in the upper-left corner (coord: 0,0) o through the alignment column by column Next column is X,_ move to the right Next column is _, X move down Next column is X, Y move right-down A C CAA A CA ACCAA ACCA A C C A A A C C A A A C A A C A Johannes Starlinger: Bioinformatics, Summer Semester 2017 14

Examples A CCAA A CA A C C A A ACCAA A CA A C C A A A C A A C A Clearly, the number c(p) of 1 s crossed in a diagonal step by a path P is the same as P -e(a,b) Finding the path that minimizes P -c(p) also solves the problem of computing the edit distance Johannes Starlinger: Bioinformatics, Summer Semester 2017 15

his Lecture Approximate String Matching Edit distance and alignment Computing global alignments Local alignment Johannes Starlinger: Bioinformatics, Summer Semester 2017 16

Algorithm How do we compute the edit distance of two strings? Naïve: Enumerate all paths, compute c(p) for each 3 3 5 5 3 5 5 13 Bad news: here exist >3 min(m,n) paths ood news: We can compute e(a,b) with ~3*m*n operations Johannes Starlinger: Bioinformatics, Summer Semester 2017 17

Enumerating all Paths Recursively AC AC AC AC _ AC _ A C AC A C A C AC AC _ A C_ A C A C_ Johannes Starlinger: Bioinformatics, Summer Semester 2017 18

he naïve (recursive) Way Observation Let A =n, B =m Let d(i,j)=e(a[ i], B[ j]) for 0 i n and 0 j m with d(i, 0)=i and d(0,j)=j We can compute e(a,b) = d(n,m) recursively as follows d( i, j) = min d( i d( i, j 1) + 1 d( i 1, j) + 1 1, j 1) + t( i, j) t( i, j) = 1: if A[ i] 0 : else B[ j] Johannes Starlinger: Bioinformatics, Summer Semester 2017 19

Algorithm function d(i,j) { if (i = 0) return j; else if (j = 0) return i; else return min ( d(i,j-1) + 1, d(i-1,j) + 1, d(i-1,j-1) + t(a[i],b[j])); } function t(c 1, c 2 ) { if (c 1 = c 2 ) return 0; else return 1; } Johannes Starlinger: Bioinformatics, Summer Semester 2017 20

What is Happening? d(2,2) d(2,1) d(1,2) d(1,1) d(2,0) d(1,0) d(1,1) d(0,1) d(1,0) d(0,0) d(1,1) d(0,2) d(0,1) d(1,0) d(1,0) d(0,0) d(0,1) d(0,0) d(0,1) Johannes Starlinger: Bioinformatics, Summer Semester 2017 21

Much Redundant Computation d(2,2) d(2,1) d(1,2) d(1,1) d(2,0) d(1,0) d(1,1) d(0,1) d(1,0) d(0,0) d(1,1) d(0,2) d(0,1) d(1,0) d(1,0) d(0,0) d(0,1) d(0,0) d(0,1) here are only ~n*m different parameter combinations Johannes Starlinger: Bioinformatics, Summer Semester 2017 22

Dynamic Programming Using a able Instead of computing top down (from n,m), we compute all different values for d(i,j) bottom up We store all values in a table We can immediately compute d(i,0) and d(0,j) Which values can we compute next? A C C A A A C C A A A C A A C A Johannes Starlinger: Bioinformatics, Summer Semester 2017 23

Example d( i, j) = d( i, j 1) + 1 min d( i 1, j) + 1 d( i 1, j 1) + t( i, j) A 1 2 3 4 A C 0 1 2 3 4 5 6 7 A C 0 1 2 3 4 5 6 7 A 1 0 2 3 4 A C 0 1 2 3 4 5 6 7 A 1 0 1 2 3 4 5 6 2 3 4 A C 0 1 2 3 4 5 6 7 A 1 0 1 2 3 4 5 6 2 1 0 1 2 3 4 5 3 4 A C 0 1 2 3 4 5 6 7 A 1 0 1 2 3 4 5 6 2 1 0 1 2 3 4 5 3 2 1 0 1 2 3 4 4 A C 0 1 2 3 4 5 6 7 A 1 0 1 2 3 4 5 6 2 1 0 1 2 3 4 5 3 2 1 0 1 2 3 4 4 3 2 1 1 1 2 3 Johannes Starlinger: Bioinformatics, Summer Semester 2017 24

Finding the (an) optimal Alignment(s) raceback We find the path from back to front Start at cell (n,m) See which cells were used to compute d(n,m) Walk any of these finds one optimal path Walking all means finding all optimal paths Alternative: Store pointers while filling the table A C 0 1 2 3 4 5 6 7 A 1 0 1 2 3 4 5 6 2 1 0 1 2 3 4 5 3 2 1 0 1 2 3 4 4 3 2 1 1 1 2 3 A C 0 1 2 3 4 5 6 7 A 1 0 1 2 3 4 5 6 2 1 0 1 2 3 4 5 3 2 1 0 1 2 3 4 4 3 2 1 1 1 2 3 A C 0 1 2 3 4 5 6 7 A 1 0 1 2 3 4 5 6 2 1 0 1 2 3 4 5 3 2 1 0 1 2 3 4 4 3 2 1 1 1 2 3 Johannes Starlinger: Bioinformatics, Summer Semester 2017 25

Complexity Building the table For every d(i,j), we need to access three other cells and make some (constantly many) additions and comparisons here are m*n cells hus: approximately 3*m*n operations Finding one optimal alignment We must walk from (n,m) to (1,1) Such a path can have at most length m+n We cannot go wrong! ogether: approximately m+n operations ogether: O(m*n) (for m*n > m+n) Johannes Starlinger: Bioinformatics, Summer Semester 2017 26

Eyeless Again a Closer Look he similar regions in the different homologues are not distributed randomly Actually, a single stretch of 128 AA, the PAX domain, is virtually unchanged in all homologues Controls binding to DNA and hence regulatory effects ypical: Only some parts of a gene are conserved, and these carry function Johannes Starlinger: Bioinformatics, Summer Semester 2017 27

his Lecture Approximate String Matching Edit distance and alignment Computing global alignments Local alignment Johannes Starlinger: Bioinformatics, Summer Semester 2017 28

Example Zufall? A C C C A C A A C A A A C C A A A A A C C A C C A A A A C A A A A C A A A A A A A A A A A A A C C C A C A A - - C - A A A C C A A A A C C A C C A A - A - A C A C A A C A A A A - A - A A A A A A - - A C Kein Zufall! A C C C A C A A C A A A C C A A A A A C C A C C A A A A C A A A A C A A A A A A A A A A A A Johannes Starlinger: Bioinformatics, Summer Semester 2017 29

Distance or Similarity Until now, we computed a global distance he higher e(a,b), the less similar are A and B he longer A and B, the higher is their distance (in general) Different lengths are punished: e(a,b) A - B Often, we want a local similarity instead If we have a sequence and don t know exactly where the genes are If a function is associated to a motif in a protein, i.e., a subsequence in the gene We need to search for substrings A A, B B which are very similar to each other Further, A and B should have a certain length to be interesting e(a,b ) does not help optimal distance is 0 for A =B = Johannes Starlinger: Bioinformatics, Summer Semester 2017 30

Sequence Similarity Let A = B =n A scoring function is a function s: Σ xσ Integer We also call s a substitution matrix he ungapped similarity sim of A, B wrt. s is defined as sim'( A, B) i= 1 ( A[ i], B[ i] ) he similarity sim of A, B (wrt. s) is the highest ungapped similarity score over all alignments of A and B Higher = better; maximal similarity is n*max(s) = n We are not yet there: his still is a global similarity score s Johannes Starlinger: Bioinformatics, Summer Semester 2017 31

Example = {A,C,,,_} A C _ A 4-2 -2-1 -3 C 4-1 -2-3 4-2 -3 4-3 AC_C A_C ACC AC A_CC A_C = -1 = 15 = 10 Johannes Starlinger: Bioinformatics, Summer Semester 2017 32

Computation Same ideas as for edit distance apply But: We want a high similarity, not a low distance hus, we can compute sim(a,b) as d(n,m) with d( i,0) = i k = 1 s ( A[ k], _) d(0, j) = s( _, B[ k] ) j k = 1 d( i, j) = d( i, j 1) + s(_, B[ j]) max d( i 1, j) + s( A[ i], _) d( i 1, j 1) + s( A[ i], B[ j]) Johannes Starlinger: Bioinformatics, Summer Semester 2017 33

Example A C A 4-1 -1-1 4-1 -1 4-1 C 4 - -3-3 -3-3 Edit Distance A C 0 1 2 3 4 5 A 1 0 1 2 3 4 2 1 0 1 2 3 3 2 1 1 1 2 C 4 3 2 2 2 1 C 5 4 3 3 3 2 Similarity A C 0-3 -6-9 -12-15 A -3 4 1-2 -5-8 -6 1 8 5-9 C -12 C -15 Johannes Starlinger: Bioinformatics, Summer Semester 2017 34

Lokal Similarity = Local Alignment Definition he local similarity score sim* of A, B is defined as sim*( A, B) = max A' substringof ( sim( A', B') ) A, B' substringof B Remark Inequality in string length does not matter any more Sounds terribly complex, but there is a neat trick A C C C A C A A C A A A C C A A A A A C C A C C A A A A C A A A A C A A A A A A A A A A A A Johannes Starlinger: Bioinformatics, Summer Semester 2017 35

Example Match: +1 I/R/D: -1 A 0-1 -2-3 -4-5 -6-1 0 1 A 0 Similarity Path length A 0 0 0 1 2 3 A 2 Johannes Starlinger: Bioinformatics, Summer Semester 2017 36

Smith-Waterman Algorithm Smith, Waterman: Identification of common molecular subsequences, J. Mol. Bio 147, 1981 Idea Note: Local paths need not span the entire strings Look at a single (global) path A series of matches (positive values for scoring function s) creates a series of increasing similarity values Any step with s<0 lowers the score Whenever the score gets below 0, we can forget this continuation of the path Instead of carrying on, we conceptually start a new (local) path o this end, we simply set d:=0 whenever it would be d<0 he highest value in the matrix is the end of the best local path Johannes Starlinger: Bioinformatics, Summer Semester 2017 37

Computation he same ideas as before We compute sim*(a,b) as d(n,m) with Assume X: s(x,_)<0 and s(_,x)<0 d( i,0) = 0 d(0, j) = 0 d( i, j) = d( i, j 1) + s(_, B[ j]) d( i 1, j) + s( A[ i], _) max d( i 1, j 1) + s( A[ i], B[ j]) 0 Johannes Starlinger: Bioinformatics, Summer Semester 2017 38

Example Match: +1 I/R/D: -1 A C 0-1 -2-3 -4-5 -6 A -1 1 0-1 -2-3 -4-2 0 2 1 0-1 -2-3 -1 1 3 2 1 0 AC A AC A AC A A C 0 0 0 0 0 0 0 A 0 1 0 0 0 0 0 0 0 2 1 1 0 0 0 0 1 3 2 1 1 AC A Johannes Starlinger: Bioinformatics, Summer Semester 2017 39

Local versus global Alignment lobal Alignment Comparison of two entire sequences Use when you know the sequences are related Interest: he differences Example: Proteins of the same family Local Alignment Finds interesting regions in yet uncharacterized sequences Use when trying to relate a sequence to other (known) sequences Interest: he similarities Often a first step before global alignment Example: Find similar genes in other species Johannes Starlinger: Bioinformatics, Summer Semester 2017 40

Beware: Not all Events are Equal Wildtype Probably fatal Probably fatal Neutral Functional C A A C A C A A A Leu Ser Asp yr ly Lys C A A C A A A A Leu Ser Asp Stop-Codon C A A A C A C A A A Leu Ser His Asp Leu hr C A C A C A C A A A Leu Ser Asp yr ly Lys C A A A A C A A A Leu Ser lu yr ly Lys DNA Protein DNA Protein DNA Protein DNA Protein DNA Protein Johannes Starlinger: Bioinformatics, Summer Semester 2017 41

Further Reading Everywhere Relaxed: Christianini & Hahn, Chapter 3 Step by step: Waack, Chapter 9 Johannes Starlinger: Bioinformatics, Summer Semester 2017 42