Introduction to Python for Biologists

Similar documents
Notater: INF3331. Veronika Heimsbakk December 4, Introduction 3

Python. chrysn

Contents. 6.1 Conditional statements 6.2 Iteration 6.3 Procedures and flow control

Introduction to Python

Regular Expressions and Finite-State Automata. L545 Spring 2008

L435/L555. Dept. of Linguistics, Indiana University Fall 2016

1 Opening URLs. 2 Regular Expressions. 3 Look Back. 4 Graph Theory. 5 Crawler / Spider

Introduction to Python

Python & Numpy A tutorial

Python. Tutorial. Jan Pöschko. March 22, Graz University of Technology

COMP 204. Exceptions continued. Yue Li based on material from Mathieu Blanchette, Carlos Oliver Gonzalez and Christopher Cameron

Functions in Python L435/L555. Dept. of Linguistics, Indiana University Fall Functions in Python. Functions.

Introduction to Python and its unit testing framework

Introduction to Language Theory and Compilation: Exercises. Session 2: Regular expressions

C++ For Science and Engineering Lecture 13

10. Finite Automata and Turing Machines

2 Getting Started with Numerical Computations in Python

MA/CSSE 474 Theory of Computation

MiniMat: Matrix Language in OCaml LLVM

Automata and Computability. Solutions to Exercises

Automata and Computability. Solutions to Exercises

COMP 204. Object Oriented Programming (OOP) Mathieu Blanchette

Sets are one of the basic building blocks for the types of objects considered in discrete mathematics.

1 Alphabets and Languages

Comp 11 Lectures. Dr. Mike Shah. August 9, Tufts University. Dr. Mike Shah (Tufts University) Comp 11 Lectures August 9, / 34

Foundations of

Context Free Grammars

Outline. policies for the second part. with answers. MCS 260 Lecture 10.5 Introduction to Computer Science Jan Verschelde, 9 July 2014

Data Intensive Computing Handout 11 Spark: Transitive Closure

UNIT II REGULAR LANGUAGES

Discrete Mathematical Structures: Theory and Applications

Computation Theory Finite Automata

CS177 Spring Final exam. Fri 05/08 7:00p - 9:00p. There are 40 multiple choice questions. Each one is worth 2.5 points.

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

9A. Iteration with range. Topics: Using for with range Summation Computing Min s Functions and for-loops A Graphics Applications

Read all questions and answers carefully! Do not make any assumptions about the code other than those that are clearly stated.

8.2 Examples of Classes

What Is a Language? Grammars, Languages, and Machines. Strings: the Building Blocks of Languages

Multiplying Products of Prime Powers

CS177 Fall Midterm 1. Wed 10/07 6:30p - 7:30p. There are 25 multiple choice questions. Each one is worth 4 points.

Scripting Languages Fast development, extensible programs

How do regular expressions work? CMSC 330: Organization of Programming Languages

IV. Turing Machine. Yuxi Fu. BASICS, Shanghai Jiao Tong University

Using the EartH2Observe data portal to analyse drought indicators. Lesson 4: Using Python Notebook to access and process data

6.001 Recitation 22: Streams

n CS 160 or CS122 n Sets and Functions n Propositions and Predicates n Inference Rules n Proof Techniques n Program Verification n CS 161

Plan for Today and Beginning Next week (Lexical Analysis)

CSE 211. Pushdown Automata. CSE 211 (Theory of Computation) Atif Hasan Rahman

Lab 2 Worksheet. Problems. Problem 1: Geometry and Linear Equations

Introduction. How to use this book. Linear algebra. Mathematica. Mathematica cells

1 Definition of a Turing machine

FIT100 Spring 01. Project 2. Astrological Toys

LAST TIME..THE COMMAND LINE

Lexical Analysis Part II: Constructing a Scanner from Regular Expressions

Part I: Definitions and Properties

8. More on Iteration with For

Math Review. for the Quantitative Reasoning measure of the GRE General Test

CSE 105 THEORY OF COMPUTATION

MA/CSSE 474 Theory of Computation

A High Level Programming Language for Quantum Computing

Introduction to Metalogic

Assignment 1: Due Friday Feb 6 at 6pm

Solutions to Exercises Chapter 2: On numbers and counting

10. The GNFA method is used to show that

MCS 260 Exam 2 13 November In order to get full credit, you need to show your work.

Recap DFA,NFA, DTM. Slides by Prof. Debasis Mitra, FIT.

Algorithms for Uncertainty Quantification

I. Numerical Computing

CMSC 330: Organization of Programming Languages. Regular Expressions and Finite Automata

CS 341 Homework 16 Languages that Are and Are Not Context-Free

Most General computer?

CMSC 330: Organization of Programming Languages

Computer Science Introductory Course MSc - Introduction to Java

(NB. Pages are intended for those who need repeated study in formal languages) Length of a string. Formal languages. Substrings: Prefix, suffix.

CSCI 239 Discrete Structures of Computer Science Lab 6 Vectors and Matrices

CS-141 Exam 2 Review October 19, 2016 Presented by the RIT Computer Science Community

Sets. Introduction to Set Theory ( 2.1) Basic notations for sets. Basic properties of sets CMSC 302. Vojislav Kecman

PetriScript Reference Manual 1.0. Alexandre Hamez Xavier Renault

CSE 20. Lecture 4: Introduction to Boolean algebra. CSE 20: Lecture4

Practical Bioinformatics

Computation and Logic Definitions

CSCI-141 Exam 1 Review September 19, 2015 Presented by the RIT Computer Science Community

Today s topics. Introduction to Set Theory ( 1.6) Naïve set theory. Basic notations for sets

A Little Logic. Propositional Logic. Satisfiability Problems. Solving Sudokus. First Order Logic. Logic Programming

Extended Introduction to Computer Science CS1001.py. Lecture 8 part A: Finding Zeroes of Real Functions: Newton Raphson Iteration

Turing machines Finite automaton no storage Pushdown automaton storage is a stack What if we give the automaton a more flexible storage?

Theoretical Computer Science

Automata Theory and Formal Grammars: Lecture 1

Section 1: Sets and Interval Notation

The Turing Machine. CSE 211 (Theory of Computation) The Turing Machine continued. Turing Machines

Meta-programming & you

Digital Systems and Information Part II

Chapter 0 Introduction. Fourth Academic Year/ Elective Course Electrical Engineering Department College of Engineering University of Salahaddin

1.2 The Role of Variables

Finite State Transducers

Modern Functional Programming and Actors With Scala and Akka

Rigorous Software Development CSCI-GA

Lectures about Python, useful both for beginners and experts, can be found at (

With Question/Answer Animations. Chapter 2

The Pip Project. PLT (W4115) Fall Frank Wallingford

Transcription:

Introduction to Python for Biologists Katerina Taškova 1 Jean-Fred Fontaine 1,2 1 Faculty of Biology, Johannes Gutenberg-Universität Mainz, Mainz, Germany 2 Genomics and Computational Biology, Kernel Press, Mainz, Germany https://cbdm.uni-mainz.de/mb17 March 21, 2017

Introduction to Python for Biologists Table of Contents Introduction Running code Literals and variables Numeric types Strings Exercise Lists, tuples and ranges Sets and dictionaries Convert and copy Loops Functions Branching Regular Expressions Annexes March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 2

Introduction to Python for Biologists Introduction Introduction Running code Literals and variables Numeric types Strings Exercise Lists, tuples and ranges Sets and dictionaries Convert and copy Loops Functions Branching Regular Expressions Annexes March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 3

Introduction to Python for Biologists Introduction What is Python? Python is a general-purpose programming language created by Guido van Rossum (1991) high-level (abstraction from the details of the computer) interpreted (needs an interpreter software) Python design philosophy code readability syntax brevity Python is widely used for Biology rich built-in features powerful scientific extensions plotting capabilities March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 4

Introduction to Python for Biologists Introduction Structured programming I Instructions are executed sequentially, one per line Conditional statements allow selective execution of code blocks Loops allow repeated execution of code blocks Functions allow on-demand execution of code blocks March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 5

Introduction to Python for Biologists Introduction Structured programming II 1 i n s t r u c t i o n 1 # 1 s t i n s t r u c t i o n ( hashtag # s t a r t s comments ) 2 # blank l i n e 3 repeat 20 times # 2nd i n s t r u c t i o n ( loop s t a r t s a block ) 4 i n s t r u c t i o n a # block defined by i n d e n t a t i o n ( spaces or tabs ) 5 i n s t r u c t i o n b # 2nd i n s t r u c t i o n i n block 6 # blank l i n e 7 i f n>10 # 3 rd i n s t r u c t i o n ( C o n d i t i o n a l statement ) 8 i n s t r u c t i o n a # 1 s t i n s t r u c t i o n i n block 9 i n s t r u c t i o n b # 2nd i n s t r u c t i o n i n block 10 # blank l i n e 11 # blank l i n e 12 # backslashs j o i n l i n e s 13 i n s t r u c t i o n 3 \ # 3 rd i n s t r u c t i o n, p a r t 1 14 i n s t r u c t i o n 3 # 3 rd i n s t r u c t i o n, p a r t 2 15 # blank l i n e 16 # Expressions i n ( ), {}, or [ ] can span m u l t i p l e l i n e s 17 i n s t r u c t i o n 4 ( 1, 2, 3 # 4 th i n s t r u c t i o n, p a r t 1 18 4, 5, 6) # 4 th i n s t r u c t i o n, p a r t 2 March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 6

Introduction to Python for Biologists Introduction Namespace Variables are names associated with data e.g. a=2 assigns value 2 to variable a Functions are names associated to specific code blocks built-in functions are available (see list on slide 100) e.g. print(a) will display 2 on the screen The user namespace is the set of names available to the user users can define new names of variables and functions in their namespace imported modules can add names of variables and functions in the user namespace March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 7

Introduction to Python for Biologists Introduction Object-oriented programming Data is organized in classes and objects a class is a template defining what objects can store and do an object is an instance of a class objects have attributes to store data and methods to do actions object namespaces are different from user namespace Example class Human is defined as: has a name (an attribute name ) has an age (an attribute age ) can introduce itself (a method who ) example with 1 existing Human object P1: 1 P1. name = Mary # assigns value to a t t r i b u t e name 2 P1. age = 26 # assigns value to a t t r i b u t e age 3 P1. who ( ) # d i s p l a y s My name i s Mary I am 2 6! 4 who ( ) # e r r o r! not i n the user namespace March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 8

Introduction to Python for Biologists Introduction Modules Modules can add functionalities to Python e.g. classes and functions Example of available modules: NumPy for scientific computing Matplotlib for plotting BioPython for Biology Modules have to be imported into the code 1 # import datetime module i n i t s own namespace 2 import datetime 3 datetime. date. today ( ) # 2017 03 16 4 today ( ) # e r r o r! 5 6 # import f u n c t i o n s log2 and log10 from module math 7 # i n c u r r e n t namespace 8 from math import log2, log10 9 log10 ( 1 ) # equal 0 March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 9

Introduction to Python for Biologists Running code Introduction Running code Literals and variables Numeric types Strings Exercise Lists, tuples and ranges Sets and dictionaries Convert and copy Loops Functions Branching Regular Expressions Annexes March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 10

Introduction to Python for Biologists Running code Running code I From a terminal by using the interactive Python shell 1 $ python3 # opens Python s h e l l 2 a=2 # assigns 2 to a 3 b=3 # assigns 3 to b 4 e x i t ( ) # closes Python s h e l l From a terminal by running a script file e.g. let say myscript.py is a script file (simple text file) and it contains: print( hello world! ) 1 $ python3 myscript. py # runs python3 and the s c r i p t 2 h e l l o world! # r e s u l t of the s c r i p t on the t e r m i n a l March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 11

Introduction to Python for Biologists Running code Running code II From Jupyter Notebook web-based graphical interface manage cells of code or text see execution results on the same notebook save/open notebooks March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 12

Introduction to Python for Biologists Running code Documentation and messages I Documentation and help: https://docs.python.org/3 use the built-in help() function e.g. help(print) to display help for function print() see help menu or Google it Examples of error messages 1 # F o r g e t t i n g quotes 2 p r i n t ( Hello world ) 3 # F i l e < s t d i n >, l i n e 2 4 # p r i n t ( Hello world ) 5 # ˆ 6 # SyntaxError : i n v a l i d syntax March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 13

Introduction to Python for Biologists Running code Documentation and messages II 1 # S p e l l i n g mistakes 2 p r i n ( Hello world ) 3 # Traceback ( most recent c a l l l a s t ) : 4 # F i l e < s t d i n >, l i n e 2, i n <module> 5 # NameError : name p r i n i s not defined 1 # Wrong l i n e break w i t h i n a s t r i n g 2 p r i n t ( Hello 3 World ) 4 # F i l e < s t d i n >, l i n e 2 5 # p r i n t ( Hello 6 # ˆ 7 # SyntaxError : EOL while scanning s t r i n g l i t e r a l March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 14

Introduction to Python for Biologists Literals and variables Introduction Running code Literals and variables Numeric types Strings Exercise Lists, tuples and ranges Sets and dictionaries Convert and copy Loops Functions Branching Regular Expressions Annexes March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 15

Introduction to Python for Biologists Literals and variables Numeric and strings literals I 1 # Numeric l i t e r a l s 2 12 3 123 4 1.6E3 # means 1600 5 6 # S t r i n g s l i t e r a l s 7 A s t r i n g # A s t r i n g 8 A s t r i n g # A s t r i n g 9 A s t r i n g # A s t r i n g 10 Three s i n g l e quotes # Three s i n g l e quotes 11 Three double quotes # Three double quotes 12 A \ s t r i n g \ # A s t r i n g ( backslash escape sequence ) 13 r A \ s t r i n g \ # A \ s t r i n g \ ( raw s t r i n g ) Python stores literals in objects of corresponding classes (class int for integers, float for floatting point, and str for strings) March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 16

Introduction to Python for Biologists Literals and variables Numeric and strings literals II Printing numeric and strings literals 1 p r i n t (12) # 12 2 p r i n t (1+2) # 3 3 4 p r i n t ( Hello World ) # Hello World 5 6 p r i n t ( Hello World, 1+2) # Hello World 3 7 p r i n t ( Hello World, 1+2, sep= ) # Hello World 3 8 p r i n t ( Hello World, 1+2, sep= \ t ) # Hello World 3 9 # (\ t : tab, \n : newline ) 10 11 p r i n t ( AB, end= ) # AB ( avoid newline at the end ) 12 p r i n t ( CD ) # ABCD 13 14 p r i n t ( Max i s, 12, and Min i s, 3) # Max i s 12 and Min i s 3 March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 17

Introduction to Python for Biologists Literals and variables Variables I Variables are names used to access objects first letter is a character (not a digit) no space characters allowed case-sensitive (variable name var is not Var) prefer alphanumeric characters (e.g. abc123) avoid accents, non-alphanumeric, non English underscores may be used (e.g. abc 123) The following keywords can not be used as variable names and, assert, break, class, continue def, del, elif, else, except, exec, finally, for, from global, if, import, in, is, lambda, not, or, pass print, raise, return, try, while, yield March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 18

Introduction to Python for Biologists Literals and variables Variables II 1 # Numeric types 2 a=2 # a i s assigned an i n t o b j e c t of value 2 3 p r i n t ( a ) # p r i n t s the o b j e c t assigned to a ( 2 ) 4 b=a # b i s assigned the same o b j e c t as a ( 2 ) 5 p r i n t ( b ) # 2 6 a=5 # a i s assigned a new o b j e c t of value 5 7 p r i n t ( a ) # 5 8 p r i n t ( b ) # 2 ( b i s s t i l l assigned to o b j e c t of value 2) 9 10 # S t r i n g s 11 c1= a 12 p r i n t ( c1 ) # a 13 myname125 = abc March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 19

Introduction to Python for Biologists Numeric types Introduction Running code Literals and variables Numeric types Strings Exercise Lists, tuples and ranges Sets and dictionaries Convert and copy Loops Functions Branching Regular Expressions Annexes March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 20

Introduction to Python for Biologists Numeric types Numeric types I 1 type ( 7 ) # <class i n t > ( i n t e g e r number ) 2 type ( 8. 2 5 ) # <class f l o a t > ( f l o a t i n g p o i n t ) 3 type (4.52 e 3) # <class f l o a t > ( f l o a t i n g p o i n t ) 4 5 # Operators ( s p e c i a l b u i l t i n f u n c t i o n s ) 6 1 + 3 # 4 ( a d d i t i o n ) 7 4 1 # 3 ( s u b s t r a c t i o n ) 8 3 2 # 6 ( m u l t i p l i c a t i o n ) 9 9 / 2 # 4.5 ( d i v i s i o n ) 10 9 / / 2 # 4 ( i n t e g e r d i v i s i o n ) 11 9 % 2 # 1 ( i n t e g e r d i v i s i o n remainder ) 12 2 3 # 8 ( exponent ) 13 14 # Lowest to highest operators precedence ( equal i f on same l i n e ) 15 +, # Addition, S u b t r a c t i o n 16, /, / /, % # M u l t i p l i c a t i o n, D i v i s i o n s, Remainder 17 +x, x # P o s i t i v e, Negative 18 # Exponentiation March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 21

Introduction to Python for Biologists Numeric types Numeric types II 1 # B u i l t i n f u n c t i o n s 2 abs ( 2.58) # 2.58 ( absolute value of x ) 3 round ( 2. 5 ) # 2 ( round to c l o s e s t i n t e g e r ) 4 5 # With v a r i a b l e s 6 a = 1 # 1 7 b = 1 + 1 # 2 8 c = a + b # 3 9 d = a+c b # 7 ( precedence of over +) 10 d = ( a+c ) b # 8 ( use parentheses to break precedence ) 11 12 # Short n o t a t i o n s ( v a l i d f o r +,,, /,... ) 13 a += 1 # a = a + 1 14 a = 5 # a = a 5 15 16 # Special f l o a t values 17 f l o a t ( NaN ) # nan ( Not a Number ) 18 f l o a t ( I n f ) # i n f : I n f i n i t e p o s i t i v e ; i n f : I n f i n i t e negative March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 22

Introduction to Python for Biologists Strings Introduction Running code Literals and variables Numeric types Strings Exercise Lists, tuples and ranges Sets and dictionaries Convert and copy Loops Functions Branching Regular Expressions Annexes March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 23

Introduction to Python for Biologists Strings Sequence types Text sequence type: Strings: immutable sequences of characters Basic sequence types: Lists: mutable sequences Tuples: immutable sequences Ranges: immutable sequence of numbers Sequence operations: All sequence types support common sequence operations (slide 98) Mutable sequence types support specific operations (slide 99) March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 24

Introduction to Python for Biologists Strings Strings I 1 # Quotes 2 A s t r i n g # A s t r i n g 3 A s t r i n g # A s t r i n g 4 A s t r i n g # A s t r i n g 5 Three s i n g l e quotes # Three s i n g l e quotes 6 Three double quotes # Three double quotes 7 8 # Escape sequences ( see annexes ) 9 A s i n g l e quote # A s i n g l e quote 10 A s i n g l e quote \ # A s i n g l e quote 11 A t a b u l a t i o n \ t 12 A newline \n See other escape sequences in slide 97 Triple quoted strings may span multiple lines - all associated whitespace will be included in the string literal March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 25

Introduction to Python for Biologists Strings Strings II 1 # Operators 2 pipe + t t e # = p i p e t t e ( concatenation ) 3 A 7 # = AAAAAAA ( r e p l i c a t i o n ) 4 A 3 + C 2 # = AAACC 5 A + s t r ( 2. 0 ) # = A2. 0 ( convert number then concatenate ) 6 7 # B u i l t i n f u n c t i o n s 8 len ( A s t r i n g of characters ) # 22 ( l e n g t h i n characters ) 9 type ( a ) # <class s t r > ( s t r i n g ) 10 11 # S l i c e s [ s t a r t : end : step ] (0 i s index of f i r s t character ) 12 ABCDEFG [ 2 : 5 ] # CDE ( F at index 5 excluded ) 13 ABCDEFG [ : 5 ] # ABCDE ( from begining ) 14 ABCDEFG [ 5 : ] # FG ( to the end ) 15 ABCDEFG [ 2:] # FG ( 2 from the end : to the end ) 16 ABCDEFG [ 0 : 5 : 2 ] # ACE ( every second l e t t e r w ith step =2) March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 26

Introduction to Python for Biologists Strings Strings methods I Strings are immutable: new objects are created for changes 1 seq = ACGtCCAgTnAGaaGT 2 3 # Case 4 seq. c a p i t a l i z e ( ) # Acgtccagtnagaagt 5 seq. casefold ( ) # acgtccagtnagaagt ( e s z e t t => ss ) 6 seq. lower ( ) # acgtccagtnagaagt ( e s z e t t => e s z e t t ) 7 seq. swapcase ( ) # acgtccagtnagaagt 8 seq. upper ( ) # ACGTCCAGTNAGAAGT 9 10 # Search and replace 11 seq. count ( a ) # 2 ( case s e n s i t i v e ) 12 seq. count ( G,0, 4) # 1 ( s l i c e s t a r t and end indexes ) 13 seq. endswith ( GT ) # True 14 seq. endswith ( G,0, 4) # False ( s l i c e s t a r t and end indexes ) 15 seq. f i n d ( GtC ) # 2 (1 s t h i t index, 1 otherwise ) 16 seq. replace ( aa, t t ) # ACGtCCAgTnAGttGT ( case s e n s i t i v e ) 17 seq. replace ( A, x, 2) # xcgtccxgtnagaagt (2 f i r s t h i t s only ) March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 27

Introduction to Python for Biologists Strings Strings methods II 1 seq = ACGtCCAgTnAGaaGT 2 3 # I s f u n c t i o n s 4 seq. isalnum ( ) # True ( Are a l l characters alphanumeric?) 5 seq. i s a l p h a ( ) # True ( Are a l l characters a l p h a b e t i c?) 6 seq. i s l o w e r ( ) # False ( Are a l l characters lowercase?) 7 seq. isnumeric ( ) # False ( Are a l l numeric characters?) 8 seq. isspace ( ) # False ( Are a l l whitespace characters?) 9 seq. isupper ( ) # False ( Are a l l characters uppercase?) 10 11 # Join and s p l i t 12. j o i n ( [ A, B ] ) # A B 13. j o i n ( seq ) # A C G t C C A g T n A G a a G T 14 seq. p a r t i t i o n ( aa ) # ( ACGtCCAgTnAG, aa, GT ) : a t u p l e 15 seq. s p l i t ( aa ) # [ ACGtCCAgTnAG, GT ] : a l i s t 16 1\n2. s p l i t l i n e s ( ) # [ 1, 2 ] ( s p l i t at l i n e boundaries \r, \n ) March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 28

Introduction to Python for Biologists Strings Strings methods III 1 seq = ACGtCCAgTnAGaaGT 2 3 # D e l e t i n g 4 seq. l s t r i p ( ) # remove leading whitespace characters 5 seq. r s t r i p ( ) # remove t r a i l i n g whitespace characters 6 seq. s t r i p ( ) # remove whitespace characters from both ends 7 8 seq. l s t r i p ( AC ) # GtCCAgTnAGaaGT ( remove C s or A s ) 9 seq. l s t r i p ( CA ) # GtCCAgTnAGaaGT ( remove C s or A s ) 10 seq. l s t r i p ( C ) # ACGtCCAgTnAGaaGT ( no impact ) 11 # same f o r r s t r i p but from the r i g h t and s t r i p from both ends 12 13 # Simple parsing of t e x t l i n e s from CSV f i l e s 14 l i n e. s t r i p ( ). s p l i t (, ) # remove newline and s p l i t CSV (\ t i f TSV) 15 16 # t r a n s l a t e ( case s e n s i t i v e ) 17 t a b l e = seq. maketrans ( atcg, tagc ) # map characters by index 18 seq. lower ( ). t r a n s l a t e ( t a b l e ) # t g c a g g t c a n t c t t c a March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 29

Introduction to Python for Biologists Exercise Introduction Running code Literals and variables Numeric types Strings Exercise Lists, tuples and ranges Sets and dictionaries Convert and copy Loops Functions Branching Regular Expressions Annexes March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 30

Introduction to Python for Biologists Exercise Exercise Create the following directory structure Dokumente python notebooks data Jupyter Notebook File: Literals.ipynb URL: https://cbdm.uni-mainz.de/mb17 Download the file into the notebooks folder March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 31

Introduction to Python for Biologists Lists, tuples and ranges Introduction Running code Literals and variables Numeric types Strings Exercise Lists, tuples and ranges Sets and dictionaries Convert and copy Loops Functions Branching Regular Expressions Annexes March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 32

Introduction to Python for Biologists Lists, tuples and ranges Sequence types Text sequence type: Strings: immutable sequences of characters Basic sequence types: Lists: mutable sequences Tuples: immutable sequences Ranges: immutable sequence of numbers Sequence operations: All sequence types support common sequence operations (slide 98) Mutable sequence types support specific operations (slide 99) March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 33

Introduction to Python for Biologists Lists, tuples and ranges Lists I A List is an ordered collection of objects 1 L i s t 1 = [ ] # an empty l i s t 2 3 L i s t 1 = [ b, a, 1, cat, K, dog, F ] 4 L i s t 1 [ 0 ] # b ( access item of index 0) 5 L i s t 1 [ 1 ] # a ( access item of index 1) 6 L i s t 1 [ 1] # F ( access the l a s t item ) 7 L i s t 1 [ 2] # dog ( access the second l a s t item ) 8 9 # S l i c e s [ s t a r t : end : step ] 10 L i s t 1 [ 2 : 5 ] # [ 1, cat, K ] ( index 5 excluded ) 11 L i s t 1 [ : 5 ] # [ b, a, 1, cat, K ] 12 L i s t 1 [ 5 : ] # [ dog, F ] 13 L i s t 1 [ 2:] # [ dog, F ] 14 L i s t 1 [ 0 : 5 : 2 ] # [ b, 1, K ] March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 34

Introduction to Python for Biologists Lists, tuples and ranges Lists II 1 # B u i l t i n f u n c t i o n s 2 L i s t 2 = [ 1, 2, 3, 4, 5] 3 len ( L i s t 1 ) # 5 ( l e n g t h = 7 items ) 4 max( L i s t 2 ) # 5 5 min ( L i s t 2 ) # 1 6 sum( L i s t 2 ) # 15 7 8 # L i s t methods 9 L i s t 2 = [ ] # empty l i s t 10 L i s t 2. append ( 1 ) # [ 1 ] 11 L i s t 2. append ( A ) # [ 1, A ] 12 L i s t 2. extend ( [ B, 2 ] ) # [ 1, A, B, 2] 13 L i s t 2. pop ( 2 ) # [ 1, A, 2] 14 L i s t 2. i n s e r t ( 3, A ) # [ 1, A, 2, A ] ( i n s e r t A at index 3) 15 L i s t 2. index ( A ) # 1 ( index of the 1 s t A ) 16 L i s t 2. count ( A ) # 2 ( number of A ) 17 L i s t 2. reverse ( ) # [ A, 2, A, 1] March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 35

Introduction to Python for Biologists Lists, tuples and ranges Lists III 1 # s o r t i n g 2 L i s t 3 = [ 5, 3, 4, 1, 2] 3 sorted ( L i s t 3 ) # [ 1, 2, 3, 4, 5] ( b u i l d a new sorted l i s t ) 4 L i s t 3 # [ 5, 3, 4, 1, 2] ( L i s t 3 not changed ) 5 L i s t 3. s o r t ( ) # modifies the l i s t in place 6 L i s t 3 # [ 1, 2, 3, 4, 5] (. s o r t ( ) did modify L i s t 3! ) 7 8 # nested l i s t / 2D l i s t s / t a b l e s 9 mylist = [ [ b, a ], 10 [ 1, cat ] ] # a l i s t of 2 l i s t s 11 mylist [ 0 ] # r e t u r n s the f i r s t l i s t [ b, a ] 12 mylist [ 0 ] [ 0 ] # b (1 s t item of the 1 s t l i s t ) 13 mylist [ 0 ] [ 1 ] # a (2 nd item of the 1 s t l i s t ) 14 mylist [ 1 ] # r e t u r n s the 2nd l i s t [ 1, cat ] 15 mylist [ 1 ] [ 0 ] = 1 0 # [ [ b, a ], [10, cat ] ] March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 36

Introduction to Python for Biologists Lists, tuples and ranges Lists IV 1 mylist = [ [ b, a ], 2 [ 1, cat ] ] 3 4 f o r s u b l i s t i n mylist : # loop over s u b l i s t s 5 f o r value i n s u b l i s t : # loop over values 6 p r i n t ( value ) # p r i n t 1 value per l i n e 7 # b 8 # a 9 # 10 10 # cat 11 12 f o r s u b l i s t i n mylist : # loop over s u b l i s t s 13 n e w s u b l i s t = map( s t r, s u b l i s t ) # convert each item to s t r i n g 14 p r i n t ( \ t. j o i n ( n e w s u b l i s t ) ) # p r i n t as TSV t a b l e 15 # b a 16 # 10 cat March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 37

Introduction to Python for Biologists Lists, tuples and ranges Tuples and ranges A Tuple is an ordered collection of objects 1 Tuple1 = ( ) # empty t u p l e 2 Tuple1 = ( b, a, 1, cat, K, dog, F ) # defined t u p l e 3 4 Tuple1 [ 0 ] # b 5 Tuple1 [ 1 : 3 ] # ( a, 1) ( index 3 excluded ) Ranges 1 # Range ( s t a r t, stop [, step ] ) 2 range (10) # range ( 0, 10) => no nice p r i n t method 3 l i s t ( range (10) ) # [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9] 4 l i s t ( range ( 0, 30, 5) ) # [ 0, 5, 10, 15, 20, 25] 5 l i s t ( range ( 0, 5, 1) ) # [ 0, 1, 2, 3, 4] 6 l i s t ( range ( 0 ) ) # [ ] 7 l i s t ( range ( 1, 0) ) # [ ] March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 38

Introduction to Python for Biologists Sets and dictionaries Introduction Running code Literals and variables Numeric types Strings Exercise Lists, tuples and ranges Sets and dictionaries Convert and copy Loops Functions Branching Regular Expressions Annexes March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 39

Introduction to Python for Biologists Sets and dictionaries Sets I A Set is a mutable unordered collection of objects 1 S0 = set ( ) # an empty set 2 S0 = { a, 1} # a new set of 2 items 3 S1 = { a, 1, b, R } # a new set of 4 items 4 S2 = { a, 1, b, S } # a new set of 4 items 5 len ( S0 ) # 2 6 7 # Operators 8 R i n S1 # True 9 R not i n S2 # True 10 S1 S2 # i n S1 but not i n S2 => { R } 11 S1 S2 # i n S1 or i n S2 => {1, a, S, R, b } 12 S1 & S2 # i n S1 and i n S2 => {1, b, a } 13 S1 ˆ S2 # i n S1 or i n S2 but not i n both => { R, S } 14 S0 <= S1 # S0 i s subset of S2 => True 15 S1 >= S2 # S1 i s superset of S2 => False 16 S1 >= S0 # True 17 S0. i s d i s j o i n t ( S1 ) # False March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 40

Introduction to Python for Biologists Sets and dictionaries Sets II 1 # Methods 2 S0. copy ( ) # r e t u r n a new set w ith a shallow copy of S0 3 S0. add ( item ) # add element item to the set 4 S0. remove ( item ) # remove element item from the set 5 S0. discard ( item ) # remove element item from the set i f present 6 S0. pop ( ) # remove and r e t u r n an a r b i t r a r y element 7 S0. c l e a r ( ) # remove a l l elements from the set March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 41

Introduction to Python for Biologists Sets and dictionaries Dictionaries I A Dictionary is a mutable indexed collection of objects (indexed by unique keys) 1 d = {} # empty d i c t i o n a r y 2 d = { A : ALA, C : CYS } # d i c t i o n a r y w ith 2 items 3 d [ A ] # ALA 4 d [ C ] # CYS 5 d [ H ]= HIS # add new item 6 d # { H : HIS, C : CYS, A : ALA } 7 del d [ A ] # { C : CYS, H : HIS } 8 9 C i n d # True ( key C i s i n d ) 10 A not i n d # True ( key A i s not i n d anymore ) March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 42

Introduction to Python for Biologists Sets and dictionaries Dictionaries II d[key] get value by key d[key] = val set value by key del d[key] delete item by key d.clear() delete all items len(d) number of items d.copy() make a shallow copy d.keys() return a view of all keys d.values() return a view of all values d.items() return a view of all items (key,value) d.update(d2) add all items from dictionary d2 d.get(key [, val]) get value by key if exists, otherwise val d.setdefaults(key [, val]) like d.get(k,val), also set d[k]=val if k not in d pop(key[, default]) remove key and return its value, return default otherwise. d.popitem() remove a random item and returns it as tuple Table: Functions for dictionaries March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 43

Introduction to Python for Biologists Convert and copy Introduction Running code Literals and variables Numeric types Strings Exercise Lists, tuples and ranges Sets and dictionaries Convert and copy Loops Functions Branching Regular Expressions Annexes March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 44

Introduction to Python for Biologists Convert and copy Converting types I Many Python functions are sensitive to the type of data. For example, you cannot concatenate a string with an integer: 1 sign = You are + 21 + years old # e r r o r!! 2 sign = You are + s t r ( 21) + years old # OK 3 sign # You are 21 years old 4 5 # convert to i n t ( from s t r or f l o a t ) 6 i n t ( 2014 ) # from a s t r i n g 7 i n t (3.141592) # from a f l o a t 8 9 # convert to f l o a t ( from s t r or i n t ) 10 f l o a t ( 1.99 ) # from a s t r i n g 11 f l o a t ( 5 ) # from an i n t e g e r March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 45

Introduction to Python for Biologists Convert and copy Converting types II 1 # convert to s t r ( from i n t, f l o a t, l i s t, tuple, d i c t and set ) 2 s t r (3.141592) # 3.141592 3 s t r ( [ 1, 2, 3, 4 ] ) # [ 1, 2, 3, 4 ] 4 5 # convert a sequence type to another 6 # ( s t r, l i s t, tuple, and set f u n c t i o n s ) 7 new set = set ( o l d l i s t ) # l i s t to set 8 new tuple = t u p l e ( o l d l i s t ) # l i s t to t u p l e 9 new set = set ( Hello ) # s t r i n g to set { H, o, e, l } 10 n e w l i s t = l i s t ( Hello ) # s t r i n g to l i s t [ H, e, l, l, o ] March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 46

Introduction to Python for Biologists Convert and copy Copy I Assignments (=) do not copy objects, they create bindings between a target and an object. 1 # Numeric types ( immutable ) 2 a = 1 # a binds the o b j e c t 1 3 b = a # b binds the o b j e c t 1 4 b = b + 1 # b binds a new o b j e c t created by the sum 5 a # 1 6 b # 2 7 8 # S t r i n g s ( immutable ) 9 a = Hello # a binds the o b j e c t Hello 10 b = a # b binds the o b j e c t Hello 11 a = a. replace ( o, o World! ) # a binds a new o b j e c t 12 a # Hello World! 13 b # Hello March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 47

Introduction to Python for Biologists Convert and copy Copy II For collections that are mutable or contain mutable items, a shallow copy is sometimes needed so one can change one copy without changing the other. 1 # D i c t i o n a r y ( mutable ) 2 d1 = { A : ALA, C : CYS } # d1 binds the o b j e c t 3 d2 = d1 # d2 binds the o b j e c t 4 d2 [ H ]= HIS # add item to the o b j e c t 5 d1 # { A : ALA, H : HIS, C : CYS } 6 d2 # { A : ALA, H : HIS, C : CYS } 7 8 d2 = d1. copy ( ) # d2 binds a shallow copy of the o b j e c t 9 d2 [ P ]= PRO # add item to the copied o b j e c t 10 d1 # { A : ALA, H : HIS, C : CYS } 11 d2 # { A : ALA, H : HIS, P : PRO, C : CYS } March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 48

Introduction to Python for Biologists Convert and copy Copy III 1 # L i s t ( mutable ) 2 l 1 = [ A, H, C ] 3 l 2 = l 1 4 l 2. append ( P ) 5 l 1 # [ A, H, C, P ] 6 l 2 # [ A, H, C, P ] 7 8 l 2 = l 1 [ : ] # shallow copy by assigning a s l i c e of the a l l l i s t 9 l 2. append ( V ) 10 l 1 # [ A, H, C, P ] 11 l 2 # [ A, H, C, P, V ] March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 49

Introduction to Python for Biologists Convert and copy Copy IV Convert types to get copies 1 n e w l i s t = l i s t ( o l d l i s t ) # shallow copy 2 new dict = d i c t ( o l d d i c t ) # shallow copy 3 new set = set ( o l d l i s t ) # copy l i s t as a set 4 new tuple = t u p l e ( o l d l i s t ) # copy l i s t a t u p l e The copy module 1 import copy 2 x. copy ( ) # shallow copy of x 3 x. deepcopy ( ) # deep copy of x, i n c l u d i n g embedded o b j e c t s March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 50

Introduction to Python for Biologists Loops Introduction Running code Literals and variables Numeric types Strings Exercise Lists, tuples and ranges Sets and dictionaries Convert and copy Loops Functions Branching Regular Expressions Annexes March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 51

Introduction to Python for Biologists Loops For loop I 1 # For items i n a l i s t 2 f o r person i n [ I s a b e l, Kate, Michael ] : 3 p r i n t ( Hi, person ) 4 # Hi I s a b e l 5 # Hi Kate 6 # Hi Michael 7 8 # For items i n a d i c t i o n a r y 9 seq = # an empty s t r i n g 10 d = { A : ALA, C : CYS } # a d i c t i o n a r y w ith 2 keys 11 f o r k i n d. keys ( ) : # loop over the keys 12 seq += d [ k ] # append value to seq 13 p r i n t ( seq ) # CYSALA March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 52

Introduction to Python for Biologists Loops For loop II 1 # For items i n a s t r i n g 2 f o r c i n abc : 3 p r i n t ( c ) 4 # a 5 # b 6 # c 7 8 # For items i n a range 9 f o r n i n range ( 3 ) : 10 p r i n t ( n ) 11 # 0 12 # 1 13 # 2 14 15 # For items from any i t e r a t o r 16 f o r n i n i t e r a t o r : 17 p r i n t ( n ) March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 53

Introduction to Python for Biologists Loops Enumerate 1 # loop g e t t i n g index and value 2 RNAs = [ mirna, trna, mrna ] 3 f o r i, rna i n enumerate (RNAs) : 4 p r i n t ( i, rna ) 5 # 0 mirna 6 # 1 trna 7 # 2 mrna 8 9 # loop over 2 l i s t s 10 RNAtypes = [ micro, t r a n s f e r, messenger ] 11 f o r i, t i n enumerate ( RNAtypes ) : 12 r = RNAs[ i ] 13 p r i n t ( i, t, r ) 14 # 0 micro mirna 15 # 1 t r a n s f e r trna 16 # 2 messenger mrna March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 54

Introduction to Python for Biologists Loops While loop 1 i =0 2 value=1 3 while value <200: 4 i +=1 5 value = i 6 p r i n t ( i, value ) 7 # 1 1 8 # 2 2 9 # 3 6 10 # 4 24 11 # 5 120 12 # 6 720 March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 55

Introduction to Python for Biologists Introduction Running code Literals and variables Numeric types Strings Exercise Lists, tuples and ranges Sets and dictionaries Convert and copy Loops Functions Branching Regular Expressions Annexes March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 56

Introduction to Python for Biologists Exercise URL https://cbdm.uni-mainz.de/mb17 Jupyter Notebook File: Sequences.ipynb Download the file into the notebooks folder Data file File: shrub dimensions.csv Download the file into the data folder March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 57

Introduction to Python for Biologists Functions Introduction Running code Literals and variables Numeric types Strings Exercise Lists, tuples and ranges Sets and dictionaries Convert and copy Loops Functions Branching Regular Expressions Annexes March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 58

Introduction to Python for Biologists Functions Functions I 1 from random import choice # import f u n c t i o n choice 2 3 # Simple f u n c t i o n 4 def kmerfixed ( ) : # d e f i n e f u n c t i o n kmerfixed 5 p r i n t ( ACGTAGACGC ) # p r i n t predefined s t r i n g 6 7 kmerfixed ( ) # d i s p l a y ACGTAGACGC 8 9 # Returning a value 10 def kmer10 ( ) : # d e f i n e f u n c t i o n kmer10 11 seq= # d e f i n e an empty s t r i n g 12 f o r count i n range ( 10) : # repeat 10 times 13 seq += choice ( CGTA ) # add 1 random nt to s t r i n g 14 r e t u r n ( seq ) # r e t u r n s t r i n g 15 16 newkmer = kmer10 ( ) # get r e s u l t of f u n c t i o n i n t o v a r i a b l e 17 p r i n t ( newkmer ) # c a l l the f u n c t i o n e. g. ACGGATACGC March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 59

Introduction to Python for Biologists Functions Functions II 1 # One parameter 2 def kmer ( k ) : # defin e kmer with 1 param. k 3 seq= 4 f o r count i n range ( k ) : # k i s used to d e f i n e the range 5 seq+=choice ( CGTA ) 6 r e t u r n ( seq ) 7 8 p r i n t ( kmer ( k =4) ) # e. g. TACC 9 p r i n t ( kmer (20) ) # e. g. CACAATGGGTACCCCGGACC 10 p r i n t ( kmer ( 0 ) ) # 11 p r i n t ( kmer ( ) ) # TypeError : kmer ( ) missing 1 r e q u i r e d 12 # p o s i t i o n a l argument : k March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 60

Introduction to Python for Biologists Functions Functions III 1 # Parameters with more parameters and d e f a u l t values 2 def generic kmer ( alphabet= ACGT, k=10) : 3 seq= 4 f o r count i n range ( k ) : 5 seq+=choice ( alphabet ) 6 r e t u r n ( seq ) 7 8 generic kmer ( AB12, 15) # e. g. 112AA1A12AA1121 9 generic kmer ( AB12 ) # e. g. 1AA1B1BA2A 10 generic kmer ( k=20) # e. g. GTGGGCTTGTGCCCTGCACT 11 generic kmer ( ) # e. g. CTTGCCGGGA 12 generic kmer ( k =8, alphabet= #$%& ) # e. g. $$#&%$%$ March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 61

Introduction to Python for Biologists Functions Name spaces I Variable and function names defined globally can be seen in functions: this is the global namespace 1 a = 10 # g l o b a l v a r i a b l e 2 3 def my function ( ) : 4 p r i n t ( a ) # w i l l use the g l o b a l v a r i a b l e 5 6 my function ( ) # 10 ( the g l o b a l a ) 7 p r i n t ( a ) # 10 ( the g l o b a l a ) March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 62

Introduction to Python for Biologists Functions Name spaces II Names defined within a function can not be seen outside: the function has its own namespace. 1 a = 10 # g l o b a l v a r i a b l e 2 3 def my function ( ) : 4 a = 1 # l o c a l v a r i a b l e defined by assignment 5 b = 2 # l o c a l v a r i a b l e defined by assignment 6 p r i n t ( a ) 7 8 my function ( ) # 1 ( the l o c a l a ) 9 p r i n t ( a ) # 10 ( the g l o b a l a ) 10 p r i n t ( b ) # NameError : name b i s not defined March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 63

Introduction to Python for Biologists Functions Name spaces III Use parameters and returned values to get and set variables outside the name space 1 a = 10 # g l o b a l v a r i a b l e 2 3 def my function ( v a l ) : # l o c a l v a r i a b l e v a l 4 b = 2 5 v a l = v a l + b 6 r e t u r n ( v a l ) 7 p r i n t ( a ) # 10 ( the g l o b a l a ) 8 p r i n t ( my function ( a ) ) # 12 9 p r i n t ( a ) # 10 ( the g l o b a l a unchanged ) 10 11 c = my function ( a ) # set v a l to 10 and assign 10+2 to c 12 p r i n t ( c ) # 12 ( g l o b a l a was changed ) 13 p r i n t ( a ) # 10 ( g l o b a l a was unchanged ) 14 15 a = my function ( a ) # change g l o b a l a w ith value 10+2 March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 64

Introduction to Python for Biologists Branching Introduction Running code Literals and variables Numeric types Strings Exercise Lists, tuples and ranges Sets and dictionaries Convert and copy Loops Functions Branching Regular Expressions Annexes March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 65

Introduction to Python for Biologists Branching Truth Value Testing I Any object can be tested for truth value. The following values are considered false (other values are considered True): None False zero value: e.g. 0 or 0.0 an empty sequence or mapping: e.g., (), [ ], { }. Operations and built-in functions that have a Boolean result always return 0 for False and 1 for True March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 66

Introduction to Python for Biologists Branching Boolean Operations I A Boolean is equal to True or False a and b (true if a and b are true, false otherwise) a or b (true if a or b is true (1 alone or both), false otherwise) a ˆ b (true if either a or b is true (not both), false otherwise) not b (true if b is false, false otherwise) March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 67

Introduction to Python for Biologists Branching Boolean Operations II All example code for tests below return True unless otherwise specified 1 # l e t set values of 3 v a r i a b l e s ( s i n g l e = symbol ) 2 a = True 3 b = False 4 c = True 5 6 7 # simple t e s t s using two = symbols (==) 8 a == True 9 b == False 10 c == True March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 68

Introduction to Python for Biologists Branching Boolean Operations III 1 # l e t set values of 3 v a r i a b l e s ( one = symbol ) 2 a = True 3 b = False 4 c = True 5 6 # order i s i r r e l e v a n t 7 ( a or b ) == ( b or a ) 8 ( a and b ) == ( b and a ) 9 10 # n e u t r a l ( whatever value of a ) 11 ( a or False ) == a 12 ( a and True ) == a 13 14 # always the same ( whatever value of a ) 15 ( a and False ) == False 16 ( a or True ) == True March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 69

Introduction to Python for Biologists Branching Boolean Operations IV 1 # l e t set values of 3 v a r i a b l e s ( one = symbol ) 2 a = True 3 b = False 4 c = True 5 6 # precedence == > not > and > or 7 ( a and b or c ) == ( ( a and b ) or c ) 8 ( not a == b ) == ( not ( a == b ) ) 9 10 # e q u i v a l e n t expressions 11 ( ( a or b ) or c ) == ( a or ( b or c ) ) == ( a or b or c ) 12 ( a or a or a ) == a 13 ( b and b and b ) == b 14 15 b and b and b == b # False and False and True => False!! 16 17 a and ( b or c ) == ( a and b ) or ( a and c ) 18 a or ( b and c ) == ( a or b ) and ( a or c ) March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 70

Introduction to Python for Biologists Branching Comparisons 1 Operations 2 < # s t r i c t l y l ess than 3 <= # less than or equal 4 > # s t r i c t l y g r e a t e r than 5 >= # g r e a t e r than or equal 6 == # equal ( two symbols =) 7 math. i s c l o s e ( a, b ) # equal f o r f l o a t i n g p o i n t s a and b 8! = # not equal 9 i s # o b j e c t i d e n t i t y 10 i s not # negated o b j e c t i d e n t i t y 11 x < y <= z # i s e q u i v a l e n t to x < y and y <= z Comparisons between objects of same class are supported if operator defined for the class. Different numerical types can be compared: e.g. 2<4.56 Floating points can not be compared exactly due to the limited precision to represent infinite numbers such as 1/3 = 0.33333... March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 71

Introduction to Python for Biologists Branching Conditionals IF-ELIF-ELSE 1 seq = ATGAnnATG 2 i f n i n seq : 3 p r i n t ( sequence contains undefined bases ( n ) ) 4 e l i f x i n seq : 5 p r i n t ( sequence contains unknown bases x but not n ) 6 else : 7 p r i n t ( no undefined bases i n sequence ) 8 9 # 10 # sequence contains undefined bases ELIF and ELSE are optional multiple ELIF are possible March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 72

Introduction to Python for Biologists Introduction Running code Literals and variables Numeric types Strings Exercise Lists, tuples and ranges Sets and dictionaries Convert and copy Loops Functions Branching Regular Expressions Annexes March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 73

Introduction to Python for Biologists Exercise URL https://cbdm.uni-mainz.de/mb17 Jupyter Notebook File: Conditionals.ipynb Download the file into the notebooks folder March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 74

Introduction to Python for Biologists Regular Expressions Introduction Running code Literals and variables Numeric types Strings Exercise Lists, tuples and ranges Sets and dictionaries Convert and copy Loops Functions Branching Regular Expressions Annexes March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 75

Introduction to Python for Biologists Regular Expressions RE: Regular Expressions I Regular expressions (called REs, or regexes, or regex patterns) are a powerful language for matching text patterns (re module) In Python a regular expression search is typically written as: 1 match = re. search ( expression, s t r i n g ) The re.search() method takes a regular expression pattern and a string and searches for that pattern within the string. If the search is successful, re.search() returns a Match object (actually class sre.sre Match ) or None otherwise. March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 76

Introduction to Python for Biologists Regular Expressions RE: Regular Expressions II 1 import re # import re module 2 s t r = an example word : cat!! # Example s t r i n g 3 match = re. search ( r word : \w\w\w, s t r ) # Search a p a t t e r n 4 i f match : 5 p r i n t ( found, match. group ( ) ) # found word : cat 6 else : 7 p r i n t ( did not f i n d ) In the pattern string, \w codes a character (letter, digit or underscore) The r at the start of the pattern string designates a python raw string which passes through backslashes without change. March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 77

Introduction to Python for Biologists Regular Expressions RE: Basic Patterns Pattern Match a, X, 9, < ordinary characters match themselves exactly. a period matches any single character except newline \w matches a word character: a letter or digit or underbar [a-za-z0-9 ] \W matches any non-word character \b boundary between word and non-word \s a single whitespace character space, newline, return, tab, form [\n \r \t \f] \S matches any non-whitespace character \t tab \n newline \r return \d decimal digit [0-9] ˆ circumflex (top hat) matches the start of a string $ dollar matches the end of a string \ inhibits the specialness of a character. So, for example, use \. to match a period Table: Regular expressions: basic patterns March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 78

Introduction to Python for Biologists Regular Expressions RE: Basic examples I The basic rules of RE search for a pattern within a string are: The search proceeds through the string from start to end, stopping at the first match found All of the pattern must be matched, but not all of the string If match = re.search(pat, str) is successful, match is not None and in particular match.group() is the matching text March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 79

Introduction to Python for Biologists Regular Expressions RE: Basic examples II 1 match = re. search ( r i i i, p i i i g ) # found 2 match. group ( ) == i i i # True 3 4 match = re. search ( r i g s, p i i i g ) # not found 5 match == None # True 6 7 match = re. search ( r.. g, p i i i g ) # found 8 match. group ( ) == i i g # True 9 10 match = re. search ( r \d\d\d, p123g ) # found 11 match. group ( ) == 123 # True 12 13 match = re. search ( r \w\w\w, @@abcd!! ) # found 14 match. group ( ) == abc # True March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 80

Introduction to Python for Biologists Regular Expressions RE: Repetitions I Repetitions are defined using +, *,? and { } + means 1 or more occurrences of the pattern to its left e.g. i+ = one or more i s * means 0 or more occurrences of the pattern to its left? means match 0 or 1 occurrences of the pattern to its left curly brackets are used to specify exact number of repetitions e.g. A{5} for 5 A letters A{6,10} for 6 to 10 A letters Leftmost and Largest: First the search finds the leftmost match for the pattern, and second it tries to use up as much of the string as possible i.e. + and * go as far as possible (they are said to be greedy ). March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 81

Introduction to Python for Biologists Regular Expressions RE: Repetitions II 1 # simple r e p e t i t i o n s 2 re. search ( r p i +, p i i i g ). group ( ) # p i i i 3 re. search ( r p i?, ap ). group ( ) # p 4 re. search ( r p i?, a p i i ). group ( ) # p i 5 re. search ( r p i, ap ). group ( ) # p 6 re. search ( r p i, a p i i ). group ( ) # p i i 7 re. search ( r p i {3}, a p i i i i i ). group ( ) # p i i i 8 re. search ( r i +, p i i g i i i i ). group ( ) # i i (1 s t h i t only ) 9 10 # 3 d i g i t s p o s s i b l y separated by whitespaces (\ s ) 11 re. search ( r \d\s \d\s \d, xx1 2 3xx ). group ( ) # 1 2 3 12 re. search ( r \d\s \d\s \d, xx12 3xx ). group ( ) # 12 3 13 re. search ( r \d\s \d\s \d, xx123xx ). group ( ) # 123 March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 82

Introduction to Python for Biologists Regular Expressions RE: Sets of characters I Square brackets indicate a set of characters [ABC] matches A or B or C. The codes \w, \s etc. work inside square brackets too with the one exception that dot (.) just means a literal dot Dash indicate a range or itself if put at the end [a-z] for lowercase alphabetic characters [a-za-z] for alphabetic characters [AB-] for A, B or dash Circumflex (ˆ) at the start inverts the set [ˆAB] for any character except A or B. March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 83

Introduction to Python for Biologists Regular Expressions RE: Sets of characters II 1 s t r = purple a l i c e b@google. com monkey dishwasher 2 match = re. search ( r \w+@\w+, s t r ) 3 i f match : 4 p r i n t match. group ( ) ## b@google 5 6 match = re. search ( r [ \w. ]+@[ \w. ]+, s t r ) 7 i f match : 8 p r i n t match. group ( ) ## a l i c e b@google. com March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 84

Introduction to Python for Biologists Regular Expressions RE: Functions I RE module functions: re.match() returns a Match object if occurrence found at begining of string, None otherwise re.search() returns a Match object for 1st occurrence, None if not found re.findall() returns a list of matched sub strings, an empty list if not found re.finditer() returns an iterator on Match objects of the occurrences, an empty iterator if not found Match object methods: match.start() returns start index match.end() returns end index match.span() returns start and end index in a tuple match.group() returns matched string March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 85

Introduction to Python for Biologists Regular Expressions RE: Functions II 1 import re 2 seq = RPAPPDRAPDQX # A sequence 3 expr = A.{ 1, 2}D # A and D separated by 1 or 2 characters 4 5 match = re. search ( expr, seq ) 6 i f match : 7 p r i n t ( 8 match. s t a r t ( ), # s t a r t index 9 match. end ( ), # end index 10 match. span ( ), # s t a r t and end index 11 match. group ( ), # the matched s t r i n g 12 seq [ match. s t a r t ( ) : match. end ( ) ], # the matched s t r i n g 13 sep= 14 ) 15 # 2 6 ( 2, 6) APPD APPD March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 86

Introduction to Python for Biologists Regular Expressions RE: Functions III 1 import re 2 seq = RPAPPDRAPDQX # A sequence 3 expr = A.{ 1, 2}D # A and D separated by 1 or 2 characters 4 5 match = re. match ( expr, seq ) # Not found at begining 6 p r i n t ( match ) 7 # None 8 9 matches = re. f i n d a l l ( expr, seq ) # Found 2 occurrences 10 p r i n t ( matches ) 11 # [ APPD, APD ] 12 13 matches = re. f i n d i t e r ( expr, seq ) # Found 2 occurrences 14 f o r m i n matches : # I t e r a t e over Match o b j e c t s 15 p r i n t ( m. span ( ), m. group ( ) ) # Use each Match o b j e c t 16 # ( 2, 6) APPD 17 # ( 7, 10) APD March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 87

Introduction to Python for Biologists Regular Expressions RE: Group Extraction Groups are defined with parentheses On a successful search match.group(): the whole match text match.group(1): match text of 1st left parenthesis match.group(2): match text of 2nd left parenthesis... 1 import re 2 s t r = purple a l i c e b@google. com monkey dishwasher 3 match = re. search ( ( [ \w. ]+)@( [ \w. ]+), s t r ) 4 i f match : 5 p r i n t ( match. group ( ) ) ## a l i c e b@google. com 6 p r i n t ( match. group ( 1 ) ) ## a l i c e b 7 p r i n t ( match. group ( 2 ) ) ## google. com March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 88

Introduction to Python for Biologists Regular Expressions RE: Group Extraction and Findall If the pattern includes a single set of parenthesis, then findall() returns a list of strings corresponding to that single group If the pattern includes 2 or more parenthesis groups, then instead of returning a list of strings, findall() returns a list of tuples. Each tuple represents one match of the pattern, and inside the tuple is the group(1), group(2)... data. 1 s t r = alice@google. com, monkey bob@abc. com dishwasher 2 t u p l e s = re. f i n d a l l ( r ( [ \w\. ]+)@( [ \w\. ]+), s t r ) 3 p r i n t ( t u p l e s ) 4 # [ ( a l i c e, google. com ), ( bob, abc. com ) ] 5 6 f o r t i n t u p l e s : 7 p r i n t ( t [ 0 ], t [ 1 ], sep= ) 8 # a l i c e google. com 9 # bob abc. com March 21, 2017 Johannes Gutenberg-Universität Mainz Taškova & Fontaine 89