Classification des séquences régulatrices sur base de motifs

Size: px
Start display at page:

Download "Classification des séquences régulatrices sur base de motifs"

Transcription

1 LIFL - 3 mars 2003 lassification des séquences régulatrices sur base de motifs Jacques van Helden Jacques.van.Helden@ulb.ac.be

2 Regulatory Sequence Analysis Regulatory regions and regulatory elements Jacques van Helden

3 The non-coding genome Organism Year Size Genes genes/mb coding non-coding repetitive Mb % % % % Transcribed Mycoplasma genitalium Haemophilus influenzae Escherichia coli Saccharomyces cerevisiae Arabidiopsis thaliana aenorhabditis elegans Drosophila melanogaster Homo sapiens

4 Transcriptional activation Transcriptional activator activation domain RNA polymerase DNA-binding domain enhancer initiation

5 Transcription factor-dna interfaces A B D

6 Regulatory sites : string description Binding sites for the yeast Pho4p transcription factor (Source : Oshima et al. Gene 179, 1996; ) Gene Site Name Sequence Affinity PHO5 UASp2 ---ataaagtgggatag- high PHO84 Site D ---TTTAGAGTGGGGGGA-- high PHO81 UAS ----TTATGGAGTGGAATAA-- high PHO8 Proximal GTGATGTGAGTGGGA--- high PHO5 UASp3 --TAATTTGGATGTGGATT-- low PHO84 Site -----AGTAGTGGAATAT-- low PHO84 Site A -----TTTATAGTGAATTTTT low group 1 consensus gagtgggac----- high-low. PHO5 UASp1 --TAAATTAGAGTTTTG---- medium PHO84 Site E ----AATAGAGTTTTTAATTA medium PHO84 Site B -----TTAGAGTTGGTGTG-- low PHO8 Distal ---TTAGAGTTAATAT--- low group 2 consensus cgagttt med-low Degenerate consensus GAGTKKk

7 IUPA ambiguous nucleotide code. A A Adenine ytosine G G Guanine T T Thymine R A or G purine Y or T pyrimidine W A or T Weak hydrogen bonding S G or Strong hydrogen bonding M A or amino group at common position K G or T Keto group at common position H A, or T not G B G, or T not A V G, A, not T D G, A or T not N G, A, or T any

8 Regulatory sites : matrix description Alignment matrix Pos Base A G T V A G T K B Binding site for the yeast Pho4p transcription factor (Source : Transfac matrix F$PHO4_01)

9 Position-weight matrix Prior Pos A G T Sum W i, j = ln f ' i, p i j A alphabet size (=4) p i prior residue probability for residue i f i,j relative frequency of residue i at position j k pseudo weight (arbitrary, 1 in this case) corrected frequency of residue i at position j f' i,j f ' i, j = n i, j A i=1 n + i, j p k i + k

10 Scanning a sequence with a profile matrix Ex: sequence...gtgagtgg.. Position-Spe cific Scoring Matrix A G T Sca nning 1 S UM G T G A G T G G T G A G T G G T G A G T G G

11 Differences between species organism yeast coli higher organisms location upstream upstream upstream overlap. Initiation downstream within introns distance range -800 to -1 bp -400 to +50 bp over 100s of Kb position effect often irrelevant often essential often irrelevant strand insensitive sensitive or symmetric insensitive most common ~5-8 conserved bp spaced pair of 3nt ~5-8 conserved bp core repeated sites occasional rare frequent composite elements frequent

12 Matrix search : matching positions

13 Regulatory Sequence Analysis Regulation of phosphate and methionine metabolism in the yeast Saccharomyces cerevisiae Jacques van Helden

14 Phosphate utilization in yeast Up-regulates Pho2p Pho4p PHO4 expression codes for Pho4p PHO2 expression is translocates tranferred up-regulates up-regulates up-regulates up-regulates up-regulates up-regulates up-regulates PHO3 PHO5, 11,12 PHO84, 86,87,88,89 PHO81 Pho4p-Phosphate PHO8 (nucleus) expression codes for expression codes for expression codes for expression codes for expression codes for acid phosphatase alkaline phosphatase catalyzes orthophosphoric monoester Pi H2O alcohol is secretion secreted Pi transporter transports facilitates is tranferred transport acid phosphatase catalyzes orthophosphoric monoester Pi alcohol H2O Pho80p catalyzes Pho85p inhibits Pho81p inhibits Pho85-Pho80 complex (cytoplasm) extracellular space

15 Pho4p binding sites gene start end sequence PHO GATAAGTGGGATA PHO GATAAGTGGGA PHO TGGATAAGTGGGATAGA PHO TGGGAGTGAGGAT PHO ATATTAAGGTGGGGTAA PHO TTATGGAGTGGAATAA PHO TTTAGAGTGGGGGG PHO TAGTTAGTGGAGTG PHO aaaagtgtAGTGataaaaat PHO TTAAAAAGTGGTATTA PHO TTAGAGTTGGTGTG PHO AATTAGAGTTTTGATA PHO5 (?) (?)..AAATTAGAGTTTG PHO TAAATTAGAGTTTTGATAGA

16 Pho4p binding specificity - matrix descriptions D E Pho 4p A G T Pho 4p.c ac g tg A G T Pho 4p.c ac g tt A G T

17 Methionine Biosynthesis in S.cerevisiae Aspartate biosynthesis ATP ADP L-Aspartate Aspartate kinase HOM3 L-aspartyl-4-P NADPH NADP+; Pi L-aspartic semialdehyde Aspartate semialdehyde deshydrogenase HOM2 Threonine biosynthesis NADPH NADP+ AcetlyoA oa L-Homoserine O-acetyl-homoserine Homoserine deshydrogenase Homoserine O-acetyltransferase HOM6 MET2 Met31p met32p MET31 MET32 Sulfur assimilation Sulfide O-acetylhomoserine (thiol)-lyase MET17 ysteine biosynthesis Homocysteine bf1p/met4p/met28p complex MET28 BF1 MET4 5-methyltetrahydropteroyltri-L-glutamate 5-tetrahydropteroyltri-L-glutamate L-Methionine Methionine synthase (vit B12-independent) MET6 Gcn4p GN4 S-adenosyl-methionine H 2 0; ATP synthetase I Pi, PPi S-adenosyl-methionine synthetase II S-Adenosyl-L-Methionine SAM1 SAM2 Met30p MET30

18 Met4p binding sites gene start end sequence MET GAAAAGTAGTGTAATTT MET AAAAGGTAGTGAAGA MET TAATTTAGTGATAAT MET ATATTTAGTGGTAGT EM ATTTATAGTGGTATT EM TTTGTAGTGATATTT MET AAAGTGAGTTAT MET TAGAAGAGTGAAAA MET GTATTTTAGTGATGG MET TAATAATAGTGATATTT MET AAATGGAGTGAAGTGT MET TTGAGGTAATGATGA MET GAATAGTGAATT MET AATATTTAGTGATTA SAM TTAAGTGATATAA SAM TTTAATGTGATTAT A G T

19 Met31p binding sites gene start end sequence MET TAAAAAATGTGGAATGG MET TGAAAAAATTGTGGATGA MET TATGAAAATGTGTAAATA MET GTGAAAATGTGGTAGTA SAM GTTGAAAATGTGGGTTTT SAM AAGGAAAATGTGGTGGG MET ATAAGAAATGTGGGTTAT MUP GGAAAAAATGTGGGTG MET GGAAAAAAAATGTGAAAATG MET ATAATAAATGTGAAGGA MET AAAAGAAGTTTTAAA MET TAAAAAGTTTTGGGG MET TTTGTGAGTTTTATTG MET GGGAAGAAGTTTGGGG MET TATGAATGTTTAGTG A G T

20 Training and validation set There is a subset of genes which can be assigned to predefined classes, on the basis of prior information (e.g. known regulation) These classes will be used as criterion variable. Note : the sample class column might contain some errors. Phosphate-responding genes Methionine-responding genes ontrol genes # ORF Gene name Family # ORF Gene name Family # ORF Gene name Family 1 YBR093 PHO5 PHO 14 YBR213W MET8 MET 33 YAL038W D19 TL 2 YDR481 PHO8 PHO 15 YDR253 MET32 MET 34 YBL005W PDR3 TL 3 YAR071W PHO11 PHO 16 YDR502 SAM2 MET 35 YBL005W-A YBL005W-A TL 4 YHR215W PHO12 PHO 17 YER091 MET6 MET 36 YBL005W-B YBL005W-B TL 5 YOL001W PHO80 PHO 18 YFR030W MET10 MET 37 YBL030 PET9 TL 6 YGR233 PHO81 PHO 19 YHL036W MUP3 MET 38 YBR006W UGA5 TL 7 YML123 PHO84 PHO 20 YIL046W MET30 MET 39 YBR018 GAL7 TL 8 YPL031 PHO85 PHO 21 YIR017 MET28 MET 40 YBR020W GAL1 TL 9 YJL117W PHO86 PHO 22 YJR010W MET3 MET 41 YBR115 LYS2 TL 10 YR037 PHO87 PHO 23 YJR137 EM17 MET 42 YBR184W YBR184W TL 11 YBR106W PHO88 PHO 24 YKL001 MET14 MET 43 YL018W LEU2 TL 12 YBR296 PHO89 PHO 25 YKR069W MET1 MET 44 YDL131W LYS21 TL 13 YHR136 SPL2 PHO 26 YLR180W SAM1 MET 45 YDL182W LYS20 TL 27 YLR303W MET17 MET 46 YDL205 HEM3 TL 28 YLR396 VPS33 MET 47 YDL210W UGA4 TL 29 YNL241 ZWF1 MET 48 YDR011W SNQ2 TL 30 YNL277W MET2 MET 49 YDR044W HEM13 TL 31 YOL064 MET22 MET 50 YDR234W LYS4 TL 32 YPL038W MET31 MET 51 YDR285W ZIP1 TL YPR065W ROX1 TL 113 YPR138 MEP3 TL 114 YPR145W ASN1 TL

21 Regulatory Sequence Analysis lassifying genes on the basis of upstream motifs Jacques van Helden

22 Questions Is it possible to predict the regulation of a gene on the basis of its upstream sequence? A challenging case : given the similarity of Met4p and Pho4p binding sites, can we distinguish their respective target genes? Genome-scale classification : having at hand the complete yeast genome, can we classify genes according to their predicted regulatory responses?

23 Discrimination between MET and PHO genes On the basis of upstream motifs, can we predict the regulation of a gene? Pho4p binds AGTGgg and AGTTtt (consensus AGTkkk) Met4p binds tagtga Met31p binds AAAATGTGG lues ombinatorial aspect : several MET genes are regulated by both Met4p and Met31p Multiple motifs : many PHO genes have several Pho4p sites Approach Build position weight matrices reflecting the specificity of each factor For each upstream region, find the 3 top scores obtained with the different matrices Define a training set with known PHO, MET and control genes Apply discriminant analysis

24 Data types Patterns can be represented either as strings or as positions-specific scoring matrices (PSSM). We collected two data sets, representing the putative regulatory signals in upstream sequences : top scores obtained with position-specific scoring matrices counts of occurrences for selected hexanucleotides and dyads These patterns were matched against each one of the 6300 yeast upstream sequences.

25 Data - matrix scores matrix scores: top scores obtained with PSSM for Met4p, Met31p, Pho4p, Pho4p.cacgtg, and Pho4p.cacgtt; 5 matrices, 3 top scores per matrix. P ho 4p Met4p Met31p P ho4p.ca cgtg.1 P ho4p.ca cgtg.2 P ho4p.ca cgtg.3 P ho4p.ca cgtk.1 P ho4p.ca cgtk.2 P ho4p.ca cgtk.3 ;orf ge ne YAL001 TF YAL002W V P S YAL003W EFB YAL004W YAL004W YAL005 S S A YAL007 ERP YAL008W FUN YAL009W S P O YAL010 M DM YAL011W YAL011W YAL012W YS YAL013W DEP YP R203W YP R203W P ho4p.ca cgtt.1 P ho4p.ca cgtt.2 P ho4p.ca cgtt.3 M e t4p.1 M e t4p.2 M e t4p.3 M e t31p.1 M e t31p.2 M e t31p.3

26 Top scores - Met4p and Pho4p matrices M et4p P HO M ET TL Pho4p

27 Top scores - Met31p and Met4p M et31p P HO M ET TL M et4p

28 Met4p - first top score vs second top score M et4p - second score M et4p - first score

29 Met31p - first top score vs second top score M et31p - second score M et31p - first score

30 Principal omponent Analysis - matrix scores 5 position-weight matrices Met4p Met31p Pho4p Pho4p.cacgtg Pho4p.cacgtt 3 top scores per matrix Each gene is characterized by 15 scores The 15-dimensional score space can be projected onto a 2- dimensional space The two first components (P1 and P2) are the directions with the highest variance in the 15- dimentional space Previously characterized PHO and MET genes are labelled The TL genes are genes with known regulation, and supposedly not regulated by MET or PHO P PA P 1 all TL M ET PHO

31 Data - pattern counts word counts: number of occurrences of 44 hexanucleotides involved in the regulation of the MET, PHO and NIT genes. Some of these words are very well conserved in the core of the binding site (e.g. AGTG, AGTT,...) whereas some other represent partial conservation of the flanking nucleotides (e.g. AGTGg, AGTTt,...). ge ne cttn{0}a tc.g a tn{0}a ag ca gn{2}cgg.ccgn{2}ctg ga tn{1}a ga.tctn{1}a tc ctta tc.ga taag ccgcgc.gcg cgg cg gca c.gtgccg a ta a ga.tctta t a ca tct.a ga tgt ctga ta.ta tca g a ga taa.ttatct ca cn{0}gtg.ca cn{0}gtg ca cn{1}tga.tca n{1}g tg ca cn{2}ga c.gtcn{2}gtg cca n{0}ca g.ctg n{0}tgg a cgn{0}tga.tcan{0}cgt aacn{1}gtg.cacn{1}gtt gca n{6}ggc.gccn{6}tgc gccn{0}a ca.tgtn{0}g gc gggn{4}cac.gtgn{4}ccc ca gn{7}atc.gatn{7}ctg ca cgtg.ca cgtg a cgtga.tca cgt cca ca g.ctgtgg cgtga c.gtca cg gcca ca.tgtggc a tca cg.cgtga t cgccac.gtggcg ccggag.ctccgg TF VPS EFB YAL004W SSA ERP FUN SPO M DM YP R204W

32 ombining multiple patterns for classification 64 genes NIT, PHO and MET families 44 patterns obtained from oligoanalysis and dyad-analysis occurrence counts for all patterns in each upstream region 64x44 multivariate table Projection of the 44 dimensions onto a 2D space (Principal omponent Analysis) PA is suboptimal: the axes with highest variance are not mandatorily the most discriminant PHO NIT MET TL

33 Regulatory Sequence Analysis Unsupervised classification (clustering) Jacques van Helden

34 hclust (*, "complete") NIT_DAL5 P HO _P HO 87 NIT_HA1 NIT_YG R125W NIT_DAL1 NIT_DAL4 NIT_M EP3 NIT_DAL3 NIT_G AT1 NIT_G AP 1 P HO _P HO 81 M ET_M ET22 P HO _P HO 89 M ET_M UP 3 M ET_M ET10 M ET_EM 17 RAND_rand_4 RAND_rand_21 RAND_rand_24 RAND_rand_5 RAND_rand_17 RAND_rand_16 RAND_rand_6 RAND_rand_19 M ET_M ET31 RAND_rand_26 RAND_rand_13 RAND_rand_3 RAND_rand_9 NIT_BAT2 RAND_rand_1 RAND_rand_22 RAND_rand_10 RAND_rand_11 M ET_M ET8 RAND_rand_15 P HO _P HO 88 RAND_rand_7 NIT_LAP 4 P HO _P HO 85 NIT_AP G 14 RAND_rand_14 NIT_UG A1 RAND_rand_2 P HO _P HO 80 RAND_rand_25 RAND_rand_29 NIT_P EP4 RAND_rand_30 RAND_rand_8 NIT_UG A4 RAND_rand_28 NIT_DAL2 NIT_PS1 NIT_YBR043 NIT_DUR3 NIT_AG P 1 NIT_P RB1 NIT_DAL7 NIT_G ZF3 NIT_G DH3 NIT_AN1 NIT_DAL80 M ET_VPS33 RAND_rand_23 M ET_M ET14 M ET_M ET16 NIT_DG 1 RAND_rand_12 RAND_rand_18 RAND_rand_20 M ET_S AM 1 RAND_rand_27 M ET_S AM 2 M ET_M ET32 M ET_M ET1 M ET_M ET17 PHO _SP L2 P HO _P HO 86 NIT_P UT4 M ET_M ET28 M ET_ZW F1 P HO _P HO 11 P HO _P HO 12 P HO _P HO 5 P HO _P HO 8 M ET_M ET3 M ET_M ET30 M ET_M ET6 M ET_M ET2 NIT_M EP1 NIT_M EP2 NIT_YHR029 P HO _P HO 84 Height euclidian.dist - complete Genes from different classes are intermingled The four main clusters do not correspond to the prior functional classes lustering method: UPGMA (complete) Distance metric: Euclidian Predicted Sequence clustering on the basis of pattern counts R AN D MET N IT PH O SU M R AN D MET N IT PH O SU M Errors % orrect % K nown lustering - Euclidian distance

35 Poisson-based similarity metrics proba Poisson ommon occurrences Distinct occurrences d= occ

36 lustering - Poisson-based distance metrics Sequence clustering on the basis of pattern counts Similarity metric based on probability of patterns found in common in two sequences Three of the 4 main clusters correspond to the regulatory mechanism (PHO, MET, NIT) Most errors consist in false negative Known RAND MET NIT PHO SUM RAND MET NIT PHO SUM Errors % orrect % po isson.sim ilarity.product.d - w ard RAND NIT PHO MET RAND_ra nd_20 P HO _P HO 88 RAND_ra nd_7 NIT_UG A4 NIT_AN1 RAND_ra nd_29 M ET_M ET8 RAND_ra nd_15 NIT_DAL7 RAND_ra nd_21 RAND_ra nd_24 NIT_BAT2 RAND_ra nd_1 P HO _P HO 89 RAND_ra nd_10 RAND_ra nd_11 NIT_PEP 4 RAND_ra nd_30 RAND_ra nd_17 RAND_ra nd_3 RAND_ra nd_9 RAND_ra nd_6 RAND_ra nd_19 P HO _P HO 80 RAND_ra nd_5 NIT_G DH3 NIT_AP G14 RAND_ra nd_18 RAND_ra nd_27 M ET_M ET31 RAND_ra nd_26 P HO _P HO 85 NIT_LAP 4 RAND_ra nd_14 NIT_UG A1 RAND_ra nd_2 RAND_ra nd_16 RAND_ra nd_13 RAND_ra nd_25 NIT_DAL1 NIT_DAL4 NIT_DUR3 NIT_YHR029 NIT_DG 1 RAND_ra nd_12 NIT_YBR043 NIT_DAL2 M ET_V P S 33 RAND_ra nd_23 NIT_P UT4 NIT_DAL5 NIT_M EP3 NIT_HA1 NIT_DAL3 NIT_M EP1 NIT_G AP 1 NIT_G AT1 NIT_M EP2 NIT_P RB1 NIT_P S1 NIT_AG P 1 NIT_G ZF3 NIT_DAL80 NIT_YG R125W RAND_ra nd_28 RAND_ra nd_8 RAND_ra nd_22 P HO _P HO 11 P HO _P HO 12 P HO _P HO 8 P HO _P HO 81 P HO _P HO 5 P HO _P HO 87 P HO _P HO 86 P HO _S P L2 P HO _P HO 84 M ET_M ET6 M ET_M ET30 M ET_S AM 2 M ET_M ET28 M ET_M UP 3 M ET_M ET3 M ET_ZW F1 M ET_S AM 1 M ET_M ET17 M ET_M ET32 M ET_M ET1 M ET_EM 17 M ET_M ET16 M ET_M ET14 M ET_M ET22 RAND_ra nd_4 M ET_M ET10 M ET_M ET2 Height Predicted hclust (*, "ward")

37 Sequence clustering on the basis of pattern counts The choice of the distance metric and clustering method is crucial lassical distance metric give very bad results Poisson-based metric bring a sensible improvement Weaknesses Dependency between variables are not (yet) taken into account Unsupervised approach. We could obtain better results by taking advantage of our prior knowledge of the functional classes.# Reference van Helden, J. (2003). Metrics for comparing regulatory sequences on the basis of pattern counts. Bioinformatics accepted.

38 Regulatory Sequence Analysis Supervised classification (discriminant analysis) Jacques van Helden

39 Discriminant analysis Training set alibration Discriminant function Objects of known class Testing set Prediction Predicted class omparison onfusion table (for evaluation) Objects of unknown class Prediction Predicted class

40 Approach Extract upstream sequences for each one of the 6000 yeast genes Use position-weight matrices to predict putative regulatory elements Use genes with known PHO or MET regulation, plus a control group (TL) as training set Build a classification rule based on predicted regulatory elements Evaluate the accuracy of the classification rule Select the best classification method and parameters Apply the classification to each one of the 6000 yeast genes

41 Difficulties The training sets are very small 13 PHO genes 19 MET genes 82 control genes (supposed to respond neither to phosphate nor to methionine) The training set may contain errors Over-fitting The number of variables (15 matrix scores, 44 pattern counts) is higher than the number of elements in some classes of the training set Size of the prediction set After training and evaluation, the discriminant function will b used to classify each one of the 6300 yeast genes. Even a small error rate (e.g. 1%) would lead to an important number of false predictions (60 false positives).

42 Evaluation of the discriminant function - confusion table The results of the evaluation are summarized in a confusion table, which contains the count of the predicted versus known class. The confusion table can be used to calculate the accuracy of the predictions. The same object should not be used for training and testing. Ideally, one would split the data set in two parts : training and evaluation sets. The training set is so small that splitting it further would lead to a sensible loss in information. In such cases, a Leave-one-out evaluation is performed. Internal validation (biased) Leave-one-out validation Known Known PHO MET TL Sum PHO MET TL Sum PHO PHO Predicted MET MET TL TL Sum Sum Error rate 0.07 Error rate 0.14 Hit rate 0.93 Hit rate 0.86 Predicted

43 Linear versus quadratic classification rule Under assumption of multivariate normality There is one covariance matrix per group g. When all covariance matrices are assumed to be identical, the classification rule can be simplified to obtain a linear function -> Linear Discriminant Analysis (LDA) When the groups have not the same covariance matrix, Quadratic Discriminant Analysis (QDA) is more appropriate.

44 Variable selection When there are too many variables, the classification is less accurate. In particular, the number of variables must be much smaller than the number of elements in the training groups. In our case, we have 15 variables, but the PHO group contains only 13 genes. We select the best subset of variables via a stepwise procedure

45 Stepwise linear discriminant analysis E rror rate Predicted n it_m et_pho_rand - Stepwise PDA - Error rates LD A LD A, randomized data n um b er of variables Known TL MET NIT PHO SUM TL MET NIT PHO SUM Errors % orrect % Genes : NIT, PHO, MET + random selection (TL) 44 variables (pattern counts) Optimum: 7 variables Best variables cttatc.gataag acgtga.tcacgt acgtgg.ccacgt cggcac.gtgccg ccgcgc.gcgcgg aacgtg.cacgtt ctgcac.gtgcag

46 Estimation of the expected error rate E rror rate E rror rates - stepwise LDA - m atrix scores LD A pe rmuted Even a random classification would assign some objects to the correct group. The random rate of correct assignation depends on The relative size of the groups The structure of the data The number of variables The expected error rate can be estimated with a permutation test n um b er of variables The method is applied to the real data set, but the training labels are randomly assigned.

47 lassification with the principal components Variable selection represents a loss of information. P transformation can be applied before discriminant analysis. This allows to "concentrate" more information in a few variables.

48 Stepwise discriminant analysis - forward variable selection PHO and MET signals, matrix scores P HO and MET upstream motifs - Stepwise PDA - Error rates E rro r rate LD A LD A with P LD A, permuted labels LD A with P, permuted labels Q D A Q D A with P Q D A, permuted labels Q D A with P, permuted labels n um b er of variables

49 Optimal conditions Pre-processed data (Principal component analysis) Linear Discriminant Analysis 4 components selected (components 1, 2, 3, and 6). External validation with the leave-one-out method Predicted onfusion table K now n PH O MET TL SU M PH O ME T TL SU M E rrors % orrect %

50 omparison of predicted and prior class Letters indicate the predicted class olors indicate the prior class P P P M P P M P M M M P M P P M M M M M M M M P M M m atrix_scores ross-validation (LOO) LD1 LD2

51 "Misclassified units" M isclassifications lassified as TL PHO88 PHO86 SA M1 V PS33 ZWF1 PHO80 MET22 PHO85 MET M isclassifications lassified as MET GA L

52 hoice of the prior probabilities The classes may have different proportions between the sample and the population For example, we could decide, on the basis of our biological knowledge, that it is likely to have 1% rather than 11% of yeast gene responding to phosphate. Popula tion la ss Sa m ple Priors from sa m ple Arbitra ry priors PHO % 11% 1% M ET % 17% 1% TL % 72% 98% TOTAL

53 Genome-scale prediction The discriminant function is used to assign each yeast gene to a regulation class (PHO, MET or control). 97 genes predicted as methionine-regulated 103 genes predicted as phosphate-responding The confidence of each prediction can be assessed with the posterior probability. ORF Gene prediction post.tl post.met post.pho Description YAL001 TF3 TL TFIII (transcription initiation factor) subunit, 138 kd YAL002W VPS8 PHO vacuolar sorting protein, 134 kd YAL003W EFB1 TL translation elongation factor eef1beta YAL004W YAL004w TL strong similarity to A.klebsiana glutamate dehydrogenase YAL005 SSA1 TL heat shock protein of HSP70 family, cytosolic YAL007 ERP2 TL p24 protein involved in membrane trafficking YAL008W FUN14 TL hypothetical protein YAL009W SPO7 TL meiotic protein YAL010 MDM10 TL involved in mitochondrial morphology and inheritance YAL011W FUN36 TL weak similarity to Mus musculus p53-associated cellular protein YAL012W YS3 MET cystathionine gamma-lyase YAL013W DEP1 TL regulator of phospholipid metabolism

54 Phosphate predictions ORF NAM E know n pre dicte d post.tl post.m ET post.pho De scription YBR093 PHO5 PHO PHO re pre ssible acid phosphatase precursor YEL017W YEL017w NA PHO hypothetica l protein YGR233 PHO81 PHO PHO cyclin-dependent kinase inhibitor YBR296 PHO89 PHO PHO Na +-coupled phosphate transport protein, high affinity YAR071W PHO11 PHO PHO secre te d acid phosphatase YHR215W PHO12 PHO PHO secre te d acid phosphatase YDR303 RS3 NA PHO sim ilarity to transcriptional regulator proteins YJR058 APS2 NA PHO AP-2 complex subunit, sigma2 subunit, 17 KD YKR048 NAP1 NA PHO nucle osom e assembly protein I YR037 PHO87 PHO PHO m e m be r of the phosphate permease family YJR059W PTK2 NA PHO involve d in polyamine uptake YML123 PHO84 PHO PHO high-affinity inorganic phosphate/h+ symporter YHR136 SPL2 PHO PHO suppressor of plc1-delta YKR050W TRK2 NA PHO m oderate-affinity potassium transport protein YHR137W ARO9 NA PHO a rom a tic amino acid aminotransferase II YAR070 YAR070c NA PHO hypothetica l protein YJR060W BF1 NA PHO kinetochore protein YDR281 PHM6 NA PHO hypothetica l protein, has a role in phosphate metabolism YJR070 YJR070c NA PHO sim ilarity to.elegans hypothetica l protein 14A4.1 YNL116W YNL116w NA PHO w e ak similarity to RING zinc finger protein from Gallus ga llus YKR047W YKR047w NA PHO que stionable ORF YEL017-A PMP2 NA PHO H+-ATPa se subunit, plasma membrane YPL088W YPL088w NA PHO sim ilarity to aryl-alcohol dehydrogenases YPL089 RLM1 NA PHO tra nscription factor of the MADS box family YDR310 SUM1 NA PHO suppressor of SIR mutations YDR311W TFB1 NA PHO TFIIH subunit (transcription initiation factor), 75 kd YMR253 YMR253c NA PHO strong similarity to YPL264c YMR255W GFD1 NA PHO prote in of the nuclear pore complex YKR045 YKR045c NA PHO hypothetica l protein YMR256 OX7 NA PHO cytochrome-c oxidase, subunit VII

55 Methionine predictions ORF NAME known predicted post.tl post.met post.pho Description YGR155W YS4 NA MET cystathionine beta-synthase YGR154 YGR154c NA MET strong similarity to hypothetical proteins YKR076w and YMR251w YKL001 MET14 MET MET ATP adenosine-5prime-phosphosulfate 3primephosphotransferase YDR502 SAM2 MET MET S-adenosylmethionine synthetase 2 YLR303W MET17 MET MET O-acetylhomoserine sulfhydrylase YDR253 MET32 MET MET transcriptional regulator of sulfur amino acid metabolism YJR010W MET3 MET MET sulfate adenylyltransferase YIL074 SER33 NA MET phosphoglycerate dehydrogenase YDR254W HL4 NA MET chromosome segregation protein YNL277W MET2 MET MET homoserine O-acetyltransferase YDL073W YDL073w NA MET weak similarity to yprinus carpio calcium channel protein YDL074 BRE1 NA MET weak similarity to spindle pole body protein NUF1 YOR367W SP1 NA MET similarity to mammalian smooth muscle protein SM22 and chicken calponin alpha YOR001W RRP6 NA MET similarity to human nucleolar 100K polymyositis-scleroderma protein YLR149 YLR149c NA MET weak similarity to hypothetical protein SP4G3.03 S. pombe YDL058W USO1 NA MET intracellular protein transport protein YBR213W MET8 MET MET siroheme synthase YLL060 GTT2 NA MET glutathione S-transferase YIL047 SYG1 NA MET member of the major facilitator superfamily YIL046W MET30 MET MET involved in regulation of sulfur assimilation genes and cell cycle progression YNL283 WS2 NA MET glucoamylase III (alpha-1,4-glucan-glucosidase) YFL032W YFL032w NA MET questionable ORF YHL038 BP2 NA MET apo-cytochrome b pre-mrna processing protein 2 YHL036W MUP3 MET MET very low affinity methionine permease YLR150W STM1 NA MET specific affinity for guanine-rich quadruplex nucleic acids YFL031W HA1 NA MET transcription factor YFL033 RIM15 NA MET protein kinase involved in expression of meiotic genes YFL018 LPD1 NA MET dihydrolipoamide dehydrogenase precursor YGR204W ADE3 NA MET tetrahydrofolate synthase (trifunctional enzyme),cytoplasmic YHR001W-A QR10 NA MET ubiquinol--cytochrome-c reductase 8.5 kda subunit

Matrix-based pattern matching

Matrix-based pattern matching Regulatory sequence analysis Matrix-based pattern matching Jacques van Helden Jacques.van-Helden@univ-amu.fr Aix-Marseille Université, France Technological Advances for Genomics and Clinics (TAGC, INSERM

More information

Using graphs to relate expression data and protein-protein interaction data

Using graphs to relate expression data and protein-protein interaction data Using graphs to relate expression data and protein-protein interaction data R. Gentleman and D. Scholtens October 31, 2017 Introduction In Ge et al. (2001) the authors consider an interesting question.

More information

Regulatory Sequence Analysis. Sequence models (Bernoulli and Markov models)

Regulatory Sequence Analysis. Sequence models (Bernoulli and Markov models) Regulatory Sequence Analysis Sequence models (Bernoulli and Markov models) 1 Why do we need random models? Any pattern discovery relies on an underlying model to estimate the random expectation. This model

More information

CELL CYCLE RESPONSE STRESS AVGPCC. YER179W DMC1 meiosis-specific protein unclear

CELL CYCLE RESPONSE STRESS AVGPCC. YER179W DMC1 meiosis-specific protein unclear ORFNAME LOCUS DESCRIPTION DATE HUBS: ESSENTIAL k AVGPCC STRESS RESPONSE CELL CYCLE PHEROMONE TREATMENT UNFOLDED PROTEIN RESPONSE YER179W DMC1 meiosis-specific protein 9-0.132-0.228-0.003-0.05 0.138 0.00

More information

Bioinformatics. Transcriptome

Bioinformatics. Transcriptome Bioinformatics Transcriptome Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/ Bioinformatics

More information

Position-specific scoring matrices (PSSM)

Position-specific scoring matrices (PSSM) Regulatory Sequence nalysis Position-specific scoring matrices (PSSM) Jacques van Helden Jacques.van-Helden@univ-amu.fr Université d ix-marseille, France Technological dvances for Genomics and Clinics

More information

Peter Pristas. Gene regulation in eukaryotes

Peter Pristas. Gene regulation in eukaryotes Peter Pristas BNK1 Gene regulation in eukaryotes Gene Expression in Eukaryotes Only about 3-5% of all the genes in a human cell are expressed at any given time. The genes expressed can be specific for

More information

GO ID GO term Number of members GO: translation 225 GO: nucleosome 50 GO: calcium ion binding 76 GO: structural

GO ID GO term Number of members GO: translation 225 GO: nucleosome 50 GO: calcium ion binding 76 GO: structural GO ID GO term Number of members GO:0006412 translation 225 GO:0000786 nucleosome 50 GO:0005509 calcium ion binding 76 GO:0003735 structural constituent of ribosome 170 GO:0019861 flagellum 23 GO:0005840

More information

Matrix-based pattern discovery algorithms

Matrix-based pattern discovery algorithms Regulatory Sequence Analysis Matrix-based pattern discovery algorithms Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe)

More information

Hotspots and Causal Inference For Yeast Data

Hotspots and Causal Inference For Yeast Data Hotspots and Causal Inference For Yeast Data Elias Chaibub Neto and Brian S Yandell October 24, 2012 Here we reproduce the analysis of the budding yeast genetical genomics data-set presented in Chaibub

More information

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: m Eukaryotic mrna processing Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: Cap structure a modified guanine base is added to the 5 end. Poly-A tail

More information

Three types of RNA polymerase in eukaryotic nuclei

Three types of RNA polymerase in eukaryotic nuclei Three types of RNA polymerase in eukaryotic nuclei Type Location RNA synthesized Effect of α-amanitin I Nucleolus Pre-rRNA for 18,.8 and 8S rrnas Insensitive II Nucleoplasm Pre-mRNA, some snrnas Sensitive

More information

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11 The Eukaryotic Genome and Its Expression Lecture Series 11 The Eukaryotic Genome and Its Expression A. The Eukaryotic Genome B. Repetitive Sequences (rem: teleomeres) C. The Structures of Protein-Coding

More information

From Gene to Protein

From Gene to Protein From Gene to Protein Gene Expression Process by which DNA directs the synthesis of a protein 2 stages transcription translation All organisms One gene one protein 1. Transcription of DNA Gene Composed

More information

Number of questions TEK (Learning Target) Biomolecules & Enzymes

Number of questions TEK (Learning Target) Biomolecules & Enzymes Unit Biomolecules & Enzymes Number of questions TEK (Learning Target) on Exam 8 questions 9A I can compare and contrast the structure and function of biomolecules. 9C I know the role of enzymes and how

More information

Cellular Neuroanatomy I The Prototypical Neuron: Soma. Reading: BCP Chapter 2

Cellular Neuroanatomy I The Prototypical Neuron: Soma. Reading: BCP Chapter 2 Cellular Neuroanatomy I The Prototypical Neuron: Soma Reading: BCP Chapter 2 Functional Unit of the Nervous System The functional unit of the nervous system is the neuron. Neurons are cells specialized

More information

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation.

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation. Protein Synthesis Unit 6 Goal: Students will be able to describe the processes of transcription and translation. Types of RNA Messenger RNA (mrna) makes a copy of DNA, carries instructions for making proteins,

More information

CSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators

CSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators CSEP 59A Summer 26 Lecture 4 MLE, EM, RE, Expression FYI, re HW #2: Hemoglobin History 1 Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm

More information

CSEP 590A Summer Lecture 4 MLE, EM, RE, Expression

CSEP 590A Summer Lecture 4 MLE, EM, RE, Expression CSEP 590A Summer 2006 Lecture 4 MLE, EM, RE, Expression 1 FYI, re HW #2: Hemoglobin History Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm

More information

S1 Gene ontology (GO) analysis of the network alignment results

S1 Gene ontology (GO) analysis of the network alignment results 1 Supplementary Material for Effective comparative analysis of protein-protein interaction networks by measuring the steady-state network flow using a Markov model Hyundoo Jeong 1, Xiaoning Qian 1 and

More information

Theoretical distribution of PSSM scores

Theoretical distribution of PSSM scores Regulatory Sequence Analysis Theoretical distribution of PSSM scores Jacques van Helden Jacques.van-Helden@univ-amu.fr Aix-Marseille Université, France Technological Advances for Genomics and Clinics (TAGC,

More information

Energy and Cellular Metabolism

Energy and Cellular Metabolism 1 Chapter 4 About This Chapter Energy and Cellular Metabolism 2 Energy in biological systems Chemical reactions Enzymes Metabolism Figure 4.1 Energy transfer in the environment Table 4.1 Properties of

More information

Chapter 20. Initiation of transcription. Eukaryotic transcription initiation

Chapter 20. Initiation of transcription. Eukaryotic transcription initiation Chapter 20. Initiation of transcription Eukaryotic transcription initiation 2003. 5.22 Prokaryotic vs eukaryotic Bacteria = one RNA polymerase Eukaryotes have three RNA polymerases (I, II, and III) in

More information

Metabolism of Sulfur Amino Acids in Saccharomyces cerevisiae

Metabolism of Sulfur Amino Acids in Saccharomyces cerevisiae MICROBIOLOGY AND MOLECULAR BIOLOGY REVIEWS, Dec. 1997, p. 503 532 Vol. 61, No. 4 1092-2172/97/$04.00 0 Copyright 1997, American Society for Microbiology Metabolism of Sulfur Amino Acids in Saccharomyces

More information

Biology I Fall Semester Exam Review 2014

Biology I Fall Semester Exam Review 2014 Biology I Fall Semester Exam Review 2014 Biomolecules and Enzymes (Chapter 2) 8 questions Macromolecules, Biomolecules, Organic Compunds Elements *From the Periodic Table of Elements Subunits Monomers,

More information

Introduction. Gene expression is the combined process of :

Introduction. Gene expression is the combined process of : 1 To know and explain: Regulation of Bacterial Gene Expression Constitutive ( house keeping) vs. Controllable genes OPERON structure and its role in gene regulation Regulation of Eukaryotic Gene Expression

More information

RNA Synthesis and Processing

RNA Synthesis and Processing RNA Synthesis and Processing Introduction Regulation of gene expression allows cells to adapt to environmental changes and is responsible for the distinct activities of the differentiated cell types that

More information

(starvation). Description a. Predicted operon members b. Gene no. a. Relative change in expression (n-fold) mutant vs. wild type.

(starvation). Description a. Predicted operon members b. Gene no. a. Relative change in expression (n-fold) mutant vs. wild type. 1 Table S1. Genes whose expression differ in the phyr mutant 8402 and/or in the ecfg mutant 8404 compared with the wild type when grown to the mid-exponential phase (OD600 0.5-0.7) in rich medium (PSY)

More information

Gene Control Mechanisms at Transcription and Translation Levels

Gene Control Mechanisms at Transcription and Translation Levels Gene Control Mechanisms at Transcription and Translation Levels Dr. M. Vijayalakshmi School of Chemical and Biotechnology SASTRA University Joint Initiative of IITs and IISc Funded by MHRD Page 1 of 9

More information

Graph Alignment and Biological Networks

Graph Alignment and Biological Networks Graph Alignment and Biological Networks Johannes Berg http://www.uni-koeln.de/ berg Institute for Theoretical Physics University of Cologne Germany p.1/12 Networks in molecular biology New large-scale

More information

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation.

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation. Protein Synthesis Unit 6 Goal: Students will be able to describe the processes of transcription and translation. Protein Synthesis: Protein synthesis uses the information in genes to make proteins. 2 Steps

More information

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Inferring Transcriptional Regulatory Networks from Gene Expression Data II Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday

More information

From protein networks to biological systems

From protein networks to biological systems FEBS 29314 FEBS Letters 579 (2005) 1821 1827 Minireview From protein networks to biological systems Peter Uetz a,1, Russell L. Finley Jr. b, * a Research Center Karlsruhe, Institute of Genetics, P.O. Box

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

Regulation of Transcription in Eukaryotes

Regulation of Transcription in Eukaryotes Regulation of Transcription in Eukaryotes Leucine zipper and helix-loop-helix proteins contain DNA-binding domains formed by dimerization of two polypeptide chains. Different members of each family can

More information

BIOINFORMATICS LAB AP BIOLOGY

BIOINFORMATICS LAB AP BIOLOGY BIOINFORMATICS LAB AP BIOLOGY Bioinformatics is the science of collecting and analyzing complex biological data. Bioinformatics combines computer science, statistics and biology to allow scientists to

More information

Constraint-Based Workshops

Constraint-Based Workshops Constraint-Based Workshops 2. Reconstruction Databases November 29 th, 2007 Defining Metabolic Reactions ydbh hslj ldha 1st level: Primary metabolites LAC 2nd level: Neutral Formulas C 3 H 6 O 3 Charged

More information

GCD3033:Cell Biology. Transcription

GCD3033:Cell Biology. Transcription Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors

More information

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p.110-114 Arrangement of information in DNA----- requirements for RNA Common arrangement of protein-coding genes in prokaryotes=

More information

Lecture 10: Cyclins, cyclin kinases and cell division

Lecture 10: Cyclins, cyclin kinases and cell division Chem*3560 Lecture 10: Cyclins, cyclin kinases and cell division The eukaryotic cell cycle Actively growing mammalian cells divide roughly every 24 hours, and follow a precise sequence of events know as

More information

6.096 Algorithms for Computational Biology. Prof. Manolis Kellis

6.096 Algorithms for Computational Biology. Prof. Manolis Kellis 6.096 Algorithms for Computational Biology Prof. Manolis Kellis Today s Goals Introduction Class introduction Challenges in Computational Biology Gene Regulation: Regulatory Motif Discovery Exhaustive

More information

Procedure to Create NCBI KOGS

Procedure to Create NCBI KOGS Procedure to Create NCBI KOGS full details in: Tatusov et al (2003) BMC Bioinformatics 4:41. 1. Detect and mask typical repetitive domains Reason: masking prevents spurious lumping of non-orthologs based

More information

Bioinformatics: Network Analysis

Bioinformatics: Network Analysis Bioinformatics: Network Analysis Flux Balance Analysis and Metabolic Control Analysis COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Flux Balance Analysis (FBA) Flux balance

More information

Laith AL-Mustafa. Protein synthesis. Nabil Bashir 10\28\ First

Laith AL-Mustafa. Protein synthesis. Nabil Bashir 10\28\ First Laith AL-Mustafa Protein synthesis Nabil Bashir 10\28\2015 http://1drv.ms/1gigdnv 01 First 0 Protein synthesis In previous lectures we started talking about DNA Replication (DNA synthesis) and we covered

More information

CGS 5991 (2 Credits) Bioinformatics Tools

CGS 5991 (2 Credits) Bioinformatics Tools CAP 5991 (3 Credits) Introduction to Bioinformatics CGS 5991 (2 Credits) Bioinformatics Tools Giri Narasimhan 8/26/03 CAP/CGS 5991: Lecture 1 1 Course Schedules CAP 5991 (3 credit) will meet every Tue

More information

From gene to protein. Premedical biology

From gene to protein. Premedical biology From gene to protein Premedical biology Central dogma of Biology, Molecular Biology, Genetics transcription replication reverse transcription translation DNA RNA Protein RNA chemically similar to DNA,

More information

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis Chapters 12&13 Notes: DNA, RNA & Protein Synthesis Name Period Words to Know: nucleotides, DNA, complementary base pairing, replication, genes, proteins, mrna, rrna, trna, transcription, translation, codon,

More information

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype Lecture Series 7 From DNA to Protein: Genotype to Phenotype Reading Assignments Read Chapter 7 From DNA to Protein A. Genes and the Synthesis of Polypeptides Genes are made up of DNA and are expressed

More information

Biological Process Term Enrichment

Biological Process Term Enrichment Biological Process Term Enrichment cellular protein localization cellular macromolecule localization intracellular protein transport intracellular transport generation of precursor metabolites and energy

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Additional File 9. Gene families amplified in A. adeninivorans.

Additional File 9. Gene families amplified in A. adeninivorans. Additional File. Gene families amplified in A. adeninivorans. Figure SA Duplication rate per gene per branch of the species tree leading to A. adeninivorans. Species tree showing the evolution of the species

More information

Quantitative Measurement of Genome-wide Protein Domain Co-occurrence of Transcription Factors

Quantitative Measurement of Genome-wide Protein Domain Co-occurrence of Transcription Factors Quantitative Measurement of Genome-wide Protein Domain Co-occurrence of Transcription Factors Arli Parikesit, Peter F. Stadler, Sonja J. Prohaska Bioinformatics Group Institute of Computer Science University

More information

Transmembrane Domains (TMDs) of ABC transporters

Transmembrane Domains (TMDs) of ABC transporters Transmembrane Domains (TMDs) of ABC transporters Most ABC transporters contain heterodimeric TMDs (e.g. HisMQ, MalFG) TMDs show only limited sequence homology (high diversity) High degree of conservation

More information

Computational Genomics. Reconstructing dynamic regulatory networks in multiple species

Computational Genomics. Reconstructing dynamic regulatory networks in multiple species 02-710 Computational Genomics Reconstructing dynamic regulatory networks in multiple species Methods for reconstructing networks in cells CRH1 SLT2 SLR3 YPS3 YPS1 Amit et al Science 2009 Pe er et al Recomb

More information

(Fold Change) acetate biosynthesis ALD6 acetaldehyde dehydrogenase 3.0 acetyl-coa biosynthesis ACS2 acetyl-coa synthetase 2.5

(Fold Change) acetate biosynthesis ALD6 acetaldehyde dehydrogenase 3.0 acetyl-coa biosynthesis ACS2 acetyl-coa synthetase 2.5 Supplementary Table 1 Overexpression strains that showed resistance or hypersensitivity to rapamycin. Strains that were more than two-fold enriched on average (and at least 1.8-fold enriched in both experiment

More information

The Gene The gene; Genes Genes Allele;

The Gene The gene; Genes Genes Allele; Gene, genetic code and regulation of the gene expression, Regulating the Metabolism, The Lac- Operon system,catabolic repression, The Trp Operon system: regulating the biosynthesis of the tryptophan. Mitesh

More information

Signal recognition YKL122c An01g02800 strong similarity to signal recognition particle 68K protein SRP68 - Canis lupus

Signal recognition YKL122c An01g02800 strong similarity to signal recognition particle 68K protein SRP68 - Canis lupus Supplementary Table 16 Components of the secretory pathway Aspergillus niger A.niger orf A.niger gene Entry into ER Description of putative Aspergillus niger gene Best homolog to putative A.niger gene

More information

Cell division control protein Q02457 P40341 Q06629 P50087 MIC26 Q synthase ATP-dependent. Dicarboxylic amino Q02785 P53388

Cell division control protein Q02457 P40341 Q06629 P50087 MIC26 Q synthase ATP-dependent. Dicarboxylic amino Q02785 P53388 Supplemental Material for IAS: Interaction specific GO term associations for predicting Protein-Protein Interaction Networks Satwica Yerneni, Ishita K. Khan, Qing Wei, and Daisuke Kihara Contact: dkihara@purdue.edu

More information

Chapter 17. From Gene to Protein. Biology Kevin Dees

Chapter 17. From Gene to Protein. Biology Kevin Dees Chapter 17 From Gene to Protein DNA The information molecule Sequences of bases is a code DNA organized in to chromosomes Chromosomes are organized into genes What do the genes actually say??? Reflecting

More information

Compositional Correlation for Detecting Real Associations. Among Time Series

Compositional Correlation for Detecting Real Associations. Among Time Series Compositional Correlation for Detecting Real Associations Among Time Series Fatih DIKBAS Civil Engineering Department, Pamukkale University, Turkey Correlation remains to be one of the most widely used

More information

Adaptation of Saccharomyces cerevisiae to high hydrostatic pressure causing growth inhibition

Adaptation of Saccharomyces cerevisiae to high hydrostatic pressure causing growth inhibition FEBS 29557 FEBS Letters 579 (2005) 2847 2852 Adaptation of Saccharomyces cerevisiae to high hydrostatic pressure causing growth inhibition Hitoshi Iwahashi a, *, Mine Odani b, Emi Ishidou a, Emiko Kitagawa

More information

Big Idea 3: Living systems store, retrieve, transmit and respond to information essential to life processes. Tuesday, December 27, 16

Big Idea 3: Living systems store, retrieve, transmit and respond to information essential to life processes. Tuesday, December 27, 16 Big Idea 3: Living systems store, retrieve, transmit and respond to information essential to life processes. Enduring understanding 3.B: Expression of genetic information involves cellular and molecular

More information

Systems Biology (2) Networks: Representation & static analysis

Systems Biology (2) Networks: Representation & static analysis Systems Biology (2) Networks: Representation & static analysis David Gilbert Bioinformatics Research Centre www.brc.dcs.gla.ac.uk Department of Computing Science, University of Glasgow Module outline Putting

More information

A Re-annotation of the Saccharomyces cerevisiae Genome

A Re-annotation of the Saccharomyces cerevisiae Genome Comparative and Functional Genomics Comp Funct Genom 2001; 2: 143 154. DOI: 10.1002 / cfg.86 Research Article A Re-annotation of the Saccharomyces cerevisiae Genome V. Wood*, K. M. Rutherford, A Ivens,

More information

A genome sequence based discriminator for vancomycin intermediate Staphyolococcus aureus Supplementary Methods

A genome sequence based discriminator for vancomycin intermediate Staphyolococcus aureus Supplementary Methods 1 Journal of Bacteriology Computational Biology Section 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Supplementary Information for: A genome sequence based discriminator

More information

Chapter 15 Active Reading Guide Regulation of Gene Expression

Chapter 15 Active Reading Guide Regulation of Gene Expression Name: AP Biology Mr. Croft Chapter 15 Active Reading Guide Regulation of Gene Expression The overview for Chapter 15 introduces the idea that while all cells of an organism have all genes in the genome,

More information

Degeneracy. Two types of degeneracy:

Degeneracy. Two types of degeneracy: Degeneracy The occurrence of more than one codon for an amino acid (AA). Most differ in only the 3 rd (3 ) base, with the 1 st and 2 nd being most important for distinguishing the AA. Two types of degeneracy:

More information

Translation. Genetic code

Translation. Genetic code Translation Genetic code If genes are segments of DNA and if DNA is just a string of nucleotide pairs, then how does the sequence of nucleotide pairs dictate the sequence of amino acids in proteins? Simple

More information

Whole-genome analysis of GCN4 binding in S.cerevisiae

Whole-genome analysis of GCN4 binding in S.cerevisiae Whole-genome analysis of GCN4 binding in S.cerevisiae Lillian Dai Alex Mallet Gcn4/DNA diagram (CREB symmetric site and AP-1 asymmetric site: Song Tan, 1999) removed for copyright reasons. What is GCN4?

More information

Introduction to Molecular and Cell Biology

Introduction to Molecular and Cell Biology Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the molecular basis of disease? What

More information

GENETICS. Supporting Information

GENETICS. Supporting Information GENETICS Supporting Information http://www.genetics.org/cgi/content/full/genetics.110.117655/dc1 Trivalent Arsenic Inhibits the Functions of Chaperonin Complex XuewenPan,StefanieReissman,NickR.Douglas,ZhiweiHuang,DanielS.Yuan,

More information

Translation - Prokaryotes

Translation - Prokaryotes 1 Translation - Prokaryotes Shine-Dalgarno (SD) Sequence rrna 3 -GAUACCAUCCUCCUUA-5 mrna...ggagg..(5-7bp)...aug Influences: Secondary structure!! SD and AUG in unstructured region Start AUG 91% GUG 8 UUG

More information

15.2 Prokaryotic Transcription *

15.2 Prokaryotic Transcription * OpenStax-CNX module: m52697 1 15.2 Prokaryotic Transcription * Shannon McDermott Based on Prokaryotic Transcription by OpenStax This work is produced by OpenStax-CNX and licensed under the Creative Commons

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting. Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction

More information

Supplementary Figure 1. The proportion S. aureus CFU of the total CFU (S. aureus + E. faecalis CFU) per host in worms

Supplementary Figure 1. The proportion S. aureus CFU of the total CFU (S. aureus + E. faecalis CFU) per host in worms Supplementary Figure 1. The proportion S. aureus CFU of the total CFU (S. aureus + E. faecalis CFU) per host in worms alive or dead at 24 hours of exposure. Two sample t-test: t = 1.22, df = 10, P= 0.25.

More information

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution Massachusetts Institute of Technology 6.877 Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution 1. Rates of amino acid replacement The initial motivation for the neutral

More information

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology 2012 Univ. 1301 Aguilera Lecture Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the

More information

Lecture 13: PROTEIN SYNTHESIS II- TRANSLATION

Lecture 13: PROTEIN SYNTHESIS II- TRANSLATION http://smtom.lecture.ub.ac.id/ Password: https://syukur16tom.wordpress.com/ Password: Lecture 13: PROTEIN SYNTHESIS II- TRANSLATION http://hyperphysics.phy-astr.gsu.edu/hbase/organic/imgorg/translation2.gif

More information

Molecular Biology (9)

Molecular Biology (9) Molecular Biology (9) Translation Mamoun Ahram, PhD Second semester, 2017-2018 1 Resources This lecture Cooper, Ch. 8 (297-319) 2 General information Protein synthesis involves interactions between three

More information

Gene regulation I Biochemistry 302. Bob Kelm February 25, 2005

Gene regulation I Biochemistry 302. Bob Kelm February 25, 2005 Gene regulation I Biochemistry 302 Bob Kelm February 25, 2005 Principles of gene regulation (cellular versus molecular level) Extracellular signals Chemical (e.g. hormones, growth factors) Environmental

More information

Sara Khraim. Shaymaa Alnamos ... Dr. Nafeth

Sara Khraim. Shaymaa Alnamos ... Dr. Nafeth 10 Sara Khraim Shaymaa Alnamos... Dr. Nafeth *Requirement of oxidative phosphorylation: 1- Source and target for electrons(nadh+fadh2 >> O2). 2- Electron carriers. 3- Enzymes, like oxidoreductases and

More information

9 The Process of Translation

9 The Process of Translation 9 The Process of Translation 9.1 Stages of Translation Process We are familiar with the genetic code, we can begin to study the mechanism by which amino acids are assembled into proteins. Because more

More information

What Kind Of Molecules Carry Protein Assembly Instructions From The Nucleus To The Cytoplasm

What Kind Of Molecules Carry Protein Assembly Instructions From The Nucleus To The Cytoplasm What Kind Of Molecules Carry Protein Assembly Instructions From The Nucleus To The Cytoplasm What kind of reaction produces large molecules by linking small molecules? molecules carry protein assembly

More information

Lecture 18 June 2 nd, Gene Expression Regulation Mutations

Lecture 18 June 2 nd, Gene Expression Regulation Mutations Lecture 18 June 2 nd, 2016 Gene Expression Regulation Mutations From Gene to Protein Central Dogma Replication DNA RNA PROTEIN Transcription Translation RNA Viruses: genome is RNA Reverse Transcriptase

More information

THE CELL 3/15/15 HUMAN ANATOMY AND PHYSIOLOGY I THE CELLULAR BASIS OF LIFE

THE CELL 3/15/15 HUMAN ANATOMY AND PHYSIOLOGY I THE CELLULAR BASIS OF LIFE HUMAN ANATOMY AND PHYSIOLOGY I Lecture: M 6-9:30 Randall Visitor Center Lab: W 6-9:30 Swatek Anatomy Center, Centennial Complex Required Text: Marieb 9 th edition Dr. Trevor Lohman DPT (949) 246-5357 tlohman@llu.edu

More information

GENETICS - CLUTCH CH.11 TRANSLATION.

GENETICS - CLUTCH CH.11 TRANSLATION. !! www.clutchprep.com CONCEPT: GENETIC CODE Nucleotides and amino acids are translated in a 1 to 1 method The triplet code states that three nucleotides codes for one amino acid - A codon is a term for

More information

TRANSLATION: How to make proteins?

TRANSLATION: How to make proteins? TRANSLATION: How to make proteins? EUKARYOTIC mrna CBP80 NUCLEUS SPLICEOSOME 5 UTR INTRON 3 UTR m 7 GpppG AUG UAA 5 ss 3 ss CBP20 PABP2 AAAAAAAAAAAAA 50-200 nts CYTOPLASM eif3 EJC PABP1 5 UTR 3 UTR m 7

More information

RECONSTRUCTING GENE REGULATORY NETWORKS FROM FUNGAL TRANSCRIPTOMIC DATA USING BAYESIAN NETWORK

RECONSTRUCTING GENE REGULATORY NETWORKS FROM FUNGAL TRANSCRIPTOMIC DATA USING BAYESIAN NETWORK RECONSTRUCTING GENE REGULATORY NETWORKS FROM FUNGAL TRANSCRIPTOMIC DATA USING BAYESIAN NETWORK Li Guo Fungal Comparative Genomics Laboratory Department of Biochemistry and Molecular biology University

More information

UNIT 6 PART 3 *REGULATION USING OPERONS* Hillis Textbook, CH 11

UNIT 6 PART 3 *REGULATION USING OPERONS* Hillis Textbook, CH 11 UNIT 6 PART 3 *REGULATION USING OPERONS* Hillis Textbook, CH 11 REVIEW: Signals that Start and Stop Transcription and Translation BUT, HOW DO CELLS CONTROL WHICH GENES ARE EXPRESSED AND WHEN? First of

More information

Table 5. Genes unique to G. thermodenitrificans NG80-2 Gene ID Gene name Gene product COG functional category

Table 5. Genes unique to G. thermodenitrificans NG80-2 Gene ID Gene name Gene product COG functional category Table 5. Genes unique to G. thermodenitrificans NG80-2 GT0030 gt30 Methionine--tRNA ligase/methionyl-trna synthetase COG0143, Translation GT0033 gt33 Unknown GT0106 gt106 Ribosomal protein L3 COG0087,

More information

Regulation of gene Expression in Prokaryotes & Eukaryotes

Regulation of gene Expression in Prokaryotes & Eukaryotes Regulation of gene Expression in Prokaryotes & Eukaryotes 1 The trp Operon Contains 5 genes coding for proteins (enzymes) required for the synthesis of the amino acid tryptophan. Also contains a promoter

More information

Prokaryotic Regulation

Prokaryotic Regulation Prokaryotic Regulation Control of transcription initiation can be: Positive control increases transcription when activators bind DNA Negative control reduces transcription when repressors bind to DNA regulatory

More information

1. In most cases, genes code for and it is that

1. In most cases, genes code for and it is that Name Chapter 10 Reading Guide From DNA to Protein: Gene Expression Concept 10.1 Genetics Shows That Genes Code for Proteins 1. In most cases, genes code for and it is that determine. 2. Describe what Garrod

More information

Biochemistry Prokaryotic translation

Biochemistry Prokaryotic translation 1 Description of Module Subject Name Paper Name Module Name/Title Dr. Vijaya Khader Dr. MC Varadaraj 2 1. Objectives 2. Understand the concept of genetic code 3. Understand the concept of wobble hypothesis

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA XIUFENG WAN xw6@cs.msstate.edu Department of Computer Science Box 9637 JOHN A. BOYLE jab@ra.msstate.edu Department of Biochemistry and Molecular Biology

More information

Written Exam 15 December Course name: Introduction to Systems Biology Course no

Written Exam 15 December Course name: Introduction to Systems Biology Course no Technical University of Denmark Written Exam 15 December 2008 Course name: Introduction to Systems Biology Course no. 27041 Aids allowed: Open book exam Provide your answers and calculations on separate

More information

Introduction to molecular biology. Mitesh Shrestha

Introduction to molecular biology. Mitesh Shrestha Introduction to molecular biology Mitesh Shrestha Molecular biology: definition Molecular biology is the study of molecular underpinnings of the process of replication, transcription and translation of

More information

CSCE555 Bioinformatics. Protein Function Annotation

CSCE555 Bioinformatics. Protein Function Annotation CSCE555 Bioinformatics Protein Function Annotation Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007 What s function? The

More information