Chapter 2 Class Notes Words and Probability
|
|
- Laurence Spencer
- 5 years ago
- Views:
Transcription
1 Chapter 2 Class Notes Words and Probability Medical/Genetics Illustration reference Bojesen et al (2003), Integrin 3 Leu33Pro Homozygosity and Risk of Cancer, J. NCI. Women only 2 x 2 table: Stratification (Type) Outcome Status With Cancer Without Cancer Total Non-carriers 501 (14.4%) Homozygotes 29 (21.5%) Total The variables here are X = stratification (either non-carrier or homozygote group) Y = cancer status at end of study. Both variables here are nominal (and therefore qualitative). 1 P a g e
2 For a review of basic probability (STAT-335), see Chap3 notes at Given a single-strand sequence such as (another is on p.37): ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCC CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGC CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGG AAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCC CTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAG TTTAATTACAGACCTGAA Summarize/analyze using statistics: is this a coding region? Does this segment differ from other regions? What is its function? Words of length: 1 are 1-tuples (e.g., nucleotides or purines/pyrimidines) 2 are 2-tuples (e.g., di-nucleotides) 3 are 3-tuples or triples (e.g., codons) etc. DNA sequences from different sources/regions of a genome may be distinguished from each other by their k-tuple content Base Composition (k=1): Consider for a moment the duplex DNA and residues A, C, G and T. Notice that fr(a) = fr(t), fr(c) = fr(g); since fr(a) + fr(c) + fr(g) + fr(t) = 1, it follows that fr(a+t) = 1- fr(c+g). So, you only need to know fr(c+g). These are given for different organisms on p.39: these percentages range from 31.6% for Mycoplasma genitalium to 66.4% for Pseudomonas aeruginosa PAO1 (bacteria). Asymmetries can be detected especially on the leading strand using the 2 P a g e
3 2.3. Introduction to Probability (review) A discrete random variable (RV) X takes on values with respective probabilities and such that. Associating the probabilities with the realized values of X either in a table, formula or graph is the probability mass function of X. An example could be for nucleotides with ; we can then derive a second RV which is only counts A s: it is 1 if the nucleotide is A and 0 otherwise; this is called a Bernoulli RV. Now, consider a series of Bernoulli RVs,,, corresponding to positions 1, 2 n; again, each of these is 1 if the respective nucleotide is A and 0 otherwise. For each of these, ; also, ( ). Then is a RV that counts the number of A s in the sequence of length n. Next, we ll make a big assumption (for the moment): that what occurs at one position is independent of what occurs elsewhere (see STAT-335 notes and p ). In the case of independence, probabilities of intersections (AND) amount to simply multiplying the unconditional probabilities Expected Values and Variances: the expected value (EV) of RV X is just its mean:. Thus, for the above. Also, since, (dropping the subscript for convenience). Next, variance is defined as [ ] ; the short-cut 3 P a g e
4 formula is then. So, by the short-cut formula. A final snippet: since we re assuming the s are independent and since the variance of a sum of independent RVs is the sum of the individual variances, (again dropping the subscript) The Binomial Distribution: by now, you have noticed that is a Binomial RV with parameters and. Hence, 2.4. Simulating from Probability Distributions: to understand the behavior (e.g. distribution) of a RV such as, simulation can be very helpful. To do so, we d generate a large number (m) of :. We could then calculate the sample mean ( ), sample variance ( ( ) ), and histogram to get an idea of,, and the distribution of. To do this, pseudo-random number generators are used; these generators simulate uniform RVs on the interval, and then convert. To illustrate, it s easy to simulate from distribution (a) for A versus rest, and (b) for all four nucleotides using Minitab and R. In R, we use the sample command: pi<-c(0.25,0.75) x<-c(1,0) seq1<-sample(x,10000,replace=t,pi) seq1[1:30] 4 P a g e
5 For the nucleotide problem: pi<-c(0.25,0.25,0.25,0.25) x<-c(1,2,3,4) seq<-sample(x,10000,replace=t,pi) seq[1:30] hist(seq) To simulate from the Binomial ( ) distribution we use the rbinom command: x<-rbinom(10000,1000,0.25) x[1:10] mean(x) sd(x)^2 hist(x,xlab="number of successes",main="binomial Simulation") Now, suppose in a sequence of length, we observe A s, and we want to know the associated p-value in a test where the alternative hypothesis is ; then we wish to find the probability of at least 280 successes in this binomial experiment. Ways to find this are (a) direct calculation from equation (2.20) on p.48 ( ), (b) approximation using the CLT ( ( ) ), (c) using Minitab or calculator, or (d) simulation (see p.63 exercise 4) Biological Words (k=2): let denote the nucleotide at position and so is the dinucleotide starting at the same position (reading of course in the to direction); can take on values A,C,G,T, and can equal AA,AC,AG,AT,CA,CC, 5 P a g e
6 CG,CT,GA,GC,GG,GT,TA,TC,TG,TT. Dinucleotides are important in part because physical parameters associated with them can describe the trajectory of the DNA helix through space (DNA bending), which may have effects on gene expression. Next, with and, we can test for independence, for all pairs, by using need to calculate with. First, we { Then, the relevant test statistic is, so for example, if then we conclude that the model does not fit for this dinucleotide pair at the 5% level of significance. In practice, frequencies from the data fr(a), fr(c), fr(g), fr(t) are used to estimate the probabilities. Empirical results are given in Table 2.2 on p.50 for E. coli and Mycoplasma genitalium; clearly, the first 1000 bp for E. coli deviate from the iid model. 6 P a g e
7 2.6. Introduction to Markov Chains: As seen by Table 2.2, the iid model may not be rich enough, and a richer one may be needed one approach is to use a Markov chain. If in a nucleotide at position we observe a C, then the nucleotide at the next position may depend upon this information, and we are led to conditional probabilities; often we find that in this case the probability that the next position is G is less than its being an A, C, or T Conditional Probability (review): in the introductory notes, we saw that the probability of events A and B (intersection) occurring is and the probability of events A or B (union) occurring is. Then, the conditional probability of event A given that B has occurred is: Similarly,. An immediate consequence of this is the multiplication rule:. Bayes Theorem follows directly: Also, let for a partition of (so that all are disjoint for and exhaustive ), then, so (*) can be rewritten The Markov Property: the sequence { } is called a first-order Markov chain if the probability of finding a particular character at position given the preceding characters at positions is identical to the probability 7 P a g e
8 of observing the character at position given the character state of the immediate preceding position,. Formally, Markov chains of order 2 correspond to the case where the conditional probability of the present position depends on the previous 2 positions, and so on. We also consider only Markov chains that are homogeneous: the above property doesn t depend upon (position along the nucleotide or segment). Next, let be the one-step transition probabilities for ; for the nucleotide problem, {A,C,G, T}. For the nucleotide problem, these probabilities are given in the one-step transition matrix: [ ] Each row sums to one, so ; this follows since after each position there must be one character from next in the sequence, and can be written in the nucleotide case as [ ] Next, we have to indicate how the Markov chain begins; for this, we let the initial probability distribution have elements 8 P a g e
9 Also, at time, we have ; in matrix terms:. In the nucleotide case, the right hand side is [ ] [ ] The result is also a matrix (vector), and the first element is On p.53 (text), the authors show that the transition matrix from time to time is given by, and so on, so that means that. A stationary distribution for the chain is when, so in matrix form: A Markov Chain Simulation: following is the dinucleotide frequencies of M. genitalium: A C G T A C G T [ ] Summing across the rows, we obtain empirical estimates, and then dividing through (e.g. ), we get 9 P a g e
10 A C G T A C G T [ ] We can also assume that the initial distribution is (as above) Now we can simulate (see Computational Example 2.3 on p.55): markov1 <- function(x,pi,p,n) { mg <- rep(0,n) mg[1] <- sample(x,1,replace=true,pi) for(k in 1:(n-1)) { mg[k+1]<-sample(x,1,replace=t,p[mg[k],]) } return(mg) } x<-c(1:4) pi<-c(0.342,0.158,0.158,0.342) P<-matrix(scan(),ncol=4,nrow=4,byrow=T) tmp<-markov1(x,pi,p,50000) length(tmp[tmp[]==1])/length(tmp) [1] length(tmp[tmp[]==2])/length(tmp) [1] P a g e
11 length(tmp[tmp[]==3])/length(tmp) [1] length(tmp[tmp[]==4])/length(tmp) [1] Also, we used the above to simulation the dinucleotide distribution (not shown); since it matches the original, we conclude as in the text (p.57) that the Markov model provides a good probabilistic description of the data for M. genitalium DNA Biological Words with k=3 (Codons): We ll discuss two measures here relating codons to amino acids with an eye to distinguishing coding from non-coding genetic regions. First, we ll compare expressed codon frequencies to those which would be expressed under independence. To illustrate using E. coli, from Table 2.1 (p.39),, so under independence, and. These are the only two codons coding for amino acid Phe (phenylaline), so under independence we expect the proportion to be roughly. Similar results are given in the Predicted column in Table 2.3, and these results are compared with the empirical results for two gene classes (I and II). Ménigue et al then noted when there were big differences and investigated and found that Class II genes were largely those such as ribosomal proteins or translation factors genes expressed at high levels whereas Class I genes were mostly those that are expressed at moderate levels. 11 P a g e
12 Another measure is the (codon adaptation index): [ ] It is the geometric mean of the ratios of the probabilities for the codons actually used to the probability of the codons most frequently used in highly expressed genes. It is illustrated on p.59 again using E. coli. In E. coli, a sample of 500 protein-coding genes displayed CAI values in the range from 0.2 to Further, there is a correlation between the CAI and mrna levels. In other words, the CAI for a gene sequence in genomic DNA provides a first approximation of its expression level: if the CAI is relatively large, then we would predict that the expression level is also large. 12 P a g e
13 13 P a g e
Modelling and Analysis in Bioinformatics. Lecture 1: Genomic k-mer Statistics
582746 Modelling and Analysis in Bioinformatics Lecture 1: Genomic k-mer Statistics Juha Kärkkäinen 06.09.2016 Outline Course introduction Genomic k-mers 1-Mers 2-Mers 3-Mers k-mers for Larger k Outline
More informationB I O I N F O R M A T I C S
B I O I N F O R M A T I C S Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: SEQUENCE ANALYSIS 1 The biological problem
More informationLearning Objectives for Stat 225
Learning Objectives for Stat 225 08/20/12 Introduction to Probability: Get some general ideas about probability, and learn how to use sample space to compute the probability of a specific event. Set Theory:
More informationQuantitative Bioinformatics
Chapter 9 Class Notes Signals in DNA 9.1. The Biological Problem: since proteins cannot read, how do they recognize nucleotides such as A, C, G, T? Although only approximate, proteins actually recognize
More informationComputational Biology: Basics & Interesting Problems
Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information
More information(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.
1. A change that makes a polypeptide defective has been discovered in its amino acid sequence. The normal and defective amino acid sequences are shown below. Researchers are attempting to reproduce the
More informationChapter 3 Class Notes Word Distributions and Occurrences
Chapter 3 Class Notes Word Distributions and Occurrences 3.1. The Biological Problem: restriction endonucleases provide[s] the means for precisely and reproducibly cutting the DNA into fragments of manageable
More informationToday s Lecture: HMMs
Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models
More informationBME 5742 Biosystems Modeling and Control
BME 5742 Biosystems Modeling and Control Lecture 24 Unregulated Gene Expression Model Dr. Zvi Roth (FAU) 1 The genetic material inside a cell, encoded in its DNA, governs the response of a cell to various
More informationModeling Noise in Genetic Sequences
Modeling Noise in Genetic Sequences M. Radavičius 1 and T. Rekašius 2 1 Institute of Mathematics and Informatics, Vilnius, Lithuania 2 Vilnius Gediminas Technical University, Vilnius, Lithuania 1. Introduction:
More informationRelated Courses He who asks is a fool for five minutes, but he who does not ask remains a fool forever.
CSE 527 Computational Biology http://www.cs.washington.edu/527 Lecture 1: Overview & Bio Review Autumn 2004 Larry Ruzzo Related Courses He who asks is a fool for five minutes, but he who does not ask remains
More informationO 3 O 4 O 5. q 3. q 4. Transition
Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in
More informationModels of Molecular Evolution
Models of Molecular Evolution Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison September 15, 2007 Genetics 875 (Fall 2009) Molecular Evolution September 15, 2009 1 /
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More informationLearning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence
More informationComputational Cell Biology Lecture 4
Computational Cell Biology Lecture 4 Case Study: Basic Modeling in Gene Expression Yang Cao Department of Computer Science DNA Structure and Base Pair Gene Expression Gene is just a small part of DNA.
More informationRNA & PROTEIN SYNTHESIS. Making Proteins Using Directions From DNA
RNA & PROTEIN SYNTHESIS Making Proteins Using Directions From DNA RNA & Protein Synthesis v Nitrogenous bases in DNA contain information that directs protein synthesis v DNA remains in nucleus v in order
More informationHigh-Throughput Sequencing Course
High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an
More informationSample Spaces, Random Variables
Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted
More informationMidterm 2 Review. CS70 Summer Lecture 6D. David Dinh 28 July UC Berkeley
Midterm 2 Review CS70 Summer 2016 - Lecture 6D David Dinh 28 July 2016 UC Berkeley Midterm 2: Format 8 questions, 190 points, 110 minutes (same as MT1). Two pages (one double-sided sheet) of handwritten
More informationDisjointness and Additivity
Midterm 2: Format Midterm 2 Review CS70 Summer 2016 - Lecture 6D David Dinh 28 July 2016 UC Berkeley 8 questions, 190 points, 110 minutes (same as MT1). Two pages (one double-sided sheet) of handwritten
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationx y = 1, 2x y + z = 2, and 3w + x + y + 2z = 0
Section. Systems of Linear Equations The equations x + 3 y =, x y + z =, and 3w + x + y + z = 0 have a common feature: each describes a geometric shape that is linear. Upon rewriting the first equation
More informationRegulatory Sequence Analysis. Sequence models (Bernoulli and Markov models)
Regulatory Sequence Analysis Sequence models (Bernoulli and Markov models) 1 Why do we need random models? Any pattern discovery relies on an underlying model to estimate the random expectation. This model
More informationSupplementary materials Quantitative assessment of ribosome drop-off in E. coli
Supplementary materials Quantitative assessment of ribosome drop-off in E. coli Celine Sin, Davide Chiarugi, Angelo Valleriani 1 Downstream Analysis Supplementary Figure 1: Illustration of the core steps
More informationLecture 5: Processes and Timescales: Rates for the fundamental processes 5.1
Lecture 5: Processes and Timescales: Rates for the fundamental processes 5.1 Reading Assignment for Lectures 5-6: Phillips, Kondev, Theriot (PKT), Chapter 3 Life is not static. Organisms as a whole are
More informationHidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.
, I. Toy Markov, I. February 17, 2017 1 / 39 Outline, I. Toy Markov 1 Toy 2 3 Markov 2 / 39 , I. Toy Markov A good stack of examples, as large as possible, is indispensable for a thorough understanding
More informationChapter 1. Vectors, Matrices, and Linear Spaces
1.7 Applications to Population Distributions 1 Chapter 1. Vectors, Matrices, and Linear Spaces 1.7. Applications to Population Distributions Note. In this section we break a population into states and
More informationSTAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.
STAT 302 Introduction to Probability Learning Outcomes Textbook: A First Course in Probability by Sheldon Ross, 8 th ed. Chapter 1: Combinatorial Analysis Demonstrate the ability to solve combinatorial
More informationMarkov Chains and Hidden Markov Models. = stochastic, generative models
Markov Chains and Hidden Markov Models = stochastic, generative models (Drawing heavily from Durbin et al., Biological Sequence Analysis) BCH339N Systems Biology / Bioinformatics Spring 2016 Edward Marcotte,
More informationProperties of Probability
Econ 325 Notes on Probability 1 By Hiro Kasahara Properties of Probability In statistics, we consider random experiments, experiments for which the outcome is random, i.e., cannot be predicted with certainty.
More informationComputational Genomics
Computational Genomics http://www.cs.cmu.edu/~02710 Introduction to probability, statistics and algorithms (brief) intro to probability Basic notations Random variable - referring to an element / event
More informationHMMs and biological sequence analysis
HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the
More informationCS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS
CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * The contents are adapted from Dr. Jean Gao at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Primer on Probability Random
More informationProf. Christian MICHEL
CIRCULAR CODES IN GENES AND GENOMES - 2013 - Prof. Christian MICHEL Theoretical Bioinformatics ICube University of Strasbourg, CNRS France c.michel@unistra.fr http://dpt-info.u-strasbg.fr/~c.michel/ Prof.
More informationOrganic Chemistry Option II: Chemical Biology
Organic Chemistry Option II: Chemical Biology Recommended books: Dr Stuart Conway Department of Chemistry, Chemistry Research Laboratory, University of Oxford email: stuart.conway@chem.ox.ac.uk Teaching
More informationObjective 3.01 (DNA, RNA and Protein Synthesis)
Objective 3.01 (DNA, RNA and Protein Synthesis) DNA Structure o Discovered by Watson and Crick o Double-stranded o Shape is a double helix (twisted ladder) o Made of chains of nucleotides: o Has four types
More informationUNIT 5. Protein Synthesis 11/22/16
UNIT 5 Protein Synthesis IV. Transcription (8.4) A. RNA carries DNA s instruction 1. Francis Crick defined the central dogma of molecular biology a. Replication copies DNA b. Transcription converts DNA
More informationChapter 6 Expectation and Conditional Expectation. Lectures Definition 6.1. Two random variables defined on a probability space are said to be
Chapter 6 Expectation and Conditional Expectation Lectures 24-30 In this chapter, we introduce expected value or the mean of a random variable. First we define expectation for discrete random variables
More informationBoolean models of gene regulatory networks. Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016
Boolean models of gene regulatory networks Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016 Gene expression Gene expression is a process that takes gene info and creates
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationFrom Gene to Protein
From Gene to Protein Gene Expression Process by which DNA directs the synthesis of a protein 2 stages transcription translation All organisms One gene one protein 1. Transcription of DNA Gene Composed
More informationOne-Parameter Processes, Usually Functions of Time
Chapter 4 One-Parameter Processes, Usually Functions of Time Section 4.1 defines one-parameter processes, and their variations (discrete or continuous parameter, one- or two- sided parameter), including
More informationLecture 15: Programming Example: TASEP
Carl Kingsford, 0-0, Fall 0 Lecture : Programming Example: TASEP The goal for this lecture is to implement a reasonably large program from scratch. The task we will program is to simulate ribosomes moving
More informationLECTURE NOTES: Discrete time Markov Chains (2/24/03; SG)
LECTURE NOTES: Discrete time Markov Chains (2/24/03; SG) A discrete time Markov Chains is a stochastic dynamical system in which the probability of arriving in a particular state at a particular time moment
More informationWeek Topics of study Home/Independent Learning Assessment (If in addition to homework) 7 th September 2015
Week Topics of study Home/Independent Learning Assessment (If in addition to homework) 7 th September Functions: define the terms range and domain (PLC 1A) and identify the range and domain of given functions
More informationSequence analysis and comparison
The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species
More information1 Variance of a Random Variable
Indian Institute of Technology Bombay Department of Electrical Engineering Handout 14 EE 325 Probability and Random Processes Lecture Notes 9 August 28, 2014 1 Variance of a Random Variable The expectation
More informationAn introduction to biostatistics: part 1
An introduction to biostatistics: part 1 Cavan Reilly September 6, 2017 Table of contents Introduction to data analysis Uncertainty Probability Conditional probability Random variables Discrete random
More informationLinear Regression (1/1/17)
STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression
More informationLecture 5. How DNA governs protein synthesis. Primary goal: How does sequence of A,G,T, and C specify the sequence of amino acids in a protein?
Lecture 5 (FW) February 4, 2009 Translation, trna adaptors, and the code Reading.Chapters 8 and 9 Lecture 5. How DNA governs protein synthesis. Primary goal: How does sequence of A,G,T, and C specify the
More informationStochastic processes and Markov chains (part II)
Stochastic processes and Markov chains (part II) Wessel van Wieringen w.n.van.wieringen@vu.nl Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University Amsterdam, The
More informationLecture 9: Linear Regression
Lecture 9: Linear Regression Goals Develop basic concepts of linear regression from a probabilistic framework Estimating parameters and hypothesis testing with linear models Linear regression in R Regression
More informationChapter 17. From Gene to Protein. Biology Kevin Dees
Chapter 17 From Gene to Protein DNA The information molecule Sequences of bases is a code DNA organized in to chromosomes Chromosomes are organized into genes What do the genes actually say??? Reflecting
More informationData Mining in Bioinformatics HMM
Data Mining in Bioinformatics HMM Microarray Problem: Major Objective n Major Objective: Discover a comprehensive theory of life s organization at the molecular level 2 1 Data Mining in Bioinformatics
More informationVisually Identifying Potential Domains for Change Points in Generalized Bernoulli Processes: an Application to DNA Segmental Analysis
University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Information Sciences 2009 Visually Identifying Potential Domains for
More informationInterpolated Markov Models for Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey
Interpolated Markov Models for Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following the
More informationBiology I Fall Semester Exam Review 2014
Biology I Fall Semester Exam Review 2014 Biomolecules and Enzymes (Chapter 2) 8 questions Macromolecules, Biomolecules, Organic Compunds Elements *From the Periodic Table of Elements Subunits Monomers,
More informationNumber of questions TEK (Learning Target) Biomolecules & Enzymes
Unit Biomolecules & Enzymes Number of questions TEK (Learning Target) on Exam 8 questions 9A I can compare and contrast the structure and function of biomolecules. 9C I know the role of enzymes and how
More informationBiological Systems: Open Access
Biological Systems: Open Access Biological Systems: Open Access Liu and Zheng, 2016, 5:1 http://dx.doi.org/10.4172/2329-6577.1000153 ISSN: 2329-6577 Research Article ariant Maps to Identify Coding and
More informationMath-Stat-491-Fall2014-Notes-I
Math-Stat-491-Fall2014-Notes-I Hariharan Narayanan October 2, 2014 1 Introduction This writeup is intended to supplement material in the prescribed texts: Introduction to Probability Models, 10th Edition,
More information20 Hypothesis Testing, Part I
20 Hypothesis Testing, Part I Bob has told Alice that the average hourly rate for a lawyer in Virginia is $200 with a standard deviation of $50, but Alice wants to test this claim. If Bob is right, she
More information*Equal contribution Contact: (TT) 1 Department of Biomedical Engineering, the Engineering Faculty, Tel Aviv
Supplementary of Complementary Post Transcriptional Regulatory Information is Detected by PUNCH-P and Ribosome Profiling Hadas Zur*,1, Ranen Aviner*,2, Tamir Tuller 1,3 1 Department of Biomedical Engineering,
More informationNon-specific filtering and control of false positives
Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview
More informationElements of probability theory
The role of probability theory in statistics We collect data so as to provide evidentiary support for answers we give to our many questions about the world (and in our particular case, about the business
More informationThe genome encodes biology as patterns or motifs. We search the genome for biologically important patterns.
Curriculum, fourth lecture: Niels Richard Hansen November 30, 2011 NRH: Handout pages 1-8 (NRH: Sections 2.1-2.5) Keywords: binomial distribution, dice games, discrete probability distributions, geometric
More informationMathematics, Genomics, and Cancer
School of Informatics IUB April 6, 2009 Outline Introduction Class Comparison Class Discovery Class Prediction Example Biological states and state modulation Software Tools Research directions Math & Biology
More informationLECTURE 1. 1 Introduction. 1.1 Sample spaces and events
LECTURE 1 1 Introduction The first part of our adventure is a highly selective review of probability theory, focusing especially on things that are most useful in statistics. 1.1 Sample spaces and events
More informationNaive Bayes classification
Naive Bayes classification Christos Dimitrakakis December 4, 2015 1 Introduction One of the most important methods in machine learning and statistics is that of Bayesian inference. This is the most fundamental
More informationConditional probabilities and graphical models
Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within
More informationHidden Markov Models and some applications
Oleg Makhnin New Mexico Tech Dept. of Mathematics November 11, 2011 HMM description Application to genetic analysis Applications to weather and climate modeling Discussion HMM description Application to
More informationHypothesis. Levels of organization. Theory. Controlled experiment. Homeostasis. ph scale. Characteristics of living things
Hypothesis Quantitative & Qualitative observations Theory Levels of organization Controlled experiment Homeostasis Characteristics of living things ph scale Quantitative- involves numbers, counting, measuring
More informationChapter 5. Means and Variances
1 Chapter 5 Means and Variances Our discussion of probability has taken us from a simple classical view of counting successes relative to total outcomes and has brought us to the idea of a probability
More informationRandomized Algorithms
Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours
More informationWhy study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables
ECE 6010 Lecture 1 Introduction; Review of Random Variables Readings from G&S: Chapter 1. Section 2.1, Section 2.3, Section 2.4, Section 3.1, Section 3.2, Section 3.5, Section 4.1, Section 4.2, Section
More informationPalindromes in SARS and other Coronaviruses
Palindromes in SARS and other Coronaviruses Ming-Ying Leung Department of Mathematical Sciences University of Texas at El Paso El Paso, TX 79968-0514 Outline: Coronavirus genomes Palindromes Mean and Variance
More informationPROTEIN SYNTHESIS INTRO
MR. POMERANTZ Page 1 of 6 Protein synthesis Intro. Use the text book to help properly answer the following questions 1. RNA differs from DNA in that RNA a. is single-stranded. c. contains the nitrogen
More informationFlow of Genetic Information
presents Flow of Genetic Information A Montagud E Navarro P Fernández de Córdoba JF Urchueguía Elements Nucleic acid DNA RNA building block structure & organization genome building block types Amino acid
More informationDNA, Chromosomes, and Genes
N, hromosomes, and Genes 1 You have most likely already learned about deoxyribonucleic acid (N), chromosomes, and genes. You have learned that all three of these substances have something to do with heredity
More informationVideos. Bozeman, transcription and translation: https://youtu.be/h3b9arupxzg Crashcourse: Transcription and Translation - https://youtu.
Translation Translation Videos Bozeman, transcription and translation: https://youtu.be/h3b9arupxzg Crashcourse: Transcription and Translation - https://youtu.be/itsb2sqr-r0 Translation Translation The
More informationSupplemental Information
Molecular Cell, Volume 52 Supplemental Information The Translational Landscape of the Mammalian Cell Cycle Craig R. Stumpf, Melissa V. Moreno, Adam B. Olshen, Barry S. Taylor, and Davide Ruggero Supplemental
More informationLecture 3: Markov chains.
1 BIOINFORMATIK II PROBABILITY & STATISTICS Summer semester 2008 The University of Zürich and ETH Zürich Lecture 3: Markov chains. Prof. Andrew Barbour Dr. Nicolas Pétrélis Adapted from a course by Dr.
More informationReview of Basic Probability Theory
Review of Basic Probability Theory James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 35 Review of Basic Probability Theory
More informationModel Building Chap 5 p251
Model Building Chap 5 p251 Models with one qualitative variable, 5.7 p277 Example 4 Colours : Blue, Green, Lemon Yellow and white Row Blue Green Lemon Insects trapped 1 0 0 1 45 2 0 0 1 59 3 0 0 1 48 4
More information15.2 Prokaryotic Transcription *
OpenStax-CNX module: m52697 1 15.2 Prokaryotic Transcription * Shannon McDermott Based on Prokaryotic Transcription by OpenStax This work is produced by OpenStax-CNX and licensed under the Creative Commons
More informationChapter 4: Probability and Probability Distributions
Chapter 4: Probability and Probability Distributions 4.1 How Probability Can Be Used in Making Inferences 4.1 a. Subjective probability b. Relative frequency c. Classical d. Relative frequency e. Subjective
More informationGene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji
Gene Regula*on, ChIP- X and DNA Mo*fs Statistics in Genomics Hongkai Ji (hji@jhsph.edu) Genetic information is stored in DNA TCAGTTGGAGCTGCTCCCCCACGGCCTCTCCTCACATTCCACGTCCTGTAGCTCTATGACCTCCACCTTTGAGTCCCTCCTC
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationUnit 3 - Molecular Biology & Genetics - Review Packet
Name Date Hour Unit 3 - Molecular Biology & Genetics - Review Packet True / False Questions - Indicate True or False for the following statements. 1. Eye color, hair color and the shape of your ears can
More information6.1 Moment Generating and Characteristic Functions
Chapter 6 Limit Theorems The power statistics can mostly be seen when there is a large collection of data points and we are interested in understanding the macro state of the system, e.g., the average,
More informationBioinformatics and BLAST
Bioinformatics and BLAST Overview Recap of last time Similarity discussion Algorithms: Needleman-Wunsch Smith-Waterman BLAST Implementation issues and current research Recap from Last Time Genome consists
More informationNorthwestern University Department of Electrical Engineering and Computer Science
Northwestern University Department of Electrical Engineering and Computer Science EECS 454: Modeling and Analysis of Communication Networks Spring 2008 Probability Review As discussed in Lecture 1, probability
More informationStochastic processes and
Stochastic processes and Markov chains (part II) Wessel van Wieringen w.n.van.wieringen@vu.nl wieringen@vu nl Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University
More informationLinear Independence x
Linear Independence A consistent system of linear equations with matrix equation Ax = b, where A is an m n matrix, has a solution set whose graph in R n is a linear object, that is, has one of only n +
More informationStatistics and Sampling distributions
Statistics and Sampling distributions a statistic is a numerical summary of sample data. It is a rv. The distribution of a statistic is called its sampling distribution. The rv s X 1, X 2,, X n are said
More informationBE/APh161: Physical Biology of the Cell Homework 2 Due Date: Wednesday, January 18, 2017
BE/APh161: Physical Biology of the Cell Homework 2 Due Date: Wednesday, January 18, 2017 Doubt is the father of creation. - Galileo Galilei 1. Number of mrna in a cell. In this problem, we are going to
More informationLecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes
Lecture Notes 7 Random Processes Definition IID Processes Bernoulli Process Binomial Counting Process Interarrival Time Process Markov Processes Markov Chains Classification of States Steady State Probabilities
More informationRanjit P. Bahadur Assistant Professor Department of Biotechnology Indian Institute of Technology Kharagpur, India. 1 st November, 2013
Hydration of protein-rna recognition sites Ranjit P. Bahadur Assistant Professor Department of Biotechnology Indian Institute of Technology Kharagpur, India 1 st November, 2013 Central Dogma of life DNA
More informationLecture 4: Random Variables and Distributions
Lecture 4: Random Variables and Distributions Goals Random Variables Overview of discrete and continuous distributions important in genetics/genomics Working with distributions in R Random Variables A
More informationQuantitative Methods Chapter 0: Review of Basic Concepts 0.1 Business Applications (II) 0.2 Business Applications (III)
Quantitative Methods Chapter 0: Review of Basic Concepts 0.1 Business Applications (II) 0.1.1 Simple Interest 0.2 Business Applications (III) 0.2.1 Expenses Involved in Buying a Car 0.2.2 Expenses Involved
More information