Statistics of Random DNA

Similar documents
X ε ) = 0, or equivalently, lim

CHAPTER VI Statistical Analysis of Experimental Data

Functions of Random Variables

Chapter 5 Properties of a Random Sample

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

Parameter, Statistic and Random Samples

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

1 Solution to Problem 6.40

A New Measure of Probabilistic Entropy. and its Properties

ε. Therefore, the estimate

TESTS BASED ON MAXIMUM LIKELIHOOD

Multiple Choice Test. Chapter Adequacy of Models for Regression

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

Chapter 14 Logistic Regression Models

Econometric Methods. Review of Estimation

Applying the condition for equilibrium to this equilibrium, we get (1) n i i =, r G and 5 i

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

For combinatorial problems we might need to generate all permutations, combinations, or subsets of a set.

BIOREPS Problem Set #11 The Evolution of DNA Strands

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

Lecture 3 Probability review (cont d)

CS286.2 Lecture 4: Dinur s Proof of the PCP Theorem

Chapter 4 Multiple Random Variables

Descriptive Statistics

Lecture 3. Sampling, sampling distributions, and parameter estimation

Multiple Linear Regression Analysis

The Selection Problem - Variable Size Decrease/Conquer (Practice with algorithm analysis)

3. Basic Concepts: Consequences and Properties

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

MEASURES OF DISPERSION

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

Simple Linear Regression

Simulation Output Analysis

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

22 Nonparametric Methods.

1 Onto functions and bijections Applications to Counting

SPECIAL CONSIDERATIONS FOR VOLUMETRIC Z-TEST FOR PROPORTIONS

Summary of the lecture in Biostatistics

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

The expected value of a sum of random variables,, is the sum of the expected values:

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

Special Instructions / Useful Data

Chapter Statistics Background of Regression Analysis

Law of Large Numbers

We have already referred to a certain reaction, which takes place at high temperature after rich combustion.

ENGI 4421 Propagation of Error Page 8-01

CHAPTER 4 RADICAL EXPRESSIONS

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015

d dt d d dt dt Also recall that by Taylor series, / 2 (enables use of sin instead of cos-see p.27 of A&F) dsin

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Investigating Cellular Automata

Line Fitting and Regression

PROPERTIES OF GOOD ESTIMATORS

Idea is to sample from a different distribution that picks points in important regions of the sample space. Want ( ) ( ) ( ) E f X = f x g x dx

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

NP!= P. By Liu Ran. Table of Contents. The P vs. NP problem is a major unsolved problem in computer

8.1 Hashing Algorithms

Probability and. Lecture 13: and Correlation

2.28 The Wall Street Journal is probably referring to the average number of cubes used per glass measured for some population that they have chosen.

Block-Based Compact Thermal Modeling of Semiconductor Integrated Circuits

NP!= P. By Liu Ran. Table of Contents. The P versus NP problem is a major unsolved problem in computer

Simple Linear Regression

Solving Constrained Flow-Shop Scheduling. Problems with Three Machines

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

ρ < 1 be five real numbers. The

Standard Deviation for PDG Mass Data

Convergence of the Desroziers scheme and its relation to the lag innovation diagnostic

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5

[ L] υ = (3) [ L] n. Q: What are the units of K in Eq. (3)? (Why is units placed in quotations.) What is the relationship to K in Eq. (1)?

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights

A New Family of Transformations for Lifetime Data

Continuous Distributions

Chapter 3 Sampling For Proportions and Percentages

Homework Assignment Number Eight Solutions

Class 13,14 June 17, 19, 2015

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

C-1: Aerodynamics of Airfoils 1 C-2: Aerodynamics of Airfoils 2 C-3: Panel Methods C-4: Thin Airfoil Theory

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

arxiv:math/ v1 [math.gm] 8 Dec 2005

Introduction to local (nonparametric) density estimation. methods

Chapter 4 Multiple Random Variables

Random Variables and Probability Distributions

Mu Sequences/Series Solutions National Convention 2014

1 Mixed Quantum State. 2 Density Matrix. CS Density Matrices, von Neumann Entropy 3/7/07 Spring 2007 Lecture 13. ψ = α x x. ρ = p i ψ i ψ i.

Application of Generating Functions to the Theory of Success Runs

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Unimodality Tests for Global Optimization of Single Variable Functions Using Statistical Methods

Algorithms behind the Correlation Setting Window

A tighter lower bound on the circuit size of the hardest Boolean functions

1. BLAST (Karlin Altschul) Statistics

Outline. Point Pattern Analysis Part I. Revisit IRP/CSR

LINEAR REGRESSION ANALYSIS

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Transcription:

Statstcs of Radom DNA Aruma Ray Aaro Youg SUNY Geeseo Bomathematcs Group

The Am To obta the epectato ad varaces for the ethalpy chage ΔH, etropy chage ΔS ad the free eergy chage ΔG for a radom -mer of DNA

Backgroud Why do we care? Kowledge of the thermodyamc propertes of DNA s essetal all applcatos of DNA volvg structure-depedet hybrdzato Is a measure of the stablty of DNA Useful for uderstadg bologcal processes. Useful for devsg accurate usage of DNA applcatos such as PCR, mcroarrays, etc.

Backgroud Some thermodyamcs thalpy ΔH s a measure of the "useful" work obtaable from a closed system uder costat pressure. Ths s sometmes referred to as the heat of a reacto. tropy ΔS s a measure of the degree of dsorder a thermodyamc system. Gbbs Free ergy ΔG s defed as ΔH - TΔS. It provdes a measure of the spotaety of a reacto ad stablty of the product. The more egatve the value of ΔG, the more stable the product.

Backgroud Duple DNA s formed due to complemetary H-bods betwee trogeous bases. Stablty of DNA s mataed by teractos betwee pars of cosecutve bases P-ways due to crease etropy as a result of stackg. Ths s the bass of the earest-eghbor model

Nearest Neghbor Model -The Bascs Poeered by Zmm, ad by Toco ad coworkers The total ΔH, ΔS ad ΔG of a strad s equal to the sum of the ΔH, ΔS ad ΔG of the stack pars preset the strad For olgoucleotdes, there are pealtes for dfferet types of eds, symmetry, etc. For detals, see A Ufed Vew of Polymer, Dumbbell ad Olgoucleotde DNA Nearest-Neghbor Thermodyamcs, by Joh Sataluca Jr.

Sequece 5'-3' ΔH kcal/mol ΔS cal/mol.k AA -7.9 -. AC -8.4 -.4 AG -7.8 - AT -7. -0.4 Ufed Olgoucleotde Parameters M NaCl Allaw, H. T. ad Sataluca, J. Jr., 997, Bochemstry 36, 058-0594 CA -8.5 -.7 CC -8-9.9 CG -0.6-7. CT -7.8 - GA -8. -. GC -9.8-4.4 GG -8-9.9 GT -8.4 -.4 TA -7. -.3 TC -8. -. TG -8.5 -.7 TT -7.9 -. Termal GC 0. -.8 Termal AT.3 4. Symmetry Correcto 0 -.4 p wth [.5,.5,.5,.5] -8.75 -.3 p wth [.7,.,.,.] -7.954 -.997 p wth [.,.,.7,.] -7.85-0.5

Nearest Neghbor Calculatos Cosder the 8-mer 3 -T-C-C-G-A-A-G-T-5 5 -A-G-G-C-T-T-C-A-3 ΔH of the total strad s gve by ΔHduple = ΔHAG+ ΔHGG + ΔHGC+ ΔHCT + ΔHTT+ ΔHTC +ΔHCA = -7.8-8.0-9.8-7.8-7.9-8. -8.5kcal/mol = -58.0 kcal/mol

At Preset We ca compute ΔH, ΔS ad ΔG for a sgle strad wth some accuracy usg the N-N model calculatos However May applcatos cocer lbrares of DNA as opposed to a sgle type of strad Computg ΔH, ΔS ad ΔG for each strad the lbrary s costly ad tme-tesve

So I applcatos volvg the use of large umbers of DNA molecules t s desrable to have a dea about the rage of ΔH, ΔS ad ΔG Such calculatos should Oly volve the probablty dstrbuto of the four bases DNA Be possble for ay legth of the molecule, ad ay temperature.

How? We kow the values of ΔH ad ΔS for ay gve stack par. It has bee epermetally show that these values do ot chage wth temperature a rage of above 00 degrees.

How? Let be the radom varable represetg the ΔH for the stack par at the -th posto a -mer of DNA Gve the probablty dstrbuto for the bases DNA, P = [p A, p C, p G, p T ], we ca fd the probablty of fdg ay stack par. Oce we kow ths, we ca calculate

How? Frst we compute the epectato for ΔH For a sgle strad ths ca be obtaed by addg the ΔHs for the stack pars preset. 3 4... 3... [ ]

Summary Ths ca be easly computed usg publshed values of ΔH sce we kow the probablty of fdg each stack par. For a uform dstrbuto of bases, ΔH for a 8 mer of DNA s 8-* = -57.95 kcal/mol. We have a procedure Maple 0, whch takes the dstrbuto of bases ad legth of DNA, ad computes ΔH, ad smlarly ΔS ad ΔG

How? Now we eed to obta the varace for the ΔH of a -mer of DNA For a sgle strad ths ca be obtaed by addg the ΔHs for the stack pars preset Therefore we wsh to obta Var 3 4... Note that the calculatos should volve cosderato of term ad symmetry, but these ca be eglected for our purposes

How? 4 3 4 3......... 4 3 Var 4 3... j j 3... j j

How? The mportat step : If -j>, ad j are depedet. Thus j = j Thus j - j = 0

How? 4 3... j j 3... j j...

How?... Var Var

How? We thus have a formula : Var 3 4... Var We used Maple 0 to compute the varace ΔH usg the values for each stack par as gve by Sataluca The same procedure ca be used to compute the varace of ΔS

How? Also we kow that ΔG=ΔH-TΔS Sce ths s a lear combato, Var G Var H T Var S

Results Probablty Dstrbuto [p A,p C,p G,p T ] ΔH kcal/mol ΔS cal/mol.k ΔG at 30 K kcal/mol p Var p Var p Var [0.5,0.5,0.5,0.5] -57.95 6.598-54.99.375-9.900.45 [0.7,0.,0.,0.] -55.678.47-53.979 6. -7.945.04 [0.,0.,0.7,0.] -54.964 6.35-40.875 43.45 -.93.33 Table : The epectato ad varaces for the ΔH, ΔS ad ΔG at 30 K are show for a radom 8-mer, geerated from the probablty dstrbutos show.

What Now? We are curretly tryg to compute the epectato ad varaces the meltg temperatures of a lbrary of DNA molecules.

Summary We have a procedure Maple 0. Iputs Outputs Probablty Dstrbuto of bases Temperature Legth of DNA strad pectato of ΔG, ΔH ad ΔS Varaces ΔG, ΔH ad ΔS

Ackowledgemets We would lke to ackowledge the support of Athoy Macula, Mathematcs, SUNY Geeseo Wedy Pogozelsk, Chemstry, SUNY Geeseo. Ths research was fuded by grats from the NSF ad the AFOSR.

Questos?