Practical Applications and Properties of the Exponentially. Modified Gaussian (EMG) Distribution. A Thesis. Submitted to the Faculty

Size: px
Start display at page:

Download "Practical Applications and Properties of the Exponentially. Modified Gaussian (EMG) Distribution. A Thesis. Submitted to the Faculty"

Transcription

1 Practical Applications and Properties of the Exponentially Modified Gaussian (EMG) Distribution A Thesis Submitted to the Faculty of Drexel University by Scott Haney in partial fulfillment of the requirements for the degree of Doctor of Philosophy March 3 rd, 011

2 c Copyright March 3 rd, 011 Scott Haney. All Rights Reserved.

3 Table of Contents List of Tables... Abstract... ii i 1. Introduction Background on Microarray Data Analysis Gene Expression Measuring Gene Expression Affymetrix Microarrays Experimental Errors and Data Preprocessing Properties of the Exponentially Modified Gaussian (EMG) Distribution Reparameterization of the EMG Distribution EMG Quantile Bounds Parameter Estimation Shape Estimation EMG Right Tail Approximation Application of the EMG Distribution to Actual Affymetrix Microarray Perfect Match (PM) Probe Distributions Comparing the Right Tail to a Shifted Exponential Distribution Discrepancy in the Sample Quantile of the Sample Mean Fitting the Right Tail of the Perfect Match (PM) Probe Data Derivation of Functions That Decrease by a Common Ratio Application of Functions that Decrease by a Common Ratio to the Right Tail Practical Implementation of EMG Parameter Estimation Method and Properties Proof of Consistency... 38

4 6. Practical Considerations and Alterations Summary of Final Parameter Estimation Method Currently Available Methods Maximum Likelihood Estimation Method The Silver Method Method of Moments Comparison of Methods on Synthetic Data Conclusion Appendices A. Derivation of pdf and cdf A.1 Derivation of the Probability Density Function and the Cumulative Distribution Function Bibliography... 57

5 i List of Figures.1 The steps of gene expression that leads to a protein product (taken from [5]) 6. Affymetrix Chip Design (taken from [13]) Step by step procedure of a typical Affymetrix microarray experiment (taken from [9]) Several sources of error for a microarray experiment (Taken from [35]) Plots of EMG distributions for different values of k Plots of the sample pdf histograms for the PM probe distributions from five Affymetrix microarrays along with a plot of an EMG distribution with k = Plot of the right tail of the sample pdf histogram for the PM probe data from T01 tumor.cel fitted to a shifted version of f(x) = 3 log (x)

6 ii List of Tables 7.1 Synthetic data results for the new method Synthetic data results for the method of moments Synthetic data results for the Silver method... 50

7

8 i Abstract Practical Applications and Properties of the Exponentially Modified Gaussian (EMG) Distribution Scott Haney Advisor: Moshe Kam, Ph.D. The exponentially modified Gaussian (EMG) probability distribution is defined as the convolution of an exponential distribution and a Gaussian distribution which are independent of each other. Using a reparameterized form of the EMG cumulative distribution function (cdf) several properties of the EMG distribution are derived. These properties are used to test whether the distribution of the perfect match (PM) probes from five Affymetrix microarrays follows an EMG distribution and to create a new parameter estimation method. A commonly used method for preprocessing Affymetrix microarray data, known as the robust multi-array average (RMA), assumes that the distribution of the PM probes at least approximately follows an EMG distribution. Using the results derived in this thesis it is found that the EMG distribution is not a good fit for these sample data based on differences in the right tail of the sample distribution. A new distribution that is very dissimilar to the right tail of an EMG distribution is derived that more accurately fits the right tail of the sample data. From the properties of the EMG distribution derived in this thesis it is further shown that a new parameter estimation method can be created. This new parameter estimation method is compared against two other methods from the literature namely the method of moments and the Silver method (009). From a theoretical perspective, the new parameter estimation method has the advantage that it is proven to be consistent and to always return valid parameter estimates (such as the constraint that the variance be positive). Neither the Silver method nor the method of moments has both of these qualities. All three methods were compared on the same synthetic data generated from EMG distributions and it was found that the performance of

9 ii each method depended on the shape of the EMG distribution. It was also found that the Silver method appears to not be consistent for EMG distributions that are too close to being a Gaussian distribution.

10 1 1. Introduction The EMG distribution is the convolution of a Gaussian distribution and an exponential distribution which are independent of each other. This distribution has found practical applications in a variety of scientific disciplines such as chromatography [17,0,3,9], cellular biology [14], radiotherapy [16], and microarray preprocessing [18,30]. Many of these practical applications focus on the problem of curve fitting of data points to a function which is an EMG pdf multiplied by a scaling parameter. A large number of algorithms have been introduced in the literature to solve this problem [3, 11, 1, 36]. The focus of this thesis is to better understand the properties of the EMG distribution so that it can be determined whether or not the perfect match (PM) probe distributions from five Affymetrix microarrays approximately follows an EMG distribution. This is an important assumption made by a commonly used microarray preprocessing method known as the robust multi-array average (RMA) [18]. Several properties of the EMG distribution were derived and were used to show that the right tails of the sample probability density function (pdf) were much heavier than would be expected for an EMG distribution. By visual analysis of the sample pdf histograms it was determined that the right tails of the sample pdfs approximately reduced in height by one third whenever the value on the x-axis was doubled. This is a property that the right tail of an EMG pdf does not come close to having. A function with this property was derived and it was found to be a reasonable approximation for the right tails of the sample pdfs. These results strongly challenge the assumption used by the RMA method that the PM probes approximately follow an EMG distribution. Using the derived properties it is also possible to create a parameter estimation method that has some very desirable properties such as consistency and always being

11 CHAPTER 1. INTRODUCTION able to return valid parameter estimates where valid refers to parameter estimates that satisfy all of the constraints of the original parameters. Several parameter estimation methods already exist in the literature [, 30] and the new parameter estimation method is compared to two of these. The two methods selected were the method provided in [30] (referred to as the Silver method ) and the method of moments. All three methods were compared on synthetic data generated from EMG distributions. The synthetic data trials distinguished between three scenarios which were: 1. The EMG distribution is close to being a shifted exponential distribution. The EMG distribution is close to being a Gaussian distribution 3. The EMG distribution is neither close to a shifted exponential distribution nor close to a Gaussian distribution An EMG distribution is considered to be close to a shifted exponential distribution when a large fraction of the variance of the EMG distribution is due to the variance of the exponential component; an EMG distribution is considered to be close to a Gaussian distribution when a large fraction of the variance of the EMG distribution is due to the variance of the Gaussian component. Both the Silver method and the method of moments were found to have distinct disadvantages compared ot the new parameter estimation method. The method of moments failed to return valid parameter estimates at least 10 times out of 100 and at most 61 times out of 100 in the synthetic data trials. For these failed runs the method of moments returned at least one imaginary parameter estimate. The Silver method appears to be converging to incorrect parameter estimates under the second scenario. The average parameter estimates for the Silver method after applying it to 100 random samples of size 10,000 generated from a certain EMG distribution showed

12 CHAPTER 1. INTRODUCTION 3 that the parameter estimates were off by as much as 9 standard deviations. With respect to accuracy, the results of the synthetic data trials showed that the performance of the parameter estimation methods varied across the three scenarios. In the first scenario it was found that the accuracy of the Silver method was noticeably better in most cases than the accuracy of the new method and the accuracy of the method of moments. In the second scenario it was found that the accuracy of the method of moments and the accuracy of the new method were comparable, while in most cases the accuracy of the Silver method was noticeably lower. In the third scenario it was found that the accuracy of the method of moments and the accuracy of the new method were comparable while in most cases the accuracy of the Silver method was noticeably lower. The organization of this thesis is as follows: 1. Background necessary for understanding the application of the EMG distribution to Affymetrix microarray data is described.. Properties of the EMG distribution that will be used in improving the application of the EMG distribution in practice are derived 3. The assumption that the PM probe data from Affymetrix microarrays approximately follows an EMG distribution is tested for data from five microarrays and it is found that this assumption is unlikely to be true. 4. A new distribution is derived to fit the right tails of the PM probe distributions from the five microarrays. This new distribution is found to visually fit the sample data well and is not close to the right tail of an EMG distribution. 5. A new parmeter estimation procedure is described and is proven to be consistent.

13 CHAPTER 1. INTRODUCTION 4 6. The new parameter estimation method is compared to two other parameter estimation methods from the literature and is found to have several important advantages over these two methods.

14 5. Background on Microarray Data Analysis Within a single human being different cell types can have exactly the same DNA yet be extraordinarily different. For example, skin cells and bone cells have the same DNA yet they are not very similar in form or function []. Although skin and muscle cells have the same DNA, certain subsequences of the DNA (known as genes) affect the cellular environment in different ways. Perhaps the most commonly studied way by which a gene can affect a cell is the process of gene expression..1 Gene Expression Gene expression is a multi-step process by which a gene product is created from a gene. In humans the most common gene products are proteins, which are one or more long chains of amino acids that are folded together. For simplicity it is assumed that gene expression refers to gene expression where the gene product is a protein since proteins are thought to be the primary reason for biological changes within the cell. The steps of gene expression for protein products [] are 1. DNA is transcribed into a complementary mrna copy. Intron sequences are removed (or spliced) from the complementary mrna copy 3. The spliced complementary mrna sequence is translated into a chain of amino acids 4. Posttranslational modifications are made to the chain of amino acids and the final protein product is formed These steps are shown pictorially in (Figure.1).

15 CHAPTER. BACKGROUND ON MICROARRAY DATA ANALYSIS 6 Figure.1: The steps of gene expression that leads to a protein product (taken from [5]) Protein gene products are typically very complex and can affect the cell in different ways depending on a variety of factors. Two common factors that impact the effect of proteins is the concentration of other proteins in the cellular environment and the folded shape of the protein. Any change starting from gene expression and ending with the final structure, form, and environment of the protein product can affect the biology of the cell [].. Measuring Gene Expression Obtaining a meaningful measure of gene expression is not straightforward. A single change in any step of the process can lead to different biological results. In practice, the first step of the process of gene expression where the DNA is transcribed

16 CHAPTER. BACKGROUND ON MICROARRAY DATA ANALYSIS 7 into a complementary mrna copy is the portion of the process that is measured. Measuring this step provides an estimate of the total amount of protein product that can be produced. This measurement, however, does not provide any estimate as to how much of the protein is actually produced or give any idea as to the final physical form of the protein in the cell. There are several important reasons for focusing on this portion of the process which are as follows: 1. Methods for measuring the presence of mrna molecules are well established. Since the human genome is approximately 99.9% identical across individuals it is reasonable to assume that the same mrna molecules are being tested for 3. It is possible to simultaneously measure the presence of a large number of mrna molecules within the same sample A number of testing devices are available for simultaneously measure large numbers of mrna molecules in a sample. One class of these testing devices, known as microarrays, are commonly used for this purpose in practice..3 Affymetrix Microarrays One of the most well known manufacturers of microarrays is Affymetrix [1]. Affymetrix microarrays are small chips that have their surfaces subdivided into a rectangular grid. Each rectangle in the grid contains a large number of 5 nucleotide base pair long DNA probes all having the same sequence. These DNA probe are standing straight up on the surface of the chip with the bottom end of the probe affixed to the surface of the chip and the top end of the probe being free to move (Figure.). This design allows any mrna molecules to chemically bind to the DNA probes on the surface of the microarray.

17 CHAPTER. BACKGROUND ON MICROARRAY DATA ANALYSIS 8 Figure.: Affymetrix Chip Design (taken from [13]) Each subgrid contains either perfect match (PM) probes or mismatch (MM) probes. A PM probe is designed to be complementary to an expected subsequence of a specific mrna. An MM probe is designed to match a PM probe sequence with the exception that the 13 th nucleotide base is switched. Every subgrid of PM probes has a corresponding subgrid of MM probes. For every gene there are typically several PM and MM probe subgrids. The entire collection of these subgrids is termed a probeset. Affymetrix microarrays measure mrna levels by using basic principles of chemistry. Each DNA probe on the surface will prefer to be bound to other DNA that is exactly complementary. In general, the closer a subsequence of an DNA is to being complementary to a probe sequence the more likely it will be to bind to the corresponding probe. By using this principle it is thought that if a targeted sequence is present in solution it will bind to its corresponding probe with high probability. Of course, other DNA sequences in solution that have a subsequence which is close to being complementary can also bind. It is thought that the MM probes can be used to provide an estimate of this erroneous binding known as cross-hybridization.

18 CHAPTER. BACKGROUND ON MICROARRAY DATA ANALYSIS 9 An Affymetrix microarray experiment is begun by extracted mrna from a biological sample. Extracted mrna then goes through a number of preparation steps where it is labeled with some molecule that can be identified using a scanner and the labeled mrna is then applied to the surface of the microarray. After the chemical reactions have had some time to take place the microarray is washed and only the mrna from the sample that is bound to probes should remain. Lastly, the microarray is put under a scanner and for each rectangular subgrid the intensity of the labeling molecule is measured. A pictorial example of this process is given in Figure.3. Figure.3: Step by step procedure of a typical Affymetrix microarray experiment (taken from [9])

19 CHAPTER. BACKGROUND ON MICROARRAY DATA ANALYSIS 10.4 Experimental Errors and Data Preprocessing Affymetrix microarrays are subject to technical, chemical, and human errors. An example of some of these errors can be seen in Figure.4. These errors have been extensively studied in the literature [33, 35, 37], however, they still remain to be convincingly modeled in practical Affymetrix microarray experiments. An understanding of how these errors affect Affymetrix microarray data is essential for determining how reliable the data are as well as for extracting a reasonable estimate for the level of gene expression in the sample. Figure.4: Several sources of error for a microarray experiment (Taken from [35]) Previous work has been completed towards estimating the mrna concentration in the presence of error and has met with some success. In one publication [1], a method was developed that was capable of detecting known mrna levels in the presence of experimental error. At least two other authors determined differential

20 CHAPTER. BACKGROUND ON MICROARRAY DATA ANALYSIS 11 equation models that took into account error which worked well on the data tested [5, 35]. For the type of microarray experiment described in the previous section these techniques have not had a very significant impact in practice. For a different type of microarray experiment, a real-time microarray experiment [8], these techniques are much more effective and practical. Most microarray data sets that are currently available, however, are not real-time microarray experiments. In practice it is common to handle microarray errors by using techniques that are much simpler than the methods discussed in the previous paragraph. The typical first step to microarray data analysis is to preprocess the data in order to remove error. Some of the most commonly used microarray preprocessing techniques in practice are those provided directly by Affymetrix (PLIER and MAS 5.0) [34] and the robust multichip average (RMA) [18]. As of the time of this writing no single method preprocessing method has been found to be generally preferable to the rest [19]. After the data are preprocessed it is usually assumed that the resulting data are error free. Data analysis techniques are then applied to the preprocessed data to find interesting results. The preprocessing technique of primary interest in this thesis is RMA. At the present time the original RMA publication has been cited over 3,000 times. This technique makes the assumption that the distribution of PM probes from a microarray approximately follows an EMG distribution [18]. RMA uses this assumption to model observed values as signal (which follows an exponential distribution) plus noise (which follows a Gaussian distribution). The signal value is then estimated by solving for the expected value of signal given the value of signal plus noise. If the assumption that the distribution of the PM probes follow an EMG distribution is incorrect then estimating the use of the EMG distribution in RMA is questoinable. This assumption is shown to be unlikely to be true based on the results of comparing the sample data

21 CHAPTER. BACKGROUND ON MICROARRAY DATA ANALYSIS 1 to certain properties of the EMG distribution.

22 13 3. Properties of the Exponentially Modified Gaussian (EMG) Distribution Due to the large reach of the EMG distribution in practical applications, [14, 18, 30] a better understanding of the EMG is worth pursuing. The probability density function (pdf) and cumulative density function (cdf) for an EMG distribution are given below as EMG(c; µ, σ, λ) and emg(c; µ, σ, λ) respectively (see Appendix A for derivations): EMG(c; µ, σ, λ) = 1 1 λσ eλ( +µ c) erfc( σ (λ + µ c )) + 1 σ erf( 1 (c µ)) σ (3.1) emg(c; µ, σ, λ) = λ λσ eλ( +µ c) erfc(( σ )(λ + µ c )) (3.) σ where erf(x) = x e t dt π 0 erfc(x) = e t dt = 1 erf(x) π x In this chapter several properties of the EMG distribution are derived. The derivations predominantly rely on reparameterizing the input to the EMG cdf. These properties will later be used to challenge a current assumption that the PM probe data from Affymetrix microarrays approximately follows an EMG distribution [18] as well as to create a new parameter estimation method.

23 CHAPTER 3. PROPERTIES OF THE EXPONENTIALLY MODIFIED GAUSSIAN (EMG) DISTRIBUTION Reparameterization of the EMG Distribution A carefully selected reparameterization of the input to the EMG cdf can be used to show several useful properties of the EMG distribution. This result was discovered by analyzing the mode of the EMG pdf, which occurs when the derivative of the EMG pdf is equal to zero. This equation is given by λσ = 1 e p π erfc(p) (3.3) where p = λσ + µ c σ Solving for c yields the mode of the EMG pdf. The equation for the mode can be simplified somewhat by replacing c with a reparameterization c 1 which is given by c 1 = µ + λσ Dσ (3.4) where D R. Replacing c with c 1 in (3.3) causes the equation to become λσ = e D πerfc(d) This equation shows that the mode of the EMG pdf can be written entirely in terms of D and λσ. The term λσ will be used often throughout the rest of this thesis, and from this point on this term will be denoted by k. This reparameterization can be slightly generalized and can be used to simplify the EMG cdf for some input values. This slightly updated reparameterization is denoted

24 CHAPTER 3. PROPERTIES OF THE EXPONENTIALLY MODIFIED GAUSSIAN (EMG) DISTRIBUTION 15 by c and is given by c = µ + Cλσ + Dσ where C,D R. Replacing c with c in (3.1), the EMG cdf reduces to EMG c (C, D, k) = 1 [1 e k Ck Dk erfc( k Ck D ) + erf( Ck + D )]. (3.5) where k = λσ. Using this equation it is possible to calculate any quantile that can be represented in terms of c once k is known. Several important results that will be used in this writing are now explained in the following sections. These results heavily rely on the term k and (3.5). From the work in the following sections it will become evident that the term k provides a significant amount of information about an EMG distribution. 3. EMG Quantile Bounds Analysis of specific values of c revealed that at least some of the quantiles must lie within certain bounds. This is accomplished by combining the constraint k > 0 (which is true because both σ and λ are greater than zero) with (3.5). Two such bounds are given in the following paragraphs both as examples and for use later in this thesis. Perhaps the simplest example of a quantile bound is when C = D = 0. Under these conditions it follows that c = µ and (3.5) reduces to a function that only depends on k which is given by EMG c (0, 0, k) = EMG µ (k) = 1 [1 e k erfc( k )] (3.6)

25 CHAPTER 3. PROPERTIES OF THE EXPONENTIALLY MODIFIED GAUSSIAN (EMG) DISTRIBUTION 16 Taking a derivative shows that the right hand side of EMG µ (k) is monotonically decreasing for k (0, ). Using this information it can be shown that 0 < EMG µ (k) < 1 (3.7) for any EMG distribution. It is also possible to determine a quantile bound on the mean m = µ + λ 1 of an EMG distribution. The reparameterization c is equal to m when C = k and D = 0. Under these conditions (3.5) reduces to a function that only depends on k which is given by EMG m (k) = EMG c (k, 0, k) = 1 [1 e k 1 erfc( k k 1 1 ) + erf( k )] (3.8) Analysis of the derivative shows that the right hand side of EMG m (k) is monotonically decreasing for k (0, ). Using this information it can be shown that 1 < EMG m(k) < 1 e 1.63 (3.9) for any EMG distribution. 3.3 Parameter Estimation It is possible to completely define an EMG distribution in terms of k and two quantiles rather than in terms of the three parameters µ, σ, and λ. Assuming that k is known, one such procedure for determining the parameters is as follows: 1. Determine µ from the quantile determined by the right hand side of EMG µ (k) = 1 [1 e k erfc( k )]

26 CHAPTER 3. PROPERTIES OF THE EXPONENTIALLY MODIFIED GAUSSIAN (EMG) DISTRIBUTION 17. Determine ml = µ + λ 1 from the quantile determined by the right hand side of EMG m (k) = 1 [1 e k 1 erfc( k k 1 1 ) + erf( k )] 3. Determine ms = µ + σ from the quantile determined by the right hand side of EMG c (0, 1, k) = 1 [1 e k k erfc( k 1 ) + erf( 1 )] 4. Estimate λ by subtracting the estimate of µ from ml and then taking the multiplicative inverse of the result 5. Estimate σ by subtracting the estimate of µ from ms The ability to define the EMG distribution in terms of k and two quantiles opens up the possibility of a new type of parameter estimation method for an EMG distribution. Given a sample from an EMG distribution if k can be estimated then it is possible to estimate the parameters of the EMG distribution. In practice, a simple way to estimate k is by estimating the sample quantile of the sample mean. This estimate can then be substituted for the left hand side of (3.8) and an estimate for k can be obtained by solving this equation for k. As long as the estimate for the sample quantile of the sample mean satisfies (3.9) it will be possible to solve for k. 3.4 Shape Estimation The value of k determines the overall shape of the EMG distribution. This can be seen by analyzing the variance of an EMG distribution in terms of k which yields

27 CHAPTER 3. PROPERTIES OF THE EXPONENTIALLY MODIFIED GAUSSIAN (EMG) DISTRIBUTION 18 the following: Var(EMG(c; µ, σ, λ)) = σ + λ = k + 1 λ (3.10) = σ + σ k (3.11) As k in (3.10) approaches zero the impact of the Gaussian component on the variance becomes negligible. As k in (3.11) approach the impact of the shifted exponential component on the variance becomes negligible. As the variance of a component becomes negligible, the EMG distribution will be close to the distribution of the other component. These observations indicate that for values of k that are large the EMG distribution is close to a Gaussian distribution and that for values of k that are small the EMG distribution is close to a shifted exponential distribution. In practice, it is likely that an EMG distribution which is very close to being either a Gaussian distribution or a shifted exponential distribution will be treated as a Gaussian distribution or a shifted exponential distribution respectively. Due to this, it seems reasonable to assume that EMG distributions which arise in practice are likely to have k values that are located within a certain bounded interval. The variance relations which were discussed in the previous paragraph provide a way to obtain a rough estimate for this bounded interval. By combining 3.10) and (3.11) it follows that k + 1 λ = σ + σ k Setting k = 1 results in σ = λ, which implies that the variance of both components is equal. It seems reasonable to assume that a component will become negligible when its variance is less than a certain percentage of the other. It further seems reasonable

28 CHAPTER 3. PROPERTIES OF THE EXPONENTIALLY MODIFIED GAUSSIAN (EMG) DISTRIBUTION 19 to assume that this percentage can be set to 1%, which results in the following bounds on k k [0.1, 10]. Several plots of EMG distributions for different values of k between 0.1 and 10 are given in Figure 3.1. In practice, it seems unlikely that values of k outside of this interval will be encountered. If this is not the case then it will be very difficult to estimate the parameters of the EMG distribution. The reason for this is that the closer the EMG distribution becomes to either a Gaussian distribution or a shifted exponential distribution the harder it will be to estimate the exact magnitude of the difference. In general, the slighter the modification to a distribution the harder it will be to detect. 3.5 EMG Right Tail Approximation It is possible to show that the EMG cdf is approximately the same as a shifted exponential cdf in the right tail of the distribution. The cdf of a shifted exponential distribution will be denoted by SED(c; λ, T ) and is defined to be SED(c; λ, T ) = 1 e λc T (3.1) where T is the shift parameter and λ is the same shape parameter that is used in an exponential distribution. The desired approximation will be derived by considering the reparameterization c 3 = µ + Dσ (3.13)

29 CHAPTER 3. PROPERTIES OF THE EXPONENTIALLY MODIFIED GAUSSIAN (EMG) DISTRIBUTION 0 (a) k = 0.1 (b) k = 0.5 (c) k = 1.0 (d) k =.0 Figure 3.1: Plots of EMG distributions for different values of k. where D R. Using this new reparameterization in place of c in the EMG cdf it follows that EMG c3 (D, k) = 1 [1 e k Dk erfc( k D ) + erf( D )] To see what happens in the right tail the limit as c 3 approaches infinity is considered. This limit can not immediately be determined because the right hand side of EMG c3 (D, k) does not directly include c 3. The right hand side is written in terms of D so a relation between the limiting value of c 3 and D would allow the limit to be easily evaluated. From the constraint that EMG µ (k) (0, 1 ) (3.7) and the constraint

30 CHAPTER 3. PROPERTIES OF THE EXPONENTIALLY MODIFIED GAUSSIAN (EMG) DISTRIBUTION 1 that σ > 0 it is clear that if c 3 is greater than the median then D > 0. Thus as c 3 approaches infinity, D approaches infinity. Using this information it can be seen that lim EMG 1 c 3 (D, k) = lim c 3 c 3 [1 e k Dk erfc( k D ) + erf( D )] = lim D 1 [1 e k Dk erfc( k D ) + erf( D )] e k D = 1 [ lim = 1 lim D e k(d k ) Dk ] where the last equality is the cdf of a shifted exponential distribution with shift T = k and shape parameter λ = k. Both the erf and the erfc terms approach their limits at a much faster rate than does a term of the form e kd, hence the cdf of the EMG distribution should be approaching the cdf of a shifted exponential distribution. To show that the right tail approximation is accurate in a more quantitative manner it is first noted that if D D 0 > 0 then the following bounds hold 1 > erf( D ) 1 α 1 > erfc( k D ) α where α 1 = erfc( D 0 ) α = erfc( D 0 k ) From the fact that k > 0 it must be that α 1 < α. Using these constraints along with

31 CHAPTER 3. PROPERTIES OF THE EXPONENTIALLY MODIFIED GAUSSIAN (EMG) DISTRIBUTION the inequality between α 1 and α bounds can be put on EMG c3 (D, k). The lower bound is given by EMG c3 (D, k) 1 [1 e k Dk erfc( k D ) + 1 α 1 ] = 1 [( α 1) e k Dk erfc( k D )] > 1 [( α 1) e k Dk ] = (1 α 1 ) e k Dk = 1 e k Dk α 1 ) = 1 e k Dk erfc( D 0 and the upper bound is given by EMG c3 (D, k) 1 [1 e k Dk ( α ) + 1 α 1 ] = 1 [( α 1) ( α )e k Dk ] < 1 [ ( α )e k Dk ] = 1 e k Dk + erfc(d 0 k) e k Dk 1 e k Dk + erfc(d 0 k) e k D 0k = 1 e k Dk + erfc(w(k)) e w (k) D 0 where w(k) = (D 0 k). For these two bounds the error between EMG c3 (D, k) and the bounds are given by L e = erfc( D 0 ) (3.14) U e = erfc(w(k)) e w (k) D 0 (3.15)

32 CHAPTER 3. PROPERTIES OF THE EXPONENTIALLY MODIFIED GAUSSIAN (EMG) DISTRIBUTION 3 where L e is the error in the lower bound and U e is the error in the upper bound. Both error terms approach zero much more rapidly than does the term e k Dk so these approximations should be quite accurate as long as D is large enough relative to k. The error in approximation can also be characterized in terms of the percentage error. It is possible to show that the percentage error of the approximation monotonically decreases to zero for D > 0. The percentage error P E of approximating the value of EMG(c 3 ) at D is given by P E = EMG c 3 (D, k) 1 e k Dk EMG c3 (D, k) = 1 1 e k Dk EMG c3 (D, k) (3.16) If the percentage error is monotonically decreasing to zero for D > 0 then it must be the case that the second term in P E given by P r = 1 e k Dk EMG c3 (D, k) is monotonically increasing to one for D > 0. Clearly the limit of P r is one since both the numerator and the denominator are valid cdfs. The derivative of P r with respect to D can be shown to be positive so it follows that P r is monotonically increasing with respect to D. The denominator of the derivative is always positive since it is squared and the numerator of the derivative given by ke k Dk (erf( (D k)) erf( D)) will be positive for all D > 0. As an example of the accuracy of this approximation assume that k = 1 and

33 CHAPTER 3. PROPERTIES OF THE EXPONENTIALLY MODIFIED GAUSSIAN (EMG) DISTRIBUTION 4 D =. Under these circumstances it follows that EMG c3 (D, k) (1 e k Dk ) EMG c3 (D, k) which shows that the percentage error in the approximation is close to 1.6%. Because the percentage error is monotonically decreasing for D > 0, the percent error in the approximation will be no more than approximately 1.6% for all D.

34 5 4. Application of the EMG Distribution to Actual Affymetrix Microarray Perfect Match (PM) Probe Distributions Data from five Affymetrix microarrays described in [31] were downloaded from [7]. The five Affymetrix microarray data files that were selected were T01 tumor.cel - T05 tumor.cel. It is found that the sample distributions of the PM probes for these five Affymetrix microarrays are unlikely to follow an EMG distribution. First, it is shown that the right tail of the sample pdf is not what would be expected for an EMG distribution. Further, it is shown that the sample quantiles of the sample means for all five distributions are larger than would be expected for an EMG distribution. 4.1 Comparing the Right Tail to a Shifted Exponential Distribution From the results in 3.5 it is clear that the EMG cdf should be well approximated by the cdf of a shifted exponential distribution in the right tail. In order to apply this approximation in practice it will be necessary to know where to begin. It will be shown that the start of the right tail can be reasonably approximated if an upper bound k max on k can be assumed. Once the right tail has been located, a slightly modified ratio of two sample quantiles will be compared to the ratio that would be expected if the distribution was a shifted exponential distribution. The results of this test will show that the right tails of the PM probe distributions from the five Affymetrix microarrays described at the beginning of this chapter are very different from what would be expected for an EMG distribution.

35 CHAPTER 4. APPLICATION OF THE EMG DISTRIBUTION TO ACTUAL AFFYMETRIX MICROARRAY PERFECT MATCH (PM) PROBE DISTRIBUTIONS Locating the Beginning of the Right Tail To estimate the beginning of the right tail there must first be an estimate for the upper bound on k denoted by k max. From this estimate it is possible to determine upper bounds on σ denoted by σ max and µ denoted by µ max. Using the result from 3.5 that the percentage error in the right tail approximation is monotonically decreasing it is possible to select a value for D such that the percentage error is bounded. These three estimates are then used to calculate c 3 (3.13) which is the estimate for the beginning of the right tail. To estimate k max it is not unreasonable to assume a value for k max by eye-balling the data given the insights from 3.4. From viewing the sample pdf histograms of the five PM probe distributions (Figure 4.1) k max = 1 seems like a safe estimate. Using k max, σ max can be obtained by rearranging (3.11) to obtain σ max s 1 + k max where s is the sample standard deviation. Substituting k = k max, C = D = 0 into (3.5) yields an estimate for µ max. Lastly a suitable value for D must be chosen so that the percentage error between the actual EMG tail and the shifted exponential tail is small enough. In 3.5 it was shown that for D > the percentage error in the approximation was no more than roughly 1.6%. Given that this error seems to be small enough it is assumed that the right tail begins at c 3 = µ max + σ max Testing the Right Tail In order to test that the right tail is approximately a shifted exponential distribution it is necessary to use a test that will not be affected much by the error in the approximation. One such test is to slightly modify the ratio of two quantiles.

36 CHAPTER 4. APPLICATION OF THE EMG DISTRIBUTION TO ACTUAL AFFYMETRIX MICROARRAY PERFECT MATCH (PM) PROBE DISTRIBUTIONS 7 (a) T01 tumor.cel (b) T0 tumor.cel (c) T03 tumor.cel (d) T04 tumor.cel (e) T05 tumor.cel (f) EMG with k = 1 Figure 4.1: Plots of the sample pdf histograms for the PM probe distributions from five Affymetrix microarrays along with a plot of an EMG distribution with k = 1.

37 CHAPTER 4. APPLICATION OF THE EMG DISTRIBUTION TO ACTUAL AFFYMETRIX MICROARRAY PERFECT MATCH (PM) PROBE DISTRIBUTIONS 8 Estimates of quantiles tend to be fairly robust so this test should not be greatly affected by the approximation error between EMG c3 (D, k) and the cdf of some shifted exponential distribution. In order to derive a test for the ratio of quantiles from a shifted exponential distribution such a test was first created for an exponential distribution. This test is then extended in a natural way to the shifted exponential distribution. The cdf for an exponential distribution denoted by E(c; λ) is given by E(c; λ) = 1 e λc For an exponential distribution, the ratio of any two quantiles is constant. To see this suppose that E(x 1 ; λ) = q and E(x ; λ) = p. Then it follows that x 1 x = ln(1 q) ln(1 p) where the ratio of the quantiles is clearly independent of λ. For a shifted exponential distribution the only change that needs to be made is to shift the input by the value of the shift parameter T. The shifted ratio of its quantile denoted by S r is given by S r = x 1 T x T ln(1 q) = ln(1 p) (4.1) (4.) The ratio test just derived for a shifted exponential distribution can be directly applied to the experimental data being studied despite two possible problems. The first possible problem is that the right tail of the distribution will not be a valid probability distribution (because the area under the right tail is not equal to one). Instead the right tail will be some constant multiple of a probability distribution that

38 CHAPTER 4. APPLICATION OF THE EMG DISTRIBUTION TO ACTUAL AFFYMETRIX MICROARRAY PERFECT MATCH (PM) PROBE DISTRIBUTIONS 9 is close to being a shifted exponential distribution. Fortunately, these constants will cancel out by taking a ratio so the test is not affected. The second possible problem is the approximation error. It is important to show that the approximation error will not cause a large error in S r. Bounds on the error in S r caused by error in the approximation will be derived. In the application of this test to the PM probe distribution of Affymetrix microarrays these bounds will be used to show that the error in S r caused by approximation error is not significant. Showing that the approximation does not significantly affect S r requires some extra work due to the shift parameter T being present in the ratio test. It was shown in 3.5 that the error in approximation can be written in terms of percentage error and that the percentage error can be made as small as desired by moving far enough to the right. Shifting the actual quantile value along with its approximation changes the percentage error so it is necessary to know how the percentage error changes in this case. The percentage error (P E from 3.16) and the shifted percentage error denoted by P E s can be related as follows P E = P E s EMG c3 (D, k) T EMG c3 (D, k) From the last equation it follows that if the ratio of the shifted quantile to the actual quantile is not too large then it will follow that if P E is reasonably small then P E s will also be reasonably small. Applying this result back to the EMG cdf it follows that the ratio of any two quantiles x 1 = EMG(y 1 ; µ, σ, λ) and x = EMG(y ; µ, σ, λ) that are far enough into the right tail is bounded by ( 1 P E s 1 + P E s ) < ( EMG(y 1; µ, σ, λ) T EMG(y ; µ, σ, λ) T )(SED(y ) T SED(y 1 ) T ) < (1 + P E s 1 P E s ) This shows that the quantile ratio assuming a shifted exponential distribution will be

39 CHAPTER 4. APPLICATION OF THE EMG DISTRIBUTION TO ACTUAL AFFYMETRIX MICROARRAY PERFECT MATCH (PM) PROBE DISTRIBUTIONS 30 approximately the same as the quantile ratio assuming an EMG cdf as long as P E s is small enough. This ratio test is now applied to the sample data from the five Affymetrix PM probe distributions using the quantiles q = 0.50 and p = For all five sample pdfs it was assumed that k max = 1 (see Figure 4.1 for a visual comparison) and that the right tail could be assumed to start at D = (see 4.1 for justification). Using these assumptions it was found that even with the approximation error being taken into account, varying the sample quantiles by even as much as five standard deviations was not enough to match the ratio that would be expected. This result strongly suggests that the right tail does not follow a shifted exponential distribution which casts doubt on the assumption that this data follows an EMG distribution. 4. Discrepancy in the Sample Quantile of the Sample Mean For all five data sets the sample quantile of the sample mean was much larger than the (1 e 1 ) th quantile. Since the quantile of the mean of an EMG distribution can not be larger than the (1 e 1 ) th quantile it seems likely that the sample data are not EMG distributed. To investigate this possibility a hypothesis test is created to determine whether or not the quantile of the mean of each distribution was larger than the (1 e 1 ) th quantile. To create the hypothesis test it is assumed that both the sample mean and the sample quantiles approximately follow a Gaussian distribution. Due to the fact that the sample size was greater than 00,000 for all five sample distributions, these two assumptions seem reasonable in light of the central limit theorem. Given these assumptions, the paired t-test can be used to determine if it is likely that the quantile of the mean is larger than the (1 e 1 ) th quantile. After applying the paired t-test to all of the sample distributions it was found that

40 CHAPTER 4. APPLICATION OF THE EMG DISTRIBUTION TO ACTUAL AFFYMETRIX MICROARRAY PERFECT MATCH (PM) PROBE DISTRIBUTIONS 31 the difference between the sample quantile of the sample mean and the (1 e 1 ) th quantile was very high. For all five data sets, the difference between the sample quantile of the sample mean and the (1 e 1 ) th sample quantile was no less than 90, while the standard deviations for both estimates were less than 1. Given these values the null hypothesis that the quantile of the mean is less than the (1 e 1 ) th quantile can easily be rejected at the α = 0.01 level for all five sample distributions. Since the mean of the sample data occurs at such a large quantile it seems likely that the best EMG fit for the data would be a distribution that is close to being a shifted exponential distribution (small value of k). From viewing Figure 4.1 it is clear that this sample pdf is not very similar to a shifted exponential distribution. This result shows that it is unlikely that these sample distributions follow an EMG distribution.

41 3 5. Fitting the Right Tail of the Perfect Match (PM) Probe Data Given that the sample data are unlikely to follow an EMG distribution the next question that should be asked is what distribution do these data follow? The previous chapter showed that the right tails of the sample distributions were very different from the right tail of an EMG distribution. From visual inspection (Figure 4.1) it appears that the problem is due to the right tails of the sample pdf histograms being much too heavy. In other words the right tails of the sample pdf histograms do not go to zero as quickly as would be expected. After further visual examination the right tails of the sample pdfs all seemed to share the property that doubling the input to the sample pdf reduced the height of the sample pdf histogram by approximately one third. Taking this observation as an assumption the problem of determining an appropriate distribution for the right tail of the sample data becomes the problem of finding a function with this property. Such a function will be derived in the next section and will be shown to fit the right tails of the sample pdf histograms closely. The derivation of this function will then be generalized to functions of a larger class. 5.1 Derivation of Functions That Decrease by a Common Ratio It is assumed that a function f(x) such that f(x) f(x) = 3 (5.1) may be an appropriate distribution for modeling the right tails of the sample pdf histograms. In order to determine the form of f(x), several common functional forms were assumed for f(x) and the algebra was checked to see if the final result was valid.

42 CHAPTER 5. FITTING THE RIGHT TAIL OF THE PERFECT MATCH (PM) PROBE DATA 33 After several attempts it was found that by assuming f(x) = g(x) x it was possible to determine f(x). By using the substitution f(x) = g(x) x it follows that f(x) f(x) = g(x)x = 3 (5.) g(x) x It is possible to create a recurrence relation over some values of g(x). This is accomplished using the following modified form of (5.) g(x) x g(x) x = 3 xlog(g(x)) xlog(g(x)) = log(3) log( g(x) g(x) ) = log(3) x g(x) = 3 x 1 g(x) g(x) = g(x)3 x 1 = log(3 x 1 ) If it is assumed that g(1) = 1 then the first six terms of the recurrence are as follows: g(1) = 1 g() = g(1)3 1 = 3 1 g(4) = g()3 1 = 3 1 g(8) = g(4)3 1 4 = g(16) = g(8)3 1 8 = g(3) = g(16) = The last three terms in this list show that the numerator of the exponent is the log base

43 CHAPTER 5. FITTING THE RIGHT TAIL OF THE PERFECT MATCH (PM) PROBE DATA 34 two of the input and the denominator of the exponent is the input. This suggests that the function 3 log (x) x may work for g(x) which suggests that f(x) = g(x) x = 3 log (x). (5.1). To verify that f(x) = 3 log (x) has the desired property this function is tested in 3 log (x) 3 log (x) = r log (x 1 ) log ((x) 1 ) = log 3 (r) log () = log 3 (r) 3 = r The last line of the algebra shows that f(x) has the desired property. The format of this function suggests that it would be possible to generate functions such that f(x) f(αx) = β where α, β > 1 by using the function β logα(x). Working out the same steps that were performed for f(x) in the previous paragraph it follows that β logα(x) β logα(αx) = r log α (x 1 ) log α ((αx) 1 ) = log β (r) log α (α) = log β (r) β = r This algebra shows that this class of functions has the expected property.

44 CHAPTER 5. FITTING THE RIGHT TAIL OF THE PERFECT MATCH (PM) PROBE DATA Application of Functions that Decrease by a Common Ratio to the Right Tail Attempting to fit the right tails of the sample pdfs immediately yields encouraging results. By fitting a shifted version of the function f(x) = 3 log(x) to the right tail of the sample pdf histogram from T01 tumor.cel it can be seen that the shifted version of f(x) and the right tail of the sample pdf histogram are very similar (Figure 5.1). It seems likely that the cdf for the sample data approaches a function that decreases by a common ratio. Figure 5.1: Plot of the right tail of the sample pdf histogram for the PM probe data from T01 tumor.cel fitted to a shifted version of f(x) = 3 log (x). Comparing the right tail of an EMG pdf to f(x) shows that these two functions are very different. Both functions are concave up and constantly decreasing, however, the rate of decrease is very different. By definition the ratio f(x) = 3 is constant with f(x) respect to x. For an EMG distribution this ratio is not constant with respect to x

45 CHAPTER 5. FITTING THE RIGHT TAIL OF THE PERFECT MATCH (PM) PROBE DATA 36 and can be very different from 3 depending on the value of x. As an example when mu = 0, σ = 1, λ = 1, and x =10 the ratio for the right tail of an EMG distribution is approximatley 5,000. In general, the right tail of an EMG pdf converges to zero much more quickly than does f(x).

46 37 6. Practical Implementation of EMG Parameter Estimation Method and Properties In section 3.3 it was shown that once the value of the variable k is known, it is possible to estimate the parameters of an EMG distribution using two sample quantiles. Using (3.6) it is possible to estimate k by replacing EMG m (k) with the sample quantile of the sample mean. Combining these two results constitutes a parameter estimation method which is given by 1. Estimate k with k e where k e is calculated by replacing the left hand side of the following equation with the sample quantile of the sample mean and solving for k e EMG m (k e ) = 1 [1 e k e 1 erfc( k e ke 1 1 ) + erf( )] k e. Determine µ from the quantile determined by the right hand side of EMG µ (k e ) = 1 [1 e k e k e erfc( )] 3. Determine ml = µ + λ 1 from the quantile determined by the right hand side of EMG m (k e ) = 1 [1 e k e 1 erfc( k e ke 1 1 ) + erf( )] k e 4. Determine ms = µ + σ from the quantile determined by the right hand side of EMG c (0, 1, k e ) = 1 [1 e k e ke erfc( k e 1 ) + erf( 1 )]

47 CHAPTER 6. PRACTICAL IMPLEMENTATION OF EMG PARAMETER ESTIMATION METHOD AND PROPERTIES Estimate λ by subtracting the estimate of µ from ml and then taking the multiplicative inverse of the result 6. Estimate σ by subtracting the estimate of µ from ms Although this method will work in theory there are several modifications that need to be made in order to make it practical. It will be shown that by performing several slight modifications to the parameter estimation procedure described in the previous paragraph a practical implementation will result. The final implementation is consistent and always returns valid parameter estimates where valid parameter estimates are parameter estimates that satisfy all constraints on the original parameters (such as σ > 0). This new parameter estimation method is then compared to other parameter estimation methods for the EMG distribution from the literature. It is found that the new parameter estimation method has several advantages over other currently available methods. 6.1 Proof of Consistency It is proved that the new parameter estimation method as introduced at the beginning of this chapter is consistent. This proof will also apply to the final implementation as the modification made will in no way affect consistency. Theorem The parameter estimation method introduced is consistent. Proof. To prove the theorem it will first be proved that the estimate for k is consistent. Applying the same techniques used to show that k is consistent it can easily be shown that the consistency of the parameter estimates follows from the consistency of k. To show that k is consistent it will be shown that the sample quantile of the sample mean is a consistent estimate for the quantile of the mean. Given the continuity of the EMG cdf it will then follow that the estimate for k is consistent.

Lesson 11. Functional Genomics I: Microarray Analysis

Lesson 11. Functional Genomics I: Microarray Analysis Lesson 11 Functional Genomics I: Microarray Analysis Transcription of DNA and translation of RNA vary with biological conditions 3 kinds of microarray platforms Spotted Array - 2 color - Pat Brown (Stanford)

More information

Low-Level Analysis of High- Density Oligonucleotide Microarray Data

Low-Level Analysis of High- Density Oligonucleotide Microarray Data Low-Level Analysis of High- Density Oligonucleotide Microarray Data Ben Bolstad http://www.stat.berkeley.edu/~bolstad Biostatistics, University of California, Berkeley UC Berkeley Feb 23, 2004 Outline

More information

QUADRATIC EQUATIONS M.K. HOME TUITION. Mathematics Revision Guides Level: GCSE Higher Tier

QUADRATIC EQUATIONS M.K. HOME TUITION. Mathematics Revision Guides Level: GCSE Higher Tier Mathematics Revision Guides Quadratic Equations Page 1 of 8 M.K. HOME TUITION Mathematics Revision Guides Level: GCSE Higher Tier QUADRATIC EQUATIONS Version: 3.1 Date: 6-10-014 Mathematics Revision Guides

More information

More Protein Synthesis and a Model for Protein Transcription Error Rates

More Protein Synthesis and a Model for Protein Transcription Error Rates More Protein Synthesis and a Model for Protein James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University October 3, 2013 Outline 1 Signal Patterns Example

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

BME 5742 Biosystems Modeling and Control

BME 5742 Biosystems Modeling and Control BME 5742 Biosystems Modeling and Control Lecture 24 Unregulated Gene Expression Model Dr. Zvi Roth (FAU) 1 The genetic material inside a cell, encoded in its DNA, governs the response of a cell to various

More information

Continuous Random Variables

Continuous Random Variables MATH 38 Continuous Random Variables Dr. Neal, WKU Throughout, let Ω be a sample space with a defined probability measure P. Definition. A continuous random variable is a real-valued function X defined

More information

56 CHAPTER 3. POLYNOMIAL FUNCTIONS

56 CHAPTER 3. POLYNOMIAL FUNCTIONS 56 CHAPTER 3. POLYNOMIAL FUNCTIONS Chapter 4 Rational functions and inequalities 4.1 Rational functions Textbook section 4.7 4.1.1 Basic rational functions and asymptotes As a first step towards understanding

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

Asymptotic distribution of the sample average value-at-risk

Asymptotic distribution of the sample average value-at-risk Asymptotic distribution of the sample average value-at-risk Stoyan V. Stoyanov Svetlozar T. Rachev September 3, 7 Abstract In this paper, we prove a result for the asymptotic distribution of the sample

More information

λ = pn µt = µ(dt)(n) Phylomath Lecture 4

λ = pn µt = µ(dt)(n) Phylomath Lecture 4 Phylomath Lecture 4 Brigid O Donnell (17 February 2004) A return to sojourn times We began by returning to the idea of sojourn times, or the time period until the next disruption event occurs for a given

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Bradley Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

BE/APh161: Physical Biology of the Cell Homework 2 Due Date: Wednesday, January 18, 2017

BE/APh161: Physical Biology of the Cell Homework 2 Due Date: Wednesday, January 18, 2017 BE/APh161: Physical Biology of the Cell Homework 2 Due Date: Wednesday, January 18, 2017 Doubt is the father of creation. - Galileo Galilei 1. Number of mrna in a cell. In this problem, we are going to

More information

CSEP 590A Summer Lecture 4 MLE, EM, RE, Expression

CSEP 590A Summer Lecture 4 MLE, EM, RE, Expression CSEP 590A Summer 2006 Lecture 4 MLE, EM, RE, Expression 1 FYI, re HW #2: Hemoglobin History Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm

More information

CSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators

CSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators CSEP 59A Summer 26 Lecture 4 MLE, EM, RE, Expression FYI, re HW #2: Hemoglobin History 1 Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm

More information

Normalization. Example of Replicate Data. Biostatistics Rafael A. Irizarry

Normalization. Example of Replicate Data. Biostatistics Rafael A. Irizarry This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

IENG581 Design and Analysis of Experiments INTRODUCTION

IENG581 Design and Analysis of Experiments INTRODUCTION Experimental Design IENG581 Design and Analysis of Experiments INTRODUCTION Experiments are performed by investigators in virtually all fields of inquiry, usually to discover something about a particular

More information

BLAST: Target frequencies and information content Dannie Durand

BLAST: Target frequencies and information content Dannie Durand Computational Genomics and Molecular Biology, Fall 2016 1 BLAST: Target frequencies and information content Dannie Durand BLAST has two components: a fast heuristic for searching for similar sequences

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

GRE Quantitative Reasoning Practice Questions

GRE Quantitative Reasoning Practice Questions GRE Quantitative Reasoning Practice Questions y O x 7. The figure above shows the graph of the function f in the xy-plane. What is the value of f (f( ))? A B C 0 D E Explanation Note that to find f (f(

More information

GCD3033:Cell Biology. Transcription

GCD3033:Cell Biology. Transcription Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors

More information

DOUBLE SERIES AND PRODUCTS OF SERIES

DOUBLE SERIES AND PRODUCTS OF SERIES DOUBLE SERIES AND PRODUCTS OF SERIES KENT MERRYFIELD. Various ways to add up a doubly-indexed series: Let be a sequence of numbers depending on the two variables j and k. I will assume that 0 j < and 0

More information

CS 542G: The Poisson Problem, Finite Differences

CS 542G: The Poisson Problem, Finite Differences CS 542G: The Poisson Problem, Finite Differences Robert Bridson November 10, 2008 1 The Poisson Problem At the end last time, we noticed that the gravitational potential has a zero Laplacian except at

More information

Chapter 4 Continuous Random Variables and Probability Distributions

Chapter 4 Continuous Random Variables and Probability Distributions Chapter 4 Continuous Random Variables and Probability Distributions Part 3: The Exponential Distribution and the Poisson process Section 4.8 The Exponential Distribution 1 / 21 Exponential Distribution

More information

Module 3. Function of a Random Variable and its distribution

Module 3. Function of a Random Variable and its distribution Module 3 Function of a Random Variable and its distribution 1. Function of a Random Variable Let Ω, F, be a probability space and let be random variable defined on Ω, F,. Further let h: R R be a given

More information

The Growth of Functions. A Practical Introduction with as Little Theory as possible

The Growth of Functions. A Practical Introduction with as Little Theory as possible The Growth of Functions A Practical Introduction with as Little Theory as possible Complexity of Algorithms (1) Before we talk about the growth of functions and the concept of order, let s discuss why

More information

College Algebra Through Problem Solving (2018 Edition)

College Algebra Through Problem Solving (2018 Edition) City University of New York (CUNY) CUNY Academic Works Open Educational Resources Queensborough Community College Winter 1-25-2018 College Algebra Through Problem Solving (2018 Edition) Danielle Cifone

More information

Physics 509: Bootstrap and Robust Parameter Estimation

Physics 509: Bootstrap and Robust Parameter Estimation Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

1.5 Sequence alignment

1.5 Sequence alignment 1.5 Sequence alignment The dramatic increase in the number of sequenced genomes and proteomes has lead to development of various bioinformatic methods and algorithms for extracting information (data mining)

More information

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y. CS450 Final Review Problems Fall 08 Solutions or worked answers provided Problems -6 are based on the midterm review Identical problems are marked recap] Please consult previous recitations and textbook

More information

Statistics 100A Homework 5 Solutions

Statistics 100A Homework 5 Solutions Chapter 5 Statistics 1A Homework 5 Solutions Ryan Rosario 1. Let X be a random variable with probability density function a What is the value of c? fx { c1 x 1 < x < 1 otherwise We know that for fx to

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Section of Bioinformatics Department of Biostatistics and Applied Mathematics UT M. D. Anderson Cancer Center kabagg@mdanderson.org

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

3 Inequalities Absolute Values Inequalities and Intervals... 18

3 Inequalities Absolute Values Inequalities and Intervals... 18 Contents 1 Real Numbers, Exponents, and Radicals 1.1 Rationalizing the Denominator................................... 1. Factoring Polynomials........................................ 1. Algebraic and Fractional

More information

Improved Holt Method for Irregular Time Series

Improved Holt Method for Irregular Time Series WDS'08 Proceedings of Contributed Papers, Part I, 62 67, 2008. ISBN 978-80-7378-065-4 MATFYZPRESS Improved Holt Method for Irregular Time Series T. Hanzák Charles University, Faculty of Mathematics and

More information

Suppose we have the set of all real numbers, R, and two operations, +, and *. Then the following are assumed to be true.

Suppose we have the set of all real numbers, R, and two operations, +, and *. Then the following are assumed to be true. Algebra Review In this appendix, a review of algebra skills will be provided. Students sometimes think that there are tricks needed to do algebra. Rather, algebra is a set of rules about what one may and

More information

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued Chapter 3 sections Chapter 3 - continued 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.3 The Cumulative Distribution Function 3.4 Bivariate Distributions 3.5 Marginal Distributions

More information

Chapter 9: Roots and Irrational Numbers

Chapter 9: Roots and Irrational Numbers Chapter 9: Roots and Irrational Numbers Index: A: Square Roots B: Irrational Numbers C: Square Root Functions & Shifting D: Finding Zeros by Completing the Square E: The Quadratic Formula F: Quadratic

More information

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Lecture No. # 36 Sampling Distribution and Parameter Estimation

More information

Continuous Probability Distributions. Uniform Distribution

Continuous Probability Distributions. Uniform Distribution Continuous Probability Distributions Uniform Distribution Important Terms & Concepts Learned Probability Mass Function (PMF) Cumulative Distribution Function (CDF) Complementary Cumulative Distribution

More information

Week #1 The Exponential and Logarithm Functions Section 1.2

Week #1 The Exponential and Logarithm Functions Section 1.2 Week #1 The Exponential and Logarithm Functions Section 1.2 From Calculus, Single Variable by Hughes-Hallett, Gleason, McCallum et. al. Copyright 2005 by John Wiley & Sons, Inc. This material is used by

More information

k-protected VERTICES IN BINARY SEARCH TREES

k-protected VERTICES IN BINARY SEARCH TREES k-protected VERTICES IN BINARY SEARCH TREES MIKLÓS BÓNA Abstract. We show that for every k, the probability that a randomly selected vertex of a random binary search tree on n nodes is at distance k from

More information

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection Model comparison Patrick Breheny March 28 Patrick Breheny BST 760: Advanced Regression 1/25 Wells in Bangladesh In this lecture and the next, we will consider a data set involving modeling the decisions

More information

Pre-Algebra (6/7) Pacing Guide

Pre-Algebra (6/7) Pacing Guide Pre-Algebra (6/7) Pacing Guide Vision Statement Imagine a classroom, a school, or a school district where all students have access to high-quality, engaging mathematics instruction. There are ambitious

More information

F X (x) = P [X x] = x f X (t)dt. 42 Lebesgue-a.e, to be exact 43 More specifically, if g = f Lebesgue-a.e., then g is also a pdf for X.

F X (x) = P [X x] = x f X (t)dt. 42 Lebesgue-a.e, to be exact 43 More specifically, if g = f Lebesgue-a.e., then g is also a pdf for X. 10.2 Properties of PDF and CDF for Continuous Random Variables 10.18. The pdf f X is determined only almost everywhere 42. That is, given a pdf f for a random variable X, if we construct a function g by

More information

Distribution Fitting (Censored Data)

Distribution Fitting (Censored Data) Distribution Fitting (Censored Data) Summary... 1 Data Input... 2 Analysis Summary... 3 Analysis Options... 4 Goodness-of-Fit Tests... 6 Frequency Histogram... 8 Comparison of Alternative Distributions...

More information

MATH Notebook 5 Fall 2018/2019

MATH Notebook 5 Fall 2018/2019 MATH442601 2 Notebook 5 Fall 2018/2019 prepared by Professor Jenny Baglivo c Copyright 2004-2019 by Jenny A. Baglivo. All Rights Reserved. 5 MATH442601 2 Notebook 5 3 5.1 Sequences of IID Random Variables.............................

More information

Measurements and Data Analysis

Measurements and Data Analysis Measurements and Data Analysis 1 Introduction The central point in experimental physical science is the measurement of physical quantities. Experience has shown that all measurements, no matter how carefully

More information

tutorial Statistical reliability analysis on Rayleigh probability distributions

tutorial Statistical reliability analysis on Rayleigh probability distributions tutorial Statistical reliability analysis on Rayleigh probability distributions These techniques for determining a six-sigma capability index for a Rayleigh distribution use sample mean and deviation.

More information

ABSTRACT. HEWITT, CHRISTINA M. Real Roots of Polynomials with Real Coefficients. (Under the direction of Dr. Michael Singer).

ABSTRACT. HEWITT, CHRISTINA M. Real Roots of Polynomials with Real Coefficients. (Under the direction of Dr. Michael Singer). ABSTRACT HEWITT, CHRISTINA M. Real Roots of Polynomials with Real Coefficients. (Under the direction of Dr. Michael Singer). Polynomial equations are used throughout mathematics. When solving polynomials

More information

Multivariate Distribution Models

Multivariate Distribution Models Multivariate Distribution Models Model Description While the probability distribution for an individual random variable is called marginal, the probability distribution for multiple random variables is

More information

Introducing the Normal Distribution

Introducing the Normal Distribution Department of Mathematics Ma 3/103 KC Border Introduction to Probability and Statistics Winter 2017 Lecture 10: Introducing the Normal Distribution Relevant textbook passages: Pitman [5]: Sections 1.2,

More information

Continuous Probability Distributions. Uniform Distribution

Continuous Probability Distributions. Uniform Distribution Continuous Probability Distributions Uniform Distribution Important Terms & Concepts Learned Probability Mass Function (PMF) Cumulative Distribution Function (CDF) Complementary Cumulative Distribution

More information

Controlling Gene Expression

Controlling Gene Expression Controlling Gene Expression Control Mechanisms Gene regulation involves turning on or off specific genes as required by the cell Determine when to make more proteins and when to stop making more Housekeeping

More information

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid. 1. A change that makes a polypeptide defective has been discovered in its amino acid sequence. The normal and defective amino acid sequences are shown below. Researchers are attempting to reproduce the

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

MATH4427 Notebook 4 Fall Semester 2017/2018

MATH4427 Notebook 4 Fall Semester 2017/2018 MATH4427 Notebook 4 Fall Semester 2017/2018 prepared by Professor Jenny Baglivo c Copyright 2009-2018 by Jenny A. Baglivo. All Rights Reserved. 4 MATH4427 Notebook 4 3 4.1 K th Order Statistics and Their

More information

M155 Exam 2 Concept Review

M155 Exam 2 Concept Review M155 Exam 2 Concept Review Mark Blumstein DERIVATIVES Product Rule Used to take the derivative of a product of two functions u and v. u v + uv Quotient Rule Used to take a derivative of the quotient of

More information

Seminar Microarray-Datenanalyse

Seminar Microarray-Datenanalyse Seminar Microarray- Normalization Hans-Ulrich Klein Christian Ruckert Institut für Medizinische Informatik WWU Münster SS 2011 Organisation 1 09.05.11 Normalisierung 2 10.05.11 Bestimmen diff. expr. Gene,

More information

More on Estimation. Maximum Likelihood Estimation.

More on Estimation. Maximum Likelihood Estimation. More on Estimation. In the previous chapter we looked at the properties of estimators and the criteria we could use to choose between types of estimators. Here we examine more closely some very popular

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

Chapter 13 - Inverse Functions

Chapter 13 - Inverse Functions Chapter 13 - Inverse Functions In the second part of this book on Calculus, we shall be devoting our study to another type of function, the exponential function and its close relative the Sine function.

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

MTH101 Calculus And Analytical Geometry Lecture Wise Questions and Answers For Final Term Exam Preparation

MTH101 Calculus And Analytical Geometry Lecture Wise Questions and Answers For Final Term Exam Preparation MTH101 Calculus And Analytical Geometry Lecture Wise Questions and Answers For Final Term Exam Preparation Lecture No 23 to 45 Complete and Important Question and answer 1. What is the difference between

More information

Student s Printed Name: KEY_&_Grading Guidelines_CUID:

Student s Printed Name: KEY_&_Grading Guidelines_CUID: Student s Printed Name: KEY_&_Grading Guidelines_CUID: Instructor: Section # : You are not permitted to use a calculator on any portion of this test. You are not allowed to use any textbook, notes, cell

More information

Microarray Preprocessing

Microarray Preprocessing Microarray Preprocessing Normaliza$on Normaliza$on is needed to ensure that differences in intensi$es are indeed due to differen$al expression, and not some prin$ng, hybridiza$on, or scanning ar$fact.

More information

1 Degree distributions and data

1 Degree distributions and data 1 Degree distributions and data A great deal of effort is often spent trying to identify what functional form best describes the degree distribution of a network, particularly the upper tail of that distribution.

More information

Algebra Exam. Solutions and Grading Guide

Algebra Exam. Solutions and Grading Guide Algebra Exam Solutions and Grading Guide You should use this grading guide to carefully grade your own exam, trying to be as objective as possible about what score the TAs would give your responses. Full

More information

Probe-Level Analysis of Affymetrix GeneChip Microarray Data

Probe-Level Analysis of Affymetrix GeneChip Microarray Data Probe-Level Analysis of Affymetrix GeneChip Microarray Data Ben Bolstad http://www.stat.berkeley.edu/~bolstad Biostatistics, University of California, Berkeley University of Minnesota Mar 30, 2004 Outline

More information

CSC 446 Notes: Lecture 13

CSC 446 Notes: Lecture 13 CSC 446 Notes: Lecture 3 The Problem We have already studied how to calculate the probability of a variable or variables using the message passing method. However, there are some times when the structure

More information

Finite Mathematics : A Business Approach

Finite Mathematics : A Business Approach Finite Mathematics : A Business Approach Dr. Brian Travers and Prof. James Lampes Second Edition Cover Art by Stephanie Oxenford Additional Editing by John Gambino Contents What You Should Already Know

More information

CHAPTER 8 MODEL DIAGNOSTICS. 8.1 Residual Analysis

CHAPTER 8 MODEL DIAGNOSTICS. 8.1 Residual Analysis CHAPTER 8 MODEL DIAGNOSTICS We have now discussed methods for specifying models and for efficiently estimating the parameters in those models. Model diagnostics, or model criticism, is concerned with testing

More information

Random Number Generation. CS1538: Introduction to simulations

Random Number Generation. CS1538: Introduction to simulations Random Number Generation CS1538: Introduction to simulations Random Numbers Stochastic simulations require random data True random data cannot come from an algorithm We must obtain it from some process

More information

MATHEMATICS Grade 5 Standard: Number, Number Sense and Operations. Organizing Topic Benchmark Indicator

MATHEMATICS Grade 5 Standard: Number, Number Sense and Operations. Organizing Topic Benchmark Indicator Standard: Number, Number Sense and Operations Number and A. Represent and compare numbers less than 0 through 6. Construct and compare numbers greater than and less Number Systems familiar applications

More information

Linear Equations. Find the domain and the range of the following set. {(4,5), (7,8), (-1,3), (3,3), (2,-3)}

Linear Equations. Find the domain and the range of the following set. {(4,5), (7,8), (-1,3), (3,3), (2,-3)} Linear Equations Domain and Range Domain refers to the set of possible values of the x-component of a point in the form (x,y). Range refers to the set of possible values of the y-component of a point in

More information

13.7 ANOTHER TEST FOR TREND: KENDALL S TAU

13.7 ANOTHER TEST FOR TREND: KENDALL S TAU 13.7 ANOTHER TEST FOR TREND: KENDALL S TAU In 1969 the U.S. government instituted a draft lottery for choosing young men to be drafted into the military. Numbers from 1 to 366 were randomly assigned to

More information

4.4 Graphs of Logarithmic Functions

4.4 Graphs of Logarithmic Functions 590 Chapter 4 Exponential and Logarithmic Functions 4.4 Graphs of Logarithmic Functions In this section, you will: Learning Objectives 4.4.1 Identify the domain of a logarithmic function. 4.4.2 Graph logarithmic

More information

Lecture Notes in Quantitative Biology

Lecture Notes in Quantitative Biology Lecture Notes in Quantitative Biology Numerical Methods L30a L30b Chapter 22.1 Revised 29 November 1996 ReCap Numerical methods Definition Examples p-values--h o true p-values--h A true modal frequency

More information

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued Chapter 3 sections 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.3 The Cumulative Distribution Function 3.4 Bivariate Distributions 3.5 Marginal Distributions 3.6 Conditional

More information

a factors The exponential 0 is a special case. If b is any nonzero real number, then

a factors The exponential 0 is a special case. If b is any nonzero real number, then 0.1 Exponents The expression x a is an exponential expression with base x and exponent a. If the exponent a is a positive integer, then the expression is simply notation that counts how many times the

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation.

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation. Protein Synthesis Unit 6 Goal: Students will be able to describe the processes of transcription and translation. Types of RNA Messenger RNA (mrna) makes a copy of DNA, carries instructions for making proteins,

More information

AP * Calculus Review. Limits, Continuity, and the Definition of the Derivative

AP * Calculus Review. Limits, Continuity, and the Definition of the Derivative AP * Calculus Review Limits, Continuity, and the Definition of the Derivative Teacher Packet Advanced Placement and AP are registered trademark of the College Entrance Examination Board. The College Board

More information

Problem 1 (20) Log-normal. f(x) Cauchy

Problem 1 (20) Log-normal. f(x) Cauchy ORF 245. Rigollet Date: 11/21/2008 Problem 1 (20) f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.0 0.2 0.4 0.6 0.8 4 2 0 2 4 Normal (with mean -1) 4 2 0 2 4 Negative-exponential x x f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.5

More information

Chemical reaction networks and diffusion

Chemical reaction networks and diffusion FYTN05 Computer Assignment 2 Chemical reaction networks and diffusion Supervisor: Adriaan Merlevede Office: K336-337, E-mail: adriaan@thep.lu.se 1 Introduction This exercise focuses on understanding and

More information

the amount of the data corresponding to the subinterval the width of the subinterval e x2 to the left by 5 units results in another PDF g(x) = 1 π

the amount of the data corresponding to the subinterval the width of the subinterval e x2 to the left by 5 units results in another PDF g(x) = 1 π Math 10A with Professor Stankova Worksheet, Discussion #42; Friday, 12/8/2017 GSI name: Roy Zhao Problems 1. For each of the following distributions, derive/find all of the following: PMF/PDF, CDF, median,

More information

numerical analysis 1

numerical analysis 1 numerical analysis 1 1.1 Differential equations In this chapter we are going to study differential equations, with particular emphasis on how to solve them with computers. We assume that the reader has

More information

Probability Distribution

Probability Distribution Economic Risk and Decision Analysis for Oil and Gas Industry CE81.98 School of Engineering and Technology Asian Institute of Technology January Semester Presented by Dr. Thitisak Boonpramote Department

More information

Bell-shaped curves, variance

Bell-shaped curves, variance November 7, 2017 Pop-in lunch on Wednesday Pop-in lunch tomorrow, November 8, at high noon. Please join our group at the Faculty Club for lunch. Means If X is a random variable with PDF equal to f (x),

More information

1. In most cases, genes code for and it is that

1. In most cases, genes code for and it is that Name Chapter 10 Reading Guide From DNA to Protein: Gene Expression Concept 10.1 Genetics Shows That Genes Code for Proteins 1. In most cases, genes code for and it is that determine. 2. Describe what Garrod

More information

Milford Public Schools Curriculum

Milford Public Schools Curriculum Milford Public Schools Curriculum Department: Mathematics Course Name: Math 07 UNIT 1 Unit Title: Operating with Rational Numbers (add/sub) Unit Description: Number System Apply and extend previous understandings

More information

Examining the accuracy of the normal approximation to the poisson random variable

Examining the accuracy of the normal approximation to the poisson random variable Eastern Michigan University DigitalCommons@EMU Master's Theses and Doctoral Dissertations Master's Theses, and Doctoral Dissertations, and Graduate Capstone Projects 2009 Examining the accuracy of the

More information

2008 Winton. Statistical Testing of RNGs

2008 Winton. Statistical Testing of RNGs 1 Statistical Testing of RNGs Criteria for Randomness For a sequence of numbers to be considered a sequence of randomly acquired numbers, it must have two basic statistical properties: Uniformly distributed

More information

Essentials of Mathematics Lesson Objectives

Essentials of Mathematics Lesson Objectives Essentials of Mathematics Lesson Unit 1: NUMBER SENSE Reviewing Rational Numbers Practice adding, subtracting, multiplying, and dividing whole numbers, fractions, and decimals. Practice evaluating exponents.

More information

Support for Michigan GLCE Math Standards, grades 4-8

Support for Michigan GLCE Math Standards, grades 4-8 Support for Michigan GLCE Math Standards, grades 4-8 Hello Michigan middle school math teachers! I hope you enjoy this little guide, designed to help you quickly and easily find support for some of the

More information

Table 2.1 presents examples and explains how the proper results should be written. Table 2.1: Writing Your Results When Adding or Subtracting

Table 2.1 presents examples and explains how the proper results should be written. Table 2.1: Writing Your Results When Adding or Subtracting When you complete a laboratory investigation, it is important to make sense of your data by summarizing it, describing the distributions, and clarifying messy data. Analyzing your data will allow you to

More information