Week 2 Statistics for bioinformatics and escience

Size: px
Start display at page:

Download "Week 2 Statistics for bioinformatics and escience"

Transcription

1 Week 2 Statistics for bioinformatics and escience Line Skotte 20. november ) Revisited. When solving these exercises, some of you tried to capture a whole open reading frame by pattern matching with a regular expression. I was very impressed by your solutions. But when using repetitions, *, in regular expressions, you must remember that the pattern matching often is greedy per default. That means that a command like gregexpr("atg.{3}*t(aa GA AG)",tmp) ends with the last stop codon in reading frame with the matched start codon, not with the first stop codon in frame after the start codon. To avoid this, you have to make the regular expression non-greedy. One way to do this is to use gregexpr("(atg)(.{3})*?(t(aa GA AG))",tmp, perl=true) instead. The *? means that the match should use as few repetitions of.{3} as possible. There might be more elegant ways of achieving the same. Notice that now it is the length of the match we are interested in, therefore we must access the attribute of the object that this function returns. The following function generate a random sequence, finds the open reading frames and simply returns the length of the first open reading frame. orffun<-function(){ tmp=paste(sample(c("a","c","g","t"),1000,replace=t),sep="",collapse="") orf<-gregexpr("(atg)(.{3})*?(t(aa GA AG))",tmp, perl=true) return((attr(orf[[1]],"match.length")[1]/3)-2) } Then the following commands orfs <- replicate(1000,orffun()) barplot(table(orfs), xlab="length", ylab="frequency") give the plot below. 1

2 Frequency Length 2.6.1) Consider for β > 0 the function F (x) = 1 exp( x β ), x 0. To show that this is a distribution function, we must according to Theorem (p. 30) show that properties (i), (ii) and (iii) of Theorem (p. 29) is satisfied. (i) F is increasing: Let x 1 x 2, then x β 1 xβ 2 and thus xβ 1 xβ 2. This implies that exp( x β 1 ) exp( xβ 2 ). Therefore (ii) It is understood that F (x 1 ) = 1 exp( x β 1 ) 1 exp( xβ 2 ) = F (x 2). F (x) = (1 exp( x β ))1 [0, ) (x). Therefore it is obvious that F (x) 0 when x. Furthermore when x we have that x β, which gives exp( x β ) 0 for x. Thus it is also obvious that F (x) = 1 exp( x β ) 1 for x. (iii) Finally to show that F is right continuous at any x R, note that the function 1 exp( x β ) is continuous (since it is a combination of continuous functions). That gives us that F is continuous in all x R\{0}, specially right continuous. 2

3 2.6.4) For λ > 0, let For x = 0, we have that F (0) = 0 and that lim ε 0,ε>0 F (ε) = 0. Thus for all x R we have that f λ (x) = ( lim F (x + ε) = F (x). ε 0,ε> x2 2λ ) λ+ 1, x R. 2 ( Notice that for any x R, we have that x2 2λ 0, thus 1 + x2 2λ that f λ (x) > 0. Now define the normalization constant c(λ) = f λ (x)dx. ) λ+ 1 2 > 0 and it follows The integrate function in R, demands that the function it integrates, must be an R function taking a numeric first argument and returning a numeric vector of the same length. So we can define f λ in R in the following way: flambda <- function(x,lambda){ 1/((1+(x^2)/(2*lambda))^(lambda+0.5)) } Numerical integration with λ = 1 2 is then carried out by writing integrate(flambda, -Inf, Inf, lambda=0.5). The integral can be calculated by vlambda <- c(0.25,0.5, 1, 2, 5, 10, 20, 50, 100) numint <- sapply(vlambda, function(parm){ integrate(flambda, -Inf, Inf, lambda=parm)$value }) for several different values of λ at the same time! Plotting by plot(vlambda, numint, ylim=c(2,4), ylab= c(lambda), xlab= lambda ) abline(h=pi, col= red, lty=2) abline(h=sqrt(2*pi), col= red, lty=2) makes the comparison with π and 2π easy. We notice that c(0.5) = π and that c(λ) 2π when λ. 3

4 c(lambda) lambda Since c(λ) > 0, we have that c(λ) 1 f λ (x) > 0 for all x R and since c(λ) 1 f λ (x)dx = c(λ) 1 f λ (x)dx = 1 the function c(λ) 1 f λ (x) is a density (according to page 32). To compare this t-distribution density with the density for the normal distribution plot(-50:50/10, dnorm(-50:50/10), xlab= x, ylab= f(x) ) points(-50:50/10, 1/numint[2]*flambda(-50:50/10, 0.5), col= red ) The red curve is the t-distribution. 4

5 f(x) x 2.6.6) According to Example , the density for the Gumbel distribution is f(x) = exp( x) exp( exp( x)) = exp( x exp( x)). The mean is defined if x f(x)dx < as µ = of the mean in R, can be done by xf(x)dx. Numerical computation fgumbel <- function(x){exp(-x-exp(-x))} mugumbel <- integrate(function(x){x*fgumbel(x)}, -Inf, Inf) Actually, the mean equals the Euler-Mascheroni constant, which is related to the Γ- function. Now the variance σ 2 = (x µ)2 f(x)dx can be calculated numerically by vargumbel <- integrate(function(x){(x-mugumbel$value)^2*fgumbel(x)}, -Inf, Inf) The variance in the gumbel distribution equals π 2 / ) We consider the probabilistic model of the pair of letters from Example (p. 55). The sample space is E = {A,C,G,T} {A,C,G,T}. Let X and Y denote the random variables representing the two aligned nucleic acids and let their joint distribution be as given in the exercise. 5

6 By Definition (p. 54), the point probabilities of the marginal distribution P 1 of X is given by p 1 (A) = P 1 ({A}) = P(X {A}) = P((X, Y ) {A} {A, C, G, T }) = P ({A} {A,C,G,T}) = y {A,C,G,T} = = p(a, y) All the other point probabilities is found in the same way. Thus we get that the marginal distribution P 1 of X is given by the point probabilities p 1 (A) = 0.20, p 1 (C) = 0.37, p 1 (G) = 0.22 and p 1 (T) = Again by definition , the point probabilities of the marginal distribution P 2 of Y is given by p 2 (A) = P 2 ({A}) = P ({A,C,G,T} {A}) = p(x, A) = = x {A,C,G,T} The other point probabilities is found in the same way. The marginal distribution P 2 of Y is given by the point probabilities p 1 (A) = 0.21, p 1 (C) = 0.34, p 1 (G) = 0.24 and p 1 (T) = Now assume that X and Y has the same marginal distributions, P 1 and P 2 as above, but that X and Y are independent. Let P denote the joint distribution, that makes X and Y independent. By Definition P must for all events M 1 {A,C,G,T} and M 2 {A,C,G,T} satiesfy that P (M 1 M 2 ) = P 1 (M 1 )P 2 (M 2 ). Since the joint sample space E = {A,C,G,T} {A,C,G,T} is discrete, P is given by its point probabilities. According to the above, we have that p (A, A) = P ({A} {A}) = P 1 ({A})P 2 ({A}) = p 1 (A)p 2 (A) and similar for all other pairs of nucleotides. Thus we can calculate the point probabilities of the joint distribution P simply by multiplying the appropriate point probabilities of the marginal distributions P 1 and P 2. (This is also stated in Theorem 2.9.3!) The point probabilities, p of the distribution P that make X and Y independent with marginal distributions P 1 and P 2 is then found to be A C G T A C G T

7 The probability under P of the event X = Y is found by P(X = Y ) = P ({(x, y) E x = y}) = {(x,y) E x=y} = = p (x, y) The probability that two aminoacids are equal is much smaller then in the example. When X and Y are independent, the probability of obtaining a pair of not equal nucleotides is higher ) We think of the data as representing the independent outcomes of the random vector (X, Y ), note that X and Y are dependent. The sample space E = E a E a, where E a denotes the amino acid alphabet. The data is loaded into R with aadata <- read.table(" We cross tabulate the data with aafreq <- table(aadata). The matrix of relative frequencies is then obtained by division with the total number of observations: N <- dim(aadata)[1] relfreq <- aafreq/n 2.9.2) Assume that the joint distribution, P of X and Y are given by the point probabilities that is the relative frequencies from above. The point probabilities p 1 and p 2 of the marginal distributions P 1 and P 2 of X and Y are by Definition calculated by prob_1<-apply(relfreq,1,sum) prob_2<-apply(relfreq,2,sum) It follows that X and Y are not independent, since The score matrix is calculated by p(a, A) = = p 1 (A)p 2 (A). score <- log(relfreq/outer(prob_1,prob_2)) Since S x,y = log(p(x, y)) log(p 1 (x)p 2 (x)) in can be thought of as a measure of how different the joint distribution is from the distribution making X and Y independent with marginal distributions P 1 and P 2. Or it can be thought of as a way to compare 7

8 how probable it is to observe (x, y) under the joint distribution compared with under the independence-distribution. S X,Y is simply a transformation of the random vector (X, Y ), and as such it is itself a random variable. The sample space of S X,Y is finite, since only a finite number of values is possible. When (X, Y ) has distribution P, the log is always defined, since it is only with probability zero that we get a pair (x, y) for wich p(x, y) = 0 (the problem was that log(0) is undefined). Example tells us exactly how to calculate the mean under the different distributions µ = x E h(x)p(x). This is done in R by score[score==-inf]<-0 sum(score*relfreq) (When the joint distribution is 0, then the score funtion equals in R, but since this occurs with probabitity zero, we can change the values of the score function for these pairs of letters without changing the distribution). 8

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes. A Probability Primer A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes. Are you holding all the cards?? Random Events A random event, E,

More information

Week 3 Statistics for bioinformatics and escience

Week 3 Statistics for bioinformatics and escience Week 3 Statitic for bioinformatic and escience Line Skotte 28. november 2008 2.9.3-4) In thi eercie we conider microrna data from Human and Moue. The data et repreent 685 independent realiation of the

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

Simulations. . p.1/25

Simulations. . p.1/25 Simulations Computer simulations of realizations of random variables has become indispensable as supplement to theoretical investigations and practical applications.. p.1/25 Simulations Computer simulations

More information

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited Ch. 8 Math Preliminaries for Lossy Coding 8.4 Info Theory Revisited 1 Info Theory Goals for Lossy Coding Again just as for the lossless case Info Theory provides: Basis for Algorithms & Bounds on Performance

More information

Notes on the second moment method, Erdős multiplication tables

Notes on the second moment method, Erdős multiplication tables Notes on the second moment method, Erdős multiplication tables January 25, 20 Erdős multiplication table theorem Suppose we form the N N multiplication table, containing all the N 2 products ab, where

More information

(NRH: Sections 2.6, 2.7, 2.11, 2.12 (at this point in the course the sections will be difficult to follow))

(NRH: Sections 2.6, 2.7, 2.11, 2.12 (at this point in the course the sections will be difficult to follow)) Curriculum, second lecture: Niels Richard Hansen November 23, 2011 NRH: Handout pages 1-13 PD: Pages 55-75 (NRH: Sections 2.6, 2.7, 2.11, 2.12 (at this point in the course the sections will be difficult

More information

There are two basic kinds of random variables continuous and discrete.

There are two basic kinds of random variables continuous and discrete. Summary of Lectures 5 and 6 Random Variables The random variable is usually represented by an upper case letter, say X. A measured value of the random variable is denoted by the corresponding lower case

More information

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample

More information

Review of Probability. CS1538: Introduction to Simulations

Review of Probability. CS1538: Introduction to Simulations Review of Probability CS1538: Introduction to Simulations Probability and Statistics in Simulation Why do we need probability and statistics in simulation? Needed to validate the simulation model Needed

More information

Practice Questions for Final

Practice Questions for Final Math 39 Practice Questions for Final June. 8th 4 Name : 8. Continuous Probability Models You should know Continuous Random Variables Discrete Probability Distributions Expected Value of Discrete Random

More information

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Maximum Likelihood This presentation is based almost entirely on Peter G. Fosters - "The Idiot s Guide to the Zen of Likelihood in a Nutshell in Seven Days for Dummies, Unleashed. http://www.bioinf.org/molsys/data/idiots.pdf

More information

Lecture 11: Continuous-valued signals and differential entropy

Lecture 11: Continuous-valued signals and differential entropy Lecture 11: Continuous-valued signals and differential entropy Biology 429 Carl Bergstrom September 20, 2008 Sources: Parts of today s lecture follow Chapter 8 from Cover and Thomas (2007). Some components

More information

Review: mostly probability and some statistics

Review: mostly probability and some statistics Review: mostly probability and some statistics C2 1 Content robability (should know already) Axioms and properties Conditional probability and independence Law of Total probability and Bayes theorem Random

More information

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline. Random Variables Amappingthattransformstheeventstotherealline. Example 1. Toss a fair coin. Define a random variable X where X is 1 if head appears and X is if tail appears. P (X =)=1/2 P (X =1)=1/2 Example

More information

STAT 801: Mathematical Statistics. Distribution Theory

STAT 801: Mathematical Statistics. Distribution Theory STAT 81: Mathematical Statistics Distribution Theory Basic Problem: Start with assumptions about f or CDF of random vector X (X 1,..., X p ). Define Y g(x 1,..., X p ) to be some function of X (usually

More information

An Introduction to Parameter Estimation

An Introduction to Parameter Estimation Introduction Introduction to Econometrics An Introduction to Parameter Estimation This document combines several important econometric foundations and corresponds to other documents such as the Introduction

More information

Introduction to Linear Regression

Introduction to Linear Regression Introduction to Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Introduction to Linear Regression 1 / 46

More information

Continuous distributions

Continuous distributions CHAPTER 7 Continuous distributions 7.. Introduction A r.v. X is said to have a continuous distribution if there exists a nonnegative function f such that P(a X b) = ˆ b a f(x)dx for every a and b. distribution.)

More information

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9: Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative sscott@cse.unl.edu 1 / 27 2

More information

Stephen Scott.

Stephen Scott. 1 / 27 sscott@cse.unl.edu 2 / 27 Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

Appendix: Synthetic Division

Appendix: Synthetic Division Appendix: Synthetic Division AP Learning Objectives In this section, we will learn how to: 1. Divide polynomials using synthetic division. Synthetic division is a short form of long division with polynomials.

More information

Multivariate distributions

Multivariate distributions CHAPTER Multivariate distributions.. Introduction We want to discuss collections of random variables (X, X,..., X n ), which are known as random vectors. In the discrete case, we can define the density

More information

Gamma and Normal Distribuions

Gamma and Normal Distribuions Gamma and Normal Distribuions Sections 5.4 & 5.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 15-3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Exponential, Gamma and Normal Distribuions

Exponential, Gamma and Normal Distribuions Exponential, Gamma and Normal Distribuions Sections 5.4, 5.5 & 6.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 9-3339 Cathy Poliak,

More information

Discrete Random Variables

Discrete Random Variables CPSC 53 Systems Modeling and Simulation Discrete Random Variables Dr. Anirban Mahanti Department of Computer Science University of Calgary mahanti@cpsc.ucalgary.ca Random Variables A random variable is

More information

Writing proofs for MATH 51H Section 2: Set theory, proofs of existential statements, proofs of uniqueness statements, proof by cases

Writing proofs for MATH 51H Section 2: Set theory, proofs of existential statements, proofs of uniqueness statements, proof by cases Writing proofs for MATH 51H Section 2: Set theory, proofs of existential statements, proofs of uniqueness statements, proof by cases September 22, 2018 Recall from last week that the purpose of a proof

More information

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables Chapter 2 Some Basic Probability Concepts 2.1 Experiments, Outcomes and Random Variables A random variable is a variable whose value is unknown until it is observed. The value of a random variable results

More information

p. 6-1 Continuous Random Variables p. 6-2

p. 6-1 Continuous Random Variables p. 6-2 Continuous Random Variables Recall: For discrete random variables, only a finite or countably infinite number of possible values with positive probability (>). Often, there is interest in random variables

More information

1.2. Functions and Their Properties. Copyright 2011 Pearson, Inc.

1.2. Functions and Their Properties. Copyright 2011 Pearson, Inc. 1.2 Functions and Their Properties Copyright 2011 Pearson, Inc. What you ll learn about Function Definition and Notation Domain and Range Continuity Increasing and Decreasing Functions Boundedness Local

More information

LANGEBIO - BIOSTATISTICS

LANGEBIO - BIOSTATISTICS LANGEBIO - BIOSTATISTICS OCTAVIO MARTÍNEZ DE LA VEGA 7. Continuous Random Variables So far we have been studying random variables which result from counting. However, in many cases we measure something

More information

1 Review of Probability and Distributions

1 Review of Probability and Distributions Random variables. A numerically valued function X of an outcome ω from a sample space Ω X : Ω R : ω X(ω) is called a random variable (r.v.), and usually determined by an experiment. We conventionally denote

More information

Algorithmische Bioinformatik WS 11/12:, by R. Krause/ K. Reinert, 14. November 2011, 12: Motif finding

Algorithmische Bioinformatik WS 11/12:, by R. Krause/ K. Reinert, 14. November 2011, 12: Motif finding Algorithmische Bioinformatik WS 11/12:, by R. Krause/ K. Reinert, 14. November 2011, 12:00 4001 Motif finding This exposition was developed by Knut Reinert and Clemens Gröpl. It is based on the following

More information

1 Probability theory. 2 Random variables and probability theory.

1 Probability theory. 2 Random variables and probability theory. Probability theory Here we summarize some of the probability theory we need. If this is totally unfamiliar to you, you should look at one of the sources given in the readings. In essence, for the major

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

MAS1302 Computational Probability and Statistics

MAS1302 Computational Probability and Statistics MAS1302 Computational Probability and Statistics April 23, 2008 3. Simulating continuous random behaviour 3.1 The Continuous Uniform U(0,1) Distribution We have already used this random variable a great

More information

Probability and Distributions

Probability and Distributions Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated

More information

Exponents. Reteach. Write each expression in exponential form (0.4)

Exponents. Reteach. Write each expression in exponential form (0.4) 9-1 Exponents You can write a number in exponential form to show repeated multiplication. A number written in exponential form has a base and an exponent. The exponent tells you how many times a number,

More information

Bivariate distributions

Bivariate distributions Bivariate distributions 3 th October 017 lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.) Bivariate Distributions of the Discrete Type The Correlation Coefficient

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

CSCE 471/871 Lecture 3: Markov Chains and

CSCE 471/871 Lecture 3: Markov Chains and and and 1 / 26 sscott@cse.unl.edu 2 / 26 Outline and chains models (s) Formal definition Finding most probable state path (Viterbi algorithm) Forward and backward algorithms State sequence known State

More information

4 Expectation & the Lebesgue Theorems

4 Expectation & the Lebesgue Theorems STA 205: Probability & Measure Theory Robert L. Wolpert 4 Expectation & the Lebesgue Theorems Let X and {X n : n N} be random variables on a probability space (Ω,F,P). If X n (ω) X(ω) for each ω Ω, does

More information

Notes on Mathematics Groups

Notes on Mathematics Groups EPGY Singapore Quantum Mechanics: 2007 Notes on Mathematics Groups A group, G, is defined is a set of elements G and a binary operation on G; one of the elements of G has particularly special properties

More information

Hints/Solutions for Homework 3

Hints/Solutions for Homework 3 Hints/Solutions for Homework 3 MATH 865 Fall 25 Q Let g : and h : be bounded and non-decreasing functions Prove that, for any rv X, [Hint: consider an independent copy Y of X] ov(g(x), h(x)) Solution:

More information

Math Review Sheet, Fall 2008

Math Review Sheet, Fall 2008 1 Descriptive Statistics Math 3070-5 Review Sheet, Fall 2008 First we need to know about the relationship among Population Samples Objects The distribution of the population can be given in one of the

More information

STAT 231 Homework 5 Solutions

STAT 231 Homework 5 Solutions Will Landau September 26, 2011 STAT 231 Homework 5 Solutions Exercise 5.1 (Devore 4.11). f. E(X) = xf(x)dx = xf (x)dx = 2 0 x x 2 dx = (1/2) 2 0 x2 dx = [ ] 2 = 8 6 1.333 x 3 6 0 g. E(X 2 ) = x2 f(x)dx

More information

Discrete Probability Refresher

Discrete Probability Refresher ECE 1502 Information Theory Discrete Probability Refresher F. R. Kschischang Dept. of Electrical and Computer Engineering University of Toronto January 13, 1999 revised January 11, 2006 Probability theory

More information

Neyman-Pearson. More Motifs. Weight Matrix Models. What s best WMM?

Neyman-Pearson. More Motifs. Weight Matrix Models. What s best WMM? Neyman-Pearson More Motifs WMM, log odds scores, Neyman-Pearson, background; Greedy & EM for motif discovery Given a sample x 1, x 2,..., x n, from a distribution f(... #) with parameter #, want to test

More information

Probability theory. References:

Probability theory. References: Reasoning Under Uncertainty References: Probability theory Mathematical methods in artificial intelligence, Bender, Chapter 7. Expert systems: Principles and programming, g, Giarratano and Riley, pag.

More information

Statistical Inference, Populations and Samples

Statistical Inference, Populations and Samples Chapter 3 Statistical Inference, Populations and Samples Contents 3.1 Introduction................................... 2 3.2 What is statistical inference?.......................... 2 3.2.1 Examples of

More information

ENGG2430A-Homework 2

ENGG2430A-Homework 2 ENGG3A-Homework Due on Feb 9th,. Independence vs correlation a For each of the following cases, compute the marginal pmfs from the joint pmfs. Explain whether the random variables X and Y are independent,

More information

Introduction to Bayesian Statistics

Introduction to Bayesian Statistics School of Computing & Communication, UTS January, 207 Random variables Pre-university: A number is just a fixed value. When we talk about probabilities: When X is a continuous random variable, it has a

More information

Introduction to Real Analysis Alternative Chapter 1

Introduction to Real Analysis Alternative Chapter 1 Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces

More information

SDS 321: Introduction to Probability and Statistics

SDS 321: Introduction to Probability and Statistics SDS 321: Introduction to Probability and Statistics Lecture 17: Continuous random variables: conditional PDF Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin

More information

Counting principles, including permutations and combinations.

Counting principles, including permutations and combinations. 1 Counting principles, including permutations and combinations. The binomial theorem: expansion of a + b n, n ε N. THE PRODUCT RULE If there are m different ways of performing an operation and for each

More information

Multiple Integrals and Probability Notes for Math 2605

Multiple Integrals and Probability Notes for Math 2605 Multiple Integrals and Probability Notes for Math 605 A. D. Andrew November 00. Introduction In these brief notes we introduce some ideas from probability, and relate them to multiple integration. Thus

More information

Slides 8: Statistical Models in Simulation

Slides 8: Statistical Models in Simulation Slides 8: Statistical Models in Simulation Purpose and Overview The world the model-builder sees is probabilistic rather than deterministic: Some statistical model might well describe the variations. An

More information

Single Maths B: Introduction to Probability

Single Maths B: Introduction to Probability Single Maths B: Introduction to Probability Overview Lecturer Email Office Homework Webpage Dr Jonathan Cumming j.a.cumming@durham.ac.uk CM233 None! http://maths.dur.ac.uk/stats/people/jac/singleb/ 1 Introduction

More information

STAT Chapter 5 Continuous Distributions

STAT Chapter 5 Continuous Distributions STAT 270 - Chapter 5 Continuous Distributions June 27, 2012 Shirin Golchi () STAT270 June 27, 2012 1 / 59 Continuous rv s Definition: X is a continuous rv if it takes values in an interval, i.e., range

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 6: Bias and variance (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 49 Our plan today We saw in last lecture that model scoring methods seem to be trading off two different

More information

Package clonotyper. October 11, 2018

Package clonotyper. October 11, 2018 Type Package Package clonotyper October 11, 2018 Title High throughput analysis of T cell antigen receptor sequences Version 1.18.0 Date 2016-10-13 Author Charles Plessy Maintainer Charles

More information

Week 12-13: Discrete Probability

Week 12-13: Discrete Probability Week 12-13: Discrete Probability November 21, 2018 1 Probability Space There are many problems about chances or possibilities, called probability in mathematics. When we roll two dice there are possible

More information

LIST OF FORMULAS FOR STK1100 AND STK1110

LIST OF FORMULAS FOR STK1100 AND STK1110 LIST OF FORMULAS FOR STK1100 AND STK1110 (Version of 11. November 2015) 1. Probability Let A, B, A 1, A 2,..., B 1, B 2,... be events, that is, subsets of a sample space Ω. a) Axioms: A probability function

More information

Multivariate Distributions

Multivariate Distributions Copyright Cosma Rohilla Shalizi; do not distribute without permission updates at http://www.stat.cmu.edu/~cshalizi/adafaepov/ Appendix E Multivariate Distributions E.1 Review of Definitions Let s review

More information

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2017 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields

More information

Problem 1. Problem 2. Problem 3. Problem 4

Problem 1. Problem 2. Problem 3. Problem 4 Problem Let A be the event that the fungus is present, and B the event that the staph-bacteria is present. We have P A = 4, P B = 9, P B A =. We wish to find P AB, to do this we use the multiplication

More information

Formulas for probability theory and linear models SF2941

Formulas for probability theory and linear models SF2941 Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms

More information

Introduction to probability and statistics

Introduction to probability and statistics Introduction to probability and statistics Alireza Fotuhi Siahpirani & Brittany Baur sroy@biostat.wisc.edu Computational Network Biology Biostatistics & Medical Informatics 826 https://compnetbiocourse.discovery.wisc.edu

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas Forward Algorithm For Markov chains we calculate the probability of a sequence, P(x) How

More information

Arkansas Tech University MATH 3513: Applied Statistics I Dr. Marcel B. Finan

Arkansas Tech University MATH 3513: Applied Statistics I Dr. Marcel B. Finan 2.4 Random Variables Arkansas Tech University MATH 3513: Applied Statistics I Dr. Marcel B. Finan By definition, a random variable X is a function with domain the sample space and range a subset of the

More information

Average and Instantaneous Velocity. p(a) p(b) Average Velocity on a < t < b =, where p(t) is the position a b

Average and Instantaneous Velocity. p(a) p(b) Average Velocity on a < t < b =, where p(t) is the position a b Particle Motion Problems Particle motion problems deal with particles that are moving along the x or y axis. Thus, we are speaking of horizontal of vertical movement. The position, velocity or acceleration

More information

Correlation. January 11, 2018

Correlation. January 11, 2018 Correlation January 11, 2018 Contents Correlations The Scattterplot The Pearson correlation The computational raw-score formula Survey data Fun facts about r Sensitivity to outliers Spearman rank-order

More information

BIOINFORMATICS TRIAL EXAMINATION MASTERS KT-OR

BIOINFORMATICS TRIAL EXAMINATION MASTERS KT-OR BIOINFORMATICS KT Maastricht University Faculty of Humanities and Science Knowledge Engineering Study TRIAL EXAMINATION MASTERS KT-OR Examiner: R.L. Westra Date: March 30, 2007 Time: 13:30 15:30 Place:

More information

{ p if x = 1 1 p if x = 0

{ p if x = 1 1 p if x = 0 Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

More information

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010 Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition

More information

Review of Probabilities and Basic Statistics

Review of Probabilities and Basic Statistics Alex Smola Barnabas Poczos TA: Ina Fiterau 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1 Overview Introduction to

More information

Chapter 5. Chapter 5 sections

Chapter 5. Chapter 5 sections 1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Some Concepts of Probability (Review) Volker Tresp Summer 2018

Some Concepts of Probability (Review) Volker Tresp Summer 2018 Some Concepts of Probability (Review) Volker Tresp Summer 2018 1 Definition There are different way to define what a probability stands for Mathematically, the most rigorous definition is based on Kolmogorov

More information

STAT 450: Statistical Theory. Distribution Theory. Reading in Casella and Berger: Ch 2 Sec 1, Ch 4 Sec 1, Ch 4 Sec 6.

STAT 450: Statistical Theory. Distribution Theory. Reading in Casella and Berger: Ch 2 Sec 1, Ch 4 Sec 1, Ch 4 Sec 6. STAT 45: Statistical Theory Distribution Theory Reading in Casella and Berger: Ch 2 Sec 1, Ch 4 Sec 1, Ch 4 Sec 6. Basic Problem: Start with assumptions about f or CDF of random vector X (X 1,..., X p

More information

1 Probability and Random Variables

1 Probability and Random Variables 1 Probability and Random Variables The models that you have seen thus far are deterministic models. For any time t, there is a unique solution X(t). On the other hand, stochastic models will result in

More information

Intro to Probability Instructor: Alexandre Bouchard

Intro to Probability Instructor: Alexandre Bouchard www.stat.ubc.ca/~bouchard/courses/stat302-sp2017-18/ Intro to Probability Instructor: Alexandre Bouchard Announcements Graded midterm available after lecture Webwork due tonight Regrading policy IF you

More information

Expected Values, Exponential and Gamma Distributions

Expected Values, Exponential and Gamma Distributions Expected Values, Exponential and Gamma Distributions Sections 5.2-5.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 14-3339 Cathy Poliak,

More information

STT 441 Final Exam Fall 2013

STT 441 Final Exam Fall 2013 STT 441 Final Exam Fall 2013 (12:45-2:45pm, Thursday, Dec. 12, 2013) NAME: ID: 1. No textbooks or class notes are allowed in this exam. 2. Be sure to show all of your work to receive credit. Credits are

More information

Math Bootcamp 2012 Miscellaneous

Math Bootcamp 2012 Miscellaneous Math Bootcamp 202 Miscellaneous Factorial, combination and permutation The factorial of a positive integer n denoted by n!, is the product of all positive integers less than or equal to n. Define 0! =.

More information

Data Analysis and Monte Carlo Methods

Data Analysis and Monte Carlo Methods Lecturer: Allen Caldwell, Max Planck Institute for Physics & TUM Recitation Instructor: Oleksander (Alex) Volynets, MPP & TUM General Information: - Lectures will be held in English, Mondays 16-18:00 -

More information

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of

More information

19 : Slice Sampling and HMC

19 : Slice Sampling and HMC 10-708: Probabilistic Graphical Models 10-708, Spring 2018 19 : Slice Sampling and HMC Lecturer: Kayhan Batmanghelich Scribes: Boxiang Lyu 1 MCMC (Auxiliary Variables Methods) In inference, we are often

More information

Probability review. September 11, Stoch. Systems Analysis Introduction 1

Probability review. September 11, Stoch. Systems Analysis Introduction 1 Probability review Alejandro Ribeiro Dept. of Electrical and Systems Engineering University of Pennsylvania aribeiro@seas.upenn.edu http://www.seas.upenn.edu/users/~aribeiro/ September 11, 2015 Stoch.

More information

Hidden Markov Models. Terminology, Representation and Basic Problems

Hidden Markov Models. Terminology, Representation and Basic Problems Hidden Markov Models Terminology, Representation and Basic Problems Data analysis? Machine learning? In bioinformatics, we analyze a lot of (sequential) data (biological sequences) to learn unknown parameters

More information

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models Fatih Cavdur fatihcavdur@uludag.edu.tr March 20, 2012 Introduction Introduction The world of the model-builder

More information

Concentration Inequalities

Concentration Inequalities Chapter Concentration Inequalities I. Moment generating functions, the Chernoff method, and sub-gaussian and sub-exponential random variables a. Goal for this section: given a random variable X, how does

More information

1 Variance of a Random Variable

1 Variance of a Random Variable Indian Institute of Technology Bombay Department of Electrical Engineering Handout 14 EE 325 Probability and Random Processes Lecture Notes 9 August 28, 2014 1 Variance of a Random Variable The expectation

More information

0 otherwise. Page 100 Exercise 9: Suppose that a random variable X has a discrete distribution with the following p.m.f.: { c. 2 x. 0 otherwise.

0 otherwise. Page 100 Exercise 9: Suppose that a random variable X has a discrete distribution with the following p.m.f.: { c. 2 x. 0 otherwise. Stat 42 Solutions for Homework Set 4 Page Exercise 5: Suppose that a box contains seven red balls and three blue balls. If five balls are selected at random, without replacement, determine the p.m.f. of

More information

Spring Nikos Apostolakis

Spring Nikos Apostolakis Spring 07 Nikos Apostolakis Review of fractions Rational expressions are fractions with numerator and denominator polynomials. We need to remember how we work with fractions (a.k.a. rational numbers) before

More information

New test - November 03, 2015 [79 marks]

New test - November 03, 2015 [79 marks] New test - November 03, 05 [79 marks] Let f(x) = e x cosx, x. a. Show that f (x) = e x ( cosx sin x). correctly finding the derivative of e x, i.e. e x correctly finding the derivative of cosx, i.e. sin

More information

Notes 9 : Infinitely divisible and stable laws

Notes 9 : Infinitely divisible and stable laws Notes 9 : Infinitely divisible and stable laws Math 733 - Fall 203 Lecturer: Sebastien Roch References: [Dur0, Section 3.7, 3.8], [Shi96, Section III.6]. Infinitely divisible distributions Recall: EX 9.

More information

The Binomial distribution. Probability theory 2. Example. The Binomial distribution

The Binomial distribution. Probability theory 2. Example. The Binomial distribution Probability theory Tron Anders Moger September th 7 The Binomial distribution Bernoulli distribution: One experiment X i with two possible outcomes, probability of success P. If the experiment is repeated

More information

EXAMPLES OF PROOFS BY INDUCTION

EXAMPLES OF PROOFS BY INDUCTION EXAMPLES OF PROOFS BY INDUCTION KEITH CONRAD 1. Introduction In this handout we illustrate proofs by induction from several areas of mathematics: linear algebra, polynomial algebra, and calculus. Becoming

More information