FW 544: Computer Lab Probability basics in R
|
|
- Gabriel Harmon
- 5 years ago
- Views:
Transcription
1 FW 544: Computer Lab Probability basics in R During this laboratory, students will be taught the properties and uses of several continuous and discrete statistical distributions that are commonly used in ecological models. The students will learn how to generate random data from each distribution using R software and how to develop simple simulation models by passing randomly generated values from one distribution to another. This will provide students with the understanding of probability distributions that will be needed to quantify uncertainty and comprehend the basics of Bayesian probability and Monte Carlo simulation. Laboratory exercises will evaluate the ability of students to build simple simulation models that use randomly generated data to approximate ecological processes, such as survival or the occurrence of a disturbance. Overview of probability and random variables We introduced random (stochastic) variables and statistical distributions last week. Here we will somewhat more formally define these ideas, and get into the details of some important statistical distributions. Probability (P) can be thought of as a measure of the uncertainty in a random outcome. If we say that event X occurs with P=1 then we are certain about X; if we say P=0 then we are certain that X does not occur; and if we say P=0.5 we are equally uncertain about whether X occurs or does not. The value or outcome X referred to is a random variable, as distinguished from a deterministic variable whose values may vary, but do so in a predictable or deterministic manner. A probability distribution or statistical distribution (or distribution for short) is a model that describes the relationship between values of a random variable and the probabilities of assuming these values. The basic types of distributions are discrete and continuous. Discrete distributions model outcomes that occur in discrete classes or integer values; examples include the Bernoulli, Binomial, and Poisson, all discussed in more detail below. Continuous distributions model outcomes that take on continuous (generally,
2 real) values and include the Uniform, Normal, Beta, and Gamma distributions, also discussed in more detail below. The probability density function for a distribution describes the probability that the random variables take on particular values (for a discrete distribution) or are in the neighborhood of a value (for a continuous distribution). The density is often written as f(x). For example, for a discrete random variable (e.g., from a Poisson) distribution we may write f(4)=0.45 indicating that the value of 4 is taken with probability For a discrete distribution f(x) will sum to 1 over the support (region of x where f(x)>0) of the distribution. For example, if we have a binomial distribution with parameters n=5 and p=0., the distribution has support for x =0,1,,3,4,5 with f(0)+f(1)+f()+f(3)+f(4)+f(5) = =1. The density for continuous distributions follows a similar idea but because the support is continuous (and this uncountable) f(x) is not directly interpretable as a point probability. However, by analogy to the discrete distribution the f(x) integrates to one over the support of f(x). The probability distribution function (or cumulative distribution function) represents the probability that a the random variable x is less than or equal to a particular value F(x) = Prob(X x). For discrete distributions, F(x) is readily obtained by summation, e.g., for the binomial example: F(3) = f(0)+f(1)+f()+f(3) = =
3 Calculation of F(x) for continuous distributions is trickier and requires integration. By definition x F ( x) f ( v) dv where the lower limit may sometimes be higher (e.g., if the support starts at zero). Usually these computations are done by computer functions (or looked up in standard tables). Once F(x) is available (for either discrete or continuous distributions) we can easily ask questions like what is the probability that x is between a and b? since Prob( a X b) b a f ( v) dv b f ( v) dv a f ( v) dv F( b) F( a). So for example if we have a Normal with mean 0 and standard deviation 1 F()=0.8=9775, F(1)= and Prob(1 X 1) F() F(1) We can reverse the idea of distributions and, for a given probability level of a distribution function obtain the value of x or quantile associated with that value. The quantiles are essentially found by inverting the distribution function and solving for x, though for discrete distribution they can easily be gotten by examination and interpolating between values. In practice we will get the quantiles of standard distributions using built-in functions in R. To take the example of the normal distribution (mean=0, sd=1), the quantiles associated with F(x) =0.01, 0.5, and 0.99 are -.33,0.00, and.33, respectively. You will often encounter the term moments, which refer to a number of important functions of distributions. The most important moments are the mean and the variance. The mean and variance are formal defined in terms of the density functions as for discrete distributions and E ( x) xf ( x) x
4 E ( x) vf( v) dv for continuous distributions, where in both cases summation or integration is over the support of x. The variance of x follows from the definition of expectation and the relationship x V ( x) E( x ). We generally estimate these population moments by their sample equivalents, the sample mean and variance. and n is the size of a random sample. ˆ x n i1 x i / n ( xi x) i1 ˆ s. n 1 n The Normal Distribution is an example where the distributions parameters-the constants that determine the behavior of the distribution and what it will predict about the data are familiar: the parameters of the Normal are just the mean ( ) and the variance ( ). We will introduce parameters for other distributions when we consider the distributions in detail, below. Random number generation The idea of random number generation is to produce a value (or a list/sample of values) of a random variable x, given some assumptions about the distribution of x and its parameters. For example, we which to obtain a simulated sample of 100 values that come from a normal distribution with mean 5 and standard deviation 10. Depending on whether the random variable is discrete or continuous and the complexity of the distribution function, there are a variety of procedures to generate random variables. Most of the common ones rely on being able to find the inverse distribution function, that is the function F 1 ( U ) that, given a value for U, the cumulating probability of x, returns the value for x. The idea is then to generate a uniform random variable between 0 and
5 1 (the range for a probability) and then solve F 1 ( U ) to get x. Thus, many random number generators start from the capacity to generate a uniform random number, which can then be used to create random variables from other distributions. In practice, R goes through these steps for you, but we will illustrate them for a few simple examples so that you can see that there is often more than one way to obtain a simulated random variable. Technically, we are not generate true random numbers with these procedures, but rather computer generated sequences of numbers that behave like random numbers, known as pseudorandom numbers. The exact means by which pseudorandom numbers are generated is an advanced topic beyond the scope of this course, and has been the subject of intensive development and refinement over the years. Suffice it to say that some pseudorandom number generators perform better (i.e., act like the real deal ) than others, so it is important to be sure that you are using a generator that has been thoroughly tested. Fortunately for us, the developers of R and the R user community have thoroughly vetted the pseudorandom number generators in R, so you can be confident when you use these procedures that the results will be essentially random. Probability distributions in R R provides a very convenient way to calculate and plot many common statistical distributions and related functions and generate random variables, so we will perform most of these tasks using built-in R functions. In a few cases we ll be able to see how to build functions from scratch or nearly so, which may help you to generalize these principles. R-code for all the examples is accumulated and saved in an R script file available on Blackboard. Uniform Distribution Density, distribution, and quantiles Perhaps the simplest distribution is the continuous Uniform (or Rectangular) Distribution, which assumes that values of x over the support of f(x) occur with equal probability. The parameters of the Uniform are simply the lower and upper bounds for
6 x, so that x is equally likely to be anywhere inside the interval a x b, but cannot occur outside the interval (i.e. the support is totally in the interval). Formally the density for x is then The distribution function is simply The mean of the uniform is obtained from f ( x; a, b) 1/( b a), a x b 0 x a or x b F( x) ( x a) /( b a), a x ( a b) / and is just the midpoint between the minimum (a) and maximum (b), the parameters of the distribution, while the variance is ( b a) /1. The density is easily implemented in R by the command >dunif(x,a,b) where a and b are the parameters and x is a value or list of values. So for a simple example we can compute and plot the density for Uniform(a=,b=8) over the range x from 0 to 10. #generate 1000 equally spaced values between 0 and 10 >x<-(0:1000)*0.01 #compute the uniform density for each value of x >density<-dunif(x,,8) Alternatively, we could have written a short function in R to do the same thing: #user-defined density >my_dunif<-function(x,a,b){1/(b-a)*(x>=a & x<=b)} >d<-my_dunif(x,,8) Either approach should produce a plot from >plot(x,density) like this
7 Notice what happens to the density when x< or x>8. Likewise we can produce and plot a distribution for x by >distrib<-punif(x,,8) >plot(x,distrib) producing
8 Quantiles at specified probability levels are produced by the qunif() command, for example: #quantiles at standard probability levels >prob_levels<-c(0,.05,.5,.5,.75,.95,1) >quants<-qunif(prob_levels,,8) >quants [1] Likelihood function We are used to thinking of probability functions as describing the probability of an outcome x given the underlying model and parameter values (e.g., Uniform(,8) above). We can turn this idea around though and ask the question of how likely a given
9 parameter value is, given the data we have and an underlying model. In this way of the thinking the data are fixed and the model parameter(s) is (are) variable. Mathematically, the calculation is the same but we are just varying different quantities. In the example we just considered, we can ask the question: how likely, given a= and a value of x=5, are integer values of the parameter b in the range 3 to 8? >a<- >x<-5 >b<-3:1 >b [1] >like<-dunif(x,a,b) >like [1] [9] We see that there is no likelihood that b is 3 or 4 (obviously ruled out by the value x=5) but that given the single observation x=5 we can t rule out b being 6,7,8 or even higher. We will come back to the likelihood and just how we use data to estimate parameters, in a later lab. Random number generation It is very easy to generate random uniform number in R using the runif() function. The first value for the function specifies the number of values you want, and the next to specify the minimum and maximum (a,b) parameters. By default a=0 and b=1 so runif(100) for example would produce 100 uniform random number between 0 and 1, something that is often the starting point for simulating other, more complicated distributions. To take a specific case, suppose we want to generate 100 uniform random number between 5.5 and #generating n Uniform(a,b) random numbers >n<-100 >a<-5.5
10 >b<-10.4 >x<-runif(n,a,b) would produce a list of numbers (x) with these characteristics. You can calculate the sample mean from the simulated data >mean(x) and confirm that while this gets close to the distribution mean of (a+b)/ it s not exact why is that? Normal Distribution Density, distribution, and quantiles The Normal distribution is perhaps the most familiar statistical distribution. It is symmetric about the mean, with the familiar bell-shaped curve, and is used to model continuous, real values with theoretical range from negative to positive infinity. It is the limiting distribution of many test statistics and functions and is commonly used as an approximation, even when the data are thought to follow some other distribution, often after transformation to reduce skewness or discontinuities in the data. The normal density is determined by the parameters and ( >0) as 1 ( x ) f ( x;, ) exp, x For example, the Normal density function over -50, 50 for =5 and =15 is produced by #normal distribution #generate equally spaced values between -50 and 50 >x<-(-5000:5000)*0.01 >mu<-5 >sigma<-15
11 >density<-dnorm(x,mu,sigma) >plot(x,density) We can produce a comparable distribution function by #distribution function >distrib<-pnorm(x,mu,sigma) >plot(x,distrib) producing
12 Specified probability quantiles are easily obtained from the qnorm() function, for example >prob_levels<-c(0.001,.05,.5,.5,.75,.95,0.999) > quants<-qnorm(prob_levels,mu,sigma) > quants [1] Equivalently, we could say that we are 90% confident that x is between and 9.67, with 10% probability (5% in each tail) outside this range. Notice that in the density dnorm() distribution pnorm() functions, we passed the data as a list to the function, for scalar (1-dimensioned) values of the parameter. Generally
13 speaking any of these function arguments can be lists, and it will make sense below (under the likelihood function) to reverse which ones are. Likelihood function Again, we can turn the model around and ask the question: how likely is a specific parameter value, given an observation (or a sample of observations)? To keep things simple for the normal, let s assume that we ve observed the values x=5 and x = 10, and assume that the standard deviation is fixed at 1. Assuming the normal model, how likely are various values of (say between and 16)? We can compute a likelihood for each data value by dnorm() First, let s make life simpler by introducing an R function that will generate a regular sequence at a specified interval, seq(). We use this to produce values for mu in the range of to 16, at 0.5 spacing (finer if we wish), and then feed them into the likelihoods for x=5 and x = 10. #likelihood >mu<-seq(,16,0.5) >like1<-dnorm(5,mu,1) >like<-dnorm(10,mu,1) At this point, we can first recognize that, assuming that the observations of x are independent, we can multiple their likelihoods or add them on the log scale to get a joint (log) likelihood for the data. >loglike<-dnorm(5,mu,1,log=true)+dnorm(10,mu,1,log=true) Finally, we can examine our log likelihood, see which one is biggest, and see which value of mu produced that log likelihood. R has a nice built in index function that will do this which works like this: >mu[loglike==max(loglike)] which basically says find the index of loglik associated with the biggest value and then tell me what the corresponding mu value is at that same index. In this example, the
14 result is 7.5, which (not coincidentally) is the arithmetic mean of 5 and 10. What we just did is a very crude way (but sometimes effective) way to get the maximum likelihood estimate under a specified model, something we ll explore more in a later lab. Random number generation The easiest way to generate random Normal numbers is by using the built-in R function rnorm(). #Generate 100 random numbers for mu=5 and sigma =10 #method 1 n<-100 mu<-5 sigma<-10 x<-rnorm(n,mu,sigma) The second (and just as valid) way is to first generate 100 random Uniform(0,1) numbers, and then treat these as #method #first generate 100 random uniform(0,1) deviates U<-runif(100) #now treat these as probability values in qnorm(), which functions as the inverse distribution function to return values of x given U. x<-qnorm(u,mu,sigma) You should be able to test these approaches out convince yourself that they produce equivalent results.
15 Poisson Distribution The Poisson Distribution is a very important discrete distribution that models outcomes that take on non-negative integer values (0, 1,,., n ). Examples include counts of animals, plants, =the process generating the counts in space is random in the sense that counts are not clustered separated except by chance. Density, distribution, and quantiles The Poisson Distribution is specified by the single parameter which is equal to both the population mean and variance. Thus sometimes the ratio of the sample mean to the variance is used as evidence (or lack thereof) of Poisson assumptions, with values of this ration ~1 taken as support for a Poisson count model. The density function of the Poisson is given by x e f ( x; ) x!, x=0, 1,, 3, where e is the base of the natural logarithm, λ>0, and x! denotes the factorial function x*(x-1)*(x-)..1. The distribution function is simply given by summation over the discrete values of x of the density F( x) x f ( k; ) k0 k0 k! x k e, x=0, 1,, 3,. The Poisson density and distribution are easily implemented in R., for example for λ=5: #poisson distribution #generate a sequence between 0 and 0 >x<-0:0 >lambda<-5 >density<-dpois(x,lambda) >plot(x,density,"h")
16 #distribution function >distrib<-ppois(x,lambda) >plot(x,distrib,"h") Likewise, standard quantiles are easily computed- > #quantiles > prob_levels<-c(0.001,.05,.5,.5,.75,.95,0.999) > quants<-qpois(prob_levels,lambda) > quants [1]
17 Likelihood function As with the other distributions we have considered, the likelihood function is formed from the density, but reversing the roles of parameters and data, with the latter now fixed and the former varying over some specified range. We will cheat here because we know that given the data λ cannot be huge (say >15) and we know that it has to be >0, so we will only look at the likelihood in that range: > #likelihood > lambda<-seq(0.01,15.01,0.01) > x<-8 > like<-dpois(x,lambda) > loglike<-dpois(x,lambda,log=true) > plot(lambda,like) We can also use a device similar to what we used for the normal to get a fairly good approximation of the maximum likelihood value for λ > #find the maximum over list of lambdas > lambda[loglike==max(loglike)] [1] 8 which is not surprising: given the simplicity of this model (mean=variance) and the single observation x=8, we expect λ to be around 8. Random number generation Given the discrete nature of the random variable in the Poisson distribution, there are several options for generating random variables, some of them quite simple, others not so simple but more flexible. We illustrate all 3 with the example of generating 100 random values from a Poisson(10) distribution. Method 1- built in R function
18 First, we can of course rely on the standard, built-in R function rpois(). #Generate 100 random numbers for lambda=10 #method 1 n<-100 lambda<-10 x<-rpois(n,lambda) #Method Uniform deviate, quantile (inverse distribution function) The second method also relies on an R-function, the quantile or inverse distribution function, but performs the calculations by first computing 100 random(0,1) numbers and then transforming them with the inverse distribution (quantile) function. #method >n<-100 >lamdbda<-100 >U<-runif(100) >x<-qpois(u,lambda) Method 3- Uniform deviate and interpolation from distribution In the third approach we have built a random number generator based directly on the cumulative discrete distribution F(x). As in Method we generate 100 random Uniform variables and then use these and the definition of F(x) to obtain values of x (see Evans et a. 000); this requires a user-defined function (pfun) to map the continuous values of U into discrete values of x: #method 3 >n<-100 >lambda<-10 #generate F(x) from 0 to 50 values<-0:50
19 >F<-ppois(values,10) #define the function to do interpolation from F(x) >pfun<-function(f,u,v) > { > x<-array(0,dim<-c(length(u))) > for (j in 1:length(u)) > { > for (i in 1:50) > { > if (f[i]<=u[j] & u[j]<f[i+1]) {x[j]<-v[i+1]} > } > } > x > } >#generate the values of U and x >U<-runif(N) >x<-pfun(f,u,values) You can confirm that all 3 methods give similar results using large values for n and that the last methods give identical results for the same vector of Uniform numbers. However, Method 3 is much slower than Methods or 1, which simply confirms that (usually) the built-in functions in R tend to be more computationally efficient than what beginning users can build. Building a function like this on your own though does illustrate that in can be done, and this can be handy in situations where no built-in function exists in R. For example, suppose we have a discrete distribution F(x) without a known mathematical form, but for which we can write out numerical values F(x). A simple example of this is where we use the quantile function to summarize the data from a sample into an empirical distribution function and treat these as F(x). We can then use an approach such as Method 3 to simulate values from this distribution, even though we have no idea of its
20 mathematical form. We ll return to these ideas later when we get more deeply into simulation in a later lab. Bernoulli Distribution/ Binomial Distribution The Bernoulli Distribution is the natural distribution for modeling outcomes that can occur in 1 of classes, such as success or failure, lived or died, heads or tails, male or female. The Bernoulli Distribution has a single parameter p that describes the probability of a success (however it is defined). The Binomial Distribution defines the number of successes that occur in n independent Bernoulli trials, each with the same probability of success p. The Binomial is thus based on summing Bernoulli distributions, and has parameters, n and p. Because these distributions are so closely related we will consider them together below. Density, distribution, and quantiles The Bernoulli random variable x takes on possible values, either 1 (indicating success) or 0 (failure), and has a single parameter, p, denoting the probability of success. The probability density function is written as x f ( x; p) p (1 p) 1x, x=0, 1 which simplifies to f ( 0; p) (1 p) and f ( 1; p) p. Note that we assume that there are only possible outcomes, a success with probability p and a failure with probability 1-p, and that by definition the probability that it is either a success or failure adds to 1. The mean of the Bernoulli distribution is E(x)=µ=p and the variance is Var(x)= p(1-p). The Binomial distribution is closely related, with the Binomial variable x defined as number of success in n independent Bernoulli trials, each with probability p of success. The Binomial thus has parameters (n and p) though one of these (n) ordinarily is known and will not be estimated from data. The Binomial density function is f ( x; n, p) n x x nx p (1 p)., x=0, n
21 The Binomial distribution function is F( x; n, p) x k nk p (1 p) k0 n k, x=0, n The mean and variance are given by E(x)=µ=np and the variance is Var(x)= np(1-p). The Binomial density and distribution are easily implemented in R by the dbinom() and pbinom() functions (there is no separate Bernoulli function in R, with the Bernoulli simply being a Binomial with a single trial n=1). e.g., for a Bernoulli with p= 0.4 >#Bernoulli >p<-0.4 >x<-0:1 >density<-dbinom(x,1,p) >distrib<-pbinom(x,1,p) >plot(x,density,"h",ylim=c(0,1)) >plot(x,distrib,"h",ylim=c(0,1)) This produces plots for the density and distribution of:
22 Taking a Binomial with p=0.4 and n=10 trials we have >#Binomial >n<-10 >p<-0.4 >x<-0:n >density<-dbinom(x,n,p) >distrib<-pbinom(x,n,p) >plot(x,density,"h",ylim=c(0,1)) >plot(x,distrib,"h",ylim=c(0,1)) This produces plots
23 and
24 Quantiles at specified p-values are easy to produce using the qbinom() function, e.g., >n<-10 > p<-0.4 > #quantiles > prob_levels<-c(0.001,.05,.5,.5,.75,.95,0.999) > quants<-qbinom(prob_levels,n,p) > quants [1] Likelihood function As with other distributions, we can reverse the roles of the data and the parameters and now treat the parameters as variables. In the case of either the Bernoulli or the Binomial there is generally only one parameter of interest, since we usually know how many trials there are. Take a case where we have 10 trials and we observe 4 successes. We can examine the likelihood over the range of p =(0,1) and try a brute force maximization as before: > #Likelihood
25 > p<-seq(0,1,0.001) > n<-10 > x<-4 > like<-dbinom(x,n,p) > loglike<-dbinom(x,n,p,log=true) > plot(p,like) > plot(p,loglike) > #find the maximum over list of lambdas > p[loglike==max(loglike)] [1] 0.4
26 The results suggest a value of p =0.4 maximizes the log likelihood. However, notice how flat the log likelihood function is, with many values of p larger and smaller than 0.4 returning similar values. This suggests that the data (4 successes but only 10 trials) provides relatively poor information about the parameter value. We will return to this point when we consider estimation in more depth later in the course. Random number generation Generating random number for the Bernoulli and Binomial is quite easy and can be accomplished with either a simple random uniform number generator or with the built-in function rbinom(). The first approach computes Bernoulli outcome by simply comparing a Uniform(0,1) random number to p; if U> p then x=0, otherwise x =1 >#generating bernoulli random variables >#specify p >#specify n_reps >p<-0.35 >#method 1 >x<-(runif(n_reps)<p)*1 >#method >x<-rbinom(n_reps,1,p) Generating Binomial random variables can be accomplished by generating a series of n Bernoulli variables and then summing these. #generating Binomial random variables #specify n #specify p #specify n_reps n_reps<-100 n<-10 p<-0.35
27 #method 1 x<-array(0,c(n_reps)) for (i in 1:n_reps) { x[i]<-sum(runif(n)<=p)*1 } Alternatively, you can directly used the rbinom() function in R #method x<-rbinom(n_reps,n,p) The advantage of the first approach is that sometimes we will not want to assume that the parameter p remains constant, but instead allow it to vary from sample to sample (or even among Bernoulli trials within a sample). In such cases we can still simulate or model the data but no longer under Binomial assumptions (which require p to be constant). We will look at an example of this in a bit. Multinomial Distribution The Multinomial Distribution is similar to the Binomial, but instead of modeling outcomes that occur in ways ( success or failure ) the outcomes can occur in 3 or more ways. For example, suppose that an animal can die, and if it lives can either reproduce or not reproduce, and that these are the only possibilities. If we assign the probabilities to these events as p 1=probability of death, p =probability of living and reproducing, and p =probability of living and not reproducing, by definition p p p 1. Thus, if we know of the 3 probabilities we know the 3 rd by subtraction, e.g., p3 1 p1 p. In general, if we have k categories of outcomes we have k-1 probabilities to describe them, with the last by subtraction. Like the Binomial, the Multinomial is built from a series of n independent trials, each with the same probabilities describing the outcomes. The random variable x is now a vector, denoting the number of the n trials that fall into each category. For example, if we have
28 100 animals, the outcomes might be 5 die, 50 live and reproduce, and 5 live but do not reproduce. The Multinomial density is n x1 x f ( x; n, p) p1 p... p x1x... x k x k k Because of its multivariate nature it is difficult to visualize the density, but density and distribution values are readily computed in R. For example, the density for a 3- category multinomial with 10 trials is calculated by > #example > n<-10 > p<-c(.5,.5,.5) > x<-c(1,5,4) > density<-dmultinom(x,n,p) > density<-dmultinom(x,n,p) > density [1] Random Multinomial variables are generated by the rmultinom() function. For instance, to generate 0 instances of the above 10-trial trinomial we would use: > #Random variables > rmultinom(0,10,p) [,1] [,] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,1] [,13] [,14] [1,] [,] [3,]
29 [,15] [,16] [,17] [,18] [,19] [,0] [1,] [,] [3,] > Beta Distribution The Beta Distribution is a continuous distribution that models a random variable x that can take on values in the range 0 x 1. The Beta is therefore appropriate for modeling the distribution of probability values, and in particular for modeling heterogeneity in probabilities. The parameters α and β (or a and b) control the location and shape of the Beta Distribution. The Uniform(0,1) distribution is a special case of the Beta with α=β=1. The Beta Distribution assumes additional importance as a natural (or conjugate) distribution for the Binomial, describing uncertainty in the Binomial parameter p before (prior) and after (posterior) data collection. Finally, the Beta and the Binomial can be combined hierarchically in model (the Beta-Binomial) in which the random outcome is a binary, but the process describing success is heterogeneous. We will return to both of these themes in later labs. Density, distribution, and quantiles The mathematical form of the Beta density is ( ) ( ) f ( x;, ) x ( ) where (c) is the Gamma function 1 (1 x) 1 ; 0 x 1; 0, c1 ( c) exp( u) u du 0 The kernel of the Beta (part that involves the random variable x) is actually quite simple and, not coincidentally resembles the Binomial distribution The mean of the Beta distribution is x 1 (1 x) 1.
30 and the variance is V E (x) x) ( ) ( 1) ( We can use the relationship between the mean and variance and the parameters to solve for parameter estimates via the Method of Moments. More later. The Beta variable x is sometimes interpreted as modeling the probability of success based on previously observing 1successes and 1failures. The Beta distribution function is obtained by integrating the density from 0 to x. Both the density and the distribution can easily be evaluated in R using the dbeta() and pbeta() functions. For example, for 10 15we can produce density and distribution values over the range of x. #Beta distribution >a<-10 >b<-15 >x<-seq(0,1,0.001) >#density >density<-dbeta(x,as,b) >distrib<-pbeta(x,a,b) >plot(x,density) >plot(x,distrib) This code will produce a plot of the density
31 and of the distribution
32 Notice that the density is centered near 0.4, but takes on a fairly wide range, indicating that Beta(10,15) would be appropriate for modeling success probability that averaged about 0.4 but exhibits heterogeneity. We will return to this theme later when we consider the Beta-Binomial distribution. Standard quantiles of the Beta are easily produced with the qbeta() function, for example > #quantiles > prob_levels<-c(0.001,.05,.5,.5,.75,.95,0.999) > quants<-qbeta(prob_levels,a,b) > quants [1] Likelihood function
33 As in our previous examples, we can consider the data (x) as fixed (observed) and treat the parameters as variables, producing a likelihood function. Note that with the Beta distribution, like the Normal, we have parameters, so we have to find a combination of and that maximizes the likelihood. The likelihood is easy to compute using R, for example if we observe x= 0.4 >#Likelihood >a<-seq(5,5,0.001) >b<-seq(10,30,0.001) >x<-0.4 >like<-dbeta(x,a,b) >loglike<-dbeta(x,a,b,log=true) It is a bit trickier to use brute force methods to get the maximum, and instead we will use graphical methods to get an approximation. Because the parameter space is dimensional, we need to display the likelihood in 3 dimensions The scatterplot function in R will produce a 3-D scatterplot >library(scatterplot3d) >scatterplot3d(a,b,loglike)
34 The graph indicates that the log likelihood has a maximum at around α=0 and β=0. Graphical methods become cumbersome (and inaccurate) for or more parameters; we will consider more exact methods for maximizing the likelihood in a later chapter. Random number generation Random number generation is easily performed in R using the rbeta() function. For example, the following code will generate 100 Beta(10,15) random variables. >#Random betas >n<-100 >a<-10 >b<-15 >#Method 1 >x<-rbeta(n,a,b) If α and β are integers, the following code can also be used to generate Beta random variables from Gamma random variables, which in turn are generated by a log transformation from Uniform distributions:
35 >#Method (if a and b are integers) >x<-array(0,c(n)) >for (i in 1:n) >{ #g1 and g are Gamma random variables >g1<-sum(-log(runif(a))) >g<-sum(-log(runif(b))) >x[i]<-g1/(g1+g) >} Gamma The Gamma distribution is a continuous distribution which has parameters, and where x takes on nonnegative values ( 0 x ). The Gamma is important in statistics because several other important distributions such as the Chi-square and Exponential are special cases. We also saw above how Gamma distributions can be used to generate Beta random variables. However, most of our interest in the Gamma will be because of its special relationship to the Poisson distribution, both for modeling heterogeneity in the Poisson parameter and for as a conjugate distribution for the Poisson in Bayesian analysis. Density, distribution, and quantiles The density function of the Gamma is 1 f ( x; b, c) ( x / b) ( c) c1 exp(-x/b); 0 x ; b 0, c 0 The distribution function of the Gamma is given by integration from 0 to x. F( x; b, c) x o 1 ( v / b) ( c) c1 exp(-v/b)dv; 0 x ; b 0, c 0 The mean and variance of the Gamma are related to the parameters in a straightforward way by E( x) bc and
36 V ( x) b c. As we will see these relationships lead to easy (but not particularly optimal) parameter estimation by the Method of Moments. The density and distribution are easily generated in R using the dgamma() and pgamma() functions. For example, we can plot the density and distribution for Gamma(b=1,c=5) by >#Gamma distribution >c<-5 >b<-1 >x<-seq(0,10,0.001) >#density note R gamma functions use inverse scale (rate) = >1/b >density<-dgamma(x,c,1/b) >distrib<-pgamma(x,c,/1b) >plot(x,density) >plot(x,distrib) This produces a density over the range of 0 to 10 of and a distribution of
37 Quantiles are produced by the qgamma() function. For the same parameter values we can produce several quantiles by > #quantiles > prob_levels<-c(0.001,.05,.5,.5,.75,.95,0.999) > quants<-qgamma(prob_levels,c,1/b) > quants [1] This indicates, for example, that median (0.5 quantile) of Gamma(1,5) is around 4.7, and that 99% of the data can be expected to lie below Likelihood function As with other distributions, we can form the likelihood by considering the data (x) as fixed and allowing the parameter values to vary. For example suppose we observe x=5, we can plot the log likelihood versus values of b and c >#likelihood >b<-seq(0.01,1,0.001)
38 >c<-b*10 >x<-5 >like<-dgamma(x,c,1/b) >loglike<-dgamma(x,c,1/b,log=true) >library(scatterplot3d) >scatterplot3d(c,b,loglike) By eyeballing this graphic, we can see that values of c 6 and b<0.5 appear to maximize the log likelihood. Random number generation We present methods for producing random numbers from the Gamma distribution. The easiest is the built in rgamma() function. For example to generate 1000 Gamma(1,5) random variables: >#random number generation >#method 1 >n<-1000 >c<-5 >b<-1
39 >x<-rgamma(n,c,1/b) Gamma variables can be generated directly from Uniform(0,1) random variables via a log transformation (we used this approach already for Beta random variables), if the parameter c has an integer value: >#method - if c is integer >n<-1000 >c<-5 >b<-1 > x<-array(0,c(n)) >for (i in 1:n) >{ >x[i]<-b*sum(-log(runif(c))) >} Estimation methods Fundamentally, all estimation methods are based on considering the sample data x as known, and then using the statistical model to derive values of the parameters based on the data. We will consider approaches: the Method of Moments and Maximum Likelihood, with most emphasis on the second of these. Method of Moments The Method of Moments is very simple and can provide reasonable estimates of parameter is some situations. The basic steps are Determine the population moments (expected value, variance, etc.) as functions of the parameter(s) Set the population moments equal to the sample (data based) moments Solve for the parameter(s) as functions of the data.
40 To take a very simple case, consider a Binomial experiment where we have 10 independent Bernoulli trials, we observe 6 success, and we wish to estimate p the probability of success (assumed homogeneous among trials). The population moment is E( x) np Setting the population moment equal to the sample moment (in the case, simply x) provides x np and solving for p provides p ˆ x / n 6/ A somewhat more complicated example involves the Beta distribution and moments: the mean and the variance. Recall that for the Beta the mean and variance are and E(x) V x) ( ) ( 1) ( Because there are unknowns (, ) and equations, we should be able to solve for the parameters, and we can. First, we equate the expectation of the moments with the sample moments
41 x and s ( ) ( 1) Then we solve for α and β ˆ x{[ x(1 x)]/ s 1} and ˆ (1 x){[ x(1 x)]/ s 1}. We have written a small R function to provide these calculations > beta.mom<-function(mean,sd){ + v<-sd** + x<-mean + a<-x*(x*(1-x)/v-1) + b<-(1-x)*(x*(1-x)/v-1) + c(a,b) + + } For example, if we have a sample mean of 0.5 and SD of 0.1 in our data, the program provides > beta.mom(.5,.1) [1] 1 1
42 or estimates of ˆ 1, ˆ 1. We can confirm that these correspond to the moments by plugging them into the population moment formula > beta.stats<-function(a,b){ + x<-a/(a+b) + v<-a*b/((a+b)^*(a+b+1)) + c(x,sqrt(v)) + } > beta.stats(1,1) [1] which returns the correct mean and SD. Unfortunately, the Method of Moments can produce bizarre results. For example > beta.mom(.1,.4) [1] However, both parameter of the Beta must be positive numbers, so the Method of Moments in this case does not work. The Beta Method of Moments behaves well for many cases, but can easily produce inadmissible values for the parameters, as just illustrated. The Beta example illustrates one drawback of the Method of Moments, which is that is sometimes can produce nonsensical results (outside the admissible parameter space). The method also does not necessarily provide a way to assess parameter confidence (variance, confidence intervals). Finally, the Method of Moments does not share some of the desirable properties of the next method, such as sufficiency, minimum variance, and asymptotic normality. For this reason, most practitioners use the Method of Moments only as a method for quick approximation, if at all.
43 For completeness, here s a function for estimating the gamma parameters using the method of moments. Remember that this is a quick and dirty and could give negative (incorrect) values. > gamma.mom<-function(mu,sd){ + v<- sd** + c=v/mu + b=(mu/sd)^ + c(b,c) + } > ## gamma MOM using mean 7 and sd of 11 > theta<-gamma.mom(7,11) > ## again take note of use of inverse scale or rate > ## lets see how close we were > mean(x<- rgamma(10000,theta[1],1/theta[])) [1] > sd(x) [1] Maximum Likelihood Maximum likelihood methods have several advantages not necessarily shared by other approaches, and therefore are favored in much of statistics. Generally speaking maximum likelihood estimators (MLEs) Are asymptotically (i.e., with large samples) unbiased Are asymptotically Normally distributed Have minimum variance (i.e., have variance smaller than any other estimator) Provide variance estimates directly as part of estimation The basic idea of MLE is simple: given the data, we consider the parameter(s) to be unknown variables; the density function now behaves instead as a likelihood function. We then solve for the parameter values that maximize the likelihood function, give the data values. There are several ways to do this:
44 By graphic the likelihood function against candidate parameter values By brute force searching over the parameter spaces By exact solution using The Calculus By numerical optimization methods. We can illustrate all these approaches by taking a simple case involving the Binomial distribution. Supposed we conduct 100 Bernoulli trials and observe 40 successes. For example, the 100 trials could be 100 nests that we have discovered and have followed from initiation to success (fledging) or failure. Because we know the number of trials (n=100) we will focus on estimating the probability of success. The statistical model is f ( x; n, p) n x x nx p (1 p) However, we now know that n=100 and x=40 so we will recast this as a likelihood function L ( p; x 40, n 100) p (1 p Now the task is to find a value for p that maximizes this function. Usually, it will be more convenient to work with the natural logarithm of the likelihood function. Because the logarithmic transformation is monotonic, if we find value of p that maximize log(l(p)) we ve also found the value that maximize L(p). For this example the log of the likelihood is 100 ln L( p; x 40, n 100) ln 40 or in general (for any integers n and x n ) ) 60 40ln p 60 ln(1 p) n ln L( p; x, n) ln xln p ( n x)ln(1 p) x. As noted, there are several ways we can go about finding the maximum of this function, and we visited briefly in the previous lab. The first method is based in graphing the likelihood and log likelihood. Rather than use the built-in dbinom() function, we have bimomial_likelihood R script.r to graph the likelikhood and log likelihood. We do this for
45 reasons: first, we want students to see explicitly what the likelihood function and its log looks like, and second, we are going to do some mathematical manipulation in a minute that would not be easy using the built-in R function. Graphical approach When we plot ln L( p; x, n) vs. p we get a curve centered about a value of p ~ 0.4. Similarly the log likelihood seems to peak around 0.4. So, p =0.4 is looking to be a good candidate for the MLE.
46 Brute force As we saw earlier, we can fairly easily find the maximum by brute force if 1) we have a single parameter (p) in this case and ) the parameter is constrained over a reasonable range ( 0 to 1 here). Using our explicit code and the list maximize trick, we get the following > #Brute force > #Likelihood > p<-seq(0,1,0.001)
47 > n<-100 > x<-40 > binomial_like<-function(x,n,p_){ + like=log(choose(n,x))+x*log(p_)+(n-x)*log(1-p_) #choose function evaluates n choose x + return(like) + } > loglike<-binomial_like(x,n,p) > #find the maximum over list of lambdas > p[loglike==max(loglike)] [1] 0.4 Again, this confirms that p=0.4 appears to be viable as the MLE. Exact approach using The Calculus The Calculus provides an exact solution to the likelihood maximization under certain conditions. In particular if the likelihood is continuous and twice differentiable then a necessary condition that L(p*) is a maximum is that the first derivative with respect to p is zero. If the second derivative is negative, then this assures that L(p*) is a maximum and not a minimum. For the Binomial Likelihood this is best approached by operating with the log likelihood. The first derivative of the log likelihood is d ln L( p; x 40, n 100) dp p (1 p) Setting this to zero yields
48 40 60 p (1 p) and with a little algebra p ˆ 40/ More generally d ln L( p; x, n) dp x p n x (1 p) x p n x ( 1 p) p ˆ x / n We can confirm graphically that the derivative becomes zero at p=0.4.
49 Direct solution of the log-likelihood equations by algebra is possible for many statistical models and their parameters. In addition to the Binomial parameter p the Poisson parameter λ can be estimated is way, as can the Normal parameters µ and σ, although analysis becomes more complicated when or more parameters are involved. For example, estimation of the Normal parameters µ and σ requires taking partial derivatives of the log-likelihood with respect to each parameter and setting each of these equations to zero. Solution of these equations for µ and σ then provides the estimates ˆ ˆ x n i1 n i1 i x i / n ( x x) Astute students will notice that the second formula differs slightly from the usual sample variance. / n s n i1 ( x x) i /( n 1) The reason is that ˆ (the MLE) slightly biased for small samples, and suse of reduces this bias. Numerical methods Explicit formulas for MLEs exist and are readily computed for many common statistical models. However, as models become more complex (more parameters and structure) it can be difficult or impossible to obtain algebraic solutions to the MLEs. Fortunately, high speed computers are capable of solving the likelihood equations via numerical approaches. These approaches really are a special application of optimization approaches that we will consider in more detail later. They generally require the following: A mathematical expression (or computer code) for computing the log-likelihood for a given parameter value
50 An initial guess for the parameter value (sometimes based on simple statistics from the data) A means of searching to see if improvements (higher log-likelihood values) can be made by changing the parameter value A stopping rule to determine that the parameter values has converged on the apparent MLE. Gradient descent methods and Newton s Method are of the more familiar (and simpler) optimization methods. Both require the ability to evaluate 1 st and nd derivatives (partial derivatives if there is more than 1 parameter) with respect to each candidate parameter value (combination of values). The derivatives can be either explicitly written (i.e., algebraic) or computed via approximations. We have Newtons method script.r that applies Newton s Method to solving for the MLE of the Binomial parameter p. The basic steps are simple: 1. Start with an initial value for p, p 0. Compute the gradient evaluated at the current value of p(p i ) d ln L( pi ) g( pi ) p i d ln L( pi ) 3. Compute g ( pi ) dpi pi 1 pi g( pi) / g ( pi) 4. Update p by 5. Return to Step and repeat until convergence Convergence can be evaluated by evaluating how much (or little) p changes and/or by determining that g p ) is sufficiently close to zero (i.e., differs from zero by some ( i specified small amount). In the example code (n=100, x=30) p in initialized at 0.1 and converges rapidly to 0.3.
51 R also has a built-in optimization function optimize() that performs maximization or minimization of specified function. The attached code applies this function to the above binomial example. MLE for higher-dimensioned problems In principal exactly the same approaches used for single-parameter models extend to parameters with multiple parameters. However, both graphical and brute force approaches become cumbersome beyond about parameters (try visualizing 4- dimensional graph!) and a generally eschewed in favor of either direct or numerical solution of a system of likelihood equations. Example- Normal Likelihood We can take the example of the Normal likelihood and a sample x of n observations. Assuming that the data are independent, the joint likelihood is formed by product of n likelihoods: n i x i x L 1 ) ( exp 1 ), ( and the log-likelihood is n i x i n x L 1 ) ( ) log( ), ( log The partial derivatives of the log-likelihood with respect to the parameters simplify to ) ( 0 ), ( log x n x L
52 3 1 ) ( ), ( log n i i x n x x n x L These equations can be solved directly by n i x i n x 1 / ˆ n i i n x x 1 / ) ( ˆ or by trial and error, gradient, Newton s Method, or other numerical methods. Application of Newton s Method and other derivative-based methods requires evaluation of the matrix of partial second derivatives ln ln ln ln L L L L I The matrix I is sometimes known as the Hessian or Information Matrix. The vector of first partial derivates / / L L G Solutions to the likelihood equations occur when G = 0; the inverse of F provides the estimated Variance-covariance Matrix, with the variances on the diagonal and the covariances on the off-diagonal. This same approach applies to any dimension MLE problem, with the sizes of G and F determined by the number of parameters (k, so G is length k and F is k x k). The optim() procedure in R can be generalized to solve for the MLEs for more complicated likelihoods involving multiple parameters. In optimize.r we perform ML optimization for Binomial, Normal, and Beta examples. Note that the optim() performs
53 by minimization, so to get maximum likelihood we compute the negative log likelihood and then find the parameter values that minimize the function. The parameter method = BFGS specifies the use of a quasi-newton method (similar to Newton s Method above) and hessian=true species that the algorithm will produce the Hessian matrix, which we can then use to get the variance-covariance matrix. R built in functions As you may have guessed, R users have created several packages that can be used to > library(mass) > beta.data<- c(0.05,0.,0.03,0.4,0.15) > fitdistr(x=beta.data,"beta",start=list(shape1=1,shape=1) ) shape1 shape ( ) ( ) Warning messages: 1: In densfun(x, parm[1], parm[],...) : NaNs produced : In densfun(x, parm[1], parm[],...) : NaNs produced > > gammer.dater<- c(3,.1,17,1,0.5,1.3,.01) > fitdistr(x=gammer.dater,"gamma") shape rate ( ) ( ) We will get more use out of these functions as the course progresses. Writing simulation programs in R We have already done a great deal of simulation with individual distributions in R; here we will focus on putting things together into more complicated analyses, and in efficiency.
CS 361: Probability & Statistics
October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite
More informationCS 361: Probability & Statistics
March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More information1 Probability Distributions
1 Probability Distributions A probability distribution describes how the values of a random variable are distributed. For example, the collection of all possible outcomes of a sequence of coin tossing
More informationPOLI 8501 Introduction to Maximum Likelihood Estimation
POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,
More informationJoint Probability Distributions and Random Samples (Devore Chapter Five)
Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete
More informationSlope Fields: Graphing Solutions Without the Solutions
8 Slope Fields: Graphing Solutions Without the Solutions Up to now, our efforts have been directed mainly towards finding formulas or equations describing solutions to given differential equations. Then,
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2017. Tom M. Mitchell. All rights reserved. *DRAFT OF September 16, 2017* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is
More informationThe Exciting Guide To Probability Distributions Part 2. Jamie Frost v1.1
The Exciting Guide To Probability Distributions Part 2 Jamie Frost v. Contents Part 2 A revisit of the multinomial distribution The Dirichlet Distribution The Beta Distribution Conjugate Priors The Gamma
More informationR Functions for Probability Distributions
R Functions for Probability Distributions Young W. Lim 2018-03-22 Thr Young W. Lim R Functions for Probability Distributions 2018-03-22 Thr 1 / 15 Outline 1 R Functions for Probability Distributions Based
More informationStatistical Distribution Assumptions of General Linear Models
Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions
More informationCentral Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom
Central Limit Theorem and the Law of Large Numbers Class 6, 8.5 Jeremy Orloff and Jonathan Bloom Learning Goals. Understand the statement of the law of large numbers. 2. Understand the statement of the
More informationProbability Distributions Columns (a) through (d)
Discrete Probability Distributions Columns (a) through (d) Probability Mass Distribution Description Notes Notation or Density Function --------------------(PMF or PDF)-------------------- (a) (b) (c)
More informationPractice Problems Section Problems
Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,
More informationUsing R in Undergraduate and Graduate Probability and Mathematical Statistics Courses*
Using R in Undergraduate and Graduate Probability and Mathematical Statistics Courses* Amy G. Froelich Michael D. Larsen Iowa State University *The work presented in this talk was partially supported by
More informationAlgebra Performance Level Descriptors
Limited A student performing at the Limited Level demonstrates a minimal command of Ohio s Learning Standards for Algebra. A student at this level has an emerging ability to A student whose performance
More informationPart 3: Parametric Models
Part 3: Parametric Models Matthew Sperrin and Juhyun Park August 19, 2008 1 Introduction There are three main objectives to this section: 1. To introduce the concepts of probability and random variables.
More informationGOV 2001/ 1002/ E-2001 Section 3 Theories of Inference
GOV 2001/ 1002/ E-2001 Section 3 Theories of Inference Solé Prillaman Harvard University February 11, 2015 1 / 48 LOGISTICS Reading Assignment- Unifying Political Methodology chs 2 and 4. Problem Set 3-
More informationSometimes the domains X and Z will be the same, so this might be written:
II. MULTIVARIATE CALCULUS The first lecture covered functions where a single input goes in, and a single output comes out. Most economic applications aren t so simple. In most cases, a number of variables
More informationCSE 103 Homework 8: Solutions November 30, var(x) = np(1 p) = P r( X ) 0.95 P r( X ) 0.
() () a. X is a binomial distribution with n = 000, p = /6 b. The expected value, variance, and standard deviation of X is: E(X) = np = 000 = 000 6 var(x) = np( p) = 000 5 6 666 stdev(x) = np( p) = 000
More informationProbability and Information Theory. Sargur N. Srihari
Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal
More informationIntroduction to Statistical Data Analysis Lecture 3: Probability Distributions
Introduction to Statistical Data Analysis Lecture 3: Probability Distributions James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis
More informationLecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019
Lecture 10: Probability distributions DANIEL WELLER TUESDAY, FEBRUARY 19, 2019 Agenda What is probability? (again) Describing probabilities (distributions) Understanding probabilities (expectation) Partial
More informationExponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that
1 More examples 1.1 Exponential families under conditioning Exponential families also behave nicely under conditioning. Specifically, suppose we write η = η 1, η 2 R k R p k so that dp η dm 0 = e ηt 1
More informationChapter 6. Order Statistics and Quantiles. 6.1 Extreme Order Statistics
Chapter 6 Order Statistics and Quantiles 61 Extreme Order Statistics Suppose we have a finite sample X 1,, X n Conditional on this sample, we define the values X 1),, X n) to be a permutation of X 1,,
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationIn Chapter 17, I delve into probability in a semiformal way, and introduce distributions
IN THIS CHAPTER»» Understanding the beta version»» Pursuing Poisson»» Grappling with gamma»» Speaking eponentially Appendi A More About Probability In Chapter 7, I delve into probability in a semiformal
More informationPart 2: One-parameter models
Part 2: One-parameter models 1 Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes
More informationNumber Systems III MA1S1. Tristan McLoughlin. December 4, 2013
Number Systems III MA1S1 Tristan McLoughlin December 4, 2013 http://en.wikipedia.org/wiki/binary numeral system http://accu.org/index.php/articles/1558 http://www.binaryconvert.com http://en.wikipedia.org/wiki/ascii
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationMonte Carlo Studies. The response in a Monte Carlo study is a random variable.
Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating
More informationLecture 4: Training a Classifier
Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as
More informationSYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions
SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationA Primer on Statistical Inference using Maximum Likelihood
A Primer on Statistical Inference using Maximum Likelihood November 3, 2017 1 Inference via Maximum Likelihood Statistical inference is the process of using observed data to estimate features of the population.
More informationUnivariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation
Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation PRE 905: Multivariate Analysis Spring 2014 Lecture 4 Today s Class The building blocks: The basics of mathematical
More informationChapter 4: An Introduction to Probability and Statistics
Chapter 4: An Introduction to Probability and Statistics 4. Probability The simplest kinds of probabilities to understand are reflected in everyday ideas like these: (i) if you toss a coin, the probability
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationMatematisk statistik allmän kurs, MASA01:A, HT-15 Laborationer
Lunds universitet Matematikcentrum Matematisk statistik Matematisk statistik allmän kurs, MASA01:A, HT-15 Laborationer General information on labs During the rst half of the course MASA01 we will have
More informationExponential Families
Exponential Families David M. Blei 1 Introduction We discuss the exponential family, a very flexible family of distributions. Most distributions that you have heard of are in the exponential family. Bernoulli,
More informationSTT 315 Problem Set #3
1. A student is asked to calculate the probability that x = 3.5 when x is chosen from a normal distribution with the following parameters: mean=3, sd=5. To calculate the answer, he uses this command: >
More informationProbability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur
Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Lecture No. # 36 Sampling Distribution and Parameter Estimation
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationLikelihood and Bayesian Inference for Proportions
Likelihood and Bayesian Inference for Proportions September 9, 2009 Readings Hoff Chapter 3 Likelihood and Bayesian Inferencefor Proportions p.1/21 Giardia In a New Zealand research program on human health
More information8.5 Taylor Polynomials and Taylor Series
8.5. TAYLOR POLYNOMIALS AND TAYLOR SERIES 50 8.5 Taylor Polynomials and Taylor Series Motivating Questions In this section, we strive to understand the ideas generated by the following important questions:
More informationThis exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.
TEST #3 STA 5326 December 4, 214 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. (You will have access to
More informationRobustness and Distribution Assumptions
Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationDIFFERENTIAL EQUATIONS
DIFFERENTIAL EQUATIONS Basic Concepts Paul Dawkins Table of Contents Preface... Basic Concepts... 1 Introduction... 1 Definitions... Direction Fields... 8 Final Thoughts...19 007 Paul Dawkins i http://tutorial.math.lamar.edu/terms.aspx
More informationReview of Discrete Probability (contd.)
Stat 504, Lecture 2 1 Review of Discrete Probability (contd.) Overview of probability and inference Probability Data generating process Observed data Inference The basic problem we study in probability:
More informationPart 8: GLMs and Hierarchical LMs and GLMs
Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course
More informationGeneralized Linear Models for Non-Normal Data
Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture
More informationEco517 Fall 2004 C. Sims MIDTERM EXAM
Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering
More informationQuiz 1. Name: Instructions: Closed book, notes, and no electronic devices.
Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices. 1. What is the difference between a deterministic model and a probabilistic model? (Two or three sentences only). 2. What is the
More informationMACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION
MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION THOMAS MAILUND Machine learning means different things to different people, and there is no general agreed upon core set of algorithms that must be
More informationDistribution Fitting (Censored Data)
Distribution Fitting (Censored Data) Summary... 1 Data Input... 2 Analysis Summary... 3 Analysis Options... 4 Goodness-of-Fit Tests... 6 Frequency Histogram... 8 Comparison of Alternative Distributions...
More informationWeek 2: Review of probability and statistics
Week 2: Review of probability and statistics Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ALL RIGHTS RESERVED
More informationVarieties of Count Data
CHAPTER 1 Varieties of Count Data SOME POINTS OF DISCUSSION What are counts? What are count data? What is a linear statistical model? What is the relationship between a probability distribution function
More informationDS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling
DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including
More informationModel Estimation Example
Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions
More informationProbability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur
Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur Lecture No. # 33 Probability Models using Gamma and Extreme Value
More informationPractical Algebra. A Step-by-step Approach. Brought to you by Softmath, producers of Algebrator Software
Practical Algebra A Step-by-step Approach Brought to you by Softmath, producers of Algebrator Software 2 Algebra e-book Table of Contents Chapter 1 Algebraic expressions 5 1 Collecting... like terms 5
More informationIntroduction to Maximum Likelihood Estimation
Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:
More informationEstimation of Quantiles
9 Estimation of Quantiles The notion of quantiles was introduced in Section 3.2: recall that a quantile x α for an r.v. X is a constant such that P(X x α )=1 α. (9.1) In this chapter we examine quantiles
More informationSubject CS1 Actuarial Statistics 1 Core Principles
Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and
More informationHypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006
Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20, 2006 1 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher)
More informationParameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!
Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Questions?! C. Porciani! Estimation & forecasting! 2! Cosmological parameters! A branch of modern cosmological research focuses
More informationJust Enough Likelihood
Just Enough Likelihood Alan R. Rogers September 2, 2013 1. Introduction Statisticians have developed several methods for comparing hypotheses and for estimating parameters from data. Of these, the method
More informationHuman-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015
Probability Refresher Kai Arras, University of Freiburg Winter term 2014/2015 Probability Refresher Introduction to Probability Random variables Joint distribution Marginalization Conditional probability
More informationSome general observations.
Modeling and analyzing data from computer experiments. Some general observations. 1. For simplicity, I assume that all factors (inputs) x1, x2,, xd are quantitative. 2. Because the code always produces
More informationWeek 1 Quantitative Analysis of Financial Markets Distributions A
Week 1 Quantitative Analysis of Financial Markets Distributions A Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 October
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released
More informationMixture distributions in Exams MLC/3L and C/4
Making sense of... Mixture distributions in Exams MLC/3L and C/4 James W. Daniel Jim Daniel s Actuarial Seminars www.actuarialseminars.com February 1, 2012 c Copyright 2012 by James W. Daniel; reproduction
More informationf(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain
0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher
More information3.4 Complex Zeros and the Fundamental Theorem of Algebra
86 Polynomial Functions 3.4 Complex Zeros and the Fundamental Theorem of Algebra In Section 3.3, we were focused on finding the real zeros of a polynomial function. In this section, we expand our horizons
More informationFourier and Stats / Astro Stats and Measurement : Stats Notes
Fourier and Stats / Astro Stats and Measurement : Stats Notes Andy Lawrence, University of Edinburgh Autumn 2013 1 Probabilities, distributions, and errors Laplace once said Probability theory is nothing
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationTopic 17: Simple Hypotheses
Topic 17: November, 2011 1 Overview and Terminology Statistical hypothesis testing is designed to address the question: Do the data provide sufficient evidence to conclude that we must depart from our
More informationInferring from data. Theory of estimators
Inferring from data Theory of estimators 1 Estimators Estimator is any function of the data e(x) used to provide an estimate ( a measurement ) of an unknown parameter. Because estimators are functions
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation Guy Lebanon February 19, 2011 Maximum likelihood estimation is the most popular general purpose method for obtaining estimating a distribution from a finite sample. It was
More informationEPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7
Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review
More informationBayesian Estimation An Informal Introduction
Mary Parker, Bayesian Estimation An Informal Introduction page 1 of 8 Bayesian Estimation An Informal Introduction Example: I take a coin out of my pocket and I want to estimate the probability of heads
More informationLikelihood and Bayesian Inference for Proportions
Likelihood and Bayesian Inference for Proportions September 18, 2007 Readings Chapter 5 HH Likelihood and Bayesian Inferencefor Proportions p. 1/24 Giardia In a New Zealand research program on human health
More informationQuantitative Understanding in Biology 1.7 Bayesian Methods
Quantitative Understanding in Biology 1.7 Bayesian Methods Jason Banfelder October 25th, 2018 1 Introduction So far, most of the methods we ve looked at fall under the heading of classical, or frequentist
More informationStatistical Models. David M. Blei Columbia University. October 14, 2014
Statistical Models David M. Blei Columbia University October 14, 2014 We have discussed graphical models. Graphical models are a formalism for representing families of probability distributions. They are
More informationMath 123, Week 2: Matrix Operations, Inverses
Math 23, Week 2: Matrix Operations, Inverses Section : Matrices We have introduced ourselves to the grid-like coefficient matrix when performing Gaussian elimination We now formally define general matrices
More informationProbability theory and inference statistics! Dr. Paola Grosso! SNE research group!! (preferred!)!!
Probability theory and inference statistics Dr. Paola Grosso SNE research group p.grosso@uva.nl paola.grosso@os3.nl (preferred) Roadmap Lecture 1: Monday Sep. 22nd Collecting data Presenting data Descriptive
More informationProbability. Table of contents
Probability Table of contents 1. Important definitions 2. Distributions 3. Discrete distributions 4. Continuous distributions 5. The Normal distribution 6. Multivariate random variables 7. Other continuous
More informationSTA 2201/442 Assignment 2
STA 2201/442 Assignment 2 1. This is about how to simulate from a continuous univariate distribution. Let the random variable X have a continuous distribution with density f X (x) and cumulative distribution
More informationLecture 1: Probability Fundamentals
Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability
More informationDiscrete probability distributions
Discrete probability s BSAD 30 Dave Novak Fall 08 Source: Anderson et al., 05 Quantitative Methods for Business th edition some slides are directly from J. Loucks 03 Cengage Learning Covered so far Chapter
More informationExpectation, Variance and Standard Deviation for Continuous Random Variables Class 6, Jeremy Orloff and Jonathan Bloom
Expectation, Variance and Standard Deviation for Continuous Random Variables Class 6, 8.5 Jeremy Orloff and Jonathan Bloom Learning Goals. Be able to compute and interpret expectation, variance, and standard
More informationCommon ontinuous random variables
Common ontinuous random variables CE 311S Earlier, we saw a number of distribution families Binomial Negative binomial Hypergeometric Poisson These were useful because they represented common situations:
More informationCommunication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi
Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 41 Pulse Code Modulation (PCM) So, if you remember we have been talking
More informationHOMEWORK #4: LOGISTIC REGRESSION
HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2019 Due: 11am Monday, February 25th, 2019 Submit scan of plots/written responses to Gradebook; submit your
More information