UNIVERZA NA PRIMORSKEM FAKULTETA ZA MATEMATIKO, NARAVOSLOVJE IN INFORMACIJSKE TEHNOLOGIJE. O neeksaknotsti eksaktnega binomskega intervala zaupanja

Size: px
Start display at page:

Download "UNIVERZA NA PRIMORSKEM FAKULTETA ZA MATEMATIKO, NARAVOSLOVJE IN INFORMACIJSKE TEHNOLOGIJE. O neeksaknotsti eksaktnega binomskega intervala zaupanja"

Transcription

1 UNIVERZA NA PRIMORSKEM FAKULTETA ZA MATEMATIKO, NARAVOSLOVJE IN INFORMACIJSKE TEHNOLOGIJE Zaključna naloga (Final project paper) O neeksaknotsti eksaktnega binomskega intervala zaupanja (On the inexactness of the exact binomial confidence interval) Ime in priimek: Anes Valentić Študijski program: Matematika Mentor: doc. dr. Rok Blagus Koper, avgust 2017

2 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, 2017 II Ključna dokumentacijska informacija Ime in PRIIMEK: Anes VALENTIĆ Naslov zaključne naloge: O neeksaknotsti eksaktnega binomskega intervala zaupanja Kraj: Koper Leto: 2017 Število listov: 38 Število slik: 8 Število tabel: 3 Število prilog: 2 Število strani prilog: 3 Število referenc: 14 Mentor: doc. dr. Rok Blagus Ključne besede: eksaktni interval, interval zaupanja, verjetnost pokritja, binomska porazdelitev, simulacije, standardni interval Math. Subj. Class. (2010): 62F03, 62G10, 62H17 Izvleček: Ena izmed najpogosteje uporabljenih metod za pridobitev intervalne ocene binomskega parametra je exaktni binomski interval zaupanja. Kot pravi njegovo ime, naj bi bil natančen test, ki ima dejansko verjetnost kritja, enako nominalni verjetnosti kritja, žal pa to ni tako. V praksi je binomski interval zaupanja konservativen, kar pomeni, da je verjetnost kritja vedno večja ali enaka nominalni verjetnosti kritja. Zaradi prevelike verjetnosti pokritja so doseženi intervali zaupanja širši, kot bi morali biti z dano nominalno stopnjo. Večina alternativ v binomskem natančnem intervalu zaupanja, kot je Waldov interval ali Wilsonov interval ocenjevanja, ni veliko boljša. Zaradi približnih intervalov zaupanja je dejanska verjetnost kritja veinoma daleč od nominalnega, razen zelo velikih vzorcev.

3 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, 2017 III Key words documentation Name and SURNAME: Anes VALENTIĆ Title of final project paper: On the inexactness of the exact binomial confidence interval Place: Koper Year: 2017 Number of pages: 38 Number of figures: 8 Number of tables: 3 Number of appendices: 2 Number of appendix pages: 3 Number of references: 14 Mentor: Assist. Prof. Rok Blagus, PhD Keywords: exact interval, confidence interval, coverage probability, binomial distribution, simulations, standard interval Math. Subj. Class. (2010): 62F03, 62G10, 62H17 Abstract: One of the most commonly used methods for obtaining an interval estimate of a binomial parameter is the exact binomial confidence interval. As its name says, it is supposed to be an exact test, that is having the actual coverage probability equal to the nominal coverage probability, but unfortunately this is not the case. In practice the binomial confidence interval is conservative, meaning that its coverage probability is always greater or equal to the nominal coverage probability. As a consequence of a to high coverage probability the obtained confidence intervals are wider than they should be with the given nominal level. Most of the alternatives to the binomial exact confidence interval, like Wald s interval or Wilson s score interval, are not much better. Due to being approximate confidence intervals they actual coverage probability is mostly far away from the nominal, except for very large samples.

4 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, 2017 IV Acknowledgement I would like to express my deepest gratitude to my advisor, Assist. Prof. Rok Blagus, PhD, for his support and guidance throughout the research. His continued support led me to the right way. I would like to thank the Faculty of Mathematics, Natural Sciences and Information Technologies, and to the technical staff of the Faculty for their help with all sorts of administrative procedures. Also, I am deeply grateful for the scholarship that University of Primorska provided me with during my studies. Finally, I would like to thank my parents, for the unconditional love and support, and Marija for the support provided.

5 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, 2017 V Contents 1 Introduction 1 2 Preliminaries in Probability and Statistics The Binomial Distribution and Its Behavior Confidence Intervals The Exact Binomial Confidence Intervals Calculating the bounds with the help of the F distribution Approximate Binomial Confidence Intervals The standard interval Wilson Score Interval On the inexactness of the Cloper-Pearson interval Theoretical background Exhaustive Simulation Study Extensive simulation study On the inexactness of the Standard interval Coverage probability of Wilson s interval Conclusion 34 8 Povzetek naloge v slovenskem jeziku 35 9 Bibliography 37

6 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, 2017 VI List of Tables 1 Theoretical CP s and simulation results for n = 10 and α = Theoretical CP s and simulation results for n = 20 and α = Examples of the oscillating behavior

7 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, 2017 VII List of Figures 1 Sample size n = Samples of sizes n = 100(left) and n = 200(right) Samples of sizes n = 50(left) and n = 100(right) Samples of sizes n = 200(left) and n = 500(right) Parameter p fixed at 0.05(left) and 0.1(right) Parameter p fixed at 0.5(left) and 0.7(right) Coverage probabilities of Wilson s interval for the sample sizes 50(left) and 100(right) Coverage probabilities of Wilson s interval for the sample sizes 200(left) and 500(right)

8 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, 2017 VIII Appendices A Representing the cdf of the binomial distribution with the help of an integral B Confidence interval tables

9 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, 2017 IX List of Abbreviations i.e. that is e.g. for example i.i.d. independent and identically distributed LHS left hand side RHS right hand side

10 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, Introduction In this thesis we revisit one of the most basic problems which is also among the most important concepts in statistical practice, namely, interval estimation of the probability of success in a binomial trial. When mentioning interval estimations of the binomial parameter p two confidence intervals first come to mind. These confidence intervals are the exact binomial confidence interval, also known as the Cloper-Pearson confidence interval, and the standard confidence interval known as Wald s confidence interval. Both intervals are advertised in many textbooks and have acquired nearly universal acceptance in practice. Cloper-Pearson s confidence interval is widely assumed to be an exact confidence interval, that is having the actual coverage probability exactly equal to the nominal. But unfortunately in practice it shows that it is very conservative. The Cloper-Pearson s coverage probability is always at least as large as the nominal, but in most of the cases as we will show later it is in fact strictly bigger than the nominal coverage probability. This difference in nominal and actual coverage probabilities is directly reflected on the width of the confidence interval which is bad, since having a wider interval then supposed means we are losing valuable information. As a possible alternative the standard interval is always introduced. The interval is obtained from Wald s large test for the binomial case and thus it is also known as Wald s interval. For this confidence interval it is widely recognized that the actual coverage probability is very bad for cases when n is small or p is near the bounds 0 and 1, or when both cases occur. But what is usually not presented is that the standard interval can perform also very badly when n is considerably large and p is not near the boundaries. In fact we will later see that the coverage probability of the standard interval behaves chaotically as a function of n and almost the same chaotic behavior is observed when looking at the coverage probability as a function of p.

11 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, For the end one very interesting alternative to the exact and standard confidence intervals is presented, namely, Wilson s confidence interval. We will later see that actually Wilson s confidence interval outperforms both, the standard and the exact confidence interval. However none of the intervals can be strictly recommended for practical usage since all are inexact, but due to lack of alternative and the fact that at some point with a big enough sample size all three test become close to being exact one can still use them.

12 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, Preliminaries in Probability and Statistics It goes without saying that one of the most important and most well known distributions in probability is the normal distribution. This work will heavily depend on the normal distribution and its properties, thus we give now the formal definition of the normal distribution Definition 2.1. The normal distribution with mean µ and variance σ 2 is denoted with N (µ, σ 2 ) and has the probability density function f(x) = 1 (x µ)2 e 2σ 2. 2πσ 2 The standard normal distribution is the normal distribution with mean 0 and variance 1. One especially important result involving the normal distribution is the Central Limit Theorem. We will base our approximate confidence intervals explicitly on results obtained from the following version of the central limit theorem. Theorem 2.2. Let {X n : n 0} be a sequence of independent and identically distributed random variables with mean µ, where < µ <, and variance σ 2, such that 0 < σ 2 <. Let us know define S n = n i=1 X i and Xn = S n n. Further we define Z n to be Z n = S n nµ σ n = n ( Xn µ ) σ Then Z n d Z N (0, 1) as n. Proof. We have omitted the proof here, it can be found in the reference [12].

13 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, Before we go on to explaining the binomial distribution we will first define two principles of convergence and give the statement of Slutsky s theorem. Definition 2.3. A sequence of random variable {X n } converges in probability towards the random variable X if for all ε > 0 we have lim P ( X n X > ε) = 0. n Definition 2.4. A sequence X 1, X 2,... of real-valued random variables is said to converge in distribution to a random variable X if lim n F n (x) = F (x), for every x R at which F is continuous. Here F n and F are the cumulative distribution functions of random variables X n and X, respectively [12]. Theorem 2.5. Slutsky s theorem. Let {X n } and {Y n } be two sequences of random variables. Let us have that X n converges in distribution to a random variable X and that Y n converges in probability to a constant c, then X n + Y n d X + c X n Y n d Xc X n /Y n d X/c provided that c 0 where d denotes convergence in distribution. Proof. We have omitted the proof here, it can be found in the reference [4]. 2.1 The Binomial Distribution and Its Behavior The Binomial distribution is based on the idea of a succession in Bernoulli trials. A Bernoulli trial is an experiment with exactly two possible outcomes. On the principle of Bernoulli trials arises the Bernoulli distribution [10]. A random variable X is said to have Bernoulli distribution with parameter p, denoted X Bernoulli(p), if X = { 1 with probability p 0 with probability 1 p, 0 p 1.

14 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, The mean and variance of a Bernoulli random variable with parameter p are easily seen to be E (X) = 1 p + 0 (1 p) = p, var (X) = (1 p) 2 p + (0 p) 2 (1 p) = p (1 p). Many events occurring in nature can be modeled as a sequence of Bernoulli trials going from the simplest example of repeated tossing of coins over election polls including two candidates to incidences of a disease. Let us have n identical Bernoulli trials performed independently where we define the events A i = {X = 1 on the i th trial}, i = 1, 2,..., n. We can further assume that the events A 1,..., A n are a collection of independent events, as they reflect the outcomes of independent Bernoulli trials. Using the independence of events A i, it becomes easy to derive the distribution of the total number of successes in n trials. We define the random variable Y by Y = total number of successes in n trials The event {Y = y} for 0 y n will occur if and only if out of the events A 1,..., A n exactly y of them occur and consequently n y do not occur. Meaning that the event {Y = y} will be equivalent to the outcome A i1 A i2... A iy A C i y+1... A C i n (2.1) where I = {i 1,..., i y } is a subset of the set {1,..., n}. Since the order of the index set I is not important we only need the number of disjoint subsets of size y of the set {1,..., n} to determine the number of possibilities for the set I. And since we know that the number of k subsets on n elements is ( n k) we can conclude directly that the number of different index sets I is ( n y). The probability of the outcome A 1... A y A C y+1... A C n is n y times P ( y time ) {}}{{}}{ A 1... A y A C y+1... A C n = (p p) (1 p) (1 p) (2.2) = p y (1 p) n y Calculating this probability made use of the independence of events A i. Notice that in the calculation it is not important which index set I is chosen, but only the size of the index set, meaning that all the outcomes of the form (2.1) will have the same

15 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, probability of occurring. Since there are ( n y) possibilities to chose the index set I of size y which completely determines the outcome (2.1) we conclude that there are also ( ) n y outcomes of the form (2.1) all having probability of occurring p y (1 p) n y. Putting this all together we obtain the probability of the event {Y = y} occurring to be ( ) n P (Y = y n, p) = p y (1 p) n y, y = 0,..., n (2.3) y Thus we come to the formal definition of the Binomial distribution: Definition 2.6. In a sequence of n identical, independent Bernoulli trials, each with success probability p and the random variables X 1,..., X n defined by { 1 with probability p X i = 0 with probability 1 p. The random variable Y = n i=1 has the Binomial distribution with parameters n and p, denoted as Binom(n, p), with the probability mass function P (Y = k) = X i ( ) n p k (1 p) n k, k = 0,..., n. k Having formally defined the Binomial distribution we will now look at some of its properties. It is easy to compute the expectation of a random variable having a Binomial distribution since we have that [ n ] n n E [Y ] = E X i = E [X i ] = p = np i=1 i=1 i=1 similarly we have for the variance, using the independence of the X i s, that [ n ] n n var [Y ] = var X i = var [X i ] = p (1 p) = np (1 p) i=1 i=1 i=1 Going from the definition of the moment generating function we have that M Y (t) = E [ e ] [ ty = E e t ] n i=1 X i n = E [ e ] n tx i = (1 p) + pe t i=1 = ( 1 p + pe t) n, i=1 (2.4)

16 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, is the moment generating function of the Binomial distribution. With all this said it only remains to examine the asymptotic behavior of the Binomial distribution. Let us have a Binomial distributed random variable, Y n Binom (n, p), for which we already know that its expectation and variance are finite. On the other hand Y n = n i=1 X i where the X i s are i.i.d. Bernoulli random variables with parameter p. Define the random variable Z n = Y n ne (X n ) n var (Xn ) for which we will have, directly from the Central Limit theorem that Z d n Z N (0, 1) which implies that Y d n W N (np, np(1 p)). 2.2 Confidence Intervals While attempting to estimate an unknown parameter θ one may take two different approaches. Firstly, one may consider methods which give functions of the sample values which, for any given sample provide a unique estimate of the value of the parameter. But one should be aware, of course, that the estimate might differ from the parameter in any particular case, thus leaving a margin of uncertainty. To capture the extent of this uncertainty one may use the sample variance of the estimator. If one wants to remove the uncertainty up to some point, intuitively it can be said that it is probable that θ lies in the range of ˆθ ) ± var (ˆθ, where ˆθ is the unique estimate given by the specific method. One can go even further and claim that it is very probable that θ lies in the range of ˆθ ) ± 2 var (ˆθ, and so on. In short, the first type of methods for estimating a parameter at a single point ˆθ have a margin of uncertainty. To avoid this uncertainty up to a point we can try to locate the parameter θ in a range around ˆθ, but keep considering ˆθ as the best estimate of θ. This brings us to our second approach to the estimation of an unknown parameter. Instead of trying to give an unique estimate of θ for a specified sample we consider the specification of the range in which θ lies. There are a few possible methods for interval estimations, but we will be focusing only on confidence intervals.

17 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, Before we go on to giving a formal definition of confidence intervals we will first introduce some notation. Let us again denote the population parameter by θ, whose value is unknown. We will define confidence intervals for values of θ given a confidence level of 100 (1 α) %, where α is between 0 and 1, and the sample size n. Confidence intervals may have an upper limit or a lower limit, or both. A 100 (1 α) % upper confidence limit, denoted with U, is a value that, under repeated random samples of size n may be expected to exceed θ s true value 100 (1 α) % of the time. On the other hand a 100 (1 α) % lower confidence limit, denoted with L, is a value that, under repeated random samples of size n may be expected to fall below θ s true value 100 (1 α) % of the time. The traditional two-sided confidence intervals use both the lower and the upper limits so that each contain θ s true value 100 (1 α/2) % of the time, so that together they contain θ s true value 100 (1 α) % of the time. The limits L and U are derived from a sample statistics, often this statistic is the sample estimate of θ, and a sample distribution that specifies the probability of getting each possible value that the sample statistic can take. This implies that L and U are also sample statistics meaning that they are varying from one sample to another. Having set up an intuitive approach to L and U we can go on with the formal definitions of the most important concepts. We first start with defining the interval estimate: Definition 2.7. A pair of functions L (X) and U (X) on the sample space are called interval estimates if for every X from the sample space the functions satisfy L (X) U (X). The interval obtained from this functions, [L (X), U (X)] is called interval estimator. Now we need to define the coverage probability and the confidence level of an interval estimator. Definition 2.8. For a given parameter θ having true value θ 0 the coverage probability of the interval estimator [L (X), U (X)], denoted by CP, is the probability that the random interval [L (X), U (X)] covers the true value of θ, in other words CP (θ 0 ) = P {L θ 0 U θ = θ 0 } Definition 2.9. The confidence level of an interval estimator [L (X), U (X)] for a parameter θ is the infimum over its coverage probabilities.

18 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, After defining an interval estimator and the coverage probability we can finally give a formal definition of a confidence interval Definition The interval estimator [L, U] is called a confidence interval for θ if its coverage probability CP (θ 0 ) = P {L θ 0 U θ = θ 0 } is the same for all parameter values of θ 0. However, it is crucial to realize two things about confidence intervals. The first one is that the confidence intervals are not always symmetric around the sample statistic. The second and far more important thing is how we interpret the confidence intervals. Namely, except in very special cases, it is not correct to conclude that a particular observed 100 (1 α) % confidence interval [l, u] for a parameter θ has a 100 (1 α) % probability of including the true value of θ. Once one has calculated the values of the sample statistics L and U for a given sample, obtaining the value l and u, the interval obtained is no longer a random variable and it can either include the true value of the parameter or not [2] [8].

19 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, The Exact Binomial Confidence Intervals Let us consider a binomial random variable X with parameters n and p, that is X Binom (n, p). We already know that for k {0,..., n} the success probability is ( ) n P (X = k) = p k (1 p) n k k Having a binomial random variable X we can define its lower and upper binomial tail functions in the following way: Definition 3.1. For y {0,..., n} and X Binom (n, p) the lower tail binomial function, l (y, p) is defined as l (y, p) = P (X y) = y k=0 ( ) n p k (1 p) n k (3.1) k similarly, the upper binomial tail function, u (y, p) is defined as u (y, p) = P (X y) = n k=y ( ) n p k (1 p) n k (3.2) k From the upper and lower binomial tail functions we can then define the exact binomial confidence interval. First, we are going to derive the binomial confidence interval in the traditional way. Namely, given the unknown parameter p [0, 1], the parameter n, a number α (0, 1) and an observation t {0,..., n} from a binomial random variable X Binom (n, p), we can describe a 100 (1 α) % confidence interval I (n, t) [0, 1]. The basic idea for defining this confidence interval is that an observation like t should be highly unusual for a p outside of I (n, t). Formally speaking this translates into [ p I (n, t) P (X t) α or P (X t) α ] 2 2

20 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, One possible, but not very useful, strategy of choosing I (n, t) is to take I (n, t) = [0, 1] and so reducing α to zero, i.e. one would never be wrong. However accepting a certain risk level, and reporting a confidence interval with an α in (0, 1) we can make the interval I (n, t) and thus accepting that we could be wrong with an expected frequency bounded above by α we might as well make the I (n, t) as narrow as possible as long as we kept the expected frequency of being wrong bounded above by α. But before we go on to formally defining the exact binomial confidence interval we will need the following two propositions about the lower and upper tail binomial functions. Proposition 3.2. Let y {0,..., n 1}, and let l (y, p) be the lower tail binomial function, then we have that: (i) l (y, 0) = 1 and l (y, 1) = 0 (ii) For 0 < p < 1, one has l (y, p) p (iii) l (y, 0) is strictly decreasing for p [0, 1]. ( ) n 1 = n p y (1 p) (n 1) j < 0. y Proof. (i) When p = 0 then l (y, 0) reduces to its first term which is (1 p) n = 1, since all terms except the first one have a positive power of p in them because y < n. When p = 1 again because y < n all terms of the sum in the right hand side of the definition of l (y, p) have a positive power of 1 p in them, since 1 p = 0, it follows that also l (y, p) = 0. (ii) Let us denote with l (y, p) the partial derivative of l (y, p) with respect to p for the sake of easier calculations. Now when y = 0 we have that l (y, p) = (1 p) n then we have that ( ) n 1 l (y, p) = n (1 p) n 1 = n p 0 (1 p) n 1 0 < 0 for all p (0, 1). 0 Since l (y, p) is continuous in p, it follows that l (y, p) is strictly decreasing on [0, 1]. Let us now suppose that y > 0 (which necessarily means also that n > 0), then we have that

21 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, l (y, p) =n (1 p) n 1 ( 1) y ( ) n [ + kp k 1 (1 p) n k + p k (n k) (1 p) ( 1)] n k 1 k k=1 [ y ( ] n 1 = n (1 p) n 1 + n )p k 1 (1 ) n 1 (k 1) k 1 k=1 [ y ( ] n 1 n )p k (1 ) n 1 k k k=1 [ y 1 ( ] n 1 = n (1 p) n 1 + n )p k (1 ) n 1 k k k=0 [ y ( ] n 1 n )p k (1 ) n 1 k k k=1 ( ) n 1 = n (1 p) n 1 + n (1 p) n 1 n p y (1 p) (n 1) y y ( ) n 1 = n p y (1 p) (n 1) y y For p (0, 1) we have that l (y, p) < 0. And since l (y, p) is continuous in p, we have that l (y, p) is strictly decreasing on [0, 1]. Proposition 3.3. Let y {1,..., n}, and let u (y, p) be the upper tail binomial function, then we have that: (i) u (y, 0) = 0 and u (y, 1) = 1 (ii) For 0 < p < 1, one has u (y, p) p (iii) u (y, 0) is strictly increasing for p [0, 1]. ( ) n 1 = n p y 1 (1 p) n j > 0. y 1 Proof. When y > 0 we have from the definition of the lower and upper tail binomial function that u (y, p) = 1 l (y, p) Also when y {0,..., n 1} we have that y 1 {0,..., n 1}. Applying Proposition 3.2 to l (y 1, p) gives directly the results needed for the completion of the proof. To define the exact binomial confidence interval we will distinguish two cases, first when the observation t is in the set {1,..., n 1} and when t is equal to 0 or n. Let us examine the first case, let us have that t {1,..., n 1}. Now for any value t in this set and for any value of p the function l (t, p) decreases from l (t, 0) = 1

22 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, to l (t, 1) = 0 by Proposition 3.2, of course l is also continuous as a function of p. So by the intermediate value theorem [11], there is a unique p U (0, q) such that if X pu Binom (n, p U ) then P (X pu t) = α 2, and if q (p U, 1] and we have X q Binom (n, q) then we have that P (X q T ) < P (X pu t) = α 2. (3.3) Thus we reject any p (p U, 1]. Similarly, for any value t in the set {1,..., n 1} and for any value of p the function u (t, p) increases from u (t, 0) = 0 to u (t, 1) = 1 by Proposition 3.3, of course u is also continuous as a function of p. Again, by the intermediate value theorem, there is a unique p L (0, 1) such that if X pl Binom (n, p L ) then P (X pl t) = α 2, and if q [0, p L ) and we have X q Binom (n, q) then we have that P (X q T ) < P (X pl t) = α 2. (3.4) Thus we reject any q [0, p L ). So, now we can take I (n, t) = (p L, p U ) as the (100 (1 α)) % confidence interval, and of course we saw that in this case I (n, t) (0, 1). One has to note also that it holds that p L < p U, and the prof of this will be given later in the text. This is the theoretic approach to finding the exact confidence interval, but of course one is not always able to precisely obtain p L and p U, so usually these values are numerically estimated from the equations P (X pl t) = α 2 and P (X pu t) = α 2 where X pl is a random variable such that X pl Binom (n, p L ) and similarly X pu is a random variable such that X pu Binom (n, p U ). The cases of t = 0 and t = n are handled similarly. For t = 0, the traditional choice is I (n, t) = [0, p U ]. For t = n the choice is I (n, t) = [p L, 1]. Proposition 3.4. For t {1,..., n 1} and α (0, 1) we have that p L < p U. Proof. We are going to prove this by contradiction, let us assume that p U p L. Under this assumption we can choose some p such that p U p p L. Since we have that p p L it follows that P (X p t) P (X pl t) = α 2,

23 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, where X p Binom (n, p) and X pl we have that Binom (n, p L ). On the other hand since p U p P (X p t) P (X pu t) = α 2, where X p Binom (n, p) and X pu Binom (n, p U ). Therefore combining these two we get 1 = P (0 X p n) P (X p t) + P (X p t) 2 (α) 2 = α So we must have that α 1 which contradicts the assumption that α (0, 1), thus we must have that p L < p U.

24 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, Calculating the bounds with the help of the F distribution Before we show that the upper and lower limits of the exact binomial confidence interval can be calculated using the F distribution, we will first give a formal definition of the F distribution. Definition 3.5. The F distribution with parameters d 1 and d 2, denoted as X F (d 1, d 2 ), is the distribution with probability density function f (x; d 1, d 2 ) = 1 B ( ) d 1 2, d 2 2 ( d1 d 2 ) ( d 12 ) ( x ( d 1 2 1) 1 + d ) d 1 +d x d 2 for real numbers x 0, where B denotes the Beta function. The cumulative distribution function of the F distribution is F (x; d 1, d 2 ) = I d 1 x d 1 x+d 2 where I is the regularized incomplete beta function. ( d1 2, d ) 2 2 As seen form above when we showed that the exact binomial confidence interval is well defined and that p L and p U always exist, it is obvious that the derivation of these two values can be quite complicated. We will now derive a faster way of calculating the values of the exact binomial confidence interval using the F distribution. The binomial cumulative distribution function can be rewritten as an integral in the following way: y ( ) ( ) n n 1 1 p k (1 p) n k = n k y k=0 p w y (1 w) n y 1 dw. (3.5) The proof of this identity is given in Appendix A. Now using the fact that the incomplete beta function is given by B p (y + 1, n y) = and the complete beta function is p 0 w y (1 w) n y 1 dw k=0 B (y + 1, n y) = 1 0 w y (1 w) n y 1 dw We can transform (3.5) to y ( ) n p k (1 p) n k = B (y + 1, n y) B p (y + 1, n y) k B (y + 1, n y) (3.6)

25 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, Now let W be a random variable that has a F distribution with (d 1, d 2 ) degrees of freedom. Then the random variable G defined as distribution function [9] d 1 W d 2 +d 1 W has the following cumulative F G (g) = P (G g) = B z (d 1 /2, d 2 /2) B (d 1 /2, d 2 /2) 0 < g < 1 It follows from (3.6) that for d 1 = 2 (y + 1) and d 2 = 2 (n y) y ( ) n p k (1 p) n k = 1 P (G p) k ( ) (y + 1) W = 1 P n y + (y + 1) W p ( ( ) ( )) n y p = 1 P W y p k=0 (3.7) Now as we already know p L satisfies the following equation for y = 1, 2,..., n 1 which is equivalent to k=0 n k=y ( ) n p k L (1 p L ) n k = α k 2. y 1 ( ) n p k L (1 p L ) n k = 1 α k 2. (3.8) Combining (3.7) and (3.8) we obtain the following equation ( ( ) ( )) n y + 1 pl P V = α y 1 P L 2 where the random variable V has the F distribution, i.e. V F (2y, 2 (n y + 1)). Now we can express p L with the percentile of the F distribution as ( ) ( ) n y + 1 pl y 1 P L = F 2y,2(n y+1),1 α/2 from where we get that p L = n y+1 yf 2y,2(n y+1),1 α/2 On the other hand, we know that p U for y = 1, 2,..., n 1 satisfies y k=0 ( ) n p k U (1 p U ) n k = α k 2. Combining (3.7) and (3.1) we obtain ( ( ) ( n y pu P V y P U )) = 1 α 2

26 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, where the random variable V has the F distribution, i.e. V F (2 (y + 1), 2 (n y)). Now we can express p U with the percentile of the F distribution as ( ) ( ) n y pu = F 2(y+1),2(n y),α/2 y P U from where we get that p U = n y (y+1)f 2(y+1),2(n y),α/2 So to conclude with, the exact binomial confidence interval [p L, p U ] can be written as [ ] 1 1 [p L, p U ] = n y+1, yf 2y,2(n y+1),1 α/2 n y (y+1)f 2(y+1),2(n y),α/2

27 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, Approximate Binomial Confidence Intervals In this part we are going to present the alternatives to the exact binomial confidence interval, which are based on asymptotic approximations of the binomial distribution. 4.1 The standard interval The standard confidence interval for p, also known as the Wald interval, has gained through out the years universal recommendation in the introductory statistics textbooks and in statistical practice. Let us formally define the standard confidence interval as follows Definition 4.1. Let Y Binom (n, p), let X 1,..., X n be a random sample of the random variable Y. If we define ˆp = 1 n n i=1 X i and z α/2 as the (100 (1 α/2)) % of the standard normal distribution then the interval given as CI S = ˆp ± z α/2 n 1/2 (ˆp (1 ˆp)) 1/2 (4.1) is known as the standard confidence interval for p or the Wald standard interval which we will denote with CI S. The derivation of the standard confidence interval is very easy and straight forward. If we have a Binomial random variable Y with parameters n and p and a random sample X 1,..., X n of Y we can estimate p with ˆp = 1 n n i=1 X i. It can be easily shown that E (ˆp) = p thus meaning that ˆp is unbiased. Also, it can be shown that p (1 p) p (1 p) var (ˆp) = SD (ˆp) = n n So we have that the random variable ˆp has mean p and variance p (1 p) /2. Now by

28 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, the Central limit theorem we have that ( 1 α = lim n P = lim n P = lim n P = lim n P = lim n P ( z α/2 z α/2 n ) n (ˆp p) z α/2 p (1 p) (ˆp p) z ) α/2 p (1 p) n ( z α/2 p (1 p) ˆp p z α/2 n ) p (1 p) n ( ˆp z α/2 p (1 p) p ˆp + z α/2 n ( ) p (1 p) n ˆp z α/2 p (1 p) p ˆp + z ) α/2 p (1 p) n n (4.2) This gives us the following approximate interval, which is guaranteed to have a coverage probability of 1 α as n goes to infinity, ˆp ± z α/2 p (1 p) (4.3) n Now since in this case the standard error of ˆp is dependent on p we will change it with the estimated standard error which is ˆ SE (ˆp) = ˆp (1 ˆp) n So the standard confidence interval becomes ˆp ± z α/2 ˆp (1 ˆp) n which is equivalent to (4.1). What we can now conclude is that since ˆp is a consistent estimate of p, where consistent means that it converges in probability to the true value and we denote this with, meaning that ˆp p as n we have that z α/2 ˆp (1 ˆp) z α/2 p (1 p) as n (4.4) n n The result in the equation (4.4) follows from the fact that by Slutsky s theorem and the consistency of ˆp we have that (1 ˆp) d (1 p). Again by Slutsky s theorem we have that ˆp (1 ˆp) d p (1 p) and by the continuous mapping theorem we have that then ˆp (1 ˆp) d p (1 p). Now again by Slutsky s theorem we have that equation (4.4) holds with the slight change that the convergence is not in probability but in distribution. But since the RHS of the equation is a constant the correct convergence follows by the fact that convergence in distribution to a constant implies convergence

29 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, in probability to the same constant. And since the interval in (4.3) has a coverage probability of 1 α as n and we have seen that the standard interval converges to the interval in (4.3) as n we can conclude that as n will go to infinity also the standard interval, CI S will have a coverage probability of 1 α. The standard interval is very easy to calculate and also heuristically appealing. It is presented along with the justification relying on the central limit theorem that for large n and p not close to zero or one, the interval will achieve an effective coverage probability that is close to 1 α. The problem is that justification of its coverage probability approaching 1 α relies solely on the central limit theorem, and thus can only be taken in account for n = but since we are not dealing with infinite samples we have to be careful when using this interval since we do not know how exactly the coverage probability behaves in a given situation.

30 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, Wilson Score Interval Before we go on to the argumentation supporting the use of Wilson s score interval we first give the formal definition of the interval Definition 4.2. Let Y Binom (n, p), let X 1,..., X n be a random sample of the random variable Y. If we define ˆp = 1 n n i=1 X i and z α/2 as the (100 (1 α/2)) % of the standard normal distribution then the interval given as CI W = ˆp + z2 α/2 2n ± z α/2 ˆp (1 ˆp) + z2 α/2 4n n / ( 1 + z2 α/2 n ) (4.5) is known as the Wilson score interval for a binomial proportion and is denoted with CI W [1]. Now the theory behind this interval lies again on the normal approximation as with the Wald interval. We again have that ( α = lim n P z α/2 ) n (ˆp p) z α/2 p (1 p) (4.6) We go as follows: our true value of the parameter p is p 0. The standard deviation of p 0 is σ = (np 0 (1 p 0 )) 1/2. From equation (4.6) it follows that asymptotically the probability that an observation as bad as ˆp will occur, where ˆp lies outside the limits p 0 z α/2 σ and p 0 + z α/2 σ is less than or equal to α. Rewriting this statement for the observed value ˆp we obtain the following equation ˆp p 0 p 0 (1 p 0 ) n = ±z α/2 (4.7) In fact this equation can be solved for p 0 and it is a straight forward calculation, simply squaring equation (4.7) on both sides we obtain p 2 0 ( 1 z2 α/2 n (ˆp p 0 ) 2 = zα/2 2 p 0 (1 p 0 ) ) ( n) + p 0 2ˆp z2 α/2 n Solving this as a quadratic equation in p 0 we obtain p 1,2 = 2ˆp + z2 α/2 n ± + ˆp 2 = 0 ( ) 2 ( ) 2ˆp z2 α/2 4ˆp n z2 α/2 n ( z2 α/2 n (4.8) ) (4.9) It can easily be seen that the value under the square root is always positive, thus p 1,2 will be real numbers and in fact equation (4.9) is equivalent to equation (4.5).

31 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, On the inexactness of the Cloper-Pearson interval The Cloper-Pearson interval, also known as the exact binomial confidence interval, is as its name says considered to be exact in the sense that its coverage probability attains exactly the nominal level prespecified. The problem is that in practice it has been shown to be inexact, in fact the coverage probability is always greater or equal to the nominal confidence level. In this section we are going to present the possible reasons for this inexactness and show using simulations the conservativness of the confidence interval. 5.1 Theoretical background Let us again remember how the exact binomial confidence interval is defined, namely p L and p U are computed as the solutions in p of the equations x k=0 when x is observed. ( ) n p k (1 p) n k = α k 2 n k=x ( ) n p k (1 p) n k = α k 2 (5.1) What immediately comes to attention is that the interval is dependent on three things only and that are the sample size n, the nominal confidence level α and on the observation. Once we fix the sample size and the nominal level the confidence interval becomes only a function of the observation. Since in order to be able to calculate the confidence interval we need to fix n and α we can assume that we are in the situation that n and α are fixed. Now we can define the exact binomial confidence interval only as a function of the observed sample, i.e. g(x) = [p Lx, p Ux ] for x = 0, 1,..., n (5.2) What is important here to understand is that in this situation where n and α are fixed, each possible sample is independent on the true value of the parameter associated with one and only one confidence interval. That means once we fix n and α we directly obtain all the confidence intervals that arise from all the possible observed values. Let

32 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, us further assume that the true value of the parameter p is p 0. At this point we may draw the following conclusion, one of the three following cases will happen: 1) For x = 0, 1,..., k we will have p 0 g(x) and for x = k + 1, k + 2,..., n we will have p 0 g(x). 2) For x = 0,..., i we will have p 0 g(x), for x = i + 1,..., j 1 we will have p 0 g(x) and for x = j,..., n we will have p 0 g(x). 3) For x = 0, 1,..., k we will have p 0 g(x) and for x = k + 1, k + 2,..., n we will have p 0 g(x). To compute the actual coverage probability for p 0 we need to compute, by definition, the following probability CP (p 0 ) = P {p L p 0 p U p = p 0 }. (5.3) And since we have the situation in which for fixed n, α and p 0 we will exactly know for which values of the observations the corresponding intervals will contain p 0 to calculate the actual coverage probability we need only to calculate the probability of a sample appearing for which p 0 will be included in the corresponding confidence interval. Now we will consider the three cases listed above, to see what will happen to the coverage probability for each of them. We will first consider cases 1) and 3) since they are fairly similar, and then case 2). C-I Let X Binom(n, p 0 ) then in this case we will have that which is equivalent to CP (p 0 ) = 1 CP (p 0 ) = n i=k+1 k P (X = i) i=0 P (X = i) = 1 P (X k + 1) But since in this case p 0 g(k + 1) it follows that p 0 < p L(k+1) and thus by equation (3.4) that CP (p 0 ) = 1 P (X k + 1) > 1 P ( ) X pl(k+1) k + 1 = 1 α 2 > 1 α 2 meaning that our actual coverage probability in this case will be for sure strictly greater than 1 α. For Case III we only need to invert the equations and we get the completely same result.

33 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, C-II Again in this case let us have that X Binom(n, p 0 ) then the coverage probability becomes which is equivalent to CP (p 0 ) = j 1 t=i+1 P (X = t) CP (p 0 ) = 1 i P (X = t) t=0 n P (X = t) = 1 P (X i) P (X j) t=j But since in this case we have that p 0 g(i) and p 0 g(j) if follows that p > p Ui and that p < p Lj so from equation (3.3) it follows that CP (p 0 ) = 1 P (X i) P (X j) > 1 P (X pui i) P ( X plj j ) = 1 α 2 α 2 = 1 α So again we obtain that the coverage probability will be strictly larger than 1 α. So, we see that from the definition of the exact binomial confidence interval it can be concluded that it will be inexact. In fact as we move away from 0.5 toward the end we will eventually hit a point q on both sides from which the actual coverage probability will be greater by at least α then the nominal coverage probability Exhaustive Simulation Study After we showed that in theory the Cloper-Pearson confidence interval will always have an actual coverage probability higher than the nominal coverage probability we will now show some examples which show how the interval performs in practice. We will have two different cases for n = 10 and n = 20, the nominal coverage probability will be kept at C-I In the first case we have the following assumptions n = 10 and α = We will examine the coverage probabilities for p set at 0.2, 0.3, 0.5, 0.6, 0.9. The coverage probabilities will be given in two different ways, the first one will be calculated theoretically based on the theory from section 5.1, and the second will be given from a simulation study. The simulation study, done in the R language [14], simulated random samples from the binomial distribution with the given parameters p and n, from each sample the confidence interval was computed and examined whether p is in the confidence interval. The coverage probability in the

34 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, end is given as the average number of times p falls in the interval of the sample. To compute the theoretical coverage probabilities we first need to compute the confidence intervals for different samples. Since the sample size and the nominal coverage probability are fixed we can calculate the confidence intervals corresponding to all samples (see appendix). Going on to calculate the theoretical coverage probabilities we have the following: For p = 0.2 we can see that p is not in the confidence intervals corresponding to the samples 6, 7, 8, 9, 10, so the coverage probability will be 10 CP (0.2) = 1 P (X 0.2 = i) = = i=6 For p = 0.3 we can see that p is not in the confidence intervals corresponding to the samples 7, 8, 9, 10, so the coverage probability will be 10 CP (0.3) = 1 P (X 0.3 = i) = = i=7 For p = 0.5 we can see that p is not in the confidence intervals corresponding to the samples 0, 1, 9, 10, so the coverage probability will be 1 10 CP (0.5) = 1 P (X 0.5 = i) + P (X 0.5 = i) = = i=0 i=9 For p = 0.6 we can see that p is not in the confidence intervals corresponding to the samples 0, 1, 2, 10, so the coverage probability will be 2 CP (0.6) = 1 P (X 0.6 = i) + P (X 0.6 = 10) = = i=0 For p = 0.9 we can see that p is not in the confidence intervals corresponding to the samples 0, 1, 2, 3, 4, 5, 6, so the coverage probability will be 6 CP (0.9) = 1 P (X 0.9 = i) = = i=0 Having obtained the theoretical coverage probabilities we can now compare them with the coverage probabilities from the simulation study described above which will represent the actual coverage probabilities that are obtained in practice. p Theoretical CP s CP s from simulations p = p = p = p = p = Table 1: Theoretical CP s and simulation results for n = 10 and α = 0.05

35 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, So what we can see from table 1 is that the theoretically obtained coverage probabilities actually are correct in practical simulation which backs up our theoretical results. Of course there are some differences in the coverage probabilities on the third or fourth decimal, but this can be expected from a simulation study including only repetitions, and this small differences are nothing but a chance error in the simulation that is very close of what we expected. C-II For the second case we will remain with the same approach. Again we will have α at 0.05 and we will test for the value of p = 0.2, 0.3, 0.5, 0.6, 0.9 and we will set n = 20. In this case we will not give a detailed calculation of the theoretical coverage probabilities, but we will only give the final results. The confidence intervals that arise in the case of n = 20 and α = 0.05 and that are used for the calculations are given in the appendix. p Theoretical CP s CP s from simulations p = p = p = p = p = Table 2: Theoretical CP s and simulation results for n = 20 and α = 0.05 Again in this case we can come to the same conclusion, that is again the results from the simulations back up our theoretically calculated coverage probabilities. 5.3 Extensive simulation study In this part we present the results of our simulation study for the exact binomial confidence interval. The simulation study conducted included three different sample sizes, 50, 100 and 200, and for every sample size the coverage probability for 100 different values of p were computed. The coverage probabilities were computed in the R language [14]. For each individual n and p we simulated random samples from the binomial distribution with the given parameters p and n, from each sample the confidence interval was computed and it was examined whether p is in the confidence interval. The coverage probability in the end was computed as the average number of times p falls in the interval of the sample.

36 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, Figure 1: Sample size n = 50 The figure 1 shows the results for the simulation study with n = 50. What we can immediately see is what we already saw in the theoretical part, namely that the exact binomial confidence interval does not have the actual coverage probability equal to the nominal coverage probability. This means that the confidence interval is not exact as advertised. In fact we have that the actual coverage probability is always bigger than the nominal meaning that the test is conservative. The main problem about this test is that it is for a lot of values of p extremely conservative, which then gives us unfortunately too wide confidence intervals for some samples, which in practice is a very bad case since from a very wide confidence interval one can not conclude much about p. What makes this problem more extreme with the exact test is that with the increase of sample size one does not get much better in terms of the coverage probability. Figure 2: Samples of sizes n = 100(left) and n = 200(right) Figure 2 shows the coverage probabilities in the cases when n = 100 and n = 200. Here again we see the same pattern as with the case of n = 50 and there is not much

37 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, improvement as we double the sample size from 100 to 200. Also, there is no guaranty that the test will ever converge with the actual coverage probability to the nominal as we have with the approximate intervals. However even thought we do not have a guaranty that it will converge, one can be confident that with a big enough sample size the actual coverage probability will come close enough to the nominal.

38 Univerza na Primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije, On the inexactness of the Standard interval The Standard interval, also known as Wald interval, defined in definition (4.1) is most probably the easiest interval to calculate. It is also one of the most well known confidence intervals for binomial parameter estimation. It is introduced in almost all introductory Statistical books and there is a widely spread belief in two thing about the standard confidence interval. Namely the first one is that the larger the number n the better the approximation and thus the coverage probability of the interval gets. And the second is that one will be very close to the nominal coverage probability except possibly when n is small or when p is near to 0 or 1. Here small and near are usually in literature either left to the interpretation of reader or some very bad bound on p and n are given. We are going to show that both of this beliefs can be totally wrong. Let us now take a look at how the standard interval really performs. First thing that comes to attention with the standard interval are the nonnegligible oscillation of the actual coverage probability. The oscillation occurs as n changes and p is fixed and also as p changes and n is fixed. Furthermore, drastic changes in coverage probabilities occur nearby p for fixed n and they occur nearby n as p is fixed. In order to show how the actual coverage probability behaves and to back up our claims of the drastic oscillations we have performed some simulation studies. All the results that will follow in the rest of this section rely on computations in the R language [14]. Where we for each individual n and p simulate random samples from the binomial distribution with the given parameters p and n, from each sample then the confidence interval is computed and examined whether p is in the confidence interval. The coverage probability in the end is computed as the average number of times p falls in the interval of the sample.

Chapter 4: An Introduction to Probability and Statistics

Chapter 4: An Introduction to Probability and Statistics Chapter 4: An Introduction to Probability and Statistics 4. Probability The simplest kinds of probabilities to understand are reflected in everyday ideas like these: (i) if you toss a coin, the probability

More information

One-sample categorical data: approximate inference

One-sample categorical data: approximate inference One-sample categorical data: approximate inference Patrick Breheny October 6 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction It is relatively easy to think about the distribution

More information

MATH 521, WEEK 2: Rational and Real Numbers, Ordered Sets, Countable Sets

MATH 521, WEEK 2: Rational and Real Numbers, Ordered Sets, Countable Sets MATH 521, WEEK 2: Rational and Real Numbers, Ordered Sets, Countable Sets 1 Rational and Real Numbers Recall that a number is rational if it can be written in the form a/b where a, b Z and b 0, and a number

More information

Sample Spaces, Random Variables

Sample Spaces, Random Variables Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

Name: Firas Rassoul-Agha

Name: Firas Rassoul-Agha Midterm 1 - Math 5010 - Spring 016 Name: Firas Rassoul-Agha Solve the following 4 problems. You have to clearly explain your solution. The answer carries no points. Only the work does. CALCULATORS ARE

More information

Algebra Year 10. Language

Algebra Year 10. Language Algebra Year 10 Introduction In Algebra we do Maths with numbers, but some of those numbers are not known. They are represented with letters, and called unknowns, variables or, most formally, literals.

More information

May 2015 Timezone 2 IB Maths Standard Exam Worked Solutions

May 2015 Timezone 2 IB Maths Standard Exam Worked Solutions May 015 Timezone IB Maths Standard Exam Worked Solutions 015, Steve Muench steve.muench@gmail.com @stevemuench Please feel free to share the link to these solutions http://bit.ly/ib-sl-maths-may-015-tz

More information

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1 4 Hypothesis testing 4. Simple hypotheses A computer tries to distinguish between two sources of signals. Both sources emit independent signals with normally distributed intensity, the signals of the first

More information

Probability Theory and Statistics. Peter Jochumzen

Probability Theory and Statistics. Peter Jochumzen Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Conditional probabilities and graphical models

Conditional probabilities and graphical models Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within

More information

Connectedness. Proposition 2.2. The following are equivalent for a topological space (X, T ).

Connectedness. Proposition 2.2. The following are equivalent for a topological space (X, T ). Connectedness 1 Motivation Connectedness is the sort of topological property that students love. Its definition is intuitive and easy to understand, and it is a powerful tool in proofs of well-known results.

More information

Department of Mathematics

Department of Mathematics Department of Mathematics Ma 3/103 KC Border Introduction to Probability and Statistics Winter 2017 Lecture 8: Expectation in Action Relevant textboo passages: Pitman [6]: Chapters 3 and 5; Section 6.4

More information

Just Enough Likelihood

Just Enough Likelihood Just Enough Likelihood Alan R. Rogers September 2, 2013 1. Introduction Statisticians have developed several methods for comparing hypotheses and for estimating parameters from data. Of these, the method

More information

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).

More information

Chapter 8: An Introduction to Probability and Statistics

Chapter 8: An Introduction to Probability and Statistics Course S3, 200 07 Chapter 8: An Introduction to Probability and Statistics This material is covered in the book: Erwin Kreyszig, Advanced Engineering Mathematics (9th edition) Chapter 24 (not including

More information

3.3 Estimator quality, confidence sets and bootstrapping

3.3 Estimator quality, confidence sets and bootstrapping Estimator quality, confidence sets and bootstrapping 109 3.3 Estimator quality, confidence sets and bootstrapping A comparison of two estimators is always a matter of comparing their respective distributions.

More information

Statistics 100A Homework 5 Solutions

Statistics 100A Homework 5 Solutions Chapter 5 Statistics 1A Homework 5 Solutions Ryan Rosario 1. Let X be a random variable with probability density function a What is the value of c? fx { c1 x 1 < x < 1 otherwise We know that for fx to

More information

Harvard University. Rigorous Research in Engineering Education

Harvard University. Rigorous Research in Engineering Education Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09 Statistical Inference You have a sample and want to use the data collected

More information

6.1 Moment Generating and Characteristic Functions

6.1 Moment Generating and Characteristic Functions Chapter 6 Limit Theorems The power statistics can mostly be seen when there is a large collection of data points and we are interested in understanding the macro state of the system, e.g., the average,

More information

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom Central Limit Theorem and the Law of Large Numbers Class 6, 8.5 Jeremy Orloff and Jonathan Bloom Learning Goals. Understand the statement of the law of large numbers. 2. Understand the statement of the

More information

Northwestern University Department of Electrical Engineering and Computer Science

Northwestern University Department of Electrical Engineering and Computer Science Northwestern University Department of Electrical Engineering and Computer Science EECS 454: Modeling and Analysis of Communication Networks Spring 2008 Probability Review As discussed in Lecture 1, probability

More information

Approximate and Fiducial Confidence Intervals for the Difference Between Two Binomial Proportions

Approximate and Fiducial Confidence Intervals for the Difference Between Two Binomial Proportions Approximate and Fiducial Confidence Intervals for the Difference Between Two Binomial Proportions K. Krishnamoorthy 1 and Dan Zhang University of Louisiana at Lafayette, Lafayette, LA 70504, USA SUMMARY

More information

General Theory of Large Deviations

General Theory of Large Deviations Chapter 30 General Theory of Large Deviations A family of random variables follows the large deviations principle if the probability of the variables falling into bad sets, representing large deviations

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

7.1 Basic Properties of Confidence Intervals

7.1 Basic Properties of Confidence Intervals 7.1 Basic Properties of Confidence Intervals What s Missing in a Point Just a single estimate What we need: how reliable it is Estimate? No idea how reliable this estimate is some measure of the variability

More information

Figure 1: Doing work on a block by pushing it across the floor.

Figure 1: Doing work on a block by pushing it across the floor. Work Let s imagine I have a block which I m pushing across the floor, shown in Figure 1. If I m moving the block at constant velocity, then I know that I have to apply a force to compensate the effects

More information

Probability and Distributions

Probability and Distributions Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated

More information

3 The language of proof

3 The language of proof 3 The language of proof After working through this section, you should be able to: (a) understand what is asserted by various types of mathematical statements, in particular implications and equivalences;

More information

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019 Lecture 10: Probability distributions DANIEL WELLER TUESDAY, FEBRUARY 19, 2019 Agenda What is probability? (again) Describing probabilities (distributions) Understanding probabilities (expectation) Partial

More information

Chapter 3. Estimation of p. 3.1 Point and Interval Estimates of p

Chapter 3. Estimation of p. 3.1 Point and Interval Estimates of p Chapter 3 Estimation of p 3.1 Point and Interval Estimates of p Suppose that we have Bernoulli Trials (BT). So far, in every example I have told you the (numerical) value of p. In science, usually the

More information

Introduction to Probability

Introduction to Probability LECTURE NOTES Course 6.041-6.431 M.I.T. FALL 2000 Introduction to Probability Dimitri P. Bertsekas and John N. Tsitsiklis Professors of Electrical Engineering and Computer Science Massachusetts Institute

More information

7 Random samples and sampling distributions

7 Random samples and sampling distributions 7 Random samples and sampling distributions 7.1 Introduction - random samples We will use the term experiment in a very general way to refer to some process, procedure or natural phenomena that produces

More information

Mathematical Statistics

Mathematical Statistics Mathematical Statistics MAS 713 Chapter 8 Previous lecture: 1 Bayesian Inference 2 Decision theory 3 Bayesian Vs. Frequentist 4 Loss functions 5 Conjugate priors Any questions? Mathematical Statistics

More information

STAT 830 Hypothesis Testing

STAT 830 Hypothesis Testing STAT 830 Hypothesis Testing Richard Lockhart Simon Fraser University STAT 830 Fall 2018 Richard Lockhart (Simon Fraser University) STAT 830 Hypothesis Testing STAT 830 Fall 2018 1 / 30 Purposes of These

More information

7 Estimation. 7.1 Population and Sample (P.91-92)

7 Estimation. 7.1 Population and Sample (P.91-92) 7 Estimation MATH1015 Biostatistics Week 7 7.1 Population and Sample (P.91-92) Suppose that we wish to study a particular health problem in Australia, for example, the average serum cholesterol level for

More information

STAT 830 Hypothesis Testing

STAT 830 Hypothesis Testing STAT 830 Hypothesis Testing Hypothesis testing is a statistical problem where you must choose, on the basis of data X, between two alternatives. We formalize this as the problem of choosing between two

More information

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems

More information

Lecture 4: September Reminder: convergence of sequences

Lecture 4: September Reminder: convergence of sequences 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 4: September 6 In this lecture we discuss the convergence of random variables. At a high-level, our first few lectures focused

More information

Advanced Herd Management Probabilities and distributions

Advanced Herd Management Probabilities and distributions Advanced Herd Management Probabilities and distributions Anders Ringgaard Kristensen Slide 1 Outline Probabilities Conditional probabilities Bayes theorem Distributions Discrete Continuous Distribution

More information

University of Regina. Lecture Notes. Michael Kozdron

University of Regina. Lecture Notes. Michael Kozdron University of Regina Statistics 252 Mathematical Statistics Lecture Notes Winter 2005 Michael Kozdron kozdron@math.uregina.ca www.math.uregina.ca/ kozdron Contents 1 The Basic Idea of Statistics: Estimating

More information

Weizhen Wang & Zhongzhan Zhang

Weizhen Wang & Zhongzhan Zhang Asymptotic infimum coverage probability for interval estimation of proportions Weizhen Wang & Zhongzhan Zhang Metrika International Journal for Theoretical and Applied Statistics ISSN 006-1335 Volume 77

More information

14.30 Introduction to Statistical Methods in Economics Spring 2009

14.30 Introduction to Statistical Methods in Economics Spring 2009 MIT OpenCourseWare http://ocw.mit.edu 4.0 Introduction to Statistical Methods in Economics Spring 009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Lecture No. # 36 Sampling Distribution and Parameter Estimation

More information

Continuum Probability and Sets of Measure Zero

Continuum Probability and Sets of Measure Zero Chapter 3 Continuum Probability and Sets of Measure Zero In this chapter, we provide a motivation for using measure theory as a foundation for probability. It uses the example of random coin tossing to

More information

LECTURE 10: REVIEW OF POWER SERIES. 1. Motivation

LECTURE 10: REVIEW OF POWER SERIES. 1. Motivation LECTURE 10: REVIEW OF POWER SERIES By definition, a power series centered at x 0 is a series of the form where a 0, a 1,... and x 0 are constants. For convenience, we shall mostly be concerned with the

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Expectation is linear. So far we saw that E(X + Y ) = E(X) + E(Y ). Let α R. Then,

Expectation is linear. So far we saw that E(X + Y ) = E(X) + E(Y ). Let α R. Then, Expectation is linear So far we saw that E(X + Y ) = E(X) + E(Y ). Let α R. Then, E(αX) = ω = ω (αx)(ω) Pr(ω) αx(ω) Pr(ω) = α ω X(ω) Pr(ω) = αe(x). Corollary. For α, β R, E(αX + βy ) = αe(x) + βe(y ).

More information

Basic Probability. Introduction

Basic Probability. Introduction Basic Probability Introduction The world is an uncertain place. Making predictions about something as seemingly mundane as tomorrow s weather, for example, is actually quite a difficult task. Even with

More information

Probability Distributions Columns (a) through (d)

Probability Distributions Columns (a) through (d) Discrete Probability Distributions Columns (a) through (d) Probability Mass Distribution Description Notes Notation or Density Function --------------------(PMF or PDF)-------------------- (a) (b) (c)

More information

Chapter 5. Means and Variances

Chapter 5. Means and Variances 1 Chapter 5 Means and Variances Our discussion of probability has taken us from a simple classical view of counting successes relative to total outcomes and has brought us to the idea of a probability

More information

18.440: Lecture 28 Lectures Review

18.440: Lecture 28 Lectures Review 18.440: Lecture 28 Lectures 18-27 Review Scott Sheffield MIT Outline Outline It s the coins, stupid Much of what we have done in this course can be motivated by the i.i.d. sequence X i where each X i is

More information

An analogy from Calculus: limits

An analogy from Calculus: limits COMP 250 Fall 2018 35 - big O Nov. 30, 2018 We have seen several algorithms in the course, and we have loosely characterized their runtimes in terms of the size n of the input. We say that the algorithm

More information

Introducing the Normal Distribution

Introducing the Normal Distribution Department of Mathematics Ma 3/13 KC Border Introduction to Probability and Statistics Winter 219 Lecture 1: Introducing the Normal Distribution Relevant textbook passages: Pitman [5]: Sections 1.2, 2.2,

More information

Design of the Fuzzy Rank Tests Package

Design of the Fuzzy Rank Tests Package Design of the Fuzzy Rank Tests Package Charles J. Geyer July 15, 2013 1 Introduction We do fuzzy P -values and confidence intervals following Geyer and Meeden (2005) and Thompson and Geyer (2007) for three

More information

Recall the Basics of Hypothesis Testing

Recall the Basics of Hypothesis Testing Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE

More information

ESTIMATING STATISTICAL CHARACTERISTICS UNDER INTERVAL UNCERTAINTY AND CONSTRAINTS: MEAN, VARIANCE, COVARIANCE, AND CORRELATION ALI JALAL-KAMALI

ESTIMATING STATISTICAL CHARACTERISTICS UNDER INTERVAL UNCERTAINTY AND CONSTRAINTS: MEAN, VARIANCE, COVARIANCE, AND CORRELATION ALI JALAL-KAMALI ESTIMATING STATISTICAL CHARACTERISTICS UNDER INTERVAL UNCERTAINTY AND CONSTRAINTS: MEAN, VARIANCE, COVARIANCE, AND CORRELATION ALI JALAL-KAMALI Department of Computer Science APPROVED: Vladik Kreinovich,

More information

Countability. 1 Motivation. 2 Counting

Countability. 1 Motivation. 2 Counting Countability 1 Motivation In topology as well as other areas of mathematics, we deal with a lot of infinite sets. However, as we will gradually discover, some infinite sets are bigger than others. Countably

More information

Probability. Table of contents

Probability. Table of contents Probability Table of contents 1. Important definitions 2. Distributions 3. Discrete distributions 4. Continuous distributions 5. The Normal distribution 6. Multivariate random variables 7. Other continuous

More information

HT Introduction. P(X i = x i ) = e λ λ x i

HT Introduction. P(X i = x i ) = e λ λ x i MODS STATISTICS Introduction. HT 2012 Simon Myers, Department of Statistics (and The Wellcome Trust Centre for Human Genetics) myers@stats.ox.ac.uk We will be concerned with the mathematical framework

More information

Special Theory of Relativity Prof. Shiva Prasad Department of Physics Indian Institute of Technology, Bombay. Lecture - 15 Momentum Energy Four Vector

Special Theory of Relativity Prof. Shiva Prasad Department of Physics Indian Institute of Technology, Bombay. Lecture - 15 Momentum Energy Four Vector Special Theory of Relativity Prof. Shiva Prasad Department of Physics Indian Institute of Technology, Bombay Lecture - 15 Momentum Energy Four Vector We had started discussing the concept of four vectors.

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Robustness and Distribution Assumptions

Robustness and Distribution Assumptions Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology

More information

2. AXIOMATIC PROBABILITY

2. AXIOMATIC PROBABILITY IA Probability Lent Term 2. AXIOMATIC PROBABILITY 2. The axioms The formulation for classical probability in which all outcomes or points in the sample space are equally likely is too restrictive to develop

More information

Probability Experiments, Trials, Outcomes, Sample Spaces Example 1 Example 2

Probability Experiments, Trials, Outcomes, Sample Spaces Example 1 Example 2 Probability Probability is the study of uncertain events or outcomes. Games of chance that involve rolling dice or dealing cards are one obvious area of application. However, probability models underlie

More information

Chapter Five Notes N P U2C5

Chapter Five Notes N P U2C5 Chapter Five Notes N P UC5 Name Period Section 5.: Linear and Quadratic Functions with Modeling In every math class you have had since algebra you have worked with equations. Most of those equations have

More information

Chapter 2 Classical Probability Theories

Chapter 2 Classical Probability Theories Chapter 2 Classical Probability Theories In principle, those who are not interested in mathematical foundations of probability theory might jump directly to Part II. One should just know that, besides

More information

Probability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability?

Probability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability? Probability: Why do we care? Lecture 2: Probability and Distributions Sandy Eckel seckel@jhsph.edu 22 April 2008 Probability helps us by: Allowing us to translate scientific questions into mathematical

More information

Slope Fields: Graphing Solutions Without the Solutions

Slope Fields: Graphing Solutions Without the Solutions 8 Slope Fields: Graphing Solutions Without the Solutions Up to now, our efforts have been directed mainly towards finding formulas or equations describing solutions to given differential equations. Then,

More information

Discrete Structures Proofwriting Checklist

Discrete Structures Proofwriting Checklist CS103 Winter 2019 Discrete Structures Proofwriting Checklist Cynthia Lee Keith Schwarz Now that we re transitioning to writing proofs about discrete structures like binary relations, functions, and graphs,

More information

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

Sequence convergence, the weak T-axioms, and first countability

Sequence convergence, the weak T-axioms, and first countability Sequence convergence, the weak T-axioms, and first countability 1 Motivation Up to now we have been mentioning the notion of sequence convergence without actually defining it. So in this section we will

More information

P (E) = P (A 1 )P (A 2 )... P (A n ).

P (E) = P (A 1 )P (A 2 )... P (A n ). Lecture 9: Conditional probability II: breaking complex events into smaller events, methods to solve probability problems, Bayes rule, law of total probability, Bayes theorem Discrete Structures II (Summer

More information

7.5 Partial Fractions and Integration

7.5 Partial Fractions and Integration 650 CHPTER 7. DVNCED INTEGRTION TECHNIQUES 7.5 Partial Fractions and Integration In this section we are interested in techniques for computing integrals of the form P(x) dx, (7.49) Q(x) where P(x) and

More information

DECISIONS UNDER UNCERTAINTY

DECISIONS UNDER UNCERTAINTY August 18, 2003 Aanund Hylland: # DECISIONS UNDER UNCERTAINTY Standard theory and alternatives 1. Introduction Individual decision making under uncertainty can be characterized as follows: The decision

More information

Loglikelihood and Confidence Intervals

Loglikelihood and Confidence Intervals Stat 504, Lecture 2 1 Loglikelihood and Confidence Intervals The loglikelihood function is defined to be the natural logarithm of the likelihood function, l(θ ; x) = log L(θ ; x). For a variety of reasons,

More information

Limiting Distributions

Limiting Distributions Limiting Distributions We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor

More information

arxiv: v1 [cs.dm] 21 Dec 2016

arxiv: v1 [cs.dm] 21 Dec 2016 UNIVERZA NA PRIMORSKEM FAKULTETA ZA MATEMATIKO, NARAVOSLOVJE IN INFORMACIJSKE TEHNOLOGIJE arxiv:1612.07113v1 [cs.dm] 21 Dec 2016 Zaključna naloga (Final project paper) Odčitljivost digrafov in dvodelnih

More information

APPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2

APPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2 APPM/MATH 4/5520 Solutions to Exam I Review Problems. (a) f X (x ) f X,X 2 (x,x 2 )dx 2 x 2e x x 2 dx 2 2e 2x x was below x 2, but when marginalizing out x 2, we ran it over all values from 0 to and so

More information

1 Normal Distribution.

1 Normal Distribution. Normal Distribution.. Introduction A Bernoulli trial is simple random experiment that ends in success or failure. A Bernoulli trial can be used to make a new random experiment by repeating the Bernoulli

More information

Lawrence D. Brown, T. Tony Cai and Anirban DasGupta

Lawrence D. Brown, T. Tony Cai and Anirban DasGupta Statistical Science 2005, Vol. 20, No. 4, 375 379 DOI 10.1214/088342305000000395 Institute of Mathematical Statistics, 2005 Comment: Fuzzy and Randomized Confidence Intervals and P -Values Lawrence D.

More information

Probability. Lecture Notes. Adolfo J. Rumbos

Probability. Lecture Notes. Adolfo J. Rumbos Probability Lecture Notes Adolfo J. Rumbos October 20, 204 2 Contents Introduction 5. An example from statistical inference................ 5 2 Probability Spaces 9 2. Sample Spaces and σ fields.....................

More information

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Ann Inst Stat Math (0) 64:359 37 DOI 0.007/s0463-00-036-3 Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Paul Vos Qiang Wu Received: 3 June 009 / Revised:

More information

CS 543 Page 1 John E. Boon, Jr.

CS 543 Page 1 John E. Boon, Jr. CS 543 Machine Learning Spring 2010 Lecture 05 Evaluating Hypotheses I. Overview A. Given observed accuracy of a hypothesis over a limited sample of data, how well does this estimate its accuracy over

More information

Statistical inference

Statistical inference Statistical inference Contents 1. Main definitions 2. Estimation 3. Testing L. Trapani MSc Induction - Statistical inference 1 1 Introduction: definition and preliminary theory In this chapter, we shall

More information

Limiting Distributions

Limiting Distributions We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the two fundamental results

More information

Proving languages to be nonregular

Proving languages to be nonregular Proving languages to be nonregular We already know that there exist languages A Σ that are nonregular, for any choice of an alphabet Σ. This is because there are uncountably many languages in total and

More information

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including

More information

Hypothesis Testing. A rule for making the required choice can be described in two ways: called the rejection or critical region of the test.

Hypothesis Testing. A rule for making the required choice can be described in two ways: called the rejection or critical region of the test. Hypothesis Testing Hypothesis testing is a statistical problem where you must choose, on the basis of data X, between two alternatives. We formalize this as the problem of choosing between two hypotheses:

More information

Eco517 Fall 2014 C. Sims MIDTERM EXAM

Eco517 Fall 2014 C. Sims MIDTERM EXAM Eco57 Fall 204 C. Sims MIDTERM EXAM You have 90 minutes for this exam and there are a total of 90 points. The points for each question are listed at the beginning of the question. Answer all questions.

More information

6 The normal distribution, the central limit theorem and random samples

6 The normal distribution, the central limit theorem and random samples 6 The normal distribution, the central limit theorem and random samples 6.1 The normal distribution We mentioned the normal (or Gaussian) distribution in Chapter 4. It has density f X (x) = 1 σ 1 2π e

More information

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,

More information